JavaScript Iteration Protocols

How many ways do you know to do iteration with JavaScript and Node.js? While, for loop,, for..of, .map(), .forEach(), streams, iterators, etc! Yes, there are a lot of ways! But did you know that JavaScript has iteration protocols to standardise synchronous and even asynchronous iteration?

In this workshop we will learn about these protocols and discover how to build iterators and iterable objects, both synchronous and asynchronous. We will learn about some common use cases for these protocols, explore generators and async generators (great tools for iteration) and finally discuss some hot tips, common pitfalls, and some (more or less successful) wild ideas!


The initial task assigned by the CTO was to help troubleshoot a production issue by analyzing log files to determine the impact on customers and count occurrences of a specific error for each customer.

The team approached the problem by using Node.js to read the log files, parse each line as JSON, filter to keep lines with specific errors, and then reduce the data to count occurrences per customer.

The next day, a 'Range Error: Invalid String Length' error occurred because the script attempted to load the entire content of large files into memory, which was not feasible for large log files.

The proposed solution was to process the log files line by line using streams and iterators, which avoids loading the entire file into memory and helps handle very large files efficiently.

Iterators are a fundamental concept in JavaScript that allow sequential access to the elements of a collection. They are important because they provide a mechanism to handle data one item at a time, which is efficient especially for operations on large or potentially infinite data sets.

Yes, generator functions can be used with asynchronous operations. By combining generators with async functions, you can handle asynchronous data flows more effectively, allowing for operations like pausing the function execution until data is fetched.

The async-iterable protocol allows for the handling of asynchronous data sources, such as streaming data or paginated API responses. It provides a structured way to process data chunks as they become available, which is crucial for performance and efficiency in web applications.

Luciano Mammino
Luciano Mammino
27 min
01 Jun, 2023


We are working on troubleshooting a production issue in a startup. The CTO identified a problem with loading large files into memory and suggested reading the file line by line. We learn about iterators and generators in JavaScript, which allow us to process data one item at a time. Generators can be used to combine async and generator functions for file processing. The speaker also discusses using a for loop instead of map, filter, and reduce. The Talk concludes with the speaker mentioning poly-filling the implementation using core.js and offering free workshops on iteration protocols and Node.js streams.

1. Introduction to the Problem

We just started to work in a new startup. The CTO reached out to us for help with troubleshooting a production issue. We need to analyze log files, count occurrences of a specific error, and group them by customer.

So, thank you for the introduction. I know we are just after lunch, but I hope we can forget about that for a second and start a new adventure together. So this is basically the story. We just started to work in a new company, it's a startup, and this is our first day in this startup. And the first thing that happens is that the CTO reaches out to us asking, can you help me with troubleshooting a problem that I have? It turns out that it's an actual production issue. And the CTO is looking at some logs and is trying to figure out which customers are impacted by these logs. And the way that we can help the CTO is by helping him to analyze some log files, where we want to count how many are FCKD, I don't know what that means, that are in that particular file, and we want to count how many of them are happening for each and every one customer. So if we look at the log file, it looks more or less like this. We are interested in these three lines here, in this particular section, just because that particular error shows up. And then we want to take all these lines and count how many occurrences there are per every different customer. So this should be pretty easy, so let's work together on an implementation for this.

2. Reading and Processing Log Files

We import the read file function from node.js to read the log file and store the information in the raw data variable. We then split the file into individual lines and create JSON objects for each line. After filtering for the specific error, we use reduce to group and count occurrences by customer. We provide the script to the CTO, who is initially impressed. However, the next day, the CTO identifies a problem with loading large files into memory. Instead, we should read the file line by line, filtering and aggregating the results as we go.

So we need to read log files. So the first thing that we do is we import read file from node.js, then we use that function to actually read the file, and this is going to load all that file information into this variable called raw data. Then we want to split on every line because eventually we want to process every single line individually.

Now we take this array of lines, we map it so that we can use JSON parts to actually create JSON objects for every single line. And at this point we can filter and just say, okay, just keep the lines where we have that message dot error equal to the specific error we are looking for. And finally, we can just do a reduce to group things, actually to count things by specific customers. And we just log the result of that reduce. And we should see something like this. Basically, we see an object where the keys are customer IDs and the values are the number of times that that particular error was happening for every single customer.

So we give this script to the CTO, the CTO is really happy and is telling us, great job, great first day, looking forward to seeing you tomorrow. So experience plus plus. All right. The next day, the CTO is telling us, well, you know that script you gave me yesterday? Actually, now there is a problem. I am seeing this error called range error invalid string length. Now I'm going to show you the code again. Can anyone tell me in which line we have a problem? Come on, don't be shy. Okay somebody's number three. Good guess. Why is this a problem? The problem is there because we are basically loading the entire content of the file in memory. We are just saying take all that data and store it into a variable as a string. And if the file starts to be relatively big, we are starting to talk about the order of gigabytes, you can start to see that particular problem. Or if it's over two gigabytes, you will see another problem, which is very similar. So the problem conceptually here is that we are preloading all the information in memory, which for big files is not something that is going to work. So what is an ideal approach instead? The best approach would be, OK, why not pass this file line by line? At the end of the day, we just care about reading one line at a time. We don't need to preload everything in memory. So an ideal implementation would look like this. We just load this one line, we pass it, we filter for it, and we start to aggregate the result. We get the next line and in this case, when we try to filter, we realize we don't care about this line, so we just get disregarded. We move it to the next line, again we pass, we filter, and we aggregate. And again, just to show you another example, we pass, we filter, aggregate, and in this case we find again the first customer, so we can just increment the counter to 2.


