How JS Modules work: a Browser Perspective
AI Generated Video Summary
That is a very cool intro. Incidentally, if anybody is thinking, oh, what question should I ask in the Q&A, ask me about how we got the name SpiderMonkey. It's a funny story.
Here we have an example of such a Ruby program in which we have two versions of a fish picture, one done with angle brackets and the other done as an emoji. And you will notice that when we run this code, indeed, the fish remains looking in the same direction. We may say that this fish is invariant, an important word that we'll be using. So fish is our theme, and since I have fish as a theme, I get to use one of my favourite opening jokes from a keynote which comes from David Foster Wallace in his 2005 keynote to a university, which goes something like this. Two fish are swimming along in the ocean and just minding their own business. They're young. They're new to this beautiful blue world, and an older fish comes by and swims along and says, how's the water, boys? It's just a greeting, and this older fish swims off.
The module system is similar to something we take for granted. ES6 modules were introduced in 2015. Let's explore how modules work, their differences from common JS, and how the module system builds a graph. The module record acts as a blueprint for our module.
A little while later, the two fish are still swimming along in silence when one fish turns to the other and says, what the hell is water? This is a great joke for setting the stage for something that we might take for granted, for something that exists in the ether and seems like just there, something that you don't need to question.
One question you might ask is, well, modules, when did they start? How old is this problem? I have some of an answer here for you. I can't give you a definitive answer, but here's a code base that is a piece of code in the Mozilla code base. It's called the MOZ.js component loader. I want to call out the date here, which is 1999. This code base has a special place in my heart because I happen to be working on it today. It's not every day that you get to say that the code that you're writing is ready to go and get its master's degree.
Let's get into the meat of this talk. How do modules work? What is this module system? What does it do? How does it differ from common JS? Why didn't we specify common JS? Eventually, we will get into what the feature looks like for the module system in browsers. One thing to start with is the module system builds a graph. This graph allows cycles. If you have a module importing some neighbour, and that neighbour imports an ancestor of your initial module, this will work. It's an important feature for developer ergonomics. You don't want to always be breaking cycles manually. The browser does this for you. So, how do we actually build this graph? How does it work? How do we ensure that you actually can write your modules in this cyclic manner? I'm going to start from taking the perspective of a node. The node, in this case, in this graph, is going to be a single module script. In the specification, we have a data structure called the module record, and the module record is this node. It's a bit like a blueprint for our fish, for our module that we're writing. And it comes somewhere in the middle of the process that we're about to describe. If you want to take a look at the codebase, I do have the source text for the Mozilla implementation of this linked there.
3. Loading ES6 Modules and Module Records
A module record is a table with imports and exports fields. It defines keys for imports and exports, using URL specifiers for imports and pointing to live code for exports. In common JS, loading a module is synchronous and blocks the main thread, causing performance issues. ES6 modules are loaded differently, parsing the entire file and building a module record. This record provides a localized view of the module graph, including imports and outgoing edges.
You can take a look at the slides later. Effectively, what is a module record? It's basically a table with a couple of fields. You have an imports and exports field, and the imports field defines a number of keys that are for example what you might call a variable name, as do the exports, they define a number of keys, the difference being that the imports use the URL specifier for the given child that we want to import, and the exports are pointing to a specific piece of code that's going to be live.
So what's the issue with this design? Why didn't we implement this? The problem is you will notice that there is no promise syntax here anywhere, and of course common.js was before top-level await. One issue here is that this is fully synchronous, and a problem with this is that on the web platform, we can't block the main thread for a network request. For system.js and common.js, this was fine. I'm going to give you a ballpark estimation in terms of timing here. Let's say that you've got a processor, you've got the on-processor register to access that register, it's like one second to access the main memory, you're looking at six seconds or so. If you want to get main memory being RAM, if you want to get something off the network in this time scale, you're looking at around four years. This is a really significant chunk of time you're going to be spending on the network. In addition, there is an important invariant of the web platform, it's called run-to-completion. What does run-to-completion mean? If you've studied operating systems, you may be familiar. But if you haven't, run-to-completion means a given task will continue running until it voluntarily yields its control of the processor, or it finishes its task. That means that we cannot interrupt a task that, for example, is blocking the main thread, so that will continue to block. That's not a great experience for users of the web, and that makes for a very error-prone API for developers to use. So we couldn't introduce synchronous loading.
So how do we solve this problem? Well, this brings us to the question of how do we load an ES6 module? It looks a little bit like this. Recall that I said common JS is loading, parsing, and evaluating the module all in one step. In ES6, we do that differently. We first parse the entire file, and then we build this module record that I mentioned to you before. The module record gives us this picture of a localised view on to the graph, so, I am my neighbours, these are my incoming edges, these are my outgoing edges. Once we have this, we also have the imports, the other URLs that we need to load, so we can go ahead and load another script. As we load that other script, we can go and do the same process here, which is we first parse, and then build that module record.
4. Loading Modules and the Module Map
The parse-fetch loop is an important part of how the module system differs from common.js. The module map, a global data structure, helps us break cycles when loading modules. URLs are used to determine the location of modules and fetch the necessary files.
You will notice that we do not do any evaluation in this phase. This is called the parse-fetch loop. It is an important piece to how the module system differs from common.js.
An important question you may be wondering is how do you implement that looping behaviour of modules? How do you break loops and ensure that you don't get into an infinite loop of modules importing each other continuously? The answer for that lies in a global data structure representing our graph. Now that's called the module map. You can find it in the specification if you wish, and I'll quickly go through how the module map helps us break cycles when we're loading modules.
An important note I'm going to make here, you will see that the origin of these URLs is in red. Important to note that we have to work with URLs and they are fully resolved URLs that you could actually go and write into your browser and resolve it to a real web page. But due to space on the slides, this is going to disappear, so in your mind if you see a relative URL, always replace that relative URL with a fully resolved URL. But let's get started with how the module map resolves cycles in the graph. We start in a state called unlinked. We have not set up relationships between neighbors on the graph, and main.js is our route, path.js is the immediate child, and another module which is the immediate child of path.js.
Main.js starts its linking process, I'm going to go and look at my children. It sees it has a child and it starts doing the same thing. Path says I've got children, I'm going to start linking. Another module is the next one in the algorithm and says it's my turn to start linking. But I don't have any children, so now I am automatically linked. That's my default state. Because another module has become linked, path.js says all of my children are now linked, that means I too am linked. Main.js says all of my children are linked, therefore I am also linked. Then we bring in the cycle, which is import path.js. It's going to go and look at path.js. This is where the cycle break happens. Because path.js is already linked, we terminate the algorithm there, and import path.js gets to say it is too linked. In the Firefox code base, the way we represent our module map is with two hash tables. One of them is the fetching hash table, and the other one is the fetched hash table. Which tells us what we are currently loading and what we have already loaded. It prevents us from reloading stuff from the network. Links are all there. The next question is why are we using URLs? The reason we're using URLs is we have to know where to look, and prior to a certain proposal that we're going to talk about in a second, there was no way to determine what the location and without the location we can't fetch the file and we don't have the information we need to apply CSP, which is content security policy.
5. Module Loading and Evaluation Process
Gekko fetches a script by transforming bytes into a document object model. SpiderMonkey parses and builds the module record. Instantiation creates a module instance with code and state. ES6 uses live bindings, unlike common JS which makes copies of code. Gekko executes the module graph, starting from the last child and then the root.
Both very important. Very quickly, from start to finish, the entire module loading process followed by the evaluation process. Gekko gets a stream of bytes from the Internet which it transforms into a document object model, and then it gets to say, oh, I've got a script, I need to fetch that. The fetch goes through another component called Neko. It takes care of downloading stuff from the Internet, and once Neko comes back to Gekko, the fetch passes over the bytes to the compiled and turned into UTF-8 bytes to SpiderMonkey. SpiderMonkey knows how to parse, it knows how to instantiate, it knows how to evaluate. Gekko knows who to talk to. Those are the relationships between those two components. SpiderMonkey gets this file, it sees, oh, yeah, cool, I can parse this, it parses it, builds the module record, it says, hey, actually, I need path.js, hey, fetch, can you get that for me, Gekko says, sure, I can get that for you. Here is the contents of that file. SpiderMonkey goes ahead and parses it. Great, we've just finished the loop that we discussed up to this point. Now, the next step here is how do we instantiate this module? Instantiation means taking this module record and turning it into a living, breathing piece of code. Now, what does that mean? It means that a module instance is something that has both code and state in one place. This also differs from common JS. Common JS would make copies of a given piece of code. So you would have multiples of that state and multiples of the code running. In ES6, it's all one singleton. We have something called live bindings. If this fish dies, it will be visible across the entire graph. So we've gone through the entire fetch process. The execution process looks similar. Gekko informs SpiderMonkey, hey, can you execute that module graph you got? Gekko has access to the root module of a given graph. That's coming from a script tag or a dynamic import. He says, sure. But this root module has several children. I'm going to go and traverse the children all the way down and start from executing the last child that it makes sense to, even if it's in a cycle. We have a way of determining that. So we'll start executing that child, and then we'll execute the root. And that's how it works.
6. Module Adoption and Performance
When we talk about cycles, we traverse the entire tree down and set the state of the last possible child. The module adoption on the web is relatively low, with only around 5-8% of the live web using modules. The main reason for this is the network speed and the multiple network requests required by the module system. To address this problem, a proposal called deferred module evaluation has been written, which aims to improve performance and adoption of modules.
When we talk about cycles, so if we have this sort of situation, I'm not going to show the slides because it's exactly the same thing as linking, just replace unlinked with linked, and the transition that you're making is from evaluating to evaluated. We are again traversing the entire tree down and then setting the state of the last possible child and then reversing our direction. All right.
We have three minutes. Let's see if I can get through the future. So here's the graph of how module adoption looks like in the wild web and this is coming from Google's telemetry. This is exactly what you want a graph to look like. It's going up and to the right. This is perfect. However, we should all be critical of data and always check the axis and you'll notice that there is a good old-fashioned 8% there, so 8% of the live web is actually using modules. In fact, this is maybe a little bit lower, maybe like 5%, it depends on which part of the data you're looking at. So that's a little low for something that's been in production for the last eight years or so, seven years. It's a little low. So the question is why aren't people adopting the module syntax. I have a feeling everybody's using it as an author tool in this room. However, few people are shipping it to browsers.
So we had this question, what the hell is water? What have I been talking about. I just told you how the module system works in browsers, but it's actually not being used all that much, the module system is not being hit by code. One problem is network speed. This is the problem that we mentioned before when I discussed run to completion. Now Tobias did a great job talking about Webpack and Webpack has been a solution because you can package all of those files that you need to use and ship them directly to the users rather than doing multiple network requests which is what the module system requires. We are making multiple hits to the network and that can be very costly. In addition there is a tooling proposal called the web bundles proposal, I'm not going to go into details because that's really for tooling, and if we are looking from a developer perspective, there are other more interesting proposals potentially, I hope, that I can show you. We can continue talking about this broader problem of performance which is something that pauses the adoption of modules.
7. Lazy Loading and Dynamic Import
To address the issue of immediate loading, a lazy method can be implemented using dynamic import. However, this approach has a significant impact on the code base, as it converts everything into async and await, potentially altering the original intention of the code.
But that means, like, we don't need to immediately load this information up front. All of this stuff happens, like, maybe a minute into the application's runtime, which we could do a load in between. So how do we fix this? You might write a lazy method that does a dynamic import of the file that you were originally interested in, but this has a significant impact on your code base. In particularly it turns everything into async and await, but this async await is just a layer of performance on your code, but it's semantically changing how that code works. Which potentially confuses the original intention of what that code was doing.
8. Module Evaluation and Import Reflection
Third module evaluation introduces a new piece to the import statement, which is with lazy init. It's not quite the same as doing a dynamic import, but it will allow you to defer some of that work, so that you can have a more performant application with a couple of caveats. I have been thinking about an alternative syntax here, which is to assert pureness of a module. There is a counter proposal called import reflection. It breaks up the module loader into pieces that you as a developer can program yourself. Finally, one interesting problem is specifiers. There is now a proposal called import maps that will allow you to do that. One extra, import assertions and JSON modules.
Third module evaluation introduces a new piece to the import statement, which is with lazy init. This allows you to defer the evaluation of that module. It's not quite the same as doing a dynamic import, but it will allow you to defer some of that work, so that you can have a more performant application with a couple of caveats. Come talk to me after the talk if you want to hear more details.
There is a counter proposal which is not trying to solve the same problem, but it gives us the tools to solve the same problem which I'm also thinking about, it's called import reflection. It breaks up the module loader into pieces that you as a developer can program yourself. In particular, the use case is for WebAssembly. WebAssembly does not always want to instantiate a module as part of the module graph. This allows you to break it up and do it on your own time.
Finally, one interesting problem is specifiers. I'm sure this is one that's close to everyone's heart. Why write this when you can write this? There is now a proposal that will allow you to do that. It's called import maps. It is in the WICG, the web incubator community group. It's implemented in both Chrome and Firefox but it isn't a web spec. We're waiting on that. So you can bug those folks about getting that into W3C.
One extra, import assertions and JSON modules. I'm throwing this up here because this allows you to import a JSON module. In the other room, about five minutes ago it started, is an excellent talk by Rick Hart talking about records and tuples, and here is an interesting thought, for import assertions, rather than import asserting that you have a JSON file, maybe you have a read-only file and it's pure data that can't be modified. That's it. Thank you.
9. Q&A: Origin of the Awesome DOM Shirt
During the Q&A session, the speaker humorously reveals that they stole the awesome DOM shirt they are wearing from someone on the DOM team. However, they clarify that they were given the shirt by a team member. They mention that they now work on the DOMs as well.
Awesome, thank you so much for your talk. We'll get to the little Q&A bit. And we'll start with some of the most important questions. We'll get them out of the way first. So where did you get this awesome DOM shirt? I stole it from somebody. You stole it from someone? Yeah. This is on... Don't do this. It's recording now. No, they gave it to me. I got it from somebody working on the DOM team, although I am getting my own version of this shirt because now I also work on the DOMs, so... Yeah, it is... Great. Well, whoever asked that question, here's your answer.