How JS Modules work: a Browser Perspective

Bookmark

Modules are a popular tool for JavaScript Developers. Recently, there have been a number of proposals touching on how Modules work, including Import Maps, Top level await, JSON modules, Module asserts, and many others. But how does the module loading system work, and how do these proposals augment it? What does it look like from the browsers perspective to load a module tree with an import map? We will explore these questions and more, giving you a behind the scenes look at module loading in JS.

by



Transcription


That is a very cool intro. Incidentally, if anybody is thinking, what question should I ask in the Q&A? Ask me about how we got the name SpiderMonkey. It's a funny story. Hi, everyone. We're going to talk about JavaScript modules. In particular, we're going to take a slightly unusual perspective, which is that of a browser. So my name is... That was fast. My name is Yulia Starcev. I'm a staff software engineer at Mozilla. In particular, I work on SpiderMonkey, which is the JavaScript compiler for Firefox. In fact, it's not just JavaScript. We also do WebAssembly. The portion that I work on is the JavaScript side. I also do a bit of work on the DOM. And my focus is the design and implementation of JavaScript features. To start this talk, when I was writing it, I was a little stuck and I was like, I don't know how I'm going to tie all of this together and make it entertaining. This is dry stuff. I came across this great tweet from my former colleague, Jason Orndorff. He used to work on SpiderMonkey with me and I learned a lot from him, especially about language design and languages. He wrote this great... Let's call it an invariant of all programming languages that implement a string reverse method. Incidentally, JavaScript does not implement a string reverse method. But effectively, string.reverse in any language that you try it on will not reverse a picture of a fish. Here we have an example of such a Ruby program in which we have two versions of a fish picture, one done with angle brackets and the other done as an emoji. You will notice that when we run this code, indeed, the fish remains looking in the same direction. We may say that this fish is invariant, an important word that we will be using. So, fish is our theme, and since I have fish as a theme, I get to use one of my favorite opening jokes from a keynote, which comes from David Foster Wallace in his 2005 keynote to a university, which goes something like this. Two fish are swimming along in the ocean and just minding their own business. They're young. They're new to this beautiful blue world, and an older fish comes by and swims along and says, how's the water, boys? It's just a greeting. And this older fish swims off. A little while later, the two fish are still swimming along in silence when one fish turns to the other and says, what the hell is water? This is a great joke for setting the stage for something that we might take for granted, for something that exists in the ether and seems like just there, something that you don't need to question. That's a little bit like what the module system is. I imagine that in the last seven years of ES6 modules existing, many of you have adopted it and use it as your primary way of writing JavaScript modules, especially the import-export syntax. Oh, yes, I forgot to fix that. For some reason, I have that twice. How do we get here? One question you might ask is, well, modules, when did they start? How old is this problem? And I have some of an answer here for you. I can't give you a definitive answer. But here's a code base that is here's a piece of code that's in the Mozilla code base. It's called the Moz JS component loader. I want to call out the date here, which is 1999. This code base has a special place in my heart because I happen to be working on it today. It's not every day that you get to say that the code that you're writing is ready to go and get its master's degree. However, for many of you in the audience, it's likely that modules really came to the forefront with the introduction of Node, and in particular, this blog post from 2009 by Kevin Denger is an important touch point because here he is asking, he's also a former Mozilla employee, he's asking what server-side JavaScript needs? In this blog post, he introduces the need for a module system and introduces a new community group called the server JS community group. This group was later renamed to CommonJS which I imagine sounds rather familiar. As mentioned, six years later, in 2015, ES6 modules were finally introduced into the specification. Browsers took a little longer to implement it. They came in 2018. It introduced a number of features to the browser including the import-export syntax that many of you are familiar with. So let's get into the meat of this talk. How do modules work? What is this module system? What does it do? How does it differ from CommonJS? Why didn't we specify CommonJS? Eventually, we will get into what the future looks like for the module system in browsers. Now, one thing to start with is the module system builds a graph. This graph allows cycles, so, if you have a module importing some neighbour, and that neighbour imports an ancestor of your initial module, this will work. It's an important feature for developer ergonomics. You don't want to always be breaking cycles manually. The browser does this for you. So, how do we actually build this graph? How does it work? How do we ensure that you actually can write your modules in this cyclic manner? I'm going to start from taking the perspective of a node, so, the node in this case, in this graph, is going to be a single module script. In the specification, we have a data structure called the module record, and the module record is this node. It's a bit like a blueprint for our fish, for our module that we are writing. It comes somewhere in the middle of the process that we are about to describe. If you want to take a look at the codebase, I have the source text for the Mozilla implementation of this linked there. You can take a look at the slides later. What is a module record? It's basically a table with a couple of fields. You have an imports and an exports field, and the imports field defines a number of keys that are, for example, what you might call a variable name, as do the exports. They define a number of keys. The difference being that the imports use the URL specifier for the given child that we want to import, and the exports are pointing to a specific piece of code that is going to be live. Now, let's talk a little bit about how we might load this module. How do we start building up this data structure? What is its relationship to its peers? Before I tell you how this works in ES6, I'm going to tell you how it works in common JS, because the contrast is important, especially for later discussions. Let's say this is, let's pretend this is the typical case for common JS. You write a piece of code and you've got a block of JavaScript that is doing some kind of work, then you hit a require statement. In this case, the work that we are doing is we are creating a dynamic path, and then we are requiring that path and loading it. The browser has already done the step of loading this script, it's parsed it and now it's executing it, so it pauses execution and goes ahead and does another load, another parse, and another execution. In our other module, we start executing, and then we find another require statement, so we go off into the ether of the internet to load that new module, then we continue our execution, and finally, we return to our previous module and continue executing. So, what is the issue with this design? Why didn't we implement this? The problem is you will notice that there is no promise syntax here anywhere, and of course common JS was before top-level await. One issue here is that this is fully synchronous, and a problem with this is that, on the web platform, we can't actually block the main thread for something like a network request. For system JS and common JS, this was totally fine. I'm going to give you a ballpark estimation in terms of timing here. Let's say that you've got a processor, you've got the on-processor register to access that register, it's like one second to access the main memory, you're looking at six seconds or so. If you want to get memory from main memory being RAM, if you want to get something off the network in this timescale, you're looking at around four years. This is a really significant chunk of time you're going to be spending on the network. In addition, there is an important invariant of the web platform, it's called run-to-completion. What does run-to-completion mean? If you've studied operating systems, you may be familiar, but if you haven't, run-to-completion means that a given task will continue running until it voluntarily yields its control of the processor, or it finishes its task. That means that we cannot interrupt a task that, for example, is blocking the main thread, so that will continue to block. That's not a great experience for users of the web, and that makes for a very error-prone API for developers to use, so we couldn't introduce synchronous loading. How do we solve this problem? This brings us to the question of how do we load an ES6 module? It looks a little bit like this. Recall that I said common JS is loading, parsing, and evaluating the module all in one step. In ES6, we do that differently. We first parse the entire file and then we build this module record that I mentioned to you before. The module record gives us a picture of a localised view onto the graph, so I am a node, these are my neighbours, these are my incoming edges, these are my outgoing edges. Once we have this, we also have the imports, the other URLs that we need to load, so we can go ahead and load another script. As we load that other script, we can go and do the same process here which is we first parse and then build that module record. You will notice that we do not do any evaluation in this phase. This is called the parse-fetch loop. It's an important piece to how the module system differs from common JS. Now, an important question you may be wondering is, well, how do you implement that looping behaviour of modules? How do you break loops and ensure that you don't get into an infinite loop of modules importing each other continuously? The answer for that lies in a global data structure representing our graph. Now, that's called the module map. You can find it in the specification if you wish. I will quickly go through how the module map helps us break cycles when we are loading modules. An important note I'm going to make here, you will see that the origin of these URLs is in red. Important to note that we have to work with URLs, and they are fully resolved URLs that you could actually go and write into your browser and resolve it to a real web page, but due to space on the slides, this is going to disappear, so, in your mind, if you see a relative URL, always replace that relative URL with a fully resolved URL. Let's get started with how the module map resolves cycles in the graph. We start in a state called unlinked. Unlinked means we have not linked our neighbours, we have not set up the relationships between neighbours on the graph, and main.js is our route, we have path.js is its immediate child, and we have another module which is the immediate child of path.js. Main.js starts its linking process. It says I'm going to go and look at my children. It sees it has a child path, and path starts doing the same thing. Path says I've got children, I'm going to start linking. Another module is the next one in the algorithm, and it says it's my turn to start linking but I don't have any children so I'm automatically linked. That's my default state. Because another module has become linked, path.js says all of my children are now linked, that means that I too am linked. Main.js says all of my children are linked, therefore, I am also linked. Then we bring in the cycle which is import path.js. It's going to go and look at path.js. This is where the cycle break happens. Because path.js is already linked, we terminate the algorithm there, and import path.js gets to say that it is too linked. In the Firefox codebase, the way that we represent our module map is with two hash tables, one of them is the fetching hash table and the other is the fetched hash table which tells us what we are currently loading and what we have already loaded. It also prevents us from reloading stuff from the network. Links are all there. The next question is why are we using URLs? The reason we are using URLs is we have to know where to look, and prior to a certain proposal that we are going to talk about in a second, there was no way to determine what the location, and without the location, we can't fetch the file and we don't have the information we need to apply CSP, which is content security policy. Both very important. Very quickly, from start to finish, the entire module loading process followed by the evaluation process. Gecko gets a stream of bytes from the internet which it transforms into a document object model and then it gets to say, I've got a script, I need to fetch that. The fetch goes through another component called Necco, which is our network component, it takes care of downloading stuff from the internet. Once Necco comes back to Gecko, the fetch passes over the bytes to the compiled and turned into UTF8 bytes to SpiderMonkey. SpiderMonkey knows how to parse, it knows how to instantiate, it knows how to evaluate. Gecko knows who to talk to. Those are the relationships between those two components. SpiderMonkey gets this file, it sees, yes, cool, I can parse this, it parses it, it builds the module record, it says, hey, actually, I need path.js, Gecko, can you get that for me? Gecko says, sure, I can get that for you. Here is the contents of that file. SpiderMonkey goes ahead and parses it. Great. We've just finished the loop that we discussed up to this point. Now, the next step here is how do we instantiate this module? Instantiation means taking this module record and turning it into a living, breathing piece of code. What does that mean? It means that a module instance is something that has both code and state in one place. This also differs from common.js. Common.js would make copies of a given piece of code, so you would have multiples of that state and multiples of the code running. In ES6, it's all one singleton. We have something called live bindings. If, for example, this fish dies, then that will be visible across the entire graph. So, we've gone through the entire fetch process. The execution process looks very similar. Gecko informs SpiderMonkey, can you execute that module graph you got? Gecko has access to the root module of a given graph, coming from a script tag or a dynamic import. SpiderMonkey says, sure, but this root module has several children. I'm going to go and traverse the children all the way down and start from executing the last child that it makes sense to, even if it's in a cycle. We have a way of determining that. We will start executing that child, and then we will execute the root. That's how it works. When we talk about cycles, so, if we have this sort of situation, I'm not going to show the slides because it's exactly the same thing as linking. Just replace unlinked with linked, and the transition that you're making is from evaluating to evaluated. We are again traversing the entire tree down, and then setting the state of the last possible child and then reversing our direction. All right. We have three minutes. Let's see if I can get through the future. So here's the graph of how module adoption looks like in the wild web, and this is coming from Google's telemetry. This is exactly what you want a graph to look like. It's going up and to the right. This is perfect. However, we should all be critical of data and always check the axis and you will notice there is a good old-fashioned 8 per cent there. 8 per cent of the live web is using modules. This is maybe a little bit lower, maybe like 5 per cent. It depends on which part of the data you're looking at. That's a little low for something that's been in production for the last eight years or so, seven years. It's a little low. The question is why aren't people adopting the module syntax? I have a feeling everybody is using it as an author tool in this room. However, a few people are shipping it to browsers. So we had this question, what the hell is water? What have I been talking about? I just told you how the module system works in browsers but it's not being used all that much. The module system is not being hit by code. One problem is network speed. This is the problem that we mentioned before when I discussed run to completion. Tobias did a great job talking about Webpack. Webpack has been a solution because you can package all of those files that you need to use and ship them directly to the users rather than doing multiple network requests which is what the module system requires. We are making multiple hits to the network and that can be very costly. In addition, there is a tooling proposal called the web bundles proposal. I'm not going to go into details because that's really for tooling. If we are looking from a developer perspective, there are other more interesting proposals potentially I hope that I can show you. But we can continue talking about this broader problem of performance which is something that pauses the adoption of modules. Recall that we have this invariant of run to completion which means that we cannot stop the main thread and block it with a network request because we have no way of preemption, we have no way of reordering tasks. They have to be run to completion. And the other invariant which I haven't mentioned yet is order of execution. What module syntax does, if you have noticed, because we are going to the last child and executing that first, it means that if you have concatenated that into one big file, the behaviour of a concatenated file is exactly the same as the loaded module. So, this is a proposal that I have written that's currently at stage one, it's called deferred module evaluation that's trying to address this problem. We have, let's say, a JavaScript file with an import that has been, this JavaScript file has been written for the best possible readability from a programmer's perspective. We have a static import and several rarely used functions that will eventually use this code. That means we don't need to immediately load this information up front. All of this stuff happens like maybe a minute into the application's run time which we could do a load in between. So, how do we fix this? Well, you might write a lazy method that does a dynamic import of the file that you were originally interested in, but this has a significant impact on your codebase, and particularly it turns everything into async and await, but this async await is just a layer of performance on your code, but it's semantically changing how that code works, which potentially confuses the original intention of what that code was doing. Deferred module evaluation introduces a new piece to the import statement which is with lazy init. This allows you to defer the evaluation of that module. It's not quite the same as doing a dynamic import but it will allow you to defer some of that work so that you can have a more performant application with a couple of caveats. Come talk to me after the talk if you want to hear more details. I will drop one more interesting nugget for you to think about. I have been thinking about an alternative syntax here which is to assert pureness of a module. What does pure mean in JavaScript? Excellent question. Come talk to me about that after. This may be a more interesting way of hinting to the browser that this can be lazily loaded. So, you can find this on the URL that you find in the slide. It's currently stage 1 and I'm soliciting feedback, that's why I'm here after all, on this proposal. There is a counter proposal which isn't trying to solve the same problem but it gives us the tools to solve the same problem which I'm also thinking about. It's called import reflection. It breaks up the module loader into pieces that you as a developer can program yourself. In particular, the use case is for WebAssembly. It doesn't want to instantiate a module as part of the module graph. This allows you to break it up and do it on your own time. Finally, one interesting problem is specifiers. I'm sure this is one that's close to everyone's heart. Why write this when you can write this? There is now a proposal that will allow you to do that. It's called import maps. It is in the WICG, the web incubator community group. It's implemented in both Chrome and Firefox but it isn't a web spec. We're waiting on that. You can bug those folks about getting that into W3C. One extra, import assertions and JSON modules. I'm throwing this up here because this allows you to import a JSON module. In the other room, about five minutes ago it started, is an excellent talk by Rickard talking about records and tuples. Here's an interesting thought for import assertions rather than import asserting that you have a JSON file, maybe you have a read-only file and it's pure data that can't be modified. That's it. Thank you. Awesome. Thank you so much for your talk. We will get to the little Q&A bit. We will start with some of the most important questions. We will get them out of the way first. Where did you get this awesome DOM shirt? I stole it from somebody. You stole it from somebody. This is on... Don't do this. They gave it to me. I got it from somebody working on the DOM team. I am getting my own version of this shirt because now I also work on the DOM. It is great. Whoever asked that question, here's your answer. There's another question. How did you come up with the name SpiderMonkey? I did not come up with the name SpiderMonkey. In fact, the person who came up with the name SpiderMonkey was Brendan Eich, the original author of the JavaScript language, not specification because that came later. The funny story about why it's named SpiderMonkey is Brendan Eich was... SpiderMonkey is the second JavaScript compiler. The first JavaScript compiler was something else. SpiderMonkey is the second JavaScript compiler that has been made, and it was so ugly that Brendan Eich who had just been to the zoo with his kids, he said this is the ugliest piece of code I've ever seen, and it reminds me of the ugliest animal I've ever seen, which is a spider monkey. Wow. That's some sentiment there. Good. I think we have an actual question about the content, which is always great. Are there differences between the ES modules implementation between Mozilla and Chromium or WebKit that we should be aware of? No. That's why we have a standard. That's a very elaborate answer. There can't be any confusion about the answer. As we don't have any more questions for you, but I'm sure people know how to find you in the hallway track, and in the Q&A later, we will move right on. Sounds good to me. Thank you.
26 min
16 Jun, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic