1. Introduction to ESM and Module Loading in Node.js
♪♪ ESM Loaders, Enhancing Module Loading in Node.js. Hi, I'm Gil Tayar. I'm pretty old. We go all the way back to the 80s. I always was a developer. I did lots of stuff, but basically that is what I love to do. Another thing I love to do is use NPM and ESM and all that to enhance my code using extreme modularity and testing code. I'm currently a software engineer at Microsoft and working on cool stuff like the Azure Data Explorer, which is a real-time analytics database engine. Nothing to do with ESM.
What is module loading in Node.js? Let's talk a little bit about that module loading. First of all, what is a module? A module is basically a glorified file, it's just called a module. So when I run a module or import a module, I'm basically executing the code inside the module, it's glorified execution module loading. So even when I'm importing module.js on the top, it's executing module.js at the bottom and then continuing executing main.js. It's all about glorified execution. First of all, a note to TypeScript users, if you're using TypeScript in Node.js, you're using the import-export syntax but you're probably transpiling that to common.js because ESM is pretty new to the game. So TypeScript usually, by default, transpiles to common.js. So if you're trying all the things I'm doing here with ASM loaders on your TypeScript code, it won't work. You have to not translate the ESM using module.node.next. And you have an example in the GitHub repo, don't worry, at the end, you'll have a link to the GitHub repo with a nice QR code and a link to this presentation.
What are the phases of module execution? How does Node.js execute a module? It's not simple. First of all, we resolve the URL. So if we're doing import.slash module.js, we have to resolve it to the absolute path on the disk. And ESM doesn't talk about paths. It talks about URLs.
2. ESM Module Loading in Node.js
ESM module loading in Node.js involves resolving URLs and reading files from the disk. It recursively loads all the modules in the file without executing them. Once all the files are loaded, it executes the code of the modules in the right order.
It talks about URLs. Usually the URLs are file URLs, but they're URLs nonetheless. Once they resolve the URL, once Node.js resolves the URL, it reads the file from the disk. And then this is different from CommonJS, but let's not talk about what, it recursively loads all the modules in the file without executing them.
Notice that we haven't executed the ESM module above. Once it recursively loaded all the files, it executes the code of all the modules bottom up in the right order. These are the phases of module execution in Node.js. And basically in the browsers too, anywhere ESM is implemented, these are the phases. And this is different from CommonJS, we don't get into where.
3. Module Loaders and Overriding Modules
Module loaders enhance the resolution and loading phases in Node.js. Examples include pnpm, which changes how modules are found, and import maps for overriding bare specifiers. We can write a loader to override modules by specifying a loader using the --loader flag. The loader.js file reads the overrides.json file and exports a resolve function that takes the module specifier as a parameter.
Okay, let's talk about module loaders. Module loaders enhance the first two, enhance and change and transform how Node.js does a resolution and how it reads the file, the module. Let's talk about enhancing the resolution phase, we'll give an example and then we'll give an example for the loading phase.
Let's talk about the resolution phase. Let's give two examples. One is pnpm, it's a package managers like yarn and npm, but pnpm changes the way modules are found. So it searches for them in the central cache repository on your disk. Currently, because they don't have loaders, they do various sorts of hacks using hard links to make it work. But if they had loaders, if it was native ESM, then they could have modified the resolution phase to search wherever they want. Another example is import maps, overriding bare specifiers and this is what we'll be trying to do. Let's write a loader that does this and you'll understand what it is in a second.
Now notice the sample code is demonstration code. Do not use it in production, it doesn't have error handling, it doesn't have handle edge cases, it's just toy examples.
Okay, let's talk about overriding modules. This is main.js and we're doing import a module to overwrite. A module to override doesn't exist anywhere in node modules or everywhere. So if I run it I get error module not found, obviously. But let's say I have an overrides.json file which says a module to overwrite actually exists in .moduleoverride.js and this is moduleoverride.js so we console log module overridden and we want this to work. So we want to run the main.js with the loader. How do we do that? We add dash dash loader and point to the loader and if we run it, boom it works. This is the syntax dash dash loader equals or space it doesn't matter and dot slash loader.js note that the dot slash is important. If you say loader.js it will look for the package loader.js not the file loader.js. Okay, so the dot slash is essential.
Okay let's look at loader.js don't worry we'll go one by one and understand it. This is the loader it's very very small as you can see writing a loader is not that difficult. First of all we have to read the overrides.json so we just read it using top level 08 and parse it to get the overrides.json. Perfect, easy, no problem. Now we export a function that function has to be named resolve because this is when NodeJS is loading a loader it looks for that function that exported function and it will receive three parameters we'll see in a second and it's async so you can do whatever you want in there you're not you're not limited to synchronicity. So let's talk about the three parameters. First of all the specifier the module specifier is what is in behind the quotes and sorry inside the quotes for in our example a module override just as it appears in the code.
4. Enhancing Loading Phase: HTTP and TypeScript
We'll talk about Rex resolve and passing on to Node.js. If the specifier is an override, we take the overrides specifier and pass it on to Node.js. Enhancing the loading phase involves loading directly from HTTP and loading TypeScript code without building it. Loading from HTTPS using ESM.SH gives us ESM modules for all NPM packages. Running the code without a loader results in an unsupported URL scheme error, but with a loader, it works.
The context we'll see later it has all sorts of things but mainly we'll talk about and Rex resolve if you can't resolve something or don't want to resolve and you want to pass it on to Node.js you can use next resolve to resolve. So let's look at the code. If the specifier is an overrides remember overrides is the JSON file then if it's well if it's not in the overrides we call next resolve with a specifier. So if we don't know what to do with it we pass it on to Node.js. Easy but if we do want to know what we do with it we take the overrides specifier so we take whatever is in the overrides and pass it on to Node.js. Now Node.js will not get the bare specifier like the module to override but it will get .slash module overrides.js but we still pass it on to Node.js and boom there we go. It works. See how easy it is to write an ASM loader? It really is that simple. Obviously edge cases, error handling, blah blah blah but in essence, this is it. So we've seen resolution phase. Let's talk about the loading phase and how to override that.
So remember resolving is taking the specifier and resolving it to a full URL and then reading the file is where we're talking about loading. Okay? So why enhance the loading phase? Well, there are lots of examples. We're going to do two things. Loading directly from HTTP instead of from a file and loading TypeScript code without building it. So we will be able to give it a TS file and it will just work. Let's talk about loading from HTTP. Okay, so this is the code. So we're loading from HTTPS, ESM.SH, whatever. ESM.SH, and there are others out there, is a service that gives you ESM modules of all the native, all the NPM packages out there. It's really, really, really cool. Okay, so this is what we want to run. If we run it like this with no loader, we will get error unsupported URL scheme because Node.js is telling us, I don't know what to do with HTTPS. Perfect. But if we run it with a loader, boom, it works. We're very, very happy. Let's see it, the loader. First of all, just like there's an export of load. We have an export of, just like we have an export of resolve for the resolve URL. Oh wait, something happened to the, I'm stopping.
5. Loader and Fetching Source Code
The loader in the module URL is simple and has a load function exported async. It handles the resolution of URLs and uses fetch to fetch the source code. Redirects are followed, and the result is returned to node.js.
Yeah. No, I see what it is. It's my, there we go. Okay, it was, okay. So this, yeah. So there we go. This is the loader. As you can see, it's as simple. We'll go over it one by one just like previously. Just like in the resolver, the loader has a load function which is exported async. Just like example. Like the previous one.
Let's talk about the module URL. The URL is the module URL. Notice this is the module URL after resolution. So the resolvers, all the resolvers are finished with it. We get the full absolute URL. Context we'll see later. And next load, if you can't load something, you can use next load to load it. It's usually the node.js one. Okay. So if the URL doesn't, isn't an HTTP or HTTPS URL, well we just pass it on to next load. Very, very easy. If it is, we use fetch. Fetch is now native to node.js, just like in the browser. You can use it without importing anything from node V18, I think, or something like that. So we use fetch to fetch the source code. We follow redirects because esm.sh has redirects and we have the source code. What do we do with it? We return the result to node.js. The source code of the module is in source.
6. Module Format and Resolver
Format of the module: ESM, common.js, JSON, WASM. HTTPS must be module and use short circuit. Node.js resolver throws an error for HTTPS. A bug and pull request opened. Need resolver in HTTP loader to solve the problem. Deal with relative, absolute, and bears specifiers.
Format is what is the format of the module. Is it module, which means ESM, common.js, JSON, WASM. In this case, we're always saying if it's coming from HTTPS, it has to be module. And short circuit. Short circuit is telling node.js, look, we didn't call next load. We know what the source is. But just so you know, we didn't call next load and it's fine. If we don't add short circuit true, node.js will fail and say, are you sure you didn't want to call next load? If you didn't, please send short circuit true, and then you add short circuit true, and you're good.
So will this work? Let's see. No. Because this is, I mean, because the node.js resolver throws an error. So it's not the loader that says, I don't know what to do with HTTPS. It's the resolver that says, I don't know what to do with HTTPS. This is infuriating, actually, because why should it care? Why should the resolver say, I don't know what to do with HTTPS? Maybe somebody else wants to know what to do with HTTPS. So I actually found this out when I was working on this talk, and opened a bug and implemented a pull request. So in node 20, if this pull request passes, we will not need the next phase, the next thing that fixes this, because the node.js resolver, it says, oh, I don't know this URL, but it's fine. I'll throw in the loader. But we still have this problem. So we need a resolver in the HTTP loader that solves this problem. We have to override resolution, too. And this is infuriating, but this is what it is. OK.
This is where the code becomes interesting. So please, please, please, please pay attention. Specifiers. Remember, these are specifiers. Our resolver will meet three kinds of specifiers. Relative URL specifiers, absolute URL specifiers, and what are called bears specifiers. And we will need to deal with all of them, just like node.js does. So relative are these kinds of specifiers.
7. Handling Bear Specifiers in Node.js
Bears are the ones out on the bottom. If it's a bear specifier, pass it on to Node.js.
Bears are the ones out on the bottom. And absolute, yes, theoretically, somebody can give us an absolute URL. So bear specifiers. What do we do? We let node handle it. We don't know how to handle bear specifiers. So if is bear specifier, then we continue on to next resolve. There's no problem there of HTTP URLs, so we can just let node handle it. And this is bear specifier. It's ugly. Maybe somebody else has a better way. So I'd say, if the specifier starts with dot, then it's not a bear specifier. Otherwise, parse it. If it's a URL, if it's not a URL, then it's a bear specifier. Otherwise, it's a URL, so not a bear specifier. So if it's bear specifier, pass it on to Node.js.
8. Relative URLs and TypeScript Loader
To deal with relative URLs, we add the module's URL to the specifier and absolutize it. If the URL starts with HTTP, we return it as-is. Otherwise, we pass it to the Node.js resolver. Another loader is for transpiling TypeScript. It calls nextload for .ts files, passes it to Node.js, and transforms the source using ESBuild.
Now we need to deal with relative URLs. How do we absolutize a relative URL? We take the module's URL and add it to the specifier and absolutize it. So in this case, module.js, we have the absolute URL of the module. And we absolutize it to get the module.js. We can do it with a new URL in Node.js, very easy. And this is what we do. We take the specifier, we take the context.parentURL, which is where the parent URL resides, and we get a URL in the back. And this also takes care of absolute URLs.
Now, if the URL starts with HTTP, we do not want to send it to the Node.js resolver because then it will throw. So we just return the URL as-is and short-circuit it. And otherwise, it's not an HTTP URL, we can pass it onto the Node.js resolver. This takes care of the problem with HTTP and resolve. And there you go. The loader works.
9. Chaining Loaders for HTTP and TypeScript
You can use any transpiler, but ESBuild is my preferred choice due to its simplicity. It only requires about 10 lines of code for transpilation. We've seen three loaders: one for overriding modules, and two for loading HTTP and transpiling TypeScript. These loaders can be chained together by importing the necessary modules and adding them in the desired order. By specifying the URLs as HTTP in the overrides, we can use bare specifiers and achieve successful loading and transpilation.
You can use whatever transpiler you want. I use ESBuild because it's easy. Get back the new code and return it as this transpiled code with return source and format. Very, very easy, and boom, you get transpilation with like 10 lines of code. Very easy.
Look at ts-node.js. It does the same thing, but robustly you can see it has hundreds of lines of code. But in essence, it's doing what I showed previously.
OK. We've seen three loaders. One that overrides modules. It does the resolving phase. And two, the loading phase, HTTP and TypeScript transpiling. Can we chain loaders? Can we add them one after the other? And the answer is absolutely.
So let's do HTTP and TypeScript loaders together. So I import HTTP, but one of them is a TypeScript code. And I want the two loaders to make it work. As you can see, we're using HTTP and we're using TypeScript. And these are the modules, nothing interesting here. But let's add the override.
OK, so instead of giving the full URL, we'll just give, you know, bare specifiers, A and B, and specify in the overrides that the URLs are HTTP URLs. We add the loaders with multiple. Notice the commas between them. We can also use --loader, HTTP loader, --loader, TS loader, --loader, override loader. Same thing. We add them together and, boom, it works. Let's see how because this is interesting. Notice the order, HTTP loader, TS loader, override loader. First of all, it does the resolution. The resolution gets a specifier and returns a URL, if you remember.
10. Loader Functionality and Future
HTTP loader calls the override loader, which calls the last loader. The override resolver reads overrides.json and passes it to the next loader. HTTP resolver handles HTTP specifiers, while NodeJS resolver absolutizes everything. Loaders enhance URL resolution and file loading. They can load from different sources, transform source code, and resolve URLs differently. A loader has resolve and load functions. To use a loader, use --loader. The future of module loading enhancements is promising and simple to use.
So notice that HTTP loader is first, but it will call the override loader. It will call the last loader. OK, why? Because the override resolver, sorry, override resolver, when it says next resolve, it will call the HTTP resolve. And when HTTP resolve calls next resolve, it will call the NodeJS resolve. So the override resolve reads the overrides.json and resolves via overrides.json and passes it onto the next loader. And this HTTP resolver, if you remember, passes non-HTTP specifiers to NodeJS and deals with HTTP specifiers on its own. NodeJS resolver would absolutize everything and pass it on, and boom, we get the URL. Loaders, same thing. It will call the last loader, this one. This is the TS loader will pass it onto the HTTP loader, which passes on to NodeJS, etc. etc. The HTTP loader will fetch the HTTP URLs and pass on non-HTTP URLs, and boom, it works.
Now, my time is short, so I'll skip this, but you can see everything in my presentation. But you can actually use loaders with APIs, so specify a loader that also has an API. It's a more advanced scenario, not used a lot, but you can do that.
Okay, let's summarize. Loaders enhance these two phases, not the execution phase, but the resolution of the URL, which is finding where the module is, and reading the URL, or what we call loading the file. Loaders can be used to load from different sources, transform source code, transform resolved URLs, and resolve URLs differently. Lots of uses for that. A loader always has at most two export functions, resolve and load. It can have only one of them. One deals with the resolution, which takes a specifier and resolves it to a URL. It can use the next resolve in the chain if it doesn't know what to do with something. And loading takes the URL, the resolved URL, and returns the source code, and if it wants, it can use the next load to load it. To use a loader, use --loader. You can chain them, which is great, and to communicate well, that's something we passed. You can communicate with the loader using various methods. And the future is bright. We finally have a formal way of doing module loading enhancements. It has withstood the test of time. It's still experimental, so small changes can still happen. Be careful. It is very simple to use, and can be used in a large variety of use cases.