Don’t Try This at Home: Synchronous I/O in Node.js
AI Generated Video Summary
This Talk explores synchronous IO in Node.js and the advantages of asynchronous IO. It discusses exceptions to synchronous IO and approaches to achieving synchronous IO in Node.js, including using WASI and native add-ons. The Talk also covers mixing workers with atomics for synchronous IO and embedding Node.js to create a synchronous worker. Additionally, it touches on TypeScript migration, optimizations in Node.js, and experiences and advice on contributing to Node.js.
1. Introduction to Synchronous IO in Node.js
Hi, everyone. I'm Anna and I'll be talking about synchronous IO in Node.js. I have a background in the Node.js Technical Steering Committee and now I'm part of the MongoDB DevTools team.
Hi, everyone. So I'm Anna and I'm going to be talking a bit about synchronous IO in Node.js. Before we get started on this, who am I? So I'm Anna, pronouns are she, her. I was previously on the Node.js Technical Steering Committee, so I was getting paid full-time to work on Node.js Core. So in September, I joined the MongoDB team, the DevTools team. And my handle, if you have any questions I want to reach out in some other way, is addalex on Twitter and GitHub, at least. And I'm also the mom of these two little cuties.
2. Synchronous vs Asynchronous IO in Node.js
The synchronous and asynchronous ways of loading files in Node.js give the same result, but the synchronous way has performance issues. Initially, I expected the synchronous version to be slightly faster for a single call, but I discovered a bug in the async version that affected performance. However, the async version is now faster. The advantage of asynchronous IO is that multiple things can happen at the same time, allowing for concurrent operations.
3. Exceptions to Synchronous IO in Node.js
But yeah, so there are exceptions to when it is okay to do this. You should not. But some cases. So, loading code required and as far as I know, also ESM import, they do synchronous file system IO. For ESM, that's not something that's technically necessary. And I hope we can get away from that at some point. But yeah, because ESM loading is asynchronous anyway and you wouldn't necessarily notice if the file system was happening asynchronously.
Second case, you absolutely know that what you're doing is the right thing to do. For example, you're writing a CLI application where you, where there's a very limited set of things going on at a time and you know that nothing else is happening at the same time. I would still encourage you to write asynchronous code. Simply because that you should follow that best practice, even if it's not strictly necessary.
And the third case is you need synchronous code for some reason. And that might be because some API or some user facing interface exposes your code as synchronous and you have no other choice. And I'm going to be talking about that last case here.
What do I actually do at my job these days? So if you've ever worked with MongoDB, this might seem a little familiar. There's this Mongo CLI utility where you can pass something that looks like a URL and say, hey, connect to this server. Connect to this database. And then you get a shell where you can run commands such as in this case, db.test.find. It doesn't really matter for the stock that this is MongoDB. This might as well be the MySQL CLI and the command could be select asterisk from test, basically the same thing.
4. Approaches to Synchronous IO in Node.js
We're building on top of the Node.js driver and need to make the method do something synchronously. The easy way is using synchronous methods, but they don't cover network I/O. I believe the existence of synchronous file operations is a design flaw. To achieve synchronous I/O in Node, you can write C, Rust, or C++ code and compile it to Wasm.
We're actually doing that first thing. So there's this GUI for MongoDB, which is called Compass, which is also maintained by our team, and we embed this in an Electron app as a React component basically, which is pretty cool. But yeah, so this old shell, the way it works is you type some command like db.test.find, and it synchronously does I O, because there is no event loop, no nothing, no Node.js involved. There's no reason to do anything asynchronous.
But we are building on top of the Node.js driver. There's no just driver does network I O. You don't have synchronous network I O Node. It's just not there. And that was tricky because people shouldn't have to know about async await in order to be able to use our shell. People have written scripts for the old shell that ideally we want to keep working as much as possible.
So the question becomes, how do we make this method do something synchronously? That's what inspired me to give this talk. What are the different approaches that we could take here? So, first of all, there's the easy way of doing synchronous I O Node. Which is synchronous methods. They are just there. They are in the API. You have FS referencing which just does a synchronous file operation. Doesn't really solve our use case here, obviously. Because it doesn't cover network I O. And that's what we're mostly concerned about here. So, that is kind of a non-starter for us. And also, if you ask me, and this is just my personal opinion, the fact that this is even possible that these read file sync and similar operations are there, there's no good reason why FileSystem.io in libuv in the underlying library that supports Node is implemented the way it is. There's no good reason why it shouldn't work just like accessing network or SDIO streams or anything else. I think that's a design flaw, and we shouldn't be able to have these things in the first place. Obviously, they're not going away because millions of people are using them. But yeah.
So, then... and this is something that if you think about doing synchronous IO in Node, probably not going to be thinking about it in the first... Like, it's not going to pop into your head at first. What you can do is you can write C code, or Rust or C++ and compile it to Wasm. So, basically anything that Clang supports would work here.
5. Using WASI for Synchronous IO in Node.js
6. Foreign brute force for synchronous IO in Node.js
You could write a native add-on that you load from Node.js to perform IO. However, reimplementing the whole Node.js networking stack just for synchronous IO would be too much work and not supported by libuview.
Then there's the foreign brute force, very straightforward Node UA, which is you write a C++ or Rust add-on, and it doesn't really require more than this. This would also be a working example except for there's boilerplate missing obviously. But you could do this. You could write a native add-on that you load from Node.js, and that performs the IO for you. This is also something that we don't want to reimplement the whole Node.js networking stack um just so that we can have synchronous IO in this. That would be far too much work. It's not something that libuview supports so we would have to come up with some clever ways of doing it in other ways. It would require rewriting so much code. We're not doing that.
7. Mixing Workers with Atomics
And now, let's get to the exciting part: mixing workers with atomics. We create a message channel and a shared array buffer. The main thread starts a worker and waits on the shared array buffer. The worker receives data from the main thread and runs an async function using NodeFetch to load HTTP requests. After getting the response, it posts it back to the main thread using atomic cert notify.
And now, let's get to like the ways that I am more excited about personally. So this is my favorite, probably, because it's kind of production ready at this point and we might actually be able to use it in the future.
So what do we do? We mix workers with atomics and the reason this is the nuclear sign and not the atom emoji is that everybody would have just thought React.
So how does this work? Let's look at an example. Again, this is pretty much runnable code. The left-hand side, the main thread side is missing some imports. But it's basically working. What the main thread does, which, when it starts up, is it creates a message channel and a shared array buffer. And it needs a shared array buffer. We'll get to that in a second. And it starts a worker to which it passes one side of the communication channel and that shared array buffer. And then, it waits on that shared array buffer using atomics.wait. And in the worker that happens when it starts up. Well, it gets the data that it was sent from the main thread. And it runs an async function. And that is actually just doing things that are super familiar for you already. You've probably most of you have seen NodeFetch at some point. It's a very nice API for loading HTTP requests. And it's also very good for these examples because it's very straightforward to get to do IO with it. It's like three lines of code here. So what happens is we load it. We use a way to fetch it. A way to wait for the whole response body to get back to us. This is obviously missing error handling. But you know, you don't really do that on slides like these where there's limited space. And then what we do after we got that response, we post it back to the main thread and use atomic cert notify. It's shared memory between the main thread and the worker thread. This actually wakes up that wait call that happened in the main thread, which was blocking. So nothing else progressed in the main thread. It was still waiting at that line for somebody to call atomic cert notify on another thread.
8. Using Workers for Synchronous IO in Node.js
And then on the main thread, we look at what we got, use the receive message from party API, and print out the response. This idea allows synchronous operations with some advantages. The main thread is blocked, spawns a worker thread, and waits for the response before progressing. Node.js offers the full API and NPM packages in the worker, but there are downsides. Atomics.wait is not allowed on main threads in browsers, and manipulating objects inside the worker is not easily possible. However, it's production-ready and can be used in worker threads.
And then on the main thread, we look at the we look at what we got, we use this receive message from party API, which is Node.Jet specific. You can emulate it on the web. But it's, like, it's a bit more convenient this way. And then we print out the response.
This general idea, I think this is pretty cool. It allows you to do things synchronously if you really need to. And it has some advantages. So, again, this is, like, how it looks schematically. The main thread is blocked and does not progress. It does not return to its own event loop. It just spawns a worker thread, lets that loop run, and then waits for that response to come back before it progresses in any way.
And so big advantages Node.js, you can use the full Node.js API and the NPM package in the worker. The small downsides are, so Athomics.wait is not allowed on main threads in browsers. Because if it were, that would look. Atomics.wait is a blocking call that does not allow anything to progress on the main thread. So, it will block rendering, for example, indefinitely, which is not something that should be allowed by a web page. And it still doesn't fully give us what we need, because it doesn't allow manipulating objects inside the worker. So, if you think about the fetch example, if we had wanted to, for example, add an event listener to the response object that we saw there, we could not have done that easily, because we there's no way to access these objects inside the worker. So, there would have to be some kind of RPC protocol that takes care of that. And yeah. Generally not very ergonomic in that way. But it's very cool. Very production ready. There was nothing experimental in what I showed. And you could use this, for example, inside a worker thread. So, Atomic does work inside of worker threads inside the browser. You could kind of do things like this. But, yeah. Anyway. None of these things really worked.
9. Embedding Node.js and Synchronous Worker
I went to my evil scientist lab and thought about a solution for synchronous IO in Node.js. The idea is to embed Node.js into itself, starting a new instance on the same thread. This eliminates the need for separate threads and reduces complexity. I came up with a project called synchronous worker, which achieves the desired result.
So, what I did was I went to my evil scientist lab, I thought, so, I know Node.js very well. I'm very familiar with its internals. I should be able to come up with a solution for this, right? And so, yeah. Remember when I made workers? Which I didn't make them all by myself. Obviously other people were involved. But like this statement, it doesn't feel entirely inaccurate. And, well, yeah. Anyway.
So, back then I obviously gave talks about that, too. And so, one of the slides from back then is like, the idea behind that is to embed Node.js into itself, to start a new Node.js instance, just like the main thread, except on a different operating system thread. And it turns out, like, if you think about it a bit more, you don't even need a separate thread for this. You can do it on the same thread. This is something that, like, I have thought about this in the past for various reasons. For example, testing systems like tap or might want to run pieces of code inside of somewhat isolated environments. I had conversations about that. I thought about what could we do about the with about XSync and similar functions in Node.js in the child processes. So, the way that these are implemented in Node is they have entirely separate implementations from the async methods. There's no good reason for that. And I think with this, we could even get down complexity inside of Node.js quite a bit if it ever ended up in Node.js.
So, the idea is instead of having separate threads, the main thread event loop still runs. It gives us its callback. And inside the callback or during startup code, we start a new event loop, a new Node.js instance with its own event loop on the same thread. And until we're done with that, nothing else on the main thread progresses. And, so, I came up, it's a pandemic. I was a bit bored during the holidays. So, this is a project that I came up with. And the idea is, like, this is all you need to actually achieve what we want. So, you create a synchronous worker. That's what I call it because it's kind of like a worker and that it starts a new Node.js instance, but it's also like no multi-threading involved. So, it's synchronous.
10. Using Workers for Synchronous IO
So, you can create a require function inside that node fetch, inside that that worker that loads node fetch. And then you have something, you have a fetch function that only runs inside the worker. And then you can do cool things like worker.runLoopUntilPromiseResolved and pass it a promise that was created inside this worker. You can do that twice to get the full text and you can print that out. And this is also, again, a runnable example.
If you were wondering, so, like none of these things actually work for Mongo's edge because they all have drawbacks, as you saw. What we actually currently do is if we get input like this, where somebody tries to use the result of an asynchronous call synchronously, what we do is we use Babel to transpile it to async code and that works well enough for us. It also has some drawbacks, like it works on a best effort basis, we're using a weight in places where we think it should be applied and some language features are not supported, but overall, this works well enough for us currently. Thank you for listening. I'll upload the slides soon and if you have any questions or want to reach out at some point in the future, you can ping me on Twitter. Thank you. And that's it. Hello, hello, how are you doing? I'm good. I'm good. We're so happy to have you join us. Thank you so much. What did you think of the poll results? Honestly, I have to think about it for a bit because I have to go through my head to figure out what the yes and no exactly stand for. It sounds like at least the majority of people are using TypeScript and they are very happy with that. Yeah, I noticed that most people are just basically happy with what they're working with. But also I was just curious because I know people have very strong opinions about this sometimes and I don't know, I wanted to get a feeling of what people really think. Of course, of course.
11. TypeScript Migration and Optimizations in Node.js
How are you finding that migration, or what's the road plan for that? We don't have one yet, I think. It's just like, I mean, it's not like on the top of the priority list for us right now, but I hope it's going to be some way where we can just migrate things on a file-per-file or project-per-project basis somehow. Yeah, yeah, yeah. Yeah, I know what you mean. It's going to take somebody who's really spending a lot of time doing that. We have the company I work for, Buffer. We're right now working on this theory, I think coined by one of our engineers, but I'm not sure. His name is Mike Sanderman, and he's talking about single-gear development, and it's basically just like, as you go along, start—you know, the time you touch the file, that's when you do like the migration to the thing you want to migrate to. Yeah, I'm not sure how easy that would be with TypeScript, but yeah. So...
Interesting, interesting. Let's take a look at some of the questions for you. Somebody asked, which optimizations I made to... oh, which optimizations did you make to SS read file in Node.js while preparing to talk? Right. So, I can just... I can share... I'm going to share the link in the Q&A channel on Discord, but basically, for some reason, the promises.readfile implementation in Node read files small chunks of 16 kilobytes. There was no real reason why it would do that. Especially, because it did get the file size first. And when you do things like... when you know you're going to read the entire file, then you might as well allocate one big buffer that is large enough to hold that instead of reading small chunks over and over again. And so, that improves performance a lot, just changing that. That's so cool. Can you talk... in your bio you mentioned that you've been one of the top contributors to NodeJS in the last four years. Can you talk a little bit about how that experience has been for you? It's a very special experience.
12. Experiences and Advice on Contributing to Node.js
I don't think it's one that many of us get to have, honestly. I'm really glad. I appreciate that a lot. It's just very different working on code that affects so many people and that is so visible in the community. I'm also not that sad that I'm not actively working on Node that much anymore. Do you have any advice for folks that are interested in starting to contribute? Start with what would I want to change about Node if I could. Or what is something that I know I could help with? And then focus on that instead of just looking for an easy contribution. What was your first change in Node? My first change was providing a test case for a bug that I found. It's not very exciting on its own, but it helped me work on my own project. I like the philosophy of making changes that are useful to you and probably useful to a lot of other people.
I don't think it's one that many of us get to have, honestly. I'm really glad. I appreciate that a lot. Yeah, it's just very different working on code that affects so many people and that is so visible in the community.
I'm also not that sad that I'm not actively working on Node that much anymore. It also contains a lot of long discussions with lots of people and lots of different opinions. It's okay. It is good to do something else for a change, honestly. Four years is a really long time.
Do you have any advice for folks that are interested in starting to contribute? I mean, maybe ideally don't start with I want to contribute, but start with what would I want to change about Node if I could. Or what is something that I know I could help with? And then focus on that instead of just looking for like, hey, so, what would an easy contribution be? Because there's always like some tiny style change that you can do. But also like if you look for something that you want to do, you're going to get a bigger sense of accomplishment. And also, like, just something that is more specific to what you personally want to see change. I mean, that's how I guess. Yeah.
What was your first change in Node? Do you remember it? Yeah. I think I do. So, like, I think technically my first change was providing just a test case for some bug that I found. So, not that exciting, but like a lot of the first changes that I made, oh, yeah, my first change to Node, it's adding type checking to set timeout and the other time as functions. And it was like, it's not very exciting on its own, but it's like, hey, so, like, this helped me work on my own project that I was doing at the time. So. So, that's really cool. I like that. I like that philosophy. You know? Make the changes that you need or that are useful to you. Probably useful to a lot of other people. Right. Yeah. Yeah. Exactly.
13. Improving Node.js with Supported Official API
That's how open source works. You do things that are good for you and hope that somebody else also finds them helpful. A lot around tracing and like acing operations in Node. If we get everything inside Node on one format, we could provide a supported official API that gives you all things that currently keep the event group alive. It would be really nice to have something inside of Node that is a fully supported public API. Somebody sufficiently motivated could definitely do that.
That's how open source works. You do things that are good for you and hope that somebody else also finds them helpful. Did you have anything in your, I don't know, in your personal backlog that you wish you had worked on that you wish someone would work on right now?
Yeah, there's a couple things that like, that I wish I would have done or like that I wish somebody would do. A lot around like, tracing and like acing operations in Node. I know that there are some very good, very talented and engaged people working on that. But like, there's some areas of the code that were just like, you know, if we get everything inside Node on like one format, we could do some really cool things, like, for example, provide a supported official API that gives you all things that currently keep the event group alive. There's some internal ones, and they return some internal objects and it's very tricky. And there's people who have built npm modules on top of that. But it would be really nice to have just like, something inside of Node that actually, you know, is a fully supported public API. And I think somebody sufficiently motivated could definitely do that. So if you're watching this live stream, get the motivation, there's something that you can do to improve this environment for lots of folks.