V8 provides the ability to capture a snapshot out of an initialized heap and rehydrate a heap from the snapshot instead of initializing it from scratch. One of the most important use cases of this feature is to improve the startup performance of an application built on top of V8. In this talk we are going to take a look at the integration of the V8 startup snapshots in Node.js, how the snapshots have been used to speed up the startup of Node.js core, and how user-land startup snapshots can be used to speed up the startup of user applications.
Node.js startup snapshots
From:

Node Congress 2023
Transcription
As mentioned, I'm Joy. I work at Igalea and I work on Node and V8. So I've been working on the Startup performance Strategic Initiative in Node for a while. The initiative has recently been renamed to Startup Snapshot as we have done the integration within Node Core and we are enabling this feature for user-land applications, which is what I'm going to talk about today. So let's get started. So a bit of history. The Startup Snapshot integration started while Node started gradually dropping the old small core philosophy and adding a lot more built-in features. This includes new globals, in particular new web APIs, new built-in modules, and new APIs in existing modules. And these new features either require additional setup during the startup of Node Core or require additional internal modules to be loaded during the startup. So to give you an overview from the last LTS version 18 to the upcoming 20, we've added support for Fetch, WebCrypto, File api, Blob, a bunch of web strings, and a bunch of new APIs under Yotel, such as the Argument Parser and the MIE Type Parser. The list is longer than that, but you get the idea. Like Node is growing a lot. Another part of this challenge is that Node Core is written about half in javascript and half in C++. So a lot of those internals are actually implemented in javascript. The upside of this is that this lowers the contribution barrier. In some cases, it reduces the C++ to javascript callback costs. But at the same time, this makes it harder to keep the startup performant. For one, the javascript code needs to be parsed and compiled before they can be executed, and that takes time. And also, most of the javascript code for initialization only gets run once during startup because it's just initialization. So it doesn't get optimized by the javascript engine. And when implementing a library in javascript, we have to take potential prototype pollution into account. You don't want the user to blow up the runtime just because they delete something from the building to a prototype, like string prototype that started with. So to mitigate this, no need to create copies of these javascript buildings at startup for the internals to use. They don't actually use the prototype methods that we expose to users. And all this can slow down the startup as Node grows. So to keep the cost of the startup initializations under control, Node core uses multiple strategies. First, we do not initialize all the globals and buildings at startup. For features that are still experimental or too new to be used widely or only serve a specific type of application, we only install accessors that will load them lazily when the user access them for the first time. And second, when building releases, we precompile all the internal modules to generate the code cache, which contains bytecode and metadata, and we empty them into the executable so that when we do have to load additional modules, a user requests, we pass the code cache to V8 and V8 can skip the parsing and compilation and just use the serialized code when it updates, when it validates that cache. And finally, for essential features that we almost always have to load, for example, the web URL api, the FS module, which are used also by other internals, or like widely used timers like time, widely used features like timers, we captured them in a V8 startup snapshot, which helps simply skipping the execution of the initialization code and saving time during startup. So this is kind of like how the Node executable used to be built and run. Initially, we were just embedding the javascript code into the executable at build time, and at run time, we need to parse it, we need to compile it, we need to execute it to get the Node core initialized before we can run user code and process system states to initialize the user app. And then we introduced embedded code cache. So at build time, we precompile all the internal javascript code and generate compiled code cache, and then we embed them into the executable. And at run time, we'll ask V8 to use the code cache and skip the parsing and compilation process. We'll still keep the internal javascript code as the source of truth in case the code cache doesn't validate in the current execution environment. But most of the time, the code cache is used and we just skip the compilation process. And now, with the startup snapshot integration, we just run the internal javascript code at build time to initialize a Node heap, and then we capture a snapshot and embed that into the executable. The other two are still kept as fallback, but at run time, we just simply deserialize the snapshot to get the initialized heap. So there is no need to even parse, compile, execute. The internal code is just like, you deserialize the result. So what exactly are these V8 startup snapshots? They're basically the heap, the V8 heap, serialized into a binary blob. There are two layers of snapshots, one that captures all the primitives and the native bindings, and one that captures the execution context, like the objects and functions. So currently, Node uses the isolates snapshot for all the isolates that you can create from the use line, including the main isolate and worker isolates. We also have built-in context snapshots for the main context, the VM context, and the worker context. Although the worker context snapshot currently only contains very minimal stuff, and we're still working on including more stuff there. So with the default snapshot, the startup is generally twice as fast compared to launching Node without a snapshot. For example, on this MacBook, it goes from about 40 milliseconds to 20 milliseconds to start up Node core itself. So on the left, that's the Node core started up without the snapshot, and on the right, that's the Node core started with the snapshot. And you can see, even just on the flangraph, there's less to be done, and it's obviously much simpler and it runs faster. And this also gives us more sustainability when we grow Node core while keeping the startup under control. And we are still tweaking the internal snapshot to make sure that the built-in one contains just the right amount of essential features, but also at the same time, the feature is now also available to users so that people can just create snapshots of their own applications. So this can be useful for applications where the startup performance matters, for example, like command line tools. In particular, if the application needs to run a lot of code during startup or if it needs to load a lot of system-independent data, these operations can be done when building the snapshot instead of being done at runtime. So the general workflow is similar to the workflow for building the core snapshot. So Node can take a user, provide a script somewhere to do some essential initialization for the user applications, and it can run that script to completion. And after all the asynchronous operations are finished, like for example, all the promises are resolved, Node can take a snapshot of the heap and write it somewhere either into one binary along with the Node executable itself or as a separate block on disk. And when starting up again, Node can just get the pre-built snapshot from somewhere, and then deserialize a user heap from it to skip the setups. So currently, the useline snapshot only takes one file as input, so you'll have to bundle the setup code into one file. But we are also looking into useline module support in the snapshot building script. So yeah, that's also coming. And currently, there are two ways to generate the useline snapshot. So first is the tougher one, which is building Node from source with the dash dash Node snapshot main configure option, which tells the 2chain to generate a snapshot using the provided user script and replace the default snapshot with a custom snapshot. And the final Node executable would contain the user snapshot. So for example, we have a file here that contains something like global data, and I put some string there. Well, you can put many other complicated things, but this is just an example. And then you go to the Node source directory, and then you build it with that configure option, and then the final executable that gets produced by the compilation process will contain a binary that already has the snapshot that contains this thing that you put on the global this. And another option that does not require building Node from source is using the dash dash build snapshot runtime option of the official Node executable. So that might come in handy if you just don't want to build Node from source, which can take a lot of time. So by default, this generates a blob called snapshot.blob to the current working directory using the given script. But you can also specify the output path with the dash dash snapshot blob option. And when you launch Node using a custom snapshot, you can again use that dash dash snapshot blob option as a way to tell Node to deserialize a heap from the specified custom snapshot instead of setting up a default Node core heap. And that will help you skip the parsing, compilation, and execution of maybe your own code and help you run faster. And now there is another option that's a work in progress, which is the new single executable application feature. And the snapshot can be layered on top of that. And this means that it will be possible to generate a snapshot and amp it into a single executable with Node itself without having to compile Node from source. So a quick preview of the current design is so the user can create a JSON configuration like that one. You specify the main script with snapshot main, the path, and then specify where you want the output to be written. And then you use the official Node executable to take this JSON configuration and generate the blob. And then you copy the executable to your destination path because it's going to inject that blob. And you use, for example, the post-ject command line 2 that is maintained officially by Node to inject that blob into the binary, which is C there, single executable application. And then after you're injecting the blob into that binary, that will contain a snapshot. And Node would just know that, oh, I have a snapshot embedded into this binary. When I launch, I will just deserialize that. And with this, you don't have to compile Node from source to use a embedded snapshot. So that's still a work in progress, but it's coming. Probably will land in 2020. And we're also thinking about, instead of doing all this, we'll provide some kind of single line utility to just get a JSON configuration. And then it generates a single executable that you can just run without doing all this. And to help users create custom snapshots. So Node also offers several javascript APIs to help synchronize the runtime states in the snapshot script. So by default, after deserializing a snapshot blob, Node would refresh the runtime states, like process the env and process the ARGB. And if the user code pre-computes something from these states or caches these states before the snapshot gets serialized, then this should be refreshed during deserialization. So for example, if the user code computes some debug level here based on the environment variable debug level like this in the snapshot script. So they might already be doing this when they are building the snapshot. Then at runtime, they can synchronize this with the snapshot synchronization api that we provide through the V8 startup snapshot namespace. So for example, in the startup snapshot script, you can add a few callbacks that will be called. First will be a callback that can be called during the serialization process to reset that debug level. And then you add another serialized callback that can reset the debug level according to the environment variable configured at runtime. And that will help you synchronize these states back. Or you can just defer the computation until deserialization if you're building a snapshot. That's also like there is a getter called isBuildingSnapshot that you can use from the api to determine whether the code is actually being run to build a snapshot. And another useful api is setDigitalizeMainFunction, which allows us to specify a main function in the snapshot without having to pass another main script. So for example, if the snapshot we have, there is a database of, for example, greetings in different languages. One way to log the greeting according to some environment variable during runtime is, for example, you can pass a separate main script that does the logging. That means we'll need an additional input, which also needs to be parsed and compiled at runtime. So instead, you can use the javascript api, which can include a main function into the snapshot. So the code of this main function will be also compiled and serialized into the snapshot. So a runtime node can just deserialize this main function and just run it. Then we no longer need additional input. And it's also faster because there is now no need to compile more code. So a summary, we have been integrating startup snapshots into Node Core to speed up the startup of the core. And now there is experimental support for user-led snapshots with some javascript APIs in DBA startup snapshot namespace to help building them. And we are also working on support for single executable application, as well as more features in the snapshot. Okay, so finally, I'd like to thank all the people who have been contributing to this feature, including Anna, Colleen, James, Chen Zhong, Da Xian, and many others that I'm forgetting about in the slides. Also personally, I'd like to thank Bloomberg and Gaia for supporting my part of work on this. And that's it. Thank you. Thank you. Ever so, you know the deal. Yeah, let's come have a chat. Hello. Thank you so much for that talk. To the audience, both online and in the room, please do feel free to ask questions, slido.com, 1404, or the QR code, which is same diff. Let me pull up the questions. I'm just going to open with while I'm faffing around and getting questions and waiting for them. So far, what have you found the common misconceptions in the design that people's common misconceptions are with the design of startup snapshots? I think one common misconception would be that this is somehow related. The features of somehow like people miss heap snapshot. We start a snapshot a lot. They used like the same underlying infrastructure like in VA to like serialize a heap, but they're kind of different tools for different means. Heap snapshot is designed for like doing a diagnostic on the heap. And startup snapshot is meant to be rehydrated. Well, whereas heap snapshot is not. Startup snapshot is. I think that's like there's so many snapshots in VA so people can get confused a bit. Yeah, that's something that I commonly see. Yeah, which also makes sense to have similar names. People are starting to learn as well. We've got loads of questions. We went on my phone from having no questions to having absolutely loads of questions. So thank you to everyone in the room online who's been submitting them. Wow. All right. First question with the most upvotes. Can snapshots also be used to start up aws Lambdas faster? So this is not something that because like in aws is not something you as a user can control, like how Node gets startup. So that kind of like the responsibility of using this kind of falls on whoever that runs Node. I think in this case it kind of like falls on Amazon. So yeah, as a user of this, I was an embedder of Node. That's kind of like out of your control. But like if you're someone who can run Node yourself or like if you can pass additional flags to Node, then yes, you can do that. Awesome. Thank you. Next question is, is there any chance of accidentally leaking sensitive environment variables to the snapshot during the build time? Which one? That's all right, we can go through them together. It's the top one that is highlighted there. Is there any chance of accidentally leaking sensitive environment variables to the snapshot during build time? So in Node core, we have like some internal assertions to make sure like you don't leak them. So we are also considering exposing this to the use land. But if you're deserializing a startup snapshot, usually you don't get to see them. It's very difficult that you get to see them because Node will refresh all of them. So yeah, so if you kind of like intentionally put something in there, that's possible. But also you can refresh them later. All right. Yeah, although I imagine if people were doing it, it probably wouldn't be an intentional act. That's definitely something I've been guilty of. How much of a startup boost has been tested to be achieved for user land startup snapshots? So there is like a literally a typescript compiler in Node core as a test feature to like test that we can snapshot the typescript compiler. So that's like you get still around, it's still in the range I mentioned about, it's like two times faster. Like before it's like 200 milliseconds, for example, on my test machine. And then when you turn the snapshot on, it's like 100 milliseconds. It kind of also depends on how much initialization you do in your application. If you do a lot, then you're going to save more because you're not even going to run code to initialize your application. You're just going to destroy things. So yeah, but in general, I would say you can get like two times faster. Thank you. We have time for one more and we have a bunch that have two ticks in. Let's actually go with the one that's right here at the top. What are the limitations of snapshots? For example, a TCP connection to a database won't be serialized or perhaps there's other things to consider as well. Yeah, that's what I mentioned earlier. Because for example, that would be an async operation that needs to be finished before you take a snapshot because I'm pretty sure it might be possible to somehow be able to deserialize a request, in-flight request, but also that's not currently a goal of this feature. So you kind of have to make sure there are no async requests before you take the snapshot and you need to resolve the promises. Those are the current limitation of these starter snapshots. Yeah, and hopefully start to build people's mental models around when it is appropriate to do this and when perhaps it may not make sense for a project. Let's actually do just one more here. How good of an idea is it to put all the code in a snapshot to optimize the startup of the whole app? So one thing that you can do is just wrap everything in a function that you don't actually invoke when you build a snapshot. And then when you put that function as the deserialized main function in the snapshot, so when you deserialize, it runs the whole function. So that's doable. Also, you probably don't get enough compilation cache with that because if they're not on the top level, VA will selectively compile some of those, but not all of them. There are some hints. So it's something that you can do. But also if you want optimal result, you can try to put more of them at the top level. Yeah, makes sense. Awesome. Look, there are so many more questions. They fall down the bottom of the page, but we are out of time. So I will remind everyone both in the room and online that the speaker Q&A room is where to go to continue to ask questions about this topic. The physical space is out by reception to the left of the door as you're kind of facing to exit it. And those online, you can use the Q&A room in the spatial chat. But please join me in a massive round of applause for Joy. Thank you so much. What a fantastic talk. Thank you. Thank you.