Node.js startup snapshots

Rate this content
Bookmark

V8 provides the ability to capture a snapshot out of an initialized heap and rehydrate a heap from the snapshot instead of initializing it from scratch. One of the most important use cases of this feature is to improve the startup performance of an application built on top of V8. In this talk we are going to take a look at the integration of the V8 startup snapshots in Node.js, how the snapshots have been used to speed up the startup of Node.js core, and how user-land startup snapshots can be used to speed up the startup of user applications.

Joyee Cheung
Joyee Cheung
28 min
14 Apr, 2023

Comments

Sign in or register to post your comment.

Video Summary and Transcription

The Talk discusses the Startup Snapshot initiative in Node, which aims to improve startup performance by adding new features and optimizing initialization costs. Startup snapshots, serialized binary blobs, are used to speed up startup and can be generated for both the core and user applications. Custom snapshots allow deserializing a heap from a specified snapshot, skipping parsing and compilation. The Talk also addresses misconceptions and limitations of startup snapshots, and highlights the different use cases for heap snapshots and startup snapshots.

Available in Español

1. Introduction to Startup Snapshot in Node

Short description:

I'm Joy, working on the startup performance strategic initiative in Node. The initiative has been renamed to Startup Snapshot. Node has been adding new features, requiring additional setup during startup. From LTS 18 to upcoming 20, support for VEJ, web crypto, file API, blob, web strings, and APIs under Yotel has been added. Node core is half in JavaScript and half in C++.

As mentioned, I'm Joy. I work at Egaleo and I work on Node and V8. So I've been working on the startup performance strategic initiative in Node for a while. The initiative has recently been renamed to Startup Snapshot as we have done the integration within Node core and we are enabling this feature for userland applications, which is what I'm going to talk about today.

So let's get started. So a bit of history. The Startup Snapshot integration started while Node started gradually dropping the old small core philosophy and adding a lot more built-in features. This includes new globals, in particular, new web APIs, new built-in modules, and new APIs in existing modules. These new features either require additional setup during the startup of Node core or require additional internal modules to be loaded during the startup.

So to give you an overview from the last LTS version 18 to the upcoming 20, we've added support for VEJ, web crypto, file API, blob, a bunch of web strings, and a bunch of new APIs under Yotel, such as the argument parser and the MIE type parser. The list is longer than that, but you get the idea. Like Node is growing a lot. Another part of this challenge is that the Node core is written about half in JavaScript and half in C++. So a lot of those internals are actually implemented in JavaScript.

2. Startup Performance and Initialization

Short description:

The startup performant is harder to maintain as the JavaScript code needs to be parsed and compiled before execution. To mitigate potential prototype pollution, JavaScript buildings are not copied for internal use. Node core uses multiple strategies to control startup initialization costs, including lazy loading, precompiling internal modules, and using V8 Startup snapshots. The snapshots are serialized binary blobs capturing the VA heap and execution contacts. They are used for isolates and contacts in NOE.

The upside of this is that this lowers the contribution barrier. In some cases, it reduces the C++ to JavaScript callback costs. But at the same time, this makes it harder to keep the startup performant. For one, the JavaScript code needs to be parsed and compiled before they can be executed, and that takes time. Also, most of the JavaScript code for initialization only gets run once during startup because it's just initialization, so it doesn't get optimized by the JavaScript engine.

When implementing a library in JavaScript, we have to take potential prototype pollution into account. You don't want the user to blow up the runtime just because they delete something from the building to a prototype, like string prototype that started with. So to mitigate this, no need to create copies of these JavaScript buildings as startup for the internals to use. They don't actually use the prototype methods that we expose to users. All this can slow down the startup as node grows.

So to keep the cost of the startup initialization under control, node core uses multiple strategies. First, we do not initialize all the globals and buildings as startup. For features that are still experimental or too new to be used widely or only serve a specific type of application, we only install accessories that will load them lazily when the user access them for the first time. And second, when building releases, we precompile all the internal modules to generate the code cache which contains bytecode and metadata, and we amp them to the executable so that when we do have to load additional modules, a user requests, we pass the code cache to V8, and V8 can skip the parsing and the compilation, and just use the serialized code when it updates, when it validates that cache. And finally, for essential features that we almost always have to load, for example, the web URL API, the FS module, which are used also by other internals, or like widely used timers, like time, widely used features like timers, we captured them in a V8 Startup snapshot, which helps simply skipping the execution of the initialization code and saving time during startup.

So this is kind of like how the node executable used to be built and run. Initially, we were just embedding the JavaScript code into the executable, at build time. And at run time, we need to parse it, we need to compile it, we need to execute it to get the node core initialized, and before we can run user code and process system states to initialize the user app. And then we introduced embedded code cache. So at build time, we precompile all the internal JavaScript code and generate compile code cache, and then we embed them into the executable. And the run time, we'll ask VA to use the code cache and skip the parsing and compilation process. We'll still keep the internal JavaScript code as the source of truth, in case the code cache doesn't validate in the current execution environment, but most of the time, the code used, and we just skip the compilation process. And now, with the starter snapshot integration, we just run the internal JavaScript code at build time to initialize a note heap and then we capture a snapshot and embed that into the executable. The other two are still kept as fallback, but at runtime we just simply deserialize the snapshot to get the initialized heap. So there is no need to even parse, compile, execute. The internal code is just like, you deserialize the result. So what exactly are these VA startup snapshots? They're basically the VA heap serialized into a binary blob. There are two layers of snapshots, one that captures all the primitives and the native bindings, and one that captures the execution contacts, like the objects and functions. So currently, NOE uses the isolate snapshot for all the isolates that you can create from the useland, including the main isolate and the worker isolates. We also have built-in contacts snapshots for the main contacts, the VM contacts, and the worker contacts, although the worker contacts snapshot currently only contains very minimal stuff.

QnA

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Top Content
Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.
You can check the slides for Feross' talk here.
Towards a Standard Library for JavaScript Runtimes
Node Congress 2022Node Congress 2022
34 min
Towards a Standard Library for JavaScript Runtimes
Top Content
You can check the slides for James' talk here.
ESM Loaders: Enhancing Module Loading in Node.js
JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
Native ESM support for Node.js was a chance for the Node.js project to release official support for enhancing the module loading experience, to enable use cases such as on the fly transpilation, module stubbing, support for loading modules from HTTP, and monitoring.
While CommonJS has support for all this, it was never officially supported and was done by hacking into the Node.js runtime code. ESM has fixed all this. We will look at the architecture of ESM loading in Node.js, and discuss the loader API that supports enhancing it. We will also look into advanced features such as loader chaining and off thread execution.
Out of the Box Node.js Diagnostics
Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
In the early years of Node.js, diagnostics and debugging were considerable pain points. Modern versions of Node have improved considerably in these areas. Features like async stack traces, heap snapshots, and CPU profiling no longer require third party modules or modifications to application source code. This talk explores the various diagnostic features that have recently been built into Node.
You can check the slides for Colin's talk here. 
Node.js Compatibility in Deno
Node Congress 2022Node Congress 2022
34 min
Node.js Compatibility in Deno
Can Deno run apps and libraries authored for Node.js? What are the tradeoffs? How does it work? What’s next?
Multithreaded Logging with Pino
JSNation Live 2021JSNation Live 2021
19 min
Multithreaded Logging with Pino
Top Content
Almost every developer thinks that adding one more log line would not decrease the performance of their server... until logging becomes the biggest bottleneck for their systems! We created one of the fastest JSON loggers for Node.js: pino. One of our key decisions was to remove all "transport" to another process (or infrastructure): it reduced both CPU and memory consumption, removing any bottleneck from logging. However, this created friction and lowered the developer experience of using Pino and in-process transports is the most asked feature our user.In the upcoming version 7, we will solve this problem and increase throughput at the same time: we are introducing pino.transport() to start a worker thread that you can use to transfer your logs safely to other destinations, without sacrificing neither performance nor the developer experience.

Workshops on related topic

Node.js Masterclass
Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Top Content
Workshop
Matteo Collina
Matteo Collina
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate
Build and Deploy a Backend With Fastify & Platformatic
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
WorkshopFree
Matteo Collina
Matteo Collina
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
0 to Auth in an Hour Using NodeJS SDK
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
WorkshopFree
Asaf Shen
Asaf Shen
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher
Building a Hyper Fast Web Server with Deno
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
WorkshopFree
Matt Landers
Will Johnston
2 authors
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
GraphQL - From Zero to Hero in 3 hours
React Summit 2022React Summit 2022
164 min
GraphQL - From Zero to Hero in 3 hours
Workshop
Pawel Sawicki
Pawel Sawicki
How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.
Mastering Node.js Test Runner
TestJS Summit 2023TestJS Summit 2023
78 min
Mastering Node.js Test Runner
Workshop
Marco Ippolito
Marco Ippolito
Node.js test runner is modern, fast, and doesn't require additional libraries, but understanding and using it well can be tricky. You will learn how to use Node.js test runner to its full potential. We'll show you how it compares to other tools, how to set it up, and how to run your tests effectively. During the workshop, we'll do exercises to help you get comfortable with filtering, using native assertions, running tests in parallel, using CLI, and more. We'll also talk about working with TypeScript, making custom reports, and code coverage.