Parsing Millions of URLs per Second

Rate this content
Bookmark

With the end of Dennard scaling, the cost of computing is no longer falling at the hardware level: to improve efficiency, we need better software. Competing JavaScript runtimes are sometimes faster than Node.js: can we bridge the gap? We show that Node.js can not only match faster competitors but even surpass them given enough effort. URLs are the most fundamental element in web applications. Node.js 16 was significantly slower than competing engines (Bun and Deno) at URL parsing. By reducing the number of instructions and vectorizing sub-algorithms, we multiplied by three the speed of URL parsing in Node.js (as of Node.js 20). If you have upgraded Node.js, you have the JavaScript engine with the fastest URL parsing in the industry with uncompromising support for the latest WHATGL URL standard. We share our strategies for accelerating both C++ and JavaScript processing in practice.

 Yagiz Nizipli
Yagiz Nizipli
14 min
04 Apr, 2024

Comments

Sign in or register to post your comment.

Video Summary and Transcription

Today's talk explores the performance of URL parsing in Node.js and introduces the ADA URL parser, which can parse 6 million URLs per second. The ADA URL parser includes optimizations such as perfect hashing, memoization tables, and vectorization. It is available in multiple languages and has bindings for popular programming languages. Reach out to Ada URL and Daniel Lemire's blog for more information.

1. URL Parsing and Performance

Short description:

Today's talk is about parsing millions of URLs per second and achieving a 400% improvement. We will explore the state of Node.js performance in 2023 and the impact of a new URL parsing dependency. We'll also discuss the structure of a URL and the various components involved.

Hello. Today I'm going to talk about parsing millions of URLs per second. My name is Elzen Zipli and I'm a senior software engineer at Sentry. I'm an OGS technical steering committee member. I'm an OpenJS foundation cross-project council member. You can reach me from my GitHub account, github.com, and from X, formerly known as Twitter, from X.com.

Software performance in the last decade has changed drastically. The main goal was to reduce cost in cloud environments such as AWS, Azure or Google Cloud. The latency has been a problem, and in order to improve it, we need to optimize our code more than ever right now. Reduce complexity, parallelism, caching, and performance brings those kinds of things. And most importantly, the climate change. Faster computers resulted in better tomorrows and better climate.

So state of Node.js performance 2023. This is a quote from that. Since Node.js 18, a new URL parsing dependency was added to Node.js 8. This addition bumps the Node.js performance from parsing URLs to a new level. Some results could reach up to an improvement of 400%. State of Node.js performance 2023 and this is written by Rafael Gonzaga, which is a Node.js technical steering committee member. This talk is about how we reach 400% improvement in URL parsing. Another quote from James Snell from Cloudflare and also Node.js TSC. Just set a benchmark for a code change, go from 11 seconds to complete down to about half a second to complete, this makes me very happy. This is in reference to adding Ada URL to Cloudflare.

So let's start with the structure of a URL. For example, there is HTTPS user pass at example.com, 1 2 3 4, which is the port number, then we have Foo, Bar, Buzz, and QUU. So it starts with the protocol, HTTPS is the protocol, it ends with the slash. Then we have the username and password. This is an optional field in all URLs. Then we have the host name, which is example.com. Then we have the port, which is 1 2 3 4. And then we have the path name, which is slash Foo slash Bar.

2. URL Parsing and Assumptions

Short description:

URLs have various optional components, different encodings, and support for different types of URLs like file-based URLs, JavaScript URLs, and path names with dots. Implementations like PHP, Python, curl, and Go follow different URL parsing specifications. We challenge the assumptions that URL parsing doesn't matter and URLs are free.

And then we see the search, which starts with a question mark Buzz. And then we have the hash, which is QUU. So port number, path name, search, hash, username, password, they're all optional. Even host name is optional if you have a file URL. But this is just an example about how this structure of a URL is. There are also, despite the structure of the URL, there is also different encodings that the URL specification supports, such as non-ASCII format, which is the first one. Then we support file-based URLs, which is what you see in Unix-based systems, file, slash, slash, slash, Foo, Bar, Buzz, Foo, Bar, Test, Node.js. Then we have JavaScript URLs, which is JavaScript colon alert. Then we have percent encoding that starts with a URL that has subsections, sub strings that has a percentage character in dash. And then we have path names with dots, which is like example.org slash dot A slash A dot dot slash B, which basically resolves into a different URL according to the URL specification. Then we have IPv4 addresses with hex and octal digits, 127.0.0.0.0.0.0.0.1, which is 127.0.0.0.1. And we have also IPv6 and so on and so forth. According to what we do URL, if we put in this input string, HTTPS711home dot dot slash Montreal. PHP in PHP, it's unchanged. In Python, it's unchanged. In what we do URL, which is implemented by Chrome, Safari and all the browsers, including Ada, it's xn dash dash 711 and so on so forth. In curl, it's a lot different. And as you see, in Go runtime, it's a lot different as well. This is mostly because of different implementations and also from all other subsystems, all other languages doesn't follow the what we do URL strictly. For PHP and Python, they basically parse the URL from start and string without any making allocations. And for curl and Go, they implement a different specification called RFC 3787. Or similar, I'm not really quite sure about. So we have these old assumptions like does URL parsing really matter? Is it the bottleneck to some performance metric? URLs are free, you don't gain anything by overlaying. This is what, these were the assumptions that we broke with our work. And you will see why.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Top Content
Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.
You can check the slides for Feross' talk here.
Towards a Standard Library for JavaScript Runtimes
Node Congress 2022Node Congress 2022
34 min
Towards a Standard Library for JavaScript Runtimes
Top Content
You can check the slides for James' talk here.
ESM Loaders: Enhancing Module Loading in Node.js
JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
Native ESM support for Node.js was a chance for the Node.js project to release official support for enhancing the module loading experience, to enable use cases such as on the fly transpilation, module stubbing, support for loading modules from HTTP, and monitoring.
While CommonJS has support for all this, it was never officially supported and was done by hacking into the Node.js runtime code. ESM has fixed all this. We will look at the architecture of ESM loading in Node.js, and discuss the loader API that supports enhancing it. We will also look into advanced features such as loader chaining and off thread execution.
Out of the Box Node.js Diagnostics
Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
In the early years of Node.js, diagnostics and debugging were considerable pain points. Modern versions of Node have improved considerably in these areas. Features like async stack traces, heap snapshots, and CPU profiling no longer require third party modules or modifications to application source code. This talk explores the various diagnostic features that have recently been built into Node.
You can check the slides for Colin's talk here. 
Node.js Compatibility in Deno
Node Congress 2022Node Congress 2022
34 min
Node.js Compatibility in Deno
Can Deno run apps and libraries authored for Node.js? What are the tradeoffs? How does it work? What’s next?
Multithreaded Logging with Pino
JSNation Live 2021JSNation Live 2021
19 min
Multithreaded Logging with Pino
Top Content
Almost every developer thinks that adding one more log line would not decrease the performance of their server... until logging becomes the biggest bottleneck for their systems! We created one of the fastest JSON loggers for Node.js: pino. One of our key decisions was to remove all "transport" to another process (or infrastructure): it reduced both CPU and memory consumption, removing any bottleneck from logging. However, this created friction and lowered the developer experience of using Pino and in-process transports is the most asked feature our user.In the upcoming version 7, we will solve this problem and increase throughput at the same time: we are introducing pino.transport() to start a worker thread that you can use to transfer your logs safely to other destinations, without sacrificing neither performance nor the developer experience.

Workshops on related topic

Node.js Masterclass
Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Top Content
Workshop
Matteo Collina
Matteo Collina
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate
Build and Deploy a Backend With Fastify & Platformatic
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
WorkshopFree
Matteo Collina
Matteo Collina
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
0 to Auth in an Hour Using NodeJS SDK
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
WorkshopFree
Asaf Shen
Asaf Shen
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher
Building a Hyper Fast Web Server with Deno
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
WorkshopFree
Matt Landers
Will Johnston
2 authors
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
GraphQL - From Zero to Hero in 3 hours
React Summit 2022React Summit 2022
164 min
GraphQL - From Zero to Hero in 3 hours
Workshop
Pawel Sawicki
Pawel Sawicki
How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.
Mastering Node.js Test Runner
TestJS Summit 2023TestJS Summit 2023
78 min
Mastering Node.js Test Runner
Workshop
Marco Ippolito
Marco Ippolito
Node.js test runner is modern, fast, and doesn't require additional libraries, but understanding and using it well can be tricky. You will learn how to use Node.js test runner to its full potential. We'll show you how it compares to other tools, how to set it up, and how to run your tests effectively. During the workshop, we'll do exercises to help you get comfortable with filtering, using native assertions, running tests in parallel, using CLI, and more. We'll also talk about working with TypeScript, making custom reports, and code coverage.