Making JavaScript on WebAssembly Fast

JavaScript in the browser runs many times faster than it did two decades ago. And that happened because the browser vendors spent that time working on intensive performance optimizations in their JavaScript engines.

Because of this optimization work, JavaScript is now running in many places besides the browser. But there are still some environments where the JS engines can’t apply those optimizations in the right way to make things fast.

We’re working to solve this, beginning a whole new wave of JavaScript optimization work. We’re improving JavaScript performance for entirely different environments, where different rules apply. And this is possible because of WebAssembly. In this talk, I'll explain how this all works and what's coming next.


Transcript


About Lin


Hi, I'm Lin Clark and I make code cartoons. I also work at Fastly. Which is doing a ton of cool things with web assembly to make better Edge Compute possible. And I'm a co-founder of the Bytecode Alliance. We're working on tools for WebAssembly ecosystem that extends beyond the browser. And that is one of those tools that I wanted to talk to you about today.


About JavaScript and WebAssembly


JavaScript was first created to run in the browser so that people could add a little bit of interactivity to their web pages. No one would have guessed that 20 years later, people will be using JavaScript to build all sorts of big, complex applications to run in your browser. What made this possible is that JavaScript in the browser runs a lot faster than it did two decades ago. And that happened because the browser vendors spent that time working on some pretty intensive performance optimizations. 

[01:14] Now this started with the introduction of Just-in-Time compilers around 2008. And the browsers are built on top of that. Continuing these optimization efforts. Now we're starting work on optimizing JavaScript performance for an entirely different set of environments where different rules apply. And this is possible because of WebAssembly. So today I want to explain what it is about WebAssembly that enables this.

But first, I want to give you a heads up. This talk is structured a bit differently than speaking experts would tell me I should be structuring this presentation. I'm going to start with telling you how we're making this work at all. And once you've heard that you might not be on board, you might think that this is a pretty ridiculous idea. So that's why I'm going to explain why. Mostly, why you would actually want to do this. And then once you're bought in, and I know you'll be bought in, then I'm going to come back and explain exactly how it is that we're making this fast.


How we're running JavaScript inside a WebAssembly engine


[02:16] So let's get started with how we're running JavaScript inside a WebAssembly engine. Whenever you're running JavaScript, the JS code needs to be executed as machine code in one way or another. Now this is done the JS engine using a variety of different techniques. From interpreters and to compilers. And I explained this all in more detail in my first set of articles about WebAssembly back in 2017. So if you want to understand more about how this works, you can go back and read those articles.

[02:51] Running this JavaScript code is really quite easy in environments like the web. Where you know that you're going to have a JavaScript engine available. But what if your target platform doesn't have a JavaScript engine? Then you need to deploy your JavaScript engine with your code. So that's what we need to do to bring JavaScript to these different environments. So how do we do this?

Well, we deploy the JavaScript engine as a WebAssembly module, and that makes it portable across a bunch of different machine architectures. And with Wazee, we can make it portable across a bunch of different operating systems as well. This means that the whole JavaScript environment is bundled up into the WebAssembly module. And once you deploy it, all you need to do is feed in the JavaScript code. And that JavaScript engine will run that code. Now, instead of working directly on the machine's memory, like it would for a browser, the JavaScript engine puts everything from Bytecode to the garbage collected objects that the Bytecode works on into the WebAssembly memories, Linear Memory.

[04:00] For our JS engine, we went with SpiderMonkey. That's the JS engine that Firefox uses. It's one of the industrial strength JavaScript virtual machines, because it's been battle-tested in the browser. And this kind of battle testing and investment in security is really important when you're running untrusted code or running code that processes untrusted input. SpiderMonkey also uses a technique called precise stack scanning, which is important for some of the optimizations. I'll be describing a bit later in the talk.

So far there's nothing revolutionary about the approach that I've described. People have already been running JavaScript inside of WebAssembly like this for a number of years. The problem is that it's slow. WebAssembly doesn't allow you to dynamically generate new machine code and run it from within pure WebAssembly code. So this means that you can't use the JIT. You can only use the interpreter. Now, given this constraint, you might be asking why. Since JIT are how the browsers made JS code run fast, and since you can't JIT compile inside of a web assembly module, this just doesn't make sense. But what if, even given these constraints, we could actually make this JavaScript run fast? Let's look at a couple of use cases where a fast version of this approach could be really useful.


Use Cases


[05:31] There are some places where you can't use a Just-in-Time compiler due to security concerns. So for example, iOS devices, or some smart TVs and gaming consoles. On these platforms, you have to use an interpreter. But the kinds of applications that you run on these platforms are long running, and they require lots of code. And those are exactly the kinds of conditions where historically you wouldn't want to use an interpreter, because of how much it slows down your execution. If we can make our approach fast, then these developers could use JavaScript on JIT-less platforms without taking a massive performance hit.

Now, there are other places where using a JIT isn't a problem, but where startup times are prohibitive. So an example of this is in Serverless functions, and this plays into that cold-start latency problem that you might've heard people talking about. Even if you're using the most paired-down JavaScript environment, which is an isolate that just starts up a bare JavaScript engine. You're looking at about five milliseconds of start-up latency. Now there are some ways to hide this startup latency for an incoming request. But it's getting harder to hide as connection times are being optimized in the network layer, with proposals such as QUIC. It's also harder to hide when you're chaining different Serverless functions together.

[06:58] But more than this, platforms that use these kinds of techniques to hide latency also often reuse instances between requests. In some cases, this means that global state can be observed between different requests, which can be a security issue. And because of this cold start problem, developers also often don't follow best practices. They stuff a lot of functions into one Serverless deployment. So this results in another security issue, which is a larger blast radius. If one part of the Serverless deployment is exploited, the attacker has access to everything in that deployment. But if we can get JavaScript startup times low enough in these contexts, then we wouldn't need to hide startup times with any tricks. We could just start up an instance in microseconds. With this, we can provide a new instance for each request, which means that there's no state lying around between requests. And because the instances are so lightweight, developers could feel free to break up their code into fine-grained pieces, and this would bring their blast radius down to a minimum for any single piece of code.


How can we make JavaScript on WebAssembly fast?


[08:08] So for these use cases, there's a big benefit to making JavaScript on Wasm fast. But how can we do that? In order to answer that question, we need to understand where the JavaScript engine spends its time. We can break down the work that a JavaScript engine has to do into two different parts: initialization and runtime.

I think of the JS engine as a contractor. This contractor is retained to complete a job. And that job is running the JavaScript code and getting to a final result. Before this contractor can actually start running the project, though, it needs to do a little bit of preliminary work. This initialization phase includes everything that only needs to happen once, at the very start of the project.

[09:00] So one part of this is application initialization. For any project, the contractor needs to take a look at the work that the client wants it to do, and then set up the resources that it needs in order to complete that job. So for example, the contractor reads through the project briefing and other supporting documents and turns them into something that it can work with. So this might be something like setting up the project management system with all of the documents stored and organized and breaking things into tasks that go into the task management system.

In the case of the JS engine, this work looks more like reading through the top-level of the source code and parsing functions into Bytecode, or allocating memory for the variables that are declared, and setting values where they're already defined.

[09:46] So that's application initialization, but in some cases there's also engine initialization and you see this in contexts like Serverless. The JS engine itself needs to be started up in the first place, and built-in functions need to be added to the environment.

I think of this like setting up the office itself, doing things like assembling the IKEA chairs and tables and everything else in the environment, before starting the work. Now, this can take considerable time and that's part of what can make the cold start such an issue for serverless use cases.

[10:23] Once the initialization phase is done, the JS engine can start its work. This work of running the code. And the speed of this part of the work is called throughput, and this throughput is affected lots of different variables. So for example: which language features are being used, whether the code behaves predictably from the JS engine's point of view, what sorts of data structures are used, and whether or not the code runs long enough to benefit from the JS engine's optimizing compiler. 

So these are the two phases where the JS engine spends its time. Initialization and Runtime. Now, how can we make the work in these two phases go faster?

[11:10] Let's start with initialization. Can we make that fast? And spoiler alert. Yes, we can. We used a tool called Wizer for this and I'll explain how that works in a minute. But first I want to show you some of the results that we saw. We tested with a small markdown application. And using Wizer, we were able to make startup time six times faster. If we look in more depth at this case, about 80% of this was spent on engine initialization. And the remaining 20% was spent on application initialization. And part of that is because this markdown render is a very small and simple application. As apps get larger and more complex, application initialization time just takes longer. So we would see even larger comparative speed ups for real-world applications.

[12:02] Now we get this fast startup using a technique called snapshotting. Before the code is deployed, as part of the build step, we run the JavaScript code using the JavaScript engine to the end of the initialization phase. And at this point, the J Ascension has parsed all of the JS Bytecode, or JS, and turned it into te code which the JS engine module stores in the linear memory. And the engine also does a lot of memory allocation and initialization in this phase.

Because this linear memory is so self-contained, once all of the values have been filled in, we can just take that memory and attach it as a data section to a Wasm module. When the JS engine module is instantiated, it has access to all of the data in the data section. Whenever the engine needs a bit of that memory, it can copy the section, or rather the memory page, that it needs into its own linear memory. With this, the JS engine doesn't have to do any setup when it starts up. All of this is pre-initialized, ready and waiting for it to start its work.

[13:11] Currently, we attach the data section to the same module as the JS engine. But in the future, once WebAssembly module linking is in place, we'll be able to ship the data section as a separate module. So this provides a really clean separation and allows the JS engine module to be reused across a bunch of different JS applications.

The JS engine module only contains the code for the engine. That means that once it's compiled, that code can be effectively cached and reused between lots of different instances. Now, on the other hand, the application-specific module contains no WebAssembly code. It only contains the linear memory, which in turn contains the JavaScript Bytecode, along with all of the rest of the JS engine state that was initialized. This makes it really easy to move this memory around and send it wherever it needs to go.

[14:06] It's kind of like the JS engine contractor doesn't need to set up its own office at all. It just gets this travel case shipped to it. And that travel case has the whole office, with everything in it all set up and ready to go for the JS engine to just get to work. And the coolest thing about this is that it doesn't rely on anything that's JS dependent. It's just using an existing property of WebAssembly itself. So you could use the same technique with languages like Python, Ru, Lua and other run times, too.

So with this approach we can get to this super fast startup time. But what about throughput? Well, for some use cases, the throughput is actually not too bad. If you have a very short running piece of JavaScript, it wouldn't go through the JIT anyways, it would stay in the interpreter the whole time. So in that case, the throughput will be about the same as in the browser. So this will have finished before a traditional JavaScript engine would have finished initialization in the case where you need to do engine initialization. But for a longer running JavaScript, it doesn't take all that long before the JIT starts kicking in. And once this happens, the throughput difference does become pretty obvious.

[15:24] Now, as I said before, it's not possible to JIT compile code within pure WebAssembly module at the moment. But it turns out that we can apply some of the same thinking that comes with Just-in-Time compilation to an ahead-of-time compilation model.

So one optimizing technique that JITs use is inline caching, which I also explained in my first series about WebAssembly. When the same bit of code gets interpreted over and over and over again, the engine decides to store its translation for that bit of code to reuse next time. And the stored translation is called the stub. Now these stubs are chained together into a linked list. And they're based on what types are used for that particular invocation. The next time that the code is run, the engine will check through this list to see whether or not it actually has a translation that is available for those types. And if so, it'll just reuse the stub.

[16:22] Because IC stubs are commonly used in JITs, people think of them as being very dynamic and specific teach program. But it turns out that they can be applied in an AOT context, too.

Even before we see the JavaScript code, we already know a lot of the IC stubs that we're going to need to use. To generate. That's because there are some patterns in JavaScript that just get used a whole lot.

[16:49] A good example of this is accessing properties on objects. This happens a lot in JavaScript code, and it can be sped up using an IC stub. For objects that have a certain shape or hidden class, that is where the properties are laid out in the same order, when you get a particular property from those objects, that property will always be at the same offset.

Now, traditionally this kind of IC stub in the JIT would hard code two values: the pointer to the shape and the offset of the property. That requires information that we don't have ahead of time. What we can do is parameterize the IC stub. So we can treat the shape and the property offset as variables that get passed in for the stub. And this way we can create a single stub that loads values from memory, and then use that same stub code everywhere. We can just bake all of the stubs for these common patterns into the AOT compiled module, regardless of what the JavaScript is actually doing.

[17:53] And we discovered that with just a couple of kilotes of IC stubs, we can cover the vast majority of all JS code. For example, with two kilotes of IC stubs, we can cover 95% of the JavaScript in Google's Octane benchmark. And from preliminary tests, that percentage seems to hold up for general web browsing as well.

Now this is just one example of a potential optimization that we can make. Right now, we're in the same kind of position that the browser JS engines were in in the early days. When they were first experimenting with Just-in-Time compilers in the first place. We still have a lot of work to do to find the clever shortcuts that we can use in this context. But we're excited to be starting that work and excited for the changes to come.

If you're excited like we are about this and want to contribute to the optimization efforts, or if you want to try to make this work for another language like Python or Ru or Lua, we'd be happy to hear from you. You can find us on the messaging platform, Zulip. Feel free to post there if you want to ask for more info. You can also find links to the projects that I mentioned in my recently published blog post on the Bytecode Alliance blog. I want to say thank you to the organizers for inviting me to speak here today. And thank you all for listening.


Questions


[19:17] Mettin Parzinski: Hey, Lin.

Lin Clark: Hi there.

Mettin Parzinski: Good to see you.

Lin Clark: Good to see you too.

[19:22] Mettin Parzinski: So with 55%, we have, "No, but I want to," and 40% said, "No." Does this surprise you? Is this what you were expecting?

[19:34] Lin Clark: No, it doesn't surprise me. It's still pretty early in terms of the tooling for the ecosystem. There are a lot of people that are using WebAssembly as users, without realizing it. So of course, if you're using Facebook, you're using WebAssembly when you upload a picture. So there are lots of folks that are actually on the user side using it under the hood and not really realizing it. There are lots of people also that are using it because it's embedded in modules that they're using. So that all makes sense that a lot of people don't realize that they're using WebAssembly. I think that as things progress, people are going to realize more and more when they are using WebAssembly, they're going to start targeting WebAssembly in certain cases, like with the JavaScript to WebAssembly work.

[20:31] Mettin Parzinski: You're talking more about that people are using it as a consumer, but not as a developer. Yeah, it doesn't surprise me at all. Even though it's been around for a long time, I actually hop around from client to client every year, and I never hear anywhere where it's used yet here in the Netherlands. So of course, there will be companies, but that's my experience, so it doesn't surprise me.

So let's jump to the Q&A. We have some questions from our audience, and if you still have any questions for Lin, then you can jump to the Community Track Q&A channel in Discord. I want to make one little note. Of course, you made your cartoons and I think everyone here must've fallen in love with your slides, and you will see a big spike in your traffic on the Code Cartoons website. Really great styling of your slides, so I wanted to give some compliments on that.

[21:31] Lin Clark: Thank you. The Code cartoons website has not been updated in a while. I need to actually take care of that. So a great place to find them is the Bytecode Alliance blog or Mozilla Hacks website.

Mettin Parzinski: And on your Twitter, I think?

Lin Clark: Yes. And on my Twitter.

[21:51] Mettin Parzinski: For the latest and greatest from Lin, check her Twitter. The first question is from Alexis. "Do you think that WebAssembly will become a major part of web development, bringing other languages to the web, or will it stay in the area of computational heavy applications?"

[22:13] Lin Clark: So I think that this wasn't the plan, but I think that we're actually going to see more of the interesting development happening outside of the web, and then coming back to the web through WebAssembly. Because I think on the web, for most things, keeping your work in JavaScript does make sense. Keeping it in JavaScript that's running natively in the browser does make sense. Except for those computationally heavy pieces. But when you're talking about outside the web, you get so many benefits from WebAssembly. You get the isolation, you get the small footprint, you get the portability that you don't have with a lot of other technologies. So I think that we're going to see a lot of uptake and interesting work happening outside of the web. And then as people want to start using those applications in their own websites, they'll bring it back into the web and start plugging it in there. That's my prediction.

[23:22] Mettin Parzinski: That sounds good. Next question is from Happy to Collaborate. I think this is someone that wants to help out. "Do you believe that the many standard, highly used modules that currently exist in JavaScript would be mimicked or duplicated or replaced equivalent Wasm based modules? This would maybe offer better security to the consumer."

[23:47] Lin Clark: I'd be interested. You said managed modules?

Mettin Parzinski: No, they're all highly used modules, standard modules. So I'm thinking this means it means something like Underscore.

Lin Clark: Okay. So Underscore is an NPM module. It could be talking about built-in modules, which are the standardized modules and JavaScript. And those are browser built-ins. And then there's also the NPM ecosystem that has a lot of commonly used modules. And yes, I do think that we're going to see implementations of that same kind of functionality in WebAssembly, sometimes more performing.

[24:35] Mettin Parzinski: Yeah, that would be awesome. Next question is from our attendee, Keith "What would be a good way to incorporate web assembly in an existing application to minimize risk, but start to get comfortable using it and taking advantage of it and how to use WebAssembly in an existing project?"

[24:55] Lin Clark: So it depends on whether or not you're talking about the web or outside of the web. So if you're talking about on the web, you probably want to be using an application where, if you have something that's computationally heavy, then it makes sense to take that little part that is computationally heavy, and port that little bit to WebAssembly. If you're working outside of the web, there are platforms like the one that we have at Fastly, where you can just easily put up a WebAssembly module that is the artifact that you put up to run on a computed edge platform. So in that case, you're just going to start up a service doing WebAssembly. And so, if you have a microservices architecture, then you can do one of your services in WebAssembly.

[26:01] Mettin Parzinski: Awesome. Next question is from Warlock, "What are build sizes in general? Bandwidth cost of using Wasm?"

[26:14] Lin Clark: So it really depends on what kind of application you're doing. The build sizes? The standard was designed to be very compact. I actually wrote about this in my first series on WebAssembly, but you can find on the Mozilla Hacks blog. So they're as compact as possible basically. But it, of course, depends on exactly what kind of computation you're doing. Sometimes the equivalent JavaScript would still be smaller depending on what you're doing.

[26:58] Mettin Parzinski: Okay. Last question we have time for. Next question is from Bartos, "Is snapshotting possible with V8 too or has it only been explored in the context of Wasm?"

[27:22] Lin Clark: So I think one thing that people sometimes don't realize is the snapshotting that you have in V8, that is applications, and that part's not new. Application snapshotting is not new. The snapshot in the engine is something that you wouldn't have without the WebAssembly runtime to do that snapshotting. Because if you're opening up a V8 isolate, what you're starting up is the isolate itself. So, that engine initialization is also part of this.

[28:00] Mettin Parzinski: Awesome. So that was the last question. But I have one question that I feel is too important to skip and it's from Richard S and he says, "Where can I learn more about WebAssembly?"

[28:13] Lin Clark: So, I've done a number of blog posts about WebAssembly. You can find those on the Mozilla Hacks website or on the Bytecode Alliance site. If you want to get involved in WebAssembly, there's a GitHub repo, well GitHub organization where all the standardization happens. That's if you really want to get low level deep into the details.

[28:36] Mettin Parzinski: Nice. All right. Thanks. Lovely. So there's some more questions in the Discord channel that we don't have time for, but I will invite everyone that still has questions for Lin to join Lin on our room on Spatial Chat, where she will be going now. So Lin, thanks a lot for joining me here and enjoy your Spacial Chat speaker room.

Lin Clark: Thank you.

Lin Clark
30 min

Check out more articles and videos

Workshops on related topic