Type-safe bindings for Node.js with Rust and WebAssembly

Bookmark
Slides

This talk will teach you how to write performance-critical Node.js modules without the burden of distributing platform-dependent artifacts and using the C/C++ toolchain. You will discover how to smoothly integrate Rust code into your Node.js + TypeScript application using WebAssembly. You will also learn how to avoid the typical WebAssembly serialization issues, and understand when other alternatives like Neon or Napi.rs are preferable. Together, we will cross the language bridge between Rust and Node.js while preserving the familiar DX you're used to.



Transcription


Hi everyone and thanks to Node Congress for having me. This is Type-C bindings for node.js with rust and webassembly. We'll be looking for an alternative, easier approach for creative native node.js modules, while also automatically generating types for them. A little bit about me. I'm Alberto Schibel. I'm from Venice, Italy. I'm a software engineer at prisma. We're reporting several rust modules to webassembly. I'm also a consultant working with node.js, typescript, and rust. You can find me online at jcomino. And you will also find the slides for this talk on my GitHub page. So elephant in the room, what is webassembly and how is that useful? Well webassembly or WASM is basically a low-level abstraction for the CPU your code is running on. It's fast, compact bytecode in the sense that it's a portable binary format for a virtual machine that models loads and stores of numbers inside the linear memory. It's designed for near native speed and is also optimized for fast startup time and small memory footprint. It was created by browser vendors to port C++ code to the web without performance degradation. And now it's a portable completion target for many languages, including rust, Go, and many others. That means you can compile your code to WASM once and then you can run the same compiled artifact on different platforms. For instance, node.js supports webassembly since version 8. And now you can import a WASM module exactly like you would import a standard npm package. And why is a single portable completion target useful? Consider node.js, a popular NAPI add-on that compiles SAS styles to css. To support multiple system configurations, architectures, and even node.js versions, this library needs to be compiled separately for each of these configurations. This means 35 different completion targets. And it makes every new deployment a time and resource consuming task. Those of you who use prisma will probably know that we're on a similar situation with typescript CLI and a library that downloads some compiled rust binaries on demand. This is what made us consider webassembly and adopted it as much as we could to simplify our deployment process. And if you ever tried to write native node.js add-ons yourself, you probably know that it's not a straightforward process. Sometimes Node.jp fails with cryptic error messages. And frankly, the tooling necessary to build and import C++ modules isn't as human friendly as what node.js developers are accustomed to. So this is perhaps one of the major reasons why rust is consistently voted as the best language for webassembly. So let's see how we can create typescript modules from it. And for those of you new to rust, let's define some baseline glossary. So anything you use a package.json for, you will put in a cargo.toml file in a rust project. What you usually call npm packages are crates in rust. And you operate on them via the cargo CLI. For instance, for compiling rust code, you will use the cargo build command. And you will specify the compilation target. In our case, for webassembly, it's wasm32 unknown unknown, which uses a 32-bit addressing space and isn't tied to any particular OS vendor or CPU architecture. If we want to move around more than purely numeric data across the webassembly bridge, we're going to need a binding generation tool, like wasm-bindgen. It's both a CLI and a rust library. And when you install it, you should specify a particular version, because it doesn't yet follow semantic versioning. I'm using version 0.2.84 in this case. This means that the version you specify in your cargo.toml file should match the version of wasm-bindgen you have installed on your machine. Moreover, to support webassembly, you need to mark your create type as cdilyb. You will tell rust to compile your code as a dynamic library that can be loaded by a C-compatible runtime, like node.js. So compiling rust webassembly is a two-step process. First, you need to run cargo build to create a compiled webassembly artifact, which will have the.wasm extension. And then you will run wasm-bindgen to generate the node.js and typescript bindings you will use to import the compiled wasm module. Of course, if wasm-bindgen supported all the commonly used rust data structures and typescript conventions, this talk will already be over. And clearly, that's not the case. So let's see how we can work around this. So for our first example, let's first see how we can define rust functions that take a number, duplicate it, and return it to the caller. I know that rust syntax can be a little bit overwhelming, so bear with me. We first import the wasm-bindgen library. We then tell rust to generate bindings for the function that follows, which will be compiled to webassembly. And whenever you see a code with a hashtag and a square bracket, that means it's a rust macro, a special kind of function that expands to generate the code at compile time. We define a public function, duplicate underscore u64, that takes an assigned 64-bit integer, multiplies it by two, and returns it. The other function is similar, but it uses floating-point numbers instead. And wasm-bindgen generates the following typescript declaration for us that you see at the bottom. We see that the function names are preserved as is. U64 numbers are mapped to begins, and F32 numbers are plain numbers in typescript because it doesn't really have a dedicated floating-point type. And here's a similar example with strings. To the left, we have a toUpperCase function that takes a string and returns a new string in all caps. Observe that in this case, we specify a custom name for the function for the JS bindings, and again, we use the wasm-bindgen macro for that. To the right, we have an end-to-string function that takes a 64-bit signed integer and returns a string representation of it. Notice that strings in rust are UTF-8 encoded. However, in javascript, they are UTF-16 encoded. And this is something you need to be aware of, especially if you're manipulating strings that may contain emojis or non-Latin characters, as you may end up with funky results. And what happens when we try to use these functions in typescript? Well, if we pass types compatible with the typescript declarations, well, they work as expected at runtime. But if we escape from the typescript validations by disguising a string as, for instance, as a begin, and we call an end-to-string function with that, well, in that case, we will get a runtime syntax error, because the function expects a number, but it's being called with a string. And what if we need more complex data structures? Here's an example with a scholars struct, which wraps values like numbers, characters, or booleans. Say we want a function that extracts the value of one of the fields, namely the letter, right, the character letter. If we were to manually write the typescript bindings for this, we will define scholars as a typed dictionary, which we call construct in place, and we type the letter field as a string. Because typescript doesn't really distinguish between single-character and multi-character strings. However, this is not what wasBinds and generates. And this is what it gets, what it creates. And although the four struct members have the types we expect, we actually get a scholar class definition, not a dictionary type. Moreover, we see some internal details that are leaking out to the generated code, and namely, that's the free method, which doesn't take any argument and doesn't return any value. This is not something we wrote in our Rast type. This is something that wasBinds and generates. Do you also notice that something else is missing? Well, this class doesn't have a constructor. So how do we create instances of it from node.js? Well, we can attempt to call the default JS constructor, and we can assign the fields manually. We just need to specify our default implementation for this free method. However, if we do this and we pass this scholar class instance to the get letter function, well, this will fail at runtime with a cryptic error. No pointer passed to rust. And it turns out that we can actually fix this by manually defining a constructor in rust which takes the four struct members as arguments, using the wasBinds and constructor macro and then calling the constructor from typescript. But I think it's clear that this is not the best developer experience we can get, right? As it requires boilerplate code and is not ergonomic for typescript devs. By the way, notice that the letter field is automatically truncated to a single character string, although we initialize it with a longer string. And what happens if we wrap strings in a struct, similarly to how we did with the scholars? Well, this code won't even compile. And that's because strings in rust are non-copyable and wasBinds will need to copy strings around. And one way to get around the problem is making wasBinds and clone the string with a dedicated macro attribute, getter with clone. But this is not something a typescript developer should be concerned with. And it's, again, an internal detail we don't need to be aware of. We still don't need to be aware of. Also, we still get a class binding rather than a dictionary type. And we've seen how cumbersome and awkward that is to use. How about enums? Well, C-style enums are translated one-to-one to typescript enums. So we can say that wasBinds works out of the box in this case. However, enums are often considered a bad practice in the typescript community, as they are a little bit hard to reason around, because the javascript runtime doesn't have any notion of enums, right? So that could lead to unexpected bugs. So ideally, we will prefer to get a union of literal types instead, like the one we see at the top right. And how about discriminated unions or targeted unions? They are a popular pattern in typescript, especially when encoding algebraic data types. And it turns out that rust supports them in the form of enum variants. In this example, we have an either type that at a given time encodes a successful numeric result with an OK constructor or a failure message with an Error constructor. However, enum variants are not supported by wasBinding, as the compiler error message tells us. So we cannot really use them as they are. Finally, wasBinding provides partial support for homogeneous vectors, but only for numeric types, which are translated to typed array instances. And they are essentially only useful when manipulating raw binary data. They are quite far from the standard general purpose arrays we usually want. Also vectors of non-primitive types, nested vectors, or tuples are not supported at all. So wasBinding provides the basic tools to port rust libraries to node.js. But it's neither ergonomic nor idiomatic for typescript apps, and is overall quite limited. Can we do better than this? Well, the first non-standard library that every rust developer usually encounters is Serde, which provides macros and utilities to serialize and deserialize common rust types to and from several formats with minimal boilerplate. And one first step to support more rust data types in javascript is exposing functions that consume and return JSON-encoded strings, which we can then parse and stringify in rust via Serde's serialized and deserialized traits. I've listed also the dependencies that we need to add to our cargo.tomog file and our versions for everyone's convenience. Here's how it will work. We first import Serde's traits, and we apply them to a rust struct or enum to make it automatically serializable and deserializable. Think of those traits as interfaces needed to translate data structures to formats like JSON, and think of the derive macro as something that implements those traits for us. Then we define a public string to string function with a wasBinding macro we already used to. Next, we parse the input string, which we assume being JSON-encoded, into the scholars struct we defined above. We compute the result, and we serialize it back to JSON. And then we return this to the caller. Notice that the typescript binding is technically typed, but it's not very useful, as we could parse any JSON or even a string that is not a JSON at all, and typescript will still accept it at compile time, although it will result in an error at runtime. JSON serialization can be expensive in practice, so the Serde wasBinding trait came up with a more efficient approach, providing a native binary integration of Serde with wasBinding. The project is currently maintained by cloudflare, by the way. And again, since it relies on Serde, we get support for plenty of rust data types, which we can use in javascript. So notable differences from the plain wasBinding approach is that enum variants can be translated to tagged unions, we get generic vector support, as well as support for maps. And similarly to the previous example, we can define a scholars struct, actually this is a subset of the previous example, with Serde's traits, and we can expose a public function that takes a scholars value as an input, and we return its letter field to wasBinding. However, notice that this time, the rust arguments are typed as JS value, which models any value that can be passed to or that can be received from javascript. Then it's up to us to cast these JS values to actual types. And we will do that by using the Serde wasBinding's fromValue utility, and then we can cast this result back to JS value. And without digging into too many details, well, we see that the function signature tells us that the result is either a JS value or a specific error type provided by Serde wasBinding. Namely, that's a was an error. And however, if we use this approach, we lose type safety entirely, as this value could literally be any value. So it's typed as any in typescript, and it's not really that useful. So we started this journey in an effort to seamlessly integrate rust functions for data structures into typescript apps with webassembly, and it looks like we should give up. Unless maybe there is a magical tool that could help us by generating type safety and ergonomic bindings? Well, thankfully, that tool exists. It's called TSify, and I honestly love it. It supports everything we've seen so far, but it doesn't need any manual casting, and it comes with strongly typed bindings. We see in a second that we're going to need a little bit more macros to make things work, but still a huge improvement over the previous approaches. Notice that also we need to install TSify with the JS feature flag, which will give us a native javascript integration. Otherwise, it will use JSON serialization by default. As a demonstration, we will readapt the previous either example using enum variants, which we wanted to be translated to target unions in typescript. So we see that we need to derive a service trait, as well as the new TSify trait. We also need to use a new TSify macro to tell Wasm BindsIn to compile some data types, a data type that is otherwise unsupported by Wasm BindsIn. In fact, Wasm ABI stands for webassembly Application Binary Interface, and it describes how to call functions between languages in webassembly. We then define the familiar either variant with a twist that is common to all the approaches that use SerDe. We need to tell rust how to serialize this enum variant, because this could happen in a plethora of ways. And to get idiomatic type unions, like the ones we see on the right, we have to tell rust that the variant name should be associated with a discriminant key, namely underscore tag, and that the content of the variant, which is defined between the constructor parentheses, should be associated to a property named value. We can then define a function that, for instance, takes an either and returns its string representation. Notice that it doesn't really require us to write any typecasting boilerplate code, and it translates to a clean typescript definition. Just like SerDe Wasm BindsIn, we can define an either value in javascript without needing any constructor. We can just create it on spot as a dictionary. But this time, we get typescript guarantees, so we can leverage typescript's compiler to avoid writing typos in our data types. So let's wrap up what we've learned so far. webassembly is here to remain, and it's good for CPU-intensive tasks that would otherwise be too slow in pure javascript, or for parting already existing complex logic to the web. Think about Figma. However, it currently provides almost no input-output support. So if you need to interact with the outside world from your functions, you'd better stick with an api for the moment. We've iterated through several approaches to port rust functions to node.js and observed their limitations or awkward developer experience, especially for typescript ads. We've finally seen that the best solution for type-safe bindings, TSFI, is still relatively new. One caveat is that its source code heavily relies on macro magic, right? And that could be a deal-breaker for someone. Also, for any SerDe approach, and that includes TSFI, you can't just use generic containers like vectors or hash maps directly in a function that you bind to webassembly. You actually first need to specify the generic type. So you have to do, you have to say, I don't know, vector of strings, and then you have to wrap it into a struct or enum variant that you then expose to SerDe via the serialized or deserialized traits, and then you use that in your function. And if you want to see some example of TSFI being used in the wild, you can check a PR of mine that introduced webassembly support to Lira, a full-text search engine written in typescript by Michele Riva, which I believe spoke here at Node Congress as well. Michael, oh, well, sorry, Michele was quite happy with the performance improvements. And that's it for me. I'm Alberto Schibel. You can find me on Twitter and GitHub at jcomaino. You can also find additional material and code samples for this talk on my repository, node-congress-2023. Feel free to reach out with additional questions right now or later on Twitter. And thank you for your attention. Ciao! ♪♪♪♪
22 min
17 Apr, 2023

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic