Crafting the Impossible: X86 Virtualization in the Browser with WebAssembly

Rate this content
Bookmark

WebAssembly is a browser feature designed to bring predictable high performance to web applications, but its capabilities are often misunderstood.


This talk will explore how WebAssembly is different from JavaScript, from the point of view of both the developer and the browser engine, with a particular focus on the V8/Chrome implementation.


WebVM is our solution to efficiently run unmodified x86 binaries in the browser and showcases what can be done with WebAssembly today. A high level overview of the project components, including the JIT engine, the Linux emulation layer and the storage backend will be discussed, followed by live demos.

Alessandro Pignotti
Alessandro Pignotti
21 min
16 Jun, 2022

Comments

Sign in or register to post your comment.

Video Summary and Transcription

ChirpX is a technology to securely run binary code in the browser, written in C++ and compiled to JavaScript WebAssembly. It can run a full virtualized system in the browser, including Bash and other languages like Python and JavaScript. ChirpX aims for scalability and the ability to work with large code bases, supporting multiprocessing and multithreading. It uses a two-tiered execution engine with an interpreter and a JIT engine. Future plans include running the full X.Org server in the browser and implementing the Windows system call. WebVM, the underlying technology, has a virtual file system backed by Cloudflare.

Available in Español

1. Introduction to ChirpX

Short description:

I am Alessandro Pignotti, founder and CTO of Leaning Technologies. We specialize in compiled to JavaScript and compiled to WebAssembly solutions. We have released three different products: Chirp, ChirpJ, and ChirpX. ChirpX is a technology to securely run binary code in the browser. It is generic, robust, and scalable. We wrote it in C++ and compiled it to JavaScript WebAssembly using Chirp.

So, hello, everybody. And I'd like to start by thanking the organization for inviting me here and giving me the opportunity some of the tech that we build for you. And especially today, it's a special day for me, because it's my birthday. And thanks a lot for coming here to celebrate with me, I really appreciate it.

So I am Alessandro Pignotti, founder and CTO of Leaning Technologies. And I was born and raised in Rome, and I moved to Pisa for my studies, and then I moved here in 2014. And I've been a proud Amsterdammer since. If you wish, you can follow me on Twitter, but I would recommend not to hold your breath while you wait for me to post something.

So what do we do? We are a small company that specializes in the niche of compiled to JavaScript and compiled to WebAssembly solutions. In this small niche, I think that we do some pretty cool things. And over the years, we have released three different products. of them was Chirp, which is a C++ tool, JavaScript and WebAssembly compiler. The second one was ChirpJ that is not just a compiler for Java, really. It's more like a full Java machine that can run in the browser. And we can use it to run even fully graphical Java applications in the browser right now. And then we decided to move one step further. And we made ChirpX. And ChirpX is not just a product, really. Right now, we consider it to be more of a technology. It's a very generic solution that will do a bunch of different things. And as a first experiment to see how we can make this eventually a product that we can sell, we built WebVM, which is just one of the possible things we can build with this. And we will discuss about that. Because, you know, each one of these products will probably require its own couple of hours talk. We have 20 minutes, so we need to cut and go to the meat. So, ChirpX is a technology to securely run binary code in the browser, okay? And there are three main ideas that we followed when making this. We wanted to build something which is generic, robust, and scalable. I think you might have your own intuition about what these terms mean, but I will get in depth later about what I mean exactly by using these words. And in practical terms, what ChirpX is is a C++ application that we wrote from scratch. We wrote it ourselves, and we compiled it to JavaScript WebAssembly using Chirp, using our other product, so that it can run in the browser. And so I know that it's nice to talk about things, but it's also nice to see how things actually do in practice.

2. Running Full Virtualized System in Browser

Short description:

I will now demonstrate that we can run a full virtualized system in the browser. Currently, I have Bash, the shell, running from a Debian distribution in the browser. I will write a binary from scratch and test it. I want to return an error code instead of the usual successful completion code to see if the shell can handle it. After compiling the code using GCC, the system loads the required data from the network. The execution is completed, and we can test if it runs as expected.

And everybody recommended me not to do a live demo, but, you know, who am I to ever follow good advice? Let's try this. So what I'd like to do, I'd like to prove to you that we can run a full virtualized system in the browser. And what I have here right now is Bash, the shell, running from a Debian distribution running in the browser. To prove it to you, I can, for example, list a file system and there is a bunch of things you can expect from an actual system running.

But I'd like to prove that I can actually run a binary that has never been seen before. And through that, I guess I'll just write one. And I don't know you guys, but when I want to write a binary from scratch, first thing I do, I open my text editor. So... It's incredibly difficult to spell correct on stage. So we have Vim, it's running, and I can now type a very small test case. And what's going on right now is that this whole thing is running from the XSAT binaries that you run on your own computer. And I plan to do a very simple L word. To do something which is not completely trivial, I actually want to, instead of returning the usual return zero or code that will tell the system that the executable is successfully completed, I want to return a error code. I want to see if the shell can deal with that. So let's try this out. Cool.

Now I want to compile this. To compile C++ code. C code, actually, I'll of course use GCC. And we can also enable some optimization because you never know. And this looks correct. Okay. What's going on in the background here? So GCC is a fairly big executable, and currently the system is loading the required data from the network. And this data comes in blocks because this is actually a full X2 implementation that runs from a disk device which is baked by a CDN. It's baked by cloudflare. And this execution is actually completed. We can test if this can even run. It does what we expect it to do. Let's check the error code. It is what we expect.

3. Compiling C++ WebAssembly

Short description:

We can compile C++ WebAssembly and prove that it's not a special version of GCC. The file type is ELF, a binary format for executable files. We can even dump and examine the binary code.

So this is fairly interesting, I think, but maybe I said it myself before, right? We can compile C++ WebAssembly, so maybe you're thinking that there is some trick here. That maybe this is a special version of GCC that can magically generate code that runs natively in the browser. And I'd like to prove to you that this is not the case. That through that we can actually ask the system. So what is the file type that we just run? The system says it's ELF, which is a binary format run executable. So Linux 32-bit intel x86, which is what we expect. And as the last proof, we can even show the code itself. We can dump this program that we built. We can take a look and this is binary code, which is what you would expect.

4. Running C++ and Other Languages in Browser

Short description:

There are other ways of running C++ code in the browser. We can try Python and JavaScript as well. ChirpX is able to deal with sophisticated executables by generating code at runtime. Building something generic means using binaries without pre-processing or special tools.

And okay, but, I mean, there are other ways of running C++ code in the browser. So why will we go toward all this complexity? And the issue is that it's complicated, really. So it is true that you can compile C++ in the browser. It is not clear that you can compile any application without manual intervention. But even putting this aside, the point is that this is an extremely generic solution, right?

So we can try something completely different now. Let's try Python. So what I've done now, I've set up the Python interpreter and I can actually type command directly in the shell one more time. I'll do the same thing I've done before. Hello. This is also pretty easy. I can return another code, which will display on screen. And, OK, Python is nice, but it's also relatively simple executable. It's just an interpreter overall, it does not much more than that. So let's try something funnier. Let's try if we can try to run JavaScript. And I will also open again Vim. I will also open one more time and write my simple test case, which is misspelled. And also, not to do something completely obvious, I'm actually going to enable the print code option. So what this option does, it actually prints out all the native code that Node.js and actually the internal engine, which is V8, the same engine that's used in Chrome, is generating just to run this small example. Keep in mind that what you're seeing is not something that happens especially because it's running in this virtualized environment. This thing happens every single time you start up Node.js. And what I found interesting is that what this thing shows to you is that ChirpX is actually able to deal with a fairly sophisticated executable, because this code is being generated at runtime. This was never ever seen before by the engine. It's just been generated, read to memory, and eventually executed. So how do we build something? Oh, by the way, this is a live website. If you want to play with this, you can go and play with that, and if you have bugs, you can report them on GitHub, and members of my team will take care of them. So let's try to define this terminology we've been using before. When building something generic, I mean that we want to do something that does not require any pre-processing, any special metadata. We should not have a special compiler or special build options, we should not have special libraries, none of this. What we do is that we take the binaries as they come out of the Debian packages and we use them.

5. Running Code in the Browser

Short description:

Fast means being able to run Node.js and handle code that is generated, modified, or deleted at runtime. We aim for scalability and the ability to work with large code bases, supporting multiprocessing, multithreading, and thousands of files. Chirpx is a client-side environment in the browser, starting with Bash as the parent process and spawning independent sub-processes like GCC and Python. To address the challenge of distinguishing code from data, we use a two-tiered execution engine with an interpreter and a JIT engine. The JIT engine generates optimized code based on the metadata built by the interpreter. The system interacts with the browser through syscalls implemented in a Linux-compatible ABI. Running everything in the JIT may seem more efficient, but there are reasons for the current approach.

Fast means pretty much being able to run Node.js, since we need to be able to give a situation where code is not only generated at run time, but is also changed at run time, maybe modified in place, or maybe just deleted and put somewhere else. This also happens when running code with V8, because code itself is garbage collector and moves around memory over time.

And then we wanted to build something scalable. What this means is that although I showed you guys just a bunch of yellow words, this thing can work on much bigger code bases. We wanted to build something that can work with programs in the wild, which means we want to support multiprocessing, multithreading, thousand of files, and all the sort of features that are effectively used by programs that are real, not just toys.

To give you an idea what we've seen so far, Chirpx is this environment in which all the execution happens. And it's all client-side. It's all in the browser. There is no server-side component doing the execution for us. This is not a trick. And the first thing to start is Bash. So Bash is the parent process. And then Bash itself can spawn sub-processes, child processes, and I showed you GCC and Python, and all of these are actually completely independent to other spaces. They have their own code and they have their own data.

But the issue is that, from the point of view of the system, we don't really know what is code and what is data. These two things are just bites. It's just old data in memory. And to solve this problem, we have actually a two-tiered execution engine. The first tier is an interpreter, and the second tier is an actual JIT engine that can generate highly optimized code. And the interpreter is able to pretty much run code without any information. It will start from the first instruction and put it to the next and so on and so forth. And as it does this, it will also build the metadata internally about how the code is structured. And with that, it's now possible to fire up the JIT engine to generate optimized, robust code out of this.

And eventually, all these applications will need to reach the browser somehow because we need to display text on screen, for example. And this happens as you would expect on a native system via syscalls. And syscalls, we implemented them ourselves. So what you saw so far is not a Linux kernel. It is a Linux-compatible ABI, so it's able to run any Linux executable, but it's not Linux itself. And the system call is the place where we stop and implement the system call manually so that they can interact with the browser. And now you might wonder why don't we just run everything in the JIT, since it's most likely more efficient.

6. JIT Compilation and ChirpX Features

Short description:

JIT compilation is an investment of execution time, paying off in the future with faster system code. Thanks to the interpreter, we can build metadata and generate JIT code for hot code blocks. ChirpX supports the X86 instruction set, with plans to optimize MMX and SSC using WebAssembly's extension. It also supports most file systems and process handling. Persistence is done locally using index.db, ensuring privacy. ChirpX enables zero-maintenance environments for education, full web-based development environments, live documentation for any programming language, and access to heavy-duty engineering applications.

And the issue there is that this is actually not necessarily the case. The way I see it, JIT compilation is pretty much an investment, and you want to make sure you recover your investment. It's an investment of execution time, really. You pay some execution time now in the hope that in the future the system code will run faster so that overall you run faster as well.

And just JIT and everything would be inefficient. So we actually, thanks to the interpreter, we can build this metadata. We build what we call the control flow graph of the program. And then when blocks of code become sufficiently hot, they run a sufficient amount of time, we start generating JIT code only from that. And only from the subs that are executed in a sufficiently high number of times. And in this way we can achieve both a good runtime performance and without exceeding the resource of the browser in terms of compile code.

So, what can we do with this thing? In terms of features, what we have right now is that we have a fairly complete support for the core X86 instruction set. We do support X87, but it's not as fast. MMX and SSC are both supported, but they are currently scalarized. So we expand them to the equivalent scalar operations, which is of course slower and our plans in the future is to of course use WebAssembly's extension to be able to shrink this gap. At the level of the OS, we have support for most of the file systems and process handling The data comes from a disk backend which is an X2 implementation. And we have chosen to use X2 because it's going to be possible for us to extend it in the future to support further extensions and reach the X3 and X4 level without having to rewrite everything from scratch.

In terms of persistence, this is pretty interesting. If you change a file in this VM, it will stay there. The persistence is local. It's done using index.db, which is great because it's privacy preserving. So we are not going to look at your data. It's yours and it's going to be stored on your machine. And with this limited amount of features, we can already do a bunch of interesting things. In the context of education, for example, it would make it possible for schools to set up a zero-maintenance environment that the students can fire up without having ever to worry about will this thing run on my computer or maybe today the setup XS is not working correctly. For developers like us, it might make it possible not only to have just web-based IDEs. This will make it possible to have a full web-based development environment where you can actually build and run the full pipeline on the client. This will be useful in documentation, to have live documentation for any programming language, not just for programming languages that can run that already in the browser. And this may also be useful to open the web to a new category of applications, in particular, heavy-duty engineering applications like computer-aided design programs. Usually, these sorts of applications do not actually have the full source code available, not even to the developer. Because they use binary components which are sold by other companies.

7. Running Binary Systems and Future Plans

Short description:

Thanks to this system, you can run binary systems without having the code. In the future, we plan to make the full X.Org server run in the browser and map OpenGL directly to WebGL. With networking support, we aim to have full development environments accessible worldwide. Our goal is to run a fully virtualized desktop environment in a tab, allowing users to access their data from anywhere.

And thanks to this system, it doesn't matter that you don't have the code. You can run the binary system as they are. Fundamentally, you don't care. And this is what we have now.

And what about the future? Well, of course, one thing in my mind is gaming. And to do that, we first need to have some sort of graphical support. We're still getting there. And the plan is to actually make the full X.Org server run in the browser. Believe it or not, this can work. I've done a prototype some months ago and it's totally possible. And then we need to figure out a way to map OpenGL directly to WebGL. And what's funny is that with this setup, it's quite possible that the overhead might be not even that high. Because, of course, virtualization implies an overhead in terms of CPU execution. But since we will map OpenGL directly to WebGL, the overhead there is probably going to be much less. And with networking support, which is a whole complicated topic, we may be able to have full development environments where you can fire up a little web application including server side code from your browser tab, which is then reachable from all over the world from other people with their own browser. And my own personal goal is to reach the point where we can run a fully virtualized desktop environment in a tab so that you can access the website, log in and you have your data. You close the tab, you're done, you can continue your work somewhere else with your own system. And this is it, really.

Feel free to get in touch. And we are actually hiring, we are looking for an intern right now. So if you are interested yourself or you know somebody that could be interested in working with our tech, there is a space. Thank you. Thank you, thank you very much, Alessandro.

And let's check if we have some questions on Slido. Could I please see this on the screen? Yeah, I think we have some of them, at least this is what I see. Okay, no problem. I will read it from my mobile phone. Yeah, so basically, the first question is very similar to my very initial one. Is it or will it be possible to run Windows applications in the browser from the .exe file? For example, browser, like I mentioned. So, to run fundamentally, the issue is that we implement system calls.

8. Running Windows Stack and WebVM File System

Short description:

It is possible to implement the Windows system call and run a full Windows stack. However, licensing is a tricky issue as we don't have a license from Microsoft to use all the DLLs. Auto-completion and other features are available in the command line. The technology is not currently open source, but this may change in the future. The WebVM has a virtual file system backed by Cloudflare, using a block-based file system that downloads blocks on-demand.

And it is, in theory, possible to implement the Windows system call and run a full Windows stack. Now, the tricky part with that is licensing. We don't have license from Microsoft to use all the DLLs from Windows. Now what you can do is run Wine. Also, run the Windows Emulation Layer for Linux to run the Windows applications on top of that. Fair enough.

The next one. Do we have auto-completion or other some limitations? I mean, in the command line? Well, no, it is an actual bush. It is exactly what you will get for your own system. So, if auto-completion is properly configured, you will get that as well.

Next question is, is it all open source? Of course it is not. So, the thing is that this might change in time. As I was saying, currently we are still trying to figure out what will be the productization of this technology. And right now it seems to us that we keep ourselves more open paths by keeping it proprietary. But this might change in the future. We honestly don't know. So far it is going to be only for us to look at the code. But we would like you guys to try that thing, if you like.

Okie doke. You said WebVM has a virtual file system backed by Cloudflare. Does it lazy load files or preload the whole VM at once? Does it perform well, after all? So, the backend is stored by a CDN, by Cloudflare. But it is not based on files. It is based on blocks. It is a block device. So, it is a very traditional block-based file system. And each block is downloaded on-demand. Only when required. And this means that we can actually support pretty large disk images. The image you have seen so far is 2 gigs. But this is mostly for technical limitations.

QnA

WebVM Offline Mode and System Access

Short description:

We plan to make it possible to go much higher in terms of size. Is there an offline mode? Right now, there is no truly offline mode, but we are working on it. Can you run top command? You can run top, but you will only get information about the virtualized system. WebVM is currently of interest to the education and web-based ID sectors, with a goal of attracting gaming enthusiasts.

And we plan to make it possible to go much, much higher in terms of size. Cool.

So, the next question is... Is there an offline mode? So, you demonstrated this website. Like Playground. Is that somehow possible to run this cool stuff when you are disconnected? So, if you ask if myself and my computer can have an offline set up, yes, of course. But it is not yet available to the public. It is still part of the fact that we still need to understand exactly what we're going to ship, what will be the APIs, and these sort of things. So, right now there is no truly offline mode. But it is what it is. We are working there. Cool.

Let's take this one. Do you also have access to the system somehow, too? For example, can you run top command? Well, you can run top. But you will only get the information. To be fair, you can not yet run top. Ideally, you will be able to run top. But you will get information only about the virtualized system. Of course, you can never access the reality of the underlying system. Right? This is secure. This is not a security hole in your system. Nice.

The question is who is and for what use WebVM right now? In terms of customers, we don't have one yet. But we have partners that we're trying to work with to try and build the first product. And the main interest we have is from the education sector and the web-based ID sector. These are the people that seem to be the most interesting right now. But my personal goal is to have some gaming people on board. Sounds good. And let's take this one. And I believe this will be the last question.

Lunch Break and Device Requirements

Short description:

During the lunch break, you can use any browser and device, such as Chrome, Firefox, or Safari. Thank you for your questions, and feel free to approach the stage if you have any more. Enjoy your one-hour lunch break!

And next we have lunch break.

What kind of browsers and devices are required? Any. Chrome, Firefox, Safari will work. That's easy.

Thank you very much for your questions. I'll keep some coins on the stage. If you have any questions, feel free to come and grab. And now we have one hour break for the lunch. Some calories needed. Thank you.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Utilising Rust from Vue with WebAssembly
Vue.js London Live 2021Vue.js London Live 2021
8 min
Utilising Rust from Vue with WebAssembly
Top Content
Rust is a new language for writing high-performance code, that can be compiled to WebAssembly, and run within the browser. In this talk you will be taken through how you can integrate Rust, within a Vue application, in a way that's painless and easy. With examples on how to interact with Rust from JavaScript, and some of the gotchas to be aware of.
Making JavaScript on WebAssembly Fast
JSNation Live 2021JSNation Live 2021
29 min
Making JavaScript on WebAssembly Fast
Top Content
JavaScript in the browser runs many times faster than it did two decades ago. And that happened because the browser vendors spent that time working on intensive performance optimizations in their JavaScript engines.Because of this optimization work, JavaScript is now running in many places besides the browser. But there are still some environments where the JS engines can’t apply those optimizations in the right way to make things fast.We’re working to solve this, beginning a whole new wave of JavaScript optimization work. We’re improving JavaScript performance for entirely different environments, where different rules apply. And this is possible because of WebAssembly. In this talk, I'll explain how this all works and what's coming next.
Building Brain-controlled Interfaces in JavaScript
JSNation Live 2021JSNation Live 2021
27 min
Building Brain-controlled Interfaces in JavaScript
Top Content
Neurotechnology is the use of technological tools to understand more about the brain and enable a direct connection with the nervous system. Research in this space is not new, however, its accessibility to JavaScript developers is.Over the past few years, brain sensors have become available to the public, with tooling that makes it possible for web developers to experiment building brain-controlled interfaces.As this technology is evolving and unlocking new opportunities, let's look into one of the latest devices available, how it works, the possibilities it opens up, and how to get started building your first mind-controlled app using JavaScript.
TensorFlow.js 101: ML in the Browser and Beyond
ML conf EU 2020ML conf EU 2020
41 min
TensorFlow.js 101: ML in the Browser and Beyond
Discover how to embrace machine learning in JavaScript using TensorFlow.js in the browser and beyond in this speedy talk. Get inspired through a whole bunch of creative prototypes that push the boundaries of what is possible in the modern web browser (things have come a long way) and then take your own first steps with machine learning in minutes. By the end of the talk everyone will understand how to recognize an object of their choice which could then be used in any creative way you can imagine. Familiarity with JavaScript is assumed, but no background in machine learning is required. Come take your first steps with TensorFlow.js!
Unreal Engine in WebAssembly/WebGPU
JS GameDev Summit 2022JS GameDev Summit 2022
33 min
Unreal Engine in WebAssembly/WebGPU
Top Content
Traditionally, browser games haven't been taken seriously. If you want to target the web, that traditionally has meant compromising on your vision as a game developer. Our team at Wonder Interactive is on a mission to change that, bringing one of the world's premiere native game engines to the browser - Unreal Engine. In our talk, we'll dive into our efforts porting the engine to the browser and carrying on the pioneering unfinished work started at Epic Games nearly a decade ago in collaboration with Mozilla. We'll dive into what this means for the future of games in the browser, and the open metaverse on the web.
Pushing the Limits of Video Encoding in Browsers With WebCodecs
JSNation 2023JSNation 2023
25 min
Pushing the Limits of Video Encoding in Browsers With WebCodecs
Top Content
High quality video encoding in browsers have traditionally been slow, low-quality and did not allow much customisation. This is because browsers never had a native way to encode videos leveraging hardware acceleration. In this talk, I’ll be going over the secrets of creating high-quality videos in-browsers efficiently with the power of WebCodecs and WebAssembly. From video containers to muxing, audio and beyond, this talk will give you everything you need to render your videos in browsers today!