Modern browsers come equipped with three types of Web Workers: Dedicated, Shared, and Service Workers. While each one offers the ability to execute JavaScript in a separate thread, their differing APIs and capabilities mean that any given task usually has an ideal worker. Learn the benefits of each worker and how to choose the right one.
A Comparison of Web Workers
From:

Node Congress 2022
Transcription
Hi, I'm Thomas Hunter, and this talk is a comparison of web workers. The content from this talk is adapted from a book I recently published titled, Multi-threaded javascript. I co-authored it. If you'd like more information about the book, feel free to follow that bit.ly URL at the bottom of the screen. All right, so today we're going to talk about three separate topics. The first one is dedicated workers. The second is shared workers, and the third is service workers. Each one of these workers is a type of web worker. But first, I'm actually going to talk about some basics. So first, the concept of multi-threading javascript. One thing to keep in mind is that it is the nature of javascript and its ecosystem to be single-threaded. So for the longest time, there really existed no true multi-threading capabilities in javascript. You could sort of pull some of this off by using basic message passing using iframes, but it wasn't exactly the cleanest solution. However, now we have web workers available to us, and with that comes a feature called shared memory, which allows for higher performance multi-threading than just using message passing. This presentation is going to be from the perspective of using these web workers from a multi-threading purpose, since it's sort of related to the book. So another basic concept is, well, what is a javascript environment? Well, a javascript environment is an isolated collection of variables, variables, things like capital O object are going to be different in these separate environments. The prototype chains, what those objects end up pointing to are different in these different environments. Each additional environment is going to incur some overhead to spin up. And so with Node, it's easier to measure. In my experiments, it was about each new work of thread instance consumes about 6 megabytes of memory. In a browser, you're going to get some more overhead. web workers will incur some additional memory overhead. And then if you have additional pages, there's even more overhead as there's different documents and rectangles that need to be rendered. So these object instances that are created in these different environments, they can never truly be shared across environment. However, you can serialize these objects. You can sort of clone them, or you can represent them as JSON and then pass them around between the separate environments. However, if you mutate one in one place, you're not mutating it in the other. None of the web workers that we're going to look at today have access to the DOM. So for example, the document object global is not available within the web workers. Using the shared array buffer, if you pass one of those between these different environments, a pointer to the same binary data in memory, we'll end up getting shared, and that's how we can perform shared memory data access. And a lot of this explanation is short of hand waving over some complexities under hood with relation to context and realms and really how the javascript VM works. All right. So now let's take a look at dedicated workers. What is a dedicated worker? Well, a dedicated worker is the simplest of the web workers that we're going to look at. Each one of these dedicated workers can have exactly one parent. You can actually end up loading them as a hierarchy if you want, where dedicated workers can load other dedicated workers as well. And each one of these workers gives us a new javascript environment. Each one is also able to execute on a separate thread. So now let's look at a code sample. This is how we would work with a dedicated worker from the context of the web page. So maybe this sits in like an index.html file. Maybe it sits inside main.js loaded by an html file. But at any rate, this runs within the main sort of thread that draws the window. And so modern browsers give us a capital W worker global and we're able to instantiate that to create an instance of a worker. The argument to this is the path to a file that we want to use is the worker. And once we get this worker, we can attach a message handler on it. So here I'm assigning this dot on message handler, which is a callback function. And so when this function gets called, it's going to print the message from worker and then it's going to print the data that was passed into it. So this code will get called within the parent thread when the dedicated worker thread has passed a message to it. And then conversely, if we want to pass a message into the worker, we call worker.postMessage where we pass in an argument. In this case, I'm passing in a string, but we could pass other simple values as well or basic objects with a few caveats. And then finally, at the end of the file, we're just logging that the end of the main.js file has run. So now let's look at the dedicated worker inside the worker. So this is our worker.js file that was referenced in the previous slide. So in this file, the first thing we do is we just log that we're inside the worker. And then we assign a global onMessage handler, which accepts the message that was passed in from the parent. And so within this handler, we just log a message that we received a message from main. We log the data that was passed into us. At this point in time within an application, this might be a good place to perform a heavy CPU, CPU heavy calculation. And then finally, we can call the postMessage global to post a message back to the parent. So let's actually execute the code and look at the output. Here I've already executed it for us. So the first thing we see is we see the message hello from main. And then even though we pass the message into the child, we end up logging the message hello from the end of main. So a non-deterministic amount of time sort of passes before the message is actually received by the worker and processed. So within there, we print hello from worker. And then we print from main message to worker. And then finally, the message that was passed back to the parent thread is then printed as well. So from worker message from worker. That's the final message there. So why might you want to use a dedicated worker? Well, the most important reason, in my opinion, is that it gives access to an additional thread. And so this is a great way to offload CPU intensive work or things that might otherwise slow down a web browser, things that might cause scroll jank. You're able to offload this additional thread. And so you pass the message to the thread, then the main thread is able to perform other work. And then once it receives a message back from that dedicated worker, it's then able to handle that value and then sort of continue with what it's doing. One thing to keep in mind is that a dedicated worker, any time its parents, its single parent dies, that dedicated worker will die as well. Next up, we're going to look at shared workers. What is a shared worker? Well, a shared worker, it's actually pretty similar, but it can actually have multiple parents. And so it's useful for communicating across different windows. There's a caveat that these windows need to be on the same origin. So for example, you can't have the Google web page communicating with the Microsoft web page. They have to follow the same origin rules. So let's again look at some sample code. So first we're going to look at the code that might run within like an html page. And so we can pretend that we have two different html pages. Both of them have a script. One is the red html and the other is the blue html. And so within these files, we have like a script tag that's instantiating a shared worker. So the shared worker is available again in modern browsers with a small caveat that I'll cover at the end. So much like the other workers, we instantiate the shared worker and we pass an argument, which is a path to the file to be used as the worker file. However, the api to interact with it is slightly different. Instead of assigning an on message, a message property directly on the worker, we assign it to a port. So that port property of the worker, that represents a communication port into the worker itself. And so the interface otherwise is fairly similar to the dedicated worker, where it's a callback that will take an event argument with a data property that's passed into it. In this case, we're just logging a message that says we received an event. And then so later, what we'll want to do is pass some message into the shared worker from one of the html files. And so to do that, we would call worker.port.post message passing in the message that we want to pass into it. All right. So here's some code on how to use the shared worker from the perspective of the worker. So this is our shared worker javascript file. Now, there's a bit of complexity going on in here, and I'll try to sort of explain everything step by step. But really, the purpose of this is just to sort of show how the shared worker works. So the first thing I do in the shared worker is I generate an ID. And so that number, it's just a big random number to sort of show that this code is only going to be executed once. And that's the next thing we do is we just print a log message that we're inside the shared.js file, and we log that ID again. Next up, we're creating a variable called ports, and that's a set, and that's going to contain the ports that are passed into it. After that, we have an onConnect method that's assigned to the self. And so what this is, this is a callback that gets called every time a different web page makes a connection to the shared worker. So if you've ever worked with WebSockets, this code pattern might feel a little familiar. So the first thing we do is we extract the port from the connection, and then we add it to our ports set. And then we just log that a connection has been made. And we're just logging the ID again, and then also the size of the ports, which is going to tell us how many pages are connected. After that, we assigned an onMessage handler to the port itself. And so that's saying that when this port receives a message, so when one of the html pages sends a message into the shared worker, we want to be able to handle it here. So again, we just log another message. We say that a message was received, the ID in the data was passed in. Now, previously with the dedicated worker, when we wanted to send a message back out to the parent, it was pretty straightforward. There was a single port, and we would just post the message to it. However, with shared workers, since we can have more than one parent, that's why we're sticking each one of the ports into the ports set. And then we're iterating over them here. And then we're calling the postMessage on each of them. Now, postMessage, it only allows us to send a single value through it. However, in this case, I'm just sort of abusing an array to pass the ID and the data that was received as well. And so let's say that we actually go ahead and execute this code on our machine. So maybe the first thing we do is we open the red.html file. And so what happens, we're going to see that the shared.js file is executed, and we see that an ID was generated. And that ID is 123456. And then we see that a connection is made, and the red.html has connected to the shared worker. And again, we print that ID. And the ID is the same. Then we open up the other file. We open the blue.html file. At that point, we see that the other connection is issued, and the ID is repeated again. And we see that the number of connections is now two. And after that, we execute the postMessage example from two slides ago, where we pass in the message, hello world, into the shared worker. And so when that happens, the shared worker prints that it received a message, prints its ID again, and the message that was received. And then it dispatches those messages to the calling html files. And so when that happens, the event line is called, the final two event lines are printed to the screen. And those will be seen within the console for the inspector on those specific pages. Sort of there's no sort of guarantee in this case, the order in which the red or the blue html logs get made. That's a bit of the fun excitement of multithreading, is that there's not always a guarantee on the order in which this stuff gets executed. So why might you use a shared worker? Well, perhaps you need to communicate across pages. You know, it's pretty useful for keeping like a context with variables associated with it, and particularly in cases where you want those variable scopes to outlive the life of a page. And so perhaps some heavy, you know, single page applications might depend on this. We want to coordinate things across different pages. So another pattern is perhaps you want to make a singleton, you know, single singleton source of truth, and you want that to be accessed across pages. One thing to keep in mind is that unfortunately, shared workers are not supported by Safari. I believe they supported them at some point, but then for security considerations they ended up removing it. However, it does work in Chrome and Firefox. If you do find yourself needing a pattern to coordinate communication across different pages, you might consider using the broadcast channel as an alternative, and that is available in modern browsers as well. Now, one thing to keep in mind is that a shared worker is going to die when its last parent dies. And so if we were to open multiple pages, maybe you end up opening 10 of them. It's not until the you can then start closing these different pages, and it's not until you close the final page that the shared worker would end up being killed as well. You can also sort of manually terminate these web workers if you so choose. All right. So finally, let's take a look at service workers. What is a service worker? Well, a service worker is the most complex of the web workers. Sort of the coolest feature that they support is the ability to intercept or otherwise proxy requests that are made to a server from a web page. And so this includes things like, oh, the fetch request, you know, sort of AJAX requests, but it also includes things like images that are loaded, assets that are loaded from css, the favicon file, a lot of that stuff. Basically all of it ends up getting sent through the service worker. One thing that's interesting about the service worker is that it can actually technically have zero parents, and it's sort of hand wavy, runs in the background. And so you do need a page to run to actually end up installing the service worker, but then once that page closes, the service worker technically remains installed in the browser. Sort of the behavior on when it sort of comes and goes, you know, when it lives and dies, that's a little bit less defined in the sense that it's not something you would necessarily want to depend on. So you can use these service workers to share state between same origin windows as well. All right. So let's look at another code example. So in this case, when we look at service workers as they're seen in the web page, again, so in this case, the interface to create the worker is a little bit different. Here we're calling navigator.serviceworker.register. Much like with the other web workers, the first argument is a path to the file that represents the worker. There's also an additional configuration object, and the most commonly used configuration property in this is that of the scope. And so the scope allows us to define the URL range that the worker can control. So in this case, we're saying, you know, things that are loaded that begin with a slash end up being in control. Essentially it's saying everything on this domain is sort of under control. We also get some different state changes that we're able to hook into. In this case, we're handling the on controller change, and we're just going to log a message to the screen. And so what that says is that when the current page, when a service worker ends up taking control of the page, which means at that point in time that requests are intercepted, we're just going to log a message that it is now functioning. So the final function that we have here is the make request. So this is just for illustrative purposes, and we'll execute this in a little bit. But what this does is it just makes a fetch to a file on the current server called data.JSON. Deserializes the result and logs it to the screen. All right. So now let's check out the service workers inside the worker itself. So this is our sw.js file. This is the first half. So the first thing we're doing in here is we're creating a variable called counter, and we're assigning it to zero. Next we're assigning an on install method on our self object, and that tells us when the service worker is being installed. So in this case, we're just going to log a message that the install happened. After that, we have an on activate. And so this tells, this gets called when the service worker reaches the activation stage or state, as it were. These service workers act as a state machine. So this happens, we're just going to log a message that the service worker is now active. But then we're also going to do something else that's kind of interesting. When a service worker is first activated, any pages that are open, requests that they send will not immediately get sent through the service worker. It isn't until the page is refreshed that those pages are then under control of the service worker. So what we're doing here at the bottom, we're calling event.waitUntil, which accepts a promise, and then the promise is the result of the self.clients.claim call. So what that's saying is basically the pages that are currently open, the service worker will now intercept the request. All right, so the file continues, and here's the second half. So we also have this onFetch handler. So this onFetch handler, this gets called every time the service worker receives a network request from one of the web pages. And so inside of this one, what we're doing is first logging the URL that was received. Next, we have an if statement. So in here, we check to see if the URL that was received ends in data.json. If it does not, then we skip to the end, and we fall back essentially to a normal HTTP request. So in here, we have this event.respondWith, and that accepts a promise, which resolves into a value that is given to the browser as the result of the request. And so the value that we're giving it is then the response from fetch, which is a promise, and then the argument that we're providing to fetch is the incoming request, which is event.request. So essentially what that line is saying is, you know, just perform a normal request and don't really do anything with it. However, if the URL did end in data.json, we execute the body of that if statement. So what we do in there is we increment that counter variable from before, and then we just call event.respondWith. You can ignore that return void, but we just call the event.respondWith, where we pass in a new response that we're creating from scratch. And so the body of that response is a JSON object that contains our counter in the variable. And then just for illustrative purposes, muscle setting the headers. In this case, we're setting the content type to text.json. All right. So if we actually execute this code, this is the logs that we'll see. The first thing we see is that the service worker is installed. Next, we see that the service worker has been activated. And then finally, we see that the controller change has happened. Again, sort of the order of these messages is non-deterministic, as the browser, you know, may handle a log and buffer it in memory while it prints a log from another location. So the order of those first three messages is not guaranteed. Next, we call makeRequest from within the web page, and that's going to make our fetch request to the server. So that happens. We print that the fetch was made. And it just prints the URL that was received. And then it returns the javascript object through the network request. And then finally, the main.js receives that message and then prints the result, which in this case is counter set to 1. And then if we were to execute the makeRequest function multiple times, each one of those requests would then result in a different counter. And so none of these requests actually get sent to the server running at localhost port 5000. They're all handled by the service worker. And so why might you want to use the service worker? Well, it's really useful to cache network assets if an application is offline. You can use it to perform background synchronization of content that is updated. If you update content on the server, the service worker can then sort of decide how to give that back to the client. If you want to be able to support push notifications, that's also done in the service worker. If you're like me and you like to build progressive web apps and you like those apps to be added to the home screen, well, both Android with Chrome and iOS with Safari will make use of these service workers. And it's sort of one of the requisites before you can add your PWA to the home screen. One thing to keep in mind is that a service worker, it might die when the last parent dies, but it sort of sits around. There's not really a guarantee on how long those closures sort of remain in memory. So if you think back to the example I had with that counter example, that's actually a bit of an anti-pattern. You really wouldn't want to create a state in memory in a service worker where you would expect it to sort of hang around. If you want to do things that are a bit more persistent, for example, you want to put things in caches, there's all sorts of cache APIs that are available within service workers to keep that stuff in a more persistent state. All right. And so if you were to take a screenshot of this presentation, this is probably the one to do. So here we have a sort of a comparison between the different types of web workers. So on the far left, I have sort of the feature and then how it works for dedicated, shared and service workers. And so all three of these web workers, each one of them is going to provide an additional thread, which could be used for multi-threading purposes. If you have a server that's not serving content over HTTPS, where you can only use dedicated and shared workers, you are unable to use a service worker with it. If you need to support Safari, you can use dedicated and service workers, but you'd have to avoid a shared worker. If you want to act as a HTTP proxy, well, then you need to use a service worker. A dedicated worker has one parent, a shared worker has at least one parent and a service worker has at least zero parents. And then a dedicated worker dies with its parent, a shared worker dies with its last parent and a service worker, it's a little bit trickier. All right. So that's been the presentation. This was a comparison of web workers. If you'd like to follow me, I'm on Twitter at TLHunter. This presentation is available at that bit.ly link online. And then if you're interested in finding out more information about the book, feel free to follow that last URL as well. Thanks.