The Road to Async Context

Bookmark

The AsyncLocalStorage API is arguably one of the most important relatively recent additions to Node.js. Today we are seeing implementations being added to other runs such as workerd, deno, and bun. And there is an effort underway in TC-39 to introduce a new AsyncContext API to the language. This talk will introduce async context tracking with AsyncLocalStorage and AsyncContext and discuss how the model is evolving as it is being implemented across multiple platforms.



Transcription


Good morning. How are you all doing? When they invited me to come out and speak, I was kind of thinking about, you know, what do I want to talk about? You know, we could talk about Node in general, kind of what's happening in the project. And I just wanted to talk about code. So, you know, I want to talk about some of the code that we've been writing recently, kind of been working on, not just in Node, but also in cloudflare Workers. And the Async Local Storage api, if you're not familiar with it, it's one of the, it's been in Node for a couple of years, but it's one of them that's not really well known, kind of what it does, how it works. And one of the things that, you know, as we started to work on this in Workers, here recently, as part of our Node compatibility story, we, you know, tried to figure out, like, you know, how is this implemented? How, you know, is the performance of it and how it's written, is it what it should be? All right. So in Workers, we decided to take a different design approach to how it works under the covers. And I kind of want to talk about the differences between the way it works in Node, the way it works in Workers, and kind of where things are going. Now, that, you know, that original title of the talk, the Road to Async Context, Async Context is actually a new proposed standard api that covers the exact same use cases, Async Local Storage, but it's actually going to be added to the language itself. And I'll talk a little bit about that here at the end. So a little bit, you know, more for me, like, you know, for those of you that don't know me, I have been involved with Node for a number of years. I'm also contributing to cloudflare Workers. I'm actually on the Workers runtime team at cloudflare. I'm not going to talk too much about Workers itself. My colleague, Matt Alonzo, is here. He's going to be showing off, you know, some bits and some more specific details of Workers, if you're interested in that. I highly recommend you go over to his talk later on. All right. So let's get things going. What is Async Local Storage? The Node documentation gives us this very helpful definition. Creates a store that stays coherent for Async operations. And that's basically it. And then it gives you an example. It's an extremely unhelpful explanation of what it is. So we're going to break it down a little bit further. All right. So we have this notion of what's called an execution context, right? This is whatever code is running right now, right? What execution context can schedule a continuation? Continuation is any code that's scheduled to run later. So think of a timer or a promise. Anytime you attach a then, a catch, a finally to it. Callbacks. So when you're calling the FS api and that Async callback is run later. Return, you know, the continuation is kind of the generic term for those things that are scheduled to run later. All right. So as the execution context is running, it can schedule any number of these things, right? If you're using QMicroTask or promises or callbacks or process next take, all of these things are things that the current code, the current running code can schedule to run either just a few moments from now, later, sometime later in the application, whatever. But these things can get stacked up. Async Local Storage is about being able to associate contextual data with the scheduling of that continuation, right? So when the current execution context is running, we want to be able to set a context value and then when that function actually does run later, we want to be able to recover that exact same value, right? An example makes this a lot easier to see. All right. So in this particular case, what we want, this log function at the top is our use case. What we want is to be able to log something to the console that has a unique request ID, right? So imagine that this is a server. Anytime that request comes in, we want to create a unique ID for that request. And anytime we have the console log output, we want to actually include that unique ID in that output, right? So the way that you have to do it, you know, up till Async Local Storage existed, is you essentially had to pass that ID in explicitly through all your functions, right? Or you had to attach it to something else. You know, we've probably all seen the case where you take the request object and add properties to it and pass that through. That's very gross. You know, adding arbitrary properties to somebody else's object that you don't own is always a bit problematic, right? And also, you know, having to pass this property down through every function even if you're not going to use it just so you can have something like logging is very cumbersome, very difficult to do, especially if you don't control that code, right? All those functions that you have to pass this through, if you don't control that, it makes it a lot more difficult to do, right? So, you know, this kind of shows, you know, a bit more of the complexity here, like everywhere that that thing goes, there's do something and do something else functions. We're not actually using the ID there. The only reason we're having to pass that in is so we can get it to that log, right? With Async Local Storage, we have a much better model, right? We can create this Async Local Storage instance at a scope that is visible to both, you know, our create server and our log function, right? The do something and do something else methods don't need to know anything about this ID, right? As we set, as we, you know, go in and start dealing with our request, right? We tell the request ID what that value is going to be. This is setting up a counter that increments on every request. And then we call set timeout. We're just kind of, you know, simulating Async activity here. It's going to wait a bit of time. Then when it actually calls that function, that do something function, which then does the log, inside the log, we'll pull out that request ID stored value and go from there. So it greatly simplifies how we actually pass this information down. Like I said, especially when we don't control that api that we're passing through, right? So if this do something is from some arbitrary third party module out on npm that you don't control the api, but you still need that ID passed through and in your logs, Async Local Storage is the thing that's going to make that possible for you to do, right? It's going to allow you to get that information. This is what we mean by remaining coherent. It's there when you need it in that Async flow. Okay. We can set multiple things, right? So this Async Local Storage is not just a single value. You can schedule as many continuations as you want at the same time, different values, when those run, and they can run in any order, the appropriate value is going to be returned when you call ALS get store. Okay. But how do we do this? Unfortunately, it's a bit complicated right now in Node. Async Local Storage is built on a api called Async Hooks and Promise Hooks. We'll talk a little bit more about that in a bit. But it adds a certain degree of complexity and performance loss when we actually go to use this. Every Async Local Storage instance has a key, and it's kind of a hidden key inside that instance. It represents that particular instance of that api. And then at any given time within Node, we have this thing called a current execution resource. This is just the object that represents the code that's running right now. So when a timer fires, right, that current execution resource is the timer. When you're dealing with a promise, then a continuation from promise, that current execution resource is a promise. There's always an object that represents that current running code within Node. Right. Whenever you set that value, so ALS.run, right, that run, and we're setting that value, what we do is set that key from that instance to the value that we're setting. And that current execution resource actually maintains the map of all of those values. So let's say we have 10 Async Local Storage instances in our application. On that current execution resource is going to be 10 keys, 10 different values. Right. Every time we schedule a new thing, a new timer, a new promise continuation, every time you add a new then, every time you create a new promise, what we end up doing now is we take the full set of those properties and copy them to the new resource. Right. But think about what, think about the performance cost of that. Right. I've dealt with production applications that have created tens of thousands of promises within just a few seconds. Every time a new promise is created in this model, that Async context here, those key values are copied from one object to the next. Right. That is an incredibly expensive operation. I have seen cases where just turning on Async Local Storage in a heavy application has been a 30 to 50 percent performance loss. And that's being fairly conservative. Right. So we can do better. Like I said, you know, in Node's implementation is based on Async Hooks and Promise Hooks. Async Hooks was originally intended as a diagnostic api. It's a way of kind of monitoring the lifecycle of these Async resources that are created within Node. But it is a very low level internal thing. It was, you know, discovered like when we started looking at Async Local Storage, like, hey, we could use this to implement this model. It works. But the performance of it is either is actually rather poor. Promise Hooks is an api built into B8. It really intended to help with this type of use case. What it does is it sets up a series of callbacks that can be fired when a promise is created, when it's resolved, when it's, you know, just all the different lifecycle events of that promise. But that gets incredibly expensive when you're invoking that code every single time you create a promise, every single time you resolve the promise. And like I said, applications can create tens of thousands, even hundreds of thousands of promises in just, you know, a few moments of time. So that code ends up being extremely expensive to run every single time. A key problem here is that we are propagating that context. We are copying those key values from one execution resource to the next every time the resource is, every time one of those continuations is created, not when the context actually changes, right? And that's the key thing because the actual contextual information that you're dealing with only changes very, very rarely, right? You're only dealing with a few instances of these things. The values are typically set once during this application. But you know, right now, every time we create those promises, we're copying that data every single time. And it gets very, very expensive very quickly. So we recently implemented async local storage in cloudflare Workers. This is a fun thing. We're actually getting, it's not going to be full node compatibility in Workers, but we do have things like, well, we will have things like node dot or node colon F, or not FS, but node colon net and node colon crypto. And you know, a lot of these things, we are using the node specifier there. It is required just like Endino. It is there. And async local storage is one of the first ones that we added. It just got enabled, I think like last month where it's available for everyone to use. But we did this without using async hooks or promise hooks. But at the api level, what code uses, it is very compatible with what node has. But we've implemented it in a completely different way. So how do we do it? We introduced this thing called an async context frame. Rather than storing all of those key value pairs on the actual execution resource, what we do is we create a frame only when the data actually changes. So initially, when the application is running, there is no frame. We haven't set any values. The first time an ALS instance is used, we will create a frame and set that value. And that frame actually maintains that map. The execution resource only maintains a reference to the current frame. So when that resource is created, we just will link it to whatever frame is current. We only create new frames when a new value is specified. So it's much less expensive. We're only propagating that context when the values actually change, not when the individual continuations are scheduled. It results in a massive performance improvement. This is currently how we do it in workers. This is how I want to do it in node. But we, I skipped a few slides here, but there's a challenge with how we can implement this in node right now. All of this depends on a new, actually it's not even a new api, it's an api that's been in V8 for years that we are using. It's been undocumented. Very few people actually know about this api. It completely eliminates any dependency on async hooks and promise hooks for us to implement this entire model. The problem is that Chromium also uses this api. And the way that the api is currently designed, you can only use it for one purpose at a time. And if node was just node, it doesn't depend on Chromium at all just by itself, that'd be fine. We could use it. But there are things like Electron, which use both node and Chromium, right? And they're using the same V8 instance. So, you know, since Chromium is actually using this api too in their own way, we can't quite use this in node yet until we make changes in V8 to actually make it work. The key change that we need to make it work there is allow multiple uses of this api at the same time to be able to set this contextual data and have multiple keys. So it's something that's coming. What I'm hoping for is later this year that we'll actually be able to make these changes in V8 and make this a significant performance improvement in node for anyone that it's using async local storage api. Right now, performance is the number one issue you're going to run into if you're using this api. Okay. So, what about standardization? This is a very good question. A little while back, Luca, over on, working over on Dino, posted this comment on Twitter. I found it quite amusing. Vercel has, you know, with next.js and they said, it's going to be totally standard, standards-based, we're going to use all, you know, all web platform standards but hey, it requires async local storage. Async local storage is a node-specific api. There's no specification for it whatsoever other than the code and the documentation. And you know, it's a node-specific api. But Vercel has adopted it. Other people are starting to adopt it. So what are these run times supposed to do, right? Dino and workers and Bun, do they just go off and implement, you know, follow nodes lead and implement all of the standard-specific APIs? Well, it turns out, yes, we're doing that, right? We are, you know, but we are implementing that node compatibility layer. But what we really want are standards, is a standard api for this. Fortunately, tc39 is working on this. There is a async context api that is being developed actively right now. It's in stage two. And to give an idea of kind of what this is going to look like, this is the example with async local storage now. And this is how it changes with async context, right? So we just changed new async local storage to new async context and get store just becomes get, right? It's going to be that simple. And there are some other, you know, there are some other smaller api differences depending on what you're doing. But for the most common use case, this is the extent of the change, okay? So very, very straightforward. The QR code sends you to the GitHub repository where this proposal is being worked on. It is in active development right now. It's at stage two. It is expected to go to stage three a little bit later this year. There's a lot of unanswered questions, things that they're still working through. Most of the api, though, is stable. But it is on track to be added soon. If you have an interest in this, they are seeking more input on use cases and just general questions in general on how this is going to be used. But, you know, like I said, the simple api, you know, is pretty close. There are some features of async local storage that will not be in async context, specifically things like enter with and disable. What those APIs allow you to do is modify the contextual information in place, right, without, with run. Run will create that new frame. Enter with kind of modifies the frame as it exists, right? So that context frame becomes fully mutable synchronously, right? That has a number of challenges. Node currently allows that. It's an experimental feature. But that's not going to be carried through to async context for a number of reasons. So just don't use those. Don't use those features. And we actually, as part of WinterCG, we created a, what we call, it's kind of a mouthful, but a web interoperable runtimes portable subset. Basically, it's just the subset of the async local storage api that you can use today and be forwards compatible with async context when it comes out, right? And the goal here is to implement this not only in Node, right, it's already there, but implement it in workers. We would like to see it implemented in deno. We'd like to see it implemented in Bun. We'd like to see it implemented in all the javascript runtimes that are out there. And we are actively working with those, with the runtimes to get them looking at this subset and implementing to this in advance of async context coming. So, that's it. You know, hopefully, it was useful. You know, if folks have questions about this, just let me know. Thank you. Thank you, James. That was great. Okay. Do we have any questions? Take a sit. So we have some questions for you. The first one is, how do you have such a cool beard? When I met my wife, it was just short stubble, right? And she just said, don't shave it. And so I have it. Ah, good, good. Yeah, that was quite a while ago, 2012. 2012, you haven't shaved not even once. I haven't shaved it. Wow. So, yeah, got to keep my wife happy. Cool. The next question is, wouldn't async context create a global state with its own set of issues compared to propagating parameters? Is the tryout worth it? Short answer, yes. Yeah, it can. So if you set the async context at that global state, then yes, it carries all of the same issues as, you know, for global state in general. You can set that async local store or the async context at any scope you want though, right? It just has to be accessible to the functions that are going to need it. So you don't have to do it as a global, just most of the examples show it as a global. Cool. Well, I keep asking your questions, please. The next question is, async context on a stage two, is there any sort of pushback in security happening that could prevent this from landing? Yeah, so in terms of the pushback, the api is pretty stable. There have been some security questions in the committee. But as far as, from what I've understood, those have been dealt with and it's pretty well accepted. The key question that's being asked right now is whether it actually belongs in the language at all versus being pushed through as a web platform api through like the WebWG or W3C, you know, one of those venues. So that's the key debate that's happening right now. And it's still, I mean, there's still some really good arguments for it not going into the language. Cool. And why does async local storage copy key value pairs instead of doing same things as you did? The, it's hard to say. Since everything was built on top of the async hooks api, there are some limitations to how that works. There's quite a bit of complexity involved in tracking those resources. And, you know, the fact that we're dealing with so many different types of resources, promises and timers and all these kinds of things, it was pretty much the easiest thing to do, right? In the current model, it just wasn't the most performant way of doing it. But I think it was, it just ended up being the easiest. So. Any chance enter with could be implemented in the proposal is basically required for APMs since sometimes we can't decide if we're sync or async. Yeah. Probably not. So at this point, the authors of the async context proposal have ruled it out entirely and they said they do not want to support that enter with functionality. Whether or not it'll come back in later on, I don't know. I don't foresee it. So. We have more questions for you. We have domains that are deprecated and async locally search which is experimental. Which one should I use now if I need it? So use async local storage now. Stick to that portable subset that the WinterCG has defined. That is going to be, if you stick to that subset when async context is finished, it will be, node will support both, workers will support both, hopefully Dino will support both. So you have a greater chance of being compatible if you just stick to that subset. Async hooks api, it's not deprecated so much as it's just permanently experimental and nobody likes it. So it has some really good cases for diagnostic purposes and you can use it for those things but for most application cases, it's probably best to avoid it. Cool. You just mentioned Dino. We have a question of Dino here. Do you see some similarities with Dino KV? Oh yeah. Yeah. I'm very interested in Dino KV. In workers we have KV2 and I'm going to be digging into that in a lot more detail. So yeah. No comments on it yet. Surprise. Can you give us more cool use cases other than logging? That is an interesting thing. The folks that are working on async context proposal are actually struggling to get more use cases other than things like tracing and logging and that kind of thing. But if you look at most of the applications that are using async local storage now, it's things like APMs, tracers and logging. Those are the key use cases. So it's like we're looking for more but those are the ones that we've seen the most. Tracing is a big one. Is async context related or inspired by Java async context? You know that's a good question. I don't know. It's mostly inspired by async local storage. What kind of applications will benefit from migrating to async context from ASL? I think all of them. So ALS is what you have available now. It's what you should use now. When async context is available, there won't be any reason not to use it and not to migrate to it. The models are going to be very similar, very close. The api differences are going to be very minimal. So there won't be any reason to avoid it. The types of applications, servers, it's primarily going to be most useful for servers that have that unique context for every single request. Well, we don't have any more questions. Thank you and let's give him a pen.
27 min
14 Apr, 2023

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic