Cracking the Concurrent Mode


With concurrent mode coming in React 18, let's talk about the complexities behind providing declarative APIs for concurrent rendering. While implementing concurrent mode APIs from scratch for Brahmos.js, I came across lot of use cases and variability which made it one of the most interesting problem to solve, and appreciate more the React's effort on advocating Concurrent UI. In this talk we will see what concurrent mode means for a web app, what are the internal complexities and how I solved it for Brahmos.js.


Hello everyone. In this talk, I will be talking about concurrent mode, more on my mental model toward concurrent mode from implementing the whole React-like library which supports all the features React has including the concurrent features. By the way, there is no concurrent mode as such coming in the React 18. It's more of like an incremental concurrent feature. About me, I am Sudhanshu Yadav. I work at Prophecy as a front-end architect. I have authored Brahmos, the React-like library, and also have open-sourced a lot many other tools. I am an internal fanatic and I love discussing internals of the library I use. I like making theories around it. You can find me on Twitter or you can check my project on GitHub. Before we start, let's understand why we need concurrent mode, what it means for a React application or any application in general. So the most important reason why a concurrent feature is there is to provide a better perceived experience. Now, a lot of libraries focus on improving the performance of the library itself, which should be the priority. But sometimes those improvements are not noticeable by the user. And if a user doesn't feel your application to be smooth, then your application is not as smooth. They should perceive your application to be smooth. And there is a problem with the current patterns which come around improving the perceived experience. They have a good context on the application, but they don't have a control over your rendering phase. So because of which those patterns can't effectively use resources with concurrent feature, it allows to effectively use all the resources available while keeping the application responsive. A concurrent feature also enables background rendering and that kind of enables a lot of things like suspending a render, pausing, resuming it back or like batching renders together. A lot, many things. And it also removes a hard part from a user lane, is about like orchestrating the whole rendering thing. It kind of like provides declarative APIs, which can help you to build more, define the orders, give hints to the React or application like a library that this is the way I want my rendering sequence to be. Or like if I want the best things to get a pre-render something or lazy load something. All of those things can be done in a more declarative way. Now definitely concurrent feature has a lot of scope, but what enables the concurrent features? And the first and most important pattern I would say is a time slicing. So to understand time slicing, let's see a React tree. Here are all the nodes. You can think of a component which takes a little bit time to process. And how update would happen, like you change a state in any component and then it will trigger a re-render on that component, which internally would trigger re-render of their children. And all of this thing will happen synchronously in one go. And once you find what all changes is required for the actual DOM, after knowing those changes, you commit those changes to actual DOM. Now let's see how it plays on a frame timeline. So most of the libraries try to achieve 60 FPS and that usually means like you have a frame which is of like a 16 millisecond and you get like a 16 millisecond to execute, not even actually 16 milliseconds, less than that to execute, to do all the JavaScript stuff. So if you have, there are two types of tasks, JavaScript task and browser task. JavaScript task would be like processing the component and browser task would be like painting on the browser or giving a feedback of user input, animations or anything. Now if your JavaScript is big and if it takes like a more, it spans across multiple frames, then what will happen if there is a browser task in between? It has to wait until all the JavaScript task is finished. So basically like your application will little bit look, it will shatter because there will be frame drops. So with time slicing, the idea is that you have a big work, so break that big work into unit of works and after like process unit by unit and after every unit, check whether browser has something to do and basically give a breathing space for browser. With that, let's say if a browser task come in between, it can easily fit in your, it can easily fit between your JavaScript task. So let's try to visualize it on our application. So same, you are making a state change and you, with the state change, we call, let's say, let's not call it as a component, let's call it as a unit of work, Fiverr in terms of React. So you process a unit of Fiverr and then you check whether you should go to browser. If no, then you process next unit and you keep doing until you don't have to go into browser. As soon as you have to go into browser, you stop your processing at that moment and you schedule your next unit of task after browser has done doing its stuff. So basically you stop and you let browser to do their stuff and once browser is done, you come back to processing units and you will continue until all the units are processed and once all thing is rendered, you know what changes are required, you commit all those changes. So this basically turns your synchronous rendering into async rendering and it's interoperable, like the browser task can interrupt it or there could be more priority things which can interrupt it. But because now it has turned async, it inherits the problem of async nature, which is basically like a multiple thing can happen at one time and if a lot of things are happening and they are happening on your main tree, your application can become inconsistent. So to solve that, we have a pattern called background rendering. With background rendering, let's say we have the same application and if there is a state change, instead of updating the same tree, let's mark it as a current tree, which is a representation of your actual DOM, instead of updating directly that, you create a new work in progress tree and you perform all those operations, all those processing in work in progress tree. So how it happens, like you copy your fiber from the current work in progress tree, you check whether if there is any pending updates, if there are any, process the fiber, basically render it and whatever the rendered children are there, compare it with the existing children on the current tree. If they are same, clone them from the current tree. If they are different, then create a new fiber and then process your next fiber and continue doing that, repeat until all things are finished. With this, one benefit you get is that you're not mutating, you're not impacting the current tree. So your current tree remains consistent with your actual DOM while you are doing some job in work in progress tree. So only when you are done with all the things, like once you have rendered, you know all the changes and you commit the changes to the DOM. Now your work in progress tree is an actual representation of your DOM. So we swap the current and work in progress tree because the work in progress tree is a more accurate version of what DOM is. And being async in nature, now there can be n number of updates happening at the same time. And when there are a lot of updates in a queue, you need to prioritize them. And different type of updates have a different priority. For example, browser task has the highest priority that would be like a paint should happen first. And also within the application context, updates on input which need to update your state or if you are doing any animation through JavaScript or if you are interacting on a button, you need to give a feedback. All of these type of updates have a higher priority and they need to be updated synchronously because you need to give feedback fast to the user. But updates like timeouts, API calls, lazy loads, all of those are lower priority and it's okay if it is delayed a little bit. So we can defer those type of updates and those type of updates can also be interrupted. And React provides a different API to manually mark an update as a deferred or a sync update like a use deferred, transitions, flush sync, etc. Now there is one more pattern which is very important for content rendering. Is that at the end you want eventual consistency. So to understand this, let's say through one example. Let's say we have a UI where we have a table with a lot many rows and columns and those columns, those cells takes some time to render. And there is an input bar where you can type and filter things. So whenever you type on the input, you filter the table. Now while user is typing, that update has to happen fast because you have to give the feedback to user that something is being input. So it has to be done synchronously. But the table update can lag behind. It's fine if the table render happens in few milliseconds. User may not notice that at all or that user might be fine with that. So there are existing patterns for solving this. One of the common patterns is debouncing. With debouncing you say like, OK, do something after some milliseconds. So you don't do until that second. And if you do anything in between, you just again shift that boundary. But with content mode, it brings a new pattern of like differing instead of like debouncing. So debouncing is not very good because it just delays the problem. It doesn't actually solve the problem. It will be like you will render up to 200 milliseconds. But what will happen if a user is trying to type at that moment of time? And on this differing, you don't have to wait for 200 milliseconds. You can start rendering your table as soon as possible. And you keep trying to render it and you will interrupt it whenever there is a new user input is happening. And eventually the table becomes consistent to what a user has typed. So it's fine lagging a little bit behind there. This enables like for a faster, for machines with more resources, it will happen fast. For machines which doesn't have a lot of resources, it will lag more behind. So it will automatically adjust based on the resource available. Now the another pattern and if you are following React, you must have gone through transitions. And transitions, you must have seen this snippet like when in like functional component, you can use transitions, you get start transition method. And transitions are mostly a declarative way of defining a deferred update. It's like saying your application is at state A, you want to move it to state B. And any update between state A and B should be clubbed together as a transition. And it shouldn't block your browser. It shouldn't make your browser unresponsive. It can happen as a deferred update. And also like all the updates happening in the transition like here, the steady state which is on the same component, onClick might be calling steady state in its parent component. All of these updates are clubbed together in a transition and all are flushed together. Because transitions can be interrupted by synchronous events, there could be a chance where transitions doesn't happen at all. So for that transition also have timeouts. You can provide timeout and say like at least in this millisecond, you want your update to be available on the DOM. Whenever it reaches to that timeout, it tries to flush them synchronously. And there can also be like multiple transitions within one action like here in HandleClick we are calling transition A and transition start transition A, start transition B. And both our transition can run independently. It's like running, executing in their own universe. And once they execute, they can merge things into current tree. Also like whenever you call the start transition twice, like you're calling the same start transitions. So all of this multiple transition, which are started by same start transition method, are clubbed together. So you don't see like inconsistent behavior. Inconsistent behavior or state. Now while content mode is great, it provides a lot of scopes. Implementing those APIs are not very straightforward. There are a lot of complexities and in the next part we'll discuss those complexities. Here it would be more on my experience on building those content APIs for Promo's. But most of them relate to React as well. So the first complexity is your updates can become a state. Because deferred update happen asynchronously, it can be interrupted by a sync update and those states in this from deferred update can become a state. Not only it can become a state by a sync update, it can also become a state by another deferred update because everything is happening asynchronously. And the bigger effect is like they might be mutating the same reference and can have side effect as well. So to solve this, for the solving the state problem, in Promo's what I'm doing is like I'm keeping a time stamp on those updates and whenever I'm in a render phase, I check like whether update is older than the current tree update. If it is new, then go to the latest path, otherwise discard the old updates. And for the mutation part, which can mostly happen on a class component where every set of states are stored on a component instance. So instead of applying the state on set state calls, we create an update queue where all updates are stored and those updates are applied lazily during the render phase. So it can only happen on your working program, it doesn't have to impact your current tree references. The other problem is the update starvation. Basically like your low priority update can be eaten up by high priority updates. So basically it can starve because there are a lot of high priority updates happening very frequently which will never allow the low priority update to happen. To fix this particular problem, what I did is basically add a retry count on this low priority updates and with more retries, start giving more time for deferred updates so that they have more space, basically more time to process the fibers in a working progress state. Also, for transitions, having a timeout helps where you can say like if a transition is eaten up by a high priority update, after a timeout it will become asynchronous update. And the most important piece is to reuse the work which we have already done as much as possible. Now, because everything is happening asynchronously, there is a good chance that we might throw away the pre-done work. And to solve that, in premise what I did is like do all the synchronous update directly in the current tree because it has to be flushed synchronously and reserve the working progress tree only for the deferred updates. So with that, what capability we get is like, let's say, for example, there is an update happening on a deferred, like a working progress tree. At that moment, if there is a sync update which triggered on a current tree, you will finish that update and you will come back to working progress tree. In working progress tree, you can resume from where you have started because you don't have to throw away anything. You just have to check whether the updates you have made, whether those have become stale or not. If they aren't stale, then you can keep those fibers. If they are stale, then you can clone the fibers again from the current tree. The other interesting problem, and it was kind of nightmare for me, was scheduling. So the problem with scheduling is like, let's say, if you yield to browser too less, then you will execute your code faster, but your browser will become like you will trade off the responsiveness of a browser because you will eat up some frames. And at the same time, if you yield too much to the browser, it will keep your browser responsive, but it will make your execution take more time. So how do you know when to yield to browser? And also, once you yield to browser, how do you know that the browser is done with its work? So you can schedule the next unit of work accordingly. So browser is working on APIs like isInputPending and scheduler API, which can solve this problem. But till those are there, I had to come up with my own solution. So what I came up with is a concept of slots where I created a five millisecond slot. And in that slot, we can process as much as fiber we can. Now after every slot, we yield to browser and schedule the next slot with message channel. And message channels are good in terms of like it doesn't have a cool off time. You can say like most of the scheduler techniques like requestAnimationFrame, not requestAnimationFrame, but setTimeout and requestIdleCallback, they will have bigger delays between two slots. But message channel can also eat up some of browser tasks as well. So what I did is like whenever we are close to the frame, and at that point of time, instead of scheduling from message channel, schedule through requestIdleCallback and also with the setTimeout and whichever responds first, take that and ignore the other one. And the slots are not constant. The size of slot can increase if a deferred update is retried multiple times. And also the slot size can reduce based on like if you have enough, like if your slots have taken enough time in a frame and there is less time remaining on that particular frame, so slot can become smaller to accommodate that. And by the way, like a slot can even become bigger than one frame, like it can become, it can span across two frames. And the idea behind that is programmatically, not programmatically, dynamically, we can control, we can tune the FPS based on the execution speed and the resource available for a user. The other interesting problem and transitions are good, but handling multiple transitions is a nightmare. So the thing is like because they are in a sync in nature, there can be multiple transitions in a queue and because they are a sync, they can have risk conditions as well and updates in one transition can become stale by other transition. Also transition can be nested, so you can write startTransition inside a startTransition and you can compose transitions as well, like you have multiple transition, startTransitions and which is wrapped up in another startTransition. You can do all sorts of nasty stuff with transitions. And the other thing can happen, like two transitions can try to update the same state and now what to do, like which one to take precedence. In BrahMos, I took a simpler approach to solve those. React right now merges all the transitions together, but their long-term plan is to run the transition individually in a different lane. In BrahMos, what I'm doing is like if two transitions are there from different startTransitions, I always run them independently. And if the transition starts, like multiple transition starts from same startTransition method, then always merge it. Now it could be like you are calling the same startTransition multiple times or it could be like you are pressing a button and there was a previous button pressed, it is still not processed completely, now you have pressed it again. So the previous transition and the new transition, the updates will get merged. And this, like if you have different transitions, those transitions are kept in a transition queue and are processed one by one instead of creating like a multiple work in process tree and try to process every transition in their own place. We just keep it simple like by process one transition at a time. And time.transition can move themselves from transition queue to sync update. That is handled by timeouts. And for nested transitions, I simply followed like whatever leaf transitions are there, that will be respected. The parent transition will only keep the state updates which are direct child of that transition. Now, there are a lot more complexities and I guess I could have explained more of them, but we have limited time. I would strongly recommend you to go through React 18 working group discussions. Those are gold mines. And also like the document, the older document on concurrent mode. You will get a lot of ideas. And after seeing those documents and discussions, you will start appreciating the effort of React team. They're trying to solve one important problem, which is have a better perceived experience for user. And on a grand scheme, if you see like that is the only thing which matters. With that note, I would like to thank you and see you all. Thanks.
30 min
25 Oct, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic