The Core of Turbopack Explained (Live Coding)

Rate this content
Bookmark
29 min
01 Jun, 2023

AI Generated Video Summary

Tobias Koppers introduces TurboPack and TurboEngine, addressing the limitations of Webpack. He demonstrates live coding to showcase the optimization of cache validation and build efficiency. The talk covers adding logging and memorization, optimizing execution and tracking dependencies, implementing invalidation and watcher, and storing and deleting invalidators. It also discusses incremental compilation, integration with other monorepo tools, error display, and the possibility of a plugin system for Toolpag. Lastly, the comparison with Bunn's Builder is mentioned.

1. Introduction to TurboPack and TurboEngine

Short description:

I'm Tobias Koppers, the creator of TurboPack and TurboEngine. I'm here to demonstrate live coding in JavaScript, focusing on the core of TurboEngine. The motivation behind TurboPack is to address the limitations of Webpack in handling large applications and incremental builds. TurboPack introduces a new architecture to optimize cache validation and improve build efficiency. I will showcase a simple application that copies JavaScript files based on the dependency graph, with the addition of a copyright header. Through live coding, I will explain the process and demonstrate how TurboEngine enhances incremental builds. Let's get started!

Thanks for having me. I'm trying something new. I'm trying to do live coding today, so I hope it works out. So, yeah, my name is Tobias Koppers and I worked on Webpack for 10 years and now, joint Universal and work on TurboPack trying to do something new, something better and yeah, I'm trying to focus, as I've said, I'm trying to focus on one aspect of TurboPack and trying to explain a little bit how TurboPack or the core of TurboEngine works in detail so I'm trying to demo something with that, so I'm trying to actually trying to live code in JavaScript a little bit of the core of TurboEngine.

So the motivation of that is that on Webpack applications we saw that applications grow and grow, larger and larger and Webpack, the architecture of Webpack is not built for that, incremental builds tend to get slower and slower if the application grows. It's not that huge of a problem but yeah, it might get a problem in a few years when the applications get millions of modules or whatever and a few problems we isolated were we do a lot of cache look-ups, we have to compute a lot of stuff to do cache validation like checking if files are still the same, hashing stuff and that is the problem because all this overhead, you pay it for every incremental build and we want to do something new, a new architecture to tackle this problem. And that's why we created like turbo engine and turbo pack and it's a new architecture and I can explain it a little bit in doing live coding.

What I want to show is a kind of small application which is super simple, not a bundler but something that is similar to a bundler, it is taking any JavaScript application and just copies over the application by following the dependency graph to another folder and doing that it also adds a copyright header just to demo something. With that, I start with the basic application written in JavaScript and explain it later and then I try to add something similar to TurboEngine to make it more efficient, to make incremental builds possible in a similar way which it works in TurboPack, in Rust and with TurboEngine. For that I prepared this little application, it's really simple, it's just a bunch of Node.js We use Acron to get the dependency graph of something, the path there, the modules.

And I go through the application a little bit to make you understand it. The main process is really simply, we get the base directory, like the source directory, we have an output directory, and then we have an entry file which is actually this file we're looking at. So we're actually copying its own application to another folder. And then we start following the dependency graph from that entry point and copy that from base tier to output tier. And another, to make it a little bit more complicated, I add this header file which basically is, let me show it, it's like a copyright statement which should be added to every file to make it a little bit more interesting. So then we invoke this kind of function, copy graph, which basically computes the output from the current file by just relocating it. Calling the copy function which copies the file, super simple. And then calls two other functions which is called get references which we see later, it's like getting the references, like all the files that have been imported from one file and then looping over that and calling itself recursively to just copy the whole application. Yeah. So copy also pretty simple, read the header, read the file, and write it to another file. Nothing super complicated here. Get references is a little bit more complicated but yeah, it's not really that you have to understand it. It's like calling parse to get an AST out of the module and like looping or doing some magic to extract the import statements and returning a list of all files referenced by that kind of file. Parse is also pretty simple using calling Akon, which is a JavaScript parsing library. Also reading the file obviously and then it returns the AST. And after that I start the whole thing and that should copy the application to the new folder. So let's try it. Oh. A few things I want to explain. I also have this task function which is actually doing nothing currently.

2. Adding Logging and Memorization

Short description:

So it's basically only adding some logging so you can actually see what the application is doing. We add more logic to that later. The first step is basically add some kind of memorization system to that is like a cache. We store the cache somewhere using a map. Now we should have this kind of memorization, it's pretty simple, actually.

So it's basically only adding some logging so you can actually see what the application is doing. Otherwise it just prints nothing, that's pretty boring. So it only has logging and what I do is I basically call the function with logging. You see it but it's nothing straight forward, it's not doing anything special.

We add more logic to that later. So what you'll end up seeing is this whole application running so it's calling main, calling copy graph, calling copy and calling all these functions in kind of three kind of metals. This is basically a stack trace.

But you also see a lot of problems with this application. In example, we're reading header a bunch of times, like here and here and here. And we also calling copy graph multiple times. We're calling fs copy graph from taskKey, because it's referenced from fs. And we're calling copy graph from task. So we're doing a lot of duplicate work that we don't want to do duplicate work because that's what we want to do.

The first step is basically add some kind of memorization system to that is like a cache. So if you execute the same function twice, then we just return an existing result. So let's add that. So to add some cache, we store the cache somewhere. And in JavaScript, we can just use a map for that. And what we want to do is we want to get the task from the map as first step, the function from the cache. Actually we want to get the function and all these arguments. So because you can call the same function with different arguments, which is basically a different task. And then if we have a task, and we can just, if you don't have a task, we can just create one. Which means we create a new object, which has some result, which is undefined for the and then we set the result, which is basically what we were doing before, so copy that one here. And then, in any case, we return the result. So now we should have some kind of memorization system.

I missed some stuff, so I actually have to set the cache, yes, like this. And there's a bug, you probably see it if you're an JavaScript developer, the map doesn't work with arrays because it's stored by identity, so what we actually need to do is store it by a kind of value of that, so for that I prepared something which is like a TupleMap. Which I need to import. Copy load, don't do it wrong. And now we should have this kind of memorization, it's pretty simple, actually.

3. Optimizing Execution and Tracking Dependencies

Short description:

Calling a function multiple times is inefficient, so we optimize it by only executing the headers once. For incremental execution, we take a bottom-up approach, computing affected functions from changed files. To track dependencies, we use a graph to store connected tasks. By setting a current task, we can invalidate dependent tasks and track execution. The Invalidate function is used to invalidate and compute tasks, along with their dependencies.

And with that, calling it again, you never execute a function twice, so headers only called once, and if you copy something again it's not reading header again and you see it's not invoking copy graph of task multiple times. So kind of optimize that, and we need the same thing later for our incremental bits.

So next step. So next step is to add some kind of incremental execution system. So what we don't want to do is something like just starting main again and looking everything up from cache, because that's what we've done with Webpack, and the overhead of looking everything up in the cache is real and is a problem. So what we want to do is do a new approach, and the approach is similar, to do it in the opposite way, so not calling functions from top to bottom. We actually want to go from bottom to top, so we want to get, like, from the file changed, figure out which functions are affected and re-compute them. So we want to go from the changed files, so from file reads, and invalidate that and then bubble up the whole system to get it in the right state again.

So to do that, we actually need some kind of graph to store which tasks are connected to other tasks. So we want to know that when we invalidate, like, a recall, that we have to invalidate our copy method or something like that. So what we want to do is we want to store something, like a dependent task, and maybe a set would do it. So store a list of tasks that this task depends on. Which depends on that task. And we want to fill up this so we actually set the task, dependent task, at the current task. So the problem is we don't know the current task. So what we want to do, and there might not be a current task for the main application, add like only if we have a current task, we add it to the dependent task. And then we want to track what's the current task. So that's actually not super complicated. We can have like a variable which is also a current task, and then have a function, wrap our calls with some kind of state where we have this current task. So we have this kind of this current task method. Current task, where you kind of call a function with this current task set to something. Maybe Copilot can do that. Yeah, it can do it. So what we want to do is, like, just set it before the function and restore it to the old value afterwards. And then we use this new function to wrap our call in this current task. So now we now do an execution that we are inside of that task and we can track it in all its child. So now we have this kind of tree of executions, which we see in the output stored in our task structure. So how do we use that? So we want to do this kind of invalidation and then bubble up to the dependent task. So we want to have some kind of function that's called Invalidate, which invalidates a task and we compute it and also invalidates, like, the dependent task. First, you want to log that because it's, like, nice to see what's happening.

4. Implementing Invalidation and Watcher

Short description:

We need to store the names and other information for invalidation. The first step is to recompute the task, followed by invalidating the dependencies. After recomputing the tasks, we need to invalidate all dependent tasks. To implement this, we add functions for getting and adding invalidators. We can use the invalidation system in our filesystem implementation. We get the invalidator in a specific function and call it when needed. We use a watcher to detect changes in the directory. Finally, we store the invalidators in a list.

That's just for, like, printing out what's happening so you can see it. And we need to store the names. So we have to store some stuff, like name, function, and args. So we can access it doing invalidation.

And then, first step is recompute the task. And second step is invalidate dependence. You know, Copyload already knows what we want to do. Okay. So what we want to do is just execute the task again by doing that. Actually, it's not correct. So we want to execute the task again and store the new result. So we want to wrap it in with current task again, but then just invoke task function with our task arguments. So we compute it, but after recomputing the tasks, the result might change or probably has changed, hopefully. And then we want to go through all dependent tasks and also invalidate them so they can recompute too. So if we have a dependent task, we want to loop over them and invalidate them. And this is not called anywhere so we want to call it, for example, from a watcher. So we add some kind of function to this kind of thing which is getInvalidator, addInvalidator, which returns the invalidation function for the current task. So if you don't have a current task, then we want to throw, but in the end, we want to return a function that allows you to invalidate the current task. And this function can be used in our filesystem implementation, which currently looks like that. So if you delete something, we call read file. If you write something, we call write file. Yeah, it's pretty simple.

So how do we get use of our invalidation kind of system? So we can get the invalidator in this kind of function. Now, we have that, but when do we call it? So we need to have some kind of watcher. Luckily there is a watcher implementation in Node.js. And for simplicity, we just watch the whole directory recursively. And in some edge cases there is no file system due to Node.js. So now we want to invalidate all read calls that are affected by this kind of file that has changed. So what we want to do is we want to store our invalidator somewhere. So we have some kind of list of invalidators.

5. Storing and Deleting Invalidators

Short description:

And then store the invalidator. In validators set. Store it in this kind of mat. So we have access to all invalidators from all read calls, so we can just call them. But that leads to, like, because one file changed, it's like, attributes changed, file added, multiple systems. Eventually it would trigger it multiple times. We want to debounce it a little bit to make it readable in the output. Set timeout. And call the invalidator. We also want to lock something. So maybe make something like file changed. We actually need to delete the invalidator. So remove it. Because once invalidated, function invalidated multiple times doesn't make sense. Because after invalidation, we call it again and then we have a new invalidator which is in the map again.

And then store the invalidator. In validators set. Store it in this kind of mat. So we have access to all invalidators from all read calls, so we can just call them. And that's what we actually want to do. So we get the invalidator. From the invalidators, via file system. Actually that's wrong. Because file system is relative and human team. So make it absolute. And get it from the list. And if we have it, if we have an invalidator, we can call it.

But that leads to, like, because one file changed, it's like, attributes changed, file added, multiple systems. Eventually it would trigger it multiple times. So we want to debounce it a little bit to make it readable in the output. Set timeout. And call the invalidator. Maybe make it 100 milliseconds. And yeah. We also want to lock something. So maybe make something like file changed. So actually see what's happening. Yeah. Oops. Cool. Kind of works but there's a bug.

So we actually need to delete the invalidator. So remove it. Because once invalidated, function invalidated multiple times doesn't make sense. Because after invalidation, we call it again and then we have a new invalidator which is in the map again.

6. Invalidation and Bubbling

Short description:

So we only want to invalidate it once. We compare the old result with the new result and only invalidate the parents if something has really changed. The new implementation now only invalidates the new function when saving the file. If something changed but we don't need to recompute it, we print unchanged and stop the bubbling.

So we only want to invalidate it once. So yeah should work. Well, hopefully. Let's try it if I miss something.

Okay, we need to import the getInvalidator function. So it works. The initial build is basically the same. We sequence all the stuff we put in before but now it basically don't exit because we're watching something. So technically, if we change the file it should recompute the stuff and get to the status state again. So maybe I can just save this file. Okay, invalidator. Okay, try again. Saving the file. Oh, now it works. Okay, now we see it's basically invalidating the read function and recomputing the read function and then it bubbles up the graph of execution. So copy is calling read, so copy is also recomputed and copy is called by copy graph and it bubbles up until the main function. There are some problems, like it bubbles up multiple times, like copy graph, copy graph is repeated here because read is used in parse and in copy. And we also, it's also pretty useless to, if I just save the file, to just recompute everything because I didn't change anything in the file. Why do we need to recompute copy graph is the references didn't change at all. So what we want to add is some kind of stop gaps that we can stop the bubbling if the result is the same. So the clock is wrong. Okay, so I have two minutes, so I add this. Yeah I want to like only invalidate if something has really changed, so I can add some kind of simple code to make this work. So we want to compare the old result with the new result. So if the old result is not equal, you shouldn't do that with Stringify, but it works for now. So only if something has really changed, we want to invalidate the parents. So now I'll try this again with the new implementation, and if I only save this file, it now only invalidated the new function, maybe we add some kind of logging. So if something changed, if logging no change, perfect. So if something changed but we don't need to recompute it, then we can just print unchanged, so we can stop the bubbling. If read changed but nothing changed, but if we really change something, like to the AST, like at a comment, we have to actually bubble more, like read will bubble up, copy the file again and then parse the file again, and get references because the references of the file didn't change, don't have to recompute.

7. Incremental Compilation and Granular Changes

Short description:

When we reorder the imports, the system can follow the graph and only recompute what's needed. A larger example with 10,000 files demonstrates the system's ability to make granular changes to files. Incremental compilation in TurboEngine allows for recompute only affected files, making the system independent of application size.

In the case when we do something like reorder the imports, then we also bubble up and get the references and go further down the tree. So the whole system is really great because then we can go from the watch event and only invalidate like certain pieces of our application, independent of how large the application is, so we can just follow the graph and only recompute what's needed.

So let's try it with a larger example. I have prepared like a directory here with a lot of files, 10,000 files, and they basically are all imported from this kind of function. So it takes a while to compile that initially because we have to, oh wait, stop, I wanted to disable the logging for that, that's why here is a font that prints millions of lines of code. It takes 10 seconds to compile that, but afterwards we can use the same system to make these granular changes to our files, like, waiting for being done.

So if I only change this small file here, then it will do the same if I save it, it will be really fast in reading a file and seeing it the same. And if I change the index file, it's a little slower, because parsing the file already takes a second or so. So we have to add something here, it takes a little bit more, but it doesn't have to copy all the files, because that's basically the same. And if I add some more modules to the file here, then it will recompute that and copy the files that are new, and that kind of stuff. There's a system of incremental compilation in TurboEngine, it's a little simplified, it's a little bit more involved in practice, but I think you get the idea on how you can tackle the problem from the bottom up, and recompute only files that are affected, and make the system independent of application size, and only depending on the size of the change.

8. Integrating with Other Monorepo Tools

Short description:

Thank you for your interest in TurboPack's integration with other monorepo tools. TurboPack is designed to be integrable with every tool, offering command line functionality and integration tools for the cache. We are particularly focused on deep integration with TurboRepo, our tool, to share caches and create a more integrated experience. Similar to our integration with NX, starting with a close collaboration allows us to build a proof of concept and ensure TurboPack's usability for other tools. Working with a familiar team provides easier communication and understanding of tool requirements.

Thank you. Seems like I got it in people's minds! What about integrating with other monorepo tools like NX, other than Turbo? TurboPack is integrable in every tool, you can call it command line thing, and we probably offer some kind of integration tools for the cache, but we also want to work a bit deeper with TurboRepo, because that's our tool, and our idea is to be able to share the caches between TurboRepo and TurboPack, so it makes it more integrated, but I think we offer the same kind of integration abilities for other monorepo tools. I think that makes sense, like you go with something you're close to, and you can work together to build this kind of proof of concept. Yeah, it's similar to what we did with NX. We built TurboPack with deep integration with NxJS, because that's where we can work together and get to a state where we have everything we know what's needed for TurboPack, and then release TurboPack as a standalone thing, so it's usable for other things too. But going initially with something you know and you can work with a team is much easier to work with than like communicating with external tools, and you don't know what's needed for the tool and yeah. Yeah, you can't like demand things the same way. You can't be like, hey, we really need to push this. It's a lot harder to negotiate that. Yeah. All right, great.

9. Error Display and TurboPack Reliability

Short description:

How do you display error message on the same line where it was thrown in your IDE? When can I use TurboPack reliably? Our priority is getting Next to work. We want to make TurboPack a standalone tool, simpler and easier to use. We don't support Webpack plugins as they are currently.

Then next one. How do you display error message on the same line where it was thrown in your IDE? What? Yes, that's a plug-in, I think. Not sure which one, but one of these. Whoever asked this, you can find Tobias later and you know, state your case.

So all right. When can I use TurboPack reliably without... Yeah. Yeah. As I said, our priority is getting Next to work, which should be soon. As soon as the case is already working for a lot of applications, but we figured out some edge cases and more of the modern stuff of Next.js. And then we want... It's our goal to make it a standalone tool because we want to have a similar tool compared to Webpack where you can use it as standalone. But there's a lot of stuff to figure out, like how we take a configuration, what the API is. And we don't want to make the same mistakes, this configuration as Webpack, so we want to make it simpler, easier to use. So we need to think about that. And technically, there is the standalone. It's two packs CLI. You can compile yourself in our repo, but it's not really... It has no configuration. It's just... Maybe not a lot yet. It's not something we make public yet, but we plan to do that and it will take a bit. But we are going to get there.

Yeah, of course. And then you knew this one was coming. So what about apps that use Webpack plugins? What's the migration plan? What tips do you have? Yeah. So we will not support Webpack plugins as they are currently. So it's not... Because Webpack plugins can really deeply integrate into Webpack and modify every tiny bit of that. And we want to offer a similar plugin system, maybe not really that deep as the JavaScript plugins.

10. Toolpag Plugin System

Short description:

We want to offer a plugin system for Toolpag, but due to the different architecture, it requires porting Webpack plugins. We are exploring options for importing plugins and considering collaborations with Copilot and ZPT.

So like the Rust plugin API, we can really get really deep into Toolpag. But we want to offer some kind of plugin system. And you probably have to port your Webpack plugin to Toolpag plugin. But that's something we want to do. Because the architecture is different. It's just not something that we can just do. Get Webpack plugin working. Maybe simple ones. But due to the different architecture, we want to have some kind of different plugin system. And yeah. Okay. Yeah. Maybe we can also import through that. Like, through documentation or something like that. Yeah. Yeah. Maybe we can get Copilot to port it over. And get ZPT to port it over.

11. Comparison with Bunn's Builder

Short description:

We haven't compared build times with Bunn's new builder yet. Since we focus on the incremental build experience and have a dev server, while Bunn focuses on fast production builds, it's difficult to make a direct comparison. However, the author of Bunn has mentioned on Twitter that they will do a comparison. We'll see how it goes.

Yeah. Of course. And then I think probably our final question for this time. Did you compare build times with Bunn's new builder? We didn't compare that yet. But I think the author of Bunn did that. But this problem. Because we currently only have a dev server. Bunn only have a production build system. So it's hard to compare currently. Because we focused on the incremental build kind of experience. The dev experience. And they focused on fast production builds. So I think there will be a comparison eventually. But we probably don't do that. At least not publicly. And maybe they probably do that. At least they said on Twitter that they will do it. So let's see.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

JSNation Live 2021JSNation Live 2021
31 min
Vite: Rethinking Frontend Tooling
Vite is a new build tool that intends to provide a leaner, faster, and more friction-less workflow for building modern web apps. This talk will dive into the project's background, rationale, technical details and design decisions: what problem does it solve, what makes it fast, and how does it fit into the JS tooling landscape.
React Summit 2023React Summit 2023
32 min
Speeding Up Your React App With Less JavaScript
Too much JavaScript is getting you down? New frameworks promising no JavaScript look interesting, but you have an existing React application to maintain. What if Qwik React is your answer for faster applications startup and better user experience? Qwik React allows you to easily turn your React application into a collection of islands, which can be SSRed and delayed hydrated, and in some instances, hydration skipped altogether. And all of this in an incremental way without a rewrite.
GraphQL Galaxy 2021GraphQL Galaxy 2021
32 min
From GraphQL Zero to GraphQL Hero with RedwoodJS
We all love GraphQL, but it can be daunting to get a server up and running and keep your code organized, maintainable, and testable over the long term. No more! Come watch as I go from an empty directory to a fully fledged GraphQL API in minutes flat. Plus, see how easy it is to use and create directives to clean up your code even more. You're gonna love GraphQL even more once you make things Redwood Easy!
JSNation 2023JSNation 2023
28 min
SolidJS: Why All the Suspense?
Solid caught the eye of the frontend community by re-popularizing reactive programming with its compelling use of Signals to render without re-renders. We've seen them adopted in the past year in everything from Preact to Angular. Signals offer a powerful set of primitives that ensure that your UI is in sync with your state independent of components. A universal language for the frontend user interface.
But what about Async? How do we manage to orchestrate data loading and mutation, server rendering, and streaming? Ryan Carniato, creator of SolidJS, takes a look at a different primitive. One that is often misunderstood but is as powerful in its use. Join him as he shows what all the Suspense is about.
React Day Berlin 2022React Day Berlin 2022
22 min
Jotai Atoms Are Just Functions
Jotai is a state management library. We have been developing it primarily for React, but it's conceptually not tied to React. It this talk, we will see how Jotai atoms work and learn about the mental model we should have. Atoms are framework-agnostic abstraction to represent states, and they are basically just functions. Understanding the atom abstraction will help designing and implementing states in your applications with Jotai

Workshops on related topic

React Day Berlin 2022React Day Berlin 2022
86 min
Using CodeMirror to Build a JavaScript Editor with Linting and AutoComplete
WorkshopFree
Using a library might seem easy at first glance, but how do you choose the right library? How do you upgrade an existing one? And how do you wade through the documentation to find what you want?
In this workshop, we’ll discuss all these finer points while going through a general example of building a code editor using CodeMirror in React. All while sharing some of the nuances our team learned about using this library and some problems we encountered.