1. Introduction to TurboPack and TurboEngine
So the motivation of that is that on Webpack applications we saw that applications grow and grow, larger and larger and Webpack, the architecture of Webpack is not built for that, incremental builds tend to get slower and slower if the application grows. It's not that huge of a problem but yeah, it might get a problem in a few years when the applications get millions of modules or whatever and a few problems we isolated were we do a lot of cache look-ups, we have to compute a lot of stuff to do cache validation like checking if files are still the same, hashing stuff and that is the problem because all this overhead, you pay it for every incremental build and we want to do something new, a new architecture to tackle this problem. And that's why we created like turbo engine and turbo pack and it's a new architecture and I can explain it a little bit in doing live coding.
2. Adding Logging and Memorization
So it's basically only adding some logging so you can actually see what the application is doing. We add more logic to that later. The first step is basically add some kind of memorization system to that is like a cache. We store the cache somewhere using a map. Now we should have this kind of memorization, it's pretty simple, actually.
So it's basically only adding some logging so you can actually see what the application is doing. Otherwise it just prints nothing, that's pretty boring. So it only has logging and what I do is I basically call the function with logging. You see it but it's nothing straight forward, it's not doing anything special.
We add more logic to that later. So what you'll end up seeing is this whole application running so it's calling main, calling copy graph, calling copy and calling all these functions in kind of three kind of metals. This is basically a stack trace.
But you also see a lot of problems with this application. In example, we're reading header a bunch of times, like here and here and here. And we also calling copy graph multiple times. We're calling fs copy graph from taskKey, because it's referenced from fs. And we're calling copy graph from task. So we're doing a lot of duplicate work that we don't want to do duplicate work because that's what we want to do.
3. Optimizing Execution and Tracking Dependencies
Calling a function multiple times is inefficient, so we optimize it by only executing the headers once. For incremental execution, we take a bottom-up approach, computing affected functions from changed files. To track dependencies, we use a graph to store connected tasks. By setting a current task, we can invalidate dependent tasks and track execution. The Invalidate function is used to invalidate and compute tasks, along with their dependencies.
And with that, calling it again, you never execute a function twice, so headers only called once, and if you copy something again it's not reading header again and you see it's not invoking copy graph of task multiple times. So kind of optimize that, and we need the same thing later for our incremental bits.
So next step. So next step is to add some kind of incremental execution system. So what we don't want to do is something like just starting main again and looking everything up from cache, because that's what we've done with Webpack, and the overhead of looking everything up in the cache is real and is a problem. So what we want to do is do a new approach, and the approach is similar, to do it in the opposite way, so not calling functions from top to bottom. We actually want to go from bottom to top, so we want to get, like, from the file changed, figure out which functions are affected and re-compute them. So we want to go from the changed files, so from file reads, and invalidate that and then bubble up the whole system to get it in the right state again.
So to do that, we actually need some kind of graph to store which tasks are connected to other tasks. So we want to know that when we invalidate, like, a recall, that we have to invalidate our copy method or something like that. So what we want to do is we want to store something, like a dependent task, and maybe a set would do it. So store a list of tasks that this task depends on. Which depends on that task. And we want to fill up this so we actually set the task, dependent task, at the current task. So the problem is we don't know the current task. So what we want to do, and there might not be a current task for the main application, add like only if we have a current task, we add it to the dependent task. And then we want to track what's the current task. So that's actually not super complicated. We can have like a variable which is also a current task, and then have a function, wrap our calls with some kind of state where we have this current task. So we have this kind of this current task method. Current task, where you kind of call a function with this current task set to something. Maybe Copilot can do that. Yeah, it can do it. So what we want to do is, like, just set it before the function and restore it to the old value afterwards. And then we use this new function to wrap our call in this current task. So now we now do an execution that we are inside of that task and we can track it in all its child. So now we have this kind of tree of executions, which we see in the output stored in our task structure. So how do we use that? So we want to do this kind of invalidation and then bubble up to the dependent task. So we want to have some kind of function that's called Invalidate, which invalidates a task and we compute it and also invalidates, like, the dependent task. First, you want to log that because it's, like, nice to see what's happening.
4. Implementing Invalidation and Watcher
We need to store the names and other information for invalidation. The first step is to recompute the task, followed by invalidating the dependencies. After recomputing the tasks, we need to invalidate all dependent tasks. To implement this, we add functions for getting and adding invalidators. We can use the invalidation system in our filesystem implementation. We get the invalidator in a specific function and call it when needed. We use a watcher to detect changes in the directory. Finally, we store the invalidators in a list.
That's just for, like, printing out what's happening so you can see it. And we need to store the names. So we have to store some stuff, like name, function, and args. So we can access it doing invalidation.
And then, first step is recompute the task. And second step is invalidate dependence. You know, Copyload already knows what we want to do. Okay. So what we want to do is just execute the task again by doing that. Actually, it's not correct. So we want to execute the task again and store the new result. So we want to wrap it in with current task again, but then just invoke task function with our task arguments. So we compute it, but after recomputing the tasks, the result might change or probably has changed, hopefully. And then we want to go through all dependent tasks and also invalidate them so they can recompute too. So if we have a dependent task, we want to loop over them and invalidate them. And this is not called anywhere so we want to call it, for example, from a watcher. So we add some kind of function to this kind of thing which is getInvalidator, addInvalidator, which returns the invalidation function for the current task. So if you don't have a current task, then we want to throw, but in the end, we want to return a function that allows you to invalidate the current task. And this function can be used in our filesystem implementation, which currently looks like that. So if you delete something, we call read file. If you write something, we call write file. Yeah, it's pretty simple.
So how do we get use of our invalidation kind of system? So we can get the invalidator in this kind of function. Now, we have that, but when do we call it? So we need to have some kind of watcher. Luckily there is a watcher implementation in Node.js. And for simplicity, we just watch the whole directory recursively. And in some edge cases there is no file system due to Node.js. So now we want to invalidate all read calls that are affected by this kind of file that has changed. So what we want to do is we want to store our invalidator somewhere. So we have some kind of list of invalidators.
5. Storing and Deleting Invalidators
And then store the invalidator. In validators set. Store it in this kind of mat. So we have access to all invalidators from all read calls, so we can just call them. But that leads to, like, because one file changed, it's like, attributes changed, file added, multiple systems. Eventually it would trigger it multiple times. We want to debounce it a little bit to make it readable in the output. Set timeout. And call the invalidator. We also want to lock something. So maybe make something like file changed. We actually need to delete the invalidator. So remove it. Because once invalidated, function invalidated multiple times doesn't make sense. Because after invalidation, we call it again and then we have a new invalidator which is in the map again.
And then store the invalidator. In validators set. Store it in this kind of mat. So we have access to all invalidators from all read calls, so we can just call them. And that's what we actually want to do. So we get the invalidator. From the invalidators, via file system. Actually that's wrong. Because file system is relative and human team. So make it absolute. And get it from the list. And if we have it, if we have an invalidator, we can call it.
But that leads to, like, because one file changed, it's like, attributes changed, file added, multiple systems. Eventually it would trigger it multiple times. So we want to debounce it a little bit to make it readable in the output. Set timeout. And call the invalidator. Maybe make it 100 milliseconds. And yeah. We also want to lock something. So maybe make something like file changed. So actually see what's happening. Yeah. Oops. Cool. Kind of works but there's a bug.
So we actually need to delete the invalidator. So remove it. Because once invalidated, function invalidated multiple times doesn't make sense. Because after invalidation, we call it again and then we have a new invalidator which is in the map again.
6. Invalidation and Bubbling
So we only want to invalidate it once. We compare the old result with the new result and only invalidate the parents if something has really changed. The new implementation now only invalidates the new function when saving the file. If something changed but we don't need to recompute it, we print unchanged and stop the bubbling.
So we only want to invalidate it once. So yeah should work. Well, hopefully. Let's try it if I miss something.
Okay, we need to import the getInvalidator function. So it works. The initial build is basically the same. We sequence all the stuff we put in before but now it basically don't exit because we're watching something. So technically, if we change the file it should recompute the stuff and get to the status state again. So maybe I can just save this file. Okay, invalidator. Okay, try again. Saving the file. Oh, now it works. Okay, now we see it's basically invalidating the read function and recomputing the read function and then it bubbles up the graph of execution. So copy is calling read, so copy is also recomputed and copy is called by copy graph and it bubbles up until the main function. There are some problems, like it bubbles up multiple times, like copy graph, copy graph is repeated here because read is used in parse and in copy. And we also, it's also pretty useless to, if I just save the file, to just recompute everything because I didn't change anything in the file. Why do we need to recompute copy graph is the references didn't change at all. So what we want to add is some kind of stop gaps that we can stop the bubbling if the result is the same. So the clock is wrong. Okay, so I have two minutes, so I add this. Yeah I want to like only invalidate if something has really changed, so I can add some kind of simple code to make this work. So we want to compare the old result with the new result. So if the old result is not equal, you shouldn't do that with Stringify, but it works for now. So only if something has really changed, we want to invalidate the parents. So now I'll try this again with the new implementation, and if I only save this file, it now only invalidated the new function, maybe we add some kind of logging. So if something changed, if logging no change, perfect. So if something changed but we don't need to recompute it, then we can just print unchanged, so we can stop the bubbling. If read changed but nothing changed, but if we really change something, like to the AST, like at a comment, we have to actually bubble more, like read will bubble up, copy the file again and then parse the file again, and get references because the references of the file didn't change, don't have to recompute.
7. Incremental Compilation and Granular Changes
When we reorder the imports, the system can follow the graph and only recompute what's needed. A larger example with 10,000 files demonstrates the system's ability to make granular changes to files. Incremental compilation in TurboEngine allows for recompute only affected files, making the system independent of application size.
In the case when we do something like reorder the imports, then we also bubble up and get the references and go further down the tree. So the whole system is really great because then we can go from the watch event and only invalidate like certain pieces of our application, independent of how large the application is, so we can just follow the graph and only recompute what's needed.
So let's try it with a larger example. I have prepared like a directory here with a lot of files, 10,000 files, and they basically are all imported from this kind of function. So it takes a while to compile that initially because we have to, oh wait, stop, I wanted to disable the logging for that, that's why here is a font that prints millions of lines of code. It takes 10 seconds to compile that, but afterwards we can use the same system to make these granular changes to our files, like, waiting for being done.
So if I only change this small file here, then it will do the same if I save it, it will be really fast in reading a file and seeing it the same. And if I change the index file, it's a little slower, because parsing the file already takes a second or so. So we have to add something here, it takes a little bit more, but it doesn't have to copy all the files, because that's basically the same. And if I add some more modules to the file here, then it will recompute that and copy the files that are new, and that kind of stuff. There's a system of incremental compilation in TurboEngine, it's a little simplified, it's a little bit more involved in practice, but I think you get the idea on how you can tackle the problem from the bottom up, and recompute only files that are affected, and make the system independent of application size, and only depending on the size of the change.
8. Integrating with Other Monorepo Tools
Thank you for your interest in TurboPack's integration with other monorepo tools. TurboPack is designed to be integrable with every tool, offering command line functionality and integration tools for the cache. We are particularly focused on deep integration with TurboRepo, our tool, to share caches and create a more integrated experience. Similar to our integration with NX, starting with a close collaboration allows us to build a proof of concept and ensure TurboPack's usability for other tools. Working with a familiar team provides easier communication and understanding of tool requirements.
Thank you. Seems like I got it in people's minds! What about integrating with other monorepo tools like NX, other than Turbo? TurboPack is integrable in every tool, you can call it command line thing, and we probably offer some kind of integration tools for the cache, but we also want to work a bit deeper with TurboRepo, because that's our tool, and our idea is to be able to share the caches between TurboRepo and TurboPack, so it makes it more integrated, but I think we offer the same kind of integration abilities for other monorepo tools. I think that makes sense, like you go with something you're close to, and you can work together to build this kind of proof of concept. Yeah, it's similar to what we did with NX. We built TurboPack with deep integration with NxJS, because that's where we can work together and get to a state where we have everything we know what's needed for TurboPack, and then release TurboPack as a standalone thing, so it's usable for other things too. But going initially with something you know and you can work with a team is much easier to work with than like communicating with external tools, and you don't know what's needed for the tool and yeah. Yeah, you can't like demand things the same way. You can't be like, hey, we really need to push this. It's a lot harder to negotiate that. Yeah. All right, great.
9. Error Display and TurboPack Reliability
How do you display error message on the same line where it was thrown in your IDE? When can I use TurboPack reliably? Our priority is getting Next to work. We want to make TurboPack a standalone tool, simpler and easier to use. We don't support Webpack plugins as they are currently.
Then next one. How do you display error message on the same line where it was thrown in your IDE? What? Yes, that's a plug-in, I think. Not sure which one, but one of these. Whoever asked this, you can find Tobias later and you know, state your case.
So all right. When can I use TurboPack reliably without... Yeah. Yeah. As I said, our priority is getting Next to work, which should be soon. As soon as the case is already working for a lot of applications, but we figured out some edge cases and more of the modern stuff of Next.js. And then we want... It's our goal to make it a standalone tool because we want to have a similar tool compared to Webpack where you can use it as standalone. But there's a lot of stuff to figure out, like how we take a configuration, what the API is. And we don't want to make the same mistakes, this configuration as Webpack, so we want to make it simpler, easier to use. So we need to think about that. And technically, there is the standalone. It's two packs CLI. You can compile yourself in our repo, but it's not really... It has no configuration. It's just... Maybe not a lot yet. It's not something we make public yet, but we plan to do that and it will take a bit. But we are going to get there.
10. Toolpag Plugin System
We want to offer a plugin system for Toolpag, but due to the different architecture, it requires porting Webpack plugins. We are exploring options for importing plugins and considering collaborations with Copilot and ZPT.
So like the Rust plugin API, we can really get really deep into Toolpag. But we want to offer some kind of plugin system. And you probably have to port your Webpack plugin to Toolpag plugin. But that's something we want to do. Because the architecture is different. It's just not something that we can just do. Get Webpack plugin working. Maybe simple ones. But due to the different architecture, we want to have some kind of different plugin system. And yeah. Okay. Yeah. Maybe we can also import through that. Like, through documentation or something like that. Yeah. Yeah. Maybe we can get Copilot to port it over. And get ZPT to port it over.
11. Comparison with Bunn's Builder
We haven't compared build times with Bunn's new builder yet. Since we focus on the incremental build experience and have a dev server, while Bunn focuses on fast production builds, it's difficult to make a direct comparison. However, the author of Bunn has mentioned on Twitter that they will do a comparison. We'll see how it goes.
Yeah. Of course. And then I think probably our final question for this time. Did you compare build times with Bunn's new builder? We didn't compare that yet. But I think the author of Bunn did that. But this problem. Because we currently only have a dev server. Bunn only have a production build system. So it's hard to compare currently. Because we focused on the incremental build kind of experience. The dev experience. And they focused on fast production builds. So I think there will be a comparison eventually. But we probably don't do that. At least not publicly. And maybe they probably do that. At least they said on Twitter that they will do it. So let's see.