Rethinking Bundling Strategies

Rate this content
Bookmark
We take a look at different challenges and decisions when bundling code for web applications. We look at how these are commonly solved and why we need to rethink them.

FAQ

Tobias Cobbers is the creator of Webpack, a popular module bundler for JavaScript. He joined Vessel and contributed to improving Webpack for Next.js. Currently, he is working on TurboPack and integrating it with Next.js.

TurboPack is a new tool developed by Tobias Cobbers that aims to improve upon Webpack's features and integrate seamlessly with Next.js. It focuses on efficient bundling strategies and better caching mechanisms to enhance web development workflows.

Tobias Cobbers highlighted two main challenges in bundling: ensuring deterministic builds and managing small input changes to result in small output changes. These challenges are crucial for effective long-term caching and minimizing the impact of updates on bundled resources.

Long-term caching in web development involves storing web resources in a browser's cache to improve load times and reduce server requests. It utilizes techniques like immutable caching, where resources are cached without revalidation, and e-tag caching, which allows browsers to check if the content has changed before downloading it again.

Webpack addresses content hash dependencies by using a manifest file that lists all the chunk hashes. This prevents changes in one part of the application from affecting unrelated parts, thereby optimizing caching and minimizing the need to re-download unchanged assets.

TurboPack proposes improvements such as more efficient handling of module fragments and exports, reducing unnecessary code in bundles, and optimizing the generation of module graphs to focus only on used exports. This leads to faster builds and more efficient application performance.

Effective code splitting strategies involve isolating changes to specific entry points or pages, ensuring that changes in one part of the application do not impact others. This can be achieved through heuristic methods such as separating node module dependencies from application code to leverage long-term caching more effectively.

Tobias Koppers
Tobias Koppers
32 min
08 Dec, 2023

Comments

Sign in or register to post your comment.

Video Summary and Transcription

The talk discusses rethinking bundling strategies, focusing on challenges such as long-term caching and improving the state of Next.js and Webpack. It explores handling immutable caching and content hashes, optimizing asset references and page manifests, and addressing issues with client-side navigation and long-term caching. The talk also covers tree shaking and optimization, optimizing module fragments and code placement, and the usage and relationship of TurboPack with Webpack. Additionally, it touches on customizing configuration and hash risks, barrel imports and code splitting, and entry points and chunking heuristics.

1. Rethinking Bundling Strategies

Short description:

I'm Tobias Cobbers, the creator of Webpack. Today, I want to talk about rethinking bundling strategies, focusing on two challenges in writing bundlers. The first challenge is long-term caching, leveraging the browser cache to store resources between deployments. The second challenge involves improving the current state of Next.js and Webpack. Let's dive into these challenges and explore how we can do better.

Thank you. Yeah, I'm actually talking about rethinking bundling strategies today, and my name is Tobias Cobbers. I created Webpack 11 years or 12 years ago, and two years ago, or three years ago, I joined Vessel and worked a little bit on Next.js, improving Webpack for Next.js.

Now I'm working on TurboPack and integrating Next.js with TurboPack. My talk is actually a little bit more general-facing, so I want to talk about a few things. I want to look at two different challenges in writing bundlers. We're actually looking at the magic in bundlers. So I grabbed two topics for that, two challenges that I currently or in the future will face with building TurboPack. And I want to go a little bit deep into that because I think learning these bundler magic can be important, even if you technically should not face it in your day-to-day job. The bundler should make it transparent and should not face you with all these challenges. It should just solve it magically. But I think it's still useful to know it, and you get some deep insight of that, and it may help you in a few edge cases.

First, I want to present these two challenges, and then go into the current state with Next.js and Webpack for that. And after that, I want to spend a bit of time rethinking that and how we can improve on that, what we can do better in the future, and what we actually want to do on TurboPack with these challenges. A little disclaimer first, I mostly work with Next.js, Webpack, and TurboPack, so everything is from the perspective of these tools. And there are still other tools outside, and they have similar things, different implementations. And although most of the ideas are not really new, it's more inspired by other tools and yeah.

The first topic is mostly about long-term caching, which is really not very known by many people. And so what is long-term caching at all? So long-term caching means we want to leverage the browser cache, so the memory cache in the browser to store our resources, and especially between deployments. So there are basically three levels, or three practical levels of leveraging browser cache. The first one is max edge caching, where you just specify my resources are valid for two hours, and you don't have to check that again, and you can just use the cache for two hours. But in practice, it's pretty much unsuitable for our case of application, because we might have a critical bug fix to fix, and we want to deploy something, and we don't want to wait two hours until the user actually gets a bug fixed. So we don't want to use that at all. And what we want to use is like e-tech caching, for example. And e-tech caching means basically when the server responds with the resource, it sends a special header, e-tech, which usually contains a hash of the content, and then the browser stores that in this cache, and basically, in the cache. And you also want to specify three valid dates, so like the next time the browser wants to use the resource, it just does a new request for that, but it includes a special if-not-match header, which includes the e-tech, so the hash of the content, and then the server might, if the resource didn't change in the meantime, it might respond with a special status code, like, hasn't changed, you can just use the cache, and you don't need to download it again. And that basically always works, that's great. But it always also re-validates the request. So it basically sends a new request, you have to pay the round-trip, but you don't have to pay the download cost. So it's good, but you can do better.

2. Handling Immutable Caching and Content Hashes

Short description:

The best way to handle caching for static resources is through immutable caching, where the browser can cache the resources indefinitely. To ensure consistency, a unique URL with a content hash is used, allowing for easy updates without breaking the cache. To achieve deterministic builds, the bundler must generate the same output for the same application, while also ensuring that small changes result in small output changes. However, handling content hashes becomes more complex when there are references between different parts of the application. Webpack and Next.js have made progress in solving these challenges, but the issue of content hashes remains.

The best one, I think, is at least for static resources and for that stuff, is immutable caching, which means you send cache control immutable and a few other headers, and that means that the browser can cache it forever, never have to do a round-trip, never have to request it again, just can store it forever, usually one year or something.

But it only works, basically, if it stores it without re-validating forever, you basically can't change the content of the resource, because if you change it, then it might be inconsistent, and browsers might have still it cached, it doesn't work.

So usually you tackle that by making the URL of that unique in a way that it never changes. So usually the thing is that just add a content hash into the URL, you might saw that with file names having this hash attached, and that makes the URL that unique that it will never change and if you deploy a new version, it will just get a new URL with a new hash.

Yeah, that would be the best one. So how do we face that from a bunch of levels? So the challenge can be solved with a few different techniques. So one thing is we want to make the bundler in a way that it's generating deterministic builds. So a build should, if you build the same application, it should just generate the same output asset so that the cache can actually be used. If you would generate different output assets, then you can't use the cache. But you also want another property. You want this property that even if you do a small change to your application, which you usually do, like in every pull request or whatever, you want a property that a small change results in a small output change. If you only change one module, you might only expect one or few chunks change in the output bundle. And yeah, that's sort of the way that we can generally use our browser cache. Now we want to use this immutable caching thing, so we won't just put a content hash on every source or every file name we emit from the bundler. It sounds pretty easy. You just hash the content, add it to the file name. But it gets a little bit complicated because there are actually references between the different things in your application. So like an example, HTML references your chunks, the chunks reference each other, maybe for async loading and that stuff. And chunks also reference assets, like images, fonts, that stuff. And so that's where the problem comes in. So yeah, so we basically solved these first few things with Webpack in the current state with Next.js. So to make deterministic builds, we just be careful implementing that and try to avoid absolute parts, basically avoid absolute parts. And to make it independent of these changes where you clone your repository to a different directory and all that stuff. And that's pretty easy, actually. And the more difficult one is this property of small input change, small output change, where you have to consider every algorithm to make it actually not having this whole application effect. Like module IDs, we can't really number them one by one, we have to... Because if you number it one by one, inserting one module at the start would rename all the modules not to the property we want. So making usage of hashes to generate module IDs, and also to chunk your modules into chunks, you have to make it deterministic in a way that small changes were turned into small output changes. It's also relevant for optimizations, like mangling and that stuff. In general, we solved a few things, but let's look into this content hashes problem.

QnA

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building Figma’s Widget Code Generator
React Advanced Conference 2022React Advanced Conference 2022
19 min
Building Figma’s Widget Code Generator
Widgets are custom, interactive objects you place in a Figma or Figjam file to extend functionality and make everything a bit more fun. They are written in a declarative style similar to React components, which gets translated to become a node on the canvas. So can you go the other way, from canvas to code? Yes! We’ll discuss how we used the public Figma plugin API to generate widget code from a design file, and make a working widget together using this.
Start Building Your Own JavaScript Tools
JSNation 2023JSNation 2023
22 min
Start Building Your Own JavaScript Tools
Your first JavaScript tool might not be the next Babel or ESLint, but it can be built on them! Let's demystify the secret art of JavaScript tools, how they work, and how to build our own. We'll discover the opportunities in our everyday work to apply these techniques, writing our own ESLint rules to prevent mistakes and code transforms to make breaking changes easy to apply. We’ll walk through the fundamentals of working with an abstract syntax tree, and develop our understanding through a live-code. You will be amazed at what you can build, and together we’ll explore how to get started.
Advanced linting rules with ESLint
TypeScript Congress 2023TypeScript Congress 2023
10 min
Advanced linting rules with ESLint
This talk will explore more advanced ways to write static analysis rules in ESLint using ESLint's control flow APIs. I will quickly explain what a control flow graph is and how you can use it to find issues in your code. I will show you how to detect when a value is assigned to variable uselessly and other logical problems you can detect using this technique.
How not(!) to Build Real-time Apps
Node Congress 2024Node Congress 2024
10 min
How not(!) to Build Real-time Apps
Are you building a chat app, a way to see users’ online status or a real-time collaboration dashboard? All of these use cases have one thing in common: Somehow the user-facing application needs to be informed in real-time about events that happen on the backend of your application.In this talk, we’ll look closely at common approaches like polling, application-level updates and pub-sub systems. We’ll explain the tradeoffs with each approach and elaborate why another approach, called Change Data Capture (CDC), is the most elegant and robust way to achieve this.
Building a Network Stack for our Browser Extension
Node Congress 2024Node Congress 2024
19 min
Building a Network Stack for our Browser Extension
Engineering problems often repeat themselves in places you wouldn't expect. Sometimes the best solution has already been invented, in a different corner of the software engineering domain. In this talk, we show how and why we mirrored the TCP/IP network stack to solve a communication problem between different components of a browser extension.

Workshops on related topic

Build React-like apps for internal tooling 10x faster with Retool
JSNation Live 2021JSNation Live 2021
86 min
Build React-like apps for internal tooling 10x faster with Retool
Workshop
Chris Smith
Chris Smith
Most businesses have to build custom software and bespoke interfaces to their data in order to power internal processes like user trial extensions, refunds, inventory management, user administration, etc. These applications have unique requirements and often, solving the problem quickly is more important than appearance. Retool makes it easy for js developers to rapidly build React-like apps for internal tools using prebuilt API and database interfaces as well as reusable UI components. In this workshop, we’ll walk through how some of the fastest growing businesses are doing internal tooling and build out some simple apps to explain how Retool works off of your existing JavaScript and ReactJS knowledge to enable rapid tool building.
Prerequisites:A free Retool.com trial accountSome minimal JavaScript and SQL/NoSQL database experience
Retool useful link: https://docs.retool.com/docs