Yarn: From Design to Implementation

Rate this content
Bookmark

In this talk, we'll go over the various iterations the Yarn team went through designing one of the most critical software in the JavaScript ecosystem. We'll discuss some of the main choices, trade-offs, errors, and successes we faced, and think about evolutions to come and future challenges that await us. By the end of this talk you'll have a better understanding of the work large-scale open-source teams do behind the team, and hopefully a better understanding of what makes open-source so special.

28 min
15 Feb, 2024

Video Summary and Transcription

Today we'll discuss the evolution and implementation of YARN, which focuses on determinism and stability. YARN Modern was re-architected to support projects with multiple packages and embraced Monorepos. YARN 2 improved workspace implementation, codebase partitioning, and stability. Dependency resolution and linking in YARN are handled by different resolvers and fetchers. YARN has a plugin system, a constraint engine, and a redesigned website. It prioritizes compatibility, performance, testing, and contributions to other projects.

Available in Español

1. Introduction to YARN

Short description:

Today we're going to talk about how YARN came to be, what it evolved into, and interesting tidbits about its implementation. I'm Maren Lison, a staff developer experience engineer at Datadog, leading the development of YARN since 2017. YARN is split into three parts: classics, modern, and implementation details. Classic YARN was created to address problems with consistency, security, and performance at Facebook. We chose to create YARN instead of contributing to NPM because we wanted to address different priorities. YARN focuses on determinism and stability.

Hi, everyone, and welcome to my talk about YARN. Today we're going to talk a bit about how it came to be, what it evolved into, and interesting tidbits about its implementation.

So first, my name is Maren Lison, I'm a staff developer experience engineer at Datadog. And I've been leading the development of YARN since 2017, from the zero point something up to the 4.0 that it currently is.

I split the content of this presentation into three parts. First we're going to talk a bit about YARN classics. Then we're going to go to YARN modern, so everything from 2.0 until today. And finally, we're going to jump into interesting details about either the implementation of YARN or some meta knowledge about YARN.

First thing, YARN classic. So classic is everything that is pre 2.0. Why did we create YARN classic? The answer is if you look at the blog post that got released around the time to introduce YARN, you will see this interesting sentence. As the size of our code base and the number of engineers grew at Facebook, we ran into problems with consistency, security and performance. At the time, the JavaScript ecosystem was very different from what it currently is, and it was very difficult to be sure that when you were running a project it would be working for everyone across the organization. That's something that didn't really work at the scale that Facebook was operating, so they decided to put some resources into trying to figure out a solution.

However, now let's take the same question but with a different emphasis. Why did we create YARN? Why didn't we just contribute to NPM? The reason for that is that it felt at the time there were so many different things that we wanted to attempt to address, that contributing to NPM wouldn't have made sense. Each project has their own priorities, and some of the things that YARN intended to address were not things that NPM intended to challenge, because they felt that it was the developer experience they wanted to have. That's completely fair. We decided that it would be easier to just create one other project that would make different trade-offs and would follow a different exploration path. Contributing wasn't really an option because we didn't want to push against NPM on a project that was entirely theirs. Remember that NPM was also a company at the time, so they had their own priorities.

When we built YARN, we decided to focus on four different core areas. The first one was determinism. It's easy to remember log files as being used everywhere, basically. However, at the time, they were very rare. We were a project. We were not enforcing the version of their dependencies across different installs, meaning that everyone would get different things. We felt that it was important that projects that work now should work in the future, and for that, YARN had to do some focus work on that. Additionally, we wanted YARN to be safe, meaning that we wanted to catch as many bugs as possible before they would even ship in production. So stability was very important to us.

2. YARN Features and Stability

Short description:

We focused on the user experience of working with the CLI, made YARN fully open source, and released it as a separate project. The initial product offering included log files, an offline mirror, and emojis for better CLI output. YARN was fully typed using Flow for improved stability.

A clean UX was also very important. We wanted YARN to be a tool that could be approached by anyone without having deep knowledge of how package managers work, how projects are set up. To that end, we decided to focus a little on the user experience of working with the CLI and not just the implementation of the software itself.

Finally, we wanted it to be fully open source. I mentioned that NPM was a company, and actually it still is a company, but now owned by GitHub. NPM was a startup, meaning that it was receiving investment from VCs, and it felt like the incentives weren't completely aligned with what the JavaScript ecosystem needed. So we really wanted YARN to be fully open source, and it got released as a project that wasn't even part of Facebook. Nowadays, Facebook doesn't actually have any part in it. It's only maintained by people from different companies.

The initial product offering when we released YARN, there were four main different things. The first one, of course, was log files. Log files all the day. At the time, NPM added shrinkwraps, which allowed you to log the version of your dependencies, but were not enabled by default. It also had an effect on transitive installs. They were a little weird and we wanted to have something simpler to understand, and that would be enabled by default. We also shipped with the offline mirror, which also followed the same strategy of how can we make sure that the project will keep being installed in the future. What if the NPM registry goes down, whether temporarily or permanently? Can we make sure that it doesn't affect us, because it doesn't affect our installs? The offline mirror lets you store the tarballs of the cache of your project, somewhere you control, for instance, inside your Git repository, but not necessarily. You can also store it inside a specific file system. So that even if the registry is down, YARN can still install your project just fine. It also shipped with emojis. I know that when you read emoji as a product feature, that doesn't seem very important, but actually, it kind of was for our users. I think it also showed that the output of the CLI is very important, and making it easier to parse for humans makes a lot of sense. Nowadays we do that with different things than emojis. We do that using semantic colors on the different components that we display. But that's the kind of attention that we decided to put into the developer experience.

And finally, this one is more about the maintenance that YARN features themselves, but they go into the stability bucket that I mentioned earlier. YARN was fully typed using Flow, not TypeScript at the time. And that was very important because it allowed us to cache many issues that we wouldn't have caught otherwise if we had not type-checked the codebase. NPM was following a very different strategy where they intended to have full coverage through tests. However, from our experience, doing something like that would still have let things pass that shouldn't have been shipped.

3. YARN Modern and Codebase Evolution

Short description:

We implemented workspaces to support projects with multiple packages and Plug and Play to eliminate the need to install projects. However, the YARN Classic codebase couldn't handle these evolutions, so we had to re-architect YARN into YARN Modern. Monorepos became the core concept, and we embraced dogfooding.

So we decided that it was worth it to spend the time to properly type everything. Along the year, we also implemented many different features. I listed two critical ones for the future of YARN. The first one was workspaces. What if the project was more than just a single package? This was initially inspired by Lerna, which itself built that feature to support the Babel project. But eventually we felt that it was a very interesting idea, and it should be supported natively in the Packet Manager. So we implemented that in YARN, and monorepos grew and grew, and now it's used by much more than just Babel.

Plug and Play also intended to answer the question, what if we didn't need to install our projects in the first place? Each time you make a git clone, you then run YARN install, and then you wait until all the dependencies are installed. What if we didn't have to do that? What if we didn't have known models on each different folder? What if we could reuse all the packages from one project with the other? Plug and Play answered all that. However, as time passed, we started to have a small problem. As good as it was, the YARN Classic hadn't really been designed to support such drastic evolutions. The workspaces had been built as an afterthought. They were built on top of the regular logic, so we had to make a couple of acts in order to make it work. And over time, it meant that adding features became more and more difficult.

Each time we were adding something, we were also risking breaking something else elsewhere, and in places that could sometimes be very difficult to understand. So we wanted to be able to move fast and break nothing when building YARN. Unfortunately, our codebase was not supporting that. Which led us to YARN Modern and the need to rewrite YARN, to re-architect it, to take everything that we had learned from YARN 1.0 and to put it into something that we felt could support us for the 10 years to come. You remember the focus areas I mentioned before, Determinism, Stability, Clean UX, and Open Source. We also added four other ones. The first one was Monorepos. We decided from the get-go, everything would be a monorepo. Even if you have a single package in your project, that's a monorepo with a single package. So the monorepo concept was put core in YARN. Everything is a monorepo. There is no special case of, oh but wait, here I'm a workspace, so there's actually multiple projects everywhere. No. Everything is a monorepo. We also went with dogfooding. We should use the features that we built.

4. YARN 2 and Codebase Partitioning

Short description:

We improved the workspace implementation by actively using it during the development of YARN 2. We also focused on partitioning the codebase to avoid breaking unrelated parts of the software. Additionally, we aimed for YARN to be evolutive, incorporating lessons learned to improve stability. The install design remained similar to YARN 1.x, with resolution, fetching, and linking as the main steps.

One problem that we had with the workspace implementation in YARN was that we were not actually using it ourselves when working on YARN. We only had a single workspace that was only used for one packet that we very rarely published. So it was difficult for us to have a good sense of how easy or not it was to work on workspaces. When we developed YARN 2, we decided to completely change that. We decided that we wanted to actually put efforts into splitting our codebase into multiple slices so that we would directly be faced with a problem that you as users are faced with when working on monorepos and YARN workspaces.

Additionally, this also led to another concern which was partitioning. In YARN 1.x, we had a couple of very important files that contained a lot of logic. And the problem was that if we were to work on those files, it would be very easy to break something elsewhere. We decided to change that by better splitting the different parts of the YARN application so that if we are working on something, we don't accidentally risk breaking something in an unrelated part of the software. It also tied into workspaces because we did this partitioning by putting the logic into different pieces.

And finally, we also wanted YARN to be evolutive. I mentioned that we had planned the design of YARN 2 and more as if we were designing for the 10 plus years. That's exactly that. We wanted YARN not to have to rewrite another time the software from scratch. So, this time, we really made a conscious effort to take everything we had learned and improve it in a way that would be stable. In terms of install design, we stayed on what we were doing in YARN 1.x. So, we had three different steps. The first one is the resolution where we take all ranges, all dependency ranges, and we turn them into versions that are representing a single package, a single tarball. Then we fetch everything all at once, and once everything has been downloaded, we start the linking, which is when we start putting the packages directly on disk after having downloaded them. However, what we did differently is that now we have a modular pipeline.

5. Dependency Resolution and Linking in YARN

Short description:

Different resolvers handle different dependency syntaxes during resolution, such as npm resolver for semver, git resolver for git tags, and patch resolver for applying patches. Fetchers retrieve package content, while linkers support various installation options in YARN.

During the resolution, you have different resolvers that can take care of different syntaxes of dependencies. For instance, you have the npm resolver, of course, which handles everything with the semver branch, but you also have the git resolver, which is able to turn a git branch tag into a git hash, or the patch resolver, which is able to apply a patch file to a package. And all that is then passed along to the fetchers, which implement their own logic in order to actually fetch the content of the package. Resolvers are for metadata, whereas fetchers are for tarballs.

Finally, during the linking, we have linkers, which are very important because they are what allow YARN to support non-modules installs, pnpm installs with symlinks, and pnp installs with neither symlinks nor non-modules. Thanks to the linkers, you have a lot of choice on how you want to use YARN. Do you want to use YARN pnp and be a bit more mindful about the dependencies that you are using? That's fine. Not to have to worry about anything and just have a package manager that works like in YARN 1, just use a non-modules for a linker and it will work just fine. The non-modules linker is actually more stable in YARN 4 than it used to be in YARN 1. We fixed a couple of more intricate bugs. So if we are still on YARN 1, consider migrating on YARN 4 with a non-modules linker.

6. YARN Plugins and Constraint Engine

Short description:

YARN splits commands into different packages called plugins, which are pre-installed and provide various functionalities. The codebase is organized into plugins to facilitate contributions and allow easy modification of specific features or bug fixes. YARN includes a constraint engine that enables the creation of custom rules for projects, such as enforcing consistent package dependencies across workspaces. The constraint engine has transitioned from Prolog to a new JS-based engine in YARN 4.0, making it more accessible for users. Additionally, the YARN website has undergone a complete redesign.

We also split the commands apart in different packages which we call plugins, but it doesn't change anything for the end users because those plugins are built by default into the YARN.js file that we distribute to our users. So for instance, all the yarn install, yarn add, yarn remove, yarn up commands are all put inside the plugin essentials which contain all the very critical commands to the YARN user experience. And all the commands that are related to the YARN registry are put inside a different plugin which is the plugin npm-cli. So you have yarn npm-login, yarn npm-info, yarn npm-publish, all these commands in the same place.

Over the time, we decided to make a lot of different split points. So for instance, you have the plugin compat, which contains a bunch of compatibility rules that are helping both YARN and pmpm to properly install projects that are not correctly listing their dependencies. You have the plugin constraints that implement the constraints rule on Djane that I'm going to talk a bit later. You have the plugin exec which implements the exec protocol. We have a lot of different packages and one question that you may have is, isn't it perhaps a little too much? The answer is no.

You see, YARN is always distributed as a single JavaScript file, as I mentioned, yarn.js, which contains all those plugins pre-installed. Our users don't need to see or care about the plugins that we provide. However, it provides a lot of values to contributors. If you are someone who wants to implement a feature or want to fix a bug, you just have to go into our repository and you will see all the titles of the different descriptions of the plugins. It will let you quickly figure out what is the place that you need to change in order to address, to implement your feature or address the bug that you saw. And splitting the codebase is not an act of distribution, it's not about making it split in multiple segments. However, it is about making sure that we are delimiting all the different pieces of YARN so that it can be more easily discovered. Now, we are going to talk a bit about interesting tidbits about how YARN is implemented, things that you may not know about us. YARN ships with a project linter, we call it the constraint engine. You can call it with YARN constraints, and it allows you to create custom rules that will be applied to your project.

So, for instance, on the YARN repository itself, we are creating a rule that enforce that no two workspaces in the same project can have the dependency on the same package, but with different versions. That's something that a lot of projects want, but each time there are small subtleties on the actual implementation, for instance, at Datadog, we use a similar rule. However, we want some special cases that are allowed to have different versions. So, in order to do that, we just have to write a constraint and to declare the special case directly within the JS code. It used to be Prolog, so it used to be constraints.pro. However, the syntax was a bit too alien for our users, and that's completely fair. Prolog is a very interesting language. However, its syntax is a little complicated. So, in YARN 4.0, we introduced a whole new JS-based engine, which is faster and uses the regular JavaScript or TypeScript syntax that you are used to. So, it's much easier now to write JavaScript, to write constraints for YARN 4.0. We recently completely redesigned the website.

7. YARN Website, Compatibility, and Performance

Short description:

YARNplicator.com showcases the beautiful redesign of the website. The documentation has been rewritten for improved clarity. Users are encouraged to provide feedback and contribute through Discord or opening a PR. YARN introduced a package called yarn-pkg-slash-fslim in versions 2 and 4 to address compatibility issues with Windows. This library treats all paths as POSIX, ensuring compatibility and preventing bugs. The file system library also supports virtual Node.js layers, enabling the overlay of a virtual file system for specific use cases. Performance tracking is automated to support the development of YARN.

If you go on yarnplicator.com, you will see that it's really beautiful. Frankly, I'm super happy about what it now looks like. In order to do that, we actually hired a designer. We worked with a Ukrainian designer for the website for a couple of months. And as you can see, the result is really great.

We also rewrote most of the documentation in order to make sure that most pages were more clear and straight to the point. However, it is a bit difficult for us sometimes to put ourselves into the shoes of someone who is just getting started with YARN, of course, because we are maintainers, so we have a lot of contacts. So, if you see something that is not quite clear enough, feel free to ping us on Discord to let us know. And, of course, if you want to open a PR to propose something better, that's even better.

YARN implements a five system library. One problem that we had with YARN 1 and that we don't have any more with YARN 4 is that we had a lot of compatibility issues with people using Windows due to PATH, especially the separators slash versus backslashes. And in order to solve them in YARN 1.0, we were having a lot of code in various places of the software, making, hey, if we are on Windows, then do the split using a backslash rather than a slash. And if we were forgetting to do that, well, to add, it meant that we had a bug that was silently shipped into production.

So, in YARN 2 and 4, we implemented a package called yarn-pkg-slash-fslim that allows you to treat all PATHs as if they were POSIX. So, everything is using regular slashes and they are properly typed, TypeScript-wise. By doing that, it means that we can make sure that everything that goes into the node fs module is in the correct syntax that we expect. So, with a slash. And then, the library turns them into either slash or backslash, depending on the actual system. Everything is centralized into this library. So it doesn't have to be inside. So, the logic, the compatibility logic doesn't have to be outcoded into the YARN business code itself.

Additionally, this file system library also supports virtual Node.js layers, which are fairly advanced and lets you overlay a virtual file system on top of what the fsa module would typically see. For instance, one thing that we are using it, one place where we are using it is when we are building the documentation of YARN. We are building a virtual file system that contains all the generated files for all the CLI commands of YARN. So, for each of those commands, we generate a file on the Flash, sorry, we generate a file on the on-demand that contains the documentation for these CLI commands using the source code as a source. It's quite powerful, although it is very advanced, of course, so it's not something that you want to use everywhere.

We track our performances. So, one thing that is very important when you work on a project the size of YARN is to realize that you don't have the time to do everything yourself. And it's very important to have a good automation to support you to do the manual work instead of you. One of the places where we put some automation is on tracking the performances of YARN compared to both itself and other package managers.

8. YARN Testing, Compatibility, and Contributions

Short description:

YARN runs tests on NestJS and Gatsby to compare package manager install times. YARN also tracks compatibility with other projects and works with maintainers to address issues. YARN uses ZIP archives in its cache for efficient file access. YARN contributes to pnpm with the linker and shell interpreter. YARN is involved in the Core Pack initiative and working groups for defining loaders in Node.js. Feel free to ask questions on Discord or find me on Twitter and GitHub. Enjoyed the talk!

So, every couple of hours, I think it's every six hours, something like that, we are running a set of tests on both installing NestJS and Gatsby and we see how much time it takes for all package managers, depending on multiple parameters. For instance, when there is a full code installed, when there is already a log file, but no cache, when there is a cache but no log file, when there is everything at once. And it lets us see trends. Nowadays, all package managers are fairly equal in speed, especially YARN 4 and PMPM are very similar in install time. However, it's interesting for us to see whether either we ship something that contains a performance regression or one of the competitors managed to find a way to significantly increase their speed, in which case we investigate as well so we can implement that. That's quite interesting to do that automatically.

Still in the domain of automation, we are also tracking compatibility with many projects in the open source ecosystem. So, for instance, every four hours, we are running tests on ESBuild to make sure that we can run YARN at ESBuild and that the project that we thus install can be used just fine. So, we not only run YARN tests, but we also run ESBuild, Ocusaurus, ESLint tests on an hourly basis. Each time one of those builds becomes a RAIL, it sometimes happens, for instance, let's say that one project starts relying on a dependency that they forgot to declare in their package.json, it puts the test red and we start working with maintainers to let them know about the potential issue that is inside their software. We figure out whether it's a problem with their package or with YARN. It sometimes happens, although it's rare, and we work together in order to address it.

The YARN cache contains zip archives and not TGZ. The reason for that is that TGZ archives must be completely decompressed in order to access a single file, whereas ZIPs can allow you to access one file inside the whole archives without having to uncompress everything. The YARN allows you to directly run your scripts by dynamically requiring files from the cache, and in order to do that, we need to be able to pinpoint the file that we access inside the cache, hence why we are using ZIPs and not TGZ. We considered using a custom format. However, zip archives are natively supported by most OSes and third-party tools, so it's quite handy to be able to just use ZIPs. We also implemented a ZIPFS VS Code extension in order to add ZIP support to VS Code.

There are bits of YARN in pnpm. If you are a pnpm user here, you might be interested in knowing that if you are using the node-linker flag in your npm IRC without the hoisted value, it will tell pnpm to install the project using a regular non-modules folder, which can be handy for compatibility purposes, and that's powered by the YARN linker. I mentioned before that we have a non-modules linker, a pnpm linker, and a pnp-linker. Both the pnp-linker and the non-modules linker are reused by pnpm so that their users can also benefit from those type of instants. If you are using those settings, you are using YARN. We also have a shell interpreter that is used by pnpm. If you have the shell emulator true setting that is toggled true, pnpm will treat it to mean that you want to run all the scripts in your packet.json through a special interpreter that is working on both Linux and Windows without having to configure special shells in your system. That's something that is implemented by YARN in JavaScript. We implemented this portable shell that works across all systems, and we published it as a package so it can be reused by other package managers. YARN also contributes to Node.js.

So, for instance, the Core Pack initiative, which aims to easily use your preferred package manager regardless of the project that you are working on, that's something that we started ourselves. We are working with Node.js in order to make it stable and enable it by default. We are working in working groups where we are attempting to tackle large issues that also affect other projects. For instance, the loader working group, which intends to define what is a loader in the context of Node.js. You might be familiar with the concept of loaders, for instance, for Webpack. What does it mean for Node.js? That's something that we are working on, along with a couple of other folks on the Node.js project.

Thanks for your time. This talk was short, so I'm sure you have a lot of questions. Feel free to ask them on Discord, and you can also find me on Twitter on GitHub using the Arcanist handle. I hope you enjoyed this talk. Have a good day.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

DevOps.js Conf 2022DevOps.js Conf 2022
31 min
The Zen of Yarn
In the past years Yarn took a spot as one of the most common tools used to develop JavaScript projects, in no small part thanks to an opinionated set of guiding principles. But what are they? How do they apply to Yarn in practice? And just as important: how do they benefit you and your projects?
In this talk we won't dive into benchmarks or feature sets: instead, you'll learn how we approach Yarn’s development, how we explore new paths, how we keep our codebase healthy, and generally why we think Yarn will remain firmly set in our ecosystem for the years to come.
GraphQL Galaxy 2022GraphQL Galaxy 2022
31 min
Your GraphQL Groove
Building with GraphQL for the first time can be anywhere between daunting and easy-peasy. Understanding which features to look for in your client-side and server-side tooling and getting into the right habits (and ridding yourself of old habits) is the key to succeed with a team of any size in GraphQL.

This talk gives an overview of common struggles I've seen numerous teams have when building with GraphQL, how they got around common sources of frustration, and the mindset they eventually adopted, and lessons learned, so you can confidently stick with and adopt GraphQL!
DevOps.js Conf 2024DevOps.js Conf 2024
25 min
Atomic Deployment for JS Hipsters
Deploying an app is all but an easy process. You will encounter a lot of glitches and pain points to solve to have it working properly. The worst is: that now that you can deploy your app in production, how can't you also deploy all branches in the project to get access to live previews? And be able to do a fast-revert on-demand?Fortunately, the classic DevOps toolkit has all you need to achieve it without compromising your mental health. By expertly mixing Git, Unix tools, and API calls, and orchestrating all of them with JavaScript, you'll master the secret of safe atomic deployments.No more need to rely on commercial services: become the perfect tool master and netlifize your app right at home!

Workshops on related topic

JSNation 2022JSNation 2022
99 min
Finding, Hacking and fixing your NodeJS Vulnerabilities with Snyk
WorkshopFree
npm and security, how much do you know about your dependencies?Hack-along, live hacking of a vulnerable Node app https://github.com/snyk-labs/nodejs-goof, Vulnerabilities from both Open source and written code. Encouraged to download the application and hack along with us.Fixing the issues and an introduction to Snyk with a demo.Open questions.
React Summit 2022React Summit 2022
51 min
Build Web3 apps with React
WorkshopFree
The workshop is designed to help Web2 developers start building for Web3 using the Hyperverse. The Hyperverse is an open marketplace of community-built, audited, easy to discover smart modules. Our goal - to make it easy for React developers to build Web3 apps without writing a single line of smart contract code. Think “npm for smart contracts.”
Learn more about the Hyperverse here.
We will go over all the blockchain/crypto basics you need to know to start building on the Hyperverse, so you do not need to have any previous knowledge about the Web3 space. You just need to have React experience.
React Advanced Conference 2021React Advanced Conference 2021
168 min
How to create editor experiences your team will love
Workshop
Content is a crucial part of what you build on the web. Modern web technologies brings a lot to the developer experience in terms of building content-driven sites, but how can we improve things for editors and content creators? In this workshop you’ll learn how use Sanity.io to approach structured content modeling, and how to build, iterate, and configure your own CMS to unify data models with efficient and delightful editor experiences. It’s intended for web developers who want to deliver better content experiences for their content teams and clients.