Yarn in Depth: Why & How

Rate this content
Bookmark

Since 2017 Yarn proved itself a pillar of JavaScript development incubating numerous features our ecosystem now heavily relies on. As years passed, as competitors improved, so did Yarn, and it's now time today to dive into the features and tradeoffs that make Yarn a truly unique gem of the JavaScript ecosystem.

33 min
01 Jul, 2021

Video Summary and Transcription

Yarn is not just a package manager, it intends to be a project manager with a focus on simplicity and a good developer experience. Yarn's impact on workflows and project management has been positive, improving scalability and release management. It offers features like local fixes, compressed packages, and sharing packages between projects. Yarn's infrastructure and extensive testing ensure compatibility and catch regressions. Yarn is modular, with plans for version 3 and a more powerful plugin ecosystem. The choice between npm and Yarn depends on the project's configuration.

Available in Español

1. Introduction to Yarn and Core Values

Short description:

Today, we're going to talk about what Yarn is and what it can bring you. Yarn is not just a package manager, it intends to be a project manager. It lets you manage scripts, split your application into standalone models, and more. We rely on our contributors to make Yarn sustainable and see it as a project. Soundness is a key value when merging PRs, ensuring that Yarn detects errors and doesn't make uncontrolled assumptions.

Hi everyone, for those who don't know me, my name is Mael. I currently work at Datadog and I've been leading Yarn's development for a few years now. Today, we're going to extensively talk about what it is and what it can bring you. While we won't be able to go into each and every single little quality of life improvements it offers, I hope that by the end of this talk, you will have a better idea of what makes this project unique in our ecosystem.

First, we should discuss what's actually Yarn. If you ask anyone, they will likely tell you that Yarn is a package manager for JavaScript, and they would be mostly right. But it's only part of the story. Yarn is not just a package manager, it intends to be a project manager. One run system. Indeed, if you think about it, Yarn lets you manage scripts. It lets you split your application into standalone models. But as we will see later, it can also be used to manage your releases cycles, to monitor how your team uses your scripts, or even to enforce standards across your monolith. All these tasks go far beyond the typical package manager, and every release pushes the boundaries further by introducing new features.

So, project managers, that sounds nice. How do we get there? What are the things that we look for when managing PRs? We're going to discuss core values. The first thing to realize is that we're a community of contributors. Open source is a very taxing environment, and most projects are struggling finding ways to make their work sustainable, and Yarn isn't exempt. To help with that, we rely a lot on our contributors to be the change that they want to see to contribute back to the project they like. In this way, we don't really see Yarn as a product. We really see it as a project. In practice, it means that our core team spends as much time working on our infrastructure as on the product itself. Recently, we moved from Webpack to ESbuild in order to make building Yarn easier. Multiple comments let you build part of the Yarn binaries from source, so you can easily try out independent features. Yarn is really all about making it possible for you to experiment well past what we as a team could offer by ourselves.

A second important value we are always keeping in mind when merging PRs is soundness. Yarn must tell you if something is wrong in your application. It shouldn't let you make mistakes. It shouldn't let them go unnoticed. It must not make uncontrolled assumptions. This might sound rigid because you have more errors than you used to, but it's really critical whether you author applications or libraries, you need to have confidence that something that works now will also work on your production or when installed by your consumers.

2. Yarn's Impact and Workflows

Short description:

Yarn helps users understand the tool and guides them along the JavaScript DevOps path. Default behaviors and user experience are crucial. Yarn's impact on the work we've made so far has been positive. Workflows like Ready Cycles and Release Workflow have improved scalability and release management. Zero installs keep the project cache within the repository.

Another one is good practices. The JavaScript landscape is large, changes fast and has many very opinionated people. As a package manager we are in a unique position to help our users understand the tool that they are using and to guide them along the hard path. Not only using Yarn should solve a practical need, it should also contribute making you learn and becoming a better engineer along the way in terms of JavaScript DevOps.

We reached the last one of this set, default behaviors, which directly tie into developer experience. Most of our users will only ever use the default commands of our tool, which is a good thing because most of them, they don't have to remember a bunch of command line flags just for the sake of running one specific behavior. For instance, the fact that you can run any script by prefixing it in your CLI with just Yarn, may look a very simple thing, but it surely is one of the reasons some people might find it appealing. User experience is really important and crucial to the Yarn user experience.

We've now seen a bit of what Yarn claims to be. Now, we are going to talk a bit about DevOps. What does Yarn actually do for you, practically speaking? We are going to go over two interesting stories of projects who adopted it. One is an open source project and the other is an internal application that we use at my company. Without surprise, the first one is Yarn itself. Before we dive in, let me tell you a funny story. Back in Yarn 1, we didn't actually use Workspaces to develop Yarn. It was a real problem for us because not only were problems hidden away from the maintainers themselves, we weren't also directly confronted with the values that some features might have. As a result, we were not emerging things even though perhaps we should have just because we couldn't see how impactful they would actually be. When you use Workspaces, it's very apparent that, for example, you need to be able to run a script in all your Workspaces at once. But since we didn't use the Workspaces, it didn't really seem like a huge deal for us at the time. Nowadays, we have an informal rule that the Yarn team needs to use all features shipped into the core, and I believe that had a strong positive impact on the work we've made so far by forcing us to use all the features that we ship as part of Yarn itself.

So, now let's talk about Workflows. The first one we are going to discuss are Ready Cycles. Our previous process back in Yarn 1 was very simple. We just had a single file at the root of the repository, and each PR that people were making was expected to add one line to it. It worked fine, but after switching to a monorepo, it wouldn't have scaled very well since we needed the ability to release each workspace, each package, by itself. So, we developed the Release Workflow, which is not unlike the chance set package that you may know. The idea is that each PR we merge also has to include a little file created by Yarn itself that lists all the workspaces that have been changed by the PR and whether they need to be part of the next release. Our CI validates these files content, and at release time, we simply instruct Yarn to aggregate all version files into classic package bums. Another workflow that we are using are zero installs. The idea is that we decided to keep the cache of the project inside the repository itself.

3. Yarn's Impact on Projects and Datalog

Short description:

For packages, we store one single archive in the repository, allowing instant access and seamless branch switching for contributors. Storing dependencies in the repository is optional, but recommended. Yarn's constraint feature ensures alignment of workspaces and enforces patterns across them. Open source projects benefit from Yarn's first-party support, while even internal projects like Datalog can leverage its capabilities. For example, the telemetry integration plugin monitors script execution and response time, helping identify and prioritize improvements. Keeping a copy of packages in the repository prevents deployment blocks and reduces CI install times.

For packages we have, we just, for each package we have, we just need to keep one single archive. As a result, anyone that flows the repository will be able to instantly start working with Yarn. More importantly, it means that our contributors are able to switch from one branch to another at almost no cost in terms of context switch, which is extremely valuable for us because we are, as I mentioned, we really want contributors to be able to submit a lot of different pull requests to feel confident working on different parts of the code. So being able to switch from one to the other is really important to the contribution workflow.

I still feel the need to mention one last thing about this, which is that this is our trade-off and not necessarily yours. So you don't have to store your dependencies in your repository if you don't want to, we just felt like it worked well for us. So I would certainly recommend it, but it's not something that really is forced on you.

One problem when you have a lot of workspaces and we have a lot, I think we have something like 44 at the moment, is that it becomes difficult to make sure that all the workspaces in your project are aligned in their configuration. For example, how can you ensure that all the workspaces have the exact same version of all your dependencies? It's far too easy to have one workspace suddenly upgraded without upgrading the other, so we have released a feature called the constraints, which you might have preferred because it uses a fairly novel programming language in its configuration, it uses product, and thanks to the constraint feature, we are able to enforce particular patterns across all workspaces in very few lines. And better yet, YARN can often apply the fixes itself. So for example, in the YARN repository, we use this feature in order to enforce the same version of dependencies in the cross all workspaces, but we also use it in order to make sure that all our packets on JSON list the right build scripts to make sure that they whole list the right repository field to make sure that none of them are marked as private, this kind of stuff.

So those three examples give you a good idea of what value open source projects can vary from YARN. In general, it means that they no longer need to like learner or chance sets, because those workers have directly received first party support. That said, even internal projects can benefit from YARN as well. And to talk about this, I'm going to discuss about Datalog. So I work at Datalog and Datalog is a huge web application along with a sizable JavaScript infrastructure which supports everything going from linting to deployments. YARN holds a really central place in this mechanism and we go over three examples.

The first one is the telemetry integration. If you don't know what Datalog is, we are making some really good cloud monitoring, extracting signals out of arbitrary metrics. So given this DNA, I guess it won't surprise you to know that we also monitor the way our developers interact with our infrastructure and in particular in terms of response time. So to achieve this, we wrote a plugin for YARN that watches all scripts that get executed, how much time they take and set it all on the dashboard. All this data then helps us, the front-end platform team, identify potential bottlenecks before they become a problem and generally prioritizes our work.

As we saw with the YARN repository itself, checking independencies has powerful effects on one's ability to efficiently continue to a project. But even in corporate environments, this pattern has its uses. From our data, the remote registry goes down approximately once a month. And by keeping our own copy of the packages, we're never blocked during our deployments. Additionally, despite a very high volume of comments on our main repository, our CI doesn't waste much time running installs because all the packages are already inside the repository. So once you call it, everything is there. And since the clone is usually cached between each CI ROM, everything is already installed in practice. All in all, instead of having to spend minutes installing projects in every single CI, it's really now just a matter of seconds.

4. Using Local Fixes and Yarn's Additional Features

Short description:

In a large project with many dependencies, it's important to have a simple way to use local fixes while maintaining auditability. Yarn provides integration with the cache system and eliminates the need for third-party tools. Additionally, Yarn offers features like compressed packages, sharing packages between projects, selling packages from git repositories, and portable scripts across different platforms.

And one last item I want to mention. So, we have a large project and a lot of dependencies. Some of them have bugs, and we try to fix them and submit the patch upstream. But until the fixes are merged, we need some simple way to use them locally while keeping some kind of auditability. Because we don't want, for example, to just edit files directly within the non-modus folder.

So, some third-party tools exist just for that. For example, the patch package project. But they are not supported out-of-the-box in a way that's directly integrated with the cache system and all its cells. So that's super handy because we don't have to rely on third-party tools that would not be as well integrated.

So, those use cases we went over give you a good overview of what YARN has to offer. But you need to keep in mind that those are only examples. For instance, we didn't talk about how YARN can keep your packages compressed on disk and share them between projects if that's what you prefer, or how you can sell packages straight from git repositories, or how your scripts are made portable automatically across both Windows and POSIX. And that itself is only really a quick list from a single site. I really suggest you to look at the documentation in order to find a lot of new gems you might not even be aware of.

5. Yarn's Development and Infrastructure

Short description:

Yarn's secret sauce lies in its infrastructure and the layers built around it. The extensive end-to-end tests, which include popular third-party projects, ensure compatibility and catch regressions. Building the code base with TypeScript provides benefits in avoiding mistakes and increasing confidence in PRs. Yarn's testing strategy shifted from unit tests to integration tests, simplifying maintenance and facilitating contributions. Trusting the infrastructure is key to efficient code development.

So far we've discussed what Yarn is, what it can bring you and your organization, but now I want to take the opportunity to go further than usual and tell you why it works so well. So we are going to make a quick dive into Yarn's development itself.

A large part of our secret sauce is our infrastructure. It may come as a surprise that the most valuable asset we have isn't Yarn itself but rather the layers built around it. But that's really the case. You probably noticed that Yarn's iteration speed was far beyond the industry standard. And a big reason for that is that we have spent a lot of resources solidifying our foundations from the get-go.

So one example of this are the end-to-end tests. Every four hours a set of comprehensive tests are run. But rather than test Yarn's behavior, we actually test popular third-party projects from the ecosystem. So for example we test Next.js, Create.react.app, Gatsby, Angular, Jest, and those are only some of the big names that we monitor. If any of them accidentally ships something incompatible with Yarn, or the other way around, if Yarn accidentally ships a regression, we are aware of it less than four hours later and can immediately work with the maintainers in order to find a solution. Even if uncommon, having this mechanism in place means that we can afford to be slightly less conservative than usual, because we know that our end-to-end tests will be able to catch if something breaks in the ecosystem due to our work.

Similarly we made a huge effort to build our code base using the latest available tools, including TypeScript, and i can't really stress enough the benefits it got for us. Not only does it help us avoid silly mistakes in our own PRs as part of the core team, but it also raises the confidence that we can have that a PR from someone contributing to Yarn for the first time won't have unforeseen effects. And finally, on the perspective of someone contributing to Yarn for the first time, it's super handy to have directly the types to have all the benefits that a good IDE with TypeScript support can bring you directly.

Finally, our last major also came with a complete overhaul of our testing strategy. At the time we had a bunch of unit tests in Yarn 1 and we instantiated various classes from the core and activated their methods to change their behaviors. It worked well enough for a time, but this unit testing approach proved difficult to maintain on the long term. Refactors were practically impossible because it would have required to rewrite all the tests since every test was relying on the architecture. So for Yarn 2, we decided to try something different, and we went toward integration tests. So we still rewrote all the tests incrementally, but this time we updated them to use directly the CLI binary itself, exactly like real users would have done. This very simple change in concept allowed us to remove a lot of maintenance costs while making it much easier for external contributors to understand how to write tests. All they have to do now is literally just write CLI comments.

So I think the lesson here is very simple. To write efficiently code, you must trust your infrastructure. And in order to do that you sometimes need to accept to pay upfront costs in order to make sure that you will be able to both reach faster iteration speed and decrease the risk of burnout for your team as a chip feature but not being fully convinced that nothing will break. Another reason why Yarn is Yarn is the architecture. More precisely one particular um the way that it is completely modular starting from Yarn. As I mentioned earlier Yarn used to be a monolith.

6. Yarn's Modularity and Future Plans

Short description:

Workspaces didn't exist at the beginning, but we made Yarn modular. Yarn's architecture makes it easy to extend with new functionalities. Yarn 3 is the next major version, with improvements, new features, and a more powerful plugin ecosystem. We are also looking into making Yarn sustainable through a sponsor program. Yarn is alive and in better shape than ever before.

Workspaces didn't exist at the beginning so we didn't use them. Yarn was just one large code base and everything was importing classes from various files but when we worked on Yarn, it became clear that some parts of the application would benefit from being independent from the rest even if only to prevent them from accidentally importing unrelated components they shouldn't even be aware of practically speaking. So one of the first things I worked on was how to make Yarn modular.

Fast forward to today we now have a core containing all the critical algorithm and exposing a bunch of interfaces that models can implement however they want. Most of the features that you actually see in Yarn today are implemented this way. The communication with the npm registry, it's a model. The package tarball generation it's a model. And plug and play itself, it's a model. So it's only at release time that all those models are bundled together to yield the CLI that you know as it.

Interestingly this architecture also makes it very easy to extend Yarn with new functionalities. For instance the focus command as we implemented it is literally a hundred lines inside its own model. You could implement it yourself if you wanted to. And that's really important because if a behavior doesn't match your expectations you don't have to just live with it. You can just implement a new one and let us know how it goes. To give you an example in the past few months at least two community members have started to solve their monorepo deployments by offering plugins dedicated to their use case. And thanks to their experience we get better insight as to how deployments should work and eventually we'll be able to build a standard workflow that will be built upon those experiments.

So where do we see Yarn in the future? No one can really predict what's going to happen but I can already tell you what's currently happening. We are working on the next major version of Yarn which will be Yarn 3. It will feature a chair of improvements, clean up some behaviors in our CLI, add new features, improve performances on commonly used commands, and make our plugin ecosystem even more powerful. At the same time we are starting to look into making Yarn sustainable by looking into sponsor program. So I mentioned earlier that open source projects where a lot of efforts, resources and time and that's where you can help us. If you or your team are interested to sponsor some of the time we spend, some of the releases we make, please contact us and we would be happy to discuss terms that would be favorable to both parties. All this to say Yarn is very much alive and I would say it's even in a better shape than ever before. Our team is aligned, our objectives are clear and our take is sound. Certainly we have taken a more opinionated course than we used to and we think that to most people it will still be a net positive knowing that the tool will protect you from your mistakes. Of course different people may prefer different kind of tools and that's fine because not one project has to please everyone. Right so we've reached the end of this fast presentation I hope you enjoyed it. Now I'll be happy to answer questions you might have on this subject. Thank you. Hey there Mel, really great talk we enjoyed it immensely and I think one of the first orders of business are to actually see the results of your question which I thought was actually really interesting.

QnA

Yarn's Core Values and Q&A

Short description:

What do you look for in a package manager? Yarn focuses on providing a good developer experience with simplicity as a core value. Yarn 2 addresses space-saving with a global store and compressed packages. Yarn version 1 is safe to install even on a mono repo with pre-existing node modules. Yarn is currently happy with Node but is exploring running Yarn inside a browser. Migrating from a multi-repo npm-based system to a mono-repo with Yarn is being improved with tools and features. Being the maintainer of a widely adopted open-source project comes with pressure, balancing feature requests, and maintaining the project's vision. Yarn has gained many contributors and a thriving community engagement.

What do you look for in a package manager in your package manager? You asked the audience, and to be honest, I was surprised. I wasn't surprised because simplicity is important, but 80 percent obviously said simplicity. When you were building out Yarn or as you maintain Yarn, is this like what are the core values that you try to maintain and make sure you deliver?

As I mentioned in the talk, one of the main things that we are looking for is to try to provide a good developer experience. So I'm not entirely surprised by seeing simplicity wins with a sizeable advance over the other numbers because it's something that we really spend a lot of time to be sure that, for example, running Yarn with the script can be done without the run keyword or this kind of thing. First, it works well, so yeah, I'm happy to see that people like it too.

Yeah, definitely interesting results. We had some questions come in from the crowd, so get ready. AC asks, one of the great benefits of pnpm is the amount of space it saves on your workstation. Does Yarn 2 address this? Yeah, so one of the interesting things with pnp is that we can have a global store that contains all the packages and instead of copying all the files into each folder, we can just reference directly this main store. We have two ways of doing this. The first one is by still making the copy but by using a copy on copy on, sorry, copy and write on systems that support it or outlinks. And another way is to simply literally just generate the file that references the global cache so you have literally nothing inside your project. On top of that, all the packages that are installed by tmp are compressed in zip form so for example from v1 to v2 installing create react app is at least one third of what it was before.

Interesting, okay so that also contributes to the speed element in the um so cnos. I'm not quite sure how to pronounce this username but cinos asks when using Yarn version 1 is it safe to run Yarn install on a mono repo with pre-existing node modules? Sure, it uh yeah it really should be um there are some bugs in v1 that have been fixed along the version so right now we are at 2.4 and we are about to start releasing release candidates for 3.0, so 1.0 is uh a few versions old but it should be a fairly safe to install a project even if there's already a none module.

Cool and uh William asks, since Yarn is now fully TypeScript, have you considered using Deno instead of Node? Not really. We are currently fairly happy with uh with Node. Something that we are currently looking is whether we could run Yarn inside a browser so that would decrease the dependencies on the Node runtime. Right now we have too many parts of the core that rely on APIs like Node streams that cannot be easily easily mocked for Deno.

Interesting. Do you have any tips for migrating from a multi-repo npm-based system to a mono-repo with Yarn, in particular with regards to the CI config for multiple deployables? So we are working on on release systems because so being able to, I'm sorry, when we moved to v2 one of the things that we made was to split the repository into multiple independent workspaces that are each deployed when needed. So between releasing versions and making deployables, it's kind of the same thing. So we are building tools that allow us to make this kind of thing more easily. Right now I'm actually working on the feature to allow us to make a pre-release of packages in just a single command line.

That's a really great work. I really have to commend you for being the maintainer of an open-source project and such a widely adopted one. I'd like to push in a question of my own and ask what is it like to be the open-source maintainer of such a widely adopted project and with so many users, probably a lot of issues and future requests, etc.? It's a lot of different things. On one hand, you have a lot of pressure from both people that are using your software and those who are not actually using it but would like to use it. On the other hand, you need to make a balance between the features that you implement and those you want to maintain because sometimes people come to us and ask for features but we know that we wouldn't use them so we have to push back on this because they would be the wrong fit for the project. And on top of that, we need to make a good experience for contributors to just make changes and help improve the codebase one peer at a time. And I think that's something that I'm really happy is that during the past two years we have acquired a lot of contributors that actually stay inside our Discord and keep engaging the community and it's really less lonely than it was at the beginning where it was mostly the same one or two contributors and mostly internal to a company.

Yarn's Support for Multi-Repo Approach

Short description:

Yarn is primarily focused on mono-repo management, but it can be installed in multiple repositories. However, some features, like managing versions between workspaces, are specific to monorepos.

A few more questions coming in from the crowd. Lots of engagement on this talk. So Mac asks does Yarn help with a multi-repo approach? Not so much. We are focusing a lot on having a mono-repo because we think it's the right way to manage large projects nowadays. That being said there is nothing that prevents you from having Yarn installed into multiple repository and to treat each of them as an independent project. The only thing is that if you do this you will not benefit from some of the tooling that we have built. For example the one I was mentioning where you can easily manage the versions, bumps between multiple workspaces. So if you don't want to use a monorepo it's fine. It's just that you will not have this feature and some others. Some of the features are targeted specifically for monorepos.

Choosing Between npm and Yarn

Short description:

The choice between npm and Yarn depends on the project's configuration. If a project uses npm, I use npm for pull requests, and if it uses Yarn, I use Yarn. I contribute to many projects, so I sometimes use npm. The key is to use the right package manager for the job.

OurX, would you use npm over Yarn in a specific scenario or use case and if so which would that be? I'm of the opinion that which package manager to use is not that much led to the user appreciation but rather to the maintainers. So for example when I make a pull request to projects that are configured to use npm then I will use npm. If I make a pull request to project that are configured to use Yarn I will use Yarn. So I sometimes use npm because I contribute to a lot of projects and some of them use npm. So yeah. To use the right package manager for the job you're saying. Exactly.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

DevOps.js Conf 2022DevOps.js Conf 2022
31 min
pnpm – a Fast, Disk Space Efficient Package Manager for JavaScript
You will learn about one of the most popular package managers for JavaScript and its advantages over npm and Yarn.A brief history of JavaScript package managersThe isolated node_modules structure created pnpmWhat makes pnpm so fastWhat makes pnpm disk space efficientMonorepo supportManaging Node.js versions with pnpm
DevOps.js Conf 2022DevOps.js Conf 2022
31 min
The Zen of Yarn
In the past years Yarn took a spot as one of the most common tools used to develop JavaScript projects, in no small part thanks to an opinionated set of guiding principles. But what are they? How do they apply to Yarn in practice? And just as important: how do they benefit you and your projects?
In this talk we won't dive into benchmarks or feature sets: instead, you'll learn how we approach Yarn’s development, how we explore new paths, how we keep our codebase healthy, and generally why we think Yarn will remain firmly set in our ecosystem for the years to come.
JSNation 2022JSNation 2022
28 min
Yarn 4 - Modern Package Management
Top Content
Yarn 4 is the next major release of your favourite JavaScript package manager, with a focus on performance, security, and developer experience. All through this talk we'll go over its new features, major changes, and share our long-term plans for the project.If you only heard about Yarn without trying it yet, if you're not sure why people make such a fuss over package managers, if you wonder how your package manager can make your work simpler and safer, this is the perfect talk for you!
React Summit US 2023React Summit US 2023
23 min
Taming the State Management Dragon
We spend a lot of time discussing which state library we should use, and fair. There are quite a few, from the common one everyone uses and loves to hate on, to that one quirky alternative, to several up and comers. However, discussing which library is best puts the cart before the horse.

When figuring out how to handle state, we should first ask ourselves: what different categories of state do we need? What are the constraints of each category? How do they relate to each other? How do they relate to the outside world? How do we keep them from becoming a giant, brittle ball of yarn? And more.

This might sound overwhelming, but never fear! In this talk, I'll walk you through how to answer these questions, and how craft an approachable, maintainable, and scalable state system. And yes, I will talk about how to pick a state management library too.