Yarn in Depth: Why & How
AI Generated Video Summary
Yarn is a package manager and project manager that prioritizes soundness and reliability. It offers workflows and features like zero-installs and workspace alignment. Yarn's success lies in its infrastructure and hidden gems. It has a modular architecture and is committed to the ecosystem. Yarn emphasizes correctness and is a better fit for those who value it.
1. Introduction to Yarn
So it's a product manager which sounds nice, but how does it translate in practice? What are the things that we look for when we merge Peers that give the project what it is? Let's discuss core values. So the first thing to realize is that we are a community of contributors. Open source is a very taxing environment and most projects are struggling finding ways to make their work sustainable. Yarn isn't exempt of that. To help a bit, we rely a lot on our contributors to be the changes they want to see, to contribute back to the project they like. In practice, it means that our core team spends as much working on our infrastructure as on the product itself. Recently, we moved from Webpack to ESBuild to make building yarn easier. Multiple comments let's you build part of the Yarn binaries from sources. So you can easily try out independent features. Or even write them yourself and start using them in your project without waiting for them to be merged. Yarn is all about making it possible for you to experiment well past what we could offer by ourselves.
The second very important value we are always keeping in mind when developing on Yarn is soundness. Yarn must tell you if something is wrong in your application. It must not let it go unnoticed. It must not make uncontrolled assumptions. This may sound a bit rigid, and is the reason why you have so many problems with x is depending on y without listing their dependencies, but it's really critical. Because whether you alter applications or libraries, you need to have confidence that something that works now will also work on production or when installed by your customers. If something needs to break, then it needs to break early so that you don't get it later on when you're not prepared for it. Good practices.
2. Yarn's Impact and Workflows
We reached the last one on this set, developer experience, doing the right thing by default. Let's take an example. In Yarn 1, there was a comment named Yarn Check. It was validating that the project was correctly installed. As a result, various companies had a policy of calling Yarn Check after every install, just to be sure that everything was okay. We removed this comment from Yarn 2 and can you guess why? The thing is, Yarn should do and already does the right thing by default. You shouldn't have to validate that your project is correctly installed, because it should be correctly installed from the get go.
So we've seen a bit of what Yarn claims to be. Let's talk a bit DevOps. What does Yarn can do for you, practically speaking? We're going to go over two interesting stories of projects who adopted it. One is a web open source project you may have heard of. The other is an internal application. Without surprise, the first one is Yarn itself. Before we dive in, let me tell you a funny story. Back in Yarn 1, we didn't actually use workspaces to develop Yarn, and it was a real problem because not only were bugs hidden from us, we also weren't directly confronted with the value of some important features. For example, when you use workspaces, it's very apparent that you need to be able to run a script in all your workspaces at once. However, since we didn't use those workspaces ourselves, it didn't seem like a huge deal to us at the time, and thus, the feature took a long time to come. Nowadays, we have an internal rule that the Yarn team needs to use all features shaped into the core, and I believe it has a strong positive impact on the work we've made so far. This is one reason why the Yarn repository is so important for us because it's where we experiment with a lot of features, not all of them being used by our users themselves. Anyway, let's talk about workflows.
The first one are release cycles. Our previous process was very simple. We had a file at the root of the repository, and each PR was expected to have one line to it. It worked fine, but after switching to a model repo, it wouldn't have scaled very well since we needed the ability to release each workflow by itself. So we developed the release workflow instead. The idea is that with each PR we merge, it has to include a little file created by Yarn that lists all the workspaces that have been changed by the PR and whether they will need to be part of the next release, and whether it will be a minor, a patch, or a major version. Our CI will validate that this file exists and its content, and at release time we simply need to tell Yarn to aggregate all the version files currently checked inside the repository into package bugs and charge log generation.
3. Yarn's Zero-installs and Workspace Alignment
We use Zero-installs to keep the project cache inside the repository, allowing instant project loading and easy branch switching. Yarn's constraint feature ensures workspace alignment and enforces patterns across all workspaces. This eliminates the need for external tools and provides consistency. At Datalog, we use a Yarn plugin to monitor script execution and identify potential bottlenecks. Checking in dependencies helps us avoid deployment issues caused by remote registry downtime.
Another workflow that we are using are Zero-installs. The idea isn't that complicated. It's just that we decided to keep the cache of the project inside the repository itself. So, for each package that we have, we keep exactly one zip file into the repository. As a result, anyone that loads the project is able to instantly start working on Yarn. They don't even have to run Yarn install. More importantly, it means that our contributors are able to switch from one branch to another at almost no cost in terms of context switching. That's extremely valuable because it ties into the FAPR experimentation subject that I mentioned earlier.
We can quickly iterate on multiple different branches without having to pay the cognitive costs of having to run frequent installs each time you run git checkout. I safely need to mention that adding the packages of the cache into the repository is a trade-off that we made for our project, but it's not something that you absolutely have to do yourself if you're not convinced by the idea. We know that a few other projects have decided that they wanted to run Yarn install as they always used to and that's perfectly fine. It's just that nowadays you have the choice.
One problem when you have a lot of work spaces, and we have a lot, I think we have something like 44 at the moment, is that it becomes difficult to make sure that all the workspaces are aligned in their configuration. For instance, how do you ensure that no two work spaces will depend on different versions of the same dependency? It can become a daunting task and existing tools didn't help much with that. Now, thanks to the constraint feature in Yarn core, we are able to enforce past-period patterns across all work spaces in a very few lines. Better yet, Yarn can apply the fixes itself. So for example, the Yarn repository itself, we are using that feature, not only to enforce that all dependencies of all workspaces are the same one, but we also use it to ensure that, for example, the license is properly set on all of our packages, that the build scripts are consistent, this kind of thing. So, those three examples give you a good idea of what value approach projects can derive from Yarn. In general, they no longer need tools like learner or churn sets, as those workflows now receive first party support. But InterLab projects can benefit from Yarn as well, as we are going to see.
4. YARN's CI Efficiency and Hidden Gems
Despite a high volume of commits, our CI doesn't waste much time running installs. YARN supports the patch protocol, integrating it with the cache system. YARN offers many hidden gems, such as compressed packages, installing from Git, and portable scripts. YARN's success lies in the infrastructure built around it, solidifying its foundations and enabling fast iteration.
Additionally, despite a high volume of commits, our CI doesn't waste much time running installs. All in all, instead of having to spend minutes installing projects in every single CI, it's now just really a matter of seconds.
And one last item I want to mention. We have a lot of projects, and a lot of dependencies. Some of them are bugs, and we try to address them as well as we can. But until the fixes are actually merged in the upstream project and released, which is sometimes a bit difficult, we need some simple way to use our changes locally while keeping some auditability. Some third-party tools exist just for that. But YARN now supports it out of the box thanks to the patch protocol in a way that's directly integrated with the cache system and all its chads. So that's really nice to have this kind of feature that came from the community through the patch package, for example, published on NPM that are moved as a first-party feature directly integrated with the package manager.
So I think those three examples give you a good overview of what YARN has to offer in the context of a company. But keep in mind that those are only examples. For example, we didn't talk about how YARN can keep your packages compressed on disk and share them between projects. That's what you prefer. Or how you can install packages straight from Git or how your scripts are made portable across both Windows and POSIX without you having to do anything. And that itself is only a quick list from a single slide. There are a lot more hidden gems in YARN.
So far we've discussed what YARN is and what it can bring you and your organization. But now I want to take the opportunity to go farther than usual and tell you why it works so well. Let's make a quick dive into YARN's development process themselves. A large part of our secret sauce, if you will, is our infrastructure. It may come as a surprise that the most valuable asset we have isn't YARN itself but rather layers built around it. But that's really the case. You probably noticed that YARN's iteration speed was far beyond the industry standard, and the big reason for that is that we have spent a lot of resources solidifying our foundations.
5. Testing Strategy and Infrastructure
End-to-End tests are run every four hours on popular third-party projects to ensure compatibility with YARN. Building the code base using the latest tools like TypeScript has increased confidence in the code. The testing strategy was revised to use the CLI binary, making it easier for external contributors to write tests. Trusting the infrastructure and investing upfront costs leads to faster iteration and decreased burnout risk.
One example of this are the End-to-End tests. Every four hours, a set of comprehensive tests are run. But rather than test YARN's behavior, we actually test popular third-party projects from the ecosystem, NextGIS, CreateRack, Gatsby, Angular, Jest, those are only some of the big names that we monitor. If any of them accidentally ships something incompatible with YARN or if YARN accidentally ships a regression, we are aware of it less than four hours later and can immediately work with the maintainers in order to find a solution. Even if that's an incommon scenario, having this mechanism in place means that we can afford to be slightly less conservative than usual. We don't have to assume that our work is compatible with the ecosystem. We have the proof that it's actually really the case. And similarly, we made a huge effort to build our code base using the latest available tools, including TypeScript itself. I can't stress enough the benefit that it gave us. Not only did it have us avoid silly mistakes on our own PRs, but it also raises the confidence that we can have that a PR won't have unforeseen effects, which means that it's easier for us to merge pull requests from third-time contributors and trust them that if things will just work. Finally, our last major also came with another rule of our testing strategy. At the time, we had a bunch of unit tests where we instantiated a lot of classes from the core and activated their methods to change their behaviors. While it worked well enough for a time, this approach proved very difficult to maintain over time, because it meant that refactorings were practically impossible, because it would have required to rewrite all the tests, all to keep a downdated interface just for the tests. So we decided to try something a bit different, starting from V2. We still rewrote all the tests, unfortunately, and I had to do this incrementally, but this time, we updated them to directly use the CLI binary, exactly like real users would have done. And because the CLI interface is not meant to change really, it meant that it allowed us to remove a lot of mental skills while making it much easier for our external contributors to understand how to write tests, because literally, all they have to do is to write CLI code. I think the lesson here is simple. To write code efficiently at large scale, like for a major open source project, you must be able to trust your infrastructure. And by accepting to pay the costs upfront like we did, you would be able to both reach faster iteration speed and decrease the risk of burnout for your team, because they will have less work and less pressure to assume that things are working.
6. YARN's Architecture and Future
YARN used to be a monolith, but now it is modular with a core containing critical algorithms. Features are implemented as models, making it easy to extend YARN with new functionalities. Community members have built their own plugins to customize workflows. YARN 3 is in development with improvements and new features. Sponsor programs are being explored to support YARN's sustainability.
So, let's focus on YARN itself, and more precisely, one particular part of it is architecture. As I mentioned earlier, YARN used to be a monolith. Work spaces didn't exist at the beginning, so we didn't use them. And while we work on YARN, some parts of the application would benefit from being independent from the rest, even if only to prevent them from accidentally relying on unrelated components they shouldn't even be aware of. So, one of the first things I did working on it was how to make YARN modular.
As for today, we now have a core containing all the critical algorithms. For example, the resolver, which will take all the packages, query the npm registry, and get all the versions needed, or the fesher that downloads packages, and which exposes a bunch of interfaces that models can implement whenever they want. Most of the features you see in YARN are implemented this way. Communication with the npm registry is a model. The package table generation is a model. A plug and play itself that you may know as the new install strategy in YARN is an external model itself. It's only at release time that those models are bundled together to yield a YARN CLI that you know. Interestingly, this architecture also makes it very easy to extend YARN with new functionalities. For instance, the focus command as we implemented it is literally a hundred lines inside its own model. You could implement it yourself, and in fact a few people actually did. Because that's something that is interesting, it's very difficult sometimes to build workflows that satisfy everyone. So for example, we have two community members that have built their own plugins that do exactly what they intend to do with their focused install. They remove dependencies if they don't want them. They generate a zip archive that they can publish to AWS in one pass. It's really much easier to implement your own workflows in some cases than to expect an open source tool to implement exactly what you need. So if a behavior doesn't match your expectation, you can experiment with a new one and let us know how it goes.
So, as I mentioned, in the past few months, at least two community members have started to solve their monoreport deployments by altering plugins dedicated to their use case. So where do we see YARN in the future? No one can predict what's going to happen, but I can already tell you what's currently happening. We're working on the next major YARN version, which will be YARN 3. It will feature its share of improvements, clean up some behavior of our CLI and add new features, including some that I'm sure will make sense. At the same time, we're starting to look into making YARN sustainable by looking into sponsor programs. I mentioned earlier that on the source projects, there are a lot of efforts, resources and time that we needed to find ways to support this work. This is where you can do something. If you or your team are interested in getting your company to sponsor some of the time we spend on YARN, please contact us. We'd be happy to discuss stuff that will be favorable to both parties.
7. YARN's Impact and Commitment
YARN is alive and in a better shape than ever before. It protects you from mistakes and offers value to the ecosystem. We've made an impact on projects like Next.js, Gatsby, Creative app, webpack, and more. Our test suite rarely reports problems. We're committed to moving the ecosystem in the right direction. Thank you for joining us!
All this to say, YARN is very much alive and in a better shape than ever before. Our team is aligned. Our objectives are clear and our tech is sound. Certainly, we have taken a more optimistic course than we used to. And we think that, for most people, it will be a net positive, knowing that the tool will protect you from your mistakes.
Of course, different people may prefer different kinds of tools. And that's fine, because not one project is for everyone. Overall, I think it's clear that YARN is there to stay. It's not just a fade. And we truly believe that we offer something different enough to provide value to the ecosystem.
We've been in relation with many projects during the past few years, tweaking our copies, expanding concept and pushing for good practices. And I think it really made an impact. Things like Next.js, Gatsby, Creative app, webpack, much more. When we started, the tests that I mentioned earlier, those end-to-end tests that we run on all projects every four hours, they frequently failed due to missing dependencies. Each time it happened, we went to the relevant project and shared a PR with them in order to have those missing dependencies. Over time, they started to pay more and more attention to it. And nowadays, our test suite is very rarely reporting problems. This is also the work that Yarn does. We don't only make a CLI tool. We also have the ecosystem move into the right direction, not only for Yarn users, but for every package manager users. Of course, the problem is that it's always public facing or shiny. So sometimes, it may be difficult for outsiders to have any idea of the real amount of work that we put into the project, but it's something that really is important for us and that we are going to keep doing in the future.
So, we've reached the end of this presentation. I hope you enjoyed it. And I'll be happy to answer questions you might have on this subject. Thank you. Thank you so much for that amazing talk, Mal. I really, really enjoyed it. I love hearing about yarn, about where it's been, where it came from, and the future of where it's going. Remember, if you have any questions, throw them into the chat so that we can grab them.
8. Yarn Poll Results and Evolution
The poll results show that there are both new adopters and long-time users of Yarn. It's interesting to see a significant number of people who have been using it for less than a year. I joined the Yarn project while working at Facebook and continued working on it even after leaving the company. Yarn has evolved to become more stable and reliable over time, with a strong emphasis on technical correctness.
And we can ask Mal. I'm going to invite Mal on. But the first thing we're going to talk about are the results of the poll. So, the poll question was, for how long have you been using yarn? And the top answer was, one to three years, followed by less than a year. So, quite a lot of new adopters, just a few people who have been using it for a while. Does that surprise you, Mal, at all?
No, actually, that's actually very interesting, because we hear every now and then that most people using yarn have been using it for a while. But it's interesting to see that almost half of the people who answered, answered less than one year and the rest is in one to three years. So, yarn isn't that old yet. So, three years, one to three, can be a lot of time. But I'm really interested by the less than one year stuff. It's really interesting.
Well, I mean, it's good to have so many people who are beginning to use it and take it up. We've got a couple of questions that have come into the chat and probably a few more that are going to come in. I'm going to pick one that I really like because I love seeing how many people work in different open source or these big projects. And I always wonder, how did they get this? And how did you get into maintaining the yarn project? So, at the time it was because I joined Facebook. So, I joined it for no particular reason related to yarn itself. And then I happened to be in the same country as the team that was in charge of yarn at the time. So, I joined the team. So, when I say team, it was a very small amount of people that were working on yarn itself. In fact, when I joined the team, there was a single one who quickly went to other projects across countries. So, I suddenly had to work on yarn by my staff. And I liked it. I really like working on a project that the whole ecosystem could use. It was very difficult, I will say, because it meant that I had to do a lot of work for the open source and prioritize correctly work, depending on the company I was working for as well. And I think I learned so much that when I left Facebook, so it was one year and a half, two years ago actually, when I left Facebook, I decided to keep working on it. And that's about when we started releasing the new branches for Yarn 2 and above that we are still maintaining to this day with a new core team that is almost entirely new people that joined starting from Yarn 2, except for me who went from Yarn 1 to Yarn 2.
That's amazing. And one thing I find so interesting about your talking Yarn, it seems sort of like there's so much I learned about what Yarn could do that I didn't know and it became so much more. I like what you said, where it's not just a package manager, it's a project manager. But what would you say is the biggest thing if you kind of had to answer this in one question in one answer? How different is Yarn now from when it was first released? I think Yarn is much more stable and sounder in terms of the technical aspect, because we we put a very strong emphasis on doing things when we know that they are correct for sure.
9. Yarn's Sound Development and Collaborative Growth
Yarn helps you build your application in a sound way, similar to TypeScript and Flow. The organizational point of view has improved with a core team of lead contributors. Finding people who share the same passion keeps the flame burning.
So, for example, with the default installation Yarn, we throw error when you try to access packages that are not in your package address, so that you have no risk to accidentally depend on the balances that would be missing on your machine. So, we really try to help you build your application in a sound way, a bit like TypeScript and Flow actually do.
And organizational point of view, it's very different from what it was before, because before it was mostly one single person, sometimes two, but not for long, and it was extremely exhausting. Nowadays, the core team is actually composed from, I would say, four main lead contributors and a few other ones who are gravitating in the same circle. So, it feels much less lonely to work on the companies, so that's a great improvement. In the open source, I think that the main way that we can keep the flame burning is by finding all the people that share the same passion as we do, and I think that we have started to to find that.
Yeah, I love the fact that you spoke about how having the different people in the team now to help you must just make that so much, much a nicer experience.
10. Reasons to Use YARN over NPM
The biggest reason to use YARN versus NPM is YARN's emphasis on correctness. While NPM focuses on things working and doing something sensible, YARN takes a more radical approach, ensuring that things work as intended. Both approaches have their benefits, but the emphasis on correctness makes YARN a better fit for me and our users.
There's a question here, and I'm pretty sure it's a question you've been asked over and over and over again, so I'm sorry. I apologize in advance that I'm gonna ask this, but what is the biggest reason to use YARN versus NPM? So from my perspective there is a lot of reason, but I would say that if something works for you, then it's fine to keep using it, whether it's YARN or NPM. The reason I choose to work on YARN is because of its emphasis on correctness. NPM has a slightly different philosophy, which is a bit that things should work and do something sensible. And on YARN we do something that is a bit more radical in that things should work if they are meant to work. The two approaches have their own benefits, but I think that the one that we have chosen fits me better, and it fits our users better as well.