Since 2017 Yarn proved itself a pillar of JavaScript development incubating numerous features our ecosystem now heavily relies on. As years passed, as competitors improved, so did Yarn, and it's now time today to dive into the features and tradeoffs that make Yarn a truly unique gem of the JavaScript ecosystem.
Yarn in Depth: Why & How
From:

DevOps.js Conf 2021
Transcription
Hi, everyone. For those who don't know me, my name is Mael. I currently work at Datadog and I've been leading yarn's development for a few years now. Today we're going to extensively talk about what it is and what it can bring you. While we won't be able to go into each and every single little quality of life improvements it offers, I hope that by the end of this talk you will have a better idea of what makes this project unique in our ecosystem. So first we should discuss what's actually yarn. If you ask anyone, they will likely tell you that yarn is a package manager for javascript and they would be mostly right. But it's only part of the story. yarn is not just a package manager, it intends to be a project manager. Indeed, if you think about it, yarn lets you manage scripts. It lets you split your application into standalone models. But as we will see later, it can also be used to manage your release cycles, to monitor how your team uses your scripts, or even to enforce standards across your monorepo. All of these tasks go far beyond the typical package manager and every release pushes the boundaries further by introducing new features. So project manager, that sounds nice, but how do we get there? What are the things that we look for when managing PRs? We are going to discuss core values. The first thing to realize is that we are a community of contributors. Open source is a very taxing environment and most projects are struggling finding ways to make their work sustainable and yarn isn't exempt. To help with that, we rely a lot on our contributors to be the change that they want to see to contribute back to the project they like. In this way, we don't really see yarn as a product, we really see it as a project. In practice, it means that our core team spends as much time working on our infrastructure as on the product itself. Recently, we moved from webpack to ES Build in order to make building yarn easier. Multiple Commons lets you build part of the yarn binaries from source, so you can easily try out independent features. yarn is really all about making it possible for you to experiment well past what we, as a team, could offer by yourselves. A second important value we are always keeping in mind when managing PRs is soundness. yarn must tell you if something is wrong in your application. It shouldn't let you make mistakes. It shouldn't let them go unnoticed. It must not make uncontrolled assumptions. This may sound a bit rigid because you have more errors than you used to, but it's really critical. Whether you author applications or libraries, you need to have confidence that something that works now will also work on your production or when installed by your consumers. Another one is good practices. The javascript landscape is large, changes fast, and has many very opinionated people. As a package manager, we are in a unique position to help our users understand the tools that they are using and to guide them along the happy path. Not only using yarn should solve a practical need, it should also contribute making you learn and becoming a better engineer along the way in terms of javascript devops. We reached the last one of this set, default behaviors, which directly tie into developer experience. Most of our users will only ever use the default commands of our tool, which is a good thing because most of them, they don't have to remember a bunch of command line flags just for the sake of running one specific behavior. For instance, the fact that you can run any script by prefixing it in your CLI with just yarn may look a very simple thing, but it truly is one of the reasons some people might find it appealing. user experience is really important and crucial to the yarn user experience. We've now seen a bit of what yarn claims to be. Now we are going to talk a bit about devops. What does yarn actually do for you, practically speaking? We are going to go over two interesting stories of projects who adopted it. One is an open source project and the other is an internal application that we use at my company. Without surprise, the first one is yarn itself. Before we dive in, let me tell you a funny story. Back in yarn 1, we didn't actually use Workspaces to develop yarn. It was a real problem for us because not only were problems hidden away from the maintainers themselves, we weren't also directly confronted with the values that some features might have. As a result, we were not emerging things, even though perhaps we should have, just because we couldn't see how impactful they would actually be. When you use Workspaces, it's very apparent that, for example, you need to be able to run a script in all your Workspaces at once. But since we didn't use the Workspaces, it didn't really seem like a huge deal for us at the time. Anyway, nowadays we have an informal rule that the yarn team needs to use all features shipped into the core. I believe that had a strong positive impact on the work we've made so far by forcing us to use all the features that we ship as part of yarn itself. Now let's talk about workflows. The first one we are going to discuss are ready cycles. Our previous process back in yarn 1 was very simple. We just had a single file at the root of the repository, and each PR that people were making was expected to add one line to it. It worked fine, but after switching to a monorepo, it wouldn't have scaled very well since we needed the ability to release each Workspace, each package by itself. So we developed the release workflow, which is not unlike the chunkset package that you may know. The idea is that each PR we merge also has to include a little file created by yarn itself that lists all the Workspaces that have been sent by the PR and whether they need to be part of the next release. Our CI validates these files' content, and at release time, we simply instruct yarn to aggregate all version files into classic package bumps. Another workflow that we are using are zero installs. The idea is that we decided to keep the cache of the project inside the repository itself. For each package we have, we just need to keep one single archive. As a result, anyone that closed the repository will be able to instantly start working with yarn. More importantly, it means that our contributors are able to switch from one branch to another at almost no cost in terms of context switch, which is extremely valuable for us because we are, as I mentioned, we really want contributors to be able to submit a lot of different pull requests to feel confident working on different parts of the code. So being able to switch from one to the other is really important to the contribution workflow. I still feel the need to mention one last thing about this, which is that this is our trade-off and not necessarily yours. So you don't have to store your dependencies in your repository if you don't want to. We just felt like it worked well for us, so I would certainly recommend it, but it's not something that really is forced on you. One problem when you have a lot of Workspaces, and we have a lot, I think we have something like 44 at the moment, is that it becomes difficult to make sure that all the Workspaces in your project are aligned in their configuration. For example, how can you ensure that all the Workspaces have the exact same version of all your dependencies? It's far too easy to have one Workspace suddenly upgraded without upgrading the other, right? So we have released a feature called the Constraints, which you might have heard because it uses a fairly novel programming language in its configuration. It uses Prolog. And thanks to the Constraints feature, we are able to enforce particular patterns across all Workspaces in very few lines. And better yet, yarn can often apply the fixes itself. So for example, in the yarn repository, we use this feature in order to enforce the same version of dependencies across all Workspaces, but we also use it in order to make sure that all our packages on JSON list the right build scripts, to make sure that they all list the right repository field, to make sure that none of them are marked as private, this kind of stuff. So those three examples give you a good idea of what value open source projects can vary from yarn. In general, it means that they no longer need tools like Learner or ChimeSets because those workflows have directly received first party support. That said, even internal projects can benefit from yarn as well. And to talk about this, I'm going to discuss about Datalog. So I work at Datalog, and Datalog has a huge web application along with a sizable javascript infrastructure, which supports everything going from linting to deployments. yarn holds a really central place in this mechanism, and we'll go over three examples. So the first one is the telemetry integration. If you don't know what Datalog is, we are making some really good cloud monitoring, extracting signals out of arbitrary metrics. So given this DNA, I guess it won't surprise you to know that we also monitor the way our developers interact with our infrastructure, and in particular in terms of response time. So to achieve this, we wrote a plugin for yarn that watches all scripts that get executed, how much time they take, and set it all on a dashboard. All this data then helps us, the front-end platform team, identify potential bottlenecks before they become a problem, and generally prioritizes our work. As we saw with the yarn repository itself, checking independencies has powerful effects on one's ability to efficiently contribute to a project. But even in corporate environments, this pattern has its uses. From our data, the remote registry goes down approximately once a month. And by keeping our own copy of the packages, we are never blocked during our deployments. Additionally, despite a very high volume of commits on our main repository, our CI doesn't waste much time running installs because all the packages are already inside the repository. So once you call it, everything is there. And since the clone is usually cached between each CI run, everything is already installed in practice. All in all, instead of having to spend minutes installing projects in every single CI, it's really now just a matter of seconds. And one last thing I want to mention. We have a large project and a lot of dependencies. Some of them have bugs, and we try to fix them and submit the patch upstream. But until the fixes are merged, we need some simple way to use them locally while keeping some kind of auditability because we don't want, for example, to just edit files directly within the non-modules folder. So some third-party tools exist just for that, for example, the patch package project. But yarn now supports it out of the box in a way that's directly integrated with the cache system and all its chars. So that's super handy because we don't have to rely on third-party tools that would not be as well integrated. So those use cases we went over give you a good overview of what yarn has to offer. But you need to keep in mind that those are only examples. For instance, we didn't talk about how yarn can keep your packages compressed on disk and share them between projects if that's what you prefer, or how you can sell packages straight from Git repositories, or how your scripts are made portable automatically across both Windows and POSIX. And that itself is only really a quick list from a single site. I really suggest you to look at the documentation in order to find a lot of new gems you might not even be aware of. So far, we've discussed what yarn is, what it can bring you and your organization. But now I want to take the opportunity to go further than usual and tell you why it works so well. So we are going to make a quick dive into yarn's development itself. A large part of our secret sauce is our infrastructure. It may come as a surprise that the most valuable asset we have isn't yarn itself, but rather the layers built around it. But that's really the case. You probably noticed that yarn iteration speed was far beyond the industry standard. And a big reason for that is that we have spent a lot of resources solidifying our foundations from the get-go. So one example of this are the end-to-end tests. Every four hours, a set of comprehensive tests are run. But rather than test yarn's behavior, we actually test popular third-party projects from the ecosystem. So for example, we test next.js, Create react App, gatsby, angular, Jest. And those are only some of the big names that we monitor. If any of them accidentally ships something incompatible with yarn or the other way around, if yarn accidentally ships a regression, we are aware of it less than four hours later and can immediately work with the maintainers in order to find a solution. Even if uncommon, having this mechanism in place means that we can afford to be slightly less conservative than usual because we know that our end-to-end tests will be able to catch if something breaks in the ecosystem due to our work. Similarly, we made a huge effort to build our codebase using the latest available tools, including typescript. And I can't really stress enough the benefits it got for us. Not only does it help us avoid silly mistakes in our own peers as part of the core team, but it also raises the confidence that we can have that a peer from someone contributing to yarn for the first time won't have unforeseen effects. And finally, on the perspective of someone contributing to yarn for the first time, it's super handy to have directly the types to have all the benefits that a good IDE with typescript support can bring you directly. Finally, our last major also came with a complete overhaul of our testing strategy. At the time, we had a bunch of unit tests in yarn 1, and we instantiated various classes from the core and activated their methods to change their behaviors. It worked well enough for a time, but this testing/talks">unit testing approach proved difficult to maintain over the long term. Refactors were practically impossible because it would have required to rewrite all the tests since every test was relying on the architecture. So for yarn 2, we decided to try something different, and we went toward integration tests. So we still rewrote all the tests incrementally, but this time we updated them to use directly the CLI binary itself, exactly like real users would have done. This very simple change in concept allowed us to remove a lot of maintenance costs while making it much easier for external contributors to understand how to write tests. All they have to do now is literally just write CLI commands. So I think the lesson here is very simple. To write efficiently code, you must trust your infrastructure. And in order to do that, you sometimes need to accept to pay upfront costs in order to make sure that you will be able to both reach faster iteration speed and decrease the risk of burnout for your team as they shift feature, but not being fully convinced that nothing will break. Another reason why yarn is yarn is its architecture. More precisely, one particular part of it, the way that it is completely modular starting from yarn 2. As I mentioned earlier, yarn used to be a monolith. Workspaces didn't exist at the beginning, so we didn't use them. yarn was just one large code base, and everything was importing classes from various files. But when we worked on yarn 2, it became clear that some parts of the application would benefit from being independent from the rest, even if only to prevent them from accidentally importing unrelated components they shouldn't even be aware of, practically speaking. So one of the first things I worked on was how to make yarn modular. Fast forward to today, we now have a core containing all the critical algorithms and exposing a bunch of interfaces that models can implement however they want. Most of the features that you actually see in yarn today are implemented this way. The communication with the npm registry, it's a model. The package double generation, it's in a model. And plug and play itself, it's a model. So it's only at release time that all those models are bundled together to yield the CLI that you know as yet. Interestingly, this architecture also makes it very easy to extend yarn with new functionalities. For instance, the focus command, as we implemented it, is literally 100 lines inside its own model. You could implement it yourself if you wanted to. And that's really important because if a behavior doesn't match your expectations, you don't have to just live with it. You can just implement a new one and let it know how it goes. To give you an example, in the past few months, at least two community members have started to solve their monorepo deployments by offering plugins dedicated to their use case. And thanks to their experience, we get better insight as to how deployments should work. And eventually, we'll be able to build a standard workflow that will be built upon those experiments. So where do we see yarn in the future? No one can really predict what's going to happen, but I can already tell you what's currently happening. We are working on the next major version of yarn, which will be yarn 3. It will feature a chair of improvements, clean up some behaviors in our CLI, add new features, improve performances on commonly used commands, and make our plugin ecosystem even more powerful. At the same time, we are starting to look into making yarn sustainable by looking into sponsored programs. I mentioned earlier that open source projects require a lot of effort, resources, and time, and that's where you can help us. If you or your team are interested to sponsor some of the time we spend, some of the releases we make, please contact us, and we will be happy to discuss terms that will be favorable to both parties. All this to say, yarn is very much aligned, and I would say it's even in better shape than ever before. Our team is aligned, our objectives are clear, and our take is sound. Suddenly, we have taken a more opinionated course than we used to, and we think that to most people, it will still be a net positive, knowing that the tool will protect you from your mistakes. Of course, different people may prefer different kind of tools, and that's fine, because not one project has to please everyone, right? So, we've reached the end of this fast presentation. I hope you enjoyed it. Now I'd be happy to answer questions you might have on the subject. Thank you. Hey there, Mel! Really great talk. We enjoyed it immensely. And I think one of the first orders of business are to actually see the results of your question, which I thought was actually really interesting. What do you look for in a package manager, in your package manager, you asked the audience. And to be honest, I was surprised. I wasn't surprised, because simplicity is important, but 80% obviously said simplicity. When you were building out yarn, or as you maintain yarn, what are the core values that you try to maintain and make sure you deliver? As I mentioned in the talk, one of the main things that we are looking for is to try to provide a good developer experience. So, I'm not entirely surprised by seeing simplicity wins with a sizable advance over the other numbers. Because, yeah, it's something that we really spend a lot of time to be sure that, for example, running yarn with a script can be done without the run keyword or this kind of thing. For us, it works well. So, yeah, I'm happy to see that people like it, too. Yeah. Definitely interesting results. We had some questions come in from the crowd. So, get ready. AC asks, one of the great benefits of PNPM is the amount of space it saves on your workstation. Does Yarn2 address this? Yeah. So, one of the interesting things with PNP is that we can have a global store that contains all the packages. And instead of copying all the files into each folder, we can just reference directly this main store. We have two ways of doing this. The first one is by still making the copy, but by using copy on sorry, copy and write on systems that support it or our links. And another way is to simply literally just generate the file that reference the global cache. So, you have literally nothing inside your product. On top of that, all the packages that are installed by PNP are compressed in zip form. So, for example, from V1 to V2, installing a CreatoRax app is at least one third of what it was before. Interesting. Okay. So, that also contributes to the speed element in the so, CINOS, I'm not quite sure how to pronounce this username, but CINOS asks, when using yarn version 1, is it safe to run yarn install on a mono repo with preexisting node modules? Sure. Yeah, it really should be. There are some bugs in V1 that have been fixed along the versions. So, right now we are at 2.4 and we are about to start releasing release candidates for 3.0. So, 1.0 is a few versions old, but it should be fairly safe to install a project even if there's already a node module. Cool. And William asks, since yarn is now fully typescript, have you considered using deno instead of Node? Not really. We are currently fairly happy with Node. Something that we are currently looking is whether we could run yarn inside a browser. So, that would decrease the dependencies on the Node runtime. Right now we have too many part of the core that relies on APIs like Node streams that cannot be easily mucked off our deno. Interesting. Do you have any tips for migrating from a multi-repo npm-based system to a mono repo with yarn, in particular with regards to CI config for multiple deployables? So, we are working on release systems because so being able to, sorry, when we moved to V2, one of the things that we made was to speed the repository into multiple independent workspaces that are each deployed when needed. So, between releasing versions and making deployables, it's kind of the same thing. So, we are building tools that allow us to make this kind of thing more easily. Right now I'm actually working on the feature to allow us to make a pre-release of packages in just a single command line. That's really great work. I really have to commend you for being the maintainer of an open source project and such a widely adopted one. I'd like to push in a question of my own and ask what is it like to be the open source maintainer of such a widely adopted project and with so many users, probably a lot of issues and future requests, et cetera, et cetera. It's a lot of different things. On one hand, you have a lot of pressure from both people that are using your software and those who are not actually using it, but would like to use it. On the other hand, you need to make a balance between the features that you implement and those you want to maintain because sometimes people come to us and ask for features, but we know that we wouldn't use them. So, we have to push back on this because we wouldn't dog food them. So, they would be the right or the wrong fit for the project. And on top of that, we need to make a good experience for contributors to just make changes and help improve the code base one PR at a time. And I think that's something that I'm really happy is that during the past two years, we have acquired a lot of contributors that actually stay inside our discord and keep engaging the community. And it's really less lonely than it was at the beginning, where it was mostly the same one or two contributors and mostly internal to a company. Great work. I know that oftentimes it feels thankless, but I feel the need to say thank you for your excellent open source work. A few more questions coming in from the crowd. Lots of engagement on this talk. So, Mac asks, does yarn help with a multi-repo approach? Not so much. We are focusing a lot on having a mono repo because we think it's the right way to manage large projects nowadays. That being said, there is nothing that prevents you from having yarn installed into multiple repositories and to treat each of them as an independent project. The only thing is that if you do this, you will not benefit from some of the tooling that we have built. For example, the one I was mentioning where you can easily manage the versions, bumps between multiple workspaces. So, if you don't want to use a mono repo, it's fine. It's just that you will not have this feature and some others. Yeah. So, some of the features are targeted specifically for mono repos. Okay. Aurix, would you use npm over yarn in a specific scenario, use case? And if so, which would that be? So, I'm of the opinion that which package manager to use is not that much led to the user appreciation, but rather to the maintainers. So, for example, when I make a pull request to projects that are configured to use npm, then I will use npm. If I make a pull request to project that are configured to use yarn, I will use yarn. So, I sometimes use npm because I contribute to a lot of projects and some of them use npm. So, yeah. So, you use the right package manager for the job you're saying. Exactly. So, Tmall, we have just about time for a couple more questions. I'm going to drop them. And whatever we don't get to in the live feed, a mail will be around in the spatial chat and on the Discord. So, definitely feel free to continue the conversation. So, two more questions and then we'll be rolling the next talk. So, with new tools coming out being built in other languages, like yesbuild and Go, for instance, it feels like the community is looking to value speed in their tooling. Do you think it would be any benefit or room to do this with yarn? So, one of the things that we have worked on is zero install. The idea is that the faster install you can have for your project is to have no install at all. It's working well. We just need a bit more support from the third party project, from the community in order to reach the state where it's completely easy to use. That being said, as for the question of would it bring value to yarn, for example, to be written in a native language, I don't think so. Because it's a bit different in our case in the sense that the main value that we obtain from being in typescript is the easiness of contributing to yarn. One of the very important things for us is to make it a good developer experience to work with yarn, but also to contribute to it. And if we were to use, for example, something like C++ or rust or Go or anything else, it would just be that much more difficult to find contributors that would be able to fix the problems that they have with yarn. So, for us, it's more that the blocker is in terms of contributing experience more than about the speed at this point. Of course, you have to think about the long-term maintenance of a project as well. So, last question from Matt. We've had multiple issues trying to migrate a large monoregro from yarn version 1 to version 2. The biggest issue came from PNP with our own packages and third party packages. Any tips for making a smooth transition? So, the first tip if you eat problems with PNP is that you are not forced to use it. There is the non-models plugin that is working very well and receives a lot of support from some of our contributors, our core contributors. So, if by errors you cannot use PNP, it's fine. You can just use non-models. That being said, we have a lot of ways to solve problems one by one if you still want to fix them with PNP. For example, with the package extensions setting, you can declare the dependencies that are missing from your packages. And that's perhaps the major issue that people have, which is that what if one package somewhere in my dependency tree doesn't list its dependencies? What can I do? With this setting, you can just declare them yourself in your.yarnrc.yaml and it will work. We also have yarnpkg.doctor, which is a package that we publish, that analyzes your sources in order to try to figure out what are the places where you might have forgotten to declare a dependency. So, it helps you make your projects stricter and rely less on OISTing. And if you really are not sure where to start, we have our Discord server and we are really happy to help anyone that comes to us with questions. Great. Thank you so much, Mel, for your excellent talk and your very helpful answers. Meet Mel in his speaker room and on the Spatial chat and on Discord. We hope to see you again in the future with us, Mel. Thank you. Thank you.