Yarn 4 - Modern Package Management

Bookmark

Yarn 4 is the next major release of your favourite JavaScript package manager, with a focus on performance, security, and developer experience. All through this talk we'll go over its new features, major changes, and share our long-term plans for the project.

If you only heard about Yarn without trying it yet, if you're not sure why people make such a fuss over package managers, if you wonder how your package manager can make your work simpler and safer, this is the perfect talk for you!

by



Transcription


Hello, everyone. So my name is Mael. I work at Datadog. And I've been leading the development for Yarn for the past few years. So today I'm going to talk to you a bit about Yarn, what are its core values, what we are aiming for, for each version that we release, and show you a glimpse of the future. Before we start, what is Yarn? So Yarn is a package manager that you may know, similar to npm, that allows you to install packages on your system to resolve dependencies. And it favors consistency and stability while still attempting to provide good performances and high modularity to your projects. It's been a long adventure. The first version of Yarn got released almost six years ago with 0.15. One year later, we released the first stable release with a 1.0. And two years and a half later, we decided that it was time to make a change and to decide for sure what we wanted to do in the future of Yarn. And with that came the 2.0. At the time, there were a lot of discussion about some of the core aspects that we have been working on in the subsequent release of the 3.0, and that we are going to keep refining in the 4.0. One other package manager. We already have npm. We also have pnpm. What does Yarn bring to the table? The thing to remember, and that's true for package managers, but that's also true for, say, bundlers, is that features and performances aside, each project in the open source ecosystem has different properties in terms of priorities, roadmap, governance model, maintainability, infrastructure. All those things are things that you should keep in mind each time you try to evaluate a project. Because, for instance, npm is owned by GitHub, whereas Yarn is completely open source. In both cases, there are pros and cons, and that's the kind of thing that you don't see at the very first glance, but that makes sense when you're trying to invest in a tool in the long term. So I talked about priorities. What are Yarn's priorities? We have four of them at the moment. The last one got added quite recently, and we are going to talk a bit about it in the future slides. But first, stability is the main core tenet of Yarn. We want all your installs, all your experience of using Yarn to be deterministic and predictable. If something works for you, then it should work for your colleagues. If something crashes for you, then it should crash also for your colleagues. And this last part is quite important, because making sure that a program fails consistently allows you to make sure that it will also work consistently. If someone has a problem, you will be able to reproduce the issue and to help them get past it. Maintainability. We are trying to set up the project not only so that it succeeds now, but also that it succeeds in the future. The way we see Yarn, Yarn will still be there in ten years. How can we make sure that Yarn will still be in a good shape in ten years? That's not so easy, because it means that we need to make some choices in terms of governance, in terms of architecture of our own repository, how can we keep the codebase healthy. So that's one of our priorities. Moderality is another one. Back in Yarn 1, we noticed that a lot of you had very specific use cases, and it was very difficult for us to implement all the features that you needed, sometimes that only one company needed. So instead, what we decided to do with modern release of Yarn is to make our core modular, meaning that you can write plugins, you can write commands that go into the core Yarn API that we provide, that we document, and you can make your own logic on a very few simple lines of code. Almost all of the Yarn commands are implemented through this system. For instance, the install itself takes something like 50 lines to implement. And finally, security. That's something that we are starting to introduce, because even though Yarn was safe before in that we tried to prevent packages from accessing your disk, there are other types of attacks. During the past few months, you may have heard about attacks such as UAParserJS or FakerJS, this kind of problems that are starting to rise, and we want to provide a solution so that it's not a problem in the future. You may notice that I didn't talk about performances. That's because we are in 2022, and all package managers have the same performances for the same features. Sometimes one package manager will do more things than the other when doing an install, like for instance with Yarn we are doing some validation, but overall it's roughly the same thing. We track benchmarks on the PMPM, NPM, and Yarn on an hourly basis in our infrastructure, and something I noticed by trying to see what were the differences between package managers is that even though PMPM adds some hedge on us on our automated benchmark, that's something I couldn't reproduce on my laptop. I think it goes to show that performances of package managers are very relative, and at the point where we are, they don't change that much things. Anyway, enough about Perth. I want you to learn something about Yarn. So now we are going to discuss about 14 things that you don't know about Yarn but that it actually does quite well. So first, about installs, since that's the main thing that a package manager does. I don't know if you know that, but Yarn, even modern versions, support non-models just fine. You may remember that in 2.0 we released a new install strategy that is called plug and play that allows you to not have a non-models folder to share your dependencies across all the projects on your machine, to not be plagued by ghost dependencies, but we also support non-models. So if your goal is just to migrate a project quick and fast, you can do that. It's very well supported. We have a protocol called exec that allows you to create your own packages dynamically when running Yarn install. The exec protocol, for instance, let's say you have a package that is on SVN. I don't know if other people are using SVN, but sometimes we do. If you want to fetch a package from there, then we would have to implement it inside the package manager, so inside Yarn itself. With the exec protocol, you can just define a JavaScript script that fetches these packages from any location you want. I use SVN, but it could be from any other location. Any workspace can be installed from Git. The Git protocol, when declaring a dependency as Git, allows you to install one workspace from any Yarn or PMPM or NPM project. And finally, the patch protocol is a protocol that allows you to apply changes to any package and to keep those changes inside your repository. That's a use case that is very common when you have a security issue on a project that you need to address that has not been released yet. You can just use the patch protocol in order to fix whatever is problematic. Another use case is, for instance, when you're trying to migrate from CGS to ESM and there is something that is breaking somewhere, just a line to change, but it's difficult to change upstream. You can use patch package in order to just change the one line that is bugging you. So that was for installs. Now, Yarn has a lot of features, and some of them are optional. You can opt into them and start using them, but you can just completely ignore them. First one is that Yarn can install symlinks, just like PMPM. You may know this strategy that PMPM has, where instead of generating a concrete non-model where each file is actually a file and each folder is actually a folder, NPM does symlinks that points to a global store. That's something that Yarn can do. We introduced that in the 3.0 last year. And a few companies have been starting to use it and giving us their feedback. The one thing I really like about Yarn is that we are very modular, as I mentioned. So we can implement a lot of things in a very few lines of code. So when we implemented the PMPM linker that allows you to do the symlinks, it actually took less than 100 lines to make the first iteration. That's the kind of thing that I was mentioning when I said that we want Yarn to be healthy so that in 10 years from now we can still keep adding features, fixing bugs without being encumbered by the past implementations. We have a version workflow that allows you to manage versions of cross workspaces. So if the only reason why you're using, for instance, Lerna is to manage versions, that's something that you can get for free just by using Yarn. We are using this workflow as part of the Yarn development itself since all of our workspaces, and we have a lot of them, like 30 or something, all our workspaces are managed through this system. And so we are improving it frequently. Constraints allow you to link your projects, packets, and files. Once you have a certain amount of workspaces, like, for instance, Yarn with these 30-something workspaces, it becomes difficult to make sure that all of them satisfy some criteria that you have, for instance, that none of them depend on some dependency, like, if you have both Lodash and Underscore, it would be a problem. So you might want to prevent one of them from being added somewhere. That's what the constraints allow you to do. If you want to prevent two workspaces from depending on different versions of React, that's something that you can do too with constraints. So constraints are very powerful and allow you to define in a very few lines of code, sometimes even as small as two, what you want your workspaces to look like. And what's the nice thing is that it just works, it can just auto-fix all problems. So if you tell it what the state should be, it will just apply the changes to all your workspaces in one command line. Another one is that TypeScript can, yeah, so Yarn can auto-install the add-types packages as needed. So it's always a little annoying for me when I'm doing some TypeScript development and I'm adding a dependency and I see that in my editor they don't show up, the types don't show up. With Yarn, we can just check whether the package has types, and if it doesn't, we check whether it has definitely typed packages, and then we install them automatically. This behavior can be disabled in 3.0, it's opt-in. In 4.0, I think we are making it opt-out. But it's something that you can disable if that's not something you want. In terms of community service, so open source is about community. We are writing this tool and making it available to you so that you can use it, so that other projects can use it in order to maintain their own architectures. And it means that we are trying to be good citizens and we are trying to work with the community in implementing features or making sure that projects can benefit from the changes that we make, this kind of thing. So I listed a few of them. The first one is that we actually contribute to third parties to fix dependencies. I mentioned earlier this problem about ghost dependencies when you start relying on dependency without declaring them in your packet.json. While it may work in some cases, it usually leads to very subtle problems where the versions are not matching, which means that as you add and remove packages, you may suddenly end up in a state where your application doesn't work, even if you didn't touch any related dependency from your project. In order to solve that, the proper fix is to list dependencies. But most package managers don't really surface this information very well. With Yarn, we try to do that, and each time we notice something that doesn't seem quite right, we try to work with the maintainers in order to fix those issues. It's important not only for Yarn or for its users, but also for NPM and PNPM users. As I mentioned, those problems occur everywhere. Each time you have something where the version is not quite compatible, although it probably should be, that's a problem of ghost dependencies. We are part of the Node.js loaders working group. So loaders are the way that Node.js allows you to intercept the required call and route them to different locations. For instance, that could be loading models from HTTP instead of loading them from the disk. It could be from loading models from compiled archives instead of loading them from individual files. There's a lot of use cases for loaders. For instance, you may know DEST, which is mocking your models, that goes through loaders. And so the loaders are very new. They didn't exist for command.js. They are starting to appear for ESM. So we are part of the discussion in order to figure out how to make them powerful enough to be practical in our world. We run end-to-end tests against many open source projects. Something that we noticed by contributing to the third parties is that it's easy for them to accidentally add another dependency, forget to list it, and then things start to break. So in order to prevent that, on our side, inside Yarn itself, every three hours, we run a bunch of end-to-end tests by installing the very latest version of all major open source projects, like Svelte, Next.js, Gatsby, Webpack, all kinds of projects, really, and checking whether they work on simple tests. If they don't, then we can immediately go to the maintainers and speak to them and see what would be the best fix. So it's been quite helpful for both us and maintainers to track regressions. And finally, we have an implemented core pack, a new Node.js tool that allows you to manage the version of your package manager on a by-project basis rather than global basis. That's something I've been feeling very strongly about, because when you think about it, your package manager's job is to lock your dependencies. Going from there, it feels a bit weird that the package manager is the only dependency of your project that wouldn't be locked, right? So with core pack, you can actually log the package manager version to a specific version so that you are entirely sure that everyone in your team will have the exact same behavior. One thing to note about core pack is that it works for Yarn, so it's distributed with Node, and when you run core pack enable, you have Yarn inside your bin folder. But it also works for PMPM. That's something that I also felt strongly about, that things should work not only for Yarn, because we are one of the other package managers, but also for PMPM, which is another one. You can organize them and accept them inside the community. And that brings me to my other point, which is cross-project pollination. We want Yarn to be kind of a platform that can be used in order to build your own package manager if you want to. How does that translate in practice? We maintain a database of ghost dependencies. All those problematic dependencies that I mentioned where if you're missing one, you may have different behaviors from one side to the other, that are things that we track and that we store inside a small database. And PMPM, for instance, leverages this database in order to fix problems as they are reported. So basically, it's like a compatibility database. PMPM itself leverages our code to generate non-models, not the same link non-models, but the concrete file that you may, the concrete non-models installed, such as the one from NPM. We implemented an oyster, which allows you to define the right layout given a set of packages. And we've been able to extract this code inside a package so that other package managers could leverage it in order to implement this same kind of behavior. And we have a bunch of libraries that we publish on their own. For instance, Clipagnon is the framework that we use in order to build our CLI. And instead of keeping it inside the Yarn code and just leaving it to live like that, we extracted it inside a package that you can use inside your own application. So even if what you're doing is no relation whatsoever to package managers, you can still use code that is written for Yarn. That's a lot of things, and I'm sure that you didn't know at least one of them. The thing with Yarn and Paras package managers in general is that those are diamond fields. There's a lot of very different things that we are doing, and it's sometimes difficult to be aware that they exist before you need them. So at first you're like, yeah, but what's the difference between A and B? And then you're trying to dig and you're starting to see that there are very small features that are making a hole. Okay. I talked about what Yarn does, but where is Yarn still not good enough? And indirectly, what are we doing for Yarn 4? In terms of friction, we are not exactly at the stage where I would want us to be. As I mentioned, it doesn't matter how good is the buffet if you can find the door to the restaurant, right? So that's something that we want to address. We want to help you find the door. So for that, we are going all in on Corepack. Corepack will become the recommended way to install Yarn, and that makes it closer from being part of Node.js. So it's still experimental in Node itself, but as far as Yarn is concerned, that's the tool that we are going to recommend people to use in order to install Yarn. It means that even though with Yarn 2 and 3, it was recommended to check in the build binary of Yarn inside your repository, it's no longer true starting from Yarn 4. The CLI will become battery-included. One of the changes that we made when we switched from 1 to 2 is that Yarn got a lot of new features, some a bit experimental, that we didn't include in the default binary, which requires you from installing and managing plugins that were written by us. That's something that we are changing in Yarn 4. We are graduating all the features that were previously plugins so that when you get the Yarn binary, it will contain all the features that we have been working on. So auto types, constraints, workspace tools, like for instance, Yarn workspaces forage, which allows you to run a script on all your workspaces. Those are things that we know will be part of the default distribution of Yarn. And another one is that we opt in the local cache. Yarn has two caches. One is global to the machine, another one is local. And with the new version, we are making it opt-in so that there are less files that are generated when you're running an install. Security. That's something that should be a default, not something you opt in. So in order to do that, we have check resolution and refresh log file. Those two flags completely prevent any attack to be done by modifying the metadata in your log file. You may have heard about supply chain attacks. That's something that is not possible with those flags. However, those are flags. You need to enable them. Which brings me to my other point, which is Arden mode. Yarn will try to detect which mode it is being run under. If it detects that it's an unsafe environment, for instance, a public pull request, it will by default make different tradeoffs between speed and security and, for instance, enable the two flags I mentioned. So that you will not have to think about security when using Yarn. Stable resolution. That's an alternative resolution strategy that we are working on that is actually used in other languages, for instance, Go. That protects against attacks like UA parser JS or fake JS. The thing with most attacks in JavaScript is that they are extremely viral. Once a package reaches the registry, it may be picked up by anything that depends on this package but also all the transitive things that depend on the vulnerable package. With stable resolution, we remove the viral factor. It's not entirely ready. It's still experimental. It breaks some old package, for instance, under this mode, things like gulp don't work anymore. So we are still discussing with the community to figure out the best way to roll out this if we actually want to roll it out. And finally, documentation. As I mentioned, there is a lot of things to discover in Yarn and you often have very little bandwidth to see them. So we are rebuilding our websites in order to be more clear, to better present all this information, and to have better content in general. I only mentioned those three main topics, but there is a lot of things that we are also improving in Yarn, like the patch protocol, faster boot time. We are improving the public APIs that you can use when writing Yarn plugins by adding new hooks and new functions. We are trying to make Yarn have a better integration with Git in various commands so that it knows how to manipulate Git to the best. There are a lot of things that change, but this is a 20-minute talk, so I can only go over a few of them. I hope you enjoyed this talk. If you have any questions, feel free to ask them on Slido or ping me in the hall. I love talking about Yarn. And yeah, thanks for having me here. Thank you very much, Mile. Very nice session. I learned a lot about Yarn. I will definitely give it a try. I don't think that we have some relevant questions on Slido so far, so folks, wake up, use this tool for sending questions. That doesn't mean that I will let Mile go without a couple of questions. And first one from myself. So Yarn is open source, right? And let's change places, so organizers ask us. Your advice, how to start contributing to Yarn? Maybe some issues that are easy to fix? We have a label with good first issues. Generally speaking, we are extremely open to new people joining us in the Discord. So a few times someone joined us and said, hey, would it be possible if I was doing this fix? Would it be merged? And then we tell them, we give them some ideas about how to make it in line with the type of contribution that we are looking for. We have written a guide for contribution guidelines. So yeah, we are really welcoming anyone to really help us working on Yarn. That's amazing. And I think we can take one question or maybe two from the room. Any questions about Yarn? Yes? Let me run with the microphone to you. Come closer. Thank you. Thank you. Arthur from Big ID. You just showed a lot of amazing things that you did with Yarn. And I recently tried to actually use Yarn Tree to improve our mostly CI performance and also, of course, get all the local benefits from it. And I checked the plug and play option. I don't know if you mentioned it today, but the question is how a big organization of hundreds of developers can just switch to this one because there is issues with at least IDEs and, you know. So that's actually what we did at my company. We are using plug and play. And the way that we do this is that you have various fields, various, sorry, tools inside Yarn itself to fix phantom dependencies. For instance, in the Yarn settings, you can have a package extension field that allows you to declare all the dependencies that are missing so that Yarn knows they are here and stop telling you that there is a problem there. As I mentioned, we also have this database of compatibility problems that we automatically fix when we are aware of them. So from time to time, people come to the repository and make a PR just to add new entries into this database so that all the ones that adopt PNP are not affected by those issues. So it's definitely possible. It's a bit of work because you have to go over the missing depths and add them to those settings. But overall, it's in a pretty good state. Thanks for answering and thanks for asking. And yeah, we have some questions actually. Let's just read them aloud. Would it be recommended to switch from NPM to Yarn in a big, long-running project? It really depends on what your priorities are, as I mentioned. So I don't use NPM, so I don't exactly know all the diamonds that it has over Yarn. I know that Yarn works for me. I don't know if it would work for you. Clearly, we are trying to be perhaps a bit more difficult to work with in that we have more friction at the moment, as I mentioned. But at the same time, we are trying to do more to surface problems before they start to suddenly appear in the middle of a CI run. And then you have to stop what you're doing to try to fix things. So it's all a matter of trade-off. See whether things work for you, whether you would want to see a special thing fixed, see whether Yarn fixed them, and make your choice. Fair enough, fair enough. And so you started your presentation with lots of statements about performance. Let's finalize on the same note and the questions exactly about that. What's your opinion on the non-deterministic performance results of package managers? Will Yarn become more consistent across different machines? Overall, I think we are fairly consistent. The timing difference I mentioned between CI and laptops was still very slow. So it was more just two lines going from one to the other with PNPM. When you're using PNP, it's even faster, because then we are just writing a single file on the disk. So the link step basically doesn't exist anymore. So depending on your settings, it can be extremely fast. Awesome, awesome. Folks, I'm sure you have more questions to Mile, more questions about Yarn. And you have a chance to continue this discussion, because I kindly ask organizers to guide Mile to the special area where you can sit and chat further. Thank you very much, Mile. Great presentation. Thank you.
29 min
16 Jun, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic