The Zen of Yarn

Bookmark

In the past years Yarn took a spot as one of the most common tools used to develop JavaScript projects, in no small part thanks to an opinionated set of guiding principles. But what are they? How do they apply to Yarn in practice? And just as important: how do they benefit you and your projects?


In this talk we won't dive into benchmarks or feature sets: instead, you'll learn how we approach Yarn’s development, how we explore new paths, how we keep our codebase healthy, and generally why we think Yarn will remain firmly set in our ecosystem for the years to come.



Transcription


Hello everyone, my name is Mael and today we're going to talk a bit about YARP. First, let me tell you a bit who I am. I'm available at Arcanis and Social Networks, as I said I'm called Mael, and I work at Datadog as part of the developer efficiency team. Our job is to make sure that developers working at Datadog, product developers, can focus on writing products and don't have to deal with maintaining tools or infrastructure or deployments or this kind of stuff. As part of that, I've been also contributing to the yarn Package Manager and in fact leading its development since 2017. Let me ask you a question for this talk. How do you evaluate projects? All tools have their own strengths and weaknesses and it's your job as developers to decide which one you will want to use on a project in order to benefit the project itself. And as maintainers of open source projects, our job is to give you all the information you need in order to make a conscious choice that will allow you to move forward in your implementations. So in order to do that, I could tell you the feature list of yarn, but I don't think it would be as useful as many people make it so. Indeed, a feature list is transient. It's just a point in time. If I was to tell you the feature list of yarn or the nice thing it can do for you, it would quickly become obsolete when we are going to implement new ones. And in fact, we are working on yarn 4. So as you can guess, new things will come in the next version. So instead of doing that and making this talk obsolete as I'm speaking it, we are going to focus on the project itself and how it works. Why does yarn thrive? Why will it keep doing so in the future? Why is it a safe bet for your project? That's what I think would be interesting to discuss. In order to do this, I remember something called the Zen of Python. You might not know it, but in Python, if you're doing a special type of import, you're going to get a point printed on screen. I put the lines there. Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. You see the picture. The idea is that all of those statements are actually the philosophy of Python code. So if you write Python code, it's supposed to be simple is better than complex. It's supposed to be flat more than this. This kind of stuff. I really like this format and I wonder what would it look like if it was on yarn. So I wrote those statements. I'm leaving it on screen for a few seconds. You don't have to read it. We're going to go over each and every one of them. So don't read it. Just screenshot it if you want. Okay. Let's start. But before we start, one last thing. yarn is there for the 10 years to come. That's our goal. So all of the slides that are going to follow have to be put inside this context. We are working on this project not only so that it's cool now, but also for it to remain cool in the future. So first statement. Uniform is better than varying. Our whole CLI used to have widely different conventions from one command to another. For example, you had an option that was called no bin links, another that was called ignore scripts, another disable PMP. And that's only for the CLI flags. The configuration themselves had different conventions in how settings were called. The whole thing made it very difficult to know how to call things. And so it was difficult to remember them, even for us. So for our users, it was even worse. And by doing that, it also increased the chance that further deviation would happen because people contributed to the yarn codebase would have no idea how they should call things. They would have no idea about the right nomenclature. So now we are doing our best in order to make sure that all the flags, all the settings, all the commands all have a consistent nomenclature so that everything is neat and tight. Stable is better than unstable. We want your application to have only two states, as far as we are concerned. Either it works everywhere or it crashes everywhere. Of course, it can be difficult to do that in every case, but we really do our best for this. For example, scripts are run by cross-platform shared implementation. So even if you're writing scripts that are meant to be run on Linux and one of your contributors is working on your project from Windows, there is a good chance that it will just work. Even if you are using Pype, even if you are using Redirection, because yarn is applying an indirection so that everything is portable. Another example is that yarn will do the best it can to surface errors early. So let's say that in production you are removing all the dependencies. There is a good chance that you will be missing a few of them because of the node hoisting issue of ghost dependencies, as they are called. yarn will do its best to tell you when that happens so that you can fix it. It will tell you, hey, this package is depending on something, but it is not installed. Please install it. Readable is better than messy. So as a package manager, we have a lot of information that we can tell you. But only a few of them are actually really interesting. So we are making a conscious effort in order to restrict the information that we are printing on screen and only show the most important ones. Even more than that, we are also trying to highlight them so that they are easily parsable just by a human by using colors in semantic places. So, for instance, all the package names have the same color, all the branches have the same color, all the references have the same color. You see the picture. We are using colors not to make things prettier, but to give them a meaning. Additionally, we also care a lot about vertical space because I don't have a very large terminal. So it really is a problem when, for example, when you run an install, you have, let's say, 300 lines of nonsense being printed on screen. That's something that we don't want to happen. So we are really making efforts to make sure that none of the PR that we merge has this kind of problem. Errors are great tools. Don't be afraid of them. Warnings or errors are ways to include the user into the decision process. So we don't shy away of printing errors. When we're not quite sure what to do, we ask for guidance. And when something doesn't feel safe, we ask for confirmation through settings, usually. But they should be kept actionable. That's the important thing about errors. If we print an error, but you have no way to fix it, then it will just be frustrating for you. So too many errors will make our users blind to them. Once you start to have errors that you cannot do anything about, for example, PR dependency warnings, you will just not read them anymore. That's not something that we want to happen. So we try to only show errors and warnings if you actually can do something about them. And to help with that, we have a list of all error codes on our website, along with detailed explanations for each of them. Like kind of small articles, really, that explain what is the problem and how to fix it. And frequently, we improve the wording based on the feedback that we receive from our users. So we really are making conscious efforts to really write something readable. Speed is important, benchmarks not so much. Benchmarks are usually about forget managers. Years ago, they were in very different places. A lot of them were much slower than they currently are, including yarn. But nowadays, speed is rarely the real differentiator between them. yarn has automated benchmark, which we use to spot dramatic regressions if it comes to that. And as an example of why benchmarks are not the best way to see how good a package manager is, the same benchmark that we use ourselves on our CI don't give the same result whether we run them on CI or on laptops. So it's really difficult to be able to give a winner in terms of speed nowadays, because it really depends not only on the package manager themselves, but also the environment they run in. Overall, speed isn't a goal in itself. It's a mean to an end. We are doing this for a reason. We are doing this to improve the developer experience, your user experience. So while speed is something that we take into account, it's not the main thing that we have into our mind. Aim to do the right thing by default. That's something that we are currently starting to adopt more and more. Until now, yarn was a regular tool and had the same behavior in every environment. But the problem is that configuring a package manager is a bit difficult, especially if you have to take into account different environments, if you have to take into account security, this kind of stuff. For instance, would you spot a PR that would add a malicious dependency to your product? So-called supply chain attacks. It's not clear. But now we are trying to do something about it. For instance, yarn will enable the immutable flag by default on CI, so you cannot commit a log file if it's not up to date. In the next major, we are going to add additional checks on PRs when we detect that a branch is a PR, so that we can check for example, one package in the cache being corrupted, or this kind of thing. All without you having to configure yarn itself. Being the status quo doesn't justify itself. This is an interesting statement. A lot of the behavior in yarn have been iterated from npm or yarn 1. For instance, did you know that if a package has a binding.jip file, it has an implicit post install. Even if it doesn't declare it, yarn and npm will still run nojip install, just because that's how it always worked. We sometimes re-evaluate features and modernize or deprecate them. In this case, the nojip post install has been kept because it could have breached too many things to deprecate it. But in other cases, we have decided to deprecate a few features. For instance, bundle dependencies have been deprecated a few years ago, and we've been doing just fine without them. Saying no is both hard and necessary. Many contributions that we receive are one-shot. It's not a problem per se, but we have to assume that we are going to have to maintain the code that we are managing ourselves. And given that we are a small team, it's a lot of pressure on us. So we need to assess usefulness, complexity costs, and maintenance costs before deciding to merge something. Sometimes the bar isn't met and it makes the feature ineligible for core adoption. But that shouldn't be a blocker either. yarn.no provides hooks that applications can use to implement their own behaviors. For instance, yarn plugins can create comments, they can access our APIs, our JS APIs, they can work with a project, they can manipulate dependencies, they can do all kinds of things. And as such, us saying no rarely prevents a feature from being developed independently. For instance, one community member created a plugin that is essentially made to replace Turbo, or this kind of tools. So it's really interesting to see what the community comes up with just by using plugins. Users are collaborators, not customers. Open source sustainability is a large topic, and perhaps you will see talks about this in the future. As we said, we don't have the resources to investigate everything. We are not your support team. We don't have the resources to do lengthy back and forth on support requests, trying to figure out what's unique about your environment, this kind of thing. What we can do, though, is to share context and keep the codebase healthy so that you can investigate your problem yourself. That's the only way we can make sure that all bugs can be addressed, by making sure that you have the information you need to dig into them if it comes to that. And of course, the positive side is that some people who actually made the efforts to understand what are their problems may sometimes become recurring contributors and help us maintain them. This has already happened. We work for the benefit of future contributors. Remember what I said about yarn being alive for 10 years? There's something called the buzz factor. If only one person is handling a project, then if this person disappears, the project has a big problem. And as I said, we want yarn to survive 10 years or more. I don't know if in 10 years I will still be working on yarn, but I know that I will probably still be using it. But working? I'm not sure. So what I'm doing, and what our whole team is doing, is to make sure that we are investing in maintenance to help future generations of contributors. So every now and then we are improving, for example, the release experience so that it shows the button on github actions, the merge experience so that we can automatically resolve the conflicts just by clicking a button on GitHub, this kind of thing. It's not entirely selfless because more active contributors we have means that we have more bandwidth ourselves, less pressure, and more time to spend on R&D and fun features and things like this. We try to be teachers and advocate for good practices. There's a school that advises to shield developers from consequences of incorrect usage to always try to do the thing that they probably want to do. Because then it decreases the barrier to use the tool and it's a good thing for beginners. But there's another school, and I think I'm more of this one, that aims to surface mistakes early and explain how not to repeat them. yarn is more about the second one. It's difficult to know exactly what users want to do. So rather than make guesses and be wrong, we prefer to tell you what doesn't work, and then you can look by yourself what does it mean and what are the ways to fix this, depending on the context. Because the context can sometimes change how you are supposed to fix a problem, right? So that's what we do. Benefit the ecosystem and not only derived users. Our team made dozens, and when I said dozens, I wasn't entirely sure whether it shouldn't be hundreds. But let's say dozens of peers and comments to help improve third-party projects. We have been on all the repositories, we have answered questions, and we keep doing that. Plug and play has tangible effects on ghost dependencies. Even if you're a npm or PNPM user, you can partially thank us for decreasing the chances of ghost dependencies. Because again, we've been working with third-party projects to solve their missing dependencies. Each time one of our users told us, hey, yarn is reporting a problem with this package, we'd look at it, we decided whether it was a problem in yarn or in the third party, and we discussed with their maintainers. And it improved a lot of packages in a dramatically short time. We also advocate for new features to other package managers, and sometimes we even implement them ourselves. As an example, I myself implemented optional peer dependencies in npm years ago. Additionally, we work with the Node projects. For example, I designed and implemented Corepack for Node, which ships both yarn and PNPM with node.js. And we also contribute to the node.js Loader Working Group. So that the Loader api, which is used by things like Jest in order to mock models, is as powerful as it can be. So we really try to be a good citizen of the whole ecosystem, and not only work to improve our projects, but everything that is javascript. In order to do that, we often pick the destination and interpolate the journey. The ecosystem has many tenants, and each time you want to do something, improvements will often have to be planned very long term, because everyone needs to agree, and no one agrees at first. So you need to really discuss a long time in order for things to come to fruition. Additionally, if we were to fix things at the wrong layer, it would hurt maintainability, because it would become difficult to remove them once they are not needed anymore. So first what we do is that we figure out what's the right fix, in an ideal world, regardless of how long it would take. And then we decide what are the paths that we want to follow in order to get there. And if we make mistakes along the road, we correct the trajectory as needed, but still keeping the end of the journey in mind. And so far it worked well. So for instance, Plug and Play has been developed like this. We knew where we wanted to go, and we decided to interpolate medium steps from where we came, in order to progressively improve the situation in God's dependencies. And finally, the last rule, but the most important one. If it's not fun, you're not doing it right. Actually, this one is for us as maintainers. If it's not fun, we are not doing it right. You can remember what I said about open source sustainability. It's extremely important for maintainers to find fun in what they are doing. Fun creates enthusiasm. It creates energy. It's more fun when you're not the only person doing it. So maintaining a project can be difficult, stressful. We have to remember every day that we are doing this for us. If I'm working on yarn, I'm sorry, but it's not for you. It's because I find pleasure in doing it. I find pride in doing it. That's something that is important to remember when you're developing on open source. And it's also important to remember it as a user. Because it's also a guarantee that we are going to keep working on this project, because we are going to keep finding it fun. We are making sure that we don't burn out while working on it. Again, we owe it to us to keep the environment as fun and toxic-free as possible. We are investing in it. That's something that we really keep in mind. So even though it's the last slide, that's the most applicable to all open source in general. I hope you like this talk. Now we are going to do a quick Q&A. I will be available to answer your questions. You asked the question on the experience of contributing to open source. And the results are there. 55% say that they haven't tried any of contributing to open source. 36% loved it. And for 9%, it was difficult. So what do you think about it? I think it's super interesting because open source is such a big part of being a developer today. It's a good thing to see that so many people actually try to make a contribution and that they loved it. 56% haven't tried it yet. It's still a bit too large. So that's part of the work that we are doing as maintainers of many projects, trying to decrease these numbers so that more people can feel invested in the project they use. But yeah, those are great numbers. I'm more bothered about this pretty less number. But for people, for him, it was difficult. It could be difficult at times, I guess. Yeah. But really glad that most of the people liked it. Awesome. So let's take some questions by the audience. Darkovic asks that naming consistency is cool to have, but how to keep it consistent in a huge growing project with multiple teams? It can be difficult. Something that I noticed could work is to enforce it via linting. You have some lint rules that allow you to prevent some patterns. And that can work because if you know that people often tend to use, for example, a low instead of enable, then you can just add a lint rule to prevent the use of a low and tell them, hey, no, please use this nomenclature instead. So at the scale of a large code base, that can work. And of course, there is the code review where people are used to the code base can take a look at the code as a contributor to something that to a code base that you don't always know. One thing that also works is to look at how the existing code works, how it is implemented, what are the name of the things. And you can often see patterns in there that you can then reuse inside your PR. And the same is true not only for naming, but also in general. When you're writing a feature, often it's very close from something that already exists. So by looking at what already exists and how is it implemented, it's not a matter of copy past, but it can be simplified a lot. Yeah, that's nice. We never use ESLint in my any of the process before, but we did have naming convention documentation. Like, yeah, you go read it and name it according to that. But yeah, it could also be messy sometimes. Documentations. Interesting. Yeah. Thank you for answering that. I hope that answer the questions by you, Darkovik. We have another question by Jay Reed, and they are asking, we have dozens of devs contributing to our monorepo. Is it recommended that everyone be using the same exact version of yarn or should that matter? Yeah, for sure. Even though we try not to shape a breaking changes between release, except major, of course, we are humans and we can make mistakes or we can fix bugs that you are relying on, or we can add bugs into our releases. So usually what I notice is that if everyone has the exact same version of a package manager, whatever it is, then they are guaranteed to have the exact same behavior. If there is different versions, there's always something that will be slightly different from one developer to another. And even though in general, it might not be a problem, it will be a problem at some point, and it will cause a very difficult to debug situation that you don't want to happen. So with yarn, until a few years ago, we had the yarn path setting that allow you to check in yarn inside your repository so that it is transparently used by anyone in your company. And with Node, we worked during the past year in order to ship Corepack, which allows you to do the same thing, but without checking in yarn inside your repository. You just have to set a package manager field inside your package.json and provided people install yarn using Corepack, they will get the version that you described. Essentially, it's a way of locking down the version of your package manager. Yeah, that's it. Okay, thank you for answering that. We have another question by Cece Miller. I once heard it said a good package manager will be able to tell you the reason not to use their product. What would your answer be to that? And how can we help that answer go away in the open source community? That's an interesting question. I would say that yarn is perhaps a bit more opinionated than other package managers. So we really try to tell you when something doesn't seem right by our standards. For example, when a dependency is accessed but is not listed, we tell you that it's a problem. In other package managers, it might work because it usually works until it doesn't. yarn really surpasses the errors early, which can be a friction for adoption because some packages are not intended to support those kinds of strictness. And there is some work to do. That's something that I mentioned in the talk we have been doing with the ecosystem in its entirety. We went to a third-party project that had an issue with that, and we discussed with them and we worked with them in order to fix those problems in all the cases that have been of reference to us. There are still a few that remain, but every now and then we have the situation where we need to add a little bit more of a configuration in order to instruct yarn that some dependency should be there and it can proceed. In general, I think the main disadvantage of yarn is also its biggest advantage in that we are very strict and we really want your project to be consistent and to have safe patterns. Consistency is definitely very much important. We have some generic questions. How do you manage to find new ideas for improvement? We have a team of contributors. Usually, anyone of us has an idea about how to improve yarn. An open source project always has more ideas than people to work on them. That's also why we rely so much on people contributing to the project because then it helps us improve those crazy ideas we have in mind that we really want to work on but don't have quite the time yet. Every now and then, for example, right now we are working on yarn 4, the next major version of yarn. We take the time to make some more crazy experiments in order to see what we could improve. For instance, in yarn 4, we will be experimenting a few things with the offline mirror by trying to disable it by default instead of enabling it by default. We've been experimenting with bundle splitting yarn. We are trying to decrease the runtime of a yarn binary. We are doing a lot of very interesting small work. Usually, we have a large backlog. That's how we do it. That sounds interesting. Very exciting, to be honest. We have a very interesting question by Sergio. If you could redesign Node dependency system from scratch, what would you change? Well, I already did, in a sense. We work on plug and play, which is the default install strategy for packages when using yarn. Even though we support non-models and it works better with yarn 3 than it used to do with yarn 1, the plug and play is really the way that we see Node resolution. Sorry, PMP is the way we see how Node resolution should work. Basically, it's just a map of where are all the packages in your system and what package depends on which other one. In a sense, it's very much like import maps, if you've heard of it, except that it is more integrated into Node. For example, if you're accessing a package that you didn't list, PMP is able to tell you what is the exact problem. For example, is it a package you didn't list? Is it a PADependency that didn't get provided by whoever depends on your package? Is it because a package didn't get installed because it's native and it doesn't match the current system? What we did on PMP is the way we see Node resolution work. It's not yet the case and we are working with Node in order to be able to implement it officially through loaders so that we would use the right interface. Cool. We are running out of time now, but for the audience, you can still continue asking questions in the Discord channel and Mel will answer them. There is no special chat, but make sure to ask your questions and he will be there in the Discord to answer it to you. Thank you so much, Mel, for your informative talk and answering all these questions. Thank you, Ennerada. It was a pleasure to be there.
31 min
24 Mar, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic