pnpm – a Fast, Disk Space Efficient Package Manager for JavaScript

Bookmark

You will learn about one of the most popular package managers for JavaScript and its advantages over npm and Yarn.

  • A brief history of JavaScript package managers
  • The isolated node_modules structure created pnpm
  • What makes pnpm so fast
  • What makes pnpm disk space efficient
  • Monorepo support
  • Managing Node.js versions with pnpm



Transcription


Hello, my name is Zoltan Kochan and today I'm going to talk about pnpm, which is a fast disk space efficient package manager. Before that let me introduce myself, I'm Zoltan Kochan, I was born and raised in Ukraine, I currently live in Ukraine as well, for now I'm safe. I work remotely for Bit, Bit is a company that helps developers to component based development. Before Bit I worked for JustAnswer for 9 years and in the meanwhile, since 2016, I'm constantly developing and maintaining pnpm. Before talking about pnpm let's briefly talk about other javascript package managers. The most popular node.js or javascript package manager is npm, which is the official package manager of the npm registry. It is shipped by node.js and in the past npm had a lot of issues, like it was slow, it was undeterministic, it was giving funny results sometimes, so some alternatives were created. One of the alternatives was created by Facebook, you probably heard about yarn, which is the second most popular package manager after npm. It is now maintained by the community and it did solve a lot of the issues that npm had in version 3 and I think version 4. Since then in v2 yarn has switched to use plug and play by default, so even though it does support classic node modules installation, now it prefers and recommends to use plug and play, which many love and many hate. I personally think it's a cool feature. And yarn is currently shipped with the latest version of node.js through the corepack tool. And the third most popular package manager is pnpm. It is a completely indie project, it was created by open source contributors to fix the issues of npm version 3, at the same time as yarn 2. pnpm is not a new project, it exists since yarn exists basically. And now it is supported by bit, because I work at bit and pnpm is heavily used in the bit CLI for package management. And pnpm as well is shipped with node.js through the corepack feature of node.js. If we compare these package managers by popularity, obviously npm is currently the most popular one, then yarn and pnpm is the less popular, even though pnpm came out at the same time as yarn. But of course Facebook had a lot of marketing power to make yarn very popular at the start. However, even though pnpm is less popular for now, it had a big spike in popularity last year. So in 2021 pnpm was downloaded three times more than in 2020. And we have a lot of big tech companies that already use pnpm, so even Microsoft uses pnpm in some of its projects, and the TikTok frontend team uses pnpm. So pnpm works really well and is production-ready for sure. Let's see what makes pnpm unique. So when you install a dependency with npm or yarn classic, all the sub-dependencies are hoisted to the root of node modules. As you may see in this example, even though only express is in the dependencies, all those other packages are also hoisted to the root of node modules. On the contrary, pnpm is only put in express to the root of node modules. So even though cookie is not a dependency of your project, importing it will work. This is a dangerous situation. Your package will work locally fine and it will even work when you install it as a dependency. But what will happen if a new version of express comes out and cookie is not in its dependencies anymore? Your package will break. pnpm prevents such mistakes. With pnpm your code has only access to packages that are declared as dependencies of your project. But how is pnpm able to create such an isolated node module structure? In this example you see express is a direct dependency and cookie is a dependency of express. As you can see only express is placed to the root of the project's node modules folder. And cookie and actually that express inside node modules is just a symlink to a folder inside the.pnpm folder. This works because when searching for dependencies the node.js resolution algorithm is using the real location of the package as a starting point for searching its dependencies. The real location of express is inside the.pnpm folder. So node.js will search for cookie in the dedicated node modules folder which is at.pnpm express at 4.17.3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0. Let me show you a quick demo. So I have an empty project. I run pnpm add express. Installation is completed. Let's see what we got here. Okay, so ignore this. This is something we hoist by default to fix some issues. These types, but you can see that express is only hoisted. And it is a symlink. The relocation of express is here. And here are, in this folder, also all the other direct dependencies of express. So you can see here cookie. But cookie is actually a symlink to a dedicated folder, isolated folder for the cookie dependency here. So actually cookie is here. And this is a symlink to cookie. Like also this, this is also a symlink to this folder of the accept package. So this is the relocation of accept. And these are the dependencies of accept. So you can see that each package is in its own isolated folder with its own dependencies. Let's return to the slides. pnpm has also very, very good monorepo support, thanks to its superior node module structure. The hoisted node module's directory structure is even worse in a monorepo. In a monorepo, when a hoisted node module's layout is used, all dependencies of all projects are hoisted to the root of the monorepo. Hence, projects have access even to dependencies of other projects. So in this example, there is a monorepo with two projects, foo and bar. foo has one dependency, dep of foo. As you can see in its packages.son file. And bar has also one dependency, dep of bar. So if a hoisted node module is used, both dependencies are hoisted to the root of the monorepo. So foo has access to the dependency of bar. And bar has access to the dependency of foo. If you accidentally require the dep of bar in foo, this code will work locally. But foo will break when installed as a dependency outside of the monorepo. So this is even worse than in a single project scenario. Because in a single project scenario, it might break in the future, but in this case, it will break after the first release of the package. With PMPM's isolated node module structure, projects have access only to their own dependencies. So in this example, we have a monorepo. PMPM creates a single.pmpm folder in the root of the monorepo. And projects of the workspace link only their direct dependencies. So both foo and bar have access only to their own dependencies. So you may ask yourself, why is npm not using the same approach? Well, actually it will, because they work on a settings to support a PMPM style node module directory. But the reason they don't use it as a default is because some tools don't support symlinks yet. Like for instance react Meteor. And many packages, also many packages in the ecosystem are basically broken because of this hoisted node module's architecture. And they rely on undeclared dependencies. PMPM actually has a solution for these cases, because PMPM is actually able to also create a classic node module without using symlinks. And with classic hoisting, just set the node linker setting to hoisted and you have a workaround. PMPM also solves the disk space usage issue. If you have many node.js projects on your computer, you may have noticed that node modules consume a lot of your disk space. PMPM solves this issue by using a content addressable storage. A content addressable storage is basically a storage that saves files under hash code, which is calculated from the content of the file. So you can see in this example there is actually a code file in different versions of the same package. But even though they are different versions, the code file is the same. It has no changes. In this case, this file will have the same hash in every package, so a single file will be created for it in the store. So even though you have three packages, you save this file only once, in one place in the disk. Every file in node modules is just a hard link to the content addressable storage. Actually, on systems that support it, not hard links are used, but copy-on-write files. Probably on Macs and some newer Linux file systems. These files don't consume additional disk space, they are just references to files in the store. It means that if two projects on your computer have the same file in node modules, in both projects that file will be linked from the same place on the disk. You won't have two copies of the same package. You will just have hard links to the global content addressable store. As a result, dependencies in new projects consume a lot less disk space than with npm or yarn. Now let's talk about... Oh, I have one slide mixed, no problem. So why other package managers don't support the content addressable store? Actually, yarn has added support for a content addressable store as well, but for now it is an opt-in feature. I think in the next version, before they plan to make it on by default. Now let's talk about why pnpm is so fast. In most cases, pnpm is faster than npm and yarn, but why is it so? npm and yarn install dependencies in stages. At the first stage, all the packages are resolved. Then all the packages are fetched from the registry. And when all packages are fetched, the dependencies are written to node modules. On the other hand, pnpm runs the installation stages separately for each dependency. So while some packages are still being resolved, others are already being fetched. And later on, some packages are written to node modules, while some others are still being fetched. This makes pnpm incredibly fast, faster than other package managers. So you might ask why other package managers don't do the same. Actually, some of these speed optimizations are possible only thanks to the unique deterministic node module structure created by pnpm, inside the.pnpm folder. Some of these speed optimizations are possible thanks to the usage of the content addressable store. And some of the optimizations make the code of the package managers hard to understand, so it's a trade-off. If you choose to make these optimizations, it will be harder to contribute to your package manager. The last feature I guess I want to talk about is that pnpm is bundled to an executable. So you may use it on your system, even if it doesn't have node.js pre-installed. And then you may use pnpm to install node.js and switch between different versions of node.js. So basically you can use pnpm instead of nvm, nvs or volta. So which package manager should you use? It depends. You should know about every package manager. You should know about the features of pnpm, yarn and npm. All of these three package managers are mature, are used by big tech companies, have many contributors and are reliable. So you really need to learn what features they provide. And on each project probably you should pick the one that best fits your needs. Thanks for attending my presentation. Of course visit the pnpm website to get more information. Follow the pnpm.js Twitter account and follow me on Twitter as well. Thank you. That was pretty interesting. Loved all the details that you shared. So thank you for that. So you asked the question, how are files from the content addressable store linked to Node modules? And the results are in. So 63% people say using symlinks while 38% say hardlinks. So what do you say? It's something that many people by mistake think that we use only symlinks, but in reality we use both symlinks and hardlinks in pnpm. And specifically for this purpose, for linking files from the content addressable store, we use hardlinks. It's important. We can use hardlinks or we can copy files or we can use copy and write files. But we cannot use symlinks for this purpose because it will break node.js resolution algorithm. Cool. So 38% of you all got it right. It's hardlinks. So just to expand a bit on it, maybe if you would like to give a short difference between symlinks and hardlinks for the audience. Yeah. So a symlink is basically just a reference to another place on the disk. So basically if a Node module is symlinked, node.js resolves this location to the file location to the real location of the file. So when it is searching for the dependencies of this module, it's searching from the real location of the file. A hardlink is different. It's basically another file with its own location. So you can have many hardlinks on the disk at different places. For node.js, it is like separate files. But these files are pointing to the same location on the physical disk. So these files do not consume additional extra space on your disk. Cool. And that is how you make the sequence efficient. Yeah, correct. That's perfect. So let's take some questions from the audience. So we have a question. How difficult is it to migrate the codebase from yarn to PNPM, for example? If it's just a single package repository, then it's really easy. There is an import command. So you can just run pnpm import and it will convert your log file from yarn log to pnpm log yaml. And you just need to remove the yarn file and commit the new log file. And then maybe update your CI scripts from yarn install to pnpm install, from yarn run to pnpm run. In most cases, all you need to do is just change yarn to pnpm and it will work as a drop-in replacement. For a workspace, you need to change how you list the package globs. So yarn uses a field in package.json and pnpm uses a separate file, pnpm-workspace.yaml. That is the biggest, like almost all of it. It's really easy. Okay. And pnpm as well. In most cases, all three package managers are compatible with each other. Cool. So you heard him. It's not difficult. It's easy. So you can try it. Okay. I had a pretty question. What's the full form of pnpm? What does it stand for? So this name was actually invented before me. The first maintainer of pnpm was Rico Stacrus from Philippines. And it is just performant npm because the first version was faster, like 10 times faster than npm at that time. That's why. That's amazing. That's amazing. So I wanted to ask a question, like how does pnpm like works on computers which might have multiple drives or multiple file systems? So how does this everything works on that? Yeah, so the drawback with hard links is that a hard link can be only in scope of one disk. So you cannot have your content addressable store on disk C and your project on disk D if you're using Windows for instance. So in that case, pnpm creates separate storage for each disk. So if you have projects on disk C, you will have a store for those projects in project C. And if you have project D, you will have a separate store for those projects. Or alternatively, you can use one store, but if the store is on disk C, then on disk C, hard links will be used to that store. But on disk D, files will be just copied. So you won't have the benefits of this space efficiency. That is interesting. Yeah. Thank you for this answer. I do not see more questions, but for the audience, if you still have more questions, you can always ask them in the Q&A channel. And you will get the answers from Zoltan there. And we do not have any speaker room for Zoltan. So yeah, make sure that you ask the questions in the Q&A channel. Thank you once again, Zoltan, for joining us today for this terrific talk and also for answering these nice questions. Thank you so much. Thank you. Thank you for having me.
31 min
24 Mar, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic