1. Introduction to pnpm and its Popularity
Hello, my name is Zoltan Kochan and today I'm going to talk about pnpm, which is a fast disk space efficient package manager.
Before that let me introduce myself. I'm Zoltan Kochan. I was born and raised in Ukraine. I currently live in Ukraine as well. For now I'm safe. I work remotely for BIT. BIT is a company that helps developers to implement component-based development. Before BIT I worked for Just Answer for nine years and in the meanwhile, since 2016, I'm constantly developing and maintaining npm-pm.
Let's see what makes pnpm unique. When you install a dependency with npm or Yarn Classic, all the sub-dependencies are hoisted to the root of node modules. As you may see in this example, even though only Express is in the dependencies, all those other packages are also hoisted to the root of node modules. On the contrary, pnpm is only put in Express to the root of node modules. So even though cookie is not a dependency of your project, importing it will work. This is a dangerous situation.
2. Isolated Node Module Structure with PMPM
But what will happen if a new version of Express comes out and cookie is not in its dependencies anymore? Your package will break. PMPM prevents such mistakes. With PMPM your code has only access to packages that are declared as dependencies of your project. In this example, Express is a direct dependency and KUKI is a dependency of Express. The real location of Express is inside the .pmpm folder. So nodejs will search for KUKI in the dedicated nodemodules folder which is at .pmpm express at 4.17.3/.nodemodules. Each package is in its own isolated folder with its own dependencies.
But what will happen if a new version of Express comes out and cookie is not in its dependencies anymore? Your package will break. PMPM prevents such mistakes.
With PMPM your code has only access to packages that are declared as dependencies of your project. How is PMPM able to create such an isolated node module structure?
In this example you see Express is a direct dependency and KUKI is a dependency of Express. As you can see, only Express is placed to the root of the project's node modules folder. And KUKI and actually that Express inside node modules, it is a seamlink, it is just a seamlink to a folder inside the .pmpm folder.
This works because when searching for dependencies the node.js resolution algorithm is using the real location of the package as a starting point for searching its dependencies. The real location of Express is inside the .pmpm folder. So nodejs will search for KUKI in the dedicated nodemodules folder which is at .pmpm express at 4.17.3/.nodemodules. And of course you can see that KUKI inside the nodemodules of Express is also just a symlink to a folder with the real location of KUKI.
So let me show you a quick demo. I have an empty project. I run npm add express. Installation is completed. Let's see what we got here. Ignore this. This is something we hoist by default to fix some issues. But you can see that Express is only hoisted and it is a Syn-Link. The real location of Express is here. And here are in this folder also all the other direct dependencies of Express.
So you can see here cookie. But cookie is actually a symlink to a dedicated isolated folder for the cookie device. You can see here. Actually, cookie is here. And this is a symlink. This is also a symlink to this folder of the accept package. So this is a real location of accept. And these are dependencies of accept. So you can see that each package is in its own isolated folder with its own dependencies. Let's return to the slides.
3. Superior Monorepo Support with QNPM
QNPM has superior monorepo support due to its node module structure. In a monorepo, all dependencies of all projects are hoisted to the root, allowing access to dependencies of other projects. This can cause issues when installed outside the monorepo. pnpm's isolated module structure ensures projects only have access to their own dependencies. In a monorepo, pnpm creates a single .pnpm folder in the root, linking projects to their direct dependencies.
QNPM has also very very good monorepo support, thanks to its superior node module structure. The hoisted node-module's directory structure is even worse in a monorepo. Because in a monorepo, when a hoisted node-module's layout is used, all dependencies of all projects are hoisted to the root of the monorepo. Hence projects have access even to dependencies of other projects. So in this example, there is a monorepo with two projects, of wind bar. Foo has one dependency, that is foo. As you can see in its packages.sam file. And bar has also one dependency, that is bar. If a hoisted monorepo is used, both dependencies are hoisted to the root of the monorepo. So foo has access to the dependency of bar, and bar has access to the dependency of foo. If you accidentally require the depth of bar in foo, this code will work locally. However, foo will break when installed as a dependency outside of the monorepo. This is even worse than in a single project scenario. Because in a single project scenario it might break in the future, but in this case it will break after the first release of the package. With pnpm's isolated module structure projects have access only to their own dependencies. In a monorepo pnpm creates a single .pnpm folder in the root of the monorepo and projects of the workspace link only their direct dependencies. So both foo and bar have access only to their own dependencies.
4. npm's Approach and pnpm's Solution
npm is working on a setting to support a pnpm styled mod modules directory. Some tools and packages in the ecosystem are broken due to this hoistedmodmodules architecture. pnpm provides a solution by allowing the creation of classic modmodules without using symlinks.
You might ask yourself why is npm not using the same approach. Well actually it will, because they work on a setting to support a pnpm styled mod modules directory. But the reason they don't use it as a default is because some tools don't support symlinks yet, like for instance reactmetio, and many packages, also many packages in the ecosystem are basically broken because of this hoistedmodmodules architecture. And they rely on undeclared dependences. But pnpm actually has a solution for these cases, because pnpm is actually able to also create a classic modmodules without using symlinks, and with classic hoisting, just set the nodelinker setting to hoisted. And you have a workaround.
5. Efficient Disk Space Usage with pnpm
pnpm solves the disk space usage issue by using a content addressable storage. Files are saved by a hash code, allowing for efficient storage. Every file in node modules is just a hardlink to the content addressable storage, reducing disk space consumption. Dependencies in new projects consume a lot less disk space than with npm or Yarn.
pnpm also solves the disk space usage issue. If you have many Node.js projects on your computer, you may have noticed that modmodules consumes a lot of your disk space. pnpm solves this issue by using a content addressable storage. A content addressable storage is basically a storage that saves files by a hash code which is calculated from the content of the file. So you can see in this example there is actually a code file in different versions of the same package but even though they are different versions the code file is the same, it has no changes. In this case this file will have the same hash in every package so a single file will be created for it in the store. So even though you have 3 packages, you save this file only once, in one place in the disk.
Every file in node modules is just a hardlink to the content addressable storage. Only on systems that support it not hardlinks are used but copy and write files, probably on Linux and Mac. These files don't consume additional disk space, they are just references to files in the store. It means that if two projects on your computer have the same file in modmodels, in both projects that file will be linked from the same place on the disk. You won't have two copies of the same package. You will just have hardlinks to the global content addressable store. As a result, dependencies in new projects consume a lot less list space than with npm or Yarn.
6. Why pmpm is so fast
Why is pmpm so fast? npm and Yarn install dependencies in stages, while pmpm runs the installation stages separately for each dependency. This makes pmpm incredibly faster. Some of these speed optimizations are possible thanks to the unique deterministic node module structure created by pmpm and the usage of the content addressable store. However, these optimizations make the code of the package managers harder to understand, so it's a tradeoff.
I have one slide mixed, no problem. Why other package managers don't support the content addressable store? I think in the next version they will make it ON by default. Let's talk about why pmpm is so fast. In most cases pmpm is faster than npm and Yarn. But why is it so? Npm and Yarn install dependencies in stages. At the first stage all the packages are resolved. Then all the packages are fetched from the registry. And when all the packages are fetched, the dependencies are written to node modules. On the other hand, pmpm runs the installation stages separately for each dependency. So while some packages are still being resolved, others are already being fetched. And later on some packages are written to node modules, while some others are still being fetched. This makes pmpm incredibly faster than other package managers. So you might ask, why other package managers don't do the same. Actually, some of these speed optimizations are possible only thanks to the unique deterministic node module structure created by pmpm inside the .pmpm folder. Some of these speed optimizations are possible thanks to the usage of the content addressable store and some of the optimizations make the code of the package managers hard to understand, so it's a tradeoff. If you choose to make these optimizations it will be harder to contribute to your package manager.
7. Pnpm Features and File Linking
Pnpm is bundled to an executable, allowing you to use it even without pre-installed Node.js. You can also use Pnpm to install and switch between different versions of Node.js. When choosing a package manager, consider the features of Pnpm, Yarn, and Npm. Visit the Pnpm website and follow Pnpmjs and my Twitter accounts for more information. Files from the content addressable store in Pnpm are linked using hardlinks, not just symlinks. A symlink is a reference to another place on the disk, while a hardlink is another file with its own location.
And the last feature I guess I want to talk about is the Pnpm is bundled to an executable, so you may use it on your system even if it doesn't have Node.js pre-installed. And then you may use Pnpm to install Node.js and switch between different versions of Node.js. Basically, you can use Pnpm instead of nvm, nvs or volta.
So, which package manager should you use? It depends. You should know about every package manager, you should know about the features of Pnpm, Yarn and Npm. All of these 3 package managers are mature, are used by big tech companies, have many contributors and are reliable. So, you really need to learn what features they provide, and on each project, probably, you should pick the one that best fits your needs.
Thanks for attending my presentation. Of course, visit the Pnpm website to get more information and follow the Pnpmjs Twitter account, and follow me on Twitter as well. Thank you! That was pretty interesting. Loved all the details that you shared. So, thank you for that.
So, you asked the question how are files from the content addressable store linked to Node modules? And the results are in. So, 63% say using symlinks, while 38% say hardlinks. So, what do you say? Yeah, it's something that many people by mistake think that we use only symlinks. But in reality we use both symlinks and hardlinks in pnpm. And specifically for this purpose, for linking files from the content addressable store, we use hardlinks. It's important. We can use hardlinks, we can copy files or we can use copy and write files. But we cannot use symlinks for this purpose, because it will break Node.js resolution algorithm.
Cool. So 38% of you all got it right. It's hardlinks. So just to expand a bit on it, maybe if you would like to give a short difference between symlinks and hardlinks for the audience. Yeah. So a symlink is basically just a reference to another place on the disk. So basically, if a Node module is symlinked, Node.js resolves this location to the real location of the file. So when it is searching for the dependencies of this module, it's searching from the real location of the file. A hardlink is different. It's basically another file with its own location.
8. Hardlinks, Migration, and Multiple Drives
You can have many hardlinks on the disk at different places. For Node.js, it is like separate files. These files are pointing to the same location on the physical disk. These files do not consume additional extra space on your disk. So let's take some questions from the audience. How difficult is to migrate the codebase from YARN to pnpm, for example? It's... If it's just a single package repository, then it's really easy. What is the full form of pnpm? What does it stand for? This name was actually invented before me. The first maintainer of pnpm was Rico Stacrus from the Philippines. It is just performance npm. I wanted to ask a question. How does pnpm work on computers which might have multiple drives or multiple file systems? How does everything work on that? The drawback with hardlinks is that a hardlink can be only in scope of one disk. So you cannot have your ContentAddressable store on disk C and your project on disk D, if you are using Windows for instance.
You can have many hardlinks on the disk at different places. For Node.js, it is like separate files. These files are pointing to the same location on the physical disk. These files do not consume additional extra space on your disk.
Cool, that's handy. Yeah, and that is how you make the sequence efficient. Yeah, we use less disk space. Yeah, correct. That's perfect.
So let's take some questions from the audience. So, we have a question. How difficult is to migrate the codebase from YARN to pnpm, for example? It's... If it's just a single package repository, then it's really easy. There is an import command. So you can just run pnpm import and it will convert your log file from YARN log to pnpm log YAML, and you just need to remove the YARN file and commit the new log file. Then maybe update your CI scripts from YARN install to pnpm install, from YARN run to pnpm run. In most cases, all you need to do is just change YARN to pnpm and it will work as a drop-in replacement. For a workspace, you need to change how you list the package globs. YARN uses a field in package.json and pnpm uses a separate file, pnpm-workspace.yarn.
What is the full form of pnpm? What does it stand for? This name was actually invented before me. The first maintainer of pnpm was Rico Stacrus from the Philippines. It is just performance npm. The first version was 10 times faster than npm at that time. That's amazing.
I wanted to ask a question. How does pnpm work on computers which might have multiple drives or multiple file systems? How does everything work on that? The drawback with hardlinks is that a hardlink can be only in scope of one disk. So you cannot have your ContentAddressable store on disk C and your project on disk D, if you are using Windows for instance. So in that case, PNPM creates a separate storage for each disk. So if you have projects on disk C, you will have a store for those projects in project C and if you have project D you will have a separate store for those projects. Or alternatively you can use one store but if the store is on disk C, then on disk C, hard links will be used to that store, but on disk D, files will be just copied. So you won't have the benefits of disk space efficiency. That is interesting, yeah. Thank you for this answer. So I do not see more questions, but for the audience, if you still have more questions, you can always ask them in the Q&A channel and you will get the answers from Zoltan there. And we do not have any speaker room for Zoltan, so make sure that you ask the questions in the Q&A channel. Thank you once again, Zoltan, for joining us today for this terrific talk and also for answering these nice questions. Thank you so much. Thank you.