pnpm – a Fast, Disk Space Efficient Package Manager for JavaScript

Rate this content
Bookmark

You will learn about one of the most popular package managers for JavaScript and its advantages over npm and Yarn.

  • A brief history of JavaScript package managers
  • The isolated node_modules structure created pnpm
  • What makes pnpm so fast
  • What makes pnpm disk space efficient
  • Monorepo support
  • Managing Node.js versions with pnpm

31 min
24 Mar, 2022

Comments

Sign in or register to post your comment.

Video Summary and Transcription

pnpm is a fast and efficient package manager that gained popularity in 2021 and is used by big tech companies like Microsoft and TikTok. It has a unique isolated node module structure that prevents package conflicts and ensures each project only has access to its own dependencies. pnpm also offers superior monorepo support with its node module structure. It solves the disk space usage issue by using a content addressable storage, reducing disk space consumption. pnpm is incredibly fast due to its installation process and deterministic node module structure. It also allows file linking using hardlinks instead of symlinks.

1. Introduction to pnpm and its Popularity

Short description:

Hello, my name is Zoltan Kochan and today I'm going to talk about pnpm, which is a fast disk space efficient package manager. Let me introduce myself. I work remotely for BIT, a company that helps developers implement component-based development. Before that, I worked for Just Answer for nine years and have been developing and maintaining npm-pm since 2016. npm, yarn, and pnpm are the most popular JavaScript package managers. npm had issues in the past, but alternatives like yarn and pnpm were created. pnpm is an indie project that fixes the issues of npm v3 and is now supported by bit. It had a spike in popularity in 2021 and is used by big tech companies like Microsoft and TikTok. pnpm is unique because it only hoists the necessary dependencies to the root of node modules.

Hello, my name is Zoltan Kochan and today I'm going to talk about pnpm, which is a fast disk space efficient package manager.

Before that let me introduce myself. I'm Zoltan Kochan. I was born and raised in Ukraine. I currently live in Ukraine as well. For now I'm safe. I work remotely for BIT. BIT is a company that helps developers to implement component-based development. Before BIT I worked for Just Answer for nine years and in the meanwhile, since 2016, I'm constantly developing and maintaining npm-pm.

Before talking about npm-pm let's briefly talk about other Javascript package managers. The most popular node.js or javascript package manager is npm which is the official package manager of the npm registry. It is shipped by node.js and in the past npm had a lot of issues, like it was slow, it was undeterministic, it was giving funny results sometimes, so some alternatives were created. One of the alternatives was created by facebook, you probably heard about yarn, which is the second most popular package manager after npm. It is now maintained by the community, and it did solve a lot of the issues that npm had in version 3 and 4. Since then, in v2, yarn has switched to use plug and play by default, so even though it does support classic node modules installation, now it prefers to use the plug and play, which many love and many hate. I personally think it's a cool feature. And yarn is currently shipped with the latest version of Node.js. The 3rd most popular package manager is pnpm. It is a completely indie project, it was created by open source contributors to fix the issues of npm v3. At the same time as yarn 2, so pnpm is not like a new project, it exists since yarn exists, basically. And now it is supported by bit, because I work at bit and pnpm is heavily used in the bitcli for package management and pnpm as well is shipped with nodjs through the corepack feature of nodjs. If we compare these package managers by popularity, obviously npm is currently the most popular one, then yarn and pnpm is the less popular, even though pnpm came out at the same time as yarn, but of course facebook had a lot of marketing power to make yarn very popular at the start. However, even though pnpm is less popular for now, it had a big spike in popularity last year, so in 2021 pnpm was downloaded three times more than in 2020. We have a lot of big tech companies that already use pnpm, so even Microsoft uses pnpm in some of its projects, and the TikTok frontend team uses pnpm. So pnpm works really well, and is production ready for sure.

Let's see what makes pnpm unique. When you install a dependency with npm or Yarn Classic, all the sub-dependencies are hoisted to the root of node modules. As you may see in this example, even though only Express is in the dependencies, all those other packages are also hoisted to the root of node modules. On the contrary, pnpm is only put in Express to the root of node modules. So even though cookie is not a dependency of your project, importing it will work. This is a dangerous situation.

2. Isolated Node Module Structure with PMPM

Short description:

But what will happen if a new version of Express comes out and cookie is not in its dependencies anymore? Your package will break. PMPM prevents such mistakes. With PMPM your code has only access to packages that are declared as dependencies of your project. In this example, Express is a direct dependency and KUKI is a dependency of Express. The real location of Express is inside the .pmpm folder. So nodejs will search for KUKI in the dedicated nodemodules folder which is at .pmpm express at 4.17.3/.nodemodules. Each package is in its own isolated folder with its own dependencies.

But what will happen if a new version of Express comes out and cookie is not in its dependencies anymore? Your package will break. PMPM prevents such mistakes.

With PMPM your code has only access to packages that are declared as dependencies of your project. How is PMPM able to create such an isolated node module structure?

In this example you see Express is a direct dependency and KUKI is a dependency of Express. As you can see, only Express is placed to the root of the project's node modules folder. And KUKI and actually that Express inside node modules, it is a seamlink, it is just a seamlink to a folder inside the .pmpm folder.

This works because when searching for dependencies the node.js resolution algorithm is using the real location of the package as a starting point for searching its dependencies. The real location of Express is inside the .pmpm folder. So nodejs will search for KUKI in the dedicated nodemodules folder which is at .pmpm express at 4.17.3/.nodemodules. And of course you can see that KUKI inside the nodemodules of Express is also just a symlink to a folder with the real location of KUKI.

So let me show you a quick demo. I have an empty project. I run npm add express. Installation is completed. Let's see what we got here. Ignore this. This is something we hoist by default to fix some issues. But you can see that Express is only hoisted and it is a Syn-Link. The real location of Express is here. And here are in this folder also all the other direct dependencies of Express.

So you can see here cookie. But cookie is actually a symlink to a dedicated isolated folder for the cookie device. You can see here. Actually, cookie is here. And this is a symlink. This is also a symlink to this folder of the accept package. So this is a real location of accept. And these are dependencies of accept. So you can see that each package is in its own isolated folder with its own dependencies. Let's return to the slides.

3. Superior Monorepo Support with QNPM

Short description:

QNPM has superior monorepo support due to its node module structure. In a monorepo, all dependencies of all projects are hoisted to the root, allowing access to dependencies of other projects. This can cause issues when installed outside the monorepo. pnpm's isolated module structure ensures projects only have access to their own dependencies. In a monorepo, pnpm creates a single .pnpm folder in the root, linking projects to their direct dependencies.

QNPM has also very very good monorepo support, thanks to its superior node module structure. The hoisted node-module's directory structure is even worse in a monorepo. Because in a monorepo, when a hoisted node-module's layout is used, all dependencies of all projects are hoisted to the root of the monorepo. Hence projects have access even to dependencies of other projects. So in this example, there is a monorepo with two projects, of wind bar. Foo has one dependency, that is foo. As you can see in its packages.sam file. And bar has also one dependency, that is bar. If a hoisted monorepo is used, both dependencies are hoisted to the root of the monorepo. So foo has access to the dependency of bar, and bar has access to the dependency of foo. If you accidentally require the depth of bar in foo, this code will work locally. However, foo will break when installed as a dependency outside of the monorepo. This is even worse than in a single project scenario. Because in a single project scenario it might break in the future, but in this case it will break after the first release of the package. With pnpm's isolated module structure projects have access only to their own dependencies. In a monorepo pnpm creates a single .pnpm folder in the root of the monorepo and projects of the workspace link only their direct dependencies. So both foo and bar have access only to their own dependencies.

4. npm's Approach and pnpm's Solution

Short description:

npm is working on a setting to support a pnpm styled mod modules directory. Some tools and packages in the ecosystem are broken due to this hoistedmodmodules architecture. pnpm provides a solution by allowing the creation of classic modmodules without using symlinks.

You might ask yourself why is npm not using the same approach. Well actually it will, because they work on a setting to support a pnpm styled mod modules directory. But the reason they don't use it as a default is because some tools don't support symlinks yet, like for instance reactmetio, and many packages, also many packages in the ecosystem are basically broken because of this hoistedmodmodules architecture. And they rely on undeclared dependences. But pnpm actually has a solution for these cases, because pnpm is actually able to also create a classic modmodules without using symlinks, and with classic hoisting, just set the nodelinker setting to hoisted. And you have a workaround.

5. Efficient Disk Space Usage with pnpm

Short description:

pnpm solves the disk space usage issue by using a content addressable storage. Files are saved by a hash code, allowing for efficient storage. Every file in node modules is just a hardlink to the content addressable storage, reducing disk space consumption. Dependencies in new projects consume a lot less disk space than with npm or Yarn.

pnpm also solves the disk space usage issue. If you have many Node.js projects on your computer, you may have noticed that modmodules consumes a lot of your disk space. pnpm solves this issue by using a content addressable storage. A content addressable storage is basically a storage that saves files by a hash code which is calculated from the content of the file. So you can see in this example there is actually a code file in different versions of the same package but even though they are different versions the code file is the same, it has no changes. In this case this file will have the same hash in every package so a single file will be created for it in the store. So even though you have 3 packages, you save this file only once, in one place in the disk.

Every file in node modules is just a hardlink to the content addressable storage. Only on systems that support it not hardlinks are used but copy and write files, probably on Linux and Mac. These files don't consume additional disk space, they are just references to files in the store. It means that if two projects on your computer have the same file in modmodels, in both projects that file will be linked from the same place on the disk. You won't have two copies of the same package. You will just have hardlinks to the global content addressable store. As a result, dependencies in new projects consume a lot less list space than with npm or Yarn.

6. Why pmpm is so fast

Short description:

Why is pmpm so fast? npm and Yarn install dependencies in stages, while pmpm runs the installation stages separately for each dependency. This makes pmpm incredibly faster. Some of these speed optimizations are possible thanks to the unique deterministic node module structure created by pmpm and the usage of the content addressable store. However, these optimizations make the code of the package managers harder to understand, so it's a tradeoff.

I have one slide mixed, no problem. Why other package managers don't support the content addressable store? I think in the next version they will make it ON by default. Let's talk about why pmpm is so fast. In most cases pmpm is faster than npm and Yarn. But why is it so? Npm and Yarn install dependencies in stages. At the first stage all the packages are resolved. Then all the packages are fetched from the registry. And when all the packages are fetched, the dependencies are written to node modules. On the other hand, pmpm runs the installation stages separately for each dependency. So while some packages are still being resolved, others are already being fetched. And later on some packages are written to node modules, while some others are still being fetched. This makes pmpm incredibly faster than other package managers. So you might ask, why other package managers don't do the same. Actually, some of these speed optimizations are possible only thanks to the unique deterministic node module structure created by pmpm inside the .pmpm folder. Some of these speed optimizations are possible thanks to the usage of the content addressable store and some of the optimizations make the code of the package managers hard to understand, so it's a tradeoff. If you choose to make these optimizations it will be harder to contribute to your package manager.

7. Pnpm Features and File Linking

Short description:

Pnpm is bundled to an executable, allowing you to use it even without pre-installed Node.js. You can also use Pnpm to install and switch between different versions of Node.js. When choosing a package manager, consider the features of Pnpm, Yarn, and Npm. Visit the Pnpm website and follow Pnpmjs and my Twitter accounts for more information. Files from the content addressable store in Pnpm are linked using hardlinks, not just symlinks. A symlink is a reference to another place on the disk, while a hardlink is another file with its own location.

And the last feature I guess I want to talk about is the Pnpm is bundled to an executable, so you may use it on your system even if it doesn't have Node.js pre-installed. And then you may use Pnpm to install Node.js and switch between different versions of Node.js. Basically, you can use Pnpm instead of nvm, nvs or volta.

So, which package manager should you use? It depends. You should know about every package manager, you should know about the features of Pnpm, Yarn and Npm. All of these 3 package managers are mature, are used by big tech companies, have many contributors and are reliable. So, you really need to learn what features they provide, and on each project, probably, you should pick the one that best fits your needs.

Thanks for attending my presentation. Of course, visit the Pnpm website to get more information and follow the Pnpmjs Twitter account, and follow me on Twitter as well. Thank you! That was pretty interesting. Loved all the details that you shared. So, thank you for that.

So, you asked the question how are files from the content addressable store linked to Node modules? And the results are in. So, 63% say using symlinks, while 38% say hardlinks. So, what do you say? Yeah, it's something that many people by mistake think that we use only symlinks. But in reality we use both symlinks and hardlinks in pnpm. And specifically for this purpose, for linking files from the content addressable store, we use hardlinks. It's important. We can use hardlinks, we can copy files or we can use copy and write files. But we cannot use symlinks for this purpose, because it will break Node.js resolution algorithm.

Cool. So 38% of you all got it right. It's hardlinks. So just to expand a bit on it, maybe if you would like to give a short difference between symlinks and hardlinks for the audience. Yeah. So a symlink is basically just a reference to another place on the disk. So basically, if a Node module is symlinked, Node.js resolves this location to the real location of the file. So when it is searching for the dependencies of this module, it's searching from the real location of the file. A hardlink is different. It's basically another file with its own location.

8. Hardlinks, Migration, and Multiple Drives

Short description:

You can have many hardlinks on the disk at different places. For Node.js, it is like separate files. These files are pointing to the same location on the physical disk. These files do not consume additional extra space on your disk. So let's take some questions from the audience. How difficult is to migrate the codebase from YARN to pnpm, for example? It's... If it's just a single package repository, then it's really easy. What is the full form of pnpm? What does it stand for? This name was actually invented before me. The first maintainer of pnpm was Rico Stacrus from the Philippines. It is just performance npm. I wanted to ask a question. How does pnpm work on computers which might have multiple drives or multiple file systems? How does everything work on that? The drawback with hardlinks is that a hardlink can be only in scope of one disk. So you cannot have your ContentAddressable store on disk C and your project on disk D, if you are using Windows for instance.

You can have many hardlinks on the disk at different places. For Node.js, it is like separate files. These files are pointing to the same location on the physical disk. These files do not consume additional extra space on your disk.

Cool, that's handy. Yeah, and that is how you make the sequence efficient. Yeah, we use less disk space. Yeah, correct. That's perfect.

So let's take some questions from the audience. So, we have a question. How difficult is to migrate the codebase from YARN to pnpm, for example? It's... If it's just a single package repository, then it's really easy. There is an import command. So you can just run pnpm import and it will convert your log file from YARN log to pnpm log YAML, and you just need to remove the YARN file and commit the new log file. Then maybe update your CI scripts from YARN install to pnpm install, from YARN run to pnpm run. In most cases, all you need to do is just change YARN to pnpm and it will work as a drop-in replacement. For a workspace, you need to change how you list the package globs. YARN uses a field in package.json and pnpm uses a separate file, pnpm-workspace.yarn.

What is the full form of pnpm? What does it stand for? This name was actually invented before me. The first maintainer of pnpm was Rico Stacrus from the Philippines. It is just performance npm. The first version was 10 times faster than npm at that time. That's amazing.

I wanted to ask a question. How does pnpm work on computers which might have multiple drives or multiple file systems? How does everything work on that? The drawback with hardlinks is that a hardlink can be only in scope of one disk. So you cannot have your ContentAddressable store on disk C and your project on disk D, if you are using Windows for instance. So in that case, PNPM creates a separate storage for each disk. So if you have projects on disk C, you will have a store for those projects in project C and if you have project D you will have a separate store for those projects. Or alternatively you can use one store but if the store is on disk C, then on disk C, hard links will be used to that store, but on disk D, files will be just copied. So you won't have the benefits of disk space efficiency. That is interesting, yeah. Thank you for this answer. So I do not see more questions, but for the audience, if you still have more questions, you can always ask them in the Q&A channel and you will get the answers from Zoltan there. And we do not have any speaker room for Zoltan, so make sure that you ask the questions in the Q&A channel. Thank you once again, Zoltan, for joining us today for this terrific talk and also for answering these nice questions. Thank you so much. Thank you.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

JSNation 2023JSNation 2023
29 min
Modern Web Debugging
Few developers enjoy debugging, and debugging can be complex for modern web apps because of the multiple frameworks, languages, and libraries used. But, developer tools have come a long way in making the process easier. In this talk, Jecelyn will dig into the modern state of debugging, improvements in DevTools, and how you can use them to reliably debug your apps.
JSNation 2022JSNation 2022
21 min
The Future of Performance Tooling
Our understanding of performance & user-experience has heavily evolved over the years. Web Developer Tooling needs to similarly evolve to make sure it is user-centric, actionable and contextual where modern experiences are concerned. In this talk, Addy will walk you through Chrome and others have been thinking about this problem and what updates they've been making to performance tools to lower the friction for building great experiences on the web.
JSNation 2022JSNation 2022
28 min
Yarn 4 - Modern Package Management
Yarn 4 is the next major release of your favourite JavaScript package manager, with a focus on performance, security, and developer experience. All through this talk we'll go over its new features, major changes, and share our long-term plans for the project.If you only heard about Yarn without trying it yet, if you're not sure why people make such a fuss over package managers, if you wonder how your package manager can make your work simpler and safer, this is the perfect talk for you!
React Advanced Conference 2021React Advanced Conference 2021
27 min
Beyond Virtual Lists: How to Render 100K Items with 100s of Updates/sec in React
Top Content
There is generally a good understanding on how to render large (say, 100K items) datasets using virtual lists, …if they remain largely static. But what if new entries are being added or updated at a rate of hundreds per second? And what if the user should be able to filter and sort them freely? How can we stay responsive in such scenarios? In this talk we discuss how Flipper introduced map-reduce inspired FSRW transformations to handle such scenarios gracefully. By applying the techniques introduced in this talk Flipper frame rates increased at least 10-fold and we hope to open-source this approach soon.
JSNation 2022JSNation 2022
30 min
High-Speed Web Applications: Beyond the Basics
Knowing how to run performance tests on your web application properly is one thing, and putting those metrics to good use is another. And both these aspects are crucial to the overall success of your performance optimization efforts. However, it can be quite an endeavor at times for it means you need to have a precise understanding of all the ins and outs of both performance data and performance tooling. This talk will shed light on how to overcome this challenge and walk you through the pitfalls and tricks of the trade of Chrome DevTools, providing you with a complete roadmap for performance analysis and optimization.

Workshops on related topic

React Advanced Conference 2021React Advanced Conference 2021
174 min
React, TypeScript, and TDD
Top Content
Featured WorkshopFree
ReactJS is wildly popular and thus wildly supported. TypeScript is increasingly popular, and thus increasingly supported.

The two together? Not as much. Given that they both change quickly, it's hard to find accurate learning materials.

React+TypeScript, with JetBrains IDEs? That three-part combination is the topic of this series. We'll show a little about a lot. Meaning, the key steps to getting productive, in the IDE, for React projects using TypeScript. Along the way we'll show test-driven development and emphasize tips-and-tricks in the IDE.