Ever wondered what happens after you hit npm install and go to grab a coffee? Let's deep dive into Npm and Yarn installation process, and how you can put this knowledge into practice.
![Node Congress 2023](https://gitnation.imgix.net/stichting-frontend-amsterdam/image/upload/v1619376938/eav9rff77rtiyz7qse5v.jpg?auto=format,compress&fit=scale&w=60)
Level: intermediate
Ever wondered what happens after you hit npm install and go to grab a coffee? Let's deep dive into Npm and Yarn installation process, and how you can put this knowledge into practice.
During 'npm install', npm manages your project's dependencies. If you install a package (e.g., 'npm install foo'), npm adds it to your 'package.json', creates a 'node_modules' directory in your project, and places the package code there. If the package has dependencies, npm recursively installs and nests them in appropriate 'node_modules' directories within each package.
In earlier implementations, npm would create multiple copies of a common dependency for different packages needing it. To optimize, npm now deduplicates by placing common dependencies at the highest possible level in the 'node_modules' directory, allowing multiple packages to share a single instance of the dependency.
The original npm structure led to problems such as bloated 'node_modules' directories, circular dependencies that could cause infinite loops, and issues with package singletons that could result in bugs due to multiple instances.
NPM ensures consistency across different environments using a lock file, known as 'shrinkwrap'. This file captures the exact package structure and versions installed in one environment to ensure they are replicated in others, such as on different development machines or in continuous integration (CI) systems.
While npm has evolved its package handling to create a flatter 'node_modules' structure for efficiency, Yarn introduced a different approach with Yarn 2 (codenamed Berry). Yarn 2 uses a virtual file system and patches the way Node.js requests files, directly managing dependency resolutions and fetching the exact files needed from a mapped list of modules.
PNPM is a package manager that maintains the traditional nested 'node_modules' structure but optimizes storage by using a global cache and hard links. Unlike npm and Yarn, PNPM does not physically duplicate package files in each project; instead, it creates direct links to the cached versions, significantly reducing disk space usage.
npm install can be a mysterious process, but understanding how package managers work is essential. NPM solved problems like large node_modules, circular dependencies, and multiple instances of the same package. Managing package versions and conflicts is crucial for consistency across projects. Alternative approaches to package management, like PNPM and Yarn2, provide insights into the hidden complexities of package managers.
npm install can be a mysterious process, but understanding how package managers work is essential. When you install a package, it creates a node_modules folder and adds the necessary code. However, this can lead to issues like large node_modules, circular dependencies, and multiple instances of the same package. NPM solved these problems by deduplicating packages and using a hierarchical structure. This ensures efficient package retrieval and eliminates the need for redundant copies.
So, you are running npm install and you go and grab a cup of coffee, and then you come back and you have no idea or maybe you don't even care what npm did during this time.
So, my name is Tali Barak. I work for Youbeek and let me tell you about the secret life of package managers. This is what happens in your project, this is the basics of npm.
So, you have your project and you need a package called foo. So, you are running npm install foo and that's add npm install to your package json and create a node modules in your project and put this code of foo inside it. But what if foo requires buzz? Okay, no problem. It will create another node modules folder under foo and it will put buzz. And what if buzz requires bugs? Well, same thing. It will put it and add it there. And then what happens if your foo requires a buzz but also your bar requires buzz. In this case, in the naive implementation of npm, you will have two packages of buzz in the same project.
And this actually creates the whole structure of your file system that is replicating the package structure that you have in your project. And this was nice, but it created quite a few problems. For example, it made your node models huge. Also, it created a problem with circular dependencies. That means that if your foo needed a buzz, that needed a buzz, that needed a buzz, that needed bar again, that needed buzz, and needed buzz, you would go into an infinite loop. And this is, by the way, quite common. It's not as rare as you might think it is. Another issue which is common is with singletons. If you need a single instance of a certain package, like the debug package, for example, in this structure, you would have multiple instances, and that can cause bugs when you execute the packages. And the last one is no longer with us. Thank God for that. It was a Windows 8 problem that the path, the file path was limited to 256 characters. This is less common.
So what did NPM do in order to solve that? So they decided to do a dedup, de-duplicate the packages that were multiple times. So instead of having buzz twice, they would put it in the highest possible level and use it there. And the reason this worked is because the way Node is requiring packages. So if buzz needed buzz, it would go under the Node modules and search for it. If it doesn't find it, it will go one level up and search for a third.
When packages have different versions and conflicts arise, the package structure becomes a graph. NPM and Yarn tackle this issue by taking a snapshot of the node modules, ensuring consistency across projects.
And then if it doesn't find it, it will go another level up. And there it is. It actually found buzz there and it will use it.
Next, they decided, well, let's take it one step further. If we can move packages up the tree, why only the duplicate one? We can actually do that for all the packages. And they made a very flat tree with all the packages. And this was good. This solved the problem. You now had smaller packages, shorter paths because it didn't go that deep. It was only unidirectional, no circulars. It made every package unique.
Was good, but then we had a problem. Each package might have a different version that it requires. So your tree, the tree that you need doesn't actually look something like this one. You have different versions of the same packages required in different places in the tree. And even worse, in some cases, the versions could conflict. That means that your foo might require buzz in version one, but bar requires buzz in version two. And how do you flatten that? What do you put at the top level? In fact, we have an issue here that your file structure, your package structure, is no longer a tree. It is actually a graph.
And the way it was solved is by different versions of NPM had different solutions. Sometimes it would take a popular one. Sometimes it would take a first one and put it at the top of the tree. While the other version that was required was left under the package that required it. Like in this case here, where you could promote two or one, depending on the order. And this is a problem because now we get a very shaky and unpredictable tree. And the way that NPM and also Yarn in version 1 solved it is by actually taking a snapshot of your whole node modules. And this is the famous log file. NPM has it as a shrink wrap file. And then Yarn added the Yarn log file. And this is the way for NPM to make sure that the package structure that the node module files in one project is the same as the one on the CI or is the same as your colleagues run.
We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career
Comments