The problem is that sometimes we overlook there and we get to the point we just say the magic word Monorepos and we assume that everyone is talking about the same thing. But the reality Monorepos is a big thing and everyone means something completely different. Just to give you an example. If we look at two popular tools, Yarn and NX, they use the same term but for opposite things. In Yarn a workspace is a single package. In NX when they say workspace they mean the whole Monorepo. So I am Talib Barak and I've been fiddling with Monorepos since 2017. So that's quite a lot of time.
2. Decisions and Strategies for Monorepos
And I'm not going to talk about that tool or that tool, Yarn or NX or TurboRepo. When moving or having a Monorepo, decisions and strategies impact the development process, product quality, speed, and team training. A monorepo is a single repository containing multiple artifacts like packages, backend services, and frontend applications. The first question to ask is what to include. It doesn't have to be the entire company code. Another decision is whether to go monorepo. If chosen, the repository will include multiple artifacts with separate package Jsons. The first decision is about installation and the approach of hoisting packages to the root of the repository.
And I'm not going to talk about that tool or that tool, Yarn or NX or TurboRepo. I want to take this opportunity and this few minutes ahead to explain what are the decisions, what are the strategies that we need to make when we are moving or when we have a Monorepo. Because these decisions, that sometimes we even take them without really worrying about them, we take them as implicit ideas, are actually impacting everything around our development process. They will impact the quality of our product, they will impact the speed of the development, the training that we need to give to the team and so on.
There is no wrong or right here that I'm going to talk about and say, look, this is one way to go, but it's more like, what are the considerations, what are the things that you need to think of. So again, if we go back, what is a monorepo? A monorepo is a single repository that has multiple artifacts, things that you want to share, things that you want to publish. This can be a package, this can be a backend service, this can be a frontend application, and so on.
And the first important question you should ask yourself when you're switching to a monorepo is what do I want to include in my monorepo? And this can be a lot of different things. This can be the different tools that we are using. This can be a frontend application, microservices, packages, backend servers, and so on. So, this is like the ground zero decision. What do I put? And it doesn't mean that you have to put the whole company code inside a single monorepo. It definitely can be just part of the code that you have.
And the first things that we have inside our package Json is what are the artifacts that, what are the dependencies, and what are the dependencies that this artifact needs in order to work? And this brings us to our first decision, which is install. This is how our packages look like. Okay. We have, on each package Json, we have a set of dependencies. And how we are going to install. So if we go with the same approach that we used in a poly repo when we had multiple repos, it means that under each package I will have a node module that will... I cannot do that. That will have all the packages that it needs. But since we know that node modules is a really, really big thing, and we don't want to replicate it multiple times for each one of the dependencies, we actually have another approach that we can use. And this approach is to hoist the packages. So instead of installing each package in its own workspace, we can actually move everything to the root of our project, of our repository, and just install them once there. And the reason this is going to work, we know about the search algorithm of node.
3. Linking Packages in a Monorepo
It means that packages in a monorepo can be linked internally. This can be done by installing the package locally or globally. Another approach is to publish the packages to an artifact registry and install them from there. YARN also allows creating a symlink between packages. With the introduction of new tools like TypeScript and Babel, managing monorepos has become more complex.
It means that it will start at the bottom, try to find the package that we did, like bar here. If it will not find, it will go one level up. Then it will not find it and it will go one level up. And then at the root of the project, it will find it. So that's the first thing.
There's a nice caveat that usually happens. This is that we can, when we are using this hosting approach, we can forget in one of the packages to put a dependency. And then when we deploy it in production, where we don't have the monorepo, we just actually have each package separately, we will suddenly find that we cannot find the required module. This can be solved, and there are plugins that are doing this. This is all good for external dependencies. Usually, we set up a monorepo because we have artifacts that are connected. And this allows us, for example, to share code. So we can have one package that multiple artifacts can share. Multiple other packages rely on. And, like in this example, where we have a service or another package that relies on the package. And this leads to our second strategy, which is how do we actually link the packages that exist in our monorepo? In our package json, we don't only have the external, but we also have the local one. And even if the problem looks very similar to the previous problem, we can install it locally, or install it globally, it's actually quite different. Because the code of these packages is inside our monorepos. So instead of just pointing to something external, we are actually going to point to a package that is internal.
So some of you would say, look, we solved this for the external packages, let's do the same for internal packages. Let's test them, build them, version them, publish them to some artifact registry that we have, and then install them from the artifact registry. And this is actually a valid approach, and I know there are projects that are actually implementing this. You cannot install anything locally, you have to publish it to a package registry, even if it is a private one, and then in order to use it you have to install it from there. But there is also another solution that we can implement here because this is our local, and because in some cases we want to make change in our package, and maybe our change doesn't span just a single package, but we want to make a change that spans multiple artifacts. So in this case, this was solved by YARN, even with the YARN version 1. You would specify that you need a package from another package, and then it will not really install the node modules, but instead it will just create a symlink, it will just create a reference from the current node modules into the other package. So, yeah, it was the age of foolishness, it was the epoch of belief, and with all the new tools that are actually introduced, it actually became more complex. We can no longer just point at the source code and say, this is what needs to be run, because we had things like we wanted types, so we had TypeScript. We wanted newer features, so we had Babel that we talk about. We wanted to have bigger bundles.
4. Web Application Development and Build Process
For web applications, we introduced tools like Webpack for static module analysis. However, pointing directly to the source code poses challenges. Instead, we can point to the build artifact. The development process becomes more complex with the need for a build process, testing, and linting. Tools like Yarn Workspace and NPM Workspace allow each package to run tests independently.
This is mostly for web applications, so we introduced tools like Webpack, but then at the same time we wanted to make sure that it can be loaded automatically, so we wanted static module analysis and all kinds of things.
So now we have a problem. We are pointing to our source code, but our source code cannot just run. It cannot run just as it is. We need some sort of a translation or a build process that will actually make it available. And this is where we introduced another approach, which is instead of pointing to the build artifact, instead of pointing, sorry, to the raw code, to the source, you should actually point to the build artifact. And this is another approach that actually solves the same problem.
The third strategy that I want to talk about is about the development process. Now that the development process is more complex and we need to do a build process, and we definitely want to test our code and we want to run some linting on it and so on. So how do we define that? Because here we have two approaches which are going to somehow repeat through other thing. Should I do that at the package level, at the artifact level? Just each package will do its own test as if it runs in the PolyRepo. And we will just run the script for each package separately. And this is actually what tools like Yarn Workspace and NPM Workspace are doing. They just assume that each one is independent.
5. Configurations, Testing, and the Build Process
We can have each configuration put separately on each package, but this makes it difficult to make unified changes across the project. Another approach is to centralize the configuration but still run tests on each project. The build process involves using a topological sort to run all the builds in a sensible order. An advanced approach is to only rebuild the packages that have changed, using source control and caching to optimize the process.
We can also have sort of each one. The problem here is that we have each configuration put separately on each one of the packages. And if we need to make some change or if we want something to be unified across our whole project, we actually need to go into each of the packages and fix it.
So another approach is, okay, let's make the configuration centralized, but still run each test on each of the project. And there are tools like Jest, if you want to run tests, that allow you to do that. You define all the projects in one configuration and then you can share a lot of configuration. But the execution is actually on a per project or per package basis. Do we work in a centralized way or do we work in a distributed way? So does it mean that everything that we do is across all the packages or spread between them?
The next thing is about the build. Okay? So we said that we have to go through a build process and this is how we work on it. You can just run a script that will do whatever you want or NX has some idea which actually they are now retreating from it which is to have a builder that knows how to build your builders. NX was actually started from the Angular CLI project ever since it was diverted quite a lot but this is where they came up with the idea that we need executor and as I said they are now reverting from it. Here I have the monorepo logical structure. This is not the physical structure of the files but the logical one and the way it is structured is in the form of a DAG which is directed acyclic with more like a wishful thinking. It means you have some routes, depending on how you want to look at it. These are packages that do not depend on everything and then on top of it you have packages and packages that depend more and more on it. And in order to run the build, if you want to build everything is actually to do what is called a topological sort. So that means you start from the package that doesn't have any dependencies and then all the packages that depend on it and then the packages depend on it and so on. And then you get a way to run all the builds in a way that it will make sense, that you're not going to miss, you're not going to get old versions.
And a naive build here would say, okay, I have my topological sort, I can just go one by one and build everything along the way, just whatever is there. Another, more advanced approach on top of that, that of course will take a lot of time and you might not want to do all of that. Another approach is to say, look, I am working with source control, let's say with Git. I know what was changed and then I can just start with the package that was changed. So everything that is further down the topological stored but was not changed, I don't need to rebuild it, I already have its build. And then you can go and make all the changes. Another optimization that you can go on top of it is you can use cache. So not only to go through the packages that were changed, you can also actually skip packages that did not have changed. So you will build package five, you will build package four. Package one maybe depends only on five. So you can cache it, or even on four but it was not changed. So you can use a cache build and then you can continue on.
6. Release Strategy and Tool Selection
NX optimizes the build process. The release strategy is critical. Two approaches: unified (all packages under one release) and distributed (independent versions). Start with the release goal, derive build and develop strategies, then consider linking and installation. Select tools that fit your strategy.
And this is similar to what NX are actually doing as part of their optimization. So this is what we need to think when we think about our build process.
The last strategy, strategy number five, is about the release. How do I publish my code to the world? And how do I version? And this is a really critical decision, probably the most critical decision. If I have all these three and of packages, how do I manage the versioning on it? And there are two main approaches. One is unified. Even if there are packages that were not changed, when I publish a new release, or a new application, or new microservices, I will publish everything under the same release. As an example, Angular is doing exactly that. Whenever they release, they release all the packages. Storybook are also doing something similar. All the packages in the monorepo have just one version, regardless of what was changed. Another approach is distributed, which is everything is going to be independent. So we have this package will be a version 1.1, and this is 3.2, and I will only release the version I actually made changes to. And, of course, then I will need to make sure that the other packages or other applications that rely on it are backward compatible and can still work with the new version. And if I'm upgrading a dependency, it's actually a change to my package. It was a season of light, it was a season of darkness.
So, if we try and summarize everything, the first thing you need to do is, yes, what is the scope of my mono repo, what do I need to put inside, and then you should go through the five stages of the strategy. Install, link, develop, build strategy and release strategy. But, in fact, I would suggest that this is not the order to think about. Actually, you should think it in the reverse order. Start from the release. What is your goal? How do you want to release? That will derive what is your build strategy, probably will derive your develop strategy, because if you are releasing separately, you don't need to test everything, and that will read what is the link and the installation strategy. And only after you did this approach, and you know what is your strategy and what you want to apply in your monorepo, then you should go and select the tools, Learn.io, Linux, PNPM workspaces, TurboRepo and so on, the ones that really fit your strategy. Yeah, so there are some hope and some light at the end of the winter, and thank you very much for listening and this is where you can find me.