Tale of Two Repos


Not all monorepos are made the same, and not all teams are equal. 

Managing a monorepo is way more than just ""let's select this tool or that tool"". It comprises many design decisions that you need to make for the product and for the developers. 

In this talk, I will go over the main monorepos challenges - packages installation and linking, development and deployment processes - and describe the possibilities that exist for each stage. 

At the end of the talk, you will have a simple and powerful model that will help you in making the right choice of tools and processes for your monorepo. 


It was the best of time, it was the worst of time. Around the last decade, javascript started to become very popular back in the mid of 2010. The introduction of Node was one of the things, and advanced frameworks like angular and react were introduced, and it became really hype. In 2015, javascript actually woke up from a deep sleep of a few years and had a lot of new features. This was known as the ECMAScript 6, or as we like to call it, ES6. But there were also some problems, because the engines that were running the language, that is the Node, and this is a different browser, and especially a browser that is no longer with us, did not know how to handle those things. Things like spread operators, error function, const, and let were extremely useful when writing javascript code. But it was just not yet supported. And we are developers, and we want those features right now. So in 2014, a very young guy called Sebastian McKenzie started a project to learn more about the works of javascript, and he was trying to actually convert the code from ES6, so it can be used in the browser that supported ES5. So Sebastian created a repository and built the core of the project, but because there were so many features for ES6, and more were coming in all the time, they decided to build it in an extensible design, and using plugins. So they opened a repository for each one of the plugins, so it could be installed separately. And the project grew and grew and grew, because there were so many plugins. So they decided to put together all the repositories of the different plugins into one plugin. And by the way, at the same time, they also renamed the project. They were no longer calling it 6to5, as it was named before, and they changed the name to Babel. So they put all their plugins in one repository that could publish many packages. And this decision was very, very controversial at the time. It was not very clear at all. And it was so controversial that they actually had to write in their GitHub repository an explanation of why they are doing it. And they also built some utilities to help them manage the different packages. Later on, they extracted all these utilities into a separate repository, again another repository, and they called it Leuna. And back then, and for a long time, maybe even until today, Leuna was a synonym for using monorepos. But I think that the way we use javascript and monorepos back in 2015 is not really what we need to do today in 2023. And the way to handle modern javascript. The problem is that sometimes we overlook there, and we get to the point, we just say the magic word, monorepos, and we assume that everyone is talking about the same thing. But the reality, monorepo is a big thing, and everyone means something completely different. Just to give you an example, if we look at two popular tools, yarn and NX, they use the same term, but for opposite things. In yarn, a workspace is a single package. In NX, when they say workspace, they mean the whole monorepo. So I am Tali Barak, and I've been fiddling with monorepos since 2017. So that's quite a lot of time. And in this talk, I'm not going to talk about that tool or that tool, Leuna or NX or TurboRepo. I want to take this opportunity, and this few minutes ahead, to explain what are the decisions, what are the strategies that we need to make when we are moving or when we have a monorepo. Because these decisions, that sometimes we even take them without really worrying about them, we take them as implicit ideas, are actually impacting everything around our development process. They will impact the quality of our product, they will impact the speed of the development, the training that we need to give to the team, and so on. There is no wrong or right here, which I'm going to talk about and say, look, this is the one way to go. But it's more like, what are the considerations? What are the things that you need to think of? So again, if we go back, what is the monorepo? A monorepo is a single repository that has multiple artifacts. Things that you want to share, things that you want to publish. This can be a package, this can be a backend service, this can be a frontend application, and so on. And the first important question you should ask yourself when you are switching to a monorepo is, what do I want to include in my monorepo? And this can be a lot of different things. This can be the different tools that we are using. This can be a frontend application, microservices, packages, backend servers, and so on. So this is like the ground zero decision. What do I put? And it doesn't mean that you have to put the whole company code inside a single monorepo. It definitely can be just part of the code that you have. The other decision is, should I actually go monorepo? And there are a lot of articles, you should look for monorepo, should I monorepo? And then there's a response, monorepo you should not, and so on. So search them because there are different considerations and different things that you need to think of when you are having a monorepo. But let's say you decided that, yes, we want to go with a monorepo. Your repository will include multiple artifacts. And in the javascript world and in the node.js world, this means that it will have multiple package JSONs. Each package JSON is related to an artifact that you want to publish. And the first thing that we have inside our package JSON is, what are the artifacts that, what are the dependencies that this artifact needs in order to work? And this brings us to our first decision, which is install. This is how our packages look like. We have on each package JSON, we have a set of dependencies. And how we are going to install. So if we go with the same approach that we used in a polyrepo, when we had multiple repos, it means that under each package, I will have a Node module that will, I cannot do that, that will have all the packages that it needs. But since we know that Node modules is a really, really big thing, that, and we don't want to replicate it multiple times for each one of the dependencies, we are actually have another approach that we can use. And this approach is to hoist the packages. So instead of installing each package in its own workspace, we can actually move everything to the root of our project, of our repository, and just install them once there. And the reason this is going to work, we know about the search algorithm of Node. It means that it will start at the bottom, try to find the package that we did, like bar here, if it will not find it, it will go one level up, then it will not find it and it will go one level up, and then at the root of the project, it will find it. So that's the first thing. There's a nice caveat that usually happens, that this is that we can, when we are using this hoisting approach, we can forget in one of the packages to put a dependency. And then when we deploy it in production, where we don't have the monorepo, we just actually have each package separately, we will suddenly find that, oops, we cannot find the required module. This can be solved and there are plugins that are doing this. This is all good for external dependencies. Usually we set up a monorepo because we have artifacts that are connected. And this allows us, for example, to share code. So we can have one package that multiple other packages rely on. And like in this example, where we have a service or another package that relies on the package. And this leads to our second strategy, which is how do we actually link the packages that exist in our monorepo? In our package, we don't only have the external, but we also have the local one. And even if the problem looks very similar to the previous problem, we can install it locally or install it globally. It's actually quite different because the code of these packages is inside our monorepos. So instead of just pointing to something external, we are actually going to point to a package that is internal. So some of you would say, look, we solved this for the external packages. Let's do the same for internal packages. Let's test them, build them, version them, publish them to some artifact registry that we have, and then install them from the artifact registry. And this is actually a valid approach. And I know there are projects that are actually implementing this. You cannot install anything locally. You have to publish it to a package registry, even if it is a private one. And then in order to use it, you have to install it from there. But there is also another solution that we can implement here because this is our local and because in some cases we want to make a change in our package. And maybe our change doesn't span just a single package, but we want to make a change that spans multiple artifacts. So in this case, this was solved. It was solved by yarn, even with yarn version 1. You would specify that you need a package from another package. And then it will not really install the node modules, but instead it will just create a symlink. It will just create a reference from the current node modules into the other package. So yeah, it was the age of foolishness. It was the epoch of belief. And with all the new tools that are actually introduced, it actually became more complex. We can no longer just point at the source code and say, this is what needs to be run. Because we had things, we wanted types, so we had typescript. We wanted newer features, so we had Babel that we talk about. We wanted to have bigger bundles. This is mostly for web application. So we introduced tools like webpack. But then at the same time, we wanted to make sure that it can be loaded automatically. So we wanted static module analysis, all kinds of things. So now we have a problem. We are pointing to our source code, but our source code cannot just run. It cannot run just as it is. We need some sort of a transpilation or a build process that will actually make it available. And this is where we introduced another approach, which is instead of pointing to the build artifact, instead of pointing to the raw code, to the source, you should actually point to the build artifact. And this is another approach that actually solved the same problem. The third strategy that I want to talk about is about the development process. Now that the development process is more complex and we need to do a build process, and we definitely want to test our code and we want to run some linting on it and so on. So how do we define that? Because here we have two approaches which are going to somehow repeat through other things. Should I do that at the package level, at the artifact level? Just each package will do its own test as if it runs in a poly repo. And we will just run the script for each package separately. And this is actually what tools like yarn Workspace and npm Workspace are doing. They just assume that each one is independent. We can also have sort of a... The problem here is that we have each configuration put separately on each one of the packages. And if we need to make some change, or if we want something to be unified across our whole project, we actually need to go into each of the packages and fix it. So another approach is, okay, let's make the configuration centralized, but still run each test on each of the projects. And there are tools like Jest, if you want to run tests, that allow you to do that. You define all the projects in one configuration and then you can share a lot of configuration. But the execution is actually on a per project or per package basis. Do we work in a centralized way or do we work in a distributed way? So does it mean that everything that we do is across all the packages or spread between them? The next thing is about the build. So we said that we have to go through a build process, and this is how we work on it. You can just run a script that will do whatever you want, or NX has some idea, which actually they are now retreating from it, which is have a builder that knows how to build your builders. NX was actually started from the angular CLI project, since it was diverted quite a lot. But this is where they came up with the idea that we need executor, and as I said, they are now reverting from it. Here I have the monorepo logical structure. This is not the physical structure of the files, but the logical one. And the way it is structured is in the form of a DAG, which is directed acyclic graph. Acyclic is more like a wishful thinking. It means you have some leaves or roots, depends how you want to look at it. These are packages that do not depend on everything, and then on top of it, you have packages and packages that depend more and more on it. In order to run the build, if you want to build everything, it's actually to do what is called a topological sort. So that means you start from the package that doesn't have any dependencies, and then all the packages that depend on it, and then the packages that depend on it, and so on. And then you get a way to run all the builds in a way that it will make sense, that you're not going to miss, you're not going to get old versions. And a naive build here would say, okay, I have my topological sort, I can just go one by one and build everything along the way, just whatever is there. Another more advanced approach on top of that, that of course will take a lot of time and you might not want to do all of that. Another approach is to say, look, I am working with source control, let's say with Git, I know what was changed, and then I can just start with a package that was changed. So everything that is further down the topological sort, but was not changed, I don't need to rebuild it. I already have its build. And then you can go and make all the changes. Another optimization that you can go on top of it is you can use cache. So not only to go through the packages that were changed, you can also actually skip packages that did not have change. So you will build package five, you will build package four, package one maybe depends only on five, so you can cache it, or even on four, but it was not changed. So you can use a cache build, and then you can continue on. And this is similar to what NX are actually doing as part of their optimization. So this is what we need to think when we think about our build process. The last strategy, strategy number five, is about the release. How do I publish my code to the world? And how do I version? And this is a really critical decision, probably the most critical decision. If I have all these tree of packages, how do I manage the versioning on it? And there are two main approaches. One is unified. Even if there are packages that were not changed, when I publish a new release, or a new application, or a new microservices, I will publish everything under the same release. As an example, angular is doing exactly that. Whenever they release, they release all the packages. storybook are also doing something similar. All the packages in the monorepo have just one version, regardless of what was changed. Another approach is distributed, which is everything is going to be independent. So we have this package will be version 1.1, and this is 3.2, and I will only release the version that I actually made changes to them. And of course, then I will need to make sure that the other packages or other applications that rely on it are backward compatible and can still work with a new version. And if I'm upgrading a dependency, it's actually a change to my package. It was a season of light. It was a season of darkness. So if we try and summarize everything, the first thing you need to do is, yes, what is the scope of my monorepo? What do I need to put inside? And then you should go through the five stages of the strategy. Install, link, develop, build strategy, and release strategy. But in fact, I would suggest that this is not the order to think about. Actually you should think it in the reverse order. Start from the release. What is your goal? How do you want to release? That will derive what is your build strategy, probably will derive your develop strategy, because if you are releasing separately, you don't need to test everything. And that will lead what is the link and the installation strategy. And only after you did this approach and you know what is your strategy and what you want to apply in your monorepo, then you should go and select the tools, learn IEI, Linux, PNPM workspaces, TurboRepo, and so on, the ones that really fit your strategy. Yeah, so there are some hope and some light at the end of the winter. And thank you very much for your listening. And this is where you can find me. Thank you very much.
24 min
17 Apr, 2023

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic