Tale of Two Repos

Rate this content
Bookmark
Slides

Not all monorepos are made the same, and not all teams are equal. 

Managing a monorepo is way more than just ""let's select this tool or that tool"". It comprises many design decisions that you need to make for the product and for the developers. 


In this talk, I will go over the main monorepos challenges - packages installation and linking, development and deployment processes - and describe the possibilities that exist for each stage. 


At the end of the talk, you will have a simple and powerful model that will help you in making the right choice of tools and processes for your monorepo. 

24 min
17 Apr, 2023

Video Summary and Transcription

JavaScript became popular in the mid-2010s with the introduction of Node and advanced frameworks like Angular and React. Monorepos, which are single repositories containing multiple artifacts, are a popular approach for managing JavaScript projects. Linking packages internally in a monorepo can be done through local or global installation, or by publishing them to an artifact registry. Managing monorepos has become more complex with the introduction of tools like TypeScript and Babel. The development process for web applications involves a build process, testing, and linting, which can be facilitated by tools like Yarn Workspace and NPM Workspace. The release strategy for monorepos can be either unified or distributed, and it is important to select tools that align with the chosen strategy.

Available in Español

1. The Rise of JavaScript and Monorepos

Short description:

JavaScript became popular in the mid-2010s with the introduction of Node and advanced frameworks like Angular and React. In 2015, ECMAScript 6 (ES6) brought new features, but the engines running the language didn't support them. Sebastian MacKenzie started a project to convert ES6 code to be used in browsers that supported ES5. The project grew and became Babyl, a controversial decision. Lerna, initially synonymous with Monorepos, was created to manage the project's plugins. However, the way we use Monorepos in 2015 may not be suitable for modern JavaScript. Yarn and NX use the term workspace differently. I have experience with Monorepos since 2017.

It was the best of time, it was the worst of time. Around the last decade, JavaScript started to become very popular, back in the mid of 2010. And the introduction of Node was one of the things, and advanced frameworks like Angular and React were introduced, and it became really hype.

In 2015, JavaScript actually woke up from a deep sleep a few years and had a lot of new features. This was known as the ECMAScript 6, or as we like to call it, ES6. But there were also some problems, because the engines that were running the language, that is the Node, and that's, this is a different browser, and especially a browser that is no longer with us, did not know how to handle those things.

Things like Spread Operators, Error Function, Const, and Let were extremely useful when writing JavaScript code, but it was just not yet supported. And we are developers, and we want those features right now. In 2014, a very young guy called Sebastian MacKenzie started a project to learn more about the works of JavaScript and he was trying to actually convert the code from ES6, so it can be used in the browser that supported ES5. So Sebastian created a repository and built the core of a project, but because there were so many features for ES6 and more were coming in all the time, they decided to build it in an extensible design and using plugins.

So they opened a repository for each one of the plugins so it can be installed separately. And the project grew and grew and grew because there were so many plugins. So they decided to put together all the repositories of the different plugins into a one plugin. And by the way, at the same time, they also renamed the project. They were no longer calling it 6to5 as it was named before, and they changed the name to Babyl. So they put all their plugins in one repository that could publish many packages. And this decision was very, very controversial at the time, it was not very clear at all. And it was so controversial that they actually had to write in their GitHub repository an explanation of why they are doing it. And they also built some utilities to help them manage the different packages. Later on, they extracted all these utilities into a separate repository, again, another repository, and they called it Lerna. And back then and for a long time, maybe even until today, Lerna was a synonym for using Monorepos. But I think that the way we use JavaScript and Monorepos back in 2015 is not really what we need to do today in 2023 and the way to handle a modern JavaScript.

The problem is that sometimes we overlook there and we get to the point we just say the magic word Monorepos and we assume that everyone is talking about the same thing. But the reality Monorepos is a big thing and everyone means something completely different. Just to give you an example. If we look at two popular tools, Yarn and NX, they use the same term but for opposite things. In Yarn a workspace is a single package. In NX when they say workspace they mean the whole Monorepo. So I am Talib Barak and I've been fiddling with Monorepos since 2017. So that's quite a lot of time.

2. Decisions and Strategies for Monorepos

Short description:

And I'm not going to talk about that tool or that tool, Yarn or NX or TurboRepo. When moving or having a Monorepo, decisions and strategies impact the development process, product quality, speed, and team training. A monorepo is a single repository containing multiple artifacts like packages, backend services, and frontend applications. The first question to ask is what to include. It doesn't have to be the entire company code. Another decision is whether to go monorepo. If chosen, the repository will include multiple artifacts with separate package Jsons. The first decision is about installation and the approach of hoisting packages to the root of the repository.

And I'm not going to talk about that tool or that tool, Yarn or NX or TurboRepo. I want to take this opportunity and this few minutes ahead to explain what are the decisions, what are the strategies that we need to make when we are moving or when we have a Monorepo. Because these decisions, that sometimes we even take them without really worrying about them, we take them as implicit ideas, are actually impacting everything around our development process. They will impact the quality of our product, they will impact the speed of the development, the training that we need to give to the team and so on.

There is no wrong or right here that I'm going to talk about and say, look, this is one way to go, but it's more like, what are the considerations, what are the things that you need to think of. So again, if we go back, what is a monorepo? A monorepo is a single repository that has multiple artifacts, things that you want to share, things that you want to publish. This can be a package, this can be a backend service, this can be a frontend application, and so on.

And the first important question you should ask yourself when you're switching to a monorepo is what do I want to include in my monorepo? And this can be a lot of different things. This can be the different tools that we are using. This can be a frontend application, microservices, packages, backend servers, and so on. So, this is like the ground zero decision. What do I put? And it doesn't mean that you have to put the whole company code inside a single monorepo. It definitely can be just part of the code that you have.

The other decision is should I actually go monorepo? And there are a lot of articles, you should look for should I monorepo, and then there's a response, monorepo you should not, and so on. So, third, then, because there are different considerations and different things that you need to think of when you are having a monorepo. But let's say you decided that, yes, we want to go with the monorepo. Your repository will include multiple artifacts, and in the JavaScript world, and in the Node.js world, this means that it will have multiple package Jsons. Each package Json is related to an artifact that you want to publish.

And the first things that we have inside our package Json is what are the artifacts that, what are the dependencies, and what are the dependencies that this artifact needs in order to work? And this brings us to our first decision, which is install. This is how our packages look like. Okay. We have, on each package Json, we have a set of dependencies. And how we are going to install. So if we go with the same approach that we used in a poly repo when we had multiple repos, it means that under each package I will have a node module that will... I cannot do that. That will have all the packages that it needs. But since we know that node modules is a really, really big thing, and we don't want to replicate it multiple times for each one of the dependencies, we actually have another approach that we can use. And this approach is to hoist the packages. So instead of installing each package in its own workspace, we can actually move everything to the root of our project, of our repository, and just install them once there. And the reason this is going to work, we know about the search algorithm of node.

3. Linking Packages in a Monorepo

Short description:

It means that packages in a monorepo can be linked internally. This can be done by installing the package locally or globally. Another approach is to publish the packages to an artifact registry and install them from there. YARN also allows creating a symlink between packages. With the introduction of new tools like TypeScript and Babel, managing monorepos has become more complex.

It means that it will start at the bottom, try to find the package that we did, like bar here. If it will not find, it will go one level up. Then it will not find it and it will go one level up. And then at the root of the project, it will find it. So that's the first thing.

There's a nice caveat that usually happens. This is that we can, when we are using this hosting approach, we can forget in one of the packages to put a dependency. And then when we deploy it in production, where we don't have the monorepo, we just actually have each package separately, we will suddenly find that we cannot find the required module. This can be solved, and there are plugins that are doing this. This is all good for external dependencies. Usually, we set up a monorepo because we have artifacts that are connected. And this allows us, for example, to share code. So we can have one package that multiple artifacts can share. Multiple other packages rely on. And, like in this example, where we have a service or another package that relies on the package. And this leads to our second strategy, which is how do we actually link the packages that exist in our monorepo? In our package json, we don't only have the external, but we also have the local one. And even if the problem looks very similar to the previous problem, we can install it locally, or install it globally, it's actually quite different. Because the code of these packages is inside our monorepos. So instead of just pointing to something external, we are actually going to point to a package that is internal.

So some of you would say, look, we solved this for the external packages, let's do the same for internal packages. Let's test them, build them, version them, publish them to some artifact registry that we have, and then install them from the artifact registry. And this is actually a valid approach, and I know there are projects that are actually implementing this. You cannot install anything locally, you have to publish it to a package registry, even if it is a private one, and then in order to use it you have to install it from there. But there is also another solution that we can implement here because this is our local, and because in some cases we want to make change in our package, and maybe our change doesn't span just a single package, but we want to make a change that spans multiple artifacts. So in this case, this was solved by YARN, even with the YARN version 1. You would specify that you need a package from another package, and then it will not really install the node modules, but instead it will just create a symlink, it will just create a reference from the current node modules into the other package. So, yeah, it was the age of foolishness, it was the epoch of belief, and with all the new tools that are actually introduced, it actually became more complex. We can no longer just point at the source code and say, this is what needs to be run, because we had things like we wanted types, so we had TypeScript. We wanted newer features, so we had Babel that we talk about. We wanted to have bigger bundles.

4. Web Application Development and Build Process

Short description:

For web applications, we introduced tools like Webpack for static module analysis. However, pointing directly to the source code poses challenges. Instead, we can point to the build artifact. The development process becomes more complex with the need for a build process, testing, and linting. Tools like Yarn Workspace and NPM Workspace allow each package to run tests independently.

This is mostly for web applications, so we introduced tools like Webpack, but then at the same time we wanted to make sure that it can be loaded automatically, so we wanted static module analysis and all kinds of things.

So now we have a problem. We are pointing to our source code, but our source code cannot just run. It cannot run just as it is. We need some sort of a translation or a build process that will actually make it available. And this is where we introduced another approach, which is instead of pointing to the build artifact, instead of pointing, sorry, to the raw code, to the source, you should actually point to the build artifact. And this is another approach that actually solves the same problem.

The third strategy that I want to talk about is about the development process. Now that the development process is more complex and we need to do a build process, and we definitely want to test our code and we want to run some linting on it and so on. So how do we define that? Because here we have two approaches which are going to somehow repeat through other thing. Should I do that at the package level, at the artifact level? Just each package will do its own test as if it runs in the PolyRepo. And we will just run the script for each package separately. And this is actually what tools like Yarn Workspace and NPM Workspace are doing. They just assume that each one is independent.

5. Configurations, Testing, and the Build Process

Short description:

We can have each configuration put separately on each package, but this makes it difficult to make unified changes across the project. Another approach is to centralize the configuration but still run tests on each project. The build process involves using a topological sort to run all the builds in a sensible order. An advanced approach is to only rebuild the packages that have changed, using source control and caching to optimize the process.

We can also have sort of each one. The problem here is that we have each configuration put separately on each one of the packages. And if we need to make some change or if we want something to be unified across our whole project, we actually need to go into each of the packages and fix it.

So another approach is, okay, let's make the configuration centralized, but still run each test on each of the project. And there are tools like Jest, if you want to run tests, that allow you to do that. You define all the projects in one configuration and then you can share a lot of configuration. But the execution is actually on a per project or per package basis. Do we work in a centralized way or do we work in a distributed way? So does it mean that everything that we do is across all the packages or spread between them?

The next thing is about the build. Okay? So we said that we have to go through a build process and this is how we work on it. You can just run a script that will do whatever you want or NX has some idea which actually they are now retreating from it which is to have a builder that knows how to build your builders. NX was actually started from the Angular CLI project ever since it was diverted quite a lot but this is where they came up with the idea that we need executor and as I said they are now reverting from it. Here I have the monorepo logical structure. This is not the physical structure of the files but the logical one and the way it is structured is in the form of a DAG which is directed acyclic with more like a wishful thinking. It means you have some routes, depending on how you want to look at it. These are packages that do not depend on everything and then on top of it you have packages and packages that depend more and more on it. And in order to run the build, if you want to build everything is actually to do what is called a topological sort. So that means you start from the package that doesn't have any dependencies and then all the packages that depend on it and then the packages depend on it and so on. And then you get a way to run all the builds in a way that it will make sense, that you're not going to miss, you're not going to get old versions.

And a naive build here would say, okay, I have my topological sort, I can just go one by one and build everything along the way, just whatever is there. Another, more advanced approach on top of that, that of course will take a lot of time and you might not want to do all of that. Another approach is to say, look, I am working with source control, let's say with Git. I know what was changed and then I can just start with the package that was changed. So everything that is further down the topological stored but was not changed, I don't need to rebuild it, I already have its build. And then you can go and make all the changes. Another optimization that you can go on top of it is you can use cache. So not only to go through the packages that were changed, you can also actually skip packages that did not have changed. So you will build package five, you will build package four. Package one maybe depends only on five. So you can cache it, or even on four but it was not changed. So you can use a cache build and then you can continue on.

6. Release Strategy and Tool Selection

Short description:

NX optimizes the build process. The release strategy is critical. Two approaches: unified (all packages under one release) and distributed (independent versions). Start with the release goal, derive build and develop strategies, then consider linking and installation. Select tools that fit your strategy.

And this is similar to what NX are actually doing as part of their optimization. So this is what we need to think when we think about our build process.

The last strategy, strategy number five, is about the release. How do I publish my code to the world? And how do I version? And this is a really critical decision, probably the most critical decision. If I have all these three and of packages, how do I manage the versioning on it? And there are two main approaches. One is unified. Even if there are packages that were not changed, when I publish a new release, or a new application, or new microservices, I will publish everything under the same release. As an example, Angular is doing exactly that. Whenever they release, they release all the packages. Storybook are also doing something similar. All the packages in the monorepo have just one version, regardless of what was changed. Another approach is distributed, which is everything is going to be independent. So we have this package will be a version 1.1, and this is 3.2, and I will only release the version I actually made changes to. And, of course, then I will need to make sure that the other packages or other applications that rely on it are backward compatible and can still work with the new version. And if I'm upgrading a dependency, it's actually a change to my package. It was a season of light, it was a season of darkness.

So, if we try and summarize everything, the first thing you need to do is, yes, what is the scope of my mono repo, what do I need to put inside, and then you should go through the five stages of the strategy. Install, link, develop, build strategy and release strategy. But, in fact, I would suggest that this is not the order to think about. Actually, you should think it in the reverse order. Start from the release. What is your goal? How do you want to release? That will derive what is your build strategy, probably will derive your develop strategy, because if you are releasing separately, you don't need to test everything, and that will read what is the link and the installation strategy. And only after you did this approach, and you know what is your strategy and what you want to apply in your monorepo, then you should go and select the tools, Learn.io, Linux, PNPM workspaces, TurboRepo and so on, the ones that really fit your strategy. Yeah, so there are some hope and some light at the end of the winter, and thank you very much for listening and this is where you can find me.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

JSNation 2023JSNation 2023
29 min
Modern Web Debugging
Top Content
Few developers enjoy debugging, and debugging can be complex for modern web apps because of the multiple frameworks, languages, and libraries used. But, developer tools have come a long way in making the process easier. In this talk, Jecelyn will dig into the modern state of debugging, improvements in DevTools, and how you can use them to reliably debug your apps.
JSNation 2022JSNation 2022
21 min
The Future of Performance Tooling
Top Content
Our understanding of performance & user-experience has heavily evolved over the years. Web Developer Tooling needs to similarly evolve to make sure it is user-centric, actionable and contextual where modern experiences are concerned. In this talk, Addy will walk you through Chrome and others have been thinking about this problem and what updates they've been making to performance tools to lower the friction for building great experiences on the web.
DevOps.js Conf 2022DevOps.js Conf 2022
31 min
pnpm – a Fast, Disk Space Efficient Package Manager for JavaScript
You will learn about one of the most popular package managers for JavaScript and its advantages over npm and Yarn.A brief history of JavaScript package managersThe isolated node_modules structure created pnpmWhat makes pnpm so fastWhat makes pnpm disk space efficientMonorepo supportManaging Node.js versions with pnpm
DevOps.js Conf 2024DevOps.js Conf 2024
25 min
End the Pain: Rethinking CI for Large Monorepos
Scaling large codebases, especially monorepos, can be a nightmare on Continuous Integration (CI) systems. The current landscape of CI tools leans towards being machine-oriented, low-level, and demanding in terms of maintenance. What's worse, they're often disassociated from the developer's actual needs and workflow.Why is CI a stumbling block? Because current CI systems are jacks-of-all-trades, with no specific understanding of your codebase. They can't take advantage of the context they operate in to offer optimizations.In this talk, we'll explore the future of CI, designed specifically for large codebases and monorepos. Imagine a CI system that understands the structure of your workspace, dynamically parallelizes tasks across machines using historical data, and does all of this with a minimal, high-level configuration. Let's rethink CI, making it smarter, more efficient, and aligned with developer needs.

Workshops on related topic

React Advanced Conference 2021React Advanced Conference 2021
174 min
React, TypeScript, and TDD
Top Content
Featured WorkshopFree
ReactJS is wildly popular and thus wildly supported. TypeScript is increasingly popular, and thus increasingly supported.

The two together? Not as much. Given that they both change quickly, it's hard to find accurate learning materials.

React+TypeScript, with JetBrains IDEs? That three-part combination is the topic of this series. We'll show a little about a lot. Meaning, the key steps to getting productive, in the IDE, for React projects using TypeScript. Along the way we'll show test-driven development and emphasize tips-and-tricks in the IDE.
React Summit 2023React Summit 2023
145 min
React at Scale with Nx
Top Content
Featured WorkshopFree
We're going to be using Nx and some its plugins to accelerate the development of this app.
Some of the things you'll learn:- Generating a pristine Nx workspace- Generating frontend React apps and backend APIs inside your workspace, with pre-configured proxies- Creating shared libs for re-using code- Generating new routed components with all the routes pre-configured by Nx and ready to go- How to organize code in a monorepo- Easily move libs around your folder structure- Creating Storybook stories and e2e Cypress tests for your components
Table of contents: - Lab 1 - Generate an empty workspace- Lab 2 - Generate a React app- Lab 3 - Executors- Lab 3.1 - Migrations- Lab 4 - Generate a component lib- Lab 5 - Generate a utility lib- Lab 6 - Generate a route lib- Lab 7 - Add an Express API- Lab 8 - Displaying a full game in the routed game-detail component- Lab 9 - Generate a type lib that the API and frontend can share- Lab 10 - Generate Storybook stories for the shared ui component- Lab 11 - E2E test the shared component
Node Congress 2023Node Congress 2023
160 min
Node Monorepos with Nx
Top Content
WorkshopFree
Multiple apis and multiple teams all in the same repository can cause a lot of headaches, but Nx has you covered. Learn to share code, maintain configuration files and coordinate changes in a monorepo that can scale as large as your organisation does. Nx allows you to bring structure to a repository with hundreds of contributors and eliminates the CI slowdowns that typically occur as the codebase grows.
Table of contents:- Lab 1 - Generate an empty workspace- Lab 2 - Generate a node api- Lab 3 - Executors- Lab 4 - Migrations- Lab 5 - Generate an auth library- Lab 6 - Generate a database library- Lab 7 - Add a node cli- Lab 8 - Module boundaries- Lab 9 - Plugins and Generators - Intro- Lab 10 - Plugins and Generators - Modifying files- Lab 11 - Setting up CI- Lab 12 - Distributed caching