The Age of Monorepos

Bookmark

The history of the web can be divided into evolutionary development leaps. The age of inline scripts, the age of jQuery, the age of SPAs, the age of JAMStack...

We are now entering the next stage that has been carefully prepared in the past few years. Let me invite you to the world of modern monorepo solutions and share with you the benefits you will reap by using them in every project size and setup. It's time you automate those boilerplate tasks and reduce the bottlenecks so you can focus on what truly matters.

Get ready for the next leap! Welcome to the age of monorepos!

by



Transcription


Thank you all for joining my talk. Unfortunately, MC decided to resign at the last moment, so I have to announce myself, but that's fine. Today, I'm going to be talking about the amazing world of monorepos. But before we dive into that, I have an important disclaimer. In this slide, you will see some examples of extremely bad web design. You will see some flickering colors that might cause happy attacks. And finally, you will see some life-changing features. So if you have medical history with any of these symptoms, perhaps it's better to change the track. Otherwise, I assume you take the full responsibility for being here. And with that formal note, let me introduce myself. My name is Miroslav Janas. I work for Narwal on the tool called NX, which you're going to hear a lot about today. I also co-organize two meetups in Vienna, ViennaJS and AngularVienna. Before we dive into what monorepos are, in order to understand how we come to the point where monorepos are needed, we need to take a trip back through history all the way to the beginning of the web to retrace our steps to see how we got here. Fasten your seat belts. It's history time. In the beginning, as you all know, the web was static. It was merely a collection of HTML pages linked with hyperlinks. The first web pages looked something like this Yahoo page. They had lots of text, lots of links, very small images. It was dial-up time, so things have to be small and fast. Usually they had a contrasty choice of colors. But pages were boring. They were too static. So people came up with graphical format that would shade things a bit. Who remembers this dancing baby? Some pages took this to a whole new level where the entire page was spinning in animations. But you see, it still wasn't what we needed, because this was running in a loop. It wasn't controlled animation. It wasn't controlled movement. So Brendan Eich from Netscape, a company producing the popular browser at the time, was given a task to come up with a language in just two weeks that would pick up some ideas from Java and that would finally bring dynamicity to the browser. And two weeks later, LiveScript was born, which was later renamed to JavaScript, much to the delight of generations of recruiters and headhunters ever since. And so the age of scripting began. And with this, we finally had pages that had fancy image galleries. We had crazy menu effects, buttons that would run away from our cursor. But the pages, well, they could still look very ugly. But now they had controlled movement, finally. And as the number of scripts on a page grew, we started to encounter certain patterns to recognize certain things that were repeating. At the same time, this was a moment of a famous browser wars between Microsoft and Netscape, and there was lots of inconsistencies between standards in these two browsers. So the developer usually had to implement things for both browsers. Luckily, we had helper libraries, most notably jQuery, which would create a wrapper around DOM manipulation. This would allow you to quickly create your websites. And as websites became more and more complicated, we started calling them web applications, not websites. But encapsulating DOM and animations wasn't the only boilerplate. There was still a lot to be encapsulated, like routing, event management, state management. And this is what led to single page applications. The first world popular framework that implemented single page application was AngularJS, and soon React and Vue followed. All of these are still used today in some variations, and they together change our thinking of web development. They set up the web development as we know it today. Unfortunately for them, this was also the time when our phones became smart, and now suddenly we no longer browse the internet on our desktop computers, but we started browsing internet on our mobile phones while sitting on the park benches or being in a public transport or sitting on a toilet seat. And in these places, connection wasn't really stable. We could hope for 3G at best with lots of interruptions. And suddenly we realised that the single page applications are just too heavy for mobile phones, and we needed to address this. And people were used to having fast websites, so suddenly when you had to wait minutes for something to load, it just didn't work. And this led to a birth of Jamstack, where we needed something that was fast and light in contrast to the large single page applications that took minutes or dozens of minutes and megabytes to load. We needed to address the elephant that is trying to be squeezed to a straw. It was the birth of the first meta frameworks like Next, Nuxt, or Gatsby, or popular today Remix, Quick, and Astro that bring something new to the table, where we finally had pages that were fast to load, where you would immediately see the results. Unfortunately, all of this came at the expense of developer experience. In order to have fast websites for users, we had to do heavy lifting on our own machines, so our build processes became super slow. And not only this led to frustrating developers, but when we consider that most of the CI and CD is now pushed to cloud, if our build processes were slow, it also meant our monthly cloud bills were getting higher and higher. It was time to address the second part of the slowness. So now websites were slow, but we needed to work fast, but we needed to speed up our developer experience. This led to monorepos. In order to understand what monorepos exactly do, let's first consider our typical web application today. Our application usually consists of a front-end application built with your chosen framework, and then we have some back-end, which usually is not a monolith back-end, but it has some microservices architecture behind, and then you have some UI components. Now what happens occasionally is that one of the developers working on back-end slightly changes a method in one of the services by that changing a contract, and simply forgets to inform you the maintainer of front-end, and suddenly your website is broken, everyone is pointing fingers at you, and you have no idea what happened because you didn't touch that code. There are some clunky attempts to solve this by constantly exporting contracts from the back-end, converting them to TypeScript and importing them in your front-end applications so that you always have up-to-date information, but this requires manual intervention, and when it comes to human factor, given enough time, it will eventually fail. On top of that, there is a sync issue, because if something changes in the back-end, you have to roll this out at the same time on front-end, and there's a lot of coordination and juggling involved. On top of that, your application might not be the only one. There might be two additional applications, a dedicated mobile and dedicated admin portal maybe built with completely different technology that is also shared among different applications. What you want to do is that whenever one of these changes, each part of the system that has been affected by it immediately is notified. In a typical poly-repo approach, what happens is that, for example, if you change the utility, you would have to publish a new version, then you update the dependencies in, say, a home page application and admin portal, and then test if this works. If it doesn't work, you have to go back, again, make a fix in utility, again publish a new version. Of course, you can solve this with, for example, sim linking or some local repository, but it requires a lot of coordination, a lot of things that you need to maintain on your end, and on top of that, this doesn't scale well. Now, wouldn't it be nice if they could just simply talk to each other, so whenever something changes, the entire ecosystem already knows what happened? This is what happens with collocation. Collocation means that all of these projects are sitting in the same repo, so whenever one of these changes, all the projects can immediately see that because they are sitting together, so there's no need to publish a new version, no need for any rolling or any other rituals. Now, having things collocated also helps us to identify certain functions that can, for example, be reused, so if you have some sophisticated admin function like this in your code, you can easily tell your colleagues, hey, I have this amazing function, maybe you want to use it in your project as well, and they can see it and go, yeah, amazing, we need this for mobile phone as well. And you can easily extract this into a library and share it with everyone else in the repo. And when people think about monorepos, this is what they usually think about, that it's just about collocation, but it's not just about collocation. Collocation is just a precondition to everything else. It's the speed that's the main selling point of the monorepos. So how do we achieve this speed? First of all, when we know, when we have things in one place collocated, we can easily spot how things are connected, and by knowing how things are connected, we can come up with this nice graph of dependencies, and, for example, if we would change here our core library, we can see that the only two projects that might be affected by this are store and admin. Furthermore, if the command that we are trying to run is deploy, we know that this core is just a building block library, it doesn't get deployed, so nothing to be done there. But our store and admin do get deployed. So by knowing what target we are running and how things are connected, we can reduce this entire graph to just two nodes. Now, imagine if this entire graph had hundreds of projects, how much you would save there. So instead of running build, for example, for all of these projects, you would simply run build for core, store, and admin, and then deploy just store and admin. Again, if we know how things are connected, we can always have up-to-date dependency graph. We can have up-to-date architectural graph. So no longer you have to maintain some stale architectural diagrams in your network drives, you can just run the command and get up-to-date diagram immediately. And not only that, you can nicely filter it by nodes, you can click on the vertices and see which files create this connection between two projects. You can see whether you have some circular dependencies, you can spot all these problems. Next thing, if we know how things are connected, we can create a nice task orchestration, because we can know in which order things need to be processed. So if we need, for example, to run test, build, and lint, a naive way would be to just simply run them sequentially. First build, then lint, and then test. Let's assume that test includes end-to-end test, so we have to do it after the build. A better way would be, of course, to run build and lint in parallel, because they don't depend on each other, and then run test afterwards. An even better way would be to start running test as soon as parts that need to be tested are built. And if you ever used Lerna up until a month ago, this was the limit of its capabilities. And it was fine for most of the stuff, but it also left a lot of things to desire. And in this state of limbo, a lot of new monorepo tools came out and they offered more functionality on top of Lerna. This changed recently when Narwal took stewardship over Lerna, because now with a single flag use NX, you can have most of the functionalities that NX, one of the popular monorepo tools provides. So Lerna is now also competitive with all their other tools, and it became way faster than it was before, and there's also a new website, so if you haven't checked, you can go ahead and check the website. Not now, after the talk. And as I mentioned, it's now powered by NX. This is opt-in, so you can switch it on or off if you don't like, it's up to you. So what are these other monorepo tools? Companies like Google and Facebook have had their monorepo solutions for a long time. The repositories consist of thousands of projects that are constantly running, and for this type of ecosystem, you need a really sophisticated build tools in order to run this in real time, right? Do not wait days for it to happen. But up until recently, it wasn't open for public, but once it became open for public, although we realise how impressive these tools are, there was a slight catch. You might need a PhD to be able to set it up, because almost none of it was automated. It required lots and lots of configurations, and not just once. Over and over, every time you create a new project. So in order to maintain this on your repository, you might need a dedicated person doing just this. This wasn't, of course, feasible for any smaller company. But luckily, companies like Microsoft with Rush and Lage, or Narva with NX, or Vercel now with TurboRepo, offered solutions that give more or less the same feature set, but automated with all the boilerplate taken away from you, so you don't have to do this manually. So, what are these features that we're missing from Lerna? The most important one is caching. Caching is a premise that you shouldn't have to run code that you already run. So if you're running, for example, build on your state of the repository, if you need to run it again, you should have just results replayed. Why struggle again? This is what caching does. For example, if I run a build on my system, it will be stored in a cache, and not just my artifacts, but also my locks. So the next time when I run it, it will not only copy those artifacts back as if I was running it now, but it would also replay all the locks, so you wouldn't even notice that it was running from a cache apart from being there immediately. But what happens if you have a colleague, for example, in Australia, and they're running the same command on the same state of the repository? They would have to do it again, because they can't see your local cache. But luckily, with the remote cache, you can not only store your cache locally, but you can also store it on the cloud. So the next time your colleague is trying to load or run the build, first it will check is it available in my local cache? If yes, then we'll copy it from local cache. If not, we'll check the cloud. Is it there? If yes, we'll copy it from cloud. Otherwise, we'll run it from a scratch, save it in local cache, and save it on cloud. And this not only saves your developer time, but it also saves tremendous amount of time on CI, especially because while you're working on your PR on your feature, you're already building these things on your machine, and your cloud already knows about it. So when you run the CI, CI is just then picking stuff from the cache. And this not only saves time while you wait for the PR, but it also saves huge amount of money for your monthly cloud bills. The next very cool feature is distributed task execution. No matter how cool implement your caching is or how performant your affected graph is, there are times occasionally when all of this doesn't matter. For example, if you're updating your version of your framework, suddenly everything is affected and your cache is nonexistent because you're having completely new version of framework, so all the cache that you have doesn't work anymore. It could be broken. So you have to run everything from the scratch. And although this might not seem important, we've seen solutions with hundreds of projects where just running build takes hour and a half. Where nightly builds are running like day and a half. So it's not even nightly, it's already like by nightly. And imagine if they would have to run this every time something major changes in their framework or if they have some major refactoring, that would be a disaster. Luckily distributed task execution gives us ability to run things in parallel on multiple agents. So, for example, if we have 100 tasks that we need to finish, we can split them to, let's say, five agents and each takes 20 tasks. But it's not that simple because tasks don't have the same size. They might be intertwined. There might be some dependencies. And on top of that, if something fails, I don't want to go through all of the agents to see what exactly where it failed. I want to see a unified report. And this is why distributed task execution is the hardest problem of monorepos. And so far only two monorepos solutions, Bazel and NX, are implementing it. Next feature that is quite dear to me is code constraints. This is your only weapon against unmanageable architectures. Imagine you are working on some experimental features. And until you're finished fully and you're still experimenting, suddenly someone started using your feature in their project. And now not only you have a dependent, which whenever you change something might be breaking change for them, but if you decide that your entire experiment was just a failure and you would rather remove this feature, well, you can't anymore because someone is depending on this. Code constraints is the ability to restrict access to your project, to specify project of which type can access this. It could be as simple as if you have, for example, Angular and React in your repository, you can say that Angular libraries should not load React libraries. But what if you have just single project? Should you still use monorepos? The answer is simple. Absolutely, yes. Because they also provide powerful generators. Normally when you start a project, you start with a CLI that generates and scaffolds projects for you. Most of the time it's fine, but often it's not. And that's why there are so many boilerplate projects that take you one additional step forward. Now, imagine if you could configure your CLI to generate stuff as you like it, with unit tests, with end to end tests, with your state management set up the way you like, with all the utilities and applications just the way you like it. With NX, for example, you have these powerful generators where you can really configure everything and not just on any time, but also every time you're creating a new component or you're creating a new function or a new project, you can specify how it should look like. And it also gives you ability to nicely migrate things. So imagine there's a new version of your Beloved framework and there are some breaking changes, of course, and normally you would have to go to your code base and fix all those, spot all those problems and fix them manually. With generators, with automated migration, you can just run migration and it will fix all this stuff for you. And finally, we have consistent tooling. I don't know about you, but for me it was always annoying when I switched between different frameworks and then I have to remember, was it npm start or npm serve or was it npm develop? And it's not just serving application, all of these commands have different parameters and different ways of running. So consistent tooling creates a wrapper around these commands so you can always run the commands in the same way regardless which framework is under the hood or whether it's Next or Nest or it's Cypress or ESLint and it's not just commands, it's also parameters that are always the same. So you only need to learn one set of commands. And even this you don't have to, because here what I forgot to mention is that we also provide extension for VS Code that give you this nice graphical representation of all the generators so you can easily find the generator you like and it will give you information about all the parameters you need to pass. You don't have to remember anything. So let's recap. Monorepos bring clarity by giving us workspace analysis and graph visualisation. They give us speed by leveraging local and remote caching, task orchestration, detecting affected nodes and distributed task execution. And finally, they make our development life easy because they give us code sharing, code collocation, powerful generators, consistent tooling and code constraints. If you would like to know how different monorepos solutions compare to each other, you can head over to monorepo.tools which gives you in-depth comparison with all the features listed. And this is open source website so it has been maintained by creators of popular monorepo solutions. Or if you don't want to spend so much time, you can just take my word and use NX. And if you were wondering if this is the thing that will finally make you a 10x developer, why settle for 10x when you can be NX developer? Thank you and enjoy the conference. We are running a little bit late so we're just going to do one little question from the QA. Question from Anonymous. What about having multiple versions of the same framework? So you have an application with React 17 and 18, what about that? Yes, so all the monorepo solutions provide possibility to have hoisted node modules, meaning that each project can have their own package JSON and their own set of dependencies. So you could have in one project React 16, in another React 17, in the third one React 18. Of course, we advise you to try to level all those versions so that you have just a central place where all the dependencies are set. But if that doesn't work for you for any reason, you can simply just hoist them with your projects. It's all good. Well, thanks a lot, Miro. Thank you. A big round of applause for Miro and then we're going to go to our next speaker.
25 min
16 Jun, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic