Monorepos have been around for some time but only recently gained popularity in the JavaScript community. The promise of easily sharing code, better enforcing organizational standards, greater developer mobility due to common tooling, and more is very appealing. Still, if approached naively, a monorepo will quickly turn into a huge mess: skyrocketing slow CI times, spaghetti dependencies among projects, hard to navigate, and ultimately leading to frustration. In this talk, we will look at the available tooling, how to kickstart a new React monorepo in particular, and we will learn the key ingredients required to build a successful, long-running monorepo that scales.
Fast React Monorepos with High Quality DX
From:

React Summit 2022
Transcription
♪♪♪ ♪♪♪ ♪♪♪ ♪♪♪ Hey, and welcome to my talk about fast React monorepos with high-quality DX. But before we go ahead, let me first introduce myself. So my name is Jeroen Stromfflormer. I'm the Director of Developer Experience here at Nowrl. I'm also a Google Developer Expert in Web Tech. I'm an ECAD instructor and Cypress Ambassador. Nowrl is the company behind NX, and we provide consulting services for Fortune 500 companies, but not only, actually. And our range goes from helping with Angular development, React development, but in particular, helping with monorepos. So we help migrate to monorepos scenarios, set up new monorepos, but especially help succeed those monorepos in the long run. We're also the creators of the open-source toolkit NX, and in case you missed it, most recently we also took over Stewardship of Luna, which is now joining forces with NX. So what is NX? NX is a smart, fast, and extensible build system. It can be used for monorepos, and today I'm not going specifically into NX directly, but I'm using it as an example of a feature set, like a fully feature-complete monorepo tool that can help you not just getting started fast, but also succeed in the long run. Now, NX is hugely popular. So we have seen it cross $1 million in December and are about to cross $2 million per week in June, probably. So this is super exciting, and it shows how much traction the monorepos space recently got. Now, what are monorepos? Rich Harris, a couple of weeks ago, posted a tweet, which I'm 100% agreeing with, because I have the same question from people, and I have to answer and explain it over and over again. So he was basically not happy with the term monorepo. And the problem is that monorepo implies, or what many people think, is that you need to have one single repository for the entire organization. Now, it's perfectly clear why people think that, but in reality, what we see when we work with now, or for instance, with large companies, is more something like this. So you have large monorepos, a couple of them, inside your company, maybe split by department or organization or domain, and then you have also the existing polyrepos, or new polyrepos even, that come up within that organization. So if you have a couple of such a mix of landscape, and you share stuff over registries, over internal registries, even monorepos share some of the parts for the outside, because if you have some polyrepos that might want to benefit from, let's say, the component library you build within a given monorepo, you might want to publish that to a registry as well, apart from just using it within that monorepo. And so that's perfectly fine. Now, probably having that, a term like multi-project repo versus single-project repo would be more useful or more meaningful, but I'm not going to coin a new term here. So whenever I'm speaking about monorepos, what I intend is basically a single Git repository with two or more distinct projects, and polyrepo on this other side is more the classic scenario, which a lot have basically, which is like a single repository with one project in it. So how do monorepos look like? They come in different forms. So a lot, especially in the open source world, what you see is just like one repo with code collocation. So you have a couple of projects in there that don't necessarily relate to each other or have relationships between them. So most of the time, it's just for reusing CI setups or common utilities for being quicker in publishing to NPM, things like that. But the real value you get out of a monorepo is actually if those packages relate to each other to some degree. So they share code and facilitate basically collaboration within the monorepo. And in fact, like a monorepo is all about architecting your system. So many times what you have is scenarios like this. You have different domain areas within your organization and very often exists a separate polyrepos, but if you converge them into a monorepo, the scenario might look like the following. So you have these domains, and as you evolve your monorepo, you split it up into smaller subprojects. So you have like an application per such domain, which you can deploy independently, and each domain has some specific libraries that are not shared with other domains or within other parts of the monorepo, but they're specific to that single domain. The main reason of having those libraries is to split up the code base, have like teams more organized and be able to work on single projects within that workspace, basically within a single domain. And you can also have like much more fine grained APIs in that case, which help you create more maintainable code in the long run. Now, obviously you get more out of it once you also start reusing things. So at the very bottom of this image here, for instance, you see those libraries, which are used a lot by the upper level domains. And those libraries are usually like things, more code utilities, like authentication libraries, logging libraries, maybe calculating dates, or also maybe even some more specific parts to your domain, to your organization. For instance, it could be like a UI component library that lives down there, which is used by the various parts of the domains to have like a uniform UI appearance and look and feel. So already from that picture, you can see that a common misconception that is around is basically proven to be false, right? So a lot of people think like monorepo is equal to monolith, which is quite contrary, because we have seen that we even have different applications per domain within that monorepo. So they can be deployed independently. So what you rather do is you split up rather than creating a monolith in this case. Now, down here, as you can see here, we have the more utility libraries, which I've just explained, but those are not the only ones. You can even have connection between domains. And also in those cases, it is very useful to create, for instance, dedicated sharing libraries, which expose like common data objects that can be used by other domains and common API interfaces, such that domains can communicate with each other. For instance, the checkout domain might need utilities, be it like UI utilities, but also communication utilities for the product part of the domains of your organization. And finally, as I mentioned before, you might even have those very leaf notes, which are some very, very general purpose libraries, such as UI components, which you might also want to publish outside the monorepo, such that like part repos can consume those. Now, as you can see my title, the fast part is very important in a monorepo because if you approach them naively, it might be easy to get started, but like a year in or so, things might get very, very slow, very quickly. And so if your PR on CI takes over an hour to build, that hinders new feature development and slows down your teams. So basically create a counterpart rather than having some more collaboration, you basically start to be detrimental. So you definitely need to have tooling that allows you to build just what changed. And most of the development or most of monorepo tools have something that is called the project graph. And what can be done with a project graph is for instance, if such library as here in the picture in the middle changes, those tools can walk up those projects graphs to understand what needs to be built or tested or linted. So in this case, you can see, we can already cut out quite some libraries and applications that don't need to be touched. So obviously it will be much faster than also running tests for those. You can even leverage this for instance, to understand what needs to be redeployed, such as like we have always the latest version in production. So in this case, you can see all of the applications in our domains are affected because it's a very central library in this case that got changed. And so we would probably want to all deploy those applications as well. Now, affected is just one part. Caching is what makes that even faster. For instance, let's take that same scenario where we change those libraries and they all get run in CI and one of the lib like in the upper left corner failed the tests. So the developer looks at them, pulls it down, fixes the test, pushes it up again. Now we don't have to run all of them again because all of the other parts are not affected by the change that was just made. They're still affected compared to the main branch like master or main. However, we only need to run tests again or computation again for the lib and the app in the upper left corner because all the other results can be fetched out of the cache. So obviously you get much more speed. Now the distribution is really the key here because right now the cache, for instance, in NX specifically by default is just local. So local to average developer's workstation, which means their local development gets quicker and speeds up, but you cannot really leverage it, especially in CI. And that's where we see a lot of folks use actually the caching because you want to make sure PRs are fast to get merged in. And so for being able to have a distributed cache, you need some central server side cloud counterpart, which in the NX case is NX Cloud, which allows you to distribute that cache among team members as well as CI. And NX Cloud goes even a step further. So it doesn't only distribute the actual caching, but it also distributes the task execution, which makes it even faster. So normally what you do on CI is you get that project graph that was affected by the change. And then you have a set of agents because what you want to do is you want to parallelize as much work as possible. And so you can actually split it up in equal batches. That's usually what happens. That's the most naive approach, like just split them up in equal batches and distribute them to the agents. What might happen though is that some agents are super fast because they got tasks to complete in a couple seconds while other agents run for minutes. And so still the entire CI run has to wait until all the agents finish. And so therefore, digitization of the agents is suboptimal and also the actual time is suboptimal in the end. So it's not that fast as you would expect. Now NX Cloud, for instance, distributes the caches or distributes not only the caches, but also the tasks uniformly based on historical data. And so it knows based on a graph, which processes can be built in parallel, which basically where they may just assign a single task to an agent because it's a long running one and which one needs to be serially executed. And in the end, the DX part that comes in here is that all the logs and artifacts are being collected and sent back to the actual main node and grouped together. And so from a developer perspective, if you get an error or want to look at the actual run, you can just go to the logs and you will see them the same way as if they would have been run on a single machine. And it's even cooler because like all those tasks now are distributed across the agents. As you hear, you can see a screenshot of NX Cloud showing the utilization of the agents and you can see how they are balanced out also based on the previous data that NX Cloud has gotten and therefore it knows how to best parallelize those tasks. And you get even like a nice visualization. So whenever your PR runs, you can in real time understand how many agents are currently running, which tasks are running on which agent, which is particularly important for debugging purposes. Now this was kind of the fast, already a bit interleaved with the DX part as we have just seen, like the visualization, for instance, that helps you from a developer experience perspective to debug things. But a DX is very important in a React, in a monorepo scenario in general, because there's nothing like having a fast, super, like worst feature-rich monorepo setup, but it's super hard to use from a developer perspective or to configure. And so in NX specifically, the important part is the developer experience part. And so first of all, things need to be incrementally adopted, right? So NX can be set up in two different main ways. So you can get started with a core setup, which means you don't use any plugins that come with NX, you just install NX and use it on your existing monorepo. And so for migration scenarios, this is ideal because you can still use the same infrastructure that you had before, and NX will just make it much faster to run the task. So the thing you leverage is the fast task scheduler, the caching, and also the distributed task execution if you use NX Cloud, which you have just seen. The other approach is that if you start new, you can actually benefit from setting up NX with some pre-configured things. So you can say, okay, I know that I'm using a React monorepo setup because React is my main focus. So you can already use some pre-configured templates that NX comes with. And so then NX would make sure that you get like Jest setup, ESLint, Prettier, Cypress configuration setup for you. So you don't have to worry about a lot of that stuff. And so usually that is the best option if you start a new monorepo right now. So for the core setup, it is super easy to adjust. You basically just run the npx add NX to monorepo command, which would add it to any npm-er and pnpm workspace and just make it a lot faster. Interestingly also, as I mentioned initially, we took over Serial Chipper for Learna and now we can do some very interesting things, especially for your Learna workspaces. For instance, right now, if you're using 5.1 plus of Learna, you can just install NX and set the use NX to true on your Learna JSON, and it would automatically defer the task scheduling to NX, making Learna super fast without you having to change anything else, which is super important, I think, from a developer economics. Another thing is beautiful output. Now this might not sound as important initially, but if you think like how often you look at terminal outputs as a developer, and so in order to reduce the cognitive load, NX really just shows you what is most important right now. So it doesn't show you, for instance, the execution of the pending tasks as in this animation here, but just what gets executed, unless of course some error occurs. That would be highlighted, obviously, big and in red. And even if you rerun the task and they get cached, the output is exactly the same. So you basically have a much, much less lower cognitive load when you parse those logs because you just see what you need right now. Also, ID integration. This is specifically interesting for newcomers, but also like if you want to explore simply the capabilities that you have, right? So for instance, for Visual Studio Code, but there's also community extensions for WebSource, for instance, we have the NX console extension, which allows you to navigate within an NX workspace and in the future, even a learner workspace, allowing you to browse commands that you can execute, things that you can generate in that workspace visually from within Visual Studio Code. Another part, the visual part, is very important also when exploring the workspace. So every or many of those monorepo tools have a so-called project graph underneath. Now in NX, we went even a step further and we visualized that as a dynamic web application that you can start just from your CLI. And so here you can see how you can, for instance, take two nodes and basically have an NX visualize the shortest path between those nodes as well as all potential paths. And you can imagine like how that can help to understand why some nodes connect with each other, like why the connection actually exists, but also for instance, for debugging circular dependencies. And finally, a very important part is not just like getting started quickly, as we have seen in terms of being fast, but also continuing support as your monorepo grows. So code generators are one thing that can help you with that, which NX comes with. Meaning, for instance, if you have like for easier onboarding junior developers, you have already some code generators in place such that like all the libraries and applications are generated in a uniformly way as you would expect. So you don't have them copy and paste, which is very error prone, of course, existing libraries, remove existing code, and then basically continue working on that. But you can actually even not just use the NX provided generators, but tailor them really down to your own needs. So you can customize them, so it only gets generate whatever you need in your current workspace. Easily that can also be triggered by Visual Studio Code. And so here for instance, you can see in this animation example of setting up that distributed task execution environment for CI. You can just generate such a CI setup, which is like super useful. And it's even just a couple lines of code if you take a look in the end. So this is an example of how powerful those generators can be without you having to write long docs of explaining how to set up those things. Similarly, also with, for instance, if you want to set up a React Webpack modification setup, NX comes with already such a plugin that you can use for setting up the main host and remote applications. And then you can use that to build up your existing stuff and then continue developing on top of those generators. Finally, also preventing spaghetti dependencies. And this is something people underestimate quite a lot if they start in monorepos. Because they get started and very excited in the beginning, but then if you are in like half a year where you have like multiple teams working on a monorepo, things might get messy very, very quickly. So people importing from Libs that they shouldn't import and maybe even just by accident, because like they saw some cool utility library and they were not supposed to use that. And so your code quickly might become unmaintainable. In NX, for instance, as you can see here, for instance, like if we have those domains, like you can use those library as domains, right? Where you can clearly expose a public API such that other people from a different domain can then leverage those functions within their own. Now, how can you however, prevent that someone imports just a different Lib? Like in theory, you can just do that because you live in the same monorepo. Now in NX we have so-called tagging system. These are simply strings. And so you can say like, I have a scope colon, some name, which for instance could be the domains. And to give to all the apps and Libs in a certain domain, you give those tags. You can add that in your configuration and all those Libs would get that. And then you also give to that API, for instance, specific tag API to mark it as something that can be consumed from outside the domain. And then NX comes just with a specific ESLint rule. So in that ESLint rule here, for instance, you can see how we define a source tag and say, okay, something that is called scope checkout, which is our specific checkout domain, can only depend on other apps and Libs that also live in that scope checkout. So that also have that tag or that they have a scope API. Because like, if there's a scope API, that is something that can be explicitly used. So that is fine. And this can then be, those checks run in CI, they run in your editor. So you immediately see if you import something wrong and are therefore very, very powerful to keep your code maintainable. And finally, automated migrations. A similar thing that a lot of people underestimate, but I think like scenarios like these are very common to use. So developers might raise that to stand up, look like we need to upgrade Webpack, right? Our tooling has gotten out of date. Webpack's run for a couple of years. We should definitely upgrade. Also React 18 came out, so maybe should give that a look. And what a product manager said, obviously, is like, okay, I understand this is important. Like, how important is it? Like, does it block us from shipping features? And most of the time it doesn't, right? It is just some housekeeping that needs to be done, right? But obviously if this takes up weeks and weeks and weeks, it is very hard to get through. And so most of the times what happens is like, yeah, sure, product manager still understands, but yeah, let's do it for Q3. We can reserve some time, right? And we all know what happens when Q3 comes around. Tons of new features, very urgent deadlines. What happens to the tooling? Just gets delayed. And so one thing we do with Enix, and we were very successfully also implementing that with like super large companies, is Enix migration. And so what happens here is that Enix has a feature built in that allows it to define migration scripts that not only migrate your configuration, but also your source code from one version to the next. And so if you are within an existing Enix workspace and use Enix plugins, Enix kind of knows what features you're using, right? And so whenever you upgrade, you just run Enix migrate, which will analyze your workspace, upgrade your package JSON, and then generate a set of migration scripts that you can then inspect and run on your workspace. And so this will transform things like, for instance, Jest integrations, it will upgrade your Cypress to the latest version, it will upgrade to React 18, whatever React you're upgrading to, and it will make sure that the code is still consistent by also adjusting potential source code. So this is a super powerful feature, something that gets very quickly underestimated. So in general, I would like to conclude to say that good tooling should be there to enable you to make the right decisions, but not just like help you get quickly started, which is definitely important for incremental adoption, as we have seen, but it should help you throughout the whole development lifecycle. And so with Enix, you get incrementally adoptable features. You can customize it to your own requirements by providing, for instance, custom generators and specified layout as you want to have it. And then it comes with all those unique features, such as like a visual graph exploration, things like as we have seen the tagging and the custom as lint rules that help you really keep your workspace and monorepo healthy in the long run. Definitely check out monorepo.tools. And if you want to reach out to me with questions, whatever, reach out to our Enix DevTools Twitter account, to devrel.io or on my own Twitter account, YuriSTR. Thanks a lot for watching. ♪♪♪