Victor Savkin
Victor Savkin co-founder, ex-Googler, ex-Angular core team member. Work on dev tools for TS/JS. Nx and Nx Cloud architect.
Panel Discussion: Mono-repo Vs. Multi-repo Vs. Hybrid
DevOps.js Conf 2021DevOps.js Conf 2021
33 min
Panel Discussion: Mono-repo Vs. Multi-repo Vs. Hybrid
And welcome everybody to our panel discussion. Today we have a great topic. So Monorepo versus Hybrid versus MultiRepo. And we have a lineup of speakers. So speakers, please go ahead and introduce yourself. Luke. Hi everyone. My name is Lukande, but a lot of people call me Luke. I'm a Star Wars fan, so you can also call me Luke Skywalker if you like. And I'm from South Africa. My actual nationality is that I'm Zambian. Yeah, senior software engineer working at a company called Entelect Software in South Africa. And yeah, primarily in the financial services sector and focus mostly on the cloud and DevOps space. Nice. And if you guys are a supporter of one of the approaches, please tell me which one. Luke, are you a supporter? Look, I'm between both multi and mono, hey. But to be honest, most of the times I go with the mono approach, especially even with my side projects. So yeah, I guess I'd go with mono. Okay. Julie. Hi. Okay, I found the unmute button. I'm still doing that after your COVID. But yeah, so hi, I'm Julie. I'm an engineer at Microsoft. I help customers use Azure on board to Azure. Before that, I was an enterprise architect at an insurance company and helped a lot of developers sort of figure out DevOps and mentored them, annoyed them, because I always have to do the security thing as well a little bit, sort of be the bad person that says you have to sign your commits. Yeah, you're probably going to ask me now, mono, multi, hybrid, I've seen it all. And I'm going to give the architects answer, which is depends on the situation, which one's best. Nice, nice. I like it. Victor, please go ahead. Sure. So I'm Victor. I used to be at Google on the Angular team, and this is where I actually sort of, not fell in love, but sort of got to know MonoRipa style development, what it can feel like. So Google, Facebook, Twitter, and others all use a very similar setup. And for the last couple of years, I've been leading a project called Annex, which is a Google style MonoRipa tool for the masses. So if you don't have Google, but you want to do something similar, this is what it is. So obviously I'm in the mono camp, but I actually don't care that much. We can do whatever we want to do. Okay, thank you. And Angel. Yeah, so my name is Angel Rivera, developer advocate currently at CircleCI. And before that, I worked pretty much US military, working in Space Command all the way through to small startups in Europe and the US. And then also went back to work for the US government for a little while. So I have a wide range of experience and a lot of things. To answer your question of which preferred repository, I'm a hybrid man. Wow. Definitely. Nice. Yeah. Okay, so the first question, could you please tell me and think about which top is better for different kinds of projects and why? Luke, please. Sure. So I guess just from personal experience, as well as learning from others, I think the mono repro approach is great when it comes to working with the smaller sized projects or medium sized. And it's really good to help just with knowledge transfer in teams. That's why I, that's one of the things that I like about it the most. You can easily have people integrated into getting a big picture of what the whole project is. They get to see if you have different services in that one repo, they get to see the entire lay of the land, so to say. So yeah, that would be one of the major reasons why I think those would be good scenarios for a mono repo approach. With multi repo, I think if you've got, if you're looking to have strong ownership, that's something that I think is becoming more and more popular, obviously because of microservices. That's pretty good to have over there, in my opinion. If you want to have that kind of strong ownership and you want teams that are integrated to have that full end to end approach, then definitely. I am very keen to hear more about the hybrid approach and where there's room for that. So yeah, but those would be my thoughts. Okay, thank you. And Victor, what do you think about this? Yeah, it's a good question in that it depends on how you define the mono repo. If you define the mono repo in the sense like the Google style mono repo or whatever, like the Facebook style, one super large repo. So obviously that doesn't work for everyone. On the other side, you have three projects side by side in a folder, and it's a mono repo as well. So if you think about the former, so very few companies should adopt the former because it requires a lot of energy expertise to just manage the stack. If you think about the latter, it works for any project, even if I'm building a blog and the blog has two features, like the list and the details, I can think of the list and the details being separate projects that my blog consists of, hence it's a mono repo. So it really depends. In terms of the rule of thumb that I use when I talk to companies who help to use mono repos is that if they do, basically if they're closer to a monolith, they have one large system, they can still do some structured development, partition six of the packages, and treat it as a mono repo with multiple modules and be built independently and then assemble into a larger system. But the gain is not as big comparing to when they have things like micro front end, micro services, because basically what mono repos do well, I think, and that's what many companies realize is that they help with coordination. Once you have a system, let's say it consists of 20 nodes, I need to work on those 20 nodes in a structured way, share code, figure out how to deploy it. What is a system? Being able to spawn the 20 nodes in development and interact with it. That's where you need this extra tooling, the mono repo, what we call style tooling, that will help you to manage that system. And the more nodes you have, the bigger the gain is. So if you have 100 nodes, well, then you have to have something to help you manage 100 nodes. So obviously it's sort of a chicken and egg thing. If you have good tooling, you will create more nodes because it's easier. So it's kind of hard to gauge sometimes. But overall, I think counterintuitively, if you have micro services and micro front ends, you actually benefit more from mono repo because you need to coordinate them in some fashion in development and in deployment. Okay, thank you. And Julie, what do you think about this? So I'm going to lower my head. I feel old when I do stuff like that. So I've seen it all, right? Like I said before, and when I was an enterprise architect, I initially started as an engineer and people talk about, oh, you should use this or that. But let's look at what it's like in real life, right? So we had something that was built. It was a distributed monolith in a mono repo. And how do you split that out? So it was okay. I think to start with a mono repo because we weren't sure what we were building, right? In terms of business domain, what do we actually need to do and execute in terms of features for this product to be successful for the user? So when you're kind of experimenting, you don't want to coordinate all that. Yes, you will shoot yourself in the foot in other places, running builds and whatnot. But in terms of just what is my product that the user actually sees, it's much easier in the mono repo. So I think maybe after a year, we got to the point where we're like, oh, we got to take this apart now. Because also the engineers had developed enough skills that we really do want to separate out the single page application. And there was a multi-tier application, right? So there are different sort of layers you have to go through. And again, this is enterprise. And so for a while, we had a distributed monolith. We split them out into repos, and then we pulled them in as Git sub modules, which was also less than ideal, but it's a little bit that sort of intermediary state until you get to the point, if you ever reach that point, where you do have microservices. And not everybody reaches that point. Sometimes you go back into a mono repo. And now, as I always have my architect hat on, I say, I don't care. Just ship whatever works for you. Okay. Angel. Yeah. So I'm going to take a different approach to your question. I think that it really doesn't matter, right? How you structure your code and where you structure it. I mean, these are just basically locations, right? Like where we're storing the stuff. And I think the problem that we all have is the tooling. And I think Victor has touched on that a little bit. The tooling is inefficient, right? I don't care where you work. Google, Facebook, CircleCI, Microsoft. The versioning control systems that we have are just not good. You've run them at scale. It's crap, right? We're not there yet. The industry. You know, we've made huge progress. Believe me. I've been around this business a long time. But the tooling is where the problems are. Because if, like, I mean, Victor and I think Victor was the one that worked at Google. They have a huge mono repo. They had to build their own kind of internal tool to manage the versioning process, right? So, let's start with the tooling. I think that's kind of where all the pain is coming from. And until that is, you know, bettered or evolved into a state where, you know, we can do versioning fairly proficiently, then I think some of these other problems will be, like, irrelevant, right? That's why I said hybrid is kind of where I'm at. Because when you're talking repos and mono repos, it's literally just where is that code being maintained and stored? And then the problem is how, you know, the tooling that we use to version that stuff, right? So, that's my answer. Yeah, thank you so much. Yeah, you're right about the tooling. And let me ask you a short question. And I'm supposed to hear a short answer just about the tooling that you use. What is the preferable setup for you if you're talking about the mono repo, multi-repo, hybrid approaches, Julie? You can just leave. Yeah, what is the preferable setup list of technologies that you use if you choose mono repo or, let's say, hybrid approach? So, here's the thing. I'm not sure I 100% agree with Angel. Why? Well, again, I have my enterprise architect hat on and the security division issue, right? Who's working on something? When you have one system, you always have then more management overhead to separate those people out. I come from compliant industries and that's always something you have to solve. And if you don't have enough people to do it, then you say, let's just split them up, all the different teams, even if they're building a distributed monolith. Tooling, I don't know. I feel like tooling doesn't solve people problems. At the end of the day, a lot of this coordinating, and that's what I mentioned in my talk, making sure everything goes well from your laptop to pushing to get to actually deploying, it's all a dance. It's all choreography more than the tooling. Give me anything and yes, one will be better than the other. But what's really painful is working with other people, right? If I could just build my own thing and full stack, then no people, no problems. That's my answer. I'd push back on that. Let's go. Let's go. It's very obvious that we have a tooling problem. I mean, you know, what you talked about, yes, I agree 100% that people do get in each other's way. I think that's more of a DevOps kind of problem, right? So we're talking about culture at that point. What I'm talking about is repository indexing, indexes becoming astronomically large, inefficient. That's the reason why Facebook, Twitter, Google all have their own internal in-house built system because of Git's inherent slow indexing, extremely painful indexing. Once it gets to a point where you're talking about terabytes of data, actually even less than that, right, in some cases. So the other point to kind of counter that would be that, yes, we have a people problem, but if you fix that inherent problem with the versioning tooling, you can also address the security problems that you're talking about. That's easily addressable through automation. I would throw CICD platforms at that security problem to collaborate around that. And we're seeing huge gains in security, right? Even Azure, the GitHub now has bought Semo, they acquired them, and now they have a slight I would say they do security scanning for you out of the box on the platform. I'm sure that's going to grow into its own product line eventually, but right now you're getting it for free. So you know, those security type questions are being answered currently. They're not perfect by any means, but I think with DevOps implementing that to handle the people portion, it is a separate issue than the version control systems. The reason why we have repo, monorepo talks, right? Or this conversation, my opinion. I think we come from large industries, right? Like I'd love to hear what Luke, for example, sees, because we're very skewed in the big Microsoft, Google's and Facebook's. So I'd love to hear what Luke thinks. Let's ask him, Luke. Yeah, so I'm in the red tape industry, man, with the financial. Yeah, so he's not so small, right? Right. And look, one of the things that I find really challenging, and I would love to hear your opinions on this, is when you have huge teams working on a monorepo. So they believe in the benefits that a monorepo would provide, but they're trying to enforce that kind of system. And how does that work in terms of like commit standards and even standards in terms of best practices? So that's why I really do lean even towards like, there's definitely a people element to this and maybe starting from there. Tools obviously are an important factor, but I think I've seen a lot of the people problems and that maybe if we started from there and try and solve things from there and build from there in terms of the paradigm we should be using, as opposed to, because not every project is alike, not every company is alike. And there's obviously going to be a temptation with the bigger companies in the industry. I've seen it where it's like, oh, but Google are practicing this, therefore, let's do that too. It was like, we're not Google. And our team is not structured like the teams they have there. We don't have the same kind of maturity they have in terms of the additional tooling they're using to make things more efficient. So yeah, those would be my thoughts on that particular area. Yeah, I just want to clarify that I mentioned teams or the tooling piece being like kind of the genesis of the problem, but I agree a hundred percent. It's all about scale. So I'm talking about running these repos at scale, but yes, of course. And that's why I said hybrid, right? Remember that. Oh yeah. Victor, could you please share your thoughts? Sure. That's actually a very good, very interesting point. And I think that it really depends on, again, how you define a monorepo. If you look at the Google scale or like Facebook scale, Microsoft scale, well, yeah, a lot of tools for sure do not work really well. You have to use like GVFS, do lots of interesting things so you can actually traverse the repo. So I personally work with a lot of large companies building large systems, but they don't have like one uber monorepo that runs the whole organization, right? It's usually like we have 10 teams running a monorepo that contribute to one system, right? And once you go to that scale, it's not that, you know, it works, right? So like you have to go past that. So I think that for most organizations, one uber monorepo won't work anyhow for organizational reasons, right? It's like, you won't be able to, they can't even pay for the same, like the Azure box, right? To run CI, CDN because two works have different budgets, right? So like let alone tooling, like just those organizations can't agree just for business reasons, right? To move together, right? So when I talk about monorepos and benefits of those, I don't mean like Google monorepo scale-wise, I mean Google monorepo style-wise, right? You have same type of tooling, but a smaller, a much, much, much smaller system. Let's say we have a thousand developers work on it, right? Where you have maybe tens of millions of lines of code, right? We're not talking about a massive system that's around the world, more like we have, you know, 10 apps that comprise this, you know, experience that most customers perhaps, like let's say over bank, right? Interactors, right? So at that point, tooling is less of an issue, right? When it comes to like how teams should coordinate, I think that it's true that at the end of the day, everything ends up being like a people issue in that, like you have to work together, agree on standards and stuff, right? But it's not like people, like I don't know which one will first, like if you like take something like social media as an example, right? And say, yes, if we were all normal people, it wouldn't be difficult on social media, we'd be nice to each other, but we are not, right? But it's partially because social media forced us to be not nice to each other, right? Because of the weight, you know, the incentives that are there, right? Same as tooling. If you have good tooling, it allows you to separate your code from everyone else's code to enforce best practices in a relatively straightforward way. You know, like people kind of coordinate better, right? You're like some of the pain goes away and you have a nice experience, right? Where you can actually deal with things. So I don't know which one is more primary, right? I think tooling, I'm a tools provider, so obviously I'm not an agile consultant, right? I'm providing tools, so obviously I'm skewed towards tooling because I feel like that is something that you can, they have a relatively low cost and can be integrated relatively fast compared to changing to like organizational structure or culture or even human nature, which is a lot harder to do. So those things, okay, I let thought work, people fight that battle, right? I'm not fighting that battle, I'm fighting the battle. Certain things should be simpler, right? Some interactions should be simpler as such that the cost of an interaction is lower, right? And when it comes to a previous point, actually I wanted to add a comment, but I wasn't sure if I should interject in that. I actually think that another thing about Monorepo that I find interesting is that folks often assume that if you're on the same repo, you necessarily more coupled to each other, right? But it's not hard to imagine with proper tooling, it would be the other way around, right? Google repo is not more coupled to each other than like a Polar repo organization, right? Like you can have boundaries within a Monorepo that just as strict as if you had a Polar repo. So those things you have to agree on, right, for sure. But it doesn't mean that you're necessarily building like a convoluted monolith where you cannot draw organizational boundaries. You can draw organizational boundaries, you can do all the things that Conway's law would require you to do, right? So, okay. So it's not really a one thought because there are a lot of points made before, right? So just to tell you, tooling is the foundational thing that you can cheaply buy or integrate compared to lots of other things, right? And then once you have it, you need to solve people's issues because at the end of the day, that's what results in your code being convoluted and stuff, right? Sounds like you're in my camp with hybrid. I mean, I think the word hybrid as well, like if you have a repo for a suborg, it is a Monorepo for that org, right? If it doesn't interact with other Monorepos, like if you're on an island and you have a single boat going back and forth, it's not really a hybrid. You're still on your own, right? So I wouldn't call it a hybrid, but if that's how we define hybrid, then yes, right? I think that most organizations should go hybrid. I've been one Uber large repo for the organization that just doesn't work, let alone tooling, right? They won't be able to buy machines or whatever resources to pay for it because they won't be able to agree on budget, right? So like there was lots of problems there for most orgs. Yeah. The hybrid for me is the ability to just, like I said, introduce the code bases you need to introduce to your automation, right, your processes, your build processes when you need to. So to me, it doesn't matter where these things live and how they live. If they're separate or if they're coupled, I don't care as long as my tooling can, you know, complete and execute tasks I need them to complete when I need them to complete them. Yeah. No, I kind of get it. I haven't had the point, but I know that you have to say it. I'm going to interrupt because I've had my hand virtual hand up for a while and now I have my physical hand. I just wanted to ask you, please. Yeah. So let's backtrack everything a little bit, right? Like I just, I worry sometimes like who is our audience? We're throwing so many things out there, right? Let's make this a little bit more concrete. We keep talking about tooling and getting stuff in repos. All right. So making it more concrete, what do we have for tooling? We have branch protection, we have pull requests, right? How do you want to fit all that together? And when I talk to customers, I talk about, you know, business units, fruits and veggies, fruits and veggies. And then I have dev QA and production, dev QA and production. Think about that math. It grows almost, it feels like exponentially. I'm terrible at math, so probably not exponentially. But it is going to be a combination of tooling and people and just sit down and plan that. Right. And it's your business driving. How do you need to separate your groups and use names? Don't use foo and bar. Fruits and vegetables. How do they work together? And again, maybe I'm going to ask fellow panelists what other tools they think of, but the ones that always pop to my head first are pull requests and branch protection. Well, those elements of pull requests, right? Those are outside of like the core versioning tools, right? So that's a product of GitHub. You know, they call it something else in another product. So I think, you know, looking at the tooling natively, then I agree 100%. There's a people obviously element to that. But I think that it's a conversation to be had outside of the modern repo. Because once you figure out the continuous integration portions of your processes, right, that's the, in my opinion, the developer people portion where they're collaborating around code, right? I don't think it matters where the code lives and how it gets integrated into your processes, right? So for me, they're two separate things, though they are related. Okay. Thank you all. Next question about statistics. So you just said a lot interesting about this, those approaches, but what about statistics? Could you share some statistics like development speed, again, CI, CD process, call duplication, and so on, indexing? And Joe, so Luke, what do you think about this? Yeah, I guess I could speak a bit about the deployment bit. And just if I am understanding the question correctly. I mean, so you have, again, this is kind of going back to the whole it depends or case by case situation, right? So with monorepo, depending on how you've structured it, you could have a relatively quick deployment stream in that regard, if it's structured right, again, and then depending on the additional tooling that you're using, and this is going to vary. The norm of what you will find is that if you're looking is that the larger that and more complex that that becomes, the slower things may get. But even that is still kind of subjective, because it still comes down to what tooling are using, what is the paradigm that you're actually using in this approach? Whereas naturally, with a multi repo, because things are smaller and a more micro sort of basis, things will be a lot faster. So I think in that particular regard, you can expect things to move much faster, statistically speaking. But even then, it's so hard to say that that's a universal thing, because this is very case by case, so to say, across projects. So yeah, those would be my thoughts on that. Angel, you already said about indexing security. What else you can mention here? Yeah, so you have to, I agree 100% with Luke, and everyone else who mentioned this, this is case by case, right? Everyone's teams are different. And I think Julie even talked, touched on this a little bit too. But you have to understand what, so to kind of go to Julie's point of bringing that back, we did kind of go all over the place. And it's probably my fault, but I tend to do that. But to kind of refocus all of this, you got to understand what you're inheriting, right? So like, as a developer, I've been to many organizations, I inherited people's code, we all inherit, right, at some point, some code. The biggest struggle that I ever had was exactly what we're talking about today is like, all right, what do I do with this? So it's a monorepo, how do I, I come from the world of I'll decouple the hell out of things because I like smaller, more manageable bits, right? But you can't go into a company like, I don't want to say the name, but there's a customer of ours that's huge, huge company, been around for a while, they have a monorepo that is immense, right? If they were to break that apart, it would just cost them millions, probably billions of dollars. We're not going to do that. So we have to operate within our context. And that's my point, right? You have to first assess what's going on with your teams, your code base, can things be decoupled, right, slowly, maybe, but should you decouple them, right? Like, so if your monolith is functioning and it's awesome, great, I don't see any reason why you should decouple it. But if there are improvements you can make that make things more efficient, to Luke's point, you know, smaller code bases run faster. I'm all about efficiency, getting my builds durations to be as small as possible. So look at things from that perspective, how can you optimize, but you also have that cost, right? Like, is it going to cost the company or the organization millions of dollars to implement a change and what is that return going to be on those changes, right? So that's why I'm a big, huge proponent of, I guess, the hybrid, because you will have situations where you're going to inherit these huge monolithic repositories and then you're also going to have to integrate some sort of new technologies or new services with that monolithic app, right? So do you couple your new microservices and merge them into that monolithic application or you just leave them decoupled and run in that hybrid state, right, where you have this one project where it's just running in this load or in this repo where it's mono and then you have these other smaller services that are maybe newly built and leave them as decoupled, right? I think the answer is leave them as hybrid, leave them decoupled if you think that that's the most optimal way. But if it's more efficient to couple them and have them in one, then do that as well. It has to be a choice made by the organization with their specific circumstances. I'll leave it at that. Okay, I see. Thank you. Victor, would you like to share some statistics? So I think if you want to make a case for poly or mono, you can always come up with some numbers to prove the case. So the way I would measure, right, sort of the tooling performance and then the organization performance. Tooling performance would be things like CI, CD, average time, worst case time, things like that, right? And then, I mean, depending on how you measure, you can make mono or poly fast. For example, if you have a system consisting of like four modules that can be in one multiple repos, they form like, let's say, a diamond shape. Yeah? So you can make a change to the bottom one. Do I care about verifying the bottom one? Or do I care about integrating the bottom one like across to the top one, right? And do I basically count this CI, CD run in case of poly or repo for all CI, CD runs that will be required, right? Things like that. So if you start doing that, then, you know, depending on what you choose to do, right, I think it ends up being the same, right? In that, of course, you can do like everyone distributes CI, right? Even if you have a small repo, you should distribute CI. You don't run on one agent. Once you start distributing CI, it doesn't really matter whether you have mono or poly repo and that you still run your CI on like 30 boxes at the same time or 50 boxes, however big your repo is, right? So you can always make your mono repo or poly repo to be fast enough. Whatever fast enough depends. To you, let's say it's like 20 minutes in worst case scenario, right? That can be made for repo pretty much any size, right? Unless you have one large node that cannot be divided, at which point, again, it's a moot point in poly and mono repo because you have the same node that can be divided in either case, right? So I think it doesn't really matter. So in terms of like tools performance, can be made work with either. Organization performance is a different thing, right? Then you start measuring things like what if I want to share code? Like this is my intent as a developer. Like how long does it take for me to share code between like myself and another team member or like a different from a different project, for example, right? Does it take an hour, a day, a week, a month, right? And in a mono repo, that is much smaller. That's the benefit of the mono. The sharing is easier, right? But if you measure different things, like then you can make poly repo look better. It really depends on what you care about, right? Most organizations I work with want to share more, not less, right? They are too divided in the lines of business and stuff, right? So for them, they look for sharing as something to optimize for, right? So the mono repo is good in that case, right? But if you have one, you know, glued together or can you want to divide them more, perhaps you want to share less, right? For organizational reasons, at which point you could still use a mono repo to do that, but poly repo gives you a sort of in a more natural way, right? Yeah, we're almost running out of time. I'd like to ask Julie. That's your last answer. What do you think about this? It all depends. I think we can continue in the spatial chat for people who want to debate with us. Okay, I think. Thank you all. It was very interesting. Thank you. Bye bye. Bye bye.