Julie Ng
Julie Ng
Julie is an Engineer at Microsoft focussing on the Azure Customer Experience who refuses to give up her mac for Windows. Previously she was an Enterprise Architect at Allianz Germany as they started their cloud journey in 2016 that included full CI/CD with Jenkins, single page apps and containers. When she's not conducting architecture design and CI/CD reviews with Azure customers or building POCs you can find her surviving lockdown by YouTubing about real world engineering or jogging and rock-climbing.
Fine-tuning DevOps for People over Perfection
DevOps.js Conf 2022DevOps.js Conf 2022
34 min
Fine-tuning DevOps for People over Perfection
Demand for DevOps has increased in recent years as more organizations adopt cloud native technologies. Complexity has also increased and a "zero to hero" mentality leaves many people chasing perfection and FOMO. This session focusses instead on why maybe we shouldn't adopt a technology practice and how sometimes teams can achieve the same results prioritizing people over ops automation
&
controls. Let's look at amounts of and fine-tuning everything as code, pull requests, DevSecOps, Monitoring and more to prioritize developer well-being over optimization perfection. It can be a valid decision to deploy less and sleep better. And finally we'll examine how manual practice and discipline can be the key to superb products and experiences.
Transcript
Intro
Hi, my name is Julie. I'm here to talk to you today at devops.js about DevOps. And I want to do something a little bit different, I want to focus more on people as opposed to, let's just say DevOps in theory and on paper. So before we get started, I have to just briefly show a disclaimer that I am appearing here as myself and much of what I'm going to share with you today is my opinion based on my experience. So it's also not a complete guide to DevOps. For this talk and the time duration, I've decided to pick a couple of examples to illustrate the points I want to make.
[00:57] So I've been building for the web for a very long time. I'm a bit older than I look, and I've worked at every place from startups to actually full corporate. I was self-employed for a long time, so I actually was freelancing at various companies and that was my initial exposure to DevOps.
Then I joined Allianz Germany, which is a multi-billion dollar insurance company, and we moved some projects to the cloud in less than a year. It was a crazy ride, but I learned so much, not just about DevOps, the skills, but really at scale. I already knew a lot of those practices and especially around Git and automated deployments, but transferring those across the team is a lot harder than it sounds. And so today I am an engineer at Microsoft, I'm part of the FastTrack for Azure program where I help customers onboard to Azure. I specialize in cloud architecture and DevOps automation. So we don't just help them with best practice guidance. If they run into a big problem or a challenge, we also help unblock them. So some of the content in here is going to be from those customer scenarios as well as internal Microsoft, kind of like my story, my experience, which is why this is my opinion, not a “here's how you should do it”.
[02:23] And the reason is because DevOps is a journey. Every company is going to be a little bit different. They're starting in different places and you always have different people with different preferences and just how they work. So it doesn't matter if you bring in somebody like me who's been doing it for a decade. It depends on the team, the company processes, and you have to make all of this work together. This is a little bit different from some of the talks I give. Part of it is that this is year two in COVID. Doing a lot of these things that have to deal with transformation and cultural transformation is really hard when everything is remote and sometimes you've never met your team. I've never met my team. So some of the things I'm going to talk about in terms of best practices actually become much more challenging when you don't have that FaceTime to have that nuance. And it's like, "Okay, is she serious or is she just being her snarky self?"
And then some of those rules, especially with security, can I bend some of them and why? So we're going to look at that. What I want you to get out of today is, a lot of these things that are best practices, even if I'm telling you they are, but you don't have to do them. You don't have to do them today, you don't have to do them next week. You have to eventually get there. And also, you can get there without following some of those best practices. So it's not about tools. It's going to be about people. And people are going to be the difference between success, both in delivering a product and as well, actually growing a team, investing in a team that will still be with you in a year or 10. So without a further ado, let's get started.
Pull requests
[03:58] So the first thing, pull requests. Everybody loves pull requests. It's best practice, but you can also do it wrong. So let's say we have a pull request workflow. Everybody's working in a branch, you might be used to, "I'm just going to open the main branch again, merge our workflow and push," and it's going to be like, "Nope." The server might reject it. So you have to then open a pull request. Then you make your commits, do all these things. And then you might have a convention or workflow set up that says, "Oh, I'm going to put a hashtag sign off. A robot should now merge this. It didn't work. Let me do that again. Let me do that again. And again." This is what I have to deal with at work. Then sometimes you just give up and you might not even close the pull request.
There are so many repos with dozens or hundreds of pull requests. It's a bit really frustrating because you want to help, you want to contribute, but it's not going anywhere. It's stuck. Now, even if pull requests are going in and the builds are running, you can eventually get code merged in, they can be really slow. And that's also just as frustrating and as disappointing. So this is what a pull request might look like. In documentation, it sounds really easy, open, approve, close and merge. So you might have it actually tied to a pipeline and you got to wait for a build agent. If you have lots of people and you don't have as many build agents, you didn't buy those as you actually need, you'll wait forever. So then if you're like me, you grab your phone and you're like, "Let me look at Twitter, see what's going on." You go back and then when it finally ran like, "Oh, it failed. But it's half an hour, an hour later, let me go have lunch, let me come back." That's not necessarily super productive either.
[05:40] So what you end up having is this traffic jam of pull requests. And that's not necessarily helpful either. So I love this tweet by Keith and he's talking about pull requests, why you have them. And it's the best way to describe when you should not have them. And that is, a pull request is like airport security. When it was first introduced, it was really designed more for the open source community and you want to welcome outside contributions, large and small. And you want to introduce a way to collaborate on that. It's not just about security, it's about discussing a code change. Oh, what do you mean by this? Can you make that change? But if you're on the same team, do you need all those gates? Maybe you don't. So let's look at what actually this workflow looks like and where people get stuck.
So this is actually a diagram I use at work and it's in the Azure documentation. And if you look at that first part, what you'll see is that, that pull request is a gate to that protected branch, the main branch or the production branch. And often I'll also see people who lock that down entirely, even admins can't push directly to it. So you really are stuck in this pull request workflow. The problem here is that that is all done in code and nobody's perfect, I'm not either. And sometimes those things break for whatever reason. And then what you end up having is actually something like this. So I blanked out everything that you can't see. And I took this screenshot this morning and it made me really sad. 26 days ago, that was the last time somebody contributed to this repository. It's something internal that theoretically as engineers we should be using every day. There's stuff in there because Azure changes every day. There's bugs in there, people should go fix typos and broken links, et cetera. But what's happening is you'll see we have about 50 pull requests that are stuck. This repository is, in my opinion, it's run amuck. It's just too many. And you also just see the number of forks we have for a team that's actually only a few hundred engineers worldwide. It sounds like a lot, but in the scheme of Microsoft or in the scale of Microsoft, is not a lot of people. What you really want to avoid, because this is the worst feeling ever, is that frustration.
[07:58] Because what happens is, you stop. And when you stop, you're not delivering business value. And that's really harmful. That's when the tooling isn't helping you. In fact, it's hurting you. So this is a great tweet from Mateo Kullina who's on the No JS core team, and I really loved it. And he was talking about, okay, he helped this company and they were deploying every two weeks. It was an overnight deployment, very long, and they're doing it every day now. So how are they doing that every day? And the key point is actually, you remove the gates and you trust your team. They get to deploy when they're ready. So that sounds really simple, but is it super simple? How does that look like?
So I pulled out this really old photo from the Allianz. So what does it mean when you're ready to deploy? So you think, "Oh, it's when we have tests, everything is great." Well guess what? In real life, not everything is great. Not everything is always great. And one of the things that was very interesting was that sometimes, and I never saw this actually in startups so much, I saw it more actually at this giant corporation. People said it was too risky to deploy during business hours for an internal line of business application. So let's still work eight hours but shift everything, and let's do it when nobody else is using it. And that's okay for the beginning. Trust me, it's not a goal. I think we only did that once with pizza and I have to blank out the faces because yeah, GDPR. So it's really up to the teams. In the beginning, you don't need to deploy 10 times a day. It's okay to start with every two weeks. It's okay to start with every month if you're dealing with a legacy application that is making you a lot of money. It just takes that time.
DevSecOps
[09:43] So another example that's more complicated, especially when you're like, "Ooh, gate keeping. Why do we have gate keeping?" We have it for security. So who doesn't want security? But it sounds too easy. Let's go through this example about how maybe it's too much or maybe we don't react to it.
So I was giving actually a webinar last week and I'm demoing how to use security and whatnot. And nobody actually said, "Julie, but look at that. There's a vulnerability, why aren't you looking at it?" I wish they asked me but they didn't. So let's go through it and I'll tell you why it's still there and it's going to stay there for the time being.
[10:24] So I go into the Dependabot bot alerts. So this is all automated. "Hey Julie, we found these issues. You should go fix them." One of them is really high. So the bottom one for a Glob-Parent. Now if we open it up, Dependabot, which was bought by GitHub, is trying to be helpful. "Here you go. It's just the version's less than 5.1.2. So upgrade it and you should be good," and it actually gives you a suggestion. So let's try that. I'm going to stick that in my package.json. I'm going to do an NPM update. This is what I see. NPM WARN. All these messages and whatnot. And at the very bottom it says, "You have 15 vulnerabilities." I thought I had two before, now I have 15. And then it says, "Run NPM audit for details." And you're like, "Hmm, okay, well guess what? I've done this before. I'm going to run NPM audit fix immediately." So again, lots and lots and lots of code. And if I scroll down to the bottom, oh, I have to do that on that screen, I still have 15 vulnerabilities.
So why? I was hoping it would be gone. So let's do an NPM fix-force. I don't even know if that's still a tag, but when you're a developer, that's what you do, you just try again. And we still have 15 issues, you can't get past it. So the hard part is actually now figuring out what should I do? There's no easy answer. You have to just stop, zoom out and think. So we have all these tools and they know about a certain fact at a certain time range, like oh 5.1.2, but it doesn't mean anything. It doesn't help you fix it.
[12:04] So after some Googling and stack overflowing, it was like, oh, there's this relatively new NPM feature. You can do an override and you can say, "I don't care what version is required in my dependency tree, I want this one." So let me put in the latest ones for these. So I'm going to update my node version manager on my local machine, my Docker file, all these things. I actually have some tests. So I'm going to do a pre-flight check which runs all of them and do a git push. Everything's green, all good. But then the CI build says, "Nope, it's not going to work." And if you dig into the error, it's giving me actually an open SSL error. So I'm like, "Eh, no, I don't want that. That's even worse encryption. Let me just ignore it. And I'm actually going and revert that change," and notice that in the commit I also put a link to the stack overflow thread a little bit like, "Why is it there?"
So you'll see actually in all my pipelines, I tend to have continued on error. It's like, "Thank you for the alert, but I'm going to keep going." There's also an Azure defender for cloud container scanner, and it'll go through the code and it found the exact same thing. And so it's like, "Okay, let me thank you for the alert. I need to disable you as well." Now all my builds are running, but that shouldn't be the end of the story. I need to deploy, so continue on error. I know about the error. Let's figure out what we're going to do. The key to that is understanding what is glob-parent, what am I using it for, and what is the Open SSL issue and how is that going to be used? So I was like, "Hmm, I'm not sure what the Open SSL, but I actually do know what the glob-parent is going to do."
[13:49] And so let's first understand what is this kind of vulnerability? And so it is a denial of service kind of vulnerability, which basically means your application isn't serving requests anymore because it's going to be bombarded. And so there's a networking version, but there's also a, let's say bad code version, where if you give it the right, let's just say inputs, then it could slow to a crawl or even crash. Is that going to happen like in my example? So I didn't show you what this project is, but the answer is no. So I'm building this application, it's a proof of concept. And what I want to do is give an assessment that's not a checklist. So if you select certain options, then maybe your security would go up, but at the cost of increased complexity. And so all of the things here, the questions and the factors, all of that is done in Markdown.
So it's pulling this all together. It's a headless CMS, I don't need a database. And it's figuring out what goes where and is attached to what, based on the file tree. So that's what it's globbing. It's matching all these things. How do I throw all this together? That is in my file system. There is no user input. This application also has a build process. So it's doing all that, figuring out live, but you just run a build at some point when you're done and when you deploy it and you're running in production mode or you're just serving it in static mode or whatever it's called, and you don't have that anymore, that gap isn't there. So in this situation, what I tell people is that tools are stupid. I'm just trying to get you to remember that the tool is not the holy grail.
[15:32] Just because it says there's a vulnerability, that doesn't mean, "Okay, stop everything, drop everything what you're doing and figure that out." Because even if you figure that out, it might be like in this case, it's choosing the lesser of two evils. So the Open SSL issue, whatever it has to do with the encryption that is used for the cookie, where the results, or rather the inputs that you gave, the answers that you picked are saved and it's all encrypted. But for me, I've decided that risk, even though there's actually no real use of data in there, is worse I think, than, okay, something that can only happen to me in development. So you won't know any of that until you go through that process. And this was just one particular example that I encountered this month. But for every kind of notification, you have to go through this process and it's really, really hard.
And then when you suddenly get something like this, we get to that other point of frustration, things that are clogged down. Some organizations will measure like, "Ooh, how many alerts are still open? How many vulnerabilities are still there?" But it doesn't actually tell you if it is a security, how bad, or it says, "High or moderate," but is it really that bad? "Lesser of two evils," that's what I always say.
Craftsmanship
&
the Art of Devops
[16:52] Okay, so we talked about pull requests, we talked about how you need to trust your team. We talked about as well that even with security, you also have to trust your team to learn and grow. And the last thing I want to touch upon today is actually craftsmanship, because we make DevOps really hard, but I want people to actually want to do it. And key to that, in my opinion, is loving what you're doing and being proud of what you do.
So this is what I see a lot of, everywhere on GitHub open source as well, not just at work. You're probably using the GitHub UI if you're clicking something and it says, "Update. Read me. Update. Read me. Update. Read me." That's okay. It's a place to start. Now what's interesting is when customers come to me, and one of the biggest challenges people face is actually versioning, especially if they're using microservices. And for whatever reason you have to correlate things, in which case you don't have a microservice, you have a distributed monolith. Anyway, when you see something like this, it looks super nice and like, "Oh my god, simple. It's super easy." And how do you do that? Well, the answer is practice and discipline. So this is a change log that I made using a tool called Standard Version, and it relies on something called conventional commits. And that's what it is. It's just a naming convention. So when you make a commit message, this is what it looks like. How it figures out if there's a bug fix or a feature, it's just a prefix. Throw some stuff between the parentheses and it'll try to categorize things for you. And then as well, if you add the hashtag and a number, of course GitHub links it to the issue. Now, if you're really, really paying close attention, you can tell that some of those, I've also adjusted by hand because even I am not perfect, even I will sometimes be in a rush and I'm super pedantic, which is why I say, "Even I." I'll be in a rush and I'll just also submit something. But when I actually make a release, I will look at the change log and edit it as necessary before committing it. Because at that point, I am publishing it to you, the consumer. And I want you to understand it. Any tiny mistakes I make in between, that's okay, it's just for me. But at the point where I say, "Okay, it's ready for the public," I want to make it look a little bit nicer.
[19:14] The other thing too is that sometimes you're not doing everything perfectly, and in this case, we had to restructure a bunch of things. Take that extra step to actually make sure everybody gets credit. This I've seen happen a lot in corporate where unfortunately people actually count those commits. And you know that GitHub profile with all those little green squares? So you need to make sure that not just for you, but also for your colleagues, that their GitHub users and usernames are also getting credit for that work, otherwise people don't see it.
Now, it's also not just about that corporate loophole, it's also about, let's say appreciation, of teams and being grateful for the help that they're giving you. So this takes 30 seconds, but it goes a really long way for your team culture.
[20:05] Another thing that I want to tell people to spend time on is actually doing documentation. So I write this both for myself as well as people who are going to use the software that I publish or samples. It's not meant for your builds, it's meant entirely for humans. This has no purpose in code whatsoever except to help you understand something. And that's really, really important because we almost always have to go back and look at things in the future. And I don't remember what I did last week, let alone six months ago. And if we're working on multiple projects and microservices, et cetera, then yeah, sometimes you take a long break from a project. So it's really important to actually add that documentation.
Now when I show demos at work, you look at something like this and this is a GitHub action workflow, and I recently took some time to actually completely migrate an Azure Pipeline workflow. It looks really nice. But how long does it take? In reality, years. There's so much experience that comes from just actually creating pipelines. What works well for me, not necessarily for everybody, and then figuring out, "Okay, there's a gap here. There's a gap here. This feature works that way." It takes a long time. The simpler and more elegant it looks, the more effort it takes to get there. And it's okay. You don't start there. What you want to do is actually invest in yourself, which means good habits. Good habits, little tiny steps that over time, actually result in that big goal you're trying to reach.
[21:38] If you try to get to it from day one, what you might end up doing is actually hurting your own motivation and your team's motivation. And that's almost the worst thing you can do. I think it's better to have more contributions and stuff you need to clean up, than to have no contributions. And you might think, "Oh no, no, they just have to update the pull requests or whatever. Just have to make the build go green." But there's a human element to that that might be missing in those types of rules that are just all or nothing binary. Don't do that. Really it's about people. It's about investing in people long term. So things like documentation, et cetera, it's not a cost, it's not a chore, it's an investment.
And with that, I am done. So thank you for coming to this talk. I hope that it'll get you thinking critically the next time you read some article about Best Practices, 10 Things you Must Do. Do you really need to do it? And if you're interested in more types of topics like this, counterintuitive, what is it actually in real life? You can follow me on Twitter or on YouTube. Then thank you very much.
Questions
[22:50] Mettin Parzinski: Good to have you here. So let's look at the results. So the question was, what does it feel like to deploy to production? And with 57% we have a winner at passport control and airport security followed with 27% waving at your coach as you step into the field or 17% showing an ID before entering a bar or club. Was this something you were expecting, Julie?
[23:17] Julie Ng: Actually it surprised me, the airport control, that it's that many people. I thought this audience, this conference, folks who use JavaScript, I don't know, they would be more laid back. I think what surprises some people, because my job today, I helped Azure customers, even the Allianz didn't have passport control and airport security. So we had all those really super strict requirements. So my job there, I was a full stack engineer and then I later had a mentoring role across many teams and we established all the rules. But it got to a point, for example, some of the compliance stuff, the product owner, so a non-technical person would create the pull request that could deploy to production, for example. Or the product owner would be the one to click a button in Jenkins, and that deploys. There weren't any hard controls on everything at the time. It's just when you have a business that's worth that much money, it's scary anyway. So you're only going to click that button if you're a hundred percent sure. And because people actually own their repositories, their own repositories for the most part, it's like, "Okay, if I screw up, I'm just going to shoot myself in the face." It might have been that it was really early in the cloud journey and they didn't put all those rules on yet. But I don't think so. I think it was just that there was a good amount of trust, which is hilarious because I have airport security today in internal repositories and I don't even work on Azure. But it's not necessary. I hope it's clear from the talk that it's not necessary to have that much control. Trust is much more important, I think, for long-term growth of the team, team spirit, and then also just adding more features and business value.
[25:00] Mettin Parzinski: But I guess also with our audience, most of the people are not actually deploying user facing code, but they are building the infrastructure, so like the chef doesn't need to have his own kitchen, thinking of that. They're scared of their own code, maybe. But hey, we'll never know unless everyone tells us now on chat. So lets, if you're ready for it, hop into the Q and A. And as a reminder, if you have any questions you can still do so in the channel, DevOps Talk Q and A. So be sure to join that. First question is from CC Miller. How would you handle the scenario where the vulnerability is a valid issue for the app but can't be updated because other things are not compatible with the fix?
[25:52] Julie Ng: So if I understood correctly, there is a valid security vulnerability, so it's a confirmed vulnerability, but there's no fix. And how should we handle that? Is that what's the question?
[26:06] Mettin Parzinski: Okay, yeah. And there's also compatibility issues.
[26:13] Julie Ng: Okay, there's compatibility issues. That's something you have to figure out with your security team. So I can say that the customers do this. I can say that a couple years ago, I'm pretty sure that Allianz probably still does this. You have to find that balance. And often even in large organizations, you'll have basically a contract. It's going to be locked, you're deploying something that has a known vulnerability and you have however many days or hours to actually fix it. And you have to make that decision. There's a vulnerability. If the team or the business owner, the product owner says, "We go anyway," okay, then the clock starts ticking after you go into production and you fix it by then. And that might mean rolling back. So there's always going to be a contract I think between all the different stakeholders in your organization and that includes the security team. And I think the hardest thing is to find the right amount of confidence. Like the example I said, it's a valid security issue, but what is the chance of it happening? There's no such thing as 100% security. I think you just have to evaluate it, take a little bit of a risk and figure out what is right for you. Yeah.
[27:28] Mettin Parzinski: Yeah. Agree. Next question is from Jessica. How often should we revisit what good looks like? We have to start somewhere and how do we know how often we should optimize for success?
[27:46] Julie Ng: I think you'll know in your gut. So you never reach perfect. And it's funny because I show demos at work all the time and in webinars and I'm like, "Oh, I'm doing it like this now," and then six months later, I would totally do it differently now. And so as you work with what you've built or as other people join your team, as you get new experience, you just might change it. And whether or not you have the time to change it, that's going to be dependent on what's your workload, how much time do you have to prioritize features, et cetera. So you're probably going to be doing it all the time and you'll never reach perfect. I think if you reach perfect, I'm like, "Hmm, that's an interesting bar." Or you have something that, there are some services you deploy and they're done. You don't really ... If it's not customer facing, let's just say it's an API to consume, some things are done and you don't update them regularly. Maybe for some security maintenance, but same thing then when it's automation, it's done.
[28:48] Mettin Parzinski: So yeah, I'm thinking of, I don't know where I got this wisdom from, but I once heard if you're looking at something you built a year ago or two years ago and you're not ashamed of it, that's a bad thing, because that means you have not progressed in your knowledge for a year or two years. And you mentioned half a year. Half a year is acceptable. And of course you don't need to progress. If you're fine where you are then you're fine. But most people here I'm assuming are here to learn new things. I thought it was a nice rule of thumb. If you're looking at something from two years ago, you're unashamed, it's a bad thing.
[29:35] Julie Ng: Yeah. It's not just though ashamed or whatever learning. So I'm reading this book right now, it's called Think Again by Adam Grant. So sometimes it's just that it's not you learning something new, but you realize something and you just change your mind. And changing your mind is actually not a bad thing. It means you're thinking like a scientist, maybe you have new learning information now and you change your mind. It didn't get better, it's just different. And what I changed, it works for me, it won't work for you, but that's okay. It's for me, not for you.
[30:06] Mettin Parzinski: Nice, nice. We have another question from Sisi Miller. Is it possible, can you over document, and what is too much, or is that a team thing?
[30:20] Julie Ng: You can definitely over document. I wish I could share so many internal things with you. Documentation, it's really hard to find that right balance. And sometimes there's not much text and actually that takes a lot more time to figure out what is actually necessary. Let me write a paragraph and delete 10% of it or 20% of it or rewrite it because it doesn't make sense. As much as you need and not much more. I think one thing I try to tell my colleagues, and I'm not really successful at it, is you don't need to write a book. You don't need to write whole paragraphs. If you can get away with bullet points, do it because nobody reads it anyway. Who has time to read pages? And so you will decide, or the people you're talking to will decide if it's too much. There are a couple times they're like, "Oh, didn't you read the Wiki?" And I'll say, "No, it was too long." And I'm direct, and so people are like, "You didn't read it?" I'm like, "No, I didn't read it. I didn't have time to read this much text. No."
[31:22] Mettin Parzinski: Yeah, it reminds me of a meme that was going on a few weeks back. Something like, why waste two minutes reading some good documentation when you can just spend a whole day trying to figure things out for yourself.
Julie Ng: I like that too. That also works.
Mettin Parzinski: And I felt personally attacked when I read that.
Julie Ng: Nah. It's not attack. It's all good. It's all good. Always good humor.
Mettin Parzinski: But I guess a lot of people are guilty of that. We have one minute. So we're going to do one more quick question from Sergio. Let's say you have some process in the workflow that was added because of reasons. How do you know when it's time to remove that process because it doesn't add value anymore?
[32:14] Julie Ng: I don't think I can do that question justice in under a minute. I would say let's go into the spatial chat, that person, if you can, and then I'll answer it there. I'm going to give you, you'll know when you know. But for details, let's go into the spatial chat room thing.
[32:31] Mettin Parzinski: Yeah, that's a nice bridge too, the next step, because this is all the time we have for our Q and A. So Julie is going to go into her spatial chat where you can continue the conversation with Julie. So if you want to have some one-on-one time with Julie, you can do some now in the spatial chat. Julie, thanks a lot for joining us. It's been a pleasure having you. And enjoy our spatial chat speaker room.
Julie Ng: See you later.
Mettin Parzinski: Bye-bye.
Julie Ng: Bye. See you later.
Panel Discussion: Mono-repo Vs. Multi-repo Vs. Hybrid
DevOps.js Conf 2021DevOps.js Conf 2021
33 min
Panel Discussion: Mono-repo Vs. Multi-repo Vs. Hybrid
Video
And welcome everybody to our panel discussion. Today we have a great topic. So Monorepo versus Hybrid versus MultiRepo. And we have a lineup of speakers. So speakers, please go ahead and introduce yourself. Luke. Hi everyone. My name is Lukande, but a lot of people call me Luke. I'm a Star Wars fan, so you can also call me Luke Skywalker if you like. And I'm from South Africa. My actual nationality is that I'm Zambian. Yeah, senior software engineer working at a company called Entelect Software in South Africa. And yeah, primarily in the financial services sector and focus mostly on the cloud and DevOps space. Nice. And if you guys are a supporter of one of the approaches, please tell me which one. Luke, are you a supporter? Look, I'm between both multi and mono, hey. But to be honest, most of the times I go with the mono approach, especially even with my side projects. So yeah, I guess I'd go with mono. Okay. Julie. Hi. Okay, I found the unmute button. I'm still doing that after your COVID. But yeah, so hi, I'm Julie. I'm an engineer at Microsoft. I help customers use Azure on board to Azure. Before that, I was an enterprise architect at an insurance company and helped a lot of developers sort of figure out DevOps and mentored them, annoyed them, because I always have to do the security thing as well a little bit, sort of be the bad person that says you have to sign your commits. Yeah, you're probably going to ask me now, mono, multi, hybrid, I've seen it all. And I'm going to give the architects answer, which is depends on the situation, which one's best. Nice, nice. I like it. Victor, please go ahead. Sure. So I'm Victor. I used to be at Google on the Angular team, and this is where I actually sort of, not fell in love, but sort of got to know MonoRipa style development, what it can feel like. So Google, Facebook, Twitter, and others all use a very similar setup. And for the last couple of years, I've been leading a project called Annex, which is a Google style MonoRipa tool for the masses. So if you don't have Google, but you want to do something similar, this is what it is. So obviously I'm in the mono camp, but I actually don't care that much. We can do whatever we want to do. Okay, thank you. And Angel. Yeah, so my name is Angel Rivera, developer advocate currently at CircleCI. And before that, I worked pretty much US military, working in Space Command all the way through to small startups in Europe and the US. And then also went back to work for the US government for a little while. So I have a wide range of experience and a lot of things. To answer your question of which preferred repository, I'm a hybrid man. Wow. Definitely. Nice. Yeah. Okay, so the first question, could you please tell me and think about which top is better for different kinds of projects and why? Luke, please. Sure. So I guess just from personal experience, as well as learning from others, I think the mono repro approach is great when it comes to working with the smaller sized projects or medium sized. And it's really good to help just with knowledge transfer in teams. That's why I, that's one of the things that I like about it the most. You can easily have people integrated into getting a big picture of what the whole project is. They get to see if you have different services in that one repo, they get to see the entire lay of the land, so to say. So yeah, that would be one of the major reasons why I think those would be good scenarios for a mono repo approach. With multi repo, I think if you've got, if you're looking to have strong ownership, that's something that I think is becoming more and more popular, obviously because of microservices. That's pretty good to have over there, in my opinion. If you want to have that kind of strong ownership and you want teams that are integrated to have that full end to end approach, then definitely. I am very keen to hear more about the hybrid approach and where there's room for that. So yeah, but those would be my thoughts. Okay, thank you. And Victor, what do you think about this? Yeah, it's a good question in that it depends on how you define the mono repo. If you define the mono repo in the sense like the Google style mono repo or whatever, like the Facebook style, one super large repo. So obviously that doesn't work for everyone. On the other side, you have three projects side by side in a folder, and it's a mono repo as well. So if you think about the former, so very few companies should adopt the former because it requires a lot of energy expertise to just manage the stack. If you think about the latter, it works for any project, even if I'm building a blog and the blog has two features, like the list and the details, I can think of the list and the details being separate projects that my blog consists of, hence it's a mono repo. So it really depends. In terms of the rule of thumb that I use when I talk to companies who help to use mono repos is that if they do, basically if they're closer to a monolith, they have one large system, they can still do some structured development, partition six of the packages, and treat it as a mono repo with multiple modules and be built independently and then assemble into a larger system. But the gain is not as big comparing to when they have things like micro front end, micro services, because basically what mono repos do well, I think, and that's what many companies realize is that they help with coordination. Once you have a system, let's say it consists of 20 nodes, I need to work on those 20 nodes in a structured way, share code, figure out how to deploy it. What is a system? Being able to spawn the 20 nodes in development and interact with it. That's where you need this extra tooling, the mono repo, what we call style tooling, that will help you to manage that system. And the more nodes you have, the bigger the gain is. So if you have 100 nodes, well, then you have to have something to help you manage 100 nodes. So obviously it's sort of a chicken and egg thing. If you have good tooling, you will create more nodes because it's easier. So it's kind of hard to gauge sometimes. But overall, I think counterintuitively, if you have micro services and micro front ends, you actually benefit more from mono repo because you need to coordinate them in some fashion in development and in deployment. Okay, thank you. And Julie, what do you think about this? So I'm going to lower my head. I feel old when I do stuff like that. So I've seen it all, right? Like I said before, and when I was an enterprise architect, I initially started as an engineer and people talk about, oh, you should use this or that. But let's look at what it's like in real life, right? So we had something that was built. It was a distributed monolith in a mono repo. And how do you split that out? So it was okay. I think to start with a mono repo because we weren't sure what we were building, right? In terms of business domain, what do we actually need to do and execute in terms of features for this product to be successful for the user? So when you're kind of experimenting, you don't want to coordinate all that. Yes, you will shoot yourself in the foot in other places, running builds and whatnot. But in terms of just what is my product that the user actually sees, it's much easier in the mono repo. So I think maybe after a year, we got to the point where we're like, oh, we got to take this apart now. Because also the engineers had developed enough skills that we really do want to separate out the single page application. And there was a multi-tier application, right? So there are different sort of layers you have to go through. And again, this is enterprise. And so for a while, we had a distributed monolith. We split them out into repos, and then we pulled them in as Git sub modules, which was also less than ideal, but it's a little bit that sort of intermediary state until you get to the point, if you ever reach that point, where you do have microservices. And not everybody reaches that point. Sometimes you go back into a mono repo. And now, as I always have my architect hat on, I say, I don't care. Just ship whatever works for you. Okay. Angel. Yeah. So I'm going to take a different approach to your question. I think that it really doesn't matter, right? How you structure your code and where you structure it. I mean, these are just basically locations, right? Like where we're storing the stuff. And I think the problem that we all have is the tooling. And I think Victor has touched on that a little bit. The tooling is inefficient, right? I don't care where you work. Google, Facebook, CircleCI, Microsoft. The versioning control systems that we have are just not good. You've run them at scale. It's crap, right? We're not there yet. The industry. You know, we've made huge progress. Believe me. I've been around this business a long time. But the tooling is where the problems are. Because if, like, I mean, Victor and I think Victor was the one that worked at Google. They have a huge mono repo. They had to build their own kind of internal tool to manage the versioning process, right? So, let's start with the tooling. I think that's kind of where all the pain is coming from. And until that is, you know, bettered or evolved into a state where, you know, we can do versioning fairly proficiently, then I think some of these other problems will be, like, irrelevant, right? That's why I said hybrid is kind of where I'm at. Because when you're talking repos and mono repos, it's literally just where is that code being maintained and stored? And then the problem is how, you know, the tooling that we use to version that stuff, right? So, that's my answer. Yeah, thank you so much. Yeah, you're right about the tooling. And let me ask you a short question. And I'm supposed to hear a short answer just about the tooling that you use. What is the preferable setup for you if you're talking about the mono repo, multi-repo, hybrid approaches, Julie? You can just leave. Yeah, what is the preferable setup list of technologies that you use if you choose mono repo or, let's say, hybrid approach? So, here's the thing. I'm not sure I 100% agree with Angel. Why? Well, again, I have my enterprise architect hat on and the security division issue, right? Who's working on something? When you have one system, you always have then more management overhead to separate those people out. I come from compliant industries and that's always something you have to solve. And if you don't have enough people to do it, then you say, let's just split them up, all the different teams, even if they're building a distributed monolith. Tooling, I don't know. I feel like tooling doesn't solve people problems. At the end of the day, a lot of this coordinating, and that's what I mentioned in my talk, making sure everything goes well from your laptop to pushing to get to actually deploying, it's all a dance. It's all choreography more than the tooling. Give me anything and yes, one will be better than the other. But what's really painful is working with other people, right? If I could just build my own thing and full stack, then no people, no problems. That's my answer. I'd push back on that. Let's go. Let's go. It's very obvious that we have a tooling problem. I mean, you know, what you talked about, yes, I agree 100% that people do get in each other's way. I think that's more of a DevOps kind of problem, right? So we're talking about culture at that point. What I'm talking about is repository indexing, indexes becoming astronomically large, inefficient. That's the reason why Facebook, Twitter, Google all have their own internal in-house built system because of Git's inherent slow indexing, extremely painful indexing. Once it gets to a point where you're talking about terabytes of data, actually even less than that, right, in some cases. So the other point to kind of counter that would be that, yes, we have a people problem, but if you fix that inherent problem with the versioning tooling, you can also address the security problems that you're talking about. That's easily addressable through automation. I would throw CICD platforms at that security problem to collaborate around that. And we're seeing huge gains in security, right? Even Azure, the GitHub now has bought Semo, they acquired them, and now they have a slight I would say they do security scanning for you out of the box on the platform. I'm sure that's going to grow into its own product line eventually, but right now you're getting it for free. So you know, those security type questions are being answered currently. They're not perfect by any means, but I think with DevOps implementing that to handle the people portion, it is a separate issue than the version control systems. The reason why we have repo, monorepo talks, right? Or this conversation, my opinion. I think we come from large industries, right? Like I'd love to hear what Luke, for example, sees, because we're very skewed in the big Microsoft, Google's and Facebook's. So I'd love to hear what Luke thinks. Let's ask him, Luke. Yeah, so I'm in the red tape industry, man, with the financial. Yeah, so he's not so small, right? Right. And look, one of the things that I find really challenging, and I would love to hear your opinions on this, is when you have huge teams working on a monorepo. So they believe in the benefits that a monorepo would provide, but they're trying to enforce that kind of system. And how does that work in terms of like commit standards and even standards in terms of best practices? So that's why I really do lean even towards like, there's definitely a people element to this and maybe starting from there. Tools obviously are an important factor, but I think I've seen a lot of the people problems and that maybe if we started from there and try and solve things from there and build from there in terms of the paradigm we should be using, as opposed to, because not every project is alike, not every company is alike. And there's obviously going to be a temptation with the bigger companies in the industry. I've seen it where it's like, oh, but Google are practicing this, therefore, let's do that too. It was like, we're not Google. And our team is not structured like the teams they have there. We don't have the same kind of maturity they have in terms of the additional tooling they're using to make things more efficient. So yeah, those would be my thoughts on that particular area. Yeah, I just want to clarify that I mentioned teams or the tooling piece being like kind of the genesis of the problem, but I agree a hundred percent. It's all about scale. So I'm talking about running these repos at scale, but yes, of course. And that's why I said hybrid, right? Remember that. Oh yeah. Victor, could you please share your thoughts? Sure. That's actually a very good, very interesting point. And I think that it really depends on, again, how you define a monorepo. If you look at the Google scale or like Facebook scale, Microsoft scale, well, yeah, a lot of tools for sure do not work really well. You have to use like GVFS, do lots of interesting things so you can actually traverse the repo. So I personally work with a lot of large companies building large systems, but they don't have like one uber monorepo that runs the whole organization, right? It's usually like we have 10 teams running a monorepo that contribute to one system, right? And once you go to that scale, it's not that, you know, it works, right? So like you have to go past that. So I think that for most organizations, one uber monorepo won't work anyhow for organizational reasons, right? It's like, you won't be able to, they can't even pay for the same, like the Azure box, right? To run CI, CDN because two works have different budgets, right? So like let alone tooling, like just those organizations can't agree just for business reasons, right? To move together, right? So when I talk about monorepos and benefits of those, I don't mean like Google monorepo scale-wise, I mean Google monorepo style-wise, right? You have same type of tooling, but a smaller, a much, much, much smaller system. Let's say we have a thousand developers work on it, right? Where you have maybe tens of millions of lines of code, right? We're not talking about a massive system that's around the world, more like we have, you know, 10 apps that comprise this, you know, experience that most customers perhaps, like let's say over bank, right? Interactors, right? So at that point, tooling is less of an issue, right? When it comes to like how teams should coordinate, I think that it's true that at the end of the day, everything ends up being like a people issue in that, like you have to work together, agree on standards and stuff, right? But it's not like people, like I don't know which one will first, like if you like take something like social media as an example, right? And say, yes, if we were all normal people, it wouldn't be difficult on social media, we'd be nice to each other, but we are not, right? But it's partially because social media forced us to be not nice to each other, right? Because of the weight, you know, the incentives that are there, right? Same as tooling. If you have good tooling, it allows you to separate your code from everyone else's code to enforce best practices in a relatively straightforward way. You know, like people kind of coordinate better, right? You're like some of the pain goes away and you have a nice experience, right? Where you can actually deal with things. So I don't know which one is more primary, right? I think tooling, I'm a tools provider, so obviously I'm not an agile consultant, right? I'm providing tools, so obviously I'm skewed towards tooling because I feel like that is something that you can, they have a relatively low cost and can be integrated relatively fast compared to changing to like organizational structure or culture or even human nature, which is a lot harder to do. So those things, okay, I let thought work, people fight that battle, right? I'm not fighting that battle, I'm fighting the battle. Certain things should be simpler, right? Some interactions should be simpler as such that the cost of an interaction is lower, right? And when it comes to a previous point, actually I wanted to add a comment, but I wasn't sure if I should interject in that. I actually think that another thing about Monorepo that I find interesting is that folks often assume that if you're on the same repo, you necessarily more coupled to each other, right? But it's not hard to imagine with proper tooling, it would be the other way around, right? Google repo is not more coupled to each other than like a Polar repo organization, right? Like you can have boundaries within a Monorepo that just as strict as if you had a Polar repo. So those things you have to agree on, right, for sure. But it doesn't mean that you're necessarily building like a convoluted monolith where you cannot draw organizational boundaries. You can draw organizational boundaries, you can do all the things that Conway's law would require you to do, right? So, okay. So it's not really a one thought because there are a lot of points made before, right? So just to tell you, tooling is the foundational thing that you can cheaply buy or integrate compared to lots of other things, right? And then once you have it, you need to solve people's issues because at the end of the day, that's what results in your code being convoluted and stuff, right? Sounds like you're in my camp with hybrid. I mean, I think the word hybrid as well, like if you have a repo for a suborg, it is a Monorepo for that org, right? If it doesn't interact with other Monorepos, like if you're on an island and you have a single boat going back and forth, it's not really a hybrid. You're still on your own, right? So I wouldn't call it a hybrid, but if that's how we define hybrid, then yes, right? I think that most organizations should go hybrid. I've been one Uber large repo for the organization that just doesn't work, let alone tooling, right? They won't be able to buy machines or whatever resources to pay for it because they won't be able to agree on budget, right? So like there was lots of problems there for most orgs. Yeah. The hybrid for me is the ability to just, like I said, introduce the code bases you need to introduce to your automation, right, your processes, your build processes when you need to. So to me, it doesn't matter where these things live and how they live. If they're separate or if they're coupled, I don't care as long as my tooling can, you know, complete and execute tasks I need them to complete when I need them to complete them. Yeah. No, I kind of get it. I haven't had the point, but I know that you have to say it. I'm going to interrupt because I've had my hand virtual hand up for a while and now I have my physical hand. I just wanted to ask you, please. Yeah. So let's backtrack everything a little bit, right? Like I just, I worry sometimes like who is our audience? We're throwing so many things out there, right? Let's make this a little bit more concrete. We keep talking about tooling and getting stuff in repos. All right. So making it more concrete, what do we have for tooling? We have branch protection, we have pull requests, right? How do you want to fit all that together? And when I talk to customers, I talk about, you know, business units, fruits and veggies, fruits and veggies. And then I have dev QA and production, dev QA and production. Think about that math. It grows almost, it feels like exponentially. I'm terrible at math, so probably not exponentially. But it is going to be a combination of tooling and people and just sit down and plan that. Right. And it's your business driving. How do you need to separate your groups and use names? Don't use foo and bar. Fruits and vegetables. How do they work together? And again, maybe I'm going to ask fellow panelists what other tools they think of, but the ones that always pop to my head first are pull requests and branch protection. Well, those elements of pull requests, right? Those are outside of like the core versioning tools, right? So that's a product of GitHub. You know, they call it something else in another product. So I think, you know, looking at the tooling natively, then I agree 100%. There's a people obviously element to that. But I think that it's a conversation to be had outside of the modern repo. Because once you figure out the continuous integration portions of your processes, right, that's the, in my opinion, the developer people portion where they're collaborating around code, right? I don't think it matters where the code lives and how it gets integrated into your processes, right? So for me, they're two separate things, though they are related. Okay. Thank you all. Next question about statistics. So you just said a lot interesting about this, those approaches, but what about statistics? Could you share some statistics like development speed, again, CI, CD process, call duplication, and so on, indexing? And Joe, so Luke, what do you think about this? Yeah, I guess I could speak a bit about the deployment bit. And just if I am understanding the question correctly. I mean, so you have, again, this is kind of going back to the whole it depends or case by case situation, right? So with monorepo, depending on how you've structured it, you could have a relatively quick deployment stream in that regard, if it's structured right, again, and then depending on the additional tooling that you're using, and this is going to vary. The norm of what you will find is that if you're looking is that the larger that and more complex that that becomes, the slower things may get. But even that is still kind of subjective, because it still comes down to what tooling are using, what is the paradigm that you're actually using in this approach? Whereas naturally, with a multi repo, because things are smaller and a more micro sort of basis, things will be a lot faster. So I think in that particular regard, you can expect things to move much faster, statistically speaking. But even then, it's so hard to say that that's a universal thing, because this is very case by case, so to say, across projects. So yeah, those would be my thoughts on that. Angel, you already said about indexing security. What else you can mention here? Yeah, so you have to, I agree 100% with Luke, and everyone else who mentioned this, this is case by case, right? Everyone's teams are different. And I think Julie even talked, touched on this a little bit too. But you have to understand what, so to kind of go to Julie's point of bringing that back, we did kind of go all over the place. And it's probably my fault, but I tend to do that. But to kind of refocus all of this, you got to understand what you're inheriting, right? So like, as a developer, I've been to many organizations, I inherited people's code, we all inherit, right, at some point, some code. The biggest struggle that I ever had was exactly what we're talking about today is like, all right, what do I do with this? So it's a monorepo, how do I, I come from the world of I'll decouple the hell out of things because I like smaller, more manageable bits, right? But you can't go into a company like, I don't want to say the name, but there's a customer of ours that's huge, huge company, been around for a while, they have a monorepo that is immense, right? If they were to break that apart, it would just cost them millions, probably billions of dollars. We're not going to do that. So we have to operate within our context. And that's my point, right? You have to first assess what's going on with your teams, your code base, can things be decoupled, right, slowly, maybe, but should you decouple them, right? Like, so if your monolith is functioning and it's awesome, great, I don't see any reason why you should decouple it. But if there are improvements you can make that make things more efficient, to Luke's point, you know, smaller code bases run faster. I'm all about efficiency, getting my builds durations to be as small as possible. So look at things from that perspective, how can you optimize, but you also have that cost, right? Like, is it going to cost the company or the organization millions of dollars to implement a change and what is that return going to be on those changes, right? So that's why I'm a big, huge proponent of, I guess, the hybrid, because you will have situations where you're going to inherit these huge monolithic repositories and then you're also going to have to integrate some sort of new technologies or new services with that monolithic app, right? So do you couple your new microservices and merge them into that monolithic application or you just leave them decoupled and run in that hybrid state, right, where you have this one project where it's just running in this load or in this repo where it's mono and then you have these other smaller services that are maybe newly built and leave them as decoupled, right? I think the answer is leave them as hybrid, leave them decoupled if you think that that's the most optimal way. But if it's more efficient to couple them and have them in one, then do that as well. It has to be a choice made by the organization with their specific circumstances. I'll leave it at that. Okay, I see. Thank you. Victor, would you like to share some statistics? So I think if you want to make a case for poly or mono, you can always come up with some numbers to prove the case. So the way I would measure, right, sort of the tooling performance and then the organization performance. Tooling performance would be things like CI, CD, average time, worst case time, things like that, right? And then, I mean, depending on how you measure, you can make mono or poly fast. For example, if you have a system consisting of like four modules that can be in one multiple repos, they form like, let's say, a diamond shape. Yeah? So you can make a change to the bottom one. Do I care about verifying the bottom one? Or do I care about integrating the bottom one like across to the top one, right? And do I basically count this CI, CD run in case of poly or repo for all CI, CD runs that will be required, right? Things like that. So if you start doing that, then, you know, depending on what you choose to do, right, I think it ends up being the same, right? In that, of course, you can do like everyone distributes CI, right? Even if you have a small repo, you should distribute CI. You don't run on one agent. Once you start distributing CI, it doesn't really matter whether you have mono or poly repo and that you still run your CI on like 30 boxes at the same time or 50 boxes, however big your repo is, right? So you can always make your mono repo or poly repo to be fast enough. Whatever fast enough depends. To you, let's say it's like 20 minutes in worst case scenario, right? That can be made for repo pretty much any size, right? Unless you have one large node that cannot be divided, at which point, again, it's a moot point in poly and mono repo because you have the same node that can be divided in either case, right? So I think it doesn't really matter. So in terms of like tools performance, can be made work with either. Organization performance is a different thing, right? Then you start measuring things like what if I want to share code? Like this is my intent as a developer. Like how long does it take for me to share code between like myself and another team member or like a different from a different project, for example, right? Does it take an hour, a day, a week, a month, right? And in a mono repo, that is much smaller. That's the benefit of the mono. The sharing is easier, right? But if you measure different things, like then you can make poly repo look better. It really depends on what you care about, right? Most organizations I work with want to share more, not less, right? They are too divided in the lines of business and stuff, right? So for them, they look for sharing as something to optimize for, right? So the mono repo is good in that case, right? But if you have one, you know, glued together or can you want to divide them more, perhaps you want to share less, right? For organizational reasons, at which point you could still use a mono repo to do that, but poly repo gives you a sort of in a more natural way, right? Yeah, we're almost running out of time. I'd like to ask Julie. That's your last answer. What do you think about this? It all depends. I think we can continue in the spatial chat for people who want to debate with us. Okay, I think. Thank you all. It was very interesting. Thank you. Bye bye. Bye bye.
Infra vs Apps – Where are my Pipelines?
DevOps.js Conf 2021DevOps.js Conf 2021
32 min
Infra vs Apps – Where are my Pipelines?
Automation of a single monolithic app is pretty straight-forward. Split it into a frontend and backend and it's still manageable. Throw in more components or infrastructure and suddenly you're scratching your head at why a build ran - or didn't run. How many pipelines do I need? How many git repos should I have? Let's walkthrough use cases from small teams who own their entire stack to organizations with central IT units that manage shared infrastructure. Learn which scenarios and criteria determine how to slice but not spaghettify your pipelines.