How Baselime created a culture where it's possible to move fast, break as little as possible, and recover from failures gracefully. The culture is technically underpinned by Node.js, Event-Driven Architectures (EDAs), and Observability (o11y).
Creating an innovation engine with observability
AI Generated Video Summary
Baseline provides observability for serverless architectures and has created an innovation engine within their team. They measure team performance using Dora metrics and the Accelerate book. Baseline emphasizes the importance of foundations, streamlined testing, and fast deployment. They practice observability-driven development and incorporate observability as part of their development lifecycle. Baseline believes in building a culture that fosters ownership and democratizes production.
1. Introduction to Baseline and the Innovation Engine
My name is Boris. I'm the founder and CEO of Baseline. We provide observability for serverless architectures. Today, I want to share how we have created an innovation engine within our team. We ship fast and I will discuss the methods we apply. The question of how well a team is performing can now be answered using the Dora metrics and the Accelerate book. We measure deployment frequency, time to go live, deployment failures, outage recovery time, and mess around lead time.
♪♪ ♪♪ ♪♪ My name is Boris. I'm the founder and CEO of Baseline. What we do is observability for serverless architectures. So I'm sure most of you guys have heard the word serverless multiple times today, from the talk earlier in the morning to all the demos that have happened since then, and a lot of emphasis is put on how to deploy code onto the cloud and et cetera, but very little effort is actually put into how do we actually run and maintain this code over time. And that's the sort of solution that we provide for people that are adopting serverless architectures.
But what I want to talk today, actually, is something completely different, is how internally within the Baseline team, we have been able to create what I like to call an innovation engine thanks to the observability that we have. So compared to other startups at very similar stages of life, we are at this point where we ship really, really, really fast. And I want to share with you the, I wouldn't say tricks, but the methods that we apply so that we can ship so fast. So, the first thing is, is anybody here in the room dealing with tech debt at their job right now? I see one hand to oh, wow. Almost all the room. Is anybody dealing with flaky tests? Is anybody dealing with CI, CD pipelines that never work when you need them to work? Again, almost everybody. And that's what I don't like. When we signed up to be software engineers, and for a lot of us cloud engineers, what we wanted is to create things and put them in the hands of people and make that innovation happen and see how people are interacting with those things that we create and we put on the web. But we are left day to day dealing with tech debt, flaky tests, and all of that, which is basically just slowing us and preventing us from innovating every single day.
And there's this question that comes up a lot in conferences and tech conversations is, how well is your team performing? And this question, what it actually means is, how much innovation is your team shipping every day? And up until very recently, there was no real way of actually answering this question, honestly. People will say, oh, we're doing well, but there was no way of quantifying it. Up until the Dora metrics and the Accelerate book. I hope everybody here has read it. If you haven't, please get a copy. And it gave us a scientific framework that we can use to actually be able to say, okay, we're in the top 10% performing teams. We're in the top 20% or we're in the bottom 10%, and we need to do a lot of work to get out of there. And to be able to answer that question, there are a few metrics that we need to measure. The first one is, how often do you deploy, that's your deployment frequency. Second one is, how long does it take for code to go live? So from a developer writing code in their code editor locally to that code being live in production used by real users, how long does that take? How many of your deployments fail? So when you deploy, you most definitely sometimes introduce defects into production. How often does that happen? And how long does it take to recover from an outage? So when someone introduces a defect to production, how long does it take for your team to detect that defect happened and ultimately fix it, either roll forward or roll back? And at baseline, we have another one. It's a bonus one. We call it how long, we call it mess around lead time. And it's how long does it take from customer insight to production? So I'm here at a conference. I've spoken with a lot of people, a lot of experts in serverless really. And I've learned a lot, I've gotten a lot of insights.
2. Insights to Production and Deployment Process
Innovative teams have foundations laid, no flaky tests or bad CI/CD pipelines. Low performing teams experience chaos and spend time fixing instead of shipping. Smaller deployments lead to faster detection and recovery time. Baseline has no red tape around deployment and streamlined testing. The bottleneck is deploying to the cloud. They don't test everything and don't do code reviews.
How long does it take from those insights that I have now to being in production at some point in the future? What's that time gap? And this is what innovative teams look like. Every single day, they have the foundations laid, they don't have flaky tests, they don't have bad CI, CD pipelines, and they keep innovating, adding blocks on top of what they already have, such that innovation happens every single day.
And for low performing teams, this is what it looks like. Complete chaos, nobody knows what's happening, and every single day, instead of shipping stuff to production that is actually helpful to your users and customers, you are fixing stuff. You are fighting CI, CD pipelines. That's not what we signed up for, and we want to move away from this.
And when you start moving away from it, it's a self-fulfilling prophecy. That's the expression. Smaller deployments lead to faster deployments. Faster deployments lead to faster detection time. Faster detection time leads to faster recovery time. And if you know that you can recover from outages quickly, you will deploy more often, you will innovate more often.
So, how does all of this look like at baseline? So, our deployment frequency is whenever you want. You make a typo change in the frontend, git push, gets deployed. We spend the whole day working on this huge migration, and blah, blah, blah. Git push gets deployed. There is no red tape around deployment. And that's something that we need to introduce more into our development cycles. Because those red tapes that we introduce, they seem like they're helping us being more productive, but actually they're just slowing down everybody.
What's our mean lead time for changes? So how long does it take for somebody writing code on their code editor to that code being in production? That is actually however long infrastructure-as-code takes. We use infrastructure-as-code to manage all our infrastructure, and every time you git push, our CI CD pipeline picks up, and it builds the artifacts, and it deploys them to the cloud. And the bottleneck in our process is actually that deployed to the cloud piece. It can take maybe two minutes or so. And the reason we are able to achieve this is controversial. We have very streamlined testing. So we don't test for the sake of testing. We don't have testing suites that take 15 minutes to test buttons and etc. We test the critical path in our software, and the rest we are going to discover if there is a problem thanks to the observability that we have. And the second thing, probably even more controversial, we don't do code reviews. I know a lot of people are not, I hear a smile there.
3. Code Reviews and Change Failure Rate
Code reviews have been a legacy practice and are common in the open source community. However, within a team, trust is placed in team members to ship code to the team's standards. The majority of issues caught in code reviews can be prevented through automated processes like linters and testing. For more challenging tasks, the Baseline team practices on-demand pair programming. Their change failure rate is approximately 10%, with one in ten deployments introducing defects in production.
It's not something that you hear a lot, but I think the idea of code reviews comes from two things. It's legacy practices. We have always done code reviews, so we are going to continue doing code reviews. And the second thing... Thank you.
The second thing is really open source. The open source community. When you're running an open source library or an open source framework, you need to have code reviews because anybody can commit to your repos. You need to have processes in place to make sure that the code that gets in your repo is what you expect. The quality that you expect. The security standards that you expect. But within a team, well, you were hired in this team. It means that we trust you to be able to ship code to the standards of this team. Why do we have to block one or two senior or staff engineers to read your code, review it every single time? When, let's be honest with each other, the vast majority of the things that get caught at code review stage, linters, testing, and all of that automated stuff can prevent the whole code review process. And more importantly, sometimes there are things that are actually hard. Like I write this code but I'm not super confident that this is what should go into production. The way we handle that within the baseline team is pairing. It's right in Slack. Hey, guys, I'm working on this thing. It's kind of hard that someone has a second, two people, three people join you. You pair for three, four, six hours the whole day, and you ship a product that doesn't need to go through review, through code review. So we do pair programming, but on demand, rather than all the time.
So our change failure rate, about 10%. What that means is one in ten deployment introduces a defect in production. This is still a pretty good number, but we have to remember that I personally deploy 20 to 30 times a day. We're a team of three. I'm bad at math. It's a lot of deployments every single day. So 10% of those introduce a defect in production. Now those defects, sometimes, they're typos.
4. Defect Recovery Time and Culture
Our recovery time for defects in production is less than one hour. From customer insight to production, it takes about half a day. It's all about the culture, trust, and tooling within the team. Many question if this approach works for everyone and if it's possible to have high velocity and quality software at the same time. The answer is yes, with the right culture, processes, and tooling.
Sometimes they are worse defects than typos, but that leads us to the next thing, which is our recovery time. When a defect gets into production, how long does it take for us to detect that problem and to fix it? On average, less than one hour. So sometimes it's five minutes because it was a typo, sometimes a bit longer because it was a bigger problem. But we bet on our observability to be able to know about these defects so soon that we are able to fix them before they actually impact anybody who is actually using the application out there in the wild.
And our mess around lead time. This is my favorite one. From customer insight to production. And so customer insight can be anything a conversation, looking at our analytic dashboard, customer interview, all of that from insight to production about half a day, typically. And the great thing about this is it has nothing to do with coding ability. It has nothing to do with credentials. It's all about the culture of your team. It's all about the trust that you have in your teammates. It's all about the tooling that you put in place so that you're able to know about these defects before they actually impact real users.
And the question I get a lot is, is this for everyone? Oh, Boris, you know, you are in a small start-up. It's fine for you to quotation marks YOLO to production. It will not work anywhere else. And this question usually comes with all of that. That approach doesn't scale. How about migrations? You need controls and manual checks around deployment. You need QA. What if you break production? Well, what I'm hearing, actually, is you cannot have high velocity and quality software at the same time. That's what all those questions are saying, that it's impossible to move fast and not break things. And my answer is yes, we can. And we can with the right culture, the right processes and the right tooling. So how do we achieve this at baseline? It starts with our company culture. So this is one of the most fundamental of our company values, and this is the toned down version because we couldn't really put fuck around and find out on our website. We had to settle with mess around. But the idea is very simple. You are experimenting, you are innovating. You don't know the answers to these questions.
5. Innovation and Observability-Driven Development
Like there was a talk today about async hooks and how it changed from Worker to Node.js, et cetera. But they were experimenting. They were innovating. They found the first solution that wasn't the right one, and now they're fixing it. The second one is ship skateboards. We build the smallest version possible of anything. We put it in production. We see how people react to it, and we keep iterating on it until we get this really fancy car. Or, midway through, we realize that actually, nobody cares about this thing that you just built and you shipped. Scrap it. Remove it. There is no point building this really fancy car, spending six months, deploying it once a month, and discovering that it was completely useless one year later when you could have shipped the smallest version and iterated on it. So, one of the other key pillars is observability-driven development.
Like there was a talk today about async hooks and how it changed from Worker to Node.js, et cetera. But they were experimenting. They were innovating. They found the first solution that wasn't the right one, and now they're fixing it. That's how software is built. You need to mess around and find out if it actually works for people out there.
The second one is ship skateboards. Nobody on our team skates, maybe one person, but the skateboard emoji is the one that is used in our Slack the most. And the reason is we ship skateboards. And this is what it means. We build the smallest version possible of anything. We put it in production. We see how people react to it, and we keep iterating on it until we get this really fancy car. Or, midway through, we realize that actually, nobody cares about this thing that you just built and you shipped. Scrap it. Remove it. There is no point building this really fancy car, spending six months, deploying it once a month, and discovering that it was completely useless one year later when you could have shipped the smallest version and iterated on it.
So, one of the other key pillars is observability-driven development. Is anybody here familiar with the concepts of observability-driven development? Just a few hands. Cool. So, I'm gonna start by telling you what observability actually is, and I have a short video of me here. But before I play the video, I'm not from a software engineering background. So, I studied aerospace engineering, and I was quite shocked when I moved into software a few years ago, and I realized that people actually had no idea what their systems were doing in production. People didn't know how many nodes they have. People didn't know how their users were experimenting their apps. And the reason I was shocked was this. So this is a very younger and skinnier me a few years ago, I was working on drones. So we will fly those drones, pack them with as much instrumentation as possible. Collect as much data as possible about the flight path and the air velocity and all of the things that you can imagine about that drone flying. And when it will land, I will take the stuff out, take the SD card, plug it on my computer.
6. Observability-Driven Development
Observability enables you to ask arbitrary questions to your systems and get answers without code changes. Observability-driven development is about incorporating observability as part of your development lifecycle. You need to instrument your code and actively instrument your application. Testing in production is controversial, but necessary for building distributed systems. Observability without action is just storage. Constructing tight feedback loops is critical for quick innovation.
And from that data, I could tell you absolutely everything that happened to this aircraft whilst it was flying. I didn't need to actually go back and fly the aircraft for whatever reason, because the data was telling me everything. And more importantly, with that data, I could actually build models that will predict how the aircraft will behave in new scenarios that were not part of this flight.
Imagine then when I joined, I started being a software engineer and people are like, oh, we don't know, we need to add logs. Like, what do you mean you need to add logs? Why didn't you have logs in the first place? Why didn't you have tracing in the first place? It's not something that you do at the end. Your feature is not complete until you have proper monitoring, proper observability, dashboards, queries, and alerts on that specific feature that you just built. That's not the right button. And that is what observability is about.
Observability enables you to ask arbitrary questions to your systems and get answers without code changes. That's the critical part, because that issue might happen in so rare conditions that you're actually not able to reproduce it. You should be able to know exactly what happened without changing the system that was in production in the first place. So, what is observability-driven development? I think I'm going to repeat myself a little bit. Observability as part of your development lifecycle. So, when you write code, you need to instrument it. You don't ship code that is not instrumented. Secondly, actively instrument your application. Pretty much the same thing. And the last one a bit more controversial, testing and production. A lot of people want to be able to replicate everything locally. That's great, but when you're building distributed systems, especially with serverless architectures, a lot of the issues that will happen are emergent behaviors that will be extremely hard to reproduce locally.
Imagine having that plane that I was flying and trying to reproduce the gust conditions that happen in the air right on the ground station. That would be pretty much impossible. That's the same thing with distributed systems in the cloud. This is one of my favorite quotes, observability without action is just storage. A lot of people have Elasticsearch that is getting all the logs and they're happy with that. You're paying for storage. If you're not using that data, if you're not querying that data every single day to understand your systems, you're just paying for storage. You're not paying for observability. And the most critical thing to actually be able to innovate quickly is to construct tight feedback loops. And yeah, I'm going to take an example to explain how that works.
7. Customer Call and Deploying to Production
So this was about a month ago, month and a half. I go on a customer call and he tells me, on my Lambda functions, sometimes DynamoDB throttles. It's difficult for me to know that. Thomas shipped the first version of the feature the next day. Everything fails all the time, but we cannot have engineers afraid to deploy to production. We don't have time for observability because we are too busy. Build a culture that fosters ownership, democratize production, instrument everything, ask questions directly to production, build your innovation engine.
So this was about a month ago, month and a half. I go on a customer call and I speak with this guy and he tells me, yo, this is great, but on my Lambda functions, typically I can run them but sometimes DynamoDB, hopefully people are familiar with DynamoDB, it's a database on Amazon Web Services, sometimes it throttles. And it's quite difficult for me to know that it throttled. That was one Wednesday afternoon around 4 or 5 p.m. that I had that call.
I dropped a message on Slack and I said, he was even more excited blah blah blah blah blah. And Thomas, who is on our team, said I know what I'm shipping tomorrow. That was at 6 p.m. one day. The next day at 5.42 p.m. he had the very first version of that feature shipped. It worked only for DynamoDB, none of the other AWS services, but we pushed it to production and were able to get feedback from it. And now we're iterating on it, adding other services and improving on that.
Oh, I say that I'm out of time. Well, the thing is, everything fails all the time. It's not all beautiful, you ship to production, it works. It fails all the time. But what we cannot afford is this, people being afraid to deploy to production. That's what we cannot have. As soon as an engineer is afraid to deploy to prod, we failed as a team.
Oh, one thing that I hear a lot is, oh, Boris, that's great, but we don't have time for observability because we have all these other things to do. And this is what I hear. We are too busy, because we are using that to work when we could be using that instead. And it doesn't matter how hard you work with this, you're always going to be slower than if you use the actual wheels.
So, to recap. Build a culture that fosters more ownership by your engineers. Democratize production to your entire team. Instrument absolutely everything. And get all your answers by asking questions directly to production, rather than trying to guess things. Build your innovation engine and spend less time managing tech debt.
Measuring Lead Time and Testing Approach
We measure lead time at the smallest level possible, shipping the smallest version of an idea to production within a few days. The metrics book mentioned is called Accelerate. We run end-to-end tests before deploying to production, using a test suite that runs every time. We deploy small serverless functions, making testing in isolation relatively easy. With trunk-based development, the risk of technical debt increases over time.
That's it for me. Thanks for having me. Do you measure lead time at the task level or at the project EPIC level?
Oh, we measure it on every single thing that gets to production, at the smallest level possible. So we don't work in large EPICs that take one month to build. We ship the very smallest possible version of it. We have this thing that we call scope hammering. So when someone has an idea, we actually take that idea and we're like, okay, what is the smallest version of this that can go to production within the next few days? And we keep reducing the scope of that idea until we have something that we are confident will be in production within a few days. And that's what we work on rather than trying to build something a bit bigger.
Awesome. Next one should be very quick to answer. What's the name or the metrics book that you mentioned?
Oh, Accelerate. I don't remember the name of the authors. But if you google Accelerate, it's a very long name, how to improve your development lifecycle or something like that, you're going to get the book.
All right. Thank you. Do you run end-to-end tests in some areas? If yes, when do you run them? In poll requests or using a schedule?
Oh, so that's very interesting. So given that we deploy so often, we don't need schedulers for doing things like that. So everything gets, like the test suite that we have gets run every single time something, before something gets to production. And in terms of end-to-end tests, what do we do, actually, is very interesting. We deploy very small serverless functions. So serverless functions do exactly one thing, not two things, like one very simple thing. It means that testing that function in isolation is actually relatively easy, because you just say, these are the inputs, what do I expect as outputs? There is very, okay, the database calls and the cache calls, everything like that, we mark it. But the actual tests themselves are very simple.
Alright, very cool. Accidentally archived one. I'm re-archiving that now. And yeah, with trunk-based development I noticed there is a higher risk of creating a lot of technical depth. It usually shows after seven years of fast development. What's your experience with it?
Well, the company is not seven years yet, so I can't quite answer that. But the way we work is push it to main.
Branches, Code Reviews, and Defects
My experience with branches and all of that is always a nightmare. We have a very small team, but we have easily 100 repos. It's very rare that your push to main will actually clash with someone else's push to main. Interesting idea about not having code reviews. It enables you to move so much faster. One out of ten deployments introduces a defect in production, but very rarely are they experience-breaking. Defects are very localized due to our decoupled infrastructure. We are willing to get churn from a few users as long as we are able to innovate every single day and gain more users than we are actually churning.
This is very personal and biased I will say, but my experience with branches and all of that is always a nightmare. So we have multiple repos. We have a very small team, but we have easily 100 repos. So it's very rare that your push to main will actually clash with someone else's push to main. So just push to main.
Okay, cool, the one that I accidentally archived, now it's your turn. Interesting idea about not having code reviews. Did you measure if this works well for the business eventually? How do you know it's not making things worse?
Oh, interesting. So, I think it was a conscious decision for us to not have code reviews and to not have a lot of code reviews. I think it was a conscious decision for us to not have code reviews and when the guys join, every single time someone joins the team, they're like oh my god, I can't work in this environment. And three weeks later, they're like oh my god, code reviews, who invented that? Simply because it just allows you, enables you to move so much faster. We don't have quantitative numbers that we can share because we started the company with that philosophy. We didn't switch to it and we are able to compare the cadence. But from personal experience at previous positions, it's definitely much, much faster.
Okay, there is one related to that. One second. Oh, that's an interesting one. So, I mentioned that one out of ten defects, approximately, one out of ten deployments introduces a defect in production. But very rarely those defects are actually experience-breaking, right? Like the way our infrastructure is done is so decoupled that, and that's what you get almost out of the box with serverless architecture. If you work with serverless, typically you'll have a much more decoupled architecture. And, as a result of that, defects are very localized. Very rarely a defect will impact a large swan of users. So, more often than not, we are actually the ones telling customers, hey, there was this issue for ten minutes, you were not able to, I don't know, see your dashboards, but nobody opened their dashboard during those ten minutes. So, that's fine. I'm sure that, you know, there might be occasions where we would get some churn. But that's a decision that we made. We are willing to get churn from a few users as long as we are able to innovate every single day and gain more users than we are actually churning. Right. Yeah, there are more questions related to what has been discussed already. Maybe two more.
Considerations for Regulated Industries
In regulated industries like healthcare and banking, the practices may differ due to strict regulations. However, for industries without such regulations, it is important to focus on what works best for your specific industry rather than blindly following industry practices.
Yeah. Would you change something on your practices if you work in a healthcare or banking industry? Oh, yeah. So, huge caveat in all of this, we are building observability. Right, like we are not handling people's money. We are not handling people's health records. I'm sure that in regulated industries, this will be very, very much different. I spent a very short stint in a regulated startup and I absolutely hated it because of how slow it was. Well, if you're in that space, I don't know. But if you're in a space where you don't have those regulations over your head, don't apply the same principles that they apply simply because those principles are best practices. Do what works for your industry rather than copying what, you know, the rest of the industry does.