Hardly any people in Tech like when there's a lot of tech debt. And most of us would like when there's not too much of it. But how do we understand how much exactly we have of it? Where exactly does it sit? Which part of it is actually the most annoying? What would be the benefit for us if we spend time getting rid of it? When it comes to planning how you tackle your tech debt, all these questions deserve answers. Especially when we're asked about the ROI on our efforts to eliminate some annoying legacy stuff and build a new shiny module or service. Also, when we work on tech debt, we do want to tackle the most impactful parts of it first, don't we? This talk is about all of that: how we measure our tech debt, how we interpret the results of these measurements so that they give us the answers to the right questions, and how we guide our decision making with those answers.
A Quick and Complete Guide to Measuring Your Tech Debt and Using the Results
Hi, everyone. My name is Anton, and thank you for having me at TechLeadConf this year. And in the next 20, maybe 25 minutes, I'm going to be talking about measuring tech debt and interpreting the results. I hope this talk will be useful and interesting for you. But first, a couple of details about me. I lead engineering organizations. I am a director of engineering at Westing in Germany at the moment. And also, I mentor and coach other engineering leaders, like engineer managers, staff principal engineers. So if you're looking for a mentor or for a coach, feel free to contact me. I'm always open. And also, outside of work, I'm a father and a big fan of mountain skiing and hiking, amongst other things. Down below, you can see my Twitter handle and my contact info, like email address and also the link to connect with me on LinkedIn. And yeah, with that, let's dive right in. Let's start with a shocker. What have I told you, says our dear friend Morpheus, that not all your tech debt actually needs fixing? So I fully support this statement, actually, or this question, rather. And to understand how's that, how's that not all tech debt needs fixing? And also, how does it even contradict a relatively popular belief that tech debt is generally the root of all evil? Let's look at a few things. So starting with actually what is tech debt, if not the root of all evil. It was the term coined by this guy, Ward Cunningham, in 1992. And he was much younger at that time, because this is a screenshot from his explainer video on YouTube that he posted in 2009. And the video's called Tech Debt Metaphor, and he's talking in depth there, like at large, about it. And here are a couple of quotes, which we're not going to read. We'll just skip over them. So according to Ward Cunningham, the author of the term tech debt, it is not the root of all evil. And what it is, it is a tool to temporarily speed up development, in that we choose to cut corners to speed up now at the expense of slowing down later. Much like taking a loan, we can afford something earlier than we earn it fully, and then for that we have to pay interest. So since it's a tool, is tech debt inherently bad? Well, let's ask our dear friend Morpheus, and he will tell us that the correct answer is it depends. As to many questions in the software engineering domain, well, how does it depend? That's the interesting part, right? What does it depend on? Like how is it even... How can we see what it depends on? Well, first, let's look at how we tell if we have tech debt in the first place. Now we can do by looking at our piece of software and answering a number of questions. First, is it easy to analyze and understand this piece of software? Second, is it easy to modify? Third, is it safe to modify? And finally, the fourth question, are there any needed technical requirements, for example, scalability, stability, security requirements that aren't implemented? And if we have a yes to any of these questions, we do have tech debt in this piece of software. Additionally, what I like to distinguish between are two types of tech debt, the first being maintenance tech debt. This is the tech debt related to the first three questions, which is basically the part of tech debt that is slowing down our changes, be it features or be it other types of changes in our code base. We're just implementing them slower because of the tech debt, because the design is not perfect. And the second part of tech debt is continuous tech debt, which means that we need to spend some time, because of some tech debt in our application, we need to spend some time on keeping the application operational. And this tech debt actually shrinks the bucket of time that we can spend on introducing changes into our code base. So this is how these two types of tech debt are different, and this is how they differently define the answer to the question, when does tech debt become bad, actually? Because not all of it is inherently bad, but when does it become bad? Well, continuous tech debt is immediately bad as soon as we deploy it to production. So any shortcuts, reliability, scalability, security shortcuts, produce some first order consequences like bugs, downtime, security breaches, God forbid, stolen data, which then in turn all produces ad hoc effort, context switching, which is, as we know, another popular productivity burner, sometimes lost revenue, sometimes things that are even worse than that, like long-term reputational damage and so on and so forth. So we don't want that. And continuous tech debt is immediately bad, and it is posing all of these risks on our product. Now, maintenance tech debt only becomes bad when you need to introduce changes in the area where you have it, which makes sense, right? Because if something is working, you don't care about how bad the design of it is, as long as it's working and you don't need to introduce any new requirements there. So for instance, we're in the microservices world, right? So it can be a simple microservice, poorly designed, made 10 years ago, that sits there and just works and has worked ever since because there were no further requirements to implement and we did not need to care about the tech debt that sits in there. What that all means is that the main questions about tech debt to ask are actually, how do we tell which tech debt needs fixing, at least right now? And also, how do we tell when the tech debt that needs fixing is getting out of hand so that our backlog of the tech debt that actually needs fixing short term is getting too big to fix short term? And the answer to that is in the tech debt metrics or tech debt-related metrics, which I like to divide in three main buckets, namely heuristic metrics, second tier metrics, and the bucket that only contains one, tech debt interest, and you will see why it's so special. Spoiler alert, it is special. So let's start one by one. Heuristic tech debt metrics. When we hear the word heuristic, it's usually about something automated, right? And these metrics are no exception. They are automated. They are usually provided by the tooling that already exists. And most of this tooling is measuring things like cyclomatic code complexity, code duplication, code smells, another thing that was first popularized by the guy named Kent Beck in 1990 and then hugely promoted in the book Refactoring, which many of you may know about, written by Martin Fowler in 1990, published in 1999, actually, sorry. Then there will be something like maintainability index, which can have a different name, but be generally the aggregation of the above metrics and something else potentially. Then there will be tech debt ratio, which is this ratio of tech debt remediation costs divided by development costs. Unfortunately, despite having cost here and this metric allegedly being about direct business impacts in money and so on, this cost is too synthetic and too inaccurate, therefore, because usually it's determined by the number of lines of code you have in your code base multiplied by some synthetic quotient, which is the cost of developing a line of code, which you can imagine can vary depending on the line. And it's rarely a good metric to kind of show the actual effort. Then there will be two more metrics, something like statically or heuristically detectable security issues and also heuristically detectable potentially missed edge cases, which are especially important in loosely typed languages where we don't have compilers to detect those cases. And finally, these metrics are also dividable in those two buckets that we talked about previously, maintenance tech debt and continuous tech debt. It's just that continuous tech debt here is not something that takes us by surprise, but rather something that we can detect already while analyzing the code and potentially fix, which is generally a good idea. Now we mentioned tools, right? So these are the tools that I just put there off the top of my head. The biggest ones probably, so SonarQube, StepSize and then Code Climate Quality, I believe it's called. And there are other tools, CLI tools, tools with the UI and what have you. So let's talk about pros and cons of heuristic tech debt metrics. Among the pros, there will be the ease to get the numbers or the full code base at once. Basically once you choose the tool and you set it up, you'll get these numbers with all the necessary split across modules, folders and so on in minutes. Then there will be the ease to segment metrics, so get the split by module or what have you that I mentioned previously. And this would be useful to detect potential hotspots, so where the metrics are showing more tech debts than in other places, which would be some spots that you potentially want to take a closer look at. Now there will be cons, obviously. Cons would be that it's hard to convert these metrics into the amount of the actual work the tech debts behind them requires, because I don't know, take cyclomatic complexity. You know that in this class it's like 15 or in this method it is 15. What does it give you in terms of the effort to fix it? Practically nothing. So you will still need to look into it and estimate, interpret it somehow. But it's also hard to convert them into business impact, again, cyclomatic complexity of 15. What business impact does it have? How much does it slow us down or does it slow our feature development? We actually don't know. We need to look into it and estimate. And then it's also really hard to prioritize between hotspots because, again, we don't know the business impact of those hotspots. And that's why more tech debt by these numbers doesn't always mean more business impact. So this prioritization with these numbers alone doesn't give us much. So that's it about the heuristic tech debt metrics. So let's look into the next bucket. Second tier tech debt metrics, which are actually not exactly tech debt metrics, but rather the metrics of what it influences and or causes. Things like effort split between ad hoc support and features, you can define a threshold there. For instance, I don't know, everything beyond 30% spend on ad hoc support would potentially be problematic. For some teams, it has to be even less to be problematic. So you define the threshold, but it's important to track that. Then cycle time feature tickets, because usually we optimize for features in nearly every team because we want to introduce changes as fast as possible because we then improve our product and improve our business KPIs, revenue, and so on. And we want to introduce them as fast as possible. So if we track cycle time, we will see if we're slowing down and we don't want to. Then there will be buck trends. If we open more bugs than we close consistently over a period of time, then we may get in trouble because there will be only bugs that we will be working on. And then the effort split between ad hoc support and features will be too concerning. And there will be software uptime, a pretty clear metric, and all sorts of mean times, like mean time to recover from an outage, mean time to detect an outage, and mean time between failures, aka outages, which basically shows us how stable we are and how much time it takes us, or how efficient we are in terms of detecting the outages and recovering from them because all outages are potentially a lost revenue. And yeah, we want to handle them quickly. Now let's look at the pros and cons. The biggest pro of this bucket of metrics is that they are directly connected to the business impact, which means that they give us leverage to prioritize tech debt related changes against business features, which is what we want, right? We come to our business stakeholders, we say, okay, here's how much business impact this tech debt elimination will bring. And they say, okay, that's nice. Then we prioritize it higher than this list of features, which is exactly what we want, right? Then these metrics are relatively easy to collect, provided there are certain best practices followed like time logged on tickets, work log, and an issue tracking system, and the downtime that are recorded and so on, which is anyway something that I would strongly suggest that you do. And then among cons, there will be actually just one, but a big one. These metrics are really very generic, so it is hard to connect them with specific code modules, classes, and et cetera, to get what exactly you need to fix, which means that although they're connected directly to the business impact, it's harder to give a promise to the business stakeholders that you will deliver this business impact because you need to go into some sort of investigation and guessing what is exactly driving this business impact, this negative business impact that you have now that you can then fix. And this is where our third bucket, consisting of one beautiful metric called tech debt interest, come into the picture, comes into the picture. Now this is a concept that was described by Martin Fowler, the Martin Fowler, the author of the book Refactoring, in the year 2008. The article I linked here, but the concept in a nutshell goes like this. It is important to get on to the ticket level when you want to understand how much tech debt is actually impacting you, because for every ticket we know the effort it took, provided we were logging this, and we should. And with the team that has worked on this ticket, we can actually also estimate how much effort it would have taken without tech debt. And although it is an imaginary situation, there would never be no tech debt, there can only be very little tech debt, but we can do this mental exercise and assume it would be taking this much time if we weren't slowed down by tech debt. And the difference between these two is tech debt interest. Now there we go over all tickets that we can remember and estimate it. So some tickets will have more tech debt interest, some will have less, some will consist of tech debt interest because those will be some bugs or putting out fires, like outages and so on that wouldn't be there if it wasn't for tech debt. Some will have no tech debt interest because it would be some new functionality that isn't impacted by tech debt at all, like a separate module or something that we're starting anew. And then finally we have the tech debt interest estimated for everything. And we also put labels or tags, whatever you call that, that represent modules or components that all the tickets are related to. And with this info, we can get a table of this sort where we would have our code segments by tags and the effort and consequentially the tech debt interest, the average tech debt interest associated with them. And for every segment of the code, we'll be able to calculate the actual overhead in efforts like person days or whatever the unit of effort you're using that it took, which is then great to detect the actual hotspots, like how much we were affected by this tech debt and the business impact that it will have whenever we fix this tech debt actually. And that is exactly what we want, right? Because this is how we can calculate the ROI. Then the ROI is the metric of them all for any changes that we want to prioritize against business features because those will have ROI as well. So and then we can compare apples with apples. Let's say we want to fix the component authentication and we'll take five person days. Then we take 14 person days of the overhead due to tech debt in the recent, I don't know, six to 12 months divided by five and get what? 2.8. So that's actually a relatively nice ratio. Now, just a couple of words about how to collect tech debt interest because it may be cumbersome and it's good to have a working, quite optimal algorithm. So first of all, you make sure your team logs time spent on tickets. Then you add a field estimation with no tech debt in your issue tracker to every issue, aka ticket. And then with the team, you sit together and fill the field for the past tickets within a reasonable timeframe, being the timeframe that they still can remember about, like they still can remember the details of the tickets for them. Like I don't know, for instance, beyond six months into the past, I would hardly expect anyone to remember anything in detail, but three months into the past, maybe. So define it for yourself. And then the most important thing is that from that point on, you repeat the exercise for the new tickets, for instance, either every retrospective or even better by adding where the issue tracker supports this additional field to the issue close dialogue so that every engineer, when closing the ticket, just fills this field. And yeah, then you have the data at your fingertips at all times. Specifically about the pros and cons of tech debt interest. First of all, it's also directly connected to the business impact, which as we know, is great. Then one of the biggest ones, it ignores sleeping tech debt that isn't actually causing any trouble or hasn't been causing any trouble in the past months that we remember about or that we have data about. It also establishes a really clear relation between modules, components or other segments of your code and the tech debt business impact in them, which then in turn eases tech debt prioritization within the bucket of tech debt. So you know clearly what you need to start working on in that bucket. Among the cons, you will have that it's relatively cumbersome to collect and it is indeed so. But if you ask me, it's totally worth it and it's really not so much hustle. That it is opinionated, also questionably a disadvantage because, I mean, it is a disadvantage, but you will have opinions involved in the interpretation of all the metrics that I mentioned previously. So it is probably just a little bit more opinionated, as every estimation, than those interpretations of the previous metrics. Or sometimes maybe less. And then the biggest one is that this metric is based on the work log, which means that it's completely blind to the parts of code that haven't been worked on. Because we don't have data. There are all those estimations, how much it would have taken if there wasn't any tech debt. And that means that we can't really have predictions about the future of those parts of the code and the changes and the tech debt interest that will be imposed on them. And yeah, so we only can know these things about the parts of the code we have worked on. That would be it about tech debt interest. And now we can actually conclude how to use all those metrics. Because first of all, you have seen that there's no ultimate metric that you can just take and measure everything with. So there has to be some sort of a synergy of them all. And this is the synergy I would suggest. But the core of it, that would be tech debt interest. Because it is great to measure the impact of both types of tech debt, continuous and maintenance tech debt. Because yeah, you can slice and dice the data the way you like. And yeah, you will always have some numbers where you have the data recorded. It gives a very good educated prediction, tech debt interest, on the tech debt impact for the parts of the code you have tech debt interest data for. Yeah, so the ones that you have worked on. Because it's a much better prediction. So an educated prediction is a much better prediction than just an estimation. Here you have some stats and more granular estimation, which is almost always better. And then finally, I already mentioned the ROI. And this is the only metric that will give the ROI to you. And then it eases cross-prioritization between tech debt changes and business features greatly. Then we add heuristic metrics into the picture. They give us a good overall idea of the code base health, regardless of whether we work on it or not. They are then useful to estimate the potential tech debt impact for the parts of the code that we're only planning to work on. So we don't have tech debt interest data for. Then we can compare them with the ones that we have tech debt interest data for. And then proportionally assume that the tech debt interest will be this or that. And then these metrics are definitely indispensable for automated quality gates. But likely many of you have seen previously all those things that check your merge requests or pull requests and say that you can't merge them because something like cyclomatic complexity is wrong and then the code is duplicated and so on and so forth. And this is great because then we don't increase the amount of tech debt and we automatically check for this. And finally, second tier metrics. They're great because they show the impact of our tech debt elimination efforts via trends. Like I don't know, the ad hoc share of efforts, the share for ad hoc efforts goes down. Or like, I don't know, the stability goes up and so on. And they're generally indispensable for tracking the overall team and software health. And that's why I suggest that every team tracks them. And that would be everything I have to say about measuring tech debt and interpreting the results. Here's a slide with the sources whenever you get hold of the presentation. And with that, thank you very much. Here's how you can contact me once again. And I'm looking forward to the Q&A section.