Have you ever been on a project where, because of the tight schedule or tight budget, the focus was only on delivering business stories? How did that work for you? Did you ever manage to pay all the technical debt incurred? What about the process debt?
I think that in most cases, this is a false economy.ย We’re gaining a small benefit now (maybe not event that), but we’re paying a much larger cost in the future. This is because, as Mike Rother says, a process that does not improve, degrades over time. For example, if you’re not continuously improving the feedback through the deployment pipeline, your 20 minute test suite will grow into an 1 hour test suite. If you’re not constantly fixing brittle tests, people will get used to ignoring them. This is not a people problem. It’s a system problem. It’s much harder to make people do something. It’s easier to put the required controls in the process. As an example, fail the build if the tests take longer than 30 minutes.
Why is this still a problem
What’s interesting is that this is not a new problem. But still, most of us still find ourselves in this position. Why? How come we still find ourselves regressing? How come the deadline never allows for any improvement time? How come we still don’t have enough slack?
One problem is that the cost of process degradation is hidden. So what if a test fails from time to time? It only takes 5 minutes to run again. So what if we don’t have automated tests? We only regression test the app every 2 sprints. But these process inefficiencies introduce a lot of waste (like wait time and rework).
Another problem is that the process degrades slowly. The regression test suite didn’t double overnight. It got there slowly, over the course of several months. We’re the frog that gets slowly boiled alive.
Yet another issue is that, many times, we optimize locally instead of globally. Yes, maybe one team delivered on a tight deadline, breaking all the rules and incurring a lot of technical debt on the way. People are celebrating the value that the team delivered. But what about the impact of the technical debt on the other 5 teams that now have to live with it? Is the organization aware of the real price that they will need to pay?
What to do about it
It feels like organizations don’t really learn from other people’s mistakes. Organizations make the same mistakes, regardless of all the books and case studies. So, many times, in order to learn, organization need to feel the pain. It’s like people are saying “It can’t happen to us. We’ll definitely keep the debt under control”. But then it does happen. So, what ca we, as software professionals, do about it?
Back-of-the-envelope cost-benefit analysis
The first thing that we can do is to make the cost clear. As I said previously, many times this cost is hidden. So, in order to improve, we need to inform the stakeholders about how much waste does each inefficiency cause. This doesn’t have to be a perfect calculation. Many of these things are hard to estimate. A quick, back-of-the-envelope calculation should suffice. Even if you spend more time in analyzing the cost, the result will probably not be much more accurate. Let’s see some examples in practice.
A slow test suite
Let’s say that the automated regression test suite takes 2 hours to run. The team thinks that they can cut this in half if they run more tests in parallel. They estimate that this will take roughly 10 days. Cost = 10 man days.
That was simple. What about the benefit?
Lead Time: If the regression step was the bottleneck, Lead time has now improved with 1 hour. This means that we can fix critical bugs 1 hour faster. We can also potentially ship business value 1 hour faster.
Downstream dependencies: If there are other steps that come right after the regression step, those steps will benefit from this improvement. Let’s say that after a successful regression test run, we can deploy to an UAT environment. We do this about 6 times per sprint. So now users will need to wait 1 hour less. Benefit = 6 hours/sprint (1 man day/sprint). We’re making an assumption that 1 man day has 6 hours of work.
Feedback for developers: Although, in theory, people start working on something else immediately after merging their code, in practice this is not that simple. If the definition of done for a user story states that the change should not cause any regressions, you’ll want to check that it doesn’t. So you’ll follow the build through the deployment pipeline and make sure that it doesn’t break anything. If it breaks something, you’ll need to do some context switching. You’ll drop whatever you’ve started and investigate the failing build. This is why, in practice, people aren’t 100% focused on the new piece of work until the old piece of work is done. So, each developer could potentially gain 1 hour of focused work per merge. If you have a team of 4 developers, and each developer merges his code at least daily, then you could gain 4 (developers) * 9 (working days in a sprint) * 1 (hours) = 36 hours/sprint (6 man days/sprint).
So, after this quick back-of-the-envelope calculation, the estimated Cost is 10 man days and the estimated Benefit is 7 man days/sprint. So, you’ll break even after about 1.5 sprints.
Brittle Tests
Position yourself as a Trusted Adviser
- We should understand the business domain that we are working in. We need to talk with business people using the same ubiquitous language. If you’re working in the accounting domain, you should learn basic accounting. Understand the underlying business need and propose alternative solutions. Maybe even try to solve problems without writing code.
- We should care about the business problem that we are solving. We should make suggestions during sprint reviews. We should delight our customers with small UX improvements. These might be easy to do, but will make the Product Owner trust the team even more (thanks Victor Rentea for the tip).