The Real Cost of Technical Debt

After eighteen years of building and leading engineering teams, I have come to believe that the single biggest communication failure between engineering and the rest of the business is around technical debt. Engineers feel it every day. They know the codebase is slowing them down. But when they try to explain it to executives, they reach for metaphors about messy rooms or credit cards, and the conversation stalls. The executive hears “we want to rewrite things because they’re not pretty enough,” and the engineer hears “they don’t care about quality.” Both walk away frustrated.

I have been on both sides of that table, and I think the problem is simpler than it seems: we are not speaking the same language. So let me try to bridge that gap.

Quantifying Debt in Terms That Matter

The first step is to stop talking about technical debt in technical terms. Nobody outside engineering cares that your service layer has circular dependencies or that your database schema was designed by someone who has since left the company. What they care about is velocity, reliability, and cost.

Here is how I frame it. I pull up our sprint data from the last six months and calculate what I call the “drag coefficient” — the percentage of each sprint that gets consumed by unplanned work, workarounds, and time spent fighting the codebase rather than building features. In a healthy codebase, this number sits around ten to fifteen percent. In a debt-heavy codebase, I have seen it climb past fifty percent.

When you tell a CFO that half of your engineering budget is being spent on fighting your own systems rather than building what customers need, they understand immediately. That is not an abstract metaphor. That is money. If you have a twenty-person engineering team costing three million a year, and your drag coefficient is forty-five percent, you are burning $1.35 million annually just to work around problems you already know about. Now you have their attention.

Velocity Metrics Before and After

I track this religiously because the data tells a compelling story. On one team I led, we measured story points delivered per sprint over a six-month window. During the first three months, with significant accumulated debt, we averaged twenty-two points per sprint with high variance — some sprints we would hit thirty, others we would barely crack fifteen because something broke or a “simple” change turned into a week-long archaeology expedition.

After a dedicated three-month debt reduction effort, our average climbed to thirty-one points per sprint, and the variance dropped dramatically. We were not only faster, we were more predictable. That predictability is worth its weight in gold because it means product managers can actually make commitments to stakeholders and keep them. It means marketing can plan launches with confidence. It means the whole business runs better, not just engineering.

The “20% Time” Myth

A lot of organizations try to address technical debt by allocating some percentage of each sprint to it. “We’ll spend twenty percent of our time on tech debt.” In theory, this sounds reasonable. In practice, it almost never works.

Here is why. When you mix debt work into feature sprints, the debt work always loses. A customer-facing bug comes in, a deadline gets moved up, a stakeholder makes a last-minute request — and the first thing that gets cut is the debt work because it does not have a visible champion. I have watched this pattern play out at least a dozen times across different teams.

What actually works is dedicated debt sprints. Full sprints where the entire team focuses on nothing but paying down debt. I typically advocate for one dedicated debt sprint out of every five or six. This approach works because it creates a protected space that cannot be nibbled away by competing priorities. It also creates momentum — the team can make meaningful progress on interconnected debt items rather than trying to chip away at the edges.

The key is to treat these sprints with the same rigor as feature sprints. They need clear goals, defined acceptance criteria, and measurable outcomes. “Make the codebase better” is not a sprint goal. “Reduce average API response time from 800ms to under 200ms by refactoring the query layer” is a sprint goal.

Prioritizing What to Fix First

Not all debt is created equal, and trying to fix everything at once is a recipe for never finishing anything. I use a simple two-axis framework: customer impact and change frequency.

Customer-facing systems that change frequently get top priority. If your checkout flow has accumulated debt and your team touches it every sprint, that is costing you the most in both drag coefficient and risk of customer-facing incidents. Internal systems that rarely change go to the bottom of the list. Yes, that admin panel was written in 2017 and the code makes you wince, but if it works and nobody needs to modify it, leave it alone.

The middle of the priority list is where judgment comes in. Internal systems that change frequently — things like your deployment pipeline or your data ingestion layer — deserve attention because they slow down everything else, even though customers never see them directly. Customer-facing systems that rarely change are lower priority because the debt is not actively costing you velocity, even if it represents latent risk.

When to Rewrite vs. Refactor

This is the question that has launched a thousand arguments in engineering teams, and I have a strong opinion: the answer is almost always refactor.

Rewrites are seductive because they promise a clean slate. But they carry enormous hidden costs. You lose all the institutional knowledge embedded in the existing code — all those weird edge cases that got handled over the years, all the subtle business logic that nobody documented. You also lose the ability to deliver incremental value. A rewrite is a binary bet: it works when it is done, and until then, you have nothing.

I have only seen rewrites succeed in two scenarios. First, when the existing system is so fundamentally architecturally wrong that incremental changes cannot get you where you need to go — like when you need to move from a monolithic batch processing system to a real-time event-driven architecture. Second, when the existing system is small enough that a rewrite can be completed in two to three sprints. Anything larger than that, and the risks compound faster than the benefits.

For everything else, the strangler fig pattern is your friend. Wrap the old system, build new functionality alongside it, and migrate piece by piece. It is slower and less satisfying than a clean rewrite, but it is dramatically safer and lets you deliver value continuously.

Building Debt Awareness Into Sprint Planning

The most important thing I have learned about technical debt is that managing it is not a one-time project — it is an ongoing discipline. Here is how I build that discipline into the regular cadence of engineering work.

Every sprint planning session starts with a debt check. The tech lead spends five minutes reviewing any new debt that was introduced in the previous sprint — because yes, you are always creating new debt, and pretending otherwise is dishonest — and flagging any existing debt that is starting to bite. This keeps debt visible without turning every planning session into a therapy session about the codebase.

We maintain a debt register, which is just a backlog specifically for debt items. Each item has an estimated drag cost — how much velocity are we losing per sprint because of this item — and an estimated fix cost. This lets us make rational economic decisions about when to address specific items rather than relying on gut feel or whoever complains the loudest.

Every quarter, I present a debt report to leadership that includes the current drag coefficient, the trend over time, and a projection of velocity gains from the planned debt work in the coming quarter. This keeps the conversation grounded in business outcomes and builds trust that engineering is being responsible stewards of the company’s investment.

Technical debt is not a failing. It is a natural consequence of building software in a world where requirements change, deadlines matter, and perfect is the enemy of shipped. The failing is ignoring it, or being unable to articulate its cost in terms the business can act on. Get the language right, get the measurement right, and the rest follows.