AI adoption is becoming table stakes. Operational maturity is the differentiator - See the 2026 benchmarks

Platform



Solutions



Pricing

Academy



About



Free trial

Book a demo

Platform



Solutions



Free trial

Book a demo

Jan 21, 2025

DORA Metrics in the Age of AI: How Engineering Leaders Should Measure Delivery in 2025

Charlie Ponsonby

—

Co-founder & CEO

Artificial Intelligence

DORA metrics have earned their place as the go-to framework for measuring software delivery. Even Reddit engineers, who roll their eyes at most productivity frameworks, tend to trust them:

That said, how teams write and ship code looks different from how it did a year ago. Google’s 2025 DORA report found that 90% of developers now use AI tools at work. Teams are deploying more often, lead times are dropping, and the metrics all trend in the right direction.

But shipping faster doesn’t mean shipping better, especially when AI is writing chunks of the code. The 2025 report fills this gap by bringing “rework rate” as a fifth metric. It shows how often teams have to push unplanned fixes to production, a blind spot in the original four.

Below, we’ll break down all five metrics and how engineering leaders should interpret them when AI tooling is in the mix.

The 2025 DORA Report: What’s New?

This year’s DORA report is called “State of AI-assisted Software Development,” and the name change says a lot. The research team surveyed nearly 5,000 technology professionals worldwide, and the findings line up with what most engineering leaders have already noticed on their own teams.

AI is now part of the daily routine for most developers. 90% of respondents report using AI tools at work, up 14% from the year before. The average developer now spends around two hours a day with these tools, roughly a quarter of their workday

(Source: Dora.dev)

On the surface, things look good. Deployment frequency is up, cycle times are down, and most developers report feeling more productive.

But the research also found something that makes the picture more complicated. DORA calls it the “mirror and multiplier” effect, and it means that AI tends to amplify whatever is already happening inside a team.

If you have strong processes and clear workflows, AI helps you move faster. But if your team already deals with messy handoffs, poor documentation, or shaky processes, AI won’t just smooth things over.

The report brings two new things to the table. The first is rework rate, a fifth metric that tracks how often teams have to push unplanned fixes to production. It picks up on instability that the original four metrics tend to miss.

The second is the DORA “AI Capabilities Model”, which boils down to seven practices that determine whether AI works for your team or against it. We’ll go through everything in more detail in the next few sections.

How AI Changes Each DORA Metric

The four original DORA metrics still matter. But when AI is part of the workflow, the way you read them changes. A number going up isn’t always good news, and a number staying flat isn’t always bad. Context matters more than ever before.
Here’s what to watch for in each one:

Deployment Frequency

What this metric used to tell you: Deployment frequency measures how often your team ships code to production. Historically, higher frequency meant your team had mature CI/CD pipelines, solid automation, and enough confidence in the process to release often.

What changes with AI: AI coding tools make it easier to write code faster, which naturally leads to more commits and more deployments. Teams that rely on AI often see deployment frequency climb without changing anything else about their process. The metric goes up, but not because the underlying system improved.

What to watch out for: More deployments aren’t always a sign of progress. If your team ships more often but each deployment is less stable, you’re just creating more clean-up work downstream.

Watch for deployment frequency rising while the change failure rate or rework rate rises alongside it. That’s a sign you’re moving faster without the guardrails to support it. Developers are already calling this out in the wild. One Reddit thread explained it perfectly:

How to read it accurately in 2025: Don’t look at deployment frequency in isolation. Pair it with change failure rate and rework rate to see whether you’re shipping faster and staying stable. A team that deploys daily with low failure rates is in a different position than one that deploys daily and spends half their time fixing what they just shipped.

Lead Time for Changes

What this metric used to tell you: Lead time tracks how long code takes to go from commit to production. A shorter lead time usually meant your pipeline had fewer bottlenecks and your team could react quickly when something needed to change.

What changes with AI: AI compresses the front end of the process. Code gets written faster and repetitive tasks take less time. But lead time includes the full path to production, and AI doesn’t touch most of that. Reviews, tests, and sign-offs still take time, and sometimes take longer when teams are less sure about AI-generated output. Here’s how this Reddit engineer put it:

What to watch out for: The total number might look better while the slowdowns just relocate. AI might cut coding time in half, but if reviews stack up or test cycles take forever, delivery doesn’t really get faster. You need to look at where time piles up across the whole pipeline.

How to read it accurately in 2025: Break lead time into stages. Track how long code waits in review, how long tests take, and how long deploys sit in a queue. Teams that see improvement from AI usually speed up the full pipeline (not just the first step). If the gains stop after the code is written, that tells you where to look next.

Change Failure Rate

What this metric used to tell you: Change failure rate measures the percentage of deployments that break something in production. A low number meant your team shipped stable code and caught most bugs before users ever saw them.

AI code often looks fine on the surface: It passes tests, matches conventions, and reviewers approve it without much friction. But that doesn’t mean it’s production-ready. Edge cases and subtle bugs can hide in plain sight, and DORA found that around 30% of developers still don’t trust AI-generated output (for good reason).

What to watch out for: Don’t assume a steady failure rate means everything is fine. Alt A: A flat change failure rate can hide growing problems. If deploys go up but the percentage stays the same, you’re still dealing with more failures than before. Ten percent of 50 deploys is five incidents. This came up on Reddit too, where one engineer said:

How to read it accurately in 2025: Use rework rate as a companion metric. Change failure rate shows the big failures, while rework rate shows the small ones that slip past. If both hold steady while you ship more often, your process is solid. If they tick up, speed might be winning over stability.

Mean Time to Recovery (MTTR)

What this metric used to tell you: MTTR measures how long it takes to restore service after something breaks in production. A short recovery time meant your team had solid monitoring, clear runbooks, and the ability to diagnose and handle issues fast.

What changes with AI: AI can speed up parts of the recovery process. It can help diagnose issues and even generate patches on the fly. But when AI wrote the broken code in the first place, things get tricky. Developers often have a harder time debugging code they didn’t write and don’t fully understand, which can drag out recovery even when the tools are fast.

What to watch out for: Fast recovery is good, but not if the same problems keep coming back. Teams can get stuck in a loop where they bounce back quickly but never tackle the root cause. Also be careful with AI-generated fixes. They might clear the immediate issue but create a new one down the line.

How to read it accurately in 2025: Measure MTTR alongside repeat incident rates. Track whether the same failure categories show up again and whether fixes last. Fast recovery plus fewer repeats means your team is healthy. Fast recovery plus the same fires every month means you’re just getting good at cleanup.

Rework Rate (New Metric)

Why DORA added it: The original four metrics track throughput and big failures well, but they have a blind spot. What wasn’t covered was the quieter cost of going back to fix things that just went out the door. Teams can hit strong numbers on the original four and still spend too much time cleaning up after themselves. That’s why rework rate exists. A reviewer on Reddit described it perfectly:

What it measures: Rework rate counts the unplanned deployments that happen because something didn’t work right the first time. These are the emergency patches, the quick corrections, and the “we just shipped this yesterday and now we need to fix it” deployments.

What it catches that the other four miss: Change failure rate tracks major incidents. Rework rate tracks the smaller problems that don’t set off alarms but still cost time. Quick patches, follow-up fixes, configs that needed another pass. They don’t look urgent, but they add up. And when AI pushes the pace of delivery up, this kind of low-grade churn becomes a major problem.

How to use it alongside the others: Rework rate works best when you read it next to change failure rate. One tracks major failures, the other tracks the quiet cleanup work. If change failure rate looks healthy but rework rate keeps climbing, small problems are slipping through. If both stay flat while deploys go up, you’re in good shape.

Stay on Top of Your DORA Metrics in the AI Era with Plandek

DORA metrics haven’t lost their value. But in 2025, reading them accurately takes more than a dashboard with four or five numbers. You need insights into what’s happening behind the scenes, especially when AI tools are changing how your team writes and ships code.

Plandek helps you get there. It’s a software engineering intelligence platform that integrates with your current tools and shows you what’s happening across the full delivery process. All five DORA metrics are built in, along with the deeper context you need to understand what’s behind them.

Here are just some of the things Plandek brings to the table:

DORA dashboards ready to go: Plandek comes with ready-to-use DORA dashboards that track the core metrics out of the box. You can slice the data by team, department, or the whole org.
50+ metrics that explain the why: The platform tracks 50+ second-order metrics like cycle time, PR collaboration, and flow efficiency. These show you what’s moving your DORA numbers up or down.
AI-powered insights with Dekka: Dekka, Plandek’s AI assistant, analyzes your data and outlines risks, blockers, and action items on its own. Your team gets a daily summary of what needs attention without anyone having to dig for it.
Track how AI tools change delivery: Plandek shows you the impact of tools like GitHub Copilot, Cursor, and Devin on your pipeline. You can see whether AI adoption is helping your team ship better or just ship faster.
Break down lead time by stage: Plandek breaks lead time into stages so you can see what’s happening in code review, testing, and deployment.
Connects to your existing stack: Plandek integrates with Jira, GitHub, GitLab, Azure DevOps, and most CI/CD tools. Data syncs automatically, so your dashboards stay current without extra work.

AI only helps if you can measure its impact. Plandek shows you what’s working, what’s not, and where to focus next.

If you want to see it in action, go ahead and book a demo.

Written by

Charlie Ponsonby

Co-founder & CEO

Charlie Ponsonby is CEO and Co-founder of Plandek, the leading Developer Productivity Insight (DPI) platform that helps software engineering teams drive productivity and transition to AI-led engineering. He writes widely on the opportunities and challenges inherent in the transition to the agentic SDLC. Prior to founding Plandek, Charlie founded Simplydigital, which grew to become the UK's largest broadband and digital services comparison business before being acquired by Europe's largest consumer electronics retailer. He started his career at Accenture and has held senior leadership roles in retail and telco. Charlie holds a degree from the University of Cambridge.