
AI is already in your software development lifecycle (SDLC), increasing throughput, and you can probably already see it in the data – more code, more pull requests, etcetera.
But software delivery performance isn’t improving at the same rate.
Perhaps code is still sitting in review. Testing and release are still limiting flow. Predictability hasn’t improved:
review queues get longer
software testing becomes a bottleneck
defects and rework creep up
delivery becomes less predictable
The system is moving faster. It’s not delivering faster.
If that sounds familiar, it’s not just you. But it also isn’t a problem with the tools – it’s a problem with the system. AI is increasing the rate your SDLC operates at. Whether that translates into better software delivery depends entirely on how that system behaves under pressure.
How AI is actually changing the SDLC
In the right conditions AI can improve delivery end-to-end. At a high level, the gains are obvious:
software engineering planning and requirements move faster
code delivery gets faster
tests can be generated earlier
CI/CD pipelines become more automated and responsive
Most teams start in the same place: development. You introduce code generation, output increases almost immediately, and suddenly you have more pull requests, more changes, more work moving through the system.
But the rest of the SDLC doesn’t change at the same rate.
So what you’ve really done is increase the rate at which work enters a system that already has constraints.
And software delivery is not limited by how fast you can start work. It’s limited by how fast work can move through the system. In most organizations, that flow is already constrained.
AI doesn’t remove those constraints by default. It exposes them, and puts them under pressure. If you apply AI across those constrained stages – improving review, testing, release, and even upstream clarity – you can unlock real gains.
If you don’t, the outcome is predictable: more work enters the system, but it doesn’t come out any faster.
The challenge is not implementing AI in software development, but integrating it across the full SDLC to ease bottlenecks.
Learn how to identify and fix software engineering bottlenecks here.
What leaders get wrong: measuring AI at the point of generation
Where this really breaks down is in how teams measure what’s happening. Most teams track activity instead of meaningful engineering productivity metrics:
adoption
prompt usage
code generated
pull requests opened
That’s understandable. It’s easy to see, and it moves quickly, but it’s also where the least meaningful signal is. AI doesn’t create value when code is written, but rather when that code is delivered into production – at quality, and on time.
If you focus on development activity, you miss where delivery actually slows:
work waiting in code review
queues building in software testing and QA
delays between merge and deployment
blockers between stages of the SDLC
And because those delays are less visible than code output, they’re often ignored. This is how teams end up with a false sense of progress. Often, this pattern emerges:
more defects introduced
more rework and bug fixing
more capacity pulled into support and maintenance
less time spent on roadmap delivery
So the system becomes busier, but less of that effort contributes to value.
You can quite easily increase output while reducing value delivery. If flow hasn’t improved – or if quality is degrading – then nothing meaningful has improved. You’ve just increased the amount of work the system has to absorb.
How to measure AI impact in the software development lifecycle
The importance of viewing AI-enabled SDLCs at a system level is why we look at AI impact through four connected dimensions: Focus, Speed, Predictability, and Quality. We call them the Four Pillars of Productivity..

If those are improving together, AI is working. If one improves while the others degrade, you are not really getting a productivity gain. You are just moving the problem around.
Pillar 1: Focus – are you creating more value, or just more work?
The first failure mode is simple: teams mistake activity for progress.
AI increases output. That’s obvious. But what matters is where your capacity is going.
If AI leads to:
more defects
more rework
more support and maintenance
more time spent fixing instead of building
then you have reduced focus, not improved it. This is where a lot of teams quietly lose ground. They look busier, but a smaller proportion of their effort is actually moving the roadmap forward.
AI should increase time spent on value delivery. In many teams, it does the opposite.
Pillar 2: Speed – are you delivering faster, or just feeding the system faster?
Most teams see gains here first, and this is exactly where many of them misread what’s happening.
Yes, AI makes developers faster. But delivery speed is not coding speed. It’s how quickly work moves from idea to production.
And that’s where things tend to break.
You generate more code, but:
PRs sit longer in review
senior engineers become bottlenecks
queues build before merge
testing and validation lag behind
So throughput into the system increases, but flow through the system doesn’t. This shows up in metrics like lead time to value and cycle time.
That’s why lead time doesn’t improve, and in some cases, actually gets worse.
Pillar 3: Predictability – are you still in control of delivery?
This is where AI starts to expose deeper issues.
As output increases, variability increases with it.
You see:
more scope change mid-sprint
less stable planning
more inconsistent delivery
greater reliance on coordination
Because the system is under strain.
More code means more decisions. More decisions mean more dependencies, more handoffs, and more chances for things to slow down.
Without strong delivery discipline, AI doesn’t make teams more predictable. It makes the system harder to control.
Pillar 4: Quality – are you scaling output, or scaling rework?
This is the most dangerous failure mode, and the one most teams underestimate. Faster code generation creates the illusion of progress, until quality catches up with you.
If review and testing don’t scale, you get:
more defects introduced
more bugs per unit of output
longer resolution times
growing defect backlogs
And over time, something more structural happens. You start to accumulate technical debt, not just in the code, but in the system:
code that’s harder to understand and review
more fragile integrations
more effort required to make changes safely
At that point, the system starts consuming itself.
More and more capacity gets pulled into fixing, reworking, and stabilizing what’s already been built, instead of delivering new value. That’s when AI stops being a multiplier and starts becoming overhead
How to implement AI in the SDLC (without breaking delivery)
Most teams treat AI adoption in software engineering as a tooling rollout. They track usage, encourage experimentation, and expect results to follow. When they don’t, it’s not obvious why. Is it how the tools are being used? Is it the SDLC? Is it how impact is being measured?
We’ve seen this repeatedly, both in our benchmark data across 2,000+ teams and in the companies we work with.
Teams end up jumping straight from “Are we using AI?” to “What’s the ROI?”, yet skipping the part where the answer actually sits.
We use the RACER Framework to help engineering leaders address this problem. It’s a way of looking at AI adoption as a system, in the order it actually plays out.

R – Rollout
Are teams using the tools in their day-to-day work?
But rollout only tells you where AI is present. It says nothing about whether it is improving delivery.
A – Approach
Are teams using AI in a way that actually improves how work gets done?
Most teams focus on code generation because it’s immediate and visible. Fewer apply AI to testing, documentation, refactoring, or upstream work. So you increase output in development, without improving how work moves through the rest of the system.
C – Constraints
What is limiting the gain? At some point, every team hits this.
AI increases throughput. Something else in the SDLC becomes the limiting factor. It could be review, testing, requirements, release or some combination of these. It varies by team, but there is always a constraint. This is where most progress stalls, not because the tools aren’t working, but because the system can’t absorb the change.
E – Engineering Impact
Is the system actually performing better? If AI is working, it shows up in delivery:
more time spent on value delivery
faster movement from idea to production
more predictable execution
stable or improving quality
If those aren’t improving together, you don’t have a productivity gain. You have more activity.
R – Results
Is this translating into outcomes? Only once the system improves do the results become clear:
faster time to value
more roadmap capacity
less rework
better use of engineering time
This is where AI becomes meaningful at a business level.
Struggling with AI in your SDLC? Understand the RACER Framework
Where Plandek fits: turning AI activity into delivery performance
AI is increasing throughput across your SDLC. What’s much harder to see is whether that translates into better performance, or simply exposes new constraints and risks.
Teams can usually track who is using AI, how often, and how much. What they struggle to see is:
whether delivery is actually faster
where work is slowing down under increased throughput
how quality and predictability are changing
how much capacity is going toward value delivery versus rework
Plandek is built to close that gap.
It gives you a system-level view of your SDLC, so you can measure how AI is affecting delivery performance, not just activity. Instead of relying on isolated signals, you can see how changes in one part of the system are impacting the whole. Using the same four dimensions we’ve outlined, Plandek helps you understand:
Focus – whether AI is increasing time spent on roadmap work or pulling capacity into support, rework, and technical debt
Speed – whether work is actually moving faster from idea to production, not just entering the system faster
Predictability – whether delivery is becoming more consistent, or more volatile under increased throughput
Quality – whether defects, rework, and technical debt are increasing or under control

Crucially, it also helps you act on what you see.
Plandek identifies the constraints that are limiting AI’s impact across your workflows, codebase, and processes, so teams can prioritise what to fix rather than guessing. It also provides the visibility and controls needed to manage AI-related risk and compliance as adoption scales.
Plandek gives you the data, structure, and context to make sure that amplification works in your favour.
👉 See how Plandek helps engineering leaders measure, manage, and scale AI impact across the SDLC
Key Takeaways
AI increases throughput, not delivery performance by default – most teams generate more code without improving flow through the system
The SDLC, not the tools, determines outcomes – constraints in review, testing, and release limit impact
Measuring activity is misleading – code generated and PR volume say nothing about real delivery performance
AI exposes bottlenecks rather than removing them – faster input simply puts more pressure on weak parts of the system
Productivity must be measured across Focus, Speed, Predictability, and Quality – improving one while degrading others is not a gain
Successful AI adoption is a systems problem – teams that identify constraints and optimise flow are the ones that see real results
FAQs
What is AI in the SDLC?
AI in the SDLC refers to using AI tools across the software development lifecycle, from planning and coding to testing and deployment, to improve delivery performance.
Why doesn’t AI improve software delivery performance automatically?
Because AI increases development speed, but delivery is limited by system constraints like code review, testing, and release processes.
What are the most important metrics for AI in software engineering?
The most important metrics are system-level software delivery metrics such as lead time to value, cycle time, defect rates, and delivery predictability.
Where does AI adoption typically fail in software teams?
AI adoption typically fails at the constraint stage, where increased output overwhelms code review, testing, or deployment processes.
How does AI impact code review and testing?
AI increases the volume of changes, which can slow down code review and overload software testing processes if they don’t scale accordingly.
How can engineering leaders measure AI impact effectively?
Leaders should measure AI impact across the full SDLC using balanced metrics for speed, quality, predictability, and value delivery – not just activity or usage.
Written by
Charlie Ponsonby
Co-founder & CEO
Charlie Ponsonby is CEO and Co-founder of Plandek, the leading Developer Productivity Insight (DPI) platform that helps software engineering teams drive productivity and transition to AI-led engineering. He writes widely on the opportunities and challenges inherent in the transition to the agentic SDLC. Prior to founding Plandek, Charlie founded Simplydigital, which grew to become the UK's largest broadband and digital services comparison business before being acquired by Europe's largest consumer electronics retailer. He started his career at Accenture and has held senior leadership roles in retail and telco. Charlie holds a degree from the University of Cambridge.
See how your engineering efforts translate into measurable business impact
Measure delivery performance, AI impact, and engineering productivity with hundreds of metrics, OOTB dashboards and custom configurations.
Contact us
UK Office
Unit 313 The Print Rooms, 164-180
Union St, London SE1 0LH
US Office
Floor 4, 1515 Mockingbird Ln,
Charlotte, NC 28209, USA












