Mean Time to Restore (MTTR) – DORA Metrics

MTTR: Orange gradient over Plandek developer meeting.

What is Mean Time to Restore?

Mean Time to Restore (MTTR) measures the time it takes to restore service after a production failure. You’re probably already using some iteration of MTTR under a different name or slightly different parameters, such as Pickup to Resolve Time.

To give you more depth, Accelerate: The Science of Lean Software and DevOps defines MTTR as the measurement of ‘time from an incident having been triggered to the time when it has been resolved’ via a production change.

MTTR is a DORA metric and, as such, a core DevOps metric. There are three other DORA metrics popularised by the DevOps Research and Assessments (DORA) group, including Deployment Frequency, Change Failure Rate and Lead Time for Changes.

Mean Time to Restore graph from Plandek dashboard
Mean Time to Restore | Plandek dashboard

 

MTTR is an incident-based metric that helps you understand:

  1. When an incident occurs
  2. When an incident is resolved
  3. How long did it take from occurrence to resolution


Plandek adds intelligent insights to this metric by giving you and your team complete visibility of MTTR over time and alongside other metrics. In this sense, Plandek gives you and your team a deeper understanding of previous workflows as well as how this can – and should – affect your future Sprints, Epics and general work.

 

Related DORA Metrics

Mean Time to Restore is one of four DORA metrics. As such, it is often used as part of a ‘balanced scorecard’ of Agile delivery and DevOps metrics surfaced in real-time.

The other DORA metrics often closely associated with Mean Time to Restore are:

 

Key Use Cases for Mean Time to Restore

Mean Time to Restore is an essential metric for engineering teams who want to monitor their responsiveness and their team’s capabilities. 

When an incident takes longer to resolve than usual, it usually indicates a larger issue: what sort of incident was it? Who was responsible for its resolution? What should the team – and the team leader – change for next time?

Ultimately, the goal when optimising your MTTR is to reduce the overall downtime of your service line. Interruptions cause disruptions, meaning the ability to deliver new value is delayed. When you minimise MTTR, you minimise delays.

 

Expected Outcomes

According to Plandek’s research and data, organisations that prioritise the optimisation of their MTTR – and ultimately reduce interruptions – see multiple benefits:

  1. Increased confidence among engineers, leaders and stakeholders
  2. Increased accuracy regarding timelines for deliverables
  3. Reduced Stuck Pull Requests and Stuck Tickets


Overall, Mean Time to Restore provides essential insight into the stability of your organisation’s – and team’s – delivery performance.

About Plandek

Plandek is an intelligent analytics and performance platform to help software delivery teams deliver valuable software faster and more predictably.

Plandek enables technology teams to track and drive their improvement and share understandable KPIs with stakeholders interested in accelerating value creation/ improving delivery efficiency.

Plandek works by mining data from delivery teams’ toolsets (such as issue tracking, code repos and CI/CD tools) to provide actionable and intelligent insight across the end-to-end software delivery process.

Plandek is recognised as a top global vendor in the DevOps Value Stream Management space by Gartner and Forrester and is used by private and public organisations globally to optimise their technology delivery and accelerate R&D ROI.

For more information, please visit www.plandek.com

Ready to get started?

Try Plandek for free or book a demo with our team