What is Mean Time to Restore?
Mean Time to Restore (MTTR) measures the time it takes to restore service after a production failure. You’re probably already using some iteration of MTTR under a different name or slightly different parameters, such as Pickup to Resolve Time.
To give you more depth, Accelerate: The Science of Lean Software and DevOps defines MTTR as the measurement of ‘time from an incident having been triggered to the time when it has been resolved’ via a production change.
MTTR is a DORA metric and, as such, a core DevOps metric. There are three other DORA metrics popularised by the DevOps Research and Assessments (DORA) group, including Deployment Frequency, Change Failure Rate and Lead Time for Changes.
MTTR is an incident-based metric that helps you understand:
- When an incident occurs.
- When an incident is resolved.
- How long it took from occurrence to resolution.
Plandek adds intelligent insights to this metric by giving you and your team complete visibility of MTTR over time and alongside other metrics. In this sense, Plandek gives you and your team a deeper understanding of previous workflows as well as how this can – and should – affect your future Sprints, Epics and general work.
Related DORA Metrics
Mean Time to Restore is one of four DORA metrics. As such it is often used as part of a ‘balanced scorecard’ of Agile delivery and DevOps metrics surfaced in real-time.
The other DORA metrics often closely associated with Mean Time to Restore are:
- Lead Time for Changes
- Deployment Frequency (DF)
- Change Failure Rate (CFR)
Key Use Cases for Mean Time to Restore
Mean Time to Restore is an essential metric for engineering teams who want to monitor their responsiveness and their team’s capabilities.
When an incident takes longer to resolve than usual, it’s usually indicative of a larger issue: what sort of incident was it? Who was responsible for its resolution? What should the team – and the team leader – change for next time?
Ultimately, the goal when optimising your MTTR is to reduce the overall downtime of your service line. Interruptions cause disruptions, meaning the ability to deliver new value is delayed. When you minimise MTTR, you minimise delays.
According to Plandek’s research and data, organisations that prioritise the optimisation of their MTTR – and ultimately reduce interruptions – see multiple benefits:
- Increased confidence amongst engineers, leaders and stakeholders.
- Increased accuracy regarding timelines for deliverables.
- Reduced Stuck Pull Requests and Stuck Tickets.
Overall, Mean Time to Restore provides essential insight into the stability of your organisation’s – and team’s – delivery performance.
Plandek is an intelligent analytics platform that helps software engineering teams deliver value faster and more predictably.
Plandek mines data from delivery teams’ toolsets and gives them the opportunity to optimise their delivery process using both intelligent insights and predictive analytics.
Co-founded in 2017 by Dan Lee (founder of Globrix) and Charlie Ponsonby (founder of Simplifydigital), Plandek is based in London and currently services the UK, Europe, the Middle East and North America.
Find out more about Plandek here: The Plandek Difference.