Revised edition 2023
The Ultimate Guide to Software Delivery and Engineering Metrics
8. DevOps and Engineering Metrics (including DORA metrics)
This category covers a broad range of potential metrics. We focus on those metrics we believe are key to the end-to-end software delivery process and the health of the engineering capability underpinning that process.
The DevOps metrics that we list here reflect the core Agile objective of increasing the frequency of deployments by better managing the continuous integration and continuous deployment process.
As such, the metrics chosen reflect the key objectives of:
- reducing build failure rate – a major source of friction in the process;
- reducing time to build and time to recover from failed builds – another critical determinant of deployment efficiency; and
- streamlining the Pull Request process in order to optimise the time it takes to go from commit to deployment.
Number of Deployments and Deployment Frequency
Tracks the number of deployments to live and the frequency of those deployments. This is a core Agile metric as the underlying objective of Agile (as stated in the Manifesto) is “the early and continuous delivery of valuable software”. As such, this is the ultimate test of our ability to deliver software in an agile way.
Number of Builds and Build Frequency
Related to deployments is the number and frequency of builds (relating to the build steps leading up to deployment, such as compiling, generating code, packaging, etc.).
Mean Build Time and Mean Failed Build Time
Mean Build Time analyses the average time taken to execute a build, which is very often a critical metric owing to lengthy build times affecting overall Lead Time. Of particular importance is Mean Failed Build Time, with the old adage that if you are going to fail, it is better to fail fast. Identifying these builds enables teams to resolve issues and to move more complex steps in the build process earlier in order to minimise the time that teams are down during builds.
Build Failure Rate and Mean Time to Recover from Failures
These metrics are often extremely helpful. Build Failure Rate looks at the percentage of failed builds and can be filtered by workflow. Failed Builds are a significant risk to delivery, both in slowing the process and creating additional work to respond to the incident (which is tracked in Mean Time to Recover from Failures).
Flakiest Files (which is only available in the Plandek dashboard) correlated commit and build data to identify the files that are the source of build failure. This enables teams to find flaky files and resolve them more effectively and quickly.
There are four DORA metrics popularised by the DevOps Research and Assessments (DORA) group, which have become increasingly popular. These are:
- Lead Time for Changes
- Deployment Frequency (described above)
- Change Failure Rate (described above)
- Mean Time to Recovery
Lead Time for Changes
Lead Time for Changes is defined in ‘Accelerate: The Science of Lean Software and DevOps’ by Kim, Humble and Forsgren (the book that popularised the DORA metrics) as: ‘the time taken to go from code committed to code successfully running in production”.
As such, it is very similar to the Code Cycle Time metric.
Code Cycle Time is a broader metric in that it provides insight into the different stages that a Pull Request goes through and the time to deploy. These stages are defined as:
- Time to Review – From open to the first comment or review
- Time to Approve – From the previous stage to approved
- Time to Merge (Commit)/Close – From the previous stage to merge/close
- Time to Deploy – From the previous stage to deployed to production.
Whilst Lead Time for Changes focuses only on stages (3) and (4) above.
Mean Time to Recovery
Time to Recovery (also known as Time to Restore) measures how long it takes to restore service when a service incident or defect impacts customers.