Measuring the impact of GenAI on engineering productivity

Measuring the impact of GenAI on engineering productivity and the software delivery process
Pranav Lakhotia
Pranav Lakhotia

Head of Professional Services, Plandek

Table of Contents

Measuring the impact of GenAI on engineering productivity and the software delivery process

In this article, we present Plandek’s findings on where GenerativeAI tools are being used within the industry at present before going on to provide a point of view on the impact the use of these tools can have on productivity – where the value lies, how we can measure the impact and pitfalls to avoid.

1. Introduction to GenAI tools

GenAI tools are the latest iteration of artificial intelligence and machine learning technologies. These tools are based on complex algorithms that, through deep learning techniques, can analyse vast datasets and generate new ones, imitating styles, patterns, and structures.

In the software development world, we are seeing teams start to use tools like Github Copilot and ChatGPT, which help engineers in a variety of ways, from pair programming and automating code documentation to writing unit tests – all of which are designed to help engineers save time and bring code to production faster.

Over the past year, Plandek has seen increased usage of these tools across our client base. Due to this, we, along with our customers, were curious to see where the impact of using these tools can be seen. More importantly, we wanted to start measuring whether these new ways of working made a material difference in productivity, efficiency, and ROI.

Within this article, we discuss some initial findings from survey results and client interviews on where and how GenAI is being used before moving on to presenting a point of view on how the impact can be measured and where you can start to see real value from the use of these tools (and where the impact may be muted and why).

 

2. Use cases

In April 2023, Gartner conducted an early-stage survey on how and where GenAI tools are used within the software engineering industry. Their findings are highlighted in the illustration below.

Gartner's early-stage survey on how and where GenAI tools are used within the software engineering industry

While Gartner’s survey results helped to frame where the impact of GenAI might be seen in relation to broader use cases, we needed to understand how our clients were adopting GenAI (largely Github Co-pilot) within their engineering teams, the prominent use cases and the challenges teams faced to support them with identifying and understanding the impact of GenAI.

Interestingly, the feedback we received from clients suggests a very different set of use cases from Gartner’s survey, which could be simply down to our client’s environments and preferred areas to deploy the technology or could reflect evolving use cases more broadly in the market.

Amongst our customers:

  • The highest value for GenAI tools was seen when automating unit tests and documenting code.

  • They have seen moderate value in code generation or pair programming, but the challenge faced so far is the technology often provides incorrect results, which need to be double-checked and corrected before moving to testing.

  • Finally, the highest reason for the drop off in GenAI usage amongst our client base was the inaccuracy of results – engineers cited that they can write code from scratch faster than having to correct generated code. We dive a little deeper into this in section 4.2.

3. Building a business case for adopting GenAI tools

To build a business case for adopting GenAI tools, engineering leaders could look to measure the impact these tools could have on the following pillars – cost avoidance, cost reduction and roadmap acceleration. We believe these three pillars provide a good basis for someone to build a business case around and understand whether using GenAI tools can ultimately achieve what most organisations in the industry are striving for – “deliver more for less”.

Three pillars for measuring GenAI tools

To measure the impact the use of GenAI tools can have on these three pillars and, ultimately, ROI, one might start looking at areas where the impact can be immediately evident and, subsequently, start to measure whether improvements in these areas can result in real value being realised.

4. Areas of impact

Software delivery is a complex, multifaceted process requiring various interdependencies to come together in order to successfully deliver new products or upgrade existing ones by adding new features. It’s important to note that the use of GenAI tools will only help certain aspects of the delivery process, not all of them.

When we start to look for the impact these tools can have, we are immediately drawn to look at value-driving metrics such as Lead Time to Value, Cycle Time, Sprint Completion %, or Deployment Frequency, as they are generally indicative of product/software delivery health and tie most closely to the ROI pillars above. However, these metrics measure the overall efficiency and health of a complex and multi-faceted software delivery system involving highly influential factors far beyond the reach of GenAI (e.g. the collaboration required between team members working in different functional roles, such as a product owner and engineer). As such, it is often difficult to distinguish the impact GenAI tools have on the overall system if it is negatively impaired by factors outside the remit of how GenAI is deployed.

However, if we think about the use cases (section 2) where GenAI tools are being used in the industry right now, we do have certain areas where they can have a direct impact. For example, using GenAI tools to help automate the writing of unit tests can potentially help you reduce the number of bugs coming out of the system or, at the very least, help you get code into testing faster.

Now, it’s natural to assume that if we see improvements in the directly impacted areas, we should start to see improvements in the real value-driving areas as well, but this doesn’t necessarily have to be the case. In fact, this is what we’re seeing based on the feedback we’ve gotten from the early adopters of these tools within our client base. In this section, we look at both these measurable areas and explore some reasons why improvements in direct areas do not necessarily correlate to improvements in real value drivers.

4.1 Areas of direct impact

When engineering teams start to adopt GenAI tools, they are likely to see fast and easy-to-measure impact on one or more of the following areas –

  • Adoption (as a %age) of GenAI tools

  • Time to Merge Pull Requests (PRs)

  • Number of PRs Merged

  • Code Knowledge Distribution

  • Number of Bugs

  • Number of Escaped Defects

Plandek is the perfect tool to measure all of the above. However, from within the early adopters of GenAI tools in our client base, we’re seeing some mixed feedback on the impact that the tools have had on the above areas.

One client who has been using Github Copilot mentioned that they’re seeing a decrease in the number of bugs but an increase in the number of escaped defects. This would suggest that the code generated from Copilot contains subtle bugs which are not caught during code review and/or testing and end up in production.

Why might this be? A reason for this could be that engineers are still upskilling in writing prompts to generate code from the tool, and the code being generated is not fully robust yet. This problem should slowly go away as your engineers get better at writing prompts, but until then, a good way to alleviate this is to ensure thorough checks are done on any code generated by these tools. In fact, this is the feedback that we’ve gotten from the client – “code generated needs to be thoroughly double-checked and modified, if necessary, before going into testing. However, we are starting to use these tools as the basis of a solution, then inspecting and modifying if needed.”

It’s well and good being able to measure these areas of direct impact, and one would expect these areas to improve once you’ve started using GenAI tools, but how would these correlate to (if at all) the real value drivers?

4.2 Real value drivers

A few areas to measure to understand if we’re actually deriving value from the use of GenAI tools could be –

One would like to assume that improvements in the direct impact areas mentioned in the previous section would result in the above value drivers also improving, but this might not necessarily be the case. It is key to remember that delivering software requires a complex network of interdependencies to come together. There are different people with differing skill sets involved, so expecting an improvement in direct areas of impact to result in real value being realised can lead you to focus on the wrong problems rather than tackling the real issues at hand.

For example, if your sprint completion % is poor, then that suggests that there may be inefficiencies in your sprint planning process. No magnitude of GenAI tools’ usage can help an engineering team improve this metric if their planning is poor to begin with. To improve their sprint completion and overall sprint predictability, the team would have to ensure they iron out the issues in their planning processes by becoming better at estimations, ensuring work is divided correctly, removing any bottlenecks in their SDLC and resolving any other issues they might be facing.

But once they do and then start to change their ways of working to incorporate GenAI tools, they could start to see significant improvements in all areas.

However, we must caveat the above statement by saying that while your engineers are upskilling themselves in writing prompts, it is possible that you see a decrease in productivity for some time. Your engineers would require possibly a couple of months (could be more, could be less) before they are well-versed in “AI speak”, i.e. are efficient at writing prompts to generate robust code which doesn’t need a lot of modifications before it can go into testing.

The way your engineers spend time delivering functionality will change as they go from writing code from scratch to writing prompts, reviewing generated code, and making changes as necessary. When they’re starting off, it’s likely that they will be slower than if they were writing code from scratch, but given enough time and experience with GenAI tools, we would estimate that they will eventually become a lot more efficient than before and start delivering value for you a lot faster.

4.3 Relating back to the three pillars

1. Cost Avoidance –

  • Normally, to achieve higher throughput, teams tend to hire additional capacity.
  • Using GenAI tools can result in improvements in lead and cycle times, which would indicate that your engineers are becoming more efficient.
  • This increased efficiency can help avoid the need to hire more developers as you’re able to achieve your higher throughput with the same team you have today.

2. Cost Reduction –

  • A good use case for using GenAI tools is to tackle areas of large technical debt by either writing better unit tests or using the tools for pair programming or generating code.
  • If technical debt areas are tackled and resolved, then it can help reduce certain operational costs like hosting, helping you to reduce your overall outlay.

3. Roadmap Acceleration –

  • Once your engineers reach a good level of maturity with writing prompts and using GenAI tools, they will be able to deliver new products or features faster, thereby reducing your time to market.
  • This can prove invaluable in this competitive day and age, enabling you to get ahead of your competition.

5. Conclusion

Software delivery is a complex process requiring a lot of different facets to fall into place for delivery to be successful. From what we’ve seen of the use of GenAI tools so far, we are optimistic about its value. But, it needs to be used correctly to ensure that you derive the most value from the tools.

The caveat about an initial drop in productivity cannot be overstated, as this is a very likely scenario when teams start to incorporate these tools into their ways of working. However, this dip will be temporary, as once your engineers gain more experience with using GenAI tools, they will become more efficient, and you’ll start seeing improvements in your directly impacted areas.

As mentioned in the article, whether these improvements result in actual value being realised depends on other factors at play. Plandek’s product and people are perfectly positioned to help you improve these other factors to ensure you get the most out of your investment into the GenAI tools in terms of cost avoidance, cost reduction, and accelerating roadmap.

We’re very excited by the use of GenerativeAI and encourage customers to start adopting the tools as we see where and how they can be impactful, but this shouldn’t be seen as a silver bullet to drastically improve productivity. Having said that, if used correctly and other aspects of your software delivery life cycle are also improved, you are very well positioned to achieve the ultimate objective of delivering more for less.

About Plandek

Plandek is an intelligent analytics and performance platform to help software delivery teams deliver valuable software faster and more predictably.

Plandek enables technology teams to track and drive their improvement and share understandable KPIs with stakeholders interested in accelerating value creation/ improving delivery efficiency.

Plandek works by mining data from delivery teams’ toolsets (such as issue tracking, code repos and CI/CD tools) to provide actionable and intelligent insight across the end-to-end software delivery process.

Plandek is recognised as a top global vendor in the DevOps Value Stream Management space by Gartner and Forrester and is used by private and public organisations globally to optimise their technology delivery and accelerate R&D ROI.

For more information, please visit www.plandek.com

View more blog posts

Ready to get started?

Try Plandek for free or book a demo with our team