An incomplete list of software development metrics

2024-Apr-07

It's a list!

There have been a couple of posts on Statistical Process Control (SPC) recently. The most common objections are that you can't measure software development, that software development is too intrinsically variable for these techniques to be worthwhile, or the measures have no effect on outcomes that matter (e.g., Free cash flow (FCF), return on invested capital (RoIC), etc.).

Here's an incomplete list of measurements that might apply to software systems and their associated development processes:

Team size
Team aggregate years of experience
Average years of experience
Count services owned by the team
Time since last deployment to a service
Time since last commit to a service
Count of tickets in the backlog
Count incomplete estimated tickets
Count of unestimated tickets
Age of oldest ticket in backlog
Age of oldest ticket in $Stage (in-progress, review, to deploy)
Count of tickets in $Stage
Test non-commentary source statements (NCSS) total
Test NCSS added per PR
Non-test NCSS total
Non-test NCSS added per PR
Test cyclomatic complexity
Non-test cyclomatic complexity
PR age
Time between PR ready for review and PR merge
PR comment count
PR count
Time between PR merge and deploy
Build duration
Deployment duration
Count of commits deployed per deployment
PRs per story
Story cycle time
Story cycle time broken down by story point estimate
Story lead time
Story ticket word count
Story comment count
Stories complete per week
Story points complete per week (if tracking story points)
Scrum meeting dev-time (grooming, planning, review, daily stand-up, etc.)
non-Scrum meetings dev-time
Non-Scrum meeting count
Tickets completed without estimate (if estimating)
Tickets completed with estimate (if estimating)
Dev-time spent per ticket estimate (if estimating)
Within-team estimate noise (if using unambiguous estimates and getting multiple estimates for each ticket)
Estimate mean squared error (if using unambiguous estimates, e.g. ticket duration, calendar date.)
Estimate/Actual ratio (if using unambiguous estimates)
Tickets in progress per developer (not done or in backlog)
Count of tickets "resurrected" from "done"
Count of tests failed
Time to repair failed test
Test duration
Deployment count
Deployment automatically rolled-back count
Deployment manually rolled-back count
Statement coverage
Branch coverage
t-way coverage
Mutation coverage
Tests per API method
Count of ticket $Type (bug, task, story, ...) in each $Stage (incl. backlog)
Tickets priority counts (if tracking priority)
Paging alert count
Non-paging alert count
Paging alert response time
Non-paging alert response time
Time to resolve a page alert
Time to resolve a non-paging alert

There's also the usual system monitoring suspects: cloud provider costs, error count, request counts, latency percentiles, disk IOPS, count inodes used, days to TLS expiry, dependency request count, dependency error count, dependency latency, retry count, etc.

There's also the DORA metrics, which I think I've somewhat duplicated above.

I think that this list conclusively disproves the claim that we can't measure software development.

I'd also argue that this weakens the claim that we can't measure things that affect the outputs we care about. There's a lot of possible metrics here. It would be surprising if none affect FCF, RoIC, etc.

I don't think this list helps much with the claim that software development is too intrinsically variable for SPC to apply. I think you need to try it out in your context to know if that's true for you. If it is true, the next question is: are you sure it's intrinsically high-variance? What did you try to change the process' variance?

Some of the metrics are hard to measure, and some may need operational definitions. For example, what exactly is an "API method", does it matter if it is publicly accessible?

There's one final objection that this doesn't cover: The quality of the software process is irrelevant compared to the product and positioning. I don't believe this; I think they're different skills. A team with fast, reliable execution and a good sense for the market will probably to outperform a team who only has a good sense for the market. There's also the lifetime of the software after you've launched it (bug fixes, improvements, etc.) to consider. A team that can deliver faster and more reliably probably delivers better RoIC.

Thank you to both kqr for the original inspiration, and Cedric Chin for helping me reflect on how I've seen others measure software development.