Forkcasting
As clear as a puddle of mud

An incomplete list of software development metrics

It's a list!

There have been a couple of posts on Statistical Process Control (SPC) recently. The most common objections are that you can't measure software development, that software development is too intrinsically variable for these techniques to be worthwhile, or the measures have no effect on outcomes that matter (e.g., Free cash flow (FCF), return on invested capital (RoIC), etc.).

Here's an incomplete list of measurements that might apply to software systems and their associated development processes:

  • Team size
  • Team aggregate years of experience
  • Average years of experience
  • Count services owned by the team
  • Time since last deployment to a service
  • Time since last commit to a service
  • Count of tickets in the backlog
  • Count incomplete estimated tickets
  • Count of unestimated tickets
  • Age of oldest ticket in backlog
  • Age of oldest ticket in $Stage (in-progress, review, to deploy)
  • Count of tickets in $Stage
  • Test non-commentary source statements (NCSS) total
  • Test NCSS added per PR
  • Non-test NCSS total
  • Non-test NCSS added per PR
  • Test cyclomatic complexity
  • Non-test cyclomatic complexity
  • PR age
  • Time between PR ready for review and PR merge
  • PR comment count
  • PR count
  • Time between PR merge and deploy
  • Build duration
  • Deployment duration
  • Count of commits deployed per deployment
  • PRs per story
  • Story cycle time
  • Story cycle time broken down by story point estimate
  • Story lead time
  • Story ticket word count
  • Story comment count
  • Stories complete per week
  • Story points complete per week (if tracking story points)
  • Scrum meeting dev-time (grooming, planning, review, daily stand-up, etc.)
  • non-Scrum meetings dev-time
  • Non-Scrum meeting count
  • Tickets completed without estimate (if estimating)
  • Tickets completed with estimate (if estimating)
  • Dev-time spent per ticket estimate (if estimating)
  • Within-team estimate noise (if using unambiguous estimates and getting multiple estimates for each ticket)
  • Estimate mean squared error (if using unambiguous estimates, e.g. ticket duration, calendar date.)
  • Estimate/Actual ratio (if using unambiguous estimates)
  • Tickets in progress per developer (not done or in backlog)
  • Count of tickets "resurrected" from "done"
  • Count of tests failed
  • Time to repair failed test
  • Test duration
  • Deployment count
  • Deployment automatically rolled-back count
  • Deployment manually rolled-back count
  • Statement coverage
  • Branch coverage
  • t-way coverage
  • Mutation coverage
  • Tests per API method
  • Count of ticket $Type (bug, task, story, ...) in each $Stage (incl. backlog)
  • Tickets priority counts (if tracking priority)
  • Paging alert count
  • Non-paging alert count
  • Paging alert response time
  • Non-paging alert response time
  • Time to resolve a page alert
  • Time to resolve a non-paging alert

There's also the usual system monitoring suspects: cloud provider costs, error count, request counts, latency percentiles, disk IOPS, count inodes used, days to TLS expiry, dependency request count, dependency error count, dependency latency, retry count, etc.

There's also the DORA metrics, which I think I've somewhat duplicated above.

I think that this list conclusively disproves the claim that we can't measure software development.

I'd also argue that this weakens the claim that we can't measure things that affect the outputs we care about. There's a lot of possible metrics here. It would be surprising if none affect FCF, RoIC, etc.

I don't think this list helps much with the claim that software development is too intrinsically variable for SPC to apply. I think you need to try it out in your context to know if that's true for you. If it is true, the next question is: are you sure it's intrinsically high-variance? What did you try to change the process' variance?

Some of the metrics are hard to measure, and some may need operational definitions. For example, what exactly is an "API method", does it matter if it is publicly accessible?

There's one final objection that this doesn't cover: The quality of the software process is irrelevant compared to the product and positioning. I don't believe this; I think they're different skills. A team with fast, reliable execution and a good sense for the market will probably to outperform a team who only has a good sense for the market. There's also the lifetime of the software after you've launched it (bug fixes, improvements, etc.) to consider. A team that can deliver faster and more reliably probably delivers better RoIC.

Thank you to both kqr for the original inspiration, and Cedric Chin for helping me reflect on how I've seen others measure software development.