👋 Hi, this is Gergely with a free issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’ve been forwarded this email, you can subscribe here. Measuring Developer Productivity: Real-World ExamplesA deepdive into developer productivity metrics used by Google, LinkedIn, Peloton, Amplitude, Intercom, Notion, Postman, and 10 other tech companies.Before we start, two notes:
Last year, the most popular article in The Pragmatic Engineer was “Measuring developer productivity? A response to McKinsey,” which I wrote in collaboration with Kent Beck . Clearly, the topic of measuring developer productivity resonated with many readers! One criticism of the article was that Kent and I offered frameworks for thinking about developer productivity, but we didn’t provide a blueprint to follow when addressing how developers should be measured. However, this was intentional because – like many complicated questions – the answer starts with “it depends.” But it’s been bugging me that I didn’t have more concrete answers about which metrics to measure, especially because this newsletter covered how LinkedIn and Uber track developer efficiency. So, I reached out to someone with close to a decade of experience in precisely this field. Abi Noda is coauthor – alongside the creators of DORA and SPACE – of the widely-cited whitepaper, DevEx: a new way to measure developer productivity. He’s also the cofounder and former CEO of developer productivity tool Pull Panda, has worked on software delivery measurements at GitHub, and is CEO and cofounder of developer insights platform, DX. Abi also writes the Engineering Enablement newsletter. For transparency, I’m an investor and advisor to DX. In his work, Abi has interviewed teams responsible for measuring developer productivity at 17 well-known tech companies. In this article, we focus on a selection of businesses; from the largest, to medium-sized scaleups. In this issue, we cover:
With that, it’s over to Abi. 1. Developer productivity metrics at 17 tech companiesThere's been a lot of debate recently about how to measure developer productivity. There’s no question it's a complex problem: software engineering is knowledge-based work, so even the question of what it means to be “productive” is tricky. Then, you start getting into metrics that can be harmful or be gamed, and it’s obvious how conversations can quickly become muddled or pedantic. But there’s a simpler way. Many companies have dedicated teams focused on making it easier for developers to ship high quality software. You’ve probably heard of them: they’re often called Developer Productivity (DevProd,) or Developer Experience (DevEx) teams. What if, instead of debating how to measure productivity, we looked into what these teams actually measure? The thing about these teams is they need developer productivity metrics in order to do their jobs. To prioritize the right projects, they need to be able to measure how productive engineering teams are, and what’s hindering them. They also need metrics to track and show that their work actually moves the needle. By studying which metrics these teams use, I think we can learn a lot about which ones are genuinely helpful. To find this out, I reached out to 17 leading tech companies, asking about the metrics their Developer Productivity functions or teams use. Below is an overview of developer productivity metrics used across different companies, at time of publication. Here’s what they shared: You can see there’s a wide range of metrics in use, including:
Every company has its own tailored approach to measuring its engineering organization’s efficiency. For this article, I’ve chosen four companies by size, in decreasing order, and do a deepdive on each one:
If you’re interested in learning more about the metrics defined in the table above, get a detailed report emailed to your inbox. 2. Google’s approachI often point to Google as a model for how to measure developer productivity. Still, plenty argue that replicating Google's level of investment is unattainable ("we're not Google.") While it’s true some of the metrics Google captures are out of reach for most companies, I believe organizations of any size can adopt Google's overall philosophy and approach. The Developer Intelligence team is a specialized unit in Google, dedicated to measuring developer productivity and providing insights to leaders. For instance, they help internal tooling teams understand how developers are using their tools, whether they’re satisfied with them, and how fast, reliable, or intuitive the tools are. They also partner with executives to understand the productivity of their organizations. Whether measuring a tool, process, or team, Google’s Developer Intelligence team subscribes to the belief that no single metric captures productivity. Instead, they look at productivity through the three dimensions of speed, ease, and quality. These exist in tension with one another, helping to surface potential tradeoffs. To illustrate, consider the example of measuring the code review process. The team captures metrics for each dimension:
Of course, the specific metrics used vary depending on what’s being measured. But the three core dimensions remain constant. Google uses qualitative and quantitative measurements to calculate metrics. It relies on this mix to provide the fullest picture possible:
Many metrics that Google uses are captured via behavioral methods:
3. LinkedInLinkedIn employs more than 10,000 people and operates independently within Microsoft. Like Google, it has a centralized Developer Insights team responsible for measuring developer productivity and satisfaction, and delivering insights to the rest of the organization. This team sits within the broader Developer Productivity and Happiness organization, which focuses on reducing friction from key developer activities and improving the internal tools they use. The Pragmatic Engineer previously covered LinkedIn’s approach to measuring developer productivity. I’ll recap the three channels LinkedIn uses to capture metrics:
What the previous The Pragmatic Engineer article on LinkedIn didn’t cover, was the specific metrics LinkedIn uses. Here are examples of those which the company focuses on:
Like Google, LinkedIn’s Developer Insights team looks at both qualitative and quantitative metrics in each area. For example, for build times, they compare the objective measure of how long builds take, with how satisfied developers are with their builds:
The quantitative metrics above are typically calculated using medians. However, one challenge when using median values is that the metric doesn’t move, even if the outlier metrics improve. In one instance, the LinkedIn team reduced excessively long front-end build times from 25 seconds to 3 seconds. This was a big win! However, because of the distribution curve of the data, the improvements weren't moving the median materially. Winsorized means are a way to recognize improvements made within the outlier metrics. Winsorized means are calculated by replacing high and low end values with numbers closer to the middle. Here is how Grant Jenks explains this approach:
The Developer Experience Index is a special metric LinkedIn provides to teams. This is an aggregate score based on a number of different metrics, such as those listed earlier. Grant generally cautions against using such composite scores, due to the complexity of developing and calibrating them. As he shared:
For more details, see Abi’s podcast interview with LinkedIn. 4. PelotonPeloton employs around 3,000-4,000 employees, so while being a large company, it’s considerably smaller than LinkedIn. Peloton’s measurement approach began with capturing qualitative insights through developer experience surveys. Later, they started pairing this data with quantitative metrics for a fuller picture. Peloton measures productivity by focusing on four key areas: engagement, velocity, quality, and stability. Here are some of the metrics used:
The developer experience survey – by which many of these metrics are measured – is led by Peloton’s Tech Enablement & Developer Experience team, which is part of their Product Operations organization. The Tech Enablement & Developer Experience team is also responsible for analyzing and sharing the findings from their surveys with leaders across the organization. Here’s how Thansha Sadacharam, head of tech learning and insights, explains the root of their survey program:
The survey happens twice a year. It’s sent to a random sample of roughly half of developers. With this approach, individual developers only need to participate in one survey per year, minimizing the overall time spent on filling out surveys, while still providing a statistically significant representative set of data results. For more, check out Abi’s interview with Thansha Sadacharam. 5. Scaleups and smaller companiesThere are several scaleups on the developer productivity list: Notion, Postman, Amplitude, GoodRx, Intercom, and Lattice. These companies range from employing around 100 to 1,000 engineers. One thing many of these scaleups focus on is measuring “moveable metrics.” A moveable metric is one that developer productivity teams can “move” by impacting it positively or negatively with their work. Moveable metrics are helpful for developer productivity teams to showcase their own impact. Here are some commonalities in what these companies measure: 1. Ease of Delivery (moveable). Most of these companies measure ease of delivery; a qualitative measure of how easy or difficult developers feel it is to do their job. Several DevProd leaders shared that they use this metric as a “north star” for their work, since their teams’ goal is to make developers’ lives easier. This metric is also useful for showing impact thanks to it being fairly moveable. From a theory standpoint, this metric also captures key aspects of the developer experience, such as cognitive load and feedback loops. 2. Engagement. Most of these companies also track engagement, a measure of how excited and stimulated developers feel about their work. While engagement is commonly measured in HR engagement surveys, DevProd teams also cited focusing on Engagement for these reasons:
3. Time Loss (moveable). GoodRx and Postman pay attention to the average amount of lost time. This is measured by the percentage of developers’ time lost to obstacles in the work environment. This metric is similar to ease of delivery, in that it provides DevProd teams a moveable metric which their work can directly impact. This metric can be translated into dollars: a major benefit! This makes Time Loss easy for business leaders to understand. For example, if an organization with $10M in engineering payroll costs reduces time loss from 20% to 10% through an initiative, that translates into $1M of savings. 4. Change Failure Rate. This is one of the four key metrics from the DORA research program. It’s a top-level metric tracked by several companies, including Amplitude and Lattice. The DORA team defines the change failure rate like this:
Lattice measures change failure rate as the number of PagerDuty incidents divided by the number of deployments. Amplitude measures it as the P0s (priority zeros) – the most important priorities – over production deploys. The P0 count goes through PagerDuty, and the deploy count is from their continuous delivery service, Spinnaker. 6. Interesting findingsSeveral interesting findings stand out to me after reviewing how 17 tech companies benchmark engineering productivity: DORA and SPACE metrics are used selectivelyI expected the metrics from DORA and SPACE to appear more frequently, since they’re often thought of as “industry standard.” But only one company, Microsoft, mentioned embracing one of these frameworks wholesale, which makes sense since they authored the SPACE framework. For other companies, some individual metrics from these frameworks were mentioned only as components in a broader, more holistic measurement strategy. Broad adoption of qualitative metricsAcross the board, all companies shared that they use both qualitative and quantitative measures. For example, Developer Engagement was shared as a metric by Intercom, Postman, and Peloton. Companies including Atlassian, Lattice, and Spotify similarly rely on self-reported measures. This highlights a marked shift in how top companies are approaching developer productivity measurement. Five years ago, most of these companies were likely focused exclusively on quantitative metrics. Our recent ACM paper provides a framework for leveraging qualitative metrics to measure developer experience. For more, check out the article, A new way to measure developer productivity – from the creators of DORA and SPACE. A big emphasis on “focus time”I was surprised by how many companies track “focus time” as a top-level metric. Although research has shown “deep work” to be an important factor in developer productivity, I didn’t expect as much attention on it as I found. Stripe and Uber shared specific metrics, such as “Number of Days with Sufficient Focus Time,” and “Weekly Focus Time Per Engineer, while other companies mentioned deep work as a topic they measure in their developer survey programs. The Pragmatic Engineer previously covered how Uber measures engineering productivity. Unique metricsThere’s quite a bit of overlap in what different developer productivity teams are measuring. But there are also a few unique metrics worth highlighting:
7. How to select your own metrics to measureI always recommend borrowing Google’s Goals, Signals, Metrics (GSM) framework to help guide metric selection. Too often, teams jump to metrics before thinking through what they actually want to understand or track. The GSM framework can help teams identify what their goal is, and then work backwards to pick metrics that serve this. Ciera Jaspan, from Google’s Developer Intelligence team, explains how the GSM process is used at Google:
Choosing metrics as a Developer Productivity teamStart by defining your charter. Why does your DevProd team exist? Here are three examples of DevProd team charters:
Work backwards from your goals to define the top-level metrics. If you want to make it easier for developers to deliver high quality software, how will your team know whether you do that? You might want to look for signals such as:
For each category, define metrics to help track how it’s going. For example:
These metrics should sound similar to many of those discussed in this article. Use similar top-level metrics for your DevProd team to convey the value and impact of your efforts. With the right metrics, you can keep everyone aligned within and outside of your team. Operational metrics are ones you’ll want to tie to specific projects or objective key results (OKRs.) Operational metrics are used by many DevProd teams on top of the top-level metrics. Operational metrics include:
I wish there was a good, one-size-fits-all solution, but the reality is that there is not. What is important: choose metrics your team can – and does! – control. Avoid targeting high-level key metrics that can be affected by factors beyond your control. If you’re an engineering leaderIf you’re a CTO, VPE, or Director of Engineering, then it’s almost certain your scope is broader than the definition of developer productivity discussed in this article. When I speak with engineering leaders who are figuring out metrics, they’ve often been asked for metrics by their CEO, or the leadership team. In this case, my best advice is to reframe the problem. What your leadership team wants is less about figuring out the perfect productivity metrics, and much more about feeling confident that you’re being a good steward of their investment in engineering. To demonstrate good stewardship, consider selecting metrics that fall within three buckets: 1. Business impact. You should report on current or planned projects, alongside data that addresses questions like:
This type of reporting is often seen as the product team’s responsibility. Still, only engineering can represent the full set of projects being worked on. A good example is a database migration project, which is unlikely to be on the Product team’s roadmap. 2. System performance. Engineering organizations produce software, so stakeholders will want to know about the health and performance of these systems. You’ll need to answer question like:
Useful metrics to report here include uptime, number of incidents, and product NPS scores. If you have a dedicated Infra or UXR team, they’re likely capturing metrics that fall in this bucket. 3. Engineering effectiveness. Stakeholders want to know how effective the engineering organization is, and how it can be improved. This article has been primarily focused on this, so you can apply what we’ve learned from how DevProd functions measure things. TakeawaysGergely here. Thanks, Abi, for this deep dive into dev productivity metrics. It’s little surprise that the largest tech companies already measure developer productivity. I found it a bit more surprising that businesses with engineering teams in the low hundreds – such as Lattice and GoodRx – also do this. Measuring a mix of qualitative and quantitative metrics is common across all these companies. They all measure several things, and the metrics often capture different areas. For example, Stripe measures both:
Both these measurements give a different view of productivity, and help to forecast problems. If Stripe only measured weekly PRs, they might see a team that’s smashing it by shipping 10 PRs per developer, per week. But what if they see that team has zero days of sufficient focus time? That team is likely to be close to burnout, at which point their number of PRs might drop. Take inspiration from the wide range of measurements each company uses. Most businesses in the survey measure at least 5-6 different metrics. There is a big variance in what they focus on, much of it based on their own priorities and engineering cultures. And I cannot nod enough to Abi’s advice on how to choose your own metrics: start with the problem you want to solve. Is it shipping frictionless, retaining developers by keeping them happy and satisfied, raising the quality of software shipped, or something else? Then work backwards from there! Hire Faster With The Pragmatic Engineer Talent CollectiveIf you’re hiring software engineers or engineering leaders, join The Pragmatic Engineer Talent Collective. It’s the #1 talent collective for software engineers and engineering managers. Get weekly drops of outstanding software engineers and engineering leaders open to new opportunities. I vet every software engineer and manager - and add a note on why they are a standout profile. Companies like Linear use this collective to hire better, and faster. Read what companies hiring say. And if you’re hiring, apply here: Featured Pragmatic Engineer Jobs
See more senior engineer and leadership roles with solid engineering cultures on The Pragmatic Engineer Job board - or post your own. You’re on the free list for The Pragmatic Engineer. For the full experience, become a paying subscriber. Many readers expense this newsletter within their company’s training/learning/development budget. This post is public, so feel free to share and forward it. |
Search thousands of free JavaScript snippets that you can quickly copy and paste into your web pages. Get free JavaScript tutorials, references, code, menus, calendars, popup windows, games, and much more.
Measuring Developer Productivity: Real-World Examples
Subscribe to:
Post Comments (Atom)
Thanksgiving dinner, unstacked
Two menus, twelve dishes, and plenty of expert advice for the holiday ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ...
-
code.gs // 1. Enter sheet name where data is to be written below var SHEET_NAME = "Sheet1" ; // 2. Run > setup // // 3....
No comments:
Post a Comment