👋 Hi, this is Gergely with a 🔒 subscriber-only issue 🔒 of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’ve been forwarded this email, you can subscribe here. Inside Stripe’s Engineering Culture: Part 2Stripe is one of the world’s largest online payment companies. This article is a deep dive into its engineering culture: operational excellence, API review, internal tools, and more.Founded in 2009, Stripe is one of the biggest tech success stories of both Silicon Valley and Europe. It has dual headquarters in San Francisco, US, and Dublin, in the Republic of Ireland, employs thousands of software engineers, and processed $817B of payments in 2022. That’s around $1.5M processed per minute, on average. But what’s it like to work at Stripe as an engineer? Over the past few months I’ve talked with CTO David Singleton, and others at the company. This article is the second and final part of a deep dive into how Stripe works; with more previously unshared details of how engineering works there. In Part 1, we covered:
Today, we round up this topic with:
For this article, Stripe has exclusively shared close to a dozen screenshots of its internal systems, which are marked in the captions as “Source: Stripe.” Check out other engineering culture deep dives on Meta, Amazon, Linear, Figma, Sourcegraph, Agoda, and others. Read them all here. The second part of this email can be cut off in some email clients. Read the full article uninterrupted, online. 1. More unique features of Stripe’s engineering cultureIn Part 1, we cover Stripe’s engineering culture. But there are more unique things about Stripe which deserve a mention for contributing to its success. Writing cultureIt’s rare to come across a company with such a strong culture of writing, as Stripe. When I talked with David about how this works, he had a document – well, a “Go” link (covered below) for all my questions. I asked him how the writing culture emerged. He said:
But how does Stripe support this writing culture? Undeniably, it's extra effort to write things down, and it’s often tempting to just get to coding, instead. Here’s how David says Stripe does it:
That the CEO and CTO write regularly, and pretty much all engineers produce longer internal documents, seems to encourage new recruits to do the same. I haven’t discovered anything that would help other companies kickstart a writing culture from scratch, so I asked for inspiration and examples of writing by developers. Here are some external documents that showcase the kind of writing engineers at Stripe produce:
Engineers wearing a “product hat”Stripe didn’t have the Product Manager (PM) role for a long time, which has had an impact on engineering. Engineers participate in all parts of the product development process: figuring out business scope, talking to users, collaborating with designers, lawyers, accountants, etc. They don’t only do software development, as is common at “traditional” companies. I’d compare Stripe’s approach to product-minded engineers. In the early days, this meant engineers did a lot of what PMs do. These days, Stripe has found it’s helpful for engineers to partner with PMs because the number of features and products has grown. Stripe now has many PMs and product folks who add a ton of value. That being said, Stripe still expects engineers to be close to users and involved in the process, end-to-end. Staying close helps the feedback loop stay tight, and ensures Stripe builds things users want. “Engineerication” for engineering managersStripe encourages engineering leaders to experience being an engineer, in what it calls ‘engineerication’. The idea is that if a manager can “down tools” and go on vacation, they can also down tools to spend a week as an IC engineer, too! As Stripe’s CTO, David leads by example; embedding himself in an engineering team and working end-to-end on a project. “Engineerication” is very helpful for identifying pain points in developer productivity, and frequently leads to improvements. My observation is that very few companies explicitly encourage this, and without empowerment from above it’s hard to justify doing zero managerial work for a week, in favor of engineering work of not especially high value. Friction loggingA friction log documents the end-to-end user flow. The person playing the role of “friction logger” gets in the mindset of a user trying to do something like sign up to use a new product, understand a warning on the portal, or generate a report. They note all friction points they hit, as well as ideas for enhancing the UX. Based on a summary from two Stripe engineers, the typical structure of a friction log is:
Stripe has advice for making the most of friction logs:
Stripe encourages engineers, managers, and everyone else, to produce friction logs at least occasionally. The key is to get deep into the mindset and context of a user. My own note: friction logging is especially powerful when done – or acted on – by people in authority, like managers, directors, and the C-level. This is because they can advocate effectively for change. 2. Operational excellenceOperational excellence – which means operating systems reliably – is a massive focus at Stripe. This should come as no surprise because reliability is non-negotiable for any payments system. Stripe’s annual letter of 2022 highlighted its API reliability, stating:
In Part 1, we covered the focus on automated testing at Stripe. In the same letter, the company reinforces how testing and reliability go hand-in-hand:
The importance of operational excellence is regularly reinforced at CTO-level. In April 2023, David sent an email to all of engineering entitled “Operational excellence.” It started with a reminder of why operational excellence matters so much to a payments company:
What is operational excellence?David’s email explained:
An interesting paradox David ekes out in his email is that changes cause operational failures, but Stripe must ship changes in order to keep moving quickly:
Safe change managementThe best way to ship often and safely is to ensure every change is automatically tested via the CI pipeline, and gradually deployed. This is the default for how all new services at Stripe are built. Writing good tests. TDD (test driven development) is not mandated at Stripe, but writing good tests is emphasized. Good tests are:
Auto-deploying code. Testing alone is not sufficient to prevent failures for users. Stripe used to ask engineers to carefully monitor the deployment of their code to production by watching charts to spot problems. Today, almost all services are auto-deployed and gradually rolled out. Stripe believes that automatically initiated and monitored deployments are more reliable than those “babysat” by humans. Automated rollout can also proceed more slowly as there’s no impatient engineer waiting to do the next task. We cover more on Stripe’s deployment tooling in Part 1. Things can still go wrong. No amount of testing can ensure production has no issues, but it can minimize risk. Latent issues which are unobservable outside production can still occur. A common latent issue is when two changes get released at nearly the same time, and impact one another. PreventionA key principle at Stripe is to design and build defensive systems with failures in mind. Failure engineers are encouraged to think about:
Design with fault domain minimization in mind. This principle is commonly known as “reduce the blast radius.” Systems are designed so that failures are isolated, and don’t propagate to impact broad swathes of the user base. Build scalable systems. Stripe encourages engineers to gather information on how much their systems need to scale in the near future, and to plan proactively; for example, for Black Friday/Cyber Monday loads. We cover the technology choices behind Stripe’s Black Friday/Cyber Monday dashboard. Build systems where the easiest path is the safest, most reliable one. Stripe makes it hard to do things like disable health checks, skip checks when deploying, and override sensible defaults. Detection, Remediation, EliminationThere is one dashboard which gives a sense of the service’s health (“detect.”) This allows rapid detection when things go wrong. Every service at Stripe has its live metrics dashboard displaying important operating characteristics, all in one place: Ensure failures do not repeat (“eliminate.”) Stripe aims to be a “learning organization,” which learns from incidents so they don’t happen again. The company uses incident reviews, asking “why” questions which get to the root of an issue, and aim to fix it and other similar issues. Stripe runs an Ops Review within each group. This tends to happen weekly, when teams check the operational posture of systems. Topics include:
3. Internal toolsLike many larger tech companies, Stripe has built plenty of its systems internally. Here are a couple custom-built tools most people working at Stripe (referred to as ‘Stripes’) use on a day-to-day-basis. “Go”When I asked David about a resource within Stripe, he responded that it can be found at “go/{something}.” “Go” is Stripe’s internal URL shortener. For example, go/home resolves to home.corp.stripe.com On top of being easy to use, Go has the neat feature that when someone makes a “go link,” Go also indexes the content behind this link, making it available to be found via search. Intranet HomeStripe custom built its intranet home page. This site displays announcements, company events, new joiners, and the latest features shipped and fixed. In a nice touch, people whose birthday it is are also listed! The intranet home page is search-heavy. I’m told many engineers use it as a gateway to resources, company-wide. One neat feature I found was that the search page autofills suggestions based on your own search profile: Stripe built this custom intranet portal back in the early 2010s, when there were no vendor solutions for portals. A small team still maintains Home. TrailheadTrailhead is Stripe’s internal product and documentation system. The structure is based on that of Stripe’s external docs. CompassCompass is Stripe’s internal project management tool. It’s custom-built, and pretty popular internally – and with engineering! The tool displays recent updates on a selected project; documents linked to it, helps assess launch readiness, and more: Compass has several Stripe-specific features:
A note on how Stripe uses SlackStripe uses Slack for conversations, like at many companies. However, the company is very clear that Slack is not meant to be a canonical, long-term repository for important information. If something is posted in Slack, it’s expected it will disappear, or become impossible to find. Stripe encourages engineers to create longer-term artifacts, instead of relying on Slack. Artifacts like Google Docs or other documents or pages, can be found on portals like the Intranet Home. The importance of archiving important information like details on design decisions is emphasized Stripe’s API review guidelines, which we cover later in this article. It’s smart that Stripe is explicit about Slack usage, given it’s tempting to dump everything into Slack and assume people will find important documents. If your company hasn’t verbalized guidelines on the long term storage of information, taking inspiration from Stripe could be good! 4. Internal engineering and FinTech platformsStripe has built several platforms that are heavily used by its software engineers:... Subscribe to The Pragmatic Engineer to read the rest.Become a paying subscriber of The Pragmatic Engineer to get access to this post and other subscriber-only content. A subscription gets you:
|
Search thousands of free JavaScript snippets that you can quickly copy and paste into your web pages. Get free JavaScript tutorials, references, code, menus, calendars, popup windows, games, and much more.
Inside Stripe’s Engineering Culture: Part 2
Subscribe to:
Post Comments (Atom)
I Quit AeroMedLab
Watch now (2 mins) | Today is my last day at AeroMedLab ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ...
-
code.gs // 1. Enter sheet name where data is to be written below var SHEET_NAME = "Sheet1" ; // 2. Run > setup // // 3....
No comments:
Post a Comment