Leaving Stripe: Parting Thoughts

Goodbye Bazel, Hello Brazil

Aug 05, 2024

Last month I resigned from my post at Stripe to return to my previous role in AWS Cryptography. I am excited for the change, and for the opportunity to do work more aligned with my interests and domain expertise.

I will remember fondly my time at Stripe, and especially all of the engineers I had the opportunity to work with. Working at Stripe was a great learning and growth opportunity, and I have no regrets about my last four and a half years. However, I did have a few things I wanted to share before closing out this chapter of my life.

Edit the company

Every transition has elements of push and pull. I am leaving primarily because of the opportunity I have at AWS. However, I still want to take a moment to reflect on the secondary factors that made Stripe a less-appealing place for me to stay. What I share here I share only with the hope of inspiring others to edit the company in ways which I was ultimately unable to, with the hope that we make a great employer even better.1

Process, not products

Stripe has a very “ship” based culture. The processes and norms feel optimized for delivering a big chunk of work where we can clearly say “we did it!” However, not every type of work fits into this paradigm.

The over emphasis on bundling everything into a “project” that we could declare victory over means teams would struggle to incorporate work that requires small-but-constant improvement. In my case, I’m specifically thinking of reliability and operations.

As an example, reliability is not something we can ship or declare victory over (though it often felt like we tried). Reliability requires constant vigilance, the flexibility to tackle problems as they occur, and the knowledge that problems will occur constantly.

Much of what made Stripe reliable and a generally productive place to work is the work of engineers who diligently sand away the rough edges that cause toil. Paraphrasing one of the lessons from Toyota Production Systems, the most critical work is the work that smooths the way for other work.

Execution versus impact

Stripe is also very “impact” driven. This alone is not a problem, and I definitely appreciate how we got here. When you have a large number of high-agency engineers working on various problems, under budget and time constraints, you want folks to leverage their efforts to achieve the highest impact. And it follows from that we want to close the loop by holding people accountable to the impact they deliver.

However, as Stripe grows and folks are more removed from the decision making processes, I see the expectation of having folks speak to the impact of their work causing more and more tension with engineers. Especially folks earlier in their career. Put another way, an employee that doesn’t have input into the project they are taking on should not have to speak to the impact of the work. At a certain point (at least for junior) we should be holding engineering managers accountable for impact and engineers accountable to execution.

Overloaded engineers

It was not uncommon during my time at Stripe for teams to be loaded with work such that there are 2 active work streams per engineer. This is a higher amount of loading than I think is healthy, and it impacted us in a couple of ways.

First, there is very little slack in the system to pick up work on an as-needed basis. A certain amount of slack in resources is needed to polish issues as blemishes are noticed.

Secondly, this type of loading negatively impacts the sense of comradery on the team. A team where everyone is working on a different thing is not really a team; it is just a group of engineers sharing a sprint board. For teams with a broad scope of ownership, this also limits the amount of context anyone will have relative to all the systems they’re expected to operate. This makes on-call much more stressful and risks burning out the team.

Lastly, a smaller set of goals that everyone on the team has context on makes it easier for teams to self organize and allows engineers to experience a higher degree of agency within the team.

Interfaces

I saw a number of initiatives during my time at Stripe that focused on shoring up reliability and security for critical components. These generally took the form of evaluating each component of a system against a rubric.

Complexity, however, is at the edges. It is at the interfaces of our systems. This turns out to be where the reliability risks are as well.

I’m not sure what my point is here. But my advice is if you ever find yourself asking about the reliability or safety of a service, you’re almost certainly asking the wrong question. The focus should be on systems; the flows and the units of functionality provided to internal and external customers.

Incomplete migrations

Stripe struggled to complete migrations, and we did not properly account for the weight of the work not finished.

Sometimes the migrations were not finished because we did not budget the work to move load to the new system after creating it. Often, these migrations were very expensive to actually do because of how untested the flows the run on them are; moving is too scary and stressful.

Other times we took a dependency on another team's “migration” which never completed.

We could reduce the context engineers need to maintain by a significant percentage by deprecating old systems before building new ones.

On my team, we had at least four incomplete migrations. This work-not-finished was a major drag on the productivity of the team. It made changes more difficult to reason about because of the multitude of code paths that need to be considered. Reliability was impacted for similar reasons: more to think about when making changes or performing mitigations during incidents.

I recommend folks adopt a budget on the number of systems or code paths their team will own. Space is created in the budget by turning off systems which are no longer needed.

Shared context

My largest personal struggle at Stripe was trying to get where I felt like I was “in the room” where strategy was being formed and solidified. As a staff-engineer wearing a tech-lead hat, I need shared context with my manager and their manager to correctly prioritize and do my work. This means knowing the criticism of plans and ideas from higher up. Receiving a filtered view of the world hinders my ability to affect change within the company.

This is actually my largest source of dissatisfaction with working at Stripe. At my previous employer, even though I was nominally operating one level below where I am at Stripe, I had more access to my leaders 2-3 levels up the organization hierarchy. There was significantly more transparency around the reality of resource constraints and business needs (and dare I say, internal politics.)2 I felt I had a seat at the table when it came to roadmap building and decision making. Lastly, at my previous employer, I felt that I had leaders willing to lend me their authority in a way that I do not have here.

It is possible that when it comes to Stripe that I am “holding it wrong” and that getting things done here requires a different skill set than the one I currently possess.

Parting advice to engineers

Finally, I want to share some parting advice to the engineers I worked with, especially those more early in their career. You’re all amazing people and you’re going to go places for sure. Hopefully the things I say below were thing you heard from me while we worked together, but in case you didn’t (and for everyone else), here they are:

You can do it

The teams that I worked on during my time at Stripe dealt with a lot of gnarly interoperability issues with external companies and services. We often discovered issues in open source packages or had to create our own libraries for obscure protocols. Issues were often found only after a system had been misbehaving for a while and large amounts of money were at stake.

This can be an intimidating and paralyzing environment to work in.

Believe that you can find the answer by reading the code, by reading the RFC, by looking at raw packets. Everyone is capable of seeing the Matrix. It isn’t a quality folks are born with. It just takes practice. Gumption.

It does also often require muting the Zoom call and going heads down for some time. The job of your leaders is to play defense so that you have the room to do what you do best. Hold them accountable to this task. During an emergency the first priority is to mitigate. Let others focus on communication.3

Sometimes you fail

A senior engineer is just a junior engineer that has made a lot of mistakes and seen others make a lot of mistakes. Making mistakes, dragging teammates into incidents, and even occasionally breaking production comes with the territory.

It doesn’t feel good when it happens. It usually feels really shitty. But you are growing. And it sucks, but this is just what it feels like. You are pushing your own abilities. Failure is truly the best teacher.

You don’t owe the world perfection. Just do your best to make sure you have learned something from the experience.4

One way to grow from a mistake is to involve yourself as much as possible in the incident review/RCA process. Anecdotally I have found a well written analysis of an incident with a good list of remediation items has earned me more professional praise than causing the underlying incident earned me blame.

Ask the hard questions

Much of the value I provided at Stripe was being that guy. When engineers and leaders are sitting in a meeting reviewing a retrospective on an incident or current operational metrics, there is an ever present risk of complacency. This is a mode that even the most disciplined, operational rigorous teams can get into. And it requires someone to break from the social cohesion.

When you’re looking at a design or reading an incident report, notice when something gives you pause or causes you unease. When something doesn’t feel right, call it out.

Raise your hand.

Practice putting it into words.

Onward

As mentioned earlier, my new role is at AWS Cryptography. I am joining as a Principal Engineer and will work with my team to continue to innovate in the realm of applied cryptography. I’m excited to join an organization that is eager to take on large, moonshot projects and tackle some hard problems.

I also offer this feedback with the understanding that I am but one person with one set of opinions, and that my ideas may conflict with the ideas of many other experienced engineers who know better than I do. When reading, please weigh heavily your own experience and common sense, and decide accordingly.

Why is this transparency important? See

If I do not know my readers/leaders, if I do not know what they value and what they doubt, it is very difficult to create value.

In aviation there is a rule that is drilled into you early on: aviate, navigate, communicate.

Though you may on occasion fail at that too.

Bits and Being

Discussion about this post