Our success rests on our ability to keep innovation alive at every level of the company. We know that the fastest way to suppress innovation is to make it unsafe to fail, which is why we’ve worked hard to ensure that everyone at Mode is able to take risks. We’ll try something out even if we’re not 100% sure it will succeed because we know that as long as we did the best we could to de-risk it, no one will shame us if it doesn’t work out. In everything we do, we have a “Shared Responsibility, Shared Risk” mentality.
Making it safe to fail requires a blameless post-mortem process, and for that, we have After Action Reviews (AARs). During an AAR, we start with the assumption that everyone did the best they could with the information they had at the time. Participants focus on how to understand and improve the systems and processes that led to the failure, rather than who did what wrong. We use these meetings as opportunities to reinforce that Mode is a safe place to fail, build stronger safeguards against future failures, and bring the team closer together.
11 Open Positions
Your current expertise does not limit what you are able to do or work on here at Covariant. Many of us have research backgrounds in academia, where failure is common and expected. We view it as a strength that our domain knowledge differs so widely across the company, which is why we encourage people to speak up and ask questions if they don’t know or understand something. You won’t know everything (no one can!), but embracing this truth gives us the confidence to make risky bets and try new things. Not only does this help us to grow, but this is how we as a company and industry innovate.
Because robotics is so cross-functional, when something isn’t working, it takes an open mind to find the root cause. Triaging each problem means considering mechanical, electrical, general software, or AI errors. Some organizations get stuck blaming the weakest link. At Covariant, we are hyper-focused on making sure our customers are absolutely thrilled with our products and to do so effectively means short-circuiting that failure mode of blame. Success for us requires teamwork so we fully embrace every challenge as a single unit.
We think this is necessary to learning and improving as a team. Inevitably, some of the things we try aren't going to be successful. When something goes wrong or we fall short of our goals, we don't look to figure out who is to blame -- we figure out where the gaps are in our processes (or our expectations!). We go into retrospectives with the understanding that people did the best they could with the conditions they had. A failed experiment isn't viewed as a personal failure due to our culture of support and accomplishment.
Creating cloud-managed IT that simply works
San Francisco, Austin, Chicago, London, Sydney, San Jose, and Remote (US)
In order to push boundaries and truly experiment, we place a heavy focus on testing. These enable us to take chances without worrying about irrevocable failures. For example, we have multiple test labs featuring equipment in which we’ve invested millions of dollars for our firmware/product side. We run these testbeds almost 24 hours a day for basic feature testing, end-to-end RF testing, competitive performance tests, scale testing of our VPN features, and more. Software engineers use these testbeds as a key part of feature development and it enables them to innovate with greater freedom.
However, while we’re always looking for opportunities to be better, challenge the norm, and improve our products, we believe it’s important to collectively learn from failures and understand what went wrong, so the same mistake isn’t made twice. To that end, we foster an environment where open discussion and collaboration are the norm, and blame and finger pointing aren’t part of the conversation. If there is a customer-facing outage or a serious bug, we have a blameless postmortem. Instead of sharing it publicly, we’ll have a follow-up meeting so that helpful lessons are learned across the organization to guide us in the future. This cultural consistency can be attributed to a reporting structure where all teams ladder up to a tight-knit engineering leadership team comprised of our CTO Bret Hull and two senior directors, Dan Aguayo, who looks after the Firmware Engineering teams, and Jack Horner, who looks after the Cloud Engineering teams.
We currently have ~200 engineers in our software engineering department. We’re organized into 17 teams which range between 1 and 30 people in size. We spend our energy on creative problem solving, making sure we improve the next time, and having support along the way. Meraki is the type of place where people are always up for lending a helping hand, whether you’re new to the company, or a seasoned Merakian.
When we asked one of our software engineers, Phil Dayboch, about Meraki’s culture of safety, he said “I worked on the Meraki phone from launch day until the product’s cancellation. At every turn of the journey, senior management made it clear that no one was to blame for challenges and that they were always willing to provide additional resources needed to get the job done. After the product’s cancellation, I was provided with the opportunity to join any team I was passionate about. It was a positive experience when all of the other teams actively pursued me and my teammates to join them.”
Summer interns soaking up the sun on our roof deck overlooking the San Francisco Bay Bridge.
33 Open Positions
We jump on the opportunity to learn from our mistakes. In a world where not all problems are solved, there are bound to be answers that don’t quite hit the mark. Our customers depend on our software to be reliable, so we have many processes in place to prevent regressions. We understand that everyone on the team is human (except you SlackBot), so we’re bound to make mistakes. When we do make errors, we compile information about the instance, document it, and focus on creating preventative measures to make us more reliable.
Here’s an example to illustrate how we handle failures. Our website’s signup form was recently hit with a malicious script to batch signup a bunch of accounts. The form didn’t include our Google Recaptcha snippet and thus, opened us up to an attack. The original programmer of the signup form was not targeted or blamed for this. Rather, the team came together to add the Google Recaptcha code. As a team, we put together a process to prevent sign ups like this from occurring in the future. Our team has each other’s back and “figuring out who pushed the bug” wasn’t on our agendas. Ultimately, we were able to create a stronger sign up experience in the process.
We expect failure in our production systems and are building safety nets into those systems that automatically recover from failure. If machines can fail, so can people. Part of resilience means reducing the (negative) impact of failure. We encourage systems that self-heal and roll-forward. Engineers aren’t dinged when they push an error, but rather praised when they can reduce the amount of time spent in the outage.
We use AI, especially machine learning, precisely because we do not know the right answer before we start. We run many experiments and accept that most will fail. Our success metrics, therefore, are based on learning quickly to ultimately make people more effective through the use of AI: some things we build will help a little, some a lot, and some not at all. In the end, our goal is to keep moving the proverbial needle forward. We do this by measuring everything and constantly evaluating our resilience. Our quantitative measures are around system downtime, mean time to resolution, and time to push a change. Our qualitative measurements include 1:1 conversations between managers and their directs, discussion at all-hands, and our monthly anonymous survey.
Mobile-based personal and professional development platform
San Francisco, Phoenix, Minneapolis, Washington D.C., or Remote (US)
When we fail, we focus on what we can take away from the experience to apply moving forward. One way we implement this behavior is through a practice we refer to as “learning memos.” Employees are encouraged to take time to reflect on what they’ve learned through various trials and tribulations, and write their thoughts out in a way that can be shared with the rest of the team.
When we introduce a critical bug into a production environment, we follow-up the incident with a post-mortem that focuses both on what went wrong and how to prevent the incident from being repeated in the future. We are strong believers in approaching this process with empathy and acting as humans, not robots during the post-mortem analysis.
We are also acutely aware that psychological safety is key to both innovation and creating a positive work environment where people feel empowered to speak up with ideas, issues, or concerns. Our managers across the organization welcome and invite feedback of all kinds.
For example, a major increase in writes to our main database caused degraded performance for our entire suite of products. This was isolated to be an issue with fetching data from one of our APIs stuck in a loop. The post-mortem focused on the technical problem, and the root causes, including recommendations for process improvements. One of the attributes our team holds dear is that of Team > Self, where any incident is the team’s fault, not the individual who may have pushed or deployed some code.
Things are going to go wrong or not work –– it is how we react to them that define how we move forward. We’ve built several services and products that worked for a while but needed to be torn down and rebuilt later. Our needs change, the industry changes, and over time, our customers expect more, too. As a result, we need to be able to adapt and respond quickly. Our data is used to power the growing on-demand economy, and beyond, and we need to be able to deliver what our customers need.
We have a strong practice of internal and external post-mortems when an issue occurs and take ownership when things go wrong. For example, when switching to new infrastructure, a command was accidentally run in production that was designed for development… this resulted in a database being dropped! 😱 Our whole team rallied and we had it restored within minutes along with pull requests out to prevent it from happening again. All the while, we experienced zero downtime! 😎
It’s completely okay to get it wrong. In fact, we have a strong bias toward taking risks and making mistakes versus playing it safe. As a result, one of our core company values is, “We are an elite force. It’s always we, never me.” We are all open about making mistakes across all levels, including our CEO. Our most senior developers admit to hacks and come forward about being wrong when they are. We’re honest in our PRs and the feedback we give, and also recognize risks and knowingly accept them. We have dedicated time to build test harnesses to catch bugs and alert us (i.e. alarms, emails) because we encourage speed over perfection. It makes us more comfortable to take risks, too, because we are reassured that mission critical regressions will be caught early.
To us, feeling safe to fail goes hand in hand with open feedback. We frequently reference the radical candor graph when we are giving or asking for feedback, and make sure everyone has context around what they’re building. We document our in-person meetings, whether they be dev team retrospectives or standups, and share them in our shared Slack office. We also have a Facebook Portal in both of our offices, which is on at all times. It provides an additional wormhole from one office to another. And despite our time differences, we schedule weekly All Hands meetings and weekly #devs meetings when our timezones overlap so that everyone can attend. Ideas are shared, concerns are voiced, and we happily address successes and failures from the week.
1 Open Positions
Seamlessly create, send, and track video emails
Colorado Springs, Denver, or Remote in CO, NY, PA, WI
Failure is inevitable and it can easily be intimidating. We believe it’s important to intentionally work through that fear by experiencing failure, talking about it, and then reinforcing practices that can limit failure’s impact, or, even better, leverage its teachings. Servers crash, ideas don’t pan out, timelines slip; the trick is to anticipate, manage, and mitigate risk to a degree that enables you to thrive without being frozen.
1 Open Positions
Career network for college students and recent grads
San Francisco, Denver, or Remote (US)
At Handshake, we recognize a growth mindset is only possible when individuals are given a safe environment to fail. Many Handshake managers consider failing a necessary part of growing as an engineer here, which is why we embrace failures as learning opportunities. In fact, many of our employees favorite the #learning-from-losses Slack channel, where employees show humility with their recent failures and how they have learned from them.
As discussed previously (see EQ > IQ), we practice blameless postmortems mistakes. Rather than pointing fingers, we focus on how to fix the issue at hand and prevent it from occurring in the future. Postmortems are then shared across the entire engineering organization so everyone can learn and improve collectively. For instance, one of our engineers wrote a feature to run a promotion within the Handshake platform, but unfortunately, the data from the promotion was getting corrupted due to a misunderstanding of how Rails destroys records. “There were no finger-pointing or blaming of why this happened,” the engineer said, “only a calm focus on how can we prevent the next incident. In fact, one of the team members even researched more of why this problem happened in the first place and gave a tech talk to fellow engineers on how to avoid these record destroy pitfalls.”
We recognize that these mistakes don't make us bad engineers but actually they are lessons for all of us to be better and help others avoid making the same mistakes.
We believe that failure is one of the most informative teachers and if we want to truly Build for the Long Term, we need to learn as much as we can as quickly as possible. We need to push ourselves and feel comfortable dropping our egos, sharing our learnings from mistakes or missteps, and iterating quickly. Supply chain is also an incredibly complex, mistake-laden industry, so to operate within it successfully we need to understand that failure can happen, acknowledge when it does, and learn how to move forward.
We hold “funerals” for projects or meetings that we stop because we’ve learned from them, team members acknowledge mistakes across retros and informal conversations, and we regularly shout out people who have “Learned and Iterated,” one of our core company values.
Failure isn’t the fault of an individual developer. It’s the fault of the systems and practices that failed to prevent it. We learn from one another and continuously strive to learn from our mistakes and the mistakes of others. Senior developers and management are transparent about our past failures (dropping production databases for example!) and encourage new staff to try new things and experiment. We chronicle these failures through a postmortem repository and support documentation to and make each error an opportunity to learn.
This is one of our core values. We highly value continuous learning, both at the level of the individual and as a team navigating the ever-changing dating space. We constantly reflect on and find opportunities to do better, which inevitably comes with making mistakes. As one example, we send “nudges” to our users when there’s a lull in their chat conversation. We thought maybe if our nudges were sassy and sarcastic, they might increase chat participation by giving both parties something to laugh about or comment on. Unfortunately, people were writing in that our nudges were too abrasive and over the top. We thought nothing of it and switched back to the old messages. Failure? Not at all. We know more now than we did before.
Our industry is incredibly complicated and opaque. To break through, we need to innovate and make mistakes.
We want developers who bring an inherent curiosity to everything they do and a desire to try new things. We value enthusiasm and smart engineering more than expertise in a particular stack. Some of our most talented engineers have mastered programming languages while on the job.
Following major projects, we conduct thorough post-mortems where everyone is encouraged to contribute. You’ll never hear blame in these meetings. We take shared responsibility for mistakes and a collaborative approach to fixing them. If you try something and fail, we’ll share learnings together and choose a different strategy next time.
1 Open Positions
We believe it’s crucial to learn from incidents and take actions to prevent them from recurring, which is why we’ve formalized this process with blameless post mortems. Our primary goal in conducting a post mortem is to ensure the incident is documented, that all contributing root causes are well understood, and that effective preventative actions are put in place to reduce the likelihood and impact of a recurrence.
For a post mortem to truly be blameless, we make sure it focuses on identifying an incident’s contributing causes without placing blame on an individual or team for bad or inappropriate behavior. At Wealthfront, a blamelessly-written post mortem assumes everyone involved in an incident had good intentions and did the right thing with the information they had. This means engineers whose actions contributed to an incident can give a detailed account of what happened without fear of punishment or retribution.
It’s a horizontal organization at Monograph. We’re very open about how all of this is new to us and we’re all figuring it out together as we go. It’s part of the process. There is no blame whatsoever and with how much we communicate with one another, it’s never one person’s fault. We’ve actually had to rebuild Monograph two times now, and each time was a decision we made together. We hope you’ll experiment and work openly with us on both Monograph and whatever side project(s) you have.
Want to List Your Company?
Submit a team profile!
Select 8 Values
Contact me (Lynne 👋)
Qualify Your Values
Reach Thousands of Devs
Find Value-Aligned Candidates
Have Meaningful Initial Conversations
Don't Fill Roles, Hire Teammates
You can post as many job openings as you want.