We jump on the opportunity to learn from our mistakes. In a world where not all problems are solved, there are bound to be answers that don’t quite hit the mark. Our customers depend on our software to be reliable, so we have many processes in place to prevent regressions. We understand that everyone on the team is human (except you SlackBot), so we’re bound to make mistakes. When we do make errors, we compile information about the instance, document it, and focus on creating preventative measures to make us more reliable.
Here’s an example to illustrate how we handle failures. Our website’s signup form was recently hit with a malicious script to batch signup a bunch of accounts. The form didn’t include our Google Recaptcha snippet and thus, opened us up to an attack. The original programmer of the signup form was not targeted or blamed for this. Rather, the team came together to add the Google Recaptcha code. As a team, we put together a process to prevent sign ups like this from occurring in the future. Our team has each other’s back and “figuring out who pushed the bug” wasn’t on our agendas. Ultimately, we were able to create a stronger sign up experience in the process.
We think this is necessary to learning and improving as a team. Inevitably, some of the things we try aren't going to be successful. When something goes wrong or we fall short of our goals, we don't look to figure out who is to blame -- we figure out where the gaps are in our processes (or our expectations!). We go into retrospectives with the understanding that people did the best they could with the conditions they had. A failed experiment isn't viewed as a personal failure due to our culture of support and accomplishment.
In terms of engineering, our high priority is reliability. Our platform is responsible for more than 2 billion dollars in construction bidding across North America every day. We can’t mess that up. The best way to be reliable is to practice dealing with post mortem because mistakes are going to happen, there’s no way around it. To give you a concrete example, not too long ago, we had an issue where we were inadvertently sending emails to our customers that we should not have (we were exposing sensitive information). This was a serious problem and we handled it by meeting as a whole team, identifying the problem, discussing what we needed to do to immediately fix it, and then making the necessary steps to ensure that this doesn’t happen again in the future. The important thing wasn’t figuring out “whose fault it was” because it doesn’t really matter whose fault it was. Mistakes are going to be made. It’s acknowledging this fact and focusing our effort into eliminating the possibility of the same mistake happening a second time. We put in patterns and tests to make sure that another developer won’t be tripped up on the same things in the future.
We take every failure as an opportunity to learn and be better. We know we may not fix the problem right away, but it’s about testing and trialing, and learning from it to make sure the next time we tackle it, it’ll be better than the last. We released the first beta of our API in a few weeks when it wasn’t really rock solid yet. Our first big user actually broke it, as high-availability was not even there. However, it allowed us to get feedback and to focus on the highest impact items. Since then, we have done tons of iterations to arrive at the product we have today and we continue to iterate on it every week. Our biggest failures are often times are best learning opportunities, which is why we spend time to thoroughly document post mortems (DNS DDos incident, 8-min indexing downtime). We share these with the public too, in hopes that everyone can benefit from the lessons we’ve learned the hard way.
21 Open Positions
At a startup of our size, each team member’s personality and background contributes significantly to the overall culture. Alex and Rebecca are both Hack Reactor grads and fearless, as are many people who self-select into a bootcamp without a traditional computer science education. They have helped set the tone for how we handle confusion (comfortably) and how we approach things we don’t know how to do (“let’s figure it out”). We are fortunate to have a lot of great resources beyond our team, which we frequently leverage. Finbarr (our technical advisor) and Gabe (a close friend to the company), are always available on Slack and help us solve specific, technical problems as well as review the scripts we write. Failing is very natural part of learning, and we are not afraid of it whatsoever. We’ve somehow eliminated any trace of imposter syndrome here and really love that part of our culture.
For example, a major increase in writes to our main database caused degraded performance for our entire suite of products. This was isolated to be an issue with fetching data from one of our APIs stuck in a loop. The post-mortem focused on the technical problem, and the root causes, including recommendations for process improvements. One of the attributes our team holds dear is that of Team > Self, where any incident is the team’s fault, not the individual who may have pushed or deployed some code.
Labor Automation Cloud Platform
New York, Toronto, and Lexington (just outside of Boston)
Sometimes, these experiments do not yield the results we desire—or simply don’t work— but we encourage experimentation and failing quickly because learning from these is what enables future success.
In our experience, failure is rarely a direct function of the individual contributor but rather a process inefficiency that needs to be remediated. To diagnose the issue, we practice blameless retrospectives to understand what happened, how to improve, and communicate our findings to the team. We look to continually improve our output.
For example, we will push out early software under feature toggles to test and observe customer usage with production traffic. If the experiment and associated a/b tests prove to be successful, we’ll expand the rollout accordingly. On the other hand, if the experiment fails we put the idea on the shelf and move on.
8 Open Positions
Our production systems have failsafes in place to give our engineers peace of mind when deploying. Deployment rollbacks are easy (it’s just one click) and blameless. We use the 5 Whys when determining the cause of production issues. This process is about identifying the root cause for an issue rather than assigning blame. (Our 5 Whys is associated with our rollback, which is blameless.)
17 Open Positions
Things are going to go wrong or not work –– it is how we react to them that define how we move forward. We’ve built several services and products that worked for a while but needed to be torn down and rebuilt later. Our needs change, the industry changes, and over time, our customers expect more, too. As a result, we need to be able to adapt and respond quickly. Our data is used to power the growing on-demand economy, and beyond, and we need to be able to deliver what our customers need.
We have a strong practice of internal and external post-mortems when an issue occurs and take ownership when things go wrong. For example, when switching to new infrastructure, a command was accidentally run in production that was designed for development… this resulted in a database being dropped! 😱 Our whole team rallied and we had it restored within minutes along with pull requests out to prevent it from happening again. All the while, we experienced zero downtime! 😎
13 Open Positions
We actively encourage the offering and exploration of new ideas, even if their value is uncertain to the broader team. If a next step forward on an idea can’t be agreed upon, we encourage parties to agree on an experiment instead. An experiment is something that is safe to try and has a clear hypothesis of what outcomes are expected. From its execution, we’ll all learn something of value, and should it fail it won’t cause the organization unavoidable or serious harm.
Failure isn’t the fault of an individual developer. It’s the fault of the systems and practices that failed to prevent it. We learn from one another and continuously strive to learn from our mistakes and the mistakes of others. Senior developers and management are transparent about our past failures (dropping production databases for example!) and encourage new staff to try new things and experiment. We chronicle these failures through a postmortem repository and support documentation to and make each error an opportunity to learn.
This is one of our core values. We highly value continuous learning, both at the level of the individual and as a team navigating the ever-changing dating space. We constantly reflect on and find opportunities to do better, which inevitably comes with making mistakes. As one example, we send “nudges” to our users when there’s a lull in their chat conversation. We thought maybe if our nudges were sassy and sarcastic, they might increase chat participation by giving both parties something to laugh about or comment on. Unfortunately, people were writing in that our nudges were too abrasive and over the top. We thought nothing of it and switched back to the old messages. Failure? Not at all. We know more now than we did before.
If we’re getting it done perfectly the first time, every time, we’re not pushing the limits enough. For our highest priority features and products, we emphasize quality and usability, but the road to production-ready deployments require devs to make mistakes. When Yuhao (Native Mobile Engineering Manager) first started at reddit, he spent 3 weeks building a reddit app for iPad on his own. In other words, he spent 3 weeks hacking together a pretty cool iPad implementation though he says, “Looking at it now, it’s really not that cool,” and laughs. The point is, failure is not only acceptable, it’s a totally necessary part of the process.
17 Open Positions
It’s a horizontal organization at Monograph. We’re very open about how all of this is new to us and we’re all figuring it out together as we go. It’s part of the process. There is no blame whatsoever and with how much we communicate with one another, it’s never one person’s fault. We’ve actually had to rebuild Monograph two times now, and each time was a decision we made together. We hope you’ll experiment and work openly with us on both Monograph and whatever side project(s) you have.
1 Open Positions
Want to List Your Company?
Submit a team profile!
Select 8 Values
Contact me (Lynne 👋)
Qualify Your Values
Reach Thousands of Devs
Find Value-Aligned Candidates
Have Meaningful Initial Conversations
Don't Fill Roles, Hire Teammates
You can post as many job openings as you want.