Culture of quality
Quality is so important that it can’t be outsourced to one person alone.
We strongly believe in a culture of software quality at Acast, and we depend on this high quality to achieve our goals and move as fast as we need to. This is why we made “quality” one of our core values, and it’s become so embedded in the fabric of our day-to-day work that we don’t even really talk about it much anymore.
But we also know quality has to be the responsibility of every individual. We can’t scale efficiently if we treat quality assurance as a second-class citizen.
It’s a shared expectation for all members of our cross-functional teams. We don’t have a separate role, function or step in our process responsible for ensuring quality. But high quality doesn’t happen just because you want it to, and over the years we’ve adopted several engineering practices to ensure quality is by design — and not a second thought.
Some might say it would be better to follow all the good engineering practices and have a dedicated QA on top, but we believe having such functions would actually yield worse quality. We value autonomy and speed most, and by adding another step in the pipeline we’d introduce more opportunities for stalling, blocking and error.
Naming a step in the process, or a person responsible for validating the quality, would also partially remove the responsibility and accountability from the team — in effect removing quality as a shared core value.
Repetitive tasks — like testing — lead to test automation when taken on by engineers. Acast engineers care deeply about their craft and quality, and teams take pride in solving complex problems, coming up with many innovative ways to make sure that happens. The trick is to know that no one else will do it for you.
We pinpointed three factors that drive quality as a cultural value: ownership, healthy cognitive load, and learning from mistakes. All our engineering practices fall into one of these.
One team has end-to-end ownership of a feature or improvement. They have the autonomy to experiment, plan, develop and release on their own terms. The aim is to increase flow and avoid becoming blocked or idle by another function.
To be able to achieve this, each team is composed of diverse individuals with various skills that together meet all the experience needed for that product. Product managers are also team members, present in the day-to-day work and able to provide their distinct point of view to further increase flow.
Ownership extends to the entire lifecycle of a product, including the operations — and, besides not having a dedicated QA entity, we also disposed of the traditional devOps team.
We don’t want our teams to toss some code over the fence at the end of the day. Rather, the team takes ownership of operations according to their SLAs and, in the worst case, someone is woken up in the middle of the night.
All engineers are part of the on-call rotation, including engineering managers, and this has been one of the strongest motivators for building resilience and stability. You can read more about our approach to on-call in our previous post here.
Quality is the confidence that your products are doing what they’re supposed to. We have that confidence because we trust our people.
Cognitive load is a precious, finite resource. When the limits are stretched, it creates an environment of uncertainty, speed fallacy, and higher likelihood of mistakes. We do several things to overcome this.
Firstly, by shipping often and in small batches, we increase confidence in releases — and therefore quality. Our teams are cross-functional and never more than 10 people. With the right tools and focus, we’re able to release value every few days to our customers.
We also rely on gradual rollouts and monitoring to make sure the release has the intended effect — and, if not, the cost of reverting is minimal.
Another way to maintain healthy cognitive load is to automate testing. The majority of the time in a software system, by introducing new functionality you’re also adding complexity to an existing solution — so the only scalable way to maintain speed and confidence is to automate testing.
Testing is crucial — especially integration testing — in the microservices architecture we’ve developed at Acast. As well as increasing confidence, a major benefit of automating testing is that people have more mental capacity to reflect on new challenges.
Quality, while being an expectation from each individual, is also achieved through teamwork and mentorship.
Despite all our efforts, bugs will slip through the net. It’s an inevitable artefact for the complexity we have to master. But, instead of crossing our fingers and hoping for the best, we rather think of the worst.
We take several measures to make sure that, when a bug does occur, we minimise its impact. We have extensive monitoring and alerting systems that will point out an error long before it’s noticed by our customers, and we’ve made deployments so fast and trivial that reverts can happen with a flip of a switch — allowing peace to be restored while we investigate what happened.
But the only way to make sure the same mistake doesn’t happen again is by learning from it.
Code reviews ensure every piece of code has at least a second pair of eyes on it, while facilitating the spread of knowledge between team members. All incidents are followed by post-mortem workshops, where we aim to understand the root causes of the malfunction and build long-term fixes that ensure we don’t encounter them again.
Quality is best assured when you don’t separate it from your day-to-day work, but rather incorporate it in an engineering culture that takes pride in building and running high quality software.
As always, this is only what works for us right now. Speak to us in a year and we probably will have learned many more ways.