Sam Atkinson proposes that most—maybe even all—developers love space. Even those who don’t probably love Chris Hadfield, the guitar-strumming astronaut who covered David Bowie’s Space Oddity from the International Space Station. These 2 topics are closely related, obviously, but not-so-obviously, they’re both highly relevant to developers who want better and safer development practices and processes. Here are a few highlights from Atkinson’s fast-paced, clever talk about applying lessons from the history of space exploration to modern development practices.
Look outside for insight
The first lesson is to look outside your industry. For most software developers, a process failure isn’t deadly. But if you’re a developer working on a shuttle launch or in critical healthcare environments, mistakes can be fatal. It certainly puts our own security, compliance, and reliability concerns in perspective and can offer valuable ideas for how to approach tough projects with extreme demands. You may not be launching your new widget into orbit, but what if you were to treat it as such?
Build systems that assume you’ll make mistakes
One of the first ways to approach development processes like an astronaut is to stop assuming everything will go according to plan. Build systems for resilience from the start, as you’re not going to be fixing them once they’re in orbit—er, production. Automation can help. And make sure that when failure does happen, you’ve got a plan for recovery. Don’t let failing systems or projects continue to struggle. Kill them off and start again.
Avoid the normalization of deviance
“It’s ok, we can work around it.” Atkinson posits that this is one of the biggest mistakes we—and those exploring space—can make. It’s not the single exception that is so dangerous, it’s when that exception becomes common practice. The poignant example of this costly normalization was the Challenger disaster. A very small thing—an O-ring seal on a solid rocket booster—was allowed to deviate from the normal practice. Shuttles launched successfully on more than a dozen flights with this defect. The fix itself wasn’t impossible, but there was little support for pausing planned missions or taking time away from other work. When the deviance of the rubber ring met the deviance of a particularly cold day, failure was catastrophic.
Use deviance as an indicator
Even on a small dev team, the number of irregular behaviors we see every day may be overwhelming. For most development projects, our tolerance for risk (and deviance) is greater than NASA’s. After a failure, it’s human nature to think that it probably won’t happen again so quickly. But remember: Challenger crashed in 1986. In 2003, Columbia disintegrated on re-entry. The cause? Foam insulation that fell off and damaged the craft—a small deviance that had been recorded on several earlier missions.
If NASA couldn’t eliminate normalized deviance, costing them many lives and millions of dollars, then your business leaders probably won’t be swayed on such issues, either. This shouldn’t stop a good development team from using such information to improve their practice. By keeping a record of deviance, you know what areas of the business are most resistant to correcting bad practice and which ones trust and respond to the recommendations of their development organizations.
Practice makes (almost) perfect
Hadfield’s memoir, An Astronauts Guide to Life on Earth, closely examines how astronauts prepare to identify and solve problems, often by asking (and answering) this question: “What’s the next thing that’s going to kill me?” Atkinson notes that they also practice more than their own roles, becoming familiar with any skill that might be needed and simulating responding to realistic scenarios until they can respond calmly and quickly—even while wearing a cumbersome spacesuit.
At best, many development teams practice disaster recovery a few times a year. Atkinson suggests several ways to integrate more rigorous training, such as assigning a creative developer to generate both simple and complex errors on a weekly or monthly basis. These would not be scheduled and would need to be identified and investigated as though they were real. This means monitoring to pick up the anomaly and proper processes and postmortem to ensure everyone is learning.
Atkinson’s talk reminds us that perfection—even for the really smart people at NASA—isn’t realistic. After all, 2.5 billion dollars of technology crashed into Mars because NASA had trouble sharing information between teams (sound familiar?). Accepting and embracing imperfection lets us learn from it so that in the future, our spaceships know which way to go.