Software Resilience Canon

by Martin Monperrus

Those papers are foundational wrt software resilience.

1975: Design of self-checking software

1978: N-version programming: A fault-tolerance approach to reliability of software operation

1988: Data Diversity: An Approach to Software Fault Tolerance

1996: A sense of self for unix processes

1997: Comparing operating systems using robustness benchmarks

2003: Crash-only software

2004: Basic concepts and taxonomy of dependable and secure computing

2004: Enhancing Server Availability and Security Through Failure-Oblivious Computing

2016: “Let it crash” in Erlang and its error supervision mechanism

2016: Chaos Engineering

And also: Chapter 11 “Exceptions” of the Java Language Specification