Skip to content

DSN

2024

Mutiny! How does Kubernetes fail, and what can we do about it?

Fault-Error-Failure chain:

  • Fault: a static defect in software
  • Error: an incorrect internal state
  • Failure: external, incorrect behavior

What kind of failures?

  • misconfigurations
  • communication errors
  • other underlying errors like OS

Etcd alterations can recreate a majority (54/81) of real-world failures analyzed.

The core idea is that given a injected fault, why the fault is propagated to the whole system.