Automatic Software Upgrades for Distributed Systems

Unknown author (2005-10-06)

Upgrading the software of long-lived, highly-available distributedsystems is difficult. It is not possible to upgrade all the nodes in asystem at once, since some nodes may be unavailable and halting thesystem for an upgrade is unacceptable. Instead, upgrades must happengradually, and there may be long periods of time when different nodesrun different software versions and need to communicate usingincompatible protocols. We present a methodology and infrastructurethat make it possible to upgrade distributed systems automatically whilelimiting service disruption. We introduce new ways to reason aboutcorrectness in a multi-version system. We also describe a prototypeimplementation that supports automatic upgrades with modest overhead.