Impossibility of boosting distributed service resilience

Unknown author (2005-02-25)

We prove two theorems saying that no distributed system in whichprocesses coordinate using reliable registers and f-resilient servicescan solve the consensus problem in the presence of f+1 undetectableprocess stopping failures. (A service is f-resilient if it isguaranteed to operate as long as no more than f of the processesconnected to it fail.)Our first theorem assumes that the given services are atomic objects,and allows any connection pattern between processes and services. Incontrast, we show that it is possible to boost the resilience ofsystems solving problems easier than consensus: the k-set consensusproblem is solvable for 2k-1 failures using 1-resilient consensusservices. The first theorem and its proof generalize to the largerclass of failure-oblivious services.Our second theorem allows the system to contain failure-awareservices, such as failure detectors, in addition to failure-obliviousservices; however, it requires that each failure-aware service beconnected to all processes. Thus, f+1 process failures overall candisable all the failure-aware services. In contrast, it is possibleto boost the resilience of a system solving consensus if arbitrarypatterns of connectivity are allowed between processes andfailure-aware services: consensus is solvable for any number offailures using only 1-resilient 2-process perfect failure detectors.