Availability-Consistency Trade-Offs in a Fault-Tolerant Stream Processing System

Unknown author (2004-11-22)

processing. In contrast to previous techniques that handlenode failures, our approach also tolerates network failuresand network partitions. The approach is based on a principledtrade-off between consistency and availability in theface of failure, that (1) ensures that all data on an inputstream is processed within a specified time threshold, but(2) reduces the impact of failures by limiting if possible thenumber of results produced based on partially available inputdata, and (3) corrects these results when failures heal.Our approach is well-suited for applications such as environmentmonitoring, where high availability and  real-time response is preferable to perfect answers.Our approach uses replication and guarantees that all processingreplicas achieve state consistency, both in the absenceof failures and after a failure heals. We achieve consistencyin the former case by defining a data-serializing operatorthat ensures that the order of tuples to a downstreamoperator is the same at all the replicas. To achieve consistencyafter a failure heals, we develop approaches based oncheckpoint/redo and undo/redo techniques.We have implemented these schemes in a prototype distributedstream processing system, and present experimentalresults that show that the system meets the desiredavailability-consistency trade-offs.