An Analysis of Patch Plausibility and Correctness for Generate-And-Validate Patch Generation Systems
We analyze reported patches for three prior generate-and-validate patch generation systems (GenProg, RSRepair, and AE). Because of experimental error, the majority of the reported patches violate the basic principle behind the design of these systems -- they do not produce correct outputs even for the inputs in the test suite used to validate the patches. We also show that the overwhelming majority of the accepted patches are not correct and are equivalent to a single modification that simply deletes functionality. We also present Kali, a generate-and-validate patch generation system that simply deletes functionality. Working with a simpler and more effectively focused search space, Kali generates at least as many correct patches as prior GenProg, RSRepair, and AE systems. Kali also generates at least as many plausible patches that produce correct outputs for the inputs in the validation test suite as the three prior systems. We also discuss the patches produced by ClearView, a generate-and-validate binary hot patching system that leverages learned invariants to produce patches that enable systems to survive otherwise fatal defects and security attacks.