An Analysis of Patch Plausibility and Correctness for Generate-And-Validate Patch Generation Systems (Supplementary Material)
We analyze reported patches for three prior generate-and-validate patch generation systems (GenProg, RSRepair, and AE). Because of experimental error, the majority of the reported patches violate the basic principle behind the design of these systems -- they do not produce correct outputs even for the inputs in the test suite used to validate the patches. We also show that the overwhelming majority of the accepted patches are not correct and are equivalent to a single modification that simply deletes functionality. We also present Kali, a generate-and-validate patch generation system that simply deletes functionality. Working with a simpler and more effectively focused search space, Kali produces more correct patches and at least as many patches that produce correct outputs for the inputs in the validation test suite as prior GenProg, RSRepair, and AE systems.