Use this URL to cite or link to this record in EThOS:
Title: Methods and measures for statistical fault localisation
Author: Landsberg, David
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
Fault localisation is the process of finding the causes of a given error, and is one of the most costly elements of software development. One of the most efficient approaches to fault localisation appeals to statistical methods. These methods are characterised by their ability to estimate how faulty a program artefact is as a function of statistical information about a given program and test suite. However, the major problem facing statistical approaches is their effectiveness -- particularly with respect to finding single (or multiple) faults in large programs typical to the real world. A solution to this problem hinges on discovering new formal properties of faulty programs and developing scalable statistical techniques which exploit them. In this thesis I address this by identifying new properties of faulty programs, developing the formal frameworks and methods which are formally proven to exploit them, and demonstrating that many of our new techniques substantially and statistically significantly outperform competing algorithms at given fault localisation tasks (using p = 0.01) on what (to our knowledge) is one of the largest scale set of experiments in fault localisation to date. This research is thus designed to corroborate the following thesis statement: That the new algorithms presented in this thesis are effective and efficient at software fault localisation and outperform state of the art statistical techniques at a range of fault localisation tasks. In more detail, the major thesis contributions are as follows: 1. We perform a thorough investigation into the existing framework of (sbfl), which currently stands at the cutting edge of statistical fault localisation. To improve on the effectiveness of sbfl, our first contribution is to introduce and motivate many new statistical measures which can be used within this framework. First, we show that many are well motivated to the task of sbfl. Second, we formally prove equivalence properties of large classes of measures. Third, we show that many of the measures perform competitively with the existing measures in experimentation -- in particular our new measure m9185 outperforms all existing measures on average in terms of effectiveness, and along with Kulkzynski2, is in a class of measures which statistically significantly outperforms all other measures at finding a single fault in a program (p = 0.01). 2. Having investigated sbfl, our second contribution is to motivate, introduce, and formally develop a new formal framework which we call probabilistic fault localisation (pfl). pfl is similar to sbfl insofar as it can leverage any suspiciousness measure, and is designed to directly estimate the probability that a given program artefact is faulty. First, we formally prove that pfl is theoretically superior to sbfl insofar as it satisfies and exploits a number of desirable formal properties which sbfl does not. Second, we experimentally show that pfl methods (namely, our measure pfl-ppv) substantially and statistically significantly outperforms the best performing sbfl measures at finding a fault in large multiple fault programs (p = 0.01). Furthermore, we show that for many of our benchmarks it is theoretically impossible to design strictly rational sbfl measures which outperform given pfl techniques. 3. Having addressed the problem of localising a single fault in a pro- gram, we address the problem of localising multiple faults. Accord- ingly, our third major contribution is the introduction and motiva- tion of a new algorithm MOpt(g) which optimises any ranking-based method g (such as pfl/sbfl/Barinel) to the task of multiple fault localisation. First we prove that MOpt(g) formally satisfies and exploits a newly identified formal property of multiple fault optimality. Secondly, we experimentally show that there are values for g such that MOpt(g) substantially and statistically significantly outperforms given ranking-based fault localisation methods at the task of finding multiple faults (p = 0.01). 4. Having developed methods for localising faults as a function of a given test suite, we finally address the problem of optimising test suites for the purposes of fault localisation. Accordingly, we first present an algorithm which leverages model checkers to improve a given test suite by making it satisfy a property of single bug opti- mality. Second, we experimentally show that on small benchmarks single bug optimal test suites can be generated (from scratch) efficiently when the algorithm is used in conjunction with the cbmc model checker, and that the test suite generated can be used effectively for fault localisation.
Supervisor: Kroening, Daniel ; Chockler, Hana Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Computer science ; Fault Localization ; Verification ; Debugging ; Error Explanation