Title:
|
Practical systematic concurrency testing for concurrent and distributed software
|
Systematic concurrency testing (SCT) is a promising solution to finding and reproducing concurrency bugs. The program under test is repeatedly executed such that a particular schedule is explored on each execution. Numerous techniques have been proposed to make SCT scalable. Despite this, we have identified the following open problems: (1) There is a major lack of comparison and empirical evaluation of SCT techniques; (2) There is a need for better reduction techniques that go beyond the current theoretical limits; (3) The feasibility of applying SCT in practice is unclear, particularly for distributed systems. This thesis makes the following contributions to the field of SCT: 1. An independent, reproducible empirical study of existing SCT techniques over 49 buggy concurrent software benchmarks. Surprisingly, we found that the "naive" controlled random scheduler performs well, finding more bugs than preemption bounding. We report the results for all techniques. We discuss the benchmarks and challenges faced in applying SCT. 2. The lazy happens-before relation (lazy HBR), which provides reduction beyond partial-order reduction for programs that use mutexes. Our evaluation over 79 publicly available benchmarks shows both a large potential and large practical improvement from exploiting the lazy HBR. 3. A description of how to create an SCT tool in practice, with a focus on subtle-yet-important details that are typically not discussed in prior work. 4. A case study where we apply SCT in the context of distributed systems written for Azure Service Fabric (Fabric). We introduce our Adara actors framework for writing portable, statically-typed actors. We describe our model of Fabric and evaluate it on a system containing 15 bugs, showing that our Fabric model includes enough behaviours/asynchrony to expose these subtle pitfalls.
|