Use this URL to cite or link to this record in EThOS:
Title: Supporting project comprehension with revision control system repository analysis
Author: Burn, Andrew James
ISNI:       0000 0004 2702 6147
Awarding Body: Durham University
Current Institution: Durham University
Date of Award: 2011
Availability of Full Text:
Access from EThOS:
Access from Institution:
Context: Project comprehension is an activity relevant to all aspects of software engineering, from requirements specification to maintenance. The historical, transactional data stored in revision control systems can be mined and analysed to produce a great deal of information about a project. Aims: This research aims to explore how the data-mining, analysis and presentation of revision control systems can be used to augment aspects of project comprehension, including change prediction, maintenance, visualization, management, profiling, sampling and assessment. Method: A series of case studies investigate how transactional data can be used to support project comprehension. A thematic analysis of revision logs is used to explore the development process and developer behaviour. A benchmarking study of a history-based model of change prediction is conducted to assess how successfully such a technique can be used to augment syntax-based models. A visualization tool is developed for managers of student projects with the aim of evaluating what visualizations best support their roles. Finally, a quasi-experiment is conducted to determine how well an algorithmic model can automatically select a representative sample of code entities from a project, in comparison with expert strategies. Results: The thematic analysis case study classified maintenance activities in 22 undergraduate projects and four real-world projects. The change prediction study calculated information retrieval metrics for 34 undergraduate projects and three real-world projects, as well as an in-depth exploration of the model's performance and applications in two selected projects. File samples for seven projects were generated by six experts and three heuristic models and compared to assess agreement rates, both within the experts and between the experts and the models. Conclusions: When the results from each study are evaluated together, the evidence strongly shows that the information stored in revision control systems can indeed be used to support a range of project comprehension activities in a manner which complements existing, syntax-based techniques. The case studies also help to develop the empirical foundation of repository analysis in the areas of visualization, maintenance, sampling, profiling and management; the research also shows that students can be viable substitutes for industrial practitioners in certain areas of software engineering research, which weakens one of the primary obstacles to empirical studies in these areas.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available