Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.699309
Title: Improving the quality of bug data in software repositories
Author: Auwal, Bilyaminu Romo
ISNI:       0000 0004 5988 9991
Awarding Body: Brunel University London
Current Institution: Brunel University
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Thesis embargoed until 16 Jun 2018
Access from Institution:
Abstract:
Context : Researchers have increasingly recognised the benefit of mining software repositories to extract information. Thus, integrating a version control tool (VC tool) and bug tracking tool (BT tool) in mining software repositories as well as synchronising missing bug tracking data (BT data) and version control log (VC log) becomes of paramount importance, in order to improve the quality of bug data in software repositories. In this way, researchers can do good quality research for software project benefit especially in open source software projects where information is limited in distributed development. Thus, shared data to track the issues of the project are not common. BT data often appears not to be mirrored when considering what developers logged as their actions, resulting in reduced traceability of defects in the development logs (VC logs). VC system (Version control system) data can be enhanced with data from bug tracking system (BT system), because VC logs reports about past software development activities. When these VC logs and BT data are used together, researchers can have a more complete picture of a bug’s life cycle, evolution and maintenance. However, current BT system and VC systems provide insufficient support for cross-analysis of both V Clogs and BT data for researchers in empirical software engineering research: prediction of software faults, software reliability, traceability, software quality, effort and cost estimation, bug prediction, and bug fixing. Aims and objectives: The aim of the thesis is to design and implement a tool chain to support the integration of a VC tool and a BT tool, as well as to synchronise the missing VC logs and BT data of open-source software projects automatically. The syncing process, using Bicho (BT tool) and CVSAnalY (VC tool), will be demonstrated and evaluated on a sample of 344 open source software (OSS) projects. Method: The tool chain was implemented and its performance evaluated semi-automatically. The SZZ algorithm approach was used to detect and trace BT data and VC logs. In its formulation, the algorithm looks for the terms "Bugs," or "Fixed" (case-insensitive) along with the ’#’ sign, that shows the ID of a bug in the VC system and BT system respectively. In i addition, the SZZ algorithm was dissected in its formulation and precision and recall analysed for the use of “fix”, “bug” or “# + digit” (e.g., #1234), was detected was detected when tracking possible bug IDs from the VC logs of the sample OSS projects. Results: The results of this analysis indicate that use of “# + digit” (e.g., #1234) is more precise for bug traceability than the use of the “bug” and “fix” keywords. Such keywords are indeed present in the VC logs, but they are less useful when trying to connect the development actions with the bug traces – that is, their recall is high. Overall, the results indicate that VC log and BT data retrieved and stored by automatic tools can be tracked and recovered with better accuracy using only a part of the SZZ algorithm. In addition, the results indicate 80-95% of all the missing BT data and VC logs for the 344 OSS projects has been synchronised into Bicho and CVSAnalY database respectively. Conclusion: The presented tool chain will eliminate and avoid repetitive activities in traceability tasks, as well as software maintenance and evolution. This thesis provides a solution towards the automation and traceability of BT data of software projects (in particular, OSS projects) using VC logs to complement and track missing bug data. Synchronising involves completing the missing data of bug repositories with the logs de tailing the actions of developers. Synchronising benefit various branches of empirical software engineering research: prediction of software faults, software reliability, traceability, software quality, effort and cost estimation, bug prediction ,and bug fixing.
Supervisor: Capiluppi, A. ; Hall, T. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.699309  DOI: Not available
Keywords: Bug traceability ; Bug accuracy ; Bug fixing-commits ; Synchronising missing bug data ; Identifying bug
Share: