Use this URL to cite or link to this record in EThOS:
Title: A quantitative exploration of causes of false positive single nucleotide polymorphisms in next-generation sequencing data
Author: Bello Ribeiro, Antonio Claudio
ISNI:       0000 0004 7968 9575
Awarding Body: University of Dundee
Current Institution: University of Dundee
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next-Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. The traditional approach to SNP discovery is based on mapping reads to a reference sequence. Apart from sequencing errors, which vary in pattern and rate depending on the sequencing platform, the short read lengths that prevail in NGS, together with the repetitive nature of the genomes of many organisms, can lead to errors in the genome assembly and/or read mapping stages of the mapping-based approach for SNP discovery. The work described here has investigated and quantified some mechanisms that cause false positive SNPs. These include reference misassembly due to the presence of paralogous sequences and read cross-mapping, along with associated factors such as quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency, and filtering of SNPs by read mapping quality and read depth. The study shows that both paralogs and the choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced. A brief exploration of the influence of these factors towards false negative (FN) SNPs generation is also carried out in the end of the study, paving the way to new insights. This thesis aims to provide a stepping stone towards a better understanding of the factors influencing the mapping-based SNP discovery approach.
Supervisor: Not available Sponsor: James Hutton Institute
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: False positive SNP ; NGS ; Read mismapping ; Misassembly ; Mapping stringency ; Read lengths