Use this URL to cite or link to this record in EThOS:
Title: Exploiting public human genome NGS datasets to characterize repetitive DNA and recover assembly gaps
Author: Ogeh, Denye Nathaniel
ISNI:       0000 0004 7228 4794
Awarding Body: University of Leicester
Current Institution: University of Leicester
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
With the advent of Next Generation Sequencing (NGS), we have witnessed the generation of enormous volumes of short read sequence data, cheaply and on short time scales. Nevertheless, the quality of genome assemblies generated using NGS technologies has been greatly affected by this innovation, compared to those generated using Sanger DNA sequencing. This is largely due to the inability of short read sequence data alone to scaffold repetitive structures, creating gaps, inversions and rearrangements and ultimately resulting in assemblies that are, at best, draft forms (by draft we mean, assembly that is only a preliminary result that will require more work to be done to make it a more complete and accurate representation of the genome). Single molecule long-read sequencing (SMS) technologies on the other hand, address this challenge by generating sequences with greatly increased read lengths, offering the prospect to better recover these complex repetitive structures, concomitantly improving assembly quality. Following this development, we evaluate the ability of SMS data (specifically Pacific Biosciences SMRT data and Oxford Nanopore MinION data from human genomes) to recover poorly represented repetitive sequences (specifically, GCrich human minisatellites), identify novel transposable element insertions and enable the closing of gapped regions. Our results show that by using single molecule sequencing and long read technology, poorly represented repetitive sequences (specifically, minisatellites and L1s) and other missing elements in published human genome assemblies can be characterized by developing custom software, scalable for the analysis of single molecule long-reads (particularly, Pacific Biosciences’ SMRT technology). The tool designed is cross-platform, thus, giving computational and non-computational biologists a straightforward approach and less technical platform for local analysis of specific poorly characterized DNA sequences.
Supervisor: Badge, Richard Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available