Use this URL to cite or link to this record in EThOS:
Title: An integrated suite of informatics tools and resources to support post-genomics investigation
Author: Li, Weizhong
ISNI:       0000 0001 3609 4932
Awarding Body: University of Liverpool
Current Institution: University of Liverpool
Date of Award: 2008
Availability of Full Text:
Access from EThOS:
Access from Institution:
The genome sequencing projects have brought about a massive increase in the scale ofbioinformatic analysis. To engage in post-genomic analysis requires the development of techniques for processing these huge datasets automatically, efficiently and effectively, and this requires the discovery of new approaches, the development of new efficient bioinformatics tools and the establishment of high-quality, accessible information resources. This thesis describes the development ofbioinformatic tools and resources, and analytical methods for a major post-genomic project directed at an open transcriptomic screen ofmechanisms involved in the environmental stress adaptation of an important environmental model species, the common carp Cyprill11S carpio L. The project required the identification and characterisation ofcDNA resources through expressed sequence tag (EST) analysis, for which a new user-configurable package, EST-ferret, was developed. The package integrates a suite of open source algorithms connected by PERL scripts that includes options for EST sequence cleaning-up, assembly, BLAST homology search, protein domain searches, and Gene Ontology (GO) annotation. ~13,500 ESTs were processed through EST-ferret and the results have been incorporated into a comprehensively annotated and searchable database, carpBASE 2.1. Thus 9202 high-quality EST sequences were assembled into 6033 non-redundant sequences. Extending the alignment search methods to include protein domains, UTRs and repeat elements annotated an additional 12.6% ofESTs. Finally, a 'GOprofiler' programme was developed and embedded in EST-ferret to assign GO annotations to ESTs. Collectively these tools maximised the identification and functional annotations for cDNA clones. Analysing gene expression profiles from microarrays is fundamental for post-genomic approaches. ExprAlign was developed to cluster and visualise gene expression data. This included CORR, a programme which determines the similarity of gene expressions between genes by computing millions of Pearson correlation coefficients. ExprAlign also implemented the Vxlnsight package to align ESTs into different expression clusters and ordinate and visualise the resulting clusters as a 3D landscape. ExprAlign was used to suggest identities for unidentified ESTs by relating 522 unclassifiable ESTs in carpBASE 2.1 to other BLAST-identified genes, and separating some unique gene and some gene isoforms. GOmatrix, using Fisher's exact test, was developed to determine which non-redundant gene expression clusters were statistically over- or under-represented in GO categories of interest. This has greatly assisted the understanding of biological roles and molecular functions of different gene groups identified from the transcript profile. Comparative, cross-species analysis of sequence data and gene expression data is important to functional genomic investigation. Orthology analysis was processed across carp, zebrafish and human and a tool called FindOrthologs was developed for this purpose. ExprAlign was implemented in the orthology analysis for discovering how conserved the correlated gene expressions of orthologous genes were across carp and human. GOmatrix also indicated the conserved biological processes for the orthologous gene groups. Supplied by The British Library - 'The world's knowledge'
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral