Title:
|
Parallel β-helix prediction : high-confidence models from multiple sequence alignments
|
This PhD project consisted of two parts. The first part was our successful T0100 prediction in CASP4. In this prediction, we produced one of the highest ranked threading alignments through sequence analysis which revealed a, “Cys-staples” pattern formed by putative disulphide bridges between consecutive turns in the parallel β-helix core of different homologues. This pattern was used as an anchoring point in the template-target alignment, and this novel approach motivated the follow-up project which constitutes the second part. The aim of this second part of the project was to apply the aforementioned approach as widely as possible and to produce high-confidence models for all detectable members of the PLL superfamily in GenPept (as retrieved from the NCBI in July 2002). Large-scale detection of PLL proteins was achieved initially with the help of two different third party fold recognition programs, setting the parameters and cut-off values carefully to be stringent in order to minimise false positive predictions. The two resulting datasets were then pooled and clustered. This resulted in twelve families with homologues in PDB, eight families without close homologues in PDB but with some members annotated as pectolytic enzymes, and one new family with no indication of prior classification as PLL. A small fraction of PLL predictions were deemed to be probable false positives, and a few others could not be followed up upon confidently because no homologues could be detected by standard BLAST searches of the public sequence databases. After augmenting the nine families without known structures through standard BLAST searches of SPTrEMBL and careful analysis and editing of automated target-template alignments, all plausible members of the altogether twenty-one families were modelled using an automated modeling procedure.
|