Use this URL to cite or link to this record in EThOS:
Title: Automating the gathering of relevant information from biomedical text
Author: Canevet, Catherine
ISNI:       0000 0004 2729 1231
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2009
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
More and more, database curators rely on literature-mining techniques to help them gather and make use of the knowledge encoded in text documents. This thesis investigates how an assisted annotation process can help and explores the hypothesis that it is only with respect to full-text publications that a system can tell relevant and irrelevant facts apart by studying their frequency. A semi-automatic annotation process was developed for a particular database - the Nuclear Protein Database (NPD), based on a set of full-text articles newly annotated with regards to subnuclear protein localisation, along with eight lexicons. The annotation process is carried out online, retrieving relevant documents (abstracts and full-text papers) and highlighting sentences of interest in them. The process also offers a summary Table of the facts found clustered by type of information. Each method involved in each step of the tool is evaluated using cross-validation results on the training data as well as test set results. The performance of the final tool, called the “NPD Curator System Interface”, is estimated empirically in an experiment where the NPD curator updates the database with pieces of information found relevant in 31 publications using the interface. A final experiment complements our main methodology by showing its extensibility to retrieving information on protein function rather than localisation. I argue that the general methods, the results they produced and the discussions they engendered are useful for any subsequent attempt to generate semi-automatic database annotation processes. The annotated corpora, gazetteers, methods and tool are fully available on request of the author (
Supervisor: Webber, Bonnie. ; Bickmore, Wendy. Sponsor: Medical Research Council (MRC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: literature-mining techniques ; semi-automatic database annotation processes