Title:
|
Ontology-based information extraction from pathology reports for cancer registration
|
This research project develops an ontology-based technique to exploit the information contained in
free-text surgical pathology reports for breast cancer patients. A novel ontology for the domain is
designed and several tools for information extraction and reasoning are developed, supported by
machine learning algorithms aiding the identification of the relevant information within the
documents. The research shows that information extraction from surgical pathology reports can be
significantly enhanced by machine learning pre-processing, which will select the appropriate
extraction technique for the report layout and filter out irrelevant portions of text. Also, such a
system can be coupled with clearly defined, formal semantic models of both the reality, which will
support the information extraction tasks, and of coding systems, which will enable to automatically
assign clinical codes with complex rules. As a whole, it can alleviate the burden for cancer registry
staff, researchers or clinicians of reading pathology reports, calculating cancer staging codes' and
entering information on a database. The main benefits of this research will result in cost savings
and in the augmented completeness and accuracy of both routine cancer registrations and study-specific
cancer data collection for cancer registries. The outcomes of this research will also be
appreciated by the management of pathology laboratories. Increasing their awareness of the
reports' use in automated contexts will hopefully induce relevant modifications in the writing styles
of the documents or, even better, encourage the adoption of structured collection of information
for, at least, the essential data items used for cancer epidemiology.
|