Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.421298
Title: Measuring the homogeneity and similarity of language corpora
Author: Cavaglia, Gabriela Maria Chiara
Awarding Body: University of Brighton
Current Institution: University of Brighton
Date of Award: 2005
Availability of Full Text:
Access through EThOS:
Access through Institution:
Abstract:
Corpus-based methods are now dominant in Natural Language Processing (NLP). Creating big corpora is no longer difficult and the technology to analyze them is growing faster, more robust and more accurate. However, when an NLP application performs well on one corpus, it is unclear whether this level of performance would be maintained on others. To make progress on these questions, we need methods for comparing corpora. This thesis investigates comparison methods based on the notions of corpus homogeneity and similarity.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.421298  DOI: Not available
Keywords: G000 Computing and Mathematical Sciences ; G500 Information Systems
Share: