Use this URL to cite or link to this record in EThOS:
Title: Context classification for improved semantic understanding of mathematical formulae
Author: Almomen, Randa
ISNI:       0000 0004 7432 4665
Awarding Body: University of Birmingham
Current Institution: University of Birmingham
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
The correct semantic interpretation of mathematical formulae in electronic mathematical documents is an important prerequisite for advanced tasks such as search, accessibility or computational processing. Especially in advanced maths, the meaning of characters and symbols is highly domain dependent, and only limited information can be gained from considering individual formulae and their structures. Although many approaches have been proposed for semantic interpretation of mathematical formulae, most of them rely on the limited semantics from maths representation languages whereas very few use maths context as a source of information. This thesis presents a novel approach for principal extraction of semantic information of mathematical formulae from their context in documents. We utilised different supervised machine learning (SML) techniques (i.e. Linear-Chain Conditional Random Fields (CRF), Maximum Entropy (MaxEnt) and Maximum Entropy Markov Models (MEMM) combined with Rprop- and Rprop+ optimisation algorithms) to detect definitions of simple and compound mathematical expressions, thereby deriving their meaning. The learning algorithms demand annotated corpus which its development considered as one of this thesis contributions. The corpus has been developed utilising a novel approach to extract desired maths expressions and sub-formulae and manually annotated by two independent annotators employing a standard measure for inter-annotation agreement. The thesis further developed a new approach to feature representation depending on the definitions' templates that extracted from maths documents to defeat the restraint of conventional window-based features. All contributions were evaluated by various techniques including employing the common metrics recall, precision, and harmonic F-measure.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QA75 Electronic computers. Computer science