Natural language generation in the LOLITA system an engineering approach
Natural Language Generation (NLG) is the automatic generation of Natural Language (NL) by computer in order to meet communicative goals. One aim of NL processing (NLP) is to allow more natural communication with a computer and, since communication is a two-way process, a NL system should be able to produce as well as interpret NL text. This research concerns the design and implementation of a NLG module for the LOLITA system. LOLITA (Large scale, Object-based, Linguistic Interactor, Translator and Analyser) is a general purpose base NLP system which performs core NLP tasks and upon which prototype NL applications have been built. As part of this encompassing project, this research shares some of its properties and methodological assumptions: the LOLITA generator has been built following Natural Language Engineering principles uses LOLITA's SemNet representation as input and is implemented in the functional programming language Haskell. As in other generation systems the adopted solution utilises a two component architecture. However, in order to avoid problems which occur at the interface between traditional planning and realisation modules (known as the generation gap) the distribution of tasks between the planner and plan-realiser is different: the plan-realiser, in the absence of detailed planning instructions, must perform some tasks (such as the selection and ordering of content) which are more traditionally performed by a planner. This work largely concerns the development of the plan- realiser and its interface with the planner. Another aspect of the solution is the use of Abstract Transformations which act on the SemNet input before realisation leading to an increased ability for creating paraphrases. The research has lead to a practical working solution which has greatly increased the power of the LOLITA system. The research also investigates how NLG systems can be evaluated and the advantages and disadvantages of using a functional language for the generation task.