The development of an expert systems approach to the statistical analysis of experimental data
This thesis is concerned with the application of expert systems techniques in the field of statistics. An expert statistician in industry has a twofold role; undertaking the design and analysis of data from complex experiments and providing supervision and help for research workers who analyse data from simpler designs. There is, therefore, a potential role for a statistical expert system which could be used by research workers to enable them to carry out valid analyses. The expert statistician would be freed from the more straightforward analyses and would only need to deal with referrals from the system and to initially 'tune' the system to their own application area. The design and development of such a prototype expert system, THESEUS, is the basis of this work. The area of application chosen for the prototype system is completely randomised designs with one trial factor. It was initially important to limit the area of study so that knowledge acquisition for the system would be a manageable task. However, once the difficulties in developing an expert system have been tackled, much of the expertise used in analysing this simple type of study could be readily extended to more complex designs. The knowledge acquisition phase, the most time consuming part of developing any expert system, concentrated on developing a rational prototype rule base by reviewing the available literature, interviewing practising statisticians and undertaking workshops where the analysis of particular data sets was discussed. The prototype software is a production rule system and is written in Turbo Pascal on an IBM-AT. Pascal was chosen because of the need to access statistical routines during the consultation process. The prototype uses a combination of forward and backward chaining to process the rules. Information required by the system can come from the user, the data or the rules. The overall system design also includes facilities for entering and editing data, altering and adding knowledge and a report generator. Implementation of these facilities is not incorporated as part of this thesis. A small number of trial sites were selected for industrial trials in order to validate the system and evaluate the results of the local experts 'tuning' of the rule base to their own particular application area.