Estimation of reliability of essay tests in public examinations
Essay test is an indispensable part of public examinations. However, there does not seem to exist a general method that can be used to estimate its reliability in the routine operation of the examination. The aim of present research is to develop a general model to study the reliability of essay tests due to betweenmarker inconsistencies, within-marker inconsistencies and question choice. The model is making use of multilevel analysis, since data from essay tests naturally fall into a three-level hierarchy: questions within candidates and candidates within markers. For cases where only one score is available for each candidate, the three-level model would degenerate into a two-level model. Analyses using the two-level model and three-level model have been performed to illustrate how between-marker inconsistencies, factors affecting between-marker inconsistencies, within-marker systematic inconsistencies during marking period, and inconsistencies due to question choice can be analysed. By performing a common factor analysis on the covariance matrix of question scores, taking the factor score of the most dominant factor to be the true score, the reliability due to question choice and between-marker variability can be estimated. The study is illustrated by performing analyses on the question scores of the 1985 Hong Kong Advanced Level Physics Paper IIA. The data set comprises 22,544 question scores of 7,844 candidates marked by 18 markers. Parameters are estimated using iterative generalised least square. All the analyses reported in this study achieved convergence within a reasonably short time, using the software ML3E running on a personal computer, showing that the model is practicable in the routine operation of public examinations.