Title:

Modelling and evaluation of pairedcomparison experiments

Pairedcomparison is a popular method for deriving scale values; scale values are
numbers that represent observers' psychophysical responses to sets of physical stimuli. The
method requires that each observer is presented with pairs of stimuli and is asked which of
the pair is greater in terms of the psychophysical property being investigated (for example,
which of the pair is lighter). However, it is time consuming (especially when the number of
stimuli n is large) since there are n(nl )/2 possible paired comparisons and all of these must
be considered.
It is possible to carry out a socalled incomplete pairedcomparison experiment
where only a proportion p (0 < p < 1) of the pairs are considered. This thesis primarily
addresses questions about the design of incomplete pairedcomparison experiments. For
instance, what is the smallest value of p and how few observers are required that still allows
reliable estimates of the scale values? MonteCarlo computational simulations were carried
out with an ideal observer model assigned with bias. Data were analyzed based on
Morrissey's leastsquares solution. This evaluation indicated that satisfactory results can be
obtained with as few as 30% (in the case where each observer compared the same pairs) or
......
10% (in the case where each observer compared different pairs) of paired comparisons.
However, the actual proportion of paired comparisons depends upon k (the number of
observers) and n (the number of stimuli). A table was produced that indicated the value of p
required (for various values of n and k) required to give a certain level of performance (this
was somewhat arbitrarily defined as r2 = 0.95; where r is the expected Pearson product
moment correlation coefficient between the estimated scale values and their true values).
A psychophysical experiment was conducted employing both the pairedcomparison
method and the categorical judgement method to estimate scale values. Results from the
pairedcomparison experiment were consistent with those predicted from the MonteCarlo
computational simulations. The pairedcomparison experiment was analysed for various
values of p and its performance compared with results from the categorical judgement
method where n = 10. For the pairedcomparison method where p = 1 (where all of the pairs
are considered) the estimated scale values were more accurate than those from the
categorical judgement experiment; however, as p reduces, the accuracy of the scale values
from the pairedcomparison method also reduced. The point where the two techniques gave
broadly similar performance was at p = 0.2 (where each observer compared different sets of
pairs) or p = 0.4 (where all observers compared the same set of pairs),
