Title:

A critical analysis of the role of statistical significance testing in education research, with special attention to mathematics education

This study analyzes the role of statistical significance testing (SST) in education. Although the basic logic underlying SST 一 a hypothesis is rejected because the observed data would be very unlikely if the hypothesis is true 一 appears so obvious that many people are tempted to accept it, it is in fact fallacious. In the light of its historical background and conceptual development, discussed in Chapter 2, the Fisher’s significance testing, NeymanPearson hypothesis testing and their hybrids are clearly distinguished. We argue that the probability of obtaining the observed or more extreme outcomes (p value) can hardly act as a measure of the strength of evidence against the null hypothesis. After discussing the five major interpretations of probability, we conclude that if we do not accept the subjective theory of probability, talking about the probability of a hypothesis that is not the outcome of a chance process is unintelligible. But the subjective theory itself has many intractable difficulties that can hardly be resolved. If we insist on assigning a probability value to a hypothesis in the same way as we assign one to a chance event, we have to accept that it is the hypothesis with low probability, rather than high probability, that we should aim at when conducting scientific research. More important, the inferences behind SST are shown to be fallacious from three different perspectives. The attempt to invoke the likelihood ratio with the observed or more extreme data instead of the probability of a hypothesis in defending the use of р value as a measure of the strength of evidence against the null hypothesis is also shown to be misleading because it can be demonstrated that the use of tail region to represent a result that is actually on the border would overstate the evidence against the ทน11 hypothesis.Although NeymanPearson hypothesis testing does not involve the concept of the probability of a hypothesis, it does have some other serious problems that can hardly be resolved. We show that it cannot address researchers' genuine concerns. By explaining why the level of significance must be specified or fixed prior to the analysis of data and why a blurring of the distinction between the р value and the significance level would lead to undesirable consequences, we conclude that the NeymanPearson hypothesis testing cannot provide an effective means for rejecting false hypotheses. After a thorough discussion of common misconceptions associated with SST and the major arguments for and against SST, we conclude that SST has insurmountable problems that could misguide the research paradigm although some other criticisms on SST are not really as justified. We also analyze various proposed alternatives to SST and conclude that confidence intervals (CIs) are no better than SST for the purpose of testing hypotheses and it is unreasonable to expect the existence of a statistical test that could provide researchers with algorithms or rigid rules by conforming to which all problems about testing hypotheses could be solved. Finally, we argue that falsificationism could eschew the disadvantages of SST and other similar statistical inductive inferences and we discuss how it could bring education research into a more fruitful situation in which to their practices. Although we pay special attention to mathematics education, the core of the discussion in the thesis might apply equally to other educational contexts.
