Karl Pearson : evolutionary biology and the emergence of a modern theory of statistics (1884-1936)
This thesis examines the development of modern statistical theory and its emergence as a highly specialised mathematical discipline at the end of the nineteenth century. The statistical work of the mathematician and statistician Karl Pearson (1857-1936), who almost singularly created the modern theory of statistics, is the focus of the thesis. The impact of the statistical and experimental work of the Darwinian zoologist W.F.R. Weldon (1860-1906), on the emergence and construction of Pearsonian statistical innovation, is central to the arguments developed in this thesis. Contributions to the Pearsonian corpus from such statisticians as Francis Ysidro Edgeworth (1845-1926), Francis Galton (1822-1911), and George Udny Yule (1871- 1951) are also addressed. The scope of the thesis does not involve a detailed account of every technical contribution that Pearson made to statistics. Instead, it provides a unifying assessment of Pearson's most seminal and innovative contributions to modern statistical theory devised in the Biometric School, at University College London, from 1892 to 1903. An assessment of Pearson's statistical contributions also entails a comprehensive examination of the two separate methodologies he developed in the Drapers' Biometric Laboratory (from 1903 to 1933) and in the Galton Eugenics Laboratory (from 1907 to 1933). This thesis arises, in part, from a desire to reassess the state of the historiography of Pearsonian statistics over the course of the last half century. Some of the earliest work on Pearson came from his former students who emphasised his achievements as a statistician usually from the perspective of the state of the discipline in their tune. The conventional view has presumed that Pearson's relationship with Galton and thus to Gallon's work on simple correlation, simple regression, inheritance and eugenics provided the impetus to Pearson's own statistical work. This approach, which focuses on a part of Pearson's statistical work, has provided minimal insight into the complexity of the totality of Pearsonian statistics. Another approach, derived from the sociology of knowledge in the 1970s, espoused this conventional view and linked Pearson's statistical work to eugenics by placing his work in a wider context of social and political ideologies. This has usually entailed frequent recourse to Pearson's social and political views vis-a-vis his popular writings on eugenics. This approach, whilst indicating the political and social dimensions of science, has produced a rather mono-causal or uni-dimensional view of history. The crucial question of the relation between his technical contributions and his ideology in the construction of his statistical methods has not yet been adequately considered. This thesis argues that the impetus to Pearson's earliest statistical work was given by his efforts to tackle the problems of asymmetrical biological distributions (arising from Weldon's dimorphic distribution of the female shore crab in the Bay of Naples). Furthermore, it argues that the fundamental developments and construction of Pearsonian statistics arose from the Darwinian biological concepts at the centre of Weldon's statistical and experimental work on marine organisms in Naples and in Plymouth. Charles Darwin's recognition that species comprised different sets of 'statistical' populations (rather than consisting of 'types' or 'essences') led to a reconceptualisation of statistical populations by Pearson and Weldon which, in turn, led to their attempts to find a statistical resolution of the pre-Darwinian Aristotelian essentialistic concept of species. Pearson's statistical developments thus involved a greater consideration of speciation and of Darwin's theory of natural selection than hitherto considered. This has, therefore, entailed a reconstruction of the totality of Pearsonian statistics to identify the mathematical and biological developments that underpinned his work and to determine other sources of influence in this development. Pearson's writings are voluminous: as principal author he published more than 540 papers and books of which 361 are statistical. The other publications include 67 literary and historical writings, 49 eugenics publications, 36 pure mathematics and physics papers and 27 reports on university matters. He also published at least 111 letters, notes and book reviews. His collected papers and letters at University College London consist of 235 boxes of family papers, scientific manuscripts and 14,000 letters. One of the most extensive sets of letters in the collection are those of W.F.R. Weldon and his wife, Florence Joy Weldon, which consists of nearly 1,000 pieces of correspondence. No published work on Pearson to date has properly utilised the correspondence between Pearson and the Weldons. Particular emphasis has been given to this collection as these letters indicate (in tandem with Pearson's Gresham lectures and the seminal statistical published papers) that Pearson's earliest statistical work started in 1892 (rather than 1895-1896) and that Weldon's influence and work during these years was decisive in the development and advancement of Pearsonian statistics. The approach adopted in this thesis is essentially that of an intellectual biography which is thematic and is broadly chronological. This approach has been adopted to make greater use of primary sources in an attempt to provide a more historically sensitive interpretation of Pearson's work than has been used previously. It has thus been possible to examine these three (as yet unexamined) key Pearsonian developments: (1) his earliest statistical work (from 1892 to 1895), (2) his joint biometrical projects with Weldon (from 1898-1906) and a shift in the focus of research in the Drapers' Biometric Laboratory following Weldon's death in 1906 and (3) the later work in the twentieth century when he established the two laboratories which were underpinned by two separate methodologies. The arguments, which follow a chronological progression, have been built around Darwin's ideas of biological variation, 'statistical' populations, his theory of natural selection and Galton's law of ancestral inheritance. The first two chapters provide background material to the arguments developed in the thesis. Weldon's use of correlation (for the identification of species) in 1889 is examined in Chaper III. It is argued, that Pearson's analysis of Weldon's dimorphic distribution led to their work on speciation which led on to Pearson's earliest innovative statistical work. Weldon's most productive research with Pearson, discussed in Chapter IV, came to fruition when he showed empirical evidence of natural selection by detecting disturbances (or deviations) in the distribution from normality as a consequence of differential mortality rates. This research enabled Pearson to further develop his theory of frequency distributions. The central part of the thesis broadens out to examine further issues not adequately examined. Galton's statistical approach to heredity is addressed in Chapter V, and it is shown that Galton adumbrated Pearson's work on multiple correlation and multiple regression with his law of ancestral heredity. This work, in conjunction with Weldon's work on natural selection, led to Pearson's introduction of the use of determinantal matrix algebra into statistical theory in 1896: this (much neglected) development was pivotal in the professionalisation of the emerging discipline of mathematical statistics. Pearson's work on goodness of fit testing provided the machinery for reconstructing his most comprehensive statistical work which spanned four decades and encompassed his entire working life as a statistician. Thus, a greater part of Pearsonian statistics has been examined than in previous studies.