Title: Automated collation and digital editions : from theory to practice
Author: Nury, Elisa Laure
ISNI:       0000 0004 7660 1021
Awarding Body: King's College London
Current Institution: King's College London (University of London)
Date of Award: 2018
The purpose of the dissertation is to investigate from a theoretical and methodological perspective the different tools that allow automated collation, and study the application of such tools to the creation of a digital critical edition in the context of Classical literature. By doing so, the dissertation examines many foundational but often neglected components of the philological method, such as the definition and wider implication of transcription, reading, and variant. The goal is to provide a reflection on automated collation and the theoretical as well as practical challenges it poses: what is automated collation? How is it performed, and what are the main differences with manual collation? What are the benefits of automated collation? Why has it not been widely adopted yet, despite the fact that it was developed to help scholars? How to process the results of collation programmes? As a case study, a Classical Latin text has been used to test automated collation and to compare the various existing tools. The method I follow in this dissertation is to apply automated collation to a selected text, the Declamations of Calpurnius Flaccus. To this purpose, the manuscript tra-dition, as well as the editio princeps, have been entirely transcribed. Afterwards, the transcriptions have been collated with different collation programmes. The results of the collation with different programmes have been examined and compared, as well as the possibilities offered to scholarly editors for visualising and further processing those results. The content of the thesis is divided into two distinct parts, from theory to practice: a first part discusses the relevance of collation in the broader context of critical editing, introduces automated collation and its various issues, as well as the implications of automated collation for key concepts such as 'witnesses', 'readings' and 'variants'; the second part of the thesis will describe the practical work on the text of Calpurnius Flaccus with automated collation programmes and the visualisation tool that was created to examine the collation results. Each part comprises a short introduction and conclusion which summarises its content and its outcome. In the process of editing a text, the collation of manuscripts and previous scholarly editions is a necessary and fundamental step. The work leading to the production of a critical edition is divided into two phases: the recension of witnesses, followed by the constitution of the text (Chiesa 2002). The editor, therefore, starts with the recension by gathering all the witnesses, manuscripts or editions, bearing a version of the edited text. Collation is the next step: comparing those witnesses to find the differences (or variants) between the versions. Finally, the editor analyses the variants in order to determine the genealogical relationships of the witnesses, if possible, and present those relationships in the form of a stemma codicum. The stemma's purpose is to help the editor produce a text 'as close as possible to the original' (Maas 1958, 1), and decide which variants are accepted as authorial and which variants are rejected as errors that got included in the tradition by the copyists of the manuscripts. After the recension, the editor prepares a critical text, selecting variant readings and making emendations when necessary. Collation is important because it is one of the first stages in the editing process and the data gathered during collation forms the basis upon which the editor will later make critical decisions (Whittaker 1991). Collation is performed because witnesses of a text always contain variant readings, and the editor needs to be in possession of all the alternatives in order to establish the text. As complete collations are not usually published, this represents a regrettable loss of information, especially given the amount of time and effort invested in collation by the editors (West 1973, 63). While collation is an essential part of textual criticism, collation is also a long, tedious and error-prone activity and needs to be checked more than once. For this reason, a new method was created: automated collation, which takes advantage of computers in order to compare texts and find variant readings. Scholars have been developing collation tools since the 1960s, but with limited success at first: what was considered a fairly mechanical process turned out to be more sophisticated than expected (Hockey 2000, 125). Since the pioneering work of Dearing (1962) and Froger (1968), automated collation has been studied for decades and has been constantly improved. In the past 50 years, close to thirty tools have been devised in order to obtain a collation with the support of increasingly complex algorithms. How does automated collation work? To simplify, the tools for automated collation take, as an input, a transcription of each witness that needs to be compared. The transcriptions are then aligned with each other through an alignment algorithm. The task of collation seems to highly benefit from the application of computing methods. The advantages offered by computing methods are various: consistency in the comparison, possibility to reuse the material in order to add new manuscripts to the collation, a common format to share collation data with other scholars. The results of automated collation can also be formatted for further processing, such as building a stemma with a different program or creating a digital scholarly edition. However, automated collation is not completely accepted by the community of scholars, nor is its method fully understood. In 1973, at the beginning of automated collation, scepticism was understandable because of the many restrictions of early tools such as the small number of witnesses collated, or the limitation to comparing lines of poetry. Faced with those limitations, West (1973, 71) stated that 'the time has not yet come when manuscripts can be collated automatically'. Forty years later, in spite of huge technical improvements, opinions have not really changed, to the point that Reeve declared that he was not convinced by computer methods, especially for large manuscript traditions (Reeve 2011, 393). But the computer is indeed supposed to be better at handling large amounts of data, and with more consistency than a human being. In fact, when dealing with large traditions, it is not always possible to sort out the relationships between manuscripts and draw a stemma by hand, yet editors are not keen on turning to electronic methods. Some of the obstacles to the wide adoption of automated collation seem connected to a general misunderstanding. There is a fear that somehow the computer will eventually be replacing the editor, and that editors will lose their right to apply their individual judgement to the texts. Greetham (2007, 23) regrets, for instance, that the role of individual, subjective evaluation 'has not always been recognised, especially by those wishing to emphasise the "scientific" aspects of the field'. The importance of individual judgement, and the role of the editor compared to the role of the computer, can be closely related to the black box issue (Sculley and Pasanek 2008): if scholars do not understand what a piece of software does, how can they trust that the programme did not deprive them of making certain choices or applying their own judgement? In the course of my PhD I had several exchanges with colleagues in the field of Classics which have highlighted several underlying misunderstandings on either side. For instance I was once told in an email that 'automated collation is impossible, because computers cannot read manuscripts'. This statement does not recognise, for instance, the fact that computers are collating from transcriptions made by scholars. From this point of view, transcription and collation are strictly connected activities. This statement, however, illustrates perfectly the main tension between traditional, manual collation and automated collation: the difference of methodol-ogy. The confusion arises here because the full transcription of manuscripts is not a part of the traditional heuristics in textual criticism. A generalised lack of tools and guidelines for the production of digital scholarly editions may also explain why automated collation is not the solution of choice.
Supervisor: Pierazzo, Elena ; Moul, Victoria Alice Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available