Dating Victorians : an experimental approach to stylochronometry
The writing style of a number of authors writing in English was empirically investigated for the purpose of detecting stylistic patterns in relation to advancing age. The aim was to identify the type of stylistic markers among lexical, syntactical, phonemic, entropic, character-based, and content ones that would be most able to discriminate between early, middle, and late works of the selected authors, and the best classification or prediction algorithm most suited for this task. Two pilot studies were initially conducted. The first one concentrated on Christina Georgina Rossetti and Edgar Allan Poe from whom personal letters and poetry were selected as the genres of study, along with a limited selection of variables. Results suggested that authors and genre vary inconsistently. The second pilot study was based on Shakespeare's plays using a wider selection of variables to assess their discriminating power in relation to a past study. It was observed that the selected variables were of satisfactory predictive power, hence judged suitable for the task. Subsequently, four experiments were conducted using the variables tested in the second pilot study and personal correspondence and poetry from two additional authors, Edna St Vincent Millay and William Butler Yeats. Stepwise multiple linear regression and regression trees were selected to deal with the first two prediction experiments, and ordinal logistic regression and artificial neural networks for two classification experiments. The first experiment revealed inconsistency in accuracy of prediction and total number of variables in the final models affected by differences in authorship and genre. The second experiment revealed inconsistencies for the same factors in terms of accuracy only. The third experiment showed total number of variables in the model and error in the final model to be affected in various degrees by authorship, genre, different variable types and order in which the variables had been calculated. The last experiment had all measurements affected by the four factors. Examination of whether differences in method within each task play an important part revealed significant influences of method, authorship, and genre for the prediction problems, whereas all factors including method and various interactions dominated in the classification problems. Given the current data and methods used, as well as the results obtained, generalizable conclusions for the wider author population have been avoided.