Semiometrics : producing a compositional view of influence
High-impact academic papers are not necessarily the most cited. For example, Einstein's 'Special Relativity' paper from 1905 received (and continues to receive) fewer citations from other papers than his 'Brownian Motion" paper of the same year, despite the former radically changing the course of an entire scientific discipline to a much greater extent. Similarly, 'impact' metrics using citation count alone are, it is argued, not adequate for determining the scientific influence of papers, authors or small groups of authors. Although valid, they remain controversial when used to determine influence of larger groups or journals. While the term 'impact' has become closely linked to a journal's citation-based Journal Impact Factor score, this thesis uses the term 'influence' to describe the wider effectiveness of research, combining citation and metadata analysis to allow richer calculations to be performed over large-scale document networks. As a result, more qualitative influence ratings can be determined and a broader outlook on scientific disciplines can be produced. These ratings are best applied using an ontology-based data source, allowing more efficient inference than under a traditional RDBMS system, and allowing easier integration between heterogeneous data sources. These metrics, termed 'Semantic Bibliometrics' or 'Semiometrics', can be applied at a variety of levels of granularity, allowing a compositional framework for impact and influence analysis. This thesis describes the process of data preparation, systems architecture, metric value and data integration for such a system, introducing novel approaches at all four stages, thereby creating a working semiometrics system for determining influence at different semantic levels of granularity.