Use this URL to cite or link to this record in EThOS:
Title: The importance of statistical measure when describing phenotype
Author: Hajne, Joanna
ISNI:       0000 0004 6057 9376
Awarding Body: University of Liverpool
Current Institution: University of Liverpool
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
Data collected in life sciences studies mostly include a genotype description of the organism, a phenotype characterisation of the organism, and experiment-specific covariates including a description of experimental procedures and laboratory (environmental) conditions. Here, phenotype measurements are taken for Neurospora crassa (wild type) growing on agar in the standard laboratory conditions. I define a phenotype as a set of traits including apical extension velocity, branching angle, and branching distance. I use the above measures (traits) to model (estimate) biologically complex filamentous fungi network as a simplified 'In Silico Fungus' consisting of series of straight lines. Phenotype data, under the central limit theorem, is often characterized by means and standard deviations. Subsequently, P values are used to show statistical validity. Here, I question whether making normality assumption based on the popularity of such approach is always justified. Therefore, I test three different scenarios by making different assumptions about the data collected. (1) Firstly, I use the most popular approach: I assume the phenotype data comes from the continuous, normal (Gauss) distribution. Thus, I predict the future measurement outcomes by using normal (Gauss) parametric approximation. (2) Secondly, I use the most intuitive approach: I do not make any assumptions about the data collected and use it to predict the future measurement outcomes by withdrawing values pseudo randomly from the actual, raw, and discrete dataset. (3) Finally, I use the strategy balanced between the previous two: I construct a customised, continuous, and non-parametric distribution based on the data collected. Thus, I predict the future measurement outcomes by using kernel density estimation method. Subsequently, I implement all of the strategies above: (1), (2), and (3) in the in silico fungus programme to compare the computer simulation outcomes. More specifically, I compare the surface coverage, expressed as the proportion of the surface occupied by the fungus. Obtained results show that the differences between different data regimes (1), (2), and (3) are significant. Therefore, I conclude that the correct assessment of the data normality is crucial for the correct interpretation and implementation of scientific observations. I suspect the described data classification process determines successful implementation of biological findings especially in the fields such as medicine and engineering.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Q Science (General) ; QA Mathematics ; QA75 Electronic computers. Computer science ; QA76 Computer software ; QD Chemistry ; QR Microbiology ; RZ Other systems of medicine ; T Technology (General) ; TA Engineering (General). Civil engineering (General)