A statistical model of internet traffic
We present a method to extract a time series (Number of Active Requests (NAR)) from web cache logs which serves as a transport level measurement of internet traffic. This series also reflects the performance or Quality of Service of a web cache. Using time series modelling, we interpret the properties of this kind of internet traffic and its effect on the performance perceived by the cache user. Our preliminary analysis of NAR concludes that this dataset is suggestive of a long-memory self-similar process but is not heavy-tailed. Having carried out more in-depth analysis, we propose a three stage modelling process of the time series: (i) a power transformation to normalise the data, (ii) a polynomial fit to approximate the general trend and (iii) a modelling of the residuals from the polynomial fit. We analyse the polynomial and show that the residual dataset may be modelled as a FARIMA(p, d, q) process. Finally, we use Canonical Variate Analysis to determine the most significant defining properties of our measurements and draw conclusions to categorise the differences in traffic properties between the various caches studied. We show that the strongest illustration of differences between the caches is shown by the short memory parameters of the FARIMA fit. We compare the differences revealed between our studied caches and draw conclusions on them. Several programs have been written in Perl and S programming languages for this analysis including totalqd.pl for NAR calculation, fullanalysis for general statistical analysis of the data and armamodel for FARIMA modelling.