Use this URL to cite or link to this record in EThOS:
Title: A machine learning framework for optimising file distribution across multiple cloud storage services
Author: Algarni, Abdullah Fayez H.
ISNI:       0000 0004 6421 9773
Awarding Body: University of York
Current Institution: University of York
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
Storing data using a single cloud storage service may lead to several potential problems for the data owner. Such issues include service continuity, availability, performance, security, and the risk of vendor lock-in. A promising solution is to distribute the data across multiple cloud storage services , similarly to the manner in which data are distributed across multiple physical disk drives to achieve fault tolerance and to improve performance . However, the distinguishing characteristics of different cloud providers, in term of pricing schemes and service performance, make optimising the cost and performance across many cloud storage services at once a challenge. This research proposes a framework for automatically tuning the data distribution policies across multiple cloud storage services from the client side, based on file access patterns. The aim of this work is to explore the optimisation of both the average cost per gigabyte and the average service performance (mainly latency time) on multiple cloud storage services . To achieve these aims, two machine learning algorithms were used: 1. supervised learning to predict file access patterns. 2. reinforcement learning to learn the ideal file distribution parameters. File distribution over several cloud storage services . The framework was tested in a cloud storage services emulator, which emulated a real multiple-cloud storage services setting (such as Google Cloud Storage, Amazon S3, Microsoft Azure Storage, and Rack- Space file cloud) in terms of service performance and cost. In addition, the framework was tested in various settings of several cloud storage services. The results of testing the framework showed that the multiple cloud approach achieved an improvement of about 42% for cost and 76% for performance. These findings indicate that storing data in multiple clouds is a superior approach, compared with the commonly used uniform file distribution and compared with a heuristic distribution method.
Supervisor: Kudenko, Daniel Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available