Use this URL to cite or link to this record in EThOS:
Title: Non-grid opportunistic resources for (big data) volunteer computing
Author: Rashid, Md Mamunur
ISNI:       0000 0004 6349 1038
Awarding Body: University of Kent
Current Institution: University of Kent
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
CPU-intensive computing at LHC (The Large Hadron Collider) requires collaborative distributed computing resources to accomplish its data reconstruction and analysis. Currently, institutional Grid is trying to manage and process large datasets within limited time and cost. The baseline paradigm is now well established to use the Computing Grid and more specifically the WLCG (Worldwide LHC Computing Grid) and its supporting infrastructures. In order to achieve its Grid Computing, LHCb has developed a Community Grid Solution called DIRAC (Distributed Infrastructure with Remote Agent Control). It is based on a pilot job submission system to the institutional Grid infrastructures. However, there are other computing resources like idle desktops (e.g. SETI@home), the idle computing cluster (e.g. CERN's Online selection farm outside data-taking periods by LHC detectors) that could be used outside the Grid infrastructures. Because of their lightweight, in particular, simulation activities could benefit from using those opportunistic resources. The DIRAC architecture allows the use of the existing institutional grid resources. To expand the capability of existing computing powers, I have proposed to integrate opportunistic resources in the distributed computing system (DIRAC). In order, not to be dependent on the local settings for the worker node at the external resource, I propose using virtual machines. The architectural modifications required for DIRAC are presented here, with specific examples for data analyses non-Grid clusters. This solution was achieved by making the necessary changes in 3 state-of-the-art technologies: DIRAC, CernVM and OpenNebula. The combination of these three techniques is referred to as the DiCON architecture. I am referring the new approach as a framework rather than a specific technical solution to a specific scientific problem as this can be reused in similar big data analysis environment. I have also shown how this was used to analyse large-scale climate data. This was a rather challenging to use one developed infrastructure to another research area. I have also proposed to use dataflow architecture to exploit the possibilities of opportunistic resources and in the meantime, establish reliability and stability. Dataflow computing architecture in a virtual environment is seen as a possible future research extension of this work. This is a theoretical contribution only and this is a unique approach in a virtual cloud (not in-house computing) environment. This paradigm could give the scientific community access to a large number of non- conventional opportunistic CPU resources for scientific data processing. This PhD work optimises the challenges and the solutions provided by such a computing infrastructure.
Supervisor: Wang, Frank Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral