Use this URL to cite or link to this record in EThOS:
Title: Bloom maps for big data
Author: Talbot, David
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2010
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
The ability to retrieve a value given a key is fundamental in computer science. Unfortunately as the a priori set from which keys are drawn grows in size, any exact data structure must use more space per key. This motivates our interest in approximate data structures. We consider the problem of succinctly encoding a map to support queries with bounded error when the distribution over values is known. We give a lower bound on the space required per key in terms of the entropy of the distribution over values and the error rate and present a generalization of the Bloom filter, the Bloom map, that achieves the lower bound up to a small constant factor. We then develop static and on-line approximation schemes for frequency data that use constant space per key to store frequencies with bounded relative error when these follow a power law. Our on-line construction has constant expected update complexity per observation and requires only a single pass over a data set. Finally we present a simple framework for using a priori knowledge to reduce the error rate of an approximate data structure with one-sided error. We evaluate the data structures proposed here empirically and use them to construct randomized language models that significantly reduce the space requirements of a state-of-the-art statistical machine translation system.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available