Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.791743
Title: Computer vision and natural language processing for people with vision impairment
Author: Massiceti, Daniela
ISNI:       0000 0004 8503 5199
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
Globally 253 million people live with severely impaired vision. They face extensive challenges in their day-to-day lives, from independently navigating and socialising, to fine-grained tasks like reading, and identifying and interacting with objects. Assistive tools have been developed to ease these challenges, ranging from the white cane to talking wearables, however, most remain simplistic, doing little in the way of parsing and understanding the dynamic, 3D world required for safe and easy interaction. Rapid advances in machine learning, computer vision and natural language processing, however, coupled with the miniaturisation of electronics and proliferation of mobile devices and wearables, are redefining the landscape of assistive technologies for people with vision impairment. This thesis takes concrete steps toward the goal of data-driven assistive technologies by exploring methods for i) understanding visual scenes, ii) relaying this information to visually impaired (VI) users, and iii) evaluating models for relaying information through natural language. In the first direction, we develop a state-of-the-art weakly-supervised semantic segmentation method which segments objects using only classification labels. These labels can easily be collected from vi users as they interact with objects in the real-world. In the second, we develop two methods for relaying information about the environment. We do this by i) creating spatial audio soundscapes of 3D scenes, and ii) allowing users to directly ask questions about their visual environment, which the system then answers. We validate the first on human participants in virtual reality environments, and the second using quantitative metrics on experimental datasets. In the third direction we investigate a widely-used dataset for a sequential visual question-answering task and find that it contains exploitable biases which are exacerbated by poor evaluation metrics. We then propose a better method for evaluating, and quantifying performance on this task. In the future data-driven assistive technologies hold much promise for people with vision impairment. Efforts must, however, consider how well data-driven models port to real-world scenarios, where i) the incoming data will be considerably different from that in established visual perception datasets, and ii) the portability constraints of mobile devices are much more stringent.
Supervisor: Torr, Philip ; Hicks, Stephen Sponsor: European Research Council ; South African Skye Foundation ; Engineering and Physical Sciences Research Council ; University of Oxford Clarendon Fund
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.791743  DOI: Not available
Keywords: Computer Vision ; Machine learning ; Natural Language Processing
Share: