Title:
|
An investigation on the use of gaze for eye-wear computing
|
This thesis investigates ways in which attention, both temporal and spatial,
as measured by gaze and head motion can be used to alleviate complexity in
computer vision tasks.
Humans mostly express attention by using their eyes, but also their head gaze.
These gaze patterns often indicate an object or area of interest. This thesis
first presents a method to extract the user's attention from eye gaze fixation
and to filter outliers, then considers head gaze as source of information.
The first approach of user's attention estimation considered uses a combination
of eye gaze fixations, observed region's visual appearance, 3D pose
information, and the user's motion. The proposed method then employs the
user's attention to identify the object of interest and produce a 3D model
reconstruction of that object. The approach is evaluated for both indoor and
outdoor objects, and compares against baseline image segmentation alternatives.
Secondly, a method to discover task-relevant objects by using the attention
extracted from users performing daily life activities is presented. A graphical
model representing an activity is generated and used to demonstrate how the
method can predict the next object to be interacted with. In addition, 3D
models of the task-relevant objects are reconstructed using the gaze collected
from the users. This method shows that the information gathered from eye
gaze can be used to teach the computer about the tasks being performed.
Thirdly, a method to estimate the user's visual temporal and spatial attention
using the user's ego-motion is presented. The method allows the
ego-motion to be determined from scene motion features obtained from optical
flow and a head-mounted Inertial Measurement Unit (IMU). Threshold
values for the temporal attention model, which states the 'when' the user is
paying attention to a task, are selected using an ROC-evaluated approach.
The spatial attention model, which indicates 'where' the user is looking at in
the environment, is built using a data-driven approach via kernel regression.
Comparative results using different motion features extracted from the optical
flow and the IMU are provided, and results show the advantage of the
proposed IMU-based model.
Finally, the thesis proposes a mixed reality system named GlaciAR, which
aims to augment users for task guidance applications. The system combines
the ideas presented before and uses an automated and unsupervised information
collection approach to generate video guides. It is a self-contained
system and runs onboard an eye-wear computer (Google Glass). It is able
to automatically determine the user's attention using an IMU-based head
motion model that collects video snippet information. It then is able to also
trigger video guides to help inexperienced users performing the task. The
evaluation of the system compares to video guides manually edited by experts
and shows validation of the proposed attention model for collecting
and triggering guidance.
|