dc.description.abstract | The human visual system lets us perceive the world around us in three dimensions
by integrating evidence from depth cues into a coherent visual model of the world. The equivalent in computer vision and computer graphics are geometric models,
which provide a wealth of information about represented objects, such as depth and
surface normals. Videos do not contain this information, but only provide per-pixel
colour information. In this dissertation, I hence investigate a combination of videos
and geometric models: videos with per-pixel depth (also known as
RGBZ videos).
I consider the full life cycle of these videos: from their acquisition, via filtering and
processing, to stereoscopic display.
I propose two approaches to capture videos with depth. The first is a spatiotemporal
stereo matching approach based on the dual-cross-bilateral grid – a novel real-time
technique derived by accelerating a reformulation of an existing stereo matching
approach. This is the basis for an extension which incorporates temporal evidence in
real time, resulting in increased temporal coherence of disparity maps – particularly
in the presence of image noise.
The second acquisition approach is a sensor fusion system which combines data
from a noisy, low-resolution time-of-flight camera and a high-resolution colour
video camera into a coherent, noise-free video with depth. The system consists
of a three-step pipeline that aligns the video streams, efficiently removes and fills
invalid and noisy geometry, and finally uses a spatiotemporal filter to increase the
spatial resolution of the depth data and strongly reduce depth measurement noise.
I show that these videos with depth empower a range of video processing effects
that are not achievable using colour video alone. These effects critically rely on the
geometric information, like a proposed video relighting technique which requires
high-quality surface normals to produce plausible results. In addition, I demonstrate
enhanced non-photorealistic rendering techniques and the ability to synthesise
stereoscopic videos, which allows these effects to be applied stereoscopically.
These stereoscopic renderings inspired me to study stereoscopic viewing discomfort.
The result of this is a surprisingly simple computational model that predicts the
visual comfort of stereoscopic images. I validated this model using a perceptual
study, which showed that it correlates strongly with human comfort ratings. This
makes it ideal for automatic comfort assessment, without the need for costly and
lengthy perceptual studies. | |