Perceptual Evaluation of Video-Realistic Speech

Perceptual Evaluation of Video-Realistic Speech

dc.date.accessioned	2004-10-20T21:05:09Z
dc.date.accessioned	2018-11-24T10:23:39Z
dc.date.available	2004-10-20T21:05:09Z
dc.date.available	2018-11-24T10:23:39Z
dc.date.issued	2003-02-28	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/7275
dc.identifier.uri	http://repository.aust.edu.ng/xmlui/handle/1721.1/7275
dc.description.abstract	abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.	en_US
dc.format.extent	17 p.	en_US
dc.format.extent	1515741 bytes
dc.format.extent	1358361 bytes
dc.language.iso	en_US
dc.subject	AI	en_US
dc.subject	visual speech	en_US
dc.subject	speech animation	en_US
dc.subject	face animation	en_US
dc.subject	image morphing	en_US
dc.subject	lip reading	en_US
dc.title	Perceptual Evaluation of Video-Realistic Speech	en_US

Files in this item

Files	Size	Format	View
AIM-2003-003.pdf	1.358Mb	application/pdf	View/Open
AIM-2003-003.ps	1.515Mb	application/postscript	View/Open

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625

Show simple item record

Perceptual Evaluation of Video-Realistic Speech

Files in this item

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625