Show simple item record

Determining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods

dc.date.accessioned2005-12-22T02:20:51Z
dc.date.accessioned2018-11-24T10:24:22Z
dc.date.available2005-12-22T02:20:51Z
dc.date.available2018-11-24T10:24:22Z
dc.date.issued2005-01-28
dc.identifier.urihttp://hdl.handle.net/1721.1/30518
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/1721.1/30518
dc.description.abstractIn this thesis I will be concerned with linking the observed speechsignal to the configuration of articulators.Due to the potentially rapid motion of the articulators, the speechsignal can be highly non-stationary. The typical linear analysistechniques that assume quasi-stationarity may not have sufficienttime-frequency resolution to determine the place of articulation.I argue that the traditional low and high-level primitives of speechprocessing, frequency and phonemes, are inadequate and should bereplaced by a representation with three layers: 1. short pitch periodresonances and other spatio-temporal patterns 2. articulatorconfiguration trajectories 3. syllables. The patterns indicatearticulator configuration trajectories (how the tongue, jaws, etc. aremoving), which are interpreted as syllables and words.My patterns are an alternative to frequency. I use shorttime-domain features of the sound waveform, which can be extractedfrom each vowel pitch period pattern, to identify the positions of thearticulators with high reliability. These features are importantbecause by capitalizing on detailed measurements within a single pitchperiod, the rapid articulator movements can be tracked. No linearsignal processing approach can achieve the combination of sensitivityto short term changes and measurement accuracy resulting from thesenonlinear techniques.The measurements I use are neurophysiologically plausible: theauditory system could be using similar methods.I have demonstrated this approach by constructing a robust techniquefor categorizing the English voiced stops as the consonants B, D, or Gbased on the vocalic portions of their releases. The classificationrecognizes 93.5%, 81.8% and 86.1% of the b, d and gto ae transitions with false positive rates 2.9%, 8.7% and2.6% respectively.
dc.format.extent96 p.
dc.format.extent85678480 bytes
dc.format.extent3087600 bytes
dc.language.isoen_US
dc.subjectAI
dc.subjectspeech processing
dc.subjectstop consonants
dc.subjectpitch period
dc.subjectspatio-temporal patterns
dc.titleDetermining articulator configuration in voiced stop consonants by matching time-domain patterns in pitch periods


Files in this item

FilesSizeFormatView
MIT-CSAIL-TR-2005-005.pdf3.087Mbapplication/pdfView/Open
MIT-CSAIL-TR-2005-005.ps85.67Mbapplication/postscriptView/Open

This item appears in the following Collection(s)

Show simple item record