Avoiding the Ham in Hamster

Modelling the use of non-segmental information in human spoken-word recognition

In speech, there are no pauses between words. Therefore, if words are constructed from a limited set of abstract phonemes, virtually each phoneme string is compatible with many words. Listeners, however, recognise the intended word sequence effortlessly. Even in the case of fully embedded words such as ham (embedded word) in hamster (embedding word), listeners can make the distinction before the end of ham. There is now considerable evidence that sub-segmental (e.g., time course of change in acoustic-phonetic features) and supra-segmental (e.g., prosody) cues in the speech signal (hereafter together referred to as non-segmental cues) modulate speech recognition, and help the listener in segmenting the speech signal into syllables and words.

Mainstream theories of spoken word recognition hold that words in the mental lexicon are represented as sequences of abstract segments, with only stress patterns as non-segmental information. Mainstream theories of word recognition, exemplified by the computational model SpeM, will therefore erroneously predict that listeners can only recognise an embedded word after part of the subsequent context has been heard.

In the proposed research, I will extend SpeM such that it can decode non-segmental cues in parallel with segmental decoding. I will investigate whether the non-segmental information in the speech signal can indeed be used to tell the difference between embedded and embedding words at – or before – the end of the embedded word, while still maintaining the hypothesis that non-segmental information is not explicitly stored in the mental lexicon.

Until now, the effect of non-segmental cues on speech recognition is only established for carefully pronounced laboratory speech. By performing analyses on corpora of increasingly more complex speech and testing the extended version of SpeM with read and conversational speech, this project will provide hitherto unknown knowledge about the nature and structure of non-segmental cues in everyday speech.

Keywords: Human word recognition, automatic speech recognition; computational modelling, sub- and supra-segmental cues, embedded words.