Consonant Challenge: An Introduction

Listeners outperform automatic speech recognition systems at every level of speech recognition, including the very basic level of consonant recognition. What is not clear is where the human advantage originates. Does the fault lie in the acoustic representations of speech or in the recogniser architecture, or in a lack of compatibility between the two? There have been relatively few studies comparing human and automatic speech recognition on the same task, and, of these, overall identification performance is the dominant metric. However, there are many insights which might be gained by carrying out a far more detailed comparison.

The purpose of this Special Session is to promote focused human-computer comparisons on a task involving consonant identification in noise, with all participants using the same training and test data. Training and test data and native listener and baseline recogniser results will be provided by the organisers, but participants are encouraged to also contribute listener responses.

Contributions are sought in (but not limited to) the following areas:

  • Psychological models of human consonant recognition
  • Comparisons of front-end ASR representations
  • Comparisons of back-end recognisers
  • Exemplar vs statistical recognition strategies
  • Native/Non-native listener/model comparisons

The results of the Challenge will be presented at a Special Session of Interspeech’08 in Brisbane, Australia.

Although the Interspeech 2008 deadline has passed, the Consonant Challenge remains open and we are happy to host new results. Please send us an e-mail if you have any contributions you want put on this website.

