Consonant Challenge: Analysis of the Speech Materials

Identification of unusable tokens

Manual listening led to the discovery of 301 unusable tokens (2.9% of the entire corpus). About 16% of these are not poor pronunciations but irrecoverable endpointing errors. A few (4%) contain some noise e.g. finger tapping which interferes with the VCV. The remaining 80% are mispronunciations of one kind or another. The table below includes all errors regardless of source. Columns are speakers.

 

gender f m m m m f f m m m f f f m m m m f m f f f f f
SPKR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 SUM %
b 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0 0 0 1 0 6 1.39
d 0 0 0 0 0 0 0 0 1 2 0 0 1 1 0 0 0 0 0 0 0 1 0 0 6 1.39
g 0 0 0 1 0 0 0 7 4 2 0 0 1 0 2 1 2 2 0 0 0 1 1 2 26 6.02
p 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 5 1.16
t 1 0 0 0 2 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0 2 9 2.08
k 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 3 0.69
s 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0.69
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0.23
f 0 1 0 1 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 7 1.62
v 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1.39
0 1 0 1 0 1 0 0 16 3 3 0 0 1 0 1 6 0 0 0 0 1 0 2 36 8.33
0 0 0 0 6 0 0 1 10 14 0 4 1 0 0 0 4 2 3 0 0 11 1 1 58 13.4
t 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 2 0 2 7 1.62
z 0 0 0 0 1 0 0 1 2 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 6 1.39
0 0 0 0 1 0 0 2 2 9 0 2 9 0 16 1 2 2 0 0 2 2 0 6 56 13.0
h 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 4 0.93
d 0 0 0 1 0 0 0 0 1 3 0 0 1 0 0 1 8 0 1 0 1 0 1 2 20 4.63
m 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0.46
n 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0.69
0 0 0 0 0 0 0 1 3 1 0 0 0 0 0 0 0 3 1 0 0 1 1 2 13 3.01
w 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 4 0.93
0 4 0 6 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 12 2.78
y 0 1 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 6 1.39
l 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0.46
SUM 3 10 1 13 15 4 1 17 43 41 3 6 17 3 18 4 26 14 5 0 4 21 7 25 301
% 1 2 0 3 3 1 0 4 10 9 1 1 4 1 4 1 6 3 1 0 1 5 2 6 2.9

 

Consonants

The table indicates that the production of certain consonants (notable those with an ambiguous orthographic correspondence) proved problematic for some speakers. The consonants , and accounted for a significant proportion of production errors, while the relatively high number for /g/ were due to mispronunciation of /i:gi:/ as /i:di:/.

Vowels

Inspection of the data also showed that there is some variation in the production of some vowels, including complete vowel reduction (principally for /ae/) and centralisation of /i:/ and /u:/ in a small number of instances. These tokens were retained.

Stress

A pilot listening test to identify correct vowel stress suggested that not all speakers produced the correct stress pattern. Participants who want to use stress information in training the ASR systems should be aware of this.

Back to Introduction