Our previous experiences have showed that both CSLR SONIC and CMU SPHINX are two versatile and powerful tools for Automatic Speech Recognition (ASR). Encouraged by the good results we had, these two systems have been compared in another important challenge of ASR: the recognition of children's speech. In this work, SPHINX has been used to build from scratch a recognizer for Italian children's speech and the results have been compared to those obtained with SONIC, both in previous and in some new experiments, which were designed in order to have uniform experimental conditions between the two different systems. This report describes the training process and the evaluation methodology regarding a speaker-independent phonetic-recognition task. First, we briefly describe the system architectures and their differences, and then we analyze the task, the corpus and the techniques adopted to face the recognition problem. The scores of multiple tests in terms of Phonetic Error Rate (PER) and an analysis on differences of the two systems are shown in the final discussion. SONIC has turned out to have the best overall performance and it obtained a minimum PER of 12.4% with VTLN and SMAPLR adaptation. SPHINX was the easiest system to train and test and its performance (PER of 17.2% with comparable adaptations) was only some percentage points far from those in SONIC.
Comparing SPHINX vs. SONIC Italian Children Speech Recognition Systems
Contributo in atti di convegno
Bulzoni, Roma, ITA
Contesto comunicativo e variabilità nella produzione e percezione della lingua, Atti del VII Convegno Nazionale AISV - Associazione Italiana di Scienze della Voce, 2011, pp. 414–425, Lecce, Italia, 26-28 January 2011