JASMIN-CGN: Uitbreiding van het CGN met spraak van Jongeren, Anderstaligen en Senioren
|JASMIN-CGN: Uitbreiding van het CGN met spraak van Jongeren, Anderstaligen en Senioren
|Year of Publication
|Dag van de Fonetiek 2006
|van Herwijnen, Olga, and Catia Cucchiarini
|Nederlandse Vereniging voor Fonetische Wetenschappen
|Utrecht, The Netherlands
Large speech corpora constitute an indispensable resource for conducting research in speech processing and for developing real-life speech applications. In 2004 the Spoken Dutch Corpus (Corpus Gesproken Nederlands – CGN: a corpus of standard Dutch as spoken by adult natives in the Netherlands and Flanders) became available. Owing to budget constraints, CGN does not include speech of children, non-natives, elderly people and recordings of speech produced in human-machine interactions. Since such recordings would be extremely useful for conducting research and for developing HLT applications for these specific groups of speakers of Dutch, a project was started to extend CGN by collecting a corpus of contemporary Dutch as spoken by children of different age groups, non-natives with different mother tongues and elderly people in the Netherlands and Flanders (JASMIN-CGN). In addition, in this project speech material will be collected in a communication setting that was not envisaged in CGN: human-machine interaction. One third of the data will be collected in Flanders and two thirds in the Netherlands. In this talk I will discuss the rationale of the project, the corpus design, the speech material, the procedure and the use that can be made of the results of this project.