JASMIN-CGN: Uitbreiding van het CGN met spraak van Jongeren, Anderstaligen en Senioren

TitleJASMIN-CGN: Uitbreiding van het CGN met spraak van Jongeren, Anderstaligen en Senioren
Publication TypePresentation
Year of Publication2006
Conference NameDag van de Fonetiek 2006
Authorsvan Herwijnen, Olga, and Catia Cucchiarini
PublisherNederlandse Vereniging voor Fonetische Wetenschappen
Conference LocationUtrecht, The Netherlands

Large speech corpora constitute an indispensable resource for conducting research in speech processing and for developing real-life speech applications. In 2004 the Spoken Dutch Corpus (Corpus Gesproken Nederlands – CGN: a corpus of standard Dutch as spoken by adult natives in the Netherlands and Flanders) became available. Owing to budget constraints, CGN does not include speech of children, non-natives, elderly people and recordings of speech produced in human-machine interactions. Since such recordings would be extremely useful for conducting research and for developing HLT applications for these specific groups of speakers of Dutch, a project was started to extend CGN by collecting a corpus of contemporary Dutch as spoken by children of different age groups, non-natives with different mother tongues and elderly people in the Netherlands and Flanders (JASMIN-CGN). In addition, in this project speech material will be collected in a communication setting that was not envisaged in CGN: human-machine interaction. One third of the data will be collected in Flanders and two thirds in the Netherlands. In this talk I will discuss the rationale of the project, the corpus design, the speech material, the procedure and the use that can be made of the results of this project.