Multimedia retrieval

TitleMultimedia retrieval
Publication TypePresentation
Year of Publication2006
Conference NameSummer Meeting on Corpus-based Research
Authorsvan Hessen, Arjan
PublisherNederlandse Vereniging voor Fonetische Wetenschappen
Conference LocationNijmegen, The Netherlands

The number of digital multimedia collections is growing rapidly. Due to the ever declining costs of recording audio and video, and due to improved preservation technology, huge data sets containing text, audio, video and images are created, both by professionals and non-professionals.

The reasons for building up these collections may vary. Organisations such as broadcast companies consider the production and publishing of multimedia data as their core business. Within these companies there is a tendency to search for "means" to get more out the produced content: a nice example is the added basic search functionality in the "uitzending gemist" collection. Other organisations are merely interested in obtaining insight in the internal information flow, for internal (corporate meetings that are recorded) or public use (council meetings that are recorded and webcasted). A number of organisations in the Netherlands administer spoken-word archives: recordings of spoken interviews and testimonies on diverging topics such as retrospective narratives, eye witness reports and historical site descriptions. Modern variants of these spoken-word archives are archives of 'Podcasts', 'Vodcasts' (video podcasts) and 'Vlogs' (video weblog), created in order to share 'home-made' information with "the world".

The Human Media Interaction (HMI) group is set within the computer science department and the Centre of Telematics and Information Technology (CTIT) and has a long history in multimedia retrieval research. Especially the use of audio mining and speech recognition technology in multimedia retrieval (SDR or spoken document retrieval) is an important research focus.

The presentation is focussed on the possibility to index and access spoken archives via the use of automatic speech recognition technology. The index, based on the imperfect recognition results is then used to search the document collection and relate individual documents to other information sources in (potentially) any media format. We will discuss the running demo application in which the recognised speech of the 8 o'clock news is used to connect news items with 5 (most) similar newspaper documents from the Twente News Corpus.