作者: M. Kurimo
DOI: 10.1109/ICASSP.2000.859331
关键词:
摘要: This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is broadcast news from radio and TV as combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) information retrieval (IR). For indexing, the documents are presented vectors word counts, whose dimensionality rapidly reduced by random mapping (RM). obtained projected into subspace determined SVD, where then smoothed self-organizing map (SOM). smoothing closest document clusters important here, because often short have high error rate (WER). As in reflect topics, SOMs provide an easy way to visualize index query results explore database. Test reported TREC's databases (www.idiap.ch/kurimo/thisl.html).