Fast latent semantic indexing of spoken documents by using self-organizing maps

作者: M. Kurimo

DOI: 10.1109/ICASSP.2000.859331

关键词:

摘要: This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is broadcast news from radio and TV as combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) information retrieval (IR). For indexing, the documents are presented vectors word counts, whose dimensionality rapidly reduced by random mapping (RM). obtained projected into subspace determined SVD, where then smoothed self-organizing map (SOM). smoothing closest document clusters important here, because often short have high error rate (WER). As in reflect topics, SOMs provide an easy way to visualize index query results explore database. Test reported TREC's databases (www.idiap.ch/kurimo/thisl.html).

参考文章(8)
Mikko Kurimo, Chafic Mokbel, Latent Semantic Indexing by Self-Organizing Map ESCA ETRW workshop on Accessing Information in Spoken Audio. pp. 25- 30 ,(1999)
Thomas Hofmann, Probabilistic Topic Maps: Navigating through Large Text Collections intelligent data analysis. pp. 161- 172 ,(1999) , 10.1007/3-540-48412-4_14
Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, Santosh Vempala, Latent semantic indexing: a probabilistic analysis symposium on principles of database systems. pp. 159- 168 ,(1998) , 10.1145/275487.275505
Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala, Latent Semantic Indexing symposium on principles of database systems. ,vol. 61, pp. 217- 235 ,(2000) , 10.1006/JCSS.2000.1711
S. Renals, The THISL spoken document retrieval project international conference on multimedia computing and systems. ,vol. 2, pp. 1049- 1051 ,(1999) , 10.1109/MMCS.1999.778655
J.R. Bellegarda, A statistical language modeling approach integrating local and global constraints ieee automatic speech recognition and understanding workshop. pp. 262- 269 ,(1997) , 10.1109/ASRU.1997.659014
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman, Indexing by Latent Semantic Analysis Journal of the Association for Information Science and Technology. ,vol. 41, pp. 391- 407 ,(1990) , 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
J.R. Bellegarda, J.W. Butzberger, Yen-Lu Chow, N.B. Coccaro, D. Naik, A novel word clustering algorithm based on latent semantic analysis international conference on acoustics speech and signal processing. ,vol. 1, pp. 172- 175 ,(1996) , 10.1109/ICASSP.1996.540318