Self organization of a massive text document collection

作者: Teuvo Kohonen , Samuel Kaski , Krista Lagus , Jarkko Salojärvi , Jukka Honkela

DOI: 10.1016/B978-044450270-4/50013-9

关键词:

摘要: Publisher Summary This chapter discusses that when the self-organizing map (SOM) is applied to mapping of documents, one can represent them statistically by their weighted word frequency histograms or some reduced representations be regarded as data vectors. One SOM about seven million documents has been made, viz., all patent abstracts in world have written English and are available electronic form. The consists models. Keywords key texts used search for most relevant first. New effective coding computational schemes described. document organization, searching, browsing system called WEBSOM, described this chapter. original WEBSOM was two-level architecture, but it later simplified.

参考文章(2)
Samuel Kaski, Timo Honkela, Krista Lagus, Teuvo Kohonen, WEBSOM - Self-Organizing Maps of Document Collections Neurocomputing. ,vol. 21, pp. 101- 117 ,(1998) , 10.1016/S0925-2312(98)00039-3
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman, Indexing by Latent Semantic Analysis Journal of the Association for Information Science and Technology. ,vol. 41, pp. 391- 407 ,(1990) , 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9