作者: Alexander Hinneburg , Andrea Porzel , Karina Wolfram
DOI: 10.1007/978-3-540-71233-6_33
关键词: NMR spectra database 、 Mathematics 、 Vector space model 、 Task (computing) 、 Function (mathematics) 、 Nearest neighbor search 、 Information retrieval 、 Probabilistic latent semantic analysis 、 Similarity (geometry) 、 Simple (abstract algebra)
摘要: Searching and mining nuclear magnetic resonance (NMR)- spectra of naturally occurring substances is an important task to investigate new potentially useful chemical compounds. Multi-dimensional NMR-spectra are relational objects like documents, but consists continuous multi-dimensional points called peaks instead words. We develop several mappings from discrete textlike data. With the help those any text retrieval method can be applied. evaluate performance two methods, namely standard vector space model probabilistic latent semantic indexing (PLSI). PLSI learns hidden topics in data, which case 2D-NMR data interesting its owns rights. Additionally, we a simple direct similarity function, detect duplicates NMR-spectra. Our experiments show that as well PLSI, both designed for created by humans, effectively handle mapped NMR-data originating natural products. able find meaningful "topics" NMR-data.