An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra

作者： Alexander Hinneburg , Andrea Porzel , Karina Wolfram

关键词: NMR spectra database 、 Mathematics 、 Vector space model 、 Task (computing) 、 Function (mathematics) 、 Nearest neighbor search 、 Information retrieval 、 Probabilistic latent semantic analysis 、 Similarity (geometry) 、 Simple (abstract algebra)

摘要: Searching and mining nuclear magnetic resonance (NMR)- spectra of naturally occurring substances is an important task to investigate new potentially useful chemical compounds. Multi-dimensional NMR-spectra are relational objects like documents, but consists continuous multi-dimensional points called peaks instead words. We develop several mappings from discrete textlike data. With the help those any text retrieval method can be applied. evaluate performance two methods, namely standard vector space model probabilistic latent semantic indexing (PLSI). PLSI learns hidden topics in data, which case 2D-NMR data interesting its owns rights. Additionally, we a simple direct similarity function, detect duplicates NMR-spectra. Our experiments show that as well PLSI, both designed for created by humans, effectively handle mapped NMR-data originating natural products. able find meaningful "topics" NMR-data.

uni-halle.de PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(13)

Karina Wolfram, Andrea Porzel, Alexander Hinneburg, Similarity Search for Multi-dimensional NMR-Spectra of Natural Products Lecture Notes in Computer Science. pp. 650- 658 ,(2006) , 10.1007/11871637_67

David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937

Athanasios Tsipouras, John Ondeyka, Claude Dufresne, Seok Lee, Gino Salituro, Nancy Tsou, Michael Goetz, Sheo Bux Singh, Simon K. Kearsley, Using similarity searches over databases of estimated 13C NMR spectra for structure identification of natural product compounds Analytica Chimica Acta. ,vol. 316, pp. 161- 171 ,(1995) , 10.1016/0003-2670(95)00322-Q

Christoph Steinbeck, Stefan Krause, Stefan Kuhn, NMRShiftDB-constructing a free chemical information system with open-source components. Journal of Chemical Information and Computer Sciences. ,vol. 43, pp. 1733- 1739 ,(2003) , 10.1021/CI0341363

Qiaozhu Mei, ChengXiang Zhai, Discovering evolutionary theme patterns from text: an exploration of temporal text mining knowledge discovery and data mining. pp. 198- 207 ,(2005) , 10.1145/1081870.1081895

António S. Barros, Douglas N. Rutledge, Segmented principal component transform–principal component analysis Chemometrics and Intelligent Laboratory Systems. ,vol. 78, pp. 125- 137 ,(2005) , 10.1016/J.CHEMOLAB.2005.01.003

Margit Farkas, János Bendl, Dieter H. Welti, Ernö Pretsch, Stephan Dütsch, Pius Portmann, Martin Zürcher, Jean-Thomas Clerc, Similarity search for a 1H-NMR spectroscopic data base Analytica Chimica Acta. ,vol. 206, pp. 173- 187 ,(1988) , 10.1016/S0003-2670(00)80840-5

Alexandrin Popescul, Steve Lawrence, Lyle H. Ungar, David M. Pennock, Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments uncertainty in artificial intelligence. pp. 437- 444 ,(2001)

Thomas Hofmann, Probabilistic latent semantic indexing international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 50- 57 ,(1999) , 10.1145/3130348.3130370

10.

Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, Thomas Griffiths, Probabilistic author-topic models for information discovery knowledge discovery and data mining. pp. 306- 315 ,(2004) , 10.1145/1014052.1014087

An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra

来源期刊

我的账户

An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra

来源期刊

相似文章 1

Current status and prospects of computational resources for natural product dereplication: a review

我的账户