SECIMTools: a suite of metabolomics data analysis tools.

作者: Alexander S Kirpich , Miguel Ibarra , Oleksandr Moskalenko , Justin M Fear , Joseph Gerken

DOI: 10.1186/S12859-018-2134-1

关键词:

摘要: Metabolomics has the promise to transform area of personalized medicine with rapid development high throughput technology for untargeted analysis metabolites. Open access, easy use, analytic tools that are broadly accessible biological community need be developed. While used in metabolomics varies, most studies have a set features identified. Galaxy is an open access platform enables scientists at all levels interact big data. promotes reproducibility by saving histories and enabling sharing workflows among scientists. SECIMTools (SouthEast Center Integrated Metabolomics) Python applications available both as standalone wrapped use Galaxy. The suite includes comprehensive quality control metrics (retention time window evaluation various peak tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical methods (partial least squares - discriminant variance, t-test, Kruskal-Wallis non-parametric test), advanced classification (random forest, support vector machines), variable selection (least absolute shrinkage operator LASSO Elastic Net). leverages integrated data made from building blocks designed interpretability. Standard formats utilities allow arbitrary linkages between encourage novel workflow designs. framework future integration other omics

参考文章(63)
Barry McDonald, A teaching note on Cook's distance - a guideline Massey University. ,(2002)
Leonard P. Freedman, Iain M. Cockburn, Timothy S. Simcoe, The Economics of Reproducibility in Preclinical Research PLOS Biology. ,vol. 13, pp. 1- 9 ,(2015) , 10.1371/JOURNAL.PBIO.1002165
P. C. Mahalanobis, On the generalized distance in statistics Proceedings of the National Institute of Sciences (Calcutta). ,vol. 2, pp. 49- 55 ,(1936)
Daniel Yekutieli, Yoav Benjamini, THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY Annals of Statistics. ,vol. 29, pp. 1165- 1188 ,(2001) , 10.1214/AOS/1013699998
Mark R Segal, Machine Learning Benchmarks and Random Forest Regression international conference on bioinformatics. ,(2004)
陳佩君, Support Vector Machines ,(2008)
Biswapriya B. Misra, Justin J. J. van der Hooft, Updates in metabolomics tools and resources: 2014-2015. Electrophoresis. ,vol. 37, pp. 86- 110 ,(2016) , 10.1002/ELPS.201500417
Anand Patil, David Huard, Christopher Fonnesbeck, PyMC: Bayesian Stochastic Modelling in Python. Journal of Statistical Software. ,vol. 35, pp. 1- 81 ,(2010) , 10.18637/JSS.V035.I04
Manish Sud, Eoin Fahy, Dawn Cotter, Kenan Azam, Ilango Vadivelu, Charles Burant, Arthur Edison, Oliver Fiehn, Richard Higashi, K. Sreekumaran Nair, Susan Sumner, Shankar Subramaniam, Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools Nucleic Acids Research. ,vol. 44, pp. 463- 470 ,(2016) , 10.1093/NAR/GKV1042
Russell A. Poldrack, Jean-Baptiste Poline, The publication and reproducibility challenges of shared data Trends in Cognitive Sciences. ,vol. 19, pp. 59- 61 ,(2015) , 10.1016/J.TICS.2014.11.008