Applications of species accumulation curves in large-scale biological data analysis.

作者: Chao Deng , Timothy Daley , Andrew Smith

DOI: 10.1007/S40484-015-0049-7

关键词:

摘要: The species accumulation curve, or collector’s of a population gives the expected number observed distinct classes as function sampling effort. Species curves allow researchers to assess and compare diversity across populations evaluate benefits additional sampling. Traditional applications have focused on ecological but emerging large-scale applications, for example in DNA sequencing, are orders magnitude larger present new challenges. We developed method estimate predicting complexity sequencing libraries. This uses rational approximations classical nonparametric empirical Bayes estimator due Good Toulmin [Biometrika, 1956, 43, 45–63]. Here we demonstrate how same approach can be highly effective other involving biological data sets. These include estimating microbial richness, immune repertoire size, k-mer genome assembly applications. show modified address containing an effectively infinite where saturation cannot practically attained. also introduce flexible suite tools implemented R package that make these methods broadly accessible.

参考文章(43)
JAMES L. NORRIS, KENNETH H POLLOCK, Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species Environmental and Ecological Statistics. ,vol. 5, pp. 391- 402 ,(1998) , 10.1023/A:1009659922745
Ji-Ping Wang, SPECIES: An R Package for Species Richness Estimation Journal of Statistical Software. ,vol. 40, pp. 1- 15 ,(2011) , 10.18637/JSS.V040.I09
I. Ionita-Laza, C. Lange, N. M. Laird, Estimating the number of unseen variants in the human genome. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 106, pp. 5008- 5013 ,(2009) , 10.1073/PNAS.0807815106
Quentin L. Burrell, Michael R. Fenton, Yes, the GIGP Really Does Work--And Is Workable!. Journal of the Association for Information Science and Technology. ,vol. 44, pp. 61- 69 ,(1993) , 10.1002/(SICI)1097-4571(199303)44:2<61::AID-ASI1>3.0.CO;2-J
F Meyer, D Paarmann, M D'Souza, R Olson, EM Glass, M Kubal, T Paczian, A Rodriguez, R Stevens, A Wilke, J Wilkening, RA Edwards, The Metagenomics RAST Server: A Public Resource for the Automatic Phylogenetic and Functional Analysis of Metagenomes BMC Bioinformatics. ,vol. 9, pp. 386- 386 ,(2008) , 10.1186/1471-2105-9-386
Anne E Magurran, Anne E Magurran, Ecological Diversity and its Measurement ,(1988)
Timothy Daley, Andrew D Smith, Predicting the molecular complexity of sequencing libraries Nature Methods. ,vol. 10, pp. 325- 327 ,(2013) , 10.1038/NMETH.2375
Kim A. Keating, James F. Quinn, Michael A. Ivie, LaDonna L. Ivie, ESTIMATING THE EFFECTIVENESS OF FURTHER SAMPLING INSPECIES INVENTORIES Ecological Applications. ,vol. 8, pp. 1239- 1249 ,(1998) , 10.1890/1051-0761(1998)008[1239:ETEOFS]2.0.CO;2
Ji-Ping Z Wang, Bruce G Lindsay, A Penalized Nonparametric Maximum Likelihood Approach to Species Richness Estimation Journal of the American Statistical Association. ,vol. 100, pp. 942- 959 ,(2005) , 10.1198/016214504000002005
J. Bunge, M. Fitzpatrick, Estimating the Number of Species: A Review Journal of the American Statistical Association. ,vol. 88, pp. 364- 373 ,(1993) , 10.1080/01621459.1993.10594330