ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices.

作者: Visanu Wanchai , Intawat Nookaew , David W. Ussery

DOI: 10.1016/J.CSBJ.2020.10.023

关键词: Domain (software engineering)Set (abstract data type)Source codeSequenceTheoretical computer scienceComputer scienceSparse matrixPython (programming language)Domain analysisMatrix (mathematics)

摘要: Large-scale protein analysis has been used to characterize large numbers of proteins across numerous species. One the applications is use as a high-throughput screening method for pathogenicity genomes. Unlike sequence homology methods, comparison at functional level provides us with unique opportunity classify proteins, based on their structures without dealing complexity distantly related Protein functions can be abstractly described by set domains, such PfamA domains; genomes then mapped matrix, each row representing genome, and columns presence or absence given domain. However, powerful tool needed analyze sparse matrices generated millions that will become available in near future. The ProdMX user-friendly utilities developed facilitate an ability included effective module pipeline. employs compressed matrix algorithm reduce computational resources time perform manipulation during domain analysis. free publicly Python package which installed popular mangers PyPI Conda, standard installer from source code GitHub repository https://github.com/visanuwan/prodmx.

参考文章(25)
Helen Cook, David W. Ussery, Sigma factors in a thousand E. coli genomes Environmental Microbiology. ,vol. 15, pp. 3121- 3129 ,(2013) , 10.1111/1462-2920.12236
Chris P. Ponting, Robert R. Russell, The Natural History of Protein Domains Annual Review of Biophysics and Biomolecular Structure. ,vol. 31, pp. 45- 71 ,(2002) , 10.1146/ANNUREV.BIOPHYS.31.082901.134314
Ken J Kalafus, Andrew R Jackson, Aleksandar Milosavljevic, Pash: Efficient Genome-Scale Sequence Anchoring by Positional Hashing Genome Research. ,vol. 14, pp. 672- 678 ,(2004) , 10.1101/GR.1963804
Lewis Y Geer, Michael Domrachev, David J Lipman, Stephen H Bryant, CDART: Protein Homology by Domain Architecture Genome Research. ,vol. 12, pp. 1619- 1623 ,(2002) , 10.1101/GR.278202
Alejandro Barrera, Ana Alastruey-Izquierdo, María J Martín, Isabel Cuesta, Juan Antonio Vizcaíno, None, Analysis of the Protein Domain and Domain Architecture Content in Fungi and Its Application in the Search of New Antifungal Targets PLoS Computational Biology. ,vol. 10, pp. e1003733- ,(2014) , 10.1371/JOURNAL.PCBI.1003733
Leon M. Arriola, James M. Hyman, Being Sensitive to Uncertainty Computing in Science and Engineering. ,vol. 9, pp. 10- 20 ,(2007) , 10.1109/MCSE.2007.27
V. Hollich, E. L.L. Sonnhammer, PfamAlyzer: domain-centric homology search. Bioinformatics. ,vol. 23, pp. 3382- 3383 ,(2007) , 10.1093/BIOINFORMATICS/BTM521
E Oliphant Travis, E Oliphant, None, Python for Scientific Computing Computing in Science and Engineering. ,(2007)
Robert D Finn, Jaina Mistry, Benjamin Schuster-Böckler, Sam Griffiths-Jones, Volker Hollich, Timo Lassmann, Simon Moxon, Mhairi Marshall, Ajay Khanna, Richard Durbin, Sean R Eddy, Erik LL Sonnhammer, Alex Bateman, None, Pfam: clans, web tools and services Nucleic Acids Research. ,vol. 34, pp. 247- 251 ,(2006) , 10.1093/NAR/GKJ149