作者: Visanu Wanchai , Intawat Nookaew , David W. Ussery
DOI: 10.1016/J.CSBJ.2020.10.023
关键词: Domain (software engineering) 、 Set (abstract data type) 、 Source code 、 Sequence 、 Theoretical computer science 、 Computer science 、 Sparse matrix 、 Python (programming language) 、 Domain analysis 、 Matrix (mathematics)
摘要: Large-scale protein analysis has been used to characterize large numbers of proteins across numerous species. One the applications is use as a high-throughput screening method for pathogenicity genomes. Unlike sequence homology methods, comparison at functional level provides us with unique opportunity classify proteins, based on their structures without dealing complexity distantly related Protein functions can be abstractly described by set domains, such PfamA domains; genomes then mapped matrix, each row representing genome, and columns presence or absence given domain. However, powerful tool needed analyze sparse matrices generated millions that will become available in near future. The ProdMX user-friendly utilities developed facilitate an ability included effective module pipeline. employs compressed matrix algorithm reduce computational resources time perform manipulation during domain analysis. free publicly Python package which installed popular mangers PyPI Conda, standard installer from source code GitHub repository https://github.com/visanuwan/prodmx.