More than 1,000 putative new human signalling proteins revealed by EST data mining.

作者: Jörg Schultz , Tobias Doerks , Chris P. Ponting , Richard R. Copley , Peer Bork

DOI: 10.1038/76069

关键词:

摘要: Cloning procedures aided by homology searches of EST databases have accelerated the pace discovery new genes, but database searching remains an involved and onerous task. More than 1.6 million human sequences been deposited in public databases, making it difficult to identify ESTs that represent genes. Compounding problems scale are difficulties detection associated with a high sequencing error rate low sequence similarity between distant homologues. We developed method, coupling BLAST-based domain identification protocol, filters candidate Application this method large-scale analysis 100 signalling families has led representing more 1,000 novel The 4,206 publicly available these genes valuable resource for rapid cloning proteins. For example, we were able at least 106 small GTPases, which 6 likely belong subfamilies. In some cases, further analyses genomic DNA previously unidentified full-length protein sequences. This is exemplified silico (prediction gene product using only data) type GTPase two catalytic domains.

参考文章(12)
Shamil Sunyaev, Jens Hanke, David Brett, Atakan Aydin, Inga Zastrow, Warren Lathe, Peer Bork, Jens Reich, Individual variation in protein-coding sequences of human genome Advances in Protein Chemistry. ,vol. 54, pp. 409- 437 ,(2000) , 10.1016/S0065-3233(00)54012-1
Meredith Wadman, Human Genome Project aims to finish 'working draft' next year. Nature. ,vol. 398, pp. 177- 177 ,(1999) , 10.1038/18250
Peer Bork, Toby J. Gibson, Applying motif and profile searches. Methods in Enzymology. ,vol. 266, pp. 162- 184 ,(1996) , 10.1016/S0076-6879(96)66013-3
William R. Pearson, Kevin R. Lynch, Jacques D. Retief, Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs. Genome Research. ,vol. 9, pp. 373- 382 ,(1999) , 10.1101/GR.9.4.373
J. Schultz, F. Milpetz, P. Bork, C. P. Ponting, SMART, a simple modular architecture research tool: Identification of signaling domains Proceedings of the National Academy of Sciences of the United States of America. ,vol. 95, pp. 5857- 5864 ,(1998) , 10.1073/PNAS.95.11.5857
Anne Mette Wolff, Jens G. Litske Petersen, Torsten Nilsson-Tillgren, Nanni Din, The open reading frame YAL048c affects the secretion of proteinase A in S. cerevisiae. Yeast. ,vol. 15, pp. 427- 434 ,(1999) , 10.1002/(SICI)1097-0061(19990330)15:5<427::AID-YEA362>3.0.CO;2-5
Gregory D. Schuler, Pieces of the puzzle: expressed sequence tags and the catalog of human genes. Journal of Molecular Medicine. ,vol. 75, pp. 694- 698 ,(1997) , 10.1007/S001090050155
Akhilesh Pandey, Fran Lewitter, None, Nucleotide sequence databases: a gold mine for biologists Trends in Biochemical Sciences. ,vol. 24, pp. 276- 280 ,(1999) , 10.1016/S0968-0004(99)01400-0
Jörg Schultz, Richard R Copley, Tobias Doerks, Chris P Ponting, Peer Bork, SMART: a web-based tool for the study of genetically mobile domains Nucleic Acids Research. ,vol. 28, pp. 231- 234 ,(2000) , 10.1093/NAR/28.1.231
Richard Durbin, Ewan Birney, Dynamite: A Flexible Code Generating Language for Dynamic Programming Methods Used in Sequence Comparison intelligent systems in molecular biology. ,vol. 5, pp. 56- 64 ,(1997)