作者: Jörg Schultz , Tobias Doerks , Chris P. Ponting , Richard R. Copley , Peer Bork
DOI: 10.1038/76069
关键词:
摘要: Cloning procedures aided by homology searches of EST databases have accelerated the pace discovery new genes, but database searching remains an involved and onerous task. More than 1.6 million human sequences been deposited in public databases, making it difficult to identify ESTs that represent genes. Compounding problems scale are difficulties detection associated with a high sequencing error rate low sequence similarity between distant homologues. We developed method, coupling BLAST-based domain identification protocol, filters candidate Application this method large-scale analysis 100 signalling families has led representing more 1,000 novel The 4,206 publicly available these genes valuable resource for rapid cloning proteins. For example, we were able at least 106 small GTPases, which 6 likely belong subfamilies. In some cases, further analyses genomic DNA previously unidentified full-length protein sequences. This is exemplified silico (prediction gene product using only data) type GTPase two catalytic domains.