System and method to extract models from semi-structured documents

作者: Anuradha Bhamidipaty , Rema Ananthanarayanan , Biplav Srivastava , Vibha Singhal Sinha , Debdoot Mukherjee

DOI:

关键词: Global modelDomain (software engineering)Information retrievalInformation modelDomain modelDocumentationComputer science

摘要: Systems and associated methods for automated semi-automated building of domain models documents are described. Embodiments provide an approach to discover information model by mining documentation about a particular captured in the documents. classify into one or more types corresponding concepts using indicative words, identify candidate elements (concepts) document types, relationships both within across consolidate learn global domain.

参考文章(29)
Byron Edward Dom, Soumen Chakrabarti, Martin Henk van den Berg, System and method for focussed web crawling ,(1999)
Ajay Hemnani, Stephane Bressan, Extracting Information from Semi-structured Web Documents Lecture Notes in Computer Science. pp. 166- 175 ,(2002) , 10.1007/3-540-46105-1_20
Caroline Privault, Jean-Michel Renders, Ludovic Menuge, Interactive cleaning for automatic document clustering and categorization ,(2007)
Sathiya Keerthi Selvaraj, Vikas Sindhwani, Large scale semi-supervised linear support vector machines ,(2006)
Christian M. Schweda, Florian Matthes, Christian Neubert, Sabine Buckl, A wiki-based approach to enterprise architecture documentation and analysis european conference on information systems. pp. 1476- 1487 ,(2009)
David D. Sha, Rengaswamy Mohan, Usha Mohan, System and method for concept based analysis of unstructured data ,(2003)
Andrew Carlson, Charles Schafer, Bootstrapping Information Extraction from Semi-structured Web Pages european conference on machine learning. pp. 195- 210 ,(2008) , 10.1007/978-3-540-87479-9_31