SEMI-AUTOMATIC EXTRACTION OF LINGUISTIC INFORMATION FOR SYNTACTIC DISAMBIGUATION

作者: ROBERTO BASILI , MARIA TERESA PAZIENZA , PAOLA VELARDI

DOI: 10.1080/08839519308949994

关键词: Machine learningSyntaxNatural language processingComputer scienceRule-based machine translationSemantic HTMLRobustness (computer science)Artificial intelligenceStatistical analysisSemi automatic

摘要: Abstract The robustness of NLP techniques can be improved by the use “shallow” methods such as statistical analysis in combination with traditional knowledge-based methods, syntax and semantics This paper describes a hybrid methodology to extract from corpora preference criteria for syntactically ambiguous structures. method is based on word co-occurrences augmented syntactic semantic tags, which we call clustered association data. proposed shown exhibit better trade-off between precision acquired data amount manual work required, respect other similar algorithms literature. Furthermore, tags makes it possible obtain statistically relevant number reliable even when application corpus.does not exceed 500,000 words.

参考文章(19)
Uri Zernik, Lexicon acquisition: learning from corpus by capitalizing on lexical categories international joint conference on artificial intelligence. pp. 1556- 1562 ,(1989)
Rajeev Agarwal, Lois Boggess, Ron Davis, Disambiguation of prepositional phrases in automatically labelled technical text national conference on artificial intelligence. pp. 155- 159 ,(1991)
Robert F. Simmons, Jungyun Seo, Syntactic graphs: a representation for the union of all ambiguous parse trees Computational Linguistics. ,vol. 15, pp. 19- 32 ,(1989)
Patrick Hanks, Kenneth Ward Church, Word association norms, mutual information, and lexicography Computational Linguistics. ,vol. 16, pp. 22- 29 ,(1990) , 10.5555/89086.89095
F. ANTONACCI, M. RUSSO, M. T. PAZIENZA, P. VELARDI, Representation and control strategies for large knowledge domains: an application to NLP Applied Artificial Intelligence. ,vol. 2, pp. 213- 249 ,(1988) , 10.1080/08839518808949909
Frank A. Smadja, FROM N-GRAMS TO COLLOCATIONS AN EVALUATION OF XTRACT meeting of the association for computational linguistics. pp. 279- 284 ,(1991) , 10.3115/981344.981380
Verónica Dahl, Discontinuous grammars computational intelligence. ,vol. 5, pp. 161- 179 ,(1990) , 10.1111/J.1467-8640.1989.TB00326.X
Roberto Basili, Maria Teresa Pazienza, Paola Velardi, Computational Lexicons: the Neat Examples and the Odd Exemplars conference on applied natural language processing. pp. 96- 103 ,(1992) , 10.3115/974499.974516
Richard M. Tong, Knowledge-Based Techniques for Information Retrieval International Journal of Intelligent Systems. ,vol. 4, pp. 221- 222 ,(1989) , 10.1002/INT.4550040302