作者: ROBERTO BASILI , MARIA TERESA PAZIENZA , PAOLA VELARDI
DOI: 10.1080/08839519308949994
关键词: Machine learning 、 Syntax 、 Natural language processing 、 Computer science 、 Rule-based machine translation 、 Semantic HTML 、 Robustness (computer science) 、 Artificial intelligence 、 Statistical analysis 、 Semi automatic
摘要: Abstract The robustness of NLP techniques can be improved by the use “shallow” methods such as statistical analysis in combination with traditional knowledge-based methods, syntax and semantics This paper describes a hybrid methodology to extract from corpora preference criteria for syntactically ambiguous structures. method is based on word co-occurrences augmented syntactic semantic tags, which we call clustered association data. proposed shown exhibit better trade-off between precision acquired data amount manual work required, respect other similar algorithms literature. Furthermore, tags makes it possible obtain statistically relevant number reliable even when application corpus.does not exceed 500,000 words.