作者: Michael R. Brent , Robert C. Berwick
关键词: Computer science 、 Artificial intelligence 、 Speech recognition 、 Subcategorization 、 Verb 、 Completeness (order theory) 、 Natural language processing 、 Text corpus
摘要: This paper describes an implemented program that takes a tagged text corpus and generates partial list of the subcategorization frames in which each verb occurs. The completeness output increases monotonically with total occurrences training corpus. False positive rates are one to three percent. Five currently detected we foresee no impediment detecting many more. Ultimately, expect provide large dictionary NLP community train dictionaries for specific corpora.