Using Statistics in Lexical Analysis

作者: K.W. Church

DOI:

关键词: LinguisticssortSalientLexical analysisIntrospectionKey Word in ContextLexical functional grammarComputer scienceSyntaxLexical density

摘要: The computational tools available for studying machine-readable corpora are at present still rather primitive. In the more advanced lexicographic organizations, there concordancing programs (see figure below), which basically KWIC (key word in context (Aho et al., 1988, p. 122), (Salton, 1989, 384)) indexes with additional features such as ability to extend context, sort leftwards well rightwards, and so on. There is very little interactive software. lack of software perhaps part reason why dictionaries produced United States pay attention corpora, based on collections selected citations, augmented by introspection, than analysis whole texts. situation somewhat different Britain. British lexicographers, especially those working foreign learners, beginning depend heavily corpora. They use these basic tool mentioned above fill detailed syntactic descriptions (prompting a move, that will probably dominate lexicography 1990s, towards thorough lexical syntax). Cobuild project 1980s, example, typical procedure was lexicographer given concordances or group words, marked up printout colored pens order identify salient senses, then wrote definitions.

参考文章(20)
Jong-Nae Wang, Jing-Shin Chang, Keh-Yih Su, Mei-Hui Su, A Sequential Truncation Parsing Algorithm Based on the Score Function international workshop/conference on parsing technologies. pp. 95- 104 ,(1989)
Stephanie Seneff, Probabilistic Parsing for Spoken Language Applications international workshop/conference on parsing technologies. pp. 209- 218 ,(1989)
Leonore Crary Hauck, Stuart Berg Flexner, The Random House dictionary of the English language Random House. ,(1968)
Zellig Sabbettai Harris, Mathematical structures of language ,(1968)
Gerald Salton, Automatic text processing ,(1988)
Steven J. DeRose, Grammatical category disambiguation by statistical optimization Computational Linguistics. ,vol. 14, pp. 31- 39 ,(1988) , 10.5555/49084.49087
Patrick Hanks, Kenneth Ward Church, Word association norms, mutual information, and lexicography Computational Linguistics. ,vol. 16, pp. 22- 29 ,(1990) , 10.5555/89086.89095