Identifying dictionary-relevant formulaic sequences in written and spoken corpora

关键词: Computer science 、 Language and Linguistics

摘要: In view of the pervasiveness of formulaic language in human communication and the growing awareness of its relevance to modern lexicography, this study presents a corpus-driven identification, analysis and comparison of dictionary-relevant formulaic sequences in reference corpora of written and spoken Slovenian. The sequences were identified using a semi-automatic approach, whereby the most frequently recurring word combinations in each corpus were ranked according to their statistical salience and manually inspected for formulaic expressions with lexicographic relevance. Despite its semantic heterogeneity, the resulting list illustrates the distinct characteristics of formulaic multi-word expressions, such as high frequency of usage, prevalent inclusion of grammatical words and common non-propositional meaning, especially in speech, where research revealed numerous understudied formulaic expressions related to interaction management and mitigation. The final evaluation of measures used in the identification process demonstrates their relative suitability for corpus-driven identification of dictionary-relevant formulaic expressions, with their precision varying in relation to corpus size and length of sequences under investigation.

oup.com 本地加速

doi.org 本地加速

sci-hub.st HTML 下载加速

参考文章(53)

Patrick Hanks, Lexical Analysis: Norms and Exploitations ,(2013)

Corpus linguistics : an international handbook Walter de Gruyter. ,(2009) , 10.1515/9783110213881.2

B. T. S. Atkins, Michael Rundell, The Oxford Guide to Practical Lexicography ,(2008)

David Wood, Perspectives on formulaic language : acquisition and communication Continuum. ,(2010)

Viviana Cortes, Eniko Csomay, Corpus-based research in applied linguistics : studies in honor of Doug Biber John Benjamins Publishing Company. ,(2015) , 10.1075/SCL.66

Carlos Ramisch, Multiword Expressions Acquisition: A Generic and Open Framework ,(2014)

Siepmann Dirk, Discourse Markers Across Languages : A Contrastive Study of Second-Level Discourse Markers in Native and Non-Native Text with Implications for General and Pedagogic Lexicography Routledge. ,(2004) , 10.4324/9780203315262

The Cambridge handbook of English corpus linguistics Cambridge University Press. ,(2015) , 10.1017/CBO9781139764377

Joanne M. Garrett, Anthony J. Viera, Understanding interobserver agreement: the kappa statistic. Family Medicine. ,vol. 37, pp. 360- 363 ,(2005)

10.

Alison Wray, Formulaic Language and the Lexicon ,(2002)

Identifying dictionary-relevant formulaic sequences in written and spoken corpora

来源期刊

我的账户

Identifying dictionary-relevant formulaic sequences in written and spoken corpora

来源期刊

相似文章 0

我的账户