Big Data Versus Small Data: The Case of ‘Gripe’ (Flu) in Spanish

作者: Antonio Moreno-Sandoval , Esteban Moro

DOI: 10.1016/J.SBSPRO.2015.07.452

关键词: Small dataData scienceBig dataEngineeringCorpus linguisticsInformation qualityField (computer science)Content analysisPredictive analyticsData processing

摘要: Abstract Big data is a broad term for sets so large and complex that traditional processing applications are inadequate. A new field, Predictive Analytics, trying to extract value from those big (unstructured) data. In Corpus Linguistics, researchers usually deal with small this paper, we compare the amount quality of information respect single topic (flu) in Twitter MultiMedica (a corpus medicine texts).

参考文章(2)
Antonio Moreno-Sandoval, Leonardo Campillos-Llanos, Design and Annotation of MultiMedica – A Multilingual Text Corpus of the Biomedical Domain Procedia - Social and Behavioral Sciences. ,vol. 95, pp. 33- 39 ,(2013) , 10.1016/J.SBSPRO.2013.10.619
Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, Vít Suchomel, The Sketch Engine: ten years on Lexicography ASIALEX. ,vol. 1, pp. 7- 36 ,(2014) , 10.1007/S40607-014-0009-9