Blog annotation: from corpus analysis to automatic tag suggestion

作者: Ivan Garrido-Marquez , Jorge Garcia Flores , François Lévy , Adeline Nazarenko

DOI: 10.13053/RCS-110-1-8

关键词: AnnotationMainstreamExploitSpam blogInformation retrievalCorpus analysisDiachronic analysisComputer scienceLexical frequencyCategorizationWorld Wide Web

摘要: Nowadays, blogs cover a large audience and they raised from the underground to become part of mainstream media. Blogs contain information on diverse topics, personal opinions, discussions between bloggers readers. Tags categories are structural elements blog post that increase blog's visibility, enhance navigation searching within history. We suppose those annotations made subjective grounds rather than in systematic way. Even if there tools help tag categorize their posts, we still don't know which extent these take into account contained previous posts. This paper presents 11 million word corpus posts French dedicated study questions, an experiment category prediction. Preliminary results show around 27% overall tags can be predicted lexical frequency analysis However, first comparison experience with existing suggestion tool shows important proportion used for description not present post. should exploit diachronic blogs.

参考文章(0)