作者: Ivan Garrido-Marquez , Jorge Garcia Flores , François Lévy , Adeline Nazarenko
DOI: 10.13053/RCS-110-1-8
关键词: Annotation 、 Mainstream 、 Exploit 、 Spam blog 、 Information retrieval 、 Corpus analysis 、 Diachronic analysis 、 Computer science 、 Lexical frequency 、 Categorization 、 World Wide Web
摘要: Nowadays, blogs cover a large audience and they raised from the underground to become part of mainstream media. Blogs contain information on diverse topics, personal opinions, discussions between bloggers readers. Tags categories are structural elements blog post that increase blog's visibility, enhance navigation searching within history. We suppose those annotations made subjective grounds rather than in systematic way. Even if there tools help tag categorize their posts, we still don't know which extent these take into account contained previous posts. This paper presents 11 million word corpus posts French dedicated study questions, an experiment category prediction. Preliminary results show around 27% overall tags can be predicted lexical frequency analysis However, first comparison experience with existing suggestion tool shows important proportion used for description not present post. should exploit diachronic blogs.