作者: Yangqiu Song , Qiang Yang , Hailong Sun , Chenguang Wang , Ming Zhang
DOI:
关键词: Smoothness (probability theory) 、 Pattern recognition 、 Artificial intelligence 、 Automatic label placement 、 Similarity (geometry) 、 Computer science 、 Content (measure theory) 、 Measure (data warehouse) 、 Consistency (database systems)
摘要: With the recent growth of online content on Web, there have been more user generated data with noisy and missing labels, e.g., social tags voted labels from Amazon's Mechanical Turks. Most machine learning methods, which require accurate label sets, could not be trusted when sets were yet unreliable. In this paper, we provide a text refinement algorithm to adjust for such labeled datasets. We assume that can refined based certain confidence, similarity between being consistent labels. propose smoothness ratio criterion measure consistency data. demonstrate effectiveness refining eight document datasets, validate results are useful generating better