作者: Gabriel Pui Cheong Fung , J.X. Yu , Hongjun Lu , P.S. Yu
DOI: 10.1109/TKDE.2006.16
关键词:
摘要: Traditionally, building a classifier requires two sets of examples: positive examples and negative examples. This paper studies the problem text using (P) unlabeled (U). The are mixed with both Since no example is given explicitly, task reliable becomes far more challenging. Simply treating all as thereafter undoubtedly poor approach to tackling this problem. Generally speaking, most solved by two-step heuristic: first, extract (N) from U. Second, build based on P N. Surprisingly, did not try Intuitively, enlarging P' (positive extracted U) should enhance effectiveness classifier. Throughout our study, we find that extracting very difficult. A document in U possesses features exhibited does necessarily mean it example, vice versa. large size high diversity also contribute difficulties P'. In paper, propose labeling heuristic called PNLH tackle aims at quality can be used top any existing classifiers. Extensive experiments several benchmarks conducted. results indicated highly feasible, especially situation where |P| extremely small.