作者: Je Hun Jeon , Yang Liu
DOI: 10.1016/J.SPECOM.2011.10.008
关键词: Event (computing) 、 Supervised learning 、 Pitch accent 、 Computer science 、 Artificial intelligence 、 Word error rate 、 Pattern recognition 、 Learning curve 、 Phrase 、 Co-training 、 Set (abstract data type) 、 Natural language processing
摘要: Most previous approaches to automatic prosodic event detection are based on supervised learning, relying the availability of a corpus that is annotated with labels interest in order train classification models. However, creating such resources an expensive and time-consuming task. In this paper, we exploit semi-supervised learning co-training algorithm for coarse-level representation events as pitch accent, intonational phrase boundaries, break indices. Since works condition views compatible uncorrelated, real data often do not satisfy these conditions, propose method label select examples co-training. our experiments Boston University radio news corpus, when using only small amount labeled initial training set, proposed labeling can effectively use unlabeled improve performance finally reach close results more data. We perform thorough analysis various factors impacting curves, including error rate informativeness added examples, individual classifiers their difference, size.