作者: CAROLINE SPORLEDER , ALEX LASCARIDES
DOI: 10.1017/S1351324906004451
关键词:
摘要: Being able to identify which rhetorical relations (e.g., contrast or explanation) hold between spans of text is important for many natural language processing applications. Using machine learning obtain a classifier can distinguish different typically depends on the availability manually labelled training data, very time-consuming create. However, are sometimes lexically marked, i.e., signalled by discourse markers because, but, consequently etc.), and it has been suggested (Marcu Echihabi, 2002) that presence these cues in some examples be exploited label them automatically with corresponding relation. The then removed data used train determine even when no marker present (based other linguistic such as word co-occurrences). In this paper, we investigate empirically how feasible approach is. particular, test whether labelled, marked really suitable material classifiers applied unmarked examples. Our results suggest type may not good strategy, models trained way do seem generalise well data. Furthermore, found evidence behaviour largely independent seems lie itself too dissimilar linguistically removing unambiguous automatic labelling process lead meaning shift examples).