作者: Seung-won Hwang , Reinald Kim Amplayo , Seonjae Lim
DOI:
关键词:
摘要: Can a text classifier generalize well for datasets where the length is different? For example, when short reviews are sentiment-labeled, can these transfer to predict sentiment of long (i.e., transfer), or vice versa? While unsupervised learning has been well-studied cross domain/lingual tasks, Cross Length Transfer (CLT) not yet explored. One reason assumption that difference trivially transferable in classification. We show it not, because short/long texts differ context richness and word intensity. devise new benchmark from diverse domains languages, existing models similar tasks cannot deal with unique challenge transferring across lengths. introduce strong baseline model called BaggedCNN treats as bags containing texts. propose state-of-the-art CLT Networks (LeTraNets) introduces two-way encoding scheme using multiple training mechanisms. test our find perform worse than baseline, while LeTraNets outperforms all models.