Stimulus Description Collections

作者: William B Dolan , David L Chen

DOI:

关键词:

摘要: The subject disclosure generally describes a technology by which text and/or speech descriptions are collected by showing a stimulus such as video clips to contributors (eg, of a crowd-sourcing service). The descriptions, which are in the language of each contributor's choice, are of the same stimulus and thus associated with one another. While each contributor may be monolingual, the technique allows for the collection of approximately bilingual data, since more than one language may be represented among the different contributors. The descriptions may be used as translation data for training a machine translation engine, and as paraphrase data (grouped by the same language) for training a machine paraphrasing system. Also described is evaluating the quality of a machine paraphrasing system via a distinctiveness metric.

参考文章(0)