作者: Antoine Raux , Alan Black , Jason Williams , Deepak Ramachandran
DOI:
关键词:
摘要: In a spoken dialog system, state tracking deduces information about the user’s goal as progresses, synthesizing evidence such acts over multiple turns with external data sources. Recent approaches have been shown to overcome ASR and SLU errors in some applications. However, there are currently no common testbeds or evaluation measures for this task, hampering progress. The challenge seeks address by providing heterogeneous corpus of 15K human-computer dialogs standard format, along suite 11 metrics. received total 27 entries from 9 research groups. results show that performance metrics cluster into 4 natural Moreover, systems benefit most those less discriminative speech recognition confidence scores. Finally, generalization is key problem: 2 test sets, fewer than half out-performed simple baselines. 1 Overview motivation Spoken interact users via language help them achieve goal. As interaction manager maintains representation process called (DST). For example, bus schedule might indicate desired route, origin, destination. Dialog difficult because automatic ∗Most work was performed when second third authors were Honda Research Institute, Mountain View, CA, USA (ASR) understanding (SLU) common, can cause system misunderstand needs. At same time, crucial relies on estimated choose actions – which present user. Most commercial use hand-crafted heuristics tracking, selecting result highest score, discarding alternatives. contrast, statistical compute scores many hypotheses (Figure 1). By exploiting correlations between sources maps, timetables, models past errors. Numerous techniques proposed, including heuristic (Higashinaka et al., 2003), Bayesian networks (Paek Horvitz, 2000; Williams Young, 2007), kernel density estimators (Ma 2012), (Bohus Rudnicky, 2006). Techniques fielded scale realistically sized problems operate real time (Young 2010; Thomson Williams, Mehta 2010). end-to-end systems, has improve overall Despite progress, direct comparisons methods not possible studies different domains components, recognition, understanding, control, etc. little agreement how evaluate tracking. Together these issues limit progress area. State Tracking Challenge (DSTC) provides first testbed