Advantages, Opportunities and Limits of Empirical Evaluations: Evaluating Adaptive Systems.

作者: Stephan Weibelzahl , Gerhard Weber

DOI:

关键词:

摘要: While empirical evaluations are a common research method in some areas of Artificial Intelligence (AI), others still neglect this approach. This article outlines both the opportunities and limits for AI techniques exemplified by evaluation adaptive systems. Using so called layered approach, we demonstrate that able to identify errors systems would otherwise remain undiscovered. To encourage new implemented an online database studies concerned with (EASy-D). 1 Advantages: Why Evaluations needed Some apply methods regularly. E.g., planning search algorithms benchmarked standard domains, machine learning usually tested real data sets. However, looking at applied such as user modeling, rare. only quarter articles published User Modeling Adapted Interaction (UMUAI) reporting significant [4]. Many them include simple study small sample sizes often without any statistical methods. On other hand, estimation effectiveness, efficiency, usability system applies world scenarios, is absolutely necessary. Especially modeling which based on human-computer interaction require evaluations. Otherwise, going paper, certain types will Undoubtedly, verification, formal correctness, tests important software engineering, however, argue evaluation—seen complement—can improve considerably. Moreover, approach way both, legitimize efforts spent, give evidence usefulness 2 Opportunities: What may learn from Empirical According Cohen [5] should answer three basic questions: How change agent’s structure affect its behavior given task environment? particular environment task? These questions be answered combination four kinds studies: exploratory yield causal hypotheses; assessment establish baselines, ranges, benchmarks; manipulation experiments test hypotheses about influences; finally observation (or quasi-experiments) disclose effects factors measured variables random assignment treatments [5, 9]. general goal defining have specified terms each area. As illustrative example, outline modeling. Similar results can obtained The seen process where layer prerequisite subsequent layers (see KI-Lexikon). Three approaches been proposed [3, 12, 16] basically just differ granularity. Thus, here introduced Weibelzahl [16, 17]. Figure shows layers: During observes registers events or cues (1). Based these input abstract properties inferred (2). Finally decides what how adapt (3) presents adapted interface (4). Each has evaluated guarantee adaptation success. 2.1 Evaluation Reliability Validity Input Data first evaluates reliability external validity KILexikon). Unreliable result misadaptations. Spooner Edwards [13] tried typical dyslexic authors spell checking system. In they stability

参考文章(17)
C. U. Lauer, S. Weibelzahl, Framework for the Evaluation of Adaptive CBR-Systems ,(2001)
Gerhard Weber, Marcus Specht, User Modeling and Adaptive Navigation Support in WWW-Based Tutoring Systems Springer, Vienna. pp. 289- 300 ,(1997) , 10.1007/978-3-7091-2670-7_30
Peter Brusilovsky, Demetrios Sampson, Charalampos Karagiannidis, The Benefits of Layered Evaluation of Adaptive Applications and Services ,(2001)
Stephan Weibelzahl, Alexandros Paramythis, Judith Masthoff, Evaluation of Adaptive Systems international conference on user modeling adaptation and personalization. pp. 292- 294 ,(2020) , 10.1145/3340631.3398668
Diane J. Litman, Shimei Pan, Empirically evaluating an adaptable spoken dialogue system international conference on user modeling, adaptation, and personalization. pp. 55- 64 ,(1999) , 10.1007/978-3-7091-2490-1_6
Joseph Beck, Mia Stern, Beverly Park Woolf, Using the Student Model to Control Problem Difficulty Springer, Vienna. pp. 277- 288 ,(1997) , 10.1007/978-3-7091-2670-7_29
André Berthold, Anthony Jameson, Interpreting symptoms of cognitive load in speech input international conference on user modeling, adaptation, and personalization. pp. 235- 244 ,(1999) , 10.1007/978-3-7091-2490-1_23
L. Miguel Encarnação, Stanislav L. Stoev, An application-independent intelligent user support system exploiting action-sequence based user modelling international conference on user modeling, adaptation, and personalization. pp. 245- 254 ,(1999) , 10.1007/978-3-7091-2490-1_24
Roger I. W. Spooner, Alistair D. N. Edwards, User Modelling for Error Recovery: A Spelling Checker for Dyslexic Users Springer, Vienna. pp. 147- 157 ,(1997) , 10.1007/978-3-7091-2670-7_17