作者: Stephan Weibelzahl , Gerhard Weber
DOI:
关键词:
摘要: While empirical evaluations are a common research method in some areas of Artificial Intelligence (AI), others still neglect this approach. This article outlines both the opportunities and limits for AI techniques exemplified by evaluation adaptive systems. Using so called layered approach, we demonstrate that able to identify errors systems would otherwise remain undiscovered. To encourage new implemented an online database studies concerned with (EASy-D). 1 Advantages: Why Evaluations needed Some apply methods regularly. E.g., planning search algorithms benchmarked standard domains, machine learning usually tested real data sets. However, looking at applied such as user modeling, rare. only quarter articles published User Modeling Adapted Interaction (UMUAI) reporting significant [4]. Many them include simple study small sample sizes often without any statistical methods. On other hand, estimation effectiveness, efficiency, usability system applies world scenarios, is absolutely necessary. Especially modeling which based on human-computer interaction require evaluations. Otherwise, going paper, certain types will Undoubtedly, verification, formal correctness, tests important software engineering, however, argue evaluation—seen complement—can improve considerably. Moreover, approach way both, legitimize efforts spent, give evidence usefulness 2 Opportunities: What may learn from Empirical According Cohen [5] should answer three basic questions: How change agent’s structure affect its behavior given task environment? particular environment task? These questions be answered combination four kinds studies: exploratory yield causal hypotheses; assessment establish baselines, ranges, benchmarks; manipulation experiments test hypotheses about influences; finally observation (or quasi-experiments) disclose effects factors measured variables random assignment treatments [5, 9]. general goal defining have specified terms each area. As illustrative example, outline modeling. Similar results can obtained The seen process where layer prerequisite subsequent layers (see KI-Lexikon). Three approaches been proposed [3, 12, 16] basically just differ granularity. Thus, here introduced Weibelzahl [16, 17]. Figure shows layers: During observes registers events or cues (1). Based these input abstract properties inferred (2). Finally decides what how adapt (3) presents adapted interface (4). Each has evaluated guarantee adaptation success. 2.1 Evaluation Reliability Validity Input Data first evaluates reliability external validity KILexikon). Unreliable result misadaptations. Spooner Edwards [13] tried typical dyslexic authors spell checking system. In they stability