作者: José Hernández-Orallo
DOI:
关键词:
摘要: Artificial intelligence develops techniques and systems whose performance must be evaluated on a regular basis in order to certify foster progress the discipline. We will describe critically assess different ways AI are evaluated. first focus traditional task-oriented evaluation approach. see that black-box (behavioural evaluation) is becoming more common, as complex unpredictable. identify three kinds of evaluation: Human discrimination, problem benchmarks peer confrontation. limitations many settings competitions these categories propose several ideas for systematic robust evaluation. then less customary (and challenging) ability-oriented approach, where system characterised by its (cognitive) abilities, rather than tasks it designed solve. discuss possibilities: adaptation cognitive tests used humans animals, development derived from algorithmic information theory or general approaches under perspective universal psychometrics.