AI Evaluation: past, present and future ∗

作者: José Hernández-Orallo

DOI:

关键词:

摘要: Artificial intelligence develops techniques and systems whose performance must be evaluated on a regular basis in order to certify foster progress the discipline. We will describe critically assess different ways AI are evaluated. first focus traditional task-oriented evaluation approach. see that black-box (behavioural evaluation) is becoming more common, as complex unpredictable. identify three kinds of evaluation: Human discrimination, problem benchmarks peer confrontation. limitations many settings competitions these categories propose several ideas for systematic robust evaluation. then less customary (and challenging) ability-oriented approach, where system characterised by its (cognitive) abilities, rather than tasks it designed solve. discuss possibilities: adaptation cognitive tests used humans animals, development derived from algorithmic information theory or general approaches under perspective universal psychometrics.

参考文章(151)
David L Dowe, Alan R Hajek, A Non-Behavioural, Computational Extension to the Turing Test computational intelligence. pp. 101- 106 ,(1998)
David L Dowe, Alan R Hajek, A computational extension to the Turing test Monash University. ,(1997)
Edward H. Shortliffe, Harry E. Pople, Philip Klahr, Allan Terry, John Gaschnig, Evaluation of expert systems: issues and case studies ,(1983)
Eric Steinhart, Johnny H. Soraker, Amnon H. Eden, James H. Moor, Singularity Hypotheses: A Scientific and Philosophical Assessment ,(2013)
Selmer Bringsjord, Bettina Schimanski, What is artificial intelligence? psychometric AI as an answer international joint conference on artificial intelligence. pp. 887- 893 ,(2003)
Allen Newell, None, You can't play 20 questions with nature and win : projective comments on the papers of this symposium Visual Information Processing#R##N#Proceedings of the Eighth Annual Carnegie Symposium on Cognition, Held at the Carnegie-Mellon University, Pittsburgh, Pennsylvania, May 19, 1972. pp. 283- 308 ,(1973) , 10.1016/B978-0-12-170150-5.50012-3
Tarek Richard Besold, A Note on Chances and Limitations of Psychometric AI Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz). pp. 49- 54 ,(2014) , 10.1007/978-3-319-11206-0_5
Franz Oppacher, Hassan Masum, Steffen Christensen, The turing ratio: metrics for open-ended tasks genetic and evolutionary computation conference. pp. 973- 980 ,(2002)
Javier Insa-Cabrera, José-Luis Benacloch-Ayuso, José Hernández-Orallo, On Measuring Social Intelligence: Experiments on Competition and Cooperation Artificial General Intelligence. pp. 126- 135 ,(2012) , 10.1007/978-3-642-35506-6_14