作者: Veronica Garcia , Catherine Welch , Lawrence M. Rudner
DOI:
关键词: Probability model 、 Mathematics education 、 Computer science 、 Automated essay scoring 、 Inter-rater reliability 、 Scoring system 、 Bayesian probability 、 Pearson product-moment correlation coefficient 、 Writing assessment 、 Statistics
摘要: This report provides a two-part evaluation of the IntelliMetricSM automated essay scoring system based on its performance essays from Analytic Writing Assessment Graduate Management Admission TestTM (GMATTM). The IntelliMetric is first compared to that individual human raters, Bayesian employing simple word counts, and weighted probability model using more than 750 responses each six prompts. second, larger compares ratings those raters approximately 500 101 Results both evaluations suggest consistent, reliable for AWA with perfect + adjacent agreement 96% 98% 92% 100% instances in 1 2, respectively. Pearson r correlations between averaged .83 evaluations. Volume 4, Number 4