'Defective automatic scoring system' evaluates essay of millions of students
“Exams” are often followed for exams and graduations, but if you try to score the exams of many students, the amount of work will be enormous. Although automatic scoring with algorithms is becoming popular in the United States as it reduces the burden on humans, Motherboard reports that “automatic scoring is still unreliable for scoring sentences”.
Flawed Algorithms Are Grading Millions of Students' Essays-VICE
An automatic paper scoring system using algorithms is called 'Natural language processing artificial intelligence systems', but it is also commonly called 'Small paper automatic scoring system'. Motherboard pointed out the problem with the automatic essay scoring system, part 1. The automatic essay scoring system has a bias . This essay automatic scoring system scores points by recognizing patterns that correlate with 'an essay with a high score by a human grader' and 'an essay with a low score'. However, the problem is that the essay automatic scoring system also learns human prejudice because human scorers add or subtract points based on their own prejudice against 'dialects' in sentences rooted in race. Has occurred.
AI learns female discrimination and racism from human language-gigazine
The bias is a high score for Chinese students and a low score for African Americans. GRE is scored on the basis of a maximum score of 6, but Chinese students tend to get 1.3 points on average and African Americans tend to get 0.81 points on average.
Chinese students tended to get high scores on the items of “length of essay” and “use of sophisticated words”, but “many Chinese students“ round memorize ”sophisticated sentences. 'Maybe it is caused by writing as it is,' Motherboard wrote. On the other hand, African Americans tended to get low scores in items such as `` grammar '', `` style '', `` sentence composition '', but Motherboard said `` If humans score, it will be a pretty good score There was also a case. ”
Regarding the question of 'can we correct the bias against dialects rooted in these races and nationalities?' Said Brent Bridgeman, a senior researcher at ETS, said, 'When adjusting the scores for a dialect of a particular race or nationality, It ’s the same thing, because it puts another bias on candidates of other races and nationalities, ”he says.
All the essay questions given in the exams conducted by ETS are scored by e-raters and humans, and if the scores are different, they will be scored by another person. For this reason, ETS says that there is no influence of e-rater bias.
Another problem with the automatic essay scoring system is that if sophisticated words are used, high scores will be given even if the content is unknown. Motherboard is a software ' Basic Automatic BS Essay Language Generator (BABEL Generator ) ”, We conducted an experiment to score GRE's online essay scoring service ScoreltNow! As a result, the essay converted by BABEL Generator scored 4 out of 6 points. This score is a score that is said to be “a sentence that clearly conveys meaning and has sufficient ability for discussion”.
In the United States, it is illegal to disclose test scores without written consent under federal law, so it is difficult to conduct external audits on essay automatic scoring systems and to study bias on algorithms. However, Motherboard reports that 21 out of 50 states in the United States have adopted an automatic essay scoring system, 18 of which have left their scoring to an automatic essay scoring system.
in Software, Posted by darkhorse_log