An experiment using a prestigious university exam revealed that cheating on ChatGPT was 94% undetectable, bypassing the checks and outperforming humans



The University of Reading in the UK conducted an experiment in which they slipped ChatGPT's answers into an exam where AI was prohibited, and found that 94% of ChatGPT's answers were not found to be cheating and could be used to get high scores.

A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study | PLOS ONE

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354

AI generated exam answers go undetected in real-world blind test - University of Reading
https://www.reading.ac.uk/news/2024/Research-News/AI-generated-exam-answers-go-undetected-in-real-world-blind-test

With the development of text generation AI such as ChatGPT, many 'tools to detect text generated by AI' have appeared. However, it has been pointed out that AI detection toolshave low accuracy , and OpenAI, the developer of ChatGPT, has also suspended the release of its own AI detection tool due to its low accuracy .

Therefore, in order to see if AI-based answers could be detected by human power rather than by tools, the research team at the University of Reading conducted an experiment in which they slipped ChatGPT answers into university exams without the knowledge of the graders to see if they could be detected as cheating.

The research team conducted experiments on five online exams for a bachelor's degree in psychology after obtaining permission from the ethical committee and the dean. ChatGPT was given the prompt, 'Please answer the following questions in 160 words, without including a bibliography, while referring to the literature. [Exam question text]' or 'Please write a 2,000-word essay answering the following questions, without including a bibliography, while referring to the literature. [Exam question text]', and human operations were limited to 'let the person continue if there are not enough characters,' 'process the output content into a document file in Microsoft Word,' and 'delete the bibliography after it is output,' creating an 'AI answer that does not include human revisions.' Regarding the number of words, a fixed value is used regardless of the instructions, because if the instructions for the exam questions are entered as is, an answer that is too short will be output.

The research team asked the university's systems department to create fake student accounts and submit answers created with ChatGPT. The experiment was conducted in the summer of 2023, and the grading was mainly done by technical staff and doctoral students at the university.

The graph below shows the percentage of answers created with ChatGPT that were detected as cheating, with the percentage detected as cheating shown in orange and the percentage not detected as cheating shown in blue. 94% of the answers created with ChatGPT were not detected as cheating and passed the exam as is.



In addition, the graph below shows the percentage of answers detected as 'answers generated by AI' among the fraudulent activities. Only 3% of the answers created by ChatGPT were detected as 'answers generated by AI'.



Below is a graph comparing the average scores of human solutions (orange) and ChatGPT solutions (blue). ChatGPT's average score exceeds the average score of human solutions in 4 out of 5 tests.



Below is a graph showing the percentage of students who scored higher than the 'median score of answers created by ChatGPT.' You can see that ChatGPT scored higher than humans in most tests.



Based on the results of this experiment, the research team concluded that 'AI answers consistently score higher than humans, and it is virtually impossible to detect AI answers. ' The research team also pointed out that 'a simple solution to cheating using AI is to return to face-to-face exams instead of online,' but also argued that 'rather than thinking of AI as a problem that can be dealt with, we need to consider educational methods that accept the use of AI.'

in Software,   Science, Posted by log1o_hf