What is the accuracy of the tool `` GPTZero '' that distinguishes between humans and AI who wrote sentences?



With the advent of `` GPT-3 '' and `` ChatGPT '' that generate highly accurate sentences, the issue of `` How can we distinguish sentences written by humans from sentences written by AI? . Meanwhile,

Edward Tian , who is enrolled in Princeton University in the United States, has released a tool ' GPTZero ' that can distinguish between texts written by humans and texts written by ChatGPT, and has attracted a great deal of attention. In response, Jacob Gonzales, a university student who runs a technology-related blog, reports the results of testing the accuracy of GPTZero in medical papers.

GPTZero Case Study (Exploring False Positives) | Gonzo Knows
https://gonzoknows.com/posts/GPTZero-Case-Study/

GPTZero, published by Tian, is a tool created to determine whether the text you entered was written by a human or by ChatGPT. GPTZero is trained on a data set similar to ChatGPT, and verifies the complexity and variation of the text to identify sentences that are likely to have been written by ChatGPT.

Mr. Tian has released a demonstration video in which sentences written by a human writer published in the American magazine The New Yorker and sentences generated by ChatGPT are input into GPTZero. After the release of GPTZero, it seems that Mr. Tian was contacted by educators around the world.



However, GPTZero cannot distinguish human-written sentences from ChatGPT sentences with 100% accuracy, and it is unknown how accurate it actually is. So, Gonzales used a paper on the new coronavirus infection (COVID-19) published in 2021 to check how GPTZero would judge it.

When I entered the first paragraph of the paper, GPTZero decided that '50% or more was written by AI'. However, when the paper was published, ChatGPT did not appear, and the paper was written by researchers at the Centers for Disease Control and Prevention (CDC) , so this is a false positive, Gonzales said. points out.



After that, Mr. Gonzales used 20 neurology-related papers published online by the US National Library of Medicine and entered the abstract part into GPTZero. As a result, 11 out of 20 papers were determined to be ``possibly written by AI,'' but most of these papers were published before 2020 and were false positives by GPTZero. was, claims Gonzales.



In fact, the papers that were misidentified as 'highly likely to be written by AI' in Mr. Gonzales' survey are as follows.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7164350/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8093009/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7668548/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8055322/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5894931/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6105044/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3776536/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5047042/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4762419/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7538222/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3590056/

In order to use GPTZero, it is necessary to register on the wait list, but since it was available for free from the web only for the first time, I tried entering the beginning part of English Wikipedia into GPTZero.



Then, it was judged that 'Your text is likely to be written entirely by AI'.



Gonzales pointed out that inaccuracies in commercial software can cause many problems. “I personally think the biggest problem is plagiarism detection in education. It could be detrimental to us,” he said.

in Software,   Web Service, Posted by log1h_ik