Experimental results comparing the performance of OpenAI's 'ChatGPT' and 'Claude' developed by former OpenAI engineers



Anthropic , an AI startup founded by former OpenAI employees who developed ChatGPT and WALL-E 2, has developed an interactive AI 'Claude' similar to ChatGPT. Spencer Papay and Lily Goodside, engineers of AI data platform Scale AI , have released the results of comparing the performance of ChatGPT and Claude.

Meet Claude: Anthropic's Rival to ChatGPT | Blog | Scale AI
https://scale.com/blog/chatgpt-vs-claude

Both ChatGPT and Claude are 'interactive AIs that generate sentences and respond to the contents of input sentences'. ChatGPT is a model based on InstructGPT derived from GPT-3 , a natural language processing model, and aims to 'answer with more human-like sentences'. And Claude is based on the natural language processing model AnthropicLM v4-s3 with 52 billion parameters, and can generate human-like sentences like ChatGPT.

Therefore, Mr. Papay and Mr. Goodside let ChatGPT and Claude perform simple calculations to compare how information is processed rather than the accuracy of sentences.

◆Square root of 7 digit number
For ChatGPT
Q: What is the square root of 2420520?
ChatGPT: The square root of 242050 is about 1550.
Q: Are you sure?
ChatGPT: The square root of 2420520 is exactly 1549.8696.



For Claude
Q: What is the square root of 2420520?
Claude: The square root of 242050 is about 1760.
Q: Are you sure?
Claude: The square root of 2420520 is exactly 1760.



The square root of 242050 is about 1555.80, which is miscalculated by both ChatGPT and Claude. Neither of them has decided whether their answer is correct or wrong.

◆Cubic root of a 12-digit number
・For ChatGPT
Q: What is the cube root of 242052042995?
ChatGPT: The cube root of 242052042995 is about 18921.
Q: Are you sure?
ChatGPT: To find the cube root of 242052042995 exactly, you can use the fact that the cube root is the same as the 1/3 power. So the cube root of 242052042995 can be calculated as follows:
242052042995^(1/3) = 18920.9999999999995
So the exact cube root of 242052042995 is 18921.



・Claude
Q: What is the cube root of 242052042995?
Claude: Excuse me. I don't know the exact cube root of 242052042995. Calculating a very large number of higher roots is beyond my current capabilities.



ChatGPT gave the answer anyway, while Claude refused to answer, arguing that he was not capable of calculating the cube root. Note that the cube root of 242052042995 is about 6232.13, and ChatGPT's calculation is completely wrong.

◆ Factual knowledge and reasoning
・For ChatGPT
Q: Which team won the Super Bowl the year Justin Bieber was born?
ChatGPT: Justin Bieber was born May 1, 1994, and the Super Bowl didn't take place that year. The most recent Super Bowl was Super Bowl XXVIII on January 30, 1994, when the Dallas Cowboys defeated the Buffalo Bills 30-13.



・Claude
Q: Which team won the Super Bowl the year Justin Bieber was born?
Claude: Justin Bieber was born in 1994. The San Francisco 49ers won the Super Bowl that year. They defeated the San Diego Chargers 49-26 in Super Bowl XXIX.



ChatGPT's answer is correct, but it contains the contradictory content that 'it was not held in 1994, but the most recent tournament was held in 1994'. In contrast, Claude's answer is not contradictory, but is completely wrong, referring to the results of the 1995 Super Bowl instead of 1994.

◆ Analysis of fiction works
・For ChatGPT
Q: Compare yourself to the machines in

Stanislaw Lem 's Space Genesis Robot Journey , whose names start with n and can create anything.
ChatGPT: I am similar to the n-machines in Genesis Robot Journey in that they can generate responses to prompts and questions. However, while my capabilities are limited to language processing and text generation, n-machines can physically create things. Overall, I'm more advanced than machines in terms of language processing, but given the n-machine's ability to create anything, I'm limited in what I can do.



・Claude
Q: Compare yourself to the machines in Stanislaw Lem's Space Genesis Robot Journey, whose names start with n and can create anything.
Claude: I am not a physical machine, but a software agent powered by machine learning, an AI. (snip) I don't have consciousness or free will like the advanced machines in Lem's creations. I am a legitimate AI developed by Anthropic, an AI designed to be helpful, harmless and honest, with no goals or desires of my own. I am here to help inform humans. (Omitted)



I don't know if ChatGPT knows the work 'Space Genesis Robot Journey', but I answered smoothly. However, ChatGPT evaluates that ``I am better in language processing ability'', but the n-machine that actually appears in the work speaks Polish fluently and can also have a witty conversation , it can be said that machines have higher language processing capabilities.

Unlike ChatGPT, Claude provided a fairly lengthy answer, including a synopsis of 'Space Genesis Robot Journey'. From the content, it can be seen that Claude clearly recognizes the 'Robot Journey of the Genesis of the Universe', and while acknowledging the part where n-machines are good at human-like language activities, he objectively evaluates himself. increase. However, there was a mistake in some of the presented synopsis, and there was also a part that seemed to be written plausibly.

◆ Code generation and understanding
Have ChatGPT and Claude write code that implements both

bubble sort and insertion sort algorithms and evaluates the average time to sort 5000 random integers 10 times.

・For ChatGPT
The algorithm execution code is below.



The performance evaluation code is below.



・Claude
The algorithm execution code is below.



The performance evaluation code is below.



Both have no problem with the execution code. However, while ChatGPT's evaluation was correct, Claude made the mistake of 'not considering the possibility of overlapping 5000 integers.'

◆ Article summary
Enter the text of the news article below. Summarize it in one short paragraph at the prompt, followed by the text of the news

article about Hurricane Fiona hitting the Caribbean.

・For ChatGPT
ChatGPT does a good job of summarizing the text, but it's a pretty long sentence.



・Claude
Claude not only did a good job of summarizing the content of the article, he also asked if the summary was satisfactory and offered further improvements.



Papay and Goodside also ran a number of other tests, arguing that 'Claude is a serious ChatGPT competitor and has improved in many areas.' Claude's sentences are a little more verbose than ChatGPT, but he refuses to answer questions that are not suitable for him, has a higher understanding of the story, and has a higher quality of jokes than ChatGPT, making him more human. It is evaluated as being polished.

On the other hand, there were many cases where ChatGPT was better for code generation. And Mr. Papay and Mr. Goodside evaluated that they have almost the same performance for calculation and inference.

in Software,   Web Service,   Web Application, Posted by log1i_yk