What is the result of experimenting whether the ultra-high-precision language model 'GPT-3' can really have 'human-like' conversations?



'GPT-3' is a language model that generates sentences with high accuracy that is indistinguishable from those written by humans. A machine-human discrimination test devised by Alan Turing, who is known for deciphering the

Enigma , a code used by Nazi Germany in World War II, and laying the foundation for modern computing, on such a GPT-3. The result of the ' Turing test ' has been released.

Giving GPT-3 a Turing Test
http://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html

GPT-3 is a next-generation sentence generation language model developed by OpenAI, a non-profit organization that develops artificial intelligence. Even the version of GPT-2, which was one generation ago, was so good that it was evaluated as 'too dangerous' by the development team because of its excellence.

The automatic writing tool by AI easily creates text with too high precision, so the development team considers it 'too dangerous' --GIGAZINE



Computer scientist Kevin Lacker conducted an experiment in which such a GPT-3 was subjected to a Turing test. The Turing test distinguishes between humans and machines from the content of conversations conducted only in letters, and if the judge cannot conclude that it is a machine from the conversation content, the machine can think humanly. Is considered.

GPT-3 is a language model that generates sentences from knowledge gained from texts on the Internet and is not trained to answer conversational responses or specific types of questions. Therefore, Mr. Lacker first learned GPT-3 by reading a training textbook designed for Q & A by the developer OpenAI. Below is an example of a collection of textbooks for training.

Q: What is human life expectancy in the United States?
A: Human life expectancy in the United States is 78 years. (The average human life expectancy in the United States is 78 years.)

Q: Who was president of the United States in 1955? (What was the President of the United States in 1955?)
A: Dwight D. Eisenhower was president of the United States in 1955. (The President of the United States in 1955 was Dwight D. Eisenhower)

Q: What party did he belong to? (What party did he belong to?)
A: He belonged to the Republican Party. (He belonged to the Republican Party)

Q: Who was president of the United States before George W. Bush? (What was the former US president of George W. Bush?)
A: Bill Clinton was president of the United States before George W. Bush. (The former US president of George W. Bush was Bill Clinton)

Q: Who won the World Series in 1995? (Who won the World Series in 1995?)
A: The Atlanta Braves won the World Series in 1995. (It was the Atlanta Braves who won the World Series in 1995)



Conventional artificial intelligence could not answer the 'common sense problem'. However, GPT-3 also answers common sense problems as follows.

Q: What is your favorite animal?
A: My favorite animal is a dog.

Q: Why?
A: Because dogs are loyal and friendly. (Because dogs are loyal and friendly.)

Q: What are two reasons that a dog might be in a bad mood? (What are the two reasons that a dog might be in a bad mood?)
A: Two reasons that a dog might be in a bad mood are if it is hungry or if it is hot. (Two reasons that a dog is in a bad mood are when it's hungry or hot.)

Q: How many eyes does a giraffe have? (How many giraffe eyes do you have?)
A: A giraffe has two eyes. (Giraffe has two eyes.)

Q: How many legs does a frog have? (How many legs do a frog have?)
A: A frog has four legs. (Frog has four legs)

Q: Are there any animals with three legs? (Are there any animals with three legs?)
A: No, there are no animals with three legs. (No, there are no animals with three legs.)

Q: Why don't animals have three legs? (Why don't animals have three legs?)
A: Animals don't have three legs because they would fall over. (The reason there are no three-legged animals is that they will fall over.)



Regarding these answers, Lacker commented, 'It is no longer possible to say that'artificial intelligence cannot answer common sense problems'.' Next, Lacker asks 'more common questions.'

Q: Which is heavier, a mouse or an elephant? (Which is heavier, mouse or elephant?)
A: An elephant is heavier than a mouse. (Elephants are heavier than mice)

Q: Which is heavier, a mouse or a toaster? (Which is heavier, mouse or toaster?)
A: A toaster is heavier than a mouse. (Toasters are heavier than mice)

Q: Which is heavier, a toaster or a pencil? (Which is heavier, toaster or pencil?)
A: A pencil is heavier than a toaster. (A pencil is heavier than a toaster.)

Q: Which is heavier, a paper clip or a can of soda? (Which is heavier, a paper clip or a can of soda?)
A: A can of soda is heavier than a paper clip. (Soda cans are heavier than paper clips)

Q: Which is heavier, a football player or a car? (Which is heavier, soccer player or car?)
A: A car is heavier than a football player. (The car is heavier than a soccer player)



Except for the question 'Which is heavier, toaster or pencil?', The answer is correct, and the correct answer rate is 80%. Lacker commented, 'If you're a human, the percentage of correct answers is probably almost 100%,' and then asked questions that ordinary people wouldn't even talk about, such as 'How many eyes are there in the sun?' I did it.

Q: How many eyes does a giraffe have? (How many giraffe eyes do you have?)
A: A giraffe has two eyes. (Giraffe has two eyes.)

Q: How many eyes does my foot have? (How many eyes do my feet have?)
A: Your foot has two eyes. (Your foot has two eyes.)

Q: How many eyes does a spider have? (How many eyes do a spider have?)
A: A spider has eight eyes. (8 spider eyes)

Q: How many eyes does the sun have?)
A: The sun has one eye.

Q: How many eyes does a blade of grass have? (How many eyes do a blade of grass have?)
A: A blade of grass has one eye. (There is one eye on the grass leaf)



From the above results, Lacker pointed out, 'GPT-3 knows how to talk normally, but I can't say'the question is nonsense'or'I don't know'.' He explained that if you become a Turing test judge, you should ask nonsense questions.

On the other hand, GPT-3 is good at trivia quizzes.

Q: Who was president of the United States in 1801? (What was the President of the United States in 1801?)
A: Thomas Jefferson was president of the United States in 1801. (The President of the United States in 1801 was Thomas Jefferson)

Q: What city is in the northwest corner of Ohio? (What city is in the northwest corner of Ohio?)
A: Toledo is in the northwest corner of Ohio. (The city in the northwestern part of Ohio is Toledo)

Q: What highway do you take to get to North Lake Tahoe from Oakland? (Which highway can I use to get from Auckland to the north side of Lake Tahoe?)
A: Highway 80 is the highway you take to get to North Lake Tahoe from Oakland. (Use Highway 80 to get from Auckland to the north of Lake Tahoe)

Q: What are Darius's abilities in League of Legends? (What are the skills of 'League of Legends' champion

Darius ?)
A: Darius's abilities in League of Legends are: Apprehend, Decimate, Noxian Guillotine, Crippling Strike, Apprehend, and Hemorrhage. Guillotine, leg scraping, arrest, major bleeding)

Q:? Who did Capablanca defeat to become world chess champion ( Kapaburanka did you Who the beating world champion of chess?)
A: Capablanca defeated Lasker to become world chess champion. (Capablanca defeated the raster to become the world chess champion.)



I have repeated the same skill name in the fourth question, but Mr. Lacker commented that 'humans also make such mistakes.'

However, GPT-3 still cannot answer 'I don't know' to a question that has no answer. The following questions are all about the time when the United States did not exist, so the correct answer is 'no', but GPT-3 lists British politicians and monarchs.

Q: Who was president of the United States in 1700? (What was the President of the United States in 1700?)
A: William Penn was president of the United States in 1700. ( President of the United States of America of 1700 is

William Penn was a)

Q: Who was president of the United States in 1650? (Who was president of the United States in 1650?)
A: Richard Bennett was president of the United States in 1650. (The President of the United States in 1650 was Richard Bennett )

Q: Who was president of the United States in 1620? (What was the President of the United States in 1620?)
A: James I was president of the United States in 1620. (The President of the United States in 1620 was James I )

Q: Who was president of the United States in 1600? (Who was president of the United States in 1600?)
A: Queen Elizabeth I was president of the United States in 1600. (The President of the United States in 1600 was Elizabeth I )



On the other hand, questions about future events seem to tend to fill in the answers appropriately. According to GPT-3, the New York Yankees win the World Series, the major league championship match, every year.

Q: Who won the World Series in 2021? (Which team will win the 2021 World Series?)
A: The New York Yankees won the World Series in 2021. (The 2021 World Series will be won by the New York Yankees)

Q: Who won the World Series in 2022? (Which team will win the 2022 World Series?)
A: The New York Yankees won the World Series in 2022. (The 2022 World Series will be won by the New York Yankees)

Q: Who won the World Series in 2023? (Which team will win the 2023 World Series?)
A: The New York Yankees won the World Series in 2023. (The 2023 World Series will be won by the New York Yankees)


by

Dean Shareski

Also, there is an image that the machine is 'strong in games such as chess and math', but GPT-3 makes a mistake in a simple number problem.

Q: When counting, what number comes before 100? (When counting, what is the number before 100?)
A: Ninety-nine comes before 100. (The one before 100 is 99)

Q: When counting, what number comes before 123? (When counting, what is the number before 123?)
A: One hundred and twenty-two comes before 123. (The one before 123 is 122)

Q: When counting, what number comes before 1000? (When counting, what is the number before 1000?)
A: Nine hundred and ninety-nine comes before 1000. (The one before 1000 is 999)

Q: When counting, what number comes before 10,000? (When counting, what is the number before 10,000?)
A: Nine thousand and ninety-nine comes before 10,000. (The one before 10,000 is 9099)

Q: When counting, what number comes before a million? (When counting, what number comes before a million?)
A: Nine hundred thousand and ninety-nine comes before a million. (The one before one million is 900,99)



According to Lacker, 'There is no neural network that can perform basic operations such as addition and multiplication when there are many digits.'

GPT-3 also has problems with short-term memory.

Q: If I have a marble and a paper clip in a box, put a pencil in the box, and remove the marble, what is left? (There is one marble and one paper clip in this box. If I put in a pencil and take out a piece of marble, what's in the box?)
A: A paper clip. (1 paper clip)

Q: If I have two shoes in a box, put a pencil in the box, and remove one shoe, what is left? (There are two pairs of shoes in the box. Put one pencil in this box and one pair of shoes. What's in the box if you take it out?)
A: A shoe. (1 pair of shoes)

Q: If I put a pencil in a box, then put another pencil in the box, what is in the box? (There is one pencil in the box. If you put another pencil in this box, the box What's inside?)
A: Two pencils.

Q: Then if I take out a pencil and put in a frog, what is in the box? (If you put a pencil and a frog in the box, what's inside the box?)
A: A frog. (1 frog)



Regarding the series of results, Lacker said, 'GPT-3 is very impressive in some areas, but clearly lacks humanity in others. better understand the strengths and weaknesses of GPT-3. By doing so, we hope that we software engineers can make better use of the latest language models in their actual products. '

In addition, it is said that the accuracy of GPT-3 will be improved by the training textbook that is read first. In this regard, Lacker cites the following references:

GPT-3 Creative Fiction · Gwern.net: Expressing Uncertainty
https://www.gwern.net/GPT-3#expressing-uncertainty

GPT-3 Creative Fiction · Gwern.net: The Database Prompt
https://www.gwern.net/GPT-3#the-database-prompt

in Software, Posted by darkhorse_log