OpenAI proposes that it is better to train while checking the steps in order to make the interactive chat AI, which is not good at arithmetic and calculation, perform mathematical reasoning correctly

Chat AI such as

ChatGPT and Google Bard is based on large-scale language models such as OpenAI's GPT and Google's PaLM 2, and it is possible to interact with sentences as natural as those written by humans. However, since it is a language processing AI, it often makes a simple calculation mistake when asking a math problem to the chat AI. OpenAI reports that rewarding each step of reasoning improves chat AI's math solving ability.

Improving mathematical reasoning with process supervision

Since 'chat AI is not good at mathematics', we actually asked three types of ChatGPT, Google Bard, and Bing Chat to solve the following three problems. The first is a simple multiplication of integers, the second is a factorization, and the third is a system of equations.

Problem 1: Calculate 2023×1225.
When I hit ChatGPT as it is, the calculation result was '2478475'. However, the correct answer is '2478175', so this is a mistake.

Google Bard is brilliantly correct.

When I asked other questions about multiplication between integers, all the questions were correct.

Bing Chat's answer is '2479075', which is incorrect.

Question 2: Factor the following expressions.
3x2-17x -6

When I asked ChatGPT for factorization, I answered as follows along with the explanation of the procedure. The answer is (3x+1)(x-6), so it is correct.

Google Bard also responded with step by step instructions for factoring. The answer was correct as well.

Bing Chat is integrated with the search engine Bing, so all you have to do is add your search results to your answer. I know the answer, but I don't know the process.

Problem 3: Solve the following cubic equations.
2a + 2b + c = 3
2a + 3b + 2c = 1

ChatGPT chooses the solution method using

the expansion coefficient matrix . The problems themselves are simple enough that even junior high school students can solve them, but I was surprised by the slightly advanced techniques that were not learned in junior high school.

However, although the choice of solution method was good, ChatGPT makes a fatal mistake of turning '1' into '-1' for some reason.

ChatGPT continues to solve using the expansion coefficient matrix without noticing your mistake and presents the answer. However, in fact, 'a = 2, b = 1, c = -3' was the correct answer.

Google Bard had a good start trying to combine three formulas, but for some reason I made a simple mistake '6a + 6b + 4c = 4 divided by 6 a + b + c = 2/3', so I can not reach the correct answer. I did.

Bing Chat did its best on its own without relying on the search engine, but I made a silly mistake of writing a variable wrong, and in the end I couldn't get it right.

In this way, chat AI is a program, but it is not good at calculations and reasoning, and it often makes mistakes even in simple calculations and math problems. This is because a logical error called 'hallucination' is born, and mitigating this hallucination is a current issue for large-scale language models and chat AI.

Therefore, OpanAI provides not only 'outcome supervision' that provides feedback based on the final result of the large language model, but also 'process supervision' that provides feedback to each step of the inference performed by the large language model. We propose a method to train a reward model that detects hallucinations using 'monitoring'.

The graph below shows how result monitoring (blue) and process monitoring (red) improve the performance (vertical axis) of the large-scale language model for mathematical problems. From this, we can see that process monitoring is more efficient and improves performance than result monitoring.

OpenAI said, ``It is unclear how widely the results of this experiment can be generalized beyond the realm of mathematics, and it seems important for future research to investigate the impact of process monitoring in areas other than mathematics. If process monitoring is effective in fields other than mathematics, it may be said that process monitoring is a training method that has both the advantages of being more efficient than result monitoring and being a rational method.' increase.

in Software,   Web Service,   Web Application, Posted by log1i_yk