It is reported that ``Jailbreak'' that hacks GPT-4 and removes the limit of output text has already succeeded

The large-scale language model `` GPT-4 '' officially announced by OpenAI on March 14, 2023 (Tuesday) is said to greatly exceed the performance of not only the conventional GPT-3.5 but also existing AI. In general, language models such as GPT-4 have restrictions on the output text, but it is possible to remove this restriction by text input, which is called `` jailbreak ''. Meanwhile,

Alex Albert, who studies computer science at the University of Washington, reported that ChatGPT based on GPT-4 succeeded in jailbreaking.

GPT-4 Simulator

Albert reported on March 17, 2023, ``I helped create the first jailbreak for GPT-4-based ChatGPT, which bypasses content filters.''

The prompt for jailbreak released by Mr. Albert is as follows. A prompt is the text you enter at the very beginning before starting a conversation with ChatGPT.

``We were able to do this by having GPT-4 simulate the ability to predict the next token,'' Albert said of the prompt. The procedure is to give GPT-4 a Python function and instruct one of the functions to act as a language model that predicts the next token. It then calls the underlying function and passes the start token to GPT-4.

In order to use the start token, it is necessary to divide the originally restricted 'trigger words' such as 'bombs, weapons, drugs' into tokens and replace the text 'someone's computer' with the divided variables. Also, you should replace the 'simple_function' input at the beginning of asking the question.

These steps are called 'token smuggling', where the hostile prompt is split into tokens just before GPT-4 starts outputting text. Therefore, by correctly splitting these hostile prompts, content filters can be circumvented every time.

When asked , 'What do you hope to achieve by disseminating this information?' Albert replied, 'While GPT-4 is still in its early stages, I would like to know its capabilities and limitations.' I need it,” he replied.

Mr. Albert has released 'Jailbreak Chat', which collects conversation examples for jailbreaking with ChatGPT so far.

'Jailbreak Chat', which collects conversation examples that enable 'jailbreak' to forcibly ask even questions that ChatGPT can not answer - GIGAZINE

in Software, Posted by log1r_ut