An attempt to watermark the text generated by ChatGPT so that it can be easily understood



While the interactive chat AI `

` ChatGPT '' announced by OpenAI has a very high ability to generate sentences, it sometimes creates `` meaningless sentences like that '', so it is temporary on Stack Overflow, a coding Q&A site. use is prohibited . In light of this situation, OpenAI engineer Scott Aaronson has revealed that he is developing a tool to watermark the generated text.

Shtetl-Optimized » Blog Archive » My AI Safety Lecture for UT Effective Altruism
https://scottaaronson.blog/?p=6823



OpenAI's attempts to watermark AI text hit limits | TechCrunch

https://techcrunch.com/2022/12/10/openais-attempts-to-watermark-ai-text-hit-limits/

ChatGPT has a very high accuracy as a sentence generation AI, and there are reports of cases where it passed free-form questions in college-level exams . On the flip side, this also means that you can cheat by telling ChatGPT the test questions and have them answer for you, and besides this, it can be used to create high-quality phishing emails and malware.

Also, as mentioned in the notes on using ChatGPT, it may return 'an answer that looks like it is meaningless'.

From these problems, it is considered necessary to add a ``watermark'' that can distinguish AI-generated sentences such as ChatGPT.

In 'GPT (Generative Pre-trained Transformer)', which is also used in ChatGPT, 'input' refers to a series of N words, also called tokens. 'Output' refers to the word that is most likely to be placed at the end of the token. The detailed mechanism is summarized in the following article.

Experts explain what kind of processing OpenAI-developed text generation AI ``GPT-3'' is doing - GIGAZINE



GPT is always generating a probability distribution of the next generated token given the string of previous tokens. After the neural network generates the distribution, OpenAI's server samples the tokens according to the distribution. Or modify the distribution according to the parameters and sample. Unless this parameter is '0', there is some degree of randomness in the sampling results, and you can get different tokens each time.

According to Aaronson, when watermarking, instead of choosing the next token completely randomly, the token will be chosen pseudo-randomly using a cryptographic pseudo-random function that only OpenAI knows the secret key. About. Even if the end user cannot tell whether the random number at the output is true or pseudo-random, if he knows the private key, he can tell that it was generated by GPT.

In response to the naturally occurring question, ``If OpenAI controls the server, why not add a watermark?'', Aaronson said, ``Who uses GPT and how How do you find out whether GPT generated a particular text or not, while keeping it a secret?' He points out the difficulty.

According to Aaronson, OpenAI engineer Hendrik Kirchner has already prototyped a watermarking tool, and it's working quite well.

Also, in the future, he seems to want to watermark not only GPT but also DALL E. As for watermarking images, the first thing that comes to mind is the one at the pixel level, but since it is easy to remove, he plans to add a watermark at the conceptual level. However, it is still unclear whether it will work.

in Web Service, Posted by logc_nt