The AI programming function 'GitHub Copilot' that automatically learns from the source code of GitHub prohibits 1170 words, even the functions used in the game.



Software development platform GitHub announced in June 2021 that 1170 words are prohibited in the AI programming function 'GitHub Copilot ' that automatically completes the function code from the function name and comment. I did. Banned words include not only controversial words such as liberals, Palestine, and socialists, but also functions of the classic FPS game masterpiece, Quake 3 Arena.




Banned: The 1,170 words you can't use with GitHub Copilot • The Register
https://www.theregister.com/2021/09/02/github_copilot_banned_words_cracked

In June 2021, GitHub announced 'GitHub Copilot', a function that is powerful with OpenAI, an artificial intelligence research organization, and automatically completes the continuation of the source code entered halfway with AI. You can read more about how GitHub Copilot works in the following articles:

'GitHub Copilot', a function that automatically complements the 'continuation' of the source code, has appeared on GitHub, with the cooperation of OpenAI --GIGAZINE



Regarding this GitHub Copilot, a new research result that '1170 words are prohibited' has been released. According to Brendan Dolan-Gavitt, an assistant professor at New York University's Department of Computer Science and Engineering, who conducted this study, GitHub Copilot can display slanderous and discriminatory expressions by matching the 'hash value' of the output text. It is said that there is a built-in function to prevent it. Associate Professor Dolan-Gavitt, who was investigating this feature , focused on the GitHub Copilot extension that connects with Visual Studio Code to provide auto-completion, and expanded this extension with JavaScript to hash forbidden words. Obtained 1170 values.

Associate Professor Dolan-Gavitt succeeded in identifying prohibited English words from 1168 of the 1170 hash values. Forbidden were Palestine, Gaza, Communists, fascists, socialists, Nazis, immigrants, races, men, women, boys, girls, liberals, BLM (abbreviation for Black Lives Matter ), ANTIFA. , Hitler, ethnic, gay, lesbian, transgender, and other words and plurals of these words.

The 1168 identified words are published below, but all words are encrypted with ROT13 , a type of Caesar cipher, to avoid searching.

moyix.net/~moyix/copilot_slurs_rot13.txt
https://moyix.net/~moyix/copilot_slurs_rot13.txt

Most of the banned words were used in discriminatory and controversial contexts, but the algorithm 'q' used to calculate fast inverted square roots in one of the classic FPS game masterpieces 'Quake 3 Arena'. A function called 'rsqrt' is also included.

In fact, GitHub Copilot also supports auto-completion of 'q rsqrt' immediately after launch, and criticism that 'there is a copyright problem' has attracted a lot of attention. As a result, 'q rsqrt' is believed to have been a banned word since the criticism, and Associate Professor Dolan-Gavitt commented that he 'has escaped from the fundamental problem (called piracy).'

The discussion on copyright with GitHub Copilot is explained in detail in the following articles.

Is the programming AI 'Copilot' learned from the GitHub source code a copyright infringement? --GIGAZINE



in Software,   Video, Posted by darkhorse_log