An attack method has appeared that steals hidden information and some functions from ChatGPT and large-scale language models (LLM)
A ' model-stealing attack' that can steal confidential information and some functions from ChatGPT, OpenAI's chat AI,
[2403.06634] Stealing Part of a Production Language Model
https://arxiv.org/abs/2403.06634
Google announces Stealing Part of a Production Language Model
— AK (@_akhaliq) March 12, 2024
We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the… pic.twitter.com/bgBCTYywWN
A research team consisting of Google DeepMind 's Nicolas Carlini and others devised a method called 'model theft attack' to steal 'confidential information that is originally hidden' from AI and LLM. Other researchers involved in the research include ETH Zurich, the University of Washington, Google Research, Cornell University, and OpenAI developers.
The research team first discovered the existence of 'model theft attacks' in 2020, but in October 2023, this attack method was found to be effective on APIs used in language models that are actually in operation. It seems that it was not thought that attacks using model theft attacks were possible until it was discovered that there was.
The research team conducted a proof of concept of a model theft attack in November 2023, and in December of the same year disclosed information to multiple services confirmed to be vulnerable to this attack method. , ensuring each service has time to fix vulnerabilities. The company also shared details of attacks on several popular services that are not vulnerable to model theft attacks.
In response to this notification, Google implemented updates to address the vulnerability. Since OpenAI implemented an update against attacks on March 3, 2024, it appears that a paper on model theft attacks was published on March 11, 2024 local time.
The research team launched a model theft attack on several white box models and verified that the model theft attack actually works. We then conducted model theft attacks against Ada, the fastest model in OpenAI's LLM GPT-3, and Babbage, a model that can perform simple tasks quickly and at low cost, and used each model to We have succeeded in stealing the entire layer. Of course, before conducting the attack, the research team notified OpenAI of the intention to carry out the attack and obtained approval.
Furthermore, the research team has confirmed that model theft attacks are also effective against 'GPT-3.5-turbo-instruct' and 'GPT-3.5-turbo-chat.' However, the research team did not reveal information about the size of these AI models as part of a responsible disclosure agreement. However, the size of the hidden layer stolen from each model was checked with OpenAI, and the stolen information was confirmed to be accurate.
Until now, model theft attacks were thought to be impractical among AI experts, but this paper shows that model theft attacks can be used to steal some functions from an AI model, It has become clear that it is possible to steal some of it. However, the research team points out that stealing a model is not more cost-effective than training your own model, and it is difficult to perform a model theft attack that almost completely restores the model.
The research team points out that the reason for the success of the model theft attack is that ``a small number of model providers have made logit bias parameters available'', and `` Anthropic '' is the only model provider that does not provide this type of API. is named. The research team pointed out that ``API design with security in mind is necessary'' based on this case where even a small decision in API design could lead to an attack on an AI model.
The research team points out that attack methods targeting AI models that are more practical than model theft attacks will appear in the future.
Related Posts: