Microsoft announces ``PyRIT'', a loophole testing tool for generated AI

On February 22, 2024, Microsoft announced the release of PyRIT (Python Risk Identification Toolkit for Generative AI), an automated tool that identifies risks in generative AI.

GitHub - Azure/PyRIT: The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

Announcing Microsoft's open automation framework to red team generative AI Systems | Microsoft Security Blog

Microsoft releases automated PyRIT red teaming tool for finding AI model risks - SiliconANGLE

Generative AI has issues such as ' hallucinations ' that output incorrect information and inappropriate results , and AI companies are responding by limiting functions to limit the negative effects of these problems, but users It becomes a game of cat and mouse as they use all sorts of tricks to find a way out .

Copilot, Microsoft's generative AI, is no exception, so Microsoft has established an internal AI Red Team, a red team specializing in AI, to work on responsible AI development.

PyRIT, released by Microsoft this time, is a library developed by the AI Red Team for AI researchers and engineers, and its biggest feature is that it automates the 'red teaming' of AI systems, making it easier for human experts to identify AI risks. This means that the time it takes is significantly reduced.

Traditional testing requires a human red team to manually generate adversarial prompts to prevent the AI from outputting malware or spitting out sensitive information from the training dataset. did.

Additionally, adversarial prompts had to be generated for each format of AI output, such as text or images, and for each API that the user interacted with, making this a tedious and time-consuming task.

On the other hand, with PyRIT, you can simply specify the type of adversarial input to the AI and automatically generate thousands of prompts that meet those criteria. For example, in an exercise Microsoft conducted with Copilot, the time it took to select a hazard category, generate thousands of malicious prompts, and then evaluate all of Copilot's output was reduced from weeks to hours. thing.

In addition to generating adversarial prompts, PyRIT can also look at the AI model's reactions and automatically determine whether it has produced harmful output in response to a certain prompt, or analyze the AI's response to adjust the prompt. This makes the entire test more efficient.

'PyRIT does not replace manual red teaming, but rather enhances red team expertise and automates tedious tasks, allowing security professionals to more acutely investigate potential risks,' Microsoft said in a statement. It is something that I will do.”

in Software, Posted by log1l_ks