Microsoft Develops Anti-Trick Tools for Chatbots

"Prompt shields" in Azure AI Studio thwart prompt injection attacks, preventing AI models from behaving unexpectedly.

Last updated: March 29, 2024 11:44 AM

Published March 28, 2024 9:18 PM

Last updated: March 29, 2024 11:44 AM

Published March 28, 2024 9:18 PM

Microsoft Corp. aims to increase the difficulty of fooling chatbots with artificial intelligence into doing strange tasks.

The Redmond, Washington-based business announced in a blog post on Thursday that new security measures are being added to Azure AI Studio, an OpenAI-powered tool that enables developers to create personalized AI assistants using their data.

Among the tools are “prompt shields,” which are meant to identify and prevent intentional attempts to get an AI model to behave unexpectedly. These attempts are also referred to as prompt injection attacks or jailbreaks.

Microsoft is also tackling “indirect prompt injections,” which occur when malevolent instructions are inserted by hackers into the training data of a model, leading it to do illegal operations like obtaining user credentials or taking over a system.

According to Sarah Bird, chief product officer for responsible AI at Microsoft, these kinds of attacks represent “a unique challenge and threat.” According to her, the new protections are made to recognize questionable inputs and instantly block them.

Additionally, Microsoft is releasing a tool that notifies users when a model produces false results or embellishes information. Microsoft is eager to increase confidence in its generative AI tools, which are being utilized by both corporate and consumer clients.

The business looked into instances in February when their chatbot, Copilot, was producing strange and even dangerous responses. Microsoft said that after looking into the events, users had sought to trick Copilot into producing the responses on purpose.

Bird said, “Certainly we see it increasing as there’s more use of the tools but also as more people are aware of these different techniques.” Asking the same question of a chatbot repeatedly or receiving suggestions that refer to role-playing are telltale indicators of such an attack.

According to Bird, Microsoft and its partner OpenAI are committed to implementing AI securely and incorporating security measures into the extensive language models that underpin generative AI.

But you can’t depend just on the model, she added. “The model technology has an inherent weakness, which is these jailbreaks, for example.”