Friday, November 22, 2024

ConfusedPilot Attack Can Manipulate RAG-Based AI Systems

Attackers can add a malicious document to the data pools used by artificial intelligence (AI) systems to create responses, which can confuse the system and potentially lead to misinformation and compromised decision-making processes within organizations.

Researchers from the Spark Research Lab at the University of Texas (UT) at Austin discovered the attack vector, which they’ve dubbed ConfusedPilot because it affects all retrieval augmented generation (RAG)-based AI systems, including Microsoft 365 Copilot. This includes other RAG-based systems that use Llama, Vicuna, and OpenAI, according to the researchers.

“This attack allows manipulation of AI responses simply by adding malicious content to any documents the AI system might reference,” Claude Mandy, chief evangelist at Symmetry, wrote in a paper about the attack, which was presented at the DEF CON AI Village 2024 conference in August but was not widely reported. The research was conducted under the supervision of Symmetry CEO and UT professor Mohit Tiwari.

Given that 65% of Fortune 500 companies currently implement or are planning to implement RAG-based AI systems, the potential impact of these attacks cannot be overstated,” Mandy wrote. Moreover, the attack is especially dangerous that it requires only basic access to manipulate responses by all RAG-based AI implementations, can persist even after malicious content is removed, and bypasses current AI security measures, he said.

Malicious Manipulation of RAG

RAG is a technique for improving response quality and eliminating a large language model (LLM) system’s expensive retraining or fine-tuning phase. It adds a step to the system in which the model retrieves external data to augment its knowledge base, thus enhancing accuracy and reliability in generating responses without the need for retraining or fine-tuning, the researchers said.

The researchers chose to focus on Microsoft 365 Copilot for the sake of their presentation and their paper, even though it is not the only RAG-based system affected. Rather, “the main culprit of this problem is misuse of RAG-based systems … via improper setup of access control and data security mechanisms,” according to the ConfusedPilot website hosted by the researchers.

In normal circumstances, a RAG-based AI system will use a retrieval mechanism to extract relevant keywords to search and match with resources stored in a vector database, using that embedded context to create a new prompt containing the relevant information to reference.

How the Attack Works

In a ConfusedPilot attack, a threat actor could introduce an innocuous document that contains specifically crafted strings into the target’s environment. “This could be achieved by any identity with access to save documents or data to an environment indexed by the AI copilot,” Mandy wrote.

The attack flow that follows from the user’s perspective is this: When a user makes a relevant query, the RAG system retrieves the document containing these strings. The malicious document contains strings that could act as instructions to the AI system that introduce a variety of malicious scenarios.

These include: content suppression, in which the malicious instructions cause the AI to disregard other relevant, legitimate content; misinformation generation, in which the AI generates a response using only the corrupted information; and false attribution, in which the response may be falsely attributed to legitimate sources, increasing its perceived credibility.

Moreover, even if the malicious document is later removed, the corrupted information may persist in the system’s responses for a period of time because the AI system retains the instructions, the researchers noted.

Victimology and Mitigations

The ConfusedPilot attack basically has two victims: The first is the LLM within the RAG-based system, while the second is the person receiving the response from the LLM, who very likely could be an individual working at a large enterprise or service provider. Indeed, these two types of companies are especially vulnerable to the attack, as they allow multiple users or departments to contribute to the data pool used by these AI systems, Mandy noted.

“Any environment that allows the input of data from multiple sources or users — either internally or from external partners — is at higher risk, given that this attack only requires data to be indexed by the AI Copilots,” he wrote.

Enterprise systems likely to be negatively affected by the attack include enterprise knowledge-management systems, AI-assisted decision support systems, and customer-facing AI services.

Microsoft did not immediately respond to request for comment by Dark Reading on the attack’s affect on Copilot. However, the researchers noted in their paper that the company has been responsive in coming up with “practical mitigation strategies” and addressing the potential for attack in its development of its AI technology. Indeed, the latter is key to long-term defense against such an attack, which depends on “better architectural models” that “try to separate the data plan from the control plan in these models,” Mandy noted.

Meanwhile, current strategies for mitigation include: data access controls that limit and scrutinize who can upload, modify, or delete data that RAG-based systems reference; data integrity audits that regularly verify the integrity of an organization’s data repositories to detect unauthorized changes or the introduction of malicious content early; and data segmentation that keeps sensitive data isolated from broader datasets wherever possible to prevent the spread of corrupted info across the AI system.


Related Articles

Latest Articles