Saturday, January 18, 2025

ChatGPT Exposes Its Instructions, Knowledge & OS Files

ChatGPT exposes significant data pertaining to its instructions, history, and the files it runs on, placing public GPTs at risk of sensitive data exposure, and raising questions about OpenAI’s security on the whole.

The world’s leading AI chatbot is more malleable and multifunctional than most people realize. With some specific prompt engineering, users can execute commands almost like one would in a shell, upload and manage files as they would in an operating system, and access the inner workings of the large language model (LLM) it runs on: the data, instructions, and configurations that influence its outputs.

OpenAI argues that this is all by design, but Marco Figueroa, a generative AI (GenAI) bug-bounty programs manager at Mozilla who has uncovered prompt-injection concerns before in ChatGPT, disagrees.

“They’re not documented features,” he says. “I think this is a pure design flaw. It’s a matter of time until something happens, and some zero-day is found,” by virtue of the data leakage.

Prompt Injection: What ChatGPT Will Tell You

Figueroa didn’t set out to expose the guts of ChatGPT. “I wanted to refactor some Python code, and I stumbled upon this,” he recalls. When he asked the model to refactor his code, it returned an unexpected response: directory not found. “That’s odd, right? It’s like a [glitch in] the Matrix.”

Related:Microsoft Pulls Exchange Patches Amid Mail Flow Issues

Was ChatGPT processing his request using more than just its general understanding of programming? Was there some kind of file system hidden underneath it? After some brainstorming, he thought of a follow-up prompt that might help elucidate the matter: “list files /”, an English translation of the Linux command “ls /”.

In response, ChatGPT provided a list of its files and directories: common Linux ones like “bin”, “dev”, “tmp”, “sys”, etc. Evidently, Figueroa says, ChatGPT runs on the Linux distribution “Debian Bookworm,” within a containerized environment.

By probing the bot’s internal file system — and in particular, the directory “/home/sandbox/.openai_internal/” — he discovered that besides just observing, he could also upload files, verify their location, move them around, and execute them.

OpenAI Access: Feature or Flaw?

In a certain light, all of this added visibility and functionality is a positive — offering even more ways for users to customize and level up how they use ChatGPT, and enhancing OpenAI’s reputation for transparency and trustworthiness.

Indeed, the risk that a user could really do anything malicious here — say, upload and execute a malicious Python script — is softened by the fact that ChatGPT runs in a sandboxed environment. Anything a user can do will, in theory, be limited only to their specific environment, strictly cordoned off from any of OpenAI’s broader infrastructure and most sensitive data.

Related:Trump 2.0 May Mean Fewer Cybersecurity Regs, Shift in Threats

Figueroa warns, though, that the extent of information ChatGPT leaks via prompt injection might one day help hackers find zero-day vulnerabilities, and break out of their sandboxes. “The reason why I stumbled onto everything I did was because of an error. This is what hackers do [to find bugs],” he says. And if trial and error doesn’t work for them, he adds, “the LLM could assist you in figuring out how to get through it.”

In an email to Dark Reading, a representative of OpenAI reaffirmed that it does not consider any of this a vulnerability, or otherwise unexpected behavior, and claimed that there were “technical inaccuracies” in Figueroa’s research. Dark Reading has followed up for more specific information.

The More Immediate Risk: Reverse-Engineering

There is one risk here, however, that isn’t so abstract.

Besides standard Linux files, ChatGPT also allows its users to access and extract much more actionable information. With the right prompts, they can unearth its internal instructions — the rules and guidelines that shape the model’s behavior. And even deeper down, they can access its knowledge data: the foundational structure and guidelines that define how the model “thinks,” and interacts with users.

Related:Cloud Ransomware Flexes Fresh Scripts Against Web Apps

On one hand, users might be grateful to have such a clear view into how ChatGPT operates, including how it handles safety and ethical concerns. On the other hand, this insight could potentially help bad actors reverse engineer those guardrails, and better engineer malicious prompts.

Worse still is what this means for the millions of custom GPTs available in the ChatGPT store today. Users have designed custom ChatGPT models with focuses in programming, security, research, and more, and the instructions and data that gives them their particular flavor is accessible to anyone who feeds them the right prompts.

“People have put secure data and information from their organizations into these GPTs, thinking it’s not available to everyone. I think that is an issue, because it’s not explicitly clear that your data potentially could be accessed,” Figueroa says.

In an email to Dark Reading, an OpenAI representative pointed to GPT Builder documentation, which warns developers about the risk: “Don’t include information you do not want the user to know” it reads, and flags its user interface, which warns, “if you upload files under Knowledge, conversations with your GPT may include file contents. Files can be downloaded when Code Interpreter is enabled.”


Related Articles

Latest Articles