Hacker plants false memories in ChatGPT to steal user data in perpetuity

As an Amazon Associate I earn from qualifying purchases.

Table of Contents

MEMORY PROBLEMS–

E-mails, files, and other untrusted material can plant destructive memories.

Dan Goodin
– Sep 24, 2024 8:56 pm UTC

Getty Images

When security scientist Johann Rehberger just recently reported a vulnerability in ChatGPT that enabled opponents to save incorrect info and harmful guidelines in a user’s long-lasting memory settings, OpenAI summarily closed the questions, identifying the defect a security concern, not, technically speaking, a security issue.

Rehberger did what all excellent scientists do: He produced a proof-of-concept make use of that utilized the vulnerability to exfiltrate all user input in all time. OpenAI engineers took notification and provided a partial repair previously this month.

Walking down memory lane

The vulnerability mistreated long-lasting discussion memory, a function OpenAI started evaluating in February and made more broadly readily available in September. Memory with ChatGPT shops details from previous discussions and utilizes it as context in all future discussions. That method, the LLM can be knowledgeable about information such as a user’s age, gender, philosophical beliefs, and practically anything else, so those information do not need to be inputted throughout each discussion.

Within 3 months of the rollout, Rehberger discovered that memories might be developed and completely saved through indirect timely injection, an AI make use of that triggers an LLM to follow directions from untrusted material such as e-mails, post, or files. The scientist showed how he might deceive ChatGPT into thinking a targeted user was 102 years of ages, resided in the Matrix, and firmly insisted Earth was flat and the LLM would integrate that info to guide all future discussions. These false-memory syndromes might be planted by keeping files in Google Drive or Microsoft OneDrive, submitting images, or searching a website like Bing– all of which might be produced by a harmful assailant.

Rehberger independently reported the finding to OpenAI in May. That exact same month, the business closed the report ticket. A month later on, the scientist sent a brand-new disclosure declaration. This time, he consisted of a PoC that triggered the ChatGPT app for macOS to send out a verbatim copy of all user input and ChatGPT output to a server of his option. All a target required to do was advise the LLM to see a web link that hosted a destructive image. After that, all input and output to and from ChatGPT was sent out to the opponent’s site.

ChatGPT: Hacking Memories with Prompt Injection-POC

“What is truly fascinating is this is memory-persistent now, “Rehberger stated in the above video demonstration.”The timely injection placed a memory into ChatGPT’s long-lasting storage. When you begin a brand-new discussion, it really is still exfiltrating the information.”

The attack isn’t possible through the ChatGPT web user interface, thanks to an API OpenAI presented in 2015.

While OpenAI has actually presented a repair that avoids memories from being abused as an exfiltration vector, the scientist stated, untrusted material can still carry out timely injections that trigger the memory tool to keep long-lasting details planted by a destructive opponent.

LLM users who wish to avoid this kind of attack need to pay attention throughout sessions for output that suggests a brand-new memory has actually been included. They ought to likewise routinely evaluate kept memories for anything that might have been planted by untrusted sources. OpenAI supplies assistance here for handling the memory tool and particular memories kept in it. Business agents didn’t react to an e-mail inquiring about its efforts to avoid other hacks that plant false-memory syndromes.

Learn more

As an Amazon Associate I earn from qualifying purchases.