- Exploitable Vulnerability: A researcher discovered that ChatGPT’s long-term memory feature could be manipulated to store false information and malicious commands.
- Proof-of-Concept Attack: The exploit demonstrated how attackers could exfiltrate all user input indefinitely by tricking the AI into accepting fabricated memories.
- Ongoing Risk: Despite a partial fix from OpenAI, users are still vulnerable to prompt injections that can embed misleading information into their memory settings.
Security researcher Johann Rehberger recently uncovered a significant vulnerability in ChatGPT related to its long-term memory feature. This flaw allowed attackers to insert false information and malicious commands into a user’s memory settings. When Rehberger reported this issue to OpenAI, the company closed the inquiry, categorizing it as a safety concern rather than a security problem. This decision prompted Rehberger to take further action by creating a proof-of-concept exploit that demonstrated the potential for exfiltrating all user input indefinitely.
The vulnerability exploits the long-term memory feature that OpenAI began testing in February and expanded to a wider audience in September. This memory function is designed to retain information from past conversations, allowing the AI to recognize user-specific details, such as age and preferences, which helps in personalizing future interactions. However, within three months of its release, Rehberger identified that memory could be manipulated through a technique known as indirect prompt injection, where untrusted content can instruct the AI to store misleading information.
Rehberger showcased this vulnerability by tricking ChatGPT into believing that a targeted user was significantly older, lived in an alternate reality, and held unconventional beliefs. These fabricated memories could be inserted by using files stored in cloud services or through seemingly benign actions, such as browsing websites. After privately reporting his findings to OpenAI in May and receiving little response, Rehberger submitted a follow-up report that included a demonstration of an exploit capable of sending all user input and output from the ChatGPT app to an external server. This required only that the target interact with a malicious web link, enabling the attacker to capture ongoing conversations.
Despite OpenAI rolling out an initial fix that limits the abuse of memory for data exfiltration, the researcher indicated that the risk remains. Untrusted content could still lead to prompt injections that allow malicious actors to implant information into the memory tool. Although the vulnerability cannot be exploited through the ChatGPT web interface due to an API introduced last year, the threat persists, particularly for users who might inadvertently interact with compromised content.
To mitigate this risk, users of the long-term memory feature are advised to remain vigilant during their interactions with the AI. They should be alert for indications of new memories being created and routinely check their stored memories for any unauthorized entries. OpenAI offers guidance on managing these memories, but as of now, company representatives have not commented on measures being taken to address the broader vulnerabilities associated with the planting of false memories.