Cybersecurity experts have uncovered a weakness in OpenAI’s ChatGPT. This flaw takes advantage of the AI’s built-in trust in Markdown links and images. It allows hackers to inject malicious prompts, creating a gateway for phishing attacks.
This specific method has been labeled ChatGPhish by researchers at Permiso Security.
“The chatgpt.com response renderer automatically trusts Markdown links and image URLs that come from external web pages the assistant has just summarized. It auto-downloads those images and turns those links into live, clickable buttons within the trusted assistant interface,” explained security researcher Andi Ahmeti in a report provided to The Hacker News.
In a potential attack scenario, a malicious actor can add a small piece of code to any web page that a victim later asks ChatGPT to summarize. As the AI responds and automatically fetches images embedded on that page, it inadvertently reveals the victim’s IP address, browser type (User-Agent), and where they came from (Referer).
Furthermore, this attack can force ChatGPT to display harmful Markdown links as active, clickable elements in its reply. Hackers can use this to show convincing fake security alerts designed to look like official system notifications. They also can display a QR code stored on an attacker’s S3 bucket, tricking the victim into scanning it with their phone. This effectively sidesteps desktop web filters and company security measures.
This recent discovery highlights how an AI’s summarization function can be turned into a strategic weak point. Earlier this March, Permiso also discovered how a hacker-controlled email, containing hidden instructions, could manipulate Microsoft Copilot’s responses through a method called cross-prompt injection (XPIA) or indirect prompt injection, when the email is summarized.
The reason ChatGPhish is considered a significant threat is not just the prompt injection itself, but how the hidden instructions from a web page are followed and then presented back to the user as a normal part of the summary.
In simpler terms, a normal web page that ChatGPT summarizes is enough to generate phishing links, fake account warning banners, remote images, and QR codes right inside an AI interface that users trust. As more businesses use ChatGPT for research and summarizing information, this weakness means any risky web page an employee asks the AI to analyze could carry a hidden payload. This payload essentially transforms the ChatGPT platform itself into a phishing tool.
“Moving the threat from email to the web browser greatly widens the scope of possible attacks,” stated Permiso. “A user no longer needs to open a risky email attachment or click a questionable link. Just summarizing a webpage during everyday browsing can give the AI attacker-written instructions, which are then included in the final summary displayed to the user.”
This report comes at the same time as Adversa AI published findings on two other attack methods, named SymJack and TrustFall. These methods target AI coding assistants and command-line coding tools, allowing attackers to run code and fully take over a computer.
SymJack is described as “a single attack pattern that allows a harmful software library to execute remote code through AI coding assistants,” according to security researcher Rony Utevsky. “The AI agent is tricked into copying what seems like a safe file. This process silently overwrites the tool’s own configuration. The next time the tool restarts, it runs the attacker’s code with the user’s complete system permission.”
In a technical sense, a rigged software library tricks the AI agent into copying a file that appears harmless. However, the copy destination is a specially crafted shortcut (symlink) that points back to the agent’s own configuration file. This causes the attacker’s code to overwrite the configuration. Upon restart, a malicious Model Context Protocol (MCP) server activates and runs any software the attacker wants, with the highest level of user privileges.
TrustFall, on the other hand, is described as a one-click remote code execution attack using a harmful software library. It can include a setup that automatically starts an MCP server without needing the user’s explicit permission or requiring the agent to call a specific tool.
In practical terms, all an attacker needs to do is create a software library that contains a harmful MCP server and a configuration that automatically allows it to run. When a developer downloads or opens that library in their AI coding tool and clicks “Yes, I trust this folder” on the prompt, the coding tool ends up running the attacker’s code with the developer’s full system permissions.
“From the moment a victim downloads the repository, launches Claude, and clicks the standard ‘Yes, I trust this folder’ dialog box, the MCP server begins as a native system process with full user privileges,” Adversa AI noted. “The harmful code runs as soon as the server starts, before any tool commands are given, and without asking for further confirmation.”
These findings align with the discovery of several other attack methods targeting AI models in recent months, including:
- The use of a new jailbreak technique called Involuntary In-Context Learning (IICL). This method “exploits the conflict between in-context learning (ICL) and safety rules” to get around the safety restrictions in GPT-5.
- A loophole in Large Language Model (LLM) security systems when users engage in multi-turn conversations. As Cisco stated: “Testing conversations over several exchanges matters for one key reason: that’s where real attackers operate. Real hackers adapt. They rephrase blocked requests, break down tasks into smaller steps across turns, use different personas, and escalate slowly. A one-question test cannot detect any of this.”
- A flaw in Anthropic’s Claude Code where changing a user-level setting in the “~/.claude.json” file, triggered by a fake npm package, can redirect connection points to an MCP server via a rogue package. This lets an attacker sit between Claude Code and an MCP server protected by OAuth login, allowing them to intercept digital tokens used for accessing other online services.
- An exploit using a remote update feature that lets an OpenClaw extension seem safe at the time of installation. Later, it allows an attacker to control an AI agent by modifying workspace files. They do this by giving the user instructions during setup to add specific commands to a file named HEARTBEAT.md.
- The use of invisible text within phishing emails, containing words taken from a legitimate newsletter or romance novel, to fool AI-based email security systems into wrongly classifying the message as safe.
- A weakness in Claude’s Chrome browser extension known as ClaudeBleed. This allows any other browser extension, even those with no special permissions, to take control of Claude and make the AI assistant perform actions on their behalf. As LayerX explained: “The flaw comes from an instruction in the extension’s code that permits any script running in the browser to communicate with Claude’s AI model, but it does not check which script it is. Because of this, any extension can activate a basic content script (which needs no special permissions) and send commands directly to the Claude extension.”
- A Cisco study found that text embedded within images, an attack called typographic prompt injection, can be used to get past safety checks in AI vision models. As Cisco noted: “When a model cannot clearly read the original image (due to small text, heavy blur, or rotation), a small, calculated alteration can restore the meaning within the model’s internal processing without making the text readable again to the human eye. This means an attacker can create images that appear as random noise or unreadable distortion to standard text-scanning security filters, yet contain perfectly clear instructions for the AI vision model.”
- A group of security weaknesses found in Microsoft Semantic Kernel (CVE-2026-25592 and CVE-2026-26030).
As frontier AI models continue to advance, threat actors are increasingly leveraging the technology to develop malware capable of dynamically altering its behavior to evade detection, and to delegate decision-making to the LLM to determine whether a compromised environment is sufficiently valuable or safe for deploying follow-on payloads.
“In the near term, the rapid spread of frontier AI model capabilities threatens to empower adversaries to exploit zero-day and N-day vulnerabilities at a massive scale,” said Palo Alto Networks Unit 42. “It will also likely allow attackers to operate with greater scale, sophistication, and speed than previously seen.”
Last month, the security firm also released a proof-of-concept (PoC) agent called Zealot that leverages LLMs to carry out automated cloud attacks requiring minimal human involvement, taking advantage of known misconfigurations and vulnerabilities.
This arises from the reality that cloud environments are inherently “AI-Attack-Ready,” since every operation has an API equivalent, feature diverse discovery mechanisms like metadata and enumeration services, are plagued by misconfigurations, and rely on credential-based access.
“Existing LLMs can chain together reconnaissance, exploitation, privilege escalation, and data exfiltration with little human oversight,” noted Unit 42 researchers Yahav Festinger and Chen Doytshman. “The attacks themselves are not novel — but automation means tasks that once demanded specialized skills can now be orchestrated by an AI agent following established techniques.”



