AI has become our new authority figure. We simply follow its instructions without question. Perhaps we should exercise more caution and critical thinking.
Worries about how well AI agents perform have been ongoing, spanning from “leaky” systems to outright poor decision-making. Given the mounting pressure to deploy more agents with greater autonomy—driven by increasingly sophisticated AI-assisted attacks—Adversa AI’s initiative to evaluate and benchmark the performance and security of 100 agents across ten categories is a welcome development.
However, the findings are deeply troubling. Out of the 100 agents tested and mapped onto a new AI Risk Quadrant, only 11 are classified as both “capable and well-defended.”
The core issue is what Adversa calls the AI agent “lethal trifecta”: access to private data, exposure to untrusted content, and the ability to perform outbound actions. In plain terms, this equates to excessive power, excessive trust, and insufficient oversight.
Since all three elements of this trifecta are essential for an AI agent to fulfill its purpose, achieving both capability and security simultaneously is an enormous challenge. Ninety-eight percent of the agents exhibit this trifecta, so while it’s not surprising, it’s still alarming to learn that so few manage to be both useful and secure.
Capability and security are nearly at odds with each other. “The same vendors delivering the most capable agents also deliver the widest attack surface—a structural reality of the market, not a few isolated cases,” notes Adversa’s analysis in its AI Risk Quadrant for Agent Security report. The firm labels this phenomenon a “power-protection inversion” and observes that it appears across all ten agent categories.
The agent categories exhibiting the most severe power-protection inversion are “computer agents,” followed by “coding agents.”
Computer agents are built to carry out specific tasks—such as making decisions or performing actions on behalf of a user. Since agents can only work with the information they have (the context problem, where inadequate context leads to flawed decisions across all agents), computer agents are granted broad access rights, essentially full control of the operating system. “A compromise gives the attacker the user’s entire machine, not just a single application or browser tab,” Adversa cautions.
These agents also face a problem common to all agents: the user has little to no visibility into or control over what the agent actually does. It receives an input (the task) and produces an output (the completed task). But with computer agents, the user has no idea what path the agent takes between input and output, nor what specific actions it performs within the operating system along the way.
“The deeper issue is that the desktop confirmation step appears to be a control but is unreliable in practice,” the analysis warns. “The human and the model reason over different abstractions (windows and labels versus screenshots and accessibility trees). That gap creates a confirmation mismatch: the human approves the appearance of the action, not what the agent is actually about to do, because nothing in the interface reveals the difference.”
The second most problematic category in the “exposed giants” quadrant is coding agents. This is particularly concerning as “vibe-coding” applications are shaping the future of software development, and “vibe-coded” in-house applications may remain in use for years to come.
The analysis breaks coding agents into three types: coding copilots (where a human reviews each suggestion), autonomous coding agents (goal-in, repo-out), and app builders (prompt-to-deployed-app). The first type might seem the least risky, but the user still doesn’t know what the agent does between input and output. “Coding agents don’t just write code—they interact with the shell, dependencies, and tokens long before a diff reaches review,” Adversa explains.
“This is the category where a compromise most directly becomes a production compromise. The danger isn’t bad code suggestions; it’s high-trust operation within the software supply chain. Non-determinism makes code review an incomplete safeguard: even if a human reviews the final diff, the agent may have already accessed secrets, run tests against production-like services, modified configurations, or selected risky dependencies. Review catches outputs; it doesn’t capture the full trail of actions.”
Coding agents rank so high among the exposed giants because they have a broad attack surface, an extensive blast radius, and weak defensive controls. The attack surface is broad because they execute shell commands, load MCP servers, and auto-load rules files. The blast radius stems from their position within the software supply chain, with access to secrets, signing keys, and deployment pipelines. And their primary defense—a code review of the output—fails to account for either the attack surface or the blast radius.
We’ve examined just two of the ten agent types covered in Adversa’s agent analysis and AI Risk Quadrant. The remaining eight categories are general assistant, work copilot, browser, conversational, custom workflow, business process, platform operations, and data engineering. None emerge unscathed. Ninety-eight percent of the tested agents are subject to the lethal trifecta, with only one agent each in the general assistant and data engineering categories being the exceptions.
Learn More at the AI Risk Summit | Ritz-Carlton, Half Moon Bay
Key takeaways from Adversa include: agent defaults prioritize speed over safety; the most powerful agents have the least protection, while the most protected agents have the least power; only 11% qualify for the capable-and-defended quadrant; tool execution accounts for 76% of blast radius; 37% of the market is audited rather than genuinely defended; and 83% of claimed AI agent defenses cannot be publicly verified.
Agents are effectively black boxes—it’s an all-or-nothing proposition. Business economics is pushing us to accept them. Since we cannot control what an agent does while it’s running, our only option is to be careful about what we input and to control the output wherever possible.
On this point, Adversa recommends focusing on controlling the output, since there’s little that can be done about input prompts. “Defend the areas you can control, not the ones you can’t,” the firm advises. “Prompt injection has no deterministic solution—no classifier can reliably separate an agent’s data from its instructions, and vendors acknowledge this. Accept the vulnerability at the input boundary and allocate your defensive resources to the trifecta elements the operator can actually control: egress, identity, and irreversible actions.”
This is where we stand today. The headlong rush into agentic AI solutions is irreversible but deeply concerning. We can only counter adversarial AI-assisted attacks with AI-assisted defense. Every business will only stay competitive if it is faster and more efficient than its rivals. In the business world, all paths lead to AI. We must hope—and can reasonably expect—that AI will improve across all areas in time. How much and when remains an open question.
But in the interim, the overarching message from Adversa’s comprehensive and detailed analysis is clear: “Let’s be careful out there.”
Related: Can We Trust AI? No – But Eventually We Must
Related: The Wild West of Agentic AI – An Attack Surface CISOs Can’t Afford to Ignore
Related: Sweet Security Launches Agentic AI Red Teaming to Counter ‘Mythos Moment’
Related: Raising the Cybersecurity Stakes: Ante up for the Agentic Era



