Final week, Anthropic introduced Undertaking Glasswing, an AI mannequin so efficient at discovering software program vulnerabilities that they took the extraordinary step of suspending its public launch. As an alternative, the corporate has given entry to Apple, Microsoft, Google, Amazon, and a coalition of others to discover and patch bugs earlier than adversaries can.
Mythos Preview, the mannequin that led to Undertaking Glasswing, discovered vulnerabilities throughout each main working system and browser. A few of these bugs had survived a long time of human audits, aggressive fuzzing, and open-source scrutiny. One had been sitting for 27 years in OpenBSD, typically thought-about to be one of many world’s most safe working methods.
It is tempting to file this below “AI lab says their AI is too dangerous,” the identical playbook OpenAI ran with GPT-2.
Not so quick; there is a materials distinction this time.
Mythos did not simply discover particular person CVEs.
- It chained 4 impartial bugs into an exploit sequence that bypassed each the browser renderer and the OS sandboxing
- It carried out native privilege escalation in Linux by way of race circumstances
- It constructed a 20-gadget ROP chain focusing on FreeBSD’s NFS server, distributed throughout packets.
Claude Opus 4.6, Anthropic’s earlier frontier mannequin, failed at autonomous exploit improvement nearly solely.Mythos hit a 72.4% success charge within the Firefox JS shell.
This is not theoretical, nor some new three-to-five-year prediction. That is about to be a real-world engineering actuality.
Why Undertaking Glasswing Exposes the Actual Cybersecurity Hole
This is the quantity that ought to maintain safety leaders awake at night time: fewer than 1% of the vulnerabilities discovered by Mythos have been patched.
Let that sink in for a second.
Essentially the most highly effective vulnerability discovery engine ever constructed ran towards the world’s most important software program, and the ecosystem could not take up the output.
Glasswing solved the discovering downside.
No one solved the issue of fixing.
Why Defenders Cannot Hold Up: Calendar Pace vs. Machine Pace
That is the structural challenge the cybersecurity business has been circling for years. AI simply made it unattainable to disregard.
Defenders function on calendar velocity. They:
- Collect intelligence
- Construct a marketing campaign
- Simulate the threats
- Mitigate
- Repeat
That cycle takes about 4 days on day. Attackers, particularly these now leveraging LLMs at each stage of their operation, are shifting at machine velocity.
For an up-to-the-minute take, David B. Cross, CISO at Atlassian, shall be talking on the Autonomous Validation Summit on May 12 about what this looks like from the inside, why periodic testing can’t keep pace with adversaries that operate autonomously, and what defenders should be doing instead.
AI-Powered Attacks Are Already Autonomous
Earlier this year, a threat actor deployed a custom MCP server hosting an LLM as part of their attack chain against FortiGate appliances.
The AI handled everything:
- Automated backdoor creation
- Internal infrastructure mapping fed directly to the model
- Autonomous vulnerability assessment, and
- AI-prioritized execution of offensive tools for domain admin access.
The result? 2,516 organizations across 106 countries were compromised in parallel. The entire chain, from initial access through credential dumping to data exfiltration, was autonomous. The only human involvement was reviewing the results afterward.
AI-based Vulnerability Discovery Is Outpacing Remediation
The gap between attacker speed and defender speed isn’t new.
What’s new is that a small but worrisome gap just became a canyon.
- Autonomous systems like AISLE discovered 13 out of 14 OpenSSL CVEs in recent coordinated releases, bugs that had survived years of human review.
- XBOW became the top-ranked hacker on HackerOne in 2025, surpassing all human participants.
- The median time from disclosure to weaponized exploit dropped from 771 days in 2018 to single-digit hours by 2024.
- By 2025, the majority of exploits will be weaponized before being publicly disclosed.
Now add Mythos-class discovery to this picture.
You don’t get a safer world automatically. You get a tsunami of legitimate findings that still require human verification, organizational process, business continuity considerations, and patch cycles that haven’t fundamentally changed in a decade.
How to Build a Mythos-Ready Security Program
The instinct after Glasswing is to ask: “How do we find more bugs?”
That’s actually the wrong question.
The right one is: “When thousands of exploitable vulnerabilities land on your desk tomorrow morning, can your program actually process them?“
For most organizations, the honest answer is no. And the reason isn’t a lack of tools or talent; it’s a structural dependency on periodic, human-initiated processes that were designed for a world where vulnerabilities trickled in, not one where they arrived in a tsunami.
We can’t fix every vulnerability. We can’t apply every hardening option.
That’s not defeatism, that’s the pragmatic starting point for any security program that actually works. The question that matters isn’t “is this CVE critical?” but “is this vulnerability exploitable in my environment, right now, given what I have deployed?“
A Mythos-ready security program needs three fundamental pieces.
First: Signal-Driven Validation Over Scheduled Testing
When a new threat emerges, when an asset changes, or when a configuration drifts, defenses need to be tested against that specific change in that moment. Not during the next quarterly pentest. Not when someone can find an open calendar slot.
The entire concept of “scheduled validation” assumes a stable threat landscape, and today, that assumption is dead on arrival.
Second: Environment-Specific Context Over Generic CVSS Scores
Glasswing will produce an avalanche of CVEs.
Yet most vulnerability management programs are still prioritized by CVSS scores. This context-free metric tells you how bad a bug could be in theory, not whether it’s exploitable in your specific infrastructure, given your controls and business risk.
When the volume of findings suddenly goes from hundreds to thousands, context-free prioritization won’t just slow you down; it’ll break your process entirely.
Third: Closed-Loop Remediation Without a Manual Handoff
The current model can’t survive in a world where adversaries exploit CVEs within hours of disclosure. You know the drill:
- Scanner finds a bug
- Analyst triages it
- The ticket goes to a different team
- Someone patches it weeks later
- Nobody re-validates
That chain of manual handoffs is exactly where the system disintegrates. If the cycle from finding to fix to re-validation can’t run without humans shuttling tickets between queues, it clearly isn’t running anywhere near machine speed.
This isn’t about buying more tools. It’s about defenders leveraging their one asymmetric advantage: you know your organization’s topology, attackers don’t.
That’s a significant advantage, but only if you can act on it at machine speed.
How Autonomous Exposure Validation Closes the Gap — and Where Picus Comes in
This is the part where I’m going to be really transparent about who’s writing this.
At Picus Security, we build a platform for Autonomous Exposure Validation. So, full disclosure, I have a perspective here that comes with an inherent bias. Take it accordingly.
What Glasswing crystallized for us, and for a lot of the CISOs we’ve been speaking with, is that the validation step within any exposure management program just became the most critical bottleneck.
- Finding vulnerabilities is about to get radically easier and more efficient
- Patching them is going to remain painfully slow.
The only lever you can pull in between is knowing which ones actually matter to your environment. That’s validation.
From Four Days to Three Minutes: How Agentic Workflows Change the Cycle
We built Picus Swarm, the AI team powering autonomous, real-time validation, to compress the traditional four-day cycle into minutes.
It’s a set of AI agents that work together to do what used to require handoffs between four separate teams:
- A researcher agent ingests and vets threat intelligence.
- A red teamer agent maps it against your environment to generate a safety-checked attacker playbook.
- A simulator agent executes across your actual endpoints and cloud, gathering telemetry and proof data.
- A coordinator agent bridges findings to remediation, opening tickets, triggering SOAR playbooks, pushing indicators of attack to your EDR, and re-validating after fixes land.
Every action is traceable and auditable, andevery agent operates within guardrails you define.
The whole chain, from a new CISA alert to validated, remediation-ready findings, runs in about three minutes.
When a Mythos-class model drops thousands of findings on your organization, you need something that can immediately tell you which of these are exploitable in your environment. Which controls would hold, which would fail, and what’s the vendor-specific fix?
The Uncomfortable Truth
Project Glasswing is going to be measured by one metric: how many vulnerabilities get patched before they get exploited. Not how many are found, not how impressive the exploit chains are, but whether the ecosystem can digest what AI is about to produce.
Visibility alone has never been enough, 83% of cybersecurity programs still show no measurable results. What’s changing the equation is closing the gap between seeing and proving: knowing whether a potential vulnerability would actually compromise your environment.
That’s validation.
And in a post-Glasswing world, it’s the only thing standing between a flood of discoveries and a flood of breaches.

We’re hosting the Autonomous Validation Summit on May 12 & 14 with Frost & Sullivan, featuring practitioners from Kraft Heinz and Glow Financial Services, along with our CTO, Volkan Erturk. Together, we’ll be taking a deeper dive into this specific problem.
>> Register here.
Note: This article was written by Sıla Özeren Hacıoğlu, Security Research Engineer at Picus Security.



