We posted our thoughts on Project Glasswing a few weeks back, sharing what happened when we tested cyber frontier models on our our own codebase. Since that post went live, the idea that has sparked the most conversation is the notion that how you structure your defenses around a weakness in your code is more critical than how fast you patch it.
Since publishing, CISOs and security teams have been asking us similar questions: What does a strong architecture look like in practice? What indicators should we keep an eye on? Where should we begin? And what role can Cloudflare play?
A quick note before diving deeper: nearly all of the components in the following architecture come from Cloudflare’s own tools. Cloudflare acts as customer zero by using the same security products we offer. Our infrastructure already protects our internal codebase, staff tools, and customer-facing services. If you already use Cloudflare, every layer covered here is available for you right now. If not, the underlying strategies are still worth adopting regardless of your setup.
What a cyber frontier model actually changes
In our earlier article, we demonstrated how a cyber frontier model like Mythos reshapes the attacker’s timeline. It can identify vulnerabilities, map out exploit chains, and create working proofs of concept much more quickly than previous models. While tools like Mythos don’t alter the fundamental stages of an attack — reconnaissance, initial access, lateral movement, persistence, and exfiltration still occur — the key difference lies in speed and scale. When scanning the open web, a model can rapidly locate and exploit easy targets. Against a well-defended system, it still needs to probe and adjust, often generating more noise than a skilled human attacker would.
Previously, discovering vulnerabilities, building exploit chains, and generating proof-of-concept code were the main bottlenecks in launching an attack. A frontier model now completes all three steps in a fraction of the time. Tasks that once required careful, methodical work are now fast and widespread.
While AI is helping developer teams at Cloudflare and other organizations ship code faster, security teams haven’t seen the same acceleration. An attacker only needs one entry point, while security teams must find and close every vulnerability. Writing a fix, testing it, and deploying it without disrupting surrounding code involves challenges that AI can’t eliminate. We discovered this firsthand when we allowed an AI coding assistant to patch its own bugs, as we described at the end of our previous post. Some of those patches resolved the original issue while unintentionally breaking other parts of the code that depended on it.
As these models grow more capable, our primary concerns from a threat perspective boil down to three areas. Each one influences the architecture we’ll explore in the rest of this post.
The first concern is the speed of discovery. Frontier models make it easier to scan large volumes of public code, including the open-source libraries that many organizations rely on. This doesn’t mean every bug in a library can be exploited, or that library bugs are the main source of vulnerabilities. Exploitability still depends on how the code is used, whether attacker-controlled input can reach the vulnerable path, and the protections surrounding it. However, widely used open-source libraries and frameworks give attackers a common surface to analyze at scale. When a real, exploitable vulnerability exists, a model can help identify it, reason through potential exploit paths, and generate proof-of-concept variations faster than maintainers and defenders can review every downstream use. The gap between when an attacker finds a vulnerability and when defenders become aware of it is what concerns us most. If you’re not running these models against your own code, you can safely assume someone else is.
The second concern is exploit volume and adaptation. A model can generate thousands of variations of a single exploit and conduct reconnaissance at the same scale. While this volume gives attackers an advantage, it won’t necessarily bypass signature-based detections. Many of those variations will share the same underlying signature, so a rule that catches the first one will catch the rest. Adaptation is how attackers will evade signature-based detections. Ask a model to demonstrate a SQL injection, and it will provide a textbook example. Tell it there’s a WAF in place, and it will start probing, learning what gets blocked, and rewriting the payload until it can bypass the rule stopping it.
The third concern is the impact when a vulnerability is inevitably exploited. No architecture catches everything. After a vulnerability is exploited, the question we ask ourselves is: where can the attacker go with one identity, one path, or one credential before something else stops them? If the answer is “anywhere they want,” the vulnerability was never the real problem. The architecture around it was.
Cloudflare’s superpower: visibility
We handle roughly a fifth of the world’s web traffic, and that traffic gives us real-time insight into which payloads are mutating, which patterns are emerging, and where attacker tooling is heading next. Two teams convert that visibility into defense.
First is Cloudforce One, our threat intelligence, research, and operations team, which operates within the Cloudflare security organization. They transform what we observe across the network into actionable insights for the rest of the stack: tracked adversaries, emerging campaigns, and indicators of compromise (IOCs). The real challenge in this work was never identifying malicious activity — it was the delay in mitigation. Knowledge of a new threat typically has to move from a threat report into a feed and then into a company’s defenses before it can be used to block anything. Attackers have learned to move faster than that. Our network closes that gap: Cloudflare customers can now use Cloudforce One threat intelligence directly within the WAF to block high-risk traffic.
Second is the team responsible for the WAF engine that performs the actual detection: the managed rulesets that protect our own properties and are available to every Cloudflare customer, the machine learning behind WAF Attack Score, and the relationships that sometimes allow us to deploy a rule before a CVE is publicly disclosed. The team is globally distributed and moves quickly, releasing rules within hours of an attack proof-of-concept becoming known. Once a detection is deployed, it reaches our entire network, along with
Every Cloudflare customer was protected in under 30 seconds. React2Shell is a recent case in point: a managed WAF rule safeguarded our own systems, along with every other Cloudflare customer’s, hours before the public advisory went live.
The scoring layer, the defenses we place in front of the application, and the containment around the vulnerability all depend on what these two teams observe.
Signature-based defenses were designed for an era when new exploits were rare and variations took weeks to appear. Cloudflare’s standard SLA from a fresh proof-of-concept to a live, deployed rule has been 12 hours. With the rise of frontier models, that timeline is no longer sufficient. Protections need to be in place before a CVE is even identified. That’s why we layer ML-based detection ahead of the traditional signature-based WAF.
The model is trained on a vast collection of historical attack traffic, and it identifies new variants of vulnerabilities before they become publicly known. A novel SQL injection or remote code execution chain is almost always a recombination of attack patterns the model has encountered before, even when the specific exploit is entirely new. We run the model on every request and assign a WAF Attack Score between 1 and 99, based on how closely the request matches those underlying patterns, rather than comparing it against a list of known-bad signatures. The lower the score, the more aggressively we handle the request. That score decides whether we allow the request through. We apply a similar scoring approach to AI prompts with AI Security for Apps: instead of checking each prompt against a catalog of known malicious prompts, we score how closely a prompt resembles a genuine attack.
The architecture around the vulnerability
Those capabilities only matter once they’re stacked in front of an application, and the first layer in our defense-in-depth strategy is the WAF. Anything matching a known-bad pattern gets dropped before it reaches the application, which filters out the bulk of obvious traffic and lets the more specialized layers below focus on what remains.
On the API surface, we enforce a positive security model through API Shield. Rather than trying to anticipate every bad request, we define what a valid request to each API looks like, either from the API’s own specification or learned from our real traffic, and anything that doesn’t conform is blocked. This neutralizes the advantage of frontier AI models: because we only allow validated traffic, generating thousands of new attack variations can’t bypass the system.
Cloudflare’s layered architecture
Bot Management intercepts probing traffic on our network before frontier models can map it out. It scores every request on how likely it is to be automated, using the same signals across our entire network: how the client behaves, whether it resembles a real browser, and whether the connection matches a known-bad pattern. An attack only succeeds if it can find a weak spot.
Zero Trust Network Access is required for every internal application. The implicit trust of being inside the network is replaced with explicit per-request identity and policy for every employee accessing every tool. The value of this became clear when one of our engineers deployed a misconfigured tool. A flat network would have exposed everything on the same segment, but in our setup, the exposure was limited to the tool itself. We built Require Access Protection afterward so newly deployed or misconfigured applications can’t be reached before an access policy is in place.
IdP Federation makes that secure-by-default posture easier to maintain consistently across every Cloudflare account — which becomes even more critical when more people are shipping internal tools rapidly. Instead of asking each team to configure SSO separately, we set up our identity provider (IdP) once and share it across the organization. New accounts get SSO automatically, recipient-side IdP connections are read-only, and Access policies in each account still evaluate the resulting identity as part of the normal request flow.
MCP Server Portal gives teams a controlled way to connect AI agents to enterprise systems. Agents access MCP servers that are centrally managed through a single portal, with every action logged. That way, when an agent acts on someone’s behalf, we know what it did, what it accessed, and whether it should have been permitted. The full story of how we built it is in our post on enterprise MCP.
AI Gateway sits in front of our internal AI tools the same way AI Security for Apps sits in front of customer-facing AI features, with the same scoring and the same visibility. Inside the company, the visibility component is more valuable than the blocking, because we needed to see what engineers were actually building before we could write meaningful policy around it.
Where your teams can start
Frontier models can help attackers discover vulnerabilities, adapt payloads, and move faster, but they still have to pass through the layered defense you deploy in front of your application. That’s where teams should begin:
Place inspection in front of public applications.
Define what valid API traffic looks like.
Use bot detection to limit automated probing.
Require identity and access policy before any internal tool is reachable.
For AI and agentic systems:
The goal is to ensure that when one layer misses, the next layer restricts what the attacker can see, reach, or modify.
That’s the purpose of the architecture around the vulnerability: to limit the scope of an attack. The vulnerability may be what triggers the attack, but the architecture determines how far it can spread.
How do we know this approach works?
Many security stacks look impenetrable on a whiteboard but crumble in practice. That’s why we test ours continuously, both at the perimeter and inside our environment, with our red team involved in both.
At the perimeter, frontier models are one tool we use to test our application security stack as an adaptive attacker. These models work alongside the rest of our red team and detection workflows, including: manual testing, threat intelligence, observed traffic patterns, proof-of-concept analysis, and signals from our own network. Together, those inputs help us decide where to direct testing: newly launched products, recently changed surfaces, and the paths an attacker is most likely to probe first. The most important part is the process that follows. When something gets through, we identify the gap, use the right combination of tools to understand it, write the rule or mitigation, deploy the update, and test again to confirm the gap is closed.
Inside the environment, our red team starts from the assumption that the perimeter has already been breached. They examine what has changed, where sensitive systems carry risk, and whether one compromised identity, path, or credential can reach further than it should. When we adjust the architecture based on what they find, they run the scenario again against the updated version to verify the gap is truly closed.
We validate that this architecture works by continuously testing its behavior during failures, rather than relying on the perfection of individual layers.
If your team is tackling the same challenges and would like to compare notes, reach out to us at [email protected].



