How We Navigated The .de TLD Outage And Kept Services Running

On May 5, 2026, at around 19:30 UTC, DENIC — the organization responsible for managing Germany’s .de country-code top-level domain (TLD) — began publishing faulty DNSSEC signatures for the .de zone. According to the DNSSEC standard, any validating DNS resolver that received these invalid signatures had to reject them and return a SERVFAIL response to clients. This included 1.1.1.1, Cloudflare’s public DNS resolver.

The .de domain is one of the biggest TLDs on the Internet, representing Germany. On Cloudflare Radar, it regularly ranks among the most queried TLDs worldwide. When an issue occurs at this level of the DNS hierarchy, millions of domains can become inaccessible.

In this post, we’ll break down what happened, the effects of the incident, and the temporary countermeasures we put in place while DENIC worked to fix the problem.

DNSSEC (Domain Name System Security Extensions) adds a layer of cryptographic verification to DNS. When a zone is signed with DNSSEC, each group of records comes with a digital signature called an RRSIG record, which allows a resolver to confirm the records haven’t been altered. Unlike encrypted DNS protocols such as DNS over TLS (DoT) and DNS over HTTPS (DoH), DNSSEC focuses on data integrity rather than privacy. The records themselves remain visible, but their authenticity can be mathematically verified.

What makes DNSSEC distinctive is that the signatures travel alongside the records they protect. This means the integrity check works no matter how many caches or intermediary hops a response has passed through. A record pulled from cache is just as verifiable as one fetched fresh from an authoritative server.

DNSSEC relies on a chain of trust. It starts at the root zone, whose trust anchor is built into resolvers by default, and each zone passes trust down to child zones through Delegation Signer (DS) records. A DS record in the parent zone holds a cryptographic hash of a public key in the child zone. When a resolver validates example.de, it checks the chain: root trusts .de, and .de trusts example.de. If there’s a break anywhere along that chain, validation fails for everything beneath it — which is why a misconfiguration at the TLD level, like with .de, can affect every domain under it.

Zones generally use two types of cryptographic keys: a Zone Signing Key (ZSK), which signs the zone’s records, and a Key Signing Key (KSK), which signs the ZSK. The KSK’s public key is what the parent zone’s DS record references, serving as the anchor for the chain of trust. Rotating a ZSK is relatively simple: create a new key, re-sign the zone’s records, and wait for cached versions to expire. Rotating a KSK is more complex, since the parent’s DS record must be updated too — often requiring coordination with a registrar or registry.

During a key rotation, there’s a critical transition period where the old key is being retired and the new key is being brought online. If the signatures published in the zone were created with a key that resolvers can’t match against the zone’s DNSKEY records — whether because the signing step failed, the timing was off, or the new key hadn’t been fully propagated yet — resolvers have no option but to reject the responses and return SERVFAIL.

On May 5, 2026, at approximately 19:30 UTC, DENIC, the operator of the .de TLD, began publishing incorrect DNSSEC signatures for the .de zone. Any validating resolver that received these records was required by the DNSSEC specification to reject them and respond with SERVFAIL. The 1.1.1.1 resolver was no exception.

The graph below illustrates the response codes returned by 1.1.1.1 for .de queries during the incident.

After the initial spike in SERVFAIL responses at 19:30 UTC, the error rate continued climbing over the next three hours as cached records gradually expired. When each domain’s cached records ran out and resolvers had to go back to DENIC for fresh copies, they received broken signatures and started returning failures.

You can also see a notable increase in overall query volume. This is common during DNS outages — clients tend to retry failed queries multiple times, sometimes three or more attempts, which inflates the raw numbers. The SERVFAIL rate may appear more alarming than the real-world impact, since many of those queries represent the same user repeatedly trying to reach the same domain.

What might come as a surprise is that the NOERROR rate remained fairly stable throughout the incident. That’s thanks to “serve stale,” which we’ll explain in the next section.

Recursive resolvers store the records they receive from authoritative nameservers for the length of each record’s TTL (Time-to-Live). While a record is cached, the resolver serves it directly without contacting the authoritative nameserver again. Once the TTL expires, the resolver fetches a fresh copy and re-caches it.

During the outage, newly fetched records resulted in SERVFAIL because the DNSSEC signatures were invalid and the resolver correctly rejected them. However, many .de records were still sitting in cache from before the incident started. Instead of immediately discarding those and returning SERVFAIL to users, 1.1.1.1 kept serving them even after their TTL had passed. This practice is known as “serving stale.”

1.1.1.1 follows RFC 8767, which standardizes this behavior. When upstream resolution fails, a resolver can continue serving expired cached records rather than returning an error. This significantly softens the blow of an upstream outage, giving operators valuable time to address the issue.

The effect is visible in the graph below, which shows response codes for .de queries during the incident with stale-served responses excluded. Without those stale responses, the NOERROR rate drops steadily from 19:30 UTC onward. These represent queries where users received correct answers only because their records were still cached.

While the root cause was largely outside our control and serve stale was doing its job, many users still experienced a real impact. Fortunately, there were steps we could take to lessen the disruption.

RFC 7646 introduces the concept of a Negative Trust Anchor (NTA). Under normal DNSSEC operation, a validating resolver maintains a set of trust anchors — public keys at the base of the chain of trust. Each DNS zone signed with DNSSEC has a trust anchor, and every child zone builds on top of it. When the cryptographic signatures linking the chain together are broken, responses get rejected and result in

SERVFAIL. An NTA is an explicit exception. It instructs the resolver to treat a specific zone as if it were unsigned, effectively skipping validation for any names within that zone.

NTAs were created specifically for situations like this. When a TLD operator publishes invalid signatures, every DNSSEC-validating resolver is forced to return SERVFAIL for every domain under that TLD. This happens not because those domains have any issues themselves, but because their parent zone is misconfigured. In such cases, continuing to return SERVFAIL offers no real security benefit: the problem is already known, publicly acknowledged, and being addressed. RFC 7646 specifically identifies TLD misconfiguration as the main scenario where NTAs should be used.

What we actually deployed

For 1.1.1.1, we operate our own resolver called Big Pineapple, which also supports 1.1.1.1 for Families, Gateway DNS, DNS Firewall, and other services. At this time, we haven’t built a dedicated NTA mechanism. Instead, we leveraged an existing override rule to classify .de as an insecure zone, causing all .de queries to be handled as though DNSSEC were disabled. This achieves the same result as an NTA, even though it isn’t formally defined in any RFC.

Choosing to bypass DNSSEC is a conscious tradeoff. Without DNSSEC validation, .de domains are exposed to real attacks for the duration of the incident. During events like this, we judged this risk acceptable because the signing failure was widespread, publicly confirmed, and impacted every validating resolver on the Internet equally. As someone in our internal incident room put it: “There is no user of 1.1.1.1 resolving a .de name right now who would prefer a SERVFAIL over an unvalidated response.”

We deployed our mitigation at 22:17 UTC, which marked the end of the impact for 1.1.1.1. We shared this update with fellow DNS operators in the DNS-OARC Mattermost.

Origin resolution mitigations

While anyone on the Internet can use our 1.1.1.1 resolver, we have a special responsibility to customers using our CDN platform services. Those with .de origin names were also impacted by this outage.

Cloudflare runs a separate internal resolver for origin resolution, distinct from our public 1.1.1.1 service. To reduce the impact, we applied a similar NTA for .de on the internal resolver, restoring origin connectivity for affected customers.

Before our mitigation was in place, queries that couldn’t be served from cache received a SERVFAIL response from 1.1.1.1. Each SERVFAIL included an Extended DNS Error (EDE) code, defined in RFC 8914, which provides clients with more detail about what went wrong.

Some resolvers returned EDE 6 (DNSSEC Bogus) with a descriptive message pointing directly at the broken signature. This is the correct behavior:

EDE: 6 (DNSSEC Bogus): RRSIG with malformed signature found for example.de/nsec3 (keytag=33834)

1.1.1.1, however, returned EDE 22 (No Reachable Authority), which on the surface suggests a connectivity issue with the upstream nameservers rather than a DNSSEC validation failure.

The root cause is a bug in how we pass DNSSEC EDE codes up from our trust chain verifier. When the verifier detects a bogus signature, it generates the DNSSEC Bogus EDE code, but this code is never included in the response. Instead, the outer layer of the resolver encounters a problem with recursive resolution and, seeing no error code, defaults to reporting “No Reachable Authority.” This hides the actual DNSSEC issue.

We recognize that this isn’t helpful for 1.1.1.1 users and will be updating our responses to properly surface DNSSEC errors.

Is this a failure of DNSSEC as a technology?

DNS is a critical link in the request chain for most Internet communication. It would be easy to conclude that this outage and the mitigations applied mean DNSSEC has failed as a technology. However, any technology that is misconfigured will risk breaking for the users who depend on it. Leaving critical fiber cables exposed on the ocean floor for sharks to damage doesn’t invalidate the essential role submarine cables play in today’s Internet. It only highlights that we’ve sometimes failed to adequately protect them. The same principle applies here. DNSSEC plays a vital role in ensuring that we can trust DNS answers without tampering by malicious actors.

No one enjoys dealing with serious incidents. Unfortunately, these things happen to everyone who operates critical infrastructure at scale. When they do, the DNS community tends to come together and support one another.

Incidents like this also underscore why relationships between operators matter. DNS is a decentralized system—no single organization controls all of it—and keeping it running reliably depends on mutual trust and open communication channels between registries, resolver operators, and the broader community. Forums like DNS-OARC provide exactly this: shared mailing lists and chat rooms where operators can coordinate quickly across organizational boundaries when something goes wrong.

DENIC has published a short blog post about the incident in which they state: “The outage is linked to a routine, scheduled key rollover. During this process, non-validatable signatures were generated and distributed. As a precautionary measure, future rollovers have been suspended until the exact technical causes have been identified.”

We’re confident we’ll learn more once their own analysis is complete.

Takeaways from this incident

This incident highlights a structural reality of the DNS hierarchy: when a registry at the TLD level fails, every domain under that TLD is affected simultaneously, regardless of where it’s hosted or which resolver is used. This isn’t unique to DNSSEC—the same would be true if a TLD’s nameservers became unreachable. The hierarchy that makes the global DNS work is also what makes failures at the top cascade downward.

There is no simple fix for this. What the industry can do is respond quickly and consistently when it happens. In this incident, resolver operators across the Internet independently applied Negative Trust Anchors within an hour, restoring resolution while DENIC worked to fix the zone. Operational best practices, industry communication channels like DNS-OARC, and features like serve-stale all help reduce the impact, even if they can’t eliminate the underlying dependency.

We also identified areas where we can improve. We’ll be working on our EDE error handling to better surface DNSSEC-related errors.

We look forward to DENIC’s post-incident report and appreciate the transparency they demonstrated throughout.

If you’d like to learn more about how DNSSEC works, visit our page How does DNSSEC work? And you can always follow real-time DNS trends and TLD data on Cloudflare Radar.

Top Posts

The Hidden Alignment Chasm: Why Enterprise AI’s Unexamined Reality Gap Threatens Deployment

From OMB M-26-14 Blueprint to Battle-Ready Cyber Edge

Nothing’s Pink Earbuds: Style Meets Sound Test

How We Navigated the .de TLD Outage and Kept Services Running

From OMB M-26-14 Blueprint to Battle-Ready Cyber Edge

CMMC Listening Sessions: DoD Hears Questions as Plans Take Shape

General Dynamics Fires Back: DISA’s Enclave Cloud Expansion Sparks Contract Clash

Hidden Fallout: The Lingering Echoes of the State Department RIF

Chaos in the Cloud: Flipkart’s Wild Ride Through KubeCon 2026

Beyond Hype: How Azure Databricks Quantifies Real Business Wins

The Hidden Alignment Chasm: Why Enterprise AI’s Unexamined Reality Gap Threatens Deployment

From OMB M-26-14 Blueprint to Battle-Ready Cyber Edge

Nothing’s Pink Earbuds: Style Meets Sound Test

Orchestrate an AI Venue Maestro: Architecting Event Fluency with MongoDB, Voyage & LangGraph

The 11-Byte Time Bomb: OpenSSL’s HollowByte Memory Freeze Vulnerability

China’s Kimi K3 Dominates: Shattering Benchmarks Against Claude Fable and GPT 5.6

CMMC Listening Sessions: DoD Hears Questions as Plans Take Shape

Sensing the Skies: IoT’s Silent Revolution in Aerospace Safety Checks

Trending

The Hidden Alignment Chasm: Why Enterprise AI’s Unexamined Reality Gap Threatens Deployment

From OMB M-26-14 Blueprint to Battle-Ready Cyber Edge

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

How We Navigated the .de TLD Outage and Kept Services Running

What we actually deployed

Origin resolution mitigations

Is this a failure of DNSSEC as a technology?

Takeaways from this incident

Related Posts