On February 20, 2026, at 17:48 UTC, Cloudflare skilled a service outage when a subset of shoppers who use Cloudflare’s Convey Your Personal IP (BYOIP) service noticed their routes to the Web withdrawn by way of Border Gateway Protocol (BGP).
The problem was not brought on, immediately or not directly, by a cyberattack or malicious exercise of any variety. This subject was attributable to a change that Cloudflare made to how our community manages IP addresses onboarded via the BYOIP pipeline. This modification brought on Cloudflare to unintentionally withdraw buyer prefixes.
For some BYOIP prospects, this resulted of their companies and functions being unreachable from the Web, inflicting timeouts and failures to attach throughout their Cloudflare deployments that used BYOIP. A subset of 1.1.1.1, particularly our vacation spot one.one.one.one, was additionally impacted. The full period of the incident was 6 hours and seven minutes with most of that point spent restoring prefix configurations to their state previous to the change.
Cloudflare engineers reverted the change and prefixes stopped being withdrawn after we started to watch failures. Nonetheless, earlier than engineers have been capable of revert the change, ~1,100 BYOIP prefixes have been withdrawn from the Cloudflare community. Some prospects have been capable of restore their very own service through the use of the Cloudflare dashboard to re-advertise their IP addresses. We resolved the incident after we restored all prefix configurations.
We’re sorry for the influence to our prospects. We allow you to down at this time. This put up is an in-depth recounting of precisely what occurred and which programs and processes failed. We will even define the steps we’re taking to stop outages like this from taking place once more.
How did the outage influence prospects?
This graph reveals the quantity of prefixes marketed by Cloudflare throughout the incident to a BGP neighbor, which correlates to influence as prefixes that weren’t marketed have been unreachable on the Web:
Out of the full 6,500 prefixes marketed to this peer, 4,306 of these have been BYOIP prefixes. These BYOIP prefixes are marketed to each peer and characterize all of the BYOIP prefixes we promote globally.
Throughout the incident, 1,100 prefixes out of the full 6,500 have been withdrawn from 17:56 to 18:46 UTC. Out of the 4,306 whole BYOIP prefixes, 25% of BYOIP prefixes have been unintentionally withdrawn. We have been capable of detect influence on one.one.one.one and revert the impacting change earlier than extra prefixes have been impacted. At 19:19 UTC, we revealed steering to prospects that they’d be capable to self-remediate this incident by going to the Cloudflare dashboard and re-advertising their prefixes.
Cloudflare was capable of revert lots of the commercial adjustments round 20:20 UTC, which brought on 800 prefixes to be restored. There have been nonetheless ~300 prefixes that have been unable to be remediated via the dashboard as a result of the service configurations for these prefixes have been faraway from the sting as a result of a software program bug. These prefixes have been manually restored by Cloudflare engineers at 23:03 UTC.
This incident didn’t influence all BYOIP prospects as a result of the configuration change was utilized iteratively and never instantaneously throughout all BYOIP prospects. As soon as the configuration change was revealed to be inflicting influence, the change was reverted earlier than all prospects have been affected.
The impacted BYOIP prospects first skilled a habits referred to as BGP Path Looking. On this state, finish consumer connections traverse networks looking for a path to the vacation spot IP. This habits will persist till the connection that was opened instances out and fails. Till the prefix is marketed someplace, prospects will proceed to see this failure mode. This loop-until-failure situation affected any product that makes use of BYOIP for commercial to the Web. One.one.one.one, which is a subset of 1.1.1.1, is a prefix onboarded as a BYOIP prefix, and was impacted on this method. This prefix, which was Cloudflare-maintained however utilizing our personal merchandise, allowed us to detect this subject rapidly. A full breakdown of the companies impacted is under.
| Service/Product | Impression Description |
|---|---|
| Core CDN and Safety Companies | Site visitors was not drawn to Cloudflare, and customers connecting to web sites marketed on these ranges would have seen failures to attach |
| Spectrum | Spectrum apps on BYOIP didn’t proxy site visitors as a result of site visitors not being drawn to Cloudflare |
| Devoted Egress | Clients who used Gateway Devoted Egress leveraging BYOIP or Devoted IPs for CDN Egress leveraging BYOIP wouldn’t have been capable of ship site visitors out to their locations |
| Magic Transit | Finish customers connecting to functions protected by Magic Transit wouldn’t have been marketed on the Web, and would have seen connection timeouts and failures |
There was additionally a set of shoppers who have been unable to revive service by toggling the prefixes on the Cloudflare dashboard. As engineers started reannouncing prefixes to revive service for these prospects, these prospects might have seen elevated latency and failures regardless of their IP addresses being marketed. This was as a result of the addressing settings for some customers have been faraway from edge servers due a problem in our personal software program, and the state needed to be propagated again to the sting.
We’re going to get into what precisely broke in our addressing system, however to try this we have to cowl a fast primer on the Addressing API, which is the underlying supply of reality for buyer IP addresses at Cloudflare.
Cloudflare’s Addressing API
The Addressing API is an authoritative dataset of the addresses current on the Cloudflare community. Any change to that dataset is instantly mirrored in Cloudflare’s world community. Whereas we’re within the technique of bettering how these programs roll out adjustments as part of Code Orange: Fail Small, at this time prospects can configure their IP addresses by interacting with public-facing APIs which configure a set of databases that set off operational workflows propagating the adjustments to Cloudflare’s edge. Because of this adjustments to the Addressing API are instantly propagated to the Cloudflare edge.
Promoting and configuring IP addresses on Cloudflare includes a number of steps:
Clients sign to Cloudflare about commercial/withdrawal of IP addresses by way of the Addressing API or BGP Management
The Addressing API instructs the machines to vary the prefix commercials
BGP can be up to date on the routers as soon as sufficient machines have acquired the notification to replace the prefix
Lastly, prospects can configure Cloudflare merchandise to make use of BYOIP addresses by way of service bindings which can assign merchandise to those ranges
The Addressing API permits us to automate a lot of the processes surrounding how we promote or withdraw addresses, however some processes nonetheless require handbook actions. These handbook processes are dangerous due to their shut proximity to Manufacturing. As part of Code Orange: Fail Small, one of many targets of remediation was to take away handbook actions taken within the Addressing API and exchange them with protected workflows.
How did the incident happen?
The precise piece of configuration that broke was a modification making an attempt to automate the shopper motion of eradicating prefixes from Cloudflare’s BYOIP service, an everyday buyer request that’s finished manually at this time. Eradicating this handbook course of was a part of our Code Orange: Fail Small work to push all change in the direction of protected, automated, health-mediated deployment. For the reason that checklist of associated objects of BYOIP prefixes will be massive, this was applied as a part of a usually operating sub-task that checks for BYOIP prefixes that must be eliminated, after which removes them. Sadly, this common cleanup sub-task queried the API with a bug.
Right here is the API question from the cleanup sub-task:
resp, err := d.doRequest(ctx, http.MethodGet, `/v1/prefixes?pending_delete`, nil)
And right here is the related a part of the API implementation:
if v := req.URL.Question().Get("pending_delete"); v != "" {
// ignore different habits and fetch pending objects from the ip_prefixes_deleted desk
prefixes, err := c.RO().IPPrefixes().FetchPrefixesPendingDeletion(ctx)
if err != nil {
api.RenderError(ctx, w, ErrInternalError)
return
}
api.Render(ctx, w, http.StatusOK, renderIPPrefixAPIResponse(prefixes, nil))
return
}
As a result of the consumer is passing pending_delete with no worth, the results of Question().Get(“pending_delete”) right here can be an empty string (“”), so the API server interprets this as a request for all BYOIP prefixes as an alternative of simply these prefixes that have been speculated to be eliminated. The system interpreted this as all returned prefixes being queued for deletion. The brand new sub-task then started systematically deleting all BYOIP prefixes and all of their associated dependent objects together with service bindings, till the influence was seen, and an engineer recognized the sub-task and shut it down.
Why did Cloudflare not catch the bug in our staging setting or testing?
Our staging setting accommodates knowledge that matches Manufacturing as intently as doable, however was not enough on this case and the mock knowledge we relied on to simulate what would happen was inadequate.
As well as, whereas we now have exams for this performance, protection for this situation in our testing course of and setting was incomplete. Preliminary testing and code evaluate targeted on the BYOIP self-service API journey and have been accomplished efficiently. Whereas our engineers efficiently examined the precise course of a buyer would have adopted, testing didn’t cowl a situation the place the task-runner service would independently execute adjustments to consumer knowledge with out express enter.
Why was restoration not instant?
Affected BYOIP prefixes weren’t all impacted in the identical method, necessitating extra intensive knowledge restoration steps. As part of Code Orange: Fail Small, we’re constructing a system the place operational state snapshots will be safely rolled out via health-mediated deployments. Within the occasion one thing does roll out that causes sudden habits, it may be in a short time rolled again to a known-good state. Nonetheless, that system just isn’t in Manufacturing at this time.
BYOIP prefixes have been in numerous states of influence throughout this incident, and every of those completely different states required completely different actions:
Most impacted prospects solely had their prefixes withdrawn. Clients on this configuration might go into the dashboard and toggle their commercials, which might restore service.
Some prospects had their prefixes withdrawn and a few bindings eliminated. These prospects have been in a partial state of restoration the place they may toggle some prefixes however not others.
Some prospects had their prefixes withdrawn and all service bindings eliminated. They might not toggle their prefixes within the dashboard as a result of there was no service (Magic Transit, Spectrum, CDN) sure to them. These prospects took the longest to mitigate, as a world configuration replace needed to be initiated to reapply the service bindings for all these prospects to each single machine on Cloudflare’s edge.
How does this incident relate to Code Orange: Fail Small?
The change we have been making when this incident occurred is a part of the Code Orange: Fail Small initiative, which is aimed toward bettering the resiliency of code and configuration at Cloudflare. As a short primer of the Code Orange: Fail Small initiatives, the work will be divided into three buckets:
Require managed rollouts for any configuration change that’s propagated to the community, identical to we do at this time for software program binary releases.
Change our inside “break glass” procedures and take away any round dependencies in order that we, and our prospects, can act quick and entry all programs with out subject throughout an incident.
Evaluate, enhance, and check failure modes of all programs dealing with community site visitors to make sure they exhibit well-defined habits beneath all circumstances, together with sudden error states.
The change that we tried to deploy falls beneath the primary bucket. By transferring dangerous, handbook adjustments to protected, automated configuration updates which can be deployed in a health-mediated method, we goal to enhance the reliability of the service.
Crucial work was already ongoing to boost the Addressing API’s configuration change help via staged check mediation and higher correctness checks. This work was ongoing in parallel with the deployed change. Though preventative measures weren’t absolutely deployed earlier than the outage, groups have been actively engaged on these programs when the incident occurred. Following our Code Orange: Fail Small promise to require managed rollouts of any become Manufacturing, our engineering groups have been reaching deep into all layers of our stack to determine and repair all problematic findings. Whereas this outage wasn’t itself world, the blast radius and influence have been unacceptably massive, additional reinforcing Code Orange: Fail Small as a precedence till we now have re-established confidence in all adjustments to our community being as gradual as doable. Now let’s speak extra particularly about enhancements to those programs.
API schema standardization
One of many points on this incident is that the pending_delete flag was interpreted as a string, making it troublesome for each consumer and server to rationalize the worth of the flag. We are going to enhance the API schema to make sure higher standardization, which can make it a lot simpler for testing and programs to validate whether or not an API name is correctly shaped or not. This work is a part of the third Code Orange workstream, which goals to create well-defined habits beneath all circumstances.
Higher separation between operational and configured state
Right now, prospects make adjustments to the addressing schema which can be persevered in an authoritative database, and that database is identical one used for operational actions. This makes handbook rollback processes tougher as a result of engineers must make the most of database snapshots as an alternative of rationalizing between desired and precise states. We are going to redesign the rollback mechanism and database configuration to make sure that we now have a simple solution to roll again adjustments rapidly and in addition to introduce layers between buyer configuration and Manufacturing.
We are going to snap shot the info that we learn from the database and are making use of to Manufacturing, and apply these snapshots in the identical method that we deploy all our different Manufacturing adjustments, mediated by well being metrics that may robotically cease the deployment if issues are going unsuitable. Because of this the following time we now have an issue the place the database will get turned into a nasty state, we are able to near-instantly revert particular person prospects (or all prospects) to a model that was working.
Whereas this can quickly block our prospects from with the ability to make direct updates by way of our API within the occasion of an outage, it is going to imply that we are able to proceed serving their site visitors whereas we work to repair the database, as an alternative of being down for that point. This work aligns with the primary and second Code Orange workstreams, which includes quick rollback and in addition protected, health-mediated deployment of configuration.
Higher arbitrate massive withdrawal actions
We are going to enhance our monitoring to detect when adjustments are taking place too quick or too broadly, comparable to withdrawing or deleting BGP prefixes rapidly, and disable the deployment of snapshots when this occurs. This can type a kind of circuit breaker to cease any out-of-control course of that’s manipulating the database from having a big blast radius, like we noticed on this incident.
We even have some ongoing work to immediately monitor that the companies run by our prospects are behaving accurately, and people alerts can be used to journey the circuit breaker and cease probably harmful adjustments from being utilized till we now have had time to research. This work aligns with the primary Code Orange workstream, which includes protected deployment of adjustments.
Under is the timeline of occasions inclusive of deployment of the change and remediation steps:
| Time (UTC) | Standing | Description |
|---|---|---|
| 2026-02-05 21:53 | Code merged into system | Damaged sub-process merged into code base |
| 2026-02-20 17:46 | Code deployed into system | Handle API launch with damaged sub-process completes |
| 2026-02-20 17:56 | Impression Begin | Damaged sub-process begins executing. Prefix commercial updates start propagating and prefixes start to be withdrawn – IMPACT STARTS – |
| 2026-02-20 18:13 | Cloudflare engaged | Cloudflare engaged for failures on one.one.one.one |
| 2026-02-20 18:18 | Inner incident declared | Cloudflare engineers proceed investigating influence |
| 2026-02-20 18:21 | Addressing API group paged | Engineering group chargeable for Addressing API engaged and debugging begins |
| 2026-02-20 18:46 | Concern recognized | Damaged sub-process terminated by an engineer and common execution disabled; remediation begins |
| 2026-02-20 19:11 | Mitigation begins | Cloudflare Engineers start to revive serviceability for prefixes that have been withdrawn whereas others targeted on prefixes that have been eliminated |
| 2026-02-20 19:19 | Some prefixes mitigated | Clients start to re-advertise their prefixes by way of the dashboard to revive service. – IMPACT DOWNGRADE – |
| 2026-02-20 19:44 | Extra mitigation continues | Engineers start database restoration strategies for eliminated prefixes |
| 2026-02-20 20:30 | Ultimate mitigation course of begins | Engineers full launch to revive withdrawn prefixes that also have present service bindings. Others are nonetheless engaged on eliminated prefixes – IMPACT DOWNGRADE – |
| 2026-02-20 21:08 | Configuration replace deploys | Engineering begins world machine configuration rollout to revive prefixes that weren’t self-mitigated or mitigated by way of earlier efforts – IMPACT DOWNGRADE – |
| 2026-02-20 23:03 | Configuration replace accomplished | International machine configuration deployment to revive remaining prefixes is accomplished. – IMPACT ENDS – |
We deeply apologize for this incident at this time and the way it affected the service we offer our prospects, and in addition the Web at massive. We goal to offer a community that’s resilient to vary, and we didn’t ship on our promise to you. We’re actively making these enhancements to make sure improved stability transferring ahead and to stop this drawback from taking place once more.



