Cloudflare powers roughly one-fifth of the internet, but we don’t do it all by themselves. Developers building on their platform draw on a wide range of external tools and services alongside Cloudflare’s own offerings. To help tie everything together, Cloudflare exposes a robust API that developers use to build automations, CI/CD pipelines, and integrations that connect the different pieces of their infrastructure. Earlier this month, they introduced self-managed OAuth, a feature that simplifies how customers create and manage their own OAuth clients for delegated access to the Cloudflare API.
Cloudflare is no stranger to OAuth. If you’ve used Wrangler or integrations from partners such as PlanetScale, you’ve already interacted with it. Until now, however, third-party OAuth access was restricted to a handful of manually onboarded integrations and wasn’t available to the broader developer community. That meant developers building custom integrations had no choice but to rely on API tokens, which are more cumbersome to manage and don’t suit many delegated-access scenarios particularly well.
Over the past year, Cloudflare onboarded a steadily growing set of early partners while simultaneously refining the consent flow, revocation mechanisms, and overall security posture behind their OAuth implementation. But as the Developer Platform expanded and AI-driven agent tools created surging demand for delegated access, it became evident that making OAuth available to all customers was essential to the platform’s long-term success.
With self-managed OAuth, developers can now implement a standard OAuth flow in which customers grant scoped permissions directly. This makes it far simpler to build SaaS integrations, internal developer platforms, and agentic tools, all while giving end users clearer consent prompts, straightforward revocation, and greater control over what each application is allowed to do.
Scaling the ecosystem securely
While the previous OAuth setup worked well enough for a small, tightly managed group of partners, the team recognized that their permissions model, consent experience, and abuse-mitigation strategies weren’t yet mature enough for broader exposure.
Earlier this year, they overhauled the consent experience so that it’s now much clearer which application is requesting access and exactly which permissions it will receive. They also added a revocation mechanism to the dashboard, giving developers an easy way to control which applications can access their data, and made app ownership more visible to help defend against OAuth phishing attacks.
Rolling out self-managed OAuth to every customer also demanded significant upgrades to the underlying OAuth engine. This undertaking required extensive planning so that the transition would cause minimal disruption to users while preserving data stability and security throughout.
Planning the upgrade to our OAuth engine
Several years ago, Cloudflare deployed Hydra, an open-source OAuth engine, to serve as the backbone of their OAuth infrastructure. That deployment performed reliably while usage was limited, but as the developer platform scaled and agentic workflows grew more prevalent, it became obvious that a major upgrade was needed to unlock new capabilities and boost performance.
During the planning phase, the team opted to carry out two smaller sequential upgrades rather than a single large leap. First, they would migrate to the latest 1.X release, assess any behavioral or performance differences, and only then proceed with the 2.X upgrade.
While mapping out the upgrade, they discovered that even the 1.X transition would affect customers, because the Hydra database required extensive schema migrations that:
Created indexes in a way that would acquire an exclusive lock on critical tables, blocking active users from performing essential OAuth operations
Added columns to critical tables and relocated other columns to entirely new tables
There was also an idiosyncrasy in the version of Hydra they were running: its SDK issued SELECT * queries, which triggered deserialization errors in the face of the schema changes.
To shield users from any impact, the team rewrote the SQL migrations to leverage features like CREATE INDEX CONCURRENTLY, and they built a custom build of Hydra that selected explicit columns instead of using SELECT *.
With the latest 1.X upgrade mapped out, the next step was to devise a strategy for the considerably larger 2.X migration. They identified three possible approaches and evaluated the trade-offs of each. An in-place upgrade was ruled out immediately, given the sheer volume of schema changes that the major version bump introduced. A blue-green deployment strategy seemed viable, but it required far more than simply toggling a switch to point traffic at the new version. The upgrade and migration process would span multiple hours, and the system had to continue operating correctly throughout that entire window.
The first blue-green approach they considered would involve halting all writes to the database, preventing any new authorizations from being created. This would ensure nothing was lost during the transition, but it also meant that no one could use existing OAuth applications unless they already held a valid credential. It introduced yet another serious problem: if a user needed to revoke an application’s access for any reason, they would be unable to do so while the upgrade was underway.
To address these shortcomings, the team devised a method that left database writes enabled, accepting that some writes might be lost during the cutover to the green version. The first challenge was minimizing the volume of new-token writes. They found an effective lever: extending the expiration time of tokens to several hours. This meant that applications that obtained new tokens before the upgrade could continue using them without needing to refresh.
With write reduction handled, the next problem was ensuring that no revocations performed by users during the upgrade window would be lost. Their solution was a queue system built on Cloudflare Queues. Whenever a revocation event occurred, a record describing that event was written into the queue. After the database was switched over to the green version, the team could drain the queue and replay every revocation that had taken place during the period when writes would otherwise have been lost. Getting this right was critical—any mistake would inadvertently restore access to applications that users had deliberately revoked.
From an operational standpoint, the first upgrade to the latest 1.X release went smoothly with zero issues. The custom database migrations completed faster than anticipated, and there was no impact on users. The team had to perform a hard cutover to the new version because the legacy version was unable to introspect tokens issued by the newer release.
Following the cutover, they observed a spike in refresh token errors
We encountered an entirely new type of error that we had never seen before. The root cause turned out to be stricter refresh-token invalidation logic in the newer version: whenever a refresh token was reused, Hydra would invalidate the entire chain of access and refresh tokens. This created a serious problem for both Wrangler and MCP clients. Both of these clients generate a high volume of requests, and a single reused refresh token would wipe out the whole session.
We addressed this by introducing refresh-token coalescing behavior into our Worker, which directs OAuth traffic to the proper destination. This let us briefly cache the refresh-token request before it reached Hydra, so that if we detected a retry we could short-circuit the response without invalidating any tokens. As a longer-term fix, Hydra’s 2.X releases include a configurable “refresh token grace period,” which allows a refresh token to be retried for a short window without tearing down the entire chain.
Since several hours of noticeable user impact would not be acceptable, we relied on our blue-green upgrade strategy. In principle, this approach is straightforward: run the migrations against a copy of the production database, and then switch over to the new Hydra version once they finish. In practice, however, there were many more pieces that had to be coordinated:
- Enable the revocation replay capture queue
- Copy and restore our database to the new target environment
- Perform targeted data cleanup — existing rows violated certain new constraints introduced in the newer versions, which could cause the migrations to fail
- Execute simultaneous cutovers across the Hydra service and two other critical internal systems to avoid any errors
- Conduct post-cutover monitoring and validation
We selected an upgrade window during which Hydra’s per-second request volume was at its lowest, so that any lost token writes would be minimal. Aside from some adjustments to timeout settings, our production migrations ran smoothly against the new database: the total runtime in production was roughly three hours. Once the migrations were done, we carefully rolled out the new version of the Hydra service, along with two additional system configuration changes to point our systems at the new SDK version.
Shortly after we switched traffic over, we noticed that a data cleanup job inside our authorization service — which depends on the Hydra consent session API — was purging OAuth policy data far too aggressively. After digging into the issue, we found a bug in one of the Hydra migrations that corrupted the state of certain valid OAuth sessions, causing the migration to flag them as invalid. This corruption of valid sessions created a mismatch between Hydra and our authorization service, which showed up as a spike in 403 errors. To address this, we performed data restorations and began working on improvements to OAuth authorization behavior so that the system no longer depends on static policy data.
Beyond the data cleanup problem, there were a handful of smaller fixes driven by specific client behaviors, which we shipped quickly.
With the Hydra version upgrade finished, OAuth traffic has remained stable, delivering better performance and reliability for our customers. It also brought production onto the same foundation that our newer OAuth APIs had already been validated against in staging, paving the way for our self-managed OAuth release on June 3.
After completing an upgrade of this scale, it is always both rewarding and instructive to review broad metrics that reflect the impact. We collected additional metrics during the database migrations and saw meaningful performance gains once the upgrade was complete.
| Metric | Approx. Value |
|---|---|
| Rows updated | 132.5M |
| Rows inserted | 114.7M |
| Temp bytes | 136.97GB |
| Transaction commits | 22.2k |
| Metric (avg) | Before | After | Change |
|---|---|---|---|
| API P95 | 185ms | 101ms | -45% |
| RSS memory | 888MB | 763MB | -14% |
| Go heap allocation | 449 MB | 271 MB | -40% |
| Goroutines | 4,015 | 3,076 | -23% |
| CPU Usage | 1.07 cores | 0.67 cores | -37% |
Self-managed OAuth for Everyone
Extending OAuth access to all customers marks a significant milestone in building a richer Cloudflare application ecosystem. Every Cloudflare user can now develop custom OAuth apps and construct integrations directly on Cloudflare. We’re thrilled to roll out self-managed OAuth across the board.
To dive in, check out our documentation or head over to the OAuth apps section in the dashboard to craft your very first OAuth app.



