The AI Engineering Stack We Constructed Internally — On The Platform We Ship

Within the final 30 days, 93% of Cloudflare’s R&D group used AI coding instruments powered by infrastructure we constructed on our personal platform.

Eleven months in the past, we undertook a significant challenge: to really combine AI into our engineering stack. We would have liked to construct the interior MCP servers, entry layer, and AI tooling obligatory for brokers to be helpful at Cloudflare. We pulled collectively engineers from throughout the corporate to kind a tiger group referred to as iMARS (Inner MCP Agent/Server Rollout Squad). The sustained work landed with the Dev Productiveness group, who additionally personal a lot of our inner tooling together with CI/CD, construct methods, and automation.

Listed below are some numbers that seize our personal agentic AI use during the last 30 days:

3,683 inner customers actively utilizing AI coding instruments (60% company-wide, 93% throughout R&D), out of roughly 6,100 whole staff
47.95 million AI requests
295 groups are presently using agentic AI instruments and coding assistants.
20.18 million AI Gateway requests per 30 days
241.37 billion tokens routed by means of AI Gateway
51.83 billion tokens processed on Staff AI

The impression on developer velocity internally is evident: we’ve by no means seen a quarter-to-quarter improve in merge requests to this diploma.

As AI tooling adoption has grown the 4-week rolling common has climbed from ~5,600/week to over 8,700. The week of March 23 hit 10,952, almost double the This autumn baseline.

MCP servers have been the place to begin, however the group shortly realized we wanted to go additional: rethink how requirements are codified, how code will get reviewed, how engineers onboard, and the way modifications propagate throughout 1000’s of repos.

This put up dives deep into what that regarded like over the previous eleven months and the place we ended up. We’re publishing now, to shut out Brokers Week, as a result of the AI engineering stack we constructed internally runs on the identical merchandise we’re delivery and enhancing this week.

The structure at a look

The engineer-facing instruments layer (OpenCode, Windsurf, and different MCP-compatible purchasers) embrace each open-source and third-party coding assistant instruments.

Every layer maps to a Cloudflare product or software we use:

What we constructed	Constructed with
Zero Belief authentication	Cloudflare Entry
Centralized LLM routing, price monitoring, BYOK, and Zero Knowledge Retention controls	AI Gateway
On-platform inference with open-weight fashions	Staff AI
MCP Server Portal with single OAuth	Staff + Entry
AI Code Reviewer CI integration	Staff + AI Gateway
Sandboxed execution for agent-generated code (Code Mode)	Dynamic Staff
Stateful, long-running agent periods	Brokers SDK (McpAgent, Sturdy Objects)
Remoted environments for cloning, constructing, and testing	Sandbox SDK — GA as of Brokers Week
Sturdy multi-step workflows	Workflows — scaled 10x throughout Brokers Week
16K+ entity data graph	Backstage (OSS)

None of that is internal-only infrastructure. All the pieces (apart from Backstage) listed above is a delivery product, and plenty of of them bought substantial updates throughout Brokers Week.

We’ll stroll by means of this in three acts:

The platform layer — how authentication, routing, and inference work (AI Gateway, Staff AI, MCP Portal, Code Mode)
The data layer — how brokers perceive our methods (Backstage, AGENTS.md)
The enforcement layer — how we preserve high quality excessive at scale (AI Code Reviewer, Engineering Codex)

Act 1: The platform layer

How AI Gateway helped us keep safe and enhance the developer expertise

When you could have over 3,600+ inner customers utilizing AI coding instruments every day, you’ll want to remedy for entry and visibility throughout many consumers, use circumstances, and roles.

All the pieces begins with Cloudflare Entry, which handles all authentication and zero-trust coverage enforcement. As soon as authenticated, each LLM request routes by means of AI Gateway. This offers us a single place to handle supplier keys, price monitoring, and knowledge retention insurance policies.

^{The OpenCode AI Gateway overview: 688.46k requests per day, 10.57B tokens per day, routing to 4 suppliers by means of one endpoint.}

AI Gateway analytics present how month-to-month utilization is distributed throughout mannequin suppliers. Over the past month, inner request quantity broke down as follows.

Supplier	Requests/month	Share
Frontier Labs (OpenAI, Anthropic, Google)	13.38M	91.16%
Staff AI	1.3M	8.84%

Frontier fashions deal with the majority of complicated agentic coding work for now, however Staff AI is already a major a part of the combo and handles an growing share of our agentic engineering workloads.

How we more and more leverage Staff AI

Staff AI is Cloudflare’s serverless AI inference platform which runs open-source fashions on GPUs throughout our world community. Past enormous price enhancements in comparison with frontier fashions, a key benefit is that inference stays on the identical community as your Staff, Sturdy Objects, and storage. No cross-cloud hops to take care of, which trigger extra latency, community flakiness, and extra networking configuration to handle.

^{Staff AI utilization within the final month: 51.47B enter tokens, 361.12M output tokens.}

Kimi K2.5, launched on Staff AI in March 2026, is a frontier-scale open-source mannequin with a 256k context window, software calling, and structured outputs. As we described in our Kimi K2.5 launch put up, we’ve a safety agent that processes over 7 billion tokens per day on Kimi. That may price an estimated $2.4M per yr on a mid-tier proprietary mannequin. However on Staff AI, it is 77% cheaper.

Past safety, we use Staff AI for documentation evaluation in our CI pipeline, for producing AGENTS.md context information throughout 1000’s of repositories, and for light-weight inference duties the place same-network latency issues greater than peak mannequin functionality.

As open-source fashions proceed to enhance, we count on Staff AI to deal with a rising share of our inner workloads.

One factor we bought proper early: routing by means of a single proxy Employee from day one. We might have had purchasers join on to AI Gateway, which might have been easier to arrange initially. However centralizing by means of a Employee meant we might add per-user attribution, mannequin catalog administration, and permission enforcement later with out touching any consumer configs. Each characteristic described within the bootstrap part under exists as a result of we had that single choke level. The proxy sample offers you a management airplane that direct connections do not, and if we plug in extra coding assistant instruments later, the identical Employee and discovery endpoint will deal with them.

The way it works: one URL to configure every part

All the setup begins with one command:

opencode auth login

That command triggers a series that configures suppliers, fashions, MCP servers, brokers, instructions, and permissions, with out the person touching a config file.

Step 1: Uncover auth necessities. OpenCode fetches config from a URL like /.well-known/opencode.

This discovery endpoint is served by a Employee and the response has an auth block telling OpenCode tips on how to authenticate, together with a config block with suppliers, MCP servers, brokers, instructions, and default permissions:

{
  "auth": {
    "command": ["cloudflared", "access", "login", "..."],
    "env": "TOKEN"
  },
  "config": {
    "provider": { "..." },
    "mcp": { "..." },
    "agent": { "..." },
    "command": { "..." },
    "permission": { "..." }
  }
}

Step 2: Authenticate by way of Cloudflare Entry. OpenCode runs the auth command and the person authenticates by means of the identical SSO they use for every part else at Cloudflare. cloudflared returns a signed JWT. OpenCode shops it domestically and routinely attaches it to each subsequent supplier request.

Step 3: Config is merged into OpenCode. The config offered is shared defaults for the whole group, however native configs all the time take precedence. Customers can override the default mannequin, add their very own brokers, or alter challenge and person scoped permissions with out affecting anybody else.

Contained in the proxy Employee. The Employee is a straightforward Hono app that does three issues:

Serves the shared config. The config is compiled at deploy time from structured supply information and comprises placeholder values like {baseURL} for the Employee’s origin. At request time, the Employee replaces these, so all supplier requests route by means of the Employee somewhat than on to mannequin suppliers. Every supplier will get a path prefix (/anthropic, /openai, /google-ai-studio/v1beta, /compat for Staff AI) that the Employee forwards to the corresponding AI Gateway route.
Proxies requests to AI Gateway. When OpenCode sends a request like POST /anthropic/v1/messages, the Employee validates the Cloudflare Entry JWT, then rewrites headers earlier than forwarding:
```
Stripped:   authorization, cf-access-token, host
Added:      cf-aig-authorization: Bearer 
            cf-aig-metadata: {"userId": ""}
```
The request goes to AI Gateway, which routes it to the suitable supplier. The response passes straight by means of with zero buffering. The apiKey area within the consumer config is empty as a result of the Employee injects the true key server-side. No API keys exist on person machines.
Retains the mannequin catalog contemporary. An hourly cron set off fetches the present OpenAI mannequin checklist from fashions.dev, caches it in Staff KV, and injects retailer: false on each mannequin for Zero Knowledge Retention. New fashions get ZDR routinely with out a config redeploy.

Nameless person monitoring. After JWT validation, the Employee maps the person’s e-mail to a UUID utilizing D1 for persistent storage and KV as a learn cache. AI Gateway solely ever sees the nameless UUID in cf-aig-metadata, by no means the e-mail. This offers us per-user price monitoring and utilization analytics with out exposing identities to mannequin suppliers or Gateway logs.

Config-as-code. Brokers and instructions are authored as markdown information with YAML frontmatter. A construct script compiles them right into a single JSON config validated in opposition to the OpenCode JSON schema. Each new session picks up the newest model routinely.

The general structure is easy and simple for anybody to deploy with our developer platform: a proxy Employee, Cloudflare Entry, AI Gateway, and a client-accessible discovery endpoint that configures every part routinely. Customers run one command they usually’re completed. There’s nothing for them to configure manually, no API keys on laptops or MCP server connections to manually arrange. Making modifications to our agentic instruments and updating what 3,000+ folks get of their coding setting is only a wrangler deploy away.

The MCP Server Portal: one OAuth, a number of MCP instruments

We described our full method to governing MCP at enterprise scale in a separate put up, together with how we use MCP Server Portals, Cloudflare Entry, and Code Mode collectively. This is the brief model of what we constructed internally.

Our inner portal aggregates 13 manufacturing MCP servers exposing 182+ instruments throughout Backstage, GitLab, Jira, Sentry, Elasticsearch, Prometheus, Google Workspace, our inner Launch Supervisor, and extra. This unifies entry and simplifies every part giving us one endpoint and one Cloudflare Entry movement governing entry to each software.

Every MCP server is constructed on the identical basis: McpAgent from the Brokers SDK, workers-oauth-provider for OAuth, and Cloudflare Entry for identification. The entire thing lives in a single monorepo with shared auth infrastructure, Bazel builds, CI/CD pipelines, and catalog-info.yaml for Backstage registration. Including a brand new server is usually copying an current one and altering the API it wraps. For extra on how this works and the safety structure behind it, see our enterprise MCP reference structure.

Code Mode on the portal layer

MCP is the best protocol for connecting AI brokers to instruments, nevertheless it has a sensible drawback: each software definition consumes context window tokens earlier than the mannequin even begins working. Because the variety of MCP servers and instruments grows, so does the token overhead, and at scale, this turns into an actual price. Code Mode is the rising repair: as a substitute of loading each software schema up entrance, the mannequin discovers and calls instruments by means of code.

Our GitLab MCP server initially uncovered 34 particular person instruments (get_merge_request, list_pipelines, get_file_content, and so forth). These 34 software schemas consumed roughly 15,000 tokens of context window per request. On a 200K context window, that is 7.5% of the funds gone earlier than asking a query. Multiplied throughout each request, each engineer, daily, it provides up.

MCP Server Portals now help Code Mode proxying, which lets us remedy that drawback centrally as a substitute of 1 server at a time. Slightly than exposing each upstream software definition to the consumer, the portal collapses them into two portal-level instruments: portal_codemode_search and portal_codemode_execute.

The great factor about doing this on the portal layer is that it scales cleanly. With out Code Mode, each new MCP server provides extra schema overhead to each request. With portal-level Code Mode, the consumer nonetheless solely sees two instruments whilst we join extra servers behind the portal. Which means much less context bloat, decrease token price, and a cleaner structure general.

Act 2: The data layer

Backstage: the data graph beneath all of it

Earlier than the iMARS group might construct MCP servers that have been truly helpful, we wanted to unravel a extra basic drawback: structured knowledge about our providers and infrastructure. We want our brokers to grasp context exterior the code base, like who owns what, how providers rely upon one another, the place the documentation lives, and what databases a service talks to.

We run Backstage, the open-source inner developer portal initially constructed by Spotify, as our service catalog. It is self-hosted (not on Cloudflare merchandise, for the report) and it tracks issues like:

2,055 providers, 167 libraries, and 122 packages
228 APIs with schema definitions
544 methods (merchandise) throughout 45 domains
1,302 databases, 277 ClickHouse tables, 173 clusters
375 groups and 6,389 customers with possession mappings
Dependency graphs connecting providers to the databases, Kafka matters, and cloud assets they depend on

Our Backstage MCP server (13 instruments) is on the market by means of our MCP Portal, and an agent can search for who owns a service, test what it depends upon, discover associated API specs, and pull Tech Insights scores, all with out leaving the coding session.

With out this structured knowledge, brokers are working blind. They’ll learn the code in entrance of them, however they cannot see the system round it. The catalog turns particular person repos right into a linked map of the engineering group.

AGENTS.md: getting 1000’s of repos prepared for AI

Early within the rollout, we stored seeing the identical failure mode: coding brokers produced modifications that regarded believable and have been nonetheless fallacious for the repo. Normally the issue was native context: the mannequin did not know the best take a look at command, the group’s present conventions, or which components of the codebase have been off-limits. That pushed us towards AGENTS.md: a brief, structured file in every repo that tells coding brokers how the codebase truly works and forces groups to make that context express.

What AGENTS.md seems like

We constructed a system that generates AGENTS.md information throughout our GitLab occasion. As a result of these information sit immediately within the mannequin’s context window, we needed them to remain brief and high-signal. A typical file seems like this:

# AGENTS.md

## Repository
- Runtime: cloudflare employees
- Check command: `pnpm take a look at`
- Lint command: `pnpm lint`

## The way to navigate this codebase
- All cloudflare employees  are in src/employees/, one file per employee
- MCP server definitions are in src/mcp/, every software in a separate file
- Checks mirror supply: src/foo.ts -> assessments/foo.take a look at.ts

## Conventions
- Testing: use Vitest with `@cloudflare/vitest-pool-workers` (Codex: RFC 021, RFC 042)
- API patterns: Observe inner REST conventions (Codex: API-REST-01)

## Boundaries
- Don't edit generated information in `gen/`
- Don't introduce new background jobs with out updating `config/`

## Dependencies
- Will depend on: auth-service, config-service
- Trusted by: api-gateway, dashboard

When an agent reads this file, it does not must infer the repo from scratch. It is aware of how the codebase is organized, which conventions to comply with and which Engineering Codex guidelines apply.

How we generate them at scale

The generator pipeline pulls entity metadata from our Backstage service catalog (possession, dependencies, system relationships), analyzes the repository construction to detect the language, construct system, take a look at framework, and listing structure, then maps the detected stack to related Engineering Codex requirements. A succesful mannequin then generates the structured doc, and the system opens a merge request so the proudly owning group can evaluation and refine it.

We have processed roughly 3,900 repositories this fashion. The primary cross wasn’t all the time good, particularly for polyglot repos or uncommon construct setups, however even that baseline was a lot better than asking brokers to deduce every part from scratch.

The preliminary merge request solved the bootstrap drawback, however retaining these information present mattered simply as a lot. A stale AGENTS.md will be worse than no file in any respect. We closed that loop with the AI Code Reviewer, which might flag when repository modifications counsel that AGENTS.md needs to be up to date.

Act 3: The enforcement layer

Each merge request at Cloudflare will get an AI code evaluation. Integration is simple: groups add a single CI element to their pipeline, and from that time each MR is reviewed routinely.

We use GitLab’s self-hosted resolution as our CI/CD platform. The reviewer is carried out as a GitLab CI element that groups embrace of their pipeline. When an MR is opened or up to date, the CI job runs OpenCode with a multi-agent evaluation coordinator. The coordinator classifies the MR by threat tier (trivial, lite, or full) and delegates to specialised evaluation brokers: code high quality, safety, codex compliance, documentation, efficiency, and launch impression. Every agent connects to the AI Gateway for mannequin entry, pulls Engineering Codex guidelines from a central repo, and reads the repository’s AGENTS.md for codebase context. Outcomes are posted again as structured MR feedback.

A separate Staff-based config service handles centralized mannequin choice per reviewer agent, so we will shift fashions with out altering the CI template. The evaluation course of itself runs within the CI runner and is stateless per execution.

We frolicked getting the output format proper. Critiques are damaged into classes (Safety, Code High quality, Efficiency) so engineers can scan headers somewhat than studying partitions of textual content. Every discovering has a severity stage (Essential, Essential, Suggestion, or Optionally available Nits) that makes it instantly clear what wants consideration versus what’s informational.

The reviewer maintains context throughout iterations. If it flagged one thing in a earlier evaluation spherical that has since been mounted, it acknowledges that somewhat than re-raising the identical difficulty. And when a discovering maps to an Engineering Codex rule, it cites the particular rule ID, turning an AI suggestion right into a reference to an organizational normal.

Staff AI handles about 15% of the reviewer’s site visitors, primarily for documentation evaluation duties the place Kimi K2.5 performs nicely at a fraction of the price of frontier fashions. Fashions like Opus 4.6 and GPT 5.4 deal with security-sensitive and architecturally complicated evaluations the place reasoning functionality issues most.

Over the past 30 days:

We’re releasing a detailed technical weblog put up alongside this one which covers the reviewer’s inner structure, together with how we route between fashions, the multi-agent orchestration, and the price optimization methods we have developed.

Engineering Codex: engineering requirements as agent abilities

The Engineering Codex is Cloudflare’s new inner requirements system the place our core engineering requirements reside. We’ve got a multi-stage AI distillation course of, which outputs a set of codex guidelines (“If you need X, use Y. You must do X, if you are doing Y or Z.”) together with an agent talent that makes use of progressive disclosure and nested hierarchical info directories and hyperlinks throughout markdown information.

This talent is on the market for engineers to make use of domestically as they construct with prompts like “how should I handle errors in my Rust service?” or “review this TypeScript code for compliance.” Our Community Firewall group audited rampartd utilizing a multi-agent consensus course of the place each requirement was scored COMPLIANT, PARTIAL, or NON-COMPLIANT with particular violation particulars and remediation steps decreasing what beforehand required weeks of guide work to a structured, repeatable course of.

At evaluation time, the AI Code Reviewer cites particular Codex guidelines in its suggestions.

^{AI Code Assessment: exhibiting categorized findings (Codex Compliance on this case) noting the codex RFC violation.}

None of those items are particularly novel on their very own. Loads of firms run service catalogs, ship reviewer bots, or publish engineering requirements. The distinction is the wiring. When an agent can pull context from Backstage, learn AGENTS.md for the repo it’s modifying, and get reviewed in opposition to Codex guidelines by the identical toolchain, the primary draft is normally shut sufficient to ship. That wasn’t true six months in the past.

From launching this effort to 93% R&D adoption took lower than a yr.

Firm-wide adoption (Feb 5 – April 15, 2026):

Metric	Worth
Energetic customers	3,683 (60% of the corporate)
R&D group adoption	93%
AI messages	47.95M
Groups with AI exercise	295
OpenCode messages	27.08M
Windsurf messages	434.9K

AI Gateway (final 30 days, mixed):

Metric	Worth
Requests	20.18M
Tokens	241.37B

Staff AI (final 30 days):

Metric	Worth
Enter tokens	51.47B
Output tokens	361.12M

What’s subsequent: background brokers

The subsequent evolution in our inner engineering stack will embrace background brokers: brokers that may be spun up on demand with the identical instruments accessible domestically (MCP portal, git, take a look at runners) however operating totally within the cloud. The structure makes use of Sturdy Objects and the Brokers SDK for orchestration, delegating to Sandbox containers when the job requires a full improvement setting like cloning a repo, putting in dependencies, or operating assessments. The Sandbox SDK went GA throughout Brokers Week.

Lengthy-running brokers, shipped natively into the Brokers SDK throughout Brokers Week, remedy the sturdy session drawback that beforehand required workarounds. The SDK now helps periods that run for prolonged intervals with out eviction, sufficient for an agent to clone a big repo, run a full take a look at suite, iterate on failures, and open a MR in a single session.

This represents an eleven-month effort to rethink not simply how code will get written, however the way it will get reviewed, how requirements are enforced, and the way modifications ship safely throughout 1000’s of repos. Each layer runs on the identical merchandise our clients use.

Brokers Week simply shipped every part you want. The platform is right here.

npx create-cloudflare@newest --template cloudflare/agents-starter

That brokers starter will get you operating. The diagram under is the total structure for while you’re able to develop it, your instruments layer on high (chatbot, net UI, CLI, browser extension), the Brokers SDK dealing with session state and orchestration within the center, and the Cloudflare providers you name from it beneath.

Docs: Brokers SDK · Sandbox SDK · AI Gateway · Staff AI · Workflows · Code Mode · MCP on Cloudflare

Repos: cloudflare/brokers · cloudflare/sandbox-sdk · cloudflare/mcp-server-cloudflare · cloudflare/abilities

For extra on how we’re utilizing AI at Cloudflare, learn the put up on our course of for AI Code Assessment. And take a look at every part we shipped throughout Brokers Week.

We’d love to listen to what you construct. Discover us on Discord, X, and Bluesky.

^{Ayush Thakur constructed the AGENTS.md system and the AI Gateway integration for the OpenCode infrastructure, Scott Roemeschke is the Engineering Supervisor of the Developer Productiveness group at Cloudflare, Rajesh Bhatia leads the Productiveness Platform operate at Cloudflare. This put up was a collaborative effort throughout the Devtools group, with assist from volunteers throughout the corporate by means of the iMARS (Inner MCP Agent/Server Rollout Squad) tiger group.}

Top Posts

10 GitHub Repositories for Web Development in Python

Morse Micro Adds High-Power Wi-Fi HaLow Module for Long-Range IoT Designs

“Crunching the Pitch: Can Algorithms Really Predict World Cup Glory?”

The AI engineering stack we constructed internally — on the platform we ship

VA EHR Expansion Accelerates: Four New Deployments Signal Nationwide Digital Health Push

Unleashing Speed at Scale: KubeVirt Performance Reimagined with VirtBench

AWS Weekly Roundup: BYOM for Amazon RDS for SQL Server, AWS IoT Device SDK for Swift, and More

Transforming Cloudflare Threat Signals into Instant WAF Protections

Beyond One Data Center: Mastering Geo-Distributed AI with the k0smos Platform

Disabled Federal Workers Take Legal Action Against Justice Department Over Denied Accessibility Rights

10 GitHub Repositories for Web Development in Python

Morse Micro Adds High-Power Wi-Fi HaLow Module for Long-Range IoT Designs

“Crunching the Pitch: Can Algorithms Really Predict World Cup Glory?”

Whales Are Swallowing Bitcoin’s Plunge

Cybersecurity M&A in May 2026: 26 Deals Unpacked

VA EHR Expansion Accelerates: Four New Deployments Signal Nationwide Digital Health Push

Decades of Remote Work: The 7 Laptop-Bag Essentials I Never Leave Home Without

AI Agents Outpace Traditional Search by 48x in Groundbreaking Harvard-Perplexity Study

Trending

10 GitHub Repositories for Web Development in Python

Morse Micro Adds High-Power Wi-Fi HaLow Module for Long-Range IoT Designs

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

The AI engineering stack we constructed internally — on the platform we ship

The structure at a look

Act 1: The platform layer

How AI Gateway helped us keep safe and enhance the developer expertise

How we more and more leverage Staff AI

The way it works: one URL to configure every part

The MCP Server Portal: one OAuth, a number of MCP instruments

Code Mode on the portal layer

Act 2: The data layer

Backstage: the data graph beneath all of it

AGENTS.md: getting 1000’s of repos prepared for AI

What AGENTS.md seems like

How we generate them at scale

Act 3: The enforcement layer

Engineering Codex: engineering requirements as agent abilities

What’s subsequent: background brokers

Related Posts