AI workloads are more and more operating on Kubernetes in manufacturing, however for a lot of groups, the trail from a working mannequin to a dependable system stays unclear. The cloud native ecosystem – its initiatives, patterns, and group – gives a rising set of constructing blocks that assist groups bridge these two worlds.
From mannequin to programs
AI Engineering is the self-discipline of constructing dependable, production-grade programs that use AI fashions as elements. It goes past mannequin coaching and immediate design into the operational challenges that groups operating inference at scale will acknowledge: serving fashions with low latency and excessive availability, effectively scheduling GPU and accelerator assets, observing token throughput and price alongside conventional infrastructure metrics, managing mannequin variations and rollouts safely, and implementing governance and entry insurance policies throughout multi-tenant environments.
These are infrastructure issues, they usually map intently to capabilities the cloud native ecosystem has been growing for years.
The cloud native stack for (Gen) AI
If you happen to’re a platform engineer or SRE being requested to assist AI workloads, the excellent news is that a lot of what you want already exists within the CNCF panorama.
Orchestration and scheduling: Kubernetes is the orchestration layer for AI inference and coaching. The 2025 CNCF Annual Survey discovered that 82% of container customers run Kubernetes in manufacturing, and the platform has advanced effectively past stateless internet providers. A key growth is Dynamic Useful resource Allocation (DRA), which reached GA in Kubernetes 1.34. DRA replaces the constraints of system plugins with fine-grained, topology-aware GPU scheduling utilizing CEL-based filtering and declarative ResourceClaims. For groups managing GPU clusters, DRA is a major step ahead.
Inference routing and cargo balancing: The Gateway API Inference Extension (Inference Gateway), which has reached GA, offers Kubernetes-native APIs for routing inference site visitors based mostly on mannequin names, LoRA adapters, and endpoint well being. This permits platform groups to serve a number of GenAI workloads on shared mannequin server swimming pools for larger utilization and fewer required accelerators. Constructing on this work, the newly fashioned WG AI Gateway is growing requirements for AI-specific networking capabilities: token-based price limiting, semantic routing, payload processing for immediate filtering, and integration patterns for retrieval-augmented technology.
Observability: OpenTelemetry and Prometheus stay important. AI workloads introduce new metrics: tokens per second, time to first token, queue depth, cache hit charges. All of them have to reside alongside conventional infrastructure telemetry. The inference-perf benchmarking device, a part of some metrics standardization effort, studies key LLM efficiency metrics and integrates with Prometheus to supply a constant measurement framework throughout mannequin servers.
ML workflows: Kubeflow has grown right into a top-30 CNCF challenge with lots of of energetic contributors, offering the pipeline orchestration, experiment monitoring, and mannequin serving elements that ML groups want. Kueue handles job queuing and truthful scheduling for batch and coaching workloads.
Coverage and safety. Open Coverage Agent (OPA) and SPIFFE/SPIRE present the governance primitives that manufacturing AI deployments require, from controlling which groups can entry which fashions to establishing workload id throughout inference providers.
GitOps and deployment. Argo and Flux convey the identical declarative, version-controlled deployment patterns to mannequin serving that platform groups already use for utility supply. Secure rollouts matter much more when a nasty mannequin model can produce incorrect or dangerous outputs.
Bridging the hole
There’s an actual hole between AI practitioners and cloud native practitioners at the moment. Regardless of their infrastructure-heavy workloads, solely 41% {of professional} AI builders at the moment determine as cloud native, in response to the CNCF and SlashData State of Cloud Native Growth report. Many come from information science backgrounds the place managed pocket book environments abstracted away operational considerations. In the meantime, cloud native practitioners generally view AI workloads as architecturally international with stateful, GPU-hungry, and completely different from the providers Kubernetes was initially designed for.
Each views include some reality, and each communities profit from closing this hole.
If you happen to’re an AI engineer transferring to Kubernetes, begin with the inference serving stack. Deploy a mannequin server (vLLM or comparable) behind the Inference Gateway, use DRA to handle your GPU assets declaratively, and instrument with OpenTelemetry from the beginning. The patterns will really feel acquainted if you happen to’ve labored with any request-response service at scale.
If you happen to’re a platform engineer supporting AI groups, perceive the brand new workload patterns. Inference providers want autoscaling based mostly on token throughput, not simply CPU. Coaching jobs are long-running and should span a number of nodes with specialised interconnects. Mannequin artifacts are giant and profit from caching methods. The CNCF Platform Engineering Maturity Mannequin offers a helpful framework for constructing self-service golden paths that embrace AI capabilities.
I do know, it’s simpler mentioned than completed. And I’m not saying that’s only a easy shift from A to B. AI, wherein approach each, feels the identical to what we all know thus far, however it’s by itself additionally completely different. With a unique complexity, scale and behavior. As an engineer, I discover this thrilling. One thing new to the ever-same utility calls for.
Why open supply issues right here
AI programs have gotten important infrastructure. The 2025 CNCF Annual Survey discovered that the highest problem organizations face in cloud native adoption is now cultural, not technical. Group dynamics and management alignment outpace device complexity and coaching as blockers. This alerts maturity: the expertise works, and the laborious issues are organizational.
For AI infrastructure particularly, open supply and vendor-neutral governance present three issues that proprietary stacks can’t simply replicate:
Composability. No single challenge solves the total AI manufacturing stack. It requires a container runtime, a scheduler, a coverage engine, an observability pipeline, a workflow orchestrator, an inference gateway, and mannequin serving frameworks, all composed collectively. The CNCF panorama permits this composition by means of interoperability and shared requirements.
Portability. Organizations are operating AI workloads throughout hyperscalers, GPU-focused cloud suppliers, and on-premises infrastructure. Kubernetes and the cloud native ecosystem present the abstraction layer that stops lock-in to any single supplier’s AI stack.
Neighborhood-driven evolution. The pace at which the Kubernetes group has responded to AI workload necessities demonstrates how open governance permits speedy adaptation. DRA, completely different AI-focused work teams, the Inference Gateway, the AI Gateway, the AI Conformance Program… These usually are not top-down product selections; they’re group responses to actual practitioner wants, formed by means of KEPs, working teams, and public design discussions.
Getting began
If you wish to discover the intersection of cloud native and AI Engineering, listed here are some entry factors:
- Strive the Inference Gateway: The getting began information walks you thru deploying an inference-aware load balancer in your cluster.
- Discover DRA for GPU scheduling: The Kubernetes documentation on DRA covers the ideas and API objects. Begin by understanding ResourceClaims and DeviceClasses.
- Be a part of the group: There are a number of energetic work teams growing proposals for brand new AI requirements. The #wg-ai-gateway channel on Kubernetes Slack is an effective start line.
- Attend KubeCon + CloudNativeCon: The AI tracks and co-located occasions present hands-on publicity to those patterns and an opportunity to attach with practitioners fixing the identical issues.
Trying forward
The cloud native ecosystem didn’t emerge particularly for AI, however it’s more and more well-suited to supporting AI programs in manufacturing. The mannequin could drive innovation, however the platform determines how reliably that innovation reaches customers.
A lot of the essential work forward sits on the intersection of those communities. Whether or not you’re constructing fashions or platforms, there is a chance to form how AI programs are operated in apply, by means of open collaboration, shared requirements, and real-world expertise.



