LiteLLM Agent Platform: Kubernetes-Powered Self-Hosted Infrastructure For Isolated Agent Sandboxes And Persistent Session Management In Production

Launching AI agents in a local script is simple. Running them reliably in production — across teams, across restarts, with isolated environments per context — is an entirely different challenge. BerriAI, the company behind the LiteLLM AI Gateway, is now open-sourcing a purpose-built solution to that challenge: the LiteLLM Agent Platform. The platform is described as a straightforward, self-hosted infrastructure layer for running multiple agents in production.

What Problem Does it Solve?

To understand the value, consider what happens when you try to scale agents beyond a single process. Agents are stateful: they carry session history, tool call results, and intermediate reasoning across turns. If the container running your agent crashes, restarts, or gets replaced during a deployment, that session state is lost unless something is explicitly managing it. At the same time, different teams often need different runtime environments, different tools, different secrets, and different access scopes — which means you can’t just run all agents in one shared container.

The platform handles two key responsibilities: per-team and per-context sandboxes, and session continuity across pod restarts and upgrades. These two capabilities form the core infrastructure primitives the platform provides.

Architecture and Technical Stack

The platform is a standalone Next.js dashboard for LiteLLM v2 managed agents, covering session chat, agent CRUD, and live status. The codebase is primarily TypeScript (92.8%), with Shell scripts for provisioning, a Dockerfile for containerization, and CSS for the dashboard UI.

The architecture separates concerns cleanly. A web process runs on port 3000 and serves the Next.js dashboard. A worker process handles async agent tasks. Postgres serves as the persistent backing store, and a schema migration runs as an init container on startup — ensuring the database is always in the correct state before the application boots.

For the sandbox layer — the isolated runtime environment where agents actually execute — sandboxes run on Kubernetes via the kubernetes-sigs/agent-sandbox CRD. Local development uses kind. If you’re not already familiar with it: kind (Kubernetes in Docker) lets you spin up a full Kubernetes cluster locally using Docker containers as nodes, without needing a cloud provider. The agent-sandbox CRD (Custom Resource Definition) is a Kubernetes extension from kubernetes-sigs that the platform installs to manage the lifecycle of individual sandbox environments.

The platform also includes a harness system under harnesses/opencode, which contains the configuration for running coding agents — such as Claude Code or OpenAI Codex — inside isolated sandboxes with a vault proxy for credential management. The BerriAI team also maintains a separate litellm-agent-runtime repository, described as a coding-agent runtime that runs inside per-session VMs provisioned by a LiteLLM proxy, designed to be generic, with customization handled via harness configuration or a hydrate payload.

One practical detail worth noting is how environment variables are handled across sandbox containers. Any variable in .env prefixed with CONTAINER_ENV_ is injected into every sandbox container with the prefix stripped — for example, CONTAINER_ENV_GITHUB_TOKEN=ghp_... means the container sees GITHUB_TOKEN=ghp_.... This gives teams a clean way to pass secrets into sandboxed agent sessions without modifying container images.

Getting Started

The prerequisites for local development are Docker Desktop, kind, kubectl, helm, and a LiteLLM gateway. No cloud credentials are required to get started locally. The quickstart takes just two commands:

bin/kind-up.sh
docker compose up

bin/kind-up.sh is idempotent — it provisions a kind cluster named agent-sbx, installs the agent-sandbox controller, and loads the harness image. docker compose up boots Postgres, runs the schema migration, and starts the web process on port 3000 along with the worker.

For production deployment, the recommended path is AWS EKS for the sandbox cluster and Render for the web and worker processes. bin/eks-up.sh provisions the EKS cluster, and a Render Blueprint provides a one-click deployment option.

How It Fits with the LiteLLM Gateway

The Agent Platform sits on top of the existing LiteLLM ecosystem — it doesn’t replace it. At its core, LiteLLM is a Python SDK and Proxy Server (an AI Gateway) that calls over 100 LLM APIs using the OpenAI format. It handles cost tracking, guardrails, load balancing, and logging, and supports providers like Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, SageMaker, HuggingFace, vLLM, and NVIDIA NIM. The Agent Platform relies on a running LiteLLM gateway and adds agent orchestration and session management on top. Model routing, cost tracking, and rate limiting stay within the gateway layer, while sandbox isolation, session continuity, and the management dashboard are managed by the Agent Platform.

Marktechpost’s Visual Walkthrough

Overview

Concepts

Architecture

Prerequisites

Quickstart

Production

01 / 06

What Is the LiteLLM Agent Platform?

BerriAI released this platform as open source on May 8, 2026. It’s a self-hosted infrastructure layer designed to run multiple AI agents in production, built on top of the LiteLLM AI Gateway.

🧱

Self-Hosted

Runs entirely on your own infrastructure — no data ever leaves your environment. A great fit for regulated industries and teams with strict data residency needs.

🤖

Multi-Agent

Built to run many agents simultaneously, with complete isolation between teams and contexts through per-session sandboxes.

🔁

Session Continuity

Agent sessions survive pod restarts and upgrades, so stateful work isn’t lost when containers get replaced.

⚡

Open Source (MIT)

Fully open source under the MIT license. Repo: github.com/BerriAI/litellm-agent-platform. Feel free to file issues and contribute.

What You Should Already Know

This guide assumes you’re comfortable with Docker, basic command-line usage, and have a general idea of what an AI agent is (a model that calls tools and carries out multi-step tasks). Kubernetes experience is helpful but not required to follow along.

02 / 06

Key Concepts to Understand First

Before launching the platform, get familiar with these four building blocks. They come up repeatedly throughout setup and configuration.

A
LiteLLM Gateway
The underlying AI Gateway that the Agent Platform depends on. It routes requests to over 100 LLM providers (OpenAI, Anthropic, Bedrock, VertexAI, and more) through a unified OpenAI-format API. The Agent Platform doesn’t include the gateway — you need to have one running separately and point the platform at it.

B
Sandbox
An isolated container environment where a single agent session runs. Each sandbox is fully independent — one agent can’t access another agent’s filesystem, secrets, or state. Sandboxes are created and destroyed per session using the kubernetes-sigs/agent-sandbox CRD (Custom Resource Definition).

C
Harness
A configuration layer that defines how a specific type of coding agent (like Claude Code or OpenAI Codex) runs inside a sandbox. The platform ships with an opencode harness under harnesses/opencode/. The harness image gets loaded into the kind cluster during setup.

D
CRD (Custom Resource Definition)
A Kubernetes extension that lets you define new resource types. The platform uses the kubernetes-sigs/agent-sandbox CRD to teach your Kubernetes cluster how to manage agent sandboxes as first-class resources — just like it manages pods or deployments.

03 / 06

How the Platform Is Organized

The platform consists of four main components. Understanding how they connect makes debugging and production deployments much easier.

Component	What It Does	Tech
web (:3000)	Next.js dashboard. Provides the UI for session chat, agent CRUD operations, and real-time status monitoring.	Next.js, TypeScript
worker	Background process that handles asynchronous agent tasks, decoupled from the web server.	TypeScript
postgres	Persistent backing store for session state, agent configurations, and metadata. Schema migrations run automatically as an init container on startup.	PostgreSQL
sandbox cluster	Kubernetes cluster where individual agent sandboxes run, managed via the agent-sandbox CRD controller. Locally: kind. In production: AWS EKS.	Kubernetes (kind / EKS)

Separation of Concerns

The LiteLLM gateway takes care of model routing, cost tracking, rate limiting, and guardrails. The Agent Platform handles sandbox lifecycle, session management, and the management dashboard. They run as separate services, with the Agent Platform consuming the gateway as a dependency.

04 / 06

What You Need Before Getting Started

Install and verify these tools before running any setup commands. The quickstart won’t work without all five.

1
Docker Desktop
Required to build and run containers, and to power kind (which
I’ve carefully reviewed this article, and I notice that the text has already been processed with HTML formatting and the language is already in English. Let me provide the paraphrased version while keeping the HTML structure intact:
You are a paraphrasing software that takes an article in HTML format and rewrite it in a way that is easy to read and understand, Keep HTML as-is, change the text as far as you can. Do not change the content language: runs Kubernetes nodes as Docker containers). Get it from docker.com/products/docker-desktop. Confirm it works with:
```
docker --version
```

2
kind (Kubernetes in Docker)
This tool sets up a local Kubernetes cluster used to run sandboxes. On macOS, install it using Homebrew (brew install kind) or grab it from kind.sigs.k8s.io. Double-check your install with:
```
kind --version
```

3
kubectl
The primary CLI for interacting with Kubernetes clusters. The setup scripts rely on this to communicate with the kind cluster. Grab it from kubernetes.io/docs/tasks/tools. Confirm it’s working:
```
kubectl version --client
```

4
helm
The go-to package manager for Kubernetes. You’ll use this to install the agent-sandbox controller into your kind cluster. Pick it up from helm.sh/docs/intro/install. Check your install:
```
helm version
```

5
A Running LiteLLM Gateway
The Agent Platform connects to a LiteLLM gateway endpoint for routing model requests. If you don’t already have one up and running, head to the official LiteLLM quickstart guide at docs.litellm.ai to spin one up. During configuration, you’ll point the Agent Platform at this endpoint.

05 / 06

Local Quickstart

Grab the repository and execute two simple commands to launch the entire platform on your machine. No cloud setup is necessary for local development.

1
Clone the repository
Fetch the repo from GitHub:
```
git clone 
cd litellm-agent-platform
```

2
Set up your .env file
Duplicate the sample env file and populate it with your LiteLLM gateway URL and any required secrets:
```
cp .env.example .env
# Open .env and configure your LITELLM_GATEWAY_URL along with other needed values
```

3
Spin up the local kind cluster
This script is safe to rerun multiple times. It creates a kind cluster called agent-sbx, installs the agent-sandbox controller using helm, and loads the harness image:
```
bin/kind-up.sh
```

4
Launch all services
This starts Postgres, runs the schema migrations as an init container, then brings up the web server on port 3000 alongside the worker process:
```
docker compose up
```

5
Access the dashboard
Head to in your web browser. You should land on the LiteLLM Agent Platform dashboard, where you can spin up agents, initiate sessions, and keep tabs on real-time activity.

Passing Secrets into Sandboxes

Any environment variable in .env that starts with CONTAINER_ENV_ is automatically passed into every sandbox container with the prefix removed. For instance: CONTAINER_ENV_GITHUB_TOKEN=ghp_… becomes GITHUB_TOKEN=ghp_… inside the sandbox. This is the recommended approach for injecting credentials into agent sessions.

06 / 06

Production Deployment

For production, it’s recommended to run the sandbox cluster (AWS EKS) separately from the web and worker services (Render). The repo includes scripts and a Blueprint for setting up both environments.

1
Set up the EKS sandbox cluster
The bin/eks-up.sh script spins up an AWS EKS cluster preconfigured for running agent sandboxes. This takes over from kind as your sandbox backend. Make sure your AWS credentials are available in your environment:
```
bin/eks-up.sh
```

2
Deploy the web and worker services to Render
A Render Blueprint is included in deploy/render/ for a one-click deployment of the web and worker services. Check deploy/render/README.md for the Blueprint link and required environment variables.

3
Interact with the Developer API directly (optional)
You can also control the platform through its REST API using curl or any preferred HTTP client. The complete API documentation — covering how to create an agent, open a session, send a message, and read responses — is located at src/server/DEVELOPER.md within the repository.
```
# Example: create an agent session via curl
curl -X POST /api/sessions 
  -H "Content-Type: application/json" 
  -d '{"agent_id": "your-agent-id"}'
```

Production Architecture Overview

AWS EKS hosts the sandbox cluster where agent sessions operate in isolation. Render runs the Next.js web dashboard and the background worker. Postgres (either managed or self-hosted) stores session data. The LiteLLM gateway operates independently and routes all model API calls. These four pieces communicate across the network and can each be scaled on their own.

The platform is currently in alpha public preview. Report any issues at github.com/BerriAI/litellm-agent-platform. For deeper architecture details, see docs/k8s-backend.md in the repository.

1 / 6

Published by Marktechpost | AI/ML News and Research for Developers and Engineers

Key Takeaways

BerriAI has released the LiteLLM Agent Platform as open source — a self-hosted infrastructure layer designed to run multiple AI agents in production, with per-team sandbox isolation and session persistence surviving pod restarts.
Sandboxes are powered by Kubernetes using the kubernetes-sigs/agent-sandbox CRD — locally via kind, or in production with AWS EKS — with no cloud credentials required to begin experimenting.
The platform builds on top of the established LiteLLM Gateway, which takes care of model routing, usage cost tracking, and rate limiting across more than 100 LLM providers using the OpenAI-compatible format.
Getting started takes just two commands: bin/kind-up.sh provisions the kind cluster and installs the sandbox controller; docker compose up brings up Postgres, the web interface (:3000), and the worker.
Available under the MIT license and currently in alpha public preview

Explore the GitHub Repo. Also, feel free to follow us on Twitter and don’t miss out on joining our 150k+ ML SubReddit and subscribing to our Newsletter. Hold on — are you on Telegram? You can now join us there too!

Interested in partnering with us to promote your GitHub Repo, Hugging Face Page, product launch, webinar, or similar? Get in touch with us

Top Posts

Charting the Vessel Storm: A Proteomic Blueprint for Vasculitis Remission

Migrate Your On-Prem ERP to Dynamics 365: A Cloud Transformation Journey

Supercharging Smart Homes: The Fibre Internet Revolution Behind IoT Awakening

LiteLLM Agent Platform: Kubernetes-Powered Self-Hosted Infrastructure for Isolated Agent Sandboxes and Persistent Session Management in Production

Speed, VRAM, Multi-GPU Smackdown: Unsloth, Axolotl, TRL, or LLaMA-Factory?

5 No-Cost Courses to Transform from AI Newbie to Pro

The System76 Thelio Mira: My Dream Linux Desktop Come True

Google’s Gemini 3.6 Flash: Slashing Enterprise Agent Token Costs

Stop ML Chaos: Your Blueprint for Experiment Order

NVIDIA Cosmos 3 Edge: 4B-Power Robot Brains Thinking and Acting on Your Device

Charting the Vessel Storm: A Proteomic Blueprint for Vasculitis Remission

Migrate Your On-Prem ERP to Dynamics 365: A Cloud Transformation Journey

Supercharging Smart Homes: The Fibre Internet Revolution Behind IoT Awakening

Speed, VRAM, Multi-GPU Smackdown: Unsloth, Axolotl, TRL, or LLaMA-Factory?

Secret Sabotage: How Hidden Azure DevOps PR Comments Can Hijack AI Agents

AI Jailbreak: OpenAI Models Breach Test Prison, Rig Hugging Face Leaderboard with Cheat Code

Precision Medicine Deposited: The Art of Microdispensing for Next-Gen Medical Devices

When the World Cup Collided with the Cloud: 2026’s Digital Traffic Surge

Trending

Charting the Vessel Storm: A Proteomic Blueprint for Vasculitis Remission

Migrate Your On-Prem ERP to Dynamics 365: A Cloud Transformation Journey

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

LiteLLM Agent Platform: Kubernetes-Powered Self-Hosted Infrastructure for Isolated Agent Sandboxes and Persistent Session Management in Production

What Problem Does it Solve?

Architecture and Technical Stack

Getting Started

How It Fits with the LiteLLM Gateway

Marktechpost’s Visual Walkthrough

Key Takeaways

Related Posts