Rewriting Jaeger’s ClickHouse Backend: Achieving 8.6× Compression On 10 Million Spans

Published on June 23, 2026
by Mahad Zaryab, a CNCF Jaeger Project Maintainer and Software Engineer at Meta

CNCF projects featured in this post

As a long-time maintainer of Jaeger, I’ve seen users ask for ClickHouse support repeatedly over the years. With Jaeger v2.18.0, we’ve finally made it happen. What thrills me most isn’t simply that ClickHouse is now an option—it’s that its architecture feels almost purpose-built for large-scale telemetry. It effortlessly absorbs massive, append-only write streams and executes complex analytical aggregations in milliseconds, giving teams a highly efficient, battle-tested storage backend.

For those unfamiliar with the project, Jaeger is a graduated Cloud Native Computing Foundation (CNCF) distributed tracing platform designed to monitor and troubleshoot intricate microservices environments. It follows requests across service boundaries to reveal latency bottlenecks and root causes, ultimately helping teams shorten their mean time to repair (MTTR). By integrating ClickHouse natively, Jaeger can now take advantage of columnar storage to deliver lightning-fast query speeds and high-ratio data compression for billions of spans.

In this article, I’ll cover why ClickHouse is an excellent fit for trace storage, how the underlying schema is structured, and how you can start using it with Jaeger right away.

Why columnar storage comes out ahead

At its heart, the tracing challenge has two sides: storing enormous volumes of semi-structured event data and then searching that data rapidly across many dimensions—service, operation, tags, duration, time range, and trace ID. Cassandra and Elasticsearch have both served the Jaeger community reliably, but they carry operational costs. Indexing overhead introduces latency and added expense. Scaling grows complicated. Retention decisions force difficult compromises.

High-throughput ingestion and low-latency queries

ClickHouse is a column-oriented OLAP database engineered for precisely these demands: high-throughput ingestion, aggressive compression, and fast analytical queries. For tracing, this is nearly a perfect match. Trace data is inherently repetitive—the same service names, operation names, status codes, and tags recur constantly. A columnar layout excels with that kind of repetition.

“Trace data is inherently repetitive—the same service names, operation names, status codes, and tags recur constantly. A columnar layout excels with that kind of repetition.”

Compression that truly makes a difference

We observed substantial compression improvements on trace data. Service names like “auth-service” or “payment-gateway” show up hundreds of thousands of times. The same goes for operation names, tag keys, and status codes. In a row-oriented database, that redundancy remains uncompressed. In a column-oriented one, ClickHouse clusters identical values together, making them extremely easy to compress. The outcome? An 8.6× compression ratio on the spans table in our benchmarks.

Real-time analytics

ClickHouse also enables more sophisticated analytical queries on trace data. Because aggregations run highly efficiently on columnar storage, Jaeger v2.18 introduces native ClickHouse SPM methods to compute service-level latency, call rates, and error rates directly from your stored spans. This means teams can produce core health and performance metrics for their microservices straight from trace data, without relying on an external metrics pipeline.

Designing the schema

Schema design was where things became challenging. We needed to optimize for Jaeger’s core query patterns: trace lookup by trace ID, service, and operation; attribute filtering; time-range queries; and the aggregation that powers the Service Performance Monitoring (SPM) feature. These requirements don’t all point in the same direction.

There’s an outstanding earlier post by Ha Anh Vu that benchmarked ClickHouse schemas for Jaeger v1, and that research laid important groundwork. However, Jaeger v2 adopts the OpenTelemetry data model, which required us to revisit several earlier decisions.

The design space is thoroughly documented in an Architectural Decision Record (ADR). The sections below explore some of the most important decisions worth understanding.

Trade-offs in primary key selection

In ClickHouse, the primary key doesn’t enforce uniqueness. Instead, it determines the on-disk sort order and drives a sparse index (one index entry per 8,192-row granule). Choosing it is the single most consequential decision in the schema.

We had two candidates for the primary key:

Optimize for trace retrieval: sort by trace_id. Every span belonging to a trace lands in one contiguous block, so GetTrace becomes a single seek plus a sequential read. However, search queries pay the price for this optimization, since the service_name and operation_name filters cannot leverage the primary key index at all.
Optimize for search (our choice): sort by (service_name, name, start_time). Search queries that filter by service, operation, and a time window become direct primary-key lookups.

The decision boiled down to an asymmetric trade-off. Sorting by trace_id makes search performance poor, but sorting by (service_name, name, start_time) hurts trace retrieval far less, because we can recover most of the lost performance with two inexpensive mechanisms:

A bloom_filter skip index on trace_id, which lets the engine prove a granule cannot contain a given ID without actually reading it.
A trace_id_timestamps materialized view that tells the search path the time bounds for each matching trace, so the subsequent GetTraces call can prune partitions and granules.

An earlier benchmark using the schema sorted by trace_id clearly demonstrated the asymmetry. Trace retrieval ran at about 27 ms, but a search query took roughly 880 ms. Re-sorting by (service_name, name, start_time) pushed trace retrieval to around 100 ms (slower, but still well within interactive thresholds) while bringing multi-filter search down to about 140 ms.

Storing typed attributes

In Jaeger v1, tags were always strings. The v2 reader API accepts a typed map, where attributes can be Bool, Int64, Float64, String, or one of the complex types (Bytes, Slice, Map). We need to query across these types, so the storage layer can’t collapse everything to strings.

The schema takes advantage of ClickHouse’s Nested column per primitive type, repeated

Think of a nested mini-table nestled within every row; each one can carry its own collection of attribute names and values. This design enables attribute filters to adopt the same query semantics used for standard table queries.

That said, it’s important to recognize that searches relying solely on attributes are naturally more resource-intensive, since they can’t fully utilize ClickHouse’s primary index. The table’s index is tuned for top-level structural fields—namely service, operation, and time. For the best query performance and to avoid heavy column scans, users should always pair attribute filters with these fields to narrow down the data ClickHouse needs to examine.

Materialized views

Certain Jaeger queries don’t align well with the spans table’s sort order. For instance, the Jaeger UI must swiftly load the complete list of known service names and operations, while trace lookups often require efficient access to trace time windows.

Instead of relying on costly full-table scans to serve these queries, we employ materialized views that precompute the results. In ClickHouse, materialized views automatically process incoming inserts from a source table and direct the transformed output into purpose-built target tables.

This technique is applied to accelerate queries for service names, operations, and trace ID timestamp ranges.

Five levels of attributes

There’s a subtle technical hurdle in the span’s schema that may not be immediately apparent: how the storage layer resolves attribute lookups. For example, when filtering for http.status_code=200, the system can’t inherently determine whether “200” is a string, an integer, a span-level attribute, or a resource-level attribute. Depending on the originating service, the same logical key could be stored under str_attributes or int_attributes, and it might reside at any of the five data tiers: resource, scope, span, event, or link.

To address this, we maintain a dedicated attribute_metadata table, fed by materialized views built on top of the spans table. This lets the reader resolve the filter key at query time and target only the columns corresponding to the types and levels that were actually recorded.

Span throughput at scale

We evaluated the ClickHouse backend using 10 million spans spread across 1 million traces on a single-node setup. The benchmark assessed ingestion throughput, compression efficiency, trace retrieval speed, and filtered search latency.

The backend maintained over 50k spans/sec during ingestion, delivered an 8.6× compression ratio on the spans table, and shrank span data from nearly 6 GiB down to approximately 722 MiB on disk. Trace retrieval averaged around 100 ms, while most search queries completed in under 50 ms. More involved filtered queries finished in roughly 140 ms.

“The backend maintained over 50k spans/sec during ingestion, delivered an 8.6× compression ratio on the spans table, and shrank span data from nearly 6 GiB down to approximately 722 MiB on disk.”

These results are promising, but they should be interpreted within the context of the benchmark environment and dataset used. The complete methodology, configuration specifics, and query details are documented in the benchmarking report.

Getting started

ClickHouse support is available as an alpha storage backend beginning with Jaeger v2.18.0. You’ll need a running ClickHouse instance and the Jaeger v2 configuration for the ClickHouse backend. The full instructions are outlined in the setup guide.

Serving as a Jaeger maintainer has been one of the most fulfilling aspects of my career thus far. If you’d like to discuss this work, contribute, or report issues, please open one on GitHub or join us in the CNCF #jaeger Slack channel.

Top Posts

Rewriting Jaeger’s ClickHouse backend: Achieving 8.6× compression on 10 million spans

NVIDIA Halos OS: Revolutionizing Safety for Physical AI Workloads

South Korea’s Unrealized Gains Tax Plan Ignites Market Turmoil on Black Tuesday

Rewriting Jaeger’s ClickHouse backend: Achieving 8.6× compression on 10 million spans

OWL’s AWS Digest: Hanoi Local Zones, Grok 4.3 on Bedrock, NY Summit Highlights & Fresh Price Drops (June 22, 2026)

Strategic Alliances Ignite Microsoft & Chevron’s Secret Power Blitz Behind the AI Data Center Revolution

Senate Defence Bill Aims To Recruit Top Cyber Talent and Curb Civilian Job Cuts

AWS Lambda’s MicroVM-Powered sandboxes: Unleash full lifecycle control in isolation.

Uncovering a Hidden Flaw: Our Journey Debugging the hyper HTTP Library

“Cloud Exchange 2026: Forrester’s Lauren Nelson Unveils the Future of Cloud Maturity”

Rewriting Jaeger’s ClickHouse backend: Achieving 8.6× compression on 10 million spans

NVIDIA Halos OS: Revolutionizing Safety for Physical AI Workloads

South Korea’s Unrealized Gains Tax Plan Ignites Market Turmoil on Black Tuesday

Vention Unites with FANUC and Universal Robots to Pioneer Software-Defined Automation

Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads

Windows 11 KB5095093 Update Introduces Revolutionary Point-in-Time Restore Feature

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

OWL’s AWS Digest: Hanoi Local Zones, Grok 4.3 on Bedrock, NY Summit Highlights & Fresh Price Drops (June 22, 2026)

Trending

Rewriting Jaeger’s ClickHouse backend: Achieving 8.6× compression on 10 million spans

NVIDIA Halos OS: Revolutionizing Safety for Physical AI Workloads

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Rewriting Jaeger’s ClickHouse backend: Achieving 8.6× compression on 10 million spans

Why columnar storage comes out ahead

High-throughput ingestion and low-latency queries

Compression that truly makes a difference

Real-time analytics

Designing the schema

Trade-offs in primary key selection

Storing typed attributes

Materialized views

Five levels of attributes

Span throughput at scale

Getting started

Related Posts