Why A 2021 Quantization Method Still Beats Its 2026 Successor

[3], a web-based vector compression technique, garnered significant attention at ICLR 2026. To me, it appeared strikingly similar: it shares substantial common ground with EDEN—a compression approach initially proposed as the 1-bit technique DRIVE at NeurIPS 2021 [1] and later extended to support any bit-width at ICML 2022 [2]. I co-authored EDEN alongside Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson, Michael Mitzenmacher, and Shay Vargaftik.

The TurboQuant publication introduces two versions: TurboQuant-mse and TurboQuant-prod. In our recent thorough evaluation [5], we demonstrate that TurboQuant-mse is actually a simplified variant of EDEN, and that EDEN’s approaches consistently deliver superior performance.

How EDEN compresses a vector

Imagine you want to shrink a $d$ -dimensional vector $x$ (such as a gradient update, an embedding, or a KV-cache entry) to just a few bits per dimension. EDEN carries out this compression through four stages:

Random rotation — Apply a random orthogonal transformation $Pi$ . Once rotated, the individual dimensions become statistically identical and, for high-dimensional vectors, closely resemble a Gaussian distribution.
Scalar quantization — Map each rotated dimension to one of $2^b$ discrete levels from a Lloyd–Max codebook that was trained on the known distribution of the rotated coordinates ( $b$ indicates the desired number of bits per dimension).
Scale — Adjust the result by multiplying with a scaling factor $S$ .
Inverse rotation — Undo the rotation by applying $Pi^top$ , producing an approximation $hat{x}$ of the initial vector.

While prior research (for instance, Suresh et al., 2017 [6]) employed rotation primarily to narrow the dynamic range of coordinates (the spread between the largest and smallest values), EDEN [1] was—as far as we are aware—the first compression method to leverage a more powerful property of random rotation: the resulting coordinates follow a predictable distribution. This enables the use of a fixed quantizer together with a mathematically computed scale that can either minimize MSE or produce an unbiased estimate, depending on the task. Both scaling options are derived through mathematical analysis, and the method achieves an asymptotic improvement in MSE over earlier techniques.

Specifically, EDEN’s two versions differ solely in how they determine $S$ :

EDEN-biased — Selects $S$ using the closed-form expression that yields the lowest reconstruction MSE.
EDEN-unbiased — Picks $S$ so the reconstructed output is correct on average ( $mathbb{E}[hat{x}] = x$ ), which is especially important when aggregating many compressed vectors (e.g., in distributed training or attention mechanisms).

When compared directly with EDEN, TurboQuant-mse aligns at every stage except one: where EDEN computes the scale $S$ mathematically, TurboQuant-mse—despite aiming to minimize MSE—omits this optimized scaling step.

The pseudocode below places all three methods side by side.

Figure 1: EDEN’s pseudocode configured for EDEN-biased, EDEN-unbiased, and TurboQuant-mse. The three are identical except at step 5: the selection of S. Image by author [5].

Why the optimal scale matters

The benefit of using the correct scale $S$ increases as the bit-width grows. At $b = 1$ bit, the difference is negligible. However, at $d = 128$ dimensions and $b = 4$ bits, EDEN-biased achieves a 2.25% lower MSE than TurboQuant-mse—and these are the exact bit-widths commonly used in practice for embeddings and KV caches.

Across dimensions ranging from 16 to 4096 and all tested bit-widths $b in {1,2,3,4}$ , EDEN-biased vNMSE (vector-normalized MSE, $mathbb{E}[|x – hat{x}|^2] / |x|^2$ ) remains below TurboQuant-mse’s in every scenario (Figure 2). As the dimensionality becomes very large, the optimal $S$ approaches 1 and the two algorithms converge,

However, at practical dimensions (ranging from 128 to 1024), this performance gap remains significant.

Figure 2: vNMSE plotted against dimension, comparing EDEN-biased and TurboQuant-mse across bit-widths $b in {1,2,3,4}$ (panels arranged left to right). EDEN-biased (which optimizes the scale factor $S$ ) consistently achieves lower error than TurboQuant-mse (which fixes $S=1$ ) across all tested dimensions. The curves converge at higher dimensions as the optimal $S$ approaches 1. Image by author [5].

Unbiased compression: saving more than a full bit

The findings above focus on the biased (MSE-minimizing) variants. Now let’s examine the unbiased scenario, where tasks like distributed training, approximate attention, or inner-product retrieval require $mathbb{E}[hat{x}] = x$ because they average many quantized vectors together.

EDEN-unbiased employs the same single-pass algorithm as EDEN-biased, with $S$ selected specifically for bias correction. TurboQuant’s unbiased counterpart, TurboQuant-prod, follows a different strategy: it allocates $(b-1)$ bits to the biased TurboQuant-mse step and sets aside 1 bit for a QJL (Quantized Johnson–Lindenstrauss) [4] correction applied to the residual (QJL behaves similarly to EDEN at $b=1$ , but introduces higher variance).

EDEN-unbiased surpasses TurboQuant-prod in every tested configuration, and by a considerable margin. This advantage stems from three structural strengths of EDEN’s single-pass approach:

EDEN optimizes the scale factor. TurboQuant-prod inherits TurboQuant-mse’s $s=1$ first stage, meaning it carries the same MSE penalty.
EDEN’s 1-bit construction produces lower variance than QJL. At large dimensions, EDEN’s 1-bit vNMSE converges to $pi/2 – 1 approx 0.57$ [1], whereas QJL’s converges to $pi/2 approx 1.57$ [4], roughly 2.75× higher.
EDEN dedicates the entire bit budget to a single unbiased quantizer. TurboQuant-prod divides the budget into $(b-1)$ biased bits plus 1 residual bit, which empirically underperforms compared to allocating all $b$ bits to a single unbiased quantizer [5].

These benefits compound on each other. The outcome is clear: 1-bit, 2-bit, and 3-bit EDEN-unbiased each deliver higher accuracy than 2-bit, 3-bit, and 4-bit TurboQuant-prod, respectively (Figure 3). By switching to EDEN, you can reduce the bit count per coordinate by one while still matching TurboQuant-prod’s accuracy.

Figure 3: vNMSE plotted against dimension, comparing EDEN-unbiased and TurboQuant-prod across bit-widths $b in {1,2,3,4}$ (panels arranged left to right). EDEN-unbiased achieves lower error at every dimension. The gap is substantial enough that EDEN with $b$ bits frequently outperforms TurboQuant-prod with $b + 1$ bits. Image by author [5].

On TurboQuant’s own benchmarks

The same trend holds on the standard ANN benchmarks used in TurboQuant’s evaluation: Stanford’s GloVe pre-trained word vectors (Open Data Commons Public Domain Dedication and License v1.0) and Qdrant’s dbpedia-entities-openai3-text-embedding-3-large embeddings (Apache 2.0), assessed using TurboQuant’s published evaluation code:

EDEN-biased achieves lower MSE than TurboQuant-mse, EDEN-unbiased delivers significantly lower inner-product error than TurboQuant-prod, and nearest-neighbor recall on both datasets favors EDEN (Figure 4).

Figure 4: Nearest-neighbor recall on GloVe and OpenAI3 embeddings at 2 and 4 bits per coordinate. EDEN-unbiased outperforms TurboQuant-prod across all four settings. Image by author [5].

Takeaway: use EDEN; optimal scaling matters

EDEN’s scale factor bridges the known post-rotation distribution to an analytically optimal quantizer. TurboQuant-mse retains EDEN’s rotation and codebook but fixes $S=1$ , making it a strictly weaker special case. TurboQuant-prod adds a 1-bit QJL stage on top, while EDEN-unbiased achieves the same property with better

accuracy, by simply selecting a bias-correcting scale.

For MSE-targeted compression (model weight quantization, nearest-neighbor search, KV cache): EDEN-biased calculates the optimal scale $S$ and consistently outperforms TurboQuant-mse (which is EDEN with $S=1$ fixed).
For unbiased estimation (distributed mean estimation, approximate attention, inner-product retrieval): EDEN-unbiased significantly surpasses TurboQuant-prod’s bit-splitting approach, by margins exceeding a full bit per coordinate.

EDEN was initially created for distributed mean estimation in federated and distributed training. Later research has, for instance, applied it to embedding compression for document re-ranking (SDR, 2022 [8]), adapted it for NVFP4 LLM training (MS-EDEN in Quartet II, 2026 [10]), extended it to vector quantization for data-free LLM weight compression (HIGGS, 2025 [9]), which was subsequently used for KV-cache compression (AQUA-KV, 2025 [11]).

EDEN implementations are available in PyTorch and TensorFlow, in Intel’s OpenFL [7], and its 1-bit variant in Google’s FedJax, TensorFlow Federated, and TensorFlow Model Optimization.

For the complete technical comparison with TurboQuant (all figures, detailed experimental methodology), see our note [5].

For the original derivations, proofs, and additional extensions, see our original papers [1] [2].

References

S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, DRIVE: One-bit Distributed Mean Estimation (2021), NeurIPS 2021.
S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning (2022), ICML 2022.
A. Zandieh, M. Daliri, A. Hadian, V. Mirrokni, TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (2026), ICLR 2026.
A. Zandieh, M. Daliri, I. Han, QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead (2024), arXiv:2406.03482.
R. Ben-Basat, Y. Ben-Itzhak, G. Mendelson, M. Mitzenmacher, A. Portnoy, S. Vargaftik, A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work (2026), arXiv:2604.18555.
A. T. Suresh, F. X. Yu, S. Kumar, H. B. McMahan, Distributed Mean Estimation with Limited Communication (2017), ICML 2017.
VMware Open Source Blog, VMware Research Group’s EDEN Becomes Part of OpenFL (November 2022).
N. Cohen, A. Portnoy, B. Fetahu, A. Ingber, SDR: Efficient Neural Re-ranking using Succinct Document Representation (2022), ACL 2022.
V. Malinovskii, A. Panferov, I. Ilin, H. Guo, P. Richtárik, D. Alistarh, HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem (2025), NAACL 2025.
A. Panferov, E. Schultheis, S. Tabesh, D. Alistarh, Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation (2026), arXiv:2601.22813.
A. Shutova, V. Malinovskii, V. Egiazarian, D. Kuznedelev, D. Mazur, N. Surkov, I. Ermakov, D. Alistarh, Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models (2025), ICML 2025.

Top Posts

Bridging the Gap: Legacy Tools Now Powered by Enterprise AI

Unlocking the Potential of Wi-Fi HaLow: The Game-Changer Poised to Revolutionize IoT Networks

Orchestrating Intelligent Agents to Decode Biology—Modeling Networks, Proteins, Metabolism, and Cell Signals in Real Time

Why a 2021 Quantization Method Still Beats Its 2026 Successor

torch-nvenc-compress: Using GPU NVENC Silicon as a PCIe Bandwidth Multiplier

Streamlined TaskTrove Dataset Exploration: Real-Time Parsing Visualization and Verifier Detection

“Decoding CSPNet: The Architecture That Delivers Superior Performance Without Compromises”

Sakana AI Unveils KAME: Real-Time LLM-Powered Speech-to-Speech Architecture

Is silently killing your iPhone battery! The common charging habit quietly destroying the battery of my iPhone? – here’s the fix

Crafting Autonomous AI Agents with Microsoft’s Agent Framework

Bridging the Gap: Legacy Tools Now Powered by Enterprise AI

Unlocking the Potential of Wi-Fi HaLow: The Game-Changer Poised to Revolutionize IoT Networks

Orchestrating Intelligent Agents to Decode Biology—Modeling Networks, Proteins, Metabolism, and Cell Signals in Real Time

You Installed Hermes. Now Make It Look Better Than ChatGPT or Claude

torch-nvenc-compress: Using GPU NVENC Silicon as a PCIe Bandwidth Multiplier

Global Takedown Nets 276 Arrests: 9 Crypto Scam Rings Shut Down in $701M Bust

Shared Experiences: The Hidden Reset Button for Stress Relief

YouTube Premium vs. Premium Lite: Which Tier Gives You More Bang for Your Buck?

Trending

Bridging the Gap: Legacy Tools Now Powered by Enterprise AI

Unlocking the Potential of Wi-Fi HaLow: The Game-Changer Poised to Revolutionize IoT Networks

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Why a 2021 Quantization Method Still Beats Its 2026 Successor

How EDEN compresses a vector

Why the optimal scale matters

Unbiased compression: saving more than a full bit

On TurboQuant’s own benchmarks

Takeaway: use EDEN; optimal scaling matters

References

Related Posts