I’ve been tackling the consumer multi-GPU PCIe bandwidth problem — Nvidia dropped NVLink from the 4090/5090, so distributing a 70B model across two consumer GPUs limits you to roughly 30 GB/s over PCIe peer-to-peer. Over the past few months, I’ve built a Python library that leverages the GPU’s otherwise-unused NVENC/NVDEC hardware to compress activations and KV cache in real time, then transmits the compact bitstream across the same connection. Repository: (Apache 2.0) Existing work (the concept itself isn’t new)
The “video codec on tensors” concept was already published when I began. Here’s what this project contributes beyond that:
Benchmark results (RTX 5090, real workloads)
What hasn’t been measured end-to-end yet (projections from the data above)Multi-GPU PCIe peer-to-peer activation transfer recovering ~180 GB/s effective bandwidth — the codec primitive is implemented and benchmarked, but the cross-GPU PCIe peer-to-peer integration is still pending. (This is where I need community support, since my test setup has only one desktop GPU and you need two on the same motherboard to validate this.) Real two-machine ethernet split-model inference — the network-simulation proof-of-concept measures real codec time plus a simulated link, but it isn’t a genuine two-machine deployment yet. (I have a 4090 laptop arriving next week to physically test this networked scenario.) Long-context KV-spill end-to-end tok/s on a real model decode loop — the compression ratio is measured, but the concrete N tok/s → 3N tok/s benchmark on, say, a 32B model with 64K context isn’t in the repo yet. The math suggests it works; the benchmark hasn’t been written. Where I’d welcome contributions
What’s included in the repository19 numbered, runnable proof-of-concepts, with every reported number reproducible. An honest status summary sits at the top of the README. The PCA basis builder, per-channel quantizer, YUV pack/unpack, and codec wrappers are all modular so components can be swapped independently. Built solo while managing full-time caregiving responsibilities — technical feedback, critique, or pointers to related work I may have overlooked are genuinely appreciated.
|
Subscribe to Updates
Get the latest tech insights from TechnologiesDigest.com on AI, innovation, and the future of digital technology.
Trending
- Bridging the Edge: How Army G-TEAD Is Solving Critical Technology Gaps on the Frontlines
- Cellular IoT Modules Rebound to $5.6B: Fueled by 5G, AI and Edge Intelligence
- 5 Agentic Workflows That Will Revolutionize Your Data Science Pipeline
- Harnessing Apple Silicon: Mastering Language Model Fine-Tuning with MLX
- Kraken Weighs Aave Acquisition: Insider Talks of a $385M Play for 15% Stake
- SharkLoader’s Deadly Bite: How StrikeShark Attacks Weaponize Cobalt Strike with Precision
- Introducing Security Profiles Operator v1: Locked-Down APIs and a More Secure Upstream Kubernetes Experience
- How I Use iOS 27’s Siri Camera Mode to Identify Anything in Real Time


