I’ve been tackling the consumer multi-GPU PCIe bandwidth problem — Nvidia dropped NVLink from the 4090/5090, so distributing a 70B model across two consumer GPUs limits you to roughly 30 GB/s over PCIe peer-to-peer. Over the past few months, I’ve built a Python library that leverages the GPU’s otherwise-unused NVENC/NVDEC hardware to compress activations and KV cache in real time, then transmits the compact bitstream across the same connection. Repository: (Apache 2.0) Existing work (the concept itself isn’t new)
The “video codec on tensors” concept was already published when I began. Here’s what this project contributes beyond that:
Benchmark results (RTX 5090, real workloads)
What hasn’t been measured end-to-end yet (projections from the data above)Multi-GPU PCIe peer-to-peer activation transfer recovering ~180 GB/s effective bandwidth — the codec primitive is implemented and benchmarked, but the cross-GPU PCIe peer-to-peer integration is still pending. (This is where I need community support, since my test setup has only one desktop GPU and you need two on the same motherboard to validate this.) Real two-machine ethernet split-model inference — the network-simulation proof-of-concept measures real codec time plus a simulated link, but it isn’t a genuine two-machine deployment yet. (I have a 4090 laptop arriving next week to physically test this networked scenario.) Long-context KV-spill end-to-end tok/s on a real model decode loop — the compression ratio is measured, but the concrete N tok/s → 3N tok/s benchmark on, say, a 32B model with 64K context isn’t in the repo yet. The math suggests it works; the benchmark hasn’t been written. Where I’d welcome contributions
What’s included in the repository19 numbered, runnable proof-of-concepts, with every reported number reproducible. An honest status summary sits at the top of the README. The PCA basis builder, per-channel quantizer, YUV pack/unpack, and codec wrappers are all modular so components can be swapped independently. Built solo while managing full-time caregiving responsibilities — technical feedback, critique, or pointers to related work I may have overlooked are genuinely appreciated.
|
Subscribe to Updates
Get the latest tech insights from TechnologiesDigest.com on AI, innovation, and the future of digital technology.
Trending
- Bridging the Gap: Legacy Tools Now Powered by Enterprise AI
- Unlocking the Potential of Wi-Fi HaLow: The Game-Changer Poised to Revolutionize IoT Networks
- Orchestrating Intelligent Agents to Decode Biology—Modeling Networks, Proteins, Metabolism, and Cell Signals in Real Time
- You Installed Hermes. Now Make It Look Better Than ChatGPT or Claude
- torch-nvenc-compress: Using GPU NVENC Silicon as a PCIe Bandwidth Multiplier
- Global Takedown Nets 276 Arrests: 9 Crypto Scam Rings Shut Down in $701M Bust
- Shared Experiences: The Hidden Reset Button for Stress Relief
- YouTube Premium vs. Premium Lite: Which Tier Gives You More Bang for Your Buck?


