In short
- DeepSeek launched its new V4-Professional mannequin with 1.6 trillion parameters.
- It prices $1.74/$3.48 per million enter/output tokens, roughly 1/twentieth the worth of Claude Opus 4.7 and 98% lower than GPT 5.5 Professional.
- DeepSeek skilled V4 partly on Huawei Ascend chips, circumventing U.S. export restrictions, and says that after 950 new supernodes come on-line later in 2026, the Professional mannequin’s already-low worth will drop additional.
DeepSeek is again, and it confirmed up a number of hours after OpenAI dropped GPT-5.5. Coincidence? Possibly. However when you’re a Chinese language AI lab that the U.S. authorities has been attempting to decelerate with chip export bans for the previous three years, your sense of timing will get fairly sharp.
The Hangzhou-based lab launched preview variations of DeepSeek-V4-Professional and DeepSeek-V4-Flash right this moment, each open-weight, each with a million token context home windows. Meaning you’ll be able to mainly work with a context roughly the dimensions of the Lord of the Rings Trilogy earlier than the mannequin collapses. Each are additionally priced nicely under something comparable within the West, and each are free for these able to operating regionally.
DeepSeek’s final main disruption—R1 in January 2025—wiped $600 billion from Nvidia’s market cap in a single day as investor questioned whether or not American firms actually wanted such large investments to provide outcomes {that a} small chinese language lab achieved with a fraction of the fee. V4 is a unique form of transfer: quieter, extra technical, and extra targeted on effectivity for anybody really constructing with AI.
Two fashions, very totally different jobs
Of the 2 new fashions, DeepSeek’s V4-Professional is the massive one, with 1.6 trillion complete parameters. To place that in perspective, parameters are the inner “settings” or “brain cells” {that a} mannequin makes use of to retailer data and acknowledge patterns—the extra parameters a mannequin has, the extra complicated info it may theoretically maintain. That makes it the largest open-source mannequin within the LLM market so far. The scale might sound ridiculous till you be taught it solely prompts 49 billion of them per inference cross.
That is the Combination-of-Consultants trick DeepSeek has refined since V3: The total mannequin sits there, however solely the related slice of it wakes up for any given request. Extra data, identical compute invoice.
“DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today,” Deepseek wrote within the mannequin’s official card on Huggingface. “It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks.”
V4-Flash is the sensible one: 284 billion complete parameters, 13 billion energetic. It’s designed to be quicker, cheaper, and based on DeepSeek’s personal benchmarks, “achieves comparable reasoning performance to the Pro version when given a larger thinking budget.”
Each assist a million tokens of context. That is roughly 750,000 phrases—roughly the whole “Lord of the Rings” trilogy plus change. And that’s as a typical characteristic, not a premium tier.
Deepseek’s (not so) secret sauce: Making consideration not horrible at scale
Here is the technical half for nerds or these within the magic powering the mannequin. Deepseek doesn’t conceal its secrets and techniques, and every little thing is on the market free of charge—the complete paper is on the market on Github.
Commonplace AI consideration—the mechanism that lets a mannequin perceive relationships between phrases—has a brutal scaling downside. Each time you double the context size, the compute price roughly quadruples. So operating a mannequin on 1,000,000 tokens is not simply twice as costly as 500,000 tokens. It is 4 occasions as costly. For this reason lengthy context has traditionally been a checkbox labs add after which silently throttle behind price limits.
DeepSeek invented two new consideration varieties to get round this. The primary, Compressed Sparse Consideration, works in two steps. It first compresses teams of tokens—say, each 4 tokens—right into a single entry. Then, as an alternative of attending to all of these compressed entries, it makes use of a “Lightning Indexer” to choose solely probably the most related outcomes for any given question. Your mannequin goes from attending to 1,000,000 tokens to attending to a a lot smaller set of a very powerful chunks, form of like a librarian who does not learn each ebook however is aware of precisely which shelf to test.
The second, Closely Compressed Consideration, is extra aggressive. It collapses each 128 tokens right into a single entry—no sparse choice, simply brutal compression. You lose fine-grained element, however you get an especially low-cost world view. The 2 consideration varieties run in alternating layers, so the mannequin will get each the element and the overview.
The end result, from the technical paper: At a million tokens, V4-Professional makes use of 27% of the compute its predecessor (V3.2) wanted. KV cache—the reminiscence the mannequin wants to trace context—drops to simply 10% of V3.2. V4-Flash pushes that additional: 10% of compute, 7% of reminiscence.
And this ended up with Deepseek with the ability to supply a less expensive worth per token than its opponents, whereas offering comparable outcomes. To place that in greenback phrases: GPT-5.5 launched yesterday at $5 enter and $30 output per million tokens with GPT-5.5 Professional priced at $30 per million enter tokens and $180 per million output tokens.
Deepseek V4-Professional is $1.74 enter and $3.48 output. V4-Flash is $0.14 enter and $0.28 output. Cline CEO Saoud Rizwan identified that if Uber had used DeepSeek as an alternative of Claude, its 2026 AI price range—reportedly sufficient for 4 months of utilization—would have lasted seven years.
deepseek v4 is now the most affordable sota mannequin out there at 1/twentieth the price of opus 4.7.
for perspective, if uber used deepseek as an alternative of claude their 2026 ai price range would have lasted 7 years as an alternative of solely 4 months. pic.twitter.com/i9rJZzvRBV
— Saoud Rizwan (@sdrzn) April 24, 2026
The benchmarks
DeepSeek does one thing uncommon in its technical report: It publishes the gaps. Most mannequin releases cherry-pick the benchmarks the place they win. DeepSeek ran the complete comparability in opposition to GPT-5.4 and Gemini-3.1-Professional, discovered that V4-Professional’s reasoning lags behind these fashions by about three to 6 months, and printed it anyway.
The place V4-Professional-Max really wins: Codeforces, aggressive programming benchmark, rated like human chess. V4-Professional scored 3,206, inserting it round twenty third amongst precise human contest contributors. On Apex Shortlist, a curated set of arduous math and STEM issues, it scored a cross price and hit 90.2% versus Opus 4.6’s 85.9% and GPT-5.4’s 78.1%. On SWE-Verified, which measures whether or not a mannequin can resolve actual GitHub points pulled from precise open-source repositories, it scored 80.6%—matching Claude Opus 4.6.

The place it trails: multitasking benchmark MMLU-Professional (Gemini-3.1-Professional at 91.0% vs V4-Professional at 87.5%), knowledgeable data benchmark GPQA Diamond (Gemini 94.3 vs V4-Professional 90.1), and Humanity’s Final Examination, a graduate-level benchmark the place Gemini-3.1-Professional’s 44.4% nonetheless beats V4-Professional’s 37.7%.
On lengthy context particularly, V4-Professional leads open-source fashions and beats Gemini-3.1-Professional on the CorpusQA benchmark (a check simulating actual doc evaluation at a million tokens), however loses to Claude Opus 4.6 on MRCR—a check measuring how nicely a mannequin retrieves particular needles buried deep in a really lengthy haystack.
Constructed to run brokers, not simply reply questions
The agentic stuff is the place this launch will get fascinating for builders really transport merchandise.
V4-Professional can run in Claude Code, OpenCode, and different AI coding instruments. In response to DeepSeek’s inner survey of 85 builders who used V4-Professional as their main coding agent, 52% stated it was able to be their default mannequin, 39% leaned towards sure, and fewer than 9% stated no. Inside staff stated it outperforms Claude Sonnet and approaches Claude Opus 4.5 on agentic coding duties.

Synthetic Evaluation, which runs impartial evaluations of AI fashions on real-world duties, ranked V4-Professional first amongst all open-weight fashions on GDPval-AA—a benchmark testing economically useful data work throughout finance, authorized, and analysis duties, scored by way of Elo. V4-Professional-Max scored 1,554 Elo, forward of GLM-5.1 (1,535) and MiniMax’s M2.7 (1,514). For reference, Claude Opus 4.6 scores 1,619 on the identical benchmark—nonetheless forward, however the hole is closing.
DeepSeek V4 Professional is the #1 open weights mannequin on GDPval-AA, our agentic real-world work duties analysis@deepseek_ai has launched V4 Professional (1.6T complete / 49B energetic) and V4 Flash (284B complete / 13B energetic). V4 is DeepSeek’s first new dimension since V3, with all intermediate fashions… pic.twitter.com/2kJWVrKQjF
— Synthetic Evaluation (@ArtificialAnlys) April 24, 2026
Deepseek’s V4 additionally introduces one thing referred to as “interleaved thinking.” In earlier fashions, when you had been operating an agent that made a number of instrument calls—say, it searched the online, then ran some code, then searched once more—the mannequin’s reasoning context bought flushed between rounds. Every new step, the mannequin needed to rebuild its psychological mannequin from scratch. V4 retains the complete chain of thought throughout instrument calls, so a 20-step agent workflow does not undergo from amnesia midway via. This issues greater than it sounds for anybody operating complicated automated pipelines.
Deepseek and the U.S.-China AI battle
The U.S. has been proscribing high-end Nvidia chip exports to China since 2022. The acknowledged aim was to sluggish Chinese language AI improvement, however the chip ban did not cease DeepSeek and as an alternative made them invent a extra environment friendly structure and construct out home {hardware} provide.
DeepSeek did not launch V4 in a vacuum—the AI house has been flush with exercise as of late: Anthropic shipped Claude Opus 4.7 on April 16—a mannequin Decrypt examined and located robust on coding and reasoning, with notably excessive token utilization. The day earlier than that, Anthropic was additionally sitting on Claude Mythos, a cybersecurity mannequin it says it may’t launch publicly as a result of it is too good at autonomous community assaults.
Xiaomi dropped MiMo V2.5 Professional on April 22, going full multimodal—picture, audio, video. Prices $1 enter and $3 output per million tokens. It matches Opus 4.6 on most coding benchmarks. Three months in the past, no one was speaking about Xiaomi as a frontier AI firm. Now it is transport aggressive fashions quicker than most Western labs.
OpenAI’s GPT-5.5 landed yesterday with prices spiking as much as $180 per million tokens of output within the Professional model. It beats V4-Professional on Terminal Bench 2.0 (82.7% vs 70.0%), which checks complicated command-line agent workflows. However it prices significantly greater than V4-Professional for equal duties. That very same day Tencent launched Hy3, one other state-of-the-art mannequin targeted on effectivity.
What this implies for you
So with so many new fashions out there, the query builders are literally asking: When is the premium price it?
For enterprise, the mathematics might have modified. A mannequin that leads open-source benchmarks at $1.74 per million enter tokens means large-scale doc processing, authorized evaluate, or code technology pipelines that had been costly six months in the past at the moment are less expensive. The one-million-token context means you’ll be able to feed total codebases or regulatory filings in a single request as an alternative of chunking them throughout a number of calls.
Moreover, its open-source nature means it can’t solely be run free of charge on native {hardware}, however it may be custom-made and improved primarily based on the corporate’s wants and use instances.
For builders and solo builders, V4-Flash is the one to look at. At $0.14 enter and $0.28 output, it is cheaper than fashions that had been thought of price range choices a yr in the past—and it handles most duties the Professional model handles. DeepSeek’s current deepseek-chat and deepseek-reasoner endpoints already path to V4-Flash in non-thinking and pondering modes respectively, so when you’re on the API, you are already utilizing it.
The fashions are text-only for now. DeepSeek stated it is engaged on multimodal capabilities, which implies different huge labs from Xiaomi to OpenAI nonetheless have that edge. Each fashions are MIT licensed and out there on Hugging Face right this moment. The previous deepseek-chat and deepseek-reasoner endpoints retire on July 24, 2026.
Every day Debrief E-newsletter
Begin on daily basis with the highest information tales proper now, plus unique options, a podcast, movies and extra.



