Xiaomi MiMo V2 Professional Evaluation: The AI Mannequin So Good It Was Mistaken For DeepSeek V4

Briefly

Xiaomi’s MiMo-V2-Professional—a trillion-parameter mannequin that briefly handed as “DeepSeek V4”—quietly lands as a top-tier AI contender.
It excels at coding, inventive writing, and agentic duties whereas dramatically undercutting rivals like Claude on value.
Sturdy reasoning and output high quality include trade-offs, together with math missteps and excessive token consumption at occasions.

Most People know Xiaomi—in the event that they understand it in any respect—as that low-cost cellphone model from China.

That is a big misinterpret. Xiaomi is the third-largest smartphone producer on the planet, behind solely Apple and Samsung, delivery roughly 170 million telephones in 2025. It makes televisions, air purifiers, health trackers, electrical scooters, clothes, and now vehicles.

Xiaomi’s SU7 Extremely set the Nürburgring file for quickest mass-produced electrical car final yr, beating out Rimac and Porsche. It lately partnered with the Sei blockchain to preinstall crypto wallets on its units throughout Europe, Latin America, and Southeast Asia. The corporate’s market cap sits round $137 billion.

So when Xiaomi drops an AI mannequin, possibly we must always concentrate.

On March 18, the corporate’s devoted AI analysis arm quietly launched three fashions directly: MiMo-V2-Professional, MiMo-V2-Omni, and a text-to-speech mannequin. The primary mannequin of the brand new MiMo technology appeared in December 2025 when the corporate quietly dropped MiMo-V2-Flash—a succesful 309B mixture-of-experts mannequin—and nearly nobody exterior the Chinese language AI group paid consideration. The Western tech press principally shrugged.

Then, on March 11, an nameless 1-trillion-parameter mannequin referred to as “Hunter Alpha” appeared on OpenRouter with no developer attribution. The mannequin climbed to the highest of OpenRouter’s leaderboard, surpassed one trillion tokens in whole utilization, and instantly triggered widespread hypothesis that it was DeepSeek’s unreleased V4.

The anticipation for that mannequin had been constructing for weeks, with insiders claiming it could outperform each Claude and ChatGPT on coding duties.

It wasn’t DeepSeek.

On March 18, Luo Fuli, head of Xiaomi’s MiMo division and a former DeepSeek researcher, revealed Hunter Alpha was an early inside take a look at construct of MiMo-V2-Professional. Xiaomi’s inventory jumped 5.8%. “I call this a quiet ambush,” Luo wrote on X.

MiMo-V2-Professional & Omni & TTS is out. Our first full-stack mannequin household constructed actually for the Agent period.
I name this a quiet ambush — not as a result of we deliberate it, however as a result of the shift from Chat to Agent paradigm occurred so quick, even we barely believed it. Someplace in between was a…
— Fuli Luo (@_LuoFuli) March 18, 2026

MiMo boasts over one trillion whole parameters, 42 billion lively per request by way of a mixture-of-experts setup. A hybrid consideration mechanism working at a 7:1 ratio handles a context window as much as a million tokens. A built-in multi-token prediction layer quickens technology by predicting a number of tokens per step, fairly than one after the other. It’s at the moment closed supply, although Xiaomi has left the door open on a possible future launch.

On the Synthetic Evaluation Intelligence Index, MiMo-V2-Professional ranks eighth worldwide and second amongst Chinese language fashions, trailing solely GLM-5. On SWE-bench Verified—real-world software program engineering duties—it scores 78%, in opposition to Claude Opus 4.6’s 80.8% and Claude Sonnet 4.6’s 79.6%.

On ClawEval, the agentic benchmark tied to the OpenClaw framework, it hits 61.5, approaching Opus 4.6’s 66.3. On PinchBench, it sits third globally at 81.0, simply behind Opus 4.6 (81.5) and its sibling MiMo-V2-Omni (81.2).

MiMo-V2-Professional prices $1 per million enter tokens and $3 per million output tokens, as much as 256K context. Claude Sonnet 4.6 runs $3 per million enter and $15 per million output (Opus 4.6 is $5/$25). For builders constructing agentic techniques at scale, these numbers aren’t a footnote.

The Omni sibling handles imaginative and prescient, audio, and video natively—not as bolted-on modules, however educated end-to-end as a unified perceptual system. The demo displaying it analyzing dashcam footage as a real-time autonomous driving mind was, frankly, spectacular. It is genuinely multimodal in a method that almost all “omni” fashions solely declare to be.

Testing the mannequin

After all, we examined MiMo-V2-Professional to learn how good it’s. This is what really occurred. The outputs will likely be out there in our Github repository.

Artistic writing

We gave MiMo-V2-Professional a single inventive writing immediate: a time journey story anchored to Mesoamerican historical past, with a particular protagonist, a cultural identification to honor, and a philosophical paradox about how time can’t be modified.

The mannequin returned over 3,000 phrases: a correct title, 5 full chapters and the structural self-discipline you’d count on from a draft that had been by means of an editor. It even wrote an epilogue.

It’s, with out query, the longest and richest piece of inventive prose we now have gotten from any mannequin, with the only real exception of Longwriter—a specialised, however now outdated mannequin constructed from the bottom up particularly for long-form technology, which is a really totally different class of competitors.

The writing itself was wealthy, descriptive, and vivid. The opening paragraph begins constructing the picture of your entire scene. MiMo v2 Professional embeds realism to make the story plausible.

Not like different fashions reminiscent of Grok, it did not simply set a scene in a spot—on this case, historical Mexico. It understood what historical Mesoamerica smelled like, and constructed the temper from the bottom up utilizing native phrases, sensible descriptions, and good contextual cues.

Dialogue sits contained in the narrative precisely the way it does in literary fiction, as an alternative of embedding it into paragraphs like most present fashions do.

One other factor value noticing is that the paradox—arguably the core ingredient of the story—wasn’t purely mental, however emotional. The entire arc is resolved and not using a lecture. The ultimate traces stick the touchdown the way in which good fiction is meant to: not by explaining the theme, however by making you’re feeling it.

“Outside, the rain began. It fell on the spiraling towers and the restored lakes and the ancient ground of Tlachinollan, where, buried in volcanic soil under the weight of a thousand years, a black rectangle waited with the patience of something that already knew how the story ended.”

The cultural specificity—mentions of cara de luna, maguey fiber, the temazcal custom, and the Nahuatl names used within the story—is constant and by no means ornamental. The time journey paradox is definitely argued, not simply nodded at. For inventive writing use circumstances, MiMo-V2-Professional simply put itself on a really quick listing, and in our opinion is by far the perfect and richest mannequin out there, beating Claude 4.6 Opus simply.

The total story is accessible right here.

Coding

The benchmark numbers level to coding as MiMo-V2-Professional’s strongest go well with, and the hands-on expertise backs that up. We requested it to construct our typical stealth sport from a single immediate, and it shipped a working sport on the primary attempt.

Not “working” merely within the sense of technically working, however working within the sense that the logic held, the screens made sense, and the visible design was really good. That mixture—correctness and aesthetics—is the place most fashions disintegrate. They get one or the opposite, however often not each.

It additionally selected a 2.5 D aesthetic as an alternative of the same old 2D fashion that different fashions went with. This design alternative made this system extra aesthetically pleasing with out altering its core proposition.

We adopted up with small enhancements. Including sound and MIDI music to a working 3D sport has damaged earlier fashions mid-generation: the code base will get too giant, the context loses the thread, and fashions both find yourself in a loop or freeze. MiMo-V2-Professional added each and saved the entire thing coherent. The music matched the sport’s tone, whereas the screens matched the sport’s visible identification.

We loved taking part in it, although if we’re sincere, extra for the way it appeared than the way it challenged us. The issue scaled with the variety of opponents fairly than stage design—the robotic and the PC spawned in the identical positions each spherical. That is a design alternative, not a bug.

Nonetheless, for a single-prompt, zero-iteration output, it’ll do the job.

You possibly can play the sport by clicking on this hyperlink.

Logic and customary sense

We requested MiMo-V2-Professional to behave as a authorized skilled and reply whether or not it is lawful for a person to marry his widow’s sister below Falkland Islands legislation. It is a tough query that goals to judge the mannequin’s reasoning.

The ultimate reply was incorrect, however the cause why is the fascinating half. The mannequin’s chain of thought appropriately caught the linguistic entice within the immediate: “if a man has a widow, that means he’s deceased” it stated—so the query is technically nonsensical.

It recognized the flaw, and determined that probably the most logical factor was that the person was referring to his “deceased wife’s sister.” It then proceeded to reply that reframed query fairly than flagging the unique as unanswerable.

“Based on my analysis of the legal framework governing the Falkland Islands, the answer to your question is yes, it is legal for a man to marry the sister of his deceased wife,” the mannequin wrote. “The phrasing ‘marry his widow’s sister’ contains a logical contradiction. If a man has a ‘widow,’ he is deceased and cannot remarry. The correct legal question is whether a man may marry the sister of his deceased wife (i.e., his late wife’s sister). This relationship is one of affinity (created by marriage) rather than consanguinity (blood relation),” it concluded

The reasoning was sound. The choice to quietly swap the premise as an alternative of surfacing the contradiction was not.

For this reason transparency in reasoning outputs is vital. We solely know this as a result of Xiaomi exposes the complete chain of thought (OpenAI doesn’t). When a mannequin causes incorrectly in a hidden chain of thought and confidently delivers a incorrect reply, then you haven’t any visibility into the place it went sideways or find out how to appropriate it.

Math

Math is the place MiMo-V2-Professional confirmed its ceiling.

We requested our typical benchmark query from FrontierMath: “Construct a degree 19 polynomial p(x) ∈ C[x] such that X := {p(x) = p(y)} ⊂ P1 × P1 has at least 3 (but not all linear) irreducible components over C. Choose p(x) to be odd, monic, have real coefficients and linear coefficient -19 and calculate p(19)”

The mannequin hit two full freezes and burned by means of a big token funds with out producing a reply.

When it did ultimately reply on the third try, it reasoned by means of the issue step-by-step… and nonetheless bought it incorrect. The right reply was 1876572071974094803391179; it answered p(19)=164,079,552,964,661 and a pair of,012,379,925,093,098,998 on a follo- up query asking it to appropriate itself.

In genera,l it’s superb for regular and even more durable math issues, however frontier math is just not its sturdy go well with—not less than not but. Utilizing the Agentic characteristic as an alternative of the pure LLM might yield higher outcomes.

Agentic options

Xiaomi is following the identical playbook as MiniMax and Kimi, and supplies a one-click OpenClaw integration that spins up a preconfigured cloud occasion with MiMo-V2-Professional because the underlying mannequin. No API setup, no VPS, no ability configuration, no hour-long troubleshooting session earlier than you even run your first job. You click on, it really works.

The demo atmosphere runs for half-hour after which destroys itself—which is an actual limitation, but additionally an sincere one. For builders already comfy with agentic infrastructure, this provides nothing. For everybody else, it is probably the most frictionless on-ramp to agentic AI you might ask for.

Conclusion

All issues thought of, MiMo-V2-Professional is a severe mannequin, and we actually loved tinkering round with it. It’s not good—the maths ceiling is actual, the chain of thought transparency surfaced a reasoning flaw {that a} much less open mannequin would have buried, and the token consumption throughout exhausting reasoning duties provides up quick.

In the event you care about prices, then Xiaomi’s pricing is aggressive—a fraction of what Claude Opus or the newest OpenAI and Google fashions price, and extra succesful than GLM or MiniMax within the areas that matter most for inventive and agentic work.

Artistic professionals specifically stand to realize so much right here—probably greater than they’d from Anthropic proper now.

This mannequin thinks expensively, and it might be a trade-off. In the event you’re working high-volume agentic pipelines, watch the token burn, although it’s possible you’ll find yourself spending lower than you’d with Claude. In the event you’re doing wealthy, open-ended work the place output high quality is the metric, then MiMo-V2-Professional earns its place on the shortlist.

Each day Debrief E-newsletter

Begin day-after-day with the highest information tales proper now, plus unique options, a podcast, movies and extra.

Top Posts

Critical WordPress Zero-Day: Unauthenticated Code Execution Exposed in WP2Shell Flaw

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

General Dynamics Fires Back: DISA’s Enclave Cloud Expansion Sparks Contract Clash

Xiaomi MiMo v2 Professional Evaluation: The AI Mannequin So Good It Was Mistaken for DeepSeek V4

Each day Debrief E-newsletter

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

Benjamin Cowen’s Bold Q4 Forecast: Bitcoin’s $44K Bottom is Imminent!

BIP-110 in Peril: Why Miner Signals Stay Below 1% and What Comes Next

Suno’s Secret Training Data Feast: Deezer, YouTube & Pond5 Hits Leaked

Gagcha Onchain Fever: A Record-Breaking Surge

Polygon’s Bloody Pivot: 1inch Founder Exit Exposes Crypto’s Costly Sacrifice for Revenue

Critical WordPress Zero-Day: Unauthenticated Code Execution Exposed in WP2Shell Flaw

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

General Dynamics Fires Back: DISA’s Enclave Cloud Expansion Sparks Contract Clash

Wireless Logic Bolsters US IoT Reach with Strategic SIMETRY Acquisition

The Blackout Test: Crucial Mistakes I Made With Backup Power (And How You Can Avoid Them)

Unlock Peak Performance: Your Command Protocol for GPT-5.6 Synergy

Iran Hunts US Military Phones: CrashStealer macOS Malware & the CVD Blueprint Unmasked

Benjamin Cowen’s Bold Q4 Forecast: Bitcoin’s $44K Bottom is Imminent!

Trending

Critical WordPress Zero-Day: Unauthenticated Code Execution Exposed in WP2Shell Flaw

Bolivia’s Bold Crypto Play: USDT Adoption Sparks AI Mining Debate

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Xiaomi MiMo v2 Professional Evaluation: The AI Mannequin So Good It Was Mistaken for DeepSeek V4

Briefly

Testing the mannequin

Each day Debrief E-newsletter

Related Posts