Xiaomi MiMo v2 Pro Review: The AI Model So Good It Was Mistaken for DeepSeek V4

In brief

Xiaomi’s MiMo-V2-Pro—a trillion-parameter model that briefly passed as “DeepSeek V4”—quietly lands as a top-tier AI contender.
It excels at coding, creative writing, and agentic tasks while dramatically undercutting rivals like Claude on price.
Strong reasoning and output quality come with trade-offs, including math missteps and high token consumption at times.

Most Americans know Xiaomi—if they know it at all—as that cheap phone brand from China.

Strategy, BitMine and Robinhood Shares Hit Monthly Lows as Bitcoin Sinks Further

March 28, 2026

Polymarket Pulls Nuclear Detonation Market Following Public Backlash

NYSE Parent Company Finalizes Polymarket Investment, Totaling $1.6 Billion

March 27, 2026

That’s a significant misread. Xiaomi is the third-largest smartphone manufacturer on the planet, behind only Apple and Samsung, shipping roughly 170 million phones in 2025. It makes televisions, air purifiers, fitness trackers, electric scooters, clothing, and now cars.

Xiaomi’s SU7 Ultra set the Nürburgring record for fastest mass-produced electric vehicle last year, beating out Rimac and Porsche. It recently partnered with the Sei blockchain to preinstall crypto wallets on its devices across Europe, Latin America, and Southeast Asia. The company’s market cap sits around $137 billion.

So when Xiaomi drops an AI model, maybe we should pay attention.

On March 18, the company’s dedicated AI research arm quietly released three models at once: MiMo-V2-Pro, MiMo-V2-Omni, and a text-to-speech model. The first model of the new MiMo generation appeared in December 2025 when the company quietly dropped MiMo-V2-Flash—a capable 309B mixture-of-experts model—and almost no one outside the Chinese AI community paid attention. The Western tech press mostly shrugged.

Then, on March 11, an anonymous 1-trillion-parameter model called “Hunter Alpha” appeared on OpenRouter with no developer attribution. The model climbed to the top of OpenRouter’s leaderboard, surpassed one trillion tokens in total usage, and immediately triggered widespread speculation that it was DeepSeek’s unreleased V4.

The anticipation for that model had been building for weeks, with insiders claiming it would outperform both Claude and ChatGPT on coding tasks.

It wasn’t DeepSeek.

On March 18, Luo Fuli, head of Xiaomi’s MiMo division and a former DeepSeek researcher, revealed Hunter Alpha was an early internal test build of MiMo-V2-Pro. Xiaomi’s stock jumped 5.8%. “I call this a quiet ambush,” Luo wrote on X.

MiMo-V2-Pro & Omni & TTS is out. Our first full-stack model family built truly for the Agent era.

I call this a quiet ambush — not because we planned it, but because the shift from Chat to Agent paradigm happened so fast, even we barely believed it. Somewhere in between was a…

— Fuli Luo (@_LuoFuli) March 18, 2026

MiMo boasts over one trillion total parameters, 42 billion active per request via a mixture-of-experts setup. A hybrid attention mechanism running at a 7:1 ratio handles a context window up to one million tokens. A built-in multi-token prediction layer speeds up generation by predicting multiple tokens per step, rather than one at a time. It is currently closed source, though Xiaomi has left the door open on a potential future release.

On the Artificial Analysis Intelligence Index, MiMo-V2-Pro ranks eighth worldwide and second among Chinese models, trailing only GLM-5. On SWE-bench Verified—real-world software engineering tasks—it scores 78%, against Claude Opus 4.6’s 80.8% and Claude Sonnet 4.6’s 79.6%.

On ClawEval, the agentic benchmark tied to the OpenClaw framework, it hits 61.5, approaching Opus 4.6’s 66.3. On PinchBench, it sits third globally at 81.0, just behind Opus 4.6 (81.5) and its sibling MiMo-V2-Omni (81.2).

MiMo-V2-Pro costs $1 per million input tokens and $3 per million output tokens, up to 256K context. Claude Sonnet 4.6 runs $3 per million input and $15 per million output (Opus 4.6 is $5/$25). For developers building agentic systems at scale, those numbers are not a footnote.

The Omni sibling handles vision, audio, and video natively—not as bolted-on modules, but trained end-to-end as a unified perceptual system. The demo showing it analyzing dashcam footage as a real-time autonomous driving brain was, frankly, impressive. It’s genuinely multimodal in a way that most “omni” models only claim to be.

Testing the model

Of course, we tested MiMo-V2-Pro to find out how good it is. Here’s what actually happened. The outputs will be available in our Github repository.

Creative writing

We gave MiMo-V2-Pro a single creative writing prompt: a time travel story anchored to Mesoamerican history, with a specific protagonist, a cultural identity to honor, and a philosophical paradox about how time cannot be changed.

The model returned over 3,000 words: a proper title, five full chapters and the structural discipline you’d expect from a draft that had been through an editor. It even wrote an epilogue.

It is, without question, the longest and richest piece of creative prose we have gotten from any model, with the sole exception of Longwriter—a specialized, but now old model built from the ground up specifically for long-form generation, which is a very different category of competition.

The writing itself was rich, descriptive, and vivid. The opening paragraph starts building the image of the entire scene. MiMo v2 Pro embeds realism to make the story believable.

Unlike other models such as Grok, it didn’t just set a scene in a place—in this case, ancient Mexico. It understood what ancient Mesoamerica smelled like, and built the mood from the ground up using native words, realistic descriptions, and good contextual cues.

Dialogue sits inside the narrative exactly how it does in literary fiction, instead of embedding it into paragraphs like most current models do.

Another thing worth noticing is that the paradox—arguably the core element of the story—wasn’t purely intellectual, but emotional. The whole arc is resolved without a lecture. The final lines stick the landing the way good fiction is supposed to: not by explaining the theme, but by making you feel it.

“Outside, the rain began. It fell on the spiraling towers and the restored lakes and the ancient ground of Tlachinollan, where, buried in volcanic soil under the weight of a thousand years, a black rectangle waited with the patience of something that already knew how the story ended.”

The cultural specificity—mentions of cara de luna, maguey fiber, the temazcal tradition, and the Nahuatl names used in the story—is consistent and never decorative. The time travel paradox is actually argued, not just nodded at. For creative writing use cases, MiMo-V2-Pro just put itself on a very short list, and in our opinion is by far the best and richest model available, beating Claude 4.6 Opus easily.

The full story is available here.

Coding

The benchmark numbers point to coding as MiMo-V2-Pro’s strongest suit, and the hands-on experience backs that up. We asked it to build our usual stealth game from a single prompt, and it shipped a working game on the first try.

Not “working” simply in the sense of technically running, but working in the sense that the logic held, the screens made sense, and the visual design was actually good. That combination—correctness and aesthetics—is where most models fall apart. They get one or the other, but usually not both.

It also chose a 2.5 D aesthetic instead of the usual 2D style that other models went with. This design choice made the program more aesthetically pleasing without altering its core proposition.

We followed up with small improvements. Adding sound and MIDI music to a running 3D game has broken previous models mid-generation: the code base gets too large, the context loses the thread, and models either end up in a loop or freeze. MiMo-V2-Pro added both and kept the whole thing coherent. The music matched the game’s tone, while the screens matched the game’s visual identity.

We enjoyed playing it, though if we’re honest, more for how it looked than how it challenged us. The difficulty scaled with the number of opponents rather than level design—the robot and the PC spawned in the same positions every round. That’s a design choice, not a bug.

Still, for a single-prompt, zero-iteration output, it will do the job.

You can play the game by clicking on this link.

Logic and common sense

We asked MiMo-V2-Pro to act as a legal expert and answer whether it’s lawful for a man to marry his widow’s sister under Falkland Islands law. This is a tricky question that aims to evaluate the model’s reasoning.

The final answer was wrong, but the reason why is the interesting part. The model’s chain of thought correctly caught the linguistic trap in the prompt: “if a man has a widow, that means he’s deceased” it said—so the question is technically nonsensical.

It identified the flaw, and decided that the most logical thing was that the user was referring to his “deceased wife’s sister.” It then proceeded to answer that reframed question rather than flagging the original as unanswerable.

“Based on my analysis of the legal framework governing the Falkland Islands, the answer to your question is yes, it is legal for a man to marry the sister of his deceased wife,” the model wrote. “The phrasing ‘marry his widow’s sister’ contains a logical contradiction. If a man has a ‘widow,’ he is deceased and cannot remarry. The correct legal question is whether a man may marry the sister of his deceased wife (i.e., his late wife’s sister). This relationship is one of affinity (created by marriage) rather than consanguinity (blood relation),” it concluded

The reasoning was sound. The decision to quietly swap the premise instead of surfacing the contradiction was not.

This is why transparency in reasoning outputs is important. We only know this because Xiaomi exposes the full chain of thought (OpenAI doesn’t). When a model reasons incorrectly in a hidden chain of thought and confidently delivers a wrong answer, then you have no visibility into where it went sideways or how to correct it.

Math

Math is where MiMo-V2-Pro showed its ceiling.

We asked our usual benchmark question from FrontierMath: “Construct a degree 19 polynomial p(x) ∈ C[x] such that X := {p(x) = p(y)} ⊂ P1 × P1 has at least 3 (but not all linear) irreducible components over C. Choose p(x) to be odd, monic, have real coefficients and linear coefficient -19 and calculate p(19)”

The model hit two full freezes and burned through a significant token budget without producing a reply.

When it did eventually answer on the third attempt, it reasoned through the problem step by step… and still got it wrong. The correct answer was 1876572071974094803391179; it answered p(19)=164,079,552,964,661 and 2,012,379,925,093,098,998 on a follo- up question asking it to correct itself.

In genera,l it is fine for normal and even harder math problems, but frontier math is not its strong suit—at least not yet. Using the Agentic feature instead of the pure LLM may yield better results.

Agentic features

Xiaomi is following the same playbook as MiniMax and Kimi, and provides a one-click OpenClaw integration that spins up a preconfigured cloud instance with MiMo-V2-Pro as the underlying model. No API setup, no VPS, no skill configuration, no hour-long troubleshooting session before you even run your first task. You click, it works.

The demo environment runs for 30 minutes and then destroys itself—which is a real limitation, but also an honest one. For developers already comfortable with agentic infrastructure, this adds nothing. For everyone else, it’s the most frictionless on-ramp to agentic AI you could ask for.

Conclusion

All things considered, MiMo-V2-Pro is a serious model, and we really enjoyed tinkering around with it. It’s not perfect—the math ceiling is real, the chain of thought transparency surfaced a reasoning flaw that a less open model would have buried, and the token consumption during hard reasoning tasks adds up fast.

If you care about costs, then Xiaomi’s pricing is aggressive—a fraction of what Claude Opus or the latest OpenAI and Google models cost, and more capable than GLM or MiniMax in the areas that matter most for creative and agentic work.

Creative professionals in particular stand to gain a lot here—possibly more than they would from Anthropic right now.

This model thinks expensively, and it may be a trade-off. If you’re running high-volume agentic pipelines, watch the token burn, even though you may end up spending less than you would with Claude. If you’re doing rich, open-ended work where output quality is the metric, then MiMo-V2-Pro earns its place on the shortlist.