Gate News reports that on March 19, Xiaomi officially released the MiMo-V2 series AI models, including the flagship reasoning model Pro, the multimodal base Omni, and the speech synthesis TTS.
MiMo-V2-Pro has over 1 trillion parameters (42B active parameters), supports a 1 million token long context, and is designed specifically for agent work scenarios. It ranks eighth globally and second domestically on the Artificial Analysis leaderboard, and ranks third worldwide in both PinchBench and ClawEval evaluations. Its overall performance surpasses Claude Sonnet 4.6 and approaches Opus 4.6, but at only one-fifth the price: $1 for input within 256K context, $3 per million tokens output; $2 for input within 1M context, $6 per million tokens output. The MiMo Claw module has integrated into the Kingsoft WebOffice ecosystem, with WPS Lingxi synchronization.
MiMo-V2-Omni is a multimodal base supporting text, image, audio, and video inputs, with a 256K context window. Pricing is $0.4 for input and $2 per million tokens for output. In audio, it supports over 10 hours of continuous long audio understanding, outperforming Gemini 3 Pro in comprehensive evaluation; in image understanding, it surpasses Claude Opus 4.6 and approaches Gemini 3 Pro.
MiMo-V2-TTS is based on a self-developed Audio Tokenizer, pretrained on over 100 million hours of speech data. It supports multi-granularity control from overall style to local emotion, can synthesize high-quality singing voices, and covers dialects including Northeast, Sichuan, Henan, Cantonese, and Taiwanese accents.
All three models are now integrated with Xiaomi miclaw, MiMo Studio, Kingsoft Office, and Xiaomi Browser, and can be accessed via the OpenClaw, OpenCode, KiloCode, Blackbox, and Cline agent development frameworks, available free for one week.