Tech

Xiaomi claims breakthrough in AI inference speed with MiMo-V2.5-Pro

The Chinese technology giant, working with TileRT, attributes the performance to extreme model-system codesign rather than specialised accelerators.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: Hacker News · original

Artificial Intelligence Media Research

Related coverage

Explore Artificial Intelligence coverage Explore Media coverage Explore Research coverage More from the Tech desk

Tech

No image available

UltraSpeed mode enables trillion-parameter model to generate 1,000 tokens per second on standard hardware

Xiaomi has announced the release of the UltraSpeed mode for its MiMo-V2.5-Pro large language model, a development achieved in collaboration with TileRT. The update allows the 1-trillion-parameter model to generate text at speeds exceeding 1,000 tokens per second, a milestone the company states has not been previously achieved on commodity graphics processing units.

The performance claim rests on what Xiaomi describes as extreme model-system codesign. This approach suggests that the speed is derived from deep integration between the model architecture and the underlying system software, rather than relying solely on hardware upgrades or specialised AI accelerators. By optimising the interaction between the model and standard commercial hardware, the firm aims to lower the barriers to deploying large-scale AI models.

Commodity GPUs refer to standard, commercially available graphics processing units, as opposed to custom-built or specialised AI accelerators often required for high-performance computing. Token generation speed remains a critical metric for evaluating the efficiency and real-time usability of large language models in production environments. Achieving high throughput on widely available hardware could have significant implications for cost structures and deployment scalability.

The announcement was published via the Xiaomi MiMo blog and reported on Hacker News, where the technical details of the UltraSpeed mode were highlighted. The source material does not specify the exact hardware specifications of the commodity GPUs used in the benchmark, nor does it elaborate on the technical methodology behind the extreme model-system codesign.

It remains unclear whether the reported speed is sustained under varying load conditions or if it is achieved only within controlled benchmark environments. As with any performance claim involving new software modes, independent verification against established benchmarks would be necessary to confirm the reproducibility of these results across different operational contexts.

Xiaomi claims breakthrough in AI inference speed with MiMo-V2.5-Pro

More from Tech

Indie title Dogpile expands to Switch and mobile with free content update

Apple shifts focus to software and AI as Cook hands WWDC reins to Ternus

The Verge curates 26 tech and lifestyle gifts for Father’s Day 2026