MiniMax: MiniMax M2Model25/100 via “efficient inference via sparse expert routing”
MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...
Unique: Implements conditional computation through expert routing that activates only 10B of 230B parameters per token, reducing inference cost and latency compared to dense models while maintaining competitive output quality through specialized expert pathways
vs others: Achieves 60-70% inference cost reduction vs 70B dense models while maintaining comparable quality through expert specialization; more efficient than full-scale frontier models (GPT-4, Claude) for cost-sensitive production deployments