Meta: Llama 4 MaverickModel24/100 via “multimodal instruction-following with mixture-of-experts routing”
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...
Unique: Uses 128-expert MoE architecture with dynamic token routing to achieve 17B active parameters instead of dense 70B+ models, enabling multimodal understanding without separate vision encoders or cross-attention layers. The sparse activation pattern is learned end-to-end during training, allowing experts to self-organize for text, vision, and fusion tasks.
vs others: More efficient than dense multimodal models like LLaVA or GPT-4V because conditional computation activates only task-relevant experts, reducing latency and API costs while maintaining instruction-following quality across modalities.