Capability
Multi Model Routing Parameter Inference
16 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “sparse-mixture-of-experts-token-routing”
Mistral's mixture-of-experts model with efficient routing.
Unique: Uses token-level routing to 2-of-8 experts per layer with simultaneous expert and router training, achieving 27.6% parameter utilization while maintaining dense-model performance. Differs from dense models (which activate all parameters) and from other MoE designs by using learned routing per token rather than sequence-level or document-level routing.
vs others: Achieves 6x faster inference than Llama 2 70B with equivalent performance by activating only 12.9B parameters per token, whereas dense models must activate all parameters regardless of task complexity.