Capability

Multi Model Routing Parameter Inference

16 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

via “sparse-mixture-of-experts-token-routing”

Mistral's mixture-of-experts model with efficient routing.

Unique: Uses token-level routing to 2-of-8 experts per layer with simultaneous expert and router training, achieving 27.6% parameter utilization while maintaining dense-model performance. Differs from dense models (which activate all parameters) and from other MoE designs by using learned routing per token rather than sequence-level or document-level routing.

vs others: Achieves 6x faster inference than Llama 2 70B with equivalent performance by activating only 12.9B parameters per token, whereas dense models must activate all parameters regardless of task complexity.

Multi Model Routing Parameter Inference

Top Matches

Also Known As

Company