Capability

On Device Compact Model Inference

15 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

via “efficient inference on edge devices through quantization and model optimization”

text-generation model by undefined. 1,00,53,835 downloads.

Unique: Qwen3-4B's 4B parameter scale is already optimized for edge deployment; supports multiple quantization formats (GPTQ, AWQ, GGML) enabling flexibility across deployment targets; grouped query attention reduces KV cache size by 4-8x compared to standard attention

vs others: Smaller base model than Llama 3.2-7B makes quantization more effective; better quality than TinyLlama at similar quantized size; requires less custom optimization than Phi-2 due to more mature quantization ecosystem

On Device Compact Model Inference

Top Matches

Also Known As

Company