Capability
Quantization Aware Adapter Training With Frozen Base Weights
20 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “model quantization for memory and latency reduction”
text-generation model by undefined. 1,42,05,413 downloads.
Unique: Supports both post-training quantization (no retraining) via bitsandbytes and quantization-aware training (better accuracy) via torch.quantization, with automatic calibration dataset selection for minimal accuracy loss
vs others: Faster and simpler than knowledge distillation (which requires training a smaller model), but less accurate than distillation for extreme compression — best for 2-4x size reduction, not 10x+