Capability
Multimodal Knowledge Distillation And Compression
10 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “knowledge distillation for model compression”
text-generation model by undefined. 1,42,05,413 downloads.
Unique: Enables knowledge transfer from larger teacher (GPT-2) to smaller student via soft target matching, preserving linguistic knowledge while reducing parameters — complementary to quantization for extreme compression
vs others: More effective than quantization alone for large compression ratios (5-10x), but requires training vs quantization's post-hoc approach — best combined with quantization for maximum compression