Model Compression Through Pruning And Structured Sparsity Support

1

TensorFlow LiteFramework58/100

Lightweight ML inference for mobile and edge devices.

Unique: Runtime support for pruned and sparsified models that skip zero-valued weights and use sparse tensor formats, enabling compression beyond quantization for models trained with sparsity constraints.

vs others: Complementary to quantization for additional compression; however, requires training-time support and sparse tensor format standardization which are not fully documented.

2

DeepSpeedFramework57/100

via “model compression through pruning and distillation”

Microsoft's distributed training library — ZeRO optimizer, trillion-parameter scale, RLHF.

Unique: Combines structured pruning with knowledge distillation; supports both unstructured and structured sparsity patterns with automatic fine-tuning to recover accuracy

vs others: More integrated than separate pruning/distillation tools; automatic fine-tuning reduces manual tuning effort

3

llmcompressorRepository55/100

via “structured and unstructured pruning with layer-wise sparsity patterns”

Toolkit for LLM quantization, pruning, and distillation.

Unique: Implements layer-wise pruning through a modifier system that applies sparsity masks to specific layer patterns, supporting both structured (channel/head removal) and unstructured (weight removal) pruning with automatic importance estimation from calibration data

vs others: More flexible than magnitude-based pruning because it supports learned importance scores; more practical than gradient-based pruning because it doesn't require training; better integrated with vLLM than generic sparse tensor libraries

4

torchFramework28/100

via “sparse tensor operations and structured sparsity support”

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Unique: Supports multiple sparse tensor formats (COO, CSR, CSC) with structured sparsity patterns (N:M, block sparsity) that leverage hardware acceleration. Integrates with quantization and pruning for model compression.

vs others: More flexible than hardware-specific sparse libraries because it abstracts format differences, while more efficient than dense computation for sparse models because it leverages sparse tensor cores.

Top Matches

Also Known As

Company