Model Quantization And Export To Onnx Torchscript For Deployment

1

NVIDIA NeMoFramework63/100

via “model quantization and export to onnx/torchscript for deployment”

NVIDIA's framework for scalable generative AI training.

Unique: Integrates post-training quantization with ONNX/TorchScript export, supporting per-channel and per-layer quantization strategies. Exported models can be optimized with graph fusion and constant folding. Supports dynamic shapes for variable-length inputs, enabling flexible deployment scenarios.

vs others: More integrated with NeMo models than generic ONNX export tools, but less mature than TensorRT for NVIDIA-specific optimization; requires manual operator mapping for custom layers.

2

PyTorch LightningFramework63/100

via “model-export-and-inference-optimization”

PyTorch training framework — distributed training, mixed precision, reproducible research.

Unique: Integrates model export with the Trainer's checkpoint system, allowing automatic export at the end of training. Supports multiple export formats (ONNX, TorchScript, SavedModel) through a unified API, and provides hooks for quantization and pruning without requiring separate tools.

vs others: More integrated than manual ONNX export (no need to manually trace models or handle export edge cases) and more flexible than framework-specific export tools (supports multiple formats and optimization techniques). Automatic export at training end reduces manual steps compared to post-hoc export workflows.

3

FastAIFramework60/100

via “model export and inference optimization for deployment”

High-level deep learning with built-in best practices.

Unique: Provides simple APIs for exporting FastAI models to standard formats (ONNX, TorchScript) and quantizing them for deployment, abstracting away the complexity of manual export and optimization.

vs others: More convenient than manual ONNX export, but less comprehensive than specialized inference optimization frameworks like TensorRT or ONNX Runtime

4

UltralyticsRepository58/100

via “multi-format model export with quantization and optimization”

Unified YOLO framework for detection and segmentation.

Unique: Unified exporter interface abstracts 10+ format-specific implementations (ONNX, TensorRT, CoreML, OpenVINO, etc.) through a single export() call with format auto-detection. Built-in validation layer compares exported model outputs against PyTorch baseline to catch numerical drift. Generates deployment code snippets for each format.

vs others: More comprehensive format coverage than TensorFlow Lite (supports TensorRT, CoreML, OpenVINO natively) and simpler than ONNX Runtime alone (handles quantization and validation automatically)

5

Detectron2Repository58/100

via “multi-format model export for deployment (torchscript, onnx, caffe2)”

Meta's modular object detection platform on PyTorch.

Unique: Supports three deployment formats (TorchScript, ONNX, Caffe2) with automatic input/output shape inference and format-specific optimizations, enabling deployment across heterogeneous inference platforms — unlike frameworks that support only a single export format

vs others: More flexible than TensorFlow's SavedModel because it supports multiple export targets; more production-ready than raw PyTorch models because exported models have no Detectron2 dependencies and can be optimized for specific inference hardware

6

xlm-roberta-baseModel55/100

via “onnx model export and optimized inference”

fill-mask model by undefined. 1,81,65,674 downloads.

Unique: Provides native ONNX export support via HuggingFace Transformers, enabling single-command conversion to hardware-agnostic format with built-in optimization profiles for CPU, GPU, and mobile inference — unlike manual ONNX conversion which requires deep knowledge of ONNX IR and operator semantics

vs others: Reduces deployment complexity and inference latency compared to PyTorch/TensorFlow serving by eliminating framework dependencies and enabling aggressive quantization/pruning, while maintaining model accuracy through ONNX Runtime's operator fusion and memory optimization

7

bge-m3Model55/100

via “onnx model export for edge and serverless deployment”

sentence-similarity model by undefined. 2,04,74,507 downloads.

Unique: Pre-optimized ONNX export with native quantization support and operator fusion for CPU inference, reducing deployment complexity compared to manual PyTorch-to-ONNX conversion while maintaining embedding quality through careful quantization calibration

vs others: Simpler than custom ONNX conversion pipelines and includes pre-tuned quantization profiles, whereas generic PyTorch-to-ONNX export requires manual optimization; reduces cold-start latency by 60-80% vs PyTorch Lambda deployments

8

mobilenetv3_small_100.lamb_in1kModel54/100

via “model-export-and-format-conversion”

image-classification model by undefined. 2,28,10,638 downloads.

Unique: timm provides unified export utilities (timm.models.convert_to_onnx, timm.models.convert_to_tflite) that handle operator fusion, constant folding, and shape inference automatically. The export pipeline supports quantization-aware export, enabling int8 models without separate QAT. ONNX export includes graph optimization via onnx-simplifier, reducing model size by 10-20% and improving inference speed.

vs others: Automated export pipeline eliminates manual operator mapping and shape inference errors; supports more target formats (ONNX, TFLite, CoreML, NCNN, TorchScript) than single-framework converters, reducing conversion complexity.

9

bge-reranker-v2-m3Model54/100

via “quantization-and-model-compression-for-edge-deployment”

text-classification model by undefined. 98,81,128 downloads.

Unique: XLM-RoBERTa base model (110M parameters) is inherently smaller than larger alternatives, making quantization more effective; safetensors format enables efficient ONNX conversion with minimal overhead vs .bin format

vs others: Smaller base model (110M) quantizes more effectively than larger alternatives (300M+); ONNX support enables cross-platform deployment (CPU, mobile, edge) vs PyTorch-only models

10

ChatTTSAgent53/100

via “onnx export for cross-platform deployment”

A generative speech model for daily dialogue.

Unique: Provides ONNX export capability for all major pipeline components (GPT, DVAE, Vocos), enabling end-to-end deployment without PyTorch. The export process includes optimization and quantization options, enabling deployment on resource-constrained devices.

vs others: More flexible than PyTorch-only deployment because ONNX enables use of alternative inference runtimes (ONNX Runtime, TensorRT, CoreML). More portable than TorchScript because ONNX is a standard format with broad ecosystem support.

11

table-transformer-detectionModel53/100

via “onnx model export for edge deployment and inference optimization”

object-detection model by undefined. 33,94,499 downloads.

Unique: Provides transformer-aware ONNX export that preserves attention mechanism semantics while enabling quantization-friendly operator fusion. The export pipeline includes automatic calibration for INT8 quantization using representative document images, reducing manual tuning overhead.

vs others: More portable than TensorFlow Lite or CoreML because ONNX Runtime runs on Windows, Linux, macOS, iOS, and Android with identical inference results; achieves better accuracy-latency tradeoffs than naive INT8 quantization due to transformer-specific calibration strategies.

12

e5-base-v2Model50/100

via “onnx and openvino model export for edge and on-premise deployment”

sentence-similarity model by undefined. 17,78,169 downloads.

Unique: Provides native ONNX and OpenVINO export through sentence-transformers' built-in conversion utilities, supporting both full-precision and quantized models without custom export code. The export process preserves the tokenizer and preprocessing logic, enabling end-to-end inference without reimplementing text preprocessing.

vs others: One-command export to multiple formats (ONNX, OpenVINO) with quantization support, whereas most models require separate conversion pipelines and manual tokenizer integration for edge deployment.

13

ModernBERT-baseModel49/100

via “onnx and safetensors export for cross-platform deployment”

fill-mask model by undefined. 13,80,835 downloads.

Unique: Provides first-class ONNX and SafeTensors support in the HuggingFace model card with pre-converted weights, eliminating the need for custom export scripts and enabling one-click deployment to ONNX Runtime, TensorRT, or CoreML without PyTorch dependency

vs others: Faster and more secure than pickle-based PyTorch exports (SafeTensors), and more portable than PyTorch-only models while maintaining compatibility with standard BERT fine-tuning workflows

14

mDeBERTa-v3-base-xnli-multilingual-nli-2mil7Model48/100

via “onnx-model-export-and-inference”

zero-shot-classification model by undefined. 3,03,704 downloads.

Unique: Enables ONNX export of the DeBERTa-v3-base architecture with full transformer semantics preserved, supporting dynamic batch sizes and sequence lengths without reexport. Unlike simple PyTorch-to-ONNX conversion, this approach maintains cross-lingual capabilities and NLI reasoning patterns across different runtime environments.

vs others: Provides hardware-agnostic inference without PyTorch dependency, enabling 2-5x faster startup and lower memory overhead than PyTorch on CPU, and supports quantization for 4x model size reduction with minimal accuracy loss vs full-precision models.

15

mask2former-swin-large-cityscapes-semanticModel46/100

via “model export to onnx and torchscript formats”

image-segmentation model by undefined. 1,55,904 downloads.

Unique: Supports export to both ONNX and TorchScript, enabling deployment across diverse inference engines (ONNX Runtime, TensorRT, CoreML) — though deformable attention may require custom ONNX operators not available in standard opset

vs others: Enables multi-platform deployment vs PyTorch-only inference, though export complexity and potential operator compatibility issues add deployment friction

16

oneformer_ade20k_swin_tinyModel46/100

via “pytorch-and-onnx-export-for-deployment”

image-segmentation model by undefined. 2,48,429 downloads.

Unique: Supports export to ONNX format for cross-platform inference, enabling deployment to CPU, mobile, and specialized hardware without PyTorch dependency. ONNX export enables optimization via TensorRT (NVIDIA), ONNX Runtime, or CoreML (iOS) for platform-specific performance tuning.

vs others: More flexible than PyTorch-only deployment because ONNX enables inference on diverse platforms; enables optimization via specialized inference engines (TensorRT, ONNX Runtime) that may outperform PyTorch on specific hardware; supports mobile deployment through CoreML/TFLite conversion.

17

resnet50.a1_in1kModel46/100

via “model quantization and optimization for edge deployment”

image-classification model by undefined. 15,64,660 downloads.

Unique: Supports multiple quantization backends (PyTorch native, ONNX, TensorRT) through timm's export utilities; includes pre-calibrated quantization profiles for ImageNet-1K to minimize accuracy loss; compatible with hardware-specific optimizations (NVIDIA TensorRT, Apple Neural Engine)

vs others: Better quantization accuracy than TensorFlow Lite's default quantization due to timm's calibration profiles; faster TensorRT export than manual ONNX conversion; broader hardware support than single-framework solutions

18

bert-base-multilingual-cased-ner-hrlModel46/100

via “onnx and tensorflow export for production deployment”

token-classification model by undefined. 2,87,100 downloads.

Unique: Supports export to three distinct production formats (ONNX, TensorFlow SavedModel, TensorFlow Lite) from single PyTorch checkpoint, enabling deployment across Java backends, Python services, mobile apps, and browsers without retraining. Maintains numerical equivalence across formats.

vs others: Eliminates need to maintain separate PyTorch, TensorFlow, and ONNX model variants; single checkpoint exports to all three formats. ONNX Runtime inference is 2-3x faster than PyTorch on CPU due to graph optimization, making it ideal for cost-sensitive deployments.

19

vit_base_patch16_224.augreg2_in21k_ft_in1kModel45/100

via “model export and deployment in multiple formats for production inference”

image-classification model by undefined. 5,01,255 downloads.

Unique: Supports SafeTensors format (safer than pickle-based .pt files due to no arbitrary code execution risk) alongside ONNX and TorchScript; timm provides built-in export utilities that handle architecture-specific details automatically, reducing manual conversion errors

vs others: Safer than raw PyTorch checkpoints because SafeTensors format prevents arbitrary code execution attacks; more portable than TorchScript because ONNX is supported by multiple runtimes (ONNX Runtime, TensorRT, CoreML); quantization utilities are more automated than manual int8 conversion

20

vit-gpt2-image-captioningModel45/100

via “model quantization and optimization for edge deployment”

image-to-text model by undefined. 2,65,979 downloads.

Unique: Supports both ONNX export (for cross-platform compatibility) and bitsandbytes quantization (for in-place int4 quantization in PyTorch), providing multiple optimization paths depending on deployment target — ONNX for mobile/web, bitsandbytes for cloud inference cost reduction

vs others: More flexible than distillation-based approaches (e.g., training a smaller model) because quantization requires no retraining, and more practical than pruning because the model architecture remains unchanged and compatible with standard inference code

Top Matches

Also Known As

Company