What can distilbert-onnx do?

extractive question-answering with onnx inference, squad-compatible span prediction with token-level alignment, cross-platform onnx runtime inference with hardware acceleration, batch inference with dynamic sequence padding, model quantization to int8 with minimal accuracy loss, squad dataset fine-tuning and transfer learning

distilbert-onnx

ModelFree

question-answering model by undefined. 48,698 downloads.

Open Source

/ 100

6 capabilities

Capabilities6 decomposed

extractive question-answering with onnx inference

Medium confidence

Performs extractive QA by encoding questions and passages through a DistilBERT transformer backbone compiled to ONNX format, then predicting start/end token positions via dense span classification layers. The ONNX compilation enables hardware-accelerated inference across CPU, GPU, and mobile runtimes without Python dependency overhead, using quantized weights optimized for latency-critical deployments.

Solves for

I need to extract answers from documents in real-time without cloud API latencyI want to run QA inference on edge devices or mobile with minimal memory footprintI need to batch process thousands of QA pairs with deterministic, reproducible resultsI want to integrate QA into a production system without managing Python runtime dependencies

Best for

embedded systems and edge device developers building offline-capable applications

teams deploying inference at scale requiring sub-100ms latency guarantees

organizations with strict data residency requirements avoiding cloud APIs

Requires

ONNX Runtime 1.10+ (Python, C++, or JavaScript bindings)

transformers library 4.0+ for tokenization and model loading

512MB RAM minimum for model weights; 2GB+ recommended for batch inference

Limitations

Extractive-only — cannot generate answers not present in source text; fails on reasoning-heavy questions

SQuAD-trained on English Wikipedia passages; performance degrades on domain-specific jargon or non-English text

Fixed sequence length (384 tokens) requires manual passage chunking for documents >512 characters

What makes it unique

Pre-compiled ONNX serialization of DistilBERT (40% smaller than BERT, 60% faster inference) eliminates Python runtime overhead and enables cross-platform deployment from mobile to server; most QA models on HuggingFace distribute as PyTorch/TensorFlow checkpoints requiring runtime conversion

vs alternatives

Faster inference than cloud-based QA APIs (50-200ms vs 500ms+ round-trip) with zero data transmission, and 10x smaller model size than full BERT-base while maintaining 95%+ SQuAD accuracy

squad-compatible span prediction with token-level alignment

Medium confidence

Implements the SQuAD evaluation protocol by predicting start and end token positions within a passage, then mapping predicted token indices back to character offsets in the original text. Uses WordPiece tokenization with offset tracking to handle subword fragmentation, ensuring predicted spans align correctly with source text even when tokens split across word boundaries.

Solves for

I need to evaluate my QA system against SQuAD benchmarks with standard metrics (EM, F1)I want to extract answer text that exactly matches the original passage without hallucinationI need to track which tokens contributed to the answer for interpretability or debuggingI want to handle edge cases like punctuation, contractions, and multi-word answers correctly

Best for

researchers benchmarking QA models against academic standards

teams building production QA systems requiring exact-match answer extraction

developers implementing QA evaluation pipelines with standard metrics

Requires

transformers tokenizer (AutoTokenizer) compatible with DistilBERT

passage text with preserved original formatting for offset mapping

SQuAD-format evaluation script (official or HuggingFace datasets library)

Limitations

SQuAD training assumes single correct answer per question; fails on ambiguous questions with multiple valid answers

Token-to-character mapping breaks on non-standard text preprocessing (HTML entities, special Unicode, mixed scripts)

Predictions limited to contiguous spans; cannot extract discontinuous answers or multi-span reasoning

What makes it unique

Preserves character-level offset mapping through WordPiece tokenization via offset_mapping tensors, enabling exact reconstruction of answer text from token predictions without post-hoc string matching; most QA implementations lose this mapping during tokenization

vs alternatives

Guarantees character-accurate answer extraction without fuzzy string matching, and enables direct SQuAD metric computation (EM/F1) without custom evaluation code

cross-platform onnx runtime inference with hardware acceleration

Medium confidence

Executes the compiled DistilBERT model through ONNX Runtime's abstraction layer, which automatically selects optimal execution providers (CPU, CUDA, TensorRT, CoreML, NNAPI) based on available hardware. The model graph is pre-optimized for inference (no training overhead), with operator fusion and memory layout optimization applied at ONNX conversion time, enabling deterministic performance across x86, ARM, and GPU architectures.

Solves for

I need to run the same QA model on CPU servers, GPU clusters, and mobile devices without code changesI want to maximize inference throughput by leveraging GPU acceleration when available, falling back to CPUI need predictable latency for SLA-critical applications with hardware-agnostic deploymentI want to minimize model size and memory footprint for resource-constrained environments

Best for

DevOps/MLOps teams managing multi-hardware inference infrastructure

mobile app developers building offline QA features for iOS/Android

edge computing teams deploying models to IoT devices and embedded systems

Requires

ONNX Runtime 1.10+ with appropriate execution provider (CPU, CUDA, TensorRT, CoreML, NNAPI)

CUDA 11.0+ and cuDNN 8.0+ for GPU acceleration (optional)

Python 3.7+ or C++17 for runtime bindings

Limitations

ONNX Runtime provider availability varies by platform; GPU support requires CUDA 11.0+ or specific GPU drivers

Operator coverage incomplete for some transformers extensions; custom ops may not be supported

Quantization to int8 reduces accuracy by 1-3% on SQuAD; requires calibration on representative data

What makes it unique

ONNX Runtime's execution provider abstraction enables single-model deployment across CPU/GPU/mobile without recompilation, with automatic hardware detection and provider selection; PyTorch/TensorFlow models require separate optimization and export per target platform

vs alternatives

10-50x faster inference than Python-based transformers on GPU (via TensorRT), and 100x smaller deployment footprint than full PyTorch runtime

batch inference with dynamic sequence padding

Medium confidence

Processes multiple question-passage pairs in parallel by padding variable-length inputs to a common sequence length (384 tokens), then executing a single batched forward pass through ONNX Runtime. Attention masks are automatically generated to zero-out padding tokens, preventing spurious attention to padded positions. Batch processing amortizes model loading and GPU kernel launch overhead, achieving 5-10x throughput improvement over sequential inference.

Solves for

I need to process 1000+ QA pairs efficiently without loading the model multiple timesI want to maximize GPU utilization by batching variable-length inputsI need to reduce per-sample inference latency through batching without increasing memory consumption linearlyI want to implement efficient data pipelines for offline QA evaluation or bulk document processing

Best for

data engineers building batch QA processing pipelines for document analysis

researchers evaluating models on large QA datasets (SQuAD, Natural Questions)

teams implementing bulk inference services with throughput requirements >100 samples/sec

Requires

batch_size parameter tuned for available GPU/CPU memory

tokenizer with padding support (pad_token_id defined)

attention mask generation (automatic in transformers library)

Limitations

Batch size is memory-constrained; batch_size=32 requires ~2GB VRAM on GPU, limiting throughput on edge devices

Padding to fixed sequence length (384) wastes computation on short passages; average utilization ~60-70%

Dynamic batching requires buffering requests, adding latency for real-time single-sample inference

What makes it unique

Implements attention masking at ONNX graph level (not post-processing), ensuring padding tokens never contribute to attention scores; most batch implementations apply masking in Python, adding per-sample overhead

vs alternatives

5-10x higher throughput than sequential inference on GPU, and 2-3x better latency than naive batching without attention mask optimization

model quantization to int8 with minimal accuracy loss

Medium confidence

Provides a pre-quantized int8 variant of DistilBERT (if available in model hub) or supports post-training quantization via ONNX Runtime's quantization tools. Quantization reduces model size from 67MB (float32) to ~17MB (int8) and accelerates inference by 2-4x on CPU through reduced memory bandwidth and integer-only arithmetic. Calibration is performed on SQuAD training data to minimize accuracy degradation.

Solves for

I need to deploy QA models on mobile/edge with <20MB footprintI want to reduce inference latency on CPU-only devices by 2-4xI need to fit multiple QA models in a single GPU for multi-task inferenceI want to reduce bandwidth for model distribution across edge devices

Best for

mobile app developers building offline QA features with strict size constraints

edge device teams deploying to IoT/embedded systems with limited storage

teams running inference on older CPUs without AVX-512 support

Requires

ONNX Runtime 1.10+ with quantization tools

calibration dataset (SQuAD or domain-specific QA pairs)

CPU with int8 arithmetic support (most modern x86/ARM)

Limitations

int8 quantization reduces SQuAD F1 score by 1-3% compared to float32; unacceptable for high-precision applications

Quantization calibration requires representative data (SQuAD training set); domain-specific accuracy loss may be higher

int8 inference requires CPU support for integer operations; older ARM processors may not have efficient int8 kernels

What makes it unique

ONNX Runtime quantization uses symmetric int8 ranges with per-channel calibration, preserving accuracy better than asymmetric quantization; most mobile frameworks use simpler per-tensor quantization with 2-5% accuracy loss

vs alternatives

2-4x faster CPU inference and 75% smaller model size vs float32, with <3% accuracy loss on SQuAD (vs 5-10% for naive quantization)

squad dataset fine-tuning and transfer learning

Medium confidence

The model is pre-trained on SQuAD 1.1 (100k QA pairs from Wikipedia), enabling transfer learning to domain-specific QA tasks. Developers can fine-tune the model on custom datasets by loading the ONNX model's PyTorch checkpoint, training on domain data, then re-exporting to ONNX. The SQuAD pre-training provides strong initialization for extractive QA, reducing fine-tuning data requirements from 10k+ to 1-5k examples for competitive performance.

Solves for

I want to adapt the model to domain-specific QA (medical, legal, technical docs) with minimal labeled dataI need to fine-tune on proprietary datasets without starting from scratchI want to understand what linguistic patterns the model learned from SQuADI need to evaluate transfer learning effectiveness for my specific domain

Best for

NLP practitioners building domain-specific QA systems with limited labeled data

researchers studying transfer learning from Wikipedia to specialized domains

teams migrating from generic QA to industry-specific applications (finance, healthcare)

Requires

PyTorch 1.9+ or TensorFlow 2.4+ for fine-tuning

transformers library 4.0+ with DistilBERT model

GPU with 8GB+ VRAM for fine-tuning (batch_size=16-32)

Limitations

SQuAD is Wikipedia-based; transfer learning may fail on highly specialized domains (medical terminology, legal jargon) without domain-specific pre-training

Fine-tuning requires PyTorch/TensorFlow and GPU; ONNX format is inference-only and cannot be directly fine-tuned

SQuAD assumes single correct answer; fine-tuning on multi-answer datasets requires custom loss functions

What makes it unique

DistilBERT's 40% smaller size enables fine-tuning on consumer GPUs (8GB VRAM) vs BERT-base requiring 16GB+, while maintaining 95% of BERT's accuracy; most practitioners default to BERT for transfer learning despite computational overhead

vs alternatives

Fine-tuning requires 5-10x less data than training from scratch, and 3-5x faster than BERT fine-tuning while achieving 95%+ of BERT's domain-specific accuracy

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with distilbert-onnx, ranked by overlap. Discovered automatically through the match graph.

Repository25

onnxruntime

ONNX Runtime is a runtime accelerator for Machine Learning models

cross-framework model inference with automatic hardware accelerationmulti-platform model deployment with platform-specific runtimes

2 shared capabilities

Platform46

ONNX Runtime Mobile

Cross-platform ONNX inference for mobile devices.

platform-specific hardware accelerator delegation (coreml, nnapi, xnnpack)inference execution with batching and sequential input handling

2 shared capabilities

Framework46

ONNX Runtime

Cross-platform ML inference accelerator — runs ONNX models on any hardware with optimizations.

multi-provider hardware-agnostic model executiononnx model loading and shape inference

2 shared capabilities

Model49

bge-reranker-base

text-classification model by undefined. 27,01,224 downloads.

onnx-based inference with hardware acceleration

1 shared capability

Model42

DeBERTa-v3-large-mnli-fever-anli-ling-wanli

zero-shot-classification model by undefined. 1,72,974 downloads.

batch-inference-with-onnx-export

1 shared capability

Model35

yolov11-license-plate-detection

object-detection model by undefined. 28,614 downloads.

onnx-based cross-platform inference without framework dependencies

1 shared capability

Best For

✓embedded systems and edge device developers building offline-capable applications
✓teams deploying inference at scale requiring sub-100ms latency guarantees
✓organizations with strict data residency requirements avoiding cloud APIs
✓developers building multi-language NLP pipelines where ONNX is the common runtime
✓researchers benchmarking QA models against academic standards
✓teams building production QA systems requiring exact-match answer extraction
✓developers implementing QA evaluation pipelines with standard metrics
✓builders needing interpretable predictions for debugging model failures

Known Limitations

⚠Extractive-only — cannot generate answers not present in source text; fails on reasoning-heavy questions
⚠SQuAD-trained on English Wikipedia passages; performance degrades on domain-specific jargon or non-English text
⚠Fixed sequence length (384 tokens) requires manual passage chunking for documents >512 characters
⚠No built-in confidence calibration — raw logit scores require manual thresholding to filter low-quality predictions
⚠ONNX Runtime compatibility varies by hardware; ARM/RISC-V support requires specific runtime builds
⚠SQuAD training assumes single correct answer per question; fails on ambiguous questions with multiple valid answers

Requirements

ONNX Runtime 1.10+ (Python, C++, or JavaScript bindings)transformers library 4.0+ for tokenization and model loading512MB RAM minimum for model weights; 2GB+ recommended for batch inferenceHardware supporting float32 or int8 quantization (most modern CPUs/GPUs)transformers tokenizer (AutoTokenizer) compatible with DistilBERTpassage text with preserved original formatting for offset mappingSQuAD-format evaluation script (official or HuggingFace datasets library)ONNX Runtime 1.10+ with appropriate execution provider (CPU, CUDA, TensorRT, CoreML, NNAPI)

Input / Output

Accepts: text (question string, 5-100 tokens typical), text (passage/context, up to 384 tokens after tokenization), structured JSON with question-passage pairs for batch processing, text (passage with original whitespace/punctuation preserved), text (question string), token-level logits from model (shape: [batch_size, seq_length, 2]), token IDs (int64 tensor, shape: [batch_size, seq_length]), attention mask (int64 tensor, shape: [batch_size, seq_length]), token type IDs (int64 tensor, shape: [batch_size, seq_length]), list of question strings (variable length), list of passage strings (variable length, up to 384 tokens after tokenization), batch_size parameter (integer, 1-128 typical), float32 ONNX model (.onnx file), calibration dataset (100-1000 representative samples), quantization config (min/max ranges, per-channel vs per-tensor), domain-specific QA dataset (JSON format matching SQuAD schema), training hyperparameters (learning rate, epochs, batch size), validation set for early stopping

Produces: structured JSON with predicted answer span (start/end token indices), confidence scores (softmax probabilities for start/end positions), character-level answer text extracted from original passage, integer tuple (start_token_idx, end_token_idx), character-level span (start_char, end_char) in original passage, answer text string extracted from passage[start_char:end_char], confidence scores (softmax probabilities for start/end predictions), start logits (float32 tensor, shape: [batch_size, seq_length]), end logits (float32 tensor, shape: [batch_size, seq_length]), inference latency metrics (ms per sample), batched start logits (float32 tensor, shape: [batch_size, 384]), batched end logits (float32 tensor, shape: [batch_size, 384]), per-sample inference time (ms), throughput metric (samples/sec), int8 quantized ONNX model (~17MB), quantization report (accuracy metrics, per-layer statistics), latency/throughput benchmarks (ms per sample, samples/sec), fine-tuned PyTorch checkpoint, re-exported ONNX model optimized for domain, evaluation metrics (EM, F1 on validation set), training curves (loss, validation F1 over epochs)

UnfragileRank

Adoption43%(40% weight)

Quality14%(20% weight)

Ecosystem50%(15% weight)

Match Graph10%(20% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: Model

6 capabilities

Visit distilbert-onnx→

Model Details

huggingface

Provider

transformers

Architecture

48,698

Downloads

Tasks

question-answering

About

philschmid/distilbert-onnx — a question-answering model on HuggingFace with 48,698 downloads

Alternatives to distilbert-onnx

wink-embeddings-sg-100d24Repository

100-dimensional English word embeddings for wink-nlp

Compare →

voyage-ai-provider30API

Voyage AI Provider for running Voyage AI models with Vercel AI SDK

Compare →

@vibe-agent-toolkit/rag-lancedb27Agent

LanceDB implementation of RAG interfaces for vibe-agent-toolkit

Compare →

vectra41Repository

A lightweight, file-backed vector database for Node.js and browsers with Pinecone-compatible filtering and hybrid BM25 search.

Compare →

Are you the builder of distilbert-onnx?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

huggingface

Looking for something else?

Search →

Capabilities6 decomposed

extractive question-answering with onnx inference

Medium confidence

Solves for

Best for

embedded systems and edge device developers building offline-capable applications

teams deploying inference at scale requiring sub-100ms latency guarantees

organizations with strict data residency requirements avoiding cloud APIs

Requires

ONNX Runtime 1.10+ (Python, C++, or JavaScript bindings)

transformers library 4.0+ for tokenization and model loading

512MB RAM minimum for model weights; 2GB+ recommended for batch inference

Limitations

Extractive-only — cannot generate answers not present in source text; fails on reasoning-heavy questions

SQuAD-trained on English Wikipedia passages; performance degrades on domain-specific jargon or non-English text

Fixed sequence length (384 tokens) requires manual passage chunking for documents >512 characters

What makes it unique

vs alternatives

Faster inference than cloud-based QA APIs (50-200ms vs 500ms+ round-trip) with zero data transmission, and 10x smaller model size than full BERT-base while maintaining 95%+ SQuAD accuracy

squad-compatible span prediction with token-level alignment

Medium confidence

Solves for

Best for

researchers benchmarking QA models against academic standards

teams building production QA systems requiring exact-match answer extraction

developers implementing QA evaluation pipelines with standard metrics

Requires

transformers tokenizer (AutoTokenizer) compatible with DistilBERT

passage text with preserved original formatting for offset mapping

SQuAD-format evaluation script (official or HuggingFace datasets library)

Limitations

SQuAD training assumes single correct answer per question; fails on ambiguous questions with multiple valid answers

Token-to-character mapping breaks on non-standard text preprocessing (HTML entities, special Unicode, mixed scripts)

Predictions limited to contiguous spans; cannot extract discontinuous answers or multi-span reasoning

What makes it unique

vs alternatives

Guarantees character-accurate answer extraction without fuzzy string matching, and enables direct SQuAD metric computation (EM/F1) without custom evaluation code

cross-platform onnx runtime inference with hardware acceleration

Medium confidence

Solves for

Best for

DevOps/MLOps teams managing multi-hardware inference infrastructure

mobile app developers building offline QA features for iOS/Android

edge computing teams deploying models to IoT devices and embedded systems

Requires

ONNX Runtime 1.10+ with appropriate execution provider (CPU, CUDA, TensorRT, CoreML, NNAPI)

CUDA 11.0+ and cuDNN 8.0+ for GPU acceleration (optional)

Python 3.7+ or C++17 for runtime bindings

Limitations

ONNX Runtime provider availability varies by platform; GPU support requires CUDA 11.0+ or specific GPU drivers

Operator coverage incomplete for some transformers extensions; custom ops may not be supported

Quantization to int8 reduces accuracy by 1-3% on SQuAD; requires calibration on representative data

What makes it unique

vs alternatives

10-50x faster inference than Python-based transformers on GPU (via TensorRT), and 100x smaller deployment footprint than full PyTorch runtime

batch inference with dynamic sequence padding

Medium confidence

Solves for

Best for

data engineers building batch QA processing pipelines for document analysis

researchers evaluating models on large QA datasets (SQuAD, Natural Questions)

teams implementing bulk inference services with throughput requirements >100 samples/sec

Requires

batch_size parameter tuned for available GPU/CPU memory

tokenizer with padding support (pad_token_id defined)

attention mask generation (automatic in transformers library)

Limitations

Batch size is memory-constrained; batch_size=32 requires ~2GB VRAM on GPU, limiting throughput on edge devices

Padding to fixed sequence length (384) wastes computation on short passages; average utilization ~60-70%

Dynamic batching requires buffering requests, adding latency for real-time single-sample inference

What makes it unique

vs alternatives

5-10x higher throughput than sequential inference on GPU, and 2-3x better latency than naive batching without attention mask optimization

model quantization to int8 with minimal accuracy loss

Medium confidence

Solves for

Best for

mobile app developers building offline QA features with strict size constraints

edge device teams deploying to IoT/embedded systems with limited storage

teams running inference on older CPUs without AVX-512 support

Requires

ONNX Runtime 1.10+ with quantization tools

calibration dataset (SQuAD or domain-specific QA pairs)

CPU with int8 arithmetic support (most modern x86/ARM)

Limitations

int8 quantization reduces SQuAD F1 score by 1-3% compared to float32; unacceptable for high-precision applications

Quantization calibration requires representative data (SQuAD training set); domain-specific accuracy loss may be higher

int8 inference requires CPU support for integer operations; older ARM processors may not have efficient int8 kernels

What makes it unique

vs alternatives

2-4x faster CPU inference and 75% smaller model size vs float32, with <3% accuracy loss on SQuAD (vs 5-10% for naive quantization)

squad dataset fine-tuning and transfer learning

Medium confidence

Solves for

Best for

NLP practitioners building domain-specific QA systems with limited labeled data

researchers studying transfer learning from Wikipedia to specialized domains

teams migrating from generic QA to industry-specific applications (finance, healthcare)

Requires

PyTorch 1.9+ or TensorFlow 2.4+ for fine-tuning

transformers library 4.0+ with DistilBERT model

GPU with 8GB+ VRAM for fine-tuning (batch_size=16-32)

Limitations

SQuAD is Wikipedia-based; transfer learning may fail on highly specialized domains (medical terminology, legal jargon) without domain-specific pre-training

Fine-tuning requires PyTorch/TensorFlow and GPU; ONNX format is inference-only and cannot be directly fine-tuned

SQuAD assumes single correct answer; fine-tuning on multi-answer datasets requires custom loss functions

What makes it unique

vs alternatives

Fine-tuning requires 5-10x less data than training from scratch, and 3-5x faster than BERT fine-tuning while achieving 95%+ of BERT's domain-specific accuracy

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to distilbert-onnx

wink-embeddings-sg-100d24Repository

100-dimensional English word embeddings for wink-nlp

Compare →

voyage-ai-provider30API

Voyage AI Provider for running Voyage AI models with Vercel AI SDK

Compare →

@vibe-agent-toolkit/rag-lancedb27Agent

LanceDB implementation of RAG interfaces for vibe-agent-toolkit

Compare →

vectra41Repository

A lightweight, file-backed vector database for Node.js and browsers with Pinecone-compatible filtering and hybrid BM25 search.

Compare →

distilbert-onnx

Capabilities6 decomposed

extractive question-answering with onnx inference

squad-compatible span prediction with token-level alignment

cross-platform onnx runtime inference with hardware acceleration

batch inference with dynamic sequence padding

model quantization to int8 with minimal accuracy loss

squad dataset fine-tuning and transfer learning

Related Artifactssharing capabilities

onnxruntime

ONNX Runtime Mobile

ONNX Runtime

bge-reranker-base

DeBERTa-v3-large-mnli-fever-anli-ling-wanli

yolov11-license-plate-detection

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Model Details

About

Categories

Alternatives to distilbert-onnx

Are you the builder of distilbert-onnx?

Get the weekly brief

Data Sources

distilbert-onnx

Capabilities6 decomposed

extractive question-answering with onnx inference

squad-compatible span prediction with token-level alignment

cross-platform onnx runtime inference with hardware acceleration

batch inference with dynamic sequence padding

model quantization to int8 with minimal accuracy loss

squad dataset fine-tuning and transfer learning

Related Artifactssharing capabilities

onnxruntime

ONNX Runtime Mobile

ONNX Runtime

bge-reranker-base

DeBERTa-v3-large-mnli-fever-anli-ling-wanli

yolov11-license-plate-detection

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Model Details

About

Categories

Alternatives to distilbert-onnx

Are you the builder of distilbert-onnx?

Get the weekly brief

Data Sources