mDeBERTa-v3-base-mnli-xnliModel42/100 via “efficient inference via deberta-v3 architecture with disentangled attention”
zero-shot-classification model by undefined. 2,37,978 downloads.
Unique: DeBERTa-v3's disentangled attention mechanism reduces attention complexity by computing content-to-content and position-to-position attention separately, lowering computational cost compared to standard multi-head attention. Combined with ONNX and SafeTensors export, enables optimized inference across heterogeneous hardware.
vs others: Achieves 2-3x faster inference than standard BERT-base on CPU due to disentangled attention, and supports ONNX quantization for additional 4-8x speedup with minimal accuracy loss, outperforming DistilBERT on accuracy-latency tradeoff for zero-shot classification.