Question Answering With Span Selection From Bidirectional Context

1

bert-large-uncasedModel47/100

via “question-answering via extractive span selection from context”

fill-mask model by undefined. 11,20,072 downloads.

Unique: Implements extractive QA via dual classification heads predicting start/end token positions, leveraging bidirectional context from 24-layer transformer to disambiguate answer boundaries without generating new text, enabling interpretable and hallucination-free answers directly traceable to source passages

vs others: More efficient and interpretable than generative QA models (T5, GPT) for document-based QA, with lower latency and no hallucination risk, but limited to questions answerable by span extraction and requires fine-tuning on QA datasets for competitive performance

2

roberta-base-squad2Model46/100

via “extractive question-answering with span selection”

question-answering model by undefined. 6,23,377 downloads.

Unique: Fine-tuned specifically on SQuAD v2 dataset which includes unanswerable questions, enabling the model to recognize when no valid answer exists in the context rather than hallucinating answers — a critical distinction from v1-only models that always force an answer

vs others: Outperforms BERT-base on SQuAD v2 benchmarks due to RoBERTa's improved pretraining (robustness to input perturbations, larger batch sizes), while remaining lightweight enough for CPU inference unlike larger models like ELECTRA or DeBERTa

3

splinter-baseModel37/100

via “extractive question-answering with span prediction”

question-answering model by undefined. 83,018 downloads.

Unique: Splinter introduces a lightweight span-selection mechanism optimized for efficiency compared to full-sequence generation models; uses a two-pointer approach (start/end token prediction) rather than autoregressive decoding, reducing inference latency by 3-5x versus generative alternatives while maintaining high F1 scores on SQuAD-style benchmarks

vs others: Faster and more deterministic than generative QA models (GPT-based) because it predicts token positions rather than generating sequences, making it ideal for production systems requiring sub-100ms latency and exact source attribution

4

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (BERT)Model22/100

* 🏆 2020: [Language Models are Few-Shot Learners (GPT-3)](https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html)

Unique: Applies bidirectional Transformer representations to span selection by scoring each token's start/end probability independently, enabling the model to use full passage context (both before and after the answer) to disambiguate correct spans, unlike unidirectional models that condition only on preceding context

vs others: Bidirectional context improves span selection accuracy on SQuAD v2.0 (+5.1 F1 improvement) compared to prior unidirectional approaches, particularly for unanswerable questions where the model must recognize absence of valid spans using full passage context

Top Matches

Also Known As

Company