Abstractive Text Summarization With Pre Trained Transformer Encoder Decoder

1

Llama-3.1-8B-InstructModel57/100

via “content summarization and extraction”

text-generation model by undefined. 95,66,721 downloads.

Unique: Instruction-tuned abstractive summarization using full 128K context window to process entire documents without chunking; learns summarization patterns from training data rather than using extractive algorithms, enabling flexible output formats and style adaptation

vs others: Handles longer documents than Mistral-7B (smaller context) and provides more flexible summarization than rule-based extractive tools; comparable to GPT-3.5 on quality but with local deployment and no API costs

2

Qwen3-4BModel55/100

via “summarization and abstractive text compression”

text-generation model by undefined. 72,05,785 downloads.

Unique: Qwen3-4B is instruction-tuned on diverse summarization tasks, enabling effective abstractive summarization without task-specific fine-tuning; smaller model size enables faster summarization of large document batches

vs others: Comparable summarization quality to larger models like GPT-3.5 for most domains; faster inference enables real-time summarization in production systems

3

opt-125mModel53/100

via “autoregressive text generation with transformer decoder architecture”

text-generation model by undefined. 79,12,032 downloads.

Unique: OPT uses a standard transformer decoder architecture with no architectural innovations, but distinguishes itself through permissive licensing (OPL) and transparent training methodology documented in arxiv:2205.01068, enabling reproducible research without commercial restrictions unlike GPT-3/4

vs others: Smaller and faster to run than GPT-2 (1.5B) with similar quality, but lacks instruction-tuning of Alpaca/Vicuna and safety alignment of InstructGPT, making it better for research baselines than production chatbots

4

bart-large-cnnModel51/100

via “abstractive-summarization-with-bart-encoder-decoder”

summarization model by undefined. 19,35,931 downloads.

Unique: Uses BART's denoising autoencoder architecture (trained with corrupted input reconstruction) combined with CNN/DailyMail fine-tuning, enabling abstractive summarization that generates novel phrasings rather than extractive copying. The encoder-decoder design with cross-attention allows the model to dynamically attend to relevant source passages while generating each summary token, unlike simpler seq2seq models.

vs others: Outperforms extractive summarization baselines and earlier seq2seq models on ROUGE metrics for news summarization; more abstractive than PEGASUS but with faster inference than T5-large due to smaller parameter count (406M vs 770M), making it the practical choice for resource-constrained production deployments.

5

t5-smallModel51/100

via “abstractive text summarization with task-prefix conditioning”

translation model by undefined. 23,37,740 downloads.

Unique: Uses task-prefix conditioning ('summarize:') to enable summarization without architectural changes; pre-training on denoising objectives (span corruption, infilling) implicitly teaches compression and paraphrasing rather than explicit summarization supervision

vs others: Simpler to deploy than BART or Pegasus (no task-specific fine-tuning required); smaller than extractive summarization baselines but with lower factuality guarantees

6

t5-baseModel50/100

via “abstractive text summarization with extractive-abstractive hybrid capability”

translation model by undefined. 22,35,007 downloads.

Unique: Unified encoder-decoder architecture enables abstractive summarization without separate extractive pre-processing or pointer networks. Learned from C4 denoising objective (span corruption) which teaches the model to compress and paraphrase text, directly applicable to summarization without task-specific architectural modifications.

vs others: Simpler and more end-to-end than extractive+abstractive pipelines (e.g., BERT-based extractors + BART generators), while achieving comparable ROUGE scores on CNN/DailyMail with a single unified model; 3-5x smaller than BART-large.

7

distilbart-cnn-12-6Model48/100

via “abstractive text summarization with distilled bart architecture”

summarization model by undefined. 11,11,635 downloads.

Unique: Achieves 40% parameter reduction (12/6 layer configuration) compared to BART-large through knowledge distillation while maintaining 90%+ ROUGE score parity on CNN/DailyMail; uses asymmetric encoder-decoder design (12 encoder layers preserve input understanding, 6 decoder layers reduce generation cost) rather than uniform compression

vs others: 3-5x faster inference than full BART-large and 2x faster than PEGASUS on identical hardware while maintaining competitive summary quality, making it ideal for cost-sensitive production deployments

8

t5-3bModel46/100

via “abstractive text summarization with length control”

translation model by undefined. 8,75,782 downloads.

Unique: Task prefix routing ('summarize:') enables length-controlled abstractive summarization without task-specific heads; length_penalty decoding parameter allows dynamic compression ratio tuning without retraining, unlike fixed-length summarization models

vs others: More flexible than BART (fixed summary length) and faster than T5-11B; supports dynamic length control that PEGASUS lacks without fine-tuning

9

pegasus-xsumModel45/100

via “abstractive text summarization with pre-trained transformer encoder-decoder”

summarization model by undefined. 2,39,806 downloads.

Unique: PEGASUS uses gap-sentence generation as pre-training objective (masking and regenerating complete sentences rather than random tokens), which directly aligns with abstractive summarization task and produces superior compression ratios compared to BERT-based approaches. Fine-tuning on XSum's abstractive summaries (not extractive) creates a model specifically optimized for semantic paraphrasing rather than sentence selection.

vs others: Outperforms BART and T5 on XSum benchmark (ROUGE-1: 47.21 vs 44.16 for BART) due to pre-training objective alignment, while maintaining comparable inference speed and model size to alternatives.

10

t5-largeModel45/100

via “abstractive summarization via conditional text generation with length control”

translation model by undefined. 4,73,953 downloads.

Unique: Unified text2text architecture allows summarization without task-specific fine-tuning on pre-trained weights; length control via beam search parameters and optional length tokens in input prefix, enabling dynamic summary length without retraining. Encoder-decoder design preserves full source document context during generation, unlike decoder-only models that must compress context into prompt.

vs others: More flexible than BART for length-controlled summarization due to explicit length token support; faster inference than T5-XL (3B) with minimal ROUGE score degradation on CNN/DailyMail benchmark

11

bart-large-cnn-samsumModel44/100

via “abstractive-summarization-with-bart-architecture”

summarization model by undefined. 2,60,012 downloads.

Unique: Fine-tuned specifically on SAMSum (dialogue summarization dataset with 16k+ annotated conversations) rather than generic CNN/DailyMail news summarization; BART's denoising pre-training (text infilling, permutation, deletion) enables stronger generalization to conversational patterns with fewer parameters than encoder-only models

vs others: Outperforms extractive summarization baselines and smaller T5 models on dialogue tasks due to BART's hybrid encoder-decoder architecture and dialogue-specific fine-tuning, while remaining 40% smaller than BART-large-xsum for faster inference

12

DeBERTa-v3-base-mnli-fever-anliModel43/100

via “transformer-based semantic encoding with disentangled attention”

zero-shot-classification model by undefined. 64,968 downloads.

Unique: DeBERTa-v3's disentangled attention separates content and position embeddings, improving semantic representation quality and attention efficiency compared to standard BERT-style encoders; 768-dimensional output balances semantic richness with computational efficiency for embedding-based retrieval systems

vs others: Produces higher-quality semantic embeddings than BERT-base due to architectural improvements; more efficient than larger models (DeBERTa-large, T5) while maintaining competitive performance on semantic similarity and retrieval tasks

13

mT5_multilingual_XLSumModel40/100

via “multilingual abstractive summarization with mt5 encoder-decoder architecture”

summarization model by undefined. 56,827 downloads.

Unique: Uses mT5's shared multilingual encoder (trained on 101 languages) with XLSum's 1.35M+ document-summary pairs across 19 languages, enabling zero-shot summarization for low-resource languages through cross-lingual transfer — unlike monolingual models (BART, Pegasus) that require separate fine-tuning per language

vs others: Covers 19 languages with a single 580M-parameter model vs maintaining separate summarizers per language; outperforms mBERT-based summarization on ROUGE scores due to T5's text-to-text generation paradigm, though slower than distilled models like DistilmT5 for latency-critical applications

14

MEETING_SUMMARYModel39/100

via “transformer-based-abstractive-compression-with-attention-visualization”

summarization model by undefined. 61,649 downloads.

Unique: BART's denoising pre-training produces more interpretable attention patterns than standard seq2seq models because it learns to reconstruct corrupted text, creating explicit alignment between input and output. The model's attention heads specialize into different roles (copy, paraphrase, aggregation) that can be analyzed independently.

vs others: More interpretable than black-box API-based summarization (GPT-3.5) and more flexible than extractive methods which cannot show reasoning about information combination or rephrasing.

15

pegasus-largeModel37/100

via “abstractive-summarization-with-pretrained-pegasus-encoder-decoder”

summarization model by undefined. 25,976 downloads.

Unique: Uses gap-sentence-generation (GSG) pretraining objective instead of standard masked language modeling (MLM), which directly optimizes for sentence-level understanding and abstractive generation by masking entire sentences and forcing the model to predict them from context. This is more aligned with summarization tasks than BERT-style MLM pretraining.

vs others: Outperforms BART and T5-base on CNN/DailyMail and XSum benchmarks (ROUGE-1: 43.9 vs 42.9) due to GSG pretraining, while being smaller and faster than T5-large, making it ideal for resource-constrained production deployments.

16

distilbart-cnn-6-6Model37/100

via “abstractive-summarization-with-distilled-bart”

summarization model by undefined. 33,640 downloads.

Unique: Uses knowledge distillation to compress BART from 12 to 6 encoder-decoder layers, achieving ~50% parameter reduction while retaining abstractive quality through teacher-student training on CNN/DailyMail and XSum. This is a deliberate trade-off of model capacity for inference speed, unlike full-size BART which prioritizes quality over efficiency.

vs others: Faster inference than full BART (6 vs 12 layers) with lower memory footprint than T5-base, while maintaining better abstractive quality than extractive baselines; trade-off is reduced capacity on out-of-distribution text compared to larger models like BART-large or T5-large

17

text_summarizationModel36/100

via “abstractive text summarization with t5 architecture”

summarization model by undefined. 12,272 downloads.

Unique: Uses T5's unified text-to-text framework where summarization is treated as a conditional generation task with a 'summarize:' prefix token, enabling transfer learning from diverse NLP tasks and supporting multi-task fine-tuning patterns that improve generalization

vs others: More abstractive and semantically coherent than extractive baselines (TextRank, BERT-based) because it learns to paraphrase; lighter-weight and faster than GPT-3.5/4 APIs while maintaining reasonable quality for general English documents

18

t5-base-indonesian-summarization-casedModel36/100

via “indonesian-language abstractive text summarization with t5 architecture”

summarization model by undefined. 10,971 downloads.

Unique: Fine-tuned specifically on Indonesian news corpus (ID_Liputan6 dataset) with cased token handling, enabling domain-optimized abstractive summarization for Indonesian rather than relying on multilingual or English-centric models with language-specific performance degradation

vs others: Outperforms generic multilingual T5 models on Indonesian news summarization by 3-5 ROUGE points due to domain-specific fine-tuning, while remaining significantly lighter than large multilingual models (mT5-large, mBART) for deployment-constrained environments

19

distilbart-cnn-6-6Model35/100

via “text2text-generation-with-encoder-decoder-architecture”

summarization model by undefined. 22,746 downloads.

Unique: BART's denoising autoencoder pre-training (corrupting and reconstructing text) enables strong transfer learning to diverse text-to-text tasks without task-specific fine-tuning. The 6-layer distilled variant maintains this capability while reducing inference latency 2-3x vs full BART, making it practical for real-time applications. Differs from GPT-style decoder-only models by using explicit encoder-decoder separation, which improves efficiency for tasks with long inputs and short outputs.

vs others: More efficient than full BART for summarization (2-3x faster) and more task-flexible than task-specific models, but slower than decoder-only models (GPT-2, GPT-3) and less capable at instruction-following or few-shot learning.

20

t5-small-booksumModel34/100

via “abstractive-text-summarization-with-t5-encoder-decoder”

summarization model by undefined. 16,506 downloads.

Unique: Fine-tuned specifically on BookSum (405K literary chapter-summary pairs) rather than generic news/Wikipedia corpora, making it architecturally optimized for narrative and long-form prose summarization with better preservation of plot and character details compared to BART or Pegasus models trained on news datasets

vs others: Smaller footprint (60M params) than T5-base (220M) with better narrative understanding than BART-large-cnn (trained on CNN/DailyMail news), enabling faster inference on edge devices while maintaining literary text quality

Top Matches

Also Known As

Company