Autoregressive Text Generation With 20b Parameters

1

Baichuan 2Model59/100

via “inference-time generation parameter tuning (temperature, top-p, top-k)”

Bilingual Chinese-English language model.

Unique: Exposes generation parameters through Hugging Face transformers' standard API, enabling seamless integration with other transformers-based tools. Parameters are applied at inference time without model modification, allowing dynamic adjustment per request.

vs others: Provides fine-grained control over generation behavior without retraining, vs fixed-behavior models. Standard parameter names (temperature, top_p, top_k) are compatible with other LLMs, enabling easy model swapping.

2

Falcon 180BModel58/100

via “large-scale autoregressive text generation with 180b parameters”

TII's 180B model trained on curated RefinedWeb data.

Unique: Largest open-source single-expert (non-MoE) model at release with 180B parameters trained on meticulously cleaned RefinedWeb data (3.5T tokens), achieving competitive reasoning and knowledge performance without mixture-of-experts complexity, enabling deterministic inference patterns and simplified deployment compared to sparse models.

vs others: Larger parameter count than most open-source alternatives (LLaMA 70B, Mistral 8x7B) with claimed GPT-4-competitive reasoning, but requires 2-3x more compute than quantized smaller models and lacks documented instruction-tuning or safety alignment compared to production-ready closed models.

3

LLMs-from-scratchRepository55/100

via “text generation via autoregressive sampling with temperature and top-k/top-p filtering”

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Unique: Implements sampling with explicit temperature scaling and top-k/top-p filtering steps, making the decoding process transparent and modifiable. Includes utilities to visualize probability distributions at each step and to compare outputs across different temperature/sampling settings.

vs others: More interpretable than transformers.generation because each sampling step is explicit; slower due to lack of optimizations like KV-cache reuse, but suitable for understanding generation mechanics and prototyping.

4

trocr-large-handwrittenModel42/100

via “autoregressive-text-generation-from-visual-input”

image-to-text model by undefined. 1,64,795 downloads.

Unique: Implements cross-attention-based visual grounding in the decoder, allowing the model to dynamically focus on different image regions during text generation, rather than using static visual context — this enables better handling of spatially-distributed handwritten text and reduces hallucination of text not present in the image

vs others: More flexible than CTC-based OCR models (which require fixed output alignment) and more interpretable than end-to-end CNN-RNN approaches because attention weights reveal which image regions influenced each generated token

5

HeliosModel34/100

via “autoregressive chunk-based long-video generation from text prompts”

Helios: Real Real-Time Long Video Generation Model

Unique: Achieves minute-scale video generation without conventional anti-drifting strategies (self-forcing, error-banks, keyframe sampling) by using unified history injection and multi-term memory patchification during training, enabling simpler inference pipelines and faster generation on single-GPU setups.

vs others: Faster than Runway ML or Pika Labs for long-form generation (19.5 FPS on H100) because it avoids expensive anti-drifting mechanisms through training-time optimizations rather than inference-time corrections.

6

Mistral: Ministral 3 8B 2512Model23/100

via “efficient text generation with context window management”

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

Unique: Balanced efficiency-to-capability ratio in the 8B class — uses optimized attention mechanisms and training procedures to achieve performance closer to 13B models while maintaining 8B inference speed, making it a sweet spot for production deployments

vs others: Faster inference and lower cost than Llama 2 70B or Mistral 7B while maintaining competitive quality on most text generation tasks

7

GPT-NeoX-20B: An Open-Source Autoregressive Language Model (GPT-NeoX)Model20/100

* ⭐ 04/2022: [PaLM: Scaling Language Modeling with Pathways (PaLM)](https://arxiv.org/abs/2204.02311)

Unique: First open-source 20B-parameter model trained on diverse, curated data (EleutherAI's The Pile) with full architectural transparency and reproducible training pipeline, enabling community-driven optimization and fine-tuning without proprietary restrictions

vs others: Larger and more capable than GPT-2 (1.5B) with comparable inference cost to smaller models, while maintaining full open-source licensing unlike GPT-3 (closed API) and competitive with contemporaneous models like BLOOM-176B in capability-per-parameter efficiency

8

Build a Large Language Model (From Scratch)Product20/100

via “autoregressive-text-generation”

A guide to building your own working LLM, by Sebastian Raschka.

Unique: Implements multiple decoding strategies (greedy, beam search, top-k/top-p sampling) with explicit control over generation behavior, showing how temperature and filtering affect output diversity

vs others: More transparent than high-level generation APIs, enabling practitioners to understand and modify generation behavior for specific use cases

Top Matches

Also Known As

Company