Capability
Positional Embedding Strategies With Extrapolation Support
3 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “alibi positional encoding for extrapolatable long-context attention”
fill-mask model by undefined. 35,60,259 downloads.
Unique: Combines ALiBi with Flash Attention and modern layer normalization (RMSNorm) to achieve length extrapolation without learned position embeddings, enabling zero-shot generalization to 4-8x longer sequences than training data
vs others: Outperforms RoPE (Rotary Position Embeddings) on length extrapolation benchmarks while maintaining lower memory overhead than interpolated positional embeddings used in LLaMA or GPT-3 variants