Capability
Large Scale Image Text Dataset Access
5 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “large-scale image-text pair dataset with clip-based quality filtering”
5.85 billion image-text pairs foundational for image generation.
Unique: Largest openly available image-text dataset (5.85B pairs) with pre-computed CLIP similarity scores for every pair, enabling quality-aware filtering without re-embedding; organized into language-specific clusters and distributed across multiple providers for redundancy and accessibility
vs others: 14x larger than LAION-400M and orders of magnitude larger than proprietary datasets (DALL-E, Imagen training data), with open access and no licensing restrictions, making it the de facto foundation for open-source image generation models