Capability
Filtered Dataset Subset Creation
8 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “dataset subset creation and curation”
5.85 billion image-text pairs foundational for image generation.
Unique: Enables reproducible subset creation by combining pre-computed metadata filters (CLIP scores, NSFW flags, watermark flags, language tags, aesthetic scores) without reprocessing images. Subsets can be created at dataset creation time or dynamically at training time.
vs others: Enables reproducible curation vs ad-hoc filtering; combines multiple quality signals (CLIP, NSFW, watermark, aesthetic) vs single-signal filtering; supports language-aware subsetting vs monolingual alternatives