Capability
Streaming And Lazy Loading For Memory Constrained Access
3 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “streaming-and-lazy-loading-for-memory-constrained-access”
Multilingual web corpus covering 101 languages.
Unique: Implements HTTP range-request-based streaming for Parquet files, enabling on-demand access to specific rows/columns without full download. Integrates with Hugging Face Datasets IterableDataset API for seamless integration with PyTorch DataLoader and Hugging Face Transformers training loops.
vs others: More memory-efficient than downloading full mC4 and more flexible than pre-computed train/test splits, enabling dynamic subset selection and rapid prototyping