Capability
2 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →via “language-specific code filtering and sampling”
250GB curated code dataset for StarCoder training.
Unique: Provides language-stratified sampling and filtering across 86 languages, enabling researchers to control dataset composition by language. Includes language distribution statistics for informed sampling decisions.
vs others: More flexible than fixed-composition datasets and more comprehensive than language-specific datasets. Enables researchers to study the impact of language diversity on code model performance.
via “language-specific-code-analysis”
Building an AI tool with “Language Specific Code Filtering And Sampling”?
Submit your artifact →curl unfragile.ai/agents.md | sh© 2026 Unfragile. The platform for software for agents.