Capability
3 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →via “permissively-licensed source code dataset curation and aggregation”
67 TB permissively licensed code dataset across 600+ languages.
Unique: Largest open-source code dataset at 67 TB with automated opt-out governance allowing repository owners to request removal, combined with rigorous deduplication and PII removal pipeline — no other public dataset offers this scale with legal compliance and community control mechanisms
vs others: Larger and more legally compliant than GitHub's CodeSearchNet (14M files) or Google's BigQuery public datasets, with explicit opt-out governance vs. implicit inclusion, and covers 600+ languages vs. Codex training data's undisclosed language distribution
via “multi-language code corpus assembly with permissive licensing verification”
783 GB curated code dataset from 86 languages with PII redaction.
Unique: Explicit permissive-only licensing filter with SPDX validation at collection time, combined with opt-out mechanism for developers — most competing datasets (CodeSearchNet, GitHub-Code) lack developer opt-out and include mixed licensing
vs others: Legally cleaner than CodeSearchNet (mixed GPL/proprietary) and more developer-respectful than GitHub-Code (no opt-out), making it safer for commercial model training
via “multi-language license validation code generation”
Open-source software licensing SDK. Generate ready-to-paste license validation code for C, C++, Rust, Python, Electron, Tauri, Unity, and JUCE. Explain machine binding, offline validation, trial keys, and anti-tamper. Scaffold Docker, Fly.io, Railway, and VPS server deployments. No API key required.
Unique: Generates language-idiomatic, zero-dependency validation code for 8+ languages from unified schema, with offline-first architecture built into generated code — not a wrapper around a shared validation service
vs others: Faster deployment than building custom license validation per language because generated code is immediately production-ready and requires no external service calls, unlike cloud-based licensing platforms
Building an AI tool with “Multi Language Code Corpus Assembly With Permissive Licensing Verification”?
Submit your artifact →curl unfragile.ai/agents.md | sh© 2026 Unfragile. The platform for software for agents.