via “competition-mathematics problem corpus construction and curation”
12.5K competition math problems across 7 subjects and 5 difficulty levels.
Unique: Curated from actual mathematics competitions (AMC/AIME) rather than synthetic or textbook problems, ensuring problems require genuine multi-step reasoning and cannot be solved by pattern matching alone. Includes difficulty stratification (1-5) and subject taxonomy across 7 mathematical domains, enabling fine-grained capability analysis. Verified solutions provided by domain experts, not generated by models.
vs others: More rigorous than general math benchmarks (e.g., SVAMP, MathQA) because it uses authentic competition problems with higher reasoning complexity; more comprehensive than single-domain datasets because it spans 7 mathematical subjects with 12,500 problems; more reliable than synthetic benchmarks because problems are human-authored and competition-tested.