{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"frontiermath","slug":"frontiermath","name":"FrontierMath","type":"benchmark","url":"https://epochai.org/frontiermath","page_url":"https://unfragile.ai/frontiermath","categories":["testing-quality"],"tags":[],"pricing":{"model":"free","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"frontiermath__cap_0","uri":"capability://data.processing.analysis.expert.authored.frontier.mathematics.problem.curation","name":"expert-authored frontier mathematics problem curation","description":"Curates several hundred original, unpublished mathematics problems authored and peer-reviewed by expert mathematicians across number theory, algebra, geometry, and analysis. Problems are tiered from undergraduate through research-level difficulty (Tiers 1-4), with a separate collection of genuinely unsolved problems that have resisted professional mathematician attempts. The curation process involves expert validation to ensure problems are novel, mathematically sound, and appropriately calibrated for difficulty.","intents":["Evaluate whether AI models can solve frontier-level mathematics problems beyond current capabilities","Test mathematical reasoning across multiple subdisciplines with problems authored by domain experts","Benchmark AI performance on unpublished, original problems to avoid data contamination from public datasets","Measure progress on genuinely open mathematical problems that professional mathematicians have not yet solved"],"best_for":["AI capability researchers measuring frontier mathematical reasoning","Organizations conducting independent model evaluations","Teams building mathematical reasoning systems who need ground-truth difficulty calibration"],"limitations":["Exact problem count unknown — documentation states 'several hundred' without precise inventory","No public leaderboard or baseline performance data available to contextualize results","Problem format specifications unknown — unclear if problems require proofs, numerical answers, or symbolic computation","No information on train/test split or data contamination screening procedures","Evaluation methodology for genuinely unsolved problems not specified"],"requires":["Access to FrontierMath benchmark (access model unknown — may require application)","Mathematical reasoning capability in AI model being evaluated","Ability to parse and execute problem specifications (format unknown)"],"input_types":["mathematical problem statements (format unspecified)"],"output_types":["mathematical solutions (format unspecified — may be proofs, numerical answers, or symbolic expressions)"],"categories":["data-processing-analysis","benchmark-evaluation"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"frontiermath__cap_1","uri":"capability://data.processing.analysis.multi.tier.mathematical.difficulty.stratification","name":"multi-tier mathematical difficulty stratification","description":"Organizes problems into four explicit difficulty tiers (Tiers 1-4) spanning undergraduate through postdoctoral to research-level mathematics, enabling granular measurement of AI reasoning capability across the difficulty spectrum. This tiered structure allows evaluation of whether models can progress from foundational to frontier-level problem-solving, with separate tracking of performance at each tier to identify capability boundaries.","intents":["Measure AI mathematical reasoning across a continuous difficulty spectrum from undergraduate to research level","Identify at what difficulty tier AI models begin to fail or plateau","Compare model performance across subdisciplines at equivalent difficulty levels","Track progress over time as models improve across all difficulty tiers"],"best_for":["Researchers studying AI capability scaling and frontier boundaries","Teams building progressive mathematical reasoning systems","Organizations publishing model evaluation reports with granular difficulty analysis"],"limitations":["Tier definitions and calibration methodology not specified in documentation","No information on problem distribution across tiers (e.g., how many problems per tier)","Scoring methodology per tier unknown — unclear if tiers are weighted equally or differently","No baseline or SOTA performance data per tier to contextualize results"],"requires":["Access to FrontierMath benchmark with tier labels","Evaluation harness capable of stratifying results by tier"],"input_types":["mathematical problems labeled with tier (1-4)"],"output_types":["per-tier performance metrics (format unspecified)"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"frontiermath__cap_2","uri":"capability://planning.reasoning.unsolved.mathematics.problem.evaluation","name":"unsolved mathematics problem evaluation","description":"Maintains a separate collection of genuinely unsolved mathematics problems that have resisted serious attempts by professional mathematicians, enabling evaluation of whether AI can make progress on open research problems. The evaluation approach for these problems is unspecified but conceptually distinct from standard problem-solving — measuring whether AI can contribute novel insights, partial solutions, or proof strategies to problems without known solutions.","intents":["Test whether AI can contribute to genuine mathematical research by solving or advancing unsolved problems","Measure AI capability on problems where no ground-truth solution exists","Evaluate AI's ability to generate novel mathematical insights rather than reproduce known solutions","Benchmark frontier mathematical reasoning on problems that represent actual research frontiers"],"best_for":["Research organizations studying AI's potential for mathematical discovery","Teams evaluating whether AI can contribute to open research problems","Mathematicians interested in AI-assisted problem-solving on frontier problems"],"limitations":["Evaluation methodology for unsolved problems completely unspecified — no rubric for assessing partial progress, novel approaches, or proof strategies","No information on how many unsolved problems are in the collection","Unclear whether unsolved problems are scored separately or combined with Tiers 1-4","No baseline or reference solutions available (by definition of unsolved)","Requires expert mathematician review to validate any claimed progress"],"requires":["Access to unsolved problem collection","Expert mathematician review capability to evaluate proposed solutions or approaches","Mathematical reasoning system capable of generating novel proofs or insights"],"input_types":["unsolved mathematics problem statements"],"output_types":["proposed solutions, partial proofs, or novel approaches (evaluation criteria unknown)"],"categories":["planning-reasoning","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"frontiermath__cap_3","uri":"capability://data.processing.analysis.cross.subdiscipline.mathematical.reasoning.measurement","name":"cross-subdiscipline mathematical reasoning measurement","description":"Evaluates mathematical reasoning across four distinct subdisciplines (number theory, algebra, geometry, analysis) within a single benchmark, enabling assessment of whether AI reasoning generalizes across mathematical domains or exhibits domain-specific strengths and weaknesses. The multi-subdiscipline structure allows identification of which mathematical areas AI handles well versus poorly.","intents":["Measure whether AI mathematical reasoning generalizes across different mathematical subdisciplines","Identify domain-specific strengths and weaknesses in AI mathematical reasoning","Compare AI performance on proof-based (geometry, analysis) versus computational (number theory, algebra) problems","Evaluate breadth of mathematical understanding across the mathematical landscape"],"best_for":["Researchers studying generalization in AI mathematical reasoning","Teams building mathematical systems who need subdiscipline-specific performance data","Organizations publishing comprehensive model evaluations"],"limitations":["No information on problem distribution across subdisciplines (e.g., how many per subdiscipline)","Subdiscipline definitions and boundaries not specified","No baseline performance data per subdiscipline to contextualize results","Unclear whether subdiscipline-specific scoring or weighting is applied"],"requires":["Access to FrontierMath benchmark with subdiscipline labels","Evaluation harness capable of stratifying results by subdiscipline"],"input_types":["mathematical problems labeled by subdiscipline (number theory, algebra, geometry, analysis)"],"output_types":["per-subdiscipline performance metrics"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"frontiermath__cap_4","uri":"capability://data.processing.analysis.independent.ai.capability.measurement.and.publication","name":"independent ai capability measurement and publication","description":"Operates as a free, open-source benchmark maintained by Epoch AI (a nonprofit focused on neutral, evidence-grounded AI capability measurement) with no commercial incentives or vendor lock-in. The benchmark is designed for independent evaluation of AI models, enabling researchers and organizations to assess frontier mathematical reasoning without reliance on proprietary evaluation infrastructure or vendor-controlled leaderboards.","intents":["Conduct independent evaluations of AI mathematical reasoning without vendor bias","Publish model performance results on a neutral, nonprofit-maintained benchmark","Access frontier mathematics problems for research without commercial constraints","Contribute to evidence-grounded AI capability measurement"],"best_for":["Independent researchers and organizations conducting model evaluations","Teams building open-source AI systems who need neutral benchmarks","Nonprofits and academic institutions studying AI capabilities","Policy organizations requiring evidence-grounded capability assessment"],"limitations":["Access model unknown — may require application or have restrictions despite open-source designation","No official leaderboard or results aggregation mentioned — unclear how results are published or compared","Nonprofit status does not guarantee comprehensive documentation or support infrastructure","No commercial incentive for rapid updates or feature additions"],"requires":["Open-source license compliance (license type unspecified)","Ability to access benchmark (access mechanism unknown)"],"input_types":["mathematical problem statements"],"output_types":["evaluation results (publication format unknown)"],"categories":["data-processing-analysis","safety-moderation"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"frontiermath__headline","uri":"capability://testing.quality.advanced.mathematics.benchmark.for.ai.evaluation","name":"advanced mathematics benchmark for ai evaluation","description":"FrontierMath is an expert-level benchmark designed to rigorously evaluate AI systems' capabilities in advanced mathematics, including number theory, algebra, geometry, and analysis through original problem sets.","intents":["best mathematics benchmark for AI","AI evaluation tools for advanced math","top benchmarks for mathematical reasoning","how to test AI in mathematics","challenging math problems for AI evaluation"],"best_for":["evaluating AI reasoning in mathematics"],"limitations":["not suitable for basic math evaluation"],"requires":[],"input_types":[],"output_types":[],"categories":["testing-quality"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":61,"verified":false,"data_access_risk":"low","permissions":["Access to FrontierMath benchmark (access model unknown — may require application)","Mathematical reasoning capability in AI model being evaluated","Ability to parse and execute problem specifications (format unknown)","Access to FrontierMath benchmark with tier labels","Evaluation harness capable of stratifying results by tier","Access to unsolved problem collection","Expert mathematician review capability to evaluate proposed solutions or approaches","Mathematical reasoning system capable of generating novel proofs or insights","Access to FrontierMath benchmark with subdiscipline labels","Evaluation harness capable of stratifying results by subdiscipline"],"failure_modes":["Exact problem count unknown — documentation states 'several hundred' without precise inventory","No public leaderboard or baseline performance data available to contextualize results","Problem format specifications unknown — unclear if problems require proofs, numerical answers, or symbolic computation","No information on train/test split or data contamination screening procedures","Evaluation methodology for genuinely unsolved problems not specified","Tier definitions and calibration methodology not specified in documentation","No information on problem distribution across tiers (e.g., how many problems per tier)","Scoring methodology per tier unknown — unclear if tiers are weighted equally or differently","No baseline or SOTA performance data per tier to contextualize results","Evaluation methodology for unsolved problems completely unspecified — no rubric for assessing partial progress, novel approaches, or proof strategies","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.7,"quality":0.8500000000000001,"ecosystem":0.3,"match_graph":0.25,"freshness":0.75,"weights":{"adoption":0.25,"quality":0.35,"ecosystem":0.15,"match_graph":0.2,"freshness":0.05}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-05-24T12:16:21.549Z","last_scraped_at":null,"last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=frontiermath","compare_url":"https://unfragile.ai/compare?artifact=frontiermath"}},"signature":"G4XI5+YkgQzvd4SskhqwjJVGoKDXoz35M+ooeNwOgEPg+TNGxhn0eDpi1v9xPAXqwhc00gXB0BQrwp4lY3o5DQ==","signedAt":"2026-06-20T22:30:07.967Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/frontiermath","artifact":"https://unfragile.ai/frontiermath","verify":"https://unfragile.ai/api/v1/verify?slug=frontiermath","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}