Capability
Multi Domain Reasoning Task Stratification
12 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “multi-domain reasoning task stratification”
23 hardest BIG-Bench tasks where models initially failed.
Unique: Explicitly stratifies tasks by reasoning modality (algorithmic, arithmetic, logical, causal, spatial) rather than treating all hard tasks as monolithic, enabling domain-specific capability assessment. This structure allows researchers to correlate model architecture choices with specific reasoning strengths.
vs others: More analytically useful than generic hard task collections because stratification enables root-cause analysis of reasoning failures; more focused than full BIG-Bench which lacks explicit domain organization.