chinese-llm-benchmarkAgent47/100 via “multi-tier model leaderboard organization with category-based filtering”
ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、MiniMax-M2.5、deepseek-v3.2、Qwen3.5、llama4、智谱GLM-5、GLM-4.7、LongCat、gemma3、mistral等开源大模型。不仅提供排行榜,也提供规模超20
Unique: Implements multi-dimensional leaderboard organization (commercial/open-source primary split, then price tier or parameter size secondary split) with separate ranked lists for reasoning-specialized models. Uses markdown-based leaderboard storage (commerce2.md, reasonmodel.md, alldata.md) enabling version control and community contributions. Maintains model metadata (provider, parameters, pricing) alongside evaluation scores for context-aware comparison.
vs others: More granular category-based filtering than MMLU leaderboards (which use single global ranking) and explicit price-tier organization vs Hugging Face Model Hub (which lacks domain-specific performance context)