Scoring And Ranking With Bm25 And Custom Weights

1

lm-evaluation-harnessBenchmark65/100

via “benchmark suite composition and leaderboard aggregation”

EleutherAI's evaluation framework — 200+ benchmarks, powers Open LLM Leaderboard.

Unique: Supports weighted aggregation of metrics across multiple tasks with hierarchical grouping. Leaderboard scores are computed with optional normalization, enabling fair comparison across models with different evaluation configurations.

vs others: Compared to manual leaderboard computation, the framework automates aggregation and ranking. Weighted aggregation enables custom benchmark suites tailored to specific evaluation goals.

2

Open LLM LeaderboardBenchmark63/100

via “multi-benchmark-aggregation-and-ranking”

Hugging Face open-source LLM leaderboard — standardized benchmarks, automatic evaluation.

Unique: Implements a transparent, multi-dimensional aggregation strategy that publishes its weighting logic and allows users to see both composite scores and individual benchmark breakdowns, avoiding the 'black box' ranking problem where a single number obscures important trade-offs

vs others: More nuanced than simple average scoring because it weights different benchmark types and provides per-benchmark visibility, whereas most commercial model APIs only publish cherry-picked metrics

3

RediSearchMCP Server55/100

A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.

Unique: Implements BM25 scoring with field-level weights specified at index creation, enabling domain-specific relevance tuning without custom scoring logic; integrates scoring into query execution to compute scores during result collection rather than post-processing

vs others: More efficient than Elasticsearch's custom scoring because BM25 is computed in-process without script execution; simpler than learning Elasticsearch's scoring DSL because field weights are declarative

4

rank-bm25Repository27/100

via “bm25+ enhanced term frequency handling with saturation control”

Various BM25 algorithms for document ranking

Unique: Implements BM25+ with modified term frequency saturation that ensures monotonic contribution, addressing a theoretical limitation where BM25Okapi's saturation function can produce counter-intuitive score decreases at very high term frequencies

vs others: More theoretically sound than BM25Okapi for term frequency handling, but empirical gains are often marginal and require dataset-specific tuning to realize benefits

5

open_llm_leaderboardWeb App26/100

via “multi-benchmark-aggregation-and-ranking”

open_llm_leaderboard — AI demo on HuggingFace

Unique: Combines heterogeneous benchmarks (code, math, language) with different evaluation methodologies and score scales into a single unified ranking, using deterministic aggregation that maintains reproducibility across leaderboard updates

vs others: More comprehensive than single-benchmark rankings (captures multi-dimensional model quality) and more transparent than proprietary model comparison services (aggregation logic is public and reproducible)

6

BrainnerProduct

via “custom-scoring-model-configuration”

Unique: Enables organizations to customize ranking model weights and train on proprietary hiring data, rather than using a generic pre-trained model, allowing alignment with organization-specific hiring criteria and potentially improving accuracy for niche roles

vs others: More tailored to specific organizations than generic ranking models, but requires more setup effort and introduces risk of encoding organizational biases if training data is not carefully curated

7

VespaProduct

via “custom-ranking-function-definition”

Top Matches

Also Known As

Company