Elo Rating Selection
Elo Rating selection uses a runtime rating system to rank models based on user feedback. Models that receive positive feedback gain rating points; those with negative feedback lose points. Over time, better-performing models rise to the top and get selected more often.
This approach uses the Bradley-Terry model (pairwise comparison framework) to continuously improve model selection through online learning.
Note on RouteLLM: The RouteLLM paper (Ong et al.) trains static router models on preference data and achieves ~50% cost reduction (2x savings). Our implementation takes a different approach: a runtime Elo rating system that updates dynamically based on live feedback rather than pre-trained static routing.
Algorithm Flow
Mathematical Foundation
Expected Score (Bradley-Terry Model)
The expected probability that model A beats model B:
E_A = 1 / (1 + 10^((R_B - R_A) / 400))
Where:
R_A= Rating of model AR_B= Rating of model BE_A= Expected score (probability A wins)