KMeans
Overview
kmeans is a selection algorithm that uses cluster-based routing to assign queries to candidate models. It partitions the query embedding space into clusters and maps each cluster to the best-performing model.
It aligns to config/algorithm/selection/kmeans.yaml.
Implementation: Rust via Linfa (linfa-clustering).
Key Advantages
- Efficient inference: O(k×d) per query (k = clusters, d = embedding dimension).
- Natural grouping of query patterns into clusters.
- Works well when prompt traffic naturally falls into recurring categories.
- Efficiency weight parameter allows quality vs. throughput tradeoff.