Research Publications
P
POSITION PAPER
POSITION PAPER
vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models
Venue:arXiv Technical Report
We introduce vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality deployments that composes heterogeneous signals into deployment-specific routing policies across cost, privacy, latency, and safety constraints.
2026Paper
P
RESEARCH PUBLICATION
When to Reason: Semantic Router for vLLM
Venue:NeurIPS - MLForSys
We present a semantic router that classifies queries based on their reasoning requirements and selectively applies reasoning only when beneficial.
2025Paper
P
RESEARCH PUBLICATION
Category-Aware Semantic Caching for Heterogeneous LLM Workloads
We present a category-aware semantic caching where similarity thresholds, TTLs, and quotas vary by query category, with a hybrid architecture separating in-memory HNSW search from external document storage.
2025Paper
P
RESEARCH PUBLICATION
Semantic Inference Routing Protocol (SIRP)
Venue:Internet Engineering Task Force (IETF)
This document specifies the Semantic Inference Routing Protocol (SIRP), a framework for content-level classification and semantic routing in AI inference systems.
2025Paper
P
RESEARCH PUBLICATION
Multi-Provider Extensions for Agentic AI Inference APIs
Venue:Internet Engineering Task Force (IETF) - Network Management Research Group
This document specifies multi-provider extensions for agentic AI inference APIs. Published: 20 October 2025. Intended Status: Informational. Expires: 23 April 2026.
2025Paper
Conference Presentations
T
CONFERENCE PRESENTATION
Intelligent LLM Routing: A New Paradigm for Multi-Model AI Orchestration in Kubernetes
Venue:KubeCon NA 2025
This research-driven talk introduces a novel architecture paradigm that complements recent advances in timely intelligent inference routing for large language models.
2025Event page
T
CONFERENCE PRESENTATION
vLLM Semantic Router: Unlock the Power of Intelligent Routing
Venue:vLLM Meetup Beijing
A deep dive into vLLM Semantic Router capabilities, demonstrating how intelligent routing can unlock new possibilities for efficient LLM inference.
2025Watch recording
T
CONFERENCE PRESENTATION
AI-Powered vLLM Semantic Router
Venue:vLLM Office Hours
An overview of AI-powered features in vLLM Semantic Router, showcasing the latest developments and community contributions.
2025Watch recording