Installation
This guide will help you install and run the vLLM Semantic Router. The router runs entirely on CPU and does not require GPU for inference.
System Requirements
No GPU required - the router runs efficiently on CPU using optimized BERT models.
Requirements:
- Python: 3.10 or higher
- Docker: Required for running the router container
Quick Start
1. Install vLLM Semantic Router
# Create a virtual environment (recommended)
python -m venv vsr
source vsr/bin/activate # On Windows: vsr\Scripts\activate
# Install from PyPI
pip install vllm-sr
Verify installation:
vllm-sr --version
2. Start vllm-sr
vllm-sr serve
If config.yaml does not exist yet, vllm-sr serve bootstraps a minimal workspace and starts the dashboard in setup mode.
The router will:
- Automatically download required ML models (~1.5GB, one-time)
- Start the dashboard on port 8700
- Start Envoy proxy on port 8888 after activation
- Start the semantic router service after activation
- Enable metrics on port 9190
3. Open the Dashboard
Open http://localhost:8700 in your browser.
For first-run setup:
- Configure one or more models.
- Choose a routing preset or keep the single-model baseline.
- Activate the generated config.
After activation, config.yaml is written to the current directory and the router exits setup mode.
4. Optional: open the dashboard from the CLI
vllm-sr dashboard
5. Test the Router
curl http://localhost:8888/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "MoM",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Common Commands
# View logs
vllm-sr logs router # Router logs
vllm-sr logs envoy # Envoy logs
vllm-sr logs router -f # Follow logs
# Check status
vllm-sr status
# Stop the router
vllm-sr stop
Advanced Configuration
YAML-first workflow
If you prefer to edit YAML directly instead of using the dashboard setup flow:
# Generate a lean advanced sample in the current directory
vllm-sr init
# Validate it before serving
vllm-sr validate config.yaml
vllm-sr init is optional. It generates an advanced sample and .vllm-sr/router-defaults.yaml for YAML-first users. router-defaults.yaml contains advanced runtime defaults and is not required for first-run dashboard setup.
HuggingFace Settings
Set environment variables before starting:
export HF_ENDPOINT=https://huggingface.co # Or mirror: https://hf-mirror.com
export HF_TOKEN=your_token_here # Only for gated models
export HF_HOME=/path/to/cache # Custom cache directory
vllm-sr serve
Custom Options
# Use custom config file
vllm-sr serve --config my-config.yaml
# Use custom Docker image
vllm-sr serve --image ghcr.io/vllm-project/semantic-router/vllm-sr:latest
# Control image pull policy
vllm-sr serve --image-pull-policy always
Next Steps
- Configuration Guide - Advanced routing and signal configuration
- API Documentation - Complete API reference
- Tutorials - Learn by example
Getting Help
- Issues: GitHub Issues
- Community: Join
#semantic-routerchannel in vLLM Slack - Documentation: vllm-semantic-router.com