What is Signal-Driven Decision?
Signal-Driven Decision is the core architecture that enables intelligent routing by extracting multiple signals from requests and combining them to make better routing decisions.
The Core Ideaโ
Traditional routing uses a single signal:
# Traditional: Single classification model
if classifier(query) == "math":
route_to_math_model()
Signal-driven routing uses multiple signals:
# Signal-driven: Multiple signals combined
if (keyword_match AND domain_match) OR high_embedding_similarity:
route_to_math_model()
Why this matters: Multiple signals voting together make more accurate decisions than any single signal.
The 8 Signal Typesโ
1. Keyword Signalsโ
- What: Fast pattern matching with AND/OR operators
- Latency: Less than 1ms
- Use Case: Deterministic routing, compliance, security
signals:
keywords:
- name: "math_keywords"
operator: "OR"
keywords: ["calculate", "equation", "solve", "derivative"]
Example: "Calculate the derivative of x^2" โ Matches "calculate" and "derivative"
2. Embedding Signalsโ
- What: Semantic similarity using embeddings
- Latency: 10-50ms
- Use Case: Intent detection, paraphrase handling
signals:
embeddings:
- name: "code_debug"
threshold: 0.70
candidates:
- "My code isn't working, how do I fix it?"
- "Help me debug this function"
Example: "Need help debugging this function" โ 0.78 similarity โ Match!
3. Domain Signalsโ
- What: MMLU domain classification (14 categories)
- Latency: 50-100ms
- Use Case: Academic and professional domain routing
signals:
domains:
- name: "mathematics"
mmlu_categories: ["abstract_algebra", "college_mathematics"]
Example: "Prove that the square root of 2 is irrational" โ Mathematics domain
4. Fact Check Signalsโ
- What: ML-based detection of queries needing fact verification
- Latency: 50-100ms
- Use Case: Healthcare, financial services, education
signals:
fact_checks:
- name: "factual_queries"
threshold: 0.75
Example: "What is the capital of France?" โ Needs fact checking
5. User Feedback Signalsโ
- What: Classification of user feedback and corrections
- Latency: 50-100ms
- Use Case: Customer support, adaptive learning
signals:
user_feedbacks:
- name: "negative_feedback"
feedback_types: ["correction", "dissatisfaction"]
Example: "That's wrong, try again" โ Negative feedback detected
6. Preference Signalsโ
- What: LLM-based route preference matching
- Latency: 200-500ms
- Use Case: Complex intent analysis
signals:
preferences:
- name: "creative_writing"
llm_endpoint: "http://localhost:8000/v1"
model: "gpt-4"
routes:
- name: "creative"
description: "Creative writing, storytelling, poetry"
Example: "Write a story about dragons" โ Creative route preferred
7. Language Signalsโ
- What: Multi-language detection (100+ languages)
- Latency: Less than 1ms
- Use Case: Route queries to language-specific models or apply language-specific policies
signals:
language:
- name: "en"
description: "English language queries"
- name: "es"
description: "Spanish language queries"
- name: "zh"
description: "Chinese language queries"
- name: "ru"
description: "Russian language queries"
- Example 1: "Hola, ยฟcรณmo estรกs?" โ Spanish (es) โ Spanish model
- Example 2: "ไฝ ๅฅฝ๏ผไธ็" โ Chinese (zh) โ Chinese model
8. Latency Signals - TPOT-based Routingโ
What: Model latency evaluation using TPOT (Time Per Output Token) Latency: Less than 1ms (cache lookup) Use Case: Route latency-sensitive queries to faster models
signals:
latency:
- name: "low_latency"
max_tpot: 0.05 # 50ms per token
description: "For real-time chat applications"
Example: Real-time chat query โ low_latency signal โ Route to fast model (TPOT < 50ms/token)
How it works: TPOT is automatically tracked from each response. The latency classifier evaluates if available models meet the TPOT threshold before routing.
How Signals Combineโ
AND Operator - All Must Matchโ
decisions:
- name: "advanced_math"
rules:
operator: "AND"
conditions:
- type: "keyword"
name: "math_keywords"
- type: "domain"
name: "mathematics"
- Logic: Route to advanced_math only if both keyword AND domain match
- Use Case: High-confidence routing (reduce false positives)
OR Operator - Any Can Matchโ
decisions:
- name: "code_help"
rules:
operator: "OR"
conditions:
- type: "keyword"
name: "code_keywords"
- type: "embedding"
name: "code_debug"
- Logic: Route to code_help if keyword OR embedding matches
- Use Case: Broad coverage (reduce false negatives)
Nested Logic - Complex Rulesโ
decisions:
- name: "verified_math"
rules:
operator: "AND"
conditions:
- type: "domain"
name: "mathematics"
- operator: "OR"
conditions:
- type: "keyword"
name: "proof_keywords"
- type: "fact_check"
name: "factual_queries"
- Logic: Route if (mathematics domain) AND (proof keywords OR needs fact checking)
- Use Case: Complex routing scenarios
Real-World Exampleโ
User Queryโ
"Prove that the square root of 2 is irrational"
Signal Extractionโ
signals_detected:
keyword: true # "prove", "square root", "irrational"
embedding: 0.89 # High similarity to math queries
domain: "mathematics" # MMLU classification
fact_check: true # Proof requires verification
Decision Processโ
decision: "advanced_math"
reason: "All math signals agree (keyword + embedding + domain + fact_check)"
confidence: 0.95
selected_model: "qwen-math"
Why This Worksโ
- Multiple signals agree: High confidence
- Fact checking enabled: Quality assurance
- Specialized model: Best for mathematical proofs
Next Stepsโ
- Configuration Guide - Configure signals and decisions
- Keyword Routing Tutorial - Learn keyword signals
- Embedding Routing Tutorial - Learn embedding signals
- Domain Routing Tutorial - Learn domain signals