Skip to main content
Version: ๐Ÿšง Next

What is Signal-Driven Decision?

Signal-Driven Decision is the core architecture that enables intelligent routing by extracting multiple signals from requests and combining them to make better routing decisions.

The Core Ideaโ€‹

Traditional routing uses a single signal:

# Traditional: Single classification model
if classifier(query) == "math":
route_to_math_model()

Signal-driven routing uses multiple signals:

# Signal-driven: Multiple signals combined
if (keyword_match AND domain_match) OR high_embedding_similarity:
route_to_math_model()

Why this matters: Multiple signals voting together make more accurate decisions than any single signal.

The 8 Signal Typesโ€‹

1. Keyword Signalsโ€‹

  • What: Fast pattern matching with AND/OR operators
  • Latency: Less than 1ms
  • Use Case: Deterministic routing, compliance, security
signals:
keywords:
- name: "math_keywords"
operator: "OR"
keywords: ["calculate", "equation", "solve", "derivative"]

Example: "Calculate the derivative of x^2" โ†’ Matches "calculate" and "derivative"

2. Embedding Signalsโ€‹

  • What: Semantic similarity using embeddings
  • Latency: 10-50ms
  • Use Case: Intent detection, paraphrase handling
signals:
embeddings:
- name: "code_debug"
threshold: 0.70
candidates:
- "My code isn't working, how do I fix it?"
- "Help me debug this function"

Example: "Need help debugging this function" โ†’ 0.78 similarity โ†’ Match!

3. Domain Signalsโ€‹

  • What: MMLU domain classification (14 categories)
  • Latency: 50-100ms
  • Use Case: Academic and professional domain routing
signals:
domains:
- name: "mathematics"
mmlu_categories: ["abstract_algebra", "college_mathematics"]

Example: "Prove that the square root of 2 is irrational" โ†’ Mathematics domain

4. Fact Check Signalsโ€‹

  • What: ML-based detection of queries needing fact verification
  • Latency: 50-100ms
  • Use Case: Healthcare, financial services, education
signals:
fact_checks:
- name: "factual_queries"
threshold: 0.75

Example: "What is the capital of France?" โ†’ Needs fact checking

5. User Feedback Signalsโ€‹

  • What: Classification of user feedback and corrections
  • Latency: 50-100ms
  • Use Case: Customer support, adaptive learning
signals:
user_feedbacks:
- name: "negative_feedback"
feedback_types: ["correction", "dissatisfaction"]

Example: "That's wrong, try again" โ†’ Negative feedback detected

6. Preference Signalsโ€‹

  • What: LLM-based route preference matching
  • Latency: 200-500ms
  • Use Case: Complex intent analysis
signals:
preferences:
- name: "creative_writing"
llm_endpoint: "http://localhost:8000/v1"
model: "gpt-4"
routes:
- name: "creative"
description: "Creative writing, storytelling, poetry"

Example: "Write a story about dragons" โ†’ Creative route preferred

7. Language Signalsโ€‹

  • What: Multi-language detection (100+ languages)
  • Latency: Less than 1ms
  • Use Case: Route queries to language-specific models or apply language-specific policies
signals:
language:
- name: "en"
description: "English language queries"
- name: "es"
description: "Spanish language queries"
- name: "zh"
description: "Chinese language queries"
- name: "ru"
description: "Russian language queries"
  • Example 1: "Hola, ยฟcรณmo estรกs?" โ†’ Spanish (es) โ†’ Spanish model
  • Example 2: "ไฝ ๅฅฝ๏ผŒไธ–็•Œ" โ†’ Chinese (zh) โ†’ Chinese model

8. Latency Signals - TPOT-based Routingโ€‹

What: Model latency evaluation using TPOT (Time Per Output Token) Latency: Less than 1ms (cache lookup) Use Case: Route latency-sensitive queries to faster models

signals:
latency:
- name: "low_latency"
max_tpot: 0.05 # 50ms per token
description: "For real-time chat applications"

Example: Real-time chat query โ†’ low_latency signal โ†’ Route to fast model (TPOT < 50ms/token)

How it works: TPOT is automatically tracked from each response. The latency classifier evaluates if available models meet the TPOT threshold before routing.

How Signals Combineโ€‹

AND Operator - All Must Matchโ€‹

decisions:
- name: "advanced_math"
rules:
operator: "AND"
conditions:
- type: "keyword"
name: "math_keywords"
- type: "domain"
name: "mathematics"
  • Logic: Route to advanced_math only if both keyword AND domain match
  • Use Case: High-confidence routing (reduce false positives)

OR Operator - Any Can Matchโ€‹

decisions:
- name: "code_help"
rules:
operator: "OR"
conditions:
- type: "keyword"
name: "code_keywords"
- type: "embedding"
name: "code_debug"
  • Logic: Route to code_help if keyword OR embedding matches
  • Use Case: Broad coverage (reduce false negatives)

Nested Logic - Complex Rulesโ€‹

decisions:
- name: "verified_math"
rules:
operator: "AND"
conditions:
- type: "domain"
name: "mathematics"
- operator: "OR"
conditions:
- type: "keyword"
name: "proof_keywords"
- type: "fact_check"
name: "factual_queries"
  • Logic: Route if (mathematics domain) AND (proof keywords OR needs fact checking)
  • Use Case: Complex routing scenarios

Real-World Exampleโ€‹

User Queryโ€‹

"Prove that the square root of 2 is irrational"

Signal Extractionโ€‹

signals_detected:
keyword: true # "prove", "square root", "irrational"
embedding: 0.89 # High similarity to math queries
domain: "mathematics" # MMLU classification
fact_check: true # Proof requires verification

Decision Processโ€‹

decision: "advanced_math"
reason: "All math signals agree (keyword + embedding + domain + fact_check)"
confidence: 0.95
selected_model: "qwen-math"

Why This Worksโ€‹

  • Multiple signals agree: High confidence
  • Fact checking enabled: Quality assurance
  • Specialized model: Best for mathematical proofs

Next Stepsโ€‹