OpenAI RAG Integration
This guide demonstrates how to use OpenAI's File Store and Vector Store APIs for RAG (Retrieval-Augmented Generation) in Semantic Router, following the OpenAI Responses API cookbook.
Overview
The OpenAI RAG backend integrates with OpenAI's File Store and Vector Store APIs to provide a first-class RAG experience. It supports two workflow modes:
- Direct Search Mode (default): Synchronous retrieval using vector store search API
- Tool-Based Mode: Adds
file_searchtool to request (Responses API workflow)
Architecture
┌─────────────┐
│ Client │
└──────┬──────┘
│
▼
┌─────────────────────────────────────┐
│ Semantic Router │
│ ┌───────────────────────────────┐ │
│ │ RAG Plugin │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ OpenAI RAG Backend │ │ │
│ │ └──────┬──────────────────┘ │ │
│ └─────────┼─────────────────── ─ ┘ │
└────────────┼─────────────────────── ┘
│
▼
┌─────────────────────────────────────┐
│ OpenAI API │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ File Store │ │Vector Store │ │
│ │ API │ │ API │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────┘
Prerequisites
- OpenAI API key with access to File Store and Vector Store APIs
- Files uploaded to OpenAI File Store
- Vector store created and populated with files
Configuration
Basic Configuration
Add the OpenAI RAG backend to your decision configuration:
decisions:
- name: rag-openai-decision
signals:
- type: keyword
keywords: ["research", "document", "knowledge"]
plugins:
rag:
enabled: true
backend: "openai"
backend_config:
vector_store_id: "vs_abc123" # Your vector store ID
api_key: "${OPENAI_API_KEY}" # Or use environment variable
max_num_results: 10
workflow_mode: "direct_search" # or "tool_based"
Advanced Configuration
rag:
enabled: true
backend: "openai"
similarity_threshold: 0.7
top_k: 10
max_context_length: 5000
injection_mode: "tool_role" # or "system_prompt"
on_failure: "skip" # or "warn" or "block"
cache_results: true
cache_ttl_seconds: 3600
backend_config:
vector_store_id: "vs_abc123"
api_key: "${OPENAI_API_KEY}"
base_url: "https://api.openai.com/v1" # Optional, defaults to OpenAI
max_num_results: 10
file_ids: # Optional: restrict search to specific files
- "file-123"
- "file-456"
filter: # Optional: metadata filter
category: "research"
published_date: "2024-01-01"
workflow_mode: "direct_search" # or "tool_based"
timeout_seconds: 30
Workflow Modes
1. Direct Search Mode (Default)
Synchronous retrieval using vector store search API. Context is retrieved before sending the request to the LLM.
Use Case: When you need immediate context injection and want to control the retrieval process.
Example:
backend_config:
workflow_mode: "direct_search"
vector_store_id: "vs_abc123"
Flow:
- User sends query
- RAG plugin calls vector store search API
- Retrieved context is injected into request
- Request sent to LLM with context