智能体记忆(Agentic Memory)
执行摘要
本文描述 Semantic Router 中智能体记忆的概念验证(POC)。智能体记忆使 AI 智能体能够跨会话记住信息,从而提供连续性与个性化。
POC 范围: 本文为概念验证,非生产级设计。目标是验证核心记忆流(检索 → 注入 → 提取 → 存储)在可接受准确度下可行。生产加固(错误处理、扩展、监控)不在范围内。
核心能力
| 能力 | 说明 |
|---|---|
| 记忆检索 | 基于嵌入的检索与简单预过滤 |
| 记忆写入 | 基于 LLM 的事实与流程提取 |
| 跨会话持久化 | 记忆存于 Milvus(重启可保留;生产级备份/高可用未验证) |
| 用户隔离 | 按 user_id 划分(见下表) |
用户隔离与 Milvus 性能说明:
方式 POC 生产(1 万+ 用户) 简单过滤 检索后按 user_id过滤退化:先搜全库再过滤 分区键 POC 过重 物理隔离,每用户 O(log N) 标量索引 POC 过重 对 user_id建索引以加速过滤POC: 使用简单元数据过滤(测试足够)。
生产: 在 Milvus schema 中将user_id配为分区键或标量索引字段。
关键设计原则
- 简单预过滤 决定是否检索记忆
- 利用历史 上下文窗口 对查询消歧
- LLM 提取事实 并在保存时分类
- 对检索结果做 基于阈值的过滤
POC 明确假设
| 假设 | 含义 | 若错误的风险 |
|---|---|---|
| LLM 提取基本准确 | 可能存入错误事实 | 记忆污染(可用 Forget API 修复) |
| 0.6 相似度阈值为起点 | 可能需调参 | 可依检索质量日志调整 |
| Milvus 可用且已配置 | 宕机则功能关闭 | 优雅降级(不崩溃) |
| 嵌入模型输出 384 维向量 | 须与 Milvus schema 一致 | 启动失败(可检测) |
| 可通过 Response API 链获得历史 | 上下文所需 | 无历史则跳过记忆 |
目录
1. 问题陈述
现状
Response API 通过 previous_response_id 提供会话链,但跨会话知识会丢失:
Session A (March 15):
User: "My budget for the Hawaii trip is $10,000"
→ Saved in session chain
Session B (March 20) - NEW SESSION:
User: "What's my budget for the trip?"
→ No previous_response_id → Knowledge LOST ❌
目标状态
使用智能体记忆时:
Session A (March 15):
User: "My budget for the Hawaii trip is $10,000"
→ Extracted and saved to Milvus
Session B (March 20) - NEW SESSION:
User: "What's my budget for the trip?"
→ Pre-filter: memory-relevant ✓
→ Search Milvus → Found: "budget for Hawaii is $10K"
→ Inject into LLM context
→ Assistant: "Your budget for the Hawaii trip is $10,000!" ✅
2. 架构概览
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENTIC MEMORY ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ExtProc Pipeline │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Request → Fact? → Tool? → Security → Cache → MEMORY → LLM │ │
│ │ │ │ ↑↓ │ │
│ │ └───────┴──── signals used ────────┘ │ │
│ │ │ │
│ │ Response ← [extract & store] ←─────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────┴─────────────────────┐ │
│ │ │ │
│ ┌─────────▼─────────┐ ┌────────────▼───┐ │
│ │ Memory Retrieval │ │ Memory Saving │ │
│ │ (request phase) │ │(response phase)│ │
│ ├───────────────────┤ ├────────────────┤ │
│ │ 1. Check signals │ │ 1. LLM extract │ │
│ │ (Fact? Tool?) │ │ 2. Classify │ │
│ │ 2. Build context │ │ 3. Deduplicate │ │
│ │ 3. Milvus search │ │ 4. Store │ │
│ │ 4. Inject to LLM │ │ │ │
│ └─────────┬─────────┘ └────────┬───────┘ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ └────────►│ Milvus │◄─────────────┘ │
│ └─── ───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
组件职责
| 组件 | 职责 | 位置 |
|---|---|---|
| Memory Filter | 决策 + 检索 + 注入 | pkg/extproc/req_filter_memory.go |
| Memory Extractor | 基于 LLM 的事实提取 | pkg/memory/extractor.go(新建) |
| Memory Store | 存储接口 | pkg/memory/store.go |
| Milvus Store | 向量库后端 | pkg/memory/milvus_store.go |
| Existing Classifiers | Fact/Tool 信号(复用) | pkg/extproc/processor_req_body.go |
存储架构
Issue #808 提出了多层存储架构。本文分阶段实现:
┌─────────────────────────────────────────────────────────────────────────┐
│ STORAGE ARCHITECTURE (Phased) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ PHASE 1 (MVP) │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Milvus (Vector Index) │ │ │
│ │ │ • Semantic search over memories │ │ │
│ │ │ • Embedding storage │ │ │
│ │ │ • Content + metadata │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ PHASE 2 (Performance) │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Redis (Hot Cache) │ │ │
│ │ │ • Fast metadata lookup │ │ │
│ │ │ • Recently accessed memories │ │ │
│ │ │ • TTL/expiration support │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ PHASE 3+ (If Needed) │ │
│ │ ┌───────────────────────┐ ┌───────────────────────┐ │ │
│ │ │ Graph Store (Neo4j) │ │ Time-Series Index │ │ │
│ │ │ • Memory links │ │ • Temporal queries │ │ │
│ │ │ • Relationships │ │ • Decay scoring │ │ │
│ │ └───────────────────────┘ └───────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
| 层 | 用途 | 何时需要 | 状态 |
|---|---|---|---|
| Milvus | 语义向量检索 | 核心能力 | MVP |
| Redis | 热缓存、快速访问、TTL | 性能优化 | Phase 2 |
| Graph (Neo4j) | 记忆关联 | 多跳推理查询 | 按需 |
| Time-Series | 时序查询、衰减 | 按时间的重要性打分 | 按需 |
设计决策: 先从仅 Milvus 开始。其余层按实证需求增加,而非臆测。
Store接口抽象存储,后续可换后端而不改检索/写入逻辑。
3. 记忆类型
| 类型 | 用途 | 示例 | 状态 |
|---|---|---|---|
| Semantic | 事实、偏好、知识 | "User's budget for Hawaii is $10,000" | MVP |
| Procedural | 步骤、流程 | "To deploy payment-service: run npm build, then docker push" | MVP |
| Episodic | 会话摘要、过往事件 | "On Dec 29 2024, user planned Hawaii vacation with $10K budget" | MVP(受限) |
| Reflective | 自省、经验教训 | "Previous budget response was incomplete - user prefers detailed breakdowns" | 未来 |
情景记忆(MVP 限制): 未实现会话结束检测。情景记忆仅在 LLM 提取显式产出摘要式内容时 创建。可靠的会话结束触发推迟到 Phase 2。
反思记忆: 自省与经验教训。不在本 POC 范围内。见 附录 A。
记忆向量空间
记忆按内容/主题聚类,而非按类型。类型是元数据:
┌────────────────────────────────────────────────────────────────────────┐
│ MEMORY VECTOR SPACE │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ BUDGET/MONEY │ │ DEPLOYMENT │ │
│ │ CLUSTER │ │ CLUSTER │ │
│ │ │ │ │ │
│ │ ● budget=$10K │ │ ● npm build │ │
│ │ (semantic) │ │ (procedural) │ │
│ │ ● cost=$5K │ │ ● docker push │ │
│ │ (semantic) │ │ (procedural) │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ ● = memory with type as metadata │
│ Query matches content → type comes from matched memory │
│ │
└─────── ─────────────────────────────────────────────────────────────────┘
Response API 与智能体记忆:何时有价值?
关键区分: 当存在 previous_response_id 时,Response API 已把完整对话历史发给 LLM。智能体记忆的价值在于跨会话上下文。
┌─────────────────────────────────────────────────────────────────────────┐
│ RESPONSE API vs. AGENTIC MEMORY: CONTEXT SOURCES │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ SAME SESSION (has previous_response_id): │
│ ───────────────────────────────────────── │
│ Response API provides: │
│ └── Full conversation chain (all turns) → sent to LLM │
│ │
│ Agentic Memory: │
│ └── STILL VALUABLE - current session may not have the answer │
│ └── Example: 100 turns planning vacation, but budget never said │
│ └── Days ago: "I have 10K spare, is that enough for a week in │
│ Thailand?" → LLM extracts: "User has $10K budget for trip" │
│ └── Now: "What's my budget?" → answer in memory, not this chain │
│ │
│ NEW SESSION (no previous_response_id): │
│ ────────────────────────────────────── │
│ Response API provides: │
│ └── Nothing (no chain to follow) │
│ │
│ Agentic Memory: │
│ └── ADDS VALUE - retrieves cross-session context │
│ └── "What was my Hawaii budget?" → finds fact from March session │
│ │
└────────── ───────────────────────────────────────────────────────────────┘
设计决策: 记忆检索在两种场景下都有价值——新会话(无链)与已有会话(查询可能引用其他会话)。预过滤通过时始终检索。
已知冗余: 若答案已在当前链中,仍会检索记忆(浪费约 10–30ms)。若不语义理解查询,无法廉价判断「答案是否已在历史中」。POC 接受该开销。
Phase 2 方案: 上下文压缩 可正确处理——不再由 Response API 发送全量历史,而发送压缩摘要 + 最近轮次 + 相关记忆。摘要在汇总时提取事实,从而消除冗余。
4. 流水线集成
当前流水线(main 分支)
1. Response API Translation
2. Parse Request
3. Fact-Check Classification
4. Tool Detection
5. Decision & Model Selection
6. Security Checks
7. PII Detection
8. Semantic Cache Check
9. Model Routing → LLM
集成智能体记忆后的增强流水线
REQUEST PHASE:
─────────────
1. Response API Translation
2. Parse Request
3. Fact-Check Classification ──┐
4. Tool Detection ├── Existing signals
5. Decision & Model Selection ──┘
6. Security Checks
7. PII Detection
8. Semantic Cache Check ───► if HIT → return cached
9. 🆕 Memory Decision:
└── if (NOT Fact) AND (NOT Tool) AND (NOT Greeting) → continue
└── else → skip to step 12
10. 🆕 Build context + rewrite query [~1-5ms]
11. 🆕 Search Milvus, inject memories [~10-30ms]
12. Model Routing → LLM
RESPONSE PHASE:
──────────────
13. Parse LLM Response
14. Cache Update
15. 🆕 Memory Extraction (async goroutine, if auto_store enabled)
└── Runs in background, does NOT add latency to response
16. Response API Translation
17. Return to Client
第 10 步说明: 查询改写策略(上下文前缀、LLM 改写、HyDE)见 附录 C。
5. 记忆检索
流程
┌──────────────────────────────────────────────────────────── ─────────────┐
│ MEMORY RETRIEVAL FLOW │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. MEMORY DECISION (reuse existing pipeline signals) │
│ ────────────────────────────────────────────────── │
│ │
│ Pipeline already classified: │
│ ├── ctx.IsFact (Fact-Check classifier) │
│ ├── ctx.RequiresTool (Tool Detection) │
│ └── isGreeting(query) (simple pattern) │
│ │
│ Decision: │
│ ├── Fact query? → SKIP (general knowledge) │
│ ├── Tool query? → SKIP (tool provides answer) │
│ ├── Greeting? → SKIP (no context needed) │
│ └── Otherwise → SEARCH MEMORY │
│ │
│ 2. BUILD CONTEXT + REWRITE QUERY │
│ ───────────────────────────── │
│ History: ["Planning vacation", "Hawaii sounds nice"] │
│ Query: "How much?" │
│ │
│ Option A (MVP): Context prepend │
│ → "How much? Hawaii vacation planning" │
│ │
│ Option B (v1): LLM rewrite │
│ → "What is the budget for the Hawaii vacation?" │
│ │
│ 3. MILVUS SEARCH │
│ ───────────── │
│ Embed context → Search with user_id filter → Top-k results │
│ │
│ 4. THRESHOLD FILTER │
│ ──────────────── │
│ Keep only results with similarity > 0.6 │
│ ⚠️ Threshold is configurable; 0.6 is starting value, tune via logs │
│ │
│ 5. INJECT INTO LLM CONTEXT │
│ ──────────────────────── │
│ Add as system message: "User's relevant context: ..." │
│ │
└─────────────────────────────────────────────────────────────────────────┘
实现
MemoryFilter 结构体
// pkg/extproc/req_filter_memory.go
type MemoryFilter struct {
store memory.Store // Interface - can be MilvusStore or InMemoryStore
}
func NewMemoryFilter(store memory.Store) *MemoryFilter {
return &MemoryFilter{store: store}
}
说明:
store为第 8 节的Store接口,而非具体实现。运行时通常为生产环境MilvusStore或测试用InMemoryStore。
记忆决策(复用现有流水线)
已知限制:
IsFact分类器面向通识事实核查(如「法国首都是哪里?」)。可能将个人事实问题(如「我的预算是多少?」)误判为 fact,从而跳过记忆。POC 缓解: 增加个人指代检测。若查询含人称代词("my", "I", "me"),则覆盖
IsFact,仍检索记忆。未来: 重训或增强 fact 分类器以区分通识与个人事实。
// pkg/extproc/req_filter_memory.go
// shouldSearchMemory decides if query should trigger memory search
// Reuses existing pipeline classification signals with personal-fact override
func shouldSearchMemory(ctx *RequestContext, query string) bool {
// Check for personal indicators (overrides IsFact for personal questions)
hasPersonalIndicator := containsPersonalPronoun(query)
// 1. Fact query → skip UNLESS it contains personal pronouns
if ctx.IsFact && !hasPersonalIndicator {
logging.Debug("Memory: Skipping - general fact query")
return false
}
// 2. Tool required → skip (tool provides answer)
if ctx.RequiresTool {
logging.Debug("Memory: Skipping - tool query")
return false
}
// 3. Greeting/social → skip (no context needed)
if isGreeting(query) {
logging.Debug("Memory: Skipping - greeting")
return false
}
// 4. Default: search memory (conservative - don't miss context)
return true
}
func containsPersonalPronoun(query string) bool {
// Simple check for personal context indicators
personalPatterns := regexp.MustCompile(`(?i)\b(my|i|me|mine|i'm|i've|i'll)\b`)
return personalPatterns.MatchString(query)
}
func isGreeting(query string) bool {
// Match greetings that are ONLY greetings, not "Hi, what's my budget?"
lower := strings.ToLower(strings.TrimSpace(query))
// Short greetings only (< 20 chars and matches pattern)
if len(lower) > 20 {
return false
}
greetings := []string{
`^(hi|hello|hey|howdy)[\s\!\.\,]*$`,
`^(hi|hello|hey)[\s\,]*(there)?[\s\!\.\,]*$`,
`^(thanks|thank you|thx)[\s\!\.\,]*$`,
`^(bye|goodbye|see you)[\s\!\.\,]*$`,
`^(ok|okay|sure|yes|no)[\s\!\.\,]*$`,
}
for _, p := range greetings {
if regexp.MustCompile(p).MatchString(lower) {
return true
}
}
return false
}
上下文构建
// buildSearchQuery builds an effective search query from history + current query
// MVP: context prepend, v1: LLM rewrite for vague queries
func buildSearchQuery(history []Message, query string) string {
// If query is self-contained, use as-is
if isSelfContained(query) {
return query
}
// MVP: Simple context prepend
context := summarizeHistory(history)
return query + " " + context
// v1 (future): LLM rewrite for vague queries
// if isVague(query) {
// return rewriteWithLLM(history, query)
// }
}
func isSelfContained(query string) bool {
// Self-contained: "What's my budget for the Hawaii trip?"
// NOT self-contained: "How much?", "And that one?", "What about it?"
vaguePatterns := []string{`^how much\??$`, `^what about`, `^and that`, `^this one`}
for _, p := range vaguePatterns {
if regexp.MustCompile(`(?i)`+p).MatchString(query) {
return false
}
}
return len(query) > 20 // Short queries are often vague
}
func summarizeHistory(history []Message) string {
// Extract key terms from last 3 user messages
var terms []string
count := 0
for i := len(history) - 1; i >= 0 && count < 3; i-- {
if history[i].Role == "user" {
terms = append(terms, extractKeyTerms(history[i].Content))
count++
}
}
return strings.Join(terms, " ")
}
// v1: LLM-based query rewriting (future enhancement)
func rewriteWithLLM(history []Message, query string) string {
prompt := fmt.Sprintf(`Conversation context: %s
Rewrite this vague query to be self-contained: "%s"
Return ONLY the rewritten query.`, summarizeHistory(history), query)
// Call LLM endpoint
resp, _ := http.Post(llmEndpoint+"/v1/chat/completions", ...)
return parseResponse(resp)
// "how much?" → "What is the budget for the Hawaii vacation?"
}
完整检索
// pkg/extproc/req_filter_memory.go
func (f *MemoryFilter) RetrieveMemories(
ctx context.Context,
query string,
userID string,
history []Message,
) ([]*memory.RetrieveResult, error) {
// 1. Memory decision (skip if fact/tool/greeting)
if !shouldSearchMemory(ctx, query) {
logging.Debug("Memory: Skipping - not memory-relevant")
return nil, nil
}
// 2. Build search query (context prepend or LLM rewrite)
searchQuery := buildSearchQuery(history, query)
// 3. Search Milvus
results, err := f.store.Retrieve(ctx, memory.RetrieveOptions{
Query: searchQuery,
UserID: userID,
Limit: 5,
Threshold: 0.6,
})
if err != nil {
return nil, err
}
logging.Infof("Memory: Retrieved %d memories", len(results))
return results, nil
}
// InjectMemories adds memories to the LLM request
func (f *MemoryFilter) InjectMemories(
requestBody []byte,
memories []*memory.RetrieveResult,
) ([]byte, error) {
if len(memories) == 0 {
return requestBody, nil
}
// Format memories as context
var sb strings.Builder
sb.WriteString("## User's Relevant Context\n\n")
for _, mem := range memories {
sb.WriteString(fmt.Sprintf("- %s\n", mem.Memory.Content))
}
// Add as system message
return injectSystemMessage(requestBody, sb.String())
}
6. 记忆写入
触发条件
记忆提取由三类事件触发:
| 触发 | 说明 | 状态 |
|---|---|---|
| 每 N 轮 | 每 10 轮提取一次 | MVP |
| 会话结束 | 会话结束时生成情景摘要 | 未来 |
| 上下文漂移 | 主题显著变化时提取 | 未来 |
说明: 会话结束与漂移检测需额外实现。MVP 仅依赖「每 N 轮」触发。
流程
┌─────────────────────────────────────────────────────────────────────────┐
│ MEMORY SAVING FLOW │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ TRIGGERS: │
│ ───────── │
│ ├── Every N turns (e.g., 10) ← MVP │
│ ├── End of session ← Future (needs detection) │
│ └── Context drift detected ← Future (needs detection) │
│ │
│ Runs: Async (background) - no user latency │
│ │
│ 1. GET BATCH │
│ ───────── │
│ Get last 10-15 turns from session │
│ │
│ 2. LLM EXTRACTION │
│ ────────────── │
│ Prompt: "Extract important facts. Include context. │
│ Return JSON: [{type, content}, ...]" │
│ │
│ LLM returns: │
│ [{"type": "semantic", "content": "budget for Hawaii is $10K"}] │
│ │
│ 3. DEDUPLICATION │
│ ───────────── │
│ For each extracted fact: │
│ - Embed content │
│ - Search existing memories (same user, same type) │
│ - If similarity > 0.9: UPDATE existing (merge/replace) │
│ - If similarity 0.7-0.9: CREATE new (gray zone, conservative) │
│ - If similarity < 0.7: CREATE new │
│ │
│ Example: │
│ Existing: "User's budget for Hawaii is $10,000" │
│ New: "User's budget is now $15,000" │
│ → Similarity ~0.92 → UPDATE existing with new value │
│ │
│ 4. STORE IN MILVUS │
│ ─────────────── │
│ Memory { id, type, content, embedding, user_id, created_at } │
│ │
│ 5. SESSION END (future): Create episodic summary │
│ ──────────────────────────── ───────────────── │
│ "On Dec 29, user planned Hawaii vacation with $10K budget" │
│ │
└─────────────────────────────────────────────────────────────────────────┘
关于
user_id: 此处指已登录用户(经认证的身份),而非当前会话中的匿名会话用户;具体映射需在 semantic router agent 侧配置。
实现
// pkg/memory/extractor.go
type MemoryExtractor struct {
store memory.Store // Interface - can be MilvusStore or InMemoryStore
llmEndpoint string // LLM endpoint for fact extraction
batchSize int // Extract every N turns (default: 10)
turnCounts map[string]int
mu sync.Mutex
}
// ProcessResponse extracts and stores memories (runs async)
//
// Triggers (MVP: only first one implemented):
// - Every N turns (e.g., 10) ← MVP
// - End of session ← Future: needs session end detection
// - Context drift detected ← Future: needs drift detection
//
func (e *MemoryExtractor) ProcessResponse(
ctx context.Context,
sessionID string,
userID string,
history []Message,
) error {
e.mu.Lock()
e.turnCounts[sessionID]++
turnCount := e.turnCounts[sessionID]
e.mu.Unlock()
// MVP: Only extract every N turns
// Future: Also trigger on session end or context drift
if turnCount % e.batchSize != 0 {
return nil
}
// Get recent batch
batchStart := max(0, len(history) - e.batchSize - 5)
batch := history[batchStart:]
// LLM extraction
extracted, err := e.extractWithLLM(batch)
if err != nil {
return err
}
// Store with deduplication
for _, fact := range extracted {
existing, similarity := e.findSimilar(ctx, userID, fact.Content, fact.Type)
if similarity > 0.9 && existing != nil {
// Very similar → UPDATE existing memory
existing.Content = fact.Content // Use newer content
existing.UpdatedAt = time.Now()
if err := e.store.Update(ctx, existing.ID, existing); err != nil {
logging.Warnf("Failed to update memory: %v", err)
}
continue
}
// similarity < 0.9 → CREATE new memory
mem := &Memory{
ID: generateID("mem"),
Type: fact.Type,
Content: fact.Content,
UserID: userID,
Source: "conversation",
CreatedAt: time.Now(),
}
if err := e.store.Store(ctx, mem); err != nil {
logging.Warnf("Failed to store memory: %v", err)
}
}
return nil
}
// findSimilar searches for existing similar memories
func (e *MemoryExtractor) findSimilar(
ctx context.Context,
userID string,
content string,
memType MemoryType,
) (*Memory, float32) {
results, err := e.store.Retrieve(ctx, memory.RetrieveOptions{
Query: content,
UserID: userID,
Types: []MemoryType{memType},
Limit: 1,
Threshold: 0.7, // Only consider reasonably similar
})
if err != nil || len(results) == 0 {
return nil, 0
}
return results[0].Memory, results[0].Score
}
// extractWithLLM uses LLM to extract facts
//
// ⚠️ POC Limitation: LLM extraction is best-effort. Failures are logged but do not
// block the response. Incorrect extractions may occur.
//
// Future: Self-correcting memory (see Section 14 - Future Enhancements):
// - Track memory usage (access_count, last_accessed)
// - Score memories based on usage + age + retrieval feedback
// - Periodically prune low-score, unused memories
// - Detect contradictions → auto-merge or flag for resolution
//
func (e *MemoryExtractor) extractWithLLM(messages []Message) ([]ExtractedFact, error) {
prompt := `Extract important information from these messages.
IMPORTANT: Include CONTEXT for each fact.
For each piece of information:
- Type: "semantic" (facts, preferences) or "procedural" (instructions, how-to)
- Content: The fact WITH its context
BAD: {"type": "semantic", "content": "budget is $10,000"}
GOOD: {"type": "semantic", "content": "budget for Hawaii vacation is $10,000"}
Messages:
` + formatMessages(messages) + `
Return JSON array (empty if nothing to remember):
[{"type": "semantic|procedural", "content": "fact with context"}]`
// Call LLM with timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
reqBody := map[string]interface{}{
"model": "qwen3",
"messages": []map[string]string{
{"role": "user", "content": prompt},
},
}
jsonBody, _ := json.Marshal(reqBody)
req, _ := http.NewRequestWithContext(ctx, "POST",
e.llmEndpoint+"/v1/chat/completions",
bytes.NewReader(jsonBody))
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
logging.Warnf("Memory extraction LLM call failed: %v", err)
return nil, err // Caller handles gracefully
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
logging.Warnf("Memory extraction LLM returned %d", resp.StatusCode)
return nil, fmt.Errorf("LLM returned %d", resp.StatusCode)
}
facts, err := parseExtractedFacts(resp.Body)
if err != nil {
// JSON parse error - LLM returned malformed output
logging.Warnf("Memory extraction parse failed: %v", err)
return nil, err // Skip this batch, don't store garbage
}
return facts, nil
}
7. 记忆操作
可对记忆执行的全部操作,由 Store 接口实现(见 第 8 节)。
| 操作 | 说明 | 触发 | 接口方法 | 状态 |
|---|---|---|---|---|
| Store | 将新记忆写入 Milvus | 自动(LLM 提取)或显式 API | Store() | MVP |
| Retrieve | 语义检索相关记忆 | 查询时自动 | Retrieve() | MVP |
| Update | 修改已有记忆内容 | 去重或显式 API | Update() | MVP |
| Forget | 按 ID 删除单条记忆 | 显式 API | Forget() | MVP |
| ForgetByScope | 按用户/项目删除全部 | 显式 API | ForgetByScope() | MVP |
| Consolidate | 合并相关记忆为摘要 | 定时/达阈值 | Consolidate() | 未来 |
| Reflect | 从记忆模式生成洞察 | 智能体发起 | Reflect() | 未来 |
Forget 操作
// Forget single memory
DELETE /v1/memory/{memory_id}
// Forget all memories for a user
DELETE /v1/memory?user_id=user_123
// Forget all memories for a project
DELETE /v1/memory?user_id=user_123&project_id=project_abc
用例:
- 用户要求「忘掉关于 X 的内容」
- GDPR/隐私合规(被遗忘权)