PicoBot/docs/plans/2026-05-07-memory-system-design.md

8.8 KiB

PicoBot Memory System Design

Date: 2026-05-07

1. Overview

Introduce a memory system that allows PicoBot agents to remember user preferences, project context, facts, and conversation history across sessions. The memory system is unified with the existing context compression pipeline: compression automatically produces timeline memory entries and advances a last_consolidated_at pointer to avoid redundant reprocessing.

Design Principles

  • Compression is memory (inspired by nanobot): when old messages are compressed, the summary is persisted — not discarded
  • FTS5 only (no vector embeddings): keyword search via SQLite FTS5, sufficient for current scale
  • Extend existing infrastructure: reuse Storage connection pool, ContextCompressor, SystemPromptBuilder
  • YAGNI: no knowledge graph, no response cache, no namespace isolation, no audit trail

2. Core Architecture

ContextCompressor (existing)         MemoryManager (new)
       │                                  │
       │ compress_if_needed()             │ store / recall / forget
       │   ├─ LLM summary → inject        │
       │   └─ store(timeline entry) ──────┘
       │   └─ advance last_consolidated_at
       │
SystemPromptBuilder ── recall(knowledge, limit=5) ──→ inject into system prompt
AgentLoop ── after_turn ──→ memory_store / memory_recall / memory_forget tools

3. Memory Categories

Category Purpose Written By Retrieved By
knowledge Long-term facts, preferences, patterns, insights Agent via memory_store tool FTS5 → injected into system prompt every turn
timeline Compressed conversation summaries ContextCompressor automatically FTS5 + time-range queries

4. Storage Schema

New table: memories

Added to the existing Storage initialization in src/storage/mod.rs:

CREATE TABLE IF NOT EXISTS memories (
    id          TEXT PRIMARY KEY,
    key         TEXT NOT NULL UNIQUE,
    content     TEXT NOT NULL,
    category    TEXT NOT NULL DEFAULT 'knowledge',
    importance  REAL NOT NULL DEFAULT 0.5,
    session_id  TEXT,
    created_at  TEXT NOT NULL,
    updated_at  TEXT NOT NULL
);

CREATE VIRTUAL TABLE IF NOT EXISTS memory_fts USING fts5(
    key,
    content,
    content=memories,
    content_rowid=rowid
);

Modified table: sessions

ALTER TABLE sessions ADD COLUMN last_consolidated_at INTEGER;

5. Unified Compression-Memory Pipeline

Trigger Conditions

Compression/consolidation fires when any of these conditions is met:

Condition Value Rationale
Token budget exceeds 50% threshold context_window / 2 Primary trigger — context is getting full
Accumulated N turns without consolidation 3 (configurable) Catch-up for short messages that don't hit token threshold
Session idle 10 minutes (configurable) Important for async channels like Feishu

Flow

compress_if_needed(history, session_id):
  1. Read last_consolidated_at from session
     → Only compress messages after that timestamp
  2. If no messages to compress → return history unchanged
  3. FTS5 recall(user_input, limit=recall_limit, category=knowledge)
     → Inject relevant facts into system prompt
  4. LLM summarization of old messages → [Context Summary]
     → Inject into current conversation
  5. Store summary as timeline entry:
     key: "ctx_{session_id}_{uuid}"
     content: "[YYYY-MM-DD HH:MM] summary text..."
     category: timeline
  6. UPDATE sessions.last_consolidated_at = now()
  7. Return compressed history

timeline Entry Format

Each timeline entry follows nanobot's convention:

[2026-05-07 14:30] User asked about Rust async patterns. Discussed tokio::select!, 
semaphore-based rate limiting, and backpressure strategies. No code was written.

This format is grep-friendly and human-readable.

6. Retrieval Strategy

Automatic Retrieval (every turn)

SystemPromptBuilder.build_system_prompt() calls:

memory.recall(query=user_message, limit=recall_limit, category=knowledge)

Results sorted by FTS5 BM25 score, injected as:

## Memory Context

- user_prefers_rust: User prefers Rust for all backend projects
- project_picobot_stack: PicoBot uses Rust, axum, sqlx, ratatui, tokio
- user_workflow: User prefers TDD workflow with cargo test --lib

Agent-Initiated Retrieval

Agent uses memory_recall tool with optional category, since, until parameters.

Fallback

If FTS5 returns empty results, fallback to LIKE '%keyword%' on key and content columns.

7. Agent Tools

Tool Parameters Description
memory_store key: str, content: str, category: str, importance?: f64 Write or update a memory entry. Key is semantic identifier (e.g., "user_language_pref")
memory_recall query: str, category?: str, since?: i64, until?: i64, limit?: usize Search memories by keyword and optional filters
memory_forget key: str Delete a memory entry by key

8. Error Handling & Degradation

Scenario Strategy
Consolidation LLM call fails Log warning, increment failure counter, do NOT block main flow
Consecutive failures >= 3 Degrade: append raw message dump to timeline with [RAW] prefix, reset counter
FTS5 recall returns empty Fallback to LIKE '%keyword%' query
memory.enabled = false ContextCompressor works normally, no memory writes
MemoryManager uninitialized ContextCompressor works with feature-gated memory write path

9. Configuration

{
  "memory": {
    "enabled": true,
    "consolidation_provider": "openai",
    "consolidation_model": "gpt-4o-mini",
    "recall_limit": 5,
    "consolidation_turn_threshold": 3,
    "idle_consolidation_minutes": 10,
    "timeline_retention_days": 90,
    "max_failures_before_degrade": 3
  }
}
Key Type Default Description
enabled bool false Master switch for memory system
consolidation_provider string Provider name for consolidation LLM calls
consolidation_model string Model name for consolidation
recall_limit usize 5 Max knowledge entries injected into system prompt
consolidation_turn_threshold usize 3 Turns before forced consolidation
idle_consolidation_minutes u64 10 Idle time before consolidation trigger
timeline_retention_days u64 90 Auto-cleanup age for timeline entries
max_failures_before_degrade usize 3 Consecutive failures before raw archive fallback

10. New Module Structure

src/
├── memory/
│   ├── mod.rs              # MemoryManager, MemoryConfig
│   ├── types.rs            # MemoryEntry, MemoryCategory, ConsolidationResult
│   └── consolidation.rs    # Consolidation prompt + LLM call logic
├── storage/
│   └── memory.rs           # SQLite CRUD for memories table + FTS5
├── tools/
│   ├── memory_store.rs     # memory_store tool
│   ├── memory_recall.rs    # memory_recall tool
│   └── memory_forget.rs    # memory_forget tool

11. Integration Points (Existing Files Modified)

File Change
src/lib.rs Add pub mod memory;
src/config/mod.rs Add MemoryConfig struct and deserialization
src/storage/mod.rs Add pub mod memory;, init memories table and FTS5 in init_schema()
src/storage/session.rs Add last_consolidated_at column read/write
src/session/session.rs Add last_consolidated_at: Option<i64> field to Session
src/agent/context_compressor.rs Add memory: Option<Arc<MemoryManager>> field, write timeline on compress
src/agent/system_prompt.rs Add memory_context section via MemoryManager::recall()
src/agent/agent_loop.rs No changes (tools registered via ToolRegistry)
src/tools/mod.rs Register memory_store, memory_recall, memory_forget in create_default_tools()
src/gateway/mod.rs Initialize MemoryManager in GatewayState::new(), pass to ContextCompressor

12. Implementation Order

# Task Dependencies
1 Types: MemoryEntry, MemoryCategory, ConsolidationResult
2 Config: MemoryConfig + deserialization
3 Storage: memories table + FTS5 + CRUD + search #1
4 MemoryManager API #1, #2, #3
5 Session: last_consolidated_at field
6 ContextCompressor memory integration #4, #5
7 SystemPromptBuilder memory context injection #4
8 Agent tools: memory_store, memory_recall, memory_forget #4
9 GatewayState initialization wiring #4, #5, #6
10 Unit tests #1-#9