PicoBot/README.md

10 KiB

PicoBot

PicoBot is a Rust-based personal AI assistant runtime. It runs a local gateway, connects chat channels such as the terminal TUI and Feishu/Lark, persists sessions in SQLite, and gives the agent a tool system for files, shell commands, web access, memory, scheduling, skills, MCP tools, and delegated sub-agents.

What It Does

  • Runs as a gateway server on 127.0.0.1:19876 by default.
  • Provides a Ratatui terminal client over WebSocket.
  • Supports Feishu/Lark messages, reactions, file upload/download, and media references.
  • Calls OpenAI-compatible providers and Anthropic Messages API providers.
  • Persists conversations, messages, memories, scheduled jobs, LLM call metadata, and background sub-agent tasks in SQLite.
  • Loads skills from workspace, user, and shared skill directories, with built-in skills installed on first use.
  • Compresses long contexts and stores timeline summaries for later recall.
  • Can register tools discovered from configured MCP servers.

Architecture

Channel -> MessageBus -> SessionManager -> AgentLoop -> LLM Provider
                                   |             |
                                   |             v
                                   |           Tools
                                   v
                                SQLite

Control messages -> SessionManager -> MessageBus -> OutboundDispatcher -> Channel

The main runtime boundary is:

  • channels only receive and send external messages.
  • bus is an async queue, not a router.
  • session owns dialog lifecycle, persistence, memory recall, prompt assembly, compression, and task cancellation.
  • agent runs the stateless LLM/tool loop.
  • providers are HTTP clients for model APIs.
  • tools execute agent actions and return string results.
  • storage owns SQLite schema and CRUD.
  • scheduler polls due jobs and feeds prompts back into sessions.

Features

Channels

  • cli_chat: terminal TUI client connected through /ws.
  • feishu: Feishu/Lark channel with configurable allow list, media directory, and reaction emoji.

LLM Providers

  • OpenAI-compatible chat completions, including DashScope, Volcengine, and similar APIs.
  • Anthropic Messages API.
  • Model-specific input_type metadata for text/image capability checks.
  • JSON Schema cleanup for cross-provider tool compatibility.

Sessions And Memory

  • Session IDs use <channel>:<chat_id>:<dialog_id>.
  • Each channel/chat can have multiple dialogs.
  • Dialog operations include create, list, switch, rename, delete, compact, dump, info, and stop.
  • Session history is persisted to SQLite and can be incrementally restored after compression.
  • Knowledge memories are recalled into the system prompt each turn.
  • Timeline memories are produced by context compression and can be searched later.

Tools

Base tools registered for the agent:

Tool Purpose
calculator Math expressions and statistics
file_read / file_write / file_edit Workspace file operations
file_search / content_search File and content search
bash Run shell commands in the workspace
http_request HTTP API requests
web_fetch Fetch and extract web page text
get_skill List or load local skills
memory_store / memory_recall / timeline_recall / memory_forget Long-term memory operations
delegate Run inline, background, or parallel sub-agents
send_message Send outbound messages to configured channels
chat_manager Inspect sessions, channels, and stored messages
cron_add/list/remove/enable/disable/update Manage scheduled jobs when scheduler is enabled
browser Optional WebDriver browser automation when enabled
MCP tools Dynamically registered from configured MCP servers

Skills

Skills are directories containing SKILL.md. Load priority is:

  1. {workspace}/skills
  2. ~/.picobot/skills
  3. ~/.agents/skills

Same-name skills in higher-priority locations override lower-priority ones. Built-in skills from resources/skills are embedded into the binary and installed into ~/.picobot/skills if missing.

Quick Start

Prerequisites

  • Rust toolchain with edition 2024 support.
  • A configured LLM provider API key.

Build

cargo build

Configure

PicoBot loads ~/.picobot/config.json first, then falls back to ./config.json. On gateway startup, a template is released to ~/.picobot/config.example.json if it does not exist. The source template is resources/templates/config.example.json.

Minimal example:

{
  "providers": {
    "openai": {
      "type": "openai",
      "base_url": "https://api.openai.com/v1",
      "api_key": "<OPENAI_API_KEY>",
      "extra_headers": {}
    }
  },
  "models": {
    "gpt-4o": {
      "model_id": "gpt-4o",
      "temperature": 0.7,
      "max_tokens": 4096,
      "input_type": ["text", "image"]
    }
  },
  "agents": {
    "default": {
      "provider": "openai",
      "model": "gpt-4o",
      "max_tool_iterations": 99,
      "token_limit": 128000
    }
  },
  "workspace_dir": "~/.picobot/workspace"
}

The .env file in the current directory is parsed by PicoBot itself. Values like <OPENAI_API_KEY> in JSON are replaced from the process environment after .env is loaded.

Run

cargo run -- gateway

The gateway switches the process working directory to workspace_dir and stores picobot.db there by default.

In another terminal:

cargo run -- chat

The client connects to ws://127.0.0.1:19876/ws by default. Override with --gateway-url.

Configuration

Top-level config fields:

Field Purpose
providers Named LLM provider configs
models Named model configs
agents Agent-to-provider/model binding
gateway Bind address, session DB path, cleanup, scheduler, background task limits
client Default WebSocket URL for the TUI client
channels Channel configs, currently Feishu/Lark
memory Recall and consolidation settings
mcp MCP server configs
browser Optional WebDriver browser tool config
workspace_dir Workspace used for file tools, shell commands, DB default, and workspace skills

Important defaults:

Key Default
gateway.host 127.0.0.1
gateway.port 19876
gateway.max_concurrent_background_tasks 10
gateway.scheduler.enabled true if scheduler is omitted and defaulted
client.gateway_url ws://127.0.0.1:19876/ws
memory.recall_limit 5
memory.timeline_retention_days 90
mcp.tool_timeout_secs 180
browser.enabled false

MCP servers support stdio, sse, and streamable-http transports. Browser automation requires a compatible Chrome/Chromium and chromedriver/WebDriver endpoint.

Slash Commands

Available from CLI chat and channel text messages:

Command Description
/new Create a new dialog
/sessions List recent dialogs
/switch <dialog_id> Switch dialog
/rename <title> Rename current dialog
/delete Delete current dialog
/compact Manually trigger context compression
/info Show current dialog information
/dump Save current dialog as Markdown
/?, /help Show help
/mcp Show MCP server and tool status
/stop Stop active tasks and clear queued messages

WebSocket API

The gateway exposes:

Method Path Description
GET /health Returns service health and version
GET /ws WebSocket upgrade for chat clients

Inbound WebSocket message types:

Type Main fields
user_input content, optional channel, chat_id, sender_id
clear_history optional chat_id, session_id
create_session optional title
list_sessions include_archived
load_session session_id
rename_session optional session_id, title
archive_session optional session_id
delete_session optional session_id
get_slash_commands none
ping none

Outbound WebSocket message types include assistant_response, error, session_established, session_created, session_list, session_loaded, session_renamed, session_archived, session_deleted, history_cleared, slash_commands_list, pong, command_executed, and system_notification.

Testing

# Unit tests
cargo test --lib

# Integration tests require real API keys in tests/test.env
cp tests/test.env.example tests/test.env
cargo test --test test_integration -- --ignored
cargo test --test test_tool_calling -- --ignored
cargo test --test test_request_format -- --ignored

Integration tests are ignored by default because they make real provider calls.

Project Layout

src/
  agent/          LLM loop, context compression, system prompts, media handling, sub-agents
  bus/            Inbound, outbound, and control message queues
  channels/       CLI chat and Feishu/Lark integrations
  client/         Ratatui terminal UI
  config/         Config loading, env substitution, path expansion
  gateway/        Axum HTTP/WebSocket server and GatewayState wiring
  mcp/            MCP client connections and tool wrappers
  memory/         Memory manager and memory types
  observability/  Agent/tool telemetry observer interfaces
  providers/      OpenAI-compatible and Anthropic clients
  scheduler/      Scheduled job runtime
  session/        Session lifecycle, dialog commands, persistence integration
  skills/         Skill loading and embedded built-in skill installation
  storage/        SQLite schema and CRUD
  tools/          Agent tool implementations
resources/
  skills/         Built-in skills embedded at build time
  templates/      Config, AGENTS.md, and USER.md templates released on first run
tests/            Unit and ignored integration tests
reference/        Third-party reference code; do not modify as project source

Key Dependencies

Crate Purpose
axum, tokio, tokio-tungstenite Gateway and WebSocket runtime
sqlx SQLite persistence
reqwest LLM and HTTP clients
ratatui, crossterm, termimad Terminal UI
rmcp MCP client support
fantoccini Optional browser automation
cron, chrono-tz Scheduling
jieba-rs Chinese tokenization for memory search
zstd, tar Embedded built-in skill packaging