PicoBot/README.md

292 lines
10 KiB
Markdown

# PicoBot
PicoBot is a Rust-based personal AI assistant runtime. It runs a local gateway, connects chat channels such as the terminal TUI and Feishu/Lark, persists sessions in SQLite, and gives the agent a tool system for files, shell commands, web access, memory, scheduling, skills, MCP tools, and delegated sub-agents.
## What It Does
- Runs as a gateway server on `127.0.0.1:19876` by default.
- Provides a Ratatui terminal client over WebSocket.
- Supports Feishu/Lark messages, reactions, file upload/download, and media references.
- Calls OpenAI-compatible providers and Anthropic Messages API providers.
- Persists conversations, messages, memories, scheduled jobs, LLM call metadata, and background sub-agent tasks in SQLite.
- Loads skills from workspace, user, and shared skill directories, with built-in skills installed on first use.
- Compresses long contexts and stores timeline summaries for later recall.
- Can register tools discovered from configured MCP servers.
## Architecture
```text
Channel -> MessageBus -> SessionManager -> AgentLoop -> LLM Provider
| |
| v
| Tools
v
SQLite
Control messages -> SessionManager -> MessageBus -> OutboundDispatcher -> Channel
```
The main runtime boundary is:
- `channels` only receive and send external messages.
- `bus` is an async queue, not a router.
- `session` owns dialog lifecycle, persistence, memory recall, prompt assembly, compression, and task cancellation.
- `agent` runs the stateless LLM/tool loop.
- `providers` are HTTP clients for model APIs.
- `tools` execute agent actions and return string results.
- `storage` owns SQLite schema and CRUD.
- `scheduler` polls due jobs and feeds prompts back into sessions.
## Features
### Channels
- `cli_chat`: terminal TUI client connected through `/ws`.
- `feishu`: Feishu/Lark channel with configurable allow list, media directory, and reaction emoji.
### LLM Providers
- OpenAI-compatible chat completions, including DashScope, Volcengine, and similar APIs.
- Anthropic Messages API.
- Model-specific `input_type` metadata for text/image capability checks.
- JSON Schema cleanup for cross-provider tool compatibility.
### Sessions And Memory
- Session IDs use `<channel>:<chat_id>:<dialog_id>`.
- Each channel/chat can have multiple dialogs.
- Dialog operations include create, list, switch, rename, delete, compact, dump, info, and stop.
- Session history is persisted to SQLite and can be incrementally restored after compression.
- Knowledge memories are recalled into the system prompt each turn.
- Timeline memories are produced by context compression and can be searched later.
### Tools
Base tools registered for the agent:
| Tool | Purpose |
|------|---------|
| `calculator` | Math expressions and statistics |
| `file_read` / `file_write` / `file_edit` | Workspace file operations |
| `file_search` / `content_search` | File and content search |
| `bash` | Run shell commands in the workspace |
| `http_request` | HTTP API requests |
| `web_fetch` | Fetch and extract web page text |
| `get_skill` | List or load local skills |
| `memory_store` / `memory_recall` / `timeline_recall` / `memory_forget` | Long-term memory operations |
| `delegate` | Run inline, background, or parallel sub-agents |
| `send_message` | Send outbound messages to configured channels |
| `chat_manager` | Inspect sessions, channels, and stored messages |
| `cron_add/list/remove/enable/disable/update` | Manage scheduled jobs when scheduler is enabled |
| `browser` | Optional WebDriver browser automation when enabled |
| MCP tools | Dynamically registered from configured MCP servers |
### Skills
Skills are directories containing `SKILL.md`. Load priority is:
1. `{workspace}/skills`
2. `~/.picobot/skills`
3. `~/.agents/skills`
Same-name skills in higher-priority locations override lower-priority ones. Built-in skills from `resources/skills` are embedded into the binary and installed into `~/.picobot/skills` if missing.
## Quick Start
### Prerequisites
- Rust toolchain with edition 2024 support.
- A configured LLM provider API key.
### Build
```bash
cargo build
```
### Configure
PicoBot loads `~/.picobot/config.json` first, then falls back to `./config.json`. On gateway startup, a template is released to `~/.picobot/config.example.json` if it does not exist. The source template is [resources/templates/config.example.json](/home/xiaoxixi/code/PicoBot/resources/templates/config.example.json).
Minimal example:
```json
{
"providers": {
"openai": {
"type": "openai",
"base_url": "https://api.openai.com/v1",
"api_key": "<OPENAI_API_KEY>",
"extra_headers": {}
}
},
"models": {
"gpt-4o": {
"model_id": "gpt-4o",
"temperature": 0.7,
"max_tokens": 4096,
"input_type": ["text", "image"]
}
},
"agents": {
"default": {
"provider": "openai",
"model": "gpt-4o",
"max_tool_iterations": 99,
"token_limit": 128000
}
},
"workspace_dir": "~/.picobot/workspace"
}
```
The `.env` file in the current directory is parsed by PicoBot itself. Values like `<OPENAI_API_KEY>` in JSON are replaced from the process environment after `.env` is loaded.
### Run
```bash
cargo run -- gateway
```
The gateway switches the process working directory to `workspace_dir` and stores `picobot.db` there by default.
In another terminal:
```bash
cargo run -- chat
```
The client connects to `ws://127.0.0.1:19876/ws` by default. Override with `--gateway-url`.
## Configuration
Top-level config fields:
| Field | Purpose |
|-------|---------|
| `providers` | Named LLM provider configs |
| `models` | Named model configs |
| `agents` | Agent-to-provider/model binding |
| `gateway` | Bind address, session DB path, cleanup, scheduler, background task limits |
| `client` | Default WebSocket URL for the TUI client |
| `channels` | Channel configs, currently Feishu/Lark |
| `memory` | Recall and consolidation settings |
| `mcp` | MCP server configs |
| `browser` | Optional WebDriver browser tool config |
| `workspace_dir` | Workspace used for file tools, shell commands, DB default, and workspace skills |
Important defaults:
| Key | Default |
|-----|---------|
| `gateway.host` | `127.0.0.1` |
| `gateway.port` | `19876` |
| `gateway.max_concurrent_background_tasks` | `10` |
| `gateway.scheduler.enabled` | `true` if `scheduler` is omitted and defaulted |
| `client.gateway_url` | `ws://127.0.0.1:19876/ws` |
| `memory.recall_limit` | `5` |
| `memory.timeline_retention_days` | `90` |
| `mcp.tool_timeout_secs` | `180` |
| `browser.enabled` | `false` |
MCP servers support `stdio`, `sse`, and `streamable-http` transports. Browser automation requires a compatible Chrome/Chromium and chromedriver/WebDriver endpoint.
## Slash Commands
Available from CLI chat and channel text messages:
| Command | Description |
|---------|-------------|
| `/new` | Create a new dialog |
| `/sessions` | List recent dialogs |
| `/switch <dialog_id>` | Switch dialog |
| `/rename <title>` | Rename current dialog |
| `/delete` | Delete current dialog |
| `/compact` | Manually trigger context compression |
| `/info` | Show current dialog information |
| `/dump` | Save current dialog as Markdown |
| `/?`, `/help` | Show help |
| `/mcp` | Show MCP server and tool status |
| `/stop` | Stop active tasks and clear queued messages |
## WebSocket API
The gateway exposes:
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Returns service health and version |
| `GET` | `/ws` | WebSocket upgrade for chat clients |
Inbound WebSocket message types:
| Type | Main fields |
|------|-------------|
| `user_input` | `content`, optional `channel`, `chat_id`, `sender_id` |
| `clear_history` | optional `chat_id`, `session_id` |
| `create_session` | optional `title` |
| `list_sessions` | `include_archived` |
| `load_session` | `session_id` |
| `rename_session` | optional `session_id`, `title` |
| `archive_session` | optional `session_id` |
| `delete_session` | optional `session_id` |
| `get_slash_commands` | none |
| `ping` | none |
Outbound WebSocket message types include `assistant_response`, `error`, `session_established`, `session_created`, `session_list`, `session_loaded`, `session_renamed`, `session_archived`, `session_deleted`, `history_cleared`, `slash_commands_list`, `pong`, `command_executed`, and `system_notification`.
## Testing
```bash
# Unit tests
cargo test --lib
# Integration tests require real API keys in tests/test.env
cp tests/test.env.example tests/test.env
cargo test --test test_integration -- --ignored
cargo test --test test_tool_calling -- --ignored
cargo test --test test_request_format -- --ignored
```
Integration tests are ignored by default because they make real provider calls.
## Project Layout
```text
src/
agent/ LLM loop, context compression, system prompts, media handling, sub-agents
bus/ Inbound, outbound, and control message queues
channels/ CLI chat and Feishu/Lark integrations
client/ Ratatui terminal UI
config/ Config loading, env substitution, path expansion
gateway/ Axum HTTP/WebSocket server and GatewayState wiring
mcp/ MCP client connections and tool wrappers
memory/ Memory manager and memory types
observability/ Agent/tool telemetry observer interfaces
providers/ OpenAI-compatible and Anthropic clients
scheduler/ Scheduled job runtime
session/ Session lifecycle, dialog commands, persistence integration
skills/ Skill loading and embedded built-in skill installation
storage/ SQLite schema and CRUD
tools/ Agent tool implementations
resources/
skills/ Built-in skills embedded at build time
templates/ Config, AGENTS.md, and USER.md templates released on first run
tests/ Unit and ignored integration tests
reference/ Third-party reference code; do not modify as project source
```
## Key Dependencies
| Crate | Purpose |
|-------|---------|
| `axum`, `tokio`, `tokio-tungstenite` | Gateway and WebSocket runtime |
| `sqlx` | SQLite persistence |
| `reqwest` | LLM and HTTP clients |
| `ratatui`, `crossterm`, `termimad` | Terminal UI |
| `rmcp` | MCP client support |
| `fantoccini` | Optional browser automation |
| `cron`, `chrono-tz` | Scheduling |
| `jieba-rs` | Chinese tokenization for memory search |
| `zstd`, `tar` | Embedded built-in skill packaging |