292 lines
10 KiB
Markdown
292 lines
10 KiB
Markdown
# PicoBot
|
|
|
|
PicoBot is a Rust-based personal AI assistant runtime. It runs a local gateway, connects chat channels such as the terminal TUI and Feishu/Lark, persists sessions in SQLite, and gives the agent a tool system for files, shell commands, web access, memory, scheduling, skills, MCP tools, and delegated sub-agents.
|
|
|
|
## What It Does
|
|
|
|
- Runs as a gateway server on `127.0.0.1:19876` by default.
|
|
- Provides a Ratatui terminal client over WebSocket.
|
|
- Supports Feishu/Lark messages, reactions, file upload/download, and media references.
|
|
- Calls OpenAI-compatible providers and Anthropic Messages API providers.
|
|
- Persists conversations, messages, memories, scheduled jobs, LLM call metadata, and background sub-agent tasks in SQLite.
|
|
- Loads skills from workspace, user, and shared skill directories, with built-in skills installed on first use.
|
|
- Compresses long contexts and stores timeline summaries for later recall.
|
|
- Can register tools discovered from configured MCP servers.
|
|
|
|
## Architecture
|
|
|
|
```text
|
|
Channel -> MessageBus -> SessionManager -> AgentLoop -> LLM Provider
|
|
| |
|
|
| v
|
|
| Tools
|
|
v
|
|
SQLite
|
|
|
|
Control messages -> SessionManager -> MessageBus -> OutboundDispatcher -> Channel
|
|
```
|
|
|
|
The main runtime boundary is:
|
|
|
|
- `channels` only receive and send external messages.
|
|
- `bus` is an async queue, not a router.
|
|
- `session` owns dialog lifecycle, persistence, memory recall, prompt assembly, compression, and task cancellation.
|
|
- `agent` runs the stateless LLM/tool loop.
|
|
- `providers` are HTTP clients for model APIs.
|
|
- `tools` execute agent actions and return string results.
|
|
- `storage` owns SQLite schema and CRUD.
|
|
- `scheduler` polls due jobs and feeds prompts back into sessions.
|
|
|
|
## Features
|
|
|
|
### Channels
|
|
|
|
- `cli_chat`: terminal TUI client connected through `/ws`.
|
|
- `feishu`: Feishu/Lark channel with configurable allow list, media directory, and reaction emoji.
|
|
|
|
### LLM Providers
|
|
|
|
- OpenAI-compatible chat completions, including DashScope, Volcengine, and similar APIs.
|
|
- Anthropic Messages API.
|
|
- Model-specific `input_type` metadata for text/image capability checks.
|
|
- JSON Schema cleanup for cross-provider tool compatibility.
|
|
|
|
### Sessions And Memory
|
|
|
|
- Session IDs use `<channel>:<chat_id>:<dialog_id>`.
|
|
- Each channel/chat can have multiple dialogs.
|
|
- Dialog operations include create, list, switch, rename, delete, compact, dump, info, and stop.
|
|
- Session history is persisted to SQLite and can be incrementally restored after compression.
|
|
- Knowledge memories are recalled into the system prompt each turn.
|
|
- Timeline memories are produced by context compression and can be searched later.
|
|
|
|
### Tools
|
|
|
|
Base tools registered for the agent:
|
|
|
|
| Tool | Purpose |
|
|
|------|---------|
|
|
| `calculator` | Math expressions and statistics |
|
|
| `file_read` / `file_write` / `file_edit` | Workspace file operations |
|
|
| `file_search` / `content_search` | File and content search |
|
|
| `bash` | Run shell commands in the workspace |
|
|
| `http_request` | HTTP API requests |
|
|
| `web_fetch` | Fetch and extract web page text |
|
|
| `get_skill` | List or load local skills |
|
|
| `memory_store` / `memory_recall` / `timeline_recall` / `memory_forget` | Long-term memory operations |
|
|
| `delegate` | Run inline, background, or parallel sub-agents |
|
|
| `send_message` | Send outbound messages to configured channels |
|
|
| `chat_manager` | Inspect sessions, channels, and stored messages |
|
|
| `cron_add/list/remove/enable/disable/update` | Manage scheduled jobs when scheduler is enabled |
|
|
| `browser` | Optional WebDriver browser automation when enabled |
|
|
| MCP tools | Dynamically registered from configured MCP servers |
|
|
|
|
### Skills
|
|
|
|
Skills are directories containing `SKILL.md`. Load priority is:
|
|
|
|
1. `{workspace}/skills`
|
|
2. `~/.picobot/skills`
|
|
3. `~/.agents/skills`
|
|
|
|
Same-name skills in higher-priority locations override lower-priority ones. Built-in skills from `resources/skills` are embedded into the binary and installed into `~/.picobot/skills` if missing.
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Rust toolchain with edition 2024 support.
|
|
- A configured LLM provider API key.
|
|
|
|
### Build
|
|
|
|
```bash
|
|
cargo build
|
|
```
|
|
|
|
### Configure
|
|
|
|
PicoBot loads `~/.picobot/config.json` first, then falls back to `./config.json`. On gateway startup, a template is released to `~/.picobot/config.example.json` if it does not exist. The source template is [resources/templates/config.example.json](/home/xiaoxixi/code/PicoBot/resources/templates/config.example.json).
|
|
|
|
Minimal example:
|
|
|
|
```json
|
|
{
|
|
"providers": {
|
|
"openai": {
|
|
"type": "openai",
|
|
"base_url": "https://api.openai.com/v1",
|
|
"api_key": "<OPENAI_API_KEY>",
|
|
"extra_headers": {}
|
|
}
|
|
},
|
|
"models": {
|
|
"gpt-4o": {
|
|
"model_id": "gpt-4o",
|
|
"temperature": 0.7,
|
|
"max_tokens": 4096,
|
|
"input_type": ["text", "image"]
|
|
}
|
|
},
|
|
"agents": {
|
|
"default": {
|
|
"provider": "openai",
|
|
"model": "gpt-4o",
|
|
"max_tool_iterations": 99,
|
|
"token_limit": 128000
|
|
}
|
|
},
|
|
"workspace_dir": "~/.picobot/workspace"
|
|
}
|
|
```
|
|
|
|
The `.env` file in the current directory is parsed by PicoBot itself. Values like `<OPENAI_API_KEY>` in JSON are replaced from the process environment after `.env` is loaded.
|
|
|
|
### Run
|
|
|
|
```bash
|
|
cargo run -- gateway
|
|
```
|
|
|
|
The gateway switches the process working directory to `workspace_dir` and stores `picobot.db` there by default.
|
|
|
|
In another terminal:
|
|
|
|
```bash
|
|
cargo run -- chat
|
|
```
|
|
|
|
The client connects to `ws://127.0.0.1:19876/ws` by default. Override with `--gateway-url`.
|
|
|
|
## Configuration
|
|
|
|
Top-level config fields:
|
|
|
|
| Field | Purpose |
|
|
|-------|---------|
|
|
| `providers` | Named LLM provider configs |
|
|
| `models` | Named model configs |
|
|
| `agents` | Agent-to-provider/model binding |
|
|
| `gateway` | Bind address, session DB path, cleanup, scheduler, background task limits |
|
|
| `client` | Default WebSocket URL for the TUI client |
|
|
| `channels` | Channel configs, currently Feishu/Lark |
|
|
| `memory` | Recall and consolidation settings |
|
|
| `mcp` | MCP server configs |
|
|
| `browser` | Optional WebDriver browser tool config |
|
|
| `workspace_dir` | Workspace used for file tools, shell commands, DB default, and workspace skills |
|
|
|
|
Important defaults:
|
|
|
|
| Key | Default |
|
|
|-----|---------|
|
|
| `gateway.host` | `127.0.0.1` |
|
|
| `gateway.port` | `19876` |
|
|
| `gateway.max_concurrent_background_tasks` | `10` |
|
|
| `gateway.scheduler.enabled` | `true` if `scheduler` is omitted and defaulted |
|
|
| `client.gateway_url` | `ws://127.0.0.1:19876/ws` |
|
|
| `memory.recall_limit` | `5` |
|
|
| `memory.timeline_retention_days` | `90` |
|
|
| `mcp.tool_timeout_secs` | `180` |
|
|
| `browser.enabled` | `false` |
|
|
|
|
MCP servers support `stdio`, `sse`, and `streamable-http` transports. Browser automation requires a compatible Chrome/Chromium and chromedriver/WebDriver endpoint.
|
|
|
|
## Slash Commands
|
|
|
|
Available from CLI chat and channel text messages:
|
|
|
|
| Command | Description |
|
|
|---------|-------------|
|
|
| `/new` | Create a new dialog |
|
|
| `/sessions` | List recent dialogs |
|
|
| `/switch <dialog_id>` | Switch dialog |
|
|
| `/rename <title>` | Rename current dialog |
|
|
| `/delete` | Delete current dialog |
|
|
| `/compact` | Manually trigger context compression |
|
|
| `/info` | Show current dialog information |
|
|
| `/dump` | Save current dialog as Markdown |
|
|
| `/?`, `/help` | Show help |
|
|
| `/mcp` | Show MCP server and tool status |
|
|
| `/stop` | Stop active tasks and clear queued messages |
|
|
|
|
## WebSocket API
|
|
|
|
The gateway exposes:
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| `GET` | `/health` | Returns service health and version |
|
|
| `GET` | `/ws` | WebSocket upgrade for chat clients |
|
|
|
|
Inbound WebSocket message types:
|
|
|
|
| Type | Main fields |
|
|
|------|-------------|
|
|
| `user_input` | `content`, optional `channel`, `chat_id`, `sender_id` |
|
|
| `clear_history` | optional `chat_id`, `session_id` |
|
|
| `create_session` | optional `title` |
|
|
| `list_sessions` | `include_archived` |
|
|
| `load_session` | `session_id` |
|
|
| `rename_session` | optional `session_id`, `title` |
|
|
| `archive_session` | optional `session_id` |
|
|
| `delete_session` | optional `session_id` |
|
|
| `get_slash_commands` | none |
|
|
| `ping` | none |
|
|
|
|
Outbound WebSocket message types include `assistant_response`, `error`, `session_established`, `session_created`, `session_list`, `session_loaded`, `session_renamed`, `session_archived`, `session_deleted`, `history_cleared`, `slash_commands_list`, `pong`, `command_executed`, and `system_notification`.
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Unit tests
|
|
cargo test --lib
|
|
|
|
# Integration tests require real API keys in tests/test.env
|
|
cp tests/test.env.example tests/test.env
|
|
cargo test --test test_integration -- --ignored
|
|
cargo test --test test_tool_calling -- --ignored
|
|
cargo test --test test_request_format -- --ignored
|
|
```
|
|
|
|
Integration tests are ignored by default because they make real provider calls.
|
|
|
|
## Project Layout
|
|
|
|
```text
|
|
src/
|
|
agent/ LLM loop, context compression, system prompts, media handling, sub-agents
|
|
bus/ Inbound, outbound, and control message queues
|
|
channels/ CLI chat and Feishu/Lark integrations
|
|
client/ Ratatui terminal UI
|
|
config/ Config loading, env substitution, path expansion
|
|
gateway/ Axum HTTP/WebSocket server and GatewayState wiring
|
|
mcp/ MCP client connections and tool wrappers
|
|
memory/ Memory manager and memory types
|
|
observability/ Agent/tool telemetry observer interfaces
|
|
providers/ OpenAI-compatible and Anthropic clients
|
|
scheduler/ Scheduled job runtime
|
|
session/ Session lifecycle, dialog commands, persistence integration
|
|
skills/ Skill loading and embedded built-in skill installation
|
|
storage/ SQLite schema and CRUD
|
|
tools/ Agent tool implementations
|
|
resources/
|
|
skills/ Built-in skills embedded at build time
|
|
templates/ Config, AGENTS.md, and USER.md templates released on first run
|
|
tests/ Unit and ignored integration tests
|
|
reference/ Third-party reference code; do not modify as project source
|
|
```
|
|
|
|
## Key Dependencies
|
|
|
|
| Crate | Purpose |
|
|
|-------|---------|
|
|
| `axum`, `tokio`, `tokio-tungstenite` | Gateway and WebSocket runtime |
|
|
| `sqlx` | SQLite persistence |
|
|
| `reqwest` | LLM and HTTP clients |
|
|
| `ratatui`, `crossterm`, `termimad` | Terminal UI |
|
|
| `rmcp` | MCP client support |
|
|
| `fantoccini` | Optional browser automation |
|
|
| `cron`, `chrono-tz` | Scheduling |
|
|
| `jieba-rs` | Chinese tokenization for memory search |
|
|
| `zstd`, `tar` | Embedded built-in skill packaging |
|