PicoBot/.qoder/skills/lark-drive/references/lark-drive-workflow-knowledge-organize-discovery.md
ooodc a7883dbed9 refactor(todo): 重构待办事项管理逻辑及更新状态规则
- 移除 TodoItem 中的 priority、created_at 和 updated_at 字段
- 强制每个任务都必须有唯一 id,且由用户负责生成
- 修改合并模式逻辑,merge=true 下保留未提及的旧任务
- 支持已完成和已取消任务重新激活(状态改回 pending 或 in_progress)
- 禁止 in_progress 状态退回到 pending,必须标记为 completed 或 cancelled
- 优化状态转换校验,允许特定状态间合法切换
- 简化任务变更消息,移除详细的新增/更新/移除统计
- 更新文档和示例,明确 id 必须由用户生成和使用
- 修复和补充测试,增强状态转换和合并模式验证
- 调整任务时间戳生成逻辑,统一使用当前时间及索引
- 该变更提供更合理的任务状态机械及管理模式,提升稳定性和易用性
2026-06-13 09:22:33 +08:00

19 KiB
Raw Blame History

知识整理工作流Discovery

Loaded by states: PARSE_SCOPE, INVENTORY.

This file owns target parsing, scope clarification, resource inventory, ResourceItem normalization, dedupe, and partial inventory handling. It MUST NOT generate classification rules, execution plans, or perform write operations.

Required Context

Before executing rules in this file:

  1. Follow ../../lark-shared/SKILL.md for identity, auth, and permission handling.
  2. For Wiki / personal library targets, follow ../../lark-wiki/SKILL.md.
  3. For Drive folder inventory, follow lark-drive-files-list.md.
  4. For Drive search targets, follow lark-drive-search.md.
  5. For URL / token inspection, follow lark-drive-inspect.md and ../../lark-wiki/references/lark-wiki-node-get.md.

State: PARSE_SCOPE

Entry: workflow triggered.

MUST:

  1. Identify target_scope, environment_profile, and identity.
  2. Apply Scope Parsing.
  3. Output Scope Confirmation.
  4. Stop and wait for user confirmation before INVENTORY.

Exit: user confirms target scope.

Scope Parsing

Condition Agent MUST Do Set target_scope Next State
Input is /wiki/<token> URL Resolve the Wiki node and preserve both node identity and object identity Wiki node INVENTORY after user confirms scope
Input is Wiki space name / space_id Resolve the Wiki space; 0 matches -> stop and ask; 1 exact match -> continue; multiple matches -> show candidates and wait for user selection; do not treat my_library as a normal listed space Wiki space INVENTORY after user confirms scope
Input has Personal Library Intent Treat as Wiki personal library / my_library; resolve real space_id before root write; do not treat it as Drive root or owned Drive document search Personal doc library INVENTORY after user confirms scope
Input is /drive/folder/<token> URL Extract folder_token Drive folder INVENTORY after user confirms scope
Input has Drive Folder Intent but no concrete folder URL, token, or unique folder name Ask for folder URL / token / name; if a concrete folder name exists, search folder candidates and wait for user selection when 0 or multiple matches exist Unknown or Drive folder candidate Stay in PARSE_SCOPE until scope is confirmed
Input has Broad Cloud Drive Intent without explicit owned-document search request Ask the user to choose concrete scope: Drive folder URL / token, Drive root, owned Drive document search, or another explicit search filter; do not default to drive +search --mine Unknown Stay in PARSE_SCOPE until scope is confirmed
Input is single cloud resource URL Resolve the resource type; if not folder / Wiki scope, do not expand automatically Single resource Ask whether scope is this resource, parent folder, owning Wiki, or related search results
Input is real keyword / name Search with the real keyword according to lark-drive-search Search scope INVENTORY after user confirms scope
Input is range browsing / statistical description with no real keyword Search by filters / empty-query browsing according to lark-drive-search Search scope INVENTORY after user confirms scope
Input is ambiguous Ask the minimum clarification question and stop Unknown Stay in PARSE_SCOPE

Personal Library Intent means the user is referring to the current user's own Feishu document library / personal document library / personal knowledge library, such as 个人文档库, 飞书个人文档库, 我的文档库, 个人知识库, 我的知识库, My Document Library, or my_library.

When this intent is detected, use Wiki personal library semantics. Do not use Drive root, drive +search --mine, or broad owned-document search unless the user explicitly asks to search owned Drive documents.

Drive Folder Intent means the user wants to organize a specific Drive folder or Drive folder tree. A Drive folder scope requires a concrete folder URL, folder token, or user-selected folder candidate.

When this intent is detected without a concrete folder identity, stop in PARSE_SCOPE and ask for clarification. Do not use Drive root, drive +search --mine, or broad owned-document search unless the user explicitly asks for Drive root or owned-document search.

Broad Cloud Drive Intent means the user refers to a broad cloud-drive-level scope such as 我的飞书云盘, 我的云盘, 我的云空间, 我的空间, or 整理云盘, without a concrete folder URL / token / unique folder name.

This intent is broader than Drive Folder Intent and MUST NOT be silently converted to owned-document search. Ask the user to choose one of:

  1. A specific Drive folder URL / token.
  2. Drive root, only when the user explicitly accepts root-level scope.
  3. Owned Drive document search, only when the user explicitly asks to organize documents owned / managed by the current user.
  4. Another explicit search filter, such as keyword, type, time range, or folder token.

Stop Conditions

Stop and ask for clarification when:

  1. 用户只说"整理文件夹"、"整理目录"、"整理资料"、"整理文档"、"我的文档",且没有 URL、token、知识库名称、Personal Library Intent、concrete Drive folder identity 或明确搜索范围。
  2. 用户说"我的文件夹"、"我的目录"、"我的空间"、"我的云盘"、"我的飞书云盘"、"我的云空间",但无法唯一判断是具体 Drive 文件夹、Drive 根目录、owned Drive document search、个人文档库还是某个 Wiki 节点。
  3. 用户给的是单个资源 URL但要求"整理一批文档"或"整理相关资料"。
  4. 用户目标环境不明确且上下文中同时存在线上、BOE、PRE 或多个 profile。

Clarification template:

请提供要整理的 Drive 文件夹链接、Wiki 节点 / 知识库链接,或明确说明要整理"我的文档库";如果只想按关键词搜索整理,也请给出关键词或范围。

Scope Confirmation

我先确认本次整理范围。

目标:
范围:
环境 / profile
身份:
预计操作:先盘点并生成整理方案,不执行移动或创建。

请确认是否按这个范围继续?

Scope confirmation is user-facing. It MUST confirm only the business scope, environment / profile, identity, and whether write operations will run.

Do not display internal batching controls in scope confirmation, including max_depth, max_items, page_size, page tokens, retry counts, or partial=true. For example, when the user confirms Drive root, say the scope is the Drive root tree; do not append "recursive depth at most 3" or "at most 500 resources".

State: INVENTORY

Entry: target_scope confirmed.

MUST:

  1. Recursively list resources according to target type.
  2. Generate path during traversal.
  3. Normalize all results to ResourceItem.
  4. Track pagination, depth, item limits, and continuation checkpoints.
  5. Treat pagination, depth, item, and per-folder page limits as batching checkpoints; continue inventory in the confirmed scope unless blocked.
  6. Set partial=true only when inventory cannot continue because of auth, permission, API / pagination failure after retries, API coverage limitations, tool budget, target scope, or environment blockers.
  7. Apply Inventory Progress Reporting.
  8. Output Inventory Summary.
  9. Do not leave INVENTORY while inventory_continuation_state has queued folders, nodes, pages, or slices that can still be fetched.
  10. Continue to CONTENT_READ without asking the user only after the confirmed scope is exhausted or blocked.

Inventory Batch Checkpoints

Scope Internal Batch Checkpoint Required Continuation
Wiki recursion max_depth=3, max_items=500; follow lark-wiki-node-list pagination Record queued nodes / paths in inventory_continuation_state and immediately continue the next internal batch within the confirmed scope unless blocked
Drive folder tree max_depth=3, max_items=500, max 10 pages per folder, page_size=200 Record queued folders / pages in inventory_continuation_state and immediately continue the next internal batch within the confirmed scope unless blocked
Search discovery page_size=20, max_items=500; continue pages until has_more=false Record remaining pages / slices in inventory_continuation_state and immediately continue the next internal batch within the confirmed scope unless blocked

These checkpoints are pacing controls, not coverage limits. If the confirmed scope still has queued work after a checkpoint, continue with the next internal batch instead of presenting the current resource_items as final inventory or moving to content analysis.

When a depth checkpoint is reached, enqueue the child folders / nodes that would exceed the current batch depth; the next batch starts from those queued children with their original paths preserved. When an item checkpoint is reached, persist the current folder / node / page cursor plus the remaining queue, visited page keys, and resource dedupe keys, then continue from that checkpoint before analysis or planning.

If tool budget would be exceeded for a very large confirmed scope, stop only at that blocker, report that the inventory is incomplete, and suggest batching by first-level directory, Wiki space, or time window. Do not stop merely because a depth or item checkpoint was reached.

Inventory Continuation Rules

  1. Pagination, depth, item, and per-folder page limits are internal batching checkpoints.
  2. When a checkpoint is reached, record inventory_continuation_state with scope, queue, current_cursor, visited_page_keys, dedupe_keys, and blockers; Drive queue entries MUST contain folder_token, path, depth, and page_token; Wiki queue entries MUST contain space_id / node_token, path, depth, and pagination cursor; search entries MUST contain query / filters and pagination cursor.
  3. A depth checkpoint MUST enqueue deeper folders / nodes; it MUST NOT discard them or treat the current depth as final coverage.
  4. An item-count checkpoint MUST persist the current cursor and queue; it MUST NOT transition to CONTENT_READ, ISSUE_ANALYSIS, or PLAN_GENERATION while fetchable work remains.
  5. If inventory_continuation_state is missing, corrupt, or lacks required fields for the current scope, set partial=true, record the checkpoint blocker, and do not claim full coverage.
  6. Do not set partial=true solely because a valid batching checkpoint was reached.
  7. Set partial=true only when continuation is blocked by auth, permission, API / pagination failure after retries, API coverage limitations, tool budget, target scope, or environment blockers.
  8. Do not claim full coverage until the continuation queue for the confirmed scope is exhausted or blocked.

Inventory Progress Reporting

Inventory can be long-running when a Drive root, large folder tree, Wiki space, or broad search scope is confirmed.

Rules:

  1. When inventory starts, output one concise stage notice with the confirmed scope type and the fact that no write operation will be executed.
  2. If inventory runs longer than about 60 seconds, output progress about every 60 seconds.
  3. Progress reports SHOULD include only fields that are currently known: scanned folders / nodes, collected resources, current depth, queued folders / nodes, current search page / slice, and current blocker if any.
  4. When a batching checkpoint is reached and continuation will proceed automatically, report it as continuing inventory, not as a user action request.
  5. Do not output filler such as "still running" without current counts or current stage.
  6. Do not expose raw folder tokens, page tokens, retry logs, or partial=true unless the user explicitly asks to view inventory coverage details.

Example:

盘点进度:已扫描 <scanned_container_count> 个目录 / 节点,收集 <resource_count> 项资源,队列剩余 <queued_container_count> 个目录 / 节点。继续盘点,不会执行移动或创建。

Wiki Inventory Rules

  1. Follow ../../lark-wiki/references/lark-wiki-node-list.md traversal semantics.
  2. Generate stable paths from parent-child traversal.
  3. Preserve Wiki node identity fields needed by ResourceItem.
  4. Treat my_library as Wiki personal library, not Drive root.

Drive Inventory Rules

  1. Use drive files list according to lark-drive-files-list.md; its schema path is drive.files.list.
  2. Use the same Drive folder-tree traversal for Drive root and ordinary folders after the first request. Drive root differs only for the first-level request: it uses omitted or empty folder_token, does not support pagination, and does not return root-level shortcuts according to schema; returned child folders MUST still be listed by their own folder tokens like ordinary folders, and those ordinary folder lists may return type=shortcut entries. For a Drive root target, record this root-level shortcut coverage caveat, set partial=true only if the user requested full root-level shortcut coverage or root pagination cannot continue, and do not claim root-level shortcut coverage as complete.
  3. Recurse only into folder items within the confirmed scope.
  4. For each directory, continue pages manually by feeding the returned next_page_token into request param page_token. Do not rely on --page-all for inventory.
  5. If a page returns has_more=true but no usable next_page_token, retry the same page request up to 3 times. If retries still cannot produce a continuation token, set partial=true for that directory and record the pagination blocker.
  6. Use drive metas batch_query when URL, owner, created time, or updated time is needed.
  7. Pagination blocker details such as partial=true, folder token, page token, and retry logs are internal by default. Do not show them to the user unless the user explicitly asks to view inventory coverage details.

Search Inventory Rules

  1. Search results may be normalized directly only when they include stable identity fields required by ResourceItem.
  2. If a search result is a Wiki item and lacks node_token, resolve it with drive +inspect or wiki +node-get before dedupe.
  3. If Wiki identity still cannot be resolved, keep the item, set needs_review=true, and record needs_review_reason.
  4. For search scope, use page_size=20 unless a lower value is required by the command.
  5. Continue fetching pages until has_more=false.
  6. If max_items=500 is reached in one batch, record the current search cursor in inventory_continuation_state and continue the next internal batch without asking the user.
  7. Do not stop at an arbitrary sample size such as first 5 pages unless the user explicitly asks for sampling or auth, permission, API, environment, or tool-budget blockers occur.
  8. If service_total / result total is greater than collected items, treat it as continuation evidence: continue fetching when a cursor / page is available; set partial=true only if continuation is blocked.
  9. Do not present a partial search sample as complete inventory. Before generating a full organization plan from partial search results, continue fetching available pages unless the user explicitly asked for sampling or a blocker prevents continuation.

ResourceItem

Agent MUST normalize Wiki, Drive, and search results into ResourceItem. Later statistics, classification, and planning MUST use this model rather than raw API responses.

{
  "source": "wiki|drive|search",
  "title": "资源标题",
  "type": "doc|docx|sheet|bitable|mindnote|file|wiki|folder|slides|shortcut|catalog",
  "path": "当前路径/资源标题",
  "depth": 2,
  "url": "https://...",
  "token": "canonical_token",
  "node_token": "wiki_node_token_or_empty",
  "obj_token": "wiki_obj_token_or_drive_file_token",
  "node_type": "origin|shortcut|empty",
  "origin_node_token": "wiki_origin_node_token_or_empty",
  "space_id": "wiki_space_id_or_empty",
  "parent_token": "parent_node_or_folder_token",
  "has_child": false,
  "dedupe_key": "wiki:<space_id>:<node_token>|drive:<type>:<token>|search:<type>:<token>",
  "created_at": "optional",
  "updated_at": "optional",
  "needs_review": false,
  "needs_review_reason": ""
}

ResourceItem rules:

  1. path MUST be generated by recursion. Do not use title alone as path.
  2. Wiki URL token may not be the underlying document token. Preserve both node_token and obj_token.
  3. type MUST come from API fields such as obj_type / doc_type.
  4. Wiki organization is by node instance. Prefer wiki:<space_id>:<node_token> as dedupe_key.
  5. MUST NOT dedupe Wiki nodes only by obj_token; one document can appear under different Wiki paths or shortcuts.
  6. If node_type=shortcut or dedupe is uncertain, use wiki +node-get to supplement origin_node_token; if unavailable, leave empty and set needs_review=true.
  7. Drive folder tree dedupes by drive:<type>:<token>.
  8. Search results may merge with recursive results only by exact identity: Wiki by same node_token, Drive by same type + token.

Inventory Summary

已完成当前可覆盖范围盘点。

<仅当适用覆盖说明Drive 根目录第一层清单不返回快捷方式;本次盘点不包含根目录第一层快捷方式。根目录下子文件夹会按普通文件夹继续盘点,普通文件夹内返回的 `type=shortcut` 条目仍会被纳入资源清单。>

| 指标 | 数量 |
|------|------|
| 总资源数 |  |
| 各类型资源数 |  |
| 一级目录数量 |  |
| 根目录直接资源数 |  |
| 空目录数量 |  |
| 疑似临时 / 测试 / 未整理资源数 |  |
| 低置信度待确认资源数 |  |

下一步将自动读取低置信度资源并分析整理问题;不会执行移动或创建。

Discovery Failure Handling

Failure / Blocker Agent MUST Do Agent MUST NOT Do
Target scope is ambiguous Ask the minimum scope clarification question and stop Do not choose a whole cloud drive / personal library by default
Environment / profile is ambiguous Ask user to confirm prod / BOE / PRE and profile Do not cross environment boundaries
Missing API scope Follow lark-shared permission handling and stop Do not retry the same command repeatedly
Resource access denied Stop and follow the main workflow Permission Request Gate Do not request permission automatically or in batch
Pagination / depth / item checkpoint reached Record inventory_continuation_state and continue inventory in the confirmed scope Do not set partial=true solely because a batching checkpoint was reached
Pagination cursor missing after retries / API pagination failure Set partial=true; record the affected directory and blocker Do not loop indefinitely or claim full coverage