22 · Execution State Surfaces
§1 · TL;DR
Section titled “§1 · TL;DR”§2 · Surface Map
Section titled “§2 · Surface Map”| Surface | Codex | Claude Code | OpenClaw | Hermes |
|---|---|---|---|---|
| Approval plan | Codex `PlanDelta` / Plan item | Claude Code Plan Mode / plans artifact | Proposal before execution | Do not store as memory or completed work |
| Execution todo | Codex `update_plan` | Claude Code `TodoWrite` / Hermes `todo` | Current turn or session focus | At most one `in_progress`; resume unfinished items only |
| Tool progress | Codex MCP `item/mcpToolCall/progress` | Claude SDK `tool_progress` / Hermes `tool_progress` | Runtime facts from tools | Great for UI and operator streams; too noisy for raw prompt injection |
| Task progress | Claude SDK `task_started` / `task_progress` | OpenClaw child-session relay / Hermes delegate callback | Background work, subagents, workflows | Needs owner, parent id, terminal states, and late-event handling |
| Status surface | Codex terminal-title task progress | Claude OSC 9;4 terminal indicator | Low-cost human feedback | Must be optional and should not enter the transcript |
| Resume surface | Claude away summary | Session restore / compaction summary | Helps users return to the task | One to three next-step sentences; not a commit recap or raw log replay |
§3 · Source-Grounded Implementation
Section titled “§3 · Source-Grounded Implementation”Codex · prompt discipline, protocol events, terminal title
Section titled “Codex · prompt discipline, protocol events, terminal title”Codex separates two kinds of progress in its base instructions. The first is a natural-language update for long tasks: briefly tell the user what has been explored and what comes next. The second is update_plan: a renderable checklist with one active step. These are related, but not interchangeable.
Codex codex/codex-rs/protocol/src/prompts/base_instructions/default.md:52-60, 173-175, 267-275 — Codex defines both natural-language progress updates and update_plan checklist discipline.
Do not repeat the full contents of the plan after an `update_plan` call...## Sharing progress updates...There should always be exactly one `in_progress` stepCodex also has a third progress channel: MCP tool progress. App-server v2 defines McpToolCallProgressNotification with thread_id, turn_id, item_id, and message, and maps it to item/mcpToolCall/progress. External MCP tools can report intermediate progress without waiting for the final tool result.
Codex codex/codex-rs/app-server-protocol/src/protocol/v2/mcp.rs:202-207 — MCP tool progress is a first-class app-server v2 notification.
pub struct McpToolCallProgressNotification { pub thread_id: String, pub turn_id: String, pub item_id: String, pub message: String,}The TUI has another layer: TaskProgress is routed into the status line, preview item, and terminal title. That surface is for human awareness, not model context.
Claude Code · SDK event taxonomy, terminal progress, away summary
Section titled “Claude Code · SDK event taxonomy, terminal progress, away summary”Claude Code exposes a richer event taxonomy. The SDK schema includes hook_progress, tool_progress, task_started, and task_progress, which cover hook execution, tool execution, background task startup, usage, and summaries.
claude-code/src/entrypoints/sdk/coreSchemas.ts:1616-1657, 1715-1762 — Claude Code models hook, tool, and background task progress separately.
subtype: z.literal('hook_progress')type: z.literal('tool_progress')subtype: z.literal('task_started')subtype: z.literal('task_progress')The remote-session path treats these as background-task lifecycle messages: task_started registers remote work, task_progress can be skipped or folded by the UI, and print.ts flushes or drains those SDK events so background-agent progress is not lost behind final output.
The terminal layer is separate again. useTerminalNotification.ts reports progress through OSC 9;4, terminal.ts checks terminal compatibility, and supportedSettings.ts exposes a user setting. That state belongs to terminal chrome, not transcript state.
The away summary is a different surface. After the terminal is blurred for five minutes, no turn is running, and no summary exists since the last user turn, Claude Code generates one to three short sentences from the recent conversation. It uses a small fast model, querySource: 'away_summary', and skipCacheWrite: true, which makes it a UI resume card rather than long-term memory.
OpenClaw · event projection and hidden boundaries
Section titled “OpenClaw · event projection and hidden boundaries”OpenClaw’s progress layer is an ACP runtime event stream. The translator emits tool_call with in_progress when a tool starts and tool_call_update when it completes or fails. The auto-reply projector treats tool-call updates as hidden boundary tags by default and can edit a previous tool summary instead of flooding chat channels.
Child-session relay is also an execution-state surface. A spawned child announces that progress will stream back to the parent session, then child snippets are emitted with a progress context prefix. This is fact propagation between sessions, not a model-maintained todo list.
Hermes · display policy is per platform
Section titled “Hermes · display policy is per platform”Hermes turns tool_progress into a product setting. CLI modes cycle through off, new, all, and verbose; new suppresses repeated adjacent tool names, while verbose shows full arguments and debugging detail. The gateway /verbose command cycles the same setting per platform and writes it to display.platforms.<platform>.tool_progress.
Defaults differ by platform. CLI and API can tolerate more progress; chat platforms often need new or off; webhook defaults to quiet. API server streams tagged ("__tool_progress__", payload) tuples into SSE queues. Progress density is a channel policy, not a global default.
§4 · Shared Principles
Section titled “§4 · Shared Principles”Every progress event needs a source. Model-authored todo, runtime tool fact, background task lifecycle, terminal chrome, and resume summary are not the same kind of truth.
Every surface needs an audience. Model context wants short and stable state; users want paced updates; operators can handle more detail; logs need completeness; terminal titles need very small strings.
Every surface needs a lifetime. Approval plans expire after approval; todos expire after completion; tool progress belongs in event logs; task progress may persist across processes; away summaries expire on the next user turn.
Progress display needs dedupe and no-progress detection. Hermes has new mode, OpenClaw can edit tool updates, and OpenClaw also detects tool loops and child sessions with no output. Without that layer, a progress UI can simply make a stuck system look busy.
§5 · Key Differences
Section titled “§5 · Key Differences”Plan-first
- Good for high-risk approvals
- User can review before execution
- Proposal can stream
- Not real execution progress
- Must expire after approval
- Easy to confuse with checklists
Todo-first
- Best for single-agent coding CLIs
- Clear model focus
- Simple compaction recovery
- Weak tool-level detail
- Weak background-task semantics
- Can be marked done too early
Event-first
- Grounded in tool and subagent facts
- Excellent for operator UI
- Naturally captures failure
- Does not always express remaining semantic work
- Needs aggregation and dedupe
- Raw events are too noisy for prompts
Task-board-first
- Works for IDEs, teams, and background work
- Owner/blocker/claim is auditable
- Strong cross-process recovery
- High complexity
- Needs locks and migrations
- Overkill for current-focus todo
Status-card-first
- Good user experience
- Keeps transcript clean
- Works for terminal and resume flows
- Not a source of truth
- Cannot carry recovery alone
- Can summarize incorrectly
§6 · Review
Section titled “§6 · Review”| System | Value | Reason | Risk |
|---|---|---|---|
| Codex | Cleanest tool boundary | Natural-language progress, update_plan, PlanDelta, MCP progress, and TUI status are separate surfaces. | The name update_plan still invites confusion with Plan Mode. |
| Claude Code | Richest UX state model | SDK events, Tasks, TodoWrite, OSC 9;4, and away summary cover IDE, CLI, SDK, and return-to-task UX. | Documentation must explain which events are for model context and which are UI-only. |
| OpenClaw | Best multi-channel projection | ACP projector, hidden boundaries, child relay, and no-progress detectors are built for operator surfaces. | Without a unified model checklist, remaining semantic work depends on summaries. |
| Hermes | Most practical display density control | Per-platform off/new/all/verbose progress modes fit chat platforms well. | Display policy alone does not preserve long-running task focus. |
§7 · Build Recipe
Section titled “§7 · Build Recipe”Execution State Router
最小可行
- Define a common event envelope: `source`, `surface`, `audience`, `lifetime`, `thread_id`, `turn_id`, `item_id`, `message`.
- Separate approval plan, execution todo, and tool progress from day one.
- Write a UI projection policy: show, hide, edit, log-only, or model-context summary.
- Inject only unfinished execution todos into resume / compaction context.
- Keep natural-language progress updates for long work, but do not treat them as durable state.
进阶
- Add MCP progress notifications so external tools can report intermediate state.
- Add background task events: `task_started`, `task_progress`, `task_completed`, `task_failed`.
- Add terminal/status surfaces such as title progress, status bar, or OSC 9;4, with a user setting.
- Add away/resume summaries that state the high-level task and next step, not raw log recaps.
- Add no-progress detectors for repeated polling, unchanged tool results, and silent child sessions.
一开始别做
- Do not merge plans, todos, tasks, and tool events into one global table.
- Do not inject raw tool progress into the model prompt.
- Do not default chat platforms to verbose tool-argument output.
- Do not write away summaries into long-term memory.
- Do not show progress without detecting stalls.
§8 · Architecture Diagram
Section titled “§8 · Architecture Diagram”§9 · Source Index
Section titled “§9 · Source Index”§10 · Exercises
Section titled “§10 · Exercises”- Draw six state surfaces for an existing agent: approval plan, execution todo, tool progress, task progress, terminal/status, resume summary.
- Write a projection policy table: visible to user, log-only, model-context summary, expiration rule.
- Simulate 20 repeated polling events and verify the UI collapses repeats and emits a no-progress warning.
- Simulate a user returning after five minutes away and generate a one-to-three sentence away summary.
- Add progress notification support to one external MCP tool and ensure the model sees only a summary, not raw progress spam.
§11 · Interview Drill: 10 Questions With Worked Answers
Section titled “§11 · Interview Drill: 10 Questions With Worked Answers”Q1 · Concept: How is an execution-state surface different from a todo list?
A todo list is one execution-state surface: the current execution focus. Execution-state surfaces also include approval plans, tool progress, background task progress, MCP progress, terminal status, and away summaries.
The distinction is source and audience. A todo is usually model-maintained and may re-enter context. Tool progress comes from runtime facts and is primarily for UI or operators. Away summary helps a returning user, not the agent’s task truth.
Q2 · Design: Why not merge Plan, Todo, and Task into one table?
They have different lifetimes. A plan is a pre-execution proposal. A todo is current execution focus. A task is durable background or team work with owners, blockers, locks, and retries.
Merging them causes predictable bugs: unapproved proposals look like progress, completed todos re-enter context, or simple CLIs inherit task-board complexity they do not need.
Q3 · Protocol: How does MCP progress differ from a tool result?
The tool result is terminal output. MCP progress is an intermediate notification. Codex app-server v2 gives MCP progress a dedicated McpToolCallProgressNotification and item/mcpToolCall/progress event name.
Slow tools can report indexing, downloading, or querying while running. The model usually needs the final result or summary, not every raw progress message.
Q4 · UX: Why should terminal progress not enter the transcript?
Terminal titles, status bars, and OSC 9;4 are chrome. They reduce waiting uncertainty for humans but are not semantic conversation content.
If they enter the transcript, compaction and resume can mistake transient UI state for task meaning.
Q5 · Resume: How is away summary different from compaction summary?
Away summary is for the user returning to the terminal: one to three sentences about the high-level task and next step. Compaction summary is for the model: unfinished tasks, constraints, tool results, and current intent.
Claude Code uses a small fast model, recent messages, querySource: 'away_summary', and skipCacheWrite, which makes it a UI resume card rather than long-term memory.
Q6 · Operator: Why does a multi-channel agent need per-platform progress settings?
Terminals and APIs can tolerate more progress detail. Chat channels get noisy fast. Webhooks often need final-only behavior. Hermes handles this with off/new/all/verbose and per-platform defaults.
One global default either starves CLI users of useful state or floods chat users with tool spam.
Q7 · Subagents: Should subagent progress be Todo or Task Progress?
The child agent lifecycle should be task progress: started, progress, completed, failed, cancelled. The parent agent’s remaining work can still have a todo item like “wait for review subagent”.
Keep the two layers separate so child tool events do not pollute parent todos and late child completions do not disturb a finished parent answer.
Q8 · Failure: Why is showing progress not enough?
A progress UI can faithfully show that the system is stuck: same tool, same args, same result, or a silent child session.
Progress surfaces need no-progress detection: repeat folding, no-output timers, unchanged-result circuit breakers, and polling-loop warnings.
Q9 · Prompt: Which state should re-enter model context?
State worth re-entering model context is unfinished execution todo, terminal task result, summarized tool facts, and the latest user intent. Raw tool progress, terminal chrome, completed todos, and unapproved plans should stay out.
The prompt should carry stable continuation state, not UI noise.
Q10 · Implementation: What fields does a minimal execution-state router need?
Start with source, surface, audience, lifetime, thread_id, turn_id, item_id, message, status, and optional parent_id.
That is enough for UI projection, resume filtering, source-based audit, and parent/child task aggregation.