Skip to content

21 · Todo List: Task Checklist and Progress Surface

Four todo and progress surfaces: Codex update_plan, Claude Code TodoWrite and Tasks, OpenClaw progress events, Hermes todo tool
A todo list is useful when progress becomes structured state, not just text in a reply.
Dimension CodexClaude CodeOpenClawHermes
Core abstraction `UpdatePlanArgs`: optional `explanation` plus `plan[]`, each item has `step` and `status``TodoWrite` TodoItem; Tasks V2 turns this into file-backed `TaskSchema`ACP `tool_call` / `tool_call_update` events plus parent-stream progress relay`TodoStore`: session-local in-memory task list exposed through the `todo` tool
Status set `pending` / `in_progress` / `completed``pending` / `in_progress` / `completed`; Tasks V2 adds owner, blockers, metadata`in_progress` / `completed` / `failed` from runtime events`pending` / `in_progress` / `completed` / `cancelled`
Write model The model submits the current checklist through `update_plan``TodoWrite` writes the whole session list; Tasks V2 uses create/update task toolsRuntime emits events as tools and child agents actually run`merge=false` replaces the list; `merge=true` updates by id
Resume path Events flow to TUI / app-server notifications, not a durable plan fileTodoWrite can be restored from transcript; Tasks V2 persists JSON filesSession events, parent relay, and compaction summaries preserve active task stateHydrates from latest todo tool response; compaction injects unfinished items only
Plan relation `update_plan` is rejected in Plan Mode`plans.ts` handles plans; TodoWrite / Tasks handle execution progressPlan and progress are projected through runtime events, not one model-owned tableNo separate approval plan; `todo` is just execution focus
Main risk Too coarse for owner/blocker/team workflowsPowerful but conceptually split across Plan Mode, TodoWrite, and Tasks V2No unified model-owned checklist for remaining semantic workIn-memory first; depends on history and compaction for continuity
The same word todo can mean a model checklist, an IDE task board, a runtime event stream, or compaction recovery focus.

Codex keeps the protocol intentionally small:

Codex codex/codex-rs/protocol/src/plan_tool.rs:9-28 — The checklist schema has three statuses and simple step/status items.
pub enum StepStatus {
Pending,
InProgress,
Completed,
}
pub struct PlanItemArg {
pub step: String,
pub status: StepStatus,
}

The important part is the boundary in the handler: if the turn is in Plan Mode, update_plan fails because it is a TODO/checklist tool.

Codex codex/codex-rs/core/src/tools/handlers/plan.rs:75-89 — Codex rejects update_plan in Plan Mode and otherwise emits PlanUpdate.
if turn.collaboration_mode.mode == ModeKind::Plan {
return Err(FunctionCallError::RespondToModel(
"update_plan is a TODO/checklist tool and is not allowed in Plan mode".to_string(),
));
}
session.send_event(turn.as_ref(), EventMsg::PlanUpdate(args)).await;

The TUI then counts completed / total, updates status surfaces, and renders an Updated Plan history cell. That makes progress a protocol event and UI state, not only model narration.

Codex has another easy-to-miss path: proposed plans emitted in Plan Mode are parsed into PlanDelta and streamed through the app-server v2 item/plan/delta notification, while successful update_plan calls become turn/plan/updated. That is two protocol channels: streaming proposal text and tool-submitted execution checklist.

Claude Code · TodoWrite for session focus, Tasks V2 for durable work

Section titled “Claude Code · TodoWrite for session focus, Tasks V2 for durable work”

Claude Code makes the behavior explicit in the TodoWrite prompt: use it for complex multi-step work, mark a task in_progress before starting it, mark it completed only when fully done, and provide both imperative content and present-continuous activeForm.

claude-code/src/tools/TodoWriteTool/prompt.ts:147-166 — TodoWrite defines the states, activeForm, single in_progress rule, and completion standard.
- pending: Task not yet started
- in_progress: Currently working on (limit to ONE task at a time)
- completed: Task finished successfully
- activeForm: The present continuous form shown during execution
- ONLY mark a task as completed when you have FULLY accomplished it

The legacy TodoWrite path stores todos in AppState by agentId or sessionId, clears the list when all tasks are done, and can nudge verification when many tasks are closed without a verification item. Tasks V2 is a different layer: file-backed tasks with ids, subjects, descriptions, owner fields, blockers, claim checks, and metadata. That belongs to IDE and team workflows rather than the minimal current-focus checklist.

Tasks V2 is not just a bigger TodoWrite. TaskUpdateTool is enabled only when Todo V2 is enabled, and its schema can change status, owner, activeForm, blocks, blockedBy, and metadata. attachments.ts also maintains separate stale-update reminders for TodoWrite and for TaskCreate / TaskUpdate. Session focus and team task boards have different reminder loops.

Hermes · tiny state, compaction-aware injection

Section titled “Hermes · tiny state, compaction-aware injection”

Hermes keeps a single in-memory TodoStore per agent/session. It supports replace and merge writes, adds a cancelled state, and exposes everything through one todo tool. Its best idea is compaction behavior: only unfinished items return to active context.

Hermes hermes-agent/tools/todo_tool.py:90-118 — Hermes only injects pending and in_progress todos after compaction.
visible = [
item for item in self._items
if item["status"] in ("pending", "in_progress")
]
if not visible:
return None

The runtime can hydrate the store from the latest todo tool response in history. That keeps todo state session-scoped without promoting it into long-term memory.

Hermes also adds runtime progress through its ACP adapter. make_tool_progress_cb() maps tool.started to ACP ToolCallStart and tracks duplicate same-name tool calls with a FIFO queue; make_step_cb() consumes completed tools from prev_tools and emits completion updates. This is tool-fact progress, not model-authored todo state.

OpenClaw · runtime progress events instead of a model-owned checklist

Section titled “OpenClaw · runtime progress events instead of a model-owned checklist”

OpenClaw focuses on progress projection. Its ACP translator emits tool_call with in_progress when a tool starts, and tool_call_update with completed or failed when a tool ends. The auto-reply projector turns those events into visible tool summaries. The parent-stream relay forwards child-agent progress back to the parent session.

This is the right choice for a multi-channel operator surface: the most reliable progress source is often the runtime event stream, not a model-authored checklist.

OpenClaw also shows the guardrail this design needs: no-progress detection. tool-loop-detection.ts distinguishes unchanged polling loops, ping-pong loops, and global repeat circuit breakers; the parent-stream relay has a no-output watcher that emits a stall notice when a child session stops producing output. Without those detectors, a progress UI can faithfully display that the system is stuck.

The shared move is to pull progress out of free-form prose. Codex uses protocol events, Claude Code uses tool state and task files, Hermes uses tool JSON, and OpenClaw uses runtime events. A final answer that says what the agent will do next is not task management because UI, resume, reminders, and compaction cannot reliably consume it.

The state machine stays small. pending / in_progress / completed is enough for the current execution surface. Add failed, cancelled, owner, or blockers only when the runtime really needs them.

There must be one current focus. Codex requires at most one in_progress; Claude Code requires exactly one; Hermes also says only one. Without that rule, a todo list becomes a wishlist and recovery loses the next action.

Completion should lower attention. Claude Code only completes fully accomplished items, Codex crosses completed items out, Hermes does not re-inject completed items after compaction, and TodoWrite clears all-done lists.

Four implementation paths

Model-owned checklist

  • Best for single-agent coding CLIs
  • The model knows the next focus
  • Easy to render in UI
  • Can be marked complete too early
  • No owner/blocker semantics
  • Needs reminders and resume

File-backed tasks

  • Works across IDE windows and agents
  • Supports owner and blockers
  • Auditable after restart
  • Requires locks and migrations
  • More complex than current-focus todo
  • Easy to overbuild

Runtime event stream

  • Grounded in tool and child-agent facts
  • Excellent for operator UI
  • Naturally captures failure states
  • Does not express remaining semantic work alone
  • Needs summary continuity
  • Needs aggregation and dedupe

Compaction injection

  • Great for long-context agents
  • Avoids repeating completed work
  • Cheap to implement
  • Depends on history and compression
  • Weak audit story
  • IDs depend on model quality
Choose based on whether your product is a CLI, IDE, operator runtime, or long-context agent.

The core split is three layers: approval plan, execution todo, and durable task. Plans are for user approval. Todos are for current execution focus. Durable tasks are for background or team work with owners, blockers, locks, and retries.

SystemValueReasonRisk
CodexCleanest minimumSmall schema, explicit Plan Mode rejection, and evented UI rendering make it ideal for coding CLI progress.Too coarse for team tasks or durable background work.
Claude CodeRichest IDE surfaceTodoWrite adds discipline, activeForm, reminders, restore, and verification nudges; Tasks V2 adds durable task semantics.Plan Mode, TodoWrite, and Tasks V2 must be explained as separate layers.
OpenClawBest operator projectionTool calls, tool updates, and child-agent progress are projected as runtime events across channels.No single model-owned checklist for remaining semantic tasks.
HermesLightest compaction recoveryA tiny TodoStore plus unfinished-only injection solves many repeat-work failures.In-memory first; continuity depends on history and compression.
Copy Codex for a CLI, Claude Code for IDE/team workflows, OpenClaw for operator progress, and Hermes for long-context recovery.

Todo List / Progress Surface

最小可行

  • Define a small schema: `id?`, `content`, `status`, optional `activeForm`; start with `pending` / `in_progress` / `completed`.
  • Expose an `update_todo` or `update_plan` tool and write updates to both session state and the event stream.
  • Enforce at most one `in_progress` item.
  • Render the checklist in UI and show completed / total in status surfaces.
  • After compaction, inject only pending and in_progress items.

进阶

  • Restore from transcript or the latest todo tool result.
  • Add reminders when complex work goes many assistant turns without a todo update.
  • Add verification nudges before closing many tasks.
  • Introduce file-backed tasks only when owner/blocker/cross-process behavior is required.
  • Project runtime tool start/update/terminal events separately.
  • Add no-progress detectors for unchanged polling, ping-pong loops, or child sessions with no output.

一开始别做

  • Do not merge Plan Mode and execution todo.
  • Do not store current todos as long-term memory.
  • Do not rely on final-answer prose as the progress source.
  • Do not allow multiple top-level `in_progress` tasks.
  • Do not re-inject completed items into active context.
Three-layer todo architecture: approval plan, execution todo, durable task, plus event stream and compaction recovery
The stable design is layered: plan for approval, todo for execution focus, durable task for background and team work.
  1. Add an update_task_progress tool with content, status, and optional activeForm; reject multiple in_progress items.
  2. Emit todo updates as events and render completed / total in a CLI or Web UI.
  3. Run a compaction test: completed items should disappear from active context, unfinished items should remain.
  4. Simulate approval-only Plan Mode followed by execution and verify they use different state.
  5. Add an internal reminder after 10 assistant turns without a todo update, and ensure the reminder is not shown to the user.

§11 · Interview Drill: 10 Questions With Worked Answers

Section titled “§11 · Interview Drill: 10 Questions With Worked Answers”
Q1 · Concept: Why is a todo list not Plan Mode?

Plan Mode is for approval before execution. A todo list is execution state after work has started. Codex enforces this distinction in code by rejecting update_plan while the collaboration mode is Plan Mode.

Q2 · Design: Why allow at most one in_progress item?

The todo list needs a current focus. Multiple active top-level tasks make resume, UI status, and next-action selection ambiguous. Parallel tool calls can happen inside one active task; that is not the same as several top-level tasks.

Q3 · UI: What does activeForm buy in Claude Code?

It gives UI a present-continuous phrase such as Running tests, separate from the imperative task content such as Run tests. That makes status bars and spinners read naturally without guessing grammar from the task title.

Q4 · Recovery: Why should completed items not be re-injected?

Completed items have audit value but little active execution value. Re-injecting them wastes context and can cause repeated work. Hermes only injects pending and in_progress items after compaction.

Q5 · Resume: Why restore TodoWrite from transcript?

Session AppState can disappear in SDK or non-interactive flows. Restoring from the last TodoWrite tool use gives the agent a stable unfinished-work surface after resume.

Q6 · Trade-off: When do you need file-backed Tasks?

Use file-backed tasks when work crosses processes, IDE windows, agents, or owners. You need ids, locks, owner fields, blockers, claim checks, and durable JSON. For a single-agent CLI, a small checklist is usually enough.

Q7 · OpenClaw: Why use progress events instead of a model-owned checklist?

In a multi-channel runtime, the most reliable progress facts come from tools, child sessions, and delivery state. Runtime events are easier to project to operators than a model-authored checklist.

Q8 · API: Replace or merge?

Replace is simpler and safer for a short current checklist because the model submits the whole truth each time. Merge is useful for long-running lists where partial updates and discovered branches matter, but it requires stable ids and cleanup.

Q9 · Verification: Why add a verification nudge?

Agents often treat code written as work completed. A verification nudge forces completion to be backed by tests, lint, HTTP smoke, screenshot, or another concrete signal.

Q10 · Checklist: What is the minimum production progress surface?

Use a tiny schema, enforce one active focus, event updates to UI and transcript, restore unfinished work after compaction or resume, and require evidence before completion. Add reminders, durable tasks, and runtime progress events only when the product shape needs them.