Skip to content

Codex

OpenAI’s own coding agent. A local codex binary runs interactive editing (a terminal UI plus a small Model Context Protocol / MCP suite); a cloud-tasks binary runs as a remote worker. The whole system is a Rust mono-repo where crates decouple through trait and protocol crates.

维度 CodexClaude CodeOpenClawHermes
Entry `cli/` terminal entry (streaming stdin)`tui/` full-screen ratatui UI (with onboarding / trust)`app-server/` + `app-server-daemon/` speak JSON-RPC, called by IDE/plugins`cloud-tasks/` standalone remote-worker binary
Core loop `core/` single turn closure (assemble prompt → model call → tool dispatch → apply)`core-api/` stable public interface`core-plugins/` plugin attachment points`protocol/` cross-crate shared types
State & memory `state/` SQLite + lease/retry/backoff`memories/` two-phase consolidation (stage1 + global)`memory_citation` for traceback`agent-graph-store/` thread / branch topology
Tools `apply-patch/` V4A patch parser`exec/` shell execution`builtin-mcps/` bundled MCP bridges`core-skills/` 8 sub-crate skill engine
Sandbox macOS: `sandbox-macos-seatbelt/` emits .sbplLinux: `bwrap/` + `landlock/` + `seccomp/`Windows: `windows-sandbox-rs/`Unified: `sandbox/` + `execpolicy/`
Observability & cost `analytics/` 20+ event types`codex-otel/` OpenTelemetry bridge`codex-rollout-trace/` full-session replay`codex-cost/` token → USD
Codex pushes every engineering concern into its own crate. A mono-repo extreme.
  1. Rust type system as backstop. Result and enum make state-machine errors impossible to silently swallow. Stage1JobClaimOutcome’s 5-way enum forces a match on every branch.
  2. Three native sandboxes. macOS uses seatbelt (works without disabling System Integrity Protection); Linux combines bubblewrap, landlock, and seccomp (no root required); Windows uses windows-sandbox-rs (no Windows Subsystem for Linux dependency). One implementation per platform.
  3. Memory consolidation in the background. memory_consolidate_global runs LLM-based rewrites of MEMORY.md and skills/ as a background job. Zero cost on the main turn.
  4. Replayable rollout-trace. Every session persists to JSONL and can be replayed. The only reliable way to diagnose agent drift.
  5. apply-patch V4A format. A homegrown patch format more reliable than unified diff, more efficient than SEARCH/REPLACE, and more structured for LLMs.
  1. Refactor cost is high. With 50+ crates, any cross-crate change touches protocol/ and triggers long rebuilds.
  2. Rust skill bar. Async plus lifetimes plus tower-style trait composition. This is a Rust project first.
  3. No explicit user-level memory. MEMORY.md is cwd-scoped; cross-project preferences must live in personal-scope AGENTS.md.
  4. TUI-first. IDE integration runs through app-server JSON-RPC, but the ecosystem is far less mature than Claude Code’s IDE plugins.
  1. Phase 2 consolidation prompt (codex-rs/memories/write/templates/memories/consolidation.md). 800 lines that nail down what counts as high-signal memory, plus a wording-preservation rule. Reuse verbatim.
  2. Three-OS sandbox abstraction (sandbox/ and execpolicy/). One interface, three native backends. The best cross-OS agent reference.
  3. Stage1JobClaimOutcome 5-way state machine (state/src/model/memories.rs). All five outcomes of claiming a job are enum variants. No implicit branches.
  4. rollout-trace JSONL. Persist every session for replay. Every agent project should have this.
  5. app-server JSON-RPC protocol. Run the agent engine as a separate process behind JSON-RPC. IDE plugins don’t reimplement the turn loop.