Codex

§1 · What Codex is

OpenAI’s own coding agent. A local codex binary runs interactive editing (a terminal UI plus a small Model Context Protocol / MCP suite); a cloud-tasks binary runs as a remote worker. The whole system is a Rust mono-repo where crates decouple through trait and protocol crates.

§2 · The 50+ crate layered architecture

维度	Codex	Claude Code	OpenClaw	Hermes
Entry	`cli/` terminal entry (streaming stdin)	`tui/` full-screen ratatui UI (with onboarding / trust)	`app-server/` + `app-server-daemon/` speak JSON-RPC, called by IDE/plugins	`cloud-tasks/` standalone remote-worker binary
Core loop	`core/` single turn closure (assemble prompt → model call → tool dispatch → apply)	`core-api/` stable public interface	`core-plugins/` plugin attachment points	`protocol/` cross-crate shared types
State & memory	`state/` SQLite + lease/retry/backoff	`memories/` two-phase consolidation (stage1 + global)	`memory_citation` for traceback	`agent-graph-store/` thread / branch topology
Tools	`apply-patch/` V4A patch parser	`exec/` shell execution	`builtin-mcps/` bundled MCP bridges	`core-skills/` 8 sub-crate skill engine
Sandbox	macOS: `sandbox-macos-seatbelt/` emits .sbpl	Linux: `bwrap/` + `landlock/` + `seccomp/`	Windows: `windows-sandbox-rs/`	Unified: `sandbox/` + `execpolicy/`
Observability & cost	`analytics/` 20+ event types	`codex-otel/` OpenTelemetry bridge	`codex-rollout-trace/` full-session replay	`codex-cost/` token → USD

Codex pushes every engineering concern into its own crate. A mono-repo extreme.

§3 · Engineering highlights

Rust type system as backstop. Result and enum make state-machine errors impossible to silently swallow. Stage1JobClaimOutcome’s 5-way enum forces a match on every branch.
Three native sandboxes. macOS uses seatbelt (works without disabling System Integrity Protection); Linux combines bubblewrap, landlock, and seccomp (no root required); Windows uses windows-sandbox-rs (no Windows Subsystem for Linux dependency). One implementation per platform.
Memory consolidation in the background. memory_consolidate_global runs LLM-based rewrites of MEMORY.md and skills/ as a background job. Zero cost on the main turn.
Replayable rollout-trace. Every session persists to JSONL and can be replayed. The only reliable way to diagnose agent drift.
apply-patch V4A format. A homegrown patch format more reliable than unified diff, more efficient than SEARCH/REPLACE, and more structured for LLMs.

§4 · Where it falls short

Refactor cost is high. With 50+ crates, any cross-crate change touches protocol/ and triggers long rebuilds.
Rust skill bar. Async plus lifetimes plus tower-style trait composition. This is a Rust project first.
No explicit user-level memory. MEMORY.md is cwd-scoped; cross-project preferences must live in personal-scope AGENTS.md.
TUI-first. IDE integration runs through app-server JSON-RPC, but the ecosystem is far less mature than Claude Code’s IDE plugins.

§5 · Five things worth stealing

Phase 2 consolidation prompt (codex-rs/memories/write/templates/memories/consolidation.md). 800 lines that nail down what counts as high-signal memory, plus a wording-preservation rule. Reuse verbatim.
Three-OS sandbox abstraction (sandbox/ and execpolicy/). One interface, three native backends. The best cross-OS agent reference.
Stage1JobClaimOutcome 5-way state machine (state/src/model/memories.rs). All five outcomes of claiming a job are enum variants. No implicit branches.
rollout-trace JSONL. Persist every session for replay. Every agent project should have this.
app-server JSON-RPC protocol. Run the agent engine as a separate process behind JSON-RPC. IDE plugins don’t reimplement the turn loop.