Skip to content

build-your-own-agent · Companion Skill

The companion engineering artifact to this book. Compresses 22-chapter conclusions into a Claude Code / Cursor loadable build-your-own-agent skill that does two jobs:

  1. Build: scaffold a production-grade agent harness from zero.
  2. Diagnose & Optimize: audit an existing agent against the 10 iron laws + 9 anti-patterns and apply the source-backed fix.

Both jobs share one vocabulary (10 laws + 8 axes) and point at source-level reference implementations from Codex / Claude Code / OpenClaw / Hermes.

  • 1 SKILL.md entry: 10 iron laws + 8-axis spectrum + Build/Diagnose dual flow navigation;
  • 9 reference docs: independently loadable, mapped to nine sub-tasks (build flow, diagnose flow, design, scaffold, refactor, security, production, interview, cross-skill);
  • 3 executable scripts: init-agent-project.py (standard initializer) + lint-agent-design.py (static 10-rule check + progress advisory) + diagnose-agent.py (runtime 9 anti-pattern detection from rollout.jsonl);
  • asset templates + assets/scaffold/ as the single source for generated files: AGENTS.md.template, pyproject.toml.template, README.md.template, .gitignore.template, ci-lint-diagnose.yml.template, and source templates. New projects start in one command.
build-your-own-agent/
├── SKILL.md # entry: 10 laws + Build/Diagnose flows
├── LICENSE.txt # MIT + attribution
├── references/
│ ├── build-agent-workflow.md # 5-phase end-to-end Build flow
│ ├── diagnose-agent.md # 4 diagnosis flows + 9 AP → fix map
│ ├── picking-from-spectrum.md # decision trees (8 axes)
│ ├── agent-scaffold.md # standard Python scaffold
│ ├── migration-guide.md # 10-stage refactor playbook
│ ├── security-checklist.md # 5-layer defense stack
│ ├── production-deployment.md # 7-phase deploy
│ ├── interview-prep.md # 20 highest-value questions
│ └── skill-interop.md # combine with mcp-builder / etc.
├── scripts/
│ ├── init-agent-project.py # standard new-agent initializer
│ ├── lint-agent-design.py # 10-rule static check (text/json/--rules subset)
│ └── diagnose-agent.py # 9 anti-pattern runtime detection
└── assets/
├── AGENTS.md.template # new-project architecture doc skeleton
├── pyproject.toml.template # deps + ruff/pytest/mypy config
├── README.md.template # project readme skeleton
├── .gitignore.template # runtime/cache/secret exclusions
├── ci-lint-diagnose.yml.template # GitHub Actions pipeline
└── scaffold/ # single source for generated source files

Two ways to install:

Terminal window
git clone https://github.com/veithly/build-your-own-agent.git ~/.claude/skills/build-your-own-agent
# Restart Claude Code; appears in the skill picker.
Terminal window
git clone https://github.com/veithly/build-your-own-agent.git ./skills/build-your-own-agent
# Declare in AGENTS.md / CLAUDE.md:
# "When designing an agent, load skill at ./skills/build-your-own-agent/SKILL.md."

The skill loads on demand without polluting context. Two main paths: Build and Diagnose.

维度 Your promptLoaded referencesOutputWhen to use
Building a coding agent "I want a Codex-like coding agent with a sandbox"SKILL.md + build-agent-workflow.md + agent-scaffold.md5-phase Build flow + initializer + standard Python scaffold + source reference per stepStarting from scratch
Stuck on architecture "Should the loop borrow from Codex rollout or Claude Code 7 transitions?"SKILL.md + picking-from-spectrum.md8-axis decision tree + three worked examplesPhase 1 decisions
Task progress surface "Should todo list use Codex update_plan or Claude Code TodoWrite?"SKILL.md + picking-from-spectrum.mdAxis 8: approval plan / execution todo / durable task layeringRuntime progress design
Execution state routing "Where should tool progress, todo, and away summary appear?"SKILL.md + picking-from-spectrum.md + docs-site §22Execution-state router: source / audience / lifetime / context policyDesigning multi-layer progress surfaces
Diagnose existing agent "Agent is slow / expensive / leaking / looping / unsafe"SKILL.md + diagnose-agent.md + scripts/diagnose-agent.py4 diagnosis flows + 9 anti-pattern → source-backed fix mapProduction trouble
Upgrade existing agent "Lint fails 8/10, how do I refactor in stages?"SKILL.md + migration-guide.md10-stage refactor, 1-3 days each, one stage per law1-year-old codebase, no time for rewrite
Pre-launch review "Launch next week, final gate"security-checklist.md + production-deployment.md + lint + diagnose5-layer security + 7-phase deploy + dual-script CI gate3 days before launch
Interview prep "Interviewing for an agent infra role"SKILL.md + interview-prep.md20 high-frequency questions + chapter pointers + three-paragraph answer template1 week before interview
Load on demand. Build and Diagnose are two clean entry points.

§3.1 · One-line commands: CI / local sweep

Section titled “§3.1 · One-line commands: CI / local sweep”
Terminal window
# Standard new-agent initialization
python ~/.claude/skills/build-your-own-agent/scripts/init-agent-project.py ./my-agent \
--profile coding-cli \
--test-cmd "python -m pytest -ra"
# Static lint (10 iron laws; CI gate)
python ~/.claude/skills/build-your-own-agent/scripts/lint-agent-design.py /path/to/agent
python ~/.claude/skills/build-your-own-agent/scripts/lint-agent-design.py /path/to/agent --format json
# Runtime diagnosis (9 anti-patterns; weekly sweep)
python ~/.claude/skills/build-your-own-agent/scripts/diagnose-agent.py /path/to/rollouts/ --allow-empty
python ~/.claude/skills/build-your-own-agent/scripts/diagnose-agent.py /path/to/rollouts/ \
--metrics /path/to/metrics.jsonl \
--agent-src /path/to/agent \
--format json

Exit 0 = pass; 1 = at least one finding. JSON output is stable enough to feed GitHub Actions / Jenkins / GitLab CI directly. assets/ci-lint-diagnose.yml.template is a working pipeline skeleton.

These 10 rules are the skill’s spine. Every law is one that all four reference systems obey.

10 Iron Laws of Agent Harness

  1. Turn is the source of truth
    Everything written to disk, every retry, every recovery happens at turn boundaries. Define what a turn is on day 1 and never break the invariant.
  2. Context has a cache boundary
    Layer 1-N is frozen (cached). Layer N+1...end is recomputed every turn. Memory snapshots and skill indexes above the boundary; timestamps and per-turn state below.
  3. Prompt is data, never instructions
    Always declare external content (web fetch, email, tool output) as data inside the prompt, wrapped with a session-unique nonce so attackers cannot forge boundaries.
  4. Three verifier tiers always
    Hard (external test/exit code) + soft (token budget) + give-up (model self-stops). Production needs at least hard + soft.
  5. Sandbox first, then trust
    OS-level sandbox is the hard runtime boundary. LLM-layer trust comes after. Defaults must be network=deny + fs_write=restricted.
  6. Redact at import time, not at log time
    Token redaction config snapshotted at module load so the LLM cannot bypass mid-turn by exporting an env var.
  7. fail_open beats fail_closed as the default
    Strict-by-default leads to "users disable safety entirely". Make fail_open the default; let production opt into fail_closed explicitly.
  8. Memory writes need a frozen snapshot
    When memory enters the prompt it must be a snapshot taken at turn start, not a live reference. Live references invalidate prefix cache and risk mid-turn mutation.
  9. Skills are content, but loadable code is supply chain
    Bundled allowlist + scanner + provenance signature is non-negotiable for production. User-installed skills require a 4-tier trust ladder x 3-verdict matrix.
  10. Audit trail is the last mile
    rollout.jsonl / trajectory / SecurityAuditReport. Without one of these you cannot investigate when something goes wrong.

Each law maps to a chapter with worked examples. Want to know how Codex / Claude Code / OpenClaw / Hermes actually implement law N? Read the corresponding chapter’s §11 interview section.

The skill does not replace the book. It is the toolkit you reach for after reading:

The book — 22 chapters (depth + cases + comparison)
├─ distilled to 10 iron laws ────► SKILL.md (entry)
├─ extracted to 8-axis spectrum ─► picking-from-spectrum.md
├─ top 20 of 220 questions ──────► interview-prep.md
└─ engineering takeaways ────────► scaffold / migration / security / production

If a law feels under-explained, jump back to the chapter. If you forget “wait, how does Codex do that transition again”, let the skill find the citation for you.

The skill is versioned in lockstep with the book. New references and rule edits land in the repo CHANGELOG.md. Welcome contributions on:

  • You built an agent — which laws actually bit you, and how you adjusted them;
  • Use cases missing from the §5 cross-reference table;
  • Any reference doc that missed a case your real project hit.

The Codex tag is just colour-coding. These paths all live under skills/ in this repository and become direct GitHub links once the repo is public.