What this is, and why it exists
The hard part of building software with agents isn't writing code — agents write code fine. It's keeping the work coherent across weeks of sessions: each session forgets what the last one decided, the codebase grows past what one person can hold in their head, and agents confidently report "done" on things that aren't. Everything below is the practice built around those three problems.
The split: I shape the work and decide; agents execute and produce evidence; I check it; lessons get written down where the next session will find them. The rules, the loop, and the machinery exist to keep that split working under load.
The scope is small things, well-shipped: news aggregators, internal tools, small-business backends, tracking apps for a community or church or neighborhood, content sites for niche audiences — solo-founder work, up to a few thousand daily users. Not enterprise; not trying to be. Read 01–04 for the argument. Skim 05–08 for the machinery, the sources, and blocks worth copying.
None of this was invented in one prompt or copied wholesale from one book. It came out of months of shipping small projects with AI agents, watching them confidently break things, and writing down what stopped the breakage the second time around. The lineage is software craft and indie practice; each rule had to keep earning its place. — Origin note
Thirteen steering rules
The stable rules behind every decision, grouped by what they do. They change slowly; almost everything else is downstream.
Shape the work before building
Keep the codebase honest
Allocate human and model attention
Compound only what earns its keep
The lineage behind the principles
The North Stars aren't original — they're the durable parts of software craft and agentic practice, grouped the same four ways. Lineage, not literature review.
Shape the work before building
- Will Cole's Invariants — think then do · don't design without strategy · don't build without a plan · do what you said · make products earn their keep.
- Shape Up / 37signals — fixed appetite, variable scope; shape before build.
- TDD / Beck — confirm failure for the right reason before fixing.
- JTBD / Moesta — "tell me about the last time you…" beats "what do you want?" for finding the real job.
Keep the codebase honest
- Pragmatic Programmer — DRY, orthogonality, tracer bullets, crash early.
- Clean Code / Architecture — readability, clear boundaries, Boy Scout Rule.
- Ousterhout — deep modules with small, stable interfaces.
- TDD — verify behavior at boundaries.
Allocate human and model attention
- Knowledge locality / PBM — decide where the relevant knowledge lives.
- 12-Factor Agents — instruction budgets, agent-tool boundaries.
- Pocock / agentic engineering — vertical slicing, context discipline.
- Review economics — agent output scales faster than human attention.
Compound only what earns its keep
- Karpathy / autoresearch loops — fixed budget, ground-truth metric, learn each run.
- Tidy First + Boy Scout Rule — each change reduces future cost.
- Pieter Levels / indie practice — boring, durable, low-maintenance stacks.
- Outcome metrics — value-correlated outcomes over flattering proxies.
From raw idea to shipped, verified work
The path every task follows from idea to shipped work. The slope model below is the gate: a task crosses into Build only when the path is downhill — scoped, contracted, and verifiable.
- Appetite is a budget, not an estimate. S / M / L fix how much I'll spend; scope flexes to fit.
- The shaped contract is a ceiling. During Build, scope can be cut but never added without re-shaping.
- "Done" means more than "output exists." It means verification ran, the learning got captured, the branch is clean, and the artifact is where it belongs.
Factory-wide abstraction, project-local specificity
Higher layers are factory-wide and slow-changing; lower layers are project-local and concrete. Each layer inherits its parent's rules and specializes them. A conflict between a factory rule and a project's need becomes an explicit local exception with a written reason — never a silent override.
claude.pushPolicy = local-only at its own layer. The agent reads the local layer first; the factory rule still loads but defers to the explicit local exception. Override is allowed; silence is not.How principles become behavior
A rule that nothing enforces is just a wish. Each mechanism below turns one slice of the rules into something the agent actually does — at a specific moment, with an observable result.
| Mechanism | What it enforces | When it fires | Example |
|---|---|---|---|
| Rules | NS-6, NS-8, NS-9 | Every turn, ambient | "Never git push --force"; "Verify spec claims before building on them" — loaded into the model's context on session start. |
| Skills | NS-1, NS-4, NS-3 | Invoked by name (/ship, /shape) |
/ship runs verification, updates plan frontmatter, writes a retro line, refuses to continue if any step fails. |
| Hooks | NS-4, NS-6, NS-12 | Lifecycle events (pre-commit, pre-write, stop) | A pre-commit hook blocks the commit if git diff --stat shows files the agent didn't intend to stage. |
| Agents | NS-8, NS-9, NS-10 | When a task needs a lens the main thread can't supply | A challenger agent (Codex or Gemini) reviews the plan with a different model before Build begins. |
| Memory | NS-1, NS-13 | On session open; on lesson capture | LEARNED.md and the project's CLAUDE.md decisions log — read at session start so yesterday's correction reaches today. |
| Review packets | NS-4, NS-8 | Plan review and large-change checkpoints | A Markdown bundle (problem, constraints, risks, acceptance) sent to Codex today and Gemini next week — same input, comparable verdicts. |
| Ledgers | NS-4, NS-6, NS-11 | Ship boundary; async-closure resolution | .claude/state/ship-ledger.jsonl row written on every ship: {ticket, slice, verified_at, verify_cmd, exit_code} — auditable months later. |
| Ship boundary | NS-4, NS-11, NS-13 | The moment work is marked done | /ship verifies, updates plan status, appends to LEARNED.md, files leftover TODOs as issues, cleans the branch — atomic, no manual catch-up. |
The list below is the same eight mechanisms, framed as definitions for readers who want the categorical shape before the table.
Sources and lineage
Where the ideas come from, with enough context to look up the unfamiliar. Most are useful in parts and overclaim in others — none wholesale.
Software craft
Agentic & product practice
/grill-with-docs for challenging the plan against the domain language. The grill skill driving the design pass for this page is his.Blocks you can borrow
Reusable pieces, ready to adapt. Drop them into a project's CLAUDE.md, an agent system prompt, or a rules file; rename to your own. Start-here-at-week-4: if the build already feels noisy, Appetite gates and Shape contract pay back fastest.
The 13 North Stars (condensed)
SHAPE BEFORE BUILDING
NS-1 Think before doing — verify assumptions before going deeper
NS-2 Prototype-first for risk — tracers/spikes; harden or delete
NS-3 Planning depth scales with risk — appetite=budget, risk=ceremony
KEEP THE CODEBASE HONEST
NS-4 Verify, don't claim — observed proof, not confidence
NS-5 Depth over surface — deep modules, narrow interfaces
NS-6 Loud failure beats silent corruption
NS-7 Code resists entropy — boring & legible over clever
ALLOCATE ATTENTION
NS-8 Review capacity is scarce — spend it where it changes outcomes
NS-9 Anchor decisions to knowledge locality
NS-10 Cheapest-effective per invocation
COMPOUND WHAT EARNS ITS KEEP
NS-11 Outcome over output — connect to real signal
NS-12 Routine work auto-runs — idempotent, observable, reconciled
NS-13 Compound the substrate — extend/refine/replace, don't balloon
The operating loop
Bench → capture ideas without interrupting current work
Think → frame the problem; name the outcome, not a proxy
Shape → reduce risk to a builder contract ┐
├─ SHAPE GATE
Build → execute inside the contract ┘
Verify → prove behavior; no claim without proof
Done → ship boundary + learning capture + cleanup
Feedback: Retro → LEARNED → Vault / Rules / Skills → next loop
Appetite gates (fixed budget, variable scope)
S small → zero interaction; pick up → verify → auto-ship
M medium → draft plan → 1 approval → build → "ready to ship?"
L large → co-shape → approve → build → checkpoint → ship
Rule: appetite fixes the budget; if it feels L but scope can be
cut to M, cut it. The shaped contract is a CEILING during build.
Shape contract — minimum fields
Problem what's broken/missing? (name the outcome)
Appetite S / M / L
Risk score novelty + reversibility + deps + interface (0–8)
Solution sketch rough shape, not a detailed spec
Rabbit holes what could balloon scope?
No-gos what we are explicitly NOT doing
Acceptance checkboxes + verification commands
Verified assumptions every claim about existing code, grep-checked
Example row:
Claim pipeline.ts caps stories at TARGET_STORIES=7
Verify rg -n "TARGET_STORIES\s*=" src/lib/pipeline.ts
Result src/lib/pipeline.ts:247:const TARGET_STORIES = 7;