Harness 设计哲学 — 从 Anthropic 理念到 Claude Code 源码实证

Anthropic 的 Harness 核心论点

先理解理念，再看代码。这是本文与其他分析的根本区别。

Anthropic 说了什么？

Anthropic 在多篇博文中阐述了 Agent 构建哲学。核心论点并非「如何约束模型」，而是：

论点 1：Agent 就是循环中的 LLM + 工具

"An agent is not a special architecture — it is a model that repeatedly calls tools until a task is done."

——《Building Effective Agents》, Dec 2024

论点 2：模型越强，Harness 应越薄

"As models get more capable, harnesses should get simpler." 早期 LLM 需要复杂流水线（RAG、CoT scaffold、多步验证），因为模型不可靠。模型改进后，这些脚手架应当被移除。

论点 3：把智能推进模型，而非代码

"Push intelligence into the model, not into code." 如果你在写代码来验证和重试模型输出，也许应该改进 Prompt。模型应该是主要决策者，代码只是管道。

论点 4：自行车，不是铁轨

好的 Harness 像自行车 —— 放大骑手（模型）的能力，骑手控制方向。坏的 Harness 像铁轨 —— 把模型限制在预定义的轨道上。

论点 5：Harness 的有趣空间会移动，不会缩小

模型改进不会让 Harness 变得不重要。补偿模型弱点的 Harness 组件会消失，但新的 Harness 能力会出现。工程师的工作是持续发现新的有价值组合。

关键认知转变：大多数人（包括我之前的分析）将 Harness 理解为「约束层」—— 权限、Hook、阻止。但 Anthropic 的真正哲学是：Harness 是赋能层（记忆、压缩、后台任务让模型做更多事）+ 最薄必要安全层（只在模型绝对无法自我保护的地方介入）。理解这个区别，才能真正理解 Claude Code 的架构决策。

01 极简循环：Agent 就是 while(true)

理念 "Agent is just LLM in a loop with tools" → 源码 query.ts 的核心循环

理念：不要过度设计

Anthropic 明确警告不要使用复杂的 Agent 框架："We suggest that developers first consider whether they can achieve their goals with direct API calls and simple code, before adopting an agent framework." 框架增加了抽象层，模糊了 Prompt 和 Response 的真实交互。

源码实证：query.ts 的核心循环

src/query.ts — 整个 Agent 的核心只有这个结构

while (true) {                              // ← 无限循环。就这么简单。

  // 1. 调用 Claude API
  for await (const message of deps.callModel({
    messages: messagesForQuery,
    systemPrompt: fullSystemPrompt,
    tools: toolUseContext.options.tools,
    toolChoice: undefined,                   // ← 模型自由选择工具
  })) {
    // 收集流式响应 + tool_use blocks
  }

  // 2. 是否需要继续？
  if (!needsFollowUp) {
    break                                    // ← 自然结束
  }

  // 3. 执行工具，把结果喂回去
  yield* runTools(toolUseBlocks, ...)
  // 循环回到步骤 1
}

没有状态机。没有流程图。没有 step-1, step-2, step-3。
整个 Claude Code Agent 的核心就是：调 API → 执行工具 → 继续。模型决定做什么，循环只负责执行和转发。这不是因为工程偷懒，而是 Anthropic 的核心设计信条：最简单的方案往往是最好的方案。

02 Prompt 优于代码：把智能推进模型

理念 "Push intelligence into the model" → 源码 System Prompt 替代了硬编码逻辑

理念：如果你在写代码来校验模型输出，也许应该改进 Prompt

传统做法是在代码中写规则（"如果模型调用了 cat 命令则拒绝"），Anthropic 的做法是在 Prompt 中写指导（"请使用 Read 而非 cat"），让模型自己做出正确选择。

源码实证：System Prompt 编码行为规范

src/constants/prompts.ts — 行为规范是 Prompt 文本，不是代码逻辑

// getUsingYourToolsSection() — 工具使用优先级
"Do NOT use Bash to run commands when a relevant dedicated tool is provided.
 This is CRITICAL to assisting the user:
 - To read files use Read instead of cat, head, tail, or sed
 - To edit files use Edit instead of sed or awk
 - To create files use Write instead of cat with heredoc
 - To search for files use Glob instead of find or ls
 - To search content use Grep instead of grep or rg"

// getSimpleDoingTasksSection() — 代码质量规范
"Don't add features, refactor code, or make 'improvements' beyond what was asked."
"Don't add error handling for scenarios that can't happen."
"Don't create helpers or abstractions for one-time operations."
"Three similar lines of code is better than a premature abstraction."

关键对比：Claude Code 没有在 BashTool 的代码中写 if (command.startsWith("cat")) reject()。它在 System Prompt 中告诉模型「请用 Read 而非 cat」。模型的推理能力替代了硬编码规则。这就是「把智能推进模型」的具体实践。

源码实证：工具描述承担行为引导

工具 prompt 不只是 schema，更是行为指南

// GlobTool 的 description —— 不只说「能做什么」，还说「什么时候用别的工具」
"- Fast file pattern matching tool that works with any codebase size
 - When you are doing an open ended search that may require multiple
   rounds of globbing and grepping, use the Agent tool instead"

// BashTool 的 description —— 嵌入了大量使用指南
"- Avoid unnecessary sleep commands
 - Do not retry failing commands in a sleep loop — diagnose the root cause
 - If you must poll an external process, use a check command rather than sleeping first
 - NEVER skip hooks (--no-verify) unless the user explicitly asks"

每个工具的 prompt() 方法输出的不仅是参数定义，更是使用指南、最佳实践和禁止事项。模型通过阅读这些指南来做出正确决策，而非通过代码强制执行。

03 自行车，不是铁轨：模型拥有完全的决策自由

理念 "Give model autonomy" → 源码 toolChoice: undefined

核心证据：`toolChoice: undefined`

src/query.ts — 模型自由选择工具

for await (const message of deps.callModel({
  messages: messagesForQuery,
  systemPrompt: fullSystemPrompt,
  tools: toolUseContext.options.tools,
  toolChoice: undefined,    // ← 不强制任何工具。模型自己选。
})) { ... }

这一行 toolChoice: undefined 蕴含了整个设计哲学：

没有预定义工作流 — 不会强制「先搜索、再阅读、再编辑」
没有步骤限制 — 模型可以调用零个、一个或多个工具
没有工具顺序 — 模型可以先编辑再搜索，如果它认为合理
循环由 needsFollowUp 控制 — 只要模型还在调用工具，循环就继续

「自行车 vs 铁轨」的具体体现：很多 Agent 框架定义了「Research Phase → Planning Phase → Implementation Phase → Testing Phase」的固定流程。Claude Code 完全没有这种东西。模型可以在任何时刻做任何事 —— 搜索、编辑、运行测试、问用户问题。Harness 提供工具和安全边界，但不规定使用顺序。

04 赋能大于限制：Harness 让模型做到原本做不到的事

理念 "Harness provides what model cannot" → 源码 7 个赋能系统

认知转变：Harness 的首要职责是赋能

大多数人只关注 Harness 的「限制」功能（权限、Hook、阻止）。但 Claude Code 中，赋能代码的体量远超限制代码：

赋能系统	模型原本的限制	Harness 如何突破
持久记忆	模型每次对话从零开始	Memory 系统 + CLAUDE.md 跨会话持久化知识。Harness 预创建目录，Prompt 告知「直接写入，不用 mkdir」
上下文压缩	上下文窗口有限	自动压缩（auto compact）+ 微压缩（micro compact），让对话理论上无限长。Prompt 告知模型「上下文会自动压缩」
后台任务	模型一次只能做一件事	AgentTool 可以 spawn 子代理在后台运行。15 秒自动后台化。模型可以同时推进多个工作流
工具发现	所有工具占用上下文空间	ToolSearch 按需发现工具。不常用工具标记 `shouldDefer`，不加载到初始 Prompt
Prompt 缓存	每次调用付全额 Token 费	静态/动态 Prompt 分界 + CacheSafeParams 跨进程共享，大幅降低 API 成本
文件状态追踪	模型不知道文件是否被外部修改	ReadFileState 缓存 + 陈旧检测，防止覆盖用户的 IDE 编辑
推测执行	每次写入都是不可逆的	Speculation Overlay 沙箱让模型「先试试看」，用户确认后再提交

源码实证：Memory 系统 —— Harness 消除模型的无效操作

src/memdir/memdir.ts — "Claude was burning turns on mkdir"

/**
 * Shipped because Claude was burning turns on `ls`/`mkdir -p` before writing.
 * Harness guarantees the directory exists via ensureMemoryDirExists().
 */
export const DIR_EXISTS_GUIDANCE =
  'This directory already exists — write to it directly with the Write tool
   (do not run mkdir or check for its existence).'

这是 Harness 赋能的经典案例：模型过去会浪费 API 调用来检查目录是否存在。解决方案不是「训练模型不检查」，而是：1) Harness 预创建目录（确定性操作）；2) Prompt 告知模型目录已存在。两者协作，零浪费。Harness 处理基础设施，模型专注推理。

源码实证：Prompt 告诉模型「你的能力被 Harness 扩展了」

src/constants/prompts.ts — 透明协作

// System Prompt 明确告知模型 Harness 的赋能存在：

"The system will automatically compress prior messages in your conversation
 as it approaches context limits. This means your conversation with the user
 is not limited by the context window."

"Calling Agent without a subagent_type creates a fork, which runs in the
 background and keeps its tool output out of your context — so you can keep
 chatting with the user while it works."

模型知道自己被赋能了 —— 上下文会自动压缩，子代理可以后台运行。这种「透明协作」让模型能更好地利用 Harness 提供的能力。

05 外部验证：自我评估是不可靠的

理念 "Agents confidently praise mediocre work" → 源码 Verification Agent

理念：Generator 和 Evaluator 必须分离

Anthropic 的工程博客明确指出："Self-evaluation fails. Agents confidently praise mediocre work when judging their own output. Tuning a standalone evaluator to be skeptical turns out to be far more tractable than making a generator critical of its own work."

源码实证：Verification Agent —— 专门用来「挑毛病」的代理

src/tools/AgentTool/built-in/verificationAgent.ts — 对抗性验证

const VERIFICATION_SYSTEM_PROMPT = `You are a verification specialist.
Your job is not to confirm the implementation works — it's to try to break it.

You have two documented failure patterns:

First, verification avoidance: when faced with a check, you find reasons
not to run it...

Second, being seduced by the first 80%: you see a polished UI or a
passing test suite and feel inclined to pass it, not noticing half the
buttons do nothing...

Your entire value is in finding the last 20%.`

这个 Prompt 编码了 Anthropic 从实践中学到的教训。「verification avoidance」和「seduced by the first 80%」是真实观察到的 Agent 失败模式。解决方案不是让主 Agent 更「严格」，而是产生一个独立的、被 Prompt 设定为「天然怀疑者」的 Agent。这就是 Generator-Evaluator 分离的 Claude Code 实现。

源码实证：Stop Hooks —— 确定性外部验证

Stop Hooks 实现了另一种外部验证：不依赖任何 LLM，而是运行确定性的 Shell 命令。

src/query/stopHooks.ts — Hook 可以阻止模型声称「完成」

// Hook 返回 preventContinuation = true → 模型被阻止继续
if (result.preventContinuation) {
  stopReason = result.stopReason || 'Stop hook prevented continuation'
}

一个配置了 npm test 的 Stop Hook，比任何 LLM 验证都更可靠 —— 测试要么通过，要么不通过，没有「差不多通过」。

06 随模型演进：代码中的「拆除标记」

理念 "Capability reduces complexity" → 源码 @[MODEL LAUNCH] 标记

理念：好的 Harness 工程师随时准备删代码

Anthropic 说得很清楚：随着模型改进，补偿模型弱点的 Harness 组件应当被移除。你的 Harness 代码应该有「过期日期」。

源码实证：`@[MODEL LAUNCH]` —— 显式的拆除标记

src/constants/prompts.ts — 三个真实的 @[MODEL LAUNCH] 标记

// 标记 1：模型过度注释问题
// @[MODEL LAUNCH]: Update comment writing for Capybara —
// remove or soften once the model stops over-commenting by default

// 标记 2：模型不够彻底问题
// @[MODEL LAUNCH]: capy v8 thoroughness counterweight —
// un-gate once validated on external via A/B

// 标记 3：模型虚假声明问题
// @[MODEL LAUNCH]: False-claims mitigation for Capybara v8
// (29-30% FC rate vs v4's 16.7%)

这些标记是 Harness 演进哲学的最直接证据。每个 @[MODEL LAUNCH] 标记意味着：「这段代码补偿了当前模型的弱点。当下一代模型不再有这个问题时，删除它。」甚至记录了具体数据（虚假声明率从 16.7% 升到 29-30%），为未来的「是否移除」决策提供量化依据。

源码实证：CLAUDE_CODE_SIMPLE —— 700 行 Prompt 的消融测试

src/constants/prompts.ts — 完整 Prompt vs 最简 Prompt

// 正常模式：700+ 行精心编排的 System Prompt
return [
  getSimpleIntroSection(),        // 身份 + 风格
  getSimpleSystemSection(),       // 系统规则
  getActionsSection(),            // 安全准则
  getUsingYourToolsSection(),     // 工具使用
  getSimpleToneAndStyleSection(), // 输出风格
  getOutputEfficiencySection(),   // 效率
  ...resolvedDynamicSections,     // 记忆 + 环境
]

// SIMPLE 消融模式：一行字符串
if (isEnvTruthy(process.env.CLAUDE_CODE_SIMPLE)) {
  return [`You are Claude Code, Anthropic's official CLI for Claude.
CWD: ${getCwd()}
Date: ${getSessionStartDate()}`]
}

这个消融开关回答一个关键问题：「那 700 行 Prompt 到底值多少？」 如果未来模型强到不需要详细指导就能正确行为，SIMPLE 模式就够了。这就是「模型越强，Harness 越薄」的工程准备。

07 必要的硬边界：模型绝对无法自我保护的地方

理念 "Safety through alignment + defense in depth" → 源码仅在必要处硬编码

理念：安全的主体是模型对齐，硬边界是纵深防御

Anthropic 的安全理念不是「用代码把模型锁死」，而是「训练模型本身可信 + 代码提供纵深防御」。Harness 中的硬限制应该只存在于模型绝对无法自我保护的场景：

模型不知道文件在它读取后被外部修改了（陈旧写入）
模型可能被 Prompt Injection 诱导写入敏感文件（路径保护）
模型不知道自己的 Token 消耗（上下文管理）
模型无法强制自己的沙箱（执行隔离）

源码实证：硬编码保护只覆盖模型盲区

src/utils/permissions/filesystem.ts — 保护模型看不到的风险

// 这些文件被保护不是因为模型「不可信」
// 而是因为 Prompt Injection 可以绕过模型的判断
export const DANGEROUS_FILES = [
  '.bashrc',       // Shell 启动 → 命令注入载体
  '.gitconfig',    // Git 配置 → 可嵌入恶意 Hook
  '.ssh/config',   // SSH → 凭据泄露风险
]

// Harness 内部路径则自动放行 — 因为内容由 Harness 控制
// "this subtree is harness-controlled"
if (normalizedPath.startsWith(bundledSkillsRoot)) {
  return { behavior: 'allow' }  // ← 信任 Harness 管控的路径
}

注意这里的不对称设计：危险路径必须问用户（因为模型可能被 Prompt Injection 欺骗），Harness 内部路径自动放行（因为内容由 Harness 控制，不可能被外部污染）。硬边界只存在于模型自身无法判断的地方，其他一切都信任模型。

08 可观测可衡量：消融实验是 Harness 的科学方法

理念 "Measure harness contribution independently" → 源码消融基线

源码实证：7 个独立开关

src/entrypoints/cli.tsx — "Harness-science L0 ablation baseline"

// Harness-science L0 ablation baseline.
// feature() gate DCEs this entire block from external builds.
if (feature('ABLATION_BASELINE') &&
    process.env.CLAUDE_CODE_ABLATION_BASELINE) {
  for (const k of [
    'CLAUDE_CODE_SIMPLE',                    // System Prompt → 1行
    'CLAUDE_CODE_DISABLE_THINKING',           // 关闭 Thinking
    'DISABLE_INTERLEAVED_THINKING',           // 关闭交错思维
    'DISABLE_COMPACT',                       // 关闭压缩
    'DISABLE_AUTO_COMPACT',                   // 关闭自动压缩
    'CLAUDE_CODE_DISABLE_AUTO_MEMORY',        // 关闭记忆
    'CLAUDE_CODE_DISABLE_BACKGROUND_TASKS',   // 关闭后台
  ]) {
    process.env[k] ??= '1'
  }
}

注意 feature('ABLATION_BASELINE') —— 这个开关在外部构建中被 Dead Code Elimination 完全移除。只有 Anthropic 内部构建才有消融能力。普通用户永远运行完整 Harness。

Harness-science：这个变量名本身就说明了一切。Anthropic 把 Harness 的评估当作科学实验来做。「关闭 Thinking 后性能下降 X%」「关闭自动压缩后长对话成功率下降 Y%」—— 每个组件的价值都可以量化。这就是为什么 Anthropic 能自信地说 "Harness 与模型同等重要" —— 他们有数据支撑。

09 用户可编程：每个团队可以定制自己的 Harness

理念 "Harness should be configurable" → 源码 settings.json + 10 种 Hook 事件

Harness 不是黑盒 —— settings.json 就是 Harness 的编程接口

{
  // 权限规则 = 定义模型的能力边界
  "permissions": {
    "allow": ["Bash(npm:*)", "Bash(git:*)"],
    "deny":  ["Bash(rm -rf:*)"]
  },

  // Hooks = 在关键节点注入确定性逻辑
  "hooks": {
    "PostToolUse": [{ "matcher": "Write|Edit",
      "hooks": [{ "type": "command", "command": "prettier --write $FILEPATH" }]
    }],
    "Stop": [{ "hooks": [{ "type": "command", "command": "npm test" }] }]
  }
}

前端团队添加 prettier Hook；后端团队添加 npm test Hook；安全团队添加 deny 规则。不改 Claude Code 一行代码，每个团队都有自己的 Harness 变体。

终极综合：Harness 到底是什么？

从 Anthropic 的理念到 Claude Code 的源码，一条完整的映射链

Harness 的三重身份

身份一：赋能者 (70% 的代码量)

记忆系统、上下文压缩、后台任务、工具发现、Prompt 缓存、推测执行。这些让模型从「能聊天」变成「能编程」。

身份二：引导者 (20% 的代码量)

System Prompt、工具描述、行为规范。不是限制模型，而是用语言引导模型做出正确选择。把智能推进模型。

身份三：守护者 (10% 的代码量)

权限检查、路径保护、陈旧检测。只在模型绝对无法自我保护的地方介入。这是最薄的一层，但是最硬的一层。

从 Anthropic 理念到 Claude Code 实现的完整映射

Anthropic 的理念	Claude Code 的实现	为什么这样做
"Agents are just LLMs in loops"	`while(true) { callAPI → runTools }`，无状态机	简单系统更可靠、更可调试
"Push intelligence into model"	700 行 System Prompt 编码行为规范，而非代码强制	模型的推理能力比 if/else 更灵活
"Bicycle, not railroad"	`toolChoice: undefined`，模型自由选择工具	预定义流程会限制模型的创造性解法
"Harness enables, not restricts"	7 个赋能系统（记忆/压缩/后台/发现等）	模型的潜力需要基础设施来释放
"Self-evaluation fails"	Verification Agent + Stop Hooks	生成者不能评判自己的作品
"Capability reduces complexity"	`@[MODEL LAUNCH]` 拆除标记 + SIMPLE 消融	今天的 Harness 代码可能是明天的废代码
"Safety = alignment + defense-in-depth"	DANGEROUS_FILES 硬保护仅覆盖模型盲区	信任模型 + 纵深防御最不可控的部分
"Measure harness independently"	7 个消融开关 + Feature Gate DCE	科学量化每个 Harness 组件的 ROI
"Harness space moves, not shrinks"	Auto-compact 断路器（数据驱动的 3 次上限）	旧问题解决，新优化出现

一句话总结

Harness 不是牢笼，而是自行车。
它不限制模型去哪里，而是让模型骑得更远。
Claude Code 的源码证明：最好的 Harness 是你几乎感觉不到它存在的那种。
一个 while(true) 循环、一个 toolChoice: undefined、
和一份 700 行的 System Prompt —— 这就是一个世界级 AI Agent 的全部骨架。
其余的，都是为了让这个简单循环跑得更安全、更持久、更便宜。

AI Agent 的 Harness 设计哲学