harness-engineering

하네스 엔지니어링이란 무엇인가?

에이전트 하네스는 "AI가 스스로 못하는 것"을 전제로 만든 보조 장치인데, AI 능력이 향상되면 그 전제가 낡아진다. 하네스 엔지니어링은 이 보조 장치를 AI 발전에 맞게 지속적으로 조정하는 작업이다.

Agent harnesses encode assumptions about what Claude can't do on its own, but those assumptions grow stale as Claude gets more capable.

Claude Messages API는 왜 매 턴마다 전체 대화를 다시 보내야 하는가?

Messages API는 상태를 기억하지 않는(stateless) 방식이라, AI가 이전 대화를 기억하지 못한다. 따라서 매번 요청할 때마다 하네스가 이전 대화 내용, 도구 설명, 지시사항을 모두 함께 보내야 한다.

The Messages API is stateless. Claude cannot see the conversation history of prior turns. This means that the agent harness needs to package new context alongside all past actions, tool descriptions, and instructions for Claude at each turn.

Claude API에서 prompt caching 비용을 최대화하려면 context를 어떤 순서로 배치해야 하는가?

변하지 않는 내용(시스템 프롬프트, 도구 설명)을 앞에, 매번 바뀌는 내용을 뒤에 배치해야 캐시 적중률이 높아져 비용이 절감된다. 캐시된 토큰은 일반 입력 비용의 10%만 청구되기 때문이다.

Since cached tokens are 10% the cost of base input tokens, here are a few principles in the agent harness help maximize cache hits: Static first, dynamic last — Order requests so that stable content (system prompt, tools) come first. Messages for updates — Append a <system-reminder> in messages instead of editing the prompt.

Claude API 세션 중간에 모델을 변경하면 안 되는 이유는?

캐시는 모델별로 별도 관리되므로, 세션 도중 모델을 바꾸면 기존 캐시가 모두 무효화되어 비용이 급증한다. 저렴한 모델이 필요하면 별도의 서브에이전트를 사용하는 것이 낫다.

Don't change models — Avoid switching models during a session. Caches are model-specific; switching breaks them. If you need a cheaper model, use a subagent.