AI/MLAI DevelopmentMulti-Agent SystemsSoftware EngineeringFrontend Design

Anthropic's Three-Agent Harness Revolutionises Full-Stack AI Development

Anthropic's innovative three-agent harness enhances long-running AI development, improving coherence and output quality.

Deepak KumarApr 5, 2026 5 min read

Introduction

Anthropic has unveiled a groundbreaking multi-agent harness designed to support long-running autonomous application development. This innovative approach targets both frontend design and full-stack software creation, aiming to enhance the efficiency and quality of AI-driven projects.

Key Features of the Three-Agent Harness

The three-agent harness divides tasks among distinct agents, each responsible for specific functions: planning, generation, and evaluation. This separation is crucial for maintaining coherence and improving output quality during extended AI sessions, which can last for several hours.

Addressing Common Challenges

One of the primary challenges in autonomous coding workflows is context loss, which can lead to premature task termination. To combat this, Anthropic engineers have implemented context resets and structured handoff artifacts. These features allow the next agent in the workflow to continue from a defined state, rather than starting anew.

Self-Evaluation Mechanism

Another significant focus of the harness is self-evaluation of outputs. Agents often tend to overrate their results, especially on subjective tasks like design. To mitigate this issue, Anthropic has introduced a separate evaluator agent, which is calibrated with few-shot examples and scoring criteria. Prithvi Rajasekaran, engineering lead at Anthropic Labs, emphasises that separating the agent performing the work from the agent judging it is a powerful strategy for improving output quality.

Frontend Design Evaluation

For frontend design tasks, the team has established four grading criteria: design quality, originality, craft, and functionality. The evaluator agent navigates live pages, interacts with the interface using Playwright MCP, and provides detailed critiques to guide the generator in iterative cycles. Each cycle produces progressively refined outputs, with iterations ranging from five to fifteen per run, sometimes taking up to four hours.

Industry Feedback

Industry practitioners have praised the structured approach of the three-agent harness. Artem Bredikhin noted on LinkedIn that long-running AI agents often fail due to context loss, stating that the breakthrough lies in the structured framework, which includes JSON feature specs, enforced testing, and a commit-by-commit progress system. Raghus Arangarajan also commented that the three-agent framework offers a repeatable workflow for multi-hour sessions, ensuring that evaluation and iteration are distinct from generation, thereby enhancing reliability and output quality.

Performance Improvements

Anthropic engineers have applied the three-agent framework across various task types to assess performance improvements. They found that separating planning, generation, and evaluation allows for better handling of subjective assessments while maintaining reproducibility in objective tasks. The structured multi-agent workflow also facilitates incremental progress during long-running sessions by clearly defining responsibilities and handoffs between agents.

Operational Considerations

To effectively implement this workflow, teams must establish evaluation criteria and calibrate scoring mechanisms while monitoring iterative output. Although agents can execute evaluations automatically, human oversight remains essential for initial calibration and quality validation. The workflow supports distributed processing of tasks, allowing multiple agents to operate in parallel or sequentially based on dependencies.

Future Implications

As AI models continue to evolve, the role of the harness may shift, with next-generation models potentially handling some tasks directly. Improved models will also enable the harness to tackle more complex work. Engineers are encouraged to experiment, monitor traces, decompose tasks, and adjust harnesses as the landscape of AI capabilities evolves.

Conclusion

Anthropic's three-agent harness represents a significant advancement in the field of AI development. By addressing common challenges and improving the workflow for long-running sessions, this innovative approach is set to enhance the quality and reliability of AI-driven projects.

Deepak Kumar

Sr Software Engineer at India Today Group

MERN Stack · Generative AI · AI Development · Multi-Agent Systems

Hire Me

All Articles

AI DevelopmentMulti-Agent SystemsSoftware EngineeringFrontend DesignAutonomous Agents