Get began with Claude Managed Brokers by following our docs.
A operating subject on the Engineering Weblog is the best way to construct efficient brokers and design harnesses for long-running work. A typical thread throughout this work is that harnesses encode assumptions about what Claude can’t do by itself. Nevertheless, these assumptions have to be steadily questioned as a result of they will go stale as fashions enhance.
As only one instance, in prior work we discovered that Claude Sonnet 4.5 would wrap up duties prematurely because it sensed its context restrict approaching—a conduct generally known as “context anxiousness.” We addressed this by including context resets to the harness. However after we used the identical harness on Claude Opus 4.5, we discovered that the conduct was gone. The resets had turn out to be useless weight.
We anticipate harnesses to proceed evolving. So we constructed Managed Brokers: a hosted service within the Claude Platform that runs long-horizon brokers in your behalf via a small set of interfaces meant to outlast any specific implementation—together with those we run in the present day.
Constructing Managed Brokers meant fixing an previous downside in computing: the best way to design a system for “packages as but unthought of.” Many years in the past, working techniques solved this downside by virtualizing {hardware} into abstractions—course of, file—basic sufficient for packages that did not exist but. The abstractions outlasted the {hardware}. The learn() command is agnostic as as to if it’s accessing a disk pack from the Seventies or a contemporary SSD. The abstractions on prime stayed steady whereas the implementations beneath modified freely.
Managed Brokers observe the identical sample. We virtualized the elements of an agent: a session (the append-only log of all the pieces that occurred), a harness (the loop that calls Claude and routes Claude’s instrument calls to the related infrastructure), and a sandbox (an execution setting the place Claude can run code and edit recordsdata). This enables the implementation of every to be swapped with out disturbing the others. We’re opinionated in regards to the form of those interfaces, not about what runs behind them.
Don’t undertake a pet
We began by putting all agent elements right into a single container, which meant the session, agent harness, and sandbox all shared an setting. There have been advantages to this method, together with that file edits are direct syscalls, and there have been no service boundaries to design.
However by coupling all the pieces into one container, we bumped into an previous infrastructure downside: we’d adopted a pet. Within the pets-vs-cattle analogy, a pet is a named, hand-tended particular person you possibly can’t afford to lose, whereas cattle are interchangeable. In our case, the server grew to become that pet; if a container failed, the session was misplaced. If a container was unresponsive, we needed to nurse it again to well being.
Nursing containers meant debugging unresponsive caught periods. Our solely window in was the WebSocket occasion stream, however that couldn’t inform us the place failures arose, which meant {that a} bug within the harness, a packet drop within the occasion stream, or a container going offline all introduced the identical. To determine what went improper, an engineer needed to open a shell contained in the container, however as a result of that container usually additionally held person information, that method primarily meant we lacked the power to debug.
A second problem was that the harness assumed that no matter Claude labored on lived within the container with it. When prospects requested us to attach Claude to their digital personal cloud, they needed to both peer their community with ours, or run our harness in their very own setting. An assumption baked into the harness grew to become an issue after we needed to attach it to totally different infrastructure.
Decouple the mind from the fingers
The answer we arrived at was to decouple what we regarded as the “mind” (Claude and its harness) from each the “fingers” (sandboxes and instruments that carry out actions) and the “session” (the log of session occasions). Every grew to become an interface that made few assumptions in regards to the others, and every may fail or get replaced independently.
The harness leaves the container. Decoupling the mind from the fingers meant the harness now not lived contained in the container. It known as the container the way in which it known as every other instrument: execute(identify, enter) → string. The container grew to become cattle. If the container died, the harness caught the failure as a tool-call error and handed it again to Claude. If Claude determined to retry, a brand new container could possibly be reinitialized with a regular recipe: provision({sources}). We now not needed to nurse failed containers again to well being.
Recovering from harness failure. The harness additionally grew to become cattle. As a result of the session log sits exterior the harness, nothing within the harness must survive a crash. When one fails, a brand new one will be rebooted with wake(sessionId), use getSession(id) to get again the occasion log, and resume from the final occasion. Throughout the agent loop, the harness writes to the session with emitEvent(id, occasion) so as to preserve a sturdy document of occasions.
The safety boundary. Within the coupled design, any untrusted code that Claude generated was run in the identical container as credentials—so a immediate injection solely needed to persuade Claude to learn its personal setting. As soon as an attacker has these tokens, they will spawn recent, unrestricted periods and delegate work to them. Slim scoping is an apparent mitigation, however this encodes an assumption about what Claude cannot do with a restricted token—and Claude is getting more and more sensible. The structural repair was to ensure the tokens are by no means reachable from the sandbox the place Claude’s generated code runs.
We used two patterns to make sure this. Auth will be bundled with a useful resource or held in a vault exterior the sandbox. For Git, we use every repository’s entry token to clone the repo throughout sandbox initialization and wire it into the native git distant. Git push and pull work from contained in the sandbox with out the agent ever dealing with the token itself. For customized instruments, we help MCP and retailer OAuth tokens in a safe vault. Claude calls MCP instruments through a devoted proxy; this proxy takes in a token related to the session. The proxy can then fetch the corresponding credentials from the vault and make the decision to the exterior service. The harness isn’t made conscious of any credentials.
The session is just not Claude’s context window
Lengthy-horizon duties usually exceed the size of Claude’s context window, and the usual methods to handle this all contain irreversible selections about what to maintain. We’ve explored these strategies in prior work on context engineering. For instance, compaction lets Claude save a abstract of its context window and the reminiscence instrument lets Claude write context to recordsdata, enabling studying throughout periods. This may be paired with context trimming, which selectively removes tokens similar to previous instrument outcomes or considering blocks.
However irreversible selections to selectively retain or discard context can result in failures. It’s troublesome to know which tokens the long run turns will want. If messages are remodeled by a compaction step, the harness removes compacted messages from Claude’s context window, and these are recoverable provided that they’re saved. Prior work has explored methods to handle this by storing context as an object that lives exterior the context window. For instance, context will be an object in a REPL that the LLM programmatically accesses by writing code to filter or slice it.
In Managed Brokers, the session gives this identical profit, serving as a context object that lives exterior Claude’s context window. However somewhat than be saved throughout the sandbox or REPL, context is durably saved within the session log. The interface, getEvents(), permits the mind to interrogate context by deciding on positional slices of the occasion stream. The interface can be utilized flexibly, permitting the mind to choose up from wherever it final stopped studying, rewinding just a few occasions earlier than a selected second to see the lead up, or rereading context earlier than a selected motion.
Any fetched occasions may also be remodeled within the harness earlier than being handed to Claude’s context window. These transformations will be regardless of the harness encodes, together with context group to attain a excessive immediate cache hit charge and context engineering. We separated the considerations of recoverable context storage within the session and arbitrary context administration within the harness as a result of we are able to’t predict what particular context engineering will probably be required in future fashions. The interfaces push that context administration into the harness, and solely assure that the session is sturdy and out there for interrogation.
Many brains, many fingers
Many brains. Decoupling the mind from the fingers solved considered one of our earliest buyer complaints. When groups needed Claude to work towards sources in their very own VPC, the one path was to see their community with ours, as a result of the container holding the harness assumed each useful resource sat subsequent to it. As soon as the harness was now not within the container, that assumption went away. The identical change had a efficiency payoff. After we initially put the mind in a container, it meant that many brains required as many containers. For every mind, no inference may occur till that container was provisioned; each session paid the complete container setup value up entrance. Each session, even ones that might by no means contact the sandbox, needed to clone the repo, boot the method, fetch pending occasions from our servers.
That useless time is expressed in time-to-first-token (TTFT), which measures how lengthy a session waits between accepting work and producing its first response token. TTFT is the latency the person most acutely feels.
Decoupling the mind from the fingers signifies that containers are provisioned by the mind through a instrument name (execute(identify, enter) → string) provided that they’re wanted. So a session that did not want a container immediately did not look forward to one. Inference may begin as quickly because the orchestration layer pulled pending occasions from the session log. Utilizing this structure, our p50 TTFT dropped roughly 60% and p95 dropped over 90%. Scaling to many brains simply meant beginning many stateless harnesses, and connecting them to fingers provided that wanted.
Many fingers. We additionally needed the power to attach every mind to many fingers. In apply, this implies Claude should purpose about many execution environments and determine the place to ship work—a more durable cognitive job than working in a single shell. We began with the mind in a single container as a result of earlier fashions weren’t able to this. As intelligence scaled, the one container grew to become the limitation as a substitute: when that container failed, we misplaced state for each hand that the mind was reaching into.
Decoupling the mind from the fingers makes every hand a instrument, execute(identify, enter) → string: a reputation and enter go in, and a string is returned. That interface helps any customized instrument, any MCP server, and our personal instruments. The harness doesn’t know whether or not the sandbox is a container, a telephone, or a Pokémon emulator. And since no hand is coupled to any mind, brains can move fingers to 1 one other.
Conclusion
The problem we confronted is an previous one: the best way to design a system for “packages as but unthought of.” Working techniques have lasted many years by virtualizing the {hardware} into abstractions basic sufficient for packages that did not exist but. With Managed Brokers, we aimed to design a system that accommodates future harnesses, sandboxes, or different elements round Claude.
Managed Brokers is a meta-harness in the identical spirit, unopinionated in regards to the particular harness that Claude will want sooner or later. Reasonably, it’s a system with basic interfaces that permit many various harnesses. For instance, Claude Code is a wonderful harness that we use broadly throughout duties. We’ve additionally proven that task-specific agent harnesses excel in slender domains. Managed Brokers can accommodate any of those, matching Claude’s intelligence over time.
Meta-harness design means being opinionated in regards to the interfaces round Claude: we anticipate that Claude will want the power to govern state (the session) and carry out computation (the sandbox). We additionally anticipate that Claude would require the power to scale to many brains and plenty of fingers. We designed the interfaces in order that these will be run reliably and securely over very long time horizons. However we make no assumptions in regards to the quantity or location of brains or fingers that Claude will want.
Acknowledgements
Written by Lance Martin, Gabe Cemaj, and Michael Cohen. Due to Nodir Turakulov and Jeremy Fox for useful conversations on these subjects. Particular due to the Brokers API crew and Jake Eaton for his or her contributions.










