The human-agent contract

The human should not need to babysit the runtime. The human gives the task, scope, and authority. The runtime carries the work inside policy and keeps the work visible.

Hierarchy: OpenTool defines the platform and policy model. > tool_os is the workspace where that model becomes visible. Autopilot handles bounded autonomous work. Arbiter watches and routes at admin scope.

  • Human: define the task, define the authority, and watch the runtime.
  • Runtime: act only through declared tools, workflows, and guardrails.
  • Trust: comes from visible state, visible policy decisions, and visible exceptions.
  • Arbiter: the standing admin runtime that watches the system and routes work into bounded lanes.

The agentic loop

Every agent interaction in > tool_os follows a loop. The agent receives your message, reasons about what to do, requests tool calls, and the system evaluates those requests before executing them. This loop repeats until the task is complete.

Prompt Reason Request tool Policy check Execute Return result Loop

The critical difference from other agent systems: step 4 always happens. Every tool call passes through the policy engine before execution. There is no bypass. The agent cannot skip this step, and neither can the developer. This is Law 1.

Law 1: Every tool invocation passes through the policy engine. No exceptions. No bypass. No admin override that skips evaluation. The policy engine is gravity — it applies to everything.

What tools the agent has

The agent's tool set is determined by the intersection of what the manifest declares and what the entity's permissions allow. Available tools include:

chat
Send and receive messages with the user. The primary communication channel.
read_file
Read contents of a file from the workspace file system.
write_file
Write or create a file in the workspace. Policy-gated for sensitive paths.
list_files
List directory contents. Used for navigation and discovery.
browse
Navigate to a URL via the server-side proxy. Returns page content.
screenshot
Capture the current state of a browser page or workspace window.
search
Search file contents, workspace state, or external sources.
shell
Execute a shell command in a sandboxed environment. Heavily governed.
task_create
Create a task on the task board. Structured output for work tracking.
canvas_edit
Create or modify canvas layouts. Manipulates Slab and Block elements.
memory_store
Save context to agent memory for recall in future sessions.
mcp_call
Call any registered MCP server tool. Namespaced as mcp__[server]__[tool].

Additional tools come from installed plugins and MCP servers. Each one ships with an opentool.json manifest that declares its capabilities and required permissions.

How tool calls work

When the agent decides to use a tool, it emits a structured tool_use event with the tool name and input parameters. This is not free-form text — it is a typed request that the system can parse, evaluate, and audit.

agent emits tool_use:
{
"name": "browse",
"input": {
"url": "https://news.ycombinator.com"
}
}
 
PolicyEngine: ALLOW — browse permitted for this entity
Executing browse via server-side proxy...
Result returned to agent context

The policy engine

Every tool call is intercepted by the PolicyEngine before execution. The policy engine evaluates the request against the current entity's role, the tool's manifest, and any org-wide guardrails. It returns one of four outcomes:

ALLOW BLOCK QUARANTINE REDIRECT
  • ALLOW — The tool call is permitted. Execution proceeds. The most common outcome for well-configured entities.
  • BLOCK — The tool call is denied. The agent receives an error message explaining why. Example: an agent trying to delete a production database without the infra:destructive permission.
  • QUARANTINE — The tool call is held for human review. It appears in the Quarantine app surface where an admin can approve or reject it. Used for high-risk operations that need a human in the loop.
  • REDIRECT — The tool call is rerouted. The policy engine substitutes a safer alternative — for example, redirecting a write to a sandbox environment instead of production.
agent emits tool_use:
{
"name": "shell",
"input": {
"command": "rm -rf /var/data/*"
}
}
 
PolicyEngine: BLOCK
Reason: destructive shell command matches guardrail pattern
Entity lacks permission: shell:destructive

Law 1: Policy engine is gravity

The governance model is built on three laws. The first law is the foundation:

Law 1 — Gravity: The policy engine is physics, not policy. It applies to every entity, every tool call, every time. There is no admin escape hatch. There is no "trusted" mode that skips evaluation. If you can bypass the policy engine, the entire governance model collapses.

This is an intentional design constraint. Many governance tools offer admin overrides or "break glass" mechanisms. We don't. If a tool call is blocked, you change the policy — you don't skip the evaluation. The audit trail is complete because there are no exceptions.

The circuit breaker

Emergency Stop
Press Escape at any time to immediately halt all agent operations. The circuit breaker also activates automatically on runaway output detection — if the agent produces excessive output in a short time window, execution is paused and the user is prompted to continue or cancel. This is a hard stop, not a suggestion. The agent cannot override the circuit breaker.

The circuit breaker exists because agents can enter loops, misinterpret instructions, or attempt operations at a scale the user didn't intend. The stop button is always available, always works, and always takes precedence over whatever the agent is doing.

Audit trail

Every step of the agentic loop is instrumented with OpenTelemetry. Every agent turn is a span. Every tool call is a child span. Every policy decision is recorded. This means:

  • You can reconstruct any agent session after the fact
  • You can see exactly which tools were called, with what parameters, and what the policy engine decided
  • You can correlate agent actions with infrastructure events
  • The Replay app surface provides a timeline view of any session
  • Audit data exports to your existing observability stack via OTLP