BoringOS runs your agents as the CLI tools you already use — Claude Code, Codex, Gemini, Ollama — and wires them into tasks, workflows, memory, and a multi-tenant backend. Budgets, audit trails, and human approvals live in the execution path, not bolted on after.
The open-source agent framework built by Hebbs. One command boots the whole thing locally.
Boots embedded Postgres and serves the shell at localhost:3000. No Docker, no external services.
The shellis the reference app that ships with the repo — a complete operating surface for your agents. You don't build any of it to start. Run the command above and this is live at localhost:3000.
A team of agents with roles, hierarchy, and delegation — already wired.
A chat surface in every app that can run tools and edit your code.
Email / Slack / event triage queue agents work from.
Visual DAG runner over the same tools your agents call.
Tenant-isolated file storage with per-agent ACLs.
Spend caps per tenant, agent, and task — enforced at runtime.
A full example app — deals, contacts, schema, UI — shipped as a Module.
Drop in new apps as signed bundles, live, no restart.
Every connector, every app, every built-in capability is made of these. Learn them once, and you can read — or write — any part of the system.
Behavior, in markdown.
Plain .md files concatenated into the agent's system prompt on every wake. Teach it when to use a tool, the edge cases, your house style. No templating — just words.
skills/deals.md → injected under ## SkillsCapability, with types.
Zod-typed callables dispatched at POST /api/tools/<module>.<name>. The same handler runs from agents, workflows, routines, or your own routes. Every call is audited.
crm.list_deals({ stage: "blocked" })Everything, bundled.
One manifest binds skills + tools + schema + workflows + routines + webhooks + OAuth + UI. Built-ins, third-parties, your own — all the same shape. app.module(x) wires the rest.
app.module(crmModule)You set the goal. The agent figures out the rest. Every action is a Tool call. Every Tool call is audited.
Create a task, assign it to an agent or a role. A comment on a task is a message; posting one wakes the agent.
It reads its skills, the task, recent comments, and relevant memory — assembled fresh by the context pipeline.
The framework spawns Claude Code / Codex / Gemini / Ollama as a subprocess. Skills shape behavior; tools execute capability.
Budget tracked per run. High-risk steps pause for human approval. Every tool call lands in the audit log.
The run's result auto-posts as a comment. Remaining work re-wakes the agent. Memory persists across runs.
Agents have a reportsTo field. They break goals into subtasks, assign each to the right teammate, escalate when blocked, and hand off to humans cleanly.
CEO sets the goal, CTO breaks it down, engineers execute, QA validates. A next_actor state machine routes work between agents and humans.
Every run builds context. Pluggable provider — Hebbs out of the box, or your own.
Cost tracked per run. Limits per agent, per task, per tenant. No runaway spend.
A DAG that dispatches every node through the tool registry — the same handlers your agents call. Persisted runs, live SSE, replay, fork-from-here, budget gates, and pause-on-approval.
Each block resolves to a Tool — same Zod validation, same audit log, whether the call comes from an agent, a workflow, or a routine.
Every block transition streams via SSE. Watch the DAG light up — no polling, no reconstruction.
Re-execute past runs. Fork from any block. Compare two runs side by side.
Budgets, audit, runtime routing, and approvals aren't add-ons — they sit in the execution path. Here's what @boringos/core ships.
Spend caps by tenant, agent, or task. Hard stops or soft alerts. Cost — including Anthropic cache tokens — tracked per run, not estimated after.
Every tool call writes a row to tool_calls with actor, inputs, and outcome. Run transitions, comments, approvals — all on one timeline you can replay.
Route any task to Claude Code, Codex, Gemini CLI, Ollama, a raw command, or a webhook. Skills and tools stay stable while you swap the backend.
wait-for-human blocks pause a run and create an Actions-queue card. Approve, and execution resumes with your input merged in. Low-risk paths stay autonomous.
Sessions, invitations, team management, device auth — every domain row carries a tenantId. Two tenants never see each other's data.
Third-party apps ship as Ed25519-signed .hebbsmod bundles. The host verifies, content-addresses, migrates schema, and registers tools on a live process.
Agents run with --dangerously-skip-permissionsso they don't stop to ask. That's safe because every byte they read or write goes through Drive — the framework's proxy over the filesystem. Tenants can't see each other. Private files stay private.
<tenantId>/ # isolation root — you cannot escape it
├── shared/... # tenant-wide · agents read+write
├── users/<userId>/... # PRIVATE · agents denied
├── agents/<agentId>/... # agent home · own=rw · others=read-only
├── tasks/<taskId>/... # deliverables · tenant-shared
└── projects/<projectId>/... # long-running · tenant-sharedEvery path is prefixed with the tenant id at the storage layer. No code path reads or writes outside the tenant root. Path traversal is rejected before the storage call.
users/<id>/ returns "private — not accessible to agents" on every agent attempt. Not a permission you forgot to set — the default, in the type system, with a literal error string you can grep.
Agents read each other's working drafts (transparency by default) but can only write to their own agents/<id>/ folder. Cross-agent writes are rejected before storage.
Reads, writes, lists, deletes — all through DriveManager. That means tenant scoping, ACL check, audit row, event fan-out, memory-sync index, every time. No side door.
@boringos/core/src/modules/drive-acl.ts.Everything you'd otherwise wire up yourself — already a Module, already installed. Same shape as the one you'll write next.
Tasks, comments, agents, runs
Pluggable cognitive memory
File storage + ACL
DAG runner over the tool registry
Email/Slack/event triage queue
Routes new items to the right agent
Built-in chat surface for every app
Gmail + Calendar via OAuth
Channels, DMs, slash commands
A Module is a TypeScript file with a manifest. Bundle it into a .hebbsmod archive, upload it, and hosts install it per-tenant — same flow as a Chrome extension, just for agents.
Plain TypeScript. Implement the Module interface. Skills as .md, tools as Zod-typed handlers.
A .hebbsmod is a signed zip — manifest + ESM entry + skills + migrations + UI. ~100KB–2MB.
Drag it onto the shell's Apps screen. Ed25519 signature verified, bytes content-addressed.
Tenants opt in. Schema applied, tools live at /api/tools/<id>.<name>, agents read the new skills next wake.
The copilot Module is built in. A chat surface that can call any registered tool and edit your code. Zero config, auto-provisioned per tenant.
The framework underneath the shell. Modules sit on top; you usually only depend on @boringos/core and @boringos/module-sdk.
One command. One minute. Your own agentic OS on localhost.
npx create-boringos my-app