Sandboxing AI Coding Agents

Isolate agents so you can grant broad permissions without risking your host.

Two managed methods ship with Baton — Lima VM and Safehouse. For anything else (Docker, Podman, a dedicated WSL2 distro, bubblewrap, firejail) you point Baton at a custom wrapper and it runs every agent command through it. On top of any of those, agents ship with their own permission systems and sandboxes that are worth leaning on for supervised host work.

	Lima VM Recommended	Safehouse	Custom wrapper	Agent Built-In
Filesystem isolation	Kernel (VM)	Same kernel	Varies	Same kernel
Network sandboxing	✓	✗	You wire it up	✓
Safe with bypass-permissions	✓	✗	Varies	✗ disables it
Platforms	macOS, Linux	macOS	All	All

Lima VM macOS · Linux

A managed Debian VM. Apple's Virtualization.framework on Apple Silicon, QEMU on Linux. One VM serves every workspace.

Setup

Settings → Sandboxing → Set up Lima VM → tune resources → Create and start VM. Install limactl first: brew install lima on macOS, apt install lima (or your distro's equivalent) on Linux.

By default, Baton starts the VM when the app opens and stops it when the app quits, so sandboxed agents are ready without an extra step and the VM isn't burning CPU/RAM in the background. Both toggles live under VM Status in the Lima section if you'd rather start and stop the VM yourself.

Agent integrations after first boot

Once the VM is running, each agent needs hook entries inside the VM so its notifications reach the host and Baton's MCP server is available. Baton writes them for you — expand Agent Integrations under the Lima VM section and flip on Notifications and MCP Server for whichever of Claude Code, Codex, Gemini, or OpenCode you use. Toggle off to remove; re-run after Baton updates if anything in the hook shape changes.

Under the hood: Baton's HTTP server binds to 127.0.0.1 only, so it's not exposed to your LAN. The agent hooks and MCP config inside the VM point at host.lima.internal:<port>; Lima's user-mode networking proxies that address back to the host's loopback. With the egress firewall on (the default), this is the only host port the VM can reach — see Network isolation below.

Mounts

Read-write ~/.baton/repos, ~/.baton/worktrees. Read-only ~/.baton/settings, ~/.baton/skills, ~/.baton/vm-shared. Nothing else on the host is visible. Add extras under Filesystem Access — they land at the same path inside the VM, so ~/projects/foo on the host is ~/projects/foo in the VM.

Heads up: the agent writes to your repos and worktrees with host file permissions. As soon as you open, build, or run that code outside the VM — opening the folder in your host editor, npm install, cargo build, running a script — it executes as you, on your host. The VM contains the agent process; reviewing diffs before merging or running on the host is on you.

SSH agent forwarding off by default

Off: local git works; only git push and private-repo clones fail. On: the agent can push directly — signing requests cross the boundary, keys stay on the host. Heads up: the agent can use any key your ssh-agent holds; avoid keys with force-push rights to critical repos.

Restrict sudo on by default · creation-time

Leave on for day-to-day development that doesn't need root: the agent can't touch /etc, /usr, or system config, so it won't slowly clutter the VM with one-off installs and tweaks. It also keeps the network isolation enforceable — without sudo restrictions an agent can sudo nft flush ruleset to drop the firewall.

How it works. At VM creation, cloud-init removes NOPASSWD from the agent user's sudoers entries, and the user is provisioned with no password set — so any sudo invocation just prompts and fails. Two common bypass paths are also closed at provision time: there are no SUID-to-other-user binaries the agent can invoke, and newuidmap/newgidmap have their SUID bits dropped with /etc/subuid cleared, so user-namespace mappings to a different uid aren't reachable either. The result is that the agent stays stuck as the unprivileged user with no in-VM path to root.

Turn off if you want the agent to install system packages or edit system files inside the VM. Either way, the VM dropdown in the title bar copies a Root shell command that SSHes you into the VM as root, so you can fix things even when the agent can't. Unlike the other toggles in this section, this one is applied during provisioning — to flip it on an existing VM you'd need to recreate it.

Network isolation on by default

Lima's network controls live on a dedicated Network sandboxing page — open it from the Lima VM section of the Sandbox settings tab. Two tabs: Firewall (rules and recent blocking activity) and Network policy (the on/off toggles). Two of the three policy toggles ship on by default; together they confine the agent to outbound internet, DNS, Lima's own internals, and Baton's agent integration — nothing else on your host or LAN.

Block host machine. Drops VM→host's loopback (reached as 192.168.5.2 from inside the VM, Lima's default gateway) except a small set of standard exemptions: DNS, DHCP, Baton's notify+MCP port, and any extra host ports you've allowed under Firewall → Host machine exemptions. To let an agent reach a host service — a local Postgres on 5432, a dev API on 3000 — add its port there; it's then reachable at host.lima.internal:<port> from inside the VM. The Baton notify+MCP exemption is opt-out and can be flipped off in the same editor if you don't want it. Turn the toggle off entirely if you want the VM to see everything bound to your loopback.

Block local networks. Drops VM→RFC 1918 LAN (router admin, NAS, IoT), IPv4 link-local, and IPv6 unique-local. The link-local block matters most on cloud machines: it covers the 169.254.169.254 instance metadata endpoint, which would otherwise let an agent exfiltrate the host's cloud credentials. Lima's own subnet stays open so DNS, DHCP, and internal traffic keep working.

Recent blocks → Allow. The Firewall tab logs what just got dropped — HTTP/HTTPS hostname blocks (when outbound blocking is on, see below) and raw TCP/UDP drops from any active block toggle. Each row has an Allow button: one click promotes that hostname to the allowlist, or that (proto, port, dest) tuple to a custom rule. Faster than guessing what an agent is reaching for ahead of time.

Heads up: the firewall lives inside the VM. With Restrict sudo off, an agent can sudo nft flush ruleset to drop the rules — keep both on for the isolation to actually be enforceable.

Block outbound traffic off by default · opt-in

The two toggles above keep the agent off your host and LAN, but outbound internet stays wide open by default. Flip Block outbound traffic on the Network policy tab — off by default — to switch the firewall's outbound default from allow to deny: agent-initiated HTTP and HTTPS get forced through an in-VM tinyproxy with FilterDefaultDeny Yes (anything off the allowlist returns a 403), and anything else outbound is dropped at the firewall unless covered by an explicit (proto, port, dest) custom rule.

Scope is the agent user's outbound traffic, deny-by-default. The nftables rule drops everything outbound originating from the unprivileged agent user, with three carve-outs: (1) HTTP/HTTPS on TCP 80/443 routed through the in-VM tinyproxy (so any client that respects HTTPS_PROXY / HTTP_PROXY — curl, fetch, requests, undici — is funneled through the hostname filter); (2) the essentials needed for the VM to function (NTP for clock sync, system updates as _apt, SSH/22 so the agent can git push); and (3) any (proto, port, dest) rules you've explicitly added under Firewall → Custom rules (e.g. a self-hosted Postgres at 10.0.0.5/32:5432). Hostname filtering only applies to traffic on the proxy path; explicit port rules bypass it by definition. Combined with Restrict sudo, the deny-by-default catch-all is what keeps a chatty or compromised agent from reaching the open internet — but treat it as hardening, not a guarantee.

How matching works. The allowlist matches on hostname only — the CONNECT target for HTTPS, the Host header for HTTP. It does not filter paths. Entries are plain hostnames or left-anchored wildcards:

api.anthropic.com — exact host
*.npmjs.org — any subdomain depth under npmjs.org

What's seeded by default. The seed covers what most agents reach for on first run: the major LLM APIs (Anthropic, OpenAI, Google's GenAI), their OAuth endpoints and web consoles (so claude /login / codex login / gemini auth complete from inside the VM), the common package registries (npm, PyPI, Cargo, Go modules, RubyGems, Maven, NuGet), Docker Hub + GHCR, all of GitHub (*.github.com + *.githubusercontent.com), and Debian apt mirrors. It is intentionally moderate — broader than "LLM API only", narrower than "everything". Trim or extend from the Firewall tab; Reset to defaults restores the seed.

Root bypass. The nftables rule that pins 80/443 to the proxy only applies to the unprivileged agent user. Root inside the VM goes direct — so admin shells via Baton's root SSH escape hatch (~/.lima/<vm>/ssh.config) can still apt-get or run one-off installs without disabling the allowlist. Combine with Restrict sudo so the agent can't escalate to that bypass.

What this does not protect against. Hostname filtering is bypassable when a target shares a CDN with an allowed host — e.g. an attacker-controlled origin fronted by Cloudflare or CloudFront with the right SNI. Treat the allowlist as hardening, not a guarantee. For stricter cases, run a MITM proxy that inspects URLs and certificates instead, or block egress entirely for tasks that don't need the network.

DNS as a residual exfil channel. By default the deny rule allows the agent to reach the host DNS resolver — agent-side getaddrinfo needs it for any custom rule referenced by hostname (a Postgres at db.example.com:5432, an SSH host). A determined agent could encode small amounts of data into queries to an attacker-controlled domain — low bandwidth, but enough for tokens or short snippets. Two mitigations: (1) flip Restrict user DNS on the Network policy tab — drops DNS for everything running as the VM's default user while keeping HTTP/HTTPS through the proxy working (tinyproxy resolves under its own system uid). Trade-off: hostname-based custom rules stop working unless you specify them by IP literal. (2) Run a filtering DNS resolver on your host (NextDNS, Pi-hole, dnsmasq with logging) to catch naive exfil patterns even when DNS stays open.

What happens when you toggle. Effect is immediate — no terminal restart needed. Tinyproxy is a permanent fixture in the VM and HTTPS_PROXY / HTTP_PROXY are always exported, so every shell already routes through it. Flipping the toggle just switches tinyproxy between filter mode (allowlist enforced) and passthrough mode (no filtering), and toggles the firewall pin on 80/443; a running agent's next request lands on whichever side you just selected. Allowlist edits while filter mode is on apply on the next request too, via a tinyproxy restart.

Port forwarding on by default

Leave on if you want to hit things the agent runs inside the VM — dev servers, a local DB, a debugger — from your host browser or tools. Anything the VM binds on localhost shows up on your host's localhost automatically. Turn off for strict containment: the VM can still reach out, but nothing it binds inside becomes visible to host processes.

Resources

CPUs, memory, disk — pick what fits your workload. The defaults (4 CPU / 4 GB / 10 GB) are a starting point, not a recommendation. Real usage scales with what the agent does: each Claude Code instance alone is 200–500 MB of RAM, and builds, test suites, linters, and dev servers running inside the VM all draw from the same pool. A few parallel agents on a Node project can saturate 4 GB easily. Bump any of CPU/memory/disk from Settings → Sandboxing → Lima Virtual Machine and restart the VM to apply. Two things to know: allocated memory doesn't return to the host quickly, so tune it rather than just maxing it out; and disk can be grown later but not shrunk, so lean a little high on that one.

Preinstalled tools

All preinstalled software is a checkbox at VM creation — some pre-checked, some not. apt install anything else inside the VM later (with Restrict sudo off, or via the root-shell escape hatch).

Pre-checked by default: Claude Code, Codex, ripgrep, fd, jq, tree, unzip, yq, Python 3.
Other CLI tools (unchecked): fzf, shellcheck, lsof, htop, GitHub CLI (gh), Playwright + Chromium, Xvfb (for headless browsers / Electron e2e via xvfb-run).
Languages & build tools (unchecked): Go, Rust (rustup), build-essential.
Containers (unchecked): containerd + nerdctl. Lighter than Docker — no daemon — and largely Docker-CLI compatible (run, build, compose, BuildKit), so alias docker=nerdctl covers most existing Docker workflows.

Customize the Lima config escape hatch

If the form doesn't cover what you need — custom cloud-init, extra networks, a different base image, a private apt repo — click Show generated config in the Lima VM section. That's the exact YAML Baton would hand to limactl. Copy it out, edit, save it as e.g. baton-vm.yaml, and create the VM yourself:

limactl create --name=baton-vm baton-vm.yaml
limactl start baton-vm

Keep the name baton-vm — that's what Baton looks for. Next time you open Sandbox settings it detects the VM, registers the sandbox method, and picks up the same Agent Integrations flow as a form-created VM.

Built-in agent sandboxing all platforms

Most agents ship with their own sandboxing — permission prompts, read-only modes, OS-level profiles — but the shape and setup vary per agent (Claude Code's tool permission system, Codex's workspace-write/read-only modes, Gemini's approval flow, etc.), so check each agent's docs. None of them give you the freedom of a Lima VM, and any "skip permissions" mode short-circuits the protection — but for supervised host work they're a sensible baseline.

Safehouse macOS

Safehouse wraps a command in a macOS sandbox-exec profile restricting filesystem access to allowlisted paths. Same-kernel, so weaker isolation than Lima. Add the preconfigured template from Settings → Sandboxing → Sandbox Methods → Safehouse.

Custom wrappers bring your own

Any tool that accepts a shell command via argv works: Docker, Podman, a dedicated WSL2 distro, systemd-nspawn, bubblewrap, firejail, a custom script. Register under Settings → Sandboxing → Sandbox Methods → Custom.

Wrap command

Shell template Baton runs per agent command. Two placeholders:

{{command}} required — replaced with the agent command.
{{env_prefix}} optional — expands to a space-separated set of BATON_* env assignments: BATON_NOTIFY_HOST (your Notify host value, so the agent's hooks know where to reach Baton) plus BATON_TERMINAL_ID (so notifications stay attributed to the exact terminal pane). You need it when the sandbox runs the agent in a fresh environment that doesn't inherit the host process env — any VM or container (Lima, Docker, WSL2). Drop it for same-kernel wrappers that inherit the host environment (e.g. Safehouse) — those vars are already present.

Examples:

# Safehouse / same-kernel wrappers — the agent inherits the host environment,
# so {{command}} goes straight after the wrapper, no env_prefix needed:
safehouse --add-dirs=~/.baton/repos --add-dirs-ro=~/.baton/notify.js:~/.baton/port:~/.baton/baton-node -- {{command}}

# Lima — the VM starts with a fresh environment. limactl passes argv after `--`,
# so prefix with `env` to apply the BATON_* assignments inside the VM:
limactl shell baton-vm -- env {{env_prefix}}{{command}}

# Docker / WSL2 — the sandbox has its own environment, so you want env_prefix.
# `docker exec` passes argv straight to the target process, so wrap it in
# `bash -lc "…"` — otherwise BATON_NOTIFY_HOST=... is read as a program name:
docker exec -i mysandbox bash -lc "{{env_prefix}}{{command}}"
wsl -d agents -- bash -lc "{{env_prefix}}{{command}}"

Shell command optional

Command that drops you into an interactive shell inside the sandbox — e.g. limactl shell baton-vm or wsl -d agents. Baton runs it when you open a terminal tab in a workspace using this method and Shell into sandbox for new terminals is on.

Notify host optional

Hostname the sandboxed process uses to reach Baton's HTTP server (notifications + MCP). Blank for same-host wrappers (e.g. Safehouse); host.lima.internal, host.docker.internal, or the host IP for VM-/container-based sandboxes.

Host paths the agent needs

Read-write ~/.baton/repos (bare git repos), ~/.baton/worktrees (agent working dir; many wrappers already expose this). Read-only ~/.baton/notify.js (hook script), ~/.baton/port (HTTP server port), ~/.baton/baton-node (node launcher).

Agent notifications and MCP wire up yourself

Lima writes Claude Code, Codex, Gemini, and OpenCode hook entries inside the VM for you (see Agent Integrations in the Lima section). Custom wrappers don't get that auto-wiring — you set it up yourself in whatever environment your wrapper runs the agent in:

Notifications. Point the agent's notification hook at node ~/.baton/notify.js --source <agent> --event <event> with BATON_NOTIFY_HOST set to your Notify host value. The {{env_prefix}} placeholder in the wrap command injects the BATON_* env vars automatically when the agent inherits the wrapper's environment. The hook config is per-agent — Claude Code's ~/.claude/settings.json, Codex's ~/.codex/config.toml, Gemini's ~/.gemini/settings.json, OpenCode's plugin file under ~/.config/opencode/plugins/.
MCP server. Register Baton's MCP endpoint at http://<notify-host>:<port>/api/v1/mcp in the agent's MCP config, with <port> read from ~/.baton/port. Same host as notifications, same per-agent config files.

Networking

If the sandbox has its own network interface (Docker, Podman, a VM), set Notify host to a hostname the sandbox can resolve back to your host, and make sure the port in ~/.baton/port is reachable from inside — publish or forward it if outbound host access is blocked by default. Same-kernel wrappers like Safehouse share the host network and need neither.

Sketch: dedicated WSL2 distro on Windows

Kernel-level isolation comparable to Lima. We haven't run this end-to-end ourselves yet — treat the steps below as directional, not verified, and expect to fill in gaps as you go.

Create the distro. Dedicated to agents. In its /etc/wsl.conf, disable Windows drive automount ([automount] enabled = false) and Windows binary interop ([interop] enabled = false), then wsl --terminate <distro>.
Mount the same paths Lima does. With automount off, use mount -t drvfs or /etc/fstab. Read-write: %USERPROFILE%\.baton\repos, %USERPROFILE%\.baton\worktrees. Read-only: %USERPROFILE%\.baton\settings, %USERPROFILE%\.baton\skills, %USERPROFILE%\.baton\vm-shared.
Fill in the Custom method fields.
Wrap command: wsl -d agents -- bash -lc "{{env_prefix}}{{command}}"
Shell command: wsl -d agents
Notify host: the Windows host IP seen from inside the distro — ip route show default, IP after via.
Wire up notifications and MCP. Per Agent notifications and MCP above. The WSL2 wrinkle: the BATON_* vars need to reach the agent inside the distro. The {{env_prefix}} in the wrap command handles this — it inlines BATON_NOTIFY_HOST and BATON_TERMINAL_ID straight into the bash -lc string, so nothing has to cross the Win32→WSL boundary via the environment.

Setting this up — or already running something similar? Reach out. We want to make WSL2 a first-class built-in sandbox method (automatic distro provisioning, notifications, MCP). Both working setups and in-progress attempts are useful input.

Applying a sandbox to your workspaces

Configuring a method doesn't sandbox anything on its own — you decide which agents use it. Two places to make that call, plus one toggle for plain terminals.

Default rules Settings → Sandboxing → Sandbox Rules: Per agent preset, decide whether it runs sandboxed by default. Tick a preset to sandbox every launch mode it has, or expand it and tick individual modes — useful when you want Claude Code's bypass-permissions mode always sandboxed but its normal mode on the host. With more than one method configured, each rule can pick a specific method or fall back to the default.
Per-workspace override at workspace creation: The sandbox pill in the New Workspace dialog lets you override the rules for that one workspace — sandbox an agent that's normally on the host, or vice versa. Cycles through your configured methods; workspaces created without a choice inherit the default rules.
Shell into sandbox for new terminals off by default: Controls what the + button on the terminal tab bar does inside a sandboxed workspace. Off is typical — + opens a host shell, and a small cube icon beside it gives you a one-click shell inside the sandbox when you want one. Flip on to swap the defaults: + drops you into the sandbox, and a desktop icon beside it opens a host shell.

Network access — the main exfil risk

Filesystem isolation limits what an agent can change. Network access decides whether anything inside the sandbox can leave it. If you grant --dangerously-skip-permissions (or the equivalent), egress is the surface that matters most.

Why it matters: agents read attacker-controlled text all the time — scraped web pages, READMEs, fetched docs, dependency descriptions. A prompt-injection path doesn't need to break the sandbox; it just needs to convince the agent to do something. With unrestricted egress, that "something" can be:

Push your source tree (and any secrets sitting in ~/.baton/repos) to a paste site or attacker-controlled gist.
Hit your dev API, internal services on your LAN, or your cloud's instance-metadata endpoint (169.254.169.254) to pull credentials.
Coordinate with a remote command-and-control server.

Mitigations

Lima VM network isolation recommended baseline

Host machine and local-network blocking on by default — see Network isolation in the Lima section. Drops VM→host loopback except an allowlist, drops VM→RFC 1918 LAN, IPv4 link-local (covering the cloud-metadata endpoint), and IPv6 unique-local. Outbound internet is still open by design — tighten with the built-in outbound block below, or roll your own proxy.

Lima outbound blocking built-in

Flip Block outbound traffic on the Network policy tab of the Lima Network sandboxing page. HTTP/HTTPS gets forced through an in-VM tinyproxy with a domain allowlist (anything not on the list returns a 403); other outbound is dropped unless covered by an explicit (proto, port, dest) custom rule. See Block outbound traffic in the Lima section for the seed list, wildcard syntax, and the SNI/CDN caveat.

Roll your own egress proxy

If you want something different from Baton's built-in proxy — URL-level filtering, shared corporate ACLs, traffic inspection — run an HTTP(S) proxy in front of the agent yourself. Set HTTPS_PROXY and HTTP_PROXY in the wrapper environment so the agent inherits them.

Tools that work for this:

httpjail — wraps a process so its outbound HTTP(S) goes through a per-process proxy with an allowlist, transparently.
smokescreen — Stripe's egress firewall; ACL-driven HTTP CONNECT proxy.
mitmproxy with a small filter addon — handy when you also want to inspect what the agent is fetching while you tune the allowlist.
Squid or your corporate MITM proxy with an allowlist ACL — the boring-but-bulletproof option if you already run one.

Agent-native egress controls

Some agents have built-in domain allowlists or hooks that fire on outbound calls. Worth layering on top of the above, but don't rely on them alone — a successfully prompt-injected agent will at minimum try to talk its way around its own checks.

No-network mode

For tasks that don't need network at all — refactoring, code review, doc edits — block egress entirely. Lima's network-isolation toggles get most of the way; a custom wrapper using --network=none (Docker), unshare -n (Linux namespaces), or a similar primitive gets the rest.

None of this guarantees safety — treat it as defense in depth. The realistic goal is making prompt-injection-driven exfil noisy and limited, not impossible.

Questions or a setup we should support? Get in touch.