Skip to main content
blog.philz.dev

computing 2+2: so many sandboxes

Sandboxes are so in right now. If you're doing agentic stuff, you've now doubt thought about what Simon Willison calls the lethal trifecta: private data, untrusted content, and external communication.

If you work in a VM, for example, you can avoid putting a secret on that VM, and then that secret--that's not there!--can't be exfiltrated.

If you want to deal with untrusted data, you can also cut off external communication. You can still use an agent, but you need to either limit its network access or limit its tools.

So, today's task is to run python -c "print(2+2)" five ways.

1. Cloud Hypervisor #

Cloud Hypervisor is a Virtual Machine Monitor which runs on top of the Linux Kernel KVM (Kernel-based Virtual Machine) which runs on top of CPUs that support virtualization. A cloud-hypervisor VM sorta looks like a process on the host (and can be managed with cgroups, for example), but it's running a full Linux kernel. With the appropriate kernel options, you can run Docker containers, do tricky networking things, nested virtualization, and so on. Lineage-wise, it's in the same family as Firecracker and crosvm. It avoids implementing floppy devices and tries to be pretty small.

One-time setup #

sudo apt-get update && sudo apt-get install -y cpio
wget -q https://github.com/cloud-hypervisor/cloud-hypervisor/releases/download/v51.1/cloud-hypervisor-static -O cloud-hypervisor
chmod +x cloud-hypervisor
wget -q https://github.com/cloud-hypervisor/linux/releases/download/ch-release-v6.16.9-20251112/vmlinux-x86_64 -O vmlinux

Build the initramfs #

Traditionally, people tell you to unpack a file system and maybe make a vinyl out of it using an iso image or some such. A trick is to instead start with a container image for your userspace, and then you get all the niceties (and all the warts) of Docker.

ROOTFS=$(mktemp -d)
CID=$(docker create python:3-alpine)
docker export "$CID" | tar -C "$ROOTFS" -x
docker rm "$CID" >/dev/null

cat > "$ROOTFS/init" << 'EOF'
#!/bin/sh
/bin/mount -t proc none /proc
/bin/mount -t devtmpfs none /dev 2>/dev/null
exec > /dev/ttyS0 2>/dev/null
/usr/local/bin/python3 -S -c "print(2+2)"
echo o > /proc/sysrq-trigger
EOF
chmod +x "$ROOTFS/init"

(cd "$ROOTFS" && find . -print0 | cpio --null -o --format=newc 2>/dev/null | gzip -9) > initramfs.img
rm -rf "$ROOTFS"

Run #

sudo ./cloud-hypervisor \
    --kernel ./vmlinux \
    --initramfs ./initramfs.img \
    --cmdline "reboot=t panic=-1 quiet loglevel=0" \
    --cpus boot=1 --memory size=256M \
    --serial tty --console off 2>/dev/null
# 4

Takes about 2 seconds.

2. gVisor #

gVisor implements a large chunk of the Linux syscall interface in a Go process. Think of it as a userland kernel. It came out of Google's AppEngine work. It can use systrap/seccomp, ptrace, and KVM tricks to do the interception.

The downside of gVisor is that you can't do some things inside of it. For example, you can't run vanilla Docker inside of gVisor because it doesn't support Docker's networking tricks.

Install #

sudo apt-get update && sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list > /dev/null
sudo apt-get update && sudo apt-get install -y runsc

Run #

Again, let's use Docker to get ourselves a userland. No need for a kernel image. runsc stands for "run secure container."

ROOTFS=$(mktemp -d)
CID=$(docker create python:3-alpine)
docker export "$CID" | tar -C "$ROOTFS" -x
docker rm "$CID" >/dev/null

sudo runsc --network=none do --root="$ROOTFS" \
    /usr/local/bin/python3 -c "print(2+2)"
# 4

3. Monty (Rust Interpreter) #

Monty is a Python interpreter written in Rust. It doesn't expose the host, but can call functions that are explicitly exposed.

uv run --with pydantic-monty python -c \
  "import pydantic_monty; pydantic_monty.Monty('print(2 + 2)').run()"
# 4

This one's super fast.

4. Deno + Pyodide (WASM + Permission Sandbox) #

Pyodide is CPython compiled to WebAssembly. Deno is a JS runtime with permission-based security. Deno happens to run wasm code fine, so we're using it as a wasm runtime. There are other choices.

Setup #

curl -fsSL https://deno.land/install.sh | sh
deno eval 'import { loadPyodide } from "npm:pyodide"; const pyodide = await loadPyodide(); pyodide.runPython("print(2+2)");'
# 4

5. Chromium + Pyodide #

Chromium is probably the world's most popular sandbox. This is pretty much the same as Deno: it's the V8 interpreter under the hood.

pyodide-sandbox.html:

<!DOCTYPE html>
<html>
<body>
<pre id="output">loading...</pre>
<script src="https://cdn.jsdelivr.net/pyodide/v0.27.5/full/pyodide.js"></script>
<script>
async function main() {
  const output = [];
  const pyodide = await loadPyodide({
    stdout: (text) => output.push(text)
  });
  pyodide.runPython("print(2+2)");
  const el = document.getElementById("output");
  el.textContent = output.join("\n");
  el.dataset.done = "1";
}
main();
</script>
</body>
</html>

Lots of ways to drive Chromium. Puppeteer, headless, etc. Let's try rodney:

uv tool install rodney

rodney start
rodney open file://$PWD/pyodide-sandbox.html
rodney wait '#output[data-done="1"]'
rodney text '#output'
rodney stop
# 4

Exercise for the Reader: Turducken #

Run pyodide inside Deno inside gVisor inside cloud-hypervisor.

The Hard Exercises #

Setting up the networking and the file system/disk sharing for these things is usually not trivial, especially if you don't want to accidentally expose the VMs to each other, and so forth.

Examples in Agentic Topology #

I want to compare two possible agents: a coding agent and a logs agent.

A coding agent needs a full Linux, because, at the end of the day, it needs to edit files and run tests and operate git. Your sandboxing options are going to end up being a VM or a container of some sort.

A logs agent needs access to your logs (say, the ability to run readonly queries on Clickhouse) and it needs to be able to send you its output. In the minimal case, it doesn't need any sandboxing at all, since it doesn't have access to anything. If you want it to be able to produce a graph, however, it will need to write out a file. At the minimum, it will need to take the results of its queries and pair them with an HTML file that has some JS that renders them with Vegalite. You might also want to mix and match the results of multiple queries, and do some data munging outside of SQL. This is all where a setup like Monty or Pyodide come in handy. Giving the agent access to some Python expands considerably how much the agent can do, and you can do it cheaply and safely with these sandboxes. In this vein, if you use DSPy for RLM, its implementation gives the LLM the Deno/pyodide solution to let the LLM have "infinite" context.

Browser-based agents are a thing too. Itsy-Bitsy is a bookmarklet-based agent. It runs in the context of the web page it's operating on.

Other sandboxes? #

Let me know what other systems I missed!