The Firecracker engine: HTTP API, jailer, rate limits
This page is the lower-level companion to Architecture. It walks through what the Firecracker engine actually does to boot a VM, what HTTP calls it sends, what the jailer setup looks like, and how the rate limiters work.
If you’re using bhatti, you don’t need this page. If you’re hacking on the engine, extending it, or debugging a weird FC-level failure, this is the reference.
Design decisions on this page
Section titled “Design decisions on this page”- No Firecracker SDK. We talk directly to FC’s Unix-socket HTTP API with ~20 lines of helpers. The reasoning lives on the Architecture page; this page covers what the API surface actually looks like in code.
- Jailer mode is opt-in via config. Without jailer, FC runs as root. With jailer, each VM’s FC process drops to an unprivileged UID inside a chroot. Single-user dev: skip the jailer. Multi-tenant: turn it on. See Jailer mode.
- Rate limiters are off by default. Firecracker’s per-drive and per-network rate limiters add overhead and aren’t useful on a single-user box. We configure them only when you set them. See Rate limiting.
- We never use FC’s diff snapshots.
track_dirty_pages: falsein machine-config means every snapshot is Full. The full reasoning is in Thermal states.
The HTTP adapter
Section titled “The HTTP adapter”Bhatti’s entire FC-API layer is a Unix-socket-aware http.Client and
three helpers. From
pkg/engine/firecracker/fc.go:
func fcAPIClient(socketPath string) *http.Client { return &http.Client{Transport: &http.Transport{ DialContext: func(ctx context.Context, _, _ string) (net.Conn, error) { var d net.Dialer return d.DialContext(ctx, "unix", socketPath) }, }}}
func fcPut(ctx context.Context, c *http.Client, path, body string) error { req, _ := http.NewRequestWithContext(ctx, "PUT", "http://localhost"+path, strings.NewReader(body)) req.Header.Set("Content-Type", "application/json") resp, err := c.Do(req) // ... check status, read body on error, done}Plus fcPatch and fcGet along the same lines. Every Firecracker
API call goes through one of these three. JSON bodies are built with
fmt.Sprintf from the call site (yes, really — see the boot
sequence below for what that looks like in practice).
Boot sequence: what FC sees
Section titled “Boot sequence: what FC sees”A bhatti create walks through a fixed sequence of API calls. The
order matters — FC enforces it (you can’t set drives before
machine-config is set, you can’t InstanceStart before drives are
registered).
In pkg/engine/firecracker/create.go, each phase emits a phase()
marker so the create timing is visible if you crank logging up. The
sequence:
1. Write bootArgs to /boot-source PUT /boot-source {"kernel_image_path":".../vmlinux-arm64", "boot_args":"reboot=k panic=1 pci=off 8250.nr_uarts=0 init=/usr/local/bin/lohar quiet loglevel=0 ip=10.0.1.2::10.0.1.1:255.255.255.0::eth0:off:1.1.1.1:8.8.8.8:"}
2. Configure rootfs drive (read-write) PUT /drives/rootfs {"drive_id":"rootfs","path_on_host":".../rootfs.ext4", "is_root_device":true,"is_read_only":false}
3. Configure machine PUT /machine-config {"vcpu_count":2,"mem_size_mib":1024, "track_dirty_pages":false,"huge_pages":"None"}
4. Configure entropy device (so guests don't block on getrandom()) PUT /entropy {"rate_limiter":{"bandwidth":{"size":1024,"one_time_burst":8192,"refill_time":100}}}
5. Configure balloon device PUT /balloon {"amount_mib":0,"deflate_on_oom":true,"stats_polling_interval_s":5}
6. Configure config drive (read-only) PUT /drives/config {"drive_id":"config","path_on_host":".../config.ext4", "is_root_device":false,"is_read_only":true}
7. Configure each volume drive (vdc, vdd, …) PUT /drives/vol-<name> {"drive_id":"vol-<name>","path_on_host":".../vol-<name>.ext4", ...}
8. Configure network interface PUT /network-interfaces/eth0 {"iface_id":"eth0","host_dev_name":"tapXXXXXXXX","guest_mac":"02:..."}
9. Configure vsock (still configured even though we don't use it post-boot) PUT /vsock {"vsock_id":"vsock0","guest_cid":3,"uds_path":".../vsock.sock"}
10. Start the instance PUT /actions {"action_type":"InstanceStart"}Steps 4 (entropy) and 5 (balloon) are easy to miss but matter. Without
the entropy device, the guest’s getrandom() blocks at boot waiting
for entropy that the kernel never gathers (microVMs don’t have a
physical entropy source). Without the balloon, the
balloon trick
during hot→warm doesn’t work.
After InstanceStart, the host pre-populates the
ARP cache with the
guest’s MAC and starts polling TCP :1024 with exec true. When the
agent answers, the create returns to the user.
Jailer mode
Section titled “Jailer mode”Firecracker’s Jailer
is the recommended way to run FC in production. It chroots each VM’s
FC process into a private directory, drops to an unprivileged UID/GID,
and applies a seccomp filter. Bhatti’s jailer integration is in
pkg/engine/firecracker/jail.go.
When firecracker_jailer and jail_uid/jail_gid are set in
config.yaml, the engine:
- Creates the chroot at
<data_dir>/jails/firecracker/<sandbox_id>/. - Resolves all file references — kernel, rootfs, config drive, volumes — to chroot-relative paths. The actual files don’t need to move; jailer hardlinks them into the chroot.
- Spawns FC as
jailer ... --exec-file /path/to/firecracker --uid <uid> --gid <gid> --id <sandbox_id> --chroot-base-dir <data_dir>/jails .... - Talks to FC’s API socket through the chroot path.
For snapshot operations under jailer, FC writes the snapshot artifacts
inside the chroot, then bhatti moves them out atomically after FC
exits — see
pkg/engine/firecracker/lifecycle.go:84-94.
Without jailer, FC runs as root with full host visibility. Single-user dev: this is fine. Multi-tenant: turn it on. The jailer hardens the isolation between FC processes — a kernel exploit that breaks out of KVM still has to get past seccomp + UID + chroot before reaching anything that matters.
Rate limiting
Section titled “Rate limiting”Firecracker supports per-drive and per-network bandwidth and IOPS
rate limiters
(FC rate limiter docs).
Bhatti’s wrapper is in
pkg/engine/firecracker/engine.go:21-34:
type RateLimitConfig struct { NetBandwidthBytes int64 // bytes/s per direction NetBurstBytes int64 // one-time burst bytes DiskBandwidthBytes int64 // bytes/s DiskIOPS int64 // ops/s}Defaults are zero, which means disabled. We don’t apply rate limiters unless you configure them. The reasoning: FC’s rate limiter is a token-bucket per drive/iface, and on a single-user box it adds syscall overhead without any real benefit.
When configured, the limiters apply at the drive and network-interface
level via FC’s API. For drives this is in
create.go:298-302:
if bw := e.cfg.RateLimits.diskBandwidth(); bw > 0 { iops := e.cfg.RateLimits.diskIOPS() rootfsDrive += fmt.Sprintf( `,"rate_limiter":{"bandwidth":{"size":%d,"refill_time":1000}, "ops":{"size":%d,"refill_time":1000}}`, bw, iops)}refill_time is in milliseconds — 1000 means the bucket refills
once per second. So a DiskBandwidthBytes: 100_000_000 config gives
each VM 100 MB/s sustained disk throughput.
For a multi-tenant deployment, useful values:
firecracker: rate_limits: net_bandwidth: 12500000 # 100 Mbps per direction net_burst: 10000000 # 10 MB burst disk_bandwidth: 104857600 # 100 MB/s disk_iops: 10000 # 10K ops/sThese give each VM a hard ceiling so one noisy neighbor can’t saturate the host’s disk or upstream link.
Listening port discovery
Section titled “Listening port discovery”When a request hits a sandbox proxy URL, the daemon needs to know which ports the guest has open. Rather than maintain a host-side registry that gets stale, bhatti just asks the guest:
// pkg/engine/firecracker/exec.go:198result, _ := ag.Exec(ctx, []string{"ss", "-tln", "--no-header"}, nil, "")return parseSSOutput(result.Stdout), nilThis is on-demand: we don’t poll. The proxy queries listening ports
when a request arrives, and the cost is one exec (a few
milliseconds) per query. Guest-side ss is fast and accurate — it
queries the kernel directly rather than walking /proc/net/tcp.
The trade-off: this only works on hot VMs. The ListeningPorts call
errors out for warm or cold sandboxes. The public proxy handles this
by ensureHot-ing the sandbox first — if you publish a port and a
request hits the cold URL, the proxy wakes the VM, queries ports,
checks that the requested port is in the list, then forwards.
Recovery: what gets persisted
Section titled “Recovery: what gets persisted”FC’s state machine doesn’t survive process restart on its own — you
get a fresh FC process every time. Bhatti persists the per-VM
configuration as JSON in the fc_state table so the next daemon
startup can rebuild and resume.
The state captured per VM
(pkg/engine/firecracker/helpers.go:120-180):
vcpu_count,mem_size_mibtap_device,guest_ip,guest_macrootfs_path,snap_mem_path,snap_vm_pathvolumes— list of attached volume IDs and mount pointsagent_token— for reattaching to the agent after restorefc_path_origin— the original sandbox ID whose paths got baked into the snapshot (matters when you create a sandbox from a named snapshot — see below)keep_hot
On daemon startup,
recoverVMs
reads each row, calls engine.RestoreVM(...), and the engine rebuilds
the in-memory VM struct from the persisted state. If the snapshot
files exist and the FC process is dead, the VM is marked stopped —
ready to resume on the next API call.
fc_path_origin and snapshot-derived sandboxes
Section titled “fc_path_origin and snapshot-derived sandboxes”When you create a sandbox from a named snapshot
(bhatti snapshot resume <snap> --name new-vm), FC’s snapshot file
contains absolute paths to the rootfs and config files of the
original sandbox. Those paths don’t exist anymore (the original may
have been destroyed) and even if they did, we’d be operating on the
wrong files.
The fix in
lifecycle.go:354-380
is to symlink the original paths to the new paths just before
/snapshot/load, then remove the symlinks once FC has the file
descriptors open. Hacky. Effective. Documented in the commit history
because it tripped up several rounds of debugging.
In jailer mode, this isn’t an issue because all paths are chroot-relative — FC only sees its own chroot. The symlink dance is a bare-mode concession.
Snapshot artifact layout
Section titled “Snapshot artifact layout”A cold sandbox has, on disk:
<data_dir>/sandboxes/<id>/├── rootfs.ext4 CoW copy, may have been written to├── config.ext4 the 1MB config drive├── vol-<name>.ext4 attached volumes├── mem.snap full memory snapshot (size of VM RAM)└── vm.snap VM state snapshot (small, KB-range)mem.snap is the size of the VM’s allocated RAM. If you have a 4 GB
VM that’s been cold for an hour, mem.snap is 4 GB on disk. vm.snap
is small — it holds CPU registers, vCPU state, and FC’s device-model
state.
For named snapshots (created via bhatti snapshot create), the
artifacts go to <data_dir>/snapshots/<user>/<name>/ and include
copies of the rootfs and volumes alongside mem.snap and vm.snap,
so the snapshot is fully self-contained — you can resume it into a
fresh sandbox even after the original is destroyed.
Where to go next
Section titled “Where to go next”- Architecture — the bird’s-eye view that this page sits underneath
- Thermal states — what the
engine does for
Stop/Start/Pause/Resume - Networking — the bridge, TAP, and IP plumbing that the create flow above depends on
- Decisions & learnings — the no-SDK choice, the rory incident, and other engine-level lessons