Skip to content

The wire protocol

All communication between the bhatti host process and a guest agent happens over a binary framing protocol. The same protocol runs over TCP (production), Unix sockets (tests), and — historically — vsock (left over from before we fully moved to TCP). The protocol is engine-independent: the entire agent test suite runs on macOS over net.Pipe() without any VM or root.

  1. Custom binary framing, not gRPC or HTTP. The framing layer is ~130 lines of Go. It runs inside a microVM where every dependency is dead weight. gRPC would add protobuf codegen, a runtime, and complexity disproportionate to a protocol with eight frame types. See Why not HTTP or gRPC.
  2. WriteFrame does one Write() call. The entire frame is assembled into a single buffer before going to the wire. Otherwise, two goroutines writing concurrent stdout/stderr chunks would interleave at byte boundaries and corrupt frames. See Atomic frame writes.
  3. AUTH must be the first frame. If a token is configured, lohar refuses any other frame on a fresh connection until it sees an AUTH frame within 5 seconds. See Auth.
  4. Unknown frame types are skipped, not errored. A new frame type can be added without breaking older clients. See Forward compatibility.
  5. One connection per operation. Most agent calls open a TCP connection, do their thing, close. TTY sessions and forwards are the exceptions. See Connection model.
┌────────────────┬───────────┬──────────────────────┐
│ Length (4B BE) │ Type (1B) │ Payload (N bytes) │
└────────────────┴───────────┴──────────────────────┘

Length is a 4-byte big-endian unsigned integer. It equals 1 + len(Payload) — the type byte plus the payload. It does not include the 4-byte length prefix itself.

Type is a single byte identifying the frame kind.

Payload is variable-length, up to MaxFrameSize - 1 bytes.

Maximum frame size is 1 MB (pkg/agent/proto/constants.go:69). Both WriteFrame and ReadFrame enforce this — oversized frames are rejected, not truncated.

The encoding is straightforward (pkg/agent/proto/frame.go:10-28):

buf := make([]byte, 4+frameLen)
binary.BigEndian.PutUint32(buf[0:4], uint32(frameLen))
buf[4] = msgType
copy(buf[5:], payload)
_, err := w.Write(buf)

The single-buffer approach above isn’t an optimization — it’s a correctness requirement. The agent’s piped exec has stdout and stderr goroutines writing to the same connection at the same time. If WriteFrame did three Write() calls (length, type, payload), two goroutines could interleave at any byte boundary. The kernel would deliver something like:

[len=1024][type=STDOUT] ← goroutine A starts
[len=512][type=STDERR] ← goroutine B writes its full frame
[1024 bytes of stdout] ← goroutine A finishes, but the order on the wire is now broken

The receiver tries to parse len=1024 type=STDOUT plus the next 1024 bytes — but the next bytes are goroutine B’s length prefix, not goroutine A’s stdout. Frames are corrupt for the rest of the connection.

By assembling the whole frame into one buffer and calling Write() once, the kernel atomically delivers the whole frame to the socket buffer (for sizes under the kernel’s pipe-buffer limit, which is much larger than 1 MB on every platform we run on). Concurrent goroutines get their frames serialized end-to-end, never interleaved.

This is why the piped exec goroutines push frames through a single channel to a single writer goroutine — defense in depth on top of WriteFrame’s atomicity. Both levels matter.

All frame type constants are in pkg/agent/proto/constants.go.

TypeByteDirectionPayload
STDIN0x01host → guestraw bytes for child’s stdin
STDOUT0x02guest → hostchild’s stdout bytes
STDERR0x03guest → hostchild’s stderr bytes
TypeByteDirectionPayload
RESIZE0x04host → guest[u16 rows BE][u16 cols BE] (4 bytes exactly)
EXIT0x05guest → host[i32 exit_code BE] (4 bytes exactly)
ERROR0x06eitherUTF-8 error message (variable length)
KILL0x07host → guestempty payload
TypeByteDirectionPayload
EXEC_REQ0x10host → guestJSON-encoded ExecRequest

The ExecRequest shape is in pkg/agent/proto/messages.go:5-19: argv, env, tty, rows, cols, cwd, session_id, max_idle_sec, if_detached, detach, output_file, session. Most are optional pointers — nil means “use default.”

TypeByteDirectionPayload
AUTH0x11host → guestraw token bytes

If a token is configured (see Auth below), this must be the first frame on every connection.

TypeByteDirectionPayload
FWD_REQ0x20host → guestJSON {"port": 8080}
FWD_RESP0x21guest → hostJSON {"status": "ok"} or {"status":"error","message":"..."}
TypeByteDirectionPayload
EXEC_LIST_REQ0x30host → guestempty
EXEC_LIST_RESP0x31guest → hostJSON []SessionInfo
EXEC_KILL0x32host → guestJSON {"session_id": "..."}
SESSION_INFO0x33guest → hostJSON SessionInfo (sent on create or attach, before STDOUT)
TypeByteDirectionPayload
ACTIVITY_REQ0x40host → guestempty
ACTIVITY_RESP0x41guest → hostJSON ActivityInfo (last activity timestamp, attached count)
TypeByteDirectionPayload
FILE_READ_REQ0x50host → guestJSON {"path":"...","offset":1,"limit":2000,"max_bytes":51200}
FILE_READ_RESP0x51guest → hostJSON {"size":1234,"mode":"0644"}
FILE_WRITE_REQ0x52host → guestJSON {"path":"...","mode":"0644","size":1234}
FILE_WRITE_RESP0x53guest → hostJSON {"status":"ok"}
FILE_STAT_REQ0x54host → guestJSON {"path":"..."}
FILE_STAT_RESP0x55guest → hostJSON FileInfo
FILE_LS_REQ0x56host → guestJSON {"path":"..."}
FILE_LS_RESP0x57guest → hostJSON []FileInfo

Systemctl IPC (in-guest, not host↔guest)

Section titled “Systemctl IPC (in-guest, not host↔guest)”
TypeByteDirectionPayload
SYSTEMCTL_REQ0x60client → lohar (Unix socket)JSON SystemctlRequest
SYSTEMCTL_RESP0x61lohar → clientJSON SystemctlResponse

These are spoken over /run/bhatti/systemctl.sock inside the guest, not over the host↔guest TCP connection. They exist so the systemctl shim’s user-facing invocation can ask PID 1 lohar to perform privileged operations — see Lohar’s systemctl IPC.

The trust boundary is SO_PEERCRED on the socket, not anything in the request payload. A non-root caller could otherwise forge a UID claim and get a privileged operation done.

Two TCP ports inside each VM, two purposes:

  • Port 1024 (control) — exec, sessions, files, activity queries
  • Port 1025 (forward) — port forwarding / TCP tunneling

MaxFrameSize is 1 MB. Both ports also exist as vsock listeners for historical reasons, but the host always uses TCP — see the lohar story for why.

One connection per operation. The host dials port 1024, optionally sends an AUTH frame, sends exactly one request frame, reads responses until the operation completes, then the connection closes.

Host Lohar
│ │
├──TCP connect :1024 ─────────────────►│
├──AUTH frame (if token configured) ──►│
├──EXEC_REQ frame ───────────────────►│
│ ├──fork/exec child
│◄──STDOUT frame──────────────────────┤
│◄──STDOUT frame──────────────────────┤
│◄──STDERR frame──────────────────────┤
│◄──EXIT frame────────────────────────┤
└──connection closed──────────────────┘

Exception: TTY sessions keep the connection open for bidirectional I/O. The host sends STDIN and RESIZE frames; the guest sends STDOUT frames and eventually an EXIT frame. If the host disconnects, the session detaches (process keeps running, scrollback buffer captures output — see Sessions).

One connection per tunnel. After the FWD_REQ/FWD_RESP handshake, the framing protocol is abandoned — the connection becomes a raw bidirectional TCP relay.

Host Lohar Target (localhost:8080)
│ │ │
├──TCP connect :1025 ─────────────────►│ │
├──AUTH frame ────────────────────────►│ │
├──FWD_REQ {"port": 8080} ───────────►│ │
│ ├──TCP connect :8080 ─────►│
│◄──FWD_RESP {"status": "ok"} ────────┤ │
│ │ │
│═══ raw bytes (no framing) ══════════►│═════════════════════════►│
│◄═════════════════════════════════════│◄═════════════════════════│

This is what bhatti exec doesn’t use, but the reverse proxy does. When you hit a published URL, the daemon picks up an HTTP request, opens a forward connection to the agent, sends FWD_REQ for the target port, and then pipes the request body through. The agent relays it to localhost:port inside the VM, and the response comes back the same way.

Forward connections don’t have framing because they’re proxying arbitrary TCP — HTTP, WebSocket, SSH, whatever. Adding framing would mean parsing HTTP on the agent side, which we don’t want to do.

If a token is configured (via the config drive at boot), the first frame on every connection must be AUTH with the token as payload. Lohar validates it within a 5-second deadline. Invalid or missing auth gets an ERROR frame and the connection is closed.

The token is generated per-sandbox during Create() — 16 random bytes, hex-encoded — and injected into the VM via the config drive. It’s then stored in the host’s AgentClient and not used for anything else. If the daemon restarts and re-reads the agent token from fc_state, the token survives — VMs across daemon restarts use the same token they were created with.

The token comparison uses subtle.ConstantTimeCompare (cmd/lohar/handler.go:69) so a network observer can’t time the comparison to figure out a prefix of the token.

File reads support server-side truncation to avoid transferring large files when the consumer only needs the first N lines. This matters because coding agents always truncate file reads — typically 2000 lines or 50 KB — and shipping a 100 MB log over the wire just to discard 99.95% of it is a waste.

host guest
FILE_READ_REQ
{"path":"/app.log","offset":1,"limit":2000,"max_bytes":51200}
FILE_READ_RESP
{"size":10485760,"mode":"0644"} ← total file size
STDOUT (line data)
STDOUT (line data)
... ← stops when limit or max_bytes hit
EXIT code=0
  • offset — 1-indexed line number to start from (0 or absent = beginning).
  • limit — maximum lines to return (0 = unlimited).
  • max_bytes — maximum bytes to return (0 = unlimited).

Whichever limit hits first stops the read. Without any truncation parameters the full file is streamed. The FILE_READ_RESP always contains the total file size so the consumer knows whether content was truncated.

Directories and non-regular files are rejected with an ERROR frame. File reads are cancellable via context — closing the connection gives lohar a broken pipe, stopping the transfer immediately.

Writes are atomic. Lohar writes to a temp file, fsyncs, then renames over the target. Concurrent readers see either the old content or the new content, never a half-written state.

host guest
FILE_WRITE_REQ
{"path":"/workspace/app.js","mode":"0644","size":1234}
STDIN (content bytes) │
STDIN (content bytes) │
... │ ← until size bytes sent
FILE_WRITE_RESP
{"status":"ok"}

Negative sizes are rejected (prevents silent data loss from a missing Content-Length on the HTTP side that would otherwise pass through as a default zero). If the connection drops mid-write, the temp file is cleaned up.

Why fsync before rename? See the explanation in lohar.md — short version: the host might snapshot the VM the next millisecond, and dirty pages in the guest’s page cache aren’t part of the FC snapshot. Without fsync, the rename is metadata-durable but the data isn’t on the virtio-blk device.

ContextSignalWhy
Piped exec (non-TTY)SIGKILL to process groupAgents need instant, reliable abort. Child processes (npm → node) must die immediately.
TTY session disconnectNone (detach)Process keeps running, session detaches. Scrollback captures output.
TTY session KILL frameSIGTERM to process groupAllows graceful shutdown. If the process handles SIGTERM and survives, the session remains reattachable.
EXEC_KILL APISIGKILL to process groupExplicit force-kill by session ID.
Idle timer expirySIGKILL to process groupSession is abandoned, no observer.

All kill operations target the process group (negative PID), not just the session leader. This requires Setpgid: true on the SysProcAttr so child processes are in the same group. Without this, npm install would survive killing the shell that launched it.

ReadFrame in the client skips unknown frame types rather than erroring. This allows the protocol to be extended without breaking existing clients — a new frame type added to lohar won’t crash an older bhatti host.

The flip side: a new client sending a frame an older agent doesn’t understand will get its connection closed (the agent ignores the frame, but no response comes back). So forward compatibility is asymmetric — host upgrades cleanly, but the agent has to be at least as new as the most-recent frame type the host wants to use.

In practice the only frame types added since v1.0 are SYSTEMCTL_REQ/RESP (in-guest, not host↔guest) and the file ops expanding from read/write to include stat/ls. Both are guarded by capability checks before use.

The framing layer is ~130 lines of Go (pkg/agent/proto/frame.go) and handles concurrent stdout/stderr multiplexing, binary file transfers, and terminal I/O with zero dependencies.

gRPC would mean:

  • protobuf code generation in the build pipeline
  • a runtime dependency inside the VM (heavy for an agent that’s supposed to be small and static)
  • complexity disproportionate to a protocol with eight frame types
  • streaming RPCs would handle the stdout/stderr case but with more moving parts than the current channel-serialized writer

HTTP would mean:

  • header parsing on every exec
  • chunked encoding negotiation
  • content-type negotiation
  • a connection that’s already authenticated and never leaves the host paying the cost of HTTP semantics it doesn’t need

The framing protocol is what we’d write if we sat down to design a host↔guest protocol from scratch. Length, type, payload. Atomic write of the whole thing. That’s it.

  • Lohar — the agent that speaks this protocol on the guest side
  • Architecture — the bigger picture of host ↔ guest communication
  • Decisions & learnings — including why post-restore vsock didn’t work and we ended up TCP-everywhere