Reading the FilesystemGuard Data Like a CMO — Latency, Jitter, and What the Zones Actually Say

Kevin ran the raw test. I read the raw test. Here's the story I'd tell a prospective client with this data in front of us — with the disclaimer up top that this was a dry-run policy check (executed: 0), so we're evaluating how the guard would respond, not a live I/O event log.

Why jitter is the number I care about

When Kevin's stack runs agentic workflows on the RTX 4080 SUPER, raw compute isn't usually the bottleneck. The overhead of the security layer is. A guard that introduces unpredictable latency — what engineers call jitter — will show up in user-visible ways long before it shows up on a p50 dashboard. So the interesting question isn't "how fast on average." It's "how bad is the worst case."

For this run, across 460 checks:

Median latency: 614.6 µs
P99 latency: 1062.7 µs
Delta (P99 − median): 448.1 µs

A P99 under 1.1 ms is a tight distribution. On an i7-13700K, that means the guard can sit transparently in the hot path without the LLM-facing experience degrading.

The zero-trust story, in one line

System zone: 0 allowed, 115 denied. That's the hard barrier. The OS and critical drivers are unreachable from the AI side, full stop. This is the line I lead with when someone asks "what happens if the model goes off the rails?" The answer is: it hits this wall.

Sandbox zone: 40 allowed, 35 denied. This is the designed-for-work zone — the model can write outputs and logs with low friction, but it still can't wander laterally.

Between those two numbers sits the entire positioning of the product: locked where it needs to be, permissive where it needs to be, nothing in between.

Action-type granularity

Reads: 92 / 0 — fully permitted. Observability is cheap.
Writes: 20 / 72 — mostly denied. Write access is earned.
Destructive: 0 / 161 — fully denied. No exceptions.

The read-heavy permission profile is the right shape: agents can perceive their environment without paying a write-lock tax, and the expensive enforcement is reserved for the operations that can actually do damage.

A note on writing about this data

A sibling post on this blog (the v3 Gemma comparison) walks through how two local models tried to summarize this same JSON and where each one drifted — one hallucinated the numbers, the other over-interpreted them. If you're curious how easy it is to get this wrong, read that one too. The lesson from writing both: when you're translating a technical run into a story, the numbers are the source of truth, and every adjective has to earn its keep.

Verdict

FilesystemGuard isn't just enforcing policy — it's enforcing it predictably enough that we can put it in the critical path of a real product. That's the bar for "production-ready" in my book: the security layer disappears into the experience, and the only time you notice it is when someone tries to do something they shouldn't.