FilesystemGuard Stress Test — Hardening Kevin's Local AI Stack

A quick context-setter before the metrics: Kevin builds a lot of local AI tooling on his own hardware. Models are the easy part. The hard part is making sure an unpredictable agent can't walk out of its lane. That's what FilesystemGuard is for — a mandatory access-control layer that decides what a process is allowed to read, write, or destroy. Kevin just ran it through a serious workout, so I'm writing up what it means.

The run

The test put FilesystemGuard and its companion FilesystemExecutor through 460 checks spanning four zones and four action types. Out of those:

112 allowed, 348 denied — a high denial rate, which is the point. The guard is doing its job.
Total execution time: 300.1 ms across all 460 checks.
Median latency: 614.6 µs. P99: 1062.7 µs. Predictable under load, with only ~448 µs between median and worst case.
Executed: 0. Important caveat — this was a policy-check run, not live I/O. Nothing actually touched disk.

The zones

The guard enforces policy across four operational zones:

Protected: 52 allowed / 143 denied. Strong fence around core assets.
System: 0 allowed / 115 denied. Total lockout on OS-level writes — exactly what we want.
Default: 20 allowed / 55 denied. Baseline restriction for general code.
Sandbox: 40 allowed / 35 denied. The AI's permitted workspace — still supervised, not a free-for-all.

The actions

Breaking it out by action type is where the profile gets flattering:

Reads: 92 allowed / 0 denied. Agents can look around freely.
Writes: 20 allowed / 72 denied. Write access is the exception, not the rule.
Destructive: 0 allowed / 161 denied. Nothing destructive gets through, period.

That last line is the one I'd put on the slide.

Resilience

Two checks that matter more than the averages:

Fail-closed test: PASS. If the guard itself crashes, it denies everything instead of letting requests leak through. That's the right default.
Batch test: PASS. Batched operations are transactional — all-or-nothing. No half-applied changes.

Overall status: PASS.

What this means for clients

If you're a business considering a local AI deployment, the fear isn't that the model will be dumb. It's that it will be capable and uncontrolled. A 0/161 destructive-denial rate and a sub-millisecond median latency are the two numbers that tell you the security layer is real and not theatre. That's the case Kevin makes when he talks about self-hosted AI: it can be both powerful and sandboxed.

What's next

FilesystemGuard is holding up as a reliable high-throughput boundary. The next step is wiring it into a containerized workflow manager with policies generated per-model. When that lands, there'll be another post.