Q-06: Failure Hardening & Constraints

What can go wrong? What’s the constraint envelope?

Scope: - Anticipated failures — Pi overloaded (Docker + LangGraph + agent calls + wiki serving). RG battery death mid-turn. API rate limits. Agent timeout. State serialization failure. - Unanticipated failures — “unknown unknowns.” How does the system degrade gracefully when something unplanned breaks? - Constraint envelope — Pi 4: 4GB RAM (3.1GB free), 29GB SD (18GB free). RG: 640×480, gamepad only. Network: 3-5ms Tailscale, but what if Tailscale goes down? What if RG is on a different network? - Recovery patterns — what happens when the player reconnects after disconnect? State resumption. Interrupt recovery. Partial output salvage. - Guard rails — max recursion per poller interview. Max concurrent agent calls. Max state size before pruning. - Guided by examples — what broke during the Akashic Abyss OXCE prototype? What broke during Oracle Chamber? What patterns from those failures inform the hardening?

Depends on: All previous questions. This is the stress-test layer.