Failure Modes & Safe Defaults
Governance systems must define behavior when components fail. The choice between fail-closed (block operations) and fail-open (allow operations) depends on your threat model and operational requirements.
What failure modes exist?
The governance boundary can fail in several ways, each requiring defined behavior:
Boundary Unavailable
The governance boundary process is down, crashed, or unreachable. Operations cannot be evaluated.
Policy Load Failure
The policy artifact cannot be loaded, parsed, or verified. Boundary cannot determine how to evaluate operations.
Measurement Failure
Subject measurement cannot be completed. Can't compute hash, file missing, or measurement timeout.
Receipt Persistence Failure
Receipt cannot be written to the chain. Storage full, I/O error, or signing key unavailable.
Time Attestation Failure
TSA is unreachable or returns error. Cannot obtain trusted timestamp for receipt.
What is fail-closed?
Fail-closed blocks operations when the governance system cannot evaluate them. This prioritizes security over availability.
When to Use Fail-Closed
- ✓High-risk operations (financial, safety-critical)
- ✓Regulatory requirements mandate audit trail
- ✓Untrusted execution environment
- ✓Zero-tolerance security posture
Tradeoffs
- ✕Availability impact during failures
- ✕Denial-of-service risk if boundary attacked
- ✕Cascading failures possible
- ✕Requires high-availability boundary
What is fail-open?
Fail-open allows operations to proceed when governance cannot evaluate them. This prioritizes availability over security.
When to Use Fail-Open
- ✓Availability-critical systems
- ✓Governance is advisory, not mandatory
- ✓Trusted execution environment
- ✓Compensating controls exist
Tradeoffs
- ✕Ungoverned operations during failure
- ✕Audit trail gaps
- ✕Potential compliance issues
- ✕Must track DEGRADED state
How do I configure failure behavior?
The policy artifact defines failure behavior per failure mode:
"failure_policy": {
"boundary_unavailable": "FAIL_CLOSED",
"policy_load_failure": "FAIL_CLOSED",
"measurement_failure": "BLOCK_AND_ALERT",
"receipt_persistence_failure": "CONTINUE_DEGRADED",
"time_attestation_failure": "CONTINUE_DEGRADED",
"degraded_behavior": {
"max_operations": 100, // Max ungoverned ops
"max_duration_seconds": 300, // Max degraded time
"on_limit_reached": "FAIL_CLOSED"
},
"recovery": {
"auto_retry_interval_ms": 5000,
"max_retries": 3,
"on_recovery": "EMIT_RECOVERY_RECEIPT"
}
}What are the degraded states?
When operating in degraded mode, the system should clearly mark the condition:
- ■DEGRADED_LOCAL: TSA unavailable, using local timestamps
- ■DEGRADED_MEASUREMENT: Partial measurement, some files inaccessible
- ■DEGRADED_PERSISTENCE: Receipts buffered in memory, not yet persisted
- ■UNGOVERNED: Boundary bypassed, no evaluation occurred
Frequently asked questions
Can I mix fail-open and fail-closed?
Yes. Configure different behaviors for different failure modes. For example: fail-closed for policy load failures, but continue-degraded for TSA failures.
How do auditors know about degraded periods?
The evidence bundle includes degraded state markers in receipts. The verifier outputs PASS_WITH_CAVEATS and lists the degraded periods with their reason codes.
What if the boundary recovers mid-operation?
Emit a RECOVERY receipt marking the transition. Operations started during degraded mode remain degraded; new operations are fully governed.