Deterministic Verification

Deterministic verification means any conforming verifier given identical bundle bytes will always produce the same verdict. This requires canonical serialization, reproducible hashing, and well-defined signing boundaries.

Why does determinism matter for verification?

If two verifiers could produce different verdicts for the same bundle, verification becomes meaningless. An auditor needs certainty that their PASS matches the producer's PASS, and that any FAIL identifies the exact same issue.

  • Trust transfer: Auditor trusts the verifier code, not the bundle producer
  • Dispute resolution: Both parties run the same verifier, get the same result
  • Implementation diversity: Multiple independent verifier implementations can exist

What is canonicalization?

Canonicalization transforms data into a single, deterministic byte representation. Two semantically equivalent JSON objects may have different byte representations (different key ordering, whitespace). Canonicalization eliminates this ambiguity.

Non-Canonical (Ambiguous)

{
  "name": "test",
  "value": 42
}

// or

{"value":42,"name":"test"}

Canonical (Deterministic)

{"name":"test","value":42}

// Always this exact byte sequence
// Keys sorted, no whitespace

What are the canonicalization rules?

The specification mandates these canonicalization rules for all JSON objects:

  • Encoding: UTF-8, no BOM
  • Key ordering: Object keys sorted lexicographically by byte value
  • Whitespace: No insignificant whitespace (no pretty-printing)
  • Arrays: Preserve element order exactly
  • Timestamps: ISO 8601 UTC with "Z" suffix
  • Numbers: No leading zeros, no trailing zeros after decimal

How are hashes computed reproducibly?

All hashes use SHA-256 over the canonical byte representation. The hash input is always the exact bytes after canonicalization, not a pretty-printed or reformatted version.

// Reproducible hash computation
1. Load JSON object
2. Apply canonicalization rules
3. Encode as UTF-8 bytes
4. Compute SHA-256(bytes)
5. Encode result as lowercase hex

// Example
input = {"b": 2, "a": 1}
canonical = {"a":1,"b":2}
bytes = [0x7b, 0x22, 0x61, 0x22, 0x3a, 0x31, 0x2c, ...]
hash = sha256(bytes)
result = "abc123..."  // hex string

What are signing boundaries?

The signing boundary defines exactly which fields are included when computing a signature. Typically, the signature field itself is excluded from the signed content to avoid circular dependency.

// Receipt signing boundary
signed_content = canonicalize(receipt EXCLUDING signer.signature)
signature = Ed25519_sign(private_key, signed_content)

// Verification
signed_content = canonicalize(receipt EXCLUDING signer.signature)
valid = Ed25519_verify(public_key, signed_content, signature)

// If any byte of signed_content differs, verification fails

How do we ensure verifier consistency?

Verifier implementations must pass a conformance test suite with known-good and known-bad test vectors. The test suite includes edge cases for canonicalization, hash computation, and signature verification.

  • Golden vectors: Pre-computed hashes and signatures for known inputs
  • Tamper vectors: Modified bundles that must produce specific FAIL codes
  • Edge cases: Unicode normalization, number formatting, empty objects

Frequently asked questions

What if my JSON library doesn't support deterministic output?

Implement a canonicalization layer that sorts keys and strips whitespace before hashing. Many implementations parse JSON, then re-serialize with sorted keys.

How do I handle floating-point numbers?

The specification recommends avoiding floating-point where possible. When required, use IEEE 754 double precision and normalize to shortest representation without trailing zeros.

Is JCS (RFC 8785) sufficient?

Yes. JCS (JSON Canonicalization Scheme) per RFC 8785 meets the requirements and is the recommended approach. Implementations should reference the RFC for edge cases.