DOOM bare-metal demo — provable test protocol (R-doom1g / R-doom1h)#
Status: v1 shipped 2026-06-14 (R-doom1g: GATE A + GATE B). v1.1 shipped 2026-06-14 (R-doom1h: GATE C-1 + GATE C-2 audio gates). Supersedes the empirical-only chi² gate from R-doom1e and the empirical "WAV non-zero" check inherited from R-doom1d.
Motivation#
The earlier R-doom1d / R-doom1e gates were empirical: they observed a behavioural property (audio WAV non-zero, frame histogram divergent after a keypress) and inferred a stack property (audio works, input propagates). Empirical inference can be wrong — a non-zero WAV can be ambient device noise, a histogram divergence can come from anything.
R-doom1g delivers a provable protocol: given deterministic inputs and a versioned reference oracle, every checkpoint reduces to a mechanical comparison with a single PASS/FAIL outcome. No human judgement, no chi² threshold tuning, no "looks right".
Operational criteria#
The protocol asserts the following finite set of properties:
| ID | Property | Gate |
|---|---|---|
| O-1 | At tic 1, gore.Run has produced frame 1 (engine startup completed) | A |
| O-2 | At tic 35, the framebuffer matches oracle/frame_tic000035.ppm (host) |
A |
| O-3 | At tic 70/140, identical match against committed oracle | A |
| O-4 | At tic 350, prndindex/mrndindex match manifest (PRNG drift gate) | A |
| O-5 | At tic 1050, deeper-in-engine state still matches oracle | A |
| O-6 | Guest virtio-gpu scanout at tic T has histogram(canvas) within | B |
chi² tolerance of guest_oracle/frame_tic<T>.ppm |
||
| O-7 | All re-runs of harvest-reference produce byte-identical oracle. | A |
| O-8 | The engine emits the SAME sequence of CacheSound + PlaySound calls | C-1 |
| (name, channel, vol, sep, lump sha256) given a fixed WAD + seed | ||
+ script; re-harvested audio_log.json is byte-equal to oracle. |
||
| O-9 | The guest's WAV output envelope (per-second RMS) lies within | C-2 |
±6 dB of oracle/reference.wav for ≥95% of one-second windows. |
Determinism prerequisites#
The gore engine has three sources of non-determinism in its default configuration. The protocol pins each one.
| Source | Default behaviour | Pinned by |
|---|---|---|
| PRNG state | starts at index 0 (already deterministic, by Go zero-value) | godoom.SeedRandom(seed) (explicit, audited) |
| Tic clock | wall-clock time.Since(start_time) |
godoom.SetDeterministicTics(true) + godoom.ResetClock() |
| Input timing | wall-clock send-key from QEMU |
tic-tagged scripted events processed in GetEvent |
| Engine startup time | variable WAD load duration | engine prints "handing off to gore.Run" — runner anchors t=0 there |
Hooks live in cloud-boot/godoom/seed.go (NEW file, GPL-2.0 like the
engine — sits beside doom.go without modifying the transpiled source)
and are wired through the TamaGo backend via
tamago.Frontend.SetSeed(uint64) + tamago.Frontend.ApplyDeterminism().
Oracle artifact format#
Two oracle directories, each committed in git:
Host-side oracle: cloud-boot/godoom/oracle/#
Produced by cloud-boot/godoom/cmd/harvest-reference. Format:
oracle/
├── manifest.json # checkpoint list with SHA-256 + PRNG state
├── frame_tic<NNNNNN>.ppm # binary P6 PPM, exactly the engine's RGB buffer
├── audio_log.json # R-doom1h: deterministic CacheSound/PlaySound
│ # event stream (tic, op, name, channel, vol,
│ # sep, lump sha256). Byte-equal across runs.
└── reference.wav # R-doom1h: reference guest WAV (44.1 kHz
# stereo 16-bit PCM, captured once via run.sh
# with DOOMBOOT_LIVE_AUDIO_WAV). Used by
# GATE C-2 audio_verify.py.
manifest.json schema:
{
"seed": 0,
"wad": "doom1.wad",
"wad_bytes": 28795076,
"wad_hash": "<sha256>",
"script_hash": "<sha256>",
"gore_version": "<git rev>",
"checkpoints": [
{
"tic": 35,
"frame_hash": "<sha256 of the RGB pixel buffer>",
"frame_ppm": "frame_tic000035.ppm",
"prndindex": 0,
"mrndindex": 0,
"frame_count": 68,
"width": 320,
"height": 200
}
],
"deterministic": true
}
Guest-side oracle: cloud-boot/tamago-uefi/internal/livedoomboot/guest_oracle/#
Produced by bash provable_test.sh amd64 with
DOOMBOOT_PROVABLE_HARVEST=1. Format: identical to host oracle, plus a
tolerance_chi2 field on manifest.json (default 50000.0) which
sets the bounded-tolerance gate for guest-vs-guest comparison.
Tolerance model#
| Comparison | Tolerance | Rationale |
|---|---|---|
Host re-harvest vs committed oracle/ |
BYTE-EQUAL | Same engine + WAD + seed must produce identical bytes — any drift is a regression. |
Guest capture vs committed guest_oracle/ |
chi² ≤ 50000 | TamaGo GC stutter shifts tic boundaries by a few frames; histogram on a 64k-pixel canvas is GC-jitter-insensitive. Threshold calibrated at harvest time, recorded in manifest. |
| PRNG state at checkpoint | exact int match | PRNG is a pure table lookup — any mismatch means a missed engine path. |
Host re-harvest vs committed audio_log.json |
BYTE-EQUAL | Same WAD + seed deterministically pin every CacheSound / PlaySound call the engine emits. Drift implies a p_random consumer changed or a sound module path was added/removed. |
Guest WAV vs committed reference.wav |
per-second RMS within ±6 dB for ≥95% of 1 s windows | See "Audio gate metric (GATE C-2)" below — chosen for tolerance to TamaGo GC scheduling jitter while remaining sensitive to genuine "wrong sound played" regressions. |
Failure-mode taxonomy#
| Symptom | Diagnosis | Assertion |
|---|---|---|
GATE A FAIL on frame_tic*.ppm SHA mismatch |
PRNG drift OR engine change OR WAD change | wad_hash in manifest, gore rev recorded |
GATE A FAIL on prndindex mismatch |
input timing drift (event consumed at wrong tic) | per-checkpoint prndindex recorded |
| GATE B FAIL with chi² just over threshold | TamaGo GC stutter exceeded budget | re-harvest to recalibrate threshold |
| GATE B FAIL with chi² massively over | virtio-gpu scanout broken (R-doom1c-style failure) | per-checkpoint chi² printed |
| Engine main loop never starts | UEFI / virtio-input init failure | runner times out at 30s, dumps serial log |
| GATE C-1 FAIL: audio_log.json byte mismatch | p_random consumer drift, sound module rewire, or WAD lump rehash | per-entry diff via diff -u oracle/audio_log.json fresh/audio_log.json |
| GATE C-2 FAIL: per-window RMS exceeds ±6 dB | Audio plumbing change in go-virtio/sound mixer, virtio-sound device config drift, or genuine "wrong SFX played" | audio_verify.py prints max_diff_db + mean_diff_db; window-by-window debug via --window-seconds 0.25 |
How to run#
Generate / regenerate the host oracle#
cd cloud-boot/godoom
GOWORK=off go run ./cmd/harvest-reference \
-wad embedwad/doom1.wad \
-seed 0 \
-checkpoints 1,35,70,140,350,1050 \
-gore-version "$(git rev-parse HEAD)" \
-out oracle/
git add oracle/ && git commit -m "godoom: regenerate provable-protocol oracle"
This writes manifest.json + frame_tic*.ppm (for GATE A) AND
audio_log.json (for GATE C-1) in a single pass.
Regenerate the GATE C-2 reference WAV#
cd cloud-boot/tamago-uefi
task doomboot:efi:amd64
DOOMBOOT_LIVE_AUDIO_WAV=/tmp/doom-ref.wav \
DOOMBOOT_LIVE_TIMEOUT=30 \
bash internal/livedoomboot/run.sh amd64
cp /tmp/doom-ref.wav ../godoom/oracle/reference.wav
git -C ../godoom add oracle/reference.wav \
&& git -C ../godoom commit -m "godoom: refresh GATE C-2 reference.wav"
Manually verify GATE C-1 (host audio-event byte-equal)#
cd cloud-boot/godoom
GOWORK=off go run ./cmd/harvest-reference \
-wad embedwad/doom1.wad -seed 0 -checkpoints 1,35,70,140,350,1050 \
-out /tmp/fresh -verify-audio-log oracle/audio_log.json
# Exit 0 = byte-equal; exit 1 = diff (the harvester prints the
# oracle vs fresh sha256 prefixes).
Manually verify GATE C-2 (guest WAV bounded tolerance)#
cd cloud-boot/tamago-uefi
python3 internal/livedoomboot/audio_verify.py \
--reference ../godoom/oracle/reference.wav \
--capture /tmp/some-capture.wav \
--tolerance-db 6 --min-ratio 0.95 --window-seconds 1.0
Run the provable test#
cd cloud-boot/tamago-uefi
task doomboot:efi:amd64 # build the probe
bash internal/livedoomboot/provable_test.sh amd64
Exit code 0 = all gates PASS; non-zero = at least one gate FAIL with a per-checkpoint diff report on stderr.
Generate / regenerate the guest oracle#
cd cloud-boot/tamago-uefi
DOOMBOOT_PROVABLE_HARVEST=1 bash internal/livedoomboot/provable_test.sh amd64
git add internal/livedoomboot/guest_oracle/ && git commit -m \
"livedoomboot: regenerate guest oracle for provable-protocol gate B"
Provability taxonomy (honest)#
| Gate | Provability class | Notes |
|---|---|---|
| Engine PRNG seed → output | BYTE-EQUAL provable | Proved by TestSeedRandom_DeterministicSequence in godoom/seed_test.go + GATE A re-harvest. |
| Engine tic clock → output | BYTE-EQUAL provable | Proved by GATE A re-harvest (same checkpoints, same bytes, every run). |
| Engine output → guest framebuffer | BOUNDED-TOLERANCE provable | GATE B chi² gate. Bounded because TamaGo GC + wall-clock-driven probe means tic boundaries jitter. |
| Engine audio event stream | BYTE-EQUAL provable | GATE C-1 (R-doom1h). Proved by harvest-reference -verify-audio-log re-running the engine and diffing audio_log.json byte-for-byte. |
| Engine audio events → guest WAV | BOUNDED-TOLERANCE provable | GATE C-2 (R-doom1h). Per-second RMS within ±6 dB for ≥95% of 1-s windows vs oracle/reference.wav. Bounded for the same reason GATE B is. |
Audio gate metric (GATE C-2)#
GATE C-2 compares a WAV captured by QEMU's -audiodev wav backend
against oracle/reference.wav using a per-second RMS envelope
tolerance: each 1-second window's RMS (converted to dBFS with a
silence floor of 1.0) must lie within ±6 dB of the corresponding
window in the reference, for ≥95% of windows. The metric lives in
cloud-boot/tamago-uefi/internal/livedoomboot/audio_verify.py.
Why this metric over the alternatives:
DOOM Phase 1 audio is short SFX bursts (pistol shots, monster grunts,
door open/close) over silence — no continuous music, no sustained
spectral content. The guest's audio output is the deterministic
engine event stream (proved byte-equal by GATE C-1) processed through
go-virtio/sound's PCM mixer and emitted via virtio-sound-pci. The
content of each SFX is identical between runs; the timing may
shift by up to ~100 ms due to TamaGo GC pauses, tamago scheduler
jitter, and virtqueue draining latency.
| Metric | Verdict | Reason |
|---|---|---|
| Sample-level byte-equal | rejected | Even 1 ms of jitter destroys it; uninformative. |
| Cross-correlation | rejected | Hyper-sensitive to <100 ms phase shifts; would FAIL on GC stutter even when audio is otherwise identical. |
| Spectral fingerprint (FFT Pearson r) | rejected | Designed for sustained content. DOOM's "burst over silence" pattern means most FFT windows hit the noise floor; the few SFX-bearing windows dominate r in unpredictable ways depending on window alignment. |
| Per-second RMS envelope | chosen | Each 1-s window absorbs <1 s of jitter transparently. Log-amplitude is a standard audio measure. "X dB tolerance for Y% of windows" is an unambiguous bounded-tolerance contract. |
Calibration evidence: two independent 30-second captures from the
same probe build produced ratio=1.000, max_diff=0.4 dB,
mean_diff=0.0 dB. A capture vs synthetic silence produced
max_diff=83.2 dB, comfortably FAIL. The ±6 dB / 95% threshold is
~15× the observed jitter, leaving headroom for cross-host variance
without going so loose that "wrong SFX played" slips through.
Adding a new checkpoint#
- Decide the engine tic at which the new property should hold (e.g., "after Enter at tic 200, the skill-select screen is visible at tic 230").
- Add a
<tic>:keydown:KEY/<tic>:keyup:KEYpair to the input script (if any) undercloud-boot/godoom/docs/oracle-script.txt. - Re-run
harvest-referencewith the new checkpoint added to-checkpoints, commit the updatedoracle/. - Re-run
provable_test.sh amd64withDOOMBOOT_PROVABLE_HARVEST=1to refreshguest_oracle/and commit it. - The CI workflow (
.github/workflows/doom-provable.yml) picks up the new checkpoint automatically — no workflow change needed.
What this protocol does NOT prove#
- Perceptual audio quality: GATE C-2 proves energy envelope fidelity in dBFS — it cannot distinguish a 440 Hz pure tone from a band-limited noise burst of the same RMS. If the go-virtio/sound mixer ever starts emitting white noise at the right energy, GATE C-2 will PASS. The combination of GATE C-1 (the engine asked for the RIGHT lump) + GATE C-2 (the host received audio with the right envelope) is the strongest empirical+provable bound the protocol offers; sample-accurate WAV equality is intentionally NOT a goal (QEMU's wav backend resamples and TamaGo can't pin the audiodev's internal clock).
- Music: Phase 1 of the demo is SFX-only (engine plays no MUS tracks). When music lands (planned R-doom2a), GATE C-2's metric may need re-justification — continuous content unlocks spectral fingerprinting which is otherwise overkill here.
- Real player input latency: provable_test.sh measures correctness of the input pipeline (the right frames appear after the right inputs), not responsiveness (how many ms between QMP send-key and frame change). The latter is an empirical micro-benchmark (R-doom1f territory).
- Long-term stability: checkpoints stop at tic 1050 (~30 s of engine time). R-doom1c's gpu-virtqueue back-pressure manifests later; that is a separate gate.