Reproducing the Results

Step-by-step pipeline to reproduce all paper numbers from the MAFIS source tree: type check, test suite, full experiment matrix, and analysis scripts.

Every number in the paper comes from a deterministic pipeline. This page is the operational version of REPRODUCIBILITY.md in the MAFIS repository.

Requirements

Rust 1.75 or newer (tested on 1.93.0)
Python 3.8 or newer with pandas, matplotlib, numpy, scipy
Roughly 4 GB of RAM for the full experiment matrix
Roughly 2 GB of disk for compiled binaries and result CSVs

Quick verification (about 3 minutes)

cargo check
cargo test

cargo check runs the type and borrow checker in about 5 seconds. cargo test runs the full test suite (568 tests across core, solver, fault, analysis, experiment, and integration suites) in about 3 minutes. Every test should pass with zero failures.

Full experiment suite (about 7 hours on a laptop)

cargo test --release --test experiment_suite \
  full_experiment_suite -- --ignored --nocapture

This runs 4,320 paired faulted-and-baseline simulations across three sub-experiments. run_matrix parallelises over N-1 rayon threads (15 threads on the M4 Pro used for the paper data).

Sub-experiment	Configuration	Runs
Warehouse Single-Dock	3 solvers × 6 scenarios × 3 densities × 30 seeds	1,620
Warehouse Dual-Dock	3 solvers × 6 scenarios × 3 densities × 30 seeds	1,620
Scheduler effect	3 solvers × 6 scenarios × 2 schedulers × 30 seeds	1,080

The runs write to results/:

File	Contents
`results/warehouse_single_dock_experiment_runs.csv`	Per-run metrics, Single-Dock
`results/warehouse_single_dock_experiment_summary.csv`	Aggregated stats
`results/warehouse_dual_dock_experiment_runs.csv`	Per-run metrics, Dual-Dock
`results/warehouse_dual_dock_experiment_summary.csv`	Aggregated stats
`results/scheduler_effect_experiment_runs.csv`	Per-run metrics, scheduler
`results/scheduler_effect_experiment_summary.csv`	Aggregated stats
`results/all_runs.csv`	Combined

Auxiliary aisle-width probe (about 1 hour)

cargo test --release --lib \
  run_rhcr_braess_observatory_proof -- --ignored --nocapture

About 600 paired runs across SD-w1 n=60, SD-w2 n=108, SD-w3 n=151. Tests whether the FT > 1.2 cells observed under recoverable faults arise from PBS node-budget saturation. The test is idempotent. Completed matrices are skipped on resume.

Then run the analysis:

python3 scripts/analysis/rhcr_braess_observatory_proof.py

Statistical analysis (about 1 second each)

All analysis scripts use pandas, matplotlib, numpy, and scipy.

Script	Purpose
`structural_cascade_scaling.py`	Per-tier structural cascade regression vs walkable area
`mitigation_delta.py`	Mitigation Δ by solver and aisle width
`ft_baseline_audit.py`	Baseline validity flags, including FT > 1.2 cells
`delta_diff.py`	Pre/post-fix drift table

Run each from the repo root:

python3 scripts/analysis/structural_cascade_scaling.py
python3 scripts/analysis/mitigation_delta.py
python3 scripts/analysis/ft_baseline_audit.py

Outputs land under results/aisle_width/analysis/ as PNG figures and JSON statistical summaries.

Determinism

All simulations use a ChaCha8 generator seeded per configuration. Paired runs (baseline and faulted) share the seed, so metric differences are causally attributable to the fault condition rather than between-run variance. Traces are bit-identical within a single machine. Cross-machine traces may differ by one floating-point ULP on parallel reductions. Run all seeds on one machine for strict reproducibility.

Solver fidelity

Each solver traces to a public reference implementation:

PIBT to pibt2
RHCR-PBS to Jiaoyang-Li/RHCR
Token Passing to the original Ma et al. paper

The repository’s RELIABILITY.md documents the per-solver fidelity audit, deviations from reference implementations, and test coverage gates that protect each port from drift.