Docs Simulator Blog About Github ↗

Reproducing the Results

Step-by-step pipeline to reproduce all paper numbers from the MAFIS source tree: type check, test suite, full experiment matrix, and analysis scripts.

Every number in the paper comes from a deterministic pipeline. This page is the operational version of REPRODUCIBILITY.md in the MAFIS repository.

Requirements

  • Rust 1.75 or newer (tested on 1.93.0)
  • Python 3.8 or newer with pandas, matplotlib, numpy, scipy
  • Roughly 4 GB of RAM for the full experiment matrix
  • Roughly 2 GB of disk for compiled binaries and result CSVs

Quick verification (about 3 minutes)

cargo check
cargo test

cargo check runs the type and borrow checker in about 5 seconds. cargo test runs the full test suite (568 tests across core, solver, fault, analysis, experiment, and integration suites) in about 3 minutes. Every test should pass with zero failures.

Full experiment suite (about 7 hours on a laptop)

cargo test --release --test experiment_suite \
  full_experiment_suite -- --ignored --nocapture

This runs 4,320 paired faulted-and-baseline simulations across three sub-experiments. run_matrix parallelises over N-1 rayon threads (15 threads on the M4 Pro used for the paper data).

Sub-experimentConfigurationRuns
Warehouse Single-Dock3 solvers × 6 scenarios × 3 densities × 30 seeds1,620
Warehouse Dual-Dock3 solvers × 6 scenarios × 3 densities × 30 seeds1,620
Scheduler effect3 solvers × 6 scenarios × 2 schedulers × 30 seeds1,080

The runs write to results/:

FileContents
results/warehouse_single_dock_experiment_runs.csvPer-run metrics, Single-Dock
results/warehouse_single_dock_experiment_summary.csvAggregated stats
results/warehouse_dual_dock_experiment_runs.csvPer-run metrics, Dual-Dock
results/warehouse_dual_dock_experiment_summary.csvAggregated stats
results/scheduler_effect_experiment_runs.csvPer-run metrics, scheduler
results/scheduler_effect_experiment_summary.csvAggregated stats
results/all_runs.csvCombined

Auxiliary aisle-width probe (about 1 hour)

cargo test --release --lib \
  run_rhcr_braess_observatory_proof -- --ignored --nocapture

About 600 paired runs across SD-w1 n=60, SD-w2 n=108, SD-w3 n=151. Tests whether the FT > 1.2 cells observed under recoverable faults arise from PBS node-budget saturation. The test is idempotent. Completed matrices are skipped on resume.

Then run the analysis:

python3 scripts/analysis/rhcr_braess_observatory_proof.py

Statistical analysis (about 1 second each)

All analysis scripts use pandas, matplotlib, numpy, and scipy.

ScriptPurpose
structural_cascade_scaling.pyPer-tier structural cascade regression vs walkable area
mitigation_delta.pyMitigation Δ by solver and aisle width
ft_baseline_audit.pyBaseline validity flags, including FT > 1.2 cells
delta_diff.pyPre/post-fix drift table

Run each from the repo root:

python3 scripts/analysis/structural_cascade_scaling.py
python3 scripts/analysis/mitigation_delta.py
python3 scripts/analysis/ft_baseline_audit.py

Outputs land under results/aisle_width/analysis/ as PNG figures and JSON statistical summaries.

Determinism

All simulations use a ChaCha8 generator seeded per configuration. Paired runs (baseline and faulted) share the seed, so metric differences are causally attributable to the fault condition rather than between-run variance. Traces are bit-identical within a single machine. Cross-machine traces may differ by one floating-point ULP on parallel reductions. Run all seeds on one machine for strict reproducibility.

Solver fidelity

Each solver traces to a public reference implementation:

The repository’s RELIABILITY.md documents the per-solver fidelity audit, deviations from reference implementations, and test coverage gates that protect each port from drift.