Docs Simulator Blog About Github ↗

Introduction to MAFIS

What MAFIS is, why lifelong fault resilience matters, and how to get started.

MAFIS (Multi-Agent Fault Injection Simulator) is a fault resilience observatory for lifelong multi-agent path finding (MAPF). It measures how multi-agent systems degrade, recover, and adapt under faults, congestion, and cascading failures sustained over continuous operation.

MAFIS is a research project built in Rust using the Bevy engine, compiled to both WebAssembly (browser) and native desktop. It runs deterministic simulations where every fault event, cascade, and recovery is reproducible from a seed.

The Research Question

What happens to a multi-agent fleet under sustained fault injection?

Different solvers, fault types, and topologies produce different degradation and recovery patterns. MAFIS makes those differences observable and measurable.

Research Variables

VariableOptions
SolverPIBT, RHCR-PBS, Token Passing
Fault typeBurst, Wear-based (Weibull), Spatial zone outage, Intermittent
Grid topologyWarehouse Medium, Warehouse Large, Compact Grid, Kiva Warehouse, Sorting Center, Fulfillment Center
Agent densityConfigurable (up to 1,000 agents in WASM, 5,000 native)
Scheduler strategyRandom, Closest

MAFIS is built to study how solver architecture, fault type, and topology interact under sustained fault conditions.

Resilience Scorecard (Live Observatory)

Interactive MAFIS sessions display a real-time Resilience Scorecard with four indicators:

  • Fault Tolerance (FT) — throughput retained under faults
  • NRR — operational uptime ratio, recovery speed vs fault frequency (requires recurring faults)
  • Survival Rate — fraction of the initial fleet still alive after faults
  • Critical Time (CT) — fraction of time spent in a critically degraded state

See Resilience Scorecard for formulas and examples.

Six Primary Experiment Metrics

Batch experiment runs (headless, reproducible) report six differential metrics designed for cross-configuration comparison:

MetricWhat it measures
Fault Tolerance (FT)Throughput retention ratio vs fault-free baseline
Critical Time (CT)Fraction of post-fault ticks below 50% baseline
TWTETime-weighted Throughput Error — penalizes slow recovery
Attack Rate (AR)Fraction of fleet ever killed or cascade-affected by any fault
Cascade DepthMean BFS depth on the ADG across all fault events
RapidityTicks to ≥90% baseline throughput for 5 consecutive ticks (recoverable faults only)

See Fault Metrics for formulas, examples, and research origins.

Two Versions

Web (WASM)Desktop (Native)
InstallNone (runs in browser)Download binary
Performance1,000 agents, 60 FPS5,000 agents, parallel computation
InterfaceHTML/CSS/JS controlsFull Egui panel system
Use caseQuick experiments, demosBatch experiments, parameter sweeps

Both versions share the same simulation core. Same seed = identical results.

A CLI tool (mafis) is also available for headless batch experiments and scripted parameter sweeps without any graphical interface.

What MAFIS Is NOT

[!WARNING] Not a solver benchmark. MAFIS does not compare algorithms against each other. Algorithm comparisons are MAPF Tracker’s domain.

[!WARNING] Not a static testbed. For standardized 2D grid environments, see MovingAI MAPF Benchmarks.

[!WARNING] Not a one-shot simulator. MAFIS measures degradation under continuous operation, not time-to-completion for a fixed set of goals.

Getting Oriented