Fault Types
The five fault types in MAFIS organized in a 3-category taxonomy (Recoverable, Permanent-distributed, Permanent-localized) and how FaultSource distinguishes automatic from manual injection.
MAFIS supports five fault types organized in a 3-category taxonomy based on duration and scope. All types go through the same cascade pipeline (ADG → BFS → replan), which makes their resilience metrics scientifically comparable regardless of how they were triggered.
3-Category Taxonomy
| Category | Types | Duration | Scope |
|---|---|---|---|
| Recoverable | TemporaryBlockage, Latency | Temporary (N ticks) | Cell or agent |
| Permanent-distributed | Overheat, Breakdown | Permanent | Individual agents, randomly distributed |
| Permanent-localized | PermanentZoneOutage | Permanent | Entire zone (contiguous area) |
Each category produces a distinct resilience signature:
- Recoverable faults test how quickly the system adapts and recovers.
- Permanent-distributed faults model fleet attrition. Individual agents die and become permanent obstacles, randomly distributed across the grid.
- Permanent-localized faults eliminate entire zones from the operational map, testing global replanning capacity.
Full Fault Type Reference
| Type | Category | Agent State | Grid Effect | Duration | Recovery |
|---|---|---|---|---|---|
| Overheat | Permanent-distributed | Dead | Cell becomes obstacle | Permanent | None |
| Breakdown | Permanent-distributed | Dead | Cell becomes obstacle | Permanent | None |
| TemporaryBlockage | Recoverable | N/A (cell-based) | Cell becomes unwalkable | Configurable (N ticks) | Auto-removes after N ticks |
| Latency | Recoverable | Alive, degraded | None | Configurable (N ticks) | Agent resumes after N ticks |
| PermanentZoneOutage | Permanent-localized | Agents in zone die | Zone cells become obstacles | Permanent | None |
Overheat
Triggered when an agent’s accumulated heat exceeds overheat_threshold (see Heat System). The agent dies, its cell becomes a permanent obstacle, and all agents whose paths cross that cell must replan. Overheat faults are automatic, arising from sustained congestion and waiting.
Breakdown
A hardware death fault triggered by breakdown_probability on each tick, configurable via FaultConfig. Like Overheat, the agent dies and becomes a permanent obstacle. The distinction matters for analysis: Breakdown is stochastic and uncorrelated with congestion; Overheat is caused by congestion. Both produce identical cascade consequences.
[!WARNING] Overheat and Breakdown are permanent. The agent dies and its cell becomes an obstacle for the remainder of the simulation. Plan your fault intensity accordingly.
TemporaryBlockage
A cell-based fault, not agent-based. A cell becomes unwalkable for a configurable number of ticks (e.g., simulating a human walking through an aisle, a spill, or a dropped package). After N ticks, the cell automatically becomes walkable again. This is a new fault type (not present in earlier versions of MAFIS).
Agents whose paths cross the blocked cell must replan around it. When the blockage clears, agents are not automatically rerouted. They continue on their current paths, which now naturally pass through the restored cell on the next replan cycle.
Latency
An agent-level degradation fault. The affected agent executes Action::Wait for N consecutive ticks regardless of what the solver would assign. After N ticks, the agent resumes normal operation. The agent is alive and occupying a cell during latency. It is not an obstacle, but it is unresponsive to the planner.
Real-world analogy: a robot’s sensor system lags, a communication packet is dropped, or a software hang causes the robot to freeze briefly before recovering.
[!NOTE] Latency faults are the mildest fault type. The agent is alive, occupies its cell, and recovers automatically. Use them to study congestion propagation without permanent fleet attrition.
PermanentZoneOutage
A permanent, localized fault that blocks an entire zone at a configurable tick. The busiest zone is selected deterministically, and its walkable cells become permanent obstacles. Agents standing on blocked cells die immediately. All task assignments into the zone are invalidated.
Parameters:
at_tick: when the blockage fires (e.g., tick 100)block_percent: fraction of zone cells to block (1–100%; default 100%)
Real-world analogy: a fire in a warehouse aisle permanently closes an entire section; a structural collapse blocks a storage zone; a water leak forces evacuation of a delivery area.
[!WARNING] PermanentZoneOutage is the most destructive fault type. A 100% blockage removes all zone cells from the operational map for the remainder of the run.
This fault type tests a failure mode that prior work on k-robust MAPF and delay-based fault models does not cover.
FaultSource
All faults carry a FaultSource tag:
pub enum FaultSource {
Automatic, // System-generated via heat/probability
Manual, // Researcher-injected via UI
Scheduled, // From a FaultSchedule scenario
}
Manual faults are injected while the simulation is paused (click a robot → “Kill” / “Block for N ticks” / “Slow for N ticks”). They are tagged FaultSource::Manual so they can be distinguished in analysis and export, but their metrics are computed through the same cascade pipeline as automatic faults. A manual kill produces the same cascade depth, spread, and recovery dynamics as an automatic breakdown.
[!IMPORTANT] This ensures that manual injection experiments produce scientifically valid comparisons to automatic fault runs. Manual and automatic faults go through the identical cascade pipeline.
Fault Intensity Configuration
The rate of automatic fault generation is controlled by FaultConfig:
| Parameter | Effect |
|---|---|
breakdown_probability | Per-tick probability that any living agent suffers a Breakdown |
overheat_threshold | Heat level that triggers Overheat (lower = more frequent) |
heat_per_wait | Heat accumulated per tick an agent waits |
heat_per_move | Heat accumulated per tick an agent moves |
congestion_heat_bonus | Extra heat added per nearby agent within congestion_heat_radius |
heat_dissipation | Heat lost per tick when not congested |
The UI exposes fault intensity presets (Off / Low / Medium / High) that set these parameters together.
All Faults Through One Pipeline
Regardless of type or source, every fault goes through the same pipeline:
- Heat/FaultCheck phase: fault is registered, agent state updated, cell obstacle status updated
- ADG construction: Agent Dependency Graph identifies which agents’ paths are blocked by the new state
- BFS propagation: cascade depth and spread computed
- Replan phase: affected agents get new plans from the active solver
- Metrics: MTTR, cascade depth/spread, throughput delta computed in
AnalysisSet::Metrics
See Cascade Propagation for the ADG pipeline in detail.