Skip to content

Durable AI Execution

Patterns for AI workflows that survive partial failure and remain inspectable after each run.

Checkpointing

Checkpoint every meaningful transition: inputs, tool calls, decisions, outputs, and recovery metadata.

Replay

Replay turns failures into inspectable events instead of ambiguous model behavior.

Related projects

Shipped systems where this concept runs in production.