Deterministic Replay
Exact reproduction from stored artifacts. Same inputs + recorded LLM responses = identical output every time.
See exactly what changes when you update a prompt or model — before it hits production.
$ verist test classify-ticket · 10 baselines ✓ 8 unchanged ~ 2 changed ticket-002 category: "urgent" → "normal" confidence: 0.92 → 0.71 ticket-009 category: "urgent" → "normal" confidence: 0.88 → 0.65 ⚠ 2 regressions — review before shipping
change prompt → recompute → see diff → approve → ship
Run a step, store output as a versioned artifact.
Re-run with a new prompt or model against the same inputs.
See exactly which fields changed, were added, or removed.
Review the impact, then ship with confidence.
Git for AI decisions
Define a step, capture a baseline, recompute with a new prompt, and see the diff.
import { defineStep, run, recompute, formatDiff } from "verist";
const classify = defineStep({
name: "classify-ticket",
input: z.object({ text: z.string() }),
output: z.object({ priority: z.enum(["high", "medium", "low"]) }),
run: async (input, ctx) => {
const result = await ctx.adapters.llm.extract(input.text);
return { output: result };
},
});
// Capture baseline
const baseline = await run(classify, { text: "..." }, { adapters });
// Recompute with new prompt
const result = await recompute(baseline, classifyV2, { adapters });
// See what changed
console.log(formatDiff(result.outputDiff));