Verist: Next Steps (Impact Plan)

Focus (from verist-ops problems)

Tier 1 wedge: Structured Output Regression
Tier 2 expansion: Safe Recompute (overrides preserved)
Tier 3 strategic: Decision Audit + Decision Backtesting

Shipped

verist init scaffolds a deterministic step + sample inputs (no API keys needed).
verist capture --sample N --seed S for deterministic sampling.
verist capture --meta key=value persisted in baseline envelopes.
verist test --format json|markdown with exit codes (0 = clean, 1 = diffs, 2 = infra).
Anthropic adapter with normalized llm-input / llm-output artifacts.
OpenAI adapter supports baseURL (Ollama, Azure, Fireworks, etc.).
Cross-provider normalized artifacts hash identically for equivalent content.
examples/prompt-diff/quickstart.ts – end-to-end LLM regression demo.
README quickstart covers both zero-friction (regex) and LLM paths.
CI integration guide with GitHub Actions examples.
Observational schema validation in recompute; RecomputeResult.status classifies diffs.
In-memory RunStore + overlay recompute example.
defineExtractionStep() shorthand – eliminates schema duplication and manual onArtifact.
fail() for structured step errors with retryable flag.
StepResult.artifacts – automatic artifact collection without callbacks.
ctx.emitEvent() – audit events without manual plumbing.
Flattened StepResult – result.value.output instead of result.value.output.delta.
27 DX issues resolved from sandbox testing (see ../verist-sandbox/issues.md).

Current State

The Tier 1 API is stable at v0.0.5. A user can go from verist init to first diff without API keys, and from examples/prompt-diff to a real LLM regression diff with one command. CI output formats are stable. The last 6 PRs were DX-driven refinements – the API surface feels settled.

The gap is no longer tooling – it's validation and distribution. No external team has used Verist in production. The thesis (structured output regression is acute pain) is well-reasoned but unproven with paying customers.

Three open DX issues remain (adapter annotation for non-LLM steps, diff() discoverability, createSnapshotFromResult naming) – none are blockers for adoption.

Top 5 Deliverables (Adoption-First)

1. README as Adoption Funnel (P0)

Why first: The README is the front door. A prospect who can't self-qualify in 60 seconds bounces. Right now it shows capabilities but doesn't help someone decide "is this for me?"

Scope:

Add a "Good fit / Not a fit" checklist above the quickstart.
Funnel to one adoption path: init → capture → test (the Tier 1 wedge).
Lead with the problem ("You updated your extraction prompt. What broke?"), not the solution.
Cut secondary content (Tier 2/3 features, architecture details) to linked pages.
Ensure the quickstart terminal output is visible and compelling (the "aha" diff).

Acceptance:

A new user can self-qualify before installing.
The README tells one story with one call to action.

2. Polish `verist init` → First Diff (P0)

Why second: The zero-friction path IS the wedge. If verist init → verist test doesn't deliver a clear "aha" in under 60 seconds, the README promise falls flat.

Scope:

Audit the init scaffolding end-to-end: install, init, capture baseline, break, diff.
Ensure the generated step + inputs produce a meaningful, easy-to-read diff.
The scaffolded project should run verist test out of the box with zero edits.
Terminal output should be self-explanatory (no need to read docs to understand the diff).
Consider: can init scaffold a verist.config.ts so capture and test work immediately?

Acceptance:

npx verist init && verist capture && verist test produces a clear regression diff.
A first-time user understands what happened without reading docs.

3. Problem-Framing Content (P1)

Why third: Distribution is the bottleneck, not features. The right engineers need to encounter the problem framing before they encounter the tool.

Scope:

Blog post / article: "You updated your extraction prompt. What broke?"
Frame the problem (silent regressions in structured LLM output), not the tool.
Include a concrete before/after: prompt change → field disappears → downstream breaks.
End with the solution pattern (capture → recompute → diff) and link to Verist.
Short demo GIF: capture baseline → tweak prompt → see diff in terminal.

Acceptance:

One published piece that frames the problem clearly.
Shareable on HN, AI engineering communities, Twitter/X.

4. Copyable CI Workflow Template (P1)

Why fourth: Bridges "I tried it locally" → "it's in my pipeline." CI integration is the stickiness mechanism – once diffs run on every PR, Verist becomes infrastructure.

Scope:

Working .github/workflows/verist.yml in examples/ci/.
Handles: checkout, install, run verist test --format markdown, post PR comment.
Works with committed baselines (no capture step in CI – baselines are checked in).
Document the two patterns: baselines-in-repo vs baselines-from-capture.
Exit codes already work (0 = clean, 1 = diffs, 2 = infra) – template should use them.

Acceptance:

Copy-paste into any repo with verist.config.ts + committed baselines → works.
PR comment shows markdown diff table on regression.

5. Safe Recompute End-to-End Example (P1)

Why fifth: This is the Tier 2 hook – the reason teams stay after adopting for regression testing. The examples/overlay-recompute/ exists but doesn't tell a compelling story yet.

Scope:

Rework the overlay-recompute example into a clear narrative:
1. AI extracts a risk assessment from a document.
2. Human reviewer corrects one field (e.g., risk level: "medium" → "high").
3. Model upgrades. Recompute runs.
4. AI output changes, but the human correction is preserved in effective state.
Show the three-layer state model visually in terminal output.
Make it runnable without API keys (deterministic step, like the init scaffolding).

Acceptance:

Running example that shows human corrections surviving a recompute.
Clear before/after demonstrating the problem (corrections lost) and solution (preserved).

Priorities

Priority	Deliverable	Impact
P0	README as adoption funnel	Front door – self-qualification in 60 seconds
P0	Polish init → first diff	Delivers the "aha" that the README promises
P1	Problem-framing content	Gets the problem in front of the right people
P1	Copyable CI workflow	Stickiness – diffs on every PR
P1	Safe recompute example	Tier 2 hook – why teams stay

What Not to Build (Yet)

Adapter step-level declaration – nice DX but affects few users (non-LLM adapters only)
Domain primitives (claims, evidence, verdicts) – user space, not kernel
Review queues – enterprise feature, not adoption driver
Dashboard – premature before paying users
Additional storage backends – Postgres is enough
Backtesting windows – Tier 3, no demand yet (YAGNI)
More LLM adapters – two providers cover the majority
createSnapshotFromResult rename – less important now that recompute(StepResult) exists

Immediate Next Steps

Rewrite README with fit/no-fit checklist and single adoption funnel
Audit and polish verist init end-to-end flow
Draft problem-framing blog post outline

Verist: Next Steps (Impact Plan) ​

Focus (from verist-ops problems) ​

Shipped ​

Current State ​

Top 5 Deliverables (Adoption-First) ​

1. README as Adoption Funnel (P0) ​

2. Polish verist init → First Diff (P0) ​

3. Problem-Framing Content (P1) ​

4. Copyable CI Workflow Template (P1) ​

5. Safe Recompute End-to-End Example (P1) ​

Priorities ​

What Not to Build (Yet) ​

Immediate Next Steps ​

Verist: Next Steps (Impact Plan)

Focus (from verist-ops problems)

Shipped

Current State

Top 5 Deliverables (Adoption-First)

1. README as Adoption Funnel (P0)

2. Polish `verist init` → First Diff (P0)

3. Problem-Framing Content (P1)

4. Copyable CI Workflow Template (P1)

5. Safe Recompute End-to-End Example (P1)

Priorities

What Not to Build (Yet)

Immediate Next Steps