ADR-012: Structured Step Errors via fail()
Status
Accepted
Context
Problem: Steps can only succeed (return
StepReturn) or throw. Whenextract()returnserr({ code: "rate_limit", retryable: true }), the step must throw, andrunStep()re-wraps the exception aserr({ code: "execution_failed" }). The original error code andretryableflag are lost. Runners that need retry logic must parse error message strings.Why now: This directly violates kernel invariant #9 (Errors Are Values) at the step boundary – the one place where it matters most. Every LLM step in the sandbox (scenarios 02, 04) works around this by throwing, losing structured error metadata that runners need for retry and audit.
Constraints: Must be additive (existing steps that throw must continue to work). Must not introduce nested
Resulttypes that confuse the API. Must preserve the existingStepErrorcontract for callers ofrunStep()andrun().
Decision
- Chosen option: Introduce a tagged
StepFailurevalue viafail()helper. Steps canreturn fail(...)as an alternative to throwing. - Rationale:
- Consistent with "errors as values" – failures are returned, not thrown
- Zero-cost discrimination via
_tagfield –runStep()checks one property - Additive – existing steps that throw continue to work unchanged
API
import { defineStep, fail } from "verist";
import { extract, type LLMContext } from "@verist/llm";
const step = defineStep({
name: "extract-job",
input: z.object({ text: z.string() }),
output: schema,
run: async (input, ctx: LLMContext) => {
const result = await extract(ctx, request, schema);
if (!result.ok) return fail(result.error);
return { output: result.value.data };
},
});
// Caller sees structured error
const result = await run(step, input, { adapters: { llm } });
if (!result.ok && result.error.retryable) {
// retry with backoff
}Types (Summary)
fail()returns a taggedStepFailurewithcode,message, optionalretryable, optionalcause.runStep()andrecompute()detectStepFailureand normalize toStepErrorwith requiredretryable.- Kernel-owned
StepError.codevalues areinput_validation,output_validation,execution_failed. Any other string code is treated as domain-specific.
Detection in runStep() and recompute()
Both call step.run() and treat a returned StepFailure as a structured error, normalizing retryable to false when omitted. Thrown exceptions are still wrapped as execution_failed. Throwing is reserved for programmer errors and invariant violations.
Alternatives
Nested
Resultfrom steprun(): Step returnsResult<StepReturn, StepError>,runStep()unwraps. Rejected – creates two layers ofResult(step →runStep()→ caller), confusing types. Detection requires checking.okon the return value, which collides with any output schema that happens to have anokfield.Force all steps to return
Result: Breaking change. Rejected – forces migration of all existing steps. Mixed styles (return vs throw) are inevitable in any ecosystem.Error subclasses: Steps throw
StepError extends Errorwith typed fields. Rejected – still uses exceptions for expected failures, violating invariant #9.instanceofchecks are fragile across package boundaries.errorCodeproperty on standardError: Steps throwErrorwith custom properties. Rejected – no type safety, properties are optional and unstructured, easy to forget.
Consequences
- Positive: Structured error codes and
retryableflag survive from adapter through step to runner. Runners can implement retry policies without string parsing. Consistent with existingResultpatterns in storage and LLM layers. - Negative: Two return paths from steps (return value vs throw).
_tagis a convention, not enforced by TypeScript's type system at the return site (step could return a plain object with_tag). Mitigated:fail()is the only documented way to createStepFailure. - Follow-ups: Update SPEC-steps, SPEC-overview API sketch, kernel-invariants #9. Update sandbox scenarios 02/04 to use
fail()instead of throwing.
References
- SPEC-steps
- SPEC-kernel-invariants (#9: Errors Are Values)
- plan.local.md §1
- sandbox/issues.md #1