Menu
◇ HANDOFFWEEK 11+ · AFTER HANDOFF

After handoff. What you own, what we still owe you, when to call us back.

An engagement ends in week ten, eleven, or twelve. After it ends, your team runs the system. This page is what that actually looks like — the artifacts you walk away with, the 30-day support window that bridges the gap, and the explicit triggers that suggest a call back. Engagements don't auto-convert into retainers — we build for your team's independence from day one.

01 / WHAT YOU OWN

Five artifacts, all yours — no retained rights, no telemetry.

  • ◇ ARTIFACT

    Production deploy

    Running in your infrastructure, on your accounts, behind your access controls. We do not host inference for you after handoff.

  • ◇ ARTIFACT

    Eval suite

    Runner, golden dataset, scoring rubric, CI gate. Versioned in your repo. Your engineers extend it; we do not gatekeep the suite.

  • ◇ ARTIFACT

    Dashboard

    Eval-pass rate, p95 latency, cost per request, drift signals. Wired to your observability stack — no Proof-of-Tech-hosted reporting.

  • ◇ ARTIFACT

    Runbooks

    One per failure mode the production deploy actually surfaced. Written with your on-call in the room. Generic playbooks do not survive contact with real incidents.

  • ◇ ARTIFACT

    Training

    Two half-day sessions, recorded, with your named owners. We cover the eval-suite extension protocol, the runbook contract, and the incident playbook end-to-end.

02 / THE RUNBOOK CONTRACT

What is in the runbook, and why generic ones do not survive.

The runbook is not a document we write in week eleven and email you. It is a per-failure-mode operations contract, written with your on-call in the room, against the failure modes the production deploy actually surfaced during the build phase. Every entry has a trigger condition, a measurable signal, a procedure, and an owner.

The default runbook covers, at minimum:

  • Incident playbook — one entry per failure mode, with rollback paths for prompts, models, retrievers, and tool configurations.
  • Scaling triggers — when to add capacity, when to shed load, and where the non-linear cost cliffs live in your stack.
  • Eval-drift signals — what to look at when the harness flags a regression, and how to distinguish a model issue from a data-distribution one.
  • Cost ceilings — per-tenant, per-tool, per-step caps wired to circuit-breakers that fail closed.
  • Disaster recovery — data loss, key compromise, vendor outage. We name the vendors and the contingencies for each.

Generic playbooks tell you to "check the logs" and "rollback if needed." A runbook earned against real incidents tells you which query the dashboard is asking, which version to roll back to, and which engineer owns the page.

03 / 30-DAY SUPPORT WINDOW

The bridge between handoff and your team running it alone.

IN SCOPE · DEFAULT
  • Incident triage with your team in a shared Slack channel.
  • Runbook updates as your on-call finds gaps in the first 30 days.
  • Eval-suite tuning when the harness flags something the rubric did not catch.
  • One drift-detection review at day 14 and day 28, written up as a short memo.
OUT OF SCOPE · DEFAULT
  • New features beyond what shipped in the build phase.
  • Model swaps and major architectural changes.
  • New eval categories not covered by the suite at handoff.

Anything out of scope can be added by a written addendum — same engagement letter, scoped deliverables. We will not surface-creep without one.

04 / WHEN TO CALL US BACK

Six triggers. Three are routine ops; three usually warrant a sprint.

T1 ROUTINE OPS

Eval-pass rate drops below the budget set in week two.

Your harness already catches this — that's why we wrote the budget into the CI gate. The drop is the signal, not the diagnosis. If your team can read the per-category breakdown and the recent dataset changes, this is a normal incident. If it persists past one debugging cycle, it warrants a call.

T2 ROUTINE OPS

A new model you want to A/B against your harness.

Routine ops. The eval suite is yours; point it at the new model, read the diff, decide. The reason to call is when the new model wins on aggregate but loses on a category you care about — that is an eval-design conversation, not a model-swap one.

T3 ROUTINE OPS

Cost per request creeps above the ceiling.

The dashboard shows where the cost lives — usually rerankers, tool calls, or context bloat. Your team can read the breakdown and apply the obvious moves. Call us when the obvious moves are not enough and the next step is a structural change.

05 / LONG-RUNNING RELATIONSHIPS

Retainers are separate, and entirely optional.

A retainer is a separate contract, scoped explicitly, with a quarterly review. Things that are reasonable under a retainer: quarterly eval reviews, model-swap support against the harness, eval-set maintenance as your corpus shifts, and a named escalation path for incidents your team cannot resolve in the first hour.

A retainer should map to real, ongoing work — not exist to defend revenue. If a quarter passes without a real ticket, the retainer ends or the price drops to match the work — your call, in writing, at the review.

The first call to discuss any of this is the same as the first call to discuss any other engagement — hello@proofoftech.org , 30 minutes, no decks. If you have a runbook of ours and a working harness, mention it in the first message and we will pull the engagement file before the call.

◇ NEXTA 30-MIN CALL · NO DECKS

Bring us a hard problem.
We'll show you what we'd build.

The first call is a free 30 minutes. You'll come away with a one-page memo on what we'd build and how we'd approach it.

or hello@proofoftech.org
NEW ENGAGEMENT · INTAKE

Tell us about it.

The more specific you are, the more useful our first reply.

SERVICE AREA
↩ ENCRYPTED IN TRANSIT