AI-in-the-Loop

Reduce review volume by letting AI score, suggest, and pre-triage work before it reaches a person.

What This Means in M3 Forge

AI-in-the-Loop extends automation beyond generation and extraction. In M3 Forge, AI can also:

Evaluate outputs with workspace evaluators and guardrails
Suggest annotations during processor labeling
Score traces and sessions in observability flows
Compare prompt variants before promotion
Route only the hardest exceptions to human reviewers

This pattern gives teams a middle layer between “fully automatic” and “fully manual.”

Where It Shows Up

Surface	AI-in-the-loop behavior
Prompt Compare	Score prompt variants with saved evaluators before rollout
Guardrails	Route low-quality outputs to retry, fallback, or review
Observability	Run evaluator schedules against traces and sessions and persist score annotations
Processor Labeling	Suggest annotations to accelerate dataset creation
HITL	Deliver smaller, better-prioritized exception queues to people

Example: Review Queue Compression

Generate or extract the first-pass result

Run the normal workflow, processor, or agent flow.

Score the output

Apply evaluators such as faithfulness, relevance, schema checks, or custom code evaluators.

Annotate production behavior

Persist evaluator results into trace or session annotations so teams can see quality patterns over time.

Send only weak cases to humans

Use threshold failures, missing fields, or policy exceptions to determine which items enter the HITL queue.

Example Pattern

Example Evaluator Configuration


{
  "name": "invoice_quality",
  "kind": "ts_code",
  "target": "trace",
  "threshold": 0.85,
  "dimensions": ["schema", "field_completeness", "groundedness"]
}

Good Fits

Reduce prompt-comparison guesswork before shipping a new prompt version
Score production traces continuously instead of waiting for manual QA sampling
Speed up extractor training with AI-generated annotation suggestions
Keep humans focused on novel errors rather than obvious passes

Human-in-the-Loop vs AI-in-the-Loop

Capability	Primary actor	Best used for
Human-in-the-Loop	Reviewer	Approvals, policy calls, business exceptions
AI-in-the-Loop	Evaluator or assistant model	Scoring, suggestion, filtering, pre-triage

The strongest systems use both. AI shrinks the queue. Humans resolve what still needs judgment.

AI-in-the-loop is most valuable when it is observable. Score annotations, evaluator history, and prompt comparisons turn reviewer effort into measurable signal instead of hidden labor.