Skip to Content
Platform CapabilitiesAI-in-the-Loop

AI-in-the-Loop

Reduce review volume by letting AI score, suggest, and pre-triage work before it reaches a person.

What This Means in M3 Forge

AI-in-the-Loop extends automation beyond generation and extraction. In M3 Forge, AI can also:

  • Evaluate outputs with workspace evaluators and guardrails
  • Suggest annotations during processor labeling
  • Score traces and sessions in observability flows
  • Compare prompt variants before promotion
  • Route only the hardest exceptions to human reviewers

This pattern gives teams a middle layer between “fully automatic” and “fully manual.”

Where It Shows Up

SurfaceAI-in-the-loop behavior
Prompt CompareScore prompt variants with saved evaluators before rollout
GuardrailsRoute low-quality outputs to retry, fallback, or review
ObservabilityRun evaluator schedules against traces and sessions and persist score annotations
Processor LabelingSuggest annotations to accelerate dataset creation
HITLDeliver smaller, better-prioritized exception queues to people

Example: Review Queue Compression

Generate or extract the first-pass result

Run the normal workflow, processor, or agent flow.

Score the output

Apply evaluators such as faithfulness, relevance, schema checks, or custom code evaluators.

Annotate production behavior

Persist evaluator results into trace or session annotations so teams can see quality patterns over time.

Send only weak cases to humans

Use threshold failures, missing fields, or policy exceptions to determine which items enter the HITL queue.

Example Pattern

Example Evaluator Configuration

{ "name": "invoice_quality", "kind": "ts_code", "target": "trace", "threshold": 0.85, "dimensions": ["schema", "field_completeness", "groundedness"] }

Good Fits

  • Reduce prompt-comparison guesswork before shipping a new prompt version
  • Score production traces continuously instead of waiting for manual QA sampling
  • Speed up extractor training with AI-generated annotation suggestions
  • Keep humans focused on novel errors rather than obvious passes

Human-in-the-Loop vs AI-in-the-Loop

CapabilityPrimary actorBest used for
Human-in-the-LoopReviewerApprovals, policy calls, business exceptions
AI-in-the-LoopEvaluator or assistant modelScoring, suggestion, filtering, pre-triage

The strongest systems use both. AI shrinks the queue. Humans resolve what still needs judgment.

AI-in-the-loop is most valuable when it is observable. Score annotations, evaluator history, and prompt comparisons turn reviewer effort into measurable signal instead of hidden labor.

Last updated on