Skip to Content
MonitoringLLM Dispatch Runtime

LLM Dispatch Runtime

The LLM Dispatch Runtime is Marie’s internal execution fabric for LLM calls that originate from executors. It gives operators live visibility into queued, in-flight, completed, failed, and dropped LLM work without making every executor call the model backend directly.

It is not a replacement for LiteLLM, OpenRouter, vLLM, or another OpenAI-compatible backend. Marie Dispatch handles executor ingress and request lifecycle. The configured backend handles provider routing.

Responsibility Split

Use this boundary when operating or debugging the system:

LayerOwnerResponsibility
Executor ingressMarie Gateway / Runtime FabricReceive executor-originated LLM work and select the configured dispatch pool
DispatchMarie LLM Dispatch RuntimeValkey request/reply transport, producer liveness, in-flight state, timeouts, drops, backpressure, and runtime health
ExecutionOpenAI-compatible adapterSend one chat.completions.create(...) call to the configured backend URL
Provider routingLiteLLM, OpenRouter, vLLM, or hosted providerProvider fallback, model routing, budgets, rate limits, and provider-specific policy
ObservabilityOTel and ClickHouseTrace correlation, completed execution history, latency, tokens, and failure analytics

If the backend URL points at LiteLLM or OpenRouter, provider fallback chains, cost/latency routing, budgets, and provider rate limits belong there. Marie Dispatch still owns dispatch-layer retry, timeout, circuit-breaker, durability, and backpressure semantics.

Request Flow

Executor -> Marie Gateway / Runtime Fabric ingress -> Valkey request queue -> LLM Dispatch Runtime -> OpenAI-compatible backend URL -> provider gateway/backend routing -> completion response -> producer reply queue -> executor reply pump

Phase 1 uses Valkey LISTs for live transport:

  • list:llm:requests:{pool_id} stores pending requests.
  • list:llm:replies:{producer_id} stores replies for a specific producer.
  • key:llm:producer:{producer_id}:alive indicates that the producer can still receive replies.

Runtime Fabric Page

Open Infrastructure → LLM Dispatch to inspect the selected Runtime Fabric.

The page has two views:

  • Live Requests shows pending requests still in Valkey and in-flight requests already popped by a dispatcher.
  • Recent Executions reads completed dispatcher spans from ClickHouse otel.otel_traces.

Live state is intentionally separate from completed history. Valkey is transport and live state only. Completed request history comes from OTel/ClickHouse.

Best-Effort Semantics

Phase 1 is best-effort after a dispatcher pops a request:

  • If a dispatcher crashes after BLPOP and before publishing a reply, that request may be lost.
  • If the producer liveness key is gone before execution, the dispatcher drops the popped request.
  • If the producer disappears after execution but before reply publish, the dispatcher drops the reply.
  • Scheduler-level retry or resubmission remains above the dispatch runtime.

This is a deliberate Phase 1 tradeoff. Do not treat Valkey LIST state as a durable audit ledger.

Required Configuration

Gateway dispatch runtime:

LLM_QUEUE_ENABLED=true LLM_QUEUE_VALKEY_URL=redis://marie-valkey:6379/0 LLM_QUEUE_POOL_ID=default LLM_QUEUE_MAX_INLINE_PAYLOAD_BYTES=16777216 LLM_QUEUE_FABRIC_GROUP_ID=default LLM_QUEUE_GATEWAY_ID=gateway-localhost OPENAI_API_KEY=EMPTY OPENAI_API_BASE=http://litellm:4000/v1

Executor producers:

LLM_QUEUE_ENABLED=true LLM_QUEUE_VALKEY_URL=redis://marie-valkey:6379/0 LLM_QUEUE_POOL_ID=default LLM_QUEUE_MAX_INLINE_PAYLOAD_BYTES=16777216

Processors do not need LLM_QUEUE_FABRIC_GROUP_ID or LLM_QUEUE_GATEWAY_ID unless they also run a dispatcher.

What To Debug Where

SymptomFirst place to check
Pending count growsRuntime Fabric live view and Valkey request queue depth
In-flight count sticksDispatcher health, backend latency, and in-flight request rows
Requests disappear without completionProducer liveness drops or dispatcher crash-after-pop
Backend connection errorsGateway dispatcher logs and backend URL configuration
Provider fallback did not happenLiteLLM/OpenRouter/provider gateway configuration
Budget or rate-limit rejectionDownstream provider gateway logs and policy
Completed item missing from historyOTel exporter, ClickHouse otel.otel_traces, and Runtime Fabric identity env vars

Current Design Gaps

These are Marie Dispatch responsibilities and should be designed as runtime features:

  • dispatch-layer circuit breaker around the configured backend URL
  • explicit retry attempts and retry outcome in UI/history
  • durable replay mode after dispatcher crash
  • clearer admission/backpressure UX beyond pending and in-flight counts

These are intentionally delegated to the downstream provider gateway when one is configured:

  • provider fallback chains
  • provider model routing by tenant, cost, latency, region, or capacity
  • provider budgets and rate limits
Last updated on