LLM Dispatch Runtime

The LLM Dispatch Runtime is Marie’s internal execution fabric for LLM calls that originate from executors. It gives operators live visibility into queued, in-flight, completed, failed, and dropped LLM work without making every executor call the model backend directly.

It is not a replacement for LiteLLM, OpenRouter, vLLM, or another OpenAI-compatible backend. Marie Dispatch handles executor ingress and request lifecycle. The configured backend handles provider routing.

Responsibility Split

Use this boundary when operating or debugging the system:

Layer	Owner	Responsibility
Executor ingress	Marie Gateway / Runtime Fabric	Receive executor-originated LLM work and select the configured dispatch pool
Dispatch	Marie LLM Dispatch Runtime	Valkey request/reply transport, producer liveness, in-flight state, timeouts, drops, backpressure, and runtime health
Execution	OpenAI-compatible adapter	Send one `chat.completions.create(...)` call to the configured backend URL
Provider routing	LiteLLM, OpenRouter, vLLM, or hosted provider	Provider fallback, model routing, budgets, rate limits, and provider-specific policy
Observability	OTel and ClickHouse	Trace correlation, completed execution history, latency, tokens, and failure analytics

If the backend URL points at LiteLLM or OpenRouter, provider fallback chains, cost/latency routing, budgets, and provider rate limits belong there. Marie Dispatch still owns dispatch-layer retry, timeout, circuit-breaker, durability, and backpressure semantics.

Request Flow


Executor
  -> Marie Gateway / Runtime Fabric ingress
  -> Valkey request queue
  -> LLM Dispatch Runtime
  -> OpenAI-compatible backend URL
  -> provider gateway/backend routing
  -> completion response
  -> producer reply queue
  -> executor reply pump

Phase 1 uses Valkey LISTs for live transport:

list:llm:requests:{pool_id} stores pending requests.
list:llm:replies:{producer_id} stores replies for a specific producer.
key:llm:producer:{producer_id}:alive indicates that the producer can still receive replies.

Runtime Fabric Page

Open Infrastructure → LLM Dispatch to inspect the selected Runtime Fabric.

The page has two views:

Live Requests shows pending requests still in Valkey and in-flight requests already popped by a dispatcher.
Recent Executions reads completed dispatcher spans from ClickHouse otel.otel_traces.

Live state is intentionally separate from completed history. Valkey is transport and live state only. Completed request history comes from OTel/ClickHouse.

Best-Effort Semantics

Phase 1 is best-effort after a dispatcher pops a request:

If a dispatcher crashes after BLPOP and before publishing a reply, that request may be lost.
If the producer liveness key is gone before execution, the dispatcher drops the popped request.
If the producer disappears after execution but before reply publish, the dispatcher drops the reply.
Scheduler-level retry or resubmission remains above the dispatch runtime.

This is a deliberate Phase 1 tradeoff. Do not treat Valkey LIST state as a durable audit ledger.

Required Configuration

Gateway dispatch runtime:


LLM_QUEUE_ENABLED=true
LLM_QUEUE_VALKEY_URL=redis://marie-valkey:6379/0
LLM_QUEUE_POOL_ID=default
LLM_QUEUE_MAX_INLINE_PAYLOAD_BYTES=16777216
LLM_QUEUE_FABRIC_GROUP_ID=default
LLM_QUEUE_GATEWAY_ID=gateway-localhost
OPENAI_API_KEY=EMPTY
OPENAI_API_BASE=http://litellm:4000/v1

Executor producers:


LLM_QUEUE_ENABLED=true
LLM_QUEUE_VALKEY_URL=redis://marie-valkey:6379/0
LLM_QUEUE_POOL_ID=default
LLM_QUEUE_MAX_INLINE_PAYLOAD_BYTES=16777216

Processors do not need LLM_QUEUE_FABRIC_GROUP_ID or LLM_QUEUE_GATEWAY_ID unless they also run a dispatcher.

What To Debug Where

Symptom	First place to check
Pending count grows	Runtime Fabric live view and Valkey request queue depth
In-flight count sticks	Dispatcher health, backend latency, and in-flight request rows
Requests disappear without completion	Producer liveness drops or dispatcher crash-after-pop
Backend connection errors	Gateway dispatcher logs and backend URL configuration
Provider fallback did not happen	LiteLLM/OpenRouter/provider gateway configuration
Budget or rate-limit rejection	Downstream provider gateway logs and policy
Completed item missing from history	OTel exporter, ClickHouse `otel.otel_traces`, and Runtime Fabric identity env vars

Current Design Gaps

These are Marie Dispatch responsibilities and should be designed as runtime features:

dispatch-layer circuit breaker around the configured backend URL
explicit retry attempts and retry outcome in UI/history
durable replay mode after dispatcher crash
clearer admission/backpressure UX beyond pending and in-flight counts

These are intentionally delegated to the downstream provider gateway when one is configured:

provider fallback chains
provider model routing by tenant, cost, latency, region, or capacity
provider budgets and rate limits