LLM Dispatch Runtime
The LLM Dispatch Runtime is Marie’s internal execution fabric for LLM calls that originate from executors. It gives operators live visibility into queued, in-flight, completed, failed, and dropped LLM work without making every executor call the model backend directly.
It is not a replacement for LiteLLM, OpenRouter, vLLM, or another OpenAI-compatible backend. Marie Dispatch handles executor ingress and request lifecycle. The configured backend handles provider routing.
Responsibility Split
Use this boundary when operating or debugging the system:
| Layer | Owner | Responsibility |
|---|---|---|
| Executor ingress | Marie Gateway / Runtime Fabric | Receive executor-originated LLM work and select the configured dispatch pool |
| Dispatch | Marie LLM Dispatch Runtime | Valkey request/reply transport, producer liveness, in-flight state, timeouts, drops, backpressure, and runtime health |
| Execution | OpenAI-compatible adapter | Send one chat.completions.create(...) call to the configured backend URL |
| Provider routing | LiteLLM, OpenRouter, vLLM, or hosted provider | Provider fallback, model routing, budgets, rate limits, and provider-specific policy |
| Observability | OTel and ClickHouse | Trace correlation, completed execution history, latency, tokens, and failure analytics |
If the backend URL points at LiteLLM or OpenRouter, provider fallback chains, cost/latency routing, budgets, and provider rate limits belong there. Marie Dispatch still owns dispatch-layer retry, timeout, circuit-breaker, durability, and backpressure semantics.
Request Flow
Executor
-> Marie Gateway / Runtime Fabric ingress
-> Valkey request queue
-> LLM Dispatch Runtime
-> OpenAI-compatible backend URL
-> provider gateway/backend routing
-> completion response
-> producer reply queue
-> executor reply pumpPhase 1 uses Valkey LISTs for live transport:
list:llm:requests:{pool_id}stores pending requests.list:llm:replies:{producer_id}stores replies for a specific producer.key:llm:producer:{producer_id}:aliveindicates that the producer can still receive replies.
Runtime Fabric Page
Open Infrastructure → LLM Dispatch to inspect the selected Runtime Fabric.
The page has two views:
- Live Requests shows pending requests still in Valkey and in-flight requests already popped by a dispatcher.
- Recent Executions reads completed dispatcher spans from ClickHouse
otel.otel_traces.
Live state is intentionally separate from completed history. Valkey is transport and live state only. Completed request history comes from OTel/ClickHouse.
Best-Effort Semantics
Phase 1 is best-effort after a dispatcher pops a request:
- If a dispatcher crashes after
BLPOPand before publishing a reply, that request may be lost. - If the producer liveness key is gone before execution, the dispatcher drops the popped request.
- If the producer disappears after execution but before reply publish, the dispatcher drops the reply.
- Scheduler-level retry or resubmission remains above the dispatch runtime.
This is a deliberate Phase 1 tradeoff. Do not treat Valkey LIST state as a durable audit ledger.
Required Configuration
Gateway dispatch runtime:
LLM_QUEUE_ENABLED=true
LLM_QUEUE_VALKEY_URL=redis://marie-valkey:6379/0
LLM_QUEUE_POOL_ID=default
LLM_QUEUE_MAX_INLINE_PAYLOAD_BYTES=16777216
LLM_QUEUE_FABRIC_GROUP_ID=default
LLM_QUEUE_GATEWAY_ID=gateway-localhost
OPENAI_API_KEY=EMPTY
OPENAI_API_BASE=http://litellm:4000/v1Executor producers:
LLM_QUEUE_ENABLED=true
LLM_QUEUE_VALKEY_URL=redis://marie-valkey:6379/0
LLM_QUEUE_POOL_ID=default
LLM_QUEUE_MAX_INLINE_PAYLOAD_BYTES=16777216Processors do not need LLM_QUEUE_FABRIC_GROUP_ID or LLM_QUEUE_GATEWAY_ID unless they also run a dispatcher.
What To Debug Where
| Symptom | First place to check |
|---|---|
| Pending count grows | Runtime Fabric live view and Valkey request queue depth |
| In-flight count sticks | Dispatcher health, backend latency, and in-flight request rows |
| Requests disappear without completion | Producer liveness drops or dispatcher crash-after-pop |
| Backend connection errors | Gateway dispatcher logs and backend URL configuration |
| Provider fallback did not happen | LiteLLM/OpenRouter/provider gateway configuration |
| Budget or rate-limit rejection | Downstream provider gateway logs and policy |
| Completed item missing from history | OTel exporter, ClickHouse otel.otel_traces, and Runtime Fabric identity env vars |
Current Design Gaps
These are Marie Dispatch responsibilities and should be designed as runtime features:
- dispatch-layer circuit breaker around the configured backend URL
- explicit retry attempts and retry outcome in UI/history
- durable replay mode after dispatcher crash
- clearer admission/backpressure UX beyond pending and in-flight counts
These are intentionally delegated to the downstream provider gateway when one is configured:
- provider fallback chains
- provider model routing by tenant, cost, latency, region, or capacity
- provider budgets and rate limits