Documentation Index
Fetch the complete documentation index at: https://docs.uselayerup.com/llms.txt
Use this file to discover all available pages before exploring further.
12 — Model gateway — registry, routing & lifecycle.
The Model Gateway is the only path from the Reasoning Plane to a model. It is vendor-neutral by construction: the same agent and the same tool talks to a routing surface, not to a provider. Models are typed slots in capability lanes; capability lanes are the unit of substitution. Above the gateway is a Model Fabric: a uniform pipeline that takes any candidate model — closed-source, open-source approved, or customer-owned — and traverses it through benchmarking, optimisation, fine-tuning, grounding, and continuous monitoring before it ever serves traffic.12.0 Reference — model fabric
The Model Fabric makes “which model do you use?” a configuration question, not a platform question. Any model that can pass the fabric’s gates can serve the substrate; any model that cannot, cannot. The fabric is the same for every provenance tier (§12.12) and for every region. Fig. 12.0a — Model Fabric. Three model-source tiers feed into a uniform six-stage pipeline (Min Benchmarks → Use-case Optimisation → Fine-tuning → SOP Configuration → RAG Grounding → Production & Monitoring) before entering the Gateway.12.1 Capability lanes
A capability lane is a typed slot with a declared SLO target, an approved-models list, and a routing policy. Lanes are stable across model generations; the mapping from lane to underlying model is what changes.| Lane | Purpose | Default SLO p99 | Default approved providers (typical) |
|---|---|---|---|
reasoning.long | Multi-step reasoning over long contexts | 15s | Frontier proprietary · OSS frontier |
reasoning.fast | Short-horizon reasoning, classification | 2s | Frontier proprietary · OSS |
extract.text | Structured extraction from text | 3s | Frontier · specialised SLM |
vlm.document | Vision-language understanding of documents | 5s | Frontier VLM · specialised |
ocr.handwriting | Handwriting and degraded scan OCR | 4s | Specialised OCR engines |
embedding.text | Embeddings for retrieval | 500ms | Frontier · OSS |
embedding.multilingual | Cross-language embeddings | 500ms | Frontier · OSS |
verify.classifier | Adversarial probes · safety | 1s | Specialised classifiers |
The platform does not bind a tenant to any specific provider. Providers in the table
above are typical examples; the registry of approved providers is a tenant decision and
is reflected in the per-lane routing policy.
12.2 Approved model registry
Every model that any agent or tool can call must be present in the approved registry. The registry record is the contract.12.3 Routing
For each lane, a routing policy chooses a model per call. The policy is declarative; the runtime resolves it.12.4 Architecture
Fig. 12.1 — Model gateway architecture. The router enforces approval, region, marking and cost; the lane decides the provider; eval & drift observe.12.5 Asset separation
The gateway treats prompts, tools, retrieval corpora and fine-tunes as four separate, independently versioned, independently audited assets (§2.8). The gateway never accepts an inline prompt assembled at request time without a registered prompt id; un-pinned prompts are rejected.12.6 Region pinning
Each tenant declares one or more regions; the gateway enforces that a call routes only to endpoints inside the declared regions. Cross-region routing requires an explicit, time-limited tenant configuration change. The router refuses cross-region routing on a single call.12.7 No-train policy
- Default: no customer data is used to train any model.
- Provider endpoints used by the platform are configured to disable training; the gateway refuses to use any endpoint that does not return a verifiable no-train signal.
- Opt-in is per (dataset, model lineage) tuple; opt-in is itself a typed
model.train.opt_inAuditEvent on the tenant chain. - Fine-tunes use only opt-in datasets; the dataset hash is part of the model registry record.
12.8 Promotion / demotion lifecycle
A model traverses a fixed lifecycle. Each transition is a typed AuditEvent. Fig. 12.2 — Model lifecycle. Movement is governed by eval & drift signals (§13).12.9 Cost & latency tradeoffs
Each lane has a per-call cost and a budget. The router can downgrade to a smaller model when budget pressure rises, only if the lane’s eval policy allows it. Downgrades are recorded; downgrade rate is a watched metric.12.10 BYO and on-prem models
Tenants can register their own model endpoints (BYO) or run self-hosted OSS models on-prem. Both must:- Pass the approved-model registry’s eval requirement.
- Expose a no-train signal at runtime.
- Be addressable inside the tenant’s region pin.
- Be reachable through the gateway’s standard interface (no direct calls from agents).
12.11 Model Deployment Lifecycle
Every candidate model entering the substrate — whether a frontier closed-source release, an approved OSS check-point, or a customer-owned fine-tune — traverses the same six-stage lifecycle. The lifecycle is the platform’s mechanism for converting “the next great model” into “a model we can rely on for production insurance work.” Stages are gated, audited, and reversible. Stage 1 · Minimum Benchmarks- Standardised pack of capability tests per lane (§12.1): correctness, format compliance, safety probes, latency, cost.
- A model that fails the minimum threshold for a lane cannot be admitted to that lane — not even as a shadow.
- Benchmarks are versioned; a benchmark version bump can demote a previously-passing model.
- Per-tenant / per-LOB / per-workflow eval suites (§13).
- Prompt revision selection, parameter selection (temperature bounds, top-p, max-tokens).
- Tool-use shaping: which tools the model can call, with what budget.
- Output: a tuned configuration registered as a separate prompt-rev / parameter-set, not a new model.
- Only where eval evidence justifies fine-tuning over prompt-rev tuning.
- Only on opt-in datasets (§12.7); dataset hash is part of the registry record.
- Evaluated against the original model’s golden suite plus targeted regressions.
- Fine-tunes never replace base models silently; they enter as separate registry entries.
- Standard-operating-procedure rules attach to the model’s lane usage: which scopes, which markings, which approval thresholds, which kill-switch rules (§15.11).
- Compliance & Control Layer (§15.10) bindings — PII / screening, QA model attachment, confidence scoring — are configured here.
- Output: an SOP record paired with the model registry id.
- The model is paired with the appropriate retrieval snapshots (§8.13): per-tenant indexes, per-LOB code lists, per-region knowledge bases.
- Grounding evals run: does the model cite EvidenceSpans correctly under retrieval? Does it refuse to fabricate when retrieval is empty?
- Output: a grounded configuration that the gateway can route traffic to.
- Promotion from
shadowtoenabledvia the lifecycle in §12.8. - Drift sigma watched continuously (§13); breach demotes the model automatically.
- Cost / latency / downgrade-rate watched as SLOs (§20).
- Incident handling and rollback per §21.
An agent cannot reach a model that has not traversed all six stages. There is no
“experimental endpoint” exposed in production. Stage 1 is where models are
candidates; Stage 6 is where they are citizens.
12.12 Model Ecosystem — provenance tiers
The substrate accepts three model-source tiers and treats them uniformly through the gateway. From a tenant’s standpoint, the tier choice is governance and economics, not architecture.Closed-source / Frontier
Tier A — Frontier reasoning, long-context, high-capability lanes. Vendor-managed; tenant
contracts directly with the vendor or via a cloud-hosted enterprise endpoint with
no-train guarantees. Fastest path to top-of-lane capability; subject to vendor
pricing and rate limits.
Open-Source Approved
Tier B — Vetted OSS check-points hosted in the tenant’s region (cloud or on-prem). Lower
marginal cost; full control over endpoint and weights. Required for sovereign-cloud
deployments and for tenants with strict data-sovereignty requirements. Capability
tracks the OSS frontier.
Proprietary & Customer-Owned
Tier C — Customer-owned base models (e.g. a carrier’s internally-trained underwriting LLM) or
fine-tunes of Tier A / Tier B models on opt-in datasets. The carrier owns the
weights; Layerup operates the endpoint inside the gateway. Eval and drift gates
apply identically.
reasoning.long to Tier A, extract.text to a fine-tune (Tier C),
embedding.text to Tier B, and keeps Tier A as a fallback under cost
pressure. The router (§12.3) is per lane; each lane can sit in any tier.
12.13 Customer-Owned Model Endpoint pattern
For Tier C, the substrate supports a specific deployment pattern: a customer-owned base model or fine-tune, deployed on infrastructure the carrier controls, exposed to the Model Gateway through a standard interface. Properties- Weight ownership. The carrier owns the weights; Layerup never moves them off carrier-controlled infrastructure.
- Endpoint hosting. The endpoint runs in the carrier’s VPC, in the carrier’s sovereign cloud, or on the carrier’s on-prem GPU fleet, depending on topology (§19).
- Gateway integration. The endpoint is registered as a model in
§12.2 with provider
byo; agents and tools never call it directly. - Eval & drift gates. The same eval suites and drift sigma apply as for Tier A and Tier B; passing is mandatory for production routing.
- No-train signal. The endpoint must report a verifiable no-train signal per §12.7; the gateway refuses to use endpoints that do not.
- Region pinning. The endpoint inherits the carrier’s region pin and cannot serve calls from outside it.
- Audit. Every call records the model registry id, prompt rev, retrieval snap, parameter set, and seed — identical to Tier A and Tier B.

