Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.uselayerup.com/llms.txt

Use this file to discover all available pages before exploring further.

12 — Model gateway — registry, routing & lifecycle.

The Model Gateway is the only path from the Reasoning Plane to a model. It is vendor-neutral by construction: the same agent and the same tool talks to a routing surface, not to a provider. Models are typed slots in capability lanes; capability lanes are the unit of substitution. Above the gateway is a Model Fabric: a uniform pipeline that takes any candidate model — closed-source, open-source approved, or customer-owned — and traverses it through benchmarking, optimisation, fine-tuning, grounding, and continuous monitoring before it ever serves traffic.

12.0 Reference — model fabric

The Model Fabric makes “which model do you use?” a configuration question, not a platform question. Any model that can pass the fabric’s gates can serve the substrate; any model that cannot, cannot. The fabric is the same for every provenance tier (§12.12) and for every region. Fig. 12.0a — Model Fabric. Three model-source tiers feed into a uniform six-stage pipeline (Min Benchmarks → Use-case Optimisation → Fine-tuning → SOP Configuration → RAG Grounding → Production & Monitoring) before entering the Gateway.

12.1 Capability lanes

A capability lane is a typed slot with a declared SLO target, an approved-models list, and a routing policy. Lanes are stable across model generations; the mapping from lane to underlying model is what changes.
LanePurposeDefault SLO p99Default approved providers (typical)
reasoning.longMulti-step reasoning over long contexts15sFrontier proprietary · OSS frontier
reasoning.fastShort-horizon reasoning, classification2sFrontier proprietary · OSS
extract.textStructured extraction from text3sFrontier · specialised SLM
vlm.documentVision-language understanding of documents5sFrontier VLM · specialised
ocr.handwritingHandwriting and degraded scan OCR4sSpecialised OCR engines
embedding.textEmbeddings for retrieval500msFrontier · OSS
embedding.multilingualCross-language embeddings500msFrontier · OSS
verify.classifierAdversarial probes · safety1sSpecialised classifiers
The platform does not bind a tenant to any specific provider. Providers in the table above are typical examples; the registry of approved providers is a tenant decision and is reflected in the per-lane routing policy.

12.2 Approved model registry

Every model that any agent or tool can call must be present in the approved registry. The registry record is the contract.
id:           model.<namespace>.<name>
provider:     proprietary | oss | byo
endpoint:     <tenant-private endpoint>
lanes:        [reasoning.long, reasoning.fast, ]
region:       eu-central-1 | us-east-1 | sovereign-zone-a | onprem-rack-1
parameters:
  default:    { temperature: 0.0, top_p: 1, max_tokens: 4096 }
  bounds:     { temperature: [0.0, 0.4] }
fineTune:
  base:       <base-model-id>
  datasetHash: <sha256>
  approved:   true | false
trainingPolicy:
  customerData:    none-by-default
  optInDatasets:   [ <dataset-ids> ]
evals:
  required:   [ pack.lane.<lane>.<version> ]
  passing:    { score_min: 0.85, regression_max: 0.0 }
release:
  state:      enabled | shadow | demoted | retired
  promotedAt: 2026-04-12T08:00:00Z
  rollbackTo: model.<previous-id>

12.3 Routing

For each lane, a routing policy chooses a model per call. The policy is declarative; the runtime resolves it.
lane: reasoning.long
strategy:
  primary:  model.acme.frontier-r2
  fallback: model.acme.frontier-r1
  shadow:   model.opencorp.oss-r3       # called for eval; result discarded
  shadow_sample_rate: 0.10
filters:
  - require: marking.allows(input)
  - require: region.matches(tenant.region_pin)
  - require: cost.under(budget.remaining)
  - reject:  customerData.train == true   # never route data to a training endpoint
oncall:
  primary_unavailable:
    after: PT2S
    do:    fallback
  drift_breach:
    state: shadow

12.4 Architecture

Fig. 12.1 — Model gateway architecture. The router enforces approval, region, marking and cost; the lane decides the provider; eval & drift observe.

12.5 Asset separation

The gateway treats prompts, tools, retrieval corpora and fine-tunes as four separate, independently versioned, independently audited assets (§2.8). The gateway never accepts an inline prompt assembled at request time without a registered prompt id; un-pinned prompts are rejected.

12.6 Region pinning

Each tenant declares one or more regions; the gateway enforces that a call routes only to endpoints inside the declared regions. Cross-region routing requires an explicit, time-limited tenant configuration change. The router refuses cross-region routing on a single call.

12.7 No-train policy

  • Default: no customer data is used to train any model.
  • Provider endpoints used by the platform are configured to disable training; the gateway refuses to use any endpoint that does not return a verifiable no-train signal.
  • Opt-in is per (dataset, model lineage) tuple; opt-in is itself a typed model.train.opt_in AuditEvent on the tenant chain.
  • Fine-tunes use only opt-in datasets; the dataset hash is part of the model registry record.

12.8 Promotion / demotion lifecycle

A model traverses a fixed lifecycle. Each transition is a typed AuditEvent. Fig. 12.2 — Model lifecycle. Movement is governed by eval & drift signals (§13).

12.9 Cost & latency tradeoffs

Each lane has a per-call cost and a budget. The router can downgrade to a smaller model when budget pressure rises, only if the lane’s eval policy allows it. Downgrades are recorded; downgrade rate is a watched metric.

12.10 BYO and on-prem models

Tenants can register their own model endpoints (BYO) or run self-hosted OSS models on-prem. Both must:
  • Pass the approved-model registry’s eval requirement.
  • Expose a no-train signal at runtime.
  • Be addressable inside the tenant’s region pin.
  • Be reachable through the gateway’s standard interface (no direct calls from agents).

12.11 Model Deployment Lifecycle

Every candidate model entering the substrate — whether a frontier closed-source release, an approved OSS check-point, or a customer-owned fine-tune — traverses the same six-stage lifecycle. The lifecycle is the platform’s mechanism for converting “the next great model” into “a model we can rely on for production insurance work.” Stages are gated, audited, and reversible. Stage 1 · Minimum Benchmarks
  • Standardised pack of capability tests per lane (§12.1): correctness, format compliance, safety probes, latency, cost.
  • A model that fails the minimum threshold for a lane cannot be admitted to that lane — not even as a shadow.
  • Benchmarks are versioned; a benchmark version bump can demote a previously-passing model.
Stage 2 · Use-case Optimisation
  • Per-tenant / per-LOB / per-workflow eval suites (§13).
  • Prompt revision selection, parameter selection (temperature bounds, top-p, max-tokens).
  • Tool-use shaping: which tools the model can call, with what budget.
  • Output: a tuned configuration registered as a separate prompt-rev / parameter-set, not a new model.
Stage 3 · Fine-Tuning (optional, gated)
  • Only where eval evidence justifies fine-tuning over prompt-rev tuning.
  • Only on opt-in datasets (§12.7); dataset hash is part of the registry record.
  • Evaluated against the original model’s golden suite plus targeted regressions.
  • Fine-tunes never replace base models silently; they enter as separate registry entries.
Stage 4 · SOP Configuration
  • Standard-operating-procedure rules attach to the model’s lane usage: which scopes, which markings, which approval thresholds, which kill-switch rules (§15.11).
  • Compliance & Control Layer (§15.10) bindings — PII / screening, QA model attachment, confidence scoring — are configured here.
  • Output: an SOP record paired with the model registry id.
Stage 5 · RAG Grounding
  • The model is paired with the appropriate retrieval snapshots (§8.13): per-tenant indexes, per-LOB code lists, per-region knowledge bases.
  • Grounding evals run: does the model cite EvidenceSpans correctly under retrieval? Does it refuse to fabricate when retrieval is empty?
  • Output: a grounded configuration that the gateway can route traffic to.
Stage 6 · Production & Monitoring
  • Promotion from shadow to enabled via the lifecycle in §12.8.
  • Drift sigma watched continuously (§13); breach demotes the model automatically.
  • Cost / latency / downgrade-rate watched as SLOs (§20).
  • Incident handling and rollback per §21.
An agent cannot reach a model that has not traversed all six stages. There is no “experimental endpoint” exposed in production. Stage 1 is where models are candidates; Stage 6 is where they are citizens.

12.12 Model Ecosystem — provenance tiers

The substrate accepts three model-source tiers and treats them uniformly through the gateway. From a tenant’s standpoint, the tier choice is governance and economics, not architecture.

Closed-source / Frontier

Tier A — Frontier reasoning, long-context, high-capability lanes. Vendor-managed; tenant contracts directly with the vendor or via a cloud-hosted enterprise endpoint with no-train guarantees. Fastest path to top-of-lane capability; subject to vendor pricing and rate limits.

Open-Source Approved

Tier B — Vetted OSS check-points hosted in the tenant’s region (cloud or on-prem). Lower marginal cost; full control over endpoint and weights. Required for sovereign-cloud deployments and for tenants with strict data-sovereignty requirements. Capability tracks the OSS frontier.

Proprietary & Customer-Owned

Tier C — Customer-owned base models (e.g. a carrier’s internally-trained underwriting LLM) or fine-tunes of Tier A / Tier B models on opt-in datasets. The carrier owns the weights; Layerup operates the endpoint inside the gateway. Eval and drift gates apply identically.
Tier mixing is the norm, not the exception. A typical production deployment routes reasoning.long to Tier A, extract.text to a fine-tune (Tier C), embedding.text to Tier B, and keeps Tier A as a fallback under cost pressure. The router (§12.3) is per lane; each lane can sit in any tier.

12.13 Customer-Owned Model Endpoint pattern

For Tier C, the substrate supports a specific deployment pattern: a customer-owned base model or fine-tune, deployed on infrastructure the carrier controls, exposed to the Model Gateway through a standard interface. Properties
  • Weight ownership. The carrier owns the weights; Layerup never moves them off carrier-controlled infrastructure.
  • Endpoint hosting. The endpoint runs in the carrier’s VPC, in the carrier’s sovereign cloud, or on the carrier’s on-prem GPU fleet, depending on topology (§19).
  • Gateway integration. The endpoint is registered as a model in §12.2 with provider byo; agents and tools never call it directly.
  • Eval & drift gates. The same eval suites and drift sigma apply as for Tier A and Tier B; passing is mandatory for production routing.
  • No-train signal. The endpoint must report a verifiable no-train signal per §12.7; the gateway refuses to use endpoints that do not.
  • Region pinning. The endpoint inherits the carrier’s region pin and cannot serve calls from outside it.
  • Audit. Every call records the model registry id, prompt rev, retrieval snap, parameter set, and seed — identical to Tier A and Tier B.
From the agent’s perspective, a Tier C endpoint is indistinguishable from a Tier A frontier endpoint — that is precisely the point. The substrate decouples agent authoring from model provenance, so a carrier can move workloads between tiers as cost, sovereignty, or capability requirements change, without rewriting agents.