12 — Model gateway — registry, routing & lifecycle.

The Model Gateway is the only path from the Reasoning Plane to a model. It is vendor-neutral by construction: the same agent and the same tool talks to a routing surface, not to a provider. Models are typed slots in capability lanes; capability lanes are the unit of substitution. Above the gateway is a Model Fabric: a uniform pipeline that takes any candidate model — closed-source, open-source approved, or customer-owned — and traverses it through benchmarking, optimisation, fine-tuning, grounding, and continuous monitoring before it ever serves traffic.

12.0 Reference — model fabric

The Model Fabric makes “which model do you use?” a configuration question, not a platform question. Any model that can pass the fabric’s gates can serve the substrate; any model that cannot, cannot. The fabric is the same for every provenance tier (§12.12) and for every region. Fig. 12.0a — Model Fabric. Three model-source tiers feed into a uniform six-stage pipeline (Min Benchmarks → Use-case Optimisation → Fine-tuning → SOP Configuration → RAG Grounding → Production & Monitoring) before entering the Gateway.

12.1 Capability lanes

A capability lane is a typed slot with a declared SLO target, an approved-models list, and a routing policy. Lanes are stable across model generations; the mapping from lane to underlying model is what changes.

Lane	Purpose	Default SLO p99	Default approved providers (typical)
`reasoning.long`	Multi-step reasoning over long contexts	15s	Frontier proprietary · OSS frontier
`reasoning.fast`	Short-horizon reasoning, classification	2s	Frontier proprietary · OSS
`extract.text`	Structured extraction from text	3s	Frontier · specialised SLM
`vlm.document`	Vision-language understanding of documents	5s	Frontier VLM · specialised
`ocr.handwriting`	Handwriting and degraded scan OCR	4s	Specialised OCR engines
`embedding.text`	Embeddings for retrieval	500ms	Frontier · OSS
`embedding.multilingual`	Cross-language embeddings	500ms	Frontier · OSS
`verify.classifier`	Adversarial probes · safety	1s	Specialised classifiers

The platform does not bind a tenant to any specific provider. Providers in the table above are typical examples; the registry of approved providers is a tenant decision and is reflected in the per-lane routing policy.

12.2 Approved model registry

Every model that any agent or tool can call must be present in the approved registry. The registry record is the contract.

id:           model.<namespace>.<name>
provider:     proprietary | oss | byo
endpoint:     <tenant-private endpoint>
lanes:        [reasoning.long, reasoning.fast, …]
region:       eu-central-1 | us-east-1 | sovereign-zone-a | onprem-rack-1
parameters:
  default:    { temperature: 0.0, top_p: 1, max_tokens: 4096 }
  bounds:     { temperature: [0.0, 0.4] }
fineTune:
  base:       <base-model-id>
  datasetHash: <sha256>
  approved:   true | false
trainingPolicy:
  customerData:    none-by-default
  optInDatasets:   [ <dataset-ids> ]
evals:
  required:   [ pack.lane.<lane>.<version> ]
  passing:    { score_min: 0.85, regression_max: 0.0 }
release:
  state:      enabled | shadow | demoted | retired
  promotedAt: 2026-04-12T08:00:00Z
  rollbackTo: model.<previous-id>

12.3 Routing

For each lane, a routing policy chooses a model per call. The policy is declarative; the runtime resolves it.

lane: reasoning.long
strategy:
  primary:  model.acme.frontier-r2
  fallback: model.acme.frontier-r1
  shadow:   model.opencorp.oss-r3       # called for eval; result discarded
  shadow_sample_rate: 0.10
filters:
  - require: marking.allows(input)
  - require: region.matches(tenant.region_pin)
  - require: cost.under(budget.remaining)
  - reject:  customerData.train == true   # never route data to a training endpoint
oncall:
  primary_unavailable:
    after: PT2S
    do:    fallback
  drift_breach:
    state: shadow

12.4 Architecture

Fig. 12.1 — Model gateway architecture. The router enforces approval, region, marking and cost; the lane decides the provider; eval & drift observe.

12.5 Asset separation

The gateway treats prompts, tools, retrieval corpora and fine-tunes as four separate, independently versioned, independently audited assets (§2.8). The gateway never accepts an inline prompt assembled at request time without a registered prompt id; un-pinned prompts are rejected.

12.6 Region pinning

Each tenant declares one or more regions; the gateway enforces that a call routes only to endpoints inside the declared regions. Cross-region routing requires an explicit, time-limited tenant configuration change. The router refuses cross-region routing on a single call.

12.7 No-train policy

Default: no customer data is used to train any model.
Provider endpoints used by the platform are configured to disable training; the gateway refuses to use any endpoint that does not return a verifiable no-train signal.
Opt-in is per (dataset, model lineage) tuple; opt-in is itself a typed model.train.opt_in AuditEvent on the tenant chain.
Fine-tunes use only opt-in datasets; the dataset hash is part of the model registry record.

12.8 Promotion / demotion lifecycle

A model traverses a fixed lifecycle. Each transition is a typed AuditEvent. Fig. 12.2 — Model lifecycle. Movement is governed by eval & drift signals (§13).

12.9 Cost & latency tradeoffs

Each lane has a per-call cost and a budget. The router can downgrade to a smaller model when budget pressure rises, only if the lane’s eval policy allows it. Downgrades are recorded; downgrade rate is a watched metric.

12.10 BYO and on-prem models

Tenants can register their own model endpoints (BYO) or run self-hosted OSS models on-prem. Both must:

Pass the approved-model registry’s eval requirement.
Expose a no-train signal at runtime.
Be addressable inside the tenant’s region pin.
Be reachable through the gateway’s standard interface (no direct calls from agents).

12.11 Model Deployment Lifecycle

Every candidate model entering the substrate — whether a frontier closed-source release, an approved OSS check-point, or a customer-owned fine-tune — traverses the same six-stage lifecycle. The lifecycle is the platform’s mechanism for converting “the next great model” into “a model we can rely on for production insurance work.” Stages are gated, audited, and reversible. Stage 1 · Minimum Benchmarks

Standardised pack of capability tests per lane (§12.1): correctness, format compliance, safety probes, latency, cost.
A model that fails the minimum threshold for a lane cannot be admitted to that lane — not even as a shadow.
Benchmarks are versioned; a benchmark version bump can demote a previously-passing model.

Stage 2 · Use-case Optimisation

Per-tenant / per-LOB / per-workflow eval suites (§13).
Prompt revision selection, parameter selection (temperature bounds, top-p, max-tokens).
Tool-use shaping: which tools the model can call, with what budget.
Output: a tuned configuration registered as a separate prompt-rev / parameter-set, not a new model.

Stage 3 · Fine-Tuning (optional, gated)

Only where eval evidence justifies fine-tuning over prompt-rev tuning.
Only on opt-in datasets (§12.7); dataset hash is part of the registry record.
Evaluated against the original model’s golden suite plus targeted regressions.
Fine-tunes never replace base models silently; they enter as separate registry entries.

Stage 4 · SOP Configuration

Standard-operating-procedure rules attach to the model’s lane usage: which scopes, which markings, which approval thresholds, which kill-switch rules (§15.11).
Compliance & Control Layer (§15.10) bindings — PII / screening, QA model attachment, confidence scoring — are configured here.
Output: an SOP record paired with the model registry id.

Stage 5 · RAG Grounding

The model is paired with the appropriate retrieval snapshots (§8.13): per-tenant indexes, per-LOB code lists, per-region knowledge bases.
Grounding evals run: does the model cite EvidenceSpans correctly under retrieval? Does it refuse to fabricate when retrieval is empty?
Output: a grounded configuration that the gateway can route traffic to.

Stage 6 · Production & Monitoring

Promotion from shadow to enabled via the lifecycle in §12.8.
Drift sigma watched continuously (§13); breach demotes the model automatically.
Cost / latency / downgrade-rate watched as SLOs (§20).
Incident handling and rollback per §21.

An agent cannot reach a model that has not traversed all six stages. There is no “experimental endpoint” exposed in production. Stage 1 is where models are candidates; Stage 6 is where they are citizens.

12.12 Model Ecosystem — provenance tiers

The substrate accepts three model-source tiers and treats them uniformly through the gateway. From a tenant’s standpoint, the tier choice is governance and economics, not architecture.

Closed-source / Frontier

Tier A — Frontier reasoning, long-context, high-capability lanes. Vendor-managed; tenant contracts directly with the vendor or via a cloud-hosted enterprise endpoint with no-train guarantees. Fastest path to top-of-lane capability; subject to vendor pricing and rate limits.

Open-Source Approved

Tier B — Vetted OSS check-points hosted in the tenant’s region (cloud or on-prem). Lower marginal cost; full control over endpoint and weights. Required for sovereign-cloud deployments and for tenants with strict data-sovereignty requirements. Capability tracks the OSS frontier.

Proprietary & Customer-Owned

Tier C — Customer-owned base models (e.g. a carrier’s internally-trained underwriting LLM) or fine-tunes of Tier A / Tier B models on opt-in datasets. The carrier owns the weights; Layerup operates the endpoint inside the gateway. Eval and drift gates apply identically.

Tier mixing is the norm, not the exception. A typical production deployment routes reasoning.long to Tier A, extract.text to a fine-tune (Tier C), embedding.text to Tier B, and keeps Tier A as a fallback under cost pressure. The router (§12.3) is per lane; each lane can sit in any tier.

12.13 Customer-Owned Model Endpoint pattern

For Tier C, the substrate supports a specific deployment pattern: a customer-owned base model or fine-tune, deployed on infrastructure the carrier controls, exposed to the Model Gateway through a standard interface. Properties

Weight ownership. The carrier owns the weights; Layerup never moves them off carrier-controlled infrastructure.
Endpoint hosting. The endpoint runs in the carrier’s VPC, in the carrier’s sovereign cloud, or on the carrier’s on-prem GPU fleet, depending on topology (§19).
Gateway integration. The endpoint is registered as a model in §12.2 with provider byo; agents and tools never call it directly.
Eval & drift gates. The same eval suites and drift sigma apply as for Tier A and Tier B; passing is mandatory for production routing.
No-train signal. The endpoint must report a verifiable no-train signal per §12.7; the gateway refuses to use endpoints that do not.
Region pinning. The endpoint inherits the carrier’s region pin and cannot serve calls from outside it.
Audit. Every call records the model registry id, prompt rev, retrieval snap, parameter set, and seed — identical to Tier A and Tier B.

From the agent’s perspective, a Tier C endpoint is indistinguishable from a Tier A frontier endpoint — that is precisely the point. The substrate decouples agent authoring from model provenance, so a carrier can move workloads between tiers as cost, sovereignty, or capability requirements change, without rewriting agents.

Foundations

Ontology

Data Plane

Logic & Reasoning

Models

Action Plane

Security & Governance

Operations

Enterprise

Model Gateway

12 — Model gateway — registry, routing & lifecycle.

12.0 Reference — model fabric

12.1 Capability lanes

12.2 Approved model registry

12.3 Routing

12.4 Architecture

12.5 Asset separation

12.6 Region pinning

12.7 No-train policy

12.8 Promotion / demotion lifecycle

12.9 Cost & latency tradeoffs

12.10 BYO and on-prem models

12.11 Model Deployment Lifecycle

12.12 Model Ecosystem — provenance tiers

Closed-source / Frontier

Open-Source Approved

Proprietary & Customer-Owned

12.13 Customer-Owned Model Endpoint pattern

​12 — Model gateway — registry, routing & lifecycle.

​12.0 Reference — model fabric

​12.1 Capability lanes

​12.2 Approved model registry

​12.3 Routing

​12.4 Architecture

​12.5 Asset separation

​12.6 Region pinning

​12.7 No-train policy

​12.8 Promotion / demotion lifecycle

​12.9 Cost & latency tradeoffs

​12.10 BYO and on-prem models

​12.11 Model Deployment Lifecycle

​12.12 Model Ecosystem — provenance tiers

Closed-source / Frontier

Open-Source Approved

Proprietary & Customer-Owned

​12.13 Customer-Owned Model Endpoint pattern

12 — Model gateway — registry, routing & lifecycle.

12.0 Reference — model fabric

12.1 Capability lanes

12.2 Approved model registry

12.3 Routing

12.4 Architecture

12.5 Asset separation

12.6 Region pinning

12.7 No-train policy

12.8 Promotion / demotion lifecycle

12.9 Cost & latency tradeoffs

12.10 BYO and on-prem models

12.11 Model Deployment Lifecycle

12.12 Model Ecosystem — provenance tiers

12.13 Customer-Owned Model Endpoint pattern