Documentation Index
Fetch the complete documentation index at: https://docs.uselayerup.com/llms.txt
Use this file to discover all available pages before exploring further.
08 — Data abstraction, mapping, lineage & replay.
Once signal is in the gateway, the Data Plane projects it onto the Ontology. This is the Data Abstraction Layer: the place where heterogeneous, messy, multi-format input becomes typed, audited, replayable Ontology objects. Every typed property carries a provenance record. Every property is part of a lineage graph that links back to its source bytes. Every run is replayable against its original ontology version. And every object is searchable by an agent through a governed RAG Knowledge Base.8.0 Reference — data abstraction layer
The Data Abstraction Layer turns the unified intake stream (§7.12) into typed Ontology objects with full lineage, then exposes those objects to the Reasoning Plane through a governed retrieval surface and a versioned RAG Knowledge Base. Fig. 8.0a — Data Abstraction Layer pipeline. Five stages (Entity Extraction → Ontology Instantiation → Schema Mapping → Cross-Validation → Semantic Search / RAG) sit between the Unified Intake Queue and the Agent Runtime.8.1 Source-to-canonical mapping model
A mapping is a versioned, declarative artefact that projects a source schema onto an Ontology object. Mappings are stored as part of the configuration domain (§21) and are subject to the same release governance as agents and tools.8.2 Provenance record
Every property the platform writes is accompanied by a provenance record. The record is the substrate’s evidentiary contract.8.3 Confidence model
Confidence is a normalised scalar in[0,1] with a defined source. Deterministic
mappings emit 1.0. Extraction tools emit a model-derived score subject to
calibration. Aggregations of multiple evidence spans use a fixed combination rule:
data.calibrator.update AuditEvent.
8.4 Lineage graph
Fig. 8.1 — Lineage graph. Every Property is reachable from its source bytes via at least one EvidenceSpan; every Decision cites the spans it relied on; every Action is reachable from the Decision that authored it.8.5 Time-travel queries
Every Object supportsasOf(timestamp) reads. The substrate retains version
history per property; given a timestamp, the resolver returns the version-set in force at
that instant. Lineage queries (e.g. “show me the EvidenceSpans cited by Decision X”) are
stable under any subsequent ontology evolution because Decisions pin to ontology versions
(§6.4).
8.6 Replay semantics
Replay reconstructs an AgentRun from its persisted inputs and lineage. The substrate distinguishes two replay modes:Deterministic steps
Bit-exact — All tool calls (§9) are idempotent and side-effect-bounded; replaying them on the same inputs reproduces the same outputs bit-exactly. Validation, lookup, conversion, classification with discrete outputs, and rule packs are bit-exact.
Non-deterministic steps
Seed-pinned — Model calls capture the model id, prompt revision, retrieval snapshot, parameter set, and seed. Replays use the exact pinned set; outputs are reproducible to the bounds the underlying model supports. Where a model has been retired, replay routes through the registered successor and an Exception of kind
replay.successor is emitted.8.7 Replay bundle format
A replay bundle is a self-contained, signed export of everything required to re-execute a run: the input objects pinned to their ontology version, the prompt revisions, the retrieval snapshots used, the tool versions, and the model lineage. Bundles are exportable in the.lrb archive format and are themselves content-addressed.
| Path inside bundle | Contents |
|---|---|
/manifest.json | Run identity, integrity hashes, signing identity |
/ontology/ | Frozen ontology snapshot at run pin |
/objects/ | Ontology objects referenced by the run |
/documents/ | Source documents (bytes-by-content-address) |
/spans/ | EvidenceSpans cited |
/prompts/ | Prompt revisions |
/retrieval/ | Retrieval-corpus snapshot manifests |
/models/ | Model lineage and capability lane mapping |
/audit/ | The slice of the audit chain covering the run |
8.8 Retention
Documents, EvidenceSpans, Decisions, Actions, and AuditEvents are retained per the tenant’s policy with a per-class minimum. The substrate enforces minimum retention regardless of any tenant deletion request; deletion below the minimum requires a typeddata.retention.exception
Decision, signed off by the tenant’s data protection officer.
Specific retention durations are tenant policy and are not part of the platform’s
architectural contract. The substrate guarantees the controls; tenants set the values.
8.9 Entity Extraction
Entity Extraction is the stage at which the substrate identifies typed entities inside unstructured payloads — named parties, identifiers, monetary amounts, dates, addresses, vehicles, vessels, properties, providers, codes — and proposes them as candidates for Ontology objects (§8.10) and Property values (§8.1). Inputs / outputs- Inputs: a content-addressed Document plus optional region/LOB hints from the Channel Router (§7.9).
- Outputs: a typed
EntityCandidateset, each with type, value, EvidenceSpan (page / bbox / token range / transcript line), extractor identity, model lineage, and calibrated confidence (§8.3).
8.10 Ontology Instantiation
Once entity candidates exist, the substrate decides which existing Ontology objects they belong to and which new objects to instantiate. This is the stage that resolves identity and links. Algorithm- Candidate normalisation — canonicalise identifiers (case-fold, strip punctuation, normalise tax-ids, addresses, account numbers).
- Entity resolution — match candidates to existing Ontology objects via deterministic keys first, then probabilistic match using approved embeddings against an entity index. Per-LOB resolvers can be pinned (e.g. provider registry for Health, vessel registry for Marine).
- Decision — one of
match(link to existing),create(instantiate new), ordefer(raise an Exception of kinddata.entity.ambiguousfor human review). - Link writes — relationships are emitted with provenance, so every link has an audit trail back to the EvidenceSpan that supports it.
data.entity.merge Decision, which is itself replayable.
8.11 Cross-Validation
Cross-Validation is the stage at which proposed property values are validated across sources before becoming authoritative on the Ontology. It is what makes the Data Abstraction Layer trustworthy under conflicting evidence. Validation classes- Within-document — consistency between fields in the same document (e.g. policy number and policyholder name match).
- Cross-document — agreement among multiple documents covering the same Ontology object (e.g. loss notice + adjuster report + photographs).
- Against systems of record — reconciliation with the authoritative system (e.g. policy admin lookup, provider registry, tax authority).
- Against rule packs — policy-table validation (e.g. coverage applies on date of loss; deductible ≤ limit; sum of allocations equals 100%).
- Against historical lineage — check that a proposed property update is consistent with the history (e.g. policy effective date does not change after binding).
data.crossvalidate.<verdict>) and is part of the property’s
provenance record.
8.12 Semantic Search & Code Lookup
Once Ontology objects exist with provenance, the substrate exposes them to the Reasoning Plane through two retrieval interfaces:- Semantic search — a typed retrieval interface that combines dense embeddings (over EvidenceSpans, Documents, transcripts, and Property text) with structured filters (tenant, region, LOB, marking, time window). All retrievals are permission-checked (§16) at query time, never at index-build time.
- Code lookup — deterministic lookups against governed code-lists (ICD-10, CPT, NAICS, ISO, vehicle / vessel / property registries, peril codes, occupational codes, currency, jurisdictional rules). Each list is versioned and pinned by the agent at run time.
8.13 RAG Knowledge Base
The RAG Knowledge Base is the substrate’s governed retrieval-augmented surface for agents that need to read beyond a single object’s lineage. It is a first-class component: indexed, versioned, multi-tenant, region-pinned, marking-aware, and replayable. Composition- Vector Store — embedding index over EvidenceSpans, Documents, transcripts, and selected Property text. Embeddings are produced by approved embedding models in the Model Gateway (§12) and re-embedded on model upgrade with a deterministic re-embedding job that produces a new retrieval snapshot.
- Indexed Knowledge — structured indexes over Ontology objects (typed properties, relationships, code-lists, calibration tables, policy tables, rule packs, prior decision summaries). These are not embedded; they are deterministic.
- Retrieval Snapshot — an immutable handle that pins which embedding model, which index version, and which inclusion / marking filters were in effect at retrieval time. Every Decision and tool call records its retrieval snapshot id (§8.7), so every retrieval is replayable.
- Marking-aware retrieval — retrievals enforce the caller’s clearance (§15.4); a chunk a caller cannot see does not appear in the result set, and its absence is itself audited.
- Tenant isolation — vector indexes are tenant-isolated by physical partition; a query cannot cross tenants by construction.
- Region pinning — indexes live in their tenant’s region; a cross-region query is impossible without an explicit replication policy.
- Provenance preservation — every retrieved chunk retains its source EvidenceSpan, so any Decision that uses a retrieval can cite the underlying bytes (§17 · Decision lineage).
- No customer-data training — retrievals are not training data. The Model Gateway’s no-train policy (§12) binds at the retrieval boundary too.

