Documentation Index
Fetch the complete documentation index at: https://docs.uselayerup.com/llms.txt
Use this file to discover all available pages before exploring further.
07 — Data plane — ingestion & source surfaces.
The Data Plane is the platform’s port to the outside world. It accepts heterogeneous signal from many channels, classifies it, hashes it, normalises it, and presents it to the Ontology and the Reasoning plane as a typed, content-addressed stream of objects.7.0 Reference — multimodal ingestion layer
The platform ingests anything an insurance carrier already touches: human-language channels (email, chat, phone, SMS), structured digital channels (forms, partner APIs, webhooks), document channels (PDF, scans, images, handwriting), voice channels (calls, voicemail, IVR transcripts), and machine channels (telematics, sensor, CDC). Everything funnels through a Channel Router, into channel-appropriate extractors (OCR / VLM, voice transcription, structured parsers), and out into a Unified Intake Queue typed against the Ontology. Fig. 7.0a — Multimodal ingestion layer. Eight channel families → Channel Router → channel-specific extractors → Unified Intake Queue → Ontology.7.1 Channel matrix
The substrate accepts signal across three transport modes — push, pull, and stream — and three content modes — structured, semi-structured, and unstructured. The same ingest gateway (§7.3) terminates all of them; differences are confined to adapters.| Mode | Examples | Latency | Idempotency | Default backpressure |
|---|---|---|---|---|
| Push · structured | Webhooks · partner APIs · broker portals | real-time | caller-supplied key + body hash | HTTP 429 + Retry-After |
| Push · unstructured | Inbound email · SFTP drop · upload portal | seconds–minutes | (message-id, attachment-hash) tuple | queue depth gating |
| Pull · structured | Policy admin · claims · billing · GL | scheduled or change-feed | (source, primary key, version) | chunked cursor |
| Pull · unstructured | Document mgmt · regulator portals | polled | (source, externalId, contentHash) | chunked cursor |
| Stream | Telematics · sensor · clickstream · partner CDC | milliseconds–seconds | per-stream watermark | partition lag · DLQ |
7.2 Content modes
Typed payloads
Structured — JSON / XML / Avro / Protobuf with a known schema. Validated at the gateway, mapped directly to Ontology objects.
Form-shaped
Semi-structured — Standard forms (ACORD, regulator filings, broker spreadsheets). Parsed into a tabular intermediate, then mapped.
Free-form content
Unstructured — PDFs, images, scans, handwritten forms, emails, voice transcripts. Routed through extraction tools (§9 · Extraction) under audit.
7.3 Ingest gateway responsibilities
The ingest gateway is a single hardened component that every channel terminates into. It performs:- Authentication — mTLS / OAuth2 client credentials / signed webhook / SFTP key. Anonymous ingest is never permitted.
- Authorisation — the source principal’s scope must include
data.ingest.<channel>. Cross-tenant routing is impossible by construction. - Anti-malware & content scan — every byte stream is scanned at the boundary. Quarantine on detection; tenant security plane notified.
- Classification — initial markings (tenant, region, default sensitivity) are applied immediately.
- Hashing — SHA-256 of the canonical payload bytes; stored as the document content address.
- Persistence — write to immutable object storage with versioned key and retention policy.
- Dedupe — see §7.5.
- Acknowledgement — typed receipt with
(documentId, contentHash, ingestId); acknowledged after durable persistence and audit emission.
7.4 Topology
Fig. 7.1 — Ingest topology. Every channel terminates into the same gateway; every step emits an AuditEvent.7.5 Dedupe key construction
Dedupe keys are deterministic per channel and form the basis of intake idempotency. The same logical signal received twice never produces two upstream objects.| Channel | Dedupe key |
|---|---|
| Webhook | sha256(deliveryId · bodyHash) |
sha256(messageId · normalisedFrom · attachmentHashes) | |
| Pull (CDC / change-feed) | (sourceId, recordKey, sourceVersion) |
| SFTP / batch | sha256(filePath · contentHash · ingestEpoch) |
| Stream | (partition, offset) |
7.6 Intake idempotency contract
- Replays of an identical payload return the original
ingestIdwith the receipt unchanged. - Different payloads under the same dedupe key are typed as a
data.ingest.collisionException and quarantined. - Replays of an identical stream offset are silently dropped.
- The gateway never silently overwrites a previously persisted document.
7.7 Rate-shaping & backpressure
Each channel has a per-tenant rate budget configured at provisioning. Sustained breach results in 429 / Retry-After to push channels and pause-of-cursor to pull adapters. Streams apply per-partition lag thresholds; sustained breach moves traffic to a slow lane and raises adata.ingest.lag Exception. The gateway never silently drops signal.
7.8 Boundary failure modes
| Failure | Detection | Containment | Audit signature |
|---|---|---|---|
| Auth replay | nonce / timestamp window | reject; lock principal after threshold | data.ingest.auth_replay |
| Schema drift (push) | schema validation fail | route to schema-quarantine queue | data.ingest.schema_drift |
| Malware | scanner verdict | quarantine; security notify | data.ingest.malware |
| Source unavailable (pull) | error rate window | backoff with circuit breaker | data.ingest.source_down |
| Stream lag | partition lag > SLO | slow lane + Exception | data.ingest.lag |
7.9 Channel Router
The Channel Router is the first hop after the gateway accepts a payload. Its job is to decide which extractor lane the payload belongs to, which tenant and region it lives in, and which rate budget and markings apply — before any extractor touches the bytes. It is intentionally thin and stateless. Every routing decision is a typed AuditEvent. Inputs- Wire-level metadata (transport, source IP, signed sender).
- Authenticated principal and its tenant scope.
- Content type, MIME, magic-bytes, file extension.
- Channel hint declared by the gateway (e.g.
email.inbound,partner.webhook).
- Typed
ChannelRoute:{tenant, region, channel, lane, markings, rateClass}. - One or more lane handles (a structured payload can fan out: parse the email body in the email-parser lane and route attachments through the OCR / VLM lane).
- Audit emission:
data.route.assignedwith route handle and decision reasons.
| Channel class | Default lane | Fan-out lanes |
|---|---|---|
| Email · inbound | email-parser | OCR / VLM (attachments) · structured-parser (forms) |
| Chat / SMS | chat-parser | — |
| Phone / voicemail / IVR | voice-transcription | chat-parser (post-transcript) |
| Web / mobile form | structured-parser | OCR / VLM (uploads) |
| Partner API / webhook | structured-parser | — |
| SFTP / batch | structured-parser or OCR / VLM by content type | — |
| Telematics / sensor | stream-consumer | — |
7.10 Agentic OCR
The OCR / VLM lane is not a single OCR engine; it is an agentic extraction pipeline that selects and combines extraction tools per document. The lane supports printed text, handwriting, scanned forms, photographs, diagrams, tables, and mixed-content multi-page documents. Pipeline stages- Pre-processing — orientation, deskew, despeckle, page split, segmentation.
- Layout analysis — block / line / table / figure regions; reading order.
- Primary extraction — printed-text OCR, handwriting OCR, table parser, signature detector, stamp detector. Each is a versioned tool (§9 · Extraction).
- VLM fallback — for low-confidence regions or non-textual content (images, diagrams), a vision-language model emits structured descriptions.
- Cross-extractor reconciliation — outputs are reconciled by an agent that selects the most-supported value per field; the reasoning trail (§10.3) records why.
- Provenance emission — every emitted property carries an
EvidenceSpan(page, bbox, optionally token range) plus extractor identity and confidence.
- Confidence per field is calibrated per extractor and per document class (§8.3).
- Every reconciliation decision is an AuditEvent (
data.ocr.reconcile). - Originals are retained at content-address; an extraction can be re-run against a newer extractor version without losing the original lineage.
7.11 Voice Transcription
Voice channels (recorded calls, voicemail, IVR, agent-customer dialog, claim FNOL phone intake, broker phone submissions) are routed into the voice-transcription lane. The lane produces a typedTranscript with timing, speaker labels, language tag,
redactions, and line-by-line confidence.
Pipeline stages
- Pre-processing — channel split, silence trim, format normalisation.
- ASR — speech-to-text in the language detected; configurable per region and per LOB.
- Diarisation — speaker turns (agent / customer / third-party).
- Translation (optional) — into the operator-display language; original transcript retained.
- Redaction — PII / PHI tokens are redacted at the lane boundary per markings policy (§15.4); raw audio retention follows the configured per-tenant retention floor.
- Hand-off — the transcript is published to the Unified Intake Queue (§7.12) typed as
VoiceTranscript, with a backref to the original recording.
7.12 Unified Intake Queue
The Unified Intake Queue is the single, ordered, typed surface that all extractor lanes publish into and that downstream planes (Ontology, Reasoning) consume from. It is what makes the substrate channel-agnostic above the queue: an underwriting agent or a claims agent does not know whether the originating signal arrived as an email attachment, an IVR call, a partner webhook, or a sensor event — it sees only typed ontology objects with provenance. Properties- Typed: every queued event is an Ontology-typed payload with provenance.
- Per-tenant: queues are tenant-isolated; cross-tenant fan-out is impossible.
- Per-region: queues are region-bound; cross-region transit requires explicit policy.
- Ordered with idempotency: the dedupe key (§7.5) is honoured at queue write; re-publishes are no-ops.
- Backpressure-aware: lag SLOs apply (§7.7); slow lanes are first-class.
- Auditable: every enqueue and dequeue is an AuditEvent (
data.intake.publish,data.intake.consume).
The Channel Router establishes region at the gateway, every extractor lane runs
region-pinned, and the Unified Intake Queue is region-bound. A payload that arrives in
an EU mailbox is parsed by EU extractors, queued in the EU intake queue, and consumed
by EU agents using EU-pinned models — with no cross-region transit. The same
pattern holds for any region: APAC, LATAM, sovereign clouds. Region is therefore not a
deployment afterthought but a property carried by every event from byte zero. (See §19,
§23.2.)

