Skip to main content

5 — Data pipeline & ingestion — end-to-end flow, event-driven triggers & document intelligence.

This section explains how application data flows into the Layerup AI Agent, how the agent processes that data, and how it surfaces structured intelligence back to your systems of record. The entire pipeline is event-driven, integrates with your existing cloud-native infrastructure, and at no point requires data to leave your network boundary.

5.1 End-to-end data flow — the full lifecycle of a single case

The following describes the complete lifecycle of a single application case as it flows through the Layerup agent in an AWS environment. The Azure equivalent is structurally identical, with Azure service equivalents substituted at each stage.
stagedescription
Step 1 — IntakeAn application packet (PDFs, medical records, Rx history, APS documents, financial statements) arrives at your existing intake system. The packet is stored in the designated S3 input bucket and an intake metadata record is written to an SQS message.
Step 2 — TriggerAn EventBridge rule (or the SQS trigger directly) detects the new SQS message and triggers the Layerup agent container execution, passing the S3 object key(s) as parameters.
Step 3 — Document RetrievalThe agent’s container reads the application documents from S3 using its scoped IAM role. Documents never leave your S3 environment; the agent pulls them into its isolated execution memory.
Step 4 — Ingestion & NormalisationThe agent’s ingestion layer processes each document: PDFs are parsed (including scanned / OCR documents), data is extracted, normalized against the underwriting configuration schema, and structured for reasoning.
Step 5 — LLM Reasoning (via Bedrock)The agent constructs prompts against its configured Agent Operating Procedure (AOP) and invokes the designated foundation model via Amazon Bedrock’s private VPC endpoint. Bedrock Guardrails are applied at this layer.
Step 6 — Structured Output GenerationThe agent assembles its final structured output: the underwriting recommendation, confidence score, flagged inconsistencies, evidence citations, open questions, and requirement list.
Step 7 — Write-BackThe output JSON payload is written to your designated S3 output bucket. Simultaneously, a completion event is written to SQS / EventBridge, which your downstream system (policy admin, CRM) consumes to update the case status.
Step 8 — Audit LoggingAll agent reasoning steps, tool invocations, source citations, and model responses are written to CloudWatch Logs in structured JSON format for audit and compliance review.
Fig. A5.1 — End-to-end data flow. Every stage executes within your network boundary. No data transits to Layerup’s infrastructure at any point.

5.2 Ingestion mechanisms — event-driven, no polling required

The Layerup agent is triggered by events within your existing event infrastructure. No polling mechanism is required. Supported trigger patterns include:

5.2.1 Event-driven architecture

AWS SQS + S3 Event Notifications

When a new application packet is uploaded to the designated S3 input bucket, S3 publishes an event notification to an SQS queue. The agent’s ECS Service or AgentCore Runtime is configured with an SQS trigger that automatically starts a new agent task for each message. This is the recommended trigger pattern for most workloads.

AWS EventBridge

For more complex routing logic (e.g., routing different application types to different agent configurations or versions), EventBridge Rules filter S3 events by prefix, metadata tag, or application type, and route each to the appropriate agent version. Useful when multiple agent configurations handle distinct case types.

API Gateway (Synchronous)

If your existing workflow system requires synchronous invocation (i.e., it calls an endpoint and waits for a response), an API Gateway endpoint can be placed in front of the agent, with a Lambda function handling the async-to-sync bridge. Given typical processing times of 7–55 minutes, the asynchronous pattern with a callback webhook is generally preferred for production workloads.

Azure Service Bus + Blob Trigger

On Azure, Blob Storage triggers an Azure Service Bus message on new document upload. Azure Container Apps’ KEDA (Kubernetes Event-Driven Autoscaling) consumes the Service Bus queue and scales agent container instances accordingly. Structurally equivalent to the SQS + S3 Event Notifications pattern on AWS.

5.2.2 Handling unstructured and poor-quality documents

A core capability of the Layerup AI Agents is the ability to ingest and extract meaningful intelligence from the imperfect document formats endemic to the insurance industry. Multi-pass OCR pipeline — Scanned PDFs and handwritten documents are processed through a multi-pass OCR pipeline before LLM reasoning. The OCR output is normalized and confidence-scored before being passed to the reasoning layer. Low-confidence OCR extractions are flagged as uncertain in the agent’s evidence citations — the agent does not silently propagate low-quality extractions into its reasoning. Multi-document context assembly — The agent’s ingestion layer reads across multiple documents simultaneously (application form, APS, Rx history, financial statements, attending physician statements) and constructs a unified case context. Cross-document inconsistencies — for example, a reported occupation on the application form that contradicts the duties described in a supporting job letter — are surfaced proactively in the agent’s output. Third-party data feed integration — Structured data from licensed third-party providers (Milliman, pharmacy claims databases, MIB) can be integrated via direct API connector or scheduled file drop, depending on the provider’s delivery mechanism. The agent’s ingestion layer normalizes these feeds into its unified case schema alongside the unstructured documents from your intake system. Ambiguity flagging — When document quality or clarity prevents confident extraction, the agent flags the specific field as uncertain rather than hallucinating a value. These flags are surfaced in the requirements and evidence citation layers of the output payload, directing the human underwriter to the specific document and field requiring manual review. Fig. A5.2 — Ingestion layer architecture. Every document type — structured or unstructured, clean or degraded — passes through the same normalisation and ambiguity-flagging pipeline before entering the reasoning layer.

5.3 Staged integration patterns

Most organizations do not begin with a fully automated end-to-end deployment. They adopt the agent progressively — starting with manual case submission and advancing toward full system integration as confidence is established and IT integration work matures. The three stages below represent the typical adoption path. Each stage is a production-ready deployment, not a prototype — organizations may choose to remain at any stage indefinitely depending on their workflow requirements.

Stage 1 — Manual / standalone (dashboard-first)

What it is: Your underwriting team submits cases directly through the companion dashboard. Documents are uploaded manually, the agent processes the case, and the underwriter reviews the structured output in the dashboard UI. No connection to your existing policy administration system, DMS, or CRM is required. When to use it: When you want to begin using the agent immediately without waiting for downstream integration work to complete. This is the fastest path to your team seeing real agent output on real cases. What your IT team needs to set up:
  • Deploy the agent container and dashboard container into your VPC
  • Configure the S3 input/output buckets and SQS intake/completion queues
  • Set up Internal ALB and VPN access for your underwriting team
  • No changes to your existing policy admin system or workflow routing
Data flow: Limitation: Case submission is manual. Volume is bounded by the number of underwriters actively using the dashboard. No automated handoff to your policy admin system.

Stage 2 — Queue-triggered (system-initiated, output consumed manually or via polling)

What it is: Your existing workflow system automatically enqueues new cases to the SQS intake queue when a submission arrives. The agent processes cases without any manual trigger. Output is written to the S3 output bucket and a completion event is published to the SQS completion queue. Your downstream system either polls S3 for the output JSON or subscribes to the completion queue. When to use it: When your intake system can be updated to write an SQS message on new case submission, but your policy admin system is not yet capable of consuming structured JSON output from S3 or a webhook in real time. The dashboard can remain deployed at this stage as an output review interface — your underwriters see processed cases there even though submission is now automated. What your IT team needs to set up:
  • Stage 1 infrastructure (agent container, S3 buckets, SQS queues)
  • Update your intake system or policy admin system to write an SQS message on new case submission (a single event publish call)
  • Optionally: update your downstream system to subscribe to the SQS completion queue and pull output from S3
Data flow: Limitation: Output consumption may still be semi-manual if your downstream system is not yet subscribed to the completion queue. Cases may require an underwriter to retrieve the output from S3 or the dashboard and manually update the policy admin system.

Stage 3 — Full automation (end-to-end, no manual touchpoint below confidence threshold)

What it is: The complete automated loop. New business submissions trigger the agent automatically; the agent’s structured output is written to S3 and consumed by your policy admin system via the completion queue in real time. Cases above the configured high-confidence threshold are handled according to your auto-resolve policy (see Confidence Engine for threshold configuration). Cases below threshold route to your underwriter queue for human review. No manual touchpoint occurs for auto-resolved cases. When to use it: When your downstream systems can consume structured JSON output from S3 or via webhook, your underwriting governance team has validated auto-resolve thresholds on real production data, and your team has completed Phase 1–2 of the CI/CD rollout lifecycle. What your IT team needs to set up:
  • Stage 2 infrastructure
  • Full SQS completion queue integration with your policy admin system (read output JSON from S3 and apply disposition)
  • Webhook or queue-based routing of below-threshold cases to your underwriter review queue
  • Ongoing monitoring of auto-resolve rate and error rate via CloudWatch dashboards
Data flow: Limitation: Requires the most IT coordination and the most governance validation before go-live. Do not skip Stage 1 and Stage 2 — the calibration data you collect in those stages is what validates your auto-resolve threshold before full automation is appropriate.

Choosing your starting stage

constraintrecommended starting stage
No IT integration resources available right nowStage 1 — dashboard only
Intake system can publish an SQS event; policy admin not yet readyStage 2 — queue-triggered
Full system integration complete; governance team has validated thresholdsStage 3 — full automation
Piloting across two lines with different IT maturityStage 1 for one line, Stage 2 for the other — independently configured
Stages are not mutually exclusive across lines of business. You can run Stage 1 for a new line and Stage 3 for a mature line in the same AWS account and the same VPC — each line has its own SQS queues, S3 buckets, and agent configuration. The infrastructure is additive.