> ## Documentation Index
> Fetch the complete documentation index at: https://docs.uselayerup.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Error Handling & Resilience

> The Layerup agent never silently fails. Every error condition — OCR failure, model timeout, guardrail intervention, low confidence — results in a deterministic, logged fallback action and human escalation.

# 8 — Error handling & resilience — deterministic fallback, escalation protocols & failure-mode taxonomy.

The Layerup agent is designed to never silently fail. This is not an operational aspiration — it is a structural property enforced by the agent's output assembly architecture. Every error condition results in a deterministic fallback action that preserves case integrity, writes a structured error record to your audit log, and ensures a human underwriter is notified.

***

## 8.1  Design principle — fail deterministically, never silently

The following invariant governs all error handling logic in the Layerup agent:

<Warning>
  Any error condition that prevents the agent from producing a high-confidence, fully evidence-grounded output for a case results in an explicit, structured escalation to a human underwriter. The agent will never produce a confident-looking output for a case it cannot process correctly. Silent failures, suppressed exceptions, and optimistic outputs under uncertainty are architecturally excluded.
</Warning>

This means the agent's output always falls into one of three categories:

1. **Full recommendation** — All required documents processed, confidence above threshold, evidence grounded, requirements complete.
2. **Partial recommendation with flags** — Some data points uncertain, some documents degraded, but overall confidence sufficient. Output includes explicit flags directing the underwriter to the uncertain items.
3. **Escalation** — Confidence below threshold, critical extraction failure, or edge case outside AOP coverage. Case is routed to a senior underwriter with full context.

There is no fourth category.

***

## 8.2  Document processing failures

### OCR failure

If the OCR pipeline cannot extract readable text from a document with confidence above the configured threshold:

1. The agent logs a structured warning citing the specific document filename and page number.
2. The agent continues processing all other documents in the case — partial failure does not abort the case.
3. The failed document is included in the `requirements` list as requiring manual review by the underwriter, with explicit notation that OCR confidence was insufficient.
4. The agent does not attempt to reason over unreadable content. Uncertain fields from failed OCR extractions are explicitly marked as unresolved in the output payload.

### Unsupported document format

If the agent encounters a document format outside its supported types (e.g., a proprietary medical record format, a password-protected PDF, or a corrupted file):

1. The error is logged with the specific document identifier, format, and error type.
2. The document is included in an `unprocessed_attachments` list in the output payload.
3. The underwriter is directed to review the document manually, with the specific reason for non-processing stated.

***

## 8.3  LLM inference failures

### Model timeout

If an Amazon Bedrock or Azure OpenAI inference call exceeds the configured timeout (default: 120 seconds):

1. The agent implements an exponential backoff retry policy with a maximum of **3 retries**.
2. Retry intervals: 5s → 15s → 45s (with jitter), ensuring SQS / Service Bus visibility timeout is not breached during the retry window.
3. If all 3 retries fail, the case is marked as `Failed` and routed to the human escalation queue with full error context.
4. The failure event is written to CloudWatch Logs with millisecond-precision timestamps for each retry attempt.

| retry attempt | wait before retry     | action if exceeded                        |
| ------------- | --------------------- | ----------------------------------------- |
| Attempt 1     | Immediate             | Log failure · retry                       |
| Attempt 2     | 5 seconds (+ jitter)  | Log failure · retry                       |
| Attempt 3     | 15 seconds (+ jitter) | Log failure · retry                       |
| Final failure | 45 seconds (+ jitter) | Mark `Failed` · route to escalation queue |

### Guardrail intervention

If Amazon Bedrock Guardrails or Azure AI Content Safety blocks an inference request:

1. The specific guardrail policy that triggered the block is logged in full detail to the audit log, including the blocked input and the policy type.
2. The agent **aborts that specific reasoning step** and does not attempt to rephrase or retry past the guardrail block. There is no code path that circumvents the guardrail.
3. The case is flagged for human review, with the guardrail intervention noted in the escalation flag and requirements list.

<Warning>
  The agent is explicitly prohibited from retrying past a guardrail block with a modified prompt. A guardrail block is treated as a terminal error for that reasoning step. This ensures guardrail policies cannot be circumvented through iterative prompt modification.
</Warning>

### Model unavailability

If the underlying foundation model is unavailable (e.g., during an AWS Bedrock or Azure OpenAI service incident):

1. The agent's SQS queue / Service Bus retains unprocessed messages for the duration of the outage. Messages are not dropped.
2. Once the service recovers, the agent processes the backlog in arrival order.
3. Your CloudWatch Alarm / Azure Monitor Alert notifies your operations team of the queue depth during an outage, with configurable thresholds for breach notification.
4. SQS Dead-Letter Queue (DLQ) / Service Bus Dead-Letter Queue captures any messages that exceed the maximum receive count — these are reviewed manually by your operations team.

***

## 8.4  Flag-and-escalate protocol — the confidence floor

The Layerup agent is configured to escalate to a human underwriter in any situation where its confidence is insufficient to support a definitive recommendation. The escalation logic operates as a mandatory, pre-output gate — it cannot be bypassed by prompt engineering or configuration.

```mermaid theme={null}
flowchart TD
  OUT[Agent Output Assembly] --> CHK1{Confidence Score<br/>≥ configured threshold?<br/>Default: 75%}
  CHK1 -->|No| ESC[Escalate to Senior Underwriter]
  CHK1 -->|Yes| CHK2{Critical data point<br/>extraction failure?}
  CHK2 -->|Yes| ESC
  CHK2 -->|No| CHK3{Pattern outside<br/>AOP coverage?}
  CHK3 -->|Yes| ESC
  CHK3 -->|No| REC[Full / Partial Recommendation<br/>with explicit flags]
  ESC --> ESCPAY[Escalation Payload<br/>flag · reasons · evidence context]
  REC --> PAY[Recommendation Payload<br/>with evidence citations]

  classDef gate fill:#fafafa,stroke:#111,color:#111;
  classDef action fill:#f4f4f2,stroke:#111,color:#111;
  class CHK1,CHK2,CHK3 gate;
  class ESC,REC,ESCPAY,PAY action;
```

*Fig. A8.1 — Escalation gate. Every case passes three mandatory checks before a recommendation is assembled. Escalation is not an error state — it is the designed outcome for cases where human judgment is required.*

The three mandatory escalation triggers are:

<CardGroup cols={2}>
  <Card title="Confidence Below Threshold" icon="gauge-low">
    Any case where the overall confidence score falls below your configured threshold (default: 75%) receives an automatic `Escalate to Senior Underwriter` recommendation, regardless of the agent's other findings. The threshold is configurable in the AOP; the escalation logic is not.
  </Card>

  <Card title="Critical Data Point Failure" icon="triangle-exclamation">
    Any case where a critical data point (e.g., primary income source, primary occupation) cannot be extracted from available documents — due to document absence, OCR failure, or irreconcilable inconsistency — results in automatic escalation, regardless of confidence on other dimensions.
  </Card>

  <Card title="Edge Case Outside AOP" icon="compass">
    Any case where the agent detects a pattern it has not been configured to evaluate — an edge case outside the AOP's defined coverage — results in a flag with an explicit note that the specific scenario requires human judgment. The agent does not extrapolate beyond its AOP scope.
  </Card>

  <Card title="Guardrail-Triggered Escalation" icon="shield-halved">
    Any case where a Bedrock Guardrail or Azure AI Content Safety block cannot be resolved through the retry protocol is automatically escalated, with the full guardrail intervention record included in the escalation payload for the reviewing underwriter.
  </Card>
</CardGroup>

***

## 8.5  Failure-mode summary

| failure mode                          | agent response                          | written to audit log        | human escalation          |
| ------------------------------------- | --------------------------------------- | --------------------------- | ------------------------- |
| OCR below confidence threshold        | Flag document · continue other docs     | Yes                         | Via requirements list     |
| Unsupported document format           | Log · include in unprocessed list       | Yes                         | Via requirements list     |
| Model timeout (all retries exhausted) | Mark Failed · route to escalation queue | Yes                         | Yes — immediate           |
| Guardrail intervention                | Abort reasoning step · flag case        | Yes (full guardrail detail) | Yes                       |
| Model service unavailability          | Retain in queue · process on recovery   | Yes — queue depth alarms    | Via operations team alert |
| Confidence below threshold            | Escalate to Senior Underwriter          | Yes                         | Yes                       |
| Critical extraction failure           | Escalate to Senior Underwriter          | Yes                         | Yes                       |
| Edge case outside AOP                 | Flag with explicit note                 | Yes                         | Yes                       |