Documentation Index
Fetch the complete documentation index at: https://docs.uselayerup.com/llms.txt
Use this file to discover all available pages before exploring further.
18 — Observability & telemetry.
The substrate is observable end-to-end: every request, every plan step, every model call, every PDP ruling, every action transition emits typed spans, metrics, and logs. Telemetry is OTel-compatible; nothing in the platform is observable only “internally.”18.1 ID model
| ID | Scope | Format | Source |
|---|---|---|---|
traceId | One end-to-end request | W3C trace-context, 16 bytes | OTel |
spanId | Per span | 8 bytes | OTel |
runId | One AgentRun | run_… | scheduler |
decisionId | One Decision | dec_… | verifier |
actionId | One Action | act_… | action plane |
idempotencyKey | One action intent | sha256 | action plane (§14.3) |
correlationId | One business case | tenant-defined | tenant adapter |
auditEventId | One audit row | aev_… | audit chain (§17) |
18.2 Span model
Spans are emitted at every plane boundary and at every step inside the runtime. Span names follow a fixed namespace.18.3 OTel emission
- Spans, metrics and logs follow OpenTelemetry semantic conventions.
- Every span carries:
tenant,region,plane,component,principal.kind. - Tenants choose their export: OTLP to a tenant-controlled collector, or pull-based scrape.
- PII fields are never put on span attributes; references are by content hash.
18.4 Dashboard catalog
Volume & concurrency
Throughput — runs/sec · staged actions/sec · committed actions/sec · ingest events/sec · per-tenant queue depth
Reasoning & verifier outcomes
Quality — verifier-pass / warn / block ratios · decision supersede rate · reviewer override rate · drift sigma
Latency surfaces
Performance — p50 / p99 per span family · model lane p50 / p99 · PDP p99 · adapter p99 per SoR
Audit health
Integrity — chain-anchor lag · tamper-detection alerts · failed-audit-emission count
SoR adapters
Integration — adapter success ratio · transactional vs flat-file mix · receipt-correlation lag
Spend attribution
Cost — per agent · per run · per tenant · per lane · per provider · cost per committed action
18.5 Cost attribution
Every model call carries the (lane, provider, model, region) tuple and a per-call cost. Costs roll up by:- tool dispatch (so per-tool cost is known)
- plan step (so per-step cost is known within a run)
- run (so per-decision cost is known)
- agent (so per-agent cost is known)
- tenant (for billing & budget enforcement)
- committed action (so cost-per-effect is computable)
18.6 Logging discipline
- Logs are structured JSON; free-text is forbidden in production logs.
- PII is never logged in plaintext; references by content hash.
- Log retention is shorter than audit retention; the audit chain is the durable record.
- Log level is dynamic per principal class; debug-level cannot be enabled for an entire tenant in production.
18.7 Alert taxonomy
| Class | Examples | Default routing |
|---|---|---|
| capacity | queue depth · token burn | operations on-call |
| quality | drift breach · supersede spike | ML on-call |
| integrity | tamper-detection · audit lag | security on-call |
| integration | adapter error · SoR down | integration on-call · tenant SoR owner |
| governance | break-glass · authority breach | tenant security primary |

