Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.uselayerup.com/llms.txt

Use this file to discover all available pages before exploring further.

18 — Observability & telemetry.

The substrate is observable end-to-end: every request, every plan step, every model call, every PDP ruling, every action transition emits typed spans, metrics, and logs. Telemetry is OTel-compatible; nothing in the platform is observable only “internally.”

18.1 ID model

IDScopeFormatSource
traceIdOne end-to-end requestW3C trace-context, 16 bytesOTel
spanIdPer span8 bytesOTel
runIdOne AgentRunrun_…scheduler
decisionIdOne Decisiondec_…verifier
actionIdOne Actionact_…action plane
idempotencyKeyOne action intentsha256action plane (§14.3)
correlationIdOne business casetenant-definedtenant adapter
auditEventIdOne audit rowaev_…audit chain (§17)

18.2 Span model

Spans are emitted at every plane boundary and at every step inside the runtime. Span names follow a fixed namespace.
data.ingest.<channel>
data.mapping.<mapping-id>
ontology.read · ontology.write
agent.run.<agent-id>
agent.plan
agent.step.<step-id>
agent.verify
tool.<name>.dispatch
tool.<name>.run
tool.<name>.result
model.<lane>.call
policy.check
action.stage · action.approval · action.commit · action.revert
audit.append

18.3 OTel emission

  • Spans, metrics and logs follow OpenTelemetry semantic conventions.
  • Every span carries: tenant, region, plane, component, principal.kind.
  • Tenants choose their export: OTLP to a tenant-controlled collector, or pull-based scrape.
  • PII fields are never put on span attributes; references are by content hash.

18.4 Dashboard catalog

Volume & concurrency

Throughput — runs/sec · staged actions/sec · committed actions/sec · ingest events/sec · per-tenant queue depth

Reasoning & verifier outcomes

Quality — verifier-pass / warn / block ratios · decision supersede rate · reviewer override rate · drift sigma

Latency surfaces

Performance — p50 / p99 per span family · model lane p50 / p99 · PDP p99 · adapter p99 per SoR

Audit health

Integrity — chain-anchor lag · tamper-detection alerts · failed-audit-emission count

SoR adapters

Integration — adapter success ratio · transactional vs flat-file mix · receipt-correlation lag

Spend attribution

Cost — per agent · per run · per tenant · per lane · per provider · cost per committed action

18.5 Cost attribution

Every model call carries the (lane, provider, model, region) tuple and a per-call cost. Costs roll up by:
  • tool dispatch (so per-tool cost is known)
  • plan step (so per-step cost is known within a run)
  • run (so per-decision cost is known)
  • agent (so per-agent cost is known)
  • tenant (for billing & budget enforcement)
  • committed action (so cost-per-effect is computable)

18.6 Logging discipline

  • Logs are structured JSON; free-text is forbidden in production logs.
  • PII is never logged in plaintext; references by content hash.
  • Log retention is shorter than audit retention; the audit chain is the durable record.
  • Log level is dynamic per principal class; debug-level cannot be enabled for an entire tenant in production.

18.7 Alert taxonomy

ClassExamplesDefault routing
capacityqueue depth · token burnoperations on-call
qualitydrift breach · supersede spikeML on-call
integritytamper-detection · audit lagsecurity on-call
integrationadapter error · SoR downintegration on-call · tenant SoR owner
governancebreak-glass · authority breachtenant security primary