18 — Observability & telemetry.

The substrate is observable end-to-end: every request, every plan step, every model call, every PDP ruling, every action transition emits typed spans, metrics, and logs. Telemetry is OTel-compatible; nothing in the platform is observable only “internally.”

18.1 ID model

ID	Scope	Format	Source
`traceId`	One end-to-end request	W3C trace-context, 16 bytes	OTel
`spanId`	Per span	8 bytes	OTel
`runId`	One AgentRun	`run_…`	scheduler
`decisionId`	One Decision	`dec_…`	verifier
`actionId`	One Action	`act_…`	action plane
`idempotencyKey`	One action intent	`sha256`	action plane (§14.3)
`correlationId`	One business case	tenant-defined	tenant adapter
`auditEventId`	One audit row	`aev_…`	audit chain (§17)

18.2 Span model

Spans are emitted at every plane boundary and at every step inside the runtime. Span names follow a fixed namespace.

data.ingest.<channel>
data.mapping.<mapping-id>
ontology.read · ontology.write
agent.run.<agent-id>
agent.plan
agent.step.<step-id>
agent.verify
tool.<name>.dispatch
tool.<name>.run
tool.<name>.result
model.<lane>.call
policy.check
action.stage · action.approval · action.commit · action.revert
audit.append

18.3 OTel emission

Spans, metrics and logs follow OpenTelemetry semantic conventions.
Every span carries: tenant, region, plane, component, principal.kind.
Tenants choose their export: OTLP to a tenant-controlled collector, or pull-based scrape.
PII fields are never put on span attributes; references are by content hash.

18.4 Dashboard catalog

Volume & concurrency

Throughput — runs/sec · staged actions/sec · committed actions/sec · ingest events/sec · per-tenant queue depth

Reasoning & verifier outcomes

Quality — verifier-pass / warn / block ratios · decision supersede rate · reviewer override rate · drift sigma

Latency surfaces

Performance — p50 / p99 per span family · model lane p50 / p99 · PDP p99 · adapter p99 per SoR

Audit health

Integrity — chain-anchor lag · tamper-detection alerts · failed-audit-emission count

SoR adapters

Integration — adapter success ratio · transactional vs flat-file mix · receipt-correlation lag

Spend attribution

Cost — per agent · per run · per tenant · per lane · per provider · cost per committed action

18.5 Cost attribution

Every model call carries the (lane, provider, model, region) tuple and a per-call cost. Costs roll up by:

tool dispatch (so per-tool cost is known)
plan step (so per-step cost is known within a run)
run (so per-decision cost is known)
agent (so per-agent cost is known)
tenant (for billing & budget enforcement)
committed action (so cost-per-effect is computable)

18.6 Logging discipline

Logs are structured JSON; free-text is forbidden in production logs.
PII is never logged in plaintext; references by content hash.
Log retention is shorter than audit retention; the audit chain is the durable record.
Log level is dynamic per principal class; debug-level cannot be enabled for an entire tenant in production.

18.7 Alert taxonomy

Class	Examples	Default routing
capacity	queue depth · token burn	operations on-call
quality	drift breach · supersede spike	ML on-call
integrity	tamper-detection · audit lag	security on-call
integration	adapter error · SoR down	integration on-call · tenant SoR owner
governance	break-glass · authority breach	tenant security primary

Foundations

Ontology

Data Plane

Logic & Reasoning

Models

Action Plane

Security & Governance

Operations

Enterprise

Observability & Telemetry

18 — Observability & telemetry.

18.1 ID model

18.2 Span model

18.3 OTel emission

18.4 Dashboard catalog

Volume & concurrency

Reasoning & verifier outcomes

Latency surfaces

Audit health

SoR adapters

Spend attribution

18.5 Cost attribution

18.6 Logging discipline

18.7 Alert taxonomy

​18 — Observability & telemetry.

​18.1 ID model

​18.2 Span model

​18.3 OTel emission

​18.4 Dashboard catalog

Volume & concurrency

Reasoning & verifier outcomes

Latency surfaces

Audit health

SoR adapters

Spend attribution

​18.5 Cost attribution

​18.6 Logging discipline

​18.7 Alert taxonomy

18 — Observability & telemetry.

18.1 ID model

18.2 Span model

18.3 OTel emission

18.4 Dashboard catalog

18.5 Cost attribution

18.6 Logging discipline

18.7 Alert taxonomy