Files
arcade-eval/categories/cat5-auditability/criteria-section-5.md
T

3.7 KiB
Raw Blame History

Category 5 — Auditability and Observability (weight 12)

Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; the human pastes. 15 scale; anchors at 1/3/5.

How tool execution logging works (verbatim, confirmed with Arcade, Jun 15): Arcade's built-in audit log covers administrative operations only (gateway creation, server registration, API key management) — this is by design, not a gap. Tool execution observability is handled via OpenTelemetry (OTEL): when deploying the Arcade image to Kubernetes, OTEL can be enabled to ship telemetry to any observability collector (Datadog, ELK Stack, etc.). When self-hosted, no telemetry flows back to Arcade — all data stays in ServiceTitan's infrastructure. This is the path to satisfy InfoSec's execution audit requirement.

ServiceTitan reality (this deployment — see ../../LIVE-POC.md): logs → ELK (Vector daemonset); metrics → Grafana/Mimir (Grafana Agent scrapes ServiceMonitors → remote_write to Mimir). The engine emits OTLP metrics but they are dropped today — arcade-otel-collector:4318 does not resolve (no collector deployed). Remediation = deploy a collector + bridge it into Prometheus/Mimir.

Scores

# Criterion (verbatim) Score (15) Evidence / note
1 OTEL enabled on the self-hosted Arcade deployment — execution telemetry ships to ServiceTitan's observability stack (Datadog or ELK).
2 Every tool call produces a log record with: user, tool invoked, timestamp, outcome — queryable in Datadog or ELK.
3 Admin audit log — all configuration changes (gateways, servers, API keys, policies) are logged in Arcade.
4 Per-tool and per-user usage metrics (call counts, error rates, latency) visible in the observability stack.
5 Trace propagation — tool call traces joinable to agent and application traces via OTEL.
6 No telemetry data leaves ServiceTitan's infrastructure to Arcade when self-hosted.

Average: ___ Category score: ___

Score anchors

  • 1 — No OTEL support; no execution telemetry available outside Arcade's dashboard
  • 3 — OTEL works but configuration is manual or underdocumented; trace propagation requires custom work
  • 5 — OTEL is documented and easy to enable; full execution telemetry in Datadog/ELK; trace propagation works end-to-end

Benchmark tests

# Test (verbatim) Result Evidence
1 Enable OTEL on the self-hosted Arcade Kubernetes deployment. Make a tool call. Verify a record appears in Datadog (or ELK) with: user_id, tool name, timestamp, outcome.
2 Make an administrative change (update a gateway). Verify the change appears in Arcade's admin audit log.
3 Propagate a trace ID from an agent call through to the tool execution. Verify the trace is end-to-end visible in the observability stack.
4 Confirm no tool execution telemetry is transmitted to Arcade's own systems when running self-hosted.

Suggested pass/fail gates

Gate Pass condition (verbatim) Result Evidence
OTEL integration OTEL enabled on self-hosted deployment; execution telemetry flows to Datadog or ELK
Execution audit Every tool call produces a retrievable record with user, tool, timestamp, outcome in ServiceTitan's observability stack
Admin audit All Arcade configuration changes are logged in the admin audit log
Data residency No tool execution telemetry transmitted to Arcade when self-hosted — confirmed
InfoSec sign-off Dane Snyder confirms the OTEL-based execution audit satisfies the access audit requirement

Findings