docs: _TEMPLATE + all-10 criteria-section stubs (verbatim criteria)
This commit is contained in:
@@ -0,0 +1,54 @@
|
||||
# Category 5 — Auditability and Observability (weight 12)
|
||||
|
||||
> Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; **the human
|
||||
> pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
**How tool execution logging works (verbatim, confirmed with Arcade, Jun 15):** Arcade's built-in
|
||||
audit log covers administrative operations only (gateway creation, server registration, API key
|
||||
management) — this is by design, not a gap. Tool execution observability is handled via
|
||||
OpenTelemetry (OTEL): when deploying the Arcade image to Kubernetes, OTEL can be enabled to ship
|
||||
telemetry to any observability collector (Datadog, ELK Stack, etc.). When self-hosted, no telemetry
|
||||
flows back to Arcade — all data stays in ServiceTitan's infrastructure. This is the path to satisfy
|
||||
InfoSec's execution audit requirement.
|
||||
|
||||
**ServiceTitan reality (this deployment — see ../../LIVE-POC.md):** logs → ELK (Vector daemonset);
|
||||
**metrics → Grafana/Mimir** (Grafana Agent scrapes ServiceMonitors → remote_write to Mimir). The
|
||||
engine emits OTLP metrics but they are **dropped** today — `arcade-otel-collector:4318` does not
|
||||
resolve (no collector deployed). Remediation = deploy a collector + bridge it into Prometheus/Mimir.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | OTEL enabled on the self-hosted Arcade deployment — execution telemetry ships to ServiceTitan's observability stack (Datadog or ELK). | | |
|
||||
| 2 | Every tool call produces a log record with: user, tool invoked, timestamp, outcome — queryable in Datadog or ELK. | | |
|
||||
| 3 | Admin audit log — all configuration changes (gateways, servers, API keys, policies) are logged in Arcade. | | |
|
||||
| 4 | Per-tool and per-user usage metrics (call counts, error rates, latency) visible in the observability stack. | | |
|
||||
| 5 | Trace propagation — tool call traces joinable to agent and application traces via OTEL. | | |
|
||||
| 6 | No telemetry data leaves ServiceTitan's infrastructure to Arcade when self-hosted. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — No OTEL support; no execution telemetry available outside Arcade's dashboard
|
||||
- **3** — OTEL works but configuration is manual or underdocumented; trace propagation requires custom work
|
||||
- **5** — OTEL is documented and easy to enable; full execution telemetry in Datadog/ELK; trace propagation works end-to-end
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Enable OTEL on the self-hosted Arcade Kubernetes deployment. Make a tool call. Verify a record appears in Datadog (or ELK) with: user_id, tool name, timestamp, outcome. | | |
|
||||
| 2 | Make an administrative change (update a gateway). Verify the change appears in Arcade's admin audit log. | | |
|
||||
| 3 | Propagate a trace ID from an agent call through to the tool execution. Verify the trace is end-to-end visible in the observability stack. | | |
|
||||
| 4 | Confirm no tool execution telemetry is transmitted to Arcade's own systems when running self-hosted. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| OTEL integration | OTEL enabled on self-hosted deployment; execution telemetry flows to Datadog or ELK | | |
|
||||
| Execution audit | Every tool call produces a retrievable record with user, tool, timestamp, outcome in ServiceTitan's observability stack | | |
|
||||
| Admin audit | All Arcade configuration changes are logged in the admin audit log | | |
|
||||
| Data residency | No tool execution telemetry transmitted to Arcade when self-hosted — confirmed | | |
|
||||
| InfoSec sign-off | Dane Snyder confirms the OTEL-based execution audit satisfies the access audit requirement | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
Reference in New Issue
Block a user