cat1: FINALIZE scorecard (draft 4/5); STATUS + cat-5 NOTES ready for fresh-session handoff

2026-06-22 09:55:01 -04:00
parent 8b48f5813e
commit 53f960409e
5 changed files with 95 additions and 24 deletions
@@ -0,0 +1,19 @@
+{
+    "workbench.colorCustomizations": {
+        "activityBar.activeBackground": "#ff6433",
+        "activityBar.activeBorder": "#00ff3d",
+        "activityBar.background": "#ff6433",
+        "activityBar.foreground": "#15202b",
+        "activityBar.inactiveForeground": "#15202b99",
+        "activityBarBadge.background": "#00ff3d",
+        "activityBarBadge.foreground": "#15202b",
+        "statusBar.background": "#ff3d00",
+        "statusBar.foreground": "#e7e7e7",
+        "statusBarItem.hoverBackground": "#ff6433",
+        "titleBar.activeBackground": "#ff3d00",
+        "titleBar.activeForeground": "#e7e7e7",
+        "titleBar.inactiveBackground": "#ff3d0099",
+        "titleBar.inactiveForeground": "#e7e7e799"
+    },
+    "peacock.color": "#ff3d00"
+}
@@ -1,18 +1,19 @@
 # STATUS — "you are here" handoff

 Each lane owns its own section. Update yours; don't touch others'. Keep it terse.
-Last full-repo update: 2026-06-18 (scaffold).
+Last full-repo update: 2026-06-22.

 ## Category 1 — Functional MCP Gateway Capability
 - Owner: ztaylor
- Status: in progress (scaffold done; executing per `~/repos/docs/arcade-eval-plan.md`)
- Last live-state check: —
- Notes: cat-1 lane = this session. Per-user tests via `user_id` headers (real Entra SSO → cat 2).
+- Status: **SCORED (draft 4/5)** — `categories/cat1-functional/criteria-section-1.md`, awaiting user paste into the Google Doc.
+- Last live-state check: 2026-06-22
+- Result: protocol/curation/mixed/dynamic-reg/zero-config-clients all PASS; per-user execution proven (`whoami` A→A/B→B); Claude Code connected via Arcade-Headers AND Entra OAuth. One finding: per-user tool-LIST scoping is gateway-wide, not native (→ cat-3/separate gateways).
+- Fixtures (reusable): gateway `zeb-gateway-test`; ref server `arcade-eval-ref` (lib/mcp_server) registered via cloudflared quick tunnel (EPHEMERAL — re-establish for cat-9; see LIVE-POC).

 ## Category 2 — Delegated Authorization and Identity
 - Owner: — (security cluster: Dane / Chandu)
- Status: not started (criteria stub seeded)
- Notes: holds the Entra/Okta SSO login → identity-mapping test (a teammate can be User B).
+- Status: not started (criteria stub seeded) — **but cat-1 work already generated strong evidence; see LIVE-POC "Known behaviors".**
+- Notes: holds the Entra/Okta SSO login → identity-mapping test. Open finding: User Source keys user_id on opaque Entra `sub`, mismatching the dashboard email → blocks downstream OAuth consent bind (fix: map User Source to the email claim). Google provider redirect-uri/secret issue was resolved 2026-06-22.

 ## Category 3 — Tool-Level Access Control and Policy
 - Owner: — (security cluster)
@@ -24,8 +25,9 @@ Last full-repo update: 2026-06-18 (scaffold).

 ## Category 5 — Auditability and Observability
 - Owner: ztaylor
- Status: not started (criteria stub seeded)
- Notes: metrics → Grafana/Mimir (NOT ELK); engine OTLP currently dropped (no collector). See LIVE-POC.
+- Status: **NEXT — start here in a fresh session** (invoke skill `arcade-gateway-eval`; read this + LIVE-POC; run live-state check). See `categories/cat5-auditability/NOTES.md` for the plan.
+- Last live-state check: —
+- Notes: metrics → **Grafana/Mimir** (NOT ELK); logs → ELK (Vector). Engine OTLP currently **dropped** — collector `arcade-otel-collector:4318` doesn't resolve. First task = OTEL collector → Prometheus/Mimir remediation (with the user; touches `k8s-backstage-v2/apps/arcade`). Full evidence + remediation shapes in LIVE-POC "Observability".

 ## Category 6 — Security and Compliance
 - Owner: — (security cluster)
@@ -11,8 +11,14 @@
  - Q5: ungranted tool → `McpError: tool not enabled for this gateway`.

 ## Remaining for cat-1 scoring
- [x] 2.2 (Claude Code) — `claude mcp add` HTTP → ✔ Connected, no adapter; key kept as `${ARCADE_API_KEY}` ref (not persisted).
- [ ] 2.2 (Cursor) — `.cursor/mcp.json` written with `${env:ARCADE_API_KEY}`; user verifying in Cursor UI (launch from shell with .env loaded).
+- [x] 2.2 (Claude Code) — connected with NO adapter in both modes: Arcade-Headers (`claude mcp add`) AND Entra User-Source OAuth (`/mcp` login → tools loaded in-session; echo/whoami ran). Key kept as `${ARCADE_API_KEY}` ref (not persisted).
+- [~] 2.2 (Cursor/LangGraph/internal) — not exercised this round; no adapter expected (same transport). Cursor config currently empty.
+- [x] 2.8 — scorecard FINALIZED (draft 4/5) in criteria-section-1.md; awaiting user paste into Google Doc.
+
+## Side evidence generated (handed to other lanes)
+- cat-2: Entra IdP login works; identity = opaque `sub`; downstream OAuth consent-bind mismatch (see LIVE-POC).
+- cat-4/8/9: `arcade deploy` is cloud-only → self-hosted servers use the register path.
+- cat-9: full tunnel-registration chain validated end-to-end (client→gateway→Engine→tunnel→local server).
 - [x] 2.5 — **dynamic registration**: PASS — saved add/remove (−Brightdata, +Youtube) reflected on next list, no restart; draft didn't propagate until Save.
 - Reference server built at `lib/mcp_server` (echo/add/whoami); locally validated by `arcade deploy` (3 tools, 0 secrets). **`arcade deploy` is cloud-only (finding)** — see LIVE-POC.
 - [x] 2.7 — **mixed prebuilt + custom**: PASS — gateway lists 7 prebuilt + 3 custom (ArcadeEvalRef_*, self-hosted via cloudflared tunnel) in one flat list; echo invokes. Full chain validated (also cat-9 Stage-2).
@@ -2,22 +2,30 @@

 > Verbatim criteria / gates / questions from the criteria Google Doc. Fill Score / Evidence /
 > Findings / Answers locally; **the human pastes** into the Google Doc. 1–5 scale; anchors at 1/3/5.
-> Status: **in progress** — scores held until the remaining tests (2.2 Claude Code, 2.5 dynamic
-> reg, 2.7 mixed, 2.4 whoami) land. Raw evidence: `tests/probes.md`.
+> Status: **FINALIZED (draft) 2026-06-22** — category score **4/5**. Draft for user review before
+> pasting into the criteria Google Doc. Raw evidence: `tests/probes.md`.

 ## Scores
 | # | Criterion (verbatim) | Score (1–5) | Evidence / note |
 |---|---|---|---|
-| 1 | Implements MCP protocol correctly — tool listing, tool invocation, error responses. |  | PASS (live) — lib `mcp` SDK client connected, initialized, listed 7 tools, invoked, got structured `isError` result + JSON-RPC error. Minor: 202 on session close. |
-| 2 | Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. |  | PASS — 7 tools listed == the 7-tool allow-list selected (Slack×2, GoogleDocs×4, Brightdata×1). |
-| 3 | Per-user tool scoping — different users see different tool lists based on their explicit grants. |  | **FINDING** — User A and User B see the **identical 7 tools** on one gateway (Arcade-Headers). List is gateway-wide, not per-user. Per-user differentiation needs cat-3 Contextual Access or separate gateways / User Source. |
-| 4 | Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). |  | PASS (Claude Code) — `claude mcp add` HTTP → ✔ Connected, no adapter, key via `${ARCADE_API_KEY}` ref (not persisted). Plus compliant `mcp`-SDK client ✓. Cursor connect in progress (GUI verify, `${env:ARCADE_API_KEY}`). |
-| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. |  | PASS — `whoami` returns the calling user's id (A→A, B→B); each call runs in the caller's own context, not a shared identity. Echo invocation clean. |
-| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. |  | PASS — one gateway lists 7 prebuilt (`main`) + 3 custom (self-hosted, tunnel-registered) tools in one flat list; both invoke. |
-| 7 | Gateway is pure metadata — adding or removing tools does not require server redeployment. |  | PASS — saved edit (remove Brightdata, add Youtube_SearchForVideos) reflected on next `tools/list`, no restart. |
-| 8 | Dynamic tool registration — new tools become available without gateway restart. |  | PASS — new tool appeared immediately after Save; no engine/server restart. |
+| 1 | Implements MCP protocol correctly — tool listing, tool invocation, error responses. | 5 | PASS (live) — lib `mcp` SDK client connected, initialized, listed tools, invoked, got structured `isError` result + JSON-RPC error. Minor: 202 on session close. |
+| 2 | Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. | 5 | PASS — listed tools == the configured allow-list exactly. |
+| 3 | Per-user tool scoping — different users see different tool lists based on their explicit grants. | 2 | **FINDING** — User A and User B see the **identical** tool list on one gateway (Arcade-Headers). List is gateway-wide, not per-user. Per-user differentiation needs cat-3 Contextual Access or separate gateways / User Source — not native to the gateway allow-list. |
+| 4 | Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). | 4 | PASS (Claude Code) — connected with **no adapter** in BOTH modes: Arcade-Headers (`claude mcp add` HTTP) and **Entra User-Source OAuth** (`/mcp` login → tools loaded in-session, echo/whoami executed). Plus compliant `mcp`-SDK client ✓. Cursor/LangGraph/internal not exercised this round (no adapter expected — same transport). |
+| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. | 4 | PASS — `whoami` returns the calling user's id (A→A, B→B); each call runs in the caller's own context, not a shared identity. (Exhaustive cross-user token-access attack is cat-2/3 scope.) |
+| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. | 5 | PASS — one gateway lists 7 prebuilt (`main`) + 3 custom (self-hosted, tunnel-registered) tools in one flat list; both invoke. |
+| 7 | Gateway is pure metadata — adding or removing tools does not require server redeployment. | 5 | PASS — saved edit (remove Brightdata, add Youtube_SearchForVideos) reflected on next `tools/list`, no restart. |
+| 8 | Dynamic tool registration — new tools become available without gateway restart. | 5 | PASS — new tool appeared immediately after Save; no engine/server restart. |

-**Average:** ___   **Category score:** ___
+**Average:** 4.4   **Category score:** **4**
+
+> **Category-score rationale (4/5):** Everything at the "5" anchor is met — full curation, mixed
+> prebuilt+custom behind one URL, dynamic registration, and zero-config/no-adapter MCP clients
+> (Claude Code via both headers and Entra OAuth). Held back from 5 by the one gap: **per-user tool
+> scoping is not native** — a single gateway serves an identical tool list to all users; per-user
+> differentiation requires workarounds (separate gateways or cat-3 Contextual Access), which is the
+> "3" anchor's language. Net: well above 3 (curation + mixed + dynamic + zero-config all solid),
+> below 5 (no native per-user tool scoping) → **4**.

 ## Score anchors
 - **1** — Basic MCP server, no per-user scoping or curation
@@ -27,7 +35,7 @@
 ## Benchmark questions
 | # | Question (verbatim) | Answer | Evidence |
 |---|---|---|---|
-| 1 | Can a Claude Code client connect to the gateway and see only the tools granted to the current user? | Connect: lib client ✓; Claude Code pending (2.2). "Only granted tools": N/A — no per-user grants on this gateway (list is gateway-wide). | probes.md |
+| 1 | Can a Claude Code client connect to the gateway and see only the tools granted to the current user? | Connect: **Yes** — Claude Code connected via both Arcade-Headers and Entra OAuth, no adapter; lib client ✓. "Only granted tools": **No** — list is gateway-wide, not per-user-granted. | probes.md |
 | 2 | Can the same gateway URL serve two different users with different tool lists? | **No** — A and B see identical 7 tools. | probes.md (A==B) |
 | 3 | Can we add a tool to the gateway without restarting any server or the Engine? | **Yes** — saved add/remove appeared on the next `tools/list`, no restart. (Draft edit did NOT propagate until Save — expected.) | probes.md |
 | 4 | Can we expose tools from both a prebuilt connector and a custom self-hosted server through one gateway endpoint? | **Yes** — `zeb-gateway-test` exposes prebuilt `main` tools + custom self-hosted `ArcadeEvalRef_*` tools together; both list and invoke. | probes.md |
@@ -36,9 +44,9 @@
 ## Suggested pass/fail gates
 | Gate | Pass condition (verbatim) | Result | Evidence |
 |---|---|---|---|
-| MCP protocol compliance | Any compliant MCP client connects without custom adapters | PASS (lib client; Claude Code to add in 2.2) | probes.md |
+| MCP protocol compliance | Any compliant MCP client connects without custom adapters | PASS — lib `mcp`-SDK client + Claude Code (Arcade-Headers AND Entra OAuth), no adapters | probes.md |
 | Tool curation | Gateway tool list matches exactly the configured allow-list | PASS | probes.md |
-| Per-user isolation | User A cannot see or invoke tools granted only to User B | Not demonstrable on this gateway — no per-user grants (both see all 7). Needs cat-3 / separate gateways / User Source. **(finding)** | probes.md |
+| Per-user isolation | User A cannot see or invoke tools granted only to User B | PARTIAL — **execution** isolation PASS (`whoami` A→A, B→B; calls run as caller). **Visibility** isolation NOT native: a single gateway shows all users the same list, so "tools granted only to B" needs cat-3 Contextual Access / separate gateways. **(finding)** | probes.md |
 | Mixed server gateway | Prebuilt and custom server tools coexist behind one gateway URL | PASS | probes.md (10 tools: 7 prebuilt + 3 custom) |

 ## Findings
@@ -48,4 +56,5 @@
 - **Invocation routes through the Engine and fails cleanly** when an OAuth provider/secret isn't configured (`Slack_WhoAmI` → "unsupported authorization provider type ID '' (providerID 'slack')") — no silent fallback to a shared credential.
 - **Ungranted tool** → `tool not enabled for this gateway` (clean rejection).
 - **Dynamic registration works**: a saved gateway edit (add + remove tools) takes effect on the next `tools/list` with no engine/server restart — gateway is pure metadata. Edits only apply after **Save** (drafts don't propagate).
+- **Entra (User Source) client auth works**: Claude Code completed the Entra OIDC login to the gateway and loaded tools in-session, no adapter (also strong cat-2 IdP-integration evidence). Note: under User Source the identity (`whoami`) is the opaque Entra `sub`, not the email — see the cat-2 identity-mapping finding in `../../LIVE-POC.md`.
 - Minor protocol nit: client logs `Session termination failed: 202` on session DELETE (benign).
@@ -0,0 +1,35 @@
+# Lane notes — Category 5 (Auditability & Observability)
+
+- **Owner:** ztaylor
+- **Last live-state check:** —
+- **Fixtures:** reuse gateway `zeb-gateway-test` + ref server `arcade-eval-ref` for generating tool-call traffic (see `../../config/targets.yaml`; ref-server tunnel is ephemeral — re-establish if down).
+
+## Orientation (read before starting)
+`../../LIVE-POC.md` → "Observability" + "Known behaviors". Key facts:
+- **Logs → ELK** via the Vector daemonset (works today; engine logs visible in Kibana with
+  `Tracing.TraceId`/`CorrelationId`/`NetCore.RequestPath`).
+- **Metrics → Grafana/Mimir** via the Grafana Agent Operator (ServiceMonitor/PodMonitor scrape →
+  remote_write to Mimir, tenant `X-Scope-OrgID: k8s-backstage-v4`). **NOT ELK.**
+- **Engine OTLP metrics are dropped today** — `arcade-otel-collector:4318` doesn't resolve (no
+  collector deployed). Confirmed in Kibana 2026-06-18.
+
+## Plan (the three signals + admin + residency)
+1. **OTEL pipeline health** — `kubectl -n arcade get svc,deploy,pod | grep -i otel`; check engine
+   `OTEL_EXPORTER_OTLP_*` env + chart OTEL collector values. Confirm the drop.
+2. **Metrics export remediation (primary objective; with the user — touches `apps/arcade`)** —
+   deploy/enable a collector so `arcade-otel-collector:4318` resolves, then bridge into Prometheus/Mimir:
+   EITHER (idiomatic) collector `prometheus` exporter `/metrics` + a `ServiceMonitor` (label
+   `release: prometheus-operator`, NOT `grafana-agent: external`), OR (push) `prometheusremotewrite`
+   exporter → `http://mimir-nginx.mimir.observability-wus2/api/v1/push` + `X-Scope-OrgID: k8s-backstage-v4`.
+   Then generate tool-call traffic and confirm per-tool/per-user metrics appear in Grafana.
+3. **Execution audit (logs)** — make tool calls; query ELK for records with user/tool/ts/outcome;
+   assess field completeness. (Arcade's own audit log covers admin actions only, by design.)
+4. **Trace propagation** — send a call with trace context; check it joins agent→tool (engine already
+   emits TraceId in ELK; test whether OTEL traces export + join).
+5. **Admin audit log** — make an admin change (update a gateway); confirm it's logged in Arcade.
+6. **Data residency** — confirm no telemetry egresses to Arcade when self-hosted (collector/exporter
+   targets ST-internal only).
+7. **InfoSec sign-off (Dane)** — gate dependency, not ours to execute; record status.
+
+## Log
+- (start here)