cat1: FINALIZE scorecard (draft 4/5); STATUS + cat-5 NOTES ready for fresh-session handoff
This commit is contained in:
Vendored
+19
@@ -0,0 +1,19 @@
|
||||
{
|
||||
"workbench.colorCustomizations": {
|
||||
"activityBar.activeBackground": "#ff6433",
|
||||
"activityBar.activeBorder": "#00ff3d",
|
||||
"activityBar.background": "#ff6433",
|
||||
"activityBar.foreground": "#15202b",
|
||||
"activityBar.inactiveForeground": "#15202b99",
|
||||
"activityBarBadge.background": "#00ff3d",
|
||||
"activityBarBadge.foreground": "#15202b",
|
||||
"statusBar.background": "#ff3d00",
|
||||
"statusBar.foreground": "#e7e7e7",
|
||||
"statusBarItem.hoverBackground": "#ff6433",
|
||||
"titleBar.activeBackground": "#ff3d00",
|
||||
"titleBar.activeForeground": "#e7e7e7",
|
||||
"titleBar.inactiveBackground": "#ff3d0099",
|
||||
"titleBar.inactiveForeground": "#e7e7e799"
|
||||
},
|
||||
"peacock.color": "#ff3d00"
|
||||
}
|
||||
@@ -1,18 +1,19 @@
|
||||
# STATUS — "you are here" handoff
|
||||
|
||||
Each lane owns its own section. Update yours; don't touch others'. Keep it terse.
|
||||
Last full-repo update: 2026-06-18 (scaffold).
|
||||
Last full-repo update: 2026-06-22.
|
||||
|
||||
## Category 1 — Functional MCP Gateway Capability
|
||||
- Owner: ztaylor
|
||||
- Status: in progress (scaffold done; executing per `~/repos/docs/arcade-eval-plan.md`)
|
||||
- Last live-state check: —
|
||||
- Notes: cat-1 lane = this session. Per-user tests via `user_id` headers (real Entra SSO → cat 2).
|
||||
- Status: **SCORED (draft 4/5)** — `categories/cat1-functional/criteria-section-1.md`, awaiting user paste into the Google Doc.
|
||||
- Last live-state check: 2026-06-22
|
||||
- Result: protocol/curation/mixed/dynamic-reg/zero-config-clients all PASS; per-user execution proven (`whoami` A→A/B→B); Claude Code connected via Arcade-Headers AND Entra OAuth. One finding: per-user tool-LIST scoping is gateway-wide, not native (→ cat-3/separate gateways).
|
||||
- Fixtures (reusable): gateway `zeb-gateway-test`; ref server `arcade-eval-ref` (lib/mcp_server) registered via cloudflared quick tunnel (EPHEMERAL — re-establish for cat-9; see LIVE-POC).
|
||||
|
||||
## Category 2 — Delegated Authorization and Identity
|
||||
- Owner: — (security cluster: Dane / Chandu)
|
||||
- Status: not started (criteria stub seeded)
|
||||
- Notes: holds the Entra/Okta SSO login → identity-mapping test (a teammate can be User B).
|
||||
- Status: not started (criteria stub seeded) — **but cat-1 work already generated strong evidence; see LIVE-POC "Known behaviors".**
|
||||
- Notes: holds the Entra/Okta SSO login → identity-mapping test. Open finding: User Source keys user_id on opaque Entra `sub`, mismatching the dashboard email → blocks downstream OAuth consent bind (fix: map User Source to the email claim). Google provider redirect-uri/secret issue was resolved 2026-06-22.
|
||||
|
||||
## Category 3 — Tool-Level Access Control and Policy
|
||||
- Owner: — (security cluster)
|
||||
@@ -24,8 +25,9 @@ Last full-repo update: 2026-06-18 (scaffold).
|
||||
|
||||
## Category 5 — Auditability and Observability
|
||||
- Owner: ztaylor
|
||||
- Status: not started (criteria stub seeded)
|
||||
- Notes: metrics → Grafana/Mimir (NOT ELK); engine OTLP currently dropped (no collector). See LIVE-POC.
|
||||
- Status: **NEXT — start here in a fresh session** (invoke skill `arcade-gateway-eval`; read this + LIVE-POC; run live-state check). See `categories/cat5-auditability/NOTES.md` for the plan.
|
||||
- Last live-state check: —
|
||||
- Notes: metrics → **Grafana/Mimir** (NOT ELK); logs → ELK (Vector). Engine OTLP currently **dropped** — collector `arcade-otel-collector:4318` doesn't resolve. First task = OTEL collector → Prometheus/Mimir remediation (with the user; touches `k8s-backstage-v2/apps/arcade`). Full evidence + remediation shapes in LIVE-POC "Observability".
|
||||
|
||||
## Category 6 — Security and Compliance
|
||||
- Owner: — (security cluster)
|
||||
|
||||
@@ -11,8 +11,14 @@
|
||||
- Q5: ungranted tool → `McpError: tool not enabled for this gateway`.
|
||||
|
||||
## Remaining for cat-1 scoring
|
||||
- [x] 2.2 (Claude Code) — `claude mcp add` HTTP → ✔ Connected, no adapter; key kept as `${ARCADE_API_KEY}` ref (not persisted).
|
||||
- [ ] 2.2 (Cursor) — `.cursor/mcp.json` written with `${env:ARCADE_API_KEY}`; user verifying in Cursor UI (launch from shell with .env loaded).
|
||||
- [x] 2.2 (Claude Code) — connected with NO adapter in both modes: Arcade-Headers (`claude mcp add`) AND Entra User-Source OAuth (`/mcp` login → tools loaded in-session; echo/whoami ran). Key kept as `${ARCADE_API_KEY}` ref (not persisted).
|
||||
- [~] 2.2 (Cursor/LangGraph/internal) — not exercised this round; no adapter expected (same transport). Cursor config currently empty.
|
||||
- [x] 2.8 — scorecard FINALIZED (draft 4/5) in criteria-section-1.md; awaiting user paste into Google Doc.
|
||||
|
||||
## Side evidence generated (handed to other lanes)
|
||||
- cat-2: Entra IdP login works; identity = opaque `sub`; downstream OAuth consent-bind mismatch (see LIVE-POC).
|
||||
- cat-4/8/9: `arcade deploy` is cloud-only → self-hosted servers use the register path.
|
||||
- cat-9: full tunnel-registration chain validated end-to-end (client→gateway→Engine→tunnel→local server).
|
||||
- [x] 2.5 — **dynamic registration**: PASS — saved add/remove (−Brightdata, +Youtube) reflected on next list, no restart; draft didn't propagate until Save.
|
||||
- Reference server built at `lib/mcp_server` (echo/add/whoami); locally validated by `arcade deploy` (3 tools, 0 secrets). **`arcade deploy` is cloud-only (finding)** — see LIVE-POC.
|
||||
- [x] 2.7 — **mixed prebuilt + custom**: PASS — gateway lists 7 prebuilt + 3 custom (ArcadeEvalRef_*, self-hosted via cloudflared tunnel) in one flat list; echo invokes. Full chain validated (also cat-9 Stage-2).
|
||||
|
||||
@@ -2,22 +2,30 @@
|
||||
|
||||
> Verbatim criteria / gates / questions from the criteria Google Doc. Fill Score / Evidence /
|
||||
> Findings / Answers locally; **the human pastes** into the Google Doc. 1–5 scale; anchors at 1/3/5.
|
||||
> Status: **in progress** — scores held until the remaining tests (2.2 Claude Code, 2.5 dynamic
|
||||
> reg, 2.7 mixed, 2.4 whoami) land. Raw evidence: `tests/probes.md`.
|
||||
> Status: **FINALIZED (draft) 2026-06-22** — category score **4/5**. Draft for user review before
|
||||
> pasting into the criteria Google Doc. Raw evidence: `tests/probes.md`.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Implements MCP protocol correctly — tool listing, tool invocation, error responses. | | PASS (live) — lib `mcp` SDK client connected, initialized, listed 7 tools, invoked, got structured `isError` result + JSON-RPC error. Minor: 202 on session close. |
|
||||
| 2 | Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. | | PASS — 7 tools listed == the 7-tool allow-list selected (Slack×2, GoogleDocs×4, Brightdata×1). |
|
||||
| 3 | Per-user tool scoping — different users see different tool lists based on their explicit grants. | | **FINDING** — User A and User B see the **identical 7 tools** on one gateway (Arcade-Headers). List is gateway-wide, not per-user. Per-user differentiation needs cat-3 Contextual Access or separate gateways / User Source. |
|
||||
| 4 | Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). | | PASS (Claude Code) — `claude mcp add` HTTP → ✔ Connected, no adapter, key via `${ARCADE_API_KEY}` ref (not persisted). Plus compliant `mcp`-SDK client ✓. Cursor connect in progress (GUI verify, `${env:ARCADE_API_KEY}`). |
|
||||
| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. | | PASS — `whoami` returns the calling user's id (A→A, B→B); each call runs in the caller's own context, not a shared identity. Echo invocation clean. |
|
||||
| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. | | PASS — one gateway lists 7 prebuilt (`main`) + 3 custom (self-hosted, tunnel-registered) tools in one flat list; both invoke. |
|
||||
| 7 | Gateway is pure metadata — adding or removing tools does not require server redeployment. | | PASS — saved edit (remove Brightdata, add Youtube_SearchForVideos) reflected on next `tools/list`, no restart. |
|
||||
| 8 | Dynamic tool registration — new tools become available without gateway restart. | | PASS — new tool appeared immediately after Save; no engine/server restart. |
|
||||
| 1 | Implements MCP protocol correctly — tool listing, tool invocation, error responses. | 5 | PASS (live) — lib `mcp` SDK client connected, initialized, listed tools, invoked, got structured `isError` result + JSON-RPC error. Minor: 202 on session close. |
|
||||
| 2 | Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. | 5 | PASS — listed tools == the configured allow-list exactly. |
|
||||
| 3 | Per-user tool scoping — different users see different tool lists based on their explicit grants. | 2 | **FINDING** — User A and User B see the **identical** tool list on one gateway (Arcade-Headers). List is gateway-wide, not per-user. Per-user differentiation needs cat-3 Contextual Access or separate gateways / User Source — not native to the gateway allow-list. |
|
||||
| 4 | Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). | 4 | PASS (Claude Code) — connected with **no adapter** in BOTH modes: Arcade-Headers (`claude mcp add` HTTP) and **Entra User-Source OAuth** (`/mcp` login → tools loaded in-session, echo/whoami executed). Plus compliant `mcp`-SDK client ✓. Cursor/LangGraph/internal not exercised this round (no adapter expected — same transport). |
|
||||
| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. | 4 | PASS — `whoami` returns the calling user's id (A→A, B→B); each call runs in the caller's own context, not a shared identity. (Exhaustive cross-user token-access attack is cat-2/3 scope.) |
|
||||
| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. | 5 | PASS — one gateway lists 7 prebuilt (`main`) + 3 custom (self-hosted, tunnel-registered) tools in one flat list; both invoke. |
|
||||
| 7 | Gateway is pure metadata — adding or removing tools does not require server redeployment. | 5 | PASS — saved edit (remove Brightdata, add Youtube_SearchForVideos) reflected on next `tools/list`, no restart. |
|
||||
| 8 | Dynamic tool registration — new tools become available without gateway restart. | 5 | PASS — new tool appeared immediately after Save; no engine/server restart. |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
**Average:** 4.4 **Category score:** **4**
|
||||
|
||||
> **Category-score rationale (4/5):** Everything at the "5" anchor is met — full curation, mixed
|
||||
> prebuilt+custom behind one URL, dynamic registration, and zero-config/no-adapter MCP clients
|
||||
> (Claude Code via both headers and Entra OAuth). Held back from 5 by the one gap: **per-user tool
|
||||
> scoping is not native** — a single gateway serves an identical tool list to all users; per-user
|
||||
> differentiation requires workarounds (separate gateways or cat-3 Contextual Access), which is the
|
||||
> "3" anchor's language. Net: well above 3 (curation + mixed + dynamic + zero-config all solid),
|
||||
> below 5 (no native per-user tool scoping) → **4**.
|
||||
|
||||
## Score anchors
|
||||
- **1** — Basic MCP server, no per-user scoping or curation
|
||||
@@ -27,7 +35,7 @@
|
||||
## Benchmark questions
|
||||
| # | Question (verbatim) | Answer | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Can a Claude Code client connect to the gateway and see only the tools granted to the current user? | Connect: lib client ✓; Claude Code pending (2.2). "Only granted tools": N/A — no per-user grants on this gateway (list is gateway-wide). | probes.md |
|
||||
| 1 | Can a Claude Code client connect to the gateway and see only the tools granted to the current user? | Connect: **Yes** — Claude Code connected via both Arcade-Headers and Entra OAuth, no adapter; lib client ✓. "Only granted tools": **No** — list is gateway-wide, not per-user-granted. | probes.md |
|
||||
| 2 | Can the same gateway URL serve two different users with different tool lists? | **No** — A and B see identical 7 tools. | probes.md (A==B) |
|
||||
| 3 | Can we add a tool to the gateway without restarting any server or the Engine? | **Yes** — saved add/remove appeared on the next `tools/list`, no restart. (Draft edit did NOT propagate until Save — expected.) | probes.md |
|
||||
| 4 | Can we expose tools from both a prebuilt connector and a custom self-hosted server through one gateway endpoint? | **Yes** — `zeb-gateway-test` exposes prebuilt `main` tools + custom self-hosted `ArcadeEvalRef_*` tools together; both list and invoke. | probes.md |
|
||||
@@ -36,9 +44,9 @@
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| MCP protocol compliance | Any compliant MCP client connects without custom adapters | PASS (lib client; Claude Code to add in 2.2) | probes.md |
|
||||
| MCP protocol compliance | Any compliant MCP client connects without custom adapters | PASS — lib `mcp`-SDK client + Claude Code (Arcade-Headers AND Entra OAuth), no adapters | probes.md |
|
||||
| Tool curation | Gateway tool list matches exactly the configured allow-list | PASS | probes.md |
|
||||
| Per-user isolation | User A cannot see or invoke tools granted only to User B | Not demonstrable on this gateway — no per-user grants (both see all 7). Needs cat-3 / separate gateways / User Source. **(finding)** | probes.md |
|
||||
| Per-user isolation | User A cannot see or invoke tools granted only to User B | PARTIAL — **execution** isolation PASS (`whoami` A→A, B→B; calls run as caller). **Visibility** isolation NOT native: a single gateway shows all users the same list, so "tools granted only to B" needs cat-3 Contextual Access / separate gateways. **(finding)** | probes.md |
|
||||
| Mixed server gateway | Prebuilt and custom server tools coexist behind one gateway URL | PASS | probes.md (10 tools: 7 prebuilt + 3 custom) |
|
||||
|
||||
## Findings
|
||||
@@ -48,4 +56,5 @@
|
||||
- **Invocation routes through the Engine and fails cleanly** when an OAuth provider/secret isn't configured (`Slack_WhoAmI` → "unsupported authorization provider type ID '' (providerID 'slack')") — no silent fallback to a shared credential.
|
||||
- **Ungranted tool** → `tool not enabled for this gateway` (clean rejection).
|
||||
- **Dynamic registration works**: a saved gateway edit (add + remove tools) takes effect on the next `tools/list` with no engine/server restart — gateway is pure metadata. Edits only apply after **Save** (drafts don't propagate).
|
||||
- **Entra (User Source) client auth works**: Claude Code completed the Entra OIDC login to the gateway and loaded tools in-session, no adapter (also strong cat-2 IdP-integration evidence). Note: under User Source the identity (`whoami`) is the opaque Entra `sub`, not the email — see the cat-2 identity-mapping finding in `../../LIVE-POC.md`.
|
||||
- Minor protocol nit: client logs `Session termination failed: 202` on session DELETE (benign).
|
||||
|
||||
@@ -0,0 +1,35 @@
|
||||
# Lane notes — Category 5 (Auditability & Observability)
|
||||
|
||||
- **Owner:** ztaylor
|
||||
- **Last live-state check:** —
|
||||
- **Fixtures:** reuse gateway `zeb-gateway-test` + ref server `arcade-eval-ref` for generating tool-call traffic (see `../../config/targets.yaml`; ref-server tunnel is ephemeral — re-establish if down).
|
||||
|
||||
## Orientation (read before starting)
|
||||
`../../LIVE-POC.md` → "Observability" + "Known behaviors". Key facts:
|
||||
- **Logs → ELK** via the Vector daemonset (works today; engine logs visible in Kibana with
|
||||
`Tracing.TraceId`/`CorrelationId`/`NetCore.RequestPath`).
|
||||
- **Metrics → Grafana/Mimir** via the Grafana Agent Operator (ServiceMonitor/PodMonitor scrape →
|
||||
remote_write to Mimir, tenant `X-Scope-OrgID: k8s-backstage-v4`). **NOT ELK.**
|
||||
- **Engine OTLP metrics are dropped today** — `arcade-otel-collector:4318` doesn't resolve (no
|
||||
collector deployed). Confirmed in Kibana 2026-06-18.
|
||||
|
||||
## Plan (the three signals + admin + residency)
|
||||
1. **OTEL pipeline health** — `kubectl -n arcade get svc,deploy,pod | grep -i otel`; check engine
|
||||
`OTEL_EXPORTER_OTLP_*` env + chart OTEL collector values. Confirm the drop.
|
||||
2. **Metrics export remediation (primary objective; with the user — touches `apps/arcade`)** —
|
||||
deploy/enable a collector so `arcade-otel-collector:4318` resolves, then bridge into Prometheus/Mimir:
|
||||
EITHER (idiomatic) collector `prometheus` exporter `/metrics` + a `ServiceMonitor` (label
|
||||
`release: prometheus-operator`, NOT `grafana-agent: external`), OR (push) `prometheusremotewrite`
|
||||
exporter → `http://mimir-nginx.mimir.observability-wus2/api/v1/push` + `X-Scope-OrgID: k8s-backstage-v4`.
|
||||
Then generate tool-call traffic and confirm per-tool/per-user metrics appear in Grafana.
|
||||
3. **Execution audit (logs)** — make tool calls; query ELK for records with user/tool/ts/outcome;
|
||||
assess field completeness. (Arcade's own audit log covers admin actions only, by design.)
|
||||
4. **Trace propagation** — send a call with trace context; check it joins agent→tool (engine already
|
||||
emits TraceId in ELK; test whether OTEL traces export + join).
|
||||
5. **Admin audit log** — make an admin change (update a gateway); confirm it's logged in Arcade.
|
||||
6. **Data residency** — confirm no telemetry egresses to Arcade when self-hosted (collector/exporter
|
||||
targets ST-internal only).
|
||||
7. **InfoSec sign-off (Dane)** — gate dependency, not ours to execute; record status.
|
||||
|
||||
## Log
|
||||
- (start here)
|
||||
Reference in New Issue
Block a user