cat1: FINALIZE scorecard (draft 4/5); STATUS + cat-5 NOTES ready for fresh-session handoff

This commit is contained in:
2026-06-22 09:55:01 -04:00
parent 8b48f5813e
commit 53f960409e
5 changed files with 95 additions and 24 deletions
@@ -2,22 +2,30 @@
> Verbatim criteria / gates / questions from the criteria Google Doc. Fill Score / Evidence /
> Findings / Answers locally; **the human pastes** into the Google Doc. 15 scale; anchors at 1/3/5.
> Status: **in progress** — scores held until the remaining tests (2.2 Claude Code, 2.5 dynamic
> reg, 2.7 mixed, 2.4 whoami) land. Raw evidence: `tests/probes.md`.
> Status: **FINALIZED (draft) 2026-06-22** — category score **4/5**. Draft for user review before
> pasting into the criteria Google Doc. Raw evidence: `tests/probes.md`.
## Scores
| # | Criterion (verbatim) | Score (15) | Evidence / note |
|---|---|---|---|
| 1 | Implements MCP protocol correctly — tool listing, tool invocation, error responses. | | PASS (live) — lib `mcp` SDK client connected, initialized, listed 7 tools, invoked, got structured `isError` result + JSON-RPC error. Minor: 202 on session close. |
| 2 | Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. | | PASS — 7 tools listed == the 7-tool allow-list selected (Slack×2, GoogleDocs×4, Brightdata×1). |
| 3 | Per-user tool scoping — different users see different tool lists based on their explicit grants. | | **FINDING** — User A and User B see the **identical 7 tools** on one gateway (Arcade-Headers). List is gateway-wide, not per-user. Per-user differentiation needs cat-3 Contextual Access or separate gateways / User Source. |
| 4 | Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). | | PASS (Claude Code) — `claude mcp add` HTTP → ✔ Connected, no adapter, key via `${ARCADE_API_KEY}` ref (not persisted). Plus compliant `mcp`-SDK client ✓. Cursor connect in progress (GUI verify, `${env:ARCADE_API_KEY}`). |
| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. | | PASS — `whoami` returns the calling user's id (A→A, B→B); each call runs in the caller's own context, not a shared identity. Echo invocation clean. |
| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. | | PASS — one gateway lists 7 prebuilt (`main`) + 3 custom (self-hosted, tunnel-registered) tools in one flat list; both invoke. |
| 7 | Gateway is pure metadata — adding or removing tools does not require server redeployment. | | PASS — saved edit (remove Brightdata, add Youtube_SearchForVideos) reflected on next `tools/list`, no restart. |
| 8 | Dynamic tool registration — new tools become available without gateway restart. | | PASS — new tool appeared immediately after Save; no engine/server restart. |
| 1 | Implements MCP protocol correctly — tool listing, tool invocation, error responses. | 5 | PASS (live) — lib `mcp` SDK client connected, initialized, listed tools, invoked, got structured `isError` result + JSON-RPC error. Minor: 202 on session close. |
| 2 | Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. | 5 | PASS — listed tools == the configured allow-list exactly. |
| 3 | Per-user tool scoping — different users see different tool lists based on their explicit grants. | 2 | **FINDING** — User A and User B see the **identical** tool list on one gateway (Arcade-Headers). List is gateway-wide, not per-user. Per-user differentiation needs cat-3 Contextual Access or separate gateways / User Source — not native to the gateway allow-list. |
| 4 | Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). | 4 | PASS (Claude Code) — connected with **no adapter** in BOTH modes: Arcade-Headers (`claude mcp add` HTTP) and **Entra User-Source OAuth** (`/mcp` login → tools loaded in-session, echo/whoami executed). Plus compliant `mcp`-SDK client ✓. Cursor/LangGraph/internal not exercised this round (no adapter expected — same transport). |
| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. | 4 | PASS — `whoami` returns the calling user's id (A→A, B→B); each call runs in the caller's own context, not a shared identity. (Exhaustive cross-user token-access attack is cat-2/3 scope.) |
| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. | 5 | PASS — one gateway lists 7 prebuilt (`main`) + 3 custom (self-hosted, tunnel-registered) tools in one flat list; both invoke. |
| 7 | Gateway is pure metadata — adding or removing tools does not require server redeployment. | 5 | PASS — saved edit (remove Brightdata, add Youtube_SearchForVideos) reflected on next `tools/list`, no restart. |
| 8 | Dynamic tool registration — new tools become available without gateway restart. | 5 | PASS — new tool appeared immediately after Save; no engine/server restart. |
**Average:** ___ **Category score:** ___
**Average:** 4.4 **Category score:** **4**
> **Category-score rationale (4/5):** Everything at the "5" anchor is met — full curation, mixed
> prebuilt+custom behind one URL, dynamic registration, and zero-config/no-adapter MCP clients
> (Claude Code via both headers and Entra OAuth). Held back from 5 by the one gap: **per-user tool
> scoping is not native** — a single gateway serves an identical tool list to all users; per-user
> differentiation requires workarounds (separate gateways or cat-3 Contextual Access), which is the
> "3" anchor's language. Net: well above 3 (curation + mixed + dynamic + zero-config all solid),
> below 5 (no native per-user tool scoping) → **4**.
## Score anchors
- **1** — Basic MCP server, no per-user scoping or curation
@@ -27,7 +35,7 @@
## Benchmark questions
| # | Question (verbatim) | Answer | Evidence |
|---|---|---|---|
| 1 | Can a Claude Code client connect to the gateway and see only the tools granted to the current user? | Connect: lib client ✓; Claude Code pending (2.2). "Only granted tools": N/A — no per-user grants on this gateway (list is gateway-wide). | probes.md |
| 1 | Can a Claude Code client connect to the gateway and see only the tools granted to the current user? | Connect: **Yes** — Claude Code connected via both Arcade-Headers and Entra OAuth, no adapter; lib client ✓. "Only granted tools": **No** — list is gateway-wide, not per-user-granted. | probes.md |
| 2 | Can the same gateway URL serve two different users with different tool lists? | **No** — A and B see identical 7 tools. | probes.md (A==B) |
| 3 | Can we add a tool to the gateway without restarting any server or the Engine? | **Yes** — saved add/remove appeared on the next `tools/list`, no restart. (Draft edit did NOT propagate until Save — expected.) | probes.md |
| 4 | Can we expose tools from both a prebuilt connector and a custom self-hosted server through one gateway endpoint? | **Yes**`zeb-gateway-test` exposes prebuilt `main` tools + custom self-hosted `ArcadeEvalRef_*` tools together; both list and invoke. | probes.md |
@@ -36,9 +44,9 @@
## Suggested pass/fail gates
| Gate | Pass condition (verbatim) | Result | Evidence |
|---|---|---|---|
| MCP protocol compliance | Any compliant MCP client connects without custom adapters | PASS (lib client; Claude Code to add in 2.2) | probes.md |
| MCP protocol compliance | Any compliant MCP client connects without custom adapters | PASS lib `mcp`-SDK client + Claude Code (Arcade-Headers AND Entra OAuth), no adapters | probes.md |
| Tool curation | Gateway tool list matches exactly the configured allow-list | PASS | probes.md |
| Per-user isolation | User A cannot see or invoke tools granted only to User B | Not demonstrable on this gateway — no per-user grants (both see all 7). Needs cat-3 / separate gateways / User Source. **(finding)** | probes.md |
| Per-user isolation | User A cannot see or invoke tools granted only to User B | PARTIAL — **execution** isolation PASS (`whoami` A→A, B→B; calls run as caller). **Visibility** isolation NOT native: a single gateway shows all users the same list, so "tools granted only to B" needs cat-3 Contextual Access / separate gateways. **(finding)** | probes.md |
| Mixed server gateway | Prebuilt and custom server tools coexist behind one gateway URL | PASS | probes.md (10 tools: 7 prebuilt + 3 custom) |
## Findings
@@ -48,4 +56,5 @@
- **Invocation routes through the Engine and fails cleanly** when an OAuth provider/secret isn't configured (`Slack_WhoAmI` → "unsupported authorization provider type ID '' (providerID 'slack')") — no silent fallback to a shared credential.
- **Ungranted tool** → `tool not enabled for this gateway` (clean rejection).
- **Dynamic registration works**: a saved gateway edit (add + remove tools) takes effect on the next `tools/list` with no engine/server restart — gateway is pure metadata. Edits only apply after **Save** (drafts don't propagate).
- **Entra (User Source) client auth works**: Claude Code completed the Entra OIDC login to the gateway and loaded tools in-session, no adapter (also strong cat-2 IdP-integration evidence). Note: under User Source the identity (`whoami`) is the opaque Entra `sub`, not the email — see the cat-2 identity-mapping finding in `../../LIVE-POC.md`.
- Minor protocol nit: client logs `Session termination failed: 202` on session DELETE (benign).