Files
arcade-eval/categories/cat1-functional/criteria-section-1.md
T

5.2 KiB
Raw Blame History

Category 1 — Functional MCP Gateway Capability (weight 8)

Verbatim criteria / gates / questions from the criteria Google Doc. Fill Score / Evidence / Findings / Answers locally; the human pastes into the Google Doc. 15 scale; anchors at 1/3/5. Status: in progress — scores held until the remaining tests (2.2 Claude Code, 2.5 dynamic reg, 2.7 mixed, 2.4 whoami) land. Raw evidence: tests/probes.md.

Scores

# Criterion (verbatim) Score (15) Evidence / note
1 Implements MCP protocol correctly — tool listing, tool invocation, error responses. PASS (live) — lib mcp SDK client connected, initialized, listed 7 tools, invoked, got structured isError result + JSON-RPC error. Minor: 202 on session close.
2 Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. PASS — 7 tools listed == the 7-tool allow-list selected (Slack×2, GoogleDocs×4, Brightdata×1).
3 Per-user tool scoping — different users see different tool lists based on their explicit grants. FINDING — User A and User B see the identical 7 tools on one gateway (Arcade-Headers). List is gateway-wide, not per-user. Per-user differentiation needs cat-3 Contextual Access or separate gateways / User Source.
4 Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). PARTIAL — custom mcp-SDK client connected with no adapter ✓. Claude Code connect = 2.2; Cursor = teammate test.
5 Tool execution isolation — one user's tool call cannot access another user's tokens or context. PENDING — vault is per-user_id by design; direct proof via reference-server whoami (2.4).
6 Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. PENDING — needs reference server (2.7).
7 Gateway is pure metadata — adding or removing tools does not require server redeployment. PASS — saved edit (remove Brightdata, add Youtube_SearchForVideos) reflected on next tools/list, no restart.
8 Dynamic tool registration — new tools become available without gateway restart. PASS — new tool appeared immediately after Save; no engine/server restart.

Average: ___ Category score: ___

Score anchors

  • 1 — Basic MCP server, no per-user scoping or curation
  • 3 — Gateway curation works; per-user scoping requires workarounds
  • 5 — Full per-user tool scoping, mixed-server gateways, zero-config for MCP clients

Benchmark questions

# Question (verbatim) Answer Evidence
1 Can a Claude Code client connect to the gateway and see only the tools granted to the current user? Connect: lib client ✓; Claude Code pending (2.2). "Only granted tools": N/A — no per-user grants on this gateway (list is gateway-wide). probes.md
2 Can the same gateway URL serve two different users with different tool lists? No — A and B see identical 7 tools. probes.md (A==B)
3 Can we add a tool to the gateway without restarting any server or the Engine? Yes — saved add/remove appeared on the next tools/list, no restart. (Draft edit did NOT propagate until Save — expected.) probes.md
4 Can we expose tools from both a prebuilt connector and a custom self-hosted server through one gateway endpoint? Pending reference server (2.7).
5 What happens when a client requests a tool the user has not been granted? McpError: tool not enabled for this gateway — clean rejection at the Engine, no leak/execution. probes.md

Suggested pass/fail gates

Gate Pass condition (verbatim) Result Evidence
MCP protocol compliance Any compliant MCP client connects without custom adapters PASS (lib client; Claude Code to add in 2.2) probes.md
Tool curation Gateway tool list matches exactly the configured allow-list PASS probes.md
Per-user isolation User A cannot see or invoke tools granted only to User B Not demonstrable on this gateway — no per-user grants (both see all 7). Needs cat-3 / separate gateways / User Source. (finding) probes.md
Mixed server gateway Prebuilt and custom server tools coexist behind one gateway URL Pending (2.7)

Findings

  • Per-user tool-list scoping is gateway-wide, not per-user, in Arcade-Headers mode (A==B identical). Differentiation requires Contextual Access (cat 3) or separate gateways / a User Source. Signals the score-3 anchor ("per-user scoping requires workarounds") unless cat-3 lifts it.
  • Invocation routes through the Engine and fails cleanly when an OAuth provider/secret isn't configured (Slack_WhoAmI → "unsupported authorization provider type ID '' (providerID 'slack')") — no silent fallback to a shared credential.
  • Ungranted tooltool not enabled for this gateway (clean rejection).
  • Dynamic registration works: a saved gateway edit (add + remove tools) takes effect on the next tools/list with no engine/server restart — gateway is pure metadata. Edits only apply after Save (drafts don't propagate).
  • Minor protocol nit: client logs Session termination failed: 202 on session DELETE (benign).