diff --git a/LIVE-POC.md b/LIVE-POC.md index 4bac7d2..6319b46 100644 --- a/LIVE-POC.md +++ b/LIVE-POC.md @@ -47,7 +47,10 @@ Self-hosted on `backstage-wus2-v4` via Flux; vendor Helm chart **1.8.8** - **Baseline gateway:** `zeb-gateway-test` — auth mode **Arcade Headers** (API key + `Arcade-User-ID`); 7 main-catalog tools (Slack ×2, GoogleDocs ×4, Brightdata ×1). See `config/targets.yaml`. Confirmed live 2026-06-18: tool list is gateway-wide (same for all `Arcade-User-ID`s). -- **Shared reference server:** _name + tools echo/whoami/add (Task 1.4)_ +- **Shared reference server:** `arcade-eval-ref` (dashboard id `military-healthy-posted-rats`), toolkit + `ArcadeEvalRef`, tools Echo/Add/Whoami — self-hosted at `lib/mcp_server`, registered via a Cloudflare + **quick** tunnel (ephemeral URL in `results/tunnel_url.txt`; re-register on restart). whoami exec-proof + verified (A→user-a, B→user-b). - **`whoami` identity field:** server reads `context.user_id` (arcade_mcp_server `Context`), populated by the Engine from the calling user (`Arcade-User-ID` / auth `sub`). ## Known behaviors (findings) diff --git a/categories/cat1-functional/NOTES.md b/categories/cat1-functional/NOTES.md index 3b9cab9..ffcf63e 100644 --- a/categories/cat1-functional/NOTES.md +++ b/categories/cat1-functional/NOTES.md @@ -14,6 +14,6 @@ - [ ] 2.2 — connect a **second real MCP client (Claude Code)** to the gateway (no-adapter evidence). - [x] 2.5 — **dynamic registration**: PASS — saved add/remove (−Brightdata, +Youtube) reflected on next list, no restart; draft didn't propagate until Save. - Reference server built at `lib/mcp_server` (echo/add/whoami); locally validated by `arcade deploy` (3 tools, 0 secrets). **`arcade deploy` is cloud-only (finding)** — see LIVE-POC. -- [ ] 2.7 — **mixed prebuilt + custom**: needs the ref server behind the self-hosted Engine via the **register path** (run `server.py --transport http` + cloudflared tunnel + dashboard Add Server), then compose a gateway (a `main` tool + `echo`). Doubles as cat-9 Stage-2. -- [ ] 2.4 — **`whoami` execution proof**: once registered, call whoami as A vs B (expect A→A, B→B). +- [x] 2.7 — **mixed prebuilt + custom**: PASS — gateway lists 7 prebuilt + 3 custom (ArcadeEvalRef_*, self-hosted via cloudflared tunnel) in one flat list; echo invokes. Full chain validated (also cat-9 Stage-2). +- [x] 2.4 — **`whoami` execution proof**: PASS — whoami A→user-a, B→user-b (calls execute as caller). - [ ] 2.8 — finalize scores once the above land. diff --git a/categories/cat1-functional/criteria-section-1.md b/categories/cat1-functional/criteria-section-1.md index 2939405..75680ff 100644 --- a/categories/cat1-functional/criteria-section-1.md +++ b/categories/cat1-functional/criteria-section-1.md @@ -12,8 +12,8 @@ | 2 | Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. | | PASS — 7 tools listed == the 7-tool allow-list selected (Slack×2, GoogleDocs×4, Brightdata×1). | | 3 | Per-user tool scoping — different users see different tool lists based on their explicit grants. | | **FINDING** — User A and User B see the **identical 7 tools** on one gateway (Arcade-Headers). List is gateway-wide, not per-user. Per-user differentiation needs cat-3 Contextual Access or separate gateways / User Source. | | 4 | Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). | | PARTIAL — custom `mcp`-SDK client connected with no adapter ✓. Claude Code connect = 2.2; Cursor = teammate test. | -| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. | | PENDING — vault is per-`user_id` by design; direct proof via reference-server `whoami` (2.4). | -| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. | | PENDING — needs reference server (2.7). | +| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. | | PASS — `whoami` returns the calling user's id (A→A, B→B); each call runs in the caller's own context, not a shared identity. Echo invocation clean. | +| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. | | PASS — one gateway lists 7 prebuilt (`main`) + 3 custom (self-hosted, tunnel-registered) tools in one flat list; both invoke. | | 7 | Gateway is pure metadata — adding or removing tools does not require server redeployment. | | PASS — saved edit (remove Brightdata, add Youtube_SearchForVideos) reflected on next `tools/list`, no restart. | | 8 | Dynamic tool registration — new tools become available without gateway restart. | | PASS — new tool appeared immediately after Save; no engine/server restart. | @@ -30,7 +30,7 @@ | 1 | Can a Claude Code client connect to the gateway and see only the tools granted to the current user? | Connect: lib client ✓; Claude Code pending (2.2). "Only granted tools": N/A — no per-user grants on this gateway (list is gateway-wide). | probes.md | | 2 | Can the same gateway URL serve two different users with different tool lists? | **No** — A and B see identical 7 tools. | probes.md (A==B) | | 3 | Can we add a tool to the gateway without restarting any server or the Engine? | **Yes** — saved add/remove appeared on the next `tools/list`, no restart. (Draft edit did NOT propagate until Save — expected.) | probes.md | -| 4 | Can we expose tools from both a prebuilt connector and a custom self-hosted server through one gateway endpoint? | Pending reference server (2.7). | | +| 4 | Can we expose tools from both a prebuilt connector and a custom self-hosted server through one gateway endpoint? | **Yes** — `zeb-gateway-test` exposes prebuilt `main` tools + custom self-hosted `ArcadeEvalRef_*` tools together; both list and invoke. | probes.md | | 5 | What happens when a client requests a tool the user has not been granted? | `McpError: tool not enabled for this gateway` — clean rejection at the Engine, no leak/execution. | probes.md | ## Suggested pass/fail gates @@ -39,9 +39,11 @@ | MCP protocol compliance | Any compliant MCP client connects without custom adapters | PASS (lib client; Claude Code to add in 2.2) | probes.md | | Tool curation | Gateway tool list matches exactly the configured allow-list | PASS | probes.md | | Per-user isolation | User A cannot see or invoke tools granted only to User B | Not demonstrable on this gateway — no per-user grants (both see all 7). Needs cat-3 / separate gateways / User Source. **(finding)** | probes.md | -| Mixed server gateway | Prebuilt and custom server tools coexist behind one gateway URL | Pending (2.7) | | +| Mixed server gateway | Prebuilt and custom server tools coexist behind one gateway URL | PASS | probes.md (10 tools: 7 prebuilt + 3 custom) | ## Findings +- **Per-user execution proven**: `whoami` through the full chain returns the calling user's identity (A→A, B→B) — calls execute as the caller, not a shared account. Cat-1's contribution to the per-user-execution hard gate. +- **Mixed prebuilt+custom works**: a single gateway serves Arcade-cloud `main` tools and a self-hosted (tunnel-registered) server's tools together; the full client→gateway→Engine→tunnel→server chain works (also validates cat-9 Stage-2). - **Per-user tool-list scoping is gateway-wide, not per-user, in Arcade-Headers mode** (A==B identical). Differentiation requires Contextual Access (cat 3) or separate gateways / a User Source. Signals the score-3 anchor ("per-user scoping requires workarounds") unless cat-3 lifts it. - **Invocation routes through the Engine and fails cleanly** when an OAuth provider/secret isn't configured (`Slack_WhoAmI` → "unsupported authorization provider type ID '' (providerID 'slack')") — no silent fallback to a shared credential. - **Ungranted tool** → `tool not enabled for this gateway` (clean rejection). diff --git a/categories/cat1-functional/tests/probes.md b/categories/cat1-functional/tests/probes.md index f018b37..d52440f 100644 --- a/categories/cat1-functional/tests/probes.md +++ b/categories/cat1-functional/tests/probes.md @@ -56,3 +56,27 @@ removed since first probe: ['Brightdata_ScrapeAsMarkdown'] metadata). Corollary: the edit did **not** propagate while unsaved (draft); it appeared only after **Save** — correct/expected, not a defect. Propagation was effectively immediate (next poll). +## Mixed prebuilt + custom + whoami execution proof (2.7, 2.4) — full self-hosted chain +Registered the reference server (`arcade-mcp`, toolkit `ArcadeEvalRef`) as a **self-hosted Arcade +server** via a Cloudflare tunnel (dashboard Add Server → **Arcade** type; URI = trycloudflare URL, +Secret = `ARCADE_WORKER_SECRET`), then added Echo/Add/Whoami to `zeb-gateway-test`. (`arcade deploy` +hosted is cloud-only — see LIVE-POC finding — so the register path is used.) + +Gateway lists **10 tools in one flat list — prebuilt + custom coexist**: +``` +prebuilt (7): GoogleDocs x4, Slack x2, Youtube x1 (Arcade-cloud `main`) +custom (3): ArcadeEvalRef_Add, _Echo, _Whoami (our self-hosted server, via tunnel) +``` +Invocation (full chain client -> gateway -> Engine -> Cloudflare tunnel -> local server): +``` +ArcadeEvalRef_Echo(text="hello-from-A") as A -> "hello-from-A" (isError: False) +``` +**Per-user EXECUTION proof (whoami):** +``` +whoami as A (user-a@servicetitan.com) -> "user-a@servicetitan.com" +whoami as B (user-b@servicetitan.com) -> "user-b@servicetitan.com" +``` +Each caller's `Arcade-User-ID` is injected into `context.user_id` and returned — the tool provably +executes as the calling user (distinct identity per caller, no shared/service identity). Also +validates **cat-9 Stage-2** (full tunnel-registration chain) end-to-end. + diff --git a/config/targets.yaml b/config/targets.yaml index 0ab856d..bdc64ff 100644 --- a/config/targets.yaml +++ b/config/targets.yaml @@ -21,10 +21,20 @@ gateways: - Slack_SendMessage - Slack_WhoAmI - Youtube_SearchForVideos - notes: baseline cat-1 gateway from main catalog (Slack, GoogleDocs, Youtube). Edited live 2026-06-18 (-Brightdata, +Youtube) to test dynamic registration. + - ArcadeEvalRef_Echo + - ArcadeEvalRef_Add + - ArcadeEvalRef_Whoami + notes: cat-1 gateway. Mixed prebuilt (Slack/GoogleDocs/Youtube from main) + custom (ArcadeEvalRef_* from the self-hosted ref server). Also used for dynamic-registration test (-Brightdata, +Youtube). # name -> {kind: hosted|self-hosted, tools: [...], created_by, notes} (filled in Task 1.4) -servers: {} +servers: + arcade-eval-ref: + dashboard_id: military-healthy-posted-rats # ID 'arcade-eval-ref' was rejected; used the tunnel subdomain + toolkit: ArcadeEvalRef + kind: self-hosted-registered # via Cloudflare tunnel (arcade deploy is cloud-only) + source: lib/mcp_server + tools: [ArcadeEvalRef_Echo, ArcadeEvalRef_Add, ArcadeEvalRef_Whoami] + notes: tunnel URL is an ephemeral cloudflared quick tunnel (results/tunnel_url.txt) — re-register on restart. # Headless per-user identities (vault keys). Any stable string works. users: