docs: update deploy design for public-ingress pivot + publicOnlyTransport finding

Records that the in-cluster Service DNS could not be used for a dashboard-registered worker (engine publicOnlyTransport SSRF guard blocks internal addresses), the pivot to st-app chart + public ingress at arcade-eval-ref.st.dev (CNAME -> k8s-backstage.st.dev), and the verified end-to-end whoami result. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 12:44:55 -04:00
5 changed files with 83 additions and 388 deletions
@@ -1,24 +1,23 @@
 # STATUS — "you are here" handoff
 Each lane owns its own section. Update yours; don't touch others'. Keep it terse.
-Last full-repo update: 2026-06-18 (scaffold).
+Last full-repo update: 2026-06-22.
 ## Category 1 — Functional MCP Gateway Capability
 - Owner: ztaylor
- Status: in progress (scaffold done; executing per `~/repos/docs/arcade-eval-plan.md`)
+- Status: **SCORED (draft 4/5)** — `categories/cat1-functional/criteria-section-1.md`, awaiting user paste into the Google Doc.
- Last live-state check: —
+- Last live-state check: 2026-06-22
- Notes: cat-1 lane = this session. Per-user tests via `user_id` headers (real Entra SSO → cat 2).
+- Result: protocol/curation/mixed/dynamic-reg/zero-config-clients all PASS; per-user execution proven (`whoami` A→A/B→B); Claude Code connected via Arcade-Headers AND Entra OAuth. One finding: per-user tool-LIST scoping is gateway-wide, not native (→ cat-3/separate gateways).
 - Fixtures (reusable): gateway `zeb-gateway-test`; ref server `arcade-eval-ref` (lib/mcp_server) registered via cloudflared quick tunnel (EPHEMERAL — re-establish for cat-9; see LIVE-POC).
 ## Category 2 — Delegated Authorization and Identity
 - Owner: — (security cluster: Dane / Chandu)
- Status: not started (criteria stub seeded)
+- Status: not started (criteria stub seeded) — **but cat-1 work already generated strong evidence; see LIVE-POC "Known behaviors".**
- Notes: holds the Entra/Okta SSO login → identity-mapping test (a teammate can be User B).
+- Notes: holds the Entra/Okta SSO login → identity-mapping test. Open finding: User Source keys user_id on opaque Entra `sub`, mismatching the dashboard email → blocks downstream OAuth consent bind (fix: map User Source to the email claim). Google provider redirect-uri/secret issue was resolved 2026-06-22.
 ## Category 3 — Tool-Level Access Control and Policy
- Owner: trachakonda
+- Owner: — (security cluster)
- Status: in progress — B1 (curr-state) + B5 (enforcement/bypass) DONE; B2/B3/B4 + per-user B1 pending dashboard + Contextual Access.
+- Status: not started (criteria stub seeded)
 - Last live-state check: 2026-06-18 (apps/arcade #2383 steady; dashboard 200). Noted: otel-collector + jaeger now deployed (cat-5) → trace store for B6.
 - Notes: Engine is the enforcement point (ungranted tool rejected there); one gateway = gateway-wide tool list (A==B), not per-user. Bypass: public-isolated for in-cluster worker (ClusterIP); tunnel custom servers = documented boundary. Blocked on dashboard for Contextual Access (input-block/output-redact) + per-user grants.
 ## Category 4 — Connector Coverage and Custom Server Development
 - Owner: — (adopt/operate cluster)
@@ -26,8 +25,9 @@ Last full-repo update: 2026-06-18 (scaffold).
 ## Category 5 — Auditability and Observability
 - Owner: ztaylor
- Status: not started (criteria stub seeded)
+- Status: **NEXT — start here in a fresh session** (invoke skill `arcade-gateway-eval`; read this + LIVE-POC; run live-state check). See `categories/cat5-auditability/NOTES.md` for the plan.
- Notes: metrics → Grafana/Mimir (NOT ELK); engine OTLP currently dropped (no collector). See LIVE-POC.
+- Last live-state check: —
 - Notes: metrics → **Grafana/Mimir** (NOT ELK); logs → ELK (Vector). Engine OTLP currently **dropped** — collector `arcade-otel-collector:4318` doesn't resolve. First task = OTEL collector → Prometheus/Mimir remediation (with the user; touches `k8s-backstage-v2/apps/arcade`). Full evidence + remediation shapes in LIVE-POC "Observability".
 ## Category 6 — Security and Compliance
 - Owner: — (security cluster)
@@ -25,24 +25,20 @@
 ## Benchmark tests
 | # | Test (verbatim) | Result | Evidence |
 |---|---|---|---|
-| 1 | Grant User A access to GitHub tools and User B access to Atlassian tools. Verify User A cannot invoke Atlassian tools even if they know the tool name. | PARTIAL (curr-state) — on one gateway the tool list is gateway-wide, identical for A and B (not per-user); an ungranted/unknown tool is cleanly rejected at the Engine. True per-user grant (A=GitHub, B=Atlassian) needs 2 gateways or Contextual Access (dashboard). | probes.md §B1: A==B 10 tools; `Github_CreateIssue` → `McpError: tool not enabled for this gateway` |
+| 1 | Grant User A access to GitHub tools and User B access to Atlassian tools. Verify User A cannot invoke Atlassian tools even if they know the tool name. |  |  |
 | 2 | Write a Contextual Access rule that blocks inputs containing a specific pattern (e.g., a mock SSN). Send a matching input — verify it is blocked before execution and logged. |  |  |
 | 3 | Write a Contextual Access rule that redacts a field from tool outputs. Verify the field is absent from the agent's response. |  |  |
 | 4 | Update User A's tool grants (add a new tool). Verify the change takes effect without restarting anything. |  |  |
-| 5 | Confirm policy enforcement point: attempt to bypass Contextual Access by calling the server directly (bypassing the Engine). Confirm this is architecturally prevented or explicitly documented as a known boundary. | DONE — enforcement is at the Engine. All arcade Services are ClusterIP; the worker (where tools run) is not public → public bypass network-prevented. In-cluster direct-to-worker is reachable but secret-gated (operational). Self-hosted custom servers exposed via public tunnel are a documented bypass boundary. | probes.md §B5: svc types; worker `/worker/health`=200, `/mcp`=406 (needs secret) |
+| 5 | Confirm policy enforcement point: attempt to bypass Contextual Access by calling the server directly (bypassing the Engine). Confirm this is architecturally prevented or explicitly documented as a known boundary. |  |  |
 ## Suggested pass/fail gates
 | Gate | Pass condition (verbatim) | Result | Evidence |
 |---|---|---|---|
-| Tool isolation | Cross-user tool calls are rejected at the Engine regardless of client behavior | PARTIAL — ungranted/unknown tools are rejected at the Engine (not the client); but on one gateway the allow-list is gateway-wide, so it is not yet per-*user* isolation. | probes.md §B1/§B5 |
+| Tool isolation | Cross-user tool calls are rejected at the Engine regardless of client behavior |  |  |
 | Input policy | Blocked inputs are rejected before execution, not after |  |  |
 | Output policy | Redacted fields are absent from the agent's response |  |  |
 | Audit | Every policy decision (allow/block/redact) produces a retrievable log entry |  |  |
 | Dynamic grants | Tool grant updates take effect without service restart |  |  |
 ## Findings
- **Enforcement point = the Engine (criterion 5).** Ungranted/unknown tool calls are rejected at the Engine with a clean structured error (`tool not enabled for this gateway`) — no leak, no execution, no shared-credential fallback.
+- 
 - **Tool curation is per-gateway, not per-user (criteria 1, 2).** On a single Arcade-Headers gateway the tool list is identical for every `Arcade-User-ID` (A==B). Per-user differentiation requires Contextual Access (an access hook) or separate gateways / a User Source — to be tested once dashboard access lands.
 - **Bypass surface (criterion 5 boundary).** Public attack surface is network-isolated for in-cluster tools (worker is ClusterIP). Two documented boundaries: (a) in-cluster direct-to-worker is only secret+network gated (operational, not architectural); (b) self-hosted custom servers exposed via public Cloudflare tunnel can be called directly, bypassing Engine policy — mitigate in prod via ClusterIP registration / tunnel access control.
 - **V4 seam note.** With no ToolHub deployed, all of the above is Arcade-native enforcement. For a ToolHub front, the authority decision + audit (`ToolHubDecisionRecord`) would move to the ToolHub MCP Endpoint, and Arcade should be reachable only via ToolHub (closes boundary (a)/(b)).
 - _Pending (dashboard / Contextual Access): per-user grants (1), Contextual Access input block (3) + output redaction (4), dynamic per-user grant w/o restart (7), audit of decisions (6), Okta-group scopes (8)._
@@ -1,200 +0,0 @@
 # Where the AI Gateway and MCP Gateway fit — target architecture
 > Cat-3 (Tool-Level Access Control & Policy) deliverable: the V4 seam map, extended into a
 > concrete integration design. **Goal:** place an **AI Gateway** (LLM/model proxy) and an
 > **MCP Gateway** (Arcade) into the existing `Agent Platform → Tool Hub → Automation Hub`
 > stack **without major work on the Tool Hub or Automation Hub applications.**
 >
 > Grounded in: `servicetitan/tool-hub` @ master, `servicetitan/automation-hub` @ master,
 > arcade-eval LIVE-POC (all read 2026-06-22).
 ## The thesis in one paragraph
 Both Tool Hub and Automation Hub were built with the exact seams this needs, and neither does
 the one thing Arcade is for. **Tool Hub** already has a data-driven `IExecutionAdapter` registry
 with a **`mcp_proxy` SourceType named in the contract** — adding Arcade is the *intended*
 extension, not surgery. **Automation Hub** explicitly scopes per-user OAuth / connector
 infrastructure as a **non-goal** and names per-user OAuth brokering as the gap an external
 platform fills. So the minimal-work design is: **(1) AI Gateway = pure configuration** (repoint
 the model/embedding base URLs every component already calls); **(2) MCP Gateway (Arcade) = one
 adapter pair behind Tool Hub's existing `mcp_proxy` seam**, with all per-user third-party OAuth
 living *inside Arcade* (so Tool Hub needs no credential vault and no new OBO authority).
 Automation Hub is untouched. Tool Hub remains the single authority/policy/audit plane over
 **both** execution backends.
 ## Design constraints — what "no major work" means here
 | App | Allowed | Explicitly avoided |
 |---|---|---|
 | **Tool Hub** | Implement one `ICatalogSource` + one `IExecutionAdapter` (`type='arcade'`/`mcp_proxy`) — the designed extension point. Config: model base URLs → AI Gateway. | No change to discovery hot path, permission model, idempotency, audit, or the OBO core. Per-user SaaS OAuth is **not** added to Tool Hub. |
 | **Automation Hub** | Nothing. | No new executor, no connector framework, no OAuth store. AH stays one of Tool Hub's catalog sources. |
 | **Agent Platform** | Config: inference endpoint → AI Gateway; identity = per-user Entra SSO. | No re-architecture. |
 ## 1. Target topology
 ```mermaid
 flowchart TB
  subgraph IDP["Identity"]
    Entra["Entra ID SSO<br/>per-user login / IUM"]
  end
  subgraph AGENT["Agent plane"]
    Agent["LLM Agent<br/>(AgentOS / sidecar)"]
  end
  subgraph GW["Gateways — inserted, no app surgery"]
    AIGW["AI Gateway<br/>LiteLLM-class LLM/model proxy<br/>keys · routing · rate-limit · cost · audit"]
    MCPGW["MCP Gateway — Arcade<br/>MCP transport + per-user OAuth broker"]
  end
  subgraph TH["Tool Hub — authority / data plane (core UNCHANGED)"]
    MCPHost["MCP surface<br/>search_tools · get_tool_details · execute_tool"]
    Policy["Stage0-6: permission re-check ·<br/>idempotency · rate-limit · audit/outbox"]
    Reg["IExecutionAdapter registry<br/>(catalog_source.type → adapter)"]
    AHAdapter["automation_hub adapter<br/>(exists)"]
    ArcAdapter["arcade adapter<br/>(NEW — mcp_proxy seam)"]
  end
  subgraph AH["Automation Hub — UNCHANGED"]
    AHCat["Catalog API<br/>GET /api/catalog/actions (ETag, cursor)"]
    AHExec["POST /actions/{id}/execute<br/>st.automation_hub.execute"]
    AHDown["ST Core API v2 / Internal API<br/>IUM bot-user impersonation"]
  end
  subgraph EXT["Third-party + custom capability"]
    SaaS["GitHub · Slack · Google · ..."]
    Custom["Custom / partner MCP servers"]
  end
  subgraph MODELS["Model providers"]
    LLMs["Anthropic · Voyage · OpenAI · internal"]
  end
  Entra -. "per-user token" .-> Agent
  Agent -- "inference" --> AIGW
  Agent -- "MCP meta-tools (carries user identity)" --> MCPHost
  MCPHost --> Policy --> Reg
  Reg --> AHAdapter
  Reg --> ArcAdapter
  AHAdapter -- "catalog sync" --> AHCat
  AHAdapter -- "IUM OBO execute" --> AHExec
  AHExec --> AHDown
  ArcAdapter -- "MCP tools/call + user identity" --> MCPGW
  MCPGW -- "resolve per-user OAuth token" --> SaaS
  MCPGW --> Custom
  AIGW --> LLMs
  TH -. "enrichment · query rewrite · embeddings · rerank" .-> AIGW
  classDef new fill:#ffe8cc,stroke:#e8860c,stroke-width:2px,color:#000;
  class AIGW,MCPGW,ArcAdapter new;
 ```
 Highlighted (orange) = the only new pieces: the **AI Gateway**, the **MCP Gateway (Arcade)**,
 and the thin **arcade adapter** that slots into Tool Hub's existing registry.
 ## 2. Two execution paths through one authority plane
 Tool Hub stays the single point of policy, idempotency, and audit. The *only* difference
 between an internal action and a third-party action is which adapter the registry resolves — and
 that the Arcade path adds per-user OAuth that neither Tool Hub nor AH can do today.
 ```mermaid
 sequenceDiagram
  autonumber
  participant U as User / Agent
  participant TH as Tool Hub
  participant AR as Arcade (MCP GW)
  participant SaaS as Third-party SaaS
  participant AH as Automation Hub
  participant ST as ServiceTitan APIs
  Note over U,ST: A. Internal ServiceTitan action — existing path, unchanged
  U->>TH: execute_tool(automation_hub://crm.create_job, input)
  TH->>TH: permission re-check · idempotency · rate-limit · audit
  TH->>AH: POST /actions/{id}/execute (IUM OBO, bot-user)
  AH->>ST: call Core / Internal API
  ST-->>AH: result
  AH-->>TH: ActionExecutionResult
  TH-->>U: CallToolResult
  Note over U,SaaS: B. Third-party action — NEW path via Arcade
  U->>TH: execute_tool(arcade://github.create_issue, input)
  TH->>TH: SAME permission re-check · idempotency · rate-limit · audit
  TH->>AR: MCP tools/call + user identity (Entra SSO)
  AR->>AR: resolve this user's stored GitHub OAuth token
  AR->>SaaS: call GitHub API AS THE USER
  SaaS-->>AR: result
  AR-->>TH: MCP CallToolResult
  TH-->>U: CallToolResult
 ```
 The critical property: **the per-user OAuth complexity lives entirely in Arcade.** Tool Hub only
 authenticates the *user* to Arcade and passes identity — so it needs no third-party token vault
 and no change to its Entra/IUM OBO core (the arcade adapter sets `RequiresObo=false` for the
 third-party-OAuth case; Arcade does the brokering). That is what keeps this out of "major work."
 ## 3. The AI Gateway is a configuration change, not a build
 Every model/embedding call in the stack already goes through a pinned SDK with a configurable
 endpoint. Point those endpoints at one AI Gateway and you get unified keys, routing, rate-limit,
 cost control, and audit across all AI traffic — with zero application code change.
 ```mermaid
 flowchart LR
  A["Agent inference"] --> AIGW
  B["Tool Hub — enrichment (Claude)"] --> AIGW
  C["Tool Hub — query rewrite (Claude Haiku)"] --> AIGW
  D["Tool Hub — embeddings + rerank (Voyage)"] --> AIGW
  E["Arcade engine — LLM / embeddings"] --> AIGW
  AIGW["AI Gateway (LiteLLM-class)<br/>keys · routing · rate-limit · cost · audit"] --> P["Anthropic · Voyage · OpenAI · internal"]
  classDef new fill:#ffe8cc,stroke:#e8860c,stroke-width:2px,color:#000;
  class AIGW new;
 ```
 The Arcade POC already routes its engine LLM + embeddings through in-cluster LiteLLM
 (LIVE-POC), so this consolidates an existing pattern rather than inventing one.
 ## 4. Change surface — component by component
 | Component | Role in target | Change required | Evidence it's minimal |
 |---|---|---|---|
 | **AI Gateway** (LiteLLM-class) | Single egress for all LLM/embedding traffic | **Config only** — repoint base URLs | Tool Hub model providers are DI seams with configurable endpoints (`IEmbeddingProvider`, `IEnrichmentProvider`, `IQueryRewriter`, `IReranker`); Arcade already uses in-cluster LiteLLM |
 | **MCP Gateway (Arcade)** | MCP transport + **per-user OAuth broker** for SaaS / custom MCP | **Deploy + register** as Tool Hub catalog source | Arcade is a running self-hosted POC (`api.arcade.st.dev`) |
 | **Tool Hub** | Authority: discovery, policy, idempotency, audit over both backends | **One adapter pair** in the `mcp_proxy` slot + endpoint config | `ICatalogSource` docstring already names `"mcp_proxy"`; adapter selection is `catalog_source.type → registry`, dispatch site unchanged |
 | **Automation Hub** | One of Tool Hub's catalog sources (internal ST actions) | **None** | AH's catalog + `/actions/{id}/execute` contract already matches Tool Hub 1:1 (same 4 execution modes, JSON-Schema I/O, `namespace:name@semver`) |
 | **Agent Platform** | Caller | **Config** — inference → AI Gateway; identity → per-user Entra SSO | — |
 ## 5. Why this is the right seam (and the one open decision)
 - **It fills a real, documented gap.** Per-user third-party OAuth is explicitly absent from
  *both* apps: AH lists "OAuth token management / connector marketplace" as a **V1 non-goal** and
  its own platform research names per-user OAuth brokering as what an external platform must add;
  Tool Hub's downstream auth is Entra/IUM-only. Arcade is precisely that missing layer.
 - **It uses the designed extension point.** Tool Hub's `mcp_proxy` SourceType and data-driven
  adapter registry exist *for this*. No core path changes.
 - **It preserves the authority model (cat-3 criterion 5).** Tool Hub remains the single Engine
  for permission re-check, idempotency, rate-limit, and audit over *both* AH and Arcade calls —
  so the policy/enforcement story is unchanged and now covers third-party tools too.
 - **One decision to confirm with Platform (chump/tahmad):** Tool Hub's ADR-009 currently intends
  partner/MCP capabilities to arrive *through AH as actions*. Routing Arcade **direct into Tool
  Hub** as a peer catalog source is a conscious deviation (ADR-009 even lists "BYO MCP outside
  AH's onboarding flow" as a trigger to reconsider). The recommendation here is the direct path,
  because AH has no plugin model and explicitly defers third-party connectivity — so going
  through AH would push *more* net-new work into AH, violating the "no major work" constraint.
 ## Evidence index
 - **Tool Hub:** `src/ToolHub.Contracts/Catalog/ICatalogSource.cs` (`mcp_proxy` named);
  `src/ToolHub.Contracts/Execution/IExecutionAdapter.cs` (`RequiresObo`, `GetOboAuthority`);
  `src/ToolHub.Execution/Dispatch/ExecutionAdapterRegistry.cs` (data-driven dispatch);
  `Stage3_OboAcquisitionStage.cs` (Entra/IUM-only OBO); ADR-009, ADR-007.
  Full seam map: `architecture/toolhub-arcade-integration.md` (outer repo).
 - **Automation Hub:** `src/server/Host.Api/Controllers/ActionExecutionController.cs`
  (`POST /actions/{id}/execute`); `Host.CatalogApi/Controllers/CatalogActionsController.cs`
  (catalog sync contract); `Domain/Catalog/Actions/DownstreamApiAuthType.cs`
  (`{ApiAccessToken, TokenServer, None}` — no per-user OAuth);
  `crap/blueprint/system/context/v1-roadmap.md` (external integration = non-goal);
  `docs/research/platform-selection/paragon.md` (per-user OAuth named as the external gap).
 - **Arcade POC:** arcade-eval `LIVE-POC.md` (self-hosted, Entra IdP, in-cluster LiteLLM);
  `criteria-section-3.md` (enforcement-at-Engine + bypass findings).
 </content>
@@ -1,123 +0,0 @@
 # How the stack works — Automation Hub, Tool Hub, and the two gateways (plain language)
 > A plain-terms companion to the technical seam map in
 > `categories/cat3-access-policy/integration-architecture.md`. Same architecture, no jargon.
 > Grounded in `servicetitan/automation-hub` @ master and `servicetitan/tool-hub` @ master
 > (source-verified 2026-06-22).
 ## The one-paragraph version
 **Automation Hub** is the warehouse of ~5,000+ things an agent can *do* inside ServiceTitan.
 **Tool Hub** is the smart front desk that makes that giant catalog usable for an AI and acts as
 the single bouncer (per-user permissions + audit). The **MCP Gateway (Arcade)** plugs in beside
 Automation Hub to add *outside* tools (GitHub, Slack, Google) **with per-user login** — the one
 thing neither of the others can do. The **AI Gateway** is one toll booth that every model/AI call
 passes through (keys, cost, rate limits), added by **configuration, not a rebuild**.
 ---
 ## 1. Automation Hub — the warehouse of actions
 Where ServiceTitan keeps everything an agent can actually *do*: "create a job," "look up a
 customer," "send an invoice" — 5,000+ actions today.
 - It holds the **catalog** (every action + what inputs it needs) and does the **execution**
  (actually calls ServiceTitan's internal APIs).
 - Its login is **ServiceTitan-identity only.** It can act as a ServiceTitan user/bot, but it has
  **no way to log into GitHub / Slack / Google on your behalf** — and that's deliberate (AH's
  roadmap lists third-party OAuth as a non-goal).
 > AH = the internal action warehouse. Great at ServiceTitan, blind to outside SaaS.
 ## 2. Tool Hub — the smart front desk
 Handing an AI the raw list of 5,000 tools (heading to 200,000) blows its context window and it
 picks the wrong tool. Tool Hub is the front desk between the agent and the warehouse. It does
 three things:
 1. **Aggregates** — every source (AH today, others later) becomes one clean, unified list. The
   agent sees **one front desk**, not many warehouses.
 2. **Discovers progressively** — the agent never reads the whole catalog. It asks:
   - *"What tools do something like X?"* → `search_tools` returns a **short shortlist**
     (names + one-line summaries only).
   - *"How exactly do I use this one?"* → `get_tool_details` returns full instructions for just
     the **1–3** it actually wants.
   - *"Run it."* → `execute_tool`.
   - (Plus `resume_execution`, `list_namespaces`, `cancel_execution`.)
   It finds tools by **meaning, not keywords** — semantic search over a vector database
   (pgvector + HNSW), embedded by **Voyage**, descriptions enriched by **Claude**, then reranked.
 3. **Permission-filters** — before the shortlist ever reaches the agent, it **removes any tool
   you're not allowed to use.** You can't see, let alone call, what you don't have access to.
 > Tool Hub = the brain *and* the bouncer. It runs as its **own central service** (two
 > autoscaled Kubernetes deployments + an admin UI), **not** a sidecar — and it's the single
 > place policy, permissions, and audit live.
 **The flow so far:**
 ```
 Agent  →  Tool Hub (front desk: search · filter · decide)  →  Automation Hub (execute)  →  ServiceTitan APIs
 ```
 ## 3. Where the two gateways fit
 Two real gaps remain. Each gateway plugs one.
 ### MCP Gateway (Arcade) — the gap = *outside tools*
 Tool Hub + AH are great for internal ServiceTitan actions, but neither can **log into
 GitHub/Slack/Google as you**. That's Arcade's one job: a second warehouse for **outside SaaS
 tools, with per-user login built in.** Tool Hub already has an empty "plug in another source"
 slot (the `mcp_proxy` adapter), so Arcade plugs in **right beside** Automation Hub:
 ```mermaid
 flowchart LR
  Agent["LLM Agent"]
  TH["Tool Hub<br/>(brain + bouncer:<br/>search · per-user filter · audit)"]
  AH["Automation Hub<br/>(internal actions)"]
  AR["MCP Gateway — Arcade<br/>(outside tools + per-user login)"]
  ST["ServiceTitan APIs"]
  SaaS["GitHub · Slack · Google"]
  Agent --> TH
  TH --> AH --> ST
  TH --> AR --> SaaS
  classDef new fill:#ffe8cc,stroke:#e8860c,stroke-width:2px,color:#000;
  class AR new;
 ```
 Tool Hub stays the single front desk and bouncer for **both** paths. The only difference: for an
 outside tool it hands off to Arcade, and **Arcade handles the messy per-user OAuth login** (that's
 the "authorize GitHub" pop-up). Tool Hub never stores your GitHub token — Arcade does.
 ### AI Gateway — the gap = *the model calls themselves*
 Everything above quietly uses AI models: semantic search uses **Voyage** embeddings, catalog
 descriptions are written by **Claude**, the agent itself calls a model to think. The **AI
 Gateway** is **one toll booth** all of that passes through — so keys, cost tracking, rate limits,
 and routing live in one place.
 The key point: this is **configuration, not a rebuild.** Every component already calls models
 through a swappable address; you just **repoint those addresses at the gateway.**
 ```mermaid
 flowchart LR
  A["Agent thinking"] --> GW
  B["Tool Hub — search (Voyage)"] --> GW
  C["Tool Hub — descriptions (Claude)"] --> GW
  GW["AI Gateway<br/>(one toll booth: keys · cost · limits)"] --> P["Anthropic · Voyage · OpenAI"]
  classDef new fill:#ffe8cc,stroke:#e8860c,stroke-width:2px,color:#000;
  class GW new;
 ```
 ## 4. The whole picture in one breath
 | Piece | What it is (simple) | The gap it fills |
 |---|---|---|
 | **Automation Hub** | Warehouse of 5,000+ internal ServiceTitan actions; executes them (ST-login only) | — (the base) |
 | **Tool Hub** | Smart central front desk: makes the catalog usable for an AI (search → details → run) + the one bouncer (per-user filter + audit) | Scale + governance |
 | **MCP Gateway (Arcade)** | Plugs in beside AH to add outside tools (GitHub/Slack/Google) **with per-user login** | The thing neither AH nor Tool Hub can do |
 | **AI Gateway** | One toll booth for **all** model/AI calls | One place for keys/cost/limits — added by config |
 **The design win:** adding both gateways is mostly **plugging into seams that already exist** —
 Tool Hub stays the single authority, Automation Hub is untouched, and the only genuinely new
 capability (logging into third-party apps as you) lives inside Arcade.
@@ -1,73 +1,95 @@
 # Deploy arcade-eval reference MCP server to backstage k8s
 **Date:** 2026-06-22
-**Status:** Approved — implementing
+**Status:** DONE — deployed and verified end-to-end.
 ## Goal
 Replace the ephemeral cloudflared **quick tunnel** (used to register the
 `arcade-eval-ref` server with the self-hosted Arcade engine) with a permanent
-in-cluster deployment on `backstage-wus2-v4`. The engine then reaches the server
+deployment on `backstage-wus2-v4`, so the engine reaches the server over a stable
-over stable cluster DNS instead of a `trycloudflare.com` URL that dies on restart.
+URL instead of a `trycloudflare.com` URL that dies on restart.
 Relevant eval categories: cat-4 (custom server dev), cat-8 (deployment), cat-9 (DX).
-## Architecture / data flow
+## Key finding that shaped the final design
 The first attempt registered the in-cluster **Service DNS**
 (`http://arcade-eval-ref.arcade-eval-ref.svc.cluster.local:8000`) as a dashboard
 worker. Health went green but **0 tools loaded**. Engine logs showed:
 ```
-Arcade engine (ns: arcade)  ──HTTP /worker/*──▶  Service arcade-eval-ref (ns: arcade-eval-ref)
+Failed to get worker tools: Get ".../worker/tools":
-   registered as type "Arcade"                       └─▶ Deployment: python:3.12 running
+  dial tcp 10.0.192.27:8000: publicOnlyTransport: blocked connection to internal address
-   URI = http://arcade-eval-ref.arcade-eval-ref            mcp_server.server over HTTP :8000
+```
-        .svc.cluster.local:8000                           (echo / add / whoami)
+
-   Secret = ARCADE_WORKER_SECRET  ◀── same value ──▶  env ARCADE_WORKER_SECRET (SealedSecret)
+**The Arcade engine has an SSRF guard (`publicOnlyTransport`) that blocks
 dashboard-registered worker URIs resolving to internal/private (RFC1918) addresses.**
 Only workers declared in the **engine config file** (e.g. the bundled `arcade-worker-main`
 at `http://arcade-worker-main:8001`) may use internal URIs. Health checks aren't guarded
 (hence green), but the authenticated `/worker/tools` discovery is. The cloudflared tunnel
 worked only because it was a *public* URL.
 ⇒ A dashboard-registered in-cluster worker **must be exposed on a public URL**. (The
 worker secret was a red herring — the connection is refused before auth.)
 ## Architecture / data flow (final)
 ```
 Claude Code ──▶ gateway zeb-gateway-test ──▶ Arcade engine ──HTTPS /worker/*──▶
   https://arcade-eval-ref.st.dev  (Cloudflare CNAME → k8s-backstage.st.dev → nginx ingress)
      └─▶ Service → Deployment: python:3.12 running mcp_server.server over HTTP :8000
          (echo / add / whoami).  /mcp also served; /worker/* auth = ARCADE_WORKER_SECRET.
 ```
 ### Runtime facts (verified by introspecting `arcade-mcp-server` 1.17)
 - `app.run()` honors env overrides via `_get_configuration_overrides()`:
-  `ARCADE_SERVER_TRANSPORT=http`, `ARCADE_SERVER_HOST=0.0.0.0`, `ARCADE_SERVER_PORT=8000`.
+  `ARCADE_SERVER_TRANSPORT=http`, `ARCADE_SERVER_HOST=0.0.0.0`, `ARCADE_SERVER_PORT=8000`
-  So the hardcoded `127.0.0.1` in `server.py`'s `__main__` is overridden at runtime —
+  — so the hardcoded `127.0.0.1` in `server.py` is overridden at runtime (no code change).
-  **no `server.py` change needed.**
+- `ARCADE_WORKER_SECRET` enables worker routes at `/worker/*`; the engine authenticates with
- `ARCADE_WORKER_SECRET` (settings alias `arcade.server_secret`) → worker routes mount at
+  an HS256 JWT (`aud=worker`, `ver=1`) signed with that secret. MCP is served at `/mcp`.
  `/worker/*` (what the engine calls); MCP also served at `/mcp`. FastAPI app, port 8000.
-## Components
+## Components (three repos)
-### 1. `arcade-eval` repo (branch off `main`)
+### 1. `arcade-eval` — image
 - `lib/mcp_server/Dockerfile` — `python:3.12-slim`, `pip install .`, HTTP transport via env,
  non-root, port 8000.
 - `.github/workflows/build-push-acr.yml` — pushes
  `servicetitandev.azurecr.io/arcade-eval-ref:1.0.<run_number>` (secrets
  `ACR_DEV_USERNAME`/`ACR_DEV_PASSWORD`). Adapted from `servicetitan/mem0`.
- **`lib/mcp_server/Dockerfile`** — `python:3.12-slim`, `pip install .` (pulls
+### 2. `k8s-backstage-v2` — `apps/mcp/arcade-eval-ref/`
-  `arcade-mcp-server` + `httpx`), `ENV` transport/host/port, non-root user, `EXPOSE 8000`,
+- `namespace.yaml` — ns `arcade-eval-ref`.
-  `CMD ["python","-m","mcp_server.server"]`.
+- `server.yaml` — **st-app HelmRelease** (chart 2.0.72): `image` pinned to `1.0.1`,
- **`.github/workflows/build-push-acr.yml`** — adapted from `servicetitan/mem0`. Pushes
+  `service.internalPort: 8000`, **`ingress.enabled` host `arcade-eval-ref.st.dev`
-  `servicetitandev.azurecr.io/arcade-eval-ref:1.0.<run_number>`. Login via repo secrets
+  class `nginx`, `oAuth.enabled: false`** (no SSO wall over `/worker/*` or `/mcp`),
-  `ACR_DEV_USERNAME` / `ACR_DEV_PASSWORD`. Triggers: `workflow_dispatch` + push to `main`
+  worker secret via `envFrom` from the SealedSecret, probes off. TLS = ingress default
-  filtered to `lib/mcp_server/**`.
+  `*.st.dev` wildcard cert.
 - `sealedsecret.yaml` — `arcade-eval-ref-worker-secret` (key `ARCADE_WORKER_SECRET`),
  strict scope, sealed with the backstage-wus2-v4 sealed-secrets cert.
-### 2. `k8s-backstage-v2` repo (branch off `master`)
+### 3. `iac-terraform-workspaces` — DNS
 - CNAME `arcade-eval-ref.st.dev` → `k8s-backstage.st.dev` (st.dev zone), mirroring the
  `anvil`/`alerts` pattern.
-New dir **`apps/mcp/arcade-eval-ref/`** (Flux's `apps` Kustomization recursively applies
+## Registration (dashboard)
 everything under `apps/`; no per-dir `kustomization.yaml`):
- **`namespace.yaml`** — ns `arcade-eval-ref` (labels per repo convention, `team: infra`).
+Add/repoint the worker: URI `https://arcade-eval-ref.st.dev`, Secret = the worker-secret
- **`server.yaml`** — plain `Deployment` (image
+plaintext (git-ignored at `results/arcade-eval-ref-worker-secret.txt`). The engine then
-  `servicetitandev.azurecr.io/arcade-eval-ref:1.0.1`; no imagePullSecret — the cluster has
+fetches `/worker/tools` over the public URL → tools load → add to `zeb-gateway-test`.
  native ACR pull, confirmed by other `apps/mcp/*` servers; `ARCADE_WORKER_SECRET` from
  secretRef; TCP probes; modest resources) + `Service` (ClusterIP, 8000→8000).
 - **`sealedsecret.yaml`** — `arcade-eval-ref-worker-secret`, key `ARCADE_WORKER_SECRET`,
  **strict** scope, sealed offline with `kubeseal --cert <backstage-wus2-v4 public cert>`.
-## Manual steps after merge
+## Verified
-1. Add `ACR_DEV_USERNAME` / `ACR_DEV_PASSWORD` repo secrets to `arcade-eval`.
+- `https://arcade-eval-ref.st.dev/worker/health` → 200 (valid `*.st.dev` LE cert);
-2. `workflow_dispatch` (or merge to `main`) to build/push the image — first run = tag `1.0.1`.
+  `/worker/tools` with a correct worker JWT → 200, tools `Echo/Add/Whoami`.
-3. Merge the k8s branch; Flux applies the namespace/secret/deployment.
+- Through the gateway: `ArcadeEvalRef_Whoami()` → the caller's Entra `sub`
-4. Dashboard → **Add Server → Arcade**, URI
+  (`GvgRofe5…`), proving per-user execution across the full
-   `http://arcade-eval-ref.arcade-eval-ref.svc.cluster.local:8000`, Secret = the worker secret
+  client → gateway → engine → public URL → in-cluster pod chain.
   plaintext (stored git-ignored at `results/arcade-eval-ref-worker-secret.txt`); re-point the
   `zeb-gateway-test` gateway's ref tools at it and drop the tunnel. Delete the plaintext file
   afterward.
-## Out of scope (YAGNI)
+## Alternative considered (not taken)
-No ingress (internal-only ClusterIP), no HPA, no PodMonitor/metrics (separate cat-5 work),
+Declare the server as a static worker in the **engine config** (`tools.directors[].workers`,
-single replica.
+like `arcade-worker-main`) — that path allows internal URIs and avoids public exposure, but
 edits the vendor Helm release (`apps/arcade`) and loses the dashboard per-project workflow.
 Public ingress was chosen as the lower-touch option.