Records that the in-cluster Service DNS could not be used for a dashboard-registered worker (engine publicOnlyTransport SSRF guard blocks internal addresses), the pivot to st-app chart + public ingress at arcade-eval-ref.st.dev (CNAME -> k8s-backstage.st.dev), and the verified end-to-end whoami result. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4.6 KiB
Deploy arcade-eval reference MCP server to backstage k8s
Date: 2026-06-22 Status: DONE — deployed and verified end-to-end.
Goal
Replace the ephemeral cloudflared quick tunnel (used to register the
arcade-eval-ref server with the self-hosted Arcade engine) with a permanent
deployment on backstage-wus2-v4, so the engine reaches the server over a stable
URL instead of a trycloudflare.com URL that dies on restart.
Relevant eval categories: cat-4 (custom server dev), cat-8 (deployment), cat-9 (DX).
Key finding that shaped the final design
The first attempt registered the in-cluster Service DNS
(http://arcade-eval-ref.arcade-eval-ref.svc.cluster.local:8000) as a dashboard
worker. Health went green but 0 tools loaded. Engine logs showed:
Failed to get worker tools: Get ".../worker/tools":
dial tcp 10.0.192.27:8000: publicOnlyTransport: blocked connection to internal address
The Arcade engine has an SSRF guard (publicOnlyTransport) that blocks
dashboard-registered worker URIs resolving to internal/private (RFC1918) addresses.
Only workers declared in the engine config file (e.g. the bundled arcade-worker-main
at http://arcade-worker-main:8001) may use internal URIs. Health checks aren't guarded
(hence green), but the authenticated /worker/tools discovery is. The cloudflared tunnel
worked only because it was a public URL.
⇒ A dashboard-registered in-cluster worker must be exposed on a public URL. (The worker secret was a red herring — the connection is refused before auth.)
Architecture / data flow (final)
Claude Code ──▶ gateway zeb-gateway-test ──▶ Arcade engine ──HTTPS /worker/*──▶
https://arcade-eval-ref.st.dev (Cloudflare CNAME → k8s-backstage.st.dev → nginx ingress)
└─▶ Service → Deployment: python:3.12 running mcp_server.server over HTTP :8000
(echo / add / whoami). /mcp also served; /worker/* auth = ARCADE_WORKER_SECRET.
Runtime facts (verified by introspecting arcade-mcp-server 1.17)
app.run()honors env overrides via_get_configuration_overrides():ARCADE_SERVER_TRANSPORT=http,ARCADE_SERVER_HOST=0.0.0.0,ARCADE_SERVER_PORT=8000— so the hardcoded127.0.0.1inserver.pyis overridden at runtime (no code change).ARCADE_WORKER_SECRETenables worker routes at/worker/*; the engine authenticates with an HS256 JWT (aud=worker,ver=1) signed with that secret. MCP is served at/mcp.
Components (three repos)
1. arcade-eval — image
lib/mcp_server/Dockerfile—python:3.12-slim,pip install ., HTTP transport via env, non-root, port 8000..github/workflows/build-push-acr.yml— pushesservicetitandev.azurecr.io/arcade-eval-ref:1.0.<run_number>(secretsACR_DEV_USERNAME/ACR_DEV_PASSWORD). Adapted fromservicetitan/mem0.
2. k8s-backstage-v2 — apps/mcp/arcade-eval-ref/
namespace.yaml— nsarcade-eval-ref.server.yaml— st-app HelmRelease (chart 2.0.72):imagepinned to1.0.1,service.internalPort: 8000,ingress.enabledhostarcade-eval-ref.st.devclassnginx,oAuth.enabled: false(no SSO wall over/worker/*or/mcp), worker secret viaenvFromfrom the SealedSecret, probes off. TLS = ingress default*.st.devwildcard cert.sealedsecret.yaml—arcade-eval-ref-worker-secret(keyARCADE_WORKER_SECRET), strict scope, sealed with the backstage-wus2-v4 sealed-secrets cert.
3. iac-terraform-workspaces — DNS
- CNAME
arcade-eval-ref.st.dev→k8s-backstage.st.dev(st.dev zone), mirroring theanvil/alertspattern.
Registration (dashboard)
Add/repoint the worker: URI https://arcade-eval-ref.st.dev, Secret = the worker-secret
plaintext (git-ignored at results/arcade-eval-ref-worker-secret.txt). The engine then
fetches /worker/tools over the public URL → tools load → add to zeb-gateway-test.
Verified
https://arcade-eval-ref.st.dev/worker/health→ 200 (valid*.st.devLE cert);/worker/toolswith a correct worker JWT → 200, toolsEcho/Add/Whoami.- Through the gateway:
ArcadeEvalRef_Whoami()→ the caller's Entrasub(GvgRofe5…), proving per-user execution across the full client → gateway → engine → public URL → in-cluster pod chain.
Alternative considered (not taken)
Declare the server as a static worker in the engine config (tools.directors[].workers,
like arcade-worker-main) — that path allows internal URIs and avoids public exposure, but
edits the vendor Helm release (apps/arcade) and loses the dashboard per-project workflow.
Public ingress was chosen as the lower-touch option.