Files
arcade-eval/docs/superpowers/specs/2026-06-22-deploy-mcp-to-k8s-design.md
ztaylor 715e846094 deploy: containerize arcade-eval-ref MCP server + ACR build/push action
Replace the cloudflared quick-tunnel dev pattern with a permanent in-cluster
deployment so the self-hosted Arcade engine reaches the echo/add/whoami reference
server over stable cluster DNS.

- lib/mcp_server/Dockerfile: python:3.12-slim, pip install ., HTTP transport via
  ARCADE_SERVER_{TRANSPORT,HOST,PORT} env overrides (no server.py change needed),
  non-root user, port 8000.
- .github/workflows/build-push-acr.yml: build + push
  servicetitandev.azurecr.io/arcade-eval-ref:1.0.<run_number>. Adapted from
  servicetitan/mem0; needs repo secrets ACR_DEV_USERNAME / ACR_DEV_PASSWORD.
- docs/superpowers/specs design record.

K8s manifests live in k8s-backstage-v2 apps/mcp/arcade-eval-ref/ (separate branch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 11:22:56 -04:00

3.6 KiB

Deploy arcade-eval reference MCP server to backstage k8s

Date: 2026-06-22 Status: Approved — implementing

Goal

Replace the ephemeral cloudflared quick tunnel (used to register the arcade-eval-ref server with the self-hosted Arcade engine) with a permanent in-cluster deployment on backstage-wus2-v4. The engine then reaches the server over stable cluster DNS instead of a trycloudflare.com URL that dies on restart.

Relevant eval categories: cat-4 (custom server dev), cat-8 (deployment), cat-9 (DX).

Architecture / data flow

Arcade engine (ns: arcade)  ──HTTP /worker/*──▶  Service arcade-eval-ref (ns: arcade-eval-ref)
   registered as type "Arcade"                       └─▶ Deployment: python:3.12 running
   URI = http://arcade-eval-ref.arcade-eval-ref            mcp_server.server over HTTP :8000
        .svc.cluster.local:8000                           (echo / add / whoami)
   Secret = ARCADE_WORKER_SECRET  ◀── same value ──▶  env ARCADE_WORKER_SECRET (SealedSecret)

Runtime facts (verified by introspecting arcade-mcp-server 1.17)

  • app.run() honors env overrides via _get_configuration_overrides(): ARCADE_SERVER_TRANSPORT=http, ARCADE_SERVER_HOST=0.0.0.0, ARCADE_SERVER_PORT=8000. So the hardcoded 127.0.0.1 in server.py's __main__ is overridden at runtime — no server.py change needed.
  • ARCADE_WORKER_SECRET (settings alias arcade.server_secret) → worker routes mount at /worker/* (what the engine calls); MCP also served at /mcp. FastAPI app, port 8000.

Components

1. arcade-eval repo (branch off main)

  • lib/mcp_server/Dockerfilepython:3.12-slim, pip install . (pulls arcade-mcp-server + httpx), ENV transport/host/port, non-root user, EXPOSE 8000, CMD ["python","-m","mcp_server.server"].
  • .github/workflows/build-push-acr.yml — adapted from servicetitan/mem0. Pushes servicetitandev.azurecr.io/arcade-eval-ref:1.0.<run_number>. Login via repo secrets ACR_DEV_USERNAME / ACR_DEV_PASSWORD. Triggers: workflow_dispatch + push to main filtered to lib/mcp_server/**.

2. k8s-backstage-v2 repo (branch off master)

New dir apps/mcp/arcade-eval-ref/ (Flux's apps Kustomization recursively applies everything under apps/; no per-dir kustomization.yaml):

  • namespace.yaml — ns arcade-eval-ref (labels per repo convention, team: infra).
  • server.yaml — plain Deployment (image servicetitandev.azurecr.io/arcade-eval-ref:1.0.1; no imagePullSecret — the cluster has native ACR pull, confirmed by other apps/mcp/* servers; ARCADE_WORKER_SECRET from secretRef; TCP probes; modest resources) + Service (ClusterIP, 8000→8000).
  • sealedsecret.yamlarcade-eval-ref-worker-secret, key ARCADE_WORKER_SECRET, strict scope, sealed offline with kubeseal --cert <backstage-wus2-v4 public cert>.

Manual steps after merge

  1. Add ACR_DEV_USERNAME / ACR_DEV_PASSWORD repo secrets to arcade-eval.
  2. workflow_dispatch (or merge to main) to build/push the image — first run = tag 1.0.1.
  3. Merge the k8s branch; Flux applies the namespace/secret/deployment.
  4. Dashboard → Add Server → Arcade, URI http://arcade-eval-ref.arcade-eval-ref.svc.cluster.local:8000, Secret = the worker secret plaintext (stored git-ignored at results/arcade-eval-ref-worker-secret.txt); re-point the zeb-gateway-test gateway's ref tools at it and drop the tunnel. Delete the plaintext file afterward.

Out of scope (YAGNI)

No ingress (internal-only ClusterIP), no HPA, no PodMonitor/metrics (separate cat-5 work), single replica.