--- name: arcade-gateway-eval description: Use when starting or resuming any lane of the Arcade.dev MCP-gateway evaluation (categories 1-10), especially a parallel session. Establishes repo location, the read-first order, the live-state check, ground rules, and per-lane file ownership. --- # Arcade gateway eval — lane bootstrap You're picking up a lane of the Arcade.dev MCP-gateway benchmark. This repo is the tool-agnostic source of truth; this skill just orients you. Do these in order. ## 1. Sync - Repo: `~/repos/arcade-eval`. `git pull` first (on rejection, `git pull --rebase`). - `cp config/.env.example .env` and fill creds if you haven't (creds live ONLY in `.env`). ## 2. Read first (in order) 1. `STATUS.md` — who owns what, where each lane is. 2. `LIVE-POC.md` — frozen deployment facts (endpoints, IdP=Entra, the OTEL/metrics evidence). 3. `GROUND-RULES.md` — binding rules. ## 3. Live-state check (REQUIRED before any conclusion) The deployment changes under you; docs age within a day. ``` git -C ~/repos/k8s-backstage-v2 log --oneline -8 origin/master -- apps/arcade curl -sS -o /dev/null -w '%{http_code}\n' https://dashboard.arcade.st.dev ``` Any not-yet-reverted in-flight "TEMPORARY"/teardown commit on `apps/arcade` ⇒ not a validated steady state; don't draw conclusions from it. ## 4. File ownership (parallel-session safety) - Write only inside your `categories/catN-*/` subtree + your own `STATUS.md` section. - Shared files (`config/targets.yaml`, `lib/`, top-level docs) are append-mostly; `git pull --rebase` before push. Full table in `GROUND-RULES.md`. ## 5. Ground rules you will trip on if you forget - **Never write the criteria Google Doc from a session** — compose `criteria-section-N.md` locally; the human pastes. Criterion wording is **verbatim** from the criteria doc. - Credentials only in `.env`. Single candidate ⇒ 1–5 scoring, anchors at 1/3/5. ## 6. Starting your lane - Your `categories/catN-*/criteria-section-N.md` is pre-seeded with verbatim criteria. - Copy `categories/_TEMPLATE/`'s `NOTES.md` + `tests/` into your dir; record progress in `NOTES.md`. ## 7. Category-specific pointers - **cat 5 (auditability):** metrics go to **Grafana/Mimir**, NOT ELK. Engine OTLP is currently dropped (no collector resolves at `arcade-otel-collector:4318`). See LIVE-POC "Metrics pipeline". - **cat 1 / 2 (identity):** per-user behavior is testable via `user_id` headers (one API key, many users); a real Entra **SSO login → identity-mapping** test is cat-2 and wants a second real account (a teammate is the natural User B). - **cat 9 (dev experience):** the shared `lib/mcp_server` (echo/whoami/add) is the fixture; the DX *timing* of building a server from scratch is the separate measurement.