From 29c5b2c8be69cdedbab833caf1ddd0442c7f9b91 Mon Sep 17 00:00:00 2001 From: iztaylor Date: Thu, 18 Jun 2026 10:07:47 -0400 Subject: [PATCH] feat: in-repo arcade-gateway-eval bootstrap skill --- .claude/skills/arcade-gateway-eval/SKILL.md | 50 +++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 .claude/skills/arcade-gateway-eval/SKILL.md diff --git a/.claude/skills/arcade-gateway-eval/SKILL.md b/.claude/skills/arcade-gateway-eval/SKILL.md new file mode 100644 index 0000000..2cbbf8e --- /dev/null +++ b/.claude/skills/arcade-gateway-eval/SKILL.md @@ -0,0 +1,50 @@ +--- +name: arcade-gateway-eval +description: Use when starting or resuming any lane of the Arcade.dev MCP-gateway evaluation (categories 1-10), especially a parallel session. Establishes repo location, the read-first order, the live-state check, ground rules, and per-lane file ownership. +--- + +# Arcade gateway eval — lane bootstrap + +You're picking up a lane of the Arcade.dev MCP-gateway benchmark. This repo is the +tool-agnostic source of truth; this skill just orients you. Do these in order. + +## 1. Sync +- Repo: `~/repos/arcade-eval`. `git pull` first (on rejection, `git pull --rebase`). +- `cp config/.env.example .env` and fill creds if you haven't (creds live ONLY in `.env`). + +## 2. Read first (in order) +1. `STATUS.md` — who owns what, where each lane is. +2. `LIVE-POC.md` — frozen deployment facts (endpoints, IdP=Entra, the OTEL/metrics evidence). +3. `GROUND-RULES.md` — binding rules. + +## 3. Live-state check (REQUIRED before any conclusion) +The deployment changes under you; docs age within a day. +``` +git -C ~/repos/k8s-backstage-v2 log --oneline -8 origin/master -- apps/arcade +curl -sS -o /dev/null -w '%{http_code}\n' https://dashboard.arcade.st.dev +``` +Any not-yet-reverted in-flight "TEMPORARY"/teardown commit on `apps/arcade` ⇒ not a validated +steady state; don't draw conclusions from it. + +## 4. File ownership (parallel-session safety) +- Write only inside your `categories/catN-*/` subtree + your own `STATUS.md` section. +- Shared files (`config/targets.yaml`, `lib/`, top-level docs) are append-mostly; `git pull + --rebase` before push. Full table in `GROUND-RULES.md`. + +## 5. Ground rules you will trip on if you forget +- **Never write the criteria Google Doc from a session** — compose `criteria-section-N.md` + locally; the human pastes. Criterion wording is **verbatim** from the criteria doc. +- Credentials only in `.env`. Single candidate ⇒ 1–5 scoring, anchors at 1/3/5. + +## 6. Starting your lane +- Your `categories/catN-*/criteria-section-N.md` is pre-seeded with verbatim criteria. +- Copy `categories/_TEMPLATE/`'s `NOTES.md` + `tests/` into your dir; record progress in `NOTES.md`. + +## 7. Category-specific pointers +- **cat 5 (auditability):** metrics go to **Grafana/Mimir**, NOT ELK. Engine OTLP is currently + dropped (no collector resolves at `arcade-otel-collector:4318`). See LIVE-POC "Metrics pipeline". +- **cat 1 / 2 (identity):** per-user behavior is testable via `user_id` headers (one API key, + many users); a real Entra **SSO login → identity-mapping** test is cat-2 and wants a second + real account (a teammate is the natural User B). +- **cat 9 (dev experience):** the shared `lib/mcp_server` (echo/whoami/add) is the fixture; + the DX *timing* of building a server from scratch is the separate measurement.