2.7 KiB
2.7 KiB
name, description
| name | description |
|---|---|
| arcade-gateway-eval | Use when starting or resuming any lane of the Arcade.dev MCP-gateway evaluation (categories 1-10), especially a parallel session. Establishes repo location, the read-first order, the live-state check, ground rules, and per-lane file ownership. |
Arcade gateway eval — lane bootstrap
You're picking up a lane of the Arcade.dev MCP-gateway benchmark. This repo is the tool-agnostic source of truth; this skill just orients you. Do these in order.
1. Sync
- Repo:
~/repos/arcade-eval.git pullfirst (on rejection,git pull --rebase). cp config/.env.example .envand fill creds if you haven't (creds live ONLY in.env).
2. Read first (in order)
STATUS.md— who owns what, where each lane is.LIVE-POC.md— frozen deployment facts (endpoints, IdP=Entra, the OTEL/metrics evidence).GROUND-RULES.md— binding rules.
3. Live-state check (REQUIRED before any conclusion)
The deployment changes under you; docs age within a day.
git -C ~/repos/k8s-backstage-v2 log --oneline -8 origin/master -- apps/arcade
curl -sS -o /dev/null -w '%{http_code}\n' https://dashboard.arcade.st.dev
Any not-yet-reverted in-flight "TEMPORARY"/teardown commit on apps/arcade ⇒ not a validated
steady state; don't draw conclusions from it.
4. File ownership (parallel-session safety)
- Write only inside your
categories/catN-*/subtree + your ownSTATUS.mdsection. - Shared files (
config/targets.yaml,lib/, top-level docs) are append-mostly;git pull --rebasebefore push. Full table inGROUND-RULES.md.
5. Ground rules you will trip on if you forget
- Never write the criteria Google Doc from a session — compose
criteria-section-N.mdlocally; the human pastes. Criterion wording is verbatim from the criteria doc. - Credentials only in
.env. Single candidate ⇒ 1–5 scoring, anchors at 1/3/5.
6. Starting your lane
- Your
categories/catN-*/criteria-section-N.mdis pre-seeded with verbatim criteria. - Copy
categories/_TEMPLATE/'sNOTES.md+tests/into your dir; record progress inNOTES.md.
7. Category-specific pointers
- cat 5 (auditability): metrics go to Grafana/Mimir, NOT ELK. Engine OTLP is currently
dropped (no collector resolves at
arcade-otel-collector:4318). See LIVE-POC "Metrics pipeline". - cat 1 / 2 (identity): per-user behavior is testable via
user_idheaders (one API key, many users); a real Entra SSO login → identity-mapping test is cat-2 and wants a second real account (a teammate is the natural User B). - cat 9 (dev experience): the shared
lib/mcp_server(echo/whoami/add) is the fixture; the DX timing of building a server from scratch is the separate measurement.