Files

2.7 KiB
Raw Permalink Blame History

name, description
name description
arcade-gateway-eval Use when starting or resuming any lane of the Arcade.dev MCP-gateway evaluation (categories 1-10), especially a parallel session. Establishes repo location, the read-first order, the live-state check, ground rules, and per-lane file ownership.

Arcade gateway eval — lane bootstrap

You're picking up a lane of the Arcade.dev MCP-gateway benchmark. This repo is the tool-agnostic source of truth; this skill just orients you. Do these in order.

1. Sync

  • Repo: ~/repos/arcade-eval. git pull first (on rejection, git pull --rebase).
  • cp config/.env.example .env and fill creds if you haven't (creds live ONLY in .env).

2. Read first (in order)

  1. STATUS.md — who owns what, where each lane is.
  2. LIVE-POC.md — frozen deployment facts (endpoints, IdP=Entra, the OTEL/metrics evidence).
  3. GROUND-RULES.md — binding rules.

3. Live-state check (REQUIRED before any conclusion)

The deployment changes under you; docs age within a day.

git -C ~/repos/k8s-backstage-v2 log --oneline -8 origin/master -- apps/arcade
curl -sS -o /dev/null -w '%{http_code}\n' https://dashboard.arcade.st.dev

Any not-yet-reverted in-flight "TEMPORARY"/teardown commit on apps/arcade ⇒ not a validated steady state; don't draw conclusions from it.

4. File ownership (parallel-session safety)

  • Write only inside your categories/catN-*/ subtree + your own STATUS.md section.
  • Shared files (config/targets.yaml, lib/, top-level docs) are append-mostly; git pull --rebase before push. Full table in GROUND-RULES.md.

5. Ground rules you will trip on if you forget

  • Never write the criteria Google Doc from a session — compose criteria-section-N.md locally; the human pastes. Criterion wording is verbatim from the criteria doc.
  • Credentials only in .env. Single candidate ⇒ 15 scoring, anchors at 1/3/5.

6. Starting your lane

  • Your categories/catN-*/criteria-section-N.md is pre-seeded with verbatim criteria.
  • Copy categories/_TEMPLATE/'s NOTES.md + tests/ into your dir; record progress in NOTES.md.

7. Category-specific pointers

  • cat 5 (auditability): metrics go to Grafana/Mimir, NOT ELK. Engine OTLP is currently dropped (no collector resolves at arcade-otel-collector:4318). See LIVE-POC "Metrics pipeline".
  • cat 1 / 2 (identity): per-user behavior is testable via user_id headers (one API key, many users); a real Entra SSO login → identity-mapping test is cat-2 and wants a second real account (a teammate is the natural User B).
  • cat 9 (dev experience): the shared lib/mcp_server (echo/whoami/add) is the fixture; the DX timing of building a server from scratch is the separate measurement.