Files
arcade-eval/.claude/skills/arcade-gateway-eval/SKILL.md

51 lines
2.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: arcade-gateway-eval
description: Use when starting or resuming any lane of the Arcade.dev MCP-gateway evaluation (categories 1-10), especially a parallel session. Establishes repo location, the read-first order, the live-state check, ground rules, and per-lane file ownership.
---
# Arcade gateway eval — lane bootstrap
You're picking up a lane of the Arcade.dev MCP-gateway benchmark. This repo is the
tool-agnostic source of truth; this skill just orients you. Do these in order.
## 1. Sync
- Repo: `~/repos/arcade-eval`. `git pull` first (on rejection, `git pull --rebase`).
- `cp config/.env.example .env` and fill creds if you haven't (creds live ONLY in `.env`).
## 2. Read first (in order)
1. `STATUS.md` — who owns what, where each lane is.
2. `LIVE-POC.md` — frozen deployment facts (endpoints, IdP=Entra, the OTEL/metrics evidence).
3. `GROUND-RULES.md` — binding rules.
## 3. Live-state check (REQUIRED before any conclusion)
The deployment changes under you; docs age within a day.
```
git -C ~/repos/k8s-backstage-v2 log --oneline -8 origin/master -- apps/arcade
curl -sS -o /dev/null -w '%{http_code}\n' https://dashboard.arcade.st.dev
```
Any not-yet-reverted in-flight "TEMPORARY"/teardown commit on `apps/arcade` ⇒ not a validated
steady state; don't draw conclusions from it.
## 4. File ownership (parallel-session safety)
- Write only inside your `categories/catN-*/` subtree + your own `STATUS.md` section.
- Shared files (`config/targets.yaml`, `lib/`, top-level docs) are append-mostly; `git pull
--rebase` before push. Full table in `GROUND-RULES.md`.
## 5. Ground rules you will trip on if you forget
- **Never write the criteria Google Doc from a session** — compose `criteria-section-N.md`
locally; the human pastes. Criterion wording is **verbatim** from the criteria doc.
- Credentials only in `.env`. Single candidate ⇒ 15 scoring, anchors at 1/3/5.
## 6. Starting your lane
- Your `categories/catN-*/criteria-section-N.md` is pre-seeded with verbatim criteria.
- Copy `categories/_TEMPLATE/`'s `NOTES.md` + `tests/` into your dir; record progress in `NOTES.md`.
## 7. Category-specific pointers
- **cat 5 (auditability):** metrics go to **Grafana/Mimir**, NOT ELK. Engine OTLP is currently
dropped (no collector resolves at `arcade-otel-collector:4318`). See LIVE-POC "Metrics pipeline".
- **cat 1 / 2 (identity):** per-user behavior is testable via `user_id` headers (one API key,
many users); a real Entra **SSO login → identity-mapping** test is cat-2 and wants a second
real account (a teammate is the natural User B).
- **cat 9 (dev experience):** the shared `lib/mcp_server` (echo/whoami/add) is the fixture;
the DX *timing* of building a server from scratch is the separate measurement.