51 lines
2.7 KiB
Markdown
51 lines
2.7 KiB
Markdown
---
|
||
name: arcade-gateway-eval
|
||
description: Use when starting or resuming any lane of the Arcade.dev MCP-gateway evaluation (categories 1-10), especially a parallel session. Establishes repo location, the read-first order, the live-state check, ground rules, and per-lane file ownership.
|
||
---
|
||
|
||
# Arcade gateway eval — lane bootstrap
|
||
|
||
You're picking up a lane of the Arcade.dev MCP-gateway benchmark. This repo is the
|
||
tool-agnostic source of truth; this skill just orients you. Do these in order.
|
||
|
||
## 1. Sync
|
||
- Repo: `~/repos/arcade-eval`. `git pull` first (on rejection, `git pull --rebase`).
|
||
- `cp config/.env.example .env` and fill creds if you haven't (creds live ONLY in `.env`).
|
||
|
||
## 2. Read first (in order)
|
||
1. `STATUS.md` — who owns what, where each lane is.
|
||
2. `LIVE-POC.md` — frozen deployment facts (endpoints, IdP=Entra, the OTEL/metrics evidence).
|
||
3. `GROUND-RULES.md` — binding rules.
|
||
|
||
## 3. Live-state check (REQUIRED before any conclusion)
|
||
The deployment changes under you; docs age within a day.
|
||
```
|
||
git -C ~/repos/k8s-backstage-v2 log --oneline -8 origin/master -- apps/arcade
|
||
curl -sS -o /dev/null -w '%{http_code}\n' https://dashboard.arcade.st.dev
|
||
```
|
||
Any not-yet-reverted in-flight "TEMPORARY"/teardown commit on `apps/arcade` ⇒ not a validated
|
||
steady state; don't draw conclusions from it.
|
||
|
||
## 4. File ownership (parallel-session safety)
|
||
- Write only inside your `categories/catN-*/` subtree + your own `STATUS.md` section.
|
||
- Shared files (`config/targets.yaml`, `lib/`, top-level docs) are append-mostly; `git pull
|
||
--rebase` before push. Full table in `GROUND-RULES.md`.
|
||
|
||
## 5. Ground rules you will trip on if you forget
|
||
- **Never write the criteria Google Doc from a session** — compose `criteria-section-N.md`
|
||
locally; the human pastes. Criterion wording is **verbatim** from the criteria doc.
|
||
- Credentials only in `.env`. Single candidate ⇒ 1–5 scoring, anchors at 1/3/5.
|
||
|
||
## 6. Starting your lane
|
||
- Your `categories/catN-*/criteria-section-N.md` is pre-seeded with verbatim criteria.
|
||
- Copy `categories/_TEMPLATE/`'s `NOTES.md` + `tests/` into your dir; record progress in `NOTES.md`.
|
||
|
||
## 7. Category-specific pointers
|
||
- **cat 5 (auditability):** metrics go to **Grafana/Mimir**, NOT ELK. Engine OTLP is currently
|
||
dropped (no collector resolves at `arcade-otel-collector:4318`). See LIVE-POC "Metrics pipeline".
|
||
- **cat 1 / 2 (identity):** per-user behavior is testable via `user_id` headers (one API key,
|
||
many users); a real Entra **SSO login → identity-mapping** test is cat-2 and wants a second
|
||
real account (a teammate is the natural User B).
|
||
- **cat 9 (dev experience):** the shared `lib/mcp_server` (echo/whoami/add) is the fixture;
|
||
the DX *timing* of building a server from scratch is the separate measurement.
|