docs: README onboarding + cluster map + STATUS handoff board
This commit is contained in:
@@ -0,0 +1,60 @@
|
||||
# arcade-eval
|
||||
|
||||
Evaluation workspace for **Arcade.dev** as a self-hosted, governed **MCP gateway** for
|
||||
ServiceTitan — measured against the internal *MCP Gateway Benchmark Criteria* (10 weighted
|
||||
categories, hard gates). Multiple lanes (one per category) run **in parallel**; this repo is
|
||||
the shared, tool-agnostic source of truth.
|
||||
|
||||
The question: can Arcade let AI agents act **as the calling user** (no shared credentials,
|
||||
auditable, per-user tool scoping) with operational characteristics we can run in production?
|
||||
|
||||
## Start here (any tool — Claude Code, Cursor, a human)
|
||||
1. `git pull`
|
||||
2. Read **`STATUS.md`** → **`LIVE-POC.md`** → **`GROUND-RULES.md`** (in that order).
|
||||
3. Run the **live-state check** (see GROUND-RULES) before trusting the live instance.
|
||||
4. Go to your `categories/catN-*/` and work only inside it.
|
||||
|
||||
Per-tool entry pointers (all say the same thing, no duplicated content):
|
||||
- **Claude Code:** the in-repo skill `arcade-gateway-eval` (auto-discovered here).
|
||||
- **Cursor:** `.cursor/rules/arcade-eval.mdc` (auto-attaches) + `.cursor/mcp.json.example`.
|
||||
- **Any agent tool:** `AGENTS.md`.
|
||||
|
||||
## The request chain (what we're testing)
|
||||
```
|
||||
MCP client → Gateway (curated tool list) → Engine (auth/vault/policy/audit) → Server → External API
|
||||
```
|
||||
Live endpoints: gateway `https://api.arcade.st.dev/mcp/{slug}`, dashboard
|
||||
`https://dashboard.arcade.st.dev`. See `LIVE-POC.md` for the full deployment snapshot.
|
||||
|
||||
## How lanes work (parallel-session safety)
|
||||
- Each category is a **lane** owning `categories/catN-*/` + its own `STATUS.md` section.
|
||||
- Shared files (`config/targets.yaml`, `lib/`, top-level docs) are append-mostly; `git pull
|
||||
--rebase` before every push. See the ownership table in `GROUND-RULES.md`.
|
||||
- The harness `lib/` is plain Python (`uv`) — tool-agnostic.
|
||||
|
||||
## Starting a new category lane
|
||||
1. `git clone … && cd arcade-eval`; `cp config/.env.example .env`, fill in creds.
|
||||
2. Invoke the bootstrap skill / read the Start-here docs above; run the live-state check.
|
||||
3. Open your `categories/catN-*/` — `criteria-section-N.md` is **already pre-seeded** with your
|
||||
verbatim criteria/gates/anchors. Copy `categories/_TEMPLATE/`'s `NOTES.md` + `tests/` in.
|
||||
4. Claim your section in `STATUS.md`. Work only inside your subtree; `git pull --rebase` before push.
|
||||
|
||||
## Categories → reviewer clusters (from the criteria doc)
|
||||
| Cluster | Question | Categories | Reviewers |
|
||||
|---|---|---|---|
|
||||
| Platform | Does it work and stay up? | 1 Functional · 7 Performance · 8 Deployment/Ops | Nawaz / SRE |
|
||||
| Security | Can we control and see it? | 2 Delegated authz · 3 Access policy · 5 Auditability · 6 Security | Dane / Chandu |
|
||||
| Adopt/Operate | Can we adopt and operate it? | 4 Connectors · 9 Developer experience · 10 Product fit | Paul / Chandu |
|
||||
|
||||
**ztaylor owns categories 1, 5, 9** (one per cluster).
|
||||
|
||||
## Weights
|
||||
1=8 · 2=20 · 3=15 · 4=10 · 5=12 · 6=10 · 7=8 · 8=7 · 9=5 · 10=5 (total 100)
|
||||
|
||||
## Layout
|
||||
```
|
||||
config/ targets.yaml · .env.example
|
||||
lib/ mcp_client.py · mcp_server/ (shared reference server) · helpers
|
||||
categories/ _TEMPLATE/ + cat1..cat10 (each: criteria-section-N.md [+ tests/ NOTES.md when active])
|
||||
results/ git-ignored run artifacts
|
||||
```
|
||||
@@ -0,0 +1,49 @@
|
||||
# STATUS — "you are here" handoff
|
||||
|
||||
Each lane owns its own section. Update yours; don't touch others'. Keep it terse.
|
||||
Last full-repo update: 2026-06-18 (scaffold).
|
||||
|
||||
## Category 1 — Functional MCP Gateway Capability
|
||||
- Owner: ztaylor
|
||||
- Status: in progress (scaffold done; executing per `~/repos/docs/arcade-eval-plan.md`)
|
||||
- Last live-state check: —
|
||||
- Notes: cat-1 lane = this session. Per-user tests via `user_id` headers (real Entra SSO → cat 2).
|
||||
|
||||
## Category 2 — Delegated Authorization and Identity
|
||||
- Owner: — (security cluster: Dane / Chandu)
|
||||
- Status: not started (criteria stub seeded)
|
||||
- Notes: holds the Entra/Okta SSO login → identity-mapping test (a teammate can be User B).
|
||||
|
||||
## Category 3 — Tool-Level Access Control and Policy
|
||||
- Owner: — (security cluster)
|
||||
- Status: not started (criteria stub seeded)
|
||||
|
||||
## Category 4 — Connector Coverage and Custom Server Development
|
||||
- Owner: — (adopt/operate cluster)
|
||||
- Status: not started (criteria stub seeded)
|
||||
|
||||
## Category 5 — Auditability and Observability
|
||||
- Owner: ztaylor
|
||||
- Status: not started (criteria stub seeded)
|
||||
- Notes: metrics → Grafana/Mimir (NOT ELK); engine OTLP currently dropped (no collector). See LIVE-POC.
|
||||
|
||||
## Category 6 — Security and Compliance
|
||||
- Owner: — (security cluster)
|
||||
- Status: not started (criteria stub seeded)
|
||||
|
||||
## Category 7 — Performance and Availability
|
||||
- Owner: — (platform cluster: Nawaz / SRE)
|
||||
- Status: not started (criteria stub seeded)
|
||||
|
||||
## Category 8 — Deployment and Operations
|
||||
- Owner: — (platform cluster)
|
||||
- Status: not started (criteria stub seeded)
|
||||
|
||||
## Category 9 — Developer Experience
|
||||
- Owner: ztaylor
|
||||
- Status: not started (criteria stub seeded)
|
||||
- Notes: stdio loop + Cloudflare-tunnel registration; shared `lib/mcp_server` is the fixture.
|
||||
|
||||
## Category 10 — Product Fit — Tools Catalog and Multi-Tenancy
|
||||
- Owner: — (adopt/operate cluster)
|
||||
- Status: not started (criteria stub seeded)
|
||||
Reference in New Issue
Block a user