Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| e78795bf4f |
@@ -1,24 +1,23 @@
|
|||||||
# STATUS — "you are here" handoff
|
# STATUS — "you are here" handoff
|
||||||
|
|
||||||
Each lane owns its own section. Update yours; don't touch others'. Keep it terse.
|
Each lane owns its own section. Update yours; don't touch others'. Keep it terse.
|
||||||
Last full-repo update: 2026-06-18 (scaffold).
|
Last full-repo update: 2026-06-22.
|
||||||
|
|
||||||
## Category 1 — Functional MCP Gateway Capability
|
## Category 1 — Functional MCP Gateway Capability
|
||||||
- Owner: ztaylor
|
- Owner: ztaylor
|
||||||
- Status: in progress (scaffold done; executing per `~/repos/docs/arcade-eval-plan.md`)
|
- Status: **SCORED (draft 4/5)** — `categories/cat1-functional/criteria-section-1.md`, awaiting user paste into the Google Doc.
|
||||||
- Last live-state check: —
|
- Last live-state check: 2026-06-22
|
||||||
- Notes: cat-1 lane = this session. Per-user tests via `user_id` headers (real Entra SSO → cat 2).
|
- Result: protocol/curation/mixed/dynamic-reg/zero-config-clients all PASS; per-user execution proven (`whoami` A→A/B→B); Claude Code connected via Arcade-Headers AND Entra OAuth. One finding: per-user tool-LIST scoping is gateway-wide, not native (→ cat-3/separate gateways).
|
||||||
|
- Fixtures (reusable): gateway `zeb-gateway-test`; ref server `arcade-eval-ref` (lib/mcp_server) registered via cloudflared quick tunnel (EPHEMERAL — re-establish for cat-9; see LIVE-POC).
|
||||||
|
|
||||||
## Category 2 — Delegated Authorization and Identity
|
## Category 2 — Delegated Authorization and Identity
|
||||||
- Owner: — (security cluster: Dane / Chandu)
|
- Owner: — (security cluster: Dane / Chandu)
|
||||||
- Status: not started (criteria stub seeded)
|
- Status: not started (criteria stub seeded) — **but cat-1 work already generated strong evidence; see LIVE-POC "Known behaviors".**
|
||||||
- Notes: holds the Entra/Okta SSO login → identity-mapping test (a teammate can be User B).
|
- Notes: holds the Entra/Okta SSO login → identity-mapping test. Open finding: User Source keys user_id on opaque Entra `sub`, mismatching the dashboard email → blocks downstream OAuth consent bind (fix: map User Source to the email claim). Google provider redirect-uri/secret issue was resolved 2026-06-22.
|
||||||
|
|
||||||
## Category 3 — Tool-Level Access Control and Policy
|
## Category 3 — Tool-Level Access Control and Policy
|
||||||
- Owner: trachakonda
|
- Owner: — (security cluster)
|
||||||
- Status: in progress — B1 (curr-state) + B5 (enforcement/bypass) DONE; B2/B3/B4 + per-user B1 pending dashboard + Contextual Access.
|
- Status: not started (criteria stub seeded)
|
||||||
- Last live-state check: 2026-06-18 (apps/arcade #2383 steady; dashboard 200). Noted: otel-collector + jaeger now deployed (cat-5) → trace store for B6.
|
|
||||||
- Notes: Engine is the enforcement point (ungranted tool rejected there); one gateway = gateway-wide tool list (A==B), not per-user. Bypass: public-isolated for in-cluster worker (ClusterIP); tunnel custom servers = documented boundary. Blocked on dashboard for Contextual Access (input-block/output-redact) + per-user grants.
|
|
||||||
|
|
||||||
## Category 4 — Connector Coverage and Custom Server Development
|
## Category 4 — Connector Coverage and Custom Server Development
|
||||||
- Owner: — (adopt/operate cluster)
|
- Owner: — (adopt/operate cluster)
|
||||||
@@ -26,8 +25,9 @@ Last full-repo update: 2026-06-18 (scaffold).
|
|||||||
|
|
||||||
## Category 5 — Auditability and Observability
|
## Category 5 — Auditability and Observability
|
||||||
- Owner: ztaylor
|
- Owner: ztaylor
|
||||||
- Status: not started (criteria stub seeded)
|
- Status: **NEXT — start here in a fresh session** (invoke skill `arcade-gateway-eval`; read this + LIVE-POC; run live-state check). See `categories/cat5-auditability/NOTES.md` for the plan.
|
||||||
- Notes: metrics → Grafana/Mimir (NOT ELK); engine OTLP currently dropped (no collector). See LIVE-POC.
|
- Last live-state check: —
|
||||||
|
- Notes: metrics → **Grafana/Mimir** (NOT ELK); logs → ELK (Vector). Engine OTLP currently **dropped** — collector `arcade-otel-collector:4318` doesn't resolve. First task = OTEL collector → Prometheus/Mimir remediation (with the user; touches `k8s-backstage-v2/apps/arcade`). Full evidence + remediation shapes in LIVE-POC "Observability".
|
||||||
|
|
||||||
## Category 6 — Security and Compliance
|
## Category 6 — Security and Compliance
|
||||||
- Owner: — (security cluster)
|
- Owner: — (security cluster)
|
||||||
|
|||||||
@@ -25,24 +25,20 @@
|
|||||||
## Benchmark tests
|
## Benchmark tests
|
||||||
| # | Test (verbatim) | Result | Evidence |
|
| # | Test (verbatim) | Result | Evidence |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| 1 | Grant User A access to GitHub tools and User B access to Atlassian tools. Verify User A cannot invoke Atlassian tools even if they know the tool name. | PARTIAL (curr-state) — on one gateway the tool list is gateway-wide, identical for A and B (not per-user); an ungranted/unknown tool is cleanly rejected at the Engine. True per-user grant (A=GitHub, B=Atlassian) needs 2 gateways or Contextual Access (dashboard). | probes.md §B1: A==B 10 tools; `Github_CreateIssue` → `McpError: tool not enabled for this gateway` |
|
| 1 | Grant User A access to GitHub tools and User B access to Atlassian tools. Verify User A cannot invoke Atlassian tools even if they know the tool name. | | |
|
||||||
| 2 | Write a Contextual Access rule that blocks inputs containing a specific pattern (e.g., a mock SSN). Send a matching input — verify it is blocked before execution and logged. | | |
|
| 2 | Write a Contextual Access rule that blocks inputs containing a specific pattern (e.g., a mock SSN). Send a matching input — verify it is blocked before execution and logged. | | |
|
||||||
| 3 | Write a Contextual Access rule that redacts a field from tool outputs. Verify the field is absent from the agent's response. | | |
|
| 3 | Write a Contextual Access rule that redacts a field from tool outputs. Verify the field is absent from the agent's response. | | |
|
||||||
| 4 | Update User A's tool grants (add a new tool). Verify the change takes effect without restarting anything. | | |
|
| 4 | Update User A's tool grants (add a new tool). Verify the change takes effect without restarting anything. | | |
|
||||||
| 5 | Confirm policy enforcement point: attempt to bypass Contextual Access by calling the server directly (bypassing the Engine). Confirm this is architecturally prevented or explicitly documented as a known boundary. | DONE — enforcement is at the Engine. All arcade Services are ClusterIP; the worker (where tools run) is not public → public bypass network-prevented. In-cluster direct-to-worker is reachable but secret-gated (operational). Self-hosted custom servers exposed via public tunnel are a documented bypass boundary. | probes.md §B5: svc types; worker `/worker/health`=200, `/mcp`=406 (needs secret) |
|
| 5 | Confirm policy enforcement point: attempt to bypass Contextual Access by calling the server directly (bypassing the Engine). Confirm this is architecturally prevented or explicitly documented as a known boundary. | | |
|
||||||
|
|
||||||
## Suggested pass/fail gates
|
## Suggested pass/fail gates
|
||||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| Tool isolation | Cross-user tool calls are rejected at the Engine regardless of client behavior | PARTIAL — ungranted/unknown tools are rejected at the Engine (not the client); but on one gateway the allow-list is gateway-wide, so it is not yet per-*user* isolation. | probes.md §B1/§B5 |
|
| Tool isolation | Cross-user tool calls are rejected at the Engine regardless of client behavior | | |
|
||||||
| Input policy | Blocked inputs are rejected before execution, not after | | |
|
| Input policy | Blocked inputs are rejected before execution, not after | | |
|
||||||
| Output policy | Redacted fields are absent from the agent's response | | |
|
| Output policy | Redacted fields are absent from the agent's response | | |
|
||||||
| Audit | Every policy decision (allow/block/redact) produces a retrievable log entry | | |
|
| Audit | Every policy decision (allow/block/redact) produces a retrievable log entry | | |
|
||||||
| Dynamic grants | Tool grant updates take effect without service restart | | |
|
| Dynamic grants | Tool grant updates take effect without service restart | | |
|
||||||
|
|
||||||
## Findings
|
## Findings
|
||||||
- **Enforcement point = the Engine (criterion 5).** Ungranted/unknown tool calls are rejected at the Engine with a clean structured error (`tool not enabled for this gateway`) — no leak, no execution, no shared-credential fallback.
|
-
|
||||||
- **Tool curation is per-gateway, not per-user (criteria 1, 2).** On a single Arcade-Headers gateway the tool list is identical for every `Arcade-User-ID` (A==B). Per-user differentiation requires Contextual Access (an access hook) or separate gateways / a User Source — to be tested once dashboard access lands.
|
|
||||||
- **Bypass surface (criterion 5 boundary).** Public attack surface is network-isolated for in-cluster tools (worker is ClusterIP). Two documented boundaries: (a) in-cluster direct-to-worker is only secret+network gated (operational, not architectural); (b) self-hosted custom servers exposed via public Cloudflare tunnel can be called directly, bypassing Engine policy — mitigate in prod via ClusterIP registration / tunnel access control.
|
|
||||||
- **V4 seam note.** With no ToolHub deployed, all of the above is Arcade-native enforcement. For a ToolHub front, the authority decision + audit (`ToolHubDecisionRecord`) would move to the ToolHub MCP Endpoint, and Arcade should be reachable only via ToolHub (closes boundary (a)/(b)).
|
|
||||||
- _Pending (dashboard / Contextual Access): per-user grants (1), Contextual Access input block (3) + output redaction (4), dynamic per-user grant w/o restart (7), audit of decisions (6), Okta-group scopes (8)._
|
|
||||||
|
|||||||
@@ -1,200 +0,0 @@
|
|||||||
# Where the AI Gateway and MCP Gateway fit — target architecture
|
|
||||||
|
|
||||||
> Cat-3 (Tool-Level Access Control & Policy) deliverable: the V4 seam map, extended into a
|
|
||||||
> concrete integration design. **Goal:** place an **AI Gateway** (LLM/model proxy) and an
|
|
||||||
> **MCP Gateway** (Arcade) into the existing `Agent Platform → Tool Hub → Automation Hub`
|
|
||||||
> stack **without major work on the Tool Hub or Automation Hub applications.**
|
|
||||||
>
|
|
||||||
> Grounded in: `servicetitan/tool-hub` @ master, `servicetitan/automation-hub` @ master,
|
|
||||||
> arcade-eval LIVE-POC (all read 2026-06-22).
|
|
||||||
|
|
||||||
## The thesis in one paragraph
|
|
||||||
|
|
||||||
Both Tool Hub and Automation Hub were built with the exact seams this needs, and neither does
|
|
||||||
the one thing Arcade is for. **Tool Hub** already has a data-driven `IExecutionAdapter` registry
|
|
||||||
with a **`mcp_proxy` SourceType named in the contract** — adding Arcade is the *intended*
|
|
||||||
extension, not surgery. **Automation Hub** explicitly scopes per-user OAuth / connector
|
|
||||||
infrastructure as a **non-goal** and names per-user OAuth brokering as the gap an external
|
|
||||||
platform fills. So the minimal-work design is: **(1) AI Gateway = pure configuration** (repoint
|
|
||||||
the model/embedding base URLs every component already calls); **(2) MCP Gateway (Arcade) = one
|
|
||||||
adapter pair behind Tool Hub's existing `mcp_proxy` seam**, with all per-user third-party OAuth
|
|
||||||
living *inside Arcade* (so Tool Hub needs no credential vault and no new OBO authority).
|
|
||||||
Automation Hub is untouched. Tool Hub remains the single authority/policy/audit plane over
|
|
||||||
**both** execution backends.
|
|
||||||
|
|
||||||
## Design constraints — what "no major work" means here
|
|
||||||
|
|
||||||
| App | Allowed | Explicitly avoided |
|
|
||||||
|---|---|---|
|
|
||||||
| **Tool Hub** | Implement one `ICatalogSource` + one `IExecutionAdapter` (`type='arcade'`/`mcp_proxy`) — the designed extension point. Config: model base URLs → AI Gateway. | No change to discovery hot path, permission model, idempotency, audit, or the OBO core. Per-user SaaS OAuth is **not** added to Tool Hub. |
|
|
||||||
| **Automation Hub** | Nothing. | No new executor, no connector framework, no OAuth store. AH stays one of Tool Hub's catalog sources. |
|
|
||||||
| **Agent Platform** | Config: inference endpoint → AI Gateway; identity = per-user Entra SSO. | No re-architecture. |
|
|
||||||
|
|
||||||
## 1. Target topology
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
flowchart TB
|
|
||||||
subgraph IDP["Identity"]
|
|
||||||
Entra["Entra ID SSO<br/>per-user login / IUM"]
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph AGENT["Agent plane"]
|
|
||||||
Agent["LLM Agent<br/>(AgentOS / sidecar)"]
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph GW["Gateways — inserted, no app surgery"]
|
|
||||||
AIGW["AI Gateway<br/>LiteLLM-class LLM/model proxy<br/>keys · routing · rate-limit · cost · audit"]
|
|
||||||
MCPGW["MCP Gateway — Arcade<br/>MCP transport + per-user OAuth broker"]
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph TH["Tool Hub — authority / data plane (core UNCHANGED)"]
|
|
||||||
MCPHost["MCP surface<br/>search_tools · get_tool_details · execute_tool"]
|
|
||||||
Policy["Stage0-6: permission re-check ·<br/>idempotency · rate-limit · audit/outbox"]
|
|
||||||
Reg["IExecutionAdapter registry<br/>(catalog_source.type → adapter)"]
|
|
||||||
AHAdapter["automation_hub adapter<br/>(exists)"]
|
|
||||||
ArcAdapter["arcade adapter<br/>(NEW — mcp_proxy seam)"]
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph AH["Automation Hub — UNCHANGED"]
|
|
||||||
AHCat["Catalog API<br/>GET /api/catalog/actions (ETag, cursor)"]
|
|
||||||
AHExec["POST /actions/{id}/execute<br/>st.automation_hub.execute"]
|
|
||||||
AHDown["ST Core API v2 / Internal API<br/>IUM bot-user impersonation"]
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph EXT["Third-party + custom capability"]
|
|
||||||
SaaS["GitHub · Slack · Google · ..."]
|
|
||||||
Custom["Custom / partner MCP servers"]
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph MODELS["Model providers"]
|
|
||||||
LLMs["Anthropic · Voyage · OpenAI · internal"]
|
|
||||||
end
|
|
||||||
|
|
||||||
Entra -. "per-user token" .-> Agent
|
|
||||||
Agent -- "inference" --> AIGW
|
|
||||||
Agent -- "MCP meta-tools (carries user identity)" --> MCPHost
|
|
||||||
MCPHost --> Policy --> Reg
|
|
||||||
Reg --> AHAdapter
|
|
||||||
Reg --> ArcAdapter
|
|
||||||
AHAdapter -- "catalog sync" --> AHCat
|
|
||||||
AHAdapter -- "IUM OBO execute" --> AHExec
|
|
||||||
AHExec --> AHDown
|
|
||||||
ArcAdapter -- "MCP tools/call + user identity" --> MCPGW
|
|
||||||
MCPGW -- "resolve per-user OAuth token" --> SaaS
|
|
||||||
MCPGW --> Custom
|
|
||||||
AIGW --> LLMs
|
|
||||||
TH -. "enrichment · query rewrite · embeddings · rerank" .-> AIGW
|
|
||||||
|
|
||||||
classDef new fill:#ffe8cc,stroke:#e8860c,stroke-width:2px,color:#000;
|
|
||||||
class AIGW,MCPGW,ArcAdapter new;
|
|
||||||
```
|
|
||||||
|
|
||||||
Highlighted (orange) = the only new pieces: the **AI Gateway**, the **MCP Gateway (Arcade)**,
|
|
||||||
and the thin **arcade adapter** that slots into Tool Hub's existing registry.
|
|
||||||
|
|
||||||
## 2. Two execution paths through one authority plane
|
|
||||||
|
|
||||||
Tool Hub stays the single point of policy, idempotency, and audit. The *only* difference
|
|
||||||
between an internal action and a third-party action is which adapter the registry resolves — and
|
|
||||||
that the Arcade path adds per-user OAuth that neither Tool Hub nor AH can do today.
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
sequenceDiagram
|
|
||||||
autonumber
|
|
||||||
participant U as User / Agent
|
|
||||||
participant TH as Tool Hub
|
|
||||||
participant AR as Arcade (MCP GW)
|
|
||||||
participant SaaS as Third-party SaaS
|
|
||||||
participant AH as Automation Hub
|
|
||||||
participant ST as ServiceTitan APIs
|
|
||||||
|
|
||||||
Note over U,ST: A. Internal ServiceTitan action — existing path, unchanged
|
|
||||||
U->>TH: execute_tool(automation_hub://crm.create_job, input)
|
|
||||||
TH->>TH: permission re-check · idempotency · rate-limit · audit
|
|
||||||
TH->>AH: POST /actions/{id}/execute (IUM OBO, bot-user)
|
|
||||||
AH->>ST: call Core / Internal API
|
|
||||||
ST-->>AH: result
|
|
||||||
AH-->>TH: ActionExecutionResult
|
|
||||||
TH-->>U: CallToolResult
|
|
||||||
|
|
||||||
Note over U,SaaS: B. Third-party action — NEW path via Arcade
|
|
||||||
U->>TH: execute_tool(arcade://github.create_issue, input)
|
|
||||||
TH->>TH: SAME permission re-check · idempotency · rate-limit · audit
|
|
||||||
TH->>AR: MCP tools/call + user identity (Entra SSO)
|
|
||||||
AR->>AR: resolve this user's stored GitHub OAuth token
|
|
||||||
AR->>SaaS: call GitHub API AS THE USER
|
|
||||||
SaaS-->>AR: result
|
|
||||||
AR-->>TH: MCP CallToolResult
|
|
||||||
TH-->>U: CallToolResult
|
|
||||||
```
|
|
||||||
|
|
||||||
The critical property: **the per-user OAuth complexity lives entirely in Arcade.** Tool Hub only
|
|
||||||
authenticates the *user* to Arcade and passes identity — so it needs no third-party token vault
|
|
||||||
and no change to its Entra/IUM OBO core (the arcade adapter sets `RequiresObo=false` for the
|
|
||||||
third-party-OAuth case; Arcade does the brokering). That is what keeps this out of "major work."
|
|
||||||
|
|
||||||
## 3. The AI Gateway is a configuration change, not a build
|
|
||||||
|
|
||||||
Every model/embedding call in the stack already goes through a pinned SDK with a configurable
|
|
||||||
endpoint. Point those endpoints at one AI Gateway and you get unified keys, routing, rate-limit,
|
|
||||||
cost control, and audit across all AI traffic — with zero application code change.
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
flowchart LR
|
|
||||||
A["Agent inference"] --> AIGW
|
|
||||||
B["Tool Hub — enrichment (Claude)"] --> AIGW
|
|
||||||
C["Tool Hub — query rewrite (Claude Haiku)"] --> AIGW
|
|
||||||
D["Tool Hub — embeddings + rerank (Voyage)"] --> AIGW
|
|
||||||
E["Arcade engine — LLM / embeddings"] --> AIGW
|
|
||||||
AIGW["AI Gateway (LiteLLM-class)<br/>keys · routing · rate-limit · cost · audit"] --> P["Anthropic · Voyage · OpenAI · internal"]
|
|
||||||
classDef new fill:#ffe8cc,stroke:#e8860c,stroke-width:2px,color:#000;
|
|
||||||
class AIGW new;
|
|
||||||
```
|
|
||||||
|
|
||||||
The Arcade POC already routes its engine LLM + embeddings through in-cluster LiteLLM
|
|
||||||
(LIVE-POC), so this consolidates an existing pattern rather than inventing one.
|
|
||||||
|
|
||||||
## 4. Change surface — component by component
|
|
||||||
|
|
||||||
| Component | Role in target | Change required | Evidence it's minimal |
|
|
||||||
|---|---|---|---|
|
|
||||||
| **AI Gateway** (LiteLLM-class) | Single egress for all LLM/embedding traffic | **Config only** — repoint base URLs | Tool Hub model providers are DI seams with configurable endpoints (`IEmbeddingProvider`, `IEnrichmentProvider`, `IQueryRewriter`, `IReranker`); Arcade already uses in-cluster LiteLLM |
|
|
||||||
| **MCP Gateway (Arcade)** | MCP transport + **per-user OAuth broker** for SaaS / custom MCP | **Deploy + register** as Tool Hub catalog source | Arcade is a running self-hosted POC (`api.arcade.st.dev`) |
|
|
||||||
| **Tool Hub** | Authority: discovery, policy, idempotency, audit over both backends | **One adapter pair** in the `mcp_proxy` slot + endpoint config | `ICatalogSource` docstring already names `"mcp_proxy"`; adapter selection is `catalog_source.type → registry`, dispatch site unchanged |
|
|
||||||
| **Automation Hub** | One of Tool Hub's catalog sources (internal ST actions) | **None** | AH's catalog + `/actions/{id}/execute` contract already matches Tool Hub 1:1 (same 4 execution modes, JSON-Schema I/O, `namespace:name@semver`) |
|
|
||||||
| **Agent Platform** | Caller | **Config** — inference → AI Gateway; identity → per-user Entra SSO | — |
|
|
||||||
|
|
||||||
## 5. Why this is the right seam (and the one open decision)
|
|
||||||
|
|
||||||
- **It fills a real, documented gap.** Per-user third-party OAuth is explicitly absent from
|
|
||||||
*both* apps: AH lists "OAuth token management / connector marketplace" as a **V1 non-goal** and
|
|
||||||
its own platform research names per-user OAuth brokering as what an external platform must add;
|
|
||||||
Tool Hub's downstream auth is Entra/IUM-only. Arcade is precisely that missing layer.
|
|
||||||
- **It uses the designed extension point.** Tool Hub's `mcp_proxy` SourceType and data-driven
|
|
||||||
adapter registry exist *for this*. No core path changes.
|
|
||||||
- **It preserves the authority model (cat-3 criterion 5).** Tool Hub remains the single Engine
|
|
||||||
for permission re-check, idempotency, rate-limit, and audit over *both* AH and Arcade calls —
|
|
||||||
so the policy/enforcement story is unchanged and now covers third-party tools too.
|
|
||||||
- **One decision to confirm with Platform (chump/tahmad):** Tool Hub's ADR-009 currently intends
|
|
||||||
partner/MCP capabilities to arrive *through AH as actions*. Routing Arcade **direct into Tool
|
|
||||||
Hub** as a peer catalog source is a conscious deviation (ADR-009 even lists "BYO MCP outside
|
|
||||||
AH's onboarding flow" as a trigger to reconsider). The recommendation here is the direct path,
|
|
||||||
because AH has no plugin model and explicitly defers third-party connectivity — so going
|
|
||||||
through AH would push *more* net-new work into AH, violating the "no major work" constraint.
|
|
||||||
|
|
||||||
## Evidence index
|
|
||||||
|
|
||||||
- **Tool Hub:** `src/ToolHub.Contracts/Catalog/ICatalogSource.cs` (`mcp_proxy` named);
|
|
||||||
`src/ToolHub.Contracts/Execution/IExecutionAdapter.cs` (`RequiresObo`, `GetOboAuthority`);
|
|
||||||
`src/ToolHub.Execution/Dispatch/ExecutionAdapterRegistry.cs` (data-driven dispatch);
|
|
||||||
`Stage3_OboAcquisitionStage.cs` (Entra/IUM-only OBO); ADR-009, ADR-007.
|
|
||||||
Full seam map: `architecture/toolhub-arcade-integration.md` (outer repo).
|
|
||||||
- **Automation Hub:** `src/server/Host.Api/Controllers/ActionExecutionController.cs`
|
|
||||||
(`POST /actions/{id}/execute`); `Host.CatalogApi/Controllers/CatalogActionsController.cs`
|
|
||||||
(catalog sync contract); `Domain/Catalog/Actions/DownstreamApiAuthType.cs`
|
|
||||||
(`{ApiAccessToken, TokenServer, None}` — no per-user OAuth);
|
|
||||||
`crap/blueprint/system/context/v1-roadmap.md` (external integration = non-goal);
|
|
||||||
`docs/research/platform-selection/paragon.md` (per-user OAuth named as the external gap).
|
|
||||||
- **Arcade POC:** arcade-eval `LIVE-POC.md` (self-hosted, Entra IdP, in-cluster LiteLLM);
|
|
||||||
`criteria-section-3.md` (enforcement-at-Engine + bypass findings).
|
|
||||||
</content>
|
|
||||||
@@ -1,123 +0,0 @@
|
|||||||
# How the stack works — Automation Hub, Tool Hub, and the two gateways (plain language)
|
|
||||||
|
|
||||||
> A plain-terms companion to the technical seam map in
|
|
||||||
> `categories/cat3-access-policy/integration-architecture.md`. Same architecture, no jargon.
|
|
||||||
> Grounded in `servicetitan/automation-hub` @ master and `servicetitan/tool-hub` @ master
|
|
||||||
> (source-verified 2026-06-22).
|
|
||||||
|
|
||||||
## The one-paragraph version
|
|
||||||
|
|
||||||
**Automation Hub** is the warehouse of ~5,000+ things an agent can *do* inside ServiceTitan.
|
|
||||||
**Tool Hub** is the smart front desk that makes that giant catalog usable for an AI and acts as
|
|
||||||
the single bouncer (per-user permissions + audit). The **MCP Gateway (Arcade)** plugs in beside
|
|
||||||
Automation Hub to add *outside* tools (GitHub, Slack, Google) **with per-user login** — the one
|
|
||||||
thing neither of the others can do. The **AI Gateway** is one toll booth that every model/AI call
|
|
||||||
passes through (keys, cost, rate limits), added by **configuration, not a rebuild**.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 1. Automation Hub — the warehouse of actions
|
|
||||||
|
|
||||||
Where ServiceTitan keeps everything an agent can actually *do*: "create a job," "look up a
|
|
||||||
customer," "send an invoice" — 5,000+ actions today.
|
|
||||||
|
|
||||||
- It holds the **catalog** (every action + what inputs it needs) and does the **execution**
|
|
||||||
(actually calls ServiceTitan's internal APIs).
|
|
||||||
- Its login is **ServiceTitan-identity only.** It can act as a ServiceTitan user/bot, but it has
|
|
||||||
**no way to log into GitHub / Slack / Google on your behalf** — and that's deliberate (AH's
|
|
||||||
roadmap lists third-party OAuth as a non-goal).
|
|
||||||
|
|
||||||
> AH = the internal action warehouse. Great at ServiceTitan, blind to outside SaaS.
|
|
||||||
|
|
||||||
## 2. Tool Hub — the smart front desk
|
|
||||||
|
|
||||||
Handing an AI the raw list of 5,000 tools (heading to 200,000) blows its context window and it
|
|
||||||
picks the wrong tool. Tool Hub is the front desk between the agent and the warehouse. It does
|
|
||||||
three things:
|
|
||||||
|
|
||||||
1. **Aggregates** — every source (AH today, others later) becomes one clean, unified list. The
|
|
||||||
agent sees **one front desk**, not many warehouses.
|
|
||||||
2. **Discovers progressively** — the agent never reads the whole catalog. It asks:
|
|
||||||
- *"What tools do something like X?"* → `search_tools` returns a **short shortlist**
|
|
||||||
(names + one-line summaries only).
|
|
||||||
- *"How exactly do I use this one?"* → `get_tool_details` returns full instructions for just
|
|
||||||
the **1–3** it actually wants.
|
|
||||||
- *"Run it."* → `execute_tool`.
|
|
||||||
- (Plus `resume_execution`, `list_namespaces`, `cancel_execution`.)
|
|
||||||
It finds tools by **meaning, not keywords** — semantic search over a vector database
|
|
||||||
(pgvector + HNSW), embedded by **Voyage**, descriptions enriched by **Claude**, then reranked.
|
|
||||||
3. **Permission-filters** — before the shortlist ever reaches the agent, it **removes any tool
|
|
||||||
you're not allowed to use.** You can't see, let alone call, what you don't have access to.
|
|
||||||
|
|
||||||
> Tool Hub = the brain *and* the bouncer. It runs as its **own central service** (two
|
|
||||||
> autoscaled Kubernetes deployments + an admin UI), **not** a sidecar — and it's the single
|
|
||||||
> place policy, permissions, and audit live.
|
|
||||||
|
|
||||||
**The flow so far:**
|
|
||||||
|
|
||||||
```
|
|
||||||
Agent → Tool Hub (front desk: search · filter · decide) → Automation Hub (execute) → ServiceTitan APIs
|
|
||||||
```
|
|
||||||
|
|
||||||
## 3. Where the two gateways fit
|
|
||||||
|
|
||||||
Two real gaps remain. Each gateway plugs one.
|
|
||||||
|
|
||||||
### MCP Gateway (Arcade) — the gap = *outside tools*
|
|
||||||
|
|
||||||
Tool Hub + AH are great for internal ServiceTitan actions, but neither can **log into
|
|
||||||
GitHub/Slack/Google as you**. That's Arcade's one job: a second warehouse for **outside SaaS
|
|
||||||
tools, with per-user login built in.** Tool Hub already has an empty "plug in another source"
|
|
||||||
slot (the `mcp_proxy` adapter), so Arcade plugs in **right beside** Automation Hub:
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
flowchart LR
|
|
||||||
Agent["LLM Agent"]
|
|
||||||
TH["Tool Hub<br/>(brain + bouncer:<br/>search · per-user filter · audit)"]
|
|
||||||
AH["Automation Hub<br/>(internal actions)"]
|
|
||||||
AR["MCP Gateway — Arcade<br/>(outside tools + per-user login)"]
|
|
||||||
ST["ServiceTitan APIs"]
|
|
||||||
SaaS["GitHub · Slack · Google"]
|
|
||||||
Agent --> TH
|
|
||||||
TH --> AH --> ST
|
|
||||||
TH --> AR --> SaaS
|
|
||||||
classDef new fill:#ffe8cc,stroke:#e8860c,stroke-width:2px,color:#000;
|
|
||||||
class AR new;
|
|
||||||
```
|
|
||||||
|
|
||||||
Tool Hub stays the single front desk and bouncer for **both** paths. The only difference: for an
|
|
||||||
outside tool it hands off to Arcade, and **Arcade handles the messy per-user OAuth login** (that's
|
|
||||||
the "authorize GitHub" pop-up). Tool Hub never stores your GitHub token — Arcade does.
|
|
||||||
|
|
||||||
### AI Gateway — the gap = *the model calls themselves*
|
|
||||||
|
|
||||||
Everything above quietly uses AI models: semantic search uses **Voyage** embeddings, catalog
|
|
||||||
descriptions are written by **Claude**, the agent itself calls a model to think. The **AI
|
|
||||||
Gateway** is **one toll booth** all of that passes through — so keys, cost tracking, rate limits,
|
|
||||||
and routing live in one place.
|
|
||||||
|
|
||||||
The key point: this is **configuration, not a rebuild.** Every component already calls models
|
|
||||||
through a swappable address; you just **repoint those addresses at the gateway.**
|
|
||||||
|
|
||||||
```mermaid
|
|
||||||
flowchart LR
|
|
||||||
A["Agent thinking"] --> GW
|
|
||||||
B["Tool Hub — search (Voyage)"] --> GW
|
|
||||||
C["Tool Hub — descriptions (Claude)"] --> GW
|
|
||||||
GW["AI Gateway<br/>(one toll booth: keys · cost · limits)"] --> P["Anthropic · Voyage · OpenAI"]
|
|
||||||
classDef new fill:#ffe8cc,stroke:#e8860c,stroke-width:2px,color:#000;
|
|
||||||
class GW new;
|
|
||||||
```
|
|
||||||
|
|
||||||
## 4. The whole picture in one breath
|
|
||||||
|
|
||||||
| Piece | What it is (simple) | The gap it fills |
|
|
||||||
|---|---|---|
|
|
||||||
| **Automation Hub** | Warehouse of 5,000+ internal ServiceTitan actions; executes them (ST-login only) | — (the base) |
|
|
||||||
| **Tool Hub** | Smart central front desk: makes the catalog usable for an AI (search → details → run) + the one bouncer (per-user filter + audit) | Scale + governance |
|
|
||||||
| **MCP Gateway (Arcade)** | Plugs in beside AH to add outside tools (GitHub/Slack/Google) **with per-user login** | The thing neither AH nor Tool Hub can do |
|
|
||||||
| **AI Gateway** | One toll booth for **all** model/AI calls | One place for keys/cost/limits — added by config |
|
|
||||||
|
|
||||||
**The design win:** adding both gateways is mostly **plugging into seams that already exist** —
|
|
||||||
Tool Hub stays the single authority, Automation Hub is untouched, and the only genuinely new
|
|
||||||
capability (logging into third-party apps as you) lives inside Arcade.
|
|
||||||
@@ -1,73 +1,95 @@
|
|||||||
# Deploy arcade-eval reference MCP server to backstage k8s
|
# Deploy arcade-eval reference MCP server to backstage k8s
|
||||||
|
|
||||||
**Date:** 2026-06-22
|
**Date:** 2026-06-22
|
||||||
**Status:** Approved — implementing
|
**Status:** DONE — deployed and verified end-to-end.
|
||||||
|
|
||||||
## Goal
|
## Goal
|
||||||
|
|
||||||
Replace the ephemeral cloudflared **quick tunnel** (used to register the
|
Replace the ephemeral cloudflared **quick tunnel** (used to register the
|
||||||
`arcade-eval-ref` server with the self-hosted Arcade engine) with a permanent
|
`arcade-eval-ref` server with the self-hosted Arcade engine) with a permanent
|
||||||
in-cluster deployment on `backstage-wus2-v4`. The engine then reaches the server
|
deployment on `backstage-wus2-v4`, so the engine reaches the server over a stable
|
||||||
over stable cluster DNS instead of a `trycloudflare.com` URL that dies on restart.
|
URL instead of a `trycloudflare.com` URL that dies on restart.
|
||||||
|
|
||||||
Relevant eval categories: cat-4 (custom server dev), cat-8 (deployment), cat-9 (DX).
|
Relevant eval categories: cat-4 (custom server dev), cat-8 (deployment), cat-9 (DX).
|
||||||
|
|
||||||
## Architecture / data flow
|
## Key finding that shaped the final design
|
||||||
|
|
||||||
|
The first attempt registered the in-cluster **Service DNS**
|
||||||
|
(`http://arcade-eval-ref.arcade-eval-ref.svc.cluster.local:8000`) as a dashboard
|
||||||
|
worker. Health went green but **0 tools loaded**. Engine logs showed:
|
||||||
|
|
||||||
```
|
```
|
||||||
Arcade engine (ns: arcade) ──HTTP /worker/*──▶ Service arcade-eval-ref (ns: arcade-eval-ref)
|
Failed to get worker tools: Get ".../worker/tools":
|
||||||
registered as type "Arcade" └─▶ Deployment: python:3.12 running
|
dial tcp 10.0.192.27:8000: publicOnlyTransport: blocked connection to internal address
|
||||||
URI = http://arcade-eval-ref.arcade-eval-ref mcp_server.server over HTTP :8000
|
```
|
||||||
.svc.cluster.local:8000 (echo / add / whoami)
|
|
||||||
Secret = ARCADE_WORKER_SECRET ◀── same value ──▶ env ARCADE_WORKER_SECRET (SealedSecret)
|
**The Arcade engine has an SSRF guard (`publicOnlyTransport`) that blocks
|
||||||
|
dashboard-registered worker URIs resolving to internal/private (RFC1918) addresses.**
|
||||||
|
Only workers declared in the **engine config file** (e.g. the bundled `arcade-worker-main`
|
||||||
|
at `http://arcade-worker-main:8001`) may use internal URIs. Health checks aren't guarded
|
||||||
|
(hence green), but the authenticated `/worker/tools` discovery is. The cloudflared tunnel
|
||||||
|
worked only because it was a *public* URL.
|
||||||
|
|
||||||
|
⇒ A dashboard-registered in-cluster worker **must be exposed on a public URL**. (The
|
||||||
|
worker secret was a red herring — the connection is refused before auth.)
|
||||||
|
|
||||||
|
## Architecture / data flow (final)
|
||||||
|
|
||||||
|
```
|
||||||
|
Claude Code ──▶ gateway zeb-gateway-test ──▶ Arcade engine ──HTTPS /worker/*──▶
|
||||||
|
https://arcade-eval-ref.st.dev (Cloudflare CNAME → k8s-backstage.st.dev → nginx ingress)
|
||||||
|
└─▶ Service → Deployment: python:3.12 running mcp_server.server over HTTP :8000
|
||||||
|
(echo / add / whoami). /mcp also served; /worker/* auth = ARCADE_WORKER_SECRET.
|
||||||
```
|
```
|
||||||
|
|
||||||
### Runtime facts (verified by introspecting `arcade-mcp-server` 1.17)
|
### Runtime facts (verified by introspecting `arcade-mcp-server` 1.17)
|
||||||
|
|
||||||
- `app.run()` honors env overrides via `_get_configuration_overrides()`:
|
- `app.run()` honors env overrides via `_get_configuration_overrides()`:
|
||||||
`ARCADE_SERVER_TRANSPORT=http`, `ARCADE_SERVER_HOST=0.0.0.0`, `ARCADE_SERVER_PORT=8000`.
|
`ARCADE_SERVER_TRANSPORT=http`, `ARCADE_SERVER_HOST=0.0.0.0`, `ARCADE_SERVER_PORT=8000`
|
||||||
So the hardcoded `127.0.0.1` in `server.py`'s `__main__` is overridden at runtime —
|
— so the hardcoded `127.0.0.1` in `server.py` is overridden at runtime (no code change).
|
||||||
**no `server.py` change needed.**
|
- `ARCADE_WORKER_SECRET` enables worker routes at `/worker/*`; the engine authenticates with
|
||||||
- `ARCADE_WORKER_SECRET` (settings alias `arcade.server_secret`) → worker routes mount at
|
an HS256 JWT (`aud=worker`, `ver=1`) signed with that secret. MCP is served at `/mcp`.
|
||||||
`/worker/*` (what the engine calls); MCP also served at `/mcp`. FastAPI app, port 8000.
|
|
||||||
|
|
||||||
## Components
|
## Components (three repos)
|
||||||
|
|
||||||
### 1. `arcade-eval` repo (branch off `main`)
|
### 1. `arcade-eval` — image
|
||||||
|
- `lib/mcp_server/Dockerfile` — `python:3.12-slim`, `pip install .`, HTTP transport via env,
|
||||||
|
non-root, port 8000.
|
||||||
|
- `.github/workflows/build-push-acr.yml` — pushes
|
||||||
|
`servicetitandev.azurecr.io/arcade-eval-ref:1.0.<run_number>` (secrets
|
||||||
|
`ACR_DEV_USERNAME`/`ACR_DEV_PASSWORD`). Adapted from `servicetitan/mem0`.
|
||||||
|
|
||||||
- **`lib/mcp_server/Dockerfile`** — `python:3.12-slim`, `pip install .` (pulls
|
### 2. `k8s-backstage-v2` — `apps/mcp/arcade-eval-ref/`
|
||||||
`arcade-mcp-server` + `httpx`), `ENV` transport/host/port, non-root user, `EXPOSE 8000`,
|
- `namespace.yaml` — ns `arcade-eval-ref`.
|
||||||
`CMD ["python","-m","mcp_server.server"]`.
|
- `server.yaml` — **st-app HelmRelease** (chart 2.0.72): `image` pinned to `1.0.1`,
|
||||||
- **`.github/workflows/build-push-acr.yml`** — adapted from `servicetitan/mem0`. Pushes
|
`service.internalPort: 8000`, **`ingress.enabled` host `arcade-eval-ref.st.dev`
|
||||||
`servicetitandev.azurecr.io/arcade-eval-ref:1.0.<run_number>`. Login via repo secrets
|
class `nginx`, `oAuth.enabled: false`** (no SSO wall over `/worker/*` or `/mcp`),
|
||||||
`ACR_DEV_USERNAME` / `ACR_DEV_PASSWORD`. Triggers: `workflow_dispatch` + push to `main`
|
worker secret via `envFrom` from the SealedSecret, probes off. TLS = ingress default
|
||||||
filtered to `lib/mcp_server/**`.
|
`*.st.dev` wildcard cert.
|
||||||
|
- `sealedsecret.yaml` — `arcade-eval-ref-worker-secret` (key `ARCADE_WORKER_SECRET`),
|
||||||
|
strict scope, sealed with the backstage-wus2-v4 sealed-secrets cert.
|
||||||
|
|
||||||
### 2. `k8s-backstage-v2` repo (branch off `master`)
|
### 3. `iac-terraform-workspaces` — DNS
|
||||||
|
- CNAME `arcade-eval-ref.st.dev` → `k8s-backstage.st.dev` (st.dev zone), mirroring the
|
||||||
|
`anvil`/`alerts` pattern.
|
||||||
|
|
||||||
New dir **`apps/mcp/arcade-eval-ref/`** (Flux's `apps` Kustomization recursively applies
|
## Registration (dashboard)
|
||||||
everything under `apps/`; no per-dir `kustomization.yaml`):
|
|
||||||
|
|
||||||
- **`namespace.yaml`** — ns `arcade-eval-ref` (labels per repo convention, `team: infra`).
|
Add/repoint the worker: URI `https://arcade-eval-ref.st.dev`, Secret = the worker-secret
|
||||||
- **`server.yaml`** — plain `Deployment` (image
|
plaintext (git-ignored at `results/arcade-eval-ref-worker-secret.txt`). The engine then
|
||||||
`servicetitandev.azurecr.io/arcade-eval-ref:1.0.1`; no imagePullSecret — the cluster has
|
fetches `/worker/tools` over the public URL → tools load → add to `zeb-gateway-test`.
|
||||||
native ACR pull, confirmed by other `apps/mcp/*` servers; `ARCADE_WORKER_SECRET` from
|
|
||||||
secretRef; TCP probes; modest resources) + `Service` (ClusterIP, 8000→8000).
|
|
||||||
- **`sealedsecret.yaml`** — `arcade-eval-ref-worker-secret`, key `ARCADE_WORKER_SECRET`,
|
|
||||||
**strict** scope, sealed offline with `kubeseal --cert <backstage-wus2-v4 public cert>`.
|
|
||||||
|
|
||||||
## Manual steps after merge
|
## Verified
|
||||||
|
|
||||||
1. Add `ACR_DEV_USERNAME` / `ACR_DEV_PASSWORD` repo secrets to `arcade-eval`.
|
- `https://arcade-eval-ref.st.dev/worker/health` → 200 (valid `*.st.dev` LE cert);
|
||||||
2. `workflow_dispatch` (or merge to `main`) to build/push the image — first run = tag `1.0.1`.
|
`/worker/tools` with a correct worker JWT → 200, tools `Echo/Add/Whoami`.
|
||||||
3. Merge the k8s branch; Flux applies the namespace/secret/deployment.
|
- Through the gateway: `ArcadeEvalRef_Whoami()` → the caller's Entra `sub`
|
||||||
4. Dashboard → **Add Server → Arcade**, URI
|
(`GvgRofe5…`), proving per-user execution across the full
|
||||||
`http://arcade-eval-ref.arcade-eval-ref.svc.cluster.local:8000`, Secret = the worker secret
|
client → gateway → engine → public URL → in-cluster pod chain.
|
||||||
plaintext (stored git-ignored at `results/arcade-eval-ref-worker-secret.txt`); re-point the
|
|
||||||
`zeb-gateway-test` gateway's ref tools at it and drop the tunnel. Delete the plaintext file
|
|
||||||
afterward.
|
|
||||||
|
|
||||||
## Out of scope (YAGNI)
|
## Alternative considered (not taken)
|
||||||
|
|
||||||
No ingress (internal-only ClusterIP), no HPA, no PodMonitor/metrics (separate cat-5 work),
|
Declare the server as a static worker in the **engine config** (`tools.directors[].workers`,
|
||||||
single replica.
|
like `arcade-worker-main`) — that path allows internal URIs and avoids public exposure, but
|
||||||
|
edits the vendor Helm release (`apps/arcade`) and loses the dashboard per-project workflow.
|
||||||
|
Public ingress was chosen as the lower-touch option.
|
||||||
|
|||||||
Reference in New Issue
Block a user