docs: _TEMPLATE + all-10 criteria-section stubs (verbatim criteria)
This commit is contained in:
@@ -0,0 +1,10 @@
|
||||
# Lane notes — Category N
|
||||
|
||||
Working scratchpad for this lane. Keep terse; the scored deliverable is `criteria-section-N.md`.
|
||||
|
||||
- **Owner:**
|
||||
- **Last live-state check:**
|
||||
- **Fixtures used:** (gateway slug, server, user_ids — see `../../config/targets.yaml`)
|
||||
|
||||
## Log
|
||||
- (date) — what was done / found
|
||||
@@ -0,0 +1,29 @@
|
||||
# Category N — <Name> (weight W)
|
||||
|
||||
> Verbatim criteria / gates / questions from the criteria Google Doc. Fill Score / Evidence /
|
||||
> Findings / Answers locally; **the human pastes** into the Google Doc. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | <verbatim criterion> | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — <anchor>
|
||||
- **3** — <anchor>
|
||||
- **5** — <anchor>
|
||||
|
||||
## Benchmark questions / tests
|
||||
| # | Question / test (verbatim) | Answer / result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | <verbatim> | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| <gate> | <verbatim> | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
@@ -0,0 +1,43 @@
|
||||
# Category 1 — Functional MCP Gateway Capability (weight 8)
|
||||
|
||||
> Verbatim criteria / gates / questions from the criteria Google Doc. Fill Score / Evidence /
|
||||
> Findings / Answers locally; **the human pastes** into the Google Doc. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Implements MCP protocol correctly — tool listing, tool invocation, error responses. | | |
|
||||
| 2 | Gateway tool curation — ability to expose a subset of tools from underlying servers to a given doorway. | | |
|
||||
| 3 | Per-user tool scoping — different users see different tool lists based on their explicit grants. | | |
|
||||
| 4 | Supports all required MCP clients without custom adapters (Claude Code, Cursor, LangGraph, internal agent frameworks). | | |
|
||||
| 5 | Tool execution isolation — one user's tool call cannot access another user's tokens or context. | | |
|
||||
| 6 | Supports mixing prebuilt (global catalog) and custom (self-hosted) servers behind a single gateway URL. | | |
|
||||
| 7 | Gateway is pure metadata — adding or removing tools does not require server redeployment. | | |
|
||||
| 8 | Dynamic tool registration — new tools become available without gateway restart. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — Basic MCP server, no per-user scoping or curation
|
||||
- **3** — Gateway curation works; per-user scoping requires workarounds
|
||||
- **5** — Full per-user tool scoping, mixed-server gateways, zero-config for MCP clients
|
||||
|
||||
## Benchmark questions
|
||||
| # | Question (verbatim) | Answer | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Can a Claude Code client connect to the gateway and see only the tools granted to the current user? | | |
|
||||
| 2 | Can the same gateway URL serve two different users with different tool lists? | | |
|
||||
| 3 | Can we add a tool to the gateway without restarting any server or the Engine? | | |
|
||||
| 4 | Can we expose tools from both a prebuilt connector and a custom self-hosted server through one gateway endpoint? | | |
|
||||
| 5 | What happens when a client requests a tool the user has not been granted? | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| MCP protocol compliance | Any compliant MCP client connects without custom adapters | | |
|
||||
| Tool curation | Gateway tool list matches exactly the configured allow-list | | |
|
||||
| Per-user isolation | User A cannot see or invoke tools granted only to User B | | |
|
||||
| Mixed server gateway | Prebuilt and custom server tools coexist behind one gateway URL | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
@@ -0,0 +1,49 @@
|
||||
# Category 10 — Product Fit — Tools Catalog and Multi-Tenancy (weight 5)
|
||||
|
||||
> *Scored only if the engineering team proceeds to evaluate Arcade as the MCP gateway layer for
|
||||
> ServiceTitan's customer-facing tools catalog.* Verbatim criteria/gates from the criteria Google
|
||||
> Doc. Fill Score/Evidence locally; **the human pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
**The multi-tenancy problem (verbatim):** ServiceTitan is a multi-tenant SaaS serving tens of
|
||||
thousands of business tenants. Creating one Arcade project per tenant is not a viable architecture.
|
||||
The requirement is a single shared Arcade deployment where tenant isolation is enforced within it:
|
||||
Tenant A's users cannot access Tenant B's tokens, tool grants, or data. Arcade's native isolation
|
||||
boundary is the **project**; within a project, isolation is at the `user_id` level.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Native multi-tenant isolation within a single project — Tenant A's tokens, tool grants, and policy are fully isolated from Tenant B's without separate projects. | | |
|
||||
| 2 | Per-tenant tool access policies — different tenants can have different tool allowlists and Contextual Access rules. | | |
|
||||
| 3 | Per-tenant quota and rate limits — one tenant's usage cannot degrade another's. | | |
|
||||
| 4 | Cross-tenant token isolation — provably no path for Tenant A's token to be served on a Tenant B tool call. | | |
|
||||
| 5 | New tenants can be provisioned programmatically via API — no manual steps, no UI clicks. | | |
|
||||
| 6 | Gateway configuration is API-driven to support programmatic tenant onboarding at scale. | | |
|
||||
| 7 | Custom servers built for internal use can be reused for the product use case without re-architecting. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — No multi-tenant model; one project per tenant is the only isolation path — does not scale
|
||||
- **3** — user_id-level token isolation works within a project; tenant-level policy and quota require significant custom work
|
||||
- **5** — Native multi-tenant model within a single deployment — per-tenant isolation, policy, quota, and API-driven onboarding all supported
|
||||
|
||||
## Benchmark questions
|
||||
| # | Question (verbatim) | Answer | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Does Arcade have a native multi-tenancy model within a single project, or does tenant isolation require one project per tenant? | | |
|
||||
| 2 | If `tenant_id:user_id` is used as the user_id, does Arcade enforce any tenant-level policy or quota boundaries, or is it purely token isolation? | | |
|
||||
| 3 | Can per-tenant tool access policies (different tool lists per tenant) be managed via API? | | |
|
||||
| 4 | Can a new tenant be onboarded — token vault initialized, tool grants set, gateway access configured — entirely via API with no manual steps? | | |
|
||||
| 5 | What is the recommended architecture for serving tens of thousands of tenants from a single Arcade deployment? | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| Multi-tenant isolation | Tenant A's tokens and tool grants are provably inaccessible to Tenant B within a single deployment | | |
|
||||
| No per-tenant project | Tenant isolation does not require one Arcade project per tenant | | |
|
||||
| API-driven onboarding | A new tenant can be fully provisioned via API with no manual steps | | |
|
||||
| Per-tenant policy | Different tenants can have different tool allowlists managed programmatically | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
@@ -0,0 +1,49 @@
|
||||
# Category 2 — Delegated Authorization and Identity (weight 20)
|
||||
|
||||
> The load-bearing category: every tool call executes as the calling user, using that user's own
|
||||
> credentials, and the agent code never sees the token. Verbatim criteria/gates from the criteria
|
||||
> Google Doc. Fill Score/Evidence locally; **the human pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Per-user OAuth token vault — tokens are stored and refreshed per user, per service, per scope. | | |
|
||||
| 2 | Tool calls execute as the calling user — not a shared service account or bot credential. | | |
|
||||
| 3 | Okta (OIDC/SAML) integration as the primary IDP for gateway access. | | |
|
||||
| 4 | Custom OAuth provider support — ability to register non-standard OAuth providers (Snowflake, Workday, TenantTalk via Okta). | | |
|
||||
| 5 | Token refresh is handled automatically without requiring user re-authentication on every call. | | |
|
||||
| 6 | The LLM and agent code never see raw tokens — token injection happens server-side in the Engine. | | |
|
||||
| 7 | Token vault is project-scoped — no cross-project token leakage. | | |
|
||||
| 8 | Admin consent — ability for an admin to pre-authorize a scope on behalf of a class of users. | | |
|
||||
| 9 | Admin-initiated token revocation — an admin can invalidate all vault tokens for a specific user directly in Arcade, without touching any downstream provider. Primary use case: employee offboarding or security incident response. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — Shared API keys or service accounts only; no per-user identity
|
||||
- **3** — Per-user OAuth works for prebuilt connectors; custom providers require undocumented manual steps; revocation requires going to each provider individually
|
||||
- **5** — Full per-user vault, Okta integration, custom OAuth providers documented and working, token refresh transparent, admin-initiated revocation works from one place
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Call a tool as User A. Verify it executes with User A's credentials by checking the downstream system's own audit log (e.g., GitHub shows the call as User A, not a service account). | | |
|
||||
| 2 | Revoke User A's OAuth token in the provider. Verify the next tool call triggers a consent/re-auth flow rather than silently failing or falling back to a shared credential. | | |
|
||||
| 3 | Configure a custom OAuth provider (Snowflake or Workday). Complete a full per-user token flow end-to-end: authorize → vault stores token → tool call executes as that user. | | |
|
||||
| 4 | Configure TenantTalk authentication via Okta as a custom OAuth provider. Verify the Engine brokers the token correctly. | | |
|
||||
| 5 | Verify token refresh: let a token expire. Confirm the next call either refreshes transparently or returns a clear re-auth prompt. | | |
|
||||
| 6 | Admin-initiated revocation: as an admin, invalidate all vault tokens for User A in Arcade directly (no downstream provider action). Verify User A's next tool call fails or triggers re-auth, across all connected systems simultaneously. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| Per-user execution | Tool calls provably execute as the calling user (verifiable in the downstream system's own logs) | | |
|
||||
| No shared credentials | No service account or shared token is used in any tool call path | | |
|
||||
| Okta integration | Gateway access works end-to-end through Okta OIDC/SAML | | |
|
||||
| Custom OAuth | At least one custom provider (Snowflake or Workday) configured and functional | | |
|
||||
| Token isolation | No user's token is accessible by, or executed as, another user | | |
|
||||
| Downstream revocation | Revoking a token at the provider level triggers re-auth on the next call — no silent fallback | | |
|
||||
| Admin-initiated revocation | An admin can invalidate all of a specific user's vault tokens in Arcade directly, taking effect across all connected systems without touching each provider individually | | |
|
||||
|
||||
## Findings
|
||||
- Note (deployment): live POC upstream IdP is **Entra ID**, not Okta yet — score criterion 3 against that gap.
|
||||
@@ -0,0 +1,44 @@
|
||||
# Category 3 — Tool-Level Access Control and Policy (weight 15)
|
||||
|
||||
> Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; **the human
|
||||
> pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Tool-level allow-list per user — a user can only call tools explicitly granted to them; the gateway enforces this, not the client. | | |
|
||||
| 2 | Contextual Access rules — per-user tool visibility and invocation policy layered on top of the gateway allow-list. | | |
|
||||
| 3 | Input filtering — ability to block or rewrite tool inputs based on policy before execution reaches the server. | | |
|
||||
| 4 | Output redaction — ability to mask or strip sensitive fields from tool outputs before they reach the agent. | | |
|
||||
| 5 | Policy is enforced at the Engine, not the client — a malicious or compromised client cannot bypass it. | | |
|
||||
| 6 | All policy decisions (allow, block, redact) are logged. | | |
|
||||
| 7 | Per-user tool grants can be updated without restarting the gateway or any server. | | |
|
||||
| 8 | Gateway scopes map to Okta groups — access managed in Okta, not a separate system. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — Gateway-level tool list only; no per-user scoping or input/output policy
|
||||
- **3** — Per-user grants work; Contextual Access input/output rules require significant manual work
|
||||
- **5** — Full per-user policy, Contextual Access input/output rules, Okta-managed scopes, all decisions audited
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Grant User A access to GitHub tools and User B access to Atlassian tools. Verify User A cannot invoke Atlassian tools even if they know the tool name. | | |
|
||||
| 2 | Write a Contextual Access rule that blocks inputs containing a specific pattern (e.g., a mock SSN). Send a matching input — verify it is blocked before execution and logged. | | |
|
||||
| 3 | Write a Contextual Access rule that redacts a field from tool outputs. Verify the field is absent from the agent's response. | | |
|
||||
| 4 | Update User A's tool grants (add a new tool). Verify the change takes effect without restarting anything. | | |
|
||||
| 5 | Confirm policy enforcement point: attempt to bypass Contextual Access by calling the server directly (bypassing the Engine). Confirm this is architecturally prevented or explicitly documented as a known boundary. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| Tool isolation | Cross-user tool calls are rejected at the Engine regardless of client behavior | | |
|
||||
| Input policy | Blocked inputs are rejected before execution, not after | | |
|
||||
| Output policy | Redacted fields are absent from the agent's response | | |
|
||||
| Audit | Every policy decision (allow/block/redact) produces a retrievable log entry | | |
|
||||
| Dynamic grants | Tool grant updates take effect without service restart | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
@@ -0,0 +1,56 @@
|
||||
# Category 4 — Connector Coverage and Custom Server Development (weight 10)
|
||||
|
||||
> Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; **the human
|
||||
> pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Prebuilt catalog covers required systems (GitHub, Salesforce, Atlassian/Jira). | | |
|
||||
| 2 | Python SDK (arcade-mcp) supports building custom servers with minimal boilerplate. | | |
|
||||
| 3 | Tool schema is auto-derived from Python type annotations — no manual schema authoring. | | |
|
||||
| 4 | Local development loop works without cloud infrastructure (stdio mode). | | |
|
||||
| 5 | Custom servers can be registered as self-hosted (HTTPS endpoint) and routed by the Engine. | | |
|
||||
| 6 | Custom OAuth provider registration — Engine brokers per-user tokens for custom systems. | | |
|
||||
| 7 | Custom servers can be versioned and updated without gateway downtime. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — No SDK; custom integration requires raw HTTP server and manual schema
|
||||
- **3** — SDK works for basic cases; custom OAuth is underdocumented; some systems blocked
|
||||
- **5** — SDK is productive, custom OAuth providers are documented and straightforward, all six systems have a working path
|
||||
|
||||
## Coverage of required systems (verbatim)
|
||||
| System | Prebuilt? | Path |
|
||||
|---|---|---|
|
||||
| GitHub | Yes | Prebuilt (global catalog) |
|
||||
| Salesforce | Yes | Prebuilt (global catalog) |
|
||||
| Atlassian / Jira | Partial | Prebuilt; confirm Confluence coverage |
|
||||
| HubSpot | Yes | Prebuilt (global catalog) |
|
||||
| QuickBooks | No | Custom server + custom OAuth provider |
|
||||
| Sage | No | Custom server + custom OAuth provider |
|
||||
| Snowflake | No | Custom server + custom OAuth provider |
|
||||
| Workday | No | Custom server + custom OAuth provider |
|
||||
| TenantTalk | No (internal) | Custom server + Okta-backed OAuth |
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Build a minimal custom server for one internal API using the arcade-mcp SDK. Measure time from schema to first successful local tool call. Target: under 2 hours. | | |
|
||||
| 2 | Register the custom server as self-hosted (HTTPS endpoint). Verify Engine routing works and tool calls reach the server. | | |
|
||||
| 3 | Configure Snowflake (or equivalent) as a custom OAuth provider. Complete per-user token flow end-to-end. | | |
|
||||
| 4 | Verify tool schema is auto-derived from Python type annotations. Confirm no manual JSON schema authoring is required. | | |
|
||||
| 5 | Update a custom tool's implementation. Verify the change takes effect without restarting the gateway. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| SDK productivity | Custom server from scratch to first local tool call in under 2 hours | | |
|
||||
| Self-hosted registration | HTTPS endpoint registers and routes correctly through the Engine | | |
|
||||
| Custom OAuth | At least one non-standard OAuth provider configured end-to-end | | |
|
||||
| Schema derivation | Tool schema is auto-derived; no manual JSON schema authoring required | | |
|
||||
| All systems covered | A working path (prebuilt or custom) exists for all six required systems | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
@@ -0,0 +1,54 @@
|
||||
# Category 5 — Auditability and Observability (weight 12)
|
||||
|
||||
> Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; **the human
|
||||
> pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
**How tool execution logging works (verbatim, confirmed with Arcade, Jun 15):** Arcade's built-in
|
||||
audit log covers administrative operations only (gateway creation, server registration, API key
|
||||
management) — this is by design, not a gap. Tool execution observability is handled via
|
||||
OpenTelemetry (OTEL): when deploying the Arcade image to Kubernetes, OTEL can be enabled to ship
|
||||
telemetry to any observability collector (Datadog, ELK Stack, etc.). When self-hosted, no telemetry
|
||||
flows back to Arcade — all data stays in ServiceTitan's infrastructure. This is the path to satisfy
|
||||
InfoSec's execution audit requirement.
|
||||
|
||||
**ServiceTitan reality (this deployment — see ../../LIVE-POC.md):** logs → ELK (Vector daemonset);
|
||||
**metrics → Grafana/Mimir** (Grafana Agent scrapes ServiceMonitors → remote_write to Mimir). The
|
||||
engine emits OTLP metrics but they are **dropped** today — `arcade-otel-collector:4318` does not
|
||||
resolve (no collector deployed). Remediation = deploy a collector + bridge it into Prometheus/Mimir.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | OTEL enabled on the self-hosted Arcade deployment — execution telemetry ships to ServiceTitan's observability stack (Datadog or ELK). | | |
|
||||
| 2 | Every tool call produces a log record with: user, tool invoked, timestamp, outcome — queryable in Datadog or ELK. | | |
|
||||
| 3 | Admin audit log — all configuration changes (gateways, servers, API keys, policies) are logged in Arcade. | | |
|
||||
| 4 | Per-tool and per-user usage metrics (call counts, error rates, latency) visible in the observability stack. | | |
|
||||
| 5 | Trace propagation — tool call traces joinable to agent and application traces via OTEL. | | |
|
||||
| 6 | No telemetry data leaves ServiceTitan's infrastructure to Arcade when self-hosted. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — No OTEL support; no execution telemetry available outside Arcade's dashboard
|
||||
- **3** — OTEL works but configuration is manual or underdocumented; trace propagation requires custom work
|
||||
- **5** — OTEL is documented and easy to enable; full execution telemetry in Datadog/ELK; trace propagation works end-to-end
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Enable OTEL on the self-hosted Arcade Kubernetes deployment. Make a tool call. Verify a record appears in Datadog (or ELK) with: user_id, tool name, timestamp, outcome. | | |
|
||||
| 2 | Make an administrative change (update a gateway). Verify the change appears in Arcade's admin audit log. | | |
|
||||
| 3 | Propagate a trace ID from an agent call through to the tool execution. Verify the trace is end-to-end visible in the observability stack. | | |
|
||||
| 4 | Confirm no tool execution telemetry is transmitted to Arcade's own systems when running self-hosted. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| OTEL integration | OTEL enabled on self-hosted deployment; execution telemetry flows to Datadog or ELK | | |
|
||||
| Execution audit | Every tool call produces a retrievable record with user, tool, timestamp, outcome in ServiceTitan's observability stack | | |
|
||||
| Admin audit | All Arcade configuration changes are logged in the admin audit log | | |
|
||||
| Data residency | No tool execution telemetry transmitted to Arcade when self-hosted — confirmed | | |
|
||||
| InfoSec sign-off | Dane Snyder confirms the OTEL-based execution audit satisfies the access audit requirement | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
@@ -0,0 +1,50 @@
|
||||
# Category 6 — Security and Compliance (weight 10)
|
||||
|
||||
> Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; **the human
|
||||
> pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | PII masking or redaction at the gateway layer — without changes to tool code. | | |
|
||||
| 2 | Input blocking — Contextual Access policy can block tool calls based on content. | | |
|
||||
| 3 | MCPs can be scaled to less than human access. | | |
|
||||
| 4 | Output redaction — sensitive fields removed from responses before reaching the agent. | | |
|
||||
| 5 | Data processing agreement (DPA) and sub-processor disclosure in place. | | |
|
||||
| 6 | SOC 2 / ISO 27001 certification (or equivalent) confirmed. | | |
|
||||
| 7 | Data boundary acceptable to InfoSec — tool call payloads route through Arcade's Engine; execution stays in ServiceTitan's infrastructure. | | |
|
||||
| 8 | Raw OAuth tokens are never exposed to the LLM, agent code, or logs. | | |
|
||||
| 9 | Secrets management integration (Azure Key Vault or equivalent) for API key storage. | | |
|
||||
| 10 | Potential for log forwarding for telemetry, alerting | | |
|
||||
| 11 | Potential integration for DLP tooling if possible | | |
|
||||
| 12 | Data boundary guardrails (able to block querying all records from a table) | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — No policy enforcement; payloads flow unmodified; DPA and certifications unconfirmed
|
||||
- **3** — Some policy controls exist; DPA in progress; compliance posture requires follow-up
|
||||
- **5** — Full policy enforcement, DPA executed, compliant data boundary, tokens never exposed
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Send a tool input containing a mock SSN. Verify it is redacted before reaching the tool function via a Contextual Access rule. | | |
|
||||
| 2 | Send a tool output containing a mock API key string. Verify it is redacted before reaching the agent. | | |
|
||||
| 3 | Attempt a tool call with an expired or revoked credential. Verify rejection with a clean error — no fallback to a shared credential. | | |
|
||||
| 4 | Attempt to call a tool that has been restricted by the MCP gateway that the person usually can perform | | |
|
||||
| 5 | Attempt to pull all records from an MCP integration, instead of focused data | | |
|
||||
| 6 | Review the DPA and sub-processor list against ServiceTitan's data governance requirements. | | |
|
||||
| 7 | Confirm in the Engine architecture that raw tokens never appear in logs, traces, or agent responses. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| Data boundary | Tool call payloads through Arcade Engine + execution in ServiceTitan infrastructure — acceptable to InfoSec | | |
|
||||
| No token exposure | Raw OAuth tokens are never visible in logs, traces, or agent responses | | |
|
||||
| DPA | Data processing agreement is executed before the pilot ends | | |
|
||||
| PII policy | At least one PII redaction rule works end-to-end | | |
|
||||
| Compliance | SOC 2 or equivalent certification confirmed | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
@@ -0,0 +1,42 @@
|
||||
# Category 7 — Performance and Availability (weight 8)
|
||||
|
||||
> Because every gateway-mediated tool call routes through the Arcade Engine — even when the custom
|
||||
> server is self-hosted — Engine latency and availability are a floor on the entire agent stack.
|
||||
> Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; **the human
|
||||
> pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Engine-added latency per tool call is within acceptable bounds for interactive agent use. | | |
|
||||
| 2 | Engine SLA — defined uptime guarantees with incident response process. | | |
|
||||
| 3 | Failure behavior when Engine is unavailable: fail-closed with a clean, catchable error. | | |
|
||||
| 4 | Self-hosted server HA — multi-replica, pod failure handling, no dropped calls on restart. | | |
|
||||
| 5 | Multi-region failover design — documented and validated. | | |
|
||||
| 6 | Engine geographic placement and round-trip latency from ServiceTitan's primary region. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — Engine SLA undocumented; failure behavior is a hang or silent failure; no HA guidance
|
||||
- **3** — SLA documented; HA works with manual configuration; failure behavior is known but requires client-side handling
|
||||
- **5** — SLA with incident response in writing; HA is the documented default; failure behavior is clean and observable
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Make 100 tool calls through the Engine to a self-hosted server. Measure P50, P95, P99 round-trip latency. Compare against a direct server call (bypassing the Engine) to isolate Engine-added overhead. | | |
|
||||
| 2 | Simulate Engine unavailability (block the Engine endpoint). Confirm tool calls fail with a clean, catchable error — not a hang or silent failure. | | |
|
||||
| 3 | Deploy the custom server with multiple replicas. Kill one pod. Confirm tool calls continue without dropped requests. | | |
|
||||
| 4 | Confirm Engine SLA documentation: uptime percentage, response time commitment, and P0 escalation path. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| Engine overhead | P95 Engine-added latency is under 500ms for standard (non-streaming) tool calls | | |
|
||||
| SLA documented | Engine uptime SLA and incident response process confirmed in writing | | |
|
||||
| HA | Self-hosted server survives pod failure; no tool calls dropped during pod restart | | |
|
||||
| Fail behavior | Engine outage produces a clean, catchable error to the agent — no hangs | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
@@ -0,0 +1,41 @@
|
||||
# Category 8 — Deployment and Operations (weight 7)
|
||||
|
||||
> Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; **the human
|
||||
> pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Helm chart available and documented for self-hosted server deployment in Kubernetes. | | |
|
||||
| 2 | Zero-downtime configuration updates — gateway and policy changes do not interrupt in-flight calls. | | |
|
||||
| 3 | GitOps-compatible — gateway, server, and policy configuration is expressible as code. | | |
|
||||
| 4 | Upgrade and rollback process is documented and tested. | | |
|
||||
| 5 | Runbooks for common failure scenarios. | | |
|
||||
| 6 | Vendor support model during the pilot: dedicated solutions engineer, response SLA. | | |
|
||||
| 7 | P0/P1 escalation path after the pilot, in production. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — No Helm chart; manual deployment only; no dedicated support
|
||||
- **3** — Helm chart works with gaps; zero-downtime config updates unverified; support exists but is not dedicated
|
||||
- **5** — Helm-native, GitOps-compatible, zero-downtime config, dedicated SE, documented escalation path
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | Deploy a self-hosted custom server to a Kubernetes namespace via Helm chart. Measure time from clean namespace to first successful tool call. Target: under 1 day. | | |
|
||||
| 2 | Update a gateway configuration (add a tool). Verify in-flight calls are not dropped. | | |
|
||||
| 3 | Simulate a configuration rollback. Verify the rollback completes cleanly and the prior configuration is restored. | | |
|
||||
| 4 | Stage a K8s namespace and confirm the Helm deployment matches the architecture recommended for production. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| Helm deployment | Full Kubernetes deployment via Helm chart in under 1 day | | |
|
||||
| Config safety | Gateway configuration changes are zero-downtime | | |
|
||||
| Rollback | Prior configuration can be restored cleanly | | |
|
||||
| Dedicated SE | A dedicated solutions engineer is available and responsive during the pilot | | |
|
||||
|
||||
## Findings
|
||||
- Note: a live deployment already exists (`k8s-backstage-v2/apps/arcade`, chart 1.8.8, Flux/GitOps) — a head start for this category's evidence.
|
||||
@@ -0,0 +1,43 @@
|
||||
# Category 9 — Developer Experience (weight 5)
|
||||
|
||||
> Verbatim criteria/gates from the criteria Google Doc. Fill Score/Evidence locally; **the human
|
||||
> pastes**. 1–5 scale; anchors at 1/3/5.
|
||||
|
||||
## Scores
|
||||
| # | Criterion (verbatim) | Score (1–5) | Evidence / note |
|
||||
|---|---|---|---|
|
||||
| 1 | Local development loop is productive — stdio mode enables tool development without cloud infrastructure (Stage 1: code runs locally, MCP client spawns the server directly). | | |
|
||||
| 2 | Tunnel-based development loop is supported — a developer can expose their locally running MCP server through a tunnel (Cloudflare, ngrok) and register it against a shared dev Arcade instance to exercise the full request chain (gateway → Engine → tunnel → local server) without deploying to Kubernetes. This is the primary development pattern for custom server authors. | | |
|
||||
| 3 | A shared dev Arcade instance is available for ServiceTitan developers to register tunnel endpoints against — no need to provision a personal Arcade org for every developer. | | |
|
||||
| 4 | Cloudflare tunnel (or equivalent) is the standardized proxy mechanism — documented, with a permanent named-tunnel option so the registered server URL does not change on every session restart. | | |
|
||||
| 5 | SDK documentation is complete, accurate, and has working examples. | | |
|
||||
| 6 | Error messages are actionable — auth failures, misconfigurations, and policy blocks identify the root cause. | | |
|
||||
| 7 | MCP client integration requires no custom adapters or wrappers. | | |
|
||||
| 8 | Gateway and server management are automatable via API. | | |
|
||||
|
||||
**Average:** ___ **Category score:** ___
|
||||
|
||||
## Score anchors
|
||||
- **1** — No tunnel support; local development requires full Kubernetes deployment to test the gateway chain
|
||||
- **3** — Tunnel registration works but is underdocumented; no shared dev instance; engineers figure it out individually
|
||||
- **5** — Tunnel loop is documented with a standard Cloudflare recipe; shared dev instance available; engineers are productive without platform hand-holding
|
||||
|
||||
## Benchmark tests
|
||||
| # | Test (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| 1 | **Stage 1 — local stdio:** Time an engineer from SDK install to first successful local tool call (stdio mode, no Arcade infrastructure). Target: under 2 hours. | | |
|
||||
| 2 | **Stage 2 — tunnel registration (the key test):** Developer runs a local MCP server in HTTP mode, opens a Cloudflare tunnel, registers the tunnel URL as a self-hosted server in the dev Arcade instance, and makes a tool call that flows: Claude Code → gateway → Engine → Cloudflare tunnel → local server. Verify the full chain works including auth and Contextual Access. Measure time from working local server to first successful gateway-mediated call. Target: under 1 day. | | |
|
||||
| 3 | Verify Cloudflare named tunnel (permanent hostname) — confirm the registered URL survives session restarts without re-editing the server registration. | | |
|
||||
| 4 | Intentionally misconfigure an OAuth provider. Measure how quickly the error message identifies the root cause. | | |
|
||||
| 5 | Integrate with Claude Code from scratch — time from gateway URL to working tool invocation in Claude Code. Target: under 30 minutes. | | |
|
||||
|
||||
## Suggested pass/fail gates
|
||||
| Gate | Pass condition (verbatim) | Result | Evidence |
|
||||
|---|---|---|---|
|
||||
| Tunnel loop | Full gateway → Engine → Cloudflare tunnel → local server chain works end-to-end | | |
|
||||
| Permanent tunnel URL | Named tunnel hostname persists across session restarts without re-registration | | |
|
||||
| Shared dev instance | ServiceTitan developers can register local servers against a shared dev Arcade org without individual account provisioning | | |
|
||||
| Time to first call | Engineer reaches a working gateway-mediated tool call in under 1 day from scratch | | |
|
||||
|
||||
## Findings
|
||||
-
|
||||
Reference in New Issue
Block a user