Chapter 05 of 08

The Reference Architecture

How the Five Pillars become running infrastructure. The LLM Gateway, cascading governance, envelope encryption, and the kill switch hierarchy.

The previous chapters defined what governed AI agent deployment requires. This chapter defines how — the technical architecture that makes Level 4 governance possible without sacrificing the speed and flexibility that makes AI agents valuable in the first place.

The architecture is organized into four layers, each addressing a distinct concern:

LayerConcernComponents
Execution LayerHow agents run and interact with toolsAgent executor, MCP tool registry, sandbox providers
Security LayerHow every LLM call is securedLLM Gateway pipeline (10-stage)
Governance LayerHow policy flows through the hierarchyCascading policy resolver, governance packs, autonomy levels
Data LayerHow data is stored, encrypted, and locatedEnvelope encryption, BYOS, BYOK, memory pointers

The LLM Gateway Pipeline

Every LLM call — whether from an AI agent executing a task, a coding session generating code, or a direct user request — passes through a 10-stage security pipeline. The gateway is not optional or configurable — it is the only path to the LLM.

LLM GATEWAY — 10-STAGE SECURITY PIPELINE REQUEST → 1. Budget Check 2. Prompt Injection 3. PII Redaction 4. Content Mod 5. Audit (pre) ⚡ LLM CALL 6. Output Valid. 7. Content Mod 8. PII Restore 9. Token Billing 10. Audit (post) → RESPONSE SHA-256 Pipeline Hash — cumulative across all 10 stages If any stage is bypassed, the hash chain breaks → detectable in audit review

Stage details

#StageWhat It DoesOn Failure
1Budget CheckVerifies tenant hasn't exceeded spending limits. Per-call, per-agent, and per-tenant budgets.Request blocked with 429. Agent receives budget error.
2Prompt Injection Detection5 structural patterns: fake system headers, document boundary markers, HR-separator overrides, XML section tags, JSON role injection. Plus semantic analysis.Request blocked. Logged as security event. Agent receives sanitized error.
3PII RedactionDetects and masks personal data (names, emails, SSNs, credit cards, phone numbers) before content reaches the LLM. 16+ pattern categories.PII replaced with tokens. Original values stored for restoration.
4Content Moderation (input)Checks input against safety policies. Configurable thresholds per governance pack (HIPAA = stricter).Request blocked or flagged depending on enforcement mode.
5Audit (pre-call)Records sanitized input, model, context, and pipeline state. Pipeline hash begins.Always succeeds (non-blocking).
6LLM CallRouted to appropriate provider (6 supported: Anthropic, OpenAI, Google, Mistral, Groq, vLLM self-hosted). BYOK keys used.Provider circuit breaker. Fallback to alternate model if configured.
7Output ValidationChecks response for policy violations, credential leaks, and formatting requirements.Response sanitized. Violations logged.
8Content Moderation (output)Verifies response safety. Catches harmful content the LLM might generate.Response blocked or redacted.
9PII RestoreRe-inserts original PII values into response for the authorized user. LLM never saw the real PII.Tokens left unreplaced (safe degradation).
10Token Billing + Audit (post-call)Records usage, cost, latency. Completes pipeline hash. HMAC-signed audit entry.Always succeeds (non-blocking).

The pipeline hash is the integrity proof

A SHA-256 hash is computed cumulatively across all 10 stages. The final hash is stored in the audit log alongside the HMAC signature. If any stage is skipped (e.g., someone disables prompt injection detection for "performance"), the hash chain breaks — and the discrepancy is detectable during audit review. You can prove that every security stage ran.


Cascading Governance

Policy doesn't live in one place. It cascades through a 4-level hierarchy, with the most restrictive level winning at any point of conflict:

CASCADING GOVERNANCE — 4-LEVEL HIERARCHY Platform Default Baseline for all tenants Tenant Set by org admin — overrides platform Workspace Set by workspace owner Team → Agent Most specific level Resolution: most restrictive wins. Tenant "enforce" overrides team "audit".

What cascades

Policy DimensionWhat It ControlsExample
Autonomy levelHow independently agents can actproactive (propose + human approves) vs autonomous (act without approval) vs reactive (only when asked)
Artifact governanceWho can create/modify schedules, workflows, triggers, promptsautonomous (agents create freely) vs proactive (propose, human approves) vs locked (manifest-only)
Discovery capabilitiesWhat agents can discover on their own5 toggles: tool discovery, data discovery, memory creation, skill suggestion, auto tool sync
Enforcement modeHow authorization denials are handledauditwarnenforce
Governance packsWhich compliance modules are activeGDPR pack enables PII redaction, consent tracking, data export. HIPAA pack enables PHI detection, encryption, access controls.

The cascade is resolved at request time with a 60-second TTL cache. Policy changes propagate within one minute. No restart required.


Envelope Encryption

Content encryption at rest uses a 3-level envelope encryption architecture. This is the same pattern used by AWS, GCP, and Azure for their managed encryption services.

ENVELOPE ENCRYPTION — 3-LEVEL KEY HIERARCHY Platform Master Key ENCRYPTION_KEY env var or external KMS wraps ↓ Tenant KEK Key Encryption Key — one per tenant wraps ↓ Data Encryption Key (DEK) Per team or agent → encrypts content Rotation: new KEK → DEKs re-wrapped instantly → content re-encrypted lazily on next read (zero downtime) Embeddings stay cleartext (lossy projections). Semantic search works during and after rotation.

What triggers encryption

Encryption is not a toggle — it's governance-driven. When a HIPAA or GDPR governance pack is enabled on a tenant, encryption auto-activates for:

Content TypeEncryptedRationale
Report content & summaryYesMay contain PII/PHI from agent analysis
Agent memory contentYesMay contain learned facts about individuals
Memory embeddingsNoLossy vector projections. Can't reconstruct content. Preserves semantic search.
Context graph descriptionsYesEntity descriptions may reference people/companies
Context graph embeddingsNoSame rationale as memory embeddings
File parsed contentYesUploaded documents may contain sensitive data

KMS provider options

ProviderUse CaseKey Storage
Platform-managedDefault. Keys derived from ENCRYPTION_KEY.Platform infrastructure
AWS KMSEnterprise. Customer-managed keys in AWS.AWS Key Management Service
GCP Cloud KMSEnterprise. Customer-managed keys in GCP.Google Cloud KMS
Azure Key VaultEnterprise. Customer-managed keys in Azure.Azure Key Vault
Local (air-gapped)On-premise. No external KMS dependency.Software HSM with scrypt-derived per-tenant keys

Kill Switch Hierarchy

When something goes wrong, speed matters more than process. The kill switch provides three levels of emergency halt, each cascading downward:

KILL SWITCH HIERARCHY — CASCADING CONTAINMENT ! Tenant Kill Switch All teams stop. All agents stop. Nuclear option. ! Team Kill Switch All agents in team stop. Others unaffected. Team B Running normally Agent 1 STOPPED Agent 2 STOPPED Agent 3 STOPPED Agent 4 RUNNING Agent 5 RUNNING Properties Instant (seconds) Cascade-enabled Notify: email, Slack, PagerDuty, SIEM Admin approval to restart Auto-restart option (0 = manual only, up to 24h)

The kill switch is implemented as a governance module — it inherits the cascading policy system. A tenant-level kill switch overrides any team or agent-level setting. This ensures that emergency containment is always possible, even if a team has misconfigured its own governance.


Data Residency (BYOS)

Enterprise customers can store data in their own infrastructure. The platform stores pointers — not the data itself.

BYOS — CONTROL PLANE / DATA PLANE SEPARATION MeetLoyd Control Plane Orchestration, policy, metadata Agent Orchestration & Execution Policy Enforcement & Governance Audit Logging (HMAC integrity) Memory Pointers (path, hash, sync) No customer data stored here Customer Data Plane Your infrastructure, your keys, your compliance AWS S3 Your bucket GCS / Azure Your account Conversations & Memory Content Reports & Generated Documents File Attachments & Embeddings Encrypted with your keys (BYOK) pointers data Fallback MeetLoyd-managed R2 (Cloudflare) for tenants without BYOS Health Circuit breaker (3 failures / 5 min cooldown) → auto-fallback to R2 Migration Resumable batch from R2 → BYOS Testing Write/read/delete test before activation

BYOS is not just "store files in S3." It's an architectural pattern where the platform never holds the data. Memory content, reports, generated documents, and file attachments all flow through the customer's storage. The platform holds metadata: paths, content hashes, sync status, and encryption key references.


Multi-Provider LLM Routing

No vendor lock-in. The platform supports 6 LLM providers with automatic routing based on model name. BYOK (Bring Your Own Key) is mandatory at all tiers — customers provide their own API keys.

ProviderModelsKey Feature
AnthropicClaude Opus 4.6, Sonnet 4.6, Haiku 4.5Primary. Extended thinking. Best for complex reasoning.
OpenAIGPT-4o, GPT-4.1, o3, o1Broad model range. Content moderation API.
GoogleGemini 2.5 Pro, 2.5 Flash, 2.0 FlashLarge context windows. Multimodal.
MistralMistral Large, CodestralEuropean provider. EU data residency.
GroqLlama 4 Scout, Llama 3.3Ultra-fast inference. Cost-effective for high-volume.
vLLM (self-hosted)DeepSeek R1, Qwen 2.5, Qwen3-CoderFull control. No data leaves your infrastructure.

Model selection is per-agent, per-team, or per-task. Model aliases (claude-sonnet-latest) resolve at deploy time. When a model is deprecated, the lifecycle manager re-routes agents automatically.


Agent Execution Model

Agents execute in a standard ReAct loop: the LLM thinks, calls a tool, observes the result, and decides what to do next. Every tool call in this loop passes through the authorization check and the LLM Gateway.

AGENT EXECUTION — ReAct LOOP WITH AUTHORIZATION Agent receives task LLM Thinks (via Gateway) Tool Call Proposed AUTHORIZATION CHECK OpenFGA: can this agent use this tool? Allowed Tool Executes (MCP) Denied Error → LLM adapts loop back Session Complete → Audit finalized · Cost recorded · Certificate issued 100 calls max $5.00 cost ceiling Loop detection

The execution loop has built-in safety limits: 100 tool calls per session (configurable), cost ceiling per session ($5.00 default), and loop detection (same tool called 5+ times with >80% argument similarity triggers circuit breaker).


Chapter Summary

The reference architecture implements the Five Pillars as running infrastructure: the LLM Gateway secures every model interaction with a 10-stage pipeline and cumulative hash integrity proof. Cascading governance resolves policy in real-time through a 4-level hierarchy. Envelope encryption protects content at rest with governance-triggered activation. The kill switch provides instant emergency containment at any hierarchy level. BYOS keeps data in the customer's infrastructure with pointer-only storage on the platform side.

The next chapter translates this architecture into an Implementation Playbook — a phased rollout plan with role-by-role guidance for CISOs, CIOs, platform teams, and business owners.