Chapter 5: The Reference Architecture — Agentic AI Blueprint

The previous chapters defined what governed AI agent deployment requires. This chapter defines how — the technical architecture that makes Level 4 governance possible without sacrificing the speed and flexibility that makes AI agents valuable in the first place.

The architecture is organized into four layers, each addressing a distinct concern:

Layer	Concern	Components
Execution Layer	How agents run and interact with tools	Agent executor, MCP tool registry, sandbox providers
Security Layer	How every LLM call is secured	LLM Gateway pipeline (10-stage)
Governance Layer	How policy flows through the hierarchy	Cascading policy resolver, governance packs, autonomy levels
Data Layer	How data is stored, encrypted, and located	Envelope encryption, BYOS, BYOK, memory pointers

The LLM Gateway Pipeline

Every LLM call — whether from an AI agent executing a task, a coding session generating code, or a direct user request — passes through a 10-stage security pipeline. The gateway is not optional or configurable — it is the only path to the LLM.

Stage details

#	Stage	What It Does	On Failure
1	Budget Check	Verifies tenant hasn't exceeded spending limits. Per-call, per-agent, and per-tenant budgets.	Request blocked with 429. Agent receives budget error.
2	Prompt Injection Detection	5 structural patterns: fake system headers, document boundary markers, HR-separator overrides, XML section tags, JSON role injection. Plus semantic analysis.	Request blocked. Logged as security event. Agent receives sanitized error.
3	PII Redaction	Detects and masks personal data (names, emails, SSNs, credit cards, phone numbers) before content reaches the LLM. 16+ pattern categories.	PII replaced with tokens. Original values stored for restoration.
4	Content Moderation (input)	Checks input against safety policies. Configurable thresholds per governance pack (HIPAA = stricter).	Request blocked or flagged depending on enforcement mode.
5	Audit (pre-call)	Records sanitized input, model, context, and pipeline state. Pipeline hash begins.	Always succeeds (non-blocking).
6	LLM Call	Routed to appropriate provider (6 supported: Anthropic, OpenAI, Google, Mistral, Groq, vLLM self-hosted). BYOK keys used.	Provider circuit breaker. Fallback to alternate model if configured.
7	Output Validation	Checks response for policy violations, credential leaks, and formatting requirements.	Response sanitized. Violations logged.
8	Content Moderation (output)	Verifies response safety. Catches harmful content the LLM might generate.	Response blocked or redacted.
9	PII Restore	Re-inserts original PII values into response for the authorized user. LLM never saw the real PII.	Tokens left unreplaced (safe degradation).
10	Token Billing + Audit (post-call)	Records usage, cost, latency. Completes pipeline hash. HMAC-signed audit entry.	Always succeeds (non-blocking).

The pipeline hash is the integrity proof

A SHA-256 hash is computed cumulatively across all 10 stages. The final hash is stored in the audit log alongside the HMAC signature. If any stage is skipped (e.g., someone disables prompt injection detection for "performance"), the hash chain breaks — and the discrepancy is detectable during audit review. You can prove that every security stage ran.

Cascading Governance

Policy doesn't live in one place. It cascades through a 4-level hierarchy, with the most restrictive level winning at any point of conflict:

What cascades

Policy Dimension	What It Controls	Example
Autonomy level	How independently agents can act	`proactive` (propose + human approves) vs `autonomous` (act without approval) vs `reactive` (only when asked)
Artifact governance	Who can create/modify schedules, workflows, triggers, prompts	`autonomous` (agents create freely) vs `proactive` (propose, human approves) vs `locked` (manifest-only)
Discovery capabilities	What agents can discover on their own	5 toggles: tool discovery, data discovery, memory creation, skill suggestion, auto tool sync
Enforcement mode	How authorization denials are handled	`audit` → `warn` → `enforce`
Governance packs	Which compliance modules are active	GDPR pack enables PII redaction, consent tracking, data export. HIPAA pack enables PHI detection, encryption, access controls.

The cascade is resolved at request time with a 60-second TTL cache. Policy changes propagate within one minute. No restart required.

Envelope Encryption

Content encryption at rest uses a 3-level envelope encryption architecture. This is the same pattern used by AWS, GCP, and Azure for their managed encryption services.

What triggers encryption

Encryption is not a toggle — it's governance-driven. When a HIPAA or GDPR governance pack is enabled on a tenant, encryption auto-activates for:

Content Type	Encrypted	Rationale
Report content & summary	Yes	May contain PII/PHI from agent analysis
Agent memory content	Yes	May contain learned facts about individuals
Memory embeddings	No	Lossy vector projections. Can't reconstruct content. Preserves semantic search.
Context graph descriptions	Yes	Entity descriptions may reference people/companies
Context graph embeddings	No	Same rationale as memory embeddings
File parsed content	Yes	Uploaded documents may contain sensitive data

KMS provider options

Provider	Use Case	Key Storage
Platform-managed	Default. Keys derived from ENCRYPTION_KEY.	Platform infrastructure
AWS KMS	Enterprise. Customer-managed keys in AWS.	AWS Key Management Service
GCP Cloud KMS	Enterprise. Customer-managed keys in GCP.	Google Cloud KMS
Azure Key Vault	Enterprise. Customer-managed keys in Azure.	Azure Key Vault
Local (air-gapped)	On-premise. No external KMS dependency.	Software HSM with scrypt-derived per-tenant keys

Kill Switch Hierarchy

When something goes wrong, speed matters more than process. The kill switch provides three levels of emergency halt, each cascading downward:

The kill switch is implemented as a governance module — it inherits the cascading policy system. A tenant-level kill switch overrides any team or agent-level setting. This ensures that emergency containment is always possible, even if a team has misconfigured its own governance.

Data Residency (BYOS)

Enterprise customers can store data in their own infrastructure. The platform stores pointers — not the data itself.

BYOS is not just "store files in S3." It's an architectural pattern where the platform never holds the data. Memory content, reports, generated documents, and file attachments all flow through the customer's storage. The platform holds metadata: paths, content hashes, sync status, and encryption key references.

Multi-Provider LLM Routing

No vendor lock-in. The platform supports 6 LLM providers with automatic routing based on model name. BYOK (Bring Your Own Key) is mandatory at all tiers — customers provide their own API keys.

Provider	Models	Key Feature
Anthropic	Claude Opus 4.6, Sonnet 4.6, Haiku 4.5	Primary. Extended thinking. Best for complex reasoning.
OpenAI	GPT-4o, GPT-4.1, o3, o1	Broad model range. Content moderation API.
Google	Gemini 2.5 Pro, 2.5 Flash, 2.0 Flash	Large context windows. Multimodal.
Mistral	Mistral Large, Codestral	European provider. EU data residency.
Groq	Llama 4 Scout, Llama 3.3	Ultra-fast inference. Cost-effective for high-volume.
vLLM (self-hosted)	DeepSeek R1, Qwen 2.5, Qwen3-Coder	Full control. No data leaves your infrastructure.

Model selection is per-agent, per-team, or per-task. Model aliases (claude-sonnet-latest) resolve at deploy time. When a model is deprecated, the lifecycle manager re-routes agents automatically.

Agent Execution Model

Agents execute in a standard ReAct loop: the LLM thinks, calls a tool, observes the result, and decides what to do next. Every tool call in this loop passes through the authorization check and the LLM Gateway.

The execution loop has built-in safety limits: 100 tool calls per session (configurable), cost ceiling per session ($5.00 default), and loop detection (same tool called 5+ times with >80% argument similarity triggers circuit breaker).

Chapter Summary

The reference architecture implements the Five Pillars as running infrastructure: the LLM Gateway secures every model interaction with a 10-stage pipeline and cumulative hash integrity proof. Cascading governance resolves policy in real-time through a 4-level hierarchy. Envelope encryption protects content at rest with governance-triggered activation. The kill switch provides instant emergency containment at any hierarchy level. BYOS keeps data in the customer's infrastructure with pointer-only storage on the platform side.

The next chapter translates this architecture into an Implementation Playbook — a phased rollout plan with role-by-role guidance for CISOs, CIOs, platform teams, and business owners.