Keep policy enforcement outside the prompt
Choice: Deny-by-default tool-boundary policy engine
Trade-off: Adds orchestration latency and policy maintenance, but makes authorization deterministic, testable, and auditable.
Architecture review dossier
This is the portfolio's system-level position: explicit decision authority, enforceable boundaries, measurable nonfunctional requirements, inspectable evidence, and operating ownership beyond launch.
A provider-portable architecture for healthcare RAG and agent workflows. Identity, PHI handling, policy, approval, evaluation, and evidence are platform contracts rather than optional prompt behavior.
Scope statement: this is a public reference architecture, not client code. It composes production-derived patterns into an inspectable system and uses synthetic data for verification.
Download sanitized solution architecture briefThese decisions make the operating philosophy concrete and expose the costs accepted in exchange for control, portability, and evidence.
Choice: Deny-by-default tool-boundary policy engine
Trade-off: Adds orchestration latency and policy maintenance, but makes authorization deterministic, testable, and auditable.
Choice: Citations, source metadata, and confidence travel with each answer
Trade-off: Increases payload size and UI complexity, but supports review, dispute, and post-incident reconstruction.
Choice: Durable queue with idempotency, cancellation, retry budgets, and dead-letter recovery
Trade-off: More operational components than synchronous APIs, but safer recovery and backpressure under variable model latency.
Choice: Explicit escalation records rather than informal chat handoffs
Trade-off: Slower happy paths, but clear accountability for high-risk clinical, regulatory, and content decisions.
Choice: Provider adapters behind evaluation and policy contracts
Trade-off: Constrains provider-specific features, but reduces lock-in and makes regulated change control practical.
Choice: Canonical receipts with hash chaining and verifiable signatures
Trade-off: Requires key management and retention controls, but prevents silent mutation of agent evidence.
Architecture is accepted against measurable operational constraints, not diagrams alone.
| Quality | Target | Architecture response | Verification |
|---|---|---|---|
| Availability | 99.9% platform; 99.99% critical event path | Multi-zone services, queue durability, graceful degradation | Synthetic probes and recovery tests |
| Latency | <2s p95 retrieval; streaming first token <1.5s | Caching, bounded context, asynchronous tools | Load tests and trace percentiles |
| Privacy | No PHI in unapproved model or log paths | Classification, redaction, scoped retrieval, private endpoints | Policy tests and log sampling |
| Recovery | RTO 60 min; RPO 15 min for evidence stores | Cross-zone replicas, immutable backups, replayable events | Quarterly restore exercise |
| Auditability | Reconstruct every consequential action | Signed receipts, model and prompt versions, approval records | Evidence-bundle verification |
| Cost | Per-workflow budget and tenant visibility | Token budgets, model routing, caching, usage attribution | Cost telemetry and threshold alerts |
Residual risk remains visible and owned. Controls reduce likelihood and blast radius; they do not turn probabilistic systems into risk-free systems.
Scenario: Untrusted retrieved content attempts to redirect tools
Controls: Content isolation, instruction hierarchy, policy enforcement at tools
Scenario: Sensitive data reaches logs, models, or unauthorized users
Controls: Classification, redaction, tenant filters, private endpoints
Scenario: Identity or metadata defect exposes another tenant
Controls: Attribute-based access, namespace isolation, adversarial tests
Scenario: Agent invokes a valid tool outside intended purpose
Controls: Deny-by-default policies, argument validation, approval thresholds
Scenario: Trace or approval history is changed after execution
Controls: Canonical signed receipts, hash chains, immutable retention
Scenario: Provider behavior changes without controlled acceptance
Controls: Pinned versions, regression suites, canary release, rollback
Planning ranges make workload assumptions and FinOps choices discussable before vendor selection. Actual pricing depends on model, region, retention, and support requirements.
| Operating tier | Monthly volume | Planning range | Primary controls |
|---|---|---|---|
| Pilot | 5K workflows/mo | $1.5K-$3K/mo | Managed services, smaller models, shared non-production |
| Department | 50K workflows/mo | $9K-$18K/mo | Caching, model routing, reserved database capacity |
| Enterprise | 500K workflows/mo | $65K-$140K/mo | Tenant attribution, regional resilience, dedicated observability |
Planning estimates are modeled reference ranges, not client invoices or guaranteed cloud quotations.
Run a credential-free example to see how identity, policy, human approval, evaluation, and signed evidence change with workflow risk.
Synthetic request with no patient or client data. Choose the risk tier, then inspect the resulting control path.
No model or external service is called. This walkthrough demonstrates the architecture contract with deterministic synthetic evidence.
A five-minute technical review can follow one synthetic request from identity through retrieval, policy, approval, evaluation, signed evidence, and operations.
Identity and tenant claims establish the permitted data and tool scope.
Input is checked for PHI, intent, risk tier, and required approval path.
FHIR and knowledge sources are filtered before evidence enters model context.
Policy engine evaluates each consequential tool call outside the prompt.
High-risk or uncertain work pauses for an accountable human decision.
Scenario and regression gates verify behavior against release criteria.
Trace, policy decisions, versions, and approvals become a verifiable receipt.
SLO, cost, drift, incident, and evidence telemetry feed ongoing ownership.