Pfizer

Content Automation

Human-in-the-Loop MLR Governance at Global Scale

Problem
Manual MLR reviews creating 14-day bottlenecks; inconsistent claim validation across regions; no traceability
Solution
RAG-powered content automation with human-in-the-loop governance, reducing review time by 71%
Content Automation
71%Faster reviews
82%Auto-validation rate
300Assets/month
100%Audit traceability

Every piece of content. Every claim. Every region. Manual review.

Pfizer's global content operations span dozens of brands across hundreds of markets. Every promotional asset requires MLR (Medical, Legal, Regulatory) review. Every claim must be validated against approved sources.

The existing process was entirely manual. Content creators submitted assets, waited days for reviewer availability, received feedback via email, made revisions, and resubmitted. Average cycle: 14 days. And no system tracked which version of which claim was approved for which market.

I mapped the entire workflow—every handoff, every approval gate, every potential failure point. The RACI was a mess of overlapping responsibilities.

14day review cycle
0%automation
8+stakeholder handoffs
Diagnose

Content Workflow RACI Matrix

Mapping responsibilities across 8 stakeholder groups for 5 core workflow stages.

20 assignments
Workstream
CT
Content
AI
Automation
ML
MLR
MD
Medical
LG
Legal
BL
Brand
Content Drafting
Creation
R
C
A
Claims Extraction
Validation
I
R
A
RAG Validation
Validation
R
A
C
MLR Review
Approval
R
C
I
A
Legal Sign-off
Approval
C
R
A
Publication
Delivery
R
I
I
A

AI acceleration. Human judgment. Full traceability.

The solution isn't to remove humans from MLR review—it's to make their time count. The RAG pipeline pre-validates claims against approved sources, flags potential issues, and routes only the exceptions to human reviewers.

Architecture: content ingestion → claim extraction → RAG validation against approved claims library → confidence scoring → routing. High-confidence validations proceed automatically. Low-confidence items queue for human review.

IngestExtractValidateRouteApprove
Architect

Content Automation RAG Pipeline

Retrieval-augmented validation with human-in-the-loop governance for low-confidence decisions.

+IngestIndexRetrieveGenerateReviewIngestion → Indexing → Retrieval → Response
Relevance
91%
Retrieved doc relevance
Faithfulness
96%
Citations verified
Latency
280ms
p95 response time
Recall@5
82%
Relevant in top 5

AI Validation Escalation Framework

Confidence-based routing ensures high-risk decisions always get human oversight.

AI Validation Escalation FrameworkConfidence-based routing for MLR content reviewContentAI Confidence100%95%80%60%0%95-100%Auto-ApproveSLA: Instant42%Direct approval with audit log42% of submissions80-95%Fast-Track ReviewSLA: < 2 hours31%Expedited human review queue31% of submissions60-80%Standard ReviewSLA: < 24 hours19%Normal MLR review queue19% of submissions0-60%Senior ReviewSLA: < 48 hours8%Senior reviewer + AI explanation8% of submissionsFlow direction: Higher confidence → Lower human involvement • All decisions logged for audit trail
Escalation Triggers:
Novel claim not in librarySenior Review
Conflicting regional rulesSenior Review
Fair balance edge caseStandard Review
Template format mismatchFast-Track Review

Sub-second validation. 300+ assets per month.

The RAG pipeline processes content in real-time. Claim extraction runs as content is uploaded. Validation happens immediately. Reviewers see pre-validated content with confidence scores and source citations.

SLOs: p95 validation latency under 2 seconds, 99.5% uptime, complete audit trail for every decision. The system handles 300+ assets per month with capacity to scale 10x without architecture changes.

<2sp95 validation
99.5%uptime
10xscale headroom
Engineer

Validation Latency Distribution

Pipeline latency percentiles showing sub-second performance for claim validation.

p50
42
SLO: 50ms
p95
128
SLO: 150ms
p99
245
SLO: 300ms
Requests/s
1247
Error Rate
0.12%
710ms532ms355ms177ms0ms9:0010:0011:0012:0013:0014:00
!
Anomaly Detected: Database connection pool saturation
Duration: ~35 minutes
p50(SLO: 50ms)
p95(SLO: 150ms)
p99(SLO: 300ms)

Model Performance Metrics

Claims extraction and compliance classification model performance across 12,500 validation samples.

Claims Extraction Model
91%
Precision
94%
Recall
92.5%
F1 Score
Model: Fine-tuned BERT
Training samples: 45,000 labeled claims
Support: 12,500 validation samples
Compliance Classification Model
88%
Precision
92%
Recall
90%
F1 Score
Model: Ensemble (BERT + Rules)
Classes: Compliant, Non-Compliant, Needs Review
Support: 8,200 validation samples
Compliance Classification Confusion Matrix
MLR Validation Model PerformanceClaims Extraction & Compliance Classification • n=12,500 samplesPredictedCompliantNon-CompliantActualCompliantNon-Compliant7,54460.4%True Positive6565.2%False Positive4083.3%False Negative3,89231.1%True NegativePerformance MetricsAccuracy91.5% target 90.0%Precision92.0% target 88.0%Recall94.9% target 92.0%F1 Score93.4% target 90.0%Specificity85.6% target 85.0%
Key Insights:
  • False Positives (656): Content incorrectly approved — 5.2% of total. Risk: Non-compliant content reaching market.
  • False Negatives (408): Content incorrectly rejected — 3.3% of total. Impact: Unnecessary manual review cycles.
  • Model achieves 92.0% precision, meaning 8.0% of "compliant" predictions require human override.

Trust the system. Verify the exceptions.

Rolled out to content teams with a trust-building approach: start with human review of all AI recommendations, gradually increase auto-approval thresholds as confidence grows.

Today, 82% of claims validate automatically. Reviewers focus on the 18% that need human judgment—novel claims, edge cases, regional variations. Their expertise is amplified, not replaced.

Before

  • ✕ 14-day cycles
  • ✕ 100% manual review
  • ✕ Email-based tracking

After

  • ✓ 4-day cycles
  • ✓ 82% auto-validated
  • ✓ Full audit trail
Enable

Operational Metrics — 12 Month Trend

Key performance indicators showing continuous improvement in content velocity and quality.

Content Velocity
150%
300/mo
Min: 120/moMax: 300/mo
Review Time
71.4%
4d
Min: 4dMax: 14d
Auto-Validation
446.7%
82%
Min: 15%Max: 82%
Error Rate
78.8%
1.8%
Min: 1.8%Max: 8.5%

The Result

Review cycles dropped from 14 days to 4 days—71% faster. Content velocity increased from 120 to 300 assets per month. Error rates fell from 8.5% to 1.8%.

More importantly: every claim, every validation, every approval is now traceable. When regulators ask "how do you know this claim is approved?", there's a complete audit trail with source documents and validation timestamps.

14 days4 daysReview Cycle
120/mo300/moAsset Velocity
8.5%1.8%Error Rate
Impact

Lessons Learned

What Worked Well
  • Starting with 100% human review built trust—gradual automation increase was accepted because users saw the AI "learning"
  • Confidence scoring + explanations made AI decisions transparent—reviewers understood why claims were flagged
  • Complete audit trail became a regulatory asset—turned compliance from cost center to competitive advantage
What We'd Do Differently
  • Would have built a "similar claims" suggestion feature earlier—reviewers frequently asked "what did we approve before?"
  • Regional rules engine should have been configurable from day one—each new market required code changes for 4 months
  • Invested more in false negative analysis—a few missed non-compliant claims in month 2 eroded initial trust

"The biggest insight: we're not automating reviewers away. We're giving them superpowers. The best reviewers now handle 3x the volume because they're only seeing the hard cases."