Predicting which expensive drugs continue, stop, or switch
A structured AI agent system for drug-level pharmacy cost prediction — inspectable reasoning, multi-model benchmarking against actuals, and a pharmacist feedback loop.
Per-drugprediction unit
Benchmarkedagainst real outcomes
Traceablereasoning per prediction
Pharmacistreview loop
01 — Problem
Why standard models fail on the members that matter most
Super claimants — members above $5k/month in pharmacy spend — represent less than 0.4% of a typical population but often 40–60% of total pharmacy cost. They are exactly the cohort where trend models and gradient boosting break down.
<0.4%
Super claimants in a typical population
Members above $5k/month in pharmacy spend. Far too few data points for a supervised ML model to learn from. XGBoost sees 3–8 historical members per drug-therapy combination — not enough signal to generalize.
40–60%
Share of total pharmacy cost they drive
A model that gets super claimants wrong by 30% introduces more error than getting the entire rest of the population wrong by 5%. Accuracy at the tail is what determines actuarial reliability.
Why XGBoost / gradient boosting fails here
Can't generalize from sparse training data
Supervised ML needs historical examples to learn from. For a member on Omnitrope (growth hormone), there are simply not enough prior cases in any single plan's claims history to train a model that generalizes. The model either overfits to noise or regresses to the mean — both catastrophic on high-cost members.
Why an AI agent with pharmacy knowledge works
Reasons from clinical context, not pattern frequency
An LLM with embedded pharmacological knowledge doesn't need historical examples to reason about a patient — it draws on clinical understanding of how drugs actually work. In one case, a patient on Ocrevus took a drug with a loading regime: $3k/day for 14 days, then $200/day for 6 months. A naive model averaged across all fills and projected $550k for the year. The agent recognized the loading phase was complete and predicted $73k — a difference of nearly $477k on a single member.
Outcome
Definition
Cost implication
Typical signal
continue
Continues at similar dose and frequency
Full annual cost
High adherence, stable fills, no competing claims
stop
Stop taking medication
Partial year or $0
Coverage gap, biosimilar switch
taper
Dose decreasing systematically
Reduced dose × partial year
Downward titration, end of loading period
switch
Replaced by alternative
Cost of destination drug
New fill in same class
02 — Signals
Modeling signals
Eight signal categories, computed from claims, included in the per-drug structured summary. Click any card for implementation detail.
Behavioral
Adherence Rate
The extent to which a patient takes their medication as prescribed.
Behavioral
Fill Regularity
Variance in fill intervals. Erratic patterns signal adherence risk.
Clinical
Dosage Progression
Titration history. Upward trends signal optimization; stability predicts continuation.
Claims
Coverage Gaps
Gap length and restart pattern are key discontinuation signals.
Drug
Specialty / Generic Status
Specialty drugs without generics have higher continuation inertia.
Financial
Copay Burden
Cost share as fraction of estimated income. High burden predicts drop-off.
Clinical
Therapy Switch Signals
A new fill in the same drug class shortly after a gap signals a likely switch.
Expert
Expert Pharmaceutical Input
Curated guidance from pharmacists and clinical specialists injected into the model.
03 — Workflow
Model pipeline
Claims are transformed into structured per-drug inputs before any model call. The agent operates on typed data — not raw claims. Drugs above $10/day are selected for prediction. A judge reviews every prediction for logical consistency.
🗃️
Claims
History
12–36 mo NDC
⚗️
Drug
Selection
Cost threshold
📋
Per-Drug
Summary
Structured JSON
🤖
AI Agent
Predictions
Multiple calls/drug
⚖️
Judge
Review
Every prediction
Logical consistency
🔗
Cross-Drug
Aggregation
Context synthesis
📊
Structured
JSON Output
Days · Cost/day
Dose · Confidence
Per-Drug Structured Summary
Each drug's history is condensed into a typed object: fill dates, quantities, days supply, cost shares, NDC dosage, and gap events. Consistent input format across all model runs.
Multi-Agent Judge Step
A judge runs on every prediction — reviewing all AI agent outputs for the same drug and selecting the most logically consistent result.
Cross-Drug Aggregation
A final pass considers all drug predictions for a member simultaneously — detects contradictions, produces a coherent member-level cost total.
Output — per drug, per member
Expected days on therapy
How many days next year the member is predicted to remain on this drug
Expected dosage
The dose level anticipated to continue — accounting for titration or taper
Expected cost per day
Projected daily drug cost, based on predicted dose and supply pattern
Confidence & reasoning
A confidence level (High / Med / Low) with a full reasoning trace tied to specific claims signals
04 — Benchmarks
Model selection
All configurations evaluated against the same held-out population. MAE and RMSE in $/month.
Model
MAE ($/mo)
RMSE ($/mo)
Rx Sentinel v2.1⭐ Best
$3,684.34
$5,746.31
Rx Sentinel v1.1
$4,104.92
$6,359.87
XGBoost
$4,125.47
$6,560.24
CART
$4,177.5
$7,092.07
Rx Sentinel v0.1
$4,864.47
$7,738.38
Concrete Examples — Why agentic reasoning matters
Humira → Yusimry biosimilar switch
A naive model sees one Yusimry fill and predicts roughly one month of future therapy. The LLM sees a biosimilar switch and correctly preserves ~11 months — at Yusimry cost.
📋 What the model saw
Yusimry (index drug)
Fills on record:1 fill — dated 10/04 — drug is active at prediction start
Days supplied:28 days — only Yusimry-specific history available
Dosage:40 MG / 0.8 ML — pre-filled syringe, every other week
Copay:$0 — zero out-of-pocket
Humira — same class, prior drug
Prior fills:Long continuous history — predating the Yusimry switch
Days supplied:~336 days — over prior 12-month window
Adherence:~92% PDC — highly regular fill pattern, no meaningful gaps
Dose:40 MG / 0.8 ML — stable — no titration changes
Gap to Yusimry:None — therapy uninterrupted across brand change
These signals together point to a single conclusion: the patient didn't start a new drug — they switched brands within the same chronic therapy.
💬 What the LLM actually said
"
The 10/04 Yusimry fill strongly suggests a product switch from Humira rather than initiation of a short course.
"
Starting from the established class adherence (~92% → ~336 days) and discounting slightly for the short Yusimry-specific history, I predict ~332 days on therapy next year.
"
Confidence is medium rather than high: the therapy-level evidence is strong, but Yusimry itself has only one fill on record. A second fill would likely upgrade this to high confidence.
💡 Why this matters
1
A trend model overstates cost — the LLM corrects it
A naive model carries forward last year's Humira spend (~$45k) at trend. It never detects the biosimilar switch, so it overpredicts cost by the full brand premium — every month, for the entire year.
2
The patient switched brands, not therapies
Humira → Yusimry is a formulary substitution. The underlying disease, the dose, and the adherence pattern are unchanged. Treating it as a new drug is a category error.
3
The savings are real — but so is the continuity
The correct projection replaces ~11 months of Humira cost with ~11 months of Yusimry cost. Both the duration and the unit cost change. Predicting zero continuation loses both.
Example 2 — Loading Regime Detection
Ocrevus — loading phase: $3k/day → $200/day
A patient on Ocrevus took 20 units for 14 days (loading), then 20 units for 6 months (maintenance) — each script at $40k. A naive model averages across all fills and projects $550k for the year. The agent recognizes the loading phase is done and predicts $73k — a difference of nearly $477k on a single member.
📋 What the model saw
Loading phase
Duration:14 days — short high-intensity initiation period
Units per fill:20 units — same pack size as maintenance
Script cost:$40k — identical cost per fill
Daily cost:$3k/day — high — driven by dosing frequency
Maintenance phase
Duration:6 months — ongoing after loading complete
Units per fill:20 units — same pack — but less frequent dosing
Script cost:$40k — identical cost per fill
Daily cost:$200/day — dramatically lower — loading is done
The fill history looks identical on the surface — same units, same script cost. The only way to know the loading phase is over is to understand the drug.
💬 What the LLM actually said
"
The fill history shows identical pack sizes across both phases — but this drug has a known loading regime. Dosing frequency, not pack size, is what changes.
"
The loading phase is complete. Forward cost should be projected at maintenance dosing — not at the blended average of loading and maintenance fills.
"
Naive averaging of all historical fills produces a significant overestimate. The correct projection is maintenance cost forward: approximately $73k for the year.
💡 Why this matters
1
The naive model can't see what it doesn't know
Averaging per-script or per-day cost across all fills is a reasonable heuristic — unless the drug has a loading regime. Then it systematically overpredicts for every patient who has completed loading.
2
The pack looks the same — the cost doesn't
Same 20-unit fill, same $40k script. A model without drug knowledge has no basis to treat these fills differently. The agent does — because it knows this drug's dosing protocol.
05 — Transparency
Predictions are inspectable
Every prediction carries a reasoning trace tied to specific claims signals. Predictions can be audited, challenged, and overridden. Expand any drug to read the trace.
Sample Members
MBR-03291
3 drug predictions
MBR-00421
3 drug predictions
MBR-01187
2 drug predictions
MBR-03291
Yusimry
280days/yr
$19kest. cost
Med▼
Budesonide
30days/yr
$0kest. cost
Low▼
Humira (2 Pen)
0days/yr
$0kest. cost
Med▼
06 — Feedback Loop
Pharmacist review & model improvement
Pharmacists flag reasoning errors on flagged predictions. Corrections are structured annotations that feed directly into model evaluation and prompt refinement.
🤖
Model prediction
With full trace
🚩
Flagged for review
💊
Pharmacist reviews
Clinical assessment
✏️
Correction logged
Structured annotation
📈
Next version improves
Used in eval & prompts
07 — Drug Dossiers
Curated drug context
For high-priority drugs, structured dossiers - continuation patterns, switch triggers, model guidance - are injected alongside claims data to guide reasoning.
Humira (adalimumab)
▼
Ozempic / Wegovy (semaglutide)
▼
Ibrutinib (Imbruvica)
▼
Summary
What makes Rx Sentinel different
Structured prediction workflow
Claims are typed into per-drug summaries before any model call. Controlled inputs, typed outputs. Purpose-built pipeline, not prompt engineering on raw data.
Benchmarked against real outcomes
Every model configuration is evaluated on held-out actuals using MAE and RMSE in days-on-therapy. The model in production is the one that earned it.
Inspectable reasoning
Every prediction includes a reasoning trace tied to specific claims signals. Auditable, challengeable, overrideable.
Expert feedback loop
Pharmacists annotate reasoning errors in structured form. Those annotations become evaluation signal. The system improves as it is used.
Cost-efficient at scale
RxSentinel v2.1 costs below $0.05 per analyzed superclaimant. Version 1.1 achieves this at under $0.02 per superclaimant — enabling large-scale deployment without prohibitive inference costs.