Medication Continuation Intelligence

Predicting which expensive drugs continue, stop, or switch

A structured AI agent system for drug-level pharmacy cost prediction — inspectable reasoning, multi-model benchmarking against actuals, and a pharmacist feedback loop.

Per-drugprediction unit

Benchmarkedagainst real outcomes

Traceablereasoning per prediction

Pharmacistreview loop

01 — Problem

Why standard models fail on the members that matter most

Super claimants — members above $5k/month in pharmacy spend — represent less than 0.4% of a typical population but often 40–60% of total pharmacy cost. They are exactly the cohort where trend models and gradient boosting break down.

<0.4%

Super claimants in a typical population

Members above $5k/month in pharmacy spend. Far too few data points for a supervised ML model to learn from. XGBoost sees 3–8 historical members per drug-therapy combination — not enough signal to generalize.

40–60%

Share of total pharmacy cost they drive

A model that gets super claimants wrong by 30% introduces more error than getting the entire rest of the population wrong by 5%. Accuracy at the tail is what determines actuarial reliability.

Why XGBoost / gradient boosting fails here

Can't generalize from sparse training data

Supervised ML needs historical examples to learn from. For a member on Omnitrope (growth hormone), there are simply not enough prior cases in any single plan's claims history to train a model that generalizes. The model either overfits to noise or regresses to the mean — both catastrophic on high-cost members.

Why an AI agent with pharmacy knowledge works

Reasons from clinical context, not pattern frequency

An LLM with embedded pharmacological knowledge doesn't need historical examples to reason about a patient — it draws on clinical understanding of how drugs actually work. In one case, a patient on Ocrevus took a drug with a loading regime: $3k/day for 14 days, then $200/day for 6 months. A naive model averaged across all fills and projected $550k for the year. The agent recognized the loading phase was complete and predicted $73k — a difference of nearly $477k on a single member.

Outcome	Definition	Cost implication	Typical signal
continue	Continues at similar dose and frequency	Full annual cost	High adherence, stable fills, no competing claims
stop	Stop taking medication	Partial year or $0	Coverage gap, biosimilar switch
taper	Dose decreasing systematically	Reduced dose × partial year	Downward titration, end of loading period
switch	Replaced by alternative	Cost of destination drug	New fill in same class

02 — Signals

Modeling signals

Eight signal categories, computed from claims, included in the per-drug structured summary. Click any card for implementation detail.

Behavioral

Adherence Rate

The extent to which a patient takes their medication as prescribed.

Behavioral

Fill Regularity

Variance in fill intervals. Erratic patterns signal adherence risk.

Clinical

Dosage Progression

Titration history. Upward trends signal optimization; stability predicts continuation.

Claims

Coverage Gaps

Gap length and restart pattern are key discontinuation signals.

Drug

Specialty / Generic Status

Specialty drugs without generics have higher continuation inertia.

Financial

Copay Burden

Cost share as fraction of estimated income. High burden predicts drop-off.

Clinical

Therapy Switch Signals

A new fill in the same drug class shortly after a gap signals a likely switch.

Expert

Expert Pharmaceutical Input

Curated guidance from pharmacists and clinical specialists injected into the model.

03 — Workflow

Model pipeline

Claims are transformed into structured per-drug inputs before any model call. The agent operates on typed data — not raw claims. Drugs above $10/day are selected for prediction. A judge reviews every prediction for logical consistency.

🗃️

Claims History

12–36 mo NDC

⚗️

Drug Selection

Cost threshold

📋

Per-Drug Summary

Structured JSON

🤖

AI Agent Predictions

Multiple calls/drug

⚖️

Judge Review

Every prediction Logical consistency

🔗

Cross-Drug Aggregation

Context synthesis

📊

Structured JSON Output

Days · Cost/day Dose · Confidence

Per-Drug Structured Summary

Each drug's history is condensed into a typed object: fill dates, quantities, days supply, cost shares, NDC dosage, and gap events. Consistent input format across all model runs.

Multi-Agent Judge Step

A judge runs on every prediction — reviewing all AI agent outputs for the same drug and selecting the most logically consistent result.

Cross-Drug Aggregation

A final pass considers all drug predictions for a member simultaneously — detects contradictions, produces a coherent member-level cost total.

Output — per drug, per member

Expected days on therapy

How many days next year the member is predicted to remain on this drug

Expected dosage

The dose level anticipated to continue — accounting for titration or taper

Expected cost per day

Projected daily drug cost, based on predicted dose and supply pattern

Confidence & reasoning

A confidence level (High / Med / Low) with a full reasoning trace tied to specific claims signals

04 — Benchmarks

Model selection

All configurations evaluated against the same held-out population. MAE and RMSE in $/month.

Model	MAE ($/mo)	RMSE ($/mo)
Rx Sentinel v2.1⭐ Best	$3,684.34	$5,746.31
Rx Sentinel v1.1	$4,104.92	$6,359.87
XGBoost	$4,125.47	$6,560.24
CART	$4,177.5	$7,092.07
Rx Sentinel v0.1	$4,864.47	$7,738.38

Concrete Examples — Why agentic reasoning matters

Humira → Yusimry biosimilar switch

A naive model sees one Yusimry fill and predicts roughly one month of future therapy. The LLM sees a biosimilar switch and correctly preserves ~11 months — at Yusimry cost.

📋 What the model saw

Yusimry (index drug)

Fills on record: 1 fill — dated 10/04 — drug is active at prediction start

Days supplied: 28 days — only Yusimry-specific history available

Dosage: 40 MG / 0.8 ML — pre-filled syringe, every other week

Copay: $0 — zero out-of-pocket

Humira — same class, prior drug

Prior fills: Long continuous history — predating the Yusimry switch

Days supplied: ~336 days — over prior 12-month window

Adherence: ~92% PDC — highly regular fill pattern, no meaningful gaps

Dose: 40 MG / 0.8 ML — stable — no titration changes

Gap to Yusimry: None — therapy uninterrupted across brand change

These signals together point to a single conclusion: the patient didn't start a new drug — they switched brands within the same chronic therapy.

💬 What the LLM actually said

The 10/04 Yusimry fill strongly suggests a product switch from Humira rather than initiation of a short course.

Starting from the established class adherence (~92% → ~336 days) and discounting slightly for the short Yusimry-specific history, I predict ~332 days on therapy next year.

Confidence is medium rather than high: the therapy-level evidence is strong, but Yusimry itself has only one fill on record. A second fill would likely upgrade this to high confidence.

💡 Why this matters

A trend model overstates cost — the LLM corrects it

A naive model carries forward last year's Humira spend (~$45k) at trend. It never detects the biosimilar switch, so it overpredicts cost by the full brand premium — every month, for the entire year.

The patient switched brands, not therapies

Humira → Yusimry is a formulary substitution. The underlying disease, the dose, and the adherence pattern are unchanged. Treating it as a new drug is a category error.

The savings are real — but so is the continuity

The correct projection replaces ~11 months of Humira cost with ~11 months of Yusimry cost. Both the duration and the unit cost change. Predicting zero continuation loses both.

Example 2 — Loading Regime Detection

Ocrevus — loading phase: $3k/day → $200/day

A patient on Ocrevus took 20 units for 14 days (loading), then 20 units for 6 months (maintenance) — each script at $40k. A naive model averages across all fills and projects $550k for the year. The agent recognizes the loading phase is done and predicts $73k — a difference of nearly $477k on a single member.

📋 What the model saw

Loading phase

Duration: 14 days — short high-intensity initiation period

Units per fill: 20 units — same pack size as maintenance

Script cost: $40k — identical cost per fill

Daily cost: $3k/day — high — driven by dosing frequency

Maintenance phase

Duration: 6 months — ongoing after loading complete

Units per fill: 20 units — same pack — but less frequent dosing

Script cost: $40k — identical cost per fill

Daily cost: $200/day — dramatically lower — loading is done

The fill history looks identical on the surface — same units, same script cost. The only way to know the loading phase is over is to understand the drug.

💬 What the LLM actually said

The fill history shows identical pack sizes across both phases — but this drug has a known loading regime. Dosing frequency, not pack size, is what changes.

The loading phase is complete. Forward cost should be projected at maintenance dosing — not at the blended average of loading and maintenance fills.

Naive averaging of all historical fills produces a significant overestimate. The correct projection is maintenance cost forward: approximately $73k for the year.

💡 Why this matters

The naive model can't see what it doesn't know

Averaging per-script or per-day cost across all fills is a reasonable heuristic — unless the drug has a loading regime. Then it systematically overpredicts for every patient who has completed loading.

The pack looks the same — the cost doesn't

Same 20-unit fill, same $40k script. A model without drug knowledge has no basis to treat these fills differently. The agent does — because it knows this drug's dosing protocol.

05 — Transparency

Predictions are inspectable

Every prediction carries a reasoning trace tied to specific claims signals. Predictions can be audited, challenged, and overridden. Expand any drug to read the trace.

Sample Members

MBR-03291

3 drug predictions

MBR-00421

3 drug predictions

MBR-01187

2 drug predictions

MBR-03291

Yusimry

280days/yr

$19kest. cost

Med▼

Budesonide

30days/yr

$0kest. cost

Low▼

Humira (2 Pen)

0days/yr

$0kest. cost

Med▼

06 — Feedback Loop

Pharmacist review & model improvement

Pharmacists flag reasoning errors on flagged predictions. Corrections are structured annotations that feed directly into model evaluation and prompt refinement.

🤖

Model prediction

With full trace

🚩

Flagged for review

💊

Pharmacist reviews

Clinical assessment

✏️

Correction logged

Structured annotation

📈

Next version improves

Used in eval & prompts

07 — Drug Dossiers

Curated drug context

For high-priority drugs, structured dossiers - continuation patterns, switch triggers, model guidance - are injected alongside claims data to guide reasoning.

Humira (adalimumab)

▼

Ozempic / Wegovy (semaglutide)

▼

Ibrutinib (Imbruvica)

▼

Summary

What makes Rx Sentinel different

Structured prediction workflow

Claims are typed into per-drug summaries before any model call. Controlled inputs, typed outputs. Purpose-built pipeline, not prompt engineering on raw data.

Benchmarked against real outcomes

Every model configuration is evaluated on held-out actuals using MAE and RMSE in days-on-therapy. The model in production is the one that earned it.

Inspectable reasoning

Every prediction includes a reasoning trace tied to specific claims signals. Auditable, challengeable, overrideable.

Expert feedback loop

Pharmacists annotate reasoning errors in structured form. Those annotations become evaluation signal. The system improves as it is used.

Cost-efficient at scale

RxSentinel v2.1 costs below $0.05 per analyzed superclaimant. Version 1.1 achieves this at under $0.02 per superclaimant — enabling large-scale deployment without prohibitive inference costs.