by Luke Shulman

Medication Days

Over the past 150 years, some of the most influential medical breakthroughs have been medications that cure disease, relieve symptoms and ultimately extend life and wellness. Considering medication data is essential for measuring the health status of a patient or population, the standards we use to measure compliance with prescriptions should be widely available. Yet processing medication data into useful statistics presents a set of unique challenges.

The primary source of medication data for population health is prescription claims data which provide a timestamp when a medication is filled at a pharmacy. Even if we assume that patients always take the medications they pickup, this data structure still just provides snapshots of a patient’s course of treatment. Medication usage is usually tracked using a metric called “Proportion of Days Covered” (PDC) which measures the ratio of days where a patient should have medication versus the number of days they actually have a supply.

It is best to illustrate this through an example. Let’s pretend you run a farm of horses who eat 10 bushels of apples a day. You have a silo that can fit 300 bushels of apples or enough for 30 days but you can’t fit anymore in your silo and your if the apples are stored there they will rot.

Lets say on Sept 1, you get you apples delivered and then on Oct 1 after 30 days your silo is empty you get another delivery leaving you no gap. In this example, your horses have food for all 30 days. The proportion of days covered is thus 1.0 calculated as 30 days you have food divided by 30 days you need food.

But now its October and your next delivery isn’t until November 1 (31 days away). You are going to be short one day of apples. Your proportion of days covered is going to be 30 divided by 31 or .97 essentially you only have 97% coverage for your horses.

Now for apples maybe this isn’t a big deal. But for medications it can be. Patients often have gaps in their coverage or are picking up their medications early meaning they have extra supply. Most quality measures using medications utilize this ratio to monitor compliance.

rxcalims The chart above shows the challenge. the patient has one significant period of overlap and then long gap until they refill their prescription again.

There exists a large body of libraries written in SAS to support the calculation of PDC. “Measuring Medication Adherence with Simple Drug Use and Medication Switching provides a great overview of the challenge and is authored by three statisticians at Walgreens and is the source of the image above.

Here we present Python functions that perform a basic calculation for PDC. We are releasing them open-source so that hopefully groups can utilize them and to help the adoption of open-source tools for health data analysis.

The following function should make that calculation. It takes in two inputs: 1. fill_dates: a list containing the dates of each claim fill 2. days supply: a list containing the days supply for of the dates in fill_dates.

def days_covered(fill_dates, days_supply, start_date, end_date):
    """fill_dates should be list of each date the prescription is filled. days_supply which is 
    a list of the days supply for each of those dates. Function also needs to account for overlapping.
    start_date = the beginning of the study period
    end_date = the end of the study period"""
    #input arrays must be same length. There should be a days supply value for each fill. 
    if len(fill_dates) != len(days_supply):
        raise ValueError("inputs must be of same length")
    #Establish the whole range of the study as an index. The days here will be denominator 
    index_range = pd.date_range(start=start_date, end=end_date, freq='D')
    #convert days supply into timedeltas to calculate when things end. 
    supply_deltas = pd.Series([pd.to_timedelta(days, unit='D') for days in days_supply], index=fill_dates)
    #find the enddates
    supply_enddates = supply_deltas + supply_deltas.index
    #trim any fills that extend outside of the study range
    supply_enddates[supply_enddates > end_date] = end_date
    #now recalculate the covered days assuming that overlapping medications mean the patient has extra supply
    new_deltas  = supply_enddates - supply_enddates.index

    covered_days  = new_deltas.sum()
    total_days =  end_date - supply_deltas.index.min()
    pdc_ratio = covered_days/total_days

    return pdc_ratio

The logic above recreates the calculation as presented in the cited paper. Let’s test the calculation by building a test case that matches the paper’s calculation of the same:

Let’s declare our input variables with the same fill dates, days supply, and study period.

#fill_dates = [datetime(2010,12,20), datetime(2011,8,25), datetime(2011,9,5),datetime(2011,9,26), datetime(2011,12,10)  ]
#days_supply = [30,30,30,30,30]
fill_dates = [datetime(2011,8,25), datetime(2011,9,5),datetime(2011,9,26), datetime(2011,12,10)  ]
days_supply = [30,30,30,30]
start_date = datetime(2011,1,1)
end_date = datetime(2011,12,31)

Then we can invoke the function and see if the calculation matches.



Data Source

To demonstrate the libraries, we have downloaded the drug event synthetic file from CMS. For ease of use, we are making available a sqlite database of the drug event file that can be used along with this notebook. This should kickstart your integration of these functions in your environment.

If you run this notebook directly, it will download the database automatically. (its pretty large almost 700mb). The code below queries those prescription events

Let’s use a patient as an example ID ‘7D8F1CC821FC36F3’ . The patient has three fills for NDC ‘53869015604’ in our data set. We will load the data into a pandas dataframe and convert the columns to a datetime format. We will also convert the days supply field to super helpful timedelta format which will help with the manipulation of the dates.

%sql select * from drug_events where DESYNPUF_ID = '7D8F1CC821FC36F3' and PROD_SRVC_ID = '53869015604';
7D8F1CC821FC36F3 233834491742583 20080815 53869015604 90.000 30 0.00 100.00 1
7D8F1CC821FC36F3 233634490454169 20081230 53869015604 30.000 30 0.00 20.00 1
7D8F1CC821FC36F3 233614491068501 20090606 53869015604 30.000 30 0.00 20.00 1

We want to make sure the days supply and fill date columns have the correct datatypes so we will run a quick conversion.

pat = %sql select * from drug_events where DESYNPUF_ID = '7D8F1CC821FC36F3' and PROD_SRVC_ID = '53869015604';
pat = pat.DataFrame()
pat['SRVC_DT'] = pd.to_datetime(pat['SRVC_DT'], format='%Y%m%d')
pat['DAYS_SUPLY_TD'] = pd.to_timedelta(pd.to_numeric(pat['DAYS_SUPLY_NUM']), unit='D')
pat['DAYS_SUPLY_NUM'] = pd.to_numeric(pat['DAYS_SUPLY_NUM'])
pat['END_DT'] = pat['SRVC_DT'] + pat['DAYS_SUPLY_TD']


The chart below shows the date of fill and days supply for our sample patient.

p = figure(plot_width=400, plot_height=175, x_axis_type='datetime', title='Patient Fills of Drug', tools=['save'])
x = [[x1, x2] for x1, x2 in zip(pat['SRVC_DT'], pat['END_DT'])]
y = [[5,5] for x in range(len(x))]
p.multi_line(x, y, color=["firebrick", "navy","green"], line_width=4.5)
p.scatter(x=pat['SRVC_DT'], y=[5 for x in range(len(pat['SRVC_DT']))], color="DarkMagenta", size=9)
p.xaxis.major_tick_line_width = 3

p.xaxis.bounds = (datetime(2008,7,1), datetime(2009,6,15))

p.output_backend = "svg"


The chart below shows the date of fill and days supply for our sample patient.

Chart Image

Based on the chart above, this patient should have several noticeable gaps. In a study period from 07/01/2008 until 06/15/09, the patient has the following coverage:

  • 08/15/2008 for 30 days
  • 12/30/2008 for 30 days
  • 06/06/2009 for 9 days our period ends on 06/15

Therefore, the proportion of days covered would be 69 over 304 days or 22%.

days_covered(pat['SRVC_DT'].tolist(), pat['DAYS_SUPLY_NUM'].tolist(), datetime(2008,7,1), datetime(2009,6,15))

Now let’s run our new function on a larger set of patients. This will calculate the proportion of days covered for all patients taking Teriparatide, a medication for osteoporosis treatment.

As shown by the histogram, 120 out of 193 patients on this medication are full compliant.

start_date = datetime(2008,1,1)
end_date = datetime(2009,1,1)
pats = %sql SELECT * FROM drug_events where prod_srvc_id = '54868540600';
pats = pats.DataFrame()
grouped = pats.groupby('DESYNPUF_ID')
pdc_results ={}
for name, group in grouped:
    fill_dates = pd.to_datetime(group.SRVC_DT).tolist()
    days_supply = pd.to_numeric(group.DAYS_SUPLY_NUM).tolist()
    pdc_results[name] = days_covered(fill_dates, days_supply, start_date, end_date)

pdc_results = pd.Series(pdc_results)
pdc_results.plot(kind='hist', title='Histogram of Patients by Proportion of Days Covered for ')


The days covered utility function is one of the methods included in Alogrex Health’s utilization analysis libraries. Algorex Health’s libraries provide simple models and a huge range of utilization and quality metrics like this one that make it trivial to perform analyses just like this whether you use Python or R.

If you want to accelerate your application of data science using claims or clinical data contact us today.