by Luke Shulman

CMS Assignment & Care Pathways

We were fortunate enough to have this poster selected for presentation at the MIT Sloan Health and Bio-Innovations Conference. A portion of this analysis also appeared in the Arcadia Data Gallery

Creating the research poster allowed us to test Algorex Health’s underlying mission, how does advanced analytics actual get done in healthcare. So we created a research question and tried to solve it. Along the way, we documented how we solved the analytical challenges we faced and plan to structure or libraries and APIs to accelerate the next analyst faced with an assignment question.

The Problem

Population health often takes the form of a monthly or annual report totaling up costs and utilization showing totals and differences between various organizations. One of the largest applications of this mindset is the way that patients are assigned to value-based contracts.

As advanced payments models such as accountable care organizations (ACO) proliferate, a core pillar of these programs is the assignment of a population to an ACO. Current work has rightly focused on the spending and quality of the ACOs to accomplish this assignment, but is limited in its understanding of why certain patients cost more than others. For example, while one patient has a specific high-cost acute event that is unlikely to reoccur, another has a long-term chronic condition that inevitably escalates in costs.  Both patients cost the same, but one provides a cost and health remediation opportunity and has a lasting relationship with a caregiver.

The following diagram demonstrates how the CMS assignment works in practice.

Current CMS Assignment Fig. 1

As shown above, the price of the visit has the possibility to swing patient assignment in unexpected ways. Since “New” patient visits have a higher reimbursement (cost) more than established patient visits, patient 3 gets trasnferred even though they have clearly established care with care with ACO ‘A’.

In a hypothetical population of 25,000 patients, we simulated the effects of different coding scenarios such as replacing a portion of lower-cost “established visit” codes with higher-cost codes.

Coding Scenarios Fig. 2

As shown, the shifting of codes can have a dramatic effect on the overall assignment. These scenarios create a derived population that takes the Medicare average frequency of these codes across the country.

So this got us thinking, what would be a better way to assign patients? or more than assignment what would be a better way to understand who has responsibility for a patients treatment plan?

We believe an alternative assignment methodology is warranted so that patients are assigned within the full context of their longitudinal care history.

Our poster tests a way to discover care pathways that indicate inflection points where interventions can be applied to improve cost and outcomes. Analyzing discovered care pathways allows identification of who will be rather than who was high cost.

Care Pathways

Latent care pathways are discovered by applying process mining techniques to a patient’s event time series. Preparing and accessing healthcare data is challenging given the variety of healthcare processes and data. Process mining techniques require clear identification of abstract events within the time series. Within healthcare data some events are incredibly precise requiring generalization while others are not well defined preventing additional insight.

Figure 3 below presents a visualization of the raw data output of three patient timeseries. The data here is taken from a deidentified Medicare population of 100,000 patients. For the purposes of presenting the latent care pathways, we have focused in on 459 patients who had been hospitalized with Diabetes.

PatientPaths Fig 3.

Each element is a visit/procedure. Patients received mostly preventative evaluation & management (E&M) visits as part of their follow-up from hospitalization. The middle patient received a combination of physical therapy, behavioral health therapy, and even uncovered a secondary diagnosis.

By mining these path ways for commonalities, we can identify interesting relationships within this cohort of patients.

DiabetesPaths Fig 4.

The above diagram is called a petri-net. It is the output of the process mining algorithms. Note that the numbers represent the number of times a patient transitions across the pathway. Thus, most of the our 459 patient cohort enters a loop of E&M visits that repeats up to 39 times. This is consistent with our visual inspection of the three patients above.

Note however that some patients also traverse a deviation that includes an electrocardiogram exam (EKG), which is a heat test to observe the rhythm of a patients heartbeat. A much smaller set of patients engages in either Physical Therapy, and Behavioral Health care.

Which pathway for a recently discharged diabetic patient is more effective? or less costly? While we can’t observe that yet, it is fascinating how well this system identified some of the challenges associated with chronic disease care.

The above network maps services but what if rather than mapping services, we mapped out when co morbid conditions were identified and what significance they had.

Diabetes CoMorbid Fig 5.

The above diagram is a fuzzy map. Since healthcare rarely has one prescribed path, this method of analysis allows us to understand the significant nodes while aggregating less-significant and less correlated nodes. Each “node” is comorbid condition identified for these patients or a cluster of conditions that have low overall significance but taken as a group are correlated to the others.

It is best to interpret this diagram like you would a geographic map. A map of Boston would include various neighborhood names and streets. It might not indicate Boston’s position in Massachusetts. If you zoomed out, you may see Massachusetts and New England along with major interstates but not individual cities or towns.

This diagram works the same way, every comorbid condition is a node with various loop and relationships to other conditions as they were identified. However as we raise the threshold for significance indicating more frequent and more consequential conditions, nodes are folded into clusters and aggregated.

Our chart here identifies four significant nodes whose position suggests they occur earlier in the hospital follow-up process. (remember all of these patients had been previously hospitalized). What’s more interesting is the nodes that are within the loops, substance abuse and arrhythmia which help to indicate the services we saw in figure 4 for Behavioral Health and EKG.

This is no breakthrough revelation that diabetes can be associated with heart disease and mental health challenges. But for a analytics system with no prior knowledge of the research question to automatically associate these elements is very instructive. It indicates where we could continue to push this work.

Materials and Methods

The analysis use a de-identified set of healthcare claims for approximately 100,000 patients over three years, a two-pronged analysis of both the existing cost-based assignment methodology, showing how it is susceptible to gaming, and our newer proposed care-pathway methodology was conducted. To simulate the existing methodology, we deployed a stochastic-gradient descent classifier that input preventative care charges and simulated the current CMS assignment process. We then manipulated the input data to test “game-coding scenarios.” To test our new care-pathway methodology, we used Python to ETL CMS-delivered data files into a relational database. We inverted the diagnosis and procedure oriented data model to a patient centric with longitudinal bias. This relational technique with smaller data sets was excellent for testing cohorting code. Our larger scale patient-oriented data store uses HBase to store patients by row-key, with multiple events in a column family. This provides scale-out for running many analyses simultaneously.

Conclusions and Next Steps

The approach demonstrated is more granular for population segmentation in healthcare. Rather than relying solely on retrospective cumulative analysis, a simulated “what would have been” activity allows identification of the correct intervention to be applied and which actor could apply it. While health systems invest in resources to prevent high-cost care, the analysis demonstrated allows those resources to be both deployed more effectively and paired with an appropriate intervention.

There are significant notes for this analysis due to the nature of utilizing publicly available health data to support the analysis. Specifically the de-identified data obtained from CMS suppressed key healthcare claim data elements that are available in commercial applications. For instance, the rendering provider for each service did not include degree level or specialty. In addition, patient prescribed medications were not included in the pathways as the duration of medication course was not available.


This work was inspired some really amazing studies that I encourage folks to read up on.

Kumar, Vikas et. al. “Exploring Clinical Care Processes Using Visual and Data Analytics: Challenges and Opportunities” Published by Polo Chau, August 24, 2014:

Mans, Ronny S., Wil Van Der Aalst, and Rob J. B. Vanwersch. Process Mining in Healthcare Evaluating and Exploiting Operational Healthcare Processes. New York: Springer, 2015. Rojas, Eric, Jorge Munoz-Gama, Marcos Sepulveda, and Daniel Capurro. “Process Mining in Healthcare: A Literature Review.” Journal of Biomedical Informatics 61 (2016): 224-36.

Gawande, Atul. “The Heroism of Incremental Care.” New Yorker (2017): n. pag. Print.

Pearce, Christopher M., Adam Mcleod, Jon Patrick, Douglas Boyle, Marianne Shearer, Paula Eustace, and Mary Catherine Pearce. “Using Patient Flow Information to Determine Risk of Hospital Presentation: Protocol for a Proof-of-Concept Study.” JMIR Research Protocols 5.4 (2016): n. pag. Web.