by Luke Shulman

My Wish List for 2017

To close out the year, I am hoping on the bandwagon and putting out a listicle of just things I would like to see in 2017. This list cuts across healthcare and data science technologies.


  1. RxClass Reversal: If you do research using medication data, you will eventually come across the amazing set of RxNorm tools which are maintained by the National Library of Medicine. With RxNorm, you can effectively do anything related to medications. you find active ingredients dosages, generic names, brand names. But most often, you just want to move up to a higher level so you don’t have to analyze every atom. NLM has responded to this by releasing RxClass which is an API that can take an RxCUI and give you its class. This transforms “Calcium Chloride” into “Irrigation Solutions”. Really helpful. Still, the method for doing this without using the API and using the RxNorm flat files is really tricky. I know it is all based on has_VAclass but seriously to go from NDC code to a class just never seems to work and I seem to reinvent how I do it each time.

  2. Standardize use of the CCLF: When you are a CMS ACO, CMS gives you a monthly batch of claims on a flat file called the CCLF. I am actually not even sure what it stands for. This shows all the claims for your ACO population across all of CMS. This is the base for any effort to try run your ACO. But for most health systems, CMS ACO covers at most 30% of your population. The rest have Medicaid, commercial or some other payer. So listen all other payers, Emulate the CCLF. CCLF format is not perfect but it is better than having to invent 20 different claims feeds across United, Humana, Aetna, BCBS, Medicaid. It crazy just everyone look at the CCLF use the Medicare fee schedule to normalize prices please.

  3. New Version of Synthetic Public Use File: CMS has tons of file formats for researchers and qualified entities to get claims data for research. But to help this process, they also released the 2008-2010 Data Entrepreneurs’ Synthetic Public Use File. This is an amazing file for teaching people how to do healthcare analysis. It is the exact same layout as the Medicare 5% Identifiable sample file so it has linked claims for 2.2 Million patients. But all the claims are synthetic. Fake not real. We use this file to test ETL and for local development before unleashing algorithms onto client data. But, it only exists to simulate years 2008-2010 meaning it doesn’t have recent drugs and doesn’t have ICD10 codes. Please update this. It is truly an amazing resource.

  4. Commonwell Help Developers: In great news, Commonwell and its key rival Carequality came to an agreement on shared connectivity. This is a great step forward to achieving interoperability and prevent data blocking. Now please allow for developers to have a sandbox environment. Publish sample messages. move onto FHIR and actually take this seriously. Take a look at Android this is amazing technical documentation. Its searchable it has usable examples its even fun. Both Commonweall and CareQuality give some amazing PDFs of 203 and 53 pages respectively and that is it. Enjoy!

  5. Stop Using TINs: This is really barking up the wrong tree. Because healthcare transactions are often financial (through claims), the most consistent identifier of who gets paid what was the Tax ID Number (TIN). This is the corporate equivalent of a company’s Social Security Number. Although National Provider IDs (NPIs) were rolled out over 10 years ago, payees of claims as corporate entities are still often identified by TINs and not by the Type 2 NPI which were meant to identify facilities and corporate organizations. Here is the problem. I can verify an NPI. I can get real records about how that provider was credentialed with CMS. But, TINs are private and basically sensitive information and can’t be verified or maintained through any public source.

##Technology and Other

  1. Jupyter Touchbar Commands: I do a ton of data science work in Jupyter notebooks and I am fortunate to have a new Touchbar MacBook Pro. It would be amazing for commands like Run Cell, Run All, to be available on the Touchbar when I work in Jupyter Notebooks. I don’t even know if it is possible for websites to make available Touchbar commands but it would be cool. Maybe there is way to map a custom Touchbar command to a keyboard shortcut.
  2. Python IDE: This is more of a resolution than a wish list. But this is year I try again to use an IDE with Python. I code mostly either in Jupyter or Sublime. My challenge with IDEs has been getting their references right between a virtual environment or to the vagrant boxes that we use for local development. It’s never really worked so I end up on a terminal, vim, sublime set-up. But I have heard great things about some new IDEs like Rodeo so I am willing to try it again.
  3. Bokeh Loosen Up High-Level Charts: The Bokeh Visualization Libraryis amazing. It’s incredibly powerful and flexible. It also has a unique set-up where there is a beginner level, where you can specify that you want just a bar chart and then an advanced level where you say I want a chart of rectangles drawn this way. The challenge is that the “high-level charts” bar, donut, scatter, don’t allow access to the same underlying layer and use a different data structure from the advanced charts. In my mind, all the styling attributes from the title font size to the grid line color should be available even on the “high-level charts” but often they are just not available as arguments or attributes on the chart object.
  4. Shiny Separate from Rstudio: I had never used Shiny the R system for creating interactive web visualizations. But, it is amazing and a real game changer in terms of getting enterprise ready dashboards deployed in R. But try developing on Shiny without Rstudio. It is really hard. (still possible) but really hard. As a shout out, if you are an R developer and have never used Shiny checkout New Zealand Tourism Board Dashboard built entirely in R and hosted on the shinyapps site. Its really well done. But, how would I run this internally on a Linux web server. I need some help.
  5. Dongles and Docks: I mentioned my new MacBook Pro above and this complaint has been covered a lot. I don’t mind carrying adapters and dongles but I literally cannot buy the adapters and dongles I need. They are out-of-stock or not yet released. On Monoprice, multiple versions of USB-C display dongles were out-of-stock for the last few weeks. I still haven’t found a dock that is shipping that can connect two monitors. It is frustrating.