by Luke Shulman

Calculating Growth and Obesity from FHIR Messages

When parents bring their kids to the pediatrician, one of the most important checks is to have the child’s height, weight, and BMI plotted on growth charts to understand their development. The charts, a image of which appears below, are ubiquitous and critical to the tracking the health of kids.


Sure enough, many many EHRs automatically show the graph and the patient’s position on the chart right inside the workflow making it easy for physicians and patients to review. They include automated features to measure growth velocity and other elements. However while the EHR stores the BMI, height, and weight measurements, Most systems do not store the comparative measure of those values for the child’s age. Even going back to the foundation of the EHR incentive programs in 2010, EHRs were required to “record and chart changes in height, weight and blood pressure” but only to “plot and display growth charts for children between 2 and 20 years of age.”

This leaves analysts and data scientists in a tricky position when it comes to estimating obesity for pediatric patients. In meetings just last week, a reporting team we work with, was lamenting the lack of obesity related ICD codes for pediatrics along with only the BMI values not the percentiles. This was significant barrier to measuring the prevalence of childhood obesity in their population.

How do we identify children who may be obese absent this information?

This is no slight against the EHRs. These values are calculated based on outside references (provided by the WHO or CDC), they are interpretations, and are not on their own a vital sign. As the quote above demonstrates, the EHR vendors prioritized the plotting of the value on the chart and associated display. As such, it is understandable that BMI for age as percentile is not stored.

In this Jupyter notebook, we are going to calculate the BMI percentiles for 2000 sample patients. The patient population is a synthetic population of 2,026 pediatric patients (aged 0-20 years) created using the amazing Synthea tool. So we will be reading these patient’s charts in the form of FHIR documents directly. In the process we will do the following:

  1. Identify each patient’s most recent BMI from a FHIR document that simulates two years of visits.
  2. Use the CDC’s reference tables to identify the nearest percentile for that patient’s measurement from the previous step.
  3. Create a population level chart that shows not only growth charts derived from our 2,000 patients but also identifies obese children with BMI’s above 90%.

This notebook is stored on github so feel free to fork and use on your own. All the directions for set-up are there.

Let’s start by setting up libraries we will need.

import json
import fhirclient.models.bundle as b
import fhirclient.models.patient as p
import fhirclient.models.observation as o
from pathlib import Path
from decimal import *
import pandas as pd
import matplotlib
from datetime import date, timedelta, datetime
import csv
import altair as alt
import numpy as np
import seaborn as sns

%matplotlib inline

We have posted a zip file of the 2,026 FHIR patient bundles that you can use to run this notebook. Unzip the file to the data/directory to continue.

Below, we will verify that you have some FHIR files to run.

p = Path('data/fhir')
files_to_load = [f for f in p.glob('*.json')]

if len(files_to_load) == 0:
    raise StopIteration("You don't have any FHIR files")
    print("Loading {0} FHIR Files".format(len(files_to_load)))
Loading 2026 FHIR Files

We will also load as constants the CDC data tables that contain the percentile values for BMI at various ages.

def getBMITable(sex):
    if sex == 'M':
        file = 'data/male-2-20.csv'
    elif sex == 'F':
        file = 'data/female-2-20.csv'
    with open(file, 'r', encoding='utf-8-sig') as csvfile:
        reader = csv.DictReader(csvfile, quoting=csv.QUOTE_NONNUMERIC, fieldnames=['AgeMonths',3,5,10,25,50,75,85,90,95,97])
        bmi_table = [row for row in reader]
    return bmi_table


The functions below uses the SMART Python FHIR Client to parse the FHIR bundles and obtain the elements we will need:

  • the patients date of birth
  • their sex
  • all of their BMI observations and the values
def getBMIObeservation(entry):
    if entry.resource.resource_type == 'Observation' and entry.resource.code.coding[0].code == '39156-5':
        return (, entry.resource.valueQuantity.value)
        return None

def getDateOfBirth(entry):
    if entry.resource.resource_type == 'Patient':
def getSex(entry):
    if entry.resource.resource_type == 'Patient':
        return entry.resource.gender
def getAgeMonths(dob, visitdate):
    age_in_months = (
    return age_in_months

Now we will tie all of this together as follows:

def getLatestBMI(bundle):
    # Find any obersvations with a BMI
    possible_visits_with_bmis = [getBMIObeservation(entry) for entry  in bundle.entry if getBMIObeservation(entry) ]
    if possible_visits_with_bmis:
        visit_date, bmi = max(possible_visits_with_bmis, key=lambda x:x[0]) # Get us the latest BMI
        return None #no possible BMIs 
    #Everyone's birthdate should be same but there is a possibility that this is notated several times in a FHIR bundle
    birthDate = max([getDateOfBirth(entry) for entry in bundle.entry if getDateOfBirth(entry) ])
    age = getAgeMonths(birthDate, visit_date) #uses the visit date from above
    if 24.0 < age > 240.0:
        return None #patients older than 20 and below 2 don't need  this measurementt
    sex = max([getSex(entry) for entry in bundle.entry if getSex(entry) ])
    if sex == 'male':
        bmi_table = MALE_BMI_TABLE
        bmi_table = FEMALE_BMI_TABLE
    age_row = min([row["AgeMonths"] for row in bmi_table if row["AgeMonths"] > age ])
    #Get their BMI from the CDC tables
    bmi_pct_row_values = ([row for row in bmi_table if row['AgeMonths'] == age_row][0]).copy()
    bmi_pct_row_values[100] = 100 #this is the maximum case 
    pct = min([percentile for (percentile, bmi_value) in bmi_pct_row_values.items() if bmi <= bmi_value])
    return {'ageMonths':int(age),'BMI':bmi, 'BMIpct':pct, 'sex':sex}

Now for each file we have, we will run the functions we defined and save the results. This can take up to 2 minutes for all the files

bmi_results = []

for f_json in files_to_load:

        with open(f_json, 'r') as jsonfile:
            json_results = json.load(jsonfile)
    bundle = b.Bundle(json_results) 
    result = getLatestBMI(bundle)
    if result:

We can now use a Pandas dataframe and seaborn to see how our population looks.

df = pd.DataFrame(bmi_results)
df['ageYears'] = (df.ageMonths/12).astype(int)
# data_to_plot = df.sort_values('ageMonths').groupby(['ageMonths', 'sex']).agg({'BMI':np.mean, 'BMIpct':np.median})
data_to_plot = df.sort_values('ageMonths').reset_index()

The seaborn charts below plot the observed BMIs at the 10th, 25th, 50th, 75th, and 90th percentile. You can see the fitted polynomial regressions (which are smooth lines) start to appear a lot like the CDC growth chart.

lm = sns.lmplot(data=data_to_plot[(data_to_plot.BMIpct.isin((10,25,50,75,90)) )], 
           x='ageMonths', y='BMI', hue='BMIpct', col='sex', order=3)

axes = lm.axes



Lastly, we can view our population averages across both sexes.

agg_data_to_plot = df.groupby('ageYears').mean().reset_index()
sns.lmplot(data=agg_data_to_plot, x='ageYears', y='BMI', order=3)
<seaborn.axisgrid.FacetGrid at 0x11b39e7b8>


Other Resources

There is a great python package PyGrowUp that more accurately calculates Z-Scores which are the better statistical method for tracking a patient’s deviation from the population. In addition, the CDC has published a very nice SAS program that helps get the official formulation for calculating both he Z-score and percentile. These packages also support the height, and weight measurements for children as well as the BMI percentile metric we calculated above.

Still, I hope you find this helpful especially if you are trying to measure obesity in pediatric populations.