by Luke Shulman

Almost 10 PieCharts 10 Python Libraries

Here is a follow-up to our “10 Heatmaps 10 Libraries” post. For those of you who don’t remember, the goal is to create the same chart in 10 different python visualization libraries and compare the effort involved.

All of the Jupyter notebooks to create these charts are stored in a public github repo Python-Viz-Compared. Each Jupyter notebook will contain one chart (bar, scatter etc) and then up to 10 different ways of implementing them. We started with Heatmaps now we move on to PieCharts.

Are Pie Charts the worst?

It’s a common visualization joke that PieCharts are the worst. Or in the words of famed data visualization guru, Edward Tufte:

One of the prevailing orthodoxies of this forum - one to which I whole-heartedly subscribe - is that pie charts are bad and that the only thing worse than one pie chart is lots of them. source

Accepted, Pie charts may be the worst. But they have one amazing purpose. Pie Charts are best for pie. That’s right our data set is all about Pie and National Pi Day is coming up in just two weeks.

The data set for this exercise is taken from the National Health and Nutrition Examination Survey. Every two years over 10,000 people are interviewed about what they ate the day before. The foods they ate are carefully categorized allowing us to really understand how Americans eat pie. How many of us eat pie on a random day? Let’s see.

#importing critical items
from IPython.core.display import HTML, SVG
import pandas as pd
import numpy as np
import xport 
import IPython 
from ipywidgets import Layout
from ipywidgets import widgets
from IPython.display import display

Data for our Heatmap

The pie categories look something like this:


53304000		Pie, blueberry, two crust

53305720		Pie, lemon (not cream or meringue), individual size or tart

Very specific. I have taken the liberty of filtering the data set including only foods that start with Pie, and then extracting the principal flavor. You can see that in the Prep NHANE Data.ipynb notebook.

We will read it into pandas and use it for all charts going forward.

Column Description
ParticipantLine a unique identifier to the survey respondent and their food
Source Where did they get this pie?
EatingOccasion When did they get this pie?
AtHome Boolean whether they ate the pie at home?
Calories Estimated calories of their pie?
FoodCode The predmoniant flavor of the pie. (see above)
pie_raw = pd.read_csv('data/Pie_Data.csv')
pie_raw.head()
ParticipantLine Source EatingOccasion AtHome Calories FoodCode
0 73570.015.0 Store Dinner True 428.0 lemon (not cream or meringue)
1 73598.015.0 Restaurant with Waiter-Waitress Dinner False 726.0 pumpkin
2 73633.08.0 Store Lunch False 309.0 pumpkin
3 73653.018.0 From Somewhere Else-Gift Snack True 123.0 lemon (not cream or meringue)
4 73726.017.0 Bake Sale Snack True 17.0 blueberry

Pie charts are often presented in sets. They show the realtive proportion of different groupings. In our tests below, we will try to graph both where Americans get their pie (source) and what flavors they prefer (FoodCode). This provides a new test for the ten libraries used to visualize them. Should a visualization library include layout tools to create a grid or rows of different charts. Or, should charts be their own distinct elements to be composed by a document such as Jupyter (or by the DOM for the web)?

All libraries demonstrated here answer this question differently.

So let’s get our data on pie sources and pie flavors together. We will use a standard pandas aggregation to create two data frames from which most of the visualizations will be built. This

pie_sources = pie_raw.groupby('Source').agg('count')
pie_flavors = pie_raw.groupby('FoodCode').agg('count')
ParticipantLine EatingOccasion AtHome Calories FoodCode
Source
Bake Sale 2 2 2 2 2
Convenience Store 2 2 2 2 2
FastFood 14 14 14 14 14
From Somewhere Else-Gift 41 41 41 41 41
Grown by Someone you know 1 1 1 1 1
Non-School Cafeteria 1 1 1 1 1
Restaurant with Waiter-Waitress 14 14 14 14 14
Soup Kitchen or Food Pantry 1 1 1 1 1
Store 87 87 87 87 87
Vending Machine 1 1 1 1 1

Pie Chart 1: MatplotLib

First up matplotlib, the most venerable python visualization library with support to export and use many many rendering types (png, pdf, svg etc).

import matplotlib.ticker as ticker
import matplotlib.cm as cm
import matplotlib as mpl
from matplotlib.gridspec import GridSpec

import matplotlib.pyplot as plt
%matplotlib inline
#credit https://matplotlib.org/devdocs/gallery/pie_and_polar_charts/pie_demo2.html#sphx-glr-gallery-pie-and-polar-charts-pie-demo2-py

source_labels = pie_sources.FoodCode.sort_values().index
source_counts = pie_sources.FoodCode.sort_values()

flavor_labels = pie_flavors.Source.sort_values().index
flavor_counts = pie_flavors.Source.sort_values()

# Make square figures and axes
plt.figure(1, figsize=(20,10))
the_grid = GridSpec(2, 2)


cmap = plt.get_cmap('Spectral')
colors = [cmap(i) for i in np.linspace(0, 1, 8)]


plt.subplot(the_grid[0, 1], aspect=1, title='Source of Pies')

source_pie = plt.pie(source_counts, labels=source_labels, autopct='%1.1f%%', shadow=True, colors=colors)


plt.subplot(the_grid[0, 0], aspect=1, title='Selected Flavors of Pies')

flavor_pie = plt.pie(flavor_counts,labels=flavor_labels, autopct='%.0f%%', shadow=True, colors=colors)

plt.suptitle('Pie Consumption Patterns in the United States', fontsize=16)


plt.show()

png

And now we can clearly see one of the major drawbacks of pie charts. Any dimension of greater than five elements looks awful and can’t be differentiated. Let’s fix this going forward with some pandas magic which will group lesser values under “other”.

The formula below will calculate the 75th quantile and group the lesser values together:

def group_lower_ranking_values(column):
    pie_counts = pie_raw.groupby(column).agg('count')
    pct_value = pie_counts[lambda df: df.columns[0]].quantile(.75)
    values_below_pct_value = pie_counts[lambda df: df.columns[0]].loc[lambda s: s < pct_value].index.values
    def fix_values(row):
        if row[column] in values_below_pct_value:
            row[column] = 'Other'
        return row 
    pie_grouped = pie_raw.apply(fix_values, axis=1).groupby(column).agg('count')
    return pie_grouped

pie_sources = group_lower_ranking_values('Source')
pie_flavors = group_lower_ranking_values('FoodCode')
        

We can then remake our chart:

png

Overall, matplotlib/pyplot pie charts are pretty easy. Notice we setup a 1 row grid and placed two subplots within that grid. That allowed matplotlib to draw each plot in one overall figure. The next two libraries use matplotlib as a backend so you will notice some of the same layout features used.

Almost Pie Chart 2 Seaborn

When we did the post on heatmaps, I wrote about Seaborn’s special use case:

Seaborn is a streamlining of matplotlib’s API to make it more applicable to statistical applications. Seaborn’s API makes you think about the best way to compare univariate or bivariate data sets and then has clear and concise syntax to get the charts needed to immediately compare your variables.

Given this use case, there is actually NO way to do a pie chart using Seaborn. This makes sense. Pie charts are a difficult and deceiving way of comparing univariate data. A bar chart can always replace a pie chart so pie chart is simply not included and shouldn’t be included. Of course being an open source project, people have requested it. However, Seaborn is the ultimate swiss-army knife for data science. Part of creating the perfect tool for peering into data means leaving out views that aren’t helpful or frankly deceptive by design. Fear not, every pie chart can be a Bar Chart. So where we cannot see pie, we can still visualize pie.

import seaborn as sns

sns.set(style="whitegrid")
#sns.set_color_codes("Spectral")

source_data = pd.DataFrame(source_counts).reset_index()
flavor_data = pd.DataFrame(flavor_counts).reset_index()

plt.figure(2, figsize=(20,15))
the_grid = GridSpec(2, 2)

plt.subplot(the_grid[0, 1],  title='Source of Pies')
sns.barplot(x='FoodCode',y='Source', data=source_data, palette='Spectral')
plt.subplot(the_grid[0, 0], title='Selected Flavors of Pies')

sns.barplot(x='Source',y='FoodCode', data=flavor_data, palette='Spectral')

plt.suptitle('Pie Consumption Patterns in the United States', fontsize=16)

png

Seaborn inherits axes, figures, and subplots from matplotlib. Above, we plotted the same data on a bar graph. This is instantly better because it allows the viewer to know not just the relative comparison of the flavors or sources, but also, exposes that we only have about 164 entries for Pie in the entire NHANES survey.

And that is how Pie charts deceive you. In a pie chart, I can show you relative percentiles but cover up the fact that pie isn’t eaten all that often. Of the more than 10,000 people interviewed very few had eaten pie that day.

Almost Pie Chart 3 PlotNine (ggplot2):

plotnine is the python implementation of R’s most dominant visualization library ggplot2. Like matplotlib in python, ggplot2 is the default visualization for R with support for all types of outputs.

Now you can do pie charts in ggplot2 by using polar coordinates to draw specific sectors of a circle. That is interesting and forces the user to identify exactly how a pie chart works: a full circle in radians divided by the relative percentage of each sector to be drawn.

I am back to 7th grade math. Unfortunately, plotnine, the Python implementation of ggplot, has not yet ported over the coord_polar ggplot layouts so alas we also can’t use it to create a pie chart either. Once again back to bar charts:

from plotnine import * 

sources_plot = ggplot(source_data, aes(x='Source', y='FoodCode', fill='Source')) + geom_col() + coord_flip()\
+ scale_fill_brewer(type='div', palette="Spectral") + theme_classic() + ggtitle('Sources of Pie')


flavors_plot = ggplot(flavor_data, aes(x='FoodCode', y='Source', fill='FoodCode')) + geom_col() + coord_flip()\
+ scale_fill_brewer(type='div', palette="Spectral") + theme_classic() + ggtitle('Flavors of Pie')


display(sources_plot, flavors_plot)

png

png

So alas, pie charts not supported. Also, the layout features of ggplot that would allow for layout multiple plots are not yet implemented so we used Jupyter commands to display both outputs.

Still, ggplot2 and plotnine have a nice natural rhythm to them. I love the grammar of defining my plot and continuing to add elements that shift it. It feels like functional programming for visualization as opposed to just a big script of code.

Pie Chart 4: BqPlot

Based on the VanderPlas taxonomy, the next four libraries are from a different core set of assumptions. These all use a python API to customize a javascript client-side framework that renders the data and figure in the browser. The advantages to this approach are that the figures have a modern look and can include rich browser interactions such as zooming, selection, and filtering.

import bqplot
from palettable.colorbrewer.diverging import *
#Make sure you enable the jupyter extension


colors = Spectral_8.hex_colors
colors.reverse()

data = np.random.rand(3)
sc = bqplot.ColorScale(scheme='Spectral')
source_pie = bqplot.Pie(sizes=source_counts, display_labels=True, scales={'color': sc}, colors=colors)
flavor_pie = bqplot.Pie(sizes=flavor_counts, display_labels=True, scales={'color': sc}, colors=colors)


source_pie.labels = source_counts.index.values.tolist()
flavor_pie.labels = flavor_counts.index.values.tolist()
source_fig = bqplot.Figure(marks=[source_pie], title='Sources of Pie')
flavor_fig = bqplot.Figure(marks=[flavor_pie], title='Flavors of Pie')
widgets.HBox([source_fig, flavor_fig])

BqPlot

BqPlot also doesn’t have built-in layout options but because these are simply javascript objects, integrated as widgets, we can arrange them in a nice horizontal row using the Jupyter widgets layout options. The HBox in Jupyter creates a horizontal box into which we can insert these elements.

Pie Chart 5: plotly

plotly is fantastic plotting library that combines a free, open-source version and also a paid version that offers some server assisted features. Plotly has amazing cross-platform support for python, R, and Javascript. It has also has great documentation and example library.

You will need to register for a free plotly account to use this one. You will register it from the command line by running the following:

import plotly
plotly.tools.set_credentials_file(username='DemoAccount', api_key='i8i8i8i8i8')

And then here is our chart: I have swapped the chart for static images so some of the interactivity may be disabled

import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools


sources_pie = go.Pie(labels=source_counts.index, values=source_counts, marker=dict(colors=colors
                                                            , line=dict(color='#FFF', width=2)), 
                                                            domain={'x': [0.0, .4], 'y': [0.0, 1]}
                                                            , showlegend=False, name='Sources of Pie', textinfo='label+percent')

flavor_pie = go.Pie(labels=flavor_counts.index, values=flavor_counts, marker=dict(colors=colors
                                                            , line=dict(color='#FFF', width=2)), 
                                                            domain={'x': [.6, 1], 'y': [0.0, 1]}
                                                            , showlegend=False, name='Flavors of Pie', textinfo='label+percent')

layout = go.Layout(height = 600,
                   width = 1000,
                   autosize = False,
                   title = 'Pie Consumption Patterns in the United States')
fig = go.Figure(data = [sources_pie,flavor_pie ], layout = layout)


py.iplot(fig, filename='basic_pie_chart')

#https://stackoverflow.com/questions/39629735/how-to-plot-pie-charts-as-subplots-with-custom-size-with-plotly-in-python

Plotly as you can see was very succinct and it added interactivity automatically. The online support also is really great. By visiting the links at plotly, you can can edit the chart in a sort of gui on their website and even regenerate the code used to create the plot. Notice the intelligent handling of the labels for each wedge. This is so hard to do and calculate reliably. Plot.ly really excels at these little details.

This is sort of a trick. Cufflinks is plotly just with a different api designed to be run directly from a pandas dataframe. This makes the data inputs easier set-up and use in the charts.

import cufflinks as cf

# Correct datatypes cufflinks does not support CategoryType so we make them strings and rebuild the dataframe. 
source_df  = pd.DataFrame(source_counts).reset_index()
flavor_df = pd.DataFrame(flavor_counts).reset_index()
source_pie = source_df.iplot(kind='pie', labels='Source', values='FoodCode', colors=colors, title='Sources of Pie')
flavor_pie = source_df.iplot(kind='pie', labels='Source', values='FoodCode', colors=colors, title='Flavors of Pie')

display(source_pie, flavor_pie)



Plotly Plotly

Embedding ploty.ly and pandas is really powerful but alas it means no composing of sub-plots into a single figure.

Pie Chart 7: Bokeh

Bokeh is another combination javascript client library and python API. Developed and maintained by Anaconda (formerly Continuum Analytics). As mentioned in previous posts, bokeh serves as the backend for newer libraries being developed by the Anaconda team.

Now remember all the circle angle talk above with ggplot. Well, Bokeh is going to require us to manually calculate, in radians, the start and end angle of each wedge.

from math import pi
source_pct_df = pd.DataFrame(source_counts/source_counts.sum()).reset_index().sort_values(by='FoodCode')
flavor_pct_df = pd.DataFrame(flavor_counts/flavor_counts.sum()).reset_index().sort_values(by='Source')


source_starts = [p*2*pi for p in source_pct_df.cumsum().FoodCode[:-1]]
source_ends = [p*2*pi for p in source_pct_df.cumsum().FoodCode[1:]]
source_legends = source_pct_df.Source.values.tolist()[1:]

flavor_starts = [p*2*pi for p in flavor_pct_df.cumsum().Source[:-1]]
flavor_ends = [p*2*pi for p in flavor_pct_df.cumsum().Source[1:]]
flavor_legends = flavor_pct_df.FoodCode.values.tolist()[1:]
 
from bokeh.io import show, output_notebook
from bokeh.models import (
    ColumnDataSource,
    HoverTool,
    LinearColorMapper,
    BasicTicker,
    PrintfTickFormatter,
    ColorBar,
    FactorRange,
    Row
)
from bokeh.plotting import figure
from bokeh.palettes import Spectral
import holoviews as hv #There is a reason we have to do this here but its not important. Holoviews is the next library
hv.extension('bokeh')




#ColumnDataSource is bokeh fancy shared datasource. Not applicable here but it would generally allow the sharing of one data source
#with multiple charts. 

size=500

z = figure(title="Sources of Pie Consumption", x_range=(-1,1), y_range=(-1,1), width=size, height=size
          )

for start, end , legend, color in zip(source_starts, source_ends, source_legends, colors[0:len(source_starts)]):
    z.wedge(x=0, y=0, radius=1, start_angle=start, end_angle=end, color=color, legend=legend)

y = figure(title=" Flavors of Pie Consumption", x_range=(-1,1), y_range=(-1,1), width=size, height=size
          )

for start, end , legend, color in zip(flavor_starts, flavor_ends, flavor_legends, colors[0:len(flavor_starts)]):
    y.wedge(x=0, y=0, radius=1, start_angle=start, end_angle=end, color=color, legend=legend)

z.legend.location = 'bottom_right'
y.legend.location = 'bottom_right'
r = Row(z, y)

show(r)


bokehoutput

Did that hurt you as much as it hurt me? Bokeh has a great grammer of graphics but, you have to calculate everything about the wedges manually. So at the top, we calculated the start and end angle of each wedge in radians. To do that we need the cumulative sum of each percentage. After all of that, we still have a gap in the pie chart because our cumulative sums don’t start at zero.

Don’t ever make a pie chart in bokeh. This stack overflow was critical to figuring this out. If rule 1 of visualization is “don’t make a pie chart” rule 2 is don’t make it in bokeh.

Pie Chart 8: Holoviews

Holoviews uses bokeh as its underlying engine but reduces the verbosity by having the user declare attributes about their data and allowing the visualizations to infer themselves from the dependent and independent variables, referred to as value dimensions (vdims) and key dimensions (kdims). It’s really great but that also means it has no use for Pie Charts. We will use a bar chart just to show off.

%%opts Bars [xrotation=90 width=600 height=500 show_legend=False tools=['hover'] invert_axes=True ]
%%opts Bars (fill_color=Cycle('Spectral'))
%%opts Layout [shared_axes=False]


import holoviews as hv
hv.extension('bokeh')
pie_raw_holo_table = hv.Table(pie_raw, kdims=['Source', 'FoodCode'])
sources_chart = pie_raw_holo_table.aggregate('Source', function=np.count_nonzero).to.bars(group='Sources of Pie Consumption ')
flavors_chart = pie_raw_holo_table.aggregate('FoodCode', function=np.count_nonzero).to.bars(group='Flavors of Pie Consumption ')
sources_chart + flavors_chart

Holoviews

One of the advantages of holoviews is that before declaring what visualization you want, you actual identify the structure of your data and then how you want that data summarized and displayed. As stated in the holoviews introduction:

HoloViews focuses on bundling your data together with the appropriate metadata to support both analysis and visualization, making your raw data and its visualization equally accessible at all times.

To demonstrate this notice how we first created a Holoviews table from the raw data pie_raw. From there we were able to go all the way through our data preparation process and finally simply ask for a bar graph of the data. We lost the groupings but with a bar chart we can see all the detail without a problem. Holoviews delivers one easy nicely formatted chart in very few lines of code. Also, the layout is as easy as using the + and * operators to join charts together. This is by far the best layout syntax we have used so far. That means if you are blending multiple types of charts using the same underlying data Holoviews might be the best option for you.

The coloring did not work though I am fairly sure that I got the Cycle syntax correct. Someone feel free to comment with how I should fix that.

Pie Chart 9: Altair

Altair the python implementation of the Vega-lite Specification. What’s the difference between vega and vega-lite? Well, there are lots of differences and one of them is that Pie charts aren’t supported. We are lucky because even in full vega it would require us to calculate the angles once again manually just like in Bokeh.

For now, here is a bar chart of our data.

from altair import Row, Column, Chart, Text, Scale, Color

source_altair_chart = Chart(source_df).mark_text(
               applyColorToBackground=True
           ).mark_bar().encode(x='FoodCode',y='Source', color=Color('Source', scale=Scale(range=colors)))

#"scale": {"scheme": "bluepurple"} doesn't work

flavor_altair_chart = Chart(flavor_df).mark_text(
               applyColorToBackground=True
           ).mark_bar().encode(x='Source',y='FoodCode', color=Color('FoodCode', scale=Scale(range=colors)))

display(source_altair_chart, flavor_altair_chart)

png

png

Pie Chart 10: PyGal

New to our analysis is PyGal. Described as “Sexy python charting” which is great. PyGal uses a range of svg and css frameworks to create rich in browser visualizations with a syntax that is dramatically simpler. The syntax of PyGal is very different in that you will compose each data set almost row by row. This seems like it could be hassle but it is very easy in practice. Because it generates raw SVG, you use Jupyter’s own DISPLAY command to visualize it making this very portable around the web. In fact even on the blog, the SVG is embedded right here.

import pygal 
from pygal.style import Style
custom_style = Style(
  colors=colors,
    legend_font_size = 4,
    title_font_size = 5,
    value_font_size=4)


sources_pygal_pie = pygal.Pie(width=150, height=150, style=custom_style, legend_box_size=4
                              , title='Sources of Pie Consumption', print_values=True)
flavors_pygal_pie = pygal.Pie(width=150, height=150, style=custom_style, legend_box_size=4
                              , title='Flavors of Pie', print_values=True)

for row in source_pct_df.values.tolist():
    source = str(row[0])
    pct = row[1]*100 
    sources_pygal_pie.add(source, pct)  # Add some values

for row in flavor_pct_df.values.tolist():
    flavor = row[0]
    pct = row[1]*100
    flavors_pygal_pie.add(flavor, pct)  # Add some values

sources_pygal_pie.value_formatter = lambda x: "{:.0f}".format(x)
flavors_pygal_pie.value_formatter = lambda x: "{:.0f}".format(x)
sources_display = sources_pygal_pie.render() 
flavors_display = flavors_pygal_pie.render()
no_wrap_div = '<div style="white-space: nowrap; overflow-x: auto">{}{}</div>'

#credit for this idea at  https://stackoverflow.com/questions/44752380/how-to-display-two-svg-images-on-the-same-line-in-jupyter-notebook

display(HTML(no_wrap_div.format(sources_display, flavors_display)))

b'\nSources of Pie Consumption532.0607653369548134.158214278965296937.33816192501648636.16856674721868942.0982350189742541.5097821887960452541.18879263689742555.053867838962365316.56187622248874248.791057450420624Sources of Pie Consumption5992553OtherFastFoodRestaurant with Waiter-WaitressFrom Somewhere or GiftFrom Somewhere Else-GiftStore'b'\nFlavors of Pie532.0607653369548134.158214278965296736.6752095723941235.76579456611493740.89124830067641639.52311398771343743.2530623199575444.92972002627698842.9465120575494851.3259411317643241138.58927297140855557.9151039275962352324.9534695301496260.021283089878283218.5300841229925940.380236677081655Flavors of Pie57778112332 cherry chocolate crean chocolate cream lemon meringue lemon (not cream or meringue) lemon (not cream or meringue) sweet potato pumpkin appleOther'

PyGal wasn’t included in our previous edition of 10 for 10 (pour one out for Lightning-viz which is retired). Considering it creates fully scalable vectors, it was fantastic and much simpler to use than Altair which is the leading full SVG visualization library. PyGal also has great set of default styles that look fantastic. PyGal’s SVG web first style also makes it much easier to embed with Flask, Django or other web framework.

Wrapping Up

What is the overall takeaway from this post? Well, you should clearly understand by now is that you don’t want to use pie charts ever. It is routinely left out of visualization packages or buried so deep that its almost impossible to implement. Pie charts are also visually deceiving for your users so best to avoid them.

The real lessons here in this post involve how you take two charts and lay them out together in one seamless view. We can see some of the libraries Bokeh, Matplotlib provide natural and embedded ways of combining two plots that don’t share data or axes so that they can play well together. Other libraries rely on you the user to layout the final product either using Jupyter HTML commands or with in your website or publication.

Lastly there are some real differences in terms of data handling:

  • Matplotlib and Plot.ly were able to use the counts under each grouping to easily create the pie chart and automatically calculate the percentages for each wedge.
  • BqPlot, Pygal needed to have the percentages pre-calculated. Not terribly difficult but just different.
  • Bokeh required that you calculate the exact angles and percentages.
  • Holoviews was the only that could really just use the raw data from the survey itself.

Feel free to leave comments below or reach out to any of us at Algorex Health.

I have converted most of the charts to png images for this post. This allows for faster loading and better integration with our CMS. However, the working notebook that generates all the charts in the browser is at Python-Viz-Compared