by Luke Shulman

Everything You Never Wanted to Know about Geographic Charting

Before the start of seventh grade, my family moved, and I started a new school that had as required module, cartography. Over a few weeks, we painstakingly drew maps of states, continents and eventually the entire world. My backpack became filled with various stencils, a compass, colored pencils, pens of multiple weights, and even tracing paper. It was one of the more frustrating parts of my schooling and I remember thinking “Why would I ever need this?”

Well fast forward and now the possibility to visualize data on maps for our computers and devices has made cartographers of every aspiring data scientist.

But there is a steep learning curve. This post summarizes the key concepts that I had to relearn before I really understood how to build map visualizations on a regular basis. For me the key understanding came when I mastered five concepts:

  1. Data Sources: Where do we get geographic data and in what format?
  2. Geometry: Specifically how we represent geometric shapes in a data source for mapping?
  3. Projection: How do you take our squashed basketball of a planet and represent it on a computer screen?
  4. Layers and Tilesets: What do we look at when we see a map?
  5. Calculate Something: How do we make an analysis that works and provides insight?

For this post, we are going to code in Python with visualizations built primarily in Folium. The Data used is sourced the Greensboro, NC Fire Incident Database from Greensboro Open Data

Data Sources

If you’re just dealing with latitudes and longitudes, those are in traditional table structures can often found in a CSV file or flat file of some sort. However more complicated geographic data sources like those prepared by the US Census or state GIS agencies are in a range of file formats that can be very difficult to utilize.

These geographic data files contain structures for various geographies and projections necessary to exchange data and those topics we will cover later in this post. But for now here are the major file types you are going to run into.

  1. Shapefiles (.shp): Shapefiles are a common standard, propagated by ESRI. First issue with shapefiles is that they aren’t files; they are really directories. Although one of the files is a .SHP file type, all the files that are included in the directory are necessary to utilize the shapefile. (I’ve made that mistake before)
  2. KML Files: Keyhole Markup Files (KML) an open standard that uses html/xml style markup for geographic data. KML files are great and contain everything in one file BUT they aren’t supported by Fiona and GeoPandas which are the primary ways that I extract data sources into Python for analysis. While that seems like drawback, I haven’t seen a data source yet that has only KML and not any other file type. FastKML looks to be a very strong library for supporting KML files if you need it.
  3. GeoJSON: The newest standard on the block, GeoJSON, is gaining traction because it integrates nicely with web-client libraries for visualization. For this reason, like JSON, GeoJSON is becoming more of a de facto standard.

There are others of course but these are the primary ones you will come across.

Geometry

I am not sure why but for some reason it was a realization to me that latitude and longitude are really just x,y coordinates. They are a “Point.” A road is a Line or more precisely, because its constantly turning, it is a “Polyline”. Lastly the boundaries of a landmass or a country or city are a “Polygon”. To represent these on the map, you need to understand how the data structures from above are going be shown on a chart or visualization. Everything you see on a map boils down to these basic unit-less shapes: Point, Line, Polygon repeated over and over again with varying styles is what creates the map.

Now, let’s see this in action:

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point

point_maker = lambda row: Point(row['Longitude'], row['Latitude'])
fire = pd.read_csv("data/GreensboroFire.csv")
fire_geo = gpd.GeoSeries(fire[['Latitude', 'Longitude']].apply(point_maker, axis=1))
p = fire_geo.plot()
p.set_title("Fire Incidents Causing over $100,000 of Damage in Greensboro, NC")
plt.savefig('GreensboroFiresNoInformation.png')
show(p)

GreensboroNo Info

Not very exciting. We see only points on a chart. We can’t see where these points are. We lack perspective. That’s because although we have successfully charted the geometry, we haven’t placed these points onto Earth. That requires our next set of concepts.

Projection

In learning cartography in 7th grade, we all had to draw something on a Clementine, peel it, and attempt to recreate the picture on our desks. This was a simple example of how tricky it can be to make round things appear on flat surfaces.

Remember these:

import cartopy.crs as ccrs
import matplotlib.pyplot as plt
%matplotlib inline

ax1 = plt.subplot(321, projection=ccrs.LambertConformal())
ax1.coastlines(resolution='50m')
ax1.set_title('Lambert Conformal Projection')
ax2 = plt.subplot(322, projection=ccrs.AlbersEqualArea())
ax2.coastlines(resolution='50m')
ax2.set_title('Albers-Equal Area Projection')
ax3 = plt.subplot(323, projection=ccrs.Mercator())
ax3.coastlines(resolution='50m')
ax3.set_title('Mercator Projection')
ax4 = plt.subplot(324, projection=ccrs.Mollweide())
ax4.coastlines(resolution='50m')
ax4.set_title('Mollweide Projection')
ax5 = plt.subplot(325, projection=ccrs.Stereographic())
ax5.coastlines(resolution='50m')
ax5.set_title('Sterepgraphic Projection')
ax6 = plt.subplot(326, projection=ccrs.InterruptedGoodeHomolosine())
ax6.coastlines(resolution='50m')
ax6.set_title('Interrupted Goode Homolosine Projection')
plt.tight_layout()


plt.savefig("Projections.png")

These are six examples of map projections.

A map projection is a systematic representation of all or part of the surface of a round body, especially the Earth, on a plane… This cannot be done without distortion; the cartographer must choose the characteristic which is to be shown accurately at the expense of others or a compromise of several characteristics…It cannot be said that there is one “best” projection for mapping. Source John P. Snyder USGS

In addition to the projection, maps represent the coordinates and axes of X and Y differently. This is known as the coordinate reference system (CRS). In some reference systems, the speed of calculation and ease of use as in the United States Grid System which allows accurate 1km grid maps to be made of any US-based location. In short, Latitude and Longitude and all our shapes and lines must be adjusted according to the projection and the CRS.

These attributes are literally the language of the map and without them the data you wish to plot is often meaningless and can’t be shown. If you tried to feed the “Points” assembled in the code above to a mapping application like Google Maps, you would get an error:

ValueError: Cannot transform naive geometries.  Please set a crs on the object first.

This error is essentially that the coordinate reference system (CRS) is not known. The system doesn’t know what language you want to speak.

To help sort this out, projections, datums, CRS attributes have all been encoded into an internationally recognized shorthand of which two are the most important for map visualization:

  • WSG84 AKA EPSG:4326: This a coordinate system that you are probably most familiar with it has +- 180.000 of Longitude and +-90 of Latitude. 0,0 is in the Atlantic Ocean south of Ghana.

  • EPSG:3857 Web Mercator: This is the Web Mercator projection that is used by almost all of web-service based map providers. To visualize your map in a web browser, backed by the maps of Google or Mapbox, this is the projection you have to use even if you have to convert your raw to this standard during your processing.

Setting the projection is critical. Data sources like shape and KML files all have the projection baked in as metadata. But to overlay data sources onto one visualization you need to make sure that speak the same coordinate system which means converting them to a common standard.

Any problem you have of things not showing up is likely because your underlying shapes are encoded in the wrong coordinate system for what you are trying to do.

Layers and Tilesets

So, back to our fire data. Other than the title, there is no way to know that those points are Greensboro, NC. There is simply no context. There is no map.

Now, we can resolve this in two ways:

Drawing it yourself

We can draw cities and states and land masses as polygons and simply have them on our visualization as a layer.

Using the same fire_geo dataframe above, let’s try that switching to the folium visualization library:

import folium
kw = {'location': [36.1001779, -79.973463], 'zoom_start': 7}
m = folium.Map(tiles=None, **kw)

folium.GeoJson(ZCTA_geo).add_to(m) #First layer add our County boundaries as a layer
folium.GeoJson(fire_geo).add_to(m) #Second layer add our fire incident points  as a layer

m.save('NC_Counties_with_Points.html')

m

We are layering in first the county boundaries of North Carolina and then our fire incident data. Now, most people would be able to tell this was North Carolina. Some locals might even know that Greensboro lies just at the northern border. (I didn’t)

But we are still missing a lot. We are missing place names, roads, town names, lakes, parks, schools, highways, railroad tracks. All the things that make a map really feel like a map.

We just have our shapes.

Get some help

The second option is to get some help adding in all of the other map details. For that, we generally rely on a third-party provider to provide a basemap or tileset instead of having to code each element separately. Basically, these providers supply an API where you can request little individual pictures of a map bounded by the points you need. As you move around, it seamlessly reaches out and fetches more and more tiles to help you visualize it. Libraries for map visualization all do this seamlessly. In fact, if we just comment out the tiles=None in our code above our map turns into this:

There are some drawbacks to this approach. While the map is more interactive and allows zooming and panning, it can seem cluttered and overwhelming especially when your visualizing a small area. At the end of the day, you have to decide which features you want to be immediatly apparent.

Calculate Something

This all builds up to the final point. I admit a lot of my struggle building maps has just been getting things to appear. Understanding the shapes, the styles, how to get the tiles loaded what tiles to pick. It’s a lot to keep track of.

What gets lost in the shuffle is the incredible power to calculate new understanding from having a comprehensive geographic plane. The web is filled with examples of isochrones and paths that provide real intelligence about how we interact with our world and lot of these calculations are at your fingertips.

Working with our example, let’s bring in one more dataset: all of the fire stations in Greensboro. From there, we’ll calculate a 1 mile radius around each fire station and see how many of these high-cost fires occurred in close proximity to a fire station.


import folium
kw = {'location': [36.1001779, -79.973463], 'zoom_start': 11, 'tiles':'Stamen Toner'}

def area_style(feature):
    '''Function returns a dictionary of style 
    elements that will be populated as each feature is drawn'''
    style_dict = {'fillColor':'#feb24c', 'stroke':False, 'fill_opacity':1}
    return style_dict

m = folium.Map(\*\*kw) 
fire_stations = gpd.read_file("data/GreensBoroFireStations.geo.json")


fires  = folium.GeoJson(fire_geo) #Process the fire data but don't add it to the map yet
fire_station_f = folium.GeoJson(fire_stations) #process the station data but don't add it to the map yet. 
fire_station_buffers = fire_stations.buffer(.0144) #this about a 1 mile buffer around each station point
fire_areas = folium.GeoJson(fire_station_buffers
                            , style_function=area_style #call the function defined above
                           ).add_to(m)

for feature in fires.data['features']: #These for loops allow you manipulate style properties for each feature directly
    lon, lat = feature['geometry']['coordinates']
    
    marker = folium.CircleMarker([lat, lon], fill_color='firebrick', radius=5, color='firebrick')
    m.add_child(marker) #this adds it to the map

for feature in fire_station_f.data['features']:
    lon, lat = feature['geometry']['coordinates']
    icon = folium.Icon(icon='flag', color='orange')
    
    marker = folium.map.Marker([lat, lon], icon=icon)
    m.add_child(marker)
    

    
m.save('NC_Counties_with_Points_and_Stations.html')

m

In the chart, each red circle is one of fire incidents costing over $100,000 and each orange flag is a fire station with it’s 1 mile buffer projected around it. We’ve also picked an alternate tileset that minimizes some of the features and provides more contrast.

All in all it helps make for a different feel.

It’s a pretty basic example. When you’re looking to create a map, think about these five key points:

  • What is my data source? Am I able to process it?
  • What are the key shapes to be shown?
  • Are my sources using the same projection/CRS?
  • Should I use a map tiles and how much detail should they have?
  • What new feature can I calculate to tell the story?

There you go - start to finish a new geographic visualization.