Oct 3, 2022
Week #5 Agenda
Important: Update your local environment
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import Point
from matplotlib import pyplot as plt
import seaborn as sns
import hvplot.pandas
import holoviews as hv
# Show all columns
pd.options.display.max_columns = 999
Or, how to pull data from the web using Python
Example APIs
Note: when accessing data via API, many services will require you to register an API key to prevent you from overloading the service with requests
This is an API for near-real-time data about earthquakes, and data is provided in GeoJSON format over the web.
The API has a separate endpoint for each version of the data that users might want. No authentication is required.
API documentation:
http://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php
Sample API endpoint, for magnitude 4.5+ earthquakes in past day:
http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_day.geojson
Note: GeoPandas can read GeoJSON from the web directly
Simply pass the URL to the gpd.read_file()
function:
# Download data on magnitude 2.5+ quakes from the past week
endpoint_url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.geojson"
df = gpd.read_file(endpoint_url)
df.head()
id | mag | place | time | updated | tz | url | detail | felt | cdi | mmi | alert | status | tsunami | sig | net | code | ids | sources | types | nst | dmin | rms | gap | magType | type | title | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ak022d0be8a1 | 2.50 | 63 km NNE of Aleneva, Alaska | 1665423060572 | 1665423199047 | None | https://earthquake.usgs.gov/earthquakes/eventp... | https://earthquake.usgs.gov/earthquakes/feed/v... | NaN | NaN | NaN | None | automatic | 0 | 96 | ak | 022d0be8a1 | ,ak022d0be8a1, | ,ak, | ,origin,phase-data, | NaN | NaN | 0.53 | NaN | ml | earthquake | M 2.5 - 63 km NNE of Aleneva, Alaska | POINT Z (-152.40750 58.56650 6.70000) |
1 | us6000isic | 5.00 | 146 km ESE of Kokopo, Papua New Guinea | 1665418172045 | 1665419288040 | None | https://earthquake.usgs.gov/earthquakes/eventp... | https://earthquake.usgs.gov/earthquakes/feed/v... | NaN | NaN | NaN | None | reviewed | 0 | 385 | us | 6000isic | ,us6000isic, | ,us, | ,origin,phase-data, | 48.0 | 7.791 | 0.44 | 95.0 | mb | earthquake | M 5.0 - 146 km ESE of Kokopo, Papua New Guinea | POINT Z (153.51260 -4.79460 76.60300) |
2 | ak022d0am9ta | 3.70 | 43 km W of Cohoe, Alaska | 1665417657235 | 1665421655030 | None | https://earthquake.usgs.gov/earthquakes/eventp... | https://earthquake.usgs.gov/earthquakes/feed/v... | 8.0 | 3.1 | 2.796 | None | reviewed | 0 | 213 | ak | 022d0am9ta | ,ak022d0am9ta,us6000isib, | ,ak,us, | ,dyfi,origin,phase-data,shakemap, | NaN | NaN | 0.38 | NaN | ml | earthquake | M 3.7 - 43 km W of Cohoe, Alaska | POINT Z (-152.08850 60.44080 81.30000) |
3 | pr71375998 | 2.88 | 33 km SSW of Esperanza, Puerto Rico | 1665416302150 | 1665422030964 | None | https://earthquake.usgs.gov/earthquakes/eventp... | https://earthquake.usgs.gov/earthquakes/feed/v... | 2.0 | 2.2 | NaN | None | reviewed | 0 | 128 | pr | 71375998 | ,pr71375998, | ,pr, | ,dyfi,origin,phase-data, | 19.0 | NaN | 0.35 | 191.0 | md | earthquake | M 2.9 - 33 km SSW of Esperanza, Puerto Rico | POINT Z (-65.57533 17.80767 10.73000) |
4 | us6000isi2 | 4.60 | Reykjanes Ridge | 1665408230842 | 1665415781040 | None | https://earthquake.usgs.gov/earthquakes/eventp... | https://earthquake.usgs.gov/earthquakes/feed/v... | NaN | NaN | NaN | None | reviewed | 0 | 326 | us | 6000isi2 | ,us6000isi2, | ,us, | ,origin,phase-data, | 93.0 | 9.219 | 0.42 | 137.0 | mb | earthquake | M 4.6 - Reykjanes Ridge | POINT Z (-35.19700 53.77160 10.00000) |
Let's plot them on a map:
fig, ax = plt.subplots(figsize=(10, 10))
# plot the country outline
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world.plot(ax=ax, facecolor="none", edgecolor="black")
# plot the earthquakes
df.plot(ax=ax, color="crimson", markersize=30, edgecolor='k', linewidth=0.5)
ax.set_axis_off()
A GeoService is a standardized format for returning GeoJSON files over the web.
Documentation http://geoservices.github.io/
OpenDataPhilly provides GeoService API endpoints for the geometry hosted on its platform
https://www.opendataphilly.org/dataset/zip-codes
# base URL
url = "https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/Zipcodes_Poly/FeatureServer/0/"
esri2gpd
¶https://github.com/PhiladelphiaController/esri2gpd
import esri2gpd
zip_codes = esri2gpd.get(url)
zip_codes.head()
geometry | OBJECTID | CODE | COD | Shape__Area | Shape__Length | |
---|---|---|---|---|---|---|
0 | POLYGON ((-75.11107 40.04682, -75.11206 40.047... | 1 | 19120 | 20 | 9.177970e+07 | 49921.544063 |
1 | POLYGON ((-75.19227 39.99463, -75.19240 39.994... | 2 | 19121 | 21 | 6.959879e+07 | 39534.887217 |
2 | POLYGON ((-75.15406 39.98601, -75.15494 39.986... | 3 | 19122 | 22 | 3.591632e+07 | 24124.645221 |
3 | POLYGON ((-75.15190 39.97056, -75.15258 39.970... | 4 | 19123 | 23 | 3.585175e+07 | 26421.728982 |
4 | POLYGON ((-75.09660 40.04249, -75.09661 40.042... | 5 | 19124 | 24 | 1.448080e+08 | 63658.770420 |
zip_codes.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
Let's plot it
fig, ax = plt.subplots(figsize=(6, 6))
zip_codes.to_crs(epsg=3857).plot(ax=ax, facecolor="none", edgecolor="black")
ax.set_axis_off()
https://www.opendataphilly.org/dataset/census-tracts
Note: the "API documentation" on OpenDataPhilly will link to the documentation for the CARTO database
For example: shooting victims in Philadelphia
https://www.opendataphilly.org/dataset/shooting-victims
Download link on Open Data Philly is just querying an API!
https://phl.carto.com/api/v2/sql?q=SELECT+*,+ST_Y(the_geom)+AS+lat,+ST_X(the_geom)+AS+lng+FROM+shootings&filename=shootings&format=csv&skipfields=cartodb_id
The CARTO databases can be queried using SQL. This allows you to select specific data from the larger database.
CARTO API documentation: https://carto.com/developers/sql-api/
SQL documentation: https://www.postgresql.org/docs/9.1/sql.html
SELECT [field names] FROM [table name] WHERE [query]
We'll use Python's requests
library to query the API endpoint with our desired query.
import requests
# the API end point
API_endpoint = "https://phl.carto.com/api/v2/sql"
# the query
query = "SELECT * FROM shootings" # table name is "shootings"
# desired format of the returned features
output_format = 'geojson'
# fields to skip
skipfields = ["cartodb_id"]
# all of our request parameters
params = dict(q=query, format=output_format, skipfields=skipfields)
params
{'q': 'SELECT * FROM shootings', 'format': 'geojson', 'skipfields': ['cartodb_id']}
# make the request
r = requests.get(API_endpoint, params=params)
r
<Response [200]>
# Get the returned data in JSON format
# This is a dictionary
features = r.json()
type(features)
dict
# What are the keys?
list(features.keys())
['type', 'features']
features['type']
'FeatureCollection'
# Let's look at the first feature
features['features'][0]
{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': [-75.160058, 39.991636]}, 'properties': {'objectid': 5540209, 'year': 2020, 'dc_key': '202022043328.0', 'code': '411', 'date_': '2020-06-19T00:00:00Z', 'time': '23:45:00', 'race': 'B', 'sex': 'M', 'age': '32', 'wound': 'Leg', 'officer_involved': 'N', 'offender_injured': 'N', 'offender_deceased': 'N', 'location': '2500 BLOCK N 17TH ST', 'latino': 0, 'point_x': None, 'point_y': None, 'dist': '22', 'inside': 0, 'outside': 1, 'fatal': 0}}
Use the GeoDataFrame.from_features()
function to create a GeoDataFrame.
Don't forget to specify the CRS of the input data!
shootings = gpd.GeoDataFrame.from_features(features, crs="EPSG:4326")
shootings.head()
geometry | objectid | year | dc_key | code | date_ | time | race | sex | age | wound | officer_involved | offender_injured | offender_deceased | location | latino | point_x | point_y | dist | inside | outside | fatal | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-75.16006 39.99164) | 5540209 | 2020 | 202022043328.0 | 411 | 2020-06-19T00:00:00Z | 23:45:00 | B | M | 32 | Leg | N | N | N | 2500 BLOCK N 17TH ST | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 |
1 | POINT (-75.18019 39.98646) | 5540210 | 2020 | 202022043831.0 | 411 | 2020-06-22T00:00:00Z | 02:00:00 | B | M | 26 | Multiple | N | N | N | 2800 BLOCK SEDGLEY AVE | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 |
2 | POINT (-75.18512 39.99001) | 5540211 | 2020 | 202022044523.0 | 411 | 2020-06-25T00:00:00Z | 00:00:00 | B | M | 44 | MULTI | N | N | N | 3200 BLOCK SUSQUEHANNA AV | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 |
3 | POINT (-75.17896 39.99545) | 5540212 | 2020 | 202022044909.0 | 411 | 2020-06-26T00:00:00Z | 19:44:00 | B | M | 26 | LEG | N | N | N | 2900 BLOCK W HUNTINGDON ST | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 |
4 | POINT (-75.17896 39.99545) | 5540213 | 2020 | 202022044910.0 | 411 | 2020-06-26T00:00:00Z | 19:44:00 | B | M | 43 | ABDOMEN | N | N | N | 2900 BLOCK W HUNTINGDON ST | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 |
# Load the city limits from open data philly
city_limits = gpd.read_file(
"https://opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson"
).to_crs(epsg=3857)
# make sure we remove missing geometries
shootings = shootings.dropna(subset=['geometry'])
# convert to a better CRS
shootings = shootings.to_crs(epsg=3857)
# Remove any shootings that are outside the city limits
shootings = gpd.sjoin(shootings, city_limits, predicate='within', how='inner').drop(columns=['index_right'])
A quick plot with geopandas to show the shootings as points
fig, ax = plt.subplots(figsize=(6, 6))
# ZIP codes
zip_codes.to_crs(epsg=3857).plot(ax=ax, facecolor="none", edgecolor="black")
# Shootings
shootings.plot(ax=ax, color="crimson", markersize=10, alpha=0.4)
ax.set_axis_off()
# initialize the axes
fig, ax = plt.subplots(figsize=(10, 10), facecolor=plt.get_cmap('viridis')(0))
# convert to Web Mercator and plot the hexbins
x = shootings.geometry.x
y = shootings.geometry.y
ax.hexbin(x, y, gridsize=40, mincnt=1, cmap='viridis')
# overlay the ZIP codes
zip_codes.to_crs(epsg=3857).plot(ax=ax,
facecolor='none',
linewidth=0.5,
edgecolor='white')
ax.set_axis_off()
The COUNT
function can be applied to count all rows.
query = "SELECT COUNT(*) FROM shootings"
params = dict(q=query)
r = requests.get(API_endpoint, params=params)
r.json()
{'rows': [{'count': 13269}], 'time': 0.003, 'fields': {'count': {'type': 'number', 'pgtype': 'int8'}}, 'total_rows': 1}
Important: always good to check how many rows you might be downloading before hand.
The LIMIT
function limits the number of returned rows. It is very useful for taking a quick look at the format of a database.
# select the first 5
query = "SELECT * FROM shootings LIMIT 5"
params = dict(q=query, format="geojson")
r = requests.get(API_endpoint, params=params)
df = gpd.GeoDataFrame.from_features(r.json(), crs="EPSG:4326")
df
geometry | cartodb_id | objectid | year | dc_key | code | date_ | time | race | sex | age | wound | officer_involved | offender_injured | offender_deceased | location | latino | point_x | point_y | dist | inside | outside | fatal | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-75.16006 39.99164) | 1 | 5540209 | 2020 | 202022043328.0 | 411 | 2020-06-19T00:00:00Z | 23:45:00 | B | M | 32 | Leg | N | N | N | 2500 BLOCK N 17TH ST | 0 | None | None | 22 | 0 | 1 | 0 |
1 | POINT (-75.18019 39.98646) | 2 | 5540210 | 2020 | 202022043831.0 | 411 | 2020-06-22T00:00:00Z | 02:00:00 | B | M | 26 | Multiple | N | N | N | 2800 BLOCK SEDGLEY AVE | 0 | None | None | 22 | 0 | 1 | 0 |
2 | POINT (-75.18512 39.99001) | 3 | 5540211 | 2020 | 202022044523.0 | 411 | 2020-06-25T00:00:00Z | 00:00:00 | B | M | 44 | MULTI | N | N | N | 3200 BLOCK SUSQUEHANNA AV | 0 | None | None | 22 | 0 | 1 | 0 |
3 | POINT (-75.17896 39.99545) | 4 | 5540212 | 2020 | 202022044909.0 | 411 | 2020-06-26T00:00:00Z | 19:44:00 | B | M | 26 | LEG | N | N | N | 2900 BLOCK W HUNTINGDON ST | 0 | None | None | 22 | 0 | 1 | 0 |
4 | POINT (-75.17896 39.99545) | 5 | 5540213 | 2020 | 202022044910.0 | 411 | 2020-06-26T00:00:00Z | 19:44:00 | B | M | 43 | ABDOMEN | N | N | N | 2900 BLOCK W HUNTINGDON ST | 0 | None | None | 22 | 0 | 1 | 0 |
shootings.head()
geometry | objectid | year | dc_key | code | date_ | time | race | sex | age | wound | officer_involved | offender_injured | offender_deceased | location | latino | point_x | point_y | dist | inside | outside | fatal | OBJECTID | Shape__Area | Shape__Length | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-8366779.385 4864726.920) | 5540209 | 2020 | 202022043328.0 | 411 | 2020-06-19T00:00:00Z | 23:45:00 | B | M | 32 | Leg | N | N | N | 2500 BLOCK N 17TH ST | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 | 1 | 3.970706e+09 | 394751.12047 |
1 | POINT (-8369020.246 4863975.169) | 5540210 | 2020 | 202022043831.0 | 411 | 2020-06-22T00:00:00Z | 02:00:00 | B | M | 26 | Multiple | N | N | N | 2800 BLOCK SEDGLEY AVE | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 | 1 | 3.970706e+09 | 394751.12047 |
2 | POINT (-8369568.940 4864490.085) | 5540211 | 2020 | 202022044523.0 | 411 | 2020-06-25T00:00:00Z | 00:00:00 | B | M | 44 | MULTI | N | N | N | 3200 BLOCK SUSQUEHANNA AV | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 | 1 | 3.970706e+09 | 394751.12047 |
3 | POINT (-8368883.100 4865281.544) | 5540212 | 2020 | 202022044909.0 | 411 | 2020-06-26T00:00:00Z | 19:44:00 | B | M | 26 | LEG | N | N | N | 2900 BLOCK W HUNTINGDON ST | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 | 1 | 3.970706e+09 | 394751.12047 |
4 | POINT (-8368883.100 4865281.544) | 5540213 | 2020 | 202022044910.0 | 411 | 2020-06-26T00:00:00Z | 19:44:00 | B | M | 43 | ABDOMEN | N | N | N | 2900 BLOCK W HUNTINGDON ST | 0.0 | NaN | NaN | 22 | 0.0 | 1.0 | 0.0 | 1 | 3.970706e+09 | 394751.12047 |
Select nonfatal shootings only
query = "SELECT * FROM shootings WHERE fatal = 0"
# Make the request
params = dict(q=query, format="geojson")
r = requests.get(API_endpoint, params=params)
# Make the GeoDataFrame
fatal = gpd.GeoDataFrame.from_features(r.json(), crs="EPSG:4326")
print("number of nonfatal shootings = ", len(fatal))
number of nonfatal shootings = 10453
Select shootings in 2022
query = "SELECT * FROM shootings WHERE date_ > '12/31/21'"
# Make the request
params = dict(q=query, format="geojson")
r = requests.get(API_endpoint, params=params)
# Make the GeoDataFrame
this_year = gpd.GeoDataFrame.from_features(r.json(), crs="EPSG:4326")
print("number of shootings this year = ", len(this_year))
number of shootings this year = 1850
carto2gpd
¶get()
function will query the databaseget_size()
function will use COUNT()
to get the total number of rowshttps://github.com/PhiladelphiaController/carto2gpd
import carto2gpd
where = "date_ > '12/31/21' and fatal = 0"
df = carto2gpd.get(API_endpoint, 'shootings', where=where)
df.head()
geometry | cartodb_id | objectid | year | dc_key | code | date_ | time | race | sex | age | wound | officer_involved | offender_injured | offender_deceased | location | latino | point_x | point_y | dist | inside | outside | fatal | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-75.12066 40.03026) | 371 | 5545192 | 2022 | 202235024216.0 | 401 | 2022-05-03T00:00:00Z | 19:23:00 | A | M | 24 | Foot | N | N | N | 100 BLOCK E FISHER AVE | 0 | None | None | 35 | 0 | 1 | 0 |
1 | None | 372 | 5545193 | 2022 | 202235025800.0 | 401 | 2022-05-10T00:00:00Z | 16:50:00 | A | M | 27 | Multiple | N | N | N | 1300 BLOCK W ROCKLNAD ST | 0 | None | None | 35 | 0 | 1 | 0 |
2 | POINT (-75.14509 40.02738) | 373 | 5545194 | 2022 | 202235025800.0 | 401 | 2022-05-10T00:00:00Z | 16:50:00 | B | M | 27 | Multiple | N | N | N | 1300 BLOCK W ROCKLAND ST | 0 | None | None | 35 | 0 | 1 | 0 |
3 | POINT (-75.14509 40.02738) | 374 | 5545195 | 2022 | 202235025800.0 | 401 | 2022-05-10T00:00:00Z | 16:50:00 | B | M | 41 | Foot | N | N | N | 1300 BLOCK W ROCKLAND ST | 0 | None | None | 35 | 0 | 1 | 0 |
4 | POINT (-75.13656 40.02895) | 375 | 5545196 | 2022 | 202235027022.0 | 401 | 2022-05-16T00:00:00Z | 17:08:00 | B | M | 34 | Leg | N | N | N | 5000 BLOCK N FRANKLIN ST | 0 | None | None | 35 | 0 | 1 | 0 |
# Limit results to the first 5
df = carto2gpd.get(API_endpoint, 'shootings', limit=5)
len(df)
5
df
geometry | cartodb_id | objectid | year | dc_key | code | date_ | time | race | sex | age | wound | officer_involved | offender_injured | offender_deceased | location | latino | point_x | point_y | dist | inside | outside | fatal | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-75.16006 39.99164) | 1 | 5540209 | 2020 | 202022043328.0 | 411 | 2020-06-19T00:00:00Z | 23:45:00 | B | M | 32 | Leg | N | N | N | 2500 BLOCK N 17TH ST | 0 | None | None | 22 | 0 | 1 | 0 |
1 | POINT (-75.18019 39.98646) | 2 | 5540210 | 2020 | 202022043831.0 | 411 | 2020-06-22T00:00:00Z | 02:00:00 | B | M | 26 | Multiple | N | N | N | 2800 BLOCK SEDGLEY AVE | 0 | None | None | 22 | 0 | 1 | 0 |
2 | POINT (-75.18512 39.99001) | 3 | 5540211 | 2020 | 202022044523.0 | 411 | 2020-06-25T00:00:00Z | 00:00:00 | B | M | 44 | MULTI | N | N | N | 3200 BLOCK SUSQUEHANNA AV | 0 | None | None | 22 | 0 | 1 | 0 |
3 | POINT (-75.17896 39.99545) | 4 | 5540212 | 2020 | 202022044909.0 | 411 | 2020-06-26T00:00:00Z | 19:44:00 | B | M | 26 | LEG | N | N | N | 2900 BLOCK W HUNTINGDON ST | 0 | None | None | 22 | 0 | 1 | 0 |
4 | POINT (-75.17896 39.99545) | 5 | 5540213 | 2020 | 202022044910.0 | 411 | 2020-06-26T00:00:00Z | 19:44:00 | B | M | 43 | ABDOMEN | N | N | N | 2900 BLOCK W HUNTINGDON ST | 0 | None | None | 22 | 0 | 1 | 0 |
size = carto2gpd.get_size(API_endpoint, 'shootings')
print(size)
13269
DateTime
objects¶Add Month and Day of Week columns
# Convert the data column to a datetime object
shootings['date'] = pd.to_datetime(shootings['date_'])
# Add new columns: Month and Day of Week
shootings["Month"] = shootings["date"].dt.month
shootings["Day of Week"] = shootings["date"].dt.dayofweek # Monday is 0, Sunday is 6
Use the familiar Groupby --> size()
count = shootings.groupby(['Month', 'Day of Week']).size()
count = count.reset_index(name='Count')
count.head()
Month | Day of Week | Count | |
---|---|---|---|
0 | 1 | 0 | 135 |
1 | 1 | 1 | 129 |
2 | 1 | 2 | 120 |
3 | 1 | 3 | 125 |
4 | 1 | 4 | 131 |
hvplot
¶# Remember 0 is Monday and 6 is Sunday
count.hvplot.heatmap(
x="Day of Week",
y="Month",
C="Count",
cmap="viridis",
width=400,
height=500,
flip_yaxis=True,
)
Several 3rd party options with easier interfaces for accessing census data
https://api.census.gov/data/2021/acs/acs1?get=NAME,B17001_002E&for=state:*
The census provides web-based documentation:
https://www.census.gov/data/developers/guidance/api-user-guide.html
Several packages provide easier Python interfaces to census data based on the census API.
We'll focus on cenpy
- "Explore and download data from Census APIs"
Let's make this for Philadelphia in Python!
# First step: import cenpy
import cenpy
Functions to help you explore the Census API from Python
cenpy.explorer.available
: Returns information about available datasets in Census APIcenpy.explorer.explain
: Explain a specific Census datasetcenpy.explorer.fips_table
: Get a table of FIPS codes for a specific geographyNote: we can change pandas display options to see all rows/columns and large cells
# UNCOMMENT TO SEE ALL ROWS/COLUMNS IN DATAFRAMES
# pd.options.display.max_rows = 9999
# pd.options.display.max_colwidth = 200
available = cenpy.explorer.available()
available.head()
c_isTimeseries | temporal | spatial | c_isAggregate | publisher | references | programCode | modified | license | keyword | identifier_url | contactPoint | distribution | description | bureauCode | accessLevel | title | c_isAvailable | c_isCube | c_isMicrodata | c_documentationLink | c_dataset | vintage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ABSCB2017 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:007 | 2020-04-30 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ABSCB2017 | {'fn': 'ASE Staff', 'hasEmail': 'mailto:erd.an... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The Annual Business Survey (ABS) provides info... | 006:07 | public | Economic Surveys: Annual Business Survey: Annu... | True | NaN | NaN | https://www.census.gov/developer/ | (abscb,) | 2017.0 |
ABSCB2018 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:007 | 2020-10-26 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ABSCB2018 | {'fn': 'ASE Staff', 'hasEmail': 'mailto:Erd.an... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The Annual Business Survey (ABS) provides info... | 006:07 | public | Economic Surveys: Annual Business Survey: Annu... | True | NaN | NaN | https://www.census.gov/developer/ | (abscb,) | 2018.0 |
ABSCB2019 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:007 | 2021-08-17 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ABSCB2019 | {'fn': 'ASE Staff', 'hasEmail': 'mailto:ERD.an... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The Annual Business Survey (ABS) provides info... | 006:07 | public | Economic Surveys: Annual Business Survey: Annu... | True | NaN | NaN | https://www.census.gov/developer/ | (abscb,) | 2019.0 |
ABSCBO2017 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:007 | 2020-04-30 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ABSCBO2017 | {'fn': 'ASE Staff', 'hasEmail': 'mailto:erd.an... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The Annual Business Survey (ABS) provides info... | 006:07 | public | Economic Surveys: Annual Business Survey: Annu... | True | NaN | NaN | https://www.census.gov/developer/ | (abscbo,) | 2017.0 |
ABSCBO2018 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:007 | 2020-10-26 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ABSCBO2018 | {'fn': 'ASE Staff', 'hasEmail': 'mailto:erd.an... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The Annual Business Survey (ABS) provides info... | 006:07 | public | Economic Surveys: Annual Business Survey: Annu... | True | NaN | NaN | https://www.census.gov/developer/ | (abscbo,) | 2018.0 |
We can use the pandas filter()
to search for specific identifiers in the dataframe.
In this case, let's search for the American Community Survey datasets. We'll match index labels using regular expressions.
In particular, we'll search for labels that start with "ACS". In the language of regular expressions, we'll use the "^" to mean "match labels that start with"
For more info on regular expressions, the documentation for the re module is a good place to start.
# Return a dataframe of all datasets that start with "ACS"
# Axis=0 means to filter the index labels!
acs = available.filter(regex="^ACS", axis=0)
acs
c_isTimeseries | temporal | spatial | c_isAggregate | publisher | references | programCode | modified | license | keyword | identifier_url | contactPoint | distribution | description | bureauCode | accessLevel | title | c_isAvailable | c_isCube | c_isMicrodata | c_documentationLink | c_dataset | vintage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ACSCD1132011 | NaN | 2011/2011 | United States | True | U.S. Census Bureau | http://www.census.gov/developers/ | 006:004 | 2014-10-06 | http://creativecommons.org/publicdomain/zero/1.0/ | (census,) | http://api.census.gov/data/id/ACSCD1132011 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is a natio... | 006:07 | public | 2011 American Community Survey 1-Year Profiles... | True | NaN | NaN | http://www.census.gov/developer/ | (acs1, cd113) | 2011.0 |
ACSCD1152015 | NaN | 2015/2015 | United States | True | U.S. Census Bureau | http://www.census.gov/developers/ | 006:004 | 2017-02-10 | http://creativecommons.org/publicdomain/zero/1.0/ | (census,) | http://api.census.gov/data/id/ACSCD1152015 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | 2015 American Community Survey 1-Year Data Pro... | True | NaN | NaN | http://www.census.gov/developer/ | (acs1, cd115) | 2015.0 |
ACSCP1Y2010 | NaN | NaN | United States | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-09-18 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSCP1Y2010 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 1-Year Comparison Profiles | True | True | NaN | https://www.census.gov/developer/ | (acs, acs1, cprofile) | 2010.0 |
ACSCP1Y2011 | NaN | NaN | United States | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-09-18 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSCP1Y2011 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 1-Year Comparison Profiles | True | True | NaN | https://www.census.gov/developer/ | (acs, acs1, cprofile) | 2011.0 |
ACSCP1Y2012 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-07-05 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSCP1Y2012 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 1-Year Comparison Profiles | True | True | NaN | https://www.census.gov/developer/ | (acs, acs1, cprofile) | 2012.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
ACSST5Y2016 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-06-29 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSST5Y2016 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Subject Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, subject) | 2016.0 |
ACSST5Y2017 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-10-19 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSST5Y2017 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Subject Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, subject) | 2017.0 |
ACSST5Y2018 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2019-10-22 15:36:29.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSST5Y2018 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Subject Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, subject) | 2018.0 |
ACSST5Y2019 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2020-04-03 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSST5Y2019 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | American Community Survey: 5-Year Estimates: S... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, subject) | 2019.0 |
ACSST5Y2020 | NaN | 2020/2020 | US | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2021-07-13 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSST5Y2020 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | American Community Survey: 5-Year Estimates: S... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, subject) | 2020.0 |
215 rows × 23 columns
Many flavors of ACS datasets are available — we want to use the detailed tables version, specifically the 5-year survey.
The relevant identifiers start with: "ACSDT5Y".
# Return a dataframe of all datasets that start with "ACSDT5Y"
available.filter(regex="^ACSDT5Y", axis=0)
c_isTimeseries | temporal | spatial | c_isAggregate | publisher | references | programCode | modified | license | keyword | identifier_url | contactPoint | distribution | description | bureauCode | accessLevel | title | c_isAvailable | c_isCube | c_isMicrodata | c_documentationLink | c_dataset | vintage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ACSDT5Y2009 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2019-08-27 13:11:18.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2009 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | American Community Survey: 5-Year Estimates: D... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2009.0 |
ACSDT5Y2010 | NaN | NaN | United States | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-07-04 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2010 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2010.0 |
ACSDT5Y2011 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-07-04 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2011 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2011.0 |
ACSDT5Y2012 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-07-04 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2012 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2012.0 |
ACSDT5Y2013 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-07-04 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2013 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2013.0 |
ACSDT5Y2014 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-07-04 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2014 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2014.0 |
ACSDT5Y2015 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-07-05 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2015 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2015.0 |
ACSDT5Y2016 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-07-05 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2016 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2016.0 |
ACSDT5Y2017 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2018-08-21 07:11:43.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2017 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | ACS 5-Year Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2017.0 |
ACSDT5Y2018 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2019-10-22 16:28:02.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2018 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | American Community Survey: 5-Year Estimates: D... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2018.0 |
ACSDT5Y2019 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2020-04-03 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2019 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | American Community Survey: 5-Year Estimates: D... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2019.0 |
ACSDT5Y2020 | NaN | 2020/2020 | US | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2021-07-13 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5Y2020 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Community Survey (ACS) is an ongo... | 006:07 | public | American Community Survey: 5-Year Estimates: D... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5) | 2020.0 |
ACSDT5YAIAN2010 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2019-10-24 07:18:57.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5YAIAN2010 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Indian and Alaska Native (AIAN) t... | 006:07 | public | American Community Survey: 5-Year Estimates: A... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, aian) | 2010.0 |
ACSDT5YAIAN2015 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2020-02-13 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5YAIAN2015 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The American Indian and Alaska Native (AIAN) t... | 006:07 | public | ACS 5-Year AIAN Detailed Tables | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, aian) | 2015.0 |
ACSDT5YSPT2010 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2019-10-11 14:16:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5YSPT2010 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The Selected Population Tables (SPT) are rele... | 006:07 | public | American Community Survey: 5-Year Estimates: S... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, spt) | 2010.0 |
ACSDT5YSPT2015 | NaN | NaN | NaN | True | U.S. Census Bureau | https://www.census.gov/developers/ | 006:004 | 2020-02-18 00:00:00.0 | https://creativecommons.org/publicdomain/zero/... | (census,) | https://api.census.gov/data/id/ACSDT5YSPT2015 | {'fn': 'American Community Survey Office', 'ha... | {'@type': 'dcat:Distribution', 'accessURL': 'h... | The Selected Population Tables (SPT) are rele... | 006:07 | public | American Community Survey: 5-Year Estimates: S... | True | True | NaN | https://www.census.gov/developer/ | (acs, acs5, spt) | 2015.0 |
Let's use the latest available 5-year data (2020). We can use the explain()
function to print out a description of the dataset:
cenpy.explorer.explain("ACSDT5Y2020")
{'American Community Survey: 5-Year Estimates: Detailed Tables 5-Year': 'The American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. Summary files include the following geographies: nation, all states (including DC and Puerto Rico), all metropolitan areas, all congressional districts (117th Congress), all counties, all places, and all tracts and block groups. Summary files contain the most detailed cross-tabulations, many of which are published down to block groups. The data are population and housing counts. There are over 64,000 variables in this dataset.'}
Use the cenpy.remote.APIConnection
object, and pass it the name of the dataset.
acs = cenpy.remote.APIConnection("ACSDT5Y2020")
The .variables
attribute stores the available variables (across all Census tables).
We can use the varslike()
function to search the variables
dataframe (it's just a simple wrapper around the pandas filter()
function).
len(acs.variables)
27892
acs.variables.head(n=10)
label | concept | predicateType | group | limit | predicateOnly | hasGeoCollectionSupport | attributes | required | |
---|---|---|---|---|---|---|---|---|---|
for | Census API FIPS 'for' clause | Census API Geography Specification | fips-for | N/A | 0 | True | NaN | NaN | NaN |
in | Census API FIPS 'in' clause | Census API Geography Specification | fips-in | N/A | 0 | True | NaN | NaN | NaN |
ucgid | Uniform Census Geography Identifier clause | Census API Geography Specification | ucgid | N/A | 0 | True | True | NaN | NaN |
B24022_060E | Estimate!!Total:!!Female:!!Service occupations... | SEX BY OCCUPATION AND MEDIAN EARNINGS IN THE P... | int | B24022 | 0 | NaN | NaN | B24022_060EA,B24022_060M,B24022_060MA | NaN |
B19001B_014E | Estimate!!Total:!!$100,000 to $124,999 | HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 202... | int | B19001B | 0 | NaN | NaN | B19001B_014EA,B19001B_014M,B19001B_014MA | NaN |
B07007PR_019E | Estimate!!Total:!!Moved from different municip... | GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY CITI... | int | B07007PR | 0 | NaN | NaN | B07007PR_019EA,B07007PR_019M,B07007PR_019MA | NaN |
B19101A_004E | Estimate!!Total:!!$15,000 to $19,999 | FAMILY INCOME IN THE PAST 12 MONTHS (IN 2020 I... | int | B19101A | 0 | NaN | NaN | B19101A_004EA,B19101A_004M,B19101A_004MA | NaN |
B24022_061E | Estimate!!Total:!!Female:!!Service occupations... | SEX BY OCCUPATION AND MEDIAN EARNINGS IN THE P... | int | B24022 | 0 | NaN | NaN | B24022_061EA,B24022_061M,B24022_061MA | NaN |
B19001B_013E | Estimate!!Total:!!$75,000 to $99,999 | HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 202... | int | B19001B | 0 | NaN | NaN | B19001B_013EA,B19001B_013M,B19001B_013MA | NaN |
B07007PR_018E | Estimate!!Total:!!Moved from different municip... | GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY CITI... | int | B07007PR | 0 | NaN | NaN | B07007PR_018EA,B07007PR_018M,B07007PR_018MA | NaN |
We're interested in variables about hispanic origin broken down by race — let's see if we can find the variables where the "Concept" column starts with "RACE"
acs.varslike?
Signature: acs.varslike(pattern=None, by=None, engine='re', within=None) Docstring: Grabs columns that match a particular search pattern. Parameters ---------- pattern : str a search pattern to match by : str a column in the APIConnection.variables to conduct the search within engine : {'re', 'fnmatch', callable} backend string matching module to use, or a function of the form match(candidate, pattern). (default: 're') within : pandas.DataFrame the variables over which to search. Notes ------ Only regex and fnmatch will be supported modules. Note that, while regex is the default, the python regular expressions module has some strange behavior if you're used to VIM or Perl-like regex. It may be easier to use fnmatch if regex is not providing the results you expect. If you want, you can also pass an engine that is a function. If so, this needs to be a function that has a signature like: fn(candidate, pattern) and return True or False if the candidate matches the pattern. So, for instance, you can use any string processing function: >>> cxn.varslike('_100M', engine = lambda c,p: c.endswith(p) which may also be expressed as a regexp: >>> cxn.varslike('_100M$', engine='re') or an fnmatch pattern: >>> cxn.varslike('*_100M', engine='fnmatch') File: ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/cenpy/remote.py Type: method
race_matches =acs.varslike("HISPANIC OR LATINO ORIGIN BY RACE", by='concept').sort_index() # searches along concept column
race_matches
label | concept | predicateType | group | limit | predicateOnly | hasGeoCollectionSupport | attributes | required | |
---|---|---|---|---|---|---|---|---|---|
B03002_001E | Estimate!!Total: | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_001EA,B03002_001M,B03002_001MA | NaN |
B03002_002E | Estimate!!Total:!!Not Hispanic or Latino: | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_002EA,B03002_002M,B03002_002MA | NaN |
B03002_003E | Estimate!!Total:!!Not Hispanic or Latino:!!Whi... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_003EA,B03002_003M,B03002_003MA | NaN |
B03002_004E | Estimate!!Total:!!Not Hispanic or Latino:!!Bla... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_004EA,B03002_004M,B03002_004MA | NaN |
B03002_005E | Estimate!!Total:!!Not Hispanic or Latino:!!Ame... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_005EA,B03002_005M,B03002_005MA | NaN |
B03002_006E | Estimate!!Total:!!Not Hispanic or Latino:!!Asi... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_006EA,B03002_006M,B03002_006MA | NaN |
B03002_007E | Estimate!!Total:!!Not Hispanic or Latino:!!Nat... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_007EA,B03002_007M,B03002_007MA | NaN |
B03002_008E | Estimate!!Total:!!Not Hispanic or Latino:!!Som... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_008EA,B03002_008M,B03002_008MA | NaN |
B03002_009E | Estimate!!Total:!!Not Hispanic or Latino:!!Two... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_009EA,B03002_009M,B03002_009MA | NaN |
B03002_010E | Estimate!!Total:!!Not Hispanic or Latino:!!Two... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_010EA,B03002_010M,B03002_010MA | NaN |
B03002_011E | Estimate!!Total:!!Not Hispanic or Latino:!!Two... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_011EA,B03002_011M,B03002_011MA | NaN |
B03002_012E | Estimate!!Total:!!Hispanic or Latino: | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_012EA,B03002_012M,B03002_012MA | NaN |
B03002_013E | Estimate!!Total:!!Hispanic or Latino:!!White a... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_013EA,B03002_013M,B03002_013MA | NaN |
B03002_014E | Estimate!!Total:!!Hispanic or Latino:!!Black o... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_014EA,B03002_014M,B03002_014MA | NaN |
B03002_015E | Estimate!!Total:!!Hispanic or Latino:!!America... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_015EA,B03002_015M,B03002_015MA | NaN |
B03002_016E | Estimate!!Total:!!Hispanic or Latino:!!Asian a... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_016EA,B03002_016M,B03002_016MA | NaN |
B03002_017E | Estimate!!Total:!!Hispanic or Latino:!!Native ... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_017EA,B03002_017M,B03002_017MA | NaN |
B03002_018E | Estimate!!Total:!!Hispanic or Latino:!!Some ot... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_018EA,B03002_018M,B03002_018MA | NaN |
B03002_019E | Estimate!!Total:!!Hispanic or Latino:!!Two or ... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_019EA,B03002_019M,B03002_019MA | NaN |
B03002_020E | Estimate!!Total:!!Hispanic or Latino:!!Two or ... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_020EA,B03002_020M,B03002_020MA | NaN |
B03002_021E | Estimate!!Total:!!Hispanic or Latino:!!Two or ... | HISPANIC OR LATINO ORIGIN BY RACE | int | B03002 | 0 | NaN | NaN | B03002_021EA,B03002_021M,B03002_021MA | NaN |
GEO_ID | Geography | SEX BY EDUCATIONAL ATTAINMENT FOR THE POPULATI... | string | B17015,B18104,B17016,B18105,B17017,B18106,B170... | 0 | NaN | NaN | NAME | NaN |
It looks like the table we want is "B03002" — we can also easily filter for all variables in this table
variables = [
"NAME",
"B03002_001E", # Total
"B03002_003E", # Not Hispanic, White
"B03002_004E", # Not Hispanic, Black
"B03002_005E", # Not Hispanic, American Indian
"B03002_006E", # Not Hispanic, Asian
"B03002_007E", # Not Hispanic, Native Hawaiian
"B03002_008E", # Not Hispanic, Other
"B03002_009E", # Not Hispanic, Two or More Races
"B03002_012E", # Hispanic
]
Note: we've also include the "NAME" variable which returns the name of the Census geography we are querying for
The Census API use heirarchy of geographies when requesting data.
For example, you cannot just request data for a specific county — you need to specify the state and the county.
Common hierarchies
Tip: Use the .geographies
attribute
This allows you to see:
acs.geographies['fips']
name | geoLevelDisplay | referenceDate | requires | wildcard | optionalWithWCFor | |
---|---|---|---|---|---|---|
0 | us | 010 | 2020-01-01 | NaN | NaN | NaN |
1 | region | 020 | 2020-01-01 | NaN | NaN | NaN |
2 | division | 030 | 2020-01-01 | NaN | NaN | NaN |
3 | state | 040 | 2020-01-01 | NaN | NaN | NaN |
4 | county | 050 | 2020-01-01 | [state] | [state] | state |
... | ... | ... | ... | ... | ... | ... |
82 | public use microdata area | 795 | 2020-01-01 | [state] | [state] | state |
83 | zip code tabulation area | 860 | 2020-01-01 | NaN | NaN | NaN |
84 | school district (elementary) | 950 | 2020-01-01 | [state] | [state] | state |
85 | school district (secondary) | 960 | 2020-01-01 | [state] | [state] | state |
86 | school district (unified) | 970 | 2020-01-01 | [state] | [state] | state |
87 rows × 6 columns
For the racial dot map, we'll use the most granular available geography: block group.
The hierarchy is: state --> county --> tract --> block group but we can use the *
operator for tracts so we'll need to know the FIPS codes for PA and Philadelphia County
counties = cenpy.explorer.fips_table("COUNTY")
counties.head()
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | AL | 1 | 1 | Autauga County | H1 |
1 | AL | 1 | 3 | Baldwin County | H1 |
2 | AL | 1 | 5 | Barbour County | H1 |
3 | AL | 1 | 7 | Bibb County | H1 |
4 | AL | 1 | 9 | Blount County | H1 |
# Trim to just Philadelphia
# Search for rows where name contains "Philadelphia"
counties.loc[ counties[3].str.contains("Philadelphia") ]
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
2294 | PA | 42 | 101 | Philadelphia County | H6 |
For Philadelphia County, the FIPS codes are:
philly_county_code = "101"
pa_state_code = "42"
You can also look up FIPS codes on Google! Wikipedia is usually a trustworthy source...
We'll use the .query()
function, which takes the following arguments:
cols
- the list of variables desired from the datasetgeo_unit
- string denoting the smallest geographic unit; syntax is "name:FIPS"geo_filter
- dictionary containing groupings of geo_units, if required by the hierarchyphilly_demo_data = acs.query(
cols=variables,
geo_unit="block group:*",
geo_filter={"state": pa_state_code,
"county": philly_county_code,
"tract": "*"},
)
philly_demo_data.head()
NAME | B03002_001E | B03002_003E | B03002_004E | B03002_005E | B03002_006E | B03002_007E | B03002_008E | B03002_009E | B03002_012E | state | county | tract | block group | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Block Group 3, Census Tract 1.01, Philadelphia... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 42 | 101 | 000101 | 3 |
1 | Block Group 1, Census Tract 1.02, Philadelphia... | 1499 | 1318 | 30 | 0 | 40 | 0 | 0 | 33 | 78 | 42 | 101 | 000102 | 1 |
2 | Block Group 4, Census Tract 1.02, Philadelphia... | 316 | 316 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 42 | 101 | 000102 | 4 |
3 | Block Group 2, Census Tract 2, Philadelphia Co... | 1090 | 490 | 40 | 0 | 548 | 0 | 0 | 12 | 0 | 42 | 101 | 000200 | 2 |
4 | Block Group 2, Census Tract 5, Philadelphia Co... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 42 | 101 | 000500 | 2 |
len(philly_demo_data)
1338
Important: data is returned as strings rather than numeric values
for variable in variables:
# Convert all variables EXCEPT for NAME
if variable != "NAME":
philly_demo_data[variable] = philly_demo_data[variable].astype(float)
What if we mess up the geographic hierarchy?
If you forget to include required parts of the geography heirarchy, you'll get an error!
acs.query(
cols=variables,
geo_unit="block group:*",
geo_filter={"state": pa_state_code},
)
/Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/cenpy/remote.py:224: SyntaxWarning: "is not" with a literal. Did you mean "!="? if index is not "": /Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/cenpy/remote.py:224: SyntaxWarning: "is not" with a literal. Did you mean "!="? if index is not "":
--------------------------------------------------------------------------- JSONDecodeError Traceback (most recent call last) File ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/requests/models.py:971, in Response.json(self, **kwargs) 970 try: --> 971 return complexjson.loads(self.text, **kwargs) 972 except JSONDecodeError as e: 973 # Catch JSON-related errors and raise as requests.JSONDecodeError 974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError File ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/simplejson/__init__.py:525, in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw) 521 if (cls is None and encoding is None and object_hook is None and 522 parse_int is None and parse_float is None and 523 parse_constant is None and object_pairs_hook is None 524 and not use_decimal and not kw): --> 525 return _default_decoder.decode(s) 526 if cls is None: File ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/simplejson/decoder.py:370, in JSONDecoder.decode(self, s, _w, _PY3) 369 s = str(s, self.encoding) --> 370 obj, end = self.raw_decode(s) 371 end = _w(s, end).end() File ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/simplejson/decoder.py:400, in JSONDecoder.raw_decode(self, s, idx, _w, _PY3) 399 idx += 3 --> 400 return self.scan_once(s, idx=_w(s, idx).end()) JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of the above exception, another exception occurred: JSONDecodeError Traceback (most recent call last) File ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/cenpy/remote.py:219, in APIConnection.query(self, cols, geo_unit, geo_filter, apikey, **kwargs) 218 try: --> 219 json_content = res.json() 220 df = pd.DataFrame().from_records(json_content[1:], columns=json_content[0]) File ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/requests/models.py:975, in Response.json(self, **kwargs) 972 except JSONDecodeError as e: 973 # Catch JSON-related errors and raise as requests.JSONDecodeError 974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError --> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of the above exception, another exception occurred: HTTPError Traceback (most recent call last) Input In [70], in <cell line: 1>() ----> 1 acs.query( 2 cols=variables, 3 geo_unit="block group:*", 4 geo_filter={"state": pa_state_code}, 5 ) File ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/cenpy/remote.py:229, in APIConnection.query(self, cols, geo_unit, geo_filter, apikey, **kwargs) 227 except (ValueError, JSONDecodeError): 228 if res.status_code == 400: --> 229 raise r.HTTPError( 230 "400 " + "\n".join(map(lambda x: x.decode(), res.iter_lines())) 231 ) 232 else: 233 res.raise_for_status() HTTPError: 400 error: unknown/unsupported geography heirarchy
cenpy
includes an interface to the Census' [Tiger] shapefile database.
cenpy.tiger.available?
Signature: cenpy.tiger.available(verbose=False) Docstring: Query the TIGERweb geoAPI for available MapServices Parameters ----------- verbose : int or bool indicator for the verbosity level. Accepts levels -1, 0, 1, and greater. Returns ------- list or dict of available MapServers through TIGERweb File: ~/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/cenpy/tiger.py Type: function
cenpy.tiger.available()
[{'name': 'AIANNHA', 'type': 'MapServer'}, {'name': 'CBSA', 'type': 'MapServer'}, {'name': 'Hydro', 'type': 'MapServer'}, {'name': 'Labels', 'type': 'MapServer'}, {'name': 'Legislative', 'type': 'MapServer'}, {'name': 'Places_CouSub_ConCity_SubMCD', 'type': 'MapServer'}, {'name': 'PUMA_TAD_TAZ_UGA_ZCTA', 'type': 'MapServer'}, {'name': 'Region_Division', 'type': 'MapServer'}, {'name': 'School', 'type': 'MapServer'}, {'name': 'Special_Land_Use_Areas', 'type': 'MapServer'}, {'name': 'State_County', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2012', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2013', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2014', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2015', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2016', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2017', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2018', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2019', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2021', 'type': 'MapServer'}, {'name': 'tigerWMS_ACS2022', 'type': 'MapServer'}, {'name': 'tigerWMS_Census2010', 'type': 'MapServer'}, {'name': 'tigerWMS_Census2020', 'type': 'MapServer'}, {'name': 'tigerWMS_Current', 'type': 'MapServer'}, {'name': 'tigerWMS_ECON2012', 'type': 'MapServer'}, {'name': 'tigerWMS_PhysicalFeatures', 'type': 'MapServer'}, {'name': 'Tracts_Blocks', 'type': 'MapServer'}, {'name': 'Transportation', 'type': 'MapServer'}, {'name': 'TribalTracts', 'type': 'MapServer'}, {'name': 'Urban', 'type': 'MapServer'}, {'name': 'USLandmass', 'type': 'MapServer'}]
Set the tigerWMS_Census2020 database as the desired GeoService.
type(acs)
cenpy.remote.APIConnection
acs.set_mapservice("tigerWMS_Census2020")
Connection to American Community Survey: 5-Year Estimates: Detailed Tables 5-Year(ID: https://api.census.gov/data/id/ACSDT5Y2020) With MapServer: Census 2020 WMS
The map service has many different layers — select the layer for our desired geography
acs.mapservice.layers
[(ESRILayer) Urban Growth Areas, (ESRILayer) Urban Growth Areas Labels, (ESRILayer) Tribal Census Tracts, (ESRILayer) Tribal Census Tracts Labels, (ESRILayer) Tribal Block Groups, (ESRILayer) Tribal Block Groups Labels, (ESRILayer) Census Tracts, (ESRILayer) Census Tracts Labels, (ESRILayer) Census Block Groups, (ESRILayer) Census Block Groups Labels, (ESRILayer) Census Blocks, (ESRILayer) Census Blocks Labels, (ESRILayer) Unified School Districts, (ESRILayer) Unified School Districts Labels, (ESRILayer) Secondary School Districts, (ESRILayer) Secondary School Districts Labels, (ESRILayer) Elementary School Districts, (ESRILayer) Elementary School Districts Labels, (ESRILayer) Estates, (ESRILayer) Estates Labels, (ESRILayer) County Subdivisions, (ESRILayer) County Subdivisions Labels, (ESRILayer) Subbarrios, (ESRILayer) Subbarrios Labels, (ESRILayer) Consolidated Cities, (ESRILayer) Consolidated Cities Labels, (ESRILayer) Incorporated Places, (ESRILayer) Incorporated Places Labels, (ESRILayer) Census Designated Places, (ESRILayer) Census Designated Places Labels, (ESRILayer) Alaska Native Regional Corporations, (ESRILayer) Alaska Native Regional Corporations Labels, (ESRILayer) Tribal Subdivisions, (ESRILayer) Tribal Subdivisions Labels, (ESRILayer) Federal American Indian Reservations, (ESRILayer) Federal American Indian Reservations Labels, (ESRILayer) Off-Reservation Trust Lands, (ESRILayer) Off-Reservation Trust Lands Labels, (ESRILayer) State American Indian Reservations, (ESRILayer) State American Indian Reservations Labels, (ESRILayer) Hawaiian Home Lands, (ESRILayer) Hawaiian Home Lands Labels, (ESRILayer) Alaska Native Village Statistical Areas, (ESRILayer) Alaska Native Village Statistical Areas Labels, (ESRILayer) Oklahoma Tribal Statistical Areas, (ESRILayer) Oklahoma Tribal Statistical Areas Labels, (ESRILayer) State Designated Tribal Statistical Areas, (ESRILayer) State Designated Tribal Statistical Areas Labels, (ESRILayer) Tribal Designated Statistical Areas, (ESRILayer) Tribal Designated Statistical Areas Labels, (ESRILayer) American Indian Joint-Use Areas, (ESRILayer) American Indian Joint-Use Areas Labels, (ESRILayer) 116th Congressional Districts, (ESRILayer) 116th Congressional Districts Labels, (ESRILayer) 2018 State Legislative Districts - Upper, (ESRILayer) 2018 State Legislative Districts - Upper Labels, (ESRILayer) 2018 State Legislative Districts - Lower, (ESRILayer) 2018 State Legislative Districts - Lower Labels, (ESRILayer) Voting Districts, (ESRILayer) Voting Districts Labels, (ESRILayer) Census Divisions, (ESRILayer) Census Divisions Labels, (ESRILayer) Census Regions, (ESRILayer) Census Regions Labels, (ESRILayer) Combined New England City and Town Areas, (ESRILayer) Combined New England City and Town Areas Labels, (ESRILayer) New England City and Town Area Divisions, (ESRILayer) New England City and Town Area Divisions Labels, (ESRILayer) Metropolitan New England City and Town Areas, (ESRILayer) Metropolitan New England City and Town Areas Labels, (ESRILayer) Micropolitan New England City and Town Areas, (ESRILayer) Micropolitan New England City and Town Areas Labels, (ESRILayer) Combined Statistical Areas, (ESRILayer) Combined Statistical Areas Labels, (ESRILayer) Metropolitan Divisions, (ESRILayer) Metropolitan Divisions Labels, (ESRILayer) Metropolitan Statistical Areas, (ESRILayer) Metropolitan Statistical Areas Labels, (ESRILayer) Micropolitan Statistical Areas, (ESRILayer) Micropolitan Statistical Areas Labels, (ESRILayer) States, (ESRILayer) States Labels, (ESRILayer) Counties, (ESRILayer) Counties Labels, (ESRILayer) Zip Code Tabulation Areas, (ESRILayer) Zip Code Tabulation Areas Labels, (ESRILayer) Public Use Microdata Areas, (ESRILayer) Public Use Microdata Areas Labels]
block_group_geoservice = acs.mapservice.layers[8]
block_group_geoservice
(ESRILayer) Census Block Groups
type(block_group_geoservice)
cenpy.tiger.ESRILayer
block_group_geoservice.variables
name | type | alias | domain | length | |
---|---|---|---|---|---|
0 | OBJECTID | esriFieldTypeOID | OBJECTID | None | NaN |
1 | AREALAND | esriFieldTypeDouble | AREALAND | None | NaN |
2 | ALANDHIST | esriFieldTypeDouble | ALANDHIST | None | NaN |
3 | AREAWATER | esriFieldTypeDouble | AREAWATER | None | NaN |
4 | AWATERHIST | esriFieldTypeDouble | AWATERHIST | None | NaN |
5 | BLKGRP | esriFieldTypeString | BLKGRP | None | 1.0 |
6 | COUNTY | esriFieldTypeString | COUNTY | None | 3.0 |
7 | EFFDATE | esriFieldTypeDate | EFFDATE | None | 8.0 |
8 | ESTABDATE | esriFieldTypeDate | ESTABDATE | None | 8.0 |
9 | FUNCSTAT | esriFieldTypeString | FUNCSTAT | None | 1.0 |
10 | GEOID | esriFieldTypeString | GEOID | None | 12.0 |
11 | LSADC | esriFieldTypeString | LSADC | None | 2.0 |
12 | MTFCC | esriFieldTypeString | MTFCC | None | 5.0 |
13 | BASENAME | esriFieldTypeString | BASENAME | None | 100.0 |
14 | NAME | esriFieldTypeString | NAME | None | 100.0 |
15 | OID | esriFieldTypeString | OID | None | 22.0 |
16 | STATE | esriFieldTypeString | STATE | None | 2.0 |
17 | TRACT | esriFieldTypeString | TRACT | None | 6.0 |
18 | UR | esriFieldTypeString | UR | None | 1.0 |
19 | VINTAGE | esriFieldTypeString | VINTAGE | None | 2.0 |
20 | CENTLON | esriFieldTypeString | CENTLON | None | 12.0 |
21 | CENTLAT | esriFieldTypeString | CENTLAT | None | 11.0 |
22 | INTPTLON | esriFieldTypeString | INTPTLON | None | 12.0 |
23 | INTPTLAT | esriFieldTypeString | INTPTLAT | None | 11.0 |
24 | HU100 | esriFieldTypeDouble | HU100 | None | NaN |
25 | POP100 | esriFieldTypeDouble | POP100 | None | NaN |
26 | STGEOMETRY | esriFieldTypeGeometry | STGEOMETRY | None | NaN |
27 | STGEOMETRY.AREA | esriFieldTypeDouble | STGEOMETRY.AREA | None | NaN |
28 | STGEOMETRY.LEN | esriFieldTypeDouble | STGEOMETRY.LEN | None | NaN |
Use esri2gpd to load the data
# The base url for the map service API endpoint
block_group_geoservice._baseurl
'http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/tigerWMS_Census2020/MapServer/8'
## We're just querying a GeoService — let's use esri2gpd
# Only Philadelphia
where_clause = "STATE = '42' AND COUNTY = '101'"
# Query
philly_block_groups = esri2gpd.get(block_group_geoservice._baseurl, where=where_clause)
len(philly_block_groups)
1338
philly_block_groups.head()
geometry | OBJECTID | AREALAND | ALANDHIST | AREAWATER | AWATERHIST | BLKGRP | COUNTY | EFFDATE | ESTABDATE | FUNCSTAT | GEOID | LSADC | MTFCC | BASENAME | NAME | OID | STATE | TRACT | UR | VINTAGE | CENTLON | CENTLAT | INTPTLON | INTPTLAT | HU100 | POP100 | STGEOMETRY.AREA | STGEOMETRY.LEN | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POLYGON ((-75.05449 40.02465, -75.05405 40.024... | 82 | 305092 | 305092 | 0 | 0 | 3 | 101 | NaN | NaN | S | 421010323003 | BG | G5030 | 3 | Block Group 3 | 208703717019554 | 42 | 032300 | None | 70 | -075.0499467 | +40.0222676 | -075.0499467 | +40.0222676 | 744 | 1869 | 520848.611219 | 3194.678561 |
1 | POLYGON ((-75.15987 39.94454, -75.15998 39.944... | 83 | 70896 | 70896 | 0 | 0 | 1 | 101 | NaN | NaN | S | 421010011021 | BG | G5030 | 1 | Block Group 1 | 208703717019574 | 42 | 001102 | None | 70 | -075.1581610 | +39.9449699 | -075.1581610 | +39.9449699 | 1180 | 1729 | 120760.631816 | 1390.110306 |
2 | POLYGON ((-75.00180 40.10493, -75.00165 40.104... | 904 | 423561 | 417638 | 0 | 0 | 2 | 101 | NaN | NaN | S | 421010361002 | BG | G5030 | 2 | Block Group 2 | 20870508933460 | 42 | 036100 | None | 70 | -074.9965242 | +40.1037503 | -074.9965242 | +40.1037503 | 404 | 1134 | 724822.503578 | 3558.455582 |
3 | POLYGON ((-75.15255 40.00318, -75.15293 40.001... | 920 | 192284 | 192284 | 0 | 0 | 2 | 101 | NaN | NaN | S | 421010200002 | BG | G5030 | 2 | Block Group 2 | 208703717019432 | 42 | 020000 | None | 70 | -075.1507409 | +40.0042065 | -075.1507409 | +40.0042065 | 497 | 747 | 328094.605123 | 3138.906384 |
4 | POLYGON ((-75.15216 40.01254, -75.15189 40.011... | 921 | 212571 | 212571 | 0 | 0 | 1 | 101 | NaN | NaN | S | 421010203001 | BG | G5030 | 1 | Block Group 1 | 208703717019433 | 42 | 020300 | None | 70 | -075.1492793 | +40.0091183 | -075.1492793 | +40.0091183 | 704 | 1354 | 362759.782648 | 3386.729282 |
Merge based on multiple columns: state, county, tract, and block group IDs.
The relevant columns are:
philly_demo_final = philly_block_groups.merge(
philly_demo_data,
left_on=["STATE", "COUNTY", "TRACT", "BLKGRP"],
right_on=["state", "county", "tract", "block group"],
)
philly_demo_final.head()
geometry | OBJECTID | AREALAND | ALANDHIST | AREAWATER | AWATERHIST | BLKGRP | COUNTY | EFFDATE | ESTABDATE | FUNCSTAT | GEOID | LSADC | MTFCC | BASENAME | NAME_x | OID | STATE | TRACT | UR | VINTAGE | CENTLON | CENTLAT | INTPTLON | INTPTLAT | HU100 | POP100 | STGEOMETRY.AREA | STGEOMETRY.LEN | NAME_y | B03002_001E | B03002_003E | B03002_004E | B03002_005E | B03002_006E | B03002_007E | B03002_008E | B03002_009E | B03002_012E | state | county | tract | block group | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POLYGON ((-75.05449 40.02465, -75.05405 40.024... | 82 | 305092 | 305092 | 0 | 0 | 3 | 101 | NaN | NaN | S | 421010323003 | BG | G5030 | 3 | Block Group 3 | 208703717019554 | 42 | 032300 | None | 70 | -075.0499467 | +40.0222676 | -075.0499467 | +40.0222676 | 744 | 1869 | 520848.611219 | 3194.678561 | Block Group 3, Census Tract 323, Philadelphia ... | 1537.0 | 744.0 | 439.0 | 0.0 | 0.0 | 0.0 | 0.0 | 22.0 | 332.0 | 42 | 101 | 032300 | 3 |
1 | POLYGON ((-75.15987 39.94454, -75.15998 39.944... | 83 | 70896 | 70896 | 0 | 0 | 1 | 101 | NaN | NaN | S | 421010011021 | BG | G5030 | 1 | Block Group 1 | 208703717019574 | 42 | 001102 | None | 70 | -075.1581610 | +39.9449699 | -075.1581610 | +39.9449699 | 1180 | 1729 | 120760.631816 | 1390.110306 | Block Group 1, Census Tract 11.02, Philadelphi... | 1742.0 | 1371.0 | 84.0 | 0.0 | 186.0 | 0.0 | 0.0 | 82.0 | 19.0 | 42 | 101 | 001102 | 1 |
2 | POLYGON ((-75.00180 40.10493, -75.00165 40.104... | 904 | 423561 | 417638 | 0 | 0 | 2 | 101 | NaN | NaN | S | 421010361002 | BG | G5030 | 2 | Block Group 2 | 20870508933460 | 42 | 036100 | None | 70 | -074.9965242 | +40.1037503 | -074.9965242 | +40.1037503 | 404 | 1134 | 724822.503578 | 3558.455582 | Block Group 2, Census Tract 361, Philadelphia ... | 935.0 | 802.0 | 7.0 | 0.0 | 97.0 | 0.0 | 0.0 | 0.0 | 29.0 | 42 | 101 | 036100 | 2 |
3 | POLYGON ((-75.15255 40.00318, -75.15293 40.001... | 920 | 192284 | 192284 | 0 | 0 | 2 | 101 | NaN | NaN | S | 421010200002 | BG | G5030 | 2 | Block Group 2 | 208703717019432 | 42 | 020000 | None | 70 | -075.1507409 | +40.0042065 | -075.1507409 | +40.0042065 | 497 | 747 | 328094.605123 | 3138.906384 | Block Group 2, Census Tract 200, Philadelphia ... | 976.0 | 101.0 | 722.0 | 0.0 | 10.0 | 0.0 | 0.0 | 0.0 | 143.0 | 42 | 101 | 020000 | 2 |
4 | POLYGON ((-75.15216 40.01254, -75.15189 40.011... | 921 | 212571 | 212571 | 0 | 0 | 1 | 101 | NaN | NaN | S | 421010203001 | BG | G5030 | 1 | Block Group 1 | 208703717019433 | 42 | 020300 | None | 70 | -075.1492793 | +40.0091183 | -075.1492793 | +40.0091183 | 704 | 1354 | 362759.782648 | 3386.729282 | Block Group 1, Census Tract 203, Philadelphia ... | 1519.0 | 13.0 | 1467.0 | 0.0 | 12.0 | 0.0 | 0.0 | 0.0 | 27.0 | 42 | 101 | 020300 | 1 |
Using geopandas
...
# Check the CRS
philly_demo_final.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
fig, ax = plt.subplots(figsize=(10,10))
# Plot the choropleth
philly_demo_final.plot(ax=ax, column='B03002_001E', legend=True)
# Format
ax.set_title("Population of Philadelphia by Block Group", fontsize=16)
ax.set_axis_off()
Or using hvplot
...
cols = ['NAME_x', 'B03002_001E', 'geometry']
philly_demo_final[cols].hvplot(c='B03002_001E',
geo=True,
legend=True,
width=600,
height=400,
cmap='viridis',
frame_height=800,
frame_width=800,
hover_cols=["NAME_x"])
# Rename columns
philly_demo_final = philly_demo_final.rename(
columns={
"B03002_001E": "Total", # Total
"B03002_003E": "White", # Not Hispanic, White
"B03002_004E": "Black", # Not Hispanic, Black
"B03002_005E": "AI/AN", # Not Hispanic, American Indian
"B03002_006E": "Asian", # Not Hispanic, Asian
"B03002_007E": "NH/PI", # Not Hispanic, Native Hawaiian
"B03002_008E": "Other_", # Not Hispanic, Other
"B03002_009E": "Two Plus", # Not Hispanic, Two or More Races
"B03002_012E": "Hispanic", # Hispanic
}
)
# Add an "Other" column
cols = ['AI/AN', 'NH/PI','Other_', 'Two Plus']
philly_demo_final['Other'] = philly_demo_final[cols].sum(axis=1)
Given a polygon, create randomly distributed points that fall within the polygon.
def random_points_in_polygon(number, polygon):
"""
Generate a random number of points within the
specified polygon.
"""
points = []
min_x, min_y, max_x, max_y = polygon.bounds
i= 0
while i < number:
point = Point(np.random.uniform(min_x, max_x), np.random.uniform(min_y, max_y))
if polygon.contains(point):
points.append(point)
i += 1
return points
Random points example
# get the first block group polygon in the data set
geo = philly_demo_final.iloc[0].geometry
geo
fig, ax = plt.subplots(figsize=(6, 6))
# Generate some random points
random_points = random_points_in_polygon(100, geo)
# Plot random points
gpd.GeoSeries(random_points).plot(ax=ax, markersize=20, color="red")
# Plot boundary of block group
gpd.GeoSeries([geo]).plot(ax=ax, facecolor="none", edgecolor="black")
ax.set_axis_off()
def generate_dot_map(data, people_per_dot):
"""
Given a GeoDataFrame with demographic columns, generate a dot
map according to the population in each geometry.
"""
results = []
for field in ["White", "Hispanic", "Black", "Asian", "Other"]:
# generate random points
pts = data.apply(
lambda row: random_points_in_polygon(
row[field] / people_per_dot, row["geometry"]
),
axis=1,
)
# combine into single GeoSeries
pts = gpd.GeoSeries(pts.apply(pd.Series).stack(), dtype=object, crs=data["geometry"].crs)
pts.name = "geometry"
# make into a GeoDataFrame
pts = gpd.GeoDataFrame(pts)
pts["field"] = field
# save
results.append(pts)
return gpd.GeoDataFrame(pd.concat(results), crs=data["geometry"].crs).reset_index(
drop=True
)
dot_map = generate_dot_map(philly_demo_final, people_per_dot=50)
/Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:122: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead. arr = construct_1d_object_array_from_listlike(values) /var/folders/49/ntrr94q12xd4rq8hqdnx96gm0000gn/T/ipykernel_13687/2645225740.py:18: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. pts = gpd.GeoSeries(pts.apply(pd.Series).stack(), dtype=object, crs=data["geometry"].crs) /Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:122: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead. arr = construct_1d_object_array_from_listlike(values) /var/folders/49/ntrr94q12xd4rq8hqdnx96gm0000gn/T/ipykernel_13687/2645225740.py:18: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. pts = gpd.GeoSeries(pts.apply(pd.Series).stack(), dtype=object, crs=data["geometry"].crs) /Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:122: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead. arr = construct_1d_object_array_from_listlike(values) /var/folders/49/ntrr94q12xd4rq8hqdnx96gm0000gn/T/ipykernel_13687/2645225740.py:18: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. pts = gpd.GeoSeries(pts.apply(pd.Series).stack(), dtype=object, crs=data["geometry"].crs) /var/folders/49/ntrr94q12xd4rq8hqdnx96gm0000gn/T/ipykernel_13687/2645225740.py:18: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. pts = gpd.GeoSeries(pts.apply(pd.Series).stack(), dtype=object, crs=data["geometry"].crs) /Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:122: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead. arr = construct_1d_object_array_from_listlike(values) /Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:122: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead. arr = construct_1d_object_array_from_listlike(values) /var/folders/49/ntrr94q12xd4rq8hqdnx96gm0000gn/T/ipykernel_13687/2645225740.py:18: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. pts = gpd.GeoSeries(pts.apply(pd.Series).stack(), dtype=object, crs=data["geometry"].crs)
print("number of points = ", len(dot_map))
number of points = 34143
dot_map.tail()
geometry | field | |
---|---|---|
34138 | POINT (-75.07955 40.01939) | Other |
34139 | POINT (-75.24719 39.96220) | Other |
34140 | POINT (-75.08613 40.02940) | Other |
34141 | POINT (-75.05695 40.01857) | Other |
34142 | POINT (-75.21972 39.92888) | Other |
# setup a custom color map from ColorBrewer
from matplotlib.colors import ListedColormap
cmap = ListedColormap(
["#3a833c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00", "#ffff33"]
)
# Convert to 3857
dot_map_3857 = dot_map.to_crs(epsg=3857)
# Initialize the figure and axes
fig, ax = plt.subplots(figsize=(10, 10), facecolor="#cfcfcf")
# Plot
dot_map_3857.plot(
ax=ax,
column="field",
categorical=True,
legend=True,
alpha=1,
markersize=0.5,
cmap=cmap,
)
# format
ax.set_title("Philadelphia, PA", fontsize=16)
ax.text(
0.5, 0.95, "1 dot = 50 people", fontsize=12, transform=ax.transAxes, ha="center"
)
ax.set_axis_off()
Let's use demographic census data in Philadelphia by census tract and compare to a dataset of childhood lead poisoning.
*
operator to get all tracts in Philadelphia countyphilly_demo_tract = acs.query(
cols=["NAME", "B03002_001E", "B03002_004E"],
geo_unit="tract:*",
geo_filter={
"state" : "42",
"county" : "101"
},
)
philly_demo_tract.head()
NAME | B03002_001E | B03002_004E | state | county | tract | |
---|---|---|---|---|---|---|
0 | Census Tract 27.01, Philadelphia County, Penns... | 4098 | 368 | 42 | 101 | 002701 |
1 | Census Tract 27.02, Philadelphia County, Penns... | 4300 | 124 | 42 | 101 | 002702 |
2 | Census Tract 28.01, Philadelphia County, Penns... | 4452 | 294 | 42 | 101 | 002801 |
3 | Census Tract 28.02, Philadelphia County, Penns... | 5772 | 221 | 42 | 101 | 002802 |
4 | Census Tract 29, Philadelphia County, Pennsylv... | 3762 | 50 | 42 | 101 | 002900 |
philly_demo_tract.dtypes # "object" means string!
NAME object B03002_001E object B03002_004E object state object county object tract object dtype: object
# Census tracts are the 7th layer (index 6 starting from 0)
acs.mapservice.layers[6]
(ESRILayer) Census Tracts
# The base url for the map service API endpoint
url = acs.mapservice.layers[6]._baseurl
url
'http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/tigerWMS_Census2020/MapServer/6'
## We're just querying a GeoService — let's use esri2gpd
# Only Philadelphia
where_clause = "STATE = '42' AND COUNTY = '101'"
# Query
philly_census_tracts = esri2gpd.get(url, where=where_clause)
# Merge them together
# IMPORTANT: Make sure your merge keys are the same dtypes (e.g., all strings or all ints)
philly_demo_tract = philly_census_tracts.merge(
philly_demo_tract,
left_on=["STATE", "COUNTY", "TRACT"],
right_on=["state", "county", "tract"],
)
Add a new column to your data called percent_black
.
Important: Make sure you convert the data to floats!
for col in ['B03002_001E', 'B03002_004E']:
philly_demo_tract[col] = philly_demo_tract[col].astype(float)
philly_demo_tract["percent_black"] = (
100 * philly_demo_tract["B03002_004E"] / philly_demo_tract["B03002_001E"]
)
carto2gpd
package# Documentation includes an example for help!
# carto2gpd.get?
table_name = 'child_blood_lead_levels_by_ct'
lead_levels = carto2gpd.get("https://phl.carto.com/api/v2/sql", table_name)
lead_levels.head()
geometry | cartodb_id | census_tract | data_redacted | num_bll_5plus | num_screen | perc_5plus | |
---|---|---|---|---|---|---|---|
0 | POLYGON ((-75.14147 39.95171, -75.14150 39.951... | 1 | 42101000100 | False | 0.0 | 100.0 | 0.0 |
1 | POLYGON ((-75.16238 39.95766, -75.16236 39.957... | 2 | 42101000200 | True | NaN | 109.0 | NaN |
2 | POLYGON ((-75.17821 39.95981, -75.17743 39.959... | 3 | 42101000300 | True | NaN | 110.0 | NaN |
3 | POLYGON ((-75.17299 39.95464, -75.17301 39.954... | 4 | 42101000401 | True | NaN | 61.0 | NaN |
4 | POLYGON ((-75.16333 39.95334, -75.16340 39.953... | 5 | 42101000402 | False | 0.0 | 41.0 | 0.0 |
See the .dropna()
function and the subset=
keyword.
lead_levels = lead_levels.dropna(subset=['perc_5plus'])
census_tract
and GEOID
fieldsGeoDataFrame.merge(...)
# Trim the lead levels data
lead_levels_trimmed = lead_levels[['census_tract', 'perc_5plus']]
# Merge into the demographic data
# Use "GEOID" — that is the unique identifier here
merged = philly_demo_tract.merge(lead_levels_trimmed,
how='left',
left_on='GEOID',
right_on='census_tract')
merged.head()
geometry | OBJECTID | AREALAND | ALANDHIST | AREAWATER | AWATERHIST | COUNTY | EFFDATE | ESTABDATE | FUNCSTAT | GEOID | LSADC | MTFCC | BASENAME | NAME_x | OID | STATE | TRACT | UR | VINTAGE | CENTLON | CENTLAT | INTPTLON | INTPTLAT | HU100 | POP100 | STGEOMETRY.AREA | STGEOMETRY.LEN | NAME_y | B03002_001E | B03002_004E | state | county | tract | percent_black | census_tract | perc_5plus | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POLYGON ((-75.02371 40.13005, -75.02489 40.129... | 669 | 1816210 | 1816210 | 7169 | 7169 | 101 | NaN | NaN | S | 42101036501 | CT | G5020 | 365.01 | Census Tract 365.01 | 207703717001577 | 42 | 036501 | None | 70 | -075.0131963 | +40.1293202 | -075.0122761 | +40.1288937 | 2520 | 5698 | 3.122595e+06 | 7152.056313 | Census Tract 365.01, Philadelphia County, Penn... | 4799.0 | 538.0 | 42 | 101 | 036501 | 11.210669 | NaN | NaN |
1 | POLYGON ((-75.18329 39.94951, -75.18275 39.949... | 673 | 99957 | 99957 | 12755 | 12755 | 101 | NaN | NaN | S | 42101000801 | CT | G5020 | 8.01 | Census Tract 8.01 | 207703717001596 | 42 | 000801 | None | 70 | -075.1800509 | +39.9500600 | -075.1804054 | +39.9497374 | 1195 | 1748 | 1.920198e+05 | 1800.716073 | Census Tract 8.01, Philadelphia County, Pennsy... | 1631.0 | 26.0 | 42 | 101 | 000801 | 1.594114 | NaN | NaN |
2 | POLYGON ((-75.16899 40.07147, -75.16892 40.071... | 2651 | 406340 | 406054 | 0 | 0 | 101 | NaN | NaN | S | 42101026301 | CT | G5020 | 263.01 | Census Tract 263.01 | 20770510238026 | 42 | 026301 | None | 70 | -075.1637161 | +40.0728148 | -075.1637161 | +40.0728148 | 1656 | 3726 | 6.947209e+05 | 3399.391181 | Census Tract 263.01, Philadelphia County, Penn... | 3772.0 | 3435.0 | 42 | 101 | 026301 | 91.065748 | 42101026301 | 3.6 |
3 | POLYGON ((-75.16912 40.02386, -75.16848 40.023... | 2662 | 421189 | 421189 | 0 | 0 | 101 | NaN | NaN | S | 42101024400 | CT | G5020 | 244 | Census Tract 244 | 20770510183838 | 42 | 024400 | None | 70 | -075.1638925 | +40.0248284 | -075.1638925 | +40.0248284 | 1374 | 3152 | 7.191082e+05 | 3469.999759 | Census Tract 244, Philadelphia County, Pennsyl... | 3453.0 | 2779.0 | 42 | 101 | 024400 | 80.480741 | 42101024400 | 11.3 |
4 | POLYGON ((-75.05465 40.04464, -75.05449 40.044... | 2664 | 842716 | 828420 | 0 | 0 | 101 | NaN | NaN | S | 42101033200 | CT | G5020 | 332 | Census Tract 332 | 20770510237205 | 42 | 033200 | None | 70 | -075.0449684 | +40.0439733 | -075.0449684 | +40.0439733 | 1046 | 3292 | 1.439592e+06 | 5498.109229 | Census Tract 332, Philadelphia County, Pennsyl... | 2951.0 | 94.0 | 42 | 101 | 033200 | 3.185361 | 42101033200 | 0.0 |
We only need the 'geometry', 'percent_black', and 'perc_5plus', and 'NAME' columns
merged = merged[['NAME_x', 'geometry', 'percent_black', 'perc_5plus']]
Make two plots:
You can make these using hvplot or geopandas/matplotlib — whichever you prefer!
# Lead levels plot
img1 = merged.hvplot(geo=True,
c='perc_5plus',
width=500,
height=400,
cmap='viridis',
title='Lead Levels')
# Percent black
img2 = merged.hvplot(geo=True,
c='percent_black',
width=500,
height=400,
cmap='viridis',
title='% Black')
img1 + img2
cols = ["perc_5plus", "percent_black"]
merged[cols].hvplot.scatter(x=cols[0], y=cols[1])
In the previous plots, it's still hard to see the relationship. Use the kdeplot()
function in seaborn
to better visualize the relationship.
You will need to remove any NaN entries first.
You should see two peaks in the distribution clearly now!
fig, ax = plt.subplots(figsize=(8,6))
X = merged.dropna()
sns.kdeplot(x=X['perc_5plus'], y=X['percent_black'], ax=ax);