Week 4B: Working with Raster Data¶

Sep 28, 2022

Housekeeping¶

  • Homework #2 due on Monday!
    • Part 1: Working with eviction data in Philadelphia
    • Part 2: Exploring the NDVI in Philadelphia
  • Come to office hours!
    • Saturday morning (Nick) 10am-12pm
    • Tuesday/Thursday (Kristin) 11am-12pm
    • Sign up for time slots on Canvas calendar

Follow along in lecture on Binder!¶

All lecture slides will work on Binder, data is available to be loaded, etc. Binder has an exact copy of the file structure in the GitHub repo.

https://github.com/MUSA-550-Fall-2022/week-4

Screen%20Shot%202020-09-23%20at%209.49.02%20PM.png

Last time: A set of coordinated visualization libraries in Python¶

hvplot¶

  • Quickly generate interactive plots from your data
  • Seamlessly handles pandas and geopandas data
  • Relies on Holoviews and Geoviews under the hood
  • Fancy, magic widgets for grouping charts by a third column
  • Great built-in interaction toolbar powered by bokeh: box select, lasso, pan, zoom, etc...

An interface just like the pandas plot() function, but much more useful.

In [1]:
# Our usual imports
import pandas as pd
import geopandas as gpd
import numpy as np
import altair as alt
from matplotlib import pyplot as plt
In [2]:
# This will add the .hvplot() function to your DataFrame!
import hvplot.pandas

# Import holoviews too
import holoviews as hv

Last time: hvplot has great support for geopandas¶

Example: Interactive choropleth of median property assessments in Philadelphia

In [3]:
# Load the data
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2021/week-3/main/data/opa_residential.csv"
data = pd.read_csv(url)

# Create the Point() objects
data['geometry'] = gpd.points_from_xy(data['lng'], data['lat'])

# Create the GeoDataFrame
data = gpd.GeoDataFrame(data, geometry='geometry', crs="EPSG:4326")
In [4]:
# load the Zillow data from GitHub
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2021/week-3/main/data/zillow_neighborhoods.geojson"
zillow = gpd.read_file(url)
In [5]:
# Important: Make sure the CRS match
data = data.to_crs(zillow.crs)

# perform the spatial join
data = gpd.sjoin(data, zillow, predicate='within', how='left')
In [6]:
# Calculate the median market value per Zillow neighborhood
median_values = data.groupby("ZillowName", as_index=False)["market_value"].median()

# Merge median values with the Zillow geometries
median_values = zillow.merge(median_values, on="ZillowName")
print(type(median_values))
<class 'geopandas.geodataframe.GeoDataFrame'>
In [7]:
median_values.head()
Out[7]:
ZillowName geometry market_value
0 Academy Gardens POLYGON ((-74.99851 40.06435, -74.99456 40.061... 185950.0
1 Allegheny West POLYGON ((-75.16592 40.00327, -75.16596 40.003... 34750.0
2 Andorra POLYGON ((-75.22463 40.06686, -75.22588 40.065... 251900.0
3 Aston Woodbridge POLYGON ((-75.00860 40.05369, -75.00861 40.053... 183800.0
4 Bartram Village POLYGON ((-75.20733 39.93350, -75.20733 39.933... 48300.0

Take advantage of GeoViews¶

Let's add a tile source underneath the choropleth map

In [8]:
import geoviews as gv
import geoviews.tile_sources as gvts
In [9]:
%%opts WMTS [width=800, height=800, xaxis=None, yaxis=None]

choro = median_values.hvplot(c='market_value', 
                             width=500, 
                             height=400, 
                             alpha=0.5, 
                             geo=True,
                             cmap='viridis', 
                             hover_cols=['ZillowName'])
gvts.ESRI * choro
Out[9]:

hvplot also works with raster data¶

  • So far we've been working mainly with vector data using geopandas: lines, points, polygons
  • The basemaps we've been adding are an exampe of raster data
  • Raster data:
    • Gridded or pixelated
    • Maps easily to an array — it's the representation for digital images

Continuous raster examples¶

  • Multispectral satellite imagery, e.g., the Landsat satellite
  • Digital Elevation Models (DEMs)
  • Maps of canopy height derived from LiDAR data.

Categorical raster examples¶

  • Land cover or land-use maps.
  • Tree height maps classified as short, medium, and tall trees.
  • Snowcover masks (binary snow or no snow)

Important attributes of raster data¶

1. The coordinate reference system¶

0637aa2541b31f526ad44f7cb2db7b6c.jpg

2. Resolution¶

The spatial distance that a single pixel covers on the ground.

Screen%20Shot%202020-09-20%20at%208.18.36%20PM.png

3. Extent¶

The bounding box that covers the entire extent of the raster data.

4. Affine transformation¶

Pixel space --> real space

  • A mapping from pixel space (rows and columns of the image) to the x and y coordinates defined by the CRS
  • Typically a six parameter matrix defining the origin, pixel resolution, and rotation of the raster image

Multi-band raster data¶

Each band measures the light reflected from different parts of the electromagnetic spectrum.

Color images are multi-band rasters!¶

The raster format: GeoTIFF¶

  • A standard .tif image format with additional spatial information embedded in the file, which includes metadata for:
    • Geotransform, e.g., extent, resolution
    • Coordinate Reference System (CRS)
    • Values that represent missing data (NoDataValue)

Tools for raster data¶

  • Lots of different options: really just working with arrays
  • One of the first options: Geospatial Data Abstraction Library (GDAL)
    • Low level and not really user-friendly
  • A more recent, much more Pythonic package: rasterio

We'll use rasterio for the majority of our raster analysis

Getting started with rasterio¶

Analysis is built around the "open()" command which returns a "dataset" or "file handle"

In [10]:
import rasterio as rio
In [11]:
# Open the file and get a file "handle"
landsat = rio.open("./data/landsat8_philly.tif")
landsat
Out[11]:
<open DatasetReader name='./data/landsat8_philly.tif' mode='r'>

Let's check out the metadata¶

In [12]:
# The CRS
landsat.crs
Out[12]:
CRS.from_epsg(32618)
In [13]:
# The bounds
landsat.bounds
Out[13]:
BoundingBox(left=476064.3596176505, bottom=4413096.927074196, right=503754.3596176505, top=4443066.927074196)
In [14]:
# The number of bands available
landsat.count
Out[14]:
10
In [15]:
# The band numbers that are available
landsat.indexes
Out[15]:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
In [16]:
# Number of pixels in the x and y directions
landsat.shape
Out[16]:
(999, 923)
In [17]:
# The 6 parameters that map from pixel to real space
landsat.transform
Out[17]:
Affine(30.0, 0.0, 476064.3596176505,
       0.0, -30.0, 4443066.927074196)
In [18]:
# All of the meta data
landsat.meta
Out[18]:
{'driver': 'GTiff',
 'dtype': 'uint16',
 'nodata': None,
 'width': 923,
 'height': 999,
 'count': 10,
 'crs': CRS.from_epsg(32618),
 'transform': Affine(30.0, 0.0, 476064.3596176505,
        0.0, -30.0, 4443066.927074196)}

Reading data from a file¶

Tell the file which band to read, starting with 1

In [19]:
# This is just a numpy array
data = landsat.read(1)
data
Out[19]:
array([[10901, 10618, 10751, ..., 12145, 11540, 14954],
       [11602, 10718, 10546, ..., 11872, 11982, 12888],
       [10975, 10384, 10357, ..., 11544, 12318, 12456],
       ...,
       [12281, 12117, 12072, ..., 11412, 11724, 11088],
       [12747, 11866, 11587, ..., 11558, 12028, 10605],
       [11791, 11677, 10656, ..., 10615, 11557, 11137]], dtype=uint16)

We can plot it with matplotlib's imshow

In [20]:
fig, ax = plt.subplots(figsize=(10,10))

img = ax.imshow(data)

plt.colorbar(img)
Out[20]:
<matplotlib.colorbar.Colorbar at 0x294c0eaf0>

Two improvements¶

  • A log-scale colorbar
  • Add the axes extent

Adding the axes extent¶

Note that imshow needs a bounding box of the form: (left, right, bottom, top).

See the documentation for imshow if you forget the right ordering...(I often do!)

Using a log-scale color bar¶

Matplotlib has a built in log normalization...

In [21]:
import matplotlib.colors as mcolors
In [22]:
# Initialize
fig, ax = plt.subplots(figsize=(10, 10))

# Plot the image
img = ax.imshow(
    data,
    norm=mcolors.LogNorm(),  # Use a log colorbar scale
    extent=[  # Set the extent of the images
        landsat.bounds.left,
        landsat.bounds.right,
        landsat.bounds.bottom,
        landsat.bounds.top,
    ],
)

# Add a colorbar
plt.colorbar(img)
Out[22]:
<matplotlib.colorbar.Colorbar at 0x29aefc3d0>

Overlaying vector geometries on raster plots¶

Two requirements:

  1. The CRS of the two datasets must match
  2. The extent of the imshow plot must be set properly

Let's add the city limits¶

In [23]:
city_limits = gpd.read_file("./data/City_Limits.geojson")
In [24]:
# Convert to the correct CRS!
print(landsat.crs.data)
{'init': 'epsg:32618'}
In [27]:
city_limits = city_limits.to_crs(landsat.crs.data['init'])
In [28]:
# Initialize
fig, ax = plt.subplots(figsize=(10, 10))

# The extent of the data
landsat_extent = [
    landsat.bounds.left,
    landsat.bounds.right,
    landsat.bounds.bottom,
    landsat.bounds.top,
]

# Plot!
img = ax.imshow(data, 
                norm=mcolors.LogNorm(), #NEW
                extent=landsat_extent)  #NEW

# Add the city limits
city_limits.plot(ax=ax, facecolor="none", edgecolor="white")

# Add a colorbar and turn off axis lines
plt.colorbar(img)
ax.set_axis_off()

What band did we plot??¶

Landsat 8 has 11 bands spanning the electromagnetic spectrum from visible to infrared.

How about a typical RGB color image?¶

First let's read the red, blue, and green bands (bands 4, 3, 2):

Note¶

The .indexes attributes tells us the bands available to be read.

Important: they are not zero-indexed!

In [29]:
landsat.indexes
Out[29]:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
In [30]:
rgb_data = landsat.read([4, 3, 2])
In [31]:
rgb_data.shape 
Out[31]:
(3, 999, 923)

Note the data has the shape: (bands, height, width)

In [32]:
data[0]
Out[32]:
array([10901, 10618, 10751, 11045, 10914, 10400, 10585, 10438, 10225,
       10297, 10523, 10944, 10507, 10505, 10686, 11082, 11852, 11455,
       11293, 10536, 10315, 10391, 10472, 10366, 10341, 11422, 10582,
       11905, 12525, 12748, 13027, 13115, 12969, 12844, 12975, 11837,
       10618, 11009, 10945, 10653, 10519, 10594, 10546, 10580, 10768,
       10943, 10769, 10533, 10559, 10725, 10774, 11520, 11922, 11483,
       10478, 10390, 10283, 10345, 10280, 10343, 10536, 10332, 10422,
       10309, 10377, 10517, 10225, 10468, 10816, 10553, 10419, 10394,
       10619, 10734, 10841, 10944, 11099, 11107, 10857, 10738, 10814,
       10658, 10735, 10842, 10765, 10587, 10785, 10797, 11099, 11141,
       10923, 11064, 10626, 10355, 10499, 10530, 10641, 10466, 10497,
       10480, 11392, 11669, 10928, 10927, 10524, 10499, 10337, 10374,
       10670, 10630, 10371, 10211, 10268, 10275, 10296, 10244, 10216,
       10240, 10598, 10427, 10355, 10412, 10417, 10410, 10272, 10268,
       10373, 10320, 10238, 10390, 10285, 10441, 10274, 10328, 10419,
       10526, 10568, 10464, 10513, 10656, 10617, 10351, 10395, 10608,
       10384, 10158, 10144, 10170, 10184, 10214, 10226, 10178, 10166,
       10224, 10285, 10357, 10377, 10173, 10175, 10151, 10144, 10138,
       10120, 10139, 10148, 10131, 10339, 10826, 10565, 10357, 10344,
       10224, 10196, 10381, 10434, 10295, 10294, 10245, 10245, 10227,
       10266, 11380, 12201, 11826, 14048, 11851, 10465, 10490, 10680,
       10332, 10920, 11247, 11096, 11168, 11633, 12129, 11740, 11647,
       11071, 10593, 10863, 10995, 11153, 11285, 10707, 10613, 11054,
       10860, 11024, 11062, 10598, 10520, 10501, 10435, 10492, 10691,
       10690, 10337, 10413, 10753, 11031, 12790, 12313, 11258, 11157,
       11617, 11518, 10959, 11121, 12540, 13398, 12544, 12900, 12122,
       11852, 11487, 10815, 11078, 11080, 11373, 14688, 15051, 11360,
       12853, 13111, 11095, 10621, 11538, 12232, 12394, 12411, 11435,
       11248, 11503, 11601, 10818, 11365, 10831, 10338, 10265, 10229,
       10233, 10372, 10876, 11282, 10837, 10484, 10976, 11477, 11810,
       11431, 10596, 10601, 10609, 10871, 10937, 10688, 10941, 10942,
       11205, 11606, 11220, 10367, 10295, 10316, 10325, 10317, 10338,
       11053, 11736, 11419, 11117, 11046, 11674, 11122, 11031, 10934,
       10930, 14929, 19892, 22118, 14195, 10473, 10668, 11283, 12265,
       12750, 12047, 11410, 10690, 10415, 10260, 10770, 12274, 12017,
       11316, 10937, 10722, 11071, 11234, 11409, 11317, 10713, 10515,
       10714, 11232, 11636, 11193, 11256, 11583, 11801, 11127, 10382,
       10316, 10480, 10633, 10794, 11020, 10756, 10679, 10801, 10829,
       10683, 10507, 10626, 10829, 10675, 10453, 10711, 10848, 10814,
       10501, 10466, 10726, 10598, 10565, 10555, 10427, 10323, 10390,
       10764, 10890, 10664, 10299, 10697, 11106, 11009, 10642, 10404,
       10733, 11026, 10783, 11156, 11066, 10854, 10414, 10360, 10562,
       10605, 10585, 10608, 10930, 10899, 10924, 11076, 11184, 11018,
       10667, 10445, 10472, 10662, 10745, 11564, 11095, 11351, 11241,
       10279, 10345, 10539, 10626, 10799, 10999, 10640, 11097, 10740,
       10487, 11287, 11105, 10495, 10614, 11057, 11144, 11021, 10803,
       10290, 10494, 10774, 11018, 10940, 10774, 10954, 10839, 10784,
       11072, 10563, 10224, 10161, 10178, 10199, 10324, 10688, 10749,
       10754, 11253, 11265, 10892, 10752, 10863, 10785, 11019, 11041,
       10844, 10881, 11077, 11389, 11226, 11405, 11563, 11377, 11397,
       11704, 11914, 11809, 11385, 11449, 11873, 11512, 11276, 11122,
       11080, 11130, 10972, 10773, 10762, 11051, 10554, 10605, 11073,
       11002, 11458, 10962, 10653, 10445, 10756, 11950, 12005, 12624,
       10454, 10858, 10916, 11163, 11186, 10916, 10746, 10657, 11282,
       10899, 10827, 10661, 11014, 10856, 11043, 10991, 11017, 10957,
       10973, 10831, 11039, 11090, 11315, 11236, 11504, 11089, 10873,
       10625, 10499, 10711, 10529, 10668, 10538, 10763, 10672, 10649,
       10514, 10237, 10534, 10625, 10516, 10404, 10197, 10247, 10312,
       10262, 10708, 10640, 10462, 10275, 10139, 10136, 10319, 10662,
       10349, 10160, 10202, 10183, 10211, 10189, 10179, 10275, 10414,
       10313, 10302, 10963, 10346, 10269, 10348, 10569, 10311, 10235,
       10366, 10292, 10267, 10305, 10265, 10308, 10356, 10349, 10354,
       10565, 10559, 10466, 10457, 10410, 10248, 10403, 10502, 10405,
       10359, 10537, 10510, 10465, 10512, 10629, 10654, 10569, 10513,
       10526, 10567, 10599, 10368, 10295, 10270, 10204, 10164, 10170,
       10189, 10155, 10210, 10210, 10171, 10163, 10162, 10205, 10259,
       10301, 10239, 10211, 10208, 10251, 10377, 10393, 10424, 10251,
       10427, 10560, 10485, 10337, 10533, 10546, 10556, 10317, 10181,
       10193, 10208, 10157, 10166, 10122, 10139, 10393, 10773, 11205,
       10496, 10629, 10624, 10779, 11037, 10398, 10568, 10289, 10547,
       10374, 10376, 10363, 10350, 11107, 12838, 10384, 10256, 10350,
       10797, 10451, 10289, 10231, 10221, 10157, 10196, 10286, 10479,
       10253, 10389, 10573, 10774, 10577, 10431, 10454, 10328, 10351,
       10594, 10821, 10539, 10298, 10177, 10268, 10342, 10268, 10385,
       10398, 10560, 10431, 10524, 10578, 10350, 10286, 10501, 10750,
       10512, 10205, 10113, 10475, 10766, 10332, 10167, 10424, 10700,
       10823, 10789, 10682, 10773, 10921, 10717, 10966, 10973, 10208,
       10161, 10250, 10099, 10216, 10700, 10760, 10650, 10559, 10589,
       10948, 11812, 10747, 10543, 10364, 10674, 10950, 11036, 11536,
       11452, 13447, 13820, 17711, 13277, 11019, 10506, 10354, 10127,
       10125, 10193, 10555, 11036, 10944, 11145, 10995, 11105, 10973,
       10449, 10764, 11036, 10857, 10630, 10569, 10607, 10989, 11250,
       10827, 10851, 10835, 10863, 10545, 10460, 10695, 10666, 10684,
       10699, 10744, 10438, 10226, 10618, 10878, 11049, 11254, 11069,
       10650, 10376, 10362, 10851, 10916, 10459, 10334, 10279, 10736,
       10804, 10799, 10264, 10288, 10908, 10516, 10716, 10575, 10710,
       12654, 11934, 11812, 11295, 10696, 10750, 10575, 10333, 10543,
       10412, 10387, 10697, 10576, 10910, 10757, 10627, 11230, 10656,
       10630, 10835, 10694, 11474, 10532, 10543, 10798, 10761, 11070,
       11062, 10489, 10424, 10299, 10309, 10238, 10334, 10424, 10429,
       10663, 10990, 11452, 11767, 11454, 10738, 10349, 10301, 10514,
       10516, 10454, 10637, 10704, 10527, 10683, 10517, 10481, 10411,
       10457, 10355, 10326, 10372, 10432, 11632, 11776, 13655, 11890,
       11715, 12084, 11566, 10977, 11113, 10649, 10418, 10668, 10609,
       11132, 10786, 10746, 11106, 10933, 10820, 10580, 10579, 10561,
       11213, 10865, 10998, 11195, 10652, 10469, 10480, 10892, 11531,
       11353, 11092, 10814, 10712, 10737, 11380, 12418, 12875, 12696,
       12197, 12504, 12697, 12140, 10702, 10473, 11418, 11788, 10925,
       10692, 10828, 10677, 10632, 10661, 10674, 10706, 10751, 11115,
       11318, 11490, 11837, 12182, 12017, 11662, 11181, 10755, 10611,
       11391, 11743, 13173, 11971, 11176, 11425, 11677, 11290, 10819,
       10839, 10928, 10923, 10962, 11350, 11170, 11306, 11292, 11592,
       11656, 11758, 12145, 11540, 14954], dtype=uint16)
In [33]:
fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(12, 6))

cmaps = ["Reds", "Greens", "Blues"]
for i in [0, 1, 2]:

    # This subplot axes
    ax = axs[i]

    # Plot
    ax.imshow(rgb_data[i], norm=mcolors.LogNorm(), extent=landsat_extent, cmap=cmaps[i])
    city_limits.plot(ax=ax, facecolor="none", edgecolor="black")

    # Format
    ax.set_axis_off()
    ax.set_title(cmaps[i][:-1])

Side note: using subplots in matplotlib¶

You can specify the subplot layout via the nrows and ncols keywords to plt.subplots. If you have more than one row or column, the function returns:

  • The figure
  • The list of axes.
In [34]:
fig, axs = plt.subplots(nrows=1, ncols=3)

Note¶

When nrows or ncols > 1, I usually name the returned axes variable "axs" vs. the usual case of just "ax". It's useful for remembering we got more than one Axes object back!

In [35]:
# This has length 3 for each of the 3 columns!
axs
Out[35]:
array([<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>], dtype=object)
In [36]:
# Left axes
axs[0]  
Out[36]:
<AxesSubplot:>
In [37]:
# Middle axes
axs[1]
Out[37]:
<AxesSubplot:>
In [38]:
# Right axes
axs[2]
Out[38]:
<AxesSubplot:>

Use earthpy to plot the combined color image¶

A useful utility package to perform the proper re-scaling of pixel values

In [39]:
import earthpy.plot as ep
In [40]:
# Initialize
fig, ax = plt.subplots(figsize=(10,10))

# Plot the RGB bands
ep.plot_rgb(rgb_data, rgb=(0, 1, 2), ax=ax);

We can "stretch" the data to improve the brightness¶

In [41]:
# Initialize
fig, ax = plt.subplots(figsize=(10,10))

# Plot the RGB bands
ep.plot_rgb(rgb_data, rgb=(0, 1, 2), ax=ax, stretch=True);

Conclusion: Working with multi-band data is a bit messy¶

Can we do better?

Yes! HoloViz to the rescue...¶

xarray¶

  • A fancier version of numpy arrays
  • Designed for gridded data with multiple dimensions
  • For raster data, those dimensions are: bands, latitude, longitude, pixel values

Let's try it out...¶

In [42]:
import xarray as xr
import rioxarray # Adds rasterio engine extension to xarray
In [43]:
ds = xr.open_dataset("./data/landsat8_philly.tif", engine="rasterio")

ds
Out[43]:
<xarray.Dataset>
Dimensions:      (band: 10, x: 923, y: 999)
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 9 10
  * x            (x) float64 4.761e+05 4.761e+05 ... 5.037e+05 5.037e+05
  * y            (y) float64 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06
    spatial_ref  int64 0
Data variables:
    band_data    (band, y, x) float32 ...
xarray.Dataset
    • band: 10
    • x: 923
    • y: 999
    • band
      (band)
      int64
      1 2 3 4 5 6 7 8 9 10
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
    • x
      (x)
      float64
      4.761e+05 4.761e+05 ... 5.037e+05
      array([476079.359618, 476109.359618, 476139.359618, ..., 503679.359618,
             503709.359618, 503739.359618])
    • y
      (y)
      float64
      4.443e+06 4.443e+06 ... 4.413e+06
      array([4443051.927074, 4443021.927074, 4442991.927074, ..., 4413171.927074,
             4413141.927074, 4413111.927074])
    • spatial_ref
      ()
      int64
      ...
      crs_wkt :
      PROJCS["WGS 84 / UTM zone 18N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32618"]]
      semi_major_axis :
      6378137.0
      semi_minor_axis :
      6356752.314245179
      inverse_flattening :
      298.257223563
      reference_ellipsoid_name :
      WGS 84
      longitude_of_prime_meridian :
      0.0
      prime_meridian_name :
      Greenwich
      geographic_crs_name :
      WGS 84
      horizontal_datum_name :
      World Geodetic System 1984
      projected_crs_name :
      WGS 84 / UTM zone 18N
      grid_mapping_name :
      transverse_mercator
      latitude_of_projection_origin :
      0.0
      longitude_of_central_meridian :
      -75.0
      false_easting :
      500000.0
      false_northing :
      0.0
      scale_factor_at_central_meridian :
      0.9996
      spatial_ref :
      PROJCS["WGS 84 / UTM zone 18N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32618"]]
      GeoTransform :
      476064.3596176505 30.0 0.0 4443066.927074196 0.0 -30.0
      array(0)
    • band_data
      (band, y, x)
      float32
      ...
      [9220770 values with dtype=float32]
In [44]:
type(ds)
Out[44]:
xarray.core.dataset.Dataset

Now the magic begins: more hvplot¶

Still experimental, but very promising — interactive plots with widgets for each band!

In [45]:
import hvplot.xarray
In [46]:
img = ds.hvplot.image(width=700, height=500, cmap='viridis')
img
Out[46]:

More magic widgets!¶

Two key raster concepts¶

  • Using vector + raster data
    • Cropping or masking raster data based on vector polygons
  • Raster math and zonal statistics
    • Combining multiple raster data sets
    • Calculating statistics within certain vector polygons

Cropping: let's trim to just the city limits¶

Use rasterio's builtin mask function:

In [47]:
from rasterio.mask import mask
In [48]:
landsat
Out[48]:
<open DatasetReader name='./data/landsat8_philly.tif' mode='r'>
In [49]:
masked, mask_transform = mask(
    dataset=landsat,
    shapes=city_limits.geometry,
    crop=True,  # remove pixels not within boundary
    all_touched=True,  # get all pixels that touch the boudnary
    filled=False,  # do not fill cropped pixels with a default value
)
In [50]:
masked
Out[50]:
masked_array(
  data=[[[--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         ...,
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --]],

        [[--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         ...,
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --]],

        [[--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         ...,
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --]],

        ...,

        [[--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         ...,
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --]],

        [[--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         ...,
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --]],

        [[--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         ...,
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --],
         [--, --, --, ..., --, --, --]]],
  mask=[[[ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         ...,
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True]],

        [[ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         ...,
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True]],

        [[ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         ...,
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True]],

        ...,

        [[ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         ...,
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True]],

        [[ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         ...,
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True]],

        [[ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         ...,
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True],
         [ True,  True,  True, ...,  True,  True,  True]]],
  fill_value=999999,
  dtype=uint16)
In [51]:
masked.shape
Out[51]:
(10, 999, 923)
In [52]:
# Initialize
fig, ax = plt.subplots(figsize=(10, 10))

# Plot the first band
ax.imshow(masked[0], cmap="viridis", extent=landsat_extent)

# Format and add the city limits
city_limits.boundary.plot(ax=ax, color="gray", linewidth=4)
ax.set_axis_off()

Writing GeoTIFF files¶

Let's save our cropped raster data. To save raster data, we need both the array of the data and the proper meta-data.

In [53]:
out_meta = landsat.meta
out_meta.update(
    {"height": masked.shape[1], "width": masked.shape[2], "transform": mask_transform}
)
print(out_meta)

# write small image to local Geotiff file
with rio.open("data/cropped_landsat.tif", "w", **out_meta) as dst:
    dst.write(masked)
    

# Can't access
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': None, 'width': 923, 'height': 999, 'count': 10, 'crs': CRS.from_epsg(32618), 'transform': Affine(30.0, 0.0, 476064.3596176505,
       0.0, -30.0, 4443066.927074196)}
In [54]:
landsat
Out[54]:
<open DatasetReader name='./data/landsat8_philly.tif' mode='r'>

Note: we used a context manager above (the "with" statement) to handle the opening and subsequent closing of our new file.

For more info, see this tutorial.

Now let's do some raster math¶

Often called "map algebra"...

The Normalized Difference Vegetation Index¶

  • A normalized index of greenness ranging from -1 to 1, calculated from the red and NIR bands.
  • Provides a measure of the amount of live green vegetation in an area

Formula:

NDVI = (NIR - Red) / (NIR + Red)

Healthy vegetation reflects NIR and absorbs visible, so the NDVI for green vegatation

Get the data for the red and NIR bands¶

NIR is band 5 and red is band 4

In [55]:
masked.shape
Out[55]:
(10, 999, 923)
In [58]:
# Note that the indexing here is zero-based, e.g., band 1 is index 0
red = masked[3]
nir = masked[4]
In [59]:
# Where mask = True, pixels are excluded
red.mask
Out[59]:
array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       ...,
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True]])
In [60]:
def calculate_NDVI(nir, red):
    """
    Calculate the NDVI from the NIR and red landsat bands
    """
    
    # Convert to floats
    nir = nir.astype(float)
    red = red.astype(float)
    
    # Get valid entries
    check = np.logical_and( red.mask == False, nir.mask == False )
    
    # Where the check is True, return the NDVI, else return NaN
    ndvi = np.where(check,  (nir - red ) / ( nir + red ), np.nan )
    return ndvi 
In [61]:
NDVI = calculate_NDVI(nir, red)
In [62]:
fig, ax = plt.subplots(figsize=(10,10))

# Plot NDVI
img = ax.imshow(NDVI, extent=landsat_extent)

# Format and plot city limits
city_limits.plot(ax=ax, edgecolor='gray', facecolor='none', linewidth=4)
plt.colorbar(img)
ax.set_axis_off()
ax.set_title("NDVI in Philadelphia", fontsize=18);

Let's overlay Philly's parks¶

In [63]:
# Read in the parks dataset
parks = gpd.read_file('./data/parks.geojson')
In [64]:
# Print out the CRS
parks.crs
Out[64]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [65]:
landsat.crs
Out[65]:
CRS.from_epsg(32618)
In [67]:
landsat.crs.data['init']
Out[67]:
'epsg:32618'
In [68]:
# Conver to landsat CRS
parks = parks.to_crs(landsat.crs.data['init'])

#parks = parks.to_crs(epsg=32618)
In [69]:
fig, ax = plt.subplots(figsize=(10,10))

# Plot NDVI
img = ax.imshow(NDVI, extent=landsat_extent)

# NEW: add the parks
parks.plot(ax=ax, edgecolor='crimson', facecolor='none', linewidth=2)

# Format and plot city limits
city_limits.plot(ax=ax, edgecolor='gray', facecolor='none', linewidth=4)
plt.colorbar(img)
ax.set_axis_off()
ax.set_title("NDVI vs. Parks in Philadelphia", fontsize=18);

It looks like it worked pretty well!

How about calculating the median NDVI within the park geometries?¶

This is called zonal statistics. We can use the rasterstats package to do this...

In [70]:
from rasterstats import zonal_stats
In [71]:
stats = zonal_stats(parks, NDVI, affine=landsat.transform, stats=['mean', 'median'], nodata=np.nan)

The zonal_stats function¶

Zonal statistics of raster values aggregated to vector geometries.

  • First argument: vector data
  • Second argument: raster data to aggregated
  • Also need to pass the affine transform and the names of the stats to compute
In [72]:
stats
Out[72]:
[{'mean': 0.5051087300747117, 'median': 0.521160491292894},
 {'mean': 0.441677258928841, 'median': 0.47791901454593105},
 {'mean': 0.4885958179164529, 'median': 0.5084981442485412},
 {'mean': 0.3756264171005127, 'median': 0.42356277391080177},
 {'mean': 0.45897667566035816, 'median': 0.49204860985900256},
 {'mean': 0.4220467885671325, 'median': 0.46829433706341134},
 {'mean': 0.38363107808041097, 'median': 0.416380868090128},
 {'mean': 0.4330996771737791, 'median': 0.4772304906066629},
 {'mean': 0.45350534061220404, 'median': 0.5015001304461257},
 {'mean': 0.44505112880627273, 'median': 0.4701653486700216},
 {'mean': 0.5017048095513129, 'median': 0.5108644957689836},
 {'mean': 0.23745576631277313, 'median': 0.24259729272419628},
 {'mean': 0.31577818701756977, 'median': 0.3306512446567765},
 {'mean': 0.5300954792288528, 'median': 0.5286783042394015},
 {'mean': 0.4138823685577781, 'median': 0.4604562737642586},
 {'mean': 0.48518030764354636, 'median': 0.5319950233699855},
 {'mean': 0.3154294503884359, 'median': 0.33626983970826585},
 {'mean': 0.4673660755293326, 'median': 0.49301485937514306},
 {'mean': 0.38810332525412405, 'median': 0.4133657898874572},
 {'mean': 0.4408030558751575, 'median': 0.45596817840957254},
 {'mean': 0.5025746121592748, 'median': 0.5077718684805924},
 {'mean': 0.3694178483720184, 'median': 0.3760657145169406},
 {'mean': 0.501111461720193, 'median': 0.5050854708805356},
 {'mean': 0.5262850765386248, 'median': 0.5356888168557536},
 {'mean': 0.48653727784657114, 'median': 0.5100931149323727},
 {'mean': 0.4687375187461541, 'median': 0.49930407348983946},
 {'mean': 0.4708292376095287, 'median': 0.48082651365503404},
 {'mean': 0.4773326667398458, 'median': 0.5156144807640011},
 {'mean': 0.4787564298905878, 'median': 0.49155885289073287},
 {'mean': 0.44870821225970553, 'median': 0.4532736333178444},
 {'mean': 0.28722653079042143, 'median': 0.2989290105395561},
 {'mean': 0.5168607841969154, 'median': 0.5325215878735168},
 {'mean': 0.4725611895576703, 'median': 0.49921839394549977},
 {'mean': 0.46766118115625893, 'median': 0.49758167570710404},
 {'mean': 0.4886459911745514, 'median': 0.5020622294662287},
 {'mean': 0.46017276593316403, 'median': 0.4758438120450033},
 {'mean': 0.4536300550044786, 'median': 0.49233559560028084},
 {'mean': 0.4239182074098442, 'median': 0.4602468966932257},
 {'mean': 0.31202502605597016, 'median': 0.32572928232562837},
 {'mean': 0.2912164528160494, 'median': 0.2946940937582767},
 {'mean': 0.3967233668480246, 'median': 0.3967233668480246},
 {'mean': 0.3226210938878899, 'median': 0.35272291143510365},
 {'mean': 0.4692543355097313, 'median': 0.502843843019927},
 {'mean': 0.2094971073454244, 'median': 0.2736635627619322},
 {'mean': 0.4429390753150177, 'median': 0.49684622604615747},
 {'mean': 0.532656111900622, 'median': 0.5346870838881491},
 {'mean': 0.5053390017291615, 'median': 0.5413888437144251},
 {'mean': 0.500764874094713, 'median': 0.5121577370584547},
 {'mean': 0.4213318206046151, 'median': 0.46379308008197284},
 {'mean': 0.43340892620907207, 'median': 0.4818377611583778},
 {'mean': 0.5069321976848545, 'median': 0.5160336741725328},
 {'mean': 0.4410269675104542, 'median': 0.4870090058426777},
 {'mean': 0.4923323198113241, 'median': 0.5212096391398666},
 {'mean': 0.45403610108483955, 'median': 0.4947035355619755},
 {'mean': 0.2771909854304859, 'median': 0.2843133053564756},
 {'mean': 0.48502792988760635, 'median': 0.5001936488730003},
 {'mean': 0.45418598340923805, 'median': 0.48402058939985293},
 {'mean': 0.32960811687490216, 'median': 0.3285528031290743},
 {'mean': 0.5260210746257522, 'median': 0.5444229172432882},
 {'mean': 0.4445493328049692, 'median': 0.4630808572279276},
 {'mean': 0.4495963972554633, 'median': 0.4858841111806047},
 {'mean': 0.5240412096213553, 'median': 0.5283402835696414},
 {'mean': 0.13767686126646383, 'median': -0.02153756296999343}]
In [73]:
len(stats)
Out[73]:
63
In [74]:
len(parks)
Out[74]:
63
In [75]:
# Store the median value in the parks data frame
parks['median_NDVI'] = [s['median'] for s in stats] 
In [76]:
parks.head()
Out[76]:
OBJECTID ASSET_NAME SITE_NAME CHILD_OF ADDRESS TYPE USE_ DESCRIPTION SQ_FEET ACREAGE ZIPCODE ALLIAS NOTES TENANT LABEL DPP_ASSET_ID Shape__Area Shape__Length geometry median_NDVI
0 7 Wissahickon Valley Park Wissahickon Valley Park Wissahickon Valley Park None LAND REGIONAL_CONSERVATION_WATERSHED NO_PROGRAM 9.078309e+07 2084.101326 19128 MAP SITE Wissahickon Valley Park 1357 1.441162e+07 71462.556702 MULTIPOLYGON (((484101.476 4431051.989, 484099... 0.521160
1 8 West Fairmount Park West Fairmount Park West Fairmount Park LAND REGIONAL_CONSERVATION_WATERSHED NO_PROGRAM 6.078159e+07 1395.358890 19131 MAP SITE West Fairmount Park 1714 9.631203e+06 25967.819064 MULTIPOLYGON (((482736.681 4428824.579, 482728... 0.477919
2 23 Pennypack Park Pennypack Park Pennypack Park LAND REGIONAL_CONSERVATION_WATERSHED NO_PROGRAM 6.023748e+07 1382.867808 19152 Verree Rd Interpretive Center MAP SITE Pennypack Park 1651 9.566914e+06 41487.394790 MULTIPOLYGON (((497133.192 4434667.950, 497123... 0.508498
3 24 East Fairmount Park East Fairmount Park East Fairmount Park LAND REGIONAL_CONSERVATION_WATERSHED NO_PROGRAM 2.871642e+07 659.240959 19121 MAP SITE East Fairmount Park 1713 4.549582e+06 21499.126097 POLYGON ((484539.743 4424162.545, 484620.184 4... 0.423563
4 25 Tacony Creek Park Tacony Creek Park Tacony Creek Park LAND REGIONAL_CONSERVATION_WATERSHED NO_PROGRAM 1.388049e+07 318.653500 19120 MAP SITE Tacony Creek Park 1961 2.201840e+06 19978.610852 MULTIPOLYGON (((491712.882 4429633.244, 491713... 0.492049

Make a quick histogram of the median values¶

They are all positive, indicating an abundance of green vegetation...

In [77]:
# Initialize
fig, ax = plt.subplots(figsize=(8,6))

# Plot a quick histogram 
ax.hist(parks['median_NDVI'], bins='auto')
ax.axvline(x=0, c='k', lw=2)

# Format
ax.set_xlabel("Median NDVI", fontsize=18)
ax.set_ylabel("Number of Parks", fontsize=18);

Let's make a choropleth, too¶

In [78]:
# Initialize
fig, ax = plt.subplots(figsize=(10,10))

# Plot the city limits
city_limits.plot(ax=ax, edgecolor='black', facecolor='none', linewidth=4)

# Plot the median NDVI
parks.plot(column='median_NDVI', legend=True, ax=ax, cmap='viridis')

# Format
ax.set_axis_off()

Let's make it interactive!¶

Make sure to check the crs!

In [79]:
parks.crs
Out[79]:
<Derived Projected CRS: EPSG:32618>
Name: WGS 84 / UTM zone 18N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 78°W and 72°W, northern hemisphere between equator and 84°N, onshore and offshore. Bahamas. Canada - Nunavut; Ontario; Quebec. Colombia. Cuba. Ecuador. Greenland. Haiti. Jamaica. Panama. Turks and Caicos Islands. United States (USA). Venezuela.
- bounds: (-78.0, 0.0, -72.0, 84.0)
Coordinate Operation:
- name: UTM zone 18N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

The CRS is "EPSG:32618" and since it's not in 4326, we'll need to specify the "crs=" keyword when plotting!

In [80]:
# trim to only the columns we want to plot
cols = ["median_NDVI", "SITE_NAME", "geometry"]

# Plot the parks colored by median NDVI
p = parks[cols].hvplot.polygons(c="median_NDVI", 
                                geo=True, 
                                crs=32618, 
                                cmap='viridis', 
                                hover_cols=["SITE_NAME"])

# Plot the city limit boundary
cl = city_limits.hvplot.polygons(
    geo=True,
    crs=32618,
    alpha=0,
    line_alpha=1,
    line_color="black",
    hover=False,
    width=700,
    height=600,
)

# combine!
cl * p
/Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/geoviews/operation/projection.py:79: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
  polys = [g for g in geom if g.area > 1e-15]
Out[80]:

Exercise: Measure the median NDVI for each of Philadelphia's neighborhoods¶

  • Once you measure the median value for each neighborhood, try making the following charts:
    • A histogram of the median values for all neighborhoods
    • Bar graphs with the neighborhoods with the 20 highest and 20 lowest values
    • A choropleth showing the median values across the city

You're welcome to use matplotlib/geopandas, but encouraged to explore the new hvplot options!

1. Load the neighborhoods data¶

  • A GeoJSON file of Philly's neighborhoods is available in the "data/" folder
In [81]:
# Load the neighborhoods
hoods = gpd.read_file("./data/zillow_neighborhoods.geojson")

2. Calculate the median NDVI value for each neighborhood¶

Important: Make sure your neighborhoods GeoDataFrame is in the same CRS as the Landsat data!

In [82]:
landsat.crs.data['init']
Out[82]:
'epsg:32618'
In [83]:
# Convert to the landsat
hoods = hoods.to_crs(landsat.crs.data['init'])
In [84]:
# Calculate the zonal statistics
stats_by_hood = zonal_stats(hoods, NDVI, affine=landsat.transform, stats=['mean', 'median'], nodata=-999)
In [85]:
stats_by_hood
Out[85]:
[{'mean': 0.31392040334687304, 'median': 0.27856594452047934},
 {'mean': 0.18300929138083402, 'median': 0.15571923768967572},
 {'mean': 0.22468445189501648, 'median': 0.17141970412772856},
 {'mean': 0.43065240779519015, 'median': 0.4617488161141866},
 {'mean': 0.318970809622512, 'median': 0.3006904521671514},
 {'mean': 0.25273468406592925, 'median': 0.2483195384843864},
 {'mean': 0.09587147406959982, 'median': 0.08340007053008469},
 {'mean': 0.17925317272000577, 'median': 0.16159209153091308},
 {'mean': 0.16927437879780147, 'median': 0.14856486796785304},
 {'mean': 0.13315183006110315, 'median': 0.0844147036216485},
 {'mean': 0.3032613180748332, 'median': 0.2991018371406554},
 {'mean': 0.3005145367111285, 'median': 0.2983533036011672},
 {'mean': 0.3520402658092766, 'median': 0.36642977306279395},
 {'mean': 0.06850728352663443, 'median': 0.051313743471153465},
 {'mean': 0.1624992167356467, 'median': 0.14594594594594595},
 {'mean': 0.19406263526762074, 'median': 0.19242101469002704},
 {'mean': 0.2495581274514327, 'median': 0.2257300667883086},
 {'mean': 0.04163000301482944, 'median': 0.037030431957658136},
 {'mean': 0.4201038409217668, 'median': 0.4439358649612553},
 {'mean': 0.04225135200774936, 'median': 0.03853493588670528},
 {'mean': 0.2904211760054823, 'median': 0.26470142185928036},
 {'mean': 0.15363805171709244, 'median': 0.12085003697505158},
 {'mean': 0.17152805040632468, 'median': 0.11496098074558328},
 {'mean': 0.42552495036167076, 'median': 0.421658805187364},
 {'mean': 0.3912762229270476, 'median': 0.42212521910111195},
 {'mean': 0.09404645575651284, 'median': 0.07857622565480188},
 {'mean': 0.16491834928991564, 'median': 0.15021318465070516},
 {'mean': 0.3106049986498808, 'median': 0.32583142201834864},
 {'mean': 0.12919644511512735, 'median': 0.11913371378191145},
 {'mean': 0.2915737313903987, 'median': 0.304774440588728},
 {'mean': 0.31787656779796253, 'median': 0.3952783887473803},
 {'mean': 0.195965777548411, 'median': 0.1770113196079921},
 {'mean': 0.05661083180075057, 'median': 0.0499893639651138},
 {'mean': 0.18515284607314653, 'median': 0.17907100357304717},
 {'mean': 0.29283640801624505, 'median': 0.2877666162286677},
 {'mean': 0.1771635174218309, 'median': 0.14970314171568821},
 {'mean': 0.14716619864818462, 'median': 0.11244390749050742},
 {'mean': 0.12469553617180827, 'median': 0.10742619029157832},
 {'mean': 0.18291214109092946, 'median': 0.1216128609630211},
 {'mean': 0.1859549325760223, 'median': 0.17248164686293802},
 {'mean': 0.09016336981602067, 'median': 0.07756515448942505},
 {'mean': 0.14353818825070166, 'median': 0.12901489323524135},
 {'mean': 0.29371646510388005, 'median': 0.2772895703235984},
 {'mean': 0.14261119894987634, 'median': 0.13165141028503385},
 {'mean': 0.160324108968167, 'median': 0.14247544403334816},
 {'mean': 0.19990495725580412, 'median': 0.12561208543149852},
 {'mean': 0.1404895811639291, 'median': 0.11285189208851981},
 {'mean': 0.20007580613997314, 'median': 0.17488414495175872},
 {'mean': 0.29349378771510654, 'median': 0.27647108771333956},
 {'mean': 0.22719447238709786, 'median': 0.21500827689277635},
 {'mean': 0.2620610489891658, 'median': 0.255177304964539},
 {'mean': 0.2540232182541783, 'median': 0.23457093696574202},
 {'mean': 0.3230085712300897, 'median': 0.3263351749539595},
 {'mean': 0.26093326612377477, 'median': 0.2594112652136994},
 {'mean': 0.30115965232695824, 'median': 0.2881442688809087},
 {'mean': 0.10419469673264953, 'median': 0.06574664149015746},
 {'mean': 0.14459843112390638, 'median': 0.1273132150810105},
 {'mean': 0.09983214655943277, 'median': 0.08794976361471002},
 {'mean': 0.14225336243007766, 'median': 0.12302560186487681},
 {'mean': 0.09946071861930077, 'median': 0.07853215389041533},
 {'mean': 0.1452193922659759, 'median': 0.12790372368526273},
 {'mean': 0.11796940950011613, 'median': 0.0776666911598968},
 {'mean': 0.17070274409635827, 'median': 0.15888826411759982},
 {'mean': 0.19068874563285163, 'median': 0.1720531412171591},
 {'mean': 0.09898339526744644, 'median': 0.08963622358746948},
 {'mean': 0.18061992404502175, 'median': 0.15495696640138285},
 {'mean': 0.16533057021385317, 'median': 0.10399253586284649},
 {'mean': 0.11908729513660037, 'median': 0.0588394720437976},
 {'mean': 0.17329359472117273, 'median': 0.10705735100688761},
 {'mean': 0.2337590931600961, 'median': 0.16357100676045763},
 {'mean': 0.21102276332981473, 'median': 0.19582317265976007},
 {'mean': 0.2484378755666938, 'median': 0.23640730014990935},
 {'mean': 0.21642001147674247, 'median': 0.18099201630711736},
 {'mean': 0.129763301389145, 'median': 0.092685120415709},
 {'mean': 0.054999694456918914, 'median': 0.04555942009022349},
 {'mean': 0.18724649877256527, 'median': 0.18596250718311752},
 {'mean': 0.17647253612174493, 'median': 0.14852174323196993},
 {'mean': 0.178385177079417, 'median': 0.16691988950276243},
 {'mean': 0.18817260737171776, 'median': 0.16271241356413235},
 {'mean': 0.13811770403698734, 'median': 0.10897435897435898},
 {'mean': 0.4355159036631469, 'median': 0.44460500963391136},
 {'mean': 0.28184168594689557, 'median': 0.25374251741396553},
 {'mean': 0.22359597758168365, 'median': 0.19391995334256762},
 {'mean': 0.2878755977585209, 'median': 0.25158715762742606},
 {'mean': 0.26728534727462555, 'median': 0.2410734949827748},
 {'mean': 0.28933947460347065, 'median': 0.25559871643303494},
 {'mean': 0.30402279251992353, 'median': 0.30089000839630564},
 {'mean': 0.3922566238548766, 'median': 0.40705882352941175},
 {'mean': 0.10406705198196901, 'median': 0.04374236571919052},
 {'mean': 0.051218927963353054, 'median': 0.04365235141713119},
 {'mean': 0.1723687820405403, 'median': 0.15059755991957824},
 {'mean': 0.33144340218886986, 'median': 0.3072201540212242},
 {'mean': 0.18000603727544026, 'median': 0.1706375749288419},
 {'mean': 0.29408048899874983, 'median': 0.3036726128016789},
 {'mean': 0.08503955870273701, 'median': 0.07407192525923781},
 {'mean': 0.32241253234016937, 'median': 0.31006926588995215},
 {'mean': 0.18205496347967423, 'median': 0.16134765474074647},
 {'mean': 0.12977836123257955, 'median': 0.08590941601457007},
 {'mean': 0.13443566497928924, 'median': 0.11798494025392971},
 {'mean': 0.2033358456620277, 'median': 0.16576007209122687},
 {'mean': 0.32415498788453834, 'median': 0.32617602427921094},
 {'mean': 0.18749910770539593, 'median': 0.17252747252747253},
 {'mean': 0.3193736682602594, 'median': 0.3373959675453564},
 {'mean': 0.3257517503301688, 'median': 0.29360932475884244},
 {'mean': 0.19620976301968188, 'median': 0.1650986741346869},
 {'mean': 0.07896242289726156, 'median': 0.06548635285469223},
 {'mean': 0.09044969459220685, 'median': 0.07117737003058104},
 {'mean': 0.23577910900495405, 'median': 0.2106886253051269},
 {'mean': 0.49002749329456036, 'median': 0.508513798224709},
 {'mean': 0.2967746028243487, 'median': 0.29078296133448145},
 {'mean': 0.23938081232215558, 'median': 0.20916949997582784},
 {'mean': 0.11612617064064894, 'median': 0.10231888964116452},
 {'mean': 0.12195714494535831, 'median': 0.04300298129912368},
 {'mean': 0.20700090460961312, 'median': 0.2064790288239442},
 {'mean': 0.13647293633345284, 'median': 0.12569427247320197},
 {'mean': 0.22309412347829316, 'median': 0.21101490231903636},
 {'mean': 0.10137747100018765, 'median': 0.06563398497986661},
 {'mean': 0.08273584634594518, 'median': 0.06561233591005096},
 {'mean': 0.04312086703340781, 'median': 0.02541695678217338},
 {'mean': 0.2536093586322431, 'median': 0.2311732524352581},
 {'mean': 0.32332895563886144, 'median': 0.32909180590909004},
 {'mean': 0.2277484811069114, 'median': 0.22431761560720143},
 {'mean': 0.1721841184848715, 'median': 0.16058076512244332},
 {'mean': 0.29942596505649405, 'median': 0.29676670038589786},
 {'mean': 0.15940437606074165, 'median': 0.1358949502071032},
 {'mean': 0.12378472816253742, 'median': 0.11830807853171385},
 {'mean': 0.1877499132271723, 'median': 0.17826751036439015},
 {'mean': 0.11943950844267338, 'median': 0.07476532302595251},
 {'mean': 0.17496298489567105, 'median': 0.1659926732955897},
 {'mean': 0.1920977399165265, 'median': 0.16871625674865026},
 {'mean': 0.16337082044518464, 'median': 0.116828342471531},
 {'mean': 0.13862897874493735, 'median': 0.12335431461369693},
 {'mean': 0.17396050138550698, 'median': 0.14759331143448312},
 {'mean': 0.2398879707943406, 'median': 0.2546650763725905},
 {'mean': 0.14397524759541586, 'median': 0.10151342090234151},
 {'mean': 0.11262226656879762, 'median': 0.08519488018434215},
 {'mean': 0.4105827285256272, 'median': 0.43712943197998555},
 {'mean': 0.12208131944943262, 'median': 0.10823104693140795},
 {'mean': 0.10174770697387744, 'median': 0.07673932517169305},
 {'mean': 0.13941350910672, 'median': 0.11942396096727109},
 {'mean': 0.20263947878478583, 'median': 0.1750341162105868},
 {'mean': 0.42006022571388996, 'median': 0.4711730560719998},
 {'mean': 0.19283089059664163, 'median': 0.12246812581863598},
 {'mean': 0.06900296739992386, 'median': 0.0563680745433852},
 {'mean': 0.16839264609586674, 'median': 0.1653832567997174},
 {'mean': 0.20260798649692677, 'median': 0.17538042425378467},
 {'mean': 0.4514355509481655, 'median': 0.4945924206514285},
 {'mean': 0.08120602841441002, 'median': 0.05459630483539914},
 {'mean': 0.3175258476394397, 'median': 0.3184809770175624},
 {'mean': 0.23161067366700408, 'median': 0.22945830797321973},
 {'mean': 0.32557766938441296, 'median': 0.3120613068393747},
 {'mean': 0.5017692669135977, 'median': 0.5207786535525264},
 {'mean': 0.17652187824323134, 'median': 0.14554151586614233},
 {'mean': 0.32586693646536, 'median': 0.3296868690576704},
 {'mean': 0.2076776587799097, 'median': 0.20473126032547465},
 {'mean': 0.2604215070389401, 'median': 0.24160812708080265},
 {'mean': 0.263077089015587, 'median': 0.25120875995449377},
 {'mean': 0.16880134097818922, 'median': 0.15243747069084695}]
In [86]:
# Store the median value in the neighborhood data frame
hoods['median_NDVI'] = [s['median'] for s in stats_by_hood]
In [87]:
hoods.head()
Out[87]:
ZillowName geometry median_NDVI
0 Academy Gardens POLYGON ((500127.271 4434899.076, 500464.133 4... 0.278566
1 Airport POLYGON ((483133.506 4415847.150, 483228.716 4... 0.155719
2 Allegheny West POLYGON ((485837.663 4428133.600, 485834.655 4... 0.171420
3 Andorra POLYGON ((480844.652 4435202.290, 480737.832 4... 0.461749
4 Aston Woodbridge POLYGON ((499266.779 4433715.944, 499265.975 4... 0.300690

3. Make a histogram of median values¶

In [88]:
# Initialize
fig, ax = plt.subplots(figsize=(8,6))

# Plot a quick histogram 
ax.hist(hoods['median_NDVI'], bins='auto')
ax.axvline(x=0, c='k', lw=2)

# Format
ax.set_xlabel("Median NDVI", fontsize=18)
ax.set_ylabel("Number of Neighborhoods", fontsize=18);

4. Plot a (static) choropleth map¶

In [89]:
# Initialize
fig, ax = plt.subplots(figsize=(10,10))

# Plot the city limits
city_limits.plot(ax=ax, edgecolor='black', facecolor='none', linewidth=4)

# Plot the median NDVI
hoods.plot(column='median_NDVI', legend=True, ax=ax)

# Format
ax.set_axis_off()

5. Plot an (interactive) choropleth map¶

You can use hvplot or altair!

With hvplot:

In [90]:
# trim to only the columns we want to plot
cols = ["median_NDVI", "ZillowName", "geometry"]

# Plot the parks colored by median NDVI
hoods[cols].hvplot.polygons(c="median_NDVI", 
                            geo=True, 
                            crs=32618, 
                            width=700, 
                            height=600, 
                            cmap='viridis',
                            hover_cols=["ZillowName"])
Out[90]:

With altair:

In [91]:
# trim to only the columns we want to plot
cols = ["median_NDVI", "ZillowName", "geometry"]

(
    alt.Chart(hoods[cols].to_crs(epsg=4326))
    .mark_geoshape()
    .encode(
        color=alt.Color(
            "median_NDVI",
            scale=alt.Scale(scheme="viridis"),
            legend=alt.Legend(title="Median NDVI"),
        ),
        tooltip=["ZillowName", "median_NDVI"],
    )
)
Out[91]:

5. Explore the neighborhoods with the highest and lowest median NDVI values¶

5A. Plot a bar chart of the neighborhoods with the top 20 largest median values¶

With hvplot:

In [92]:
# calculate the top 20
cols = ['ZillowName', 'median_NDVI']
top20 = hoods[cols].sort_values('median_NDVI', ascending=False).head(n=20)

top20.hvplot.bar(x='ZillowName', y='median_NDVI', rot=90, height=400)
Out[92]:

With altair:

In [93]:
(
alt.Chart(top20)
    .mark_bar()
    .encode(x=alt.X("ZillowName", sort="-y"),  # Sort in descending order by y
            y="median_NDVI",
            tooltip=['ZillowName', 'median_NDVI'])
    .properties(width=800)
)
Out[93]:

5B. Plot a bar chart of the neighborhoods with the bottom 20 largest median values¶

With hvplot:

In [94]:
cols = ['ZillowName', 'median_NDVI']
bottom20 = hoods[cols].sort_values('median_NDVI', ascending=True).tail(n=20)


bottom20.hvplot.bar(x='ZillowName', y='median_NDVI', rot=90, height=300, width=600)
Out[94]:

With altair:

In [95]:
(
alt.Chart(bottom20)
    .mark_bar()
    .encode(x=alt.X("ZillowName", sort="y"),  # Sort in ascending order by y
            y="median_NDVI",
            tooltip=['ZillowName', 'median_NDVI'])
    .properties(width=800)
)
Out[95]:

That's it!¶

  • Next week: introduction to APIs and downloading data from the web in Python
  • See you on Monday!