Week 4A: More Interactive Data Viz, Working with Raster Data¶

Sep 26, 2022

Housekeeping¶

  • Homework #2 due a week from today (10/3)
  • Choose a dataset to visualize and explore
    • OpenDataPhilly or one your choosing
    • Email me if you want to analyze one that's not on OpenDataPhilly

Week #4 Agenda¶

Two parts:

  • Part 1: More interactive data visualization: the HoloViz ecosystem
  • Part 2: Getting started with raster data

Part 1: More interactive data viz¶

Recap: Data viz in Python¶

What have we learned so far¶

Matplotlib¶

  • The classic, most flexible library
  • Can handle geographic data well
  • Overly verbose syntax, syntax is not declarative

Pandas¶

  • Quick, built-in interface
  • Not as many features as other libraries

seaborn¶

  • Best for visualizing complex relationships between variables
  • Improves matplotlib's syntax: more declarative

altair¶

  • Easy, declarative syntax
  • Lots of interactive features
  • Complex visualizations with minimal amounts of code

We'll learn one more today...¶

A set of coordinated visualization libraries in Python¶

The motivation behind HoloViz mirrors the goals of this course¶

Proper data visualization is crucial throughout all of the steps of the data science pipeline: data wrangling, modeling, and storytelling

Today: hvPlot, Holoviews, Geoviews¶

Later in the course: Datashader, Param, Panel¶

A quick overview¶

  • Bokeh: creating interactive visualizations using Javascript using Python
  • HoloViews: a declarative, high-level library for creating bokeh libraries

$\rightarrow$ Similar to altair and Vega¶

A significant pro:¶

GeoViews builds on HoloViews to add support for geographic data

The major con:¶

  • All are relatively new
  • Bokeh is the most well-tested
  • HoloViews, GeoViews, hvPlot are being actively developed but are very promising

How does hvPlot fit in?¶

The hvPlot package¶

It's relatively new: officially released in February 2019

In [1]:
%%html 

<center>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">We are very pleased officially announce the release of hvPlot! It provides a high-level plotting API for the PyData ecosystem including <a href="https://twitter.com/pandas_dev?ref_src=twsrc%5Etfw">@pandas_dev</a>, <a href="https://twitter.com/xarray_dev?ref_src=twsrc%5Etfw">@xarray_dev</a>, <a href="https://twitter.com/dask_dev?ref_src=twsrc%5Etfw">@dask_dev</a>, <a href="https://twitter.com/geopandas?ref_src=twsrc%5Etfw">@geopandas</a> and more, generating interactive <a href="https://twitter.com/datashader?ref_src=twsrc%5Etfw">@datashader</a> and <a href="https://twitter.com/BokehPlots?ref_src=twsrc%5Etfw">@BokehPlots</a>. <a href="https://t.co/Loc5XElJUL">https://t.co/Loc5XElJUL</a></p>&mdash; HoloViews (@HoloViews) <a href="https://twitter.com/HoloViews/status/1092409050283819010?ref_src=twsrc%5Etfw">February 4, 2019</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</center>

We are very pleased officially announce the release of hvPlot! It provides a high-level plotting API for the PyData ecosystem including @pandas_dev, @xarray_dev, @dask_dev, @geopandas and more, generating interactive @datashader and @BokehPlots. https://t.co/Loc5XElJUL

— HoloViews (@HoloViews) February 4, 2019

Main use¶

  • Quickly generate interactive plots from your data
  • Seamlessly handles pandas and geopandas data
  • Relies on Holoviews and Geoviews under the hood

An interface just like the pandas plot() function, but much more useful.

Let's load some data to try out¶

In [2]:
# Our usual imports
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
In [3]:
# let's load the measles data from week 2
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2022/week-2/master/data/measles_incidence.csv"
measles_data_raw = pd.read_csv(url, skiprows=2, na_values='-')
In [4]:
measles_data_raw.head()
Out[4]:
YEAR WEEK ALABAMA ALASKA ARIZONA ARKANSAS CALIFORNIA COLORADO CONNECTICUT DELAWARE ... SOUTH DAKOTA TENNESSEE TEXAS UTAH VERMONT VIRGINIA WASHINGTON WEST VIRGINIA WISCONSIN WYOMING
0 1928 1 3.67 NaN 1.90 4.11 1.38 8.38 4.50 8.58 ... 5.69 22.03 1.18 0.4 0.28 NaN 14.83 3.36 1.54 0.91
1 1928 2 6.25 NaN 6.40 9.91 1.80 6.02 9.00 7.30 ... 6.57 16.96 0.63 NaN 0.56 NaN 17.34 4.19 0.96 NaN
2 1928 3 7.95 NaN 4.50 11.15 1.31 2.86 8.81 15.88 ... 2.04 24.66 0.62 0.2 1.12 NaN 15.67 4.19 4.79 1.36
3 1928 4 12.58 NaN 1.90 13.75 1.87 13.71 10.40 4.29 ... 2.19 18.86 0.37 0.2 6.70 NaN 12.77 4.66 1.64 3.64
4 1928 5 8.03 NaN 0.47 20.79 2.38 5.13 16.80 5.58 ... 3.94 20.05 1.57 0.4 6.70 NaN 18.83 7.37 2.91 0.91

5 rows × 53 columns

Convert from wide to long formats...¶

In [5]:
measles_data = measles_data_raw.melt(id_vars=["YEAR", "WEEK"], 
                                     value_name="incidence", 
                                     var_name="state")
In [6]:
measles_data.head()
Out[6]:
YEAR WEEK state incidence
0 1928 1 ALABAMA 3.67
1 1928 2 ALABAMA 6.25
2 1928 3 ALABAMA 7.95
3 1928 4 ALABAMA 12.58
4 1928 5 ALABAMA 8.03

Reminder: plotting with pandas¶

The default .plot() doesn't know which variables to plot.

In [7]:
fig, ax = plt.subplots(figsize=(10, 6))
measles_data.plot(ax=ax)
Out[7]:
<AxesSubplot:>

But we can group by the year, and plot the national average each year

In [8]:
by_year = measles_data.groupby("YEAR")['incidence'].sum()
by_year.head()
Out[8]:
YEAR
1928    16924.34
1929    12060.96
1930    14575.11
1931    15427.67
1932    14481.11
Name: incidence, dtype: float64
In [9]:
fig, ax = plt.subplots(figsize=(10, 6))

# Plot the annual average by year
by_year.plot(ax=ax)

# Add the vaccine year and label
ax.axvline(x=1963, c='k', linewidth=2)
ax.text(1963, 27000, " Vaccine introduced", ha='left', fontsize=18);

Adding interactivity with hvplot¶

Use the .hvplot() to create interactive plots.

In [10]:
# This will add the .hvplot() function to your DataFrame!
import hvplot.pandas

# Import holoviews too
import holoviews as hv

# Load bokeh
hv.extension('bokeh')
In [11]:
img = by_year.hvplot(kind='line')

img
Out[11]:

In this case, .hvplot() creates a Holoviews Curve object.

Not unlike altair Chart objects, it's an object that knows how to translate from your DataFrame data to a visualization.

In [12]:
print(img)
:Curve   [YEAR]   (incidence)

Many different chart types are available...¶

In [13]:
by_year.hvplot(kind='scatter')
Out[13]:
In [14]:
by_year.hvplot(kind='bar', rot=90, width=1000)
Out[14]:

Just like in altair, we can also layer chart elements together¶

Use the * operator to layer together chart elements.

Note: the same thing can be accomplished in altair, but with the + operator.

In [15]:
# The line chart of incidence vs year
incidence = by_year.hvplot(kind='line')

# Vertical line + label for vaccine year
vline = hv.VLine(1963).opts(color='black')
label = hv.Text(1963, 27000, " Vaccine introduced", halign='left')

final_chart = incidence * vline * label
final_chart
Out[15]:

We can group charts by a specific column, with automatic widget selectors¶

This is some powerful magic.

Let's calculate the annual measles incidence for each year and state:

In [16]:
by_state = measles_data.groupby(['YEAR', 'state'])['incidence'].sum()
by_state.head()
Out[16]:
YEAR  state     
1928  ALABAMA       334.99
      ALASKA          0.00
      ARIZONA       200.75
      ARKANSAS      481.77
      CALIFORNIA     69.22
Name: incidence, dtype: float64

Now, tell hvplot to plot produce charts for each state:

In [17]:
by_state_chart = by_state.hvplot(x="YEAR",
                                 y="incidence",
                                 groupby="state", 
                                 width=400, 
                                 kind="line")

by_state_chart
Out[17]:

We can select out individual charts from the set of grouped objects¶

In [18]:
PA = by_state_chart['PENNSYLVANIA'].relabel('PA')
NJ = by_state_chart['NEW JERSEY'].relabel('NJ')  

Combine charts as subplots with the + operator¶

In [19]:
combined = PA + NJ     

combined
Out[19]:
In [20]:
print(combined)
:Layout
   .Curve.PA :Curve   [YEAR]   (incidence)
   .Curve.NJ :Curve   [YEAR]   (incidence)

The charts are side-by-side by default. You can also specify the number of rows/columns explicitly.

In [21]:
# one column
combined.cols(1)
Out[21]:

We can also show overlay lines on the same plot¶

Using the by keyword:

In [22]:
states = ['NEW YORK', 'NEW JERSEY', 'CALIFORNIA', 'PENNSYLVANIA']
sub_states = by_state.loc[:, states]
In [23]:
sub_states
Out[23]:
YEAR  state       
1928  NEW YORK        649.97
1929  NEW YORK        249.09
1930  NEW YORK        315.39
1931  NEW YORK        423.22
1932  NEW YORK        465.06
                       ...  
1999  PENNSYLVANIA      0.00
2000  PENNSYLVANIA      0.00
2001  PENNSYLVANIA      0.05
2002  PENNSYLVANIA      0.00
2003  PENNSYLVANIA      0.00
Name: incidence, Length: 304, dtype: float64
In [24]:
sub_state_chart = sub_states.hvplot(x='YEAR', 
                                    y='incidence', 
                                    by='state', 
                                    kind='line') 

sub_state_chart * vline
Out[24]:

We can also show faceted plots¶

Just like in altair, when we used the alt.Chart().facet(column='state') syntax

Below, we specify the state column should be mapped to each column:

In [25]:
img = sub_states.hvplot(x="YEAR", 
                        y='incidence',
                        col="state", 
                        kind="line", 
                        rot=90, 
                        frame_width=200) 
img * vline
Out[25]:

Functions for each kind of chart type are available too¶

In [26]:
by_state.hvplot
Out[26]:
<hvplot.plotting.core.hvPlotTabular at 0x286734a90>

For example, we could plot a bar chart for these four states¶

In [27]:
by_state.loc[1960:1970, states].hvplot.bar(x='YEAR', 
                                           y='incidence', 
                                           by='state', rot=90)
Out[27]:

Change bar() to line() and we get the same thing as before.

In [28]:
by_state.loc[1960:1970, states].hvplot.line(x='YEAR', 
                                            y='incidence', 
                                            by='state', rot=90)
Out[28]:

Customizing charts¶

See the help message for explicit hvplot functions:

In [31]:
by_state.hvplot.line?

Heatmaps are available too...¶

Can we reproduce the WSJ measles heatmap that we made in altair in week 2?

Use the help function:

In [32]:
measles_data.hvplot.heatmap?

Two methods:¶

We want to plot 'YEAR' on the x axis, 'state' on the y axis, and specify 'incidence' as the values begin plotted in each heatmap bin.

  1. You can use the by_state data frame which has already summed over weeks for each state
  2. Use the original, tidy data (measles_data) with columns for state, week, year, and incidence
    • you will need to use the reduce_function keyword to sum over weeks
In [33]:
by_state
Out[33]:
YEAR  state        
1928  ALABAMA          334.99
      ALASKA             0.00
      ARIZONA          200.75
      ARKANSAS         481.77
      CALIFORNIA        69.22
                        ...  
2003  VIRGINIA           0.00
      WASHINGTON         0.00
      WEST VIRGINIA      0.00
      WISCONSIN          0.00
      WYOMING            0.00
Name: incidence, Length: 3876, dtype: float64

Method #1¶

In [37]:
# METHOD #1: just plot the incidence
heatmap = by_state.hvplot.heatmap(
    x="YEAR",
    y="state", 
    C="incidence",
    cmap="viridis",
    height=500,
    width=1000,
    flip_yaxis=True,
    rot=90,
)
heatmap.redim(
    state="State", YEAR="Year",
)
Out[37]:

Method #2¶

In [38]:
measles_data
Out[38]:
YEAR WEEK state incidence
0 1928 1 ALABAMA 3.67
1 1928 2 ALABAMA 6.25
2 1928 3 ALABAMA 7.95
3 1928 4 ALABAMA 12.58
4 1928 5 ALABAMA 8.03
... ... ... ... ...
201547 2003 48 WYOMING NaN
201548 2003 49 WYOMING NaN
201549 2003 50 WYOMING NaN
201550 2003 51 WYOMING NaN
201551 2003 52 WYOMING NaN

201552 rows × 4 columns

In [39]:
## METHOD 2: hvplot does the aggregation
heatmap = measles_data.hvplot.heatmap(
    x="YEAR",
    y="state",
    C="incidence",
    cmap='viridis',
    reduce_function=np.sum,
    height=500,
    width=1000,
    flip_yaxis=True,
    rot=90,
)
heatmap.redim(state="State", YEAR="Year")
Out[39]:

Just like altair: save the file as html¶

In [40]:
import hvplot
hvplot.save(heatmap, 'measles.html')
In [41]:
# load the html file and display it
from IPython.display import HTML
HTML('measles.html')
Out[41]:
measles

Two more useful features:¶

  1. Scatter matrix plots
  2. Explorer mode

1. Scatter matrix plots¶

Visualizing relationships between variables, as we have seen in seaborn and altair

Let's load the penguins data set from week 2

In [42]:
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2022/week-2/master/data/penguins.csv"
penguins = pd.read_csv(url)
In [43]:
penguins.head()
Out[43]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

Use the hvplot.scatter_matrix() function:

In [44]:
penguins.hvplot.scatter?
In [45]:
columns = ['flipper_length_mm', 
           'bill_length_mm', 
           'body_mass_g', 
           'species']
hvplot.scatter_matrix(penguins[columns], c='species')
Out[45]:

Note the "box select" and "lasso" features on the tool bar for interactions

2. Explorer mode¶

  • An interactive interface that allows you to easily generate customized plots, which makes it easy to explore both your data and hvPlot’s options, parameters, etc.
  • New feature just released!
In [80]:
hvexplorer = hvplot.explorer(penguins)
hvexplorer
Out[80]:

Get the code for your customized plot¶

You can then export the current state of the explorer by running hvexplorer.plot_code()

In [81]:
hvexplorer.plot_code()
Out[81]:
"df.hvplot(by=['species'], colorbar=True, kind='scatter', title='Bill Length vs. Depth by Penguin Species', x='bill_length_mm', xlabel='Bill Length', y=['bill_depth_mm'], ylabel='Bill Depth')"
In [82]:
penguins.hvplot(
    by=["species"],
    colorbar=True,
    kind="scatter",
    title="Bill Length vs. Depth by Penguin Species",
    x="bill_length_mm",
    xlabel="Bill Length",
    y=["bill_depth_mm"],
    ylabel="Bill Depth",
)
Out[82]:

Recap: altair vs hvplot¶

  • Both use a declarative syntax (altair more so than hvplot)
  • Users of ggplot might be more familiar with altair's syntax
  • hvplot integrates directly into pandas dataframes via the .hvplot() function
  • Both have support for cross-filtering and interactions
  • Both can be incorporated into web-based dashboard via HTML (later in course)
  • hvplot has better support for large data (later in course)

It's largely up to you which one you feel is easier to use¶

hvplot can also be used with geopandas!¶

Let's load some geographic data for countries:

In [46]:
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
In [47]:
world.head()
Out[47]:
pop_est continent name iso_a3 gdp_md_est geometry
0 889953.0 Oceania Fiji FJI 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 58005463.0 Africa Tanzania TZA 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253.0 Africa W. Sahara ESH 907 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 37589262.0 North America Canada CAN 1736425 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 328239523.0 North America United States of America USA 21433226 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Plotting with just geopandas¶

In [48]:
fig, ax = plt.subplots(figsize=(10,10))
world.plot(column='gdp_md_est', ax=ax)
ax.set_axis_off()

Now with hvplot¶

In [49]:
world.hvplot.polygons?
In [51]:
# Can also just do world.hvplot()
world.hvplot.polygons(c="gdp_md_est", geo=True, frame_height=400, logz=True)
/Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/geoviews/operation/projection.py:79: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
  polys = [g for g in geom if g.area > 1e-15]
Out[51]:

Let's try it on our median assessment values per neighborhood from last week¶

In [52]:
import geopandas as gpd
In [53]:
# Load the data
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2022/week-3/master/data/opa_residential.csv"
data = pd.read_csv(url)

# Create the Point() objects
data['Coordinates'] = gpd.points_from_xy(data['lng'], data['lat'])

# Create the GeoDataFrame
data = gpd.GeoDataFrame(data, geometry='Coordinates', crs="EPSG:4326")
In [54]:
# load the Zillow data from GitHub
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2022/week-3/master/data/zillow_neighborhoods.geojson"
zillow = gpd.read_file(url)
In [55]:
# Important: Make sure the CRS match
data = data.to_crs(zillow.crs)

# perform the spatial join
data = gpd.sjoin(data, zillow, predicate='within', how='left')
In [56]:
# Calculate the median market value per Zillow neighborhood
median_values = data.groupby('ZillowName', as_index=False)['market_value'].median()

# Merge median values with the Zillow geometries
median_values = zillow.merge(median_values, on='ZillowName')
print(type(median_values))
<class 'geopandas.geodataframe.GeoDataFrame'>
In [57]:
median_values.head()
Out[57]:
ZillowName geometry market_value
0 Academy Gardens POLYGON ((-74.99851 40.06435, -74.99456 40.061... 185950.0
1 Allegheny West POLYGON ((-75.16592 40.00327, -75.16596 40.003... 34750.0
2 Andorra POLYGON ((-75.22463 40.06686, -75.22588 40.065... 251900.0
3 Aston Woodbridge POLYGON ((-75.00860 40.05369, -75.00861 40.053... 183800.0
4 Bartram Village POLYGON ((-75.20733 39.93350, -75.20733 39.933... 48300.0
In [58]:
median_values.crs
Out[58]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [59]:
# pass arguments directly to hvplot() 
# and it recognizes polygons automatically
median_values.hvplot(c='market_value', 
                     frame_width=600, 
                     frame_height=500, 
                     geo=True, 
                     cmap='viridis', 
                     hover_cols=['ZillowName'])
Out[59]:

Important: geo=True assumes EPSG:4326¶

If you specify geo=True, the data needs to be in typical lat/lng CRS. If not, you can use the crs keyword to specify the type of CRS your data is in.

In [60]:
median_values_3857 = median_values.to_crs(epsg=3857)
In [61]:
median_values_3857.crs
Out[61]:
<Derived Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [62]:
median_values_3857.hvplot(c='market_value', 
                     frame_width=600, 
                     frame_height=500, 
                    geo=True, 
                    crs=3857, # NEW: specify the CRS
                     cmap='viridis', 
                     hover_cols=['ZillowName'])
Out[62]:

Now we can take advantage of GeoViews¶

Let's add a tile source underneath the choropleth map

In [65]:
import geoviews as gv
import geoviews.tile_sources as gvts
In [66]:
type(gvts.ESRI)
Out[66]:
geoviews.element.geo.WMTS
In [67]:
%%opts WMTS [width=800, height=800, xaxis=None, yaxis=None]

choro = median_values.hvplot(c='market_value', 
                             width=500, 
                             height=400, 
                             alpha=0.5, 
                             geo=True, 
                             cmap='viridis', 
                             hover_cols=['ZillowName'])
gvts.ESRI * choro
Out[67]:

Many of the most common tile sources are available..¶

In [68]:
%%opts WMTS [width=200, height=200, xaxis=None, yaxis=None]

(gvts.OSM + 
 gvts.Wikipedia + 
 gvts.StamenToner + 
 gvts.EsriNatGeo +
 gvts.EsriImagery + 
 gvts.EsriUSATopo + 
 gvts.EsriTerrain + 
 gvts.CartoDark).cols(4)
Out[68]:

Note: we've used the %%opts cell magic to apply syling options to any charts generated in the cell.

See the documentation guide on customizations for more details.

What about interactive hex bins?¶

You can do it with hvplot! Sort of.

Step 1: Extract out the x/y values of the data¶

  • Let's add them as new columns into the data frame
  • Remember, you can the use "x" and "y" attributes of the "geometry" column.
In [69]:
data['x'] = data.geometry.x
data['y'] = data.geometry.y
In [70]:
data.head()
Out[70]:
parcel_number lat lng location market_value building_value land_value total_land_area total_livable_area Coordinates index_right ZillowName x y
0 71361800 39.991575 -75.128994 2726 A ST 62200.0 44473.0 17727.0 1109.69 1638.0 POINT (-75.12899 39.99158) 79.0 McGuire -75.128994 39.991575
1 71362100 39.991702 -75.128978 2732 A ST 25200.0 18018.0 7182.0 1109.69 1638.0 POINT (-75.12898 39.99170) 79.0 McGuire -75.128978 39.991702
2 71362200 39.991744 -75.128971 2734 A ST 62200.0 44473.0 17727.0 1109.69 1638.0 POINT (-75.12897 39.99174) 79.0 McGuire -75.128971 39.991744
3 71362600 39.991994 -75.128895 2742 A ST 15500.0 11083.0 4417.0 1109.69 1638.0 POINT (-75.12889 39.99199) 79.0 McGuire -75.128895 39.991994
4 71363800 39.992592 -75.128743 2814 A ST 31300.0 22400.0 8900.0 643.50 890.0 POINT (-75.12874 39.99259) 79.0 McGuire -75.128743 39.992592

Step 2: Create a new DataFrame with only the columns we need¶

  • In this case, we'll use the x and y coordinate columns and the associated market_value column
  • Important: hvplot's hexbin won't work if the input data is a GeoDataFrame, e.g., still has a geometry column
    • By doing the column selection, we convert the data into a regular pandas DataFrame
In [71]:
subdata = data[['x', 'y', 'market_value']]
In [72]:
type(subdata)
Out[72]:
pandas.core.frame.DataFrame

Step 3: Plot with the hexbin function¶

  • Similar syntax to matplotlib's hexbin() function
  • Specify:
    • The x/y coordinates,
    • An optional C column to aggregate for each bin (raw counts are shown if not provided)
    • A reduce_function that determines how to aggregate the C column
In [73]:
data.hvplot.hexbin?
In [74]:
subdata.head()
Out[74]:
x y market_value
0 -75.128994 39.991575 62200.0
1 -75.128978 39.991702 25200.0
2 -75.128971 39.991744 62200.0
3 -75.128895 39.991994 15500.0
4 -75.128743 39.992592 31300.0
In [75]:
subdata.hvplot.hexbin(x='x', 
                      y='y', 
                      C='market_value', 
                      reduce_function=np.median, 
                      logz=True, 
                      geo=True, 
                      gridsize=40, 
                      cmap='viridis')
Out[75]:

Not the prettiest but it gets the job done for some quick exploratory analysis!

Documentation references¶

  • Hvplot user guide
  • HoloViz tutorial: introduction to the HoloViz ecosystem
  • HoloViews user guide and gallery
  • GeoViews user guide and gallery

Some very cool examples available in the galleries

That's it!¶

  • Homework #2 due a week from today
  • Wednesday we'll dive into raster data analysis!
In [ ]: