%%html 

<center>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">We are very pleased officially announce the release of hvPlot! It provides a high-level plotting API for the PyData ecosystem including <a href="https://twitter.com/pandas_dev?ref_src=twsrc%5Etfw">@pandas_dev</a>, <a href="https://twitter.com/xarray_dev?ref_src=twsrc%5Etfw">@xarray_dev</a>, <a href="https://twitter.com/dask_dev?ref_src=twsrc%5Etfw">@dask_dev</a>, <a href="https://twitter.com/geopandas?ref_src=twsrc%5Etfw">@geopandas</a> and more, generating interactive <a href="https://twitter.com/datashader?ref_src=twsrc%5Etfw">@datashader</a> and <a href="https://twitter.com/BokehPlots?ref_src=twsrc%5Etfw">@BokehPlots</a>. <a href="https://t.co/Loc5XElJUL">https://t.co/Loc5XElJUL</a></p>&mdash; HoloViews (@HoloViews) <a href="https://twitter.com/HoloViews/status/1092409050283819010?ref_src=twsrc%5Etfw">February 4, 2019</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</center>


# Our usual imports
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt


# let's load the measles data from week 2
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2022/week-2/master/data/measles_incidence.csv"
measles_data_raw = pd.read_csv(url, skiprows=2, na_values='-')


measles_data_raw.head()


measles_data = measles_data_raw.melt(id_vars=["YEAR", "WEEK"], 
                                     value_name="incidence", 
                                     var_name="state")


measles_data.head()


fig, ax = plt.subplots(figsize=(10, 6))
measles_data.plot(ax=ax)

<AxesSubplot:>


by_year = measles_data.groupby("YEAR")['incidence'].sum()
by_year.head()

YEAR
1928    16924.34
1929    12060.96
1930    14575.11
1931    15427.67
1932    14481.11
Name: incidence, dtype: float64


fig, ax = plt.subplots(figsize=(10, 6))

# Plot the annual average by year
by_year.plot(ax=ax)

# Add the vaccine year and label
ax.axvline(x=1963, c='k', linewidth=2)
ax.text(1963, 27000, " Vaccine introduced", ha='left', fontsize=18);


# This will add the .hvplot() function to your DataFrame!
import hvplot.pandas

# Import holoviews too
import holoviews as hv

# Load bokeh
hv.extension('bokeh')


img = by_year.hvplot(kind='line')

img


print(img)

:Curve   [YEAR]   (incidence)


by_year.hvplot(kind='scatter')


by_year.hvplot(kind='bar', rot=90, width=1000)


# The line chart of incidence vs year
incidence = by_year.hvplot(kind='line')

# Vertical line + label for vaccine year
vline = hv.VLine(1963).opts(color='black')
label = hv.Text(1963, 27000, " Vaccine introduced", halign='left')

final_chart = incidence * vline * label
final_chart


by_state = measles_data.groupby(['YEAR', 'state'])['incidence'].sum()
by_state.head()

YEAR  state     
1928  ALABAMA       334.99
      ALASKA          0.00
      ARIZONA       200.75
      ARKANSAS      481.77
      CALIFORNIA     69.22
Name: incidence, dtype: float64


by_state_chart = by_state.hvplot(x="YEAR",
                                 y="incidence",
                                 groupby="state", 
                                 width=400, 
                                 kind="line")

by_state_chart


PA = by_state_chart['PENNSYLVANIA'].relabel('PA')
NJ = by_state_chart['NEW JERSEY'].relabel('NJ')


combined = PA + NJ     

combined


print(combined)

:Layout
   .Curve.PA :Curve   [YEAR]   (incidence)
   .Curve.NJ :Curve   [YEAR]   (incidence)


# one column
combined.cols(1)


states = ['NEW YORK', 'NEW JERSEY', 'CALIFORNIA', 'PENNSYLVANIA']
sub_states = by_state.loc[:, states]


sub_states

YEAR  state       
1928  NEW YORK        649.97
1929  NEW YORK        249.09
1930  NEW YORK        315.39
1931  NEW YORK        423.22
1932  NEW YORK        465.06
                       ...  
1999  PENNSYLVANIA      0.00
2000  PENNSYLVANIA      0.00
2001  PENNSYLVANIA      0.05
2002  PENNSYLVANIA      0.00
2003  PENNSYLVANIA      0.00
Name: incidence, Length: 304, dtype: float64


sub_state_chart = sub_states.hvplot(x='YEAR', 
                                    y='incidence', 
                                    by='state', 
                                    kind='line') 

sub_state_chart * vline


img = sub_states.hvplot(x="YEAR", 
                        y='incidence',
                        col="state", 
                        kind="line", 
                        rot=90, 
                        frame_width=200) 
img * vline


by_state.hvplot

<hvplot.plotting.core.hvPlotTabular at 0x286734a90>


by_state.loc[1960:1970, states].hvplot.bar(x='YEAR', 
                                           y='incidence', 
                                           by='state', rot=90)


by_state.loc[1960:1970, states].hvplot.line(x='YEAR', 
                                            y='incidence', 
                                            by='state', rot=90)


by_state.hvplot.line?


measles_data.hvplot.heatmap?


by_state

YEAR  state        
1928  ALABAMA          334.99
      ALASKA             0.00
      ARIZONA          200.75
      ARKANSAS         481.77
      CALIFORNIA        69.22
                        ...  
2003  VIRGINIA           0.00
      WASHINGTON         0.00
      WEST VIRGINIA      0.00
      WISCONSIN          0.00
      WYOMING            0.00
Name: incidence, Length: 3876, dtype: float64


# METHOD #1: just plot the incidence
heatmap = by_state.hvplot.heatmap(
    x="YEAR",
    y="state", 
    C="incidence",
    cmap="viridis",
    height=500,
    width=1000,
    flip_yaxis=True,
    rot=90,
)
heatmap.redim(
    state="State", YEAR="Year",
)


measles_data


## METHOD 2: hvplot does the aggregation
heatmap = measles_data.hvplot.heatmap(
    x="YEAR",
    y="state",
    C="incidence",
    cmap='viridis',
    reduce_function=np.sum,
    height=500,
    width=1000,
    flip_yaxis=True,
    rot=90,
)
heatmap.redim(state="State", YEAR="Year")


import hvplot
hvplot.save(heatmap, 'measles.html')


# load the html file and display it
from IPython.display import HTML
HTML('measles.html')


url = "https://raw.githubusercontent.com/MUSA-550-Fall-2022/week-2/master/data/penguins.csv"
penguins = pd.read_csv(url)


penguins.head()


penguins.hvplot.scatter?


columns = ['flipper_length_mm', 
           'bill_length_mm', 
           'body_mass_g', 
           'species']
hvplot.scatter_matrix(penguins[columns], c='species')


hvexplorer = hvplot.explorer(penguins)
hvexplorer


hvexplorer.plot_code()

"df.hvplot(by=['species'], colorbar=True, kind='scatter', title='Bill Length vs. Depth by Penguin Species', x='bill_length_mm', xlabel='Bill Length', y=['bill_depth_mm'], ylabel='Bill Depth')"


penguins.hvplot(
    by=["species"],
    colorbar=True,
    kind="scatter",
    title="Bill Length vs. Depth by Penguin Species",
    x="bill_length_mm",
    xlabel="Bill Length",
    y=["bill_depth_mm"],
    ylabel="Bill Depth",
)


import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))


world.head()


fig, ax = plt.subplots(figsize=(10,10))
world.plot(column='gdp_md_est', ax=ax)
ax.set_axis_off()


world.hvplot.polygons?


# Can also just do world.hvplot()
world.hvplot.polygons(c="gdp_md_est", geo=True, frame_height=400, logz=True)

/Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/geoviews/operation/projection.py:79: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
  polys = [g for g in geom if g.area > 1e-15]


import geopandas as gpd


# Load the data
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2022/week-3/master/data/opa_residential.csv"
data = pd.read_csv(url)

# Create the Point() objects
data['Coordinates'] = gpd.points_from_xy(data['lng'], data['lat'])

# Create the GeoDataFrame
data = gpd.GeoDataFrame(data, geometry='Coordinates', crs="EPSG:4326")


# load the Zillow data from GitHub
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2022/week-3/master/data/zillow_neighborhoods.geojson"
zillow = gpd.read_file(url)


# Important: Make sure the CRS match
data = data.to_crs(zillow.crs)

# perform the spatial join
data = gpd.sjoin(data, zillow, predicate='within', how='left')


# Calculate the median market value per Zillow neighborhood
median_values = data.groupby('ZillowName', as_index=False)['market_value'].median()

# Merge median values with the Zillow geometries
median_values = zillow.merge(median_values, on='ZillowName')
print(type(median_values))

<class 'geopandas.geodataframe.GeoDataFrame'>


median_values.head()


median_values.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich


# pass arguments directly to hvplot() 
# and it recognizes polygons automatically
median_values.hvplot(c='market_value', 
                     frame_width=600, 
                     frame_height=500, 
                     geo=True, 
                     cmap='viridis', 
                     hover_cols=['ZillowName'])


median_values_3857 = median_values.to_crs(epsg=3857)


median_values_3857.crs

<Derived Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich


median_values_3857.hvplot(c='market_value', 
                     frame_width=600, 
                     frame_height=500, 
                    geo=True, 
                    crs=3857, # NEW: specify the CRS
                     cmap='viridis', 
                     hover_cols=['ZillowName'])


import geoviews as gv
import geoviews.tile_sources as gvts


type(gvts.ESRI)

geoviews.element.geo.WMTS


%%opts WMTS [width=800, height=800, xaxis=None, yaxis=None]

choro = median_values.hvplot(c='market_value', 
                             width=500, 
                             height=400, 
                             alpha=0.5, 
                             geo=True, 
                             cmap='viridis', 
                             hover_cols=['ZillowName'])
gvts.ESRI * choro


%%opts WMTS [width=200, height=200, xaxis=None, yaxis=None]

(gvts.OSM + 
 gvts.Wikipedia + 
 gvts.StamenToner + 
 gvts.EsriNatGeo +
 gvts.EsriImagery + 
 gvts.EsriUSATopo + 
 gvts.EsriTerrain + 
 gvts.CartoDark).cols(4)


data['x'] = data.geometry.x
data['y'] = data.geometry.y


data.head()


subdata = data[['x', 'y', 'market_value']]


type(subdata)

pandas.core.frame.DataFrame


data.hvplot.hexbin?


subdata.head()


subdata.hvplot.hexbin(x='x', 
                      y='y', 
                      C='market_value', 
                      reduce_function=np.median, 
                      logz=True, 
                      geo=True, 
                      gridsize=40, 
                      cmap='viridis')

	YEAR	WEEK	ALABAMA	ALASKA	ARIZONA	ARKANSAS	CALIFORNIA	COLORADO	CONNECTICUT	DELAWARE	...	SOUTH DAKOTA	TENNESSEE	TEXAS	UTAH	VERMONT	VIRGINIA	WASHINGTON	WEST VIRGINIA	WISCONSIN	WYOMING
0	1928	1	3.67	NaN	1.90	4.11	1.38	8.38	4.50	8.58	...	5.69	22.03	1.18	0.4	0.28	NaN	14.83	3.36	1.54	0.91
1	1928	2	6.25	NaN	6.40	9.91	1.80	6.02	9.00	7.30	...	6.57	16.96	0.63	NaN	0.56	NaN	17.34	4.19	0.96	NaN
2	1928	3	7.95	NaN	4.50	11.15	1.31	2.86	8.81	15.88	...	2.04	24.66	0.62	0.2	1.12	NaN	15.67	4.19	4.79	1.36
3	1928	4	12.58	NaN	1.90	13.75	1.87	13.71	10.40	4.29	...	2.19	18.86	0.37	0.2	6.70	NaN	12.77	4.66	1.64	3.64
4	1928	5	8.03	NaN	0.47	20.79	2.38	5.13	16.80	5.58	...	3.94	20.05	1.57	0.4	6.70	NaN	18.83	7.37	2.91	0.91

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	male	2007
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	female	2007
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	female	2007
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN	2007
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	female	2007

	pop_est	continent	name	iso_a3	gdp_md_est	geometry
0	889953.0	Oceania	Fiji	FJI	5496	MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1	58005463.0	Africa	Tanzania	TZA	63177	POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2	603253.0	Africa	W. Sahara	ESH	907	POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3	37589262.0	North America	Canada	CAN	1736425	MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4	328239523.0	North America	United States of America	USA	21433226	MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

	ZillowName	geometry	market_value
0	Academy Gardens	POLYGON ((-74.99851 40.06435, -74.99456 40.061...	185950.0
1	Allegheny West	POLYGON ((-75.16592 40.00327, -75.16596 40.003...	34750.0
2	Andorra	POLYGON ((-75.22463 40.06686, -75.22588 40.065...	251900.0
3	Aston Woodbridge	POLYGON ((-75.00860 40.05369, -75.00861 40.053...	183800.0
4	Bartram Village	POLYGON ((-75.20733 39.93350, -75.20733 39.933...	48300.0

	parcel_number	lat	lng	location	market_value	building_value	land_value	total_land_area	total_livable_area	Coordinates	index_right	ZillowName	x	y
0	71361800	39.991575	-75.128994	2726 A ST	62200.0	44473.0	17727.0	1109.69	1638.0	POINT (-75.12899 39.99158)	79.0	McGuire	-75.128994	39.991575
1	71362100	39.991702	-75.128978	2732 A ST	25200.0	18018.0	7182.0	1109.69	1638.0	POINT (-75.12898 39.99170)	79.0	McGuire	-75.128978	39.991702
2	71362200	39.991744	-75.128971	2734 A ST	62200.0	44473.0	17727.0	1109.69	1638.0	POINT (-75.12897 39.99174)	79.0	McGuire	-75.128971	39.991744
3	71362600	39.991994	-75.128895	2742 A ST	15500.0	11083.0	4417.0	1109.69	1638.0	POINT (-75.12889 39.99199)	79.0	McGuire	-75.128895	39.991994
4	71363800	39.992592	-75.128743	2814 A ST	31300.0	22400.0	8900.0	643.50	890.0	POINT (-75.12874 39.99259)	79.0	McGuire	-75.128743	39.992592

Week 4A: More Interactive Data Viz, Working with Raster Data¶

Housekeeping¶

Week #4 Agenda¶

Part 1: More interactive data viz¶

Recap: Data viz in Python¶

What have we learned so far¶

Matplotlib¶

Pandas¶

seaborn¶

altair¶

We'll learn one more today...¶

A set of coordinated visualization libraries in Python¶

The motivation behind HoloViz mirrors the goals of this course¶

Today: hvPlot, Holoviews, Geoviews¶

Later in the course: Datashader, Param, Panel¶

A quick overview¶

$\rightarrow$ Similar to altair and Vega¶

A significant pro:¶

The major con:¶

How does hvPlot fit in?¶

The hvPlot package¶

Main use¶

Let's load some data to try out¶

Convert from wide to long formats...¶

Reminder: plotting with pandas¶

Adding interactivity with hvplot¶

Many different chart types are available...¶

Just like in altair, we can also layer chart elements together¶

We can group charts by a specific column, with automatic widget selectors¶

We can select out individual charts from the set of grouped objects¶

Combine charts as subplots with the + operator¶

We can also show overlay lines on the same plot¶

We can also show faceted plots¶

Functions for each kind of chart type are available too¶

For example, we could plot a bar chart for these four states¶

Customizing charts¶

Heatmaps are available too...¶

Two methods:¶

Method #1¶

Method #2¶

Just like altair: save the file as html¶

Two more useful features:¶

1. Scatter matrix plots¶

2. Explorer mode¶

Get the code for your customized plot¶

Recap: altair vs hvplot¶

It's largely up to you which one you feel is easier to use¶

hvplot can also be used with geopandas!¶

Plotting with just geopandas¶

Now with hvplot¶

Let's try it on our median assessment values per neighborhood from last week¶

Important: geo=True assumes EPSG:4326¶

Now we can take advantage of GeoViews¶

Many of the most common tile sources are available..¶

What about interactive hex bins?¶

Step 1: Extract out the x/y values of the data¶

Step 2: Create a new DataFrame with only the columns we need¶

Step 3: Plot with the hexbin function¶

Documentation references¶

That's it!¶

Today: `hvPlot`, `Holoviews`, `Geoviews`¶

Later in the course: `Datashader`, `Param`, `Panel`¶

How does `hvPlot` fit in?¶

The `hvPlot` package¶

Reminder: plotting with `pandas`¶

Adding interactivity with `hvplot`¶

Combine charts as subplots with the `+` operator¶

Important: `geo=True` assumes EPSG:4326¶