Oct 24, 2022
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
Two case studies:
Source: https://earthobservatory.nasa.gov/world-of-change/AralSea
Source: https://earthobservatory.nasa.gov/images/76327/lake-orumiyeh-iran
Source: https://earthobservatory.nasa.gov/images/91921/disappearing-walker-lake
import intake
import xarray as xr
# hvplot
import hvplot.xarray
import hvplot.pandas
import holoviews as hv
import geoviews as gv
from geoviews.tile_sources import EsriImagery
/Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/dask/dataframe/backends.py:189: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index) /Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/dask/dataframe/backends.py:189: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index) /Users/nhand/mambaforge/envs/musa-550-fall-2022/lib/python3.9/site-packages/dask/dataframe/backends.py:189: FutureWarning: pandas.UInt64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
url = 'https://raw.githubusercontent.com/pyviz-topics/EarthML/master/examples/catalog.yml'
cat = intake.open_catalog(url)
list(cat)
['landsat_5_small', 'landsat_8_small', 'landsat_5', 'landsat_8', 'google_landsat_band', 'amazon_landsat_band', 'fluxnet_daily', 'fluxnet_metadata', 'seattle_lidar']
We'll focus on the Landsat 5 and 8 small datasets.
These are "small" snapshots around Walker Lake, around cut out of the larger Landsat dataset.
landsat_5 = cat.landsat_5_small()
landsat_5
landsat_5_small: args: chunks: band: 1 x: 50 y: 50 concat_dim: band storage_options: anon: true urlpath: s3://earth-data/landsat/small/LT05_L1TP_042033_19881022_20161001_01_T1_sr_band{band:d}.tif description: Small version of Landsat 5 Surface Reflectance Level-2 Science Product. driver: intake_xarray.raster.RasterIOSource metadata: cache: - argkey: urlpath regex: earth-data/landsat type: file catalog_dir: https://raw.githubusercontent.com/pyviz-topics/EarthML/master/examples plots: band_image: dynamic: false groupby: band kind: image rasterize: true width: 400 x: x y: y
Leveraging the power of hvplot under the hood...
landsat_5.hvplot.band_image()
landsat_5.hvplot.image(
x="x",
y="y",
groupby="band",
rasterize=True,
frame_width=400,
frame_height=400,
geo=True,
crs=32611,
)
landsat_8 = cat.landsat_8_small()
landsat_8.hvplot.image(
x="x",
y="y",
groupby="band",
rasterize=True,
frame_width=400,
frame_height=400,
geo=True,
crs=32611,
)
This will return an xarray
DataArray
where the values are stored as dask
arrays.
landsat_5_da = landsat_5.to_dask()
landsat_8_da = landsat_8.to_dask()
landsat_5_da
<xarray.DataArray (band: 6, y: 300, x: 300)> array([[[ 640., 842., 864., ..., 1250., 929., 1111.], [ 796., 774., 707., ..., 1136., 906., 1065.], [ 975., 707., 908., ..., 1386., 1249., 1088.], ..., [1243., 1202., 1160., ..., 1132., 1067., 845.], [1287., 1334., 1292., ..., 801., 934., 845.], [1550., 1356., 1314., ..., 1309., 1636., 1199.]], [[ 810., 1096., 1191., ..., 1749., 1266., 1411.], [1096., 1048., 905., ..., 1556., 1217., 1411.], [1286., 1000., 1286., ..., 1749., 1604., 1411.], ..., [1752., 1565., 1566., ..., 1502., 1456., 1078.], [1752., 1799., 1706., ..., 983., 1172., 1077.], [1893., 1753., 1754., ..., 1736., 2250., 1736.]], [[1007., 1345., 1471., ..., 2102., 1462., 1719.], [1260., 1259., 1175., ..., 1847., 1419., 1719.], [1555., 1175., 1555., ..., 2059., 1889., 1760.], ..., ... ..., [2429., 2138., 2041., ..., 2175., 1885., 1301.], [2381., 2333., 2382., ..., 1204., 1495., 1301.], [2478., 2041., 2333., ..., 2755., 3431., 2223.]], [[1819., 2596., 2495., ..., 2429., 1785., 2023.], [2259., 2359., 1885., ..., 2158., 1684., 1921.], [2865., 2291., 2664., ..., 2057., 1955., 2057.], ..., [3081., 2679., 2612., ..., 2499., 2098., 1395.], [2779., 2544., 2779., ..., 1429., 1596., 1496.], [3183., 2309., 2679., ..., 3067., 3802., 2665.]], [[1682., 2215., 2070., ..., 2072., 1440., 1780.], [1876., 1973., 1633., ..., 1926., 1586., 1635.], [2409., 1924., 2215., ..., 1829., 1780., 1829.], ..., [2585., 2296., 2296., ..., 2093., 1710., 1134.], [2393., 2344., 2489., ..., 1182., 1374., 1326.], [2826., 2007., 2393., ..., 2860., 3724., 2333.]]]) Coordinates: * band (band) int64 1 2 3 4 5 7 * y (y) float64 4.309e+06 4.309e+06 4.309e+06 ... 4.264e+06 4.264e+06 * x (x) float64 3.324e+05 3.326e+05 3.327e+05 ... 3.771e+05 3.772e+05 Attributes: transform: (150.0, 0.0, 332325.0, 0.0, -150.0, 4309275.0) crs: +init=epsg:32611 res: (150.0, 150.0) is_tiled: 0 nodatavals: (nan,) scales: (1.0,) offsets: (0.0,) AREA_OR_POINT: Area
landsat_5_da.shape
(6, 300, 300)
landsat_5_da.values
array([[[ 640., 842., 864., ..., 1250., 929., 1111.], [ 796., 774., 707., ..., 1136., 906., 1065.], [ 975., 707., 908., ..., 1386., 1249., 1088.], ..., [1243., 1202., 1160., ..., 1132., 1067., 845.], [1287., 1334., 1292., ..., 801., 934., 845.], [1550., 1356., 1314., ..., 1309., 1636., 1199.]], [[ 810., 1096., 1191., ..., 1749., 1266., 1411.], [1096., 1048., 905., ..., 1556., 1217., 1411.], [1286., 1000., 1286., ..., 1749., 1604., 1411.], ..., [1752., 1565., 1566., ..., 1502., 1456., 1078.], [1752., 1799., 1706., ..., 983., 1172., 1077.], [1893., 1753., 1754., ..., 1736., 2250., 1736.]], [[1007., 1345., 1471., ..., 2102., 1462., 1719.], [1260., 1259., 1175., ..., 1847., 1419., 1719.], [1555., 1175., 1555., ..., 2059., 1889., 1760.], ..., [2090., 1840., 1798., ..., 1828., 1703., 1242.], [2048., 2049., 2008., ..., 1074., 1326., 1158.], [2216., 1965., 2049., ..., 2202., 2783., 1994.]], [[1221., 1662., 1809., ..., 2401., 1660., 1957.], [1564., 1465., 1367., ..., 2105., 1610., 1907.], [2004., 1465., 1955., ..., 2302., 2055., 2006.], ..., [2429., 2138., 2041., ..., 2175., 1885., 1301.], [2381., 2333., 2382., ..., 1204., 1495., 1301.], [2478., 2041., 2333., ..., 2755., 3431., 2223.]], [[1819., 2596., 2495., ..., 2429., 1785., 2023.], [2259., 2359., 1885., ..., 2158., 1684., 1921.], [2865., 2291., 2664., ..., 2057., 1955., 2057.], ..., [3081., 2679., 2612., ..., 2499., 2098., 1395.], [2779., 2544., 2779., ..., 1429., 1596., 1496.], [3183., 2309., 2679., ..., 3067., 3802., 2665.]], [[1682., 2215., 2070., ..., 2072., 1440., 1780.], [1876., 1973., 1633., ..., 1926., 1586., 1635.], [2409., 1924., 2215., ..., 1829., 1780., 1829.], ..., [2585., 2296., 2296., ..., 2093., 1710., 1134.], [2393., 2344., 2489., ..., 1182., 1374., 1326.], [2826., 2007., 2393., ..., 2860., 3724., 2333.]]])
Use the attrs
attribute
landsat_5_da.attrs
{'transform': (150.0, 0.0, 332325.0, 0.0, -150.0, 4309275.0), 'crs': '+init=epsg:32611', 'res': (150.0, 150.0), 'is_tiled': 0, 'nodatavals': (nan,), 'scales': (1.0,), 'offsets': (0.0,), 'AREA_OR_POINT': 'Area'}
landsat_5_da.crs
'+init=epsg:32611'
Problem: they appear to cover different areas
See week 4's lecture for a reminder!
Hints
earthpy
package to do the plotting.data.sel(band=bands)
syntax to select specific bands from the dataset, where bands is a list of the desired bands to be selected.import earthpy.plot as ep
# Get the RGB data from landsat 8 dataset as a numpy array
rgb_data_8 = landsat_8_da.sel(band=[4,3,2]).values
# # Initialize
fig, ax = plt.subplots(figsize=(10,10))
# # Plot the RGB bands
ep.plot_rgb(rgb_data_8, rgb=(0, 1, 2), ax=ax);
# Get the RGB data from landsat 5 dataset as a numpy array
rgb_data_5 = landsat_5_da.sel(band=[3,2,1]).values
# # Initialize
fig, ax = plt.subplots(figsize=(10,10))
# # Plot the RGB bands
ep.plot_rgb(rgb_data_5, rgb=(0, 1, 2), ax=ax);
Yes, we can use xarray builtin selection functionality!
The Landsat 5 images is more zoomed in, so let's trim to the Landsat 8 data to this range
Take advantage of the "coordinate" arrays provided by xarray:
landsat_5_da.x
<xarray.DataArray 'x' (x: 300)> array([332400., 332550., 332700., ..., 376950., 377100., 377250.]) Coordinates: * x (x) float64 3.324e+05 3.326e+05 3.327e+05 ... 3.771e+05 3.772e+05
landsat_5_da.y
<xarray.DataArray 'y' (y: 300)> array([4309200., 4309050., 4308900., ..., 4264650., 4264500., 4264350.]) Coordinates: * y (y) float64 4.309e+06 4.309e+06 4.309e+06 ... 4.264e+06 4.264e+06
Remember: These coordinates are in units of meters!
# x bounds
xmin = landsat_5_da.x.min()
xmax = landsat_5_da.x.max()
# y bounds
ymin = landsat_5_da.y.min()
ymax = landsat_5_da.y.max()
We can use Python's built-in slice()
function to slice the data's coordinate arrays and select the subset of the data we want.
More info: http://xarray.pydata.org/en/stable/indexing.html
Syntax: slice(start, end, step)
More info: https://www.w3schools.com/python/ref_func_slice.asp
letters = ["a", "b", "c", "d", "e", "f", "g", "h"]
letters[0:5:2]
['a', 'c', 'e']
letters[slice(0, 5, 2)]
['a', 'c', 'e']
We can use the .sel()
function to slice our x
and y
coordinate arrays!
# slice the x and y coordinates
slice_x = slice(xmin, xmax)
slice_y = slice(ymax, ymin) # IMPORTANT: y coordinate array is in descending order
slice_x
slice(<xarray.DataArray 'x' ()> array(332400.), <xarray.DataArray 'x' ()> array(377250.), None)
# Use the .sel() to slice
landsat_8_trimmed = landsat_8_da.sel(x=slice_x, y=slice_y)
y
coordinate is stored in descending order, so the slice should be ordered the same way (from ymax to ymin)¶landsat_8_da.y
<xarray.DataArray 'y' (y: 286)> array([4320900., 4320690., 4320480., ..., 4261470., 4261260., 4261050.]) Coordinates: * y (y) float64 4.321e+06 4.321e+06 4.32e+06 ... 4.261e+06 4.261e+06
landsat_8_trimmed.shape
(7, 214, 213)
landsat_8_da.shape
(7, 286, 286)
landsat_8_trimmed
<xarray.DataArray (band: 7, y: 214, x: 213)> array([[[ 730., 721., 703., ..., 970., 1059., 520.], [ 700., 751., 656., ..., 835., 1248., 839.], [ 721., 754., 796., ..., 1065., 1207., 1080.], ..., [1022., 949., 983., ..., 516., 598., 676.], [ 976., 757., 769., ..., 950., 954., 634.], [ 954., 1034., 788., ..., 1133., 662., 1055.]], [[ 874., 879., 858., ..., 1157., 1259., 609.], [ 860., 919., 814., ..., 968., 1523., 983.], [ 896., 939., 982., ..., 1203., 1412., 1292.], ..., [1243., 1157., 1212., ..., 604., 680., 805.], [1215., 953., 949., ..., 1095., 1110., 755.], [1200., 1258., 978., ..., 1340., 758., 1189.]], [[1181., 1148., 1104., ..., 1459., 1633., 775.], [1154., 1223., 1093., ..., 1220., 1851., 1345.], [1198., 1258., 1309., ..., 1531., 1851., 1674.], ..., ... ..., [2652., 2459., 2649., ..., 1190., 1299., 1581.], [2547., 1892., 2212., ..., 2284., 2416., 1475.], [2400., 2627., 2405., ..., 2469., 1579., 2367.]], [[3039., 2806., 2877., ..., 1965., 2367., 1203.], [2779., 2799., 2474., ..., 1685., 2339., 1637.], [2597., 2822., 2790., ..., 2030., 2587., 2116.], ..., [3144., 2892., 3168., ..., 1453., 1594., 1765.], [3109., 2405., 2731., ..., 2804., 3061., 1653.], [2518., 3110., 3144., ..., 2801., 2051., 2770.]], [[2528., 2326., 2417., ..., 1748., 2139., 1023.], [2260., 2238., 1919., ..., 1519., 2096., 1448.], [2041., 2226., 2247., ..., 1848., 2380., 1973.], ..., [2661., 2423., 2556., ..., 1225., 1335., 1469.], [2573., 1963., 2091., ..., 2479., 2570., 1393.], [2191., 2525., 2504., ..., 2658., 1779., 2514.]]]) Coordinates: * band (band) int64 1 2 3 4 5 6 7 * y (y) float64 4.309e+06 4.309e+06 4.309e+06 ... 4.265e+06 4.264e+06 * x (x) float64 3.326e+05 3.328e+05 3.33e+05 ... 3.769e+05 3.771e+05 Attributes: transform: (210.0, 0.0, 318195.0, 0.0, -210.0, 4321005.0) crs: +init=epsg:32611 res: (210.0, 210.0) is_tiled: 0 nodatavals: (nan,) scales: (1.0,) offsets: (0.0,) AREA_OR_POINT: Area
# Get the trimmed landsat 8 RGB data as a numpy array
rgb_data_8 = landsat_8_trimmed.sel(band=[4,3,2]).values
# # Initialize
fig, ax = plt.subplots(figsize=(10,10))
# # Plot the RGB bands
ep.plot_rgb(rgb_data_8, rgb=(0, 1, 2), ax=ax);
# Select the RGB data as a numpy array
rgb_data_5 = landsat_5_da.sel(band=[3,2,1]).values
# Initialize the figure
fig, ax = plt.subplots(figsize=(10,10))
# Plot the RGB bands
ep.plot_rgb(rgb_data_5, rgb=(0, 1, 2), ax=ax);
Yes! We'll use the change in the NDVI over time to detect change in lake volume
For Landsat 5: NIR = band 4 and Red = band 3
For Landsat 8: NIR = band 5 and Red = band 4
NIR_1988 = landsat_5_da.sel(band=4)
RED_1988 = landsat_5_da.sel(band=3)
NDVI_1988 = (NIR_1988 - RED_1988) / (NIR_1988 + RED_1988)
NIR_2017 = landsat_8_da.sel(band=5)
RED_2017 = landsat_8_da.sel(band=4)
NDVI_2017 = (NIR_2017 - RED_2017) / (NIR_2017 + RED_2017)
hvplot.image()
function to show the differencecoolwarm
, is particularly useful for this situationdiff = NDVI_2017 - NDVI_1988
diff.hvplot.image(
x="x",
y="y",
frame_height=400,
frame_width=400,
cmap="coolwarm",
clim=(-1, 1),
geo=True,
crs=32611,
)
NDVI_1988.shape
(300, 300)
NDVI_2017.shape
(286, 286)
diff.shape
(42, 43)
# Different x/y ranges!
print(NDVI_1988.x[0].values)
print(NDVI_2017.x[0].values)
332400.0 318300.0
# Different resolutions
print((NDVI_1988.x[1] - NDVI_1988.x[0]).values)
print((NDVI_2017.x[1] - NDVI_2017.x[0]).values)
150.0 210.0
# The center lat/lng values in EPSG = 4326
# I got these points from google maps
x0 = -118.7081
y0 = 38.6942
Let's convert these coordinates to the sames CRS as the Landsat data
cartopy to the rescue!
import cartopy.crs as ccrs
# Initialize a CRS object for our EPSG=32611 CRS
crs = ccrs.epsg("32611")
# Convert from (x0,y0) in 4326 to (x_center, y_center) in 32611
x_center, y_center = crs.transform_point(x0, y0, ccrs.PlateCarree())
x_center
351453.22378616617
We can also use geopandas:
pt = pd.DataFrame({"lng": [x0], "lat": [y0]})
gpt = gpd.GeoDataFrame(
pt, geometry=gpd.points_from_xy(pt["lng"], pt["lat"]), crs="EPSG:4326"
)
gpt
lng | lat | geometry | |
---|---|---|---|
0 | -118.7081 | 38.6942 | POINT (-118.70810 38.69420) |
pt_32611 = gpt.to_crs(epsg=32611).iloc[0]
pt_32611.geometry.x
351453.2237862034
pt_32611.geometry.y
4284227.008738902
Remember: we are converting from EPSG=4326 (PlateCarree) to EPSG=32611
Let's add a 30 km bounding box around the midpoint
buffer = 15e3 # in km
xmin = x_center - buffer
xmax = x_center + buffer
ymin = y_center - buffer
ymax = y_center + buffer
bounding_box = [[[xmin, ymin], [xmin, ymax], [xmax, ymax], [xmax, ymin]]]
Use the builting geoviews imagery to confirm..
EsriImagery * gv.Polygons(bounding_box, crs=crs).options(alpha=0.4)
res = 200 # 200 meter resolution
x = np.arange(xmin, xmax, res)
y = np.arange(ymin, ymax, res)
len(x)
150
len(y)
150
This does a linear interpolation of the data using the nearest pixels.
NDVI_2017_regridded = NDVI_2017.interp(x=x, y=y)
NDVI_1988_regridded = NDVI_1988.interp(x=x, y=y)
img1988 = NDVI_1988_regridded.hvplot.image(
x="x",
y="y",
crs=32611,
geo=True,
frame_height=300,
frame_width=300,
clim=(-3, 1),
cmap="fire",
title="1988"
)
img2017 = NDVI_2017_regridded.hvplot.image(
x="x",
y="y",
crs=32611,
geo=True,
frame_height=300,
frame_width=300,
clim=(-3, 1),
cmap="fire",
title="2017"
)
img1988 + img2017
Now that the images are lined up, the change in lake volume is clearly apparent
diff_regridded = NDVI_2017_regridded - NDVI_1988_regridded
diff_regridded
<xarray.DataArray (y: 150, x: 150)> array([[ 0.07285845, 0.02394533, 0.02255797, ..., 0.022373 , 0.02654387, 0.01870564], [ 0.09309169, 0.11359612, 0.16240596, ..., 0.0184576 , 0.01718405, 0.01623927], [ 0.1039483 , 0.15174163, 0.14382052, ..., 0.01866696, 0.01353435, 0.0147996 ], ..., [-0.10564601, 0.01861015, 0.0668057 , ..., 0.01163675, 0.01841919, 0.00968276], [ 0.01748623, -0.06021414, 0.03493955, ..., 0.03757851, 0.02141323, 0.03405552], [ 0.0131126 , 0.08405171, -0.02384596, ..., 0.01839516, 0.00968287, 0.02835872]]) Coordinates: * x (x) float64 3.365e+05 3.367e+05 3.369e+05 ... 3.661e+05 3.663e+05 * y (y) float64 4.269e+06 4.269e+06 4.27e+06 ... 4.299e+06 4.299e+06
diff_regridded.hvplot.image(
x="x",
y="y",
crs=32611,
geo=True,
frame_height=400,
frame_width=400,
cmap="coolwarm",
clim=(-1, 1),
)
Given hi-resolution data, we can downsample to a lower resolution with the familiar groupby / mean framework from pandas
Let's try a resolution of 1000 meters instead of 200 meters
# Define a low-resolution grid
res_1000 = 1000
x_1000 = np.arange(xmin, xmax, res_1000)
y_1000 = np.arange(ymin, ymax, res_1000)
# Groupby new bins and take the mean of all pixels falling into a group
# First: groupby low-resolution x bins and take mean
# Then: groupby low-resolution y bins and take mean
diff_res_1000 = (
diff_regridded.groupby_bins("x", x_1000, labels=x_1000[:-1])
.mean(dim="x")
.groupby_bins("y", y_1000, labels=y_1000[:-1])
.mean(dim="y")
.rename(x_bins="x", y_bins="y")
)
diff_res_1000
<xarray.DataArray (y: 29, x: 29)> array([[ 0.06869101, 0.13177651, 0.15469471, 0.07722272, 0.08874126, 0.07121849, 0.03444372, 0.037119 , 0.00104902, 0.04373775, 0.06956536, 0.11455193, 0.1385741 , 0.11434505, 0.06491517, 0.06781426, 0.05570124, 0.05596228, 0.02769747, 0.02554617, 0.03236267, 0.01968282, 0.02387923, 0.01797904, 0.01902946, 0.02166195, 0.02174784, 0.01880276, 0.01359409], [ 0.08346992, 0.08776704, 0.13548499, 0.08220051, 0.09708871, 0.06775058, 0.03779001, 0.01991329, 0.07547025, 0.13493956, 0.20133291, 0.12017601, 0.07325837, 0.05940746, 0.07004369, 0.0447765 , 0.04581451, 0.03409407, 0.02344497, 0.03093005, 0.02869954, 0.03271979, 0.02381046, 0.01556579, 0.02169905, 0.0227847 , 0.02022507, 0.018137 , 0.01564086], [ 0.0936053 , 0.09289295, 0.12060025, 0.12873584, 0.16235808, 0.10101415, 0.04018545, 0.01535085, 0.07127819, 0.16983906, 0.10893062, 0.21828906, 0.08079256, 0.07577124, 0.06833529, 0.01555701, -0.01109825, -0.00299437, 0.01431601, 0.02801818, 0.0293123 , 0.02606464, 0.02491255, 0.01810481, 0.01536261, 0.02188127, 0.02200607, 0.01673377, 0.00776444], [ 0.08717537, 0.07700656, 0.13392983, 0.14826298, 0.15392851, 0.09926764, 0.0569021 , 0.0610834 , 0.08476449, 0.08527878, ... 0.02798621, 0.02992438, 0.03641541, 0.02005467, 0.02204858, 0.0287752 , 0.02188122, 0.02495601, 0.02506901], [ 0.05668256, 0.07664267, 0.08830451, 0.07859942, 0.12379519, 0.114724 , 0.0405303 , 0.03885428, 0.06682004, 0.05558355, 0.03833086, 0.02902788, 0.17875198, 0.20029484, 0.01872939, 0.02381692, 0.02002493, 0.0218631 , 0.02401623, 0.02320183, 0.03253913, 0.01651287, 0.02256299, 0.03535588, 0.03159138, 0.02213577, 0.02914387, 0.029847 , 0.02951869], [ 0.0616654 , 0.04362798, 0.06757889, 0.13201092, 0.10358936, 0.08289852, 0.10031057, 0.07410737, 0.06310736, 0.0420152 , 0.04179894, 0.02750726, -0.05992471, 0.02938139, 0.03474508, 0.02488141, 0.02266253, 0.02158754, 0.0207588 , 0.02306504, 0.02320071, 0.02288257, 0.02793184, 0.02504522, 0.02886683, 0.022406 , 0.01791923, 0.0284036 , 0.02830287], [ 0.04925822, 0.07870229, 0.1287353 , 0.13487135, 0.07996429, 0.09300206, 0.07809081, 0.07023948, 0.04476815, 0.03548731, 0.03374816, -0.04190856, -0.00502124, 0.03805152, 0.03067927, 0.02737551, 0.02609662, 0.02289044, 0.02446029, 0.02962656, 0.0284653 , 0.02105816, 0.03050465, 0.02811649, 0.02367856, 0.03380146, 0.02753168, 0.02918184, 0.02360341]]) Coordinates: * x (x) float64 3.365e+05 3.375e+05 3.385e+05 ... 3.635e+05 3.645e+05 * y (y) float64 4.269e+06 4.27e+06 4.271e+06 ... 4.296e+06 4.297e+06
diff_res_1000.hvplot.image(
x="x",
y="y",
crs=32611,
geo=True,
frame_width=500,
frame_height=400,
cmap="coolwarm",
clim=(-1, 1),
)
band_info = pd.DataFrame([
(1, "Aerosol", " 0.43 - 0.45", 0.440, "30", "Coastal aerosol"),
(2, "Blue", " 0.45 - 0.51", 0.480, "30", "Blue"),
(3, "Green", " 0.53 - 0.59", 0.560, "30", "Green"),
(4, "Red", " 0.64 - 0.67", 0.655, "30", "Red"),
(5, "NIR", " 0.85 - 0.88", 0.865, "30", "Near Infrared (NIR)"),
(6, "SWIR1", " 1.57 - 1.65", 1.610, "30", "Shortwave Infrared (SWIR) 1"),
(7, "SWIR2", " 2.11 - 2.29", 2.200, "30", "Shortwave Infrared (SWIR) 2"),
(8, "Panc", " 0.50 - 0.68", 0.590, "15", "Panchromatic"),
(9, "Cirrus", " 1.36 - 1.38", 1.370, "30", "Cirrus"),
(10, "TIRS1", "10.60 - 11.19", 10.895, "100 * (30)", "Thermal Infrared (TIRS) 1"),
(11, "TIRS2", "11.50 - 12.51", 12.005, "100 * (30)", "Thermal Infrared (TIRS) 2")],
columns=['Band', 'Name', 'Wavelength Range (µm)', 'Nominal Wavelength (µm)', 'Resolution (m)', 'Description']).set_index(["Band"])
band_info
Name | Wavelength Range (µm) | Nominal Wavelength (µm) | Resolution (m) | Description | |
---|---|---|---|---|---|
Band | |||||
1 | Aerosol | 0.43 - 0.45 | 0.440 | 30 | Coastal aerosol |
2 | Blue | 0.45 - 0.51 | 0.480 | 30 | Blue |
3 | Green | 0.53 - 0.59 | 0.560 | 30 | Green |
4 | Red | 0.64 - 0.67 | 0.655 | 30 | Red |
5 | NIR | 0.85 - 0.88 | 0.865 | 30 | Near Infrared (NIR) |
6 | SWIR1 | 1.57 - 1.65 | 1.610 | 30 | Shortwave Infrared (SWIR) 1 |
7 | SWIR2 | 2.11 - 2.29 | 2.200 | 30 | Shortwave Infrared (SWIR) 2 |
8 | Panc | 0.50 - 0.68 | 0.590 | 15 | Panchromatic |
9 | Cirrus | 1.36 - 1.38 | 1.370 | 30 | Cirrus |
10 | TIRS1 | 10.60 - 11.19 | 10.895 | 100 * (30) | Thermal Infrared (TIRS) 1 |
11 | TIRS2 | 11.50 - 12.51 | 12.005 | 100 * (30) | Thermal Infrared (TIRS) 2 |
We'll be downloading publicly available Landsat data from Google Cloud Storage
See: https://cloud.google.com/storage/docs/public-datasets/landsat
The relevant information is stored in our intake catalog:
From our catalog.yml file:
google_landsat_band:
description: Landsat bands from Google Cloud Storage
driver: rasterio
parameters:
path:
description: landsat path
type: int
row:
description: landsat row
type: int
product_id:
description: landsat file id
type: str
band:
description: band
type: int
args:
urlpath: https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/{{ '%03d' % path }}/{{ '%03d' % row }}/{{ product_id }}/{{ product_id }}_B{{ band }}.TIF
chunks:
band: 1
x: 256
y: 256
From the "urlpath" above, you can see we need "path", "row", and "product_id" variables to properly identify a Landsat image:
The path and row corresponding to the area over Philadelphia has already been selected using the Earth Explorer. This tool was also used to find the id of the particular date of interest using the same tool.
# Necessary variables
path = 14
row = 32
product_id = 'LC08_L1TP_014032_20160727_20170222_01_T1'
We won't cover the specifics in the course, but geemap
is a fantastic library for interacting with GEE.
geemap
package:
This will return a specific Landsat band as a dask array.
from random import random
from time import sleep
def get_band_with_exponential_backoff(path, row, product_id, band, maximum_backoff=32):
"""
Google Cloud Storage recommends using exponential backoff
when accessing the API.
https://cloud.google.com/storage/docs/exponential-backoff
"""
n = backoff = 0
while backoff < maximum_backoff:
try:
return cat.google_landsat_band(
path=path, row=row, product_id=product_id, band=band
).to_dask()
except:
backoff = min(2 ** n + random(), maximum_backoff)
sleep(backoff)
n += 1
Loop over each band, load that band using the above function, and then concatenate the results together..
bands = [1, 2, 3, 4, 5, 6, 7, 9, 10, 11]
datasets = []
for band in bands:
da = get_band_with_exponential_backoff(
path=path, row=row, product_id=product_id, band=band
)
da = da.assign_coords(band=[band])
datasets.append(da)
ds = xr.concat(datasets, "band", compat="identical")
ds
<xarray.DataArray (band: 10, y: 7871, x: 7741)> dask.array<concatenate, shape=(10, 7871, 7741), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray> Coordinates: * band (band) int64 1 2 3 4 5 6 7 9 10 11 * y (y) float64 4.582e+06 4.582e+06 4.582e+06 ... 4.346e+06 4.346e+06 * x (x) float64 3.954e+05 3.954e+05 3.955e+05 ... 6.276e+05 6.276e+05 Attributes: transform: (30.0, 0.0, 395385.0, 0.0, -30.0, 4582215.0) crs: +init=epsg:32618 res: (30.0, 30.0) is_tiled: 1 nodatavals: (nan,) scales: (1.0,) offsets: (0.0,) AREA_OR_POINT: Point
CRS is EPSG=32618
def load_google_landsat_metadata(path, row, product_id):
"""
Load Landsat metadata for path, row, product_id from Google Cloud Storage
"""
def parse_type(x):
try:
return eval(x)
except:
return x
baseurl = "https://storage.googleapis.com/gcp-public-data-landsat/LC08/01"
suffix = f"{path:03d}/{row:03d}/{product_id}/{product_id}_MTL.txt"
df = intake.open_csv(
urlpath=f"{baseurl}/{suffix}",
csv_kwargs={
"sep": "=",
"header": None,
"names": ["index", "value"],
"skiprows": 2,
"converters": {"index": (lambda x: x.strip()), "value": parse_type},
},
).read()
metadata = df.set_index("index")["value"]
return metadata
metadata = load_google_landsat_metadata(path, row, product_id)
metadata.head()
index ORIGIN Image courtesy of the U.S. Geological Survey REQUEST_ID 0501702206266_00020 LANDSAT_SCENE_ID LC80140322016209LGN01 LANDSAT_PRODUCT_ID LC08_L1TP_014032_20160727_20170222_01_T1 COLLECTION_NUMBER 01 Name: value, dtype: object
slice()
function discussed last example!# Load the GeoJSON from the URL
url = "http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson"
city_limits = gpd.read_file(url)
# Convert to the right CRS for this data
city_limits = city_limits.to_crs(epsg=32618)
y
coordinates are in descending order, so you'll slice will need to be in descending order as wellcity_limits.total_bounds?
Type: property String form: <property object at 0x1610179a0> Docstring: Returns a tuple containing ``minx``, ``miny``, ``maxx``, ``maxy`` values for the bounds of the series as a whole. See ``GeoSeries.bounds`` for the bounds of the geometries contained in the series. Examples -------- >>> from shapely.geometry import Point, Polygon, LineString >>> d = {'geometry': [Point(3, -1), Polygon([(0, 0), (1, 1), (1, 0)]), ... LineString([(0, 1), (1, 2)])]} >>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326") >>> gdf.total_bounds array([ 0., -1., 3., 2.])
# Get x/y range of city limits from "total_bounds"
xmin, ymin, xmax, ymax = city_limits.total_bounds
# Slice our xarray data
subset = ds.sel(
x=slice(xmin, xmax),
y=slice(ymax, ymin)
) # NOTE: y coordinate system is in descending order!
This should take about a minute or so, depending on the speed of your laptop.
subset = subset.compute()
subset
<xarray.DataArray (band: 10, y: 1000, x: 924)> array([[[10702, 10162, 10361, ..., 11681, 11489, 15594], [10870, 10376, 10122, ..., 11620, 11477, 12042], [10429, 10147, 10063, ..., 11455, 11876, 11790], ..., [11944, 11696, 11626, ..., 11192, 11404, 10923], [12406, 11555, 11155, ..., 11516, 11959, 10766], [11701, 11232, 10657, ..., 10515, 11475, 10926]], [[ 9809, 9147, 9390, ..., 10848, 10702, 15408], [ 9989, 9405, 9139, ..., 10831, 10660, 11361], [ 9439, 9083, 8981, ..., 10654, 11141, 11073], ..., [11220, 10853, 10741, ..., 10318, 10511, 9950], [11765, 10743, 10259, ..., 10646, 11378, 9829], [10889, 10365, 9630, ..., 9500, 10573, 10008]], [[ 9042, 8466, 8889, ..., 10014, 9647, 14981], [ 9699, 8714, 8596, ..., 9866, 9783, 11186], [ 8623, 8457, 8334, ..., 9688, 10474, 9993], ..., ... ..., [ 5027, 5028, 5038, ..., 5035, 5037, 5029], [ 5058, 5021, 5023, ..., 5035, 5041, 5031], [ 5036, 5041, 5040, ..., 5036, 5044, 5035]], [[29033, 28976, 28913, ..., 32614, 32577, 32501], [28940, 28904, 28858, ..., 32516, 32545, 32545], [28882, 28879, 28854, ..., 32431, 32527, 32563], ..., [30094, 29929, 29713, ..., 29521, 29525, 29429], [30140, 29972, 29696, ..., 29556, 29516, 29398], [30133, 29960, 29614, ..., 29587, 29533, 29424]], [[25558, 25519, 25492, ..., 27680, 27650, 27619], [25503, 25450, 25402, ..., 27636, 27630, 27639], [25473, 25434, 25378, ..., 27609, 27668, 27667], ..., [26126, 26055, 25934, ..., 25622, 25586, 25520], [26149, 26077, 25935, ..., 25651, 25594, 25507], [26147, 26050, 25856, ..., 25696, 25644, 25571]]], dtype=uint16) Coordinates: * band (band) int64 1 2 3 4 5 6 7 9 10 11 * y (y) float64 4.443e+06 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06 * x (x) float64 4.761e+05 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05 Attributes: transform: (30.0, 0.0, 395385.0, 0.0, -30.0, 4582215.0) crs: +init=epsg:32618 res: (30.0, 30.0) is_tiled: 1 nodatavals: (nan,) scales: (1.0,) offsets: (0.0,) AREA_OR_POINT: Point
# Original data was about 8000 x 8000 in size
ds.shape
(10, 7871, 7741)
# Sliced data is only about 1000 x 1000 in size!
subset.shape
(10, 1000, 924)