Skip to content

Commit d7bd04b

Browse files
committed
Merge branch 'main' of github.com:pritamd47/interactive-sentinel-2
2 parents c709af6 + 00adfb1 commit d7bd04b

File tree

6 files changed

+80
-38
lines changed

6 files changed

+80
-38
lines changed

.github/workflows/nightly-build.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
if: ${{ github.repository_owner == 'ProjectPythia' }}
1111
uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
1212
with:
13-
environment_name: cookbook-dev
13+
environment_name: interactive-sentinel-2-cookbook-dev
1414

1515
link-check:
1616
if: ${{ github.repository_owner == 'ProjectPythia' }}

.github/workflows/publish-book.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
build:
1212
uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
1313
with:
14-
environment_name: cookbook-dev
14+
environment_name: interactive-sentinel-2-cookbook-dev
1515

1616
deploy:
1717
needs: build

.github/workflows/trigger-book-build.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@ jobs:
66
build:
77
uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
88
with:
9-
environment_name: cookbook-dev
9+
environment_name: interactive-sentinel-2-cookbook-dev
1010
artifact_name: book-zip-${{ github.event.number }}
1111
# Other input options are possible, see ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml

_config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ execute:
1919
# To execute notebooks via a Binder instead, replace 'cache' with 'binder'
2020
execute_notebooks: cache
2121
timeout: 3000
22-
allow_errors: False # cells with expected failures must set the `raises-exception` cell tag
22+
allow_errors: false # ~cells with expected failures must set the `raises-exception` cell tag~
2323

2424
# Add a few extensions to help with parsing content
2525
parse:

environment.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: cookbook-dev
1+
name: interactive-sentinel-2-cookbook-dev
22
channels:
33
- pyviz
44
- conda-forge
@@ -22,5 +22,9 @@ dependencies:
2222
- geoviews
2323
- dask
2424
- distributed
25+
- odc-stac
26+
- graphviz
27+
- python-graphviz
28+
- pydantic < 2
2529
- pip:
2630
- sphinx-pythia-theme

notebooks/data-intake-ms-planetary-computer.ipynb

Lines changed: 71 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
22
"cells": [
33
{
4-
"attachments": {},
54
"cell_type": "markdown",
65
"metadata": {},
76
"source": [
@@ -10,13 +9,14 @@
109
]
1110
},
1211
{
13-
"attachments": {},
1412
"cell_type": "markdown",
1513
"metadata": {},
1614
"source": [
1715
"## Overview\n",
1816
"\n",
19-
"In this notebook, we will take a look at how to retrieve Sentinel-2 L2A satellite imagery from the [Microsoft Planetary Computer Data Catalog (MSPC)](https://planetarycomputer.microsoft.com/catalog). We will go over how to interact with the Data Catalog, which exposes a [SpatioTemporal Asset Catalog (STAC)](https://stacspec.org/en) interface for querying, searching and retrieving data. We will use the [stackstac](https://stackstac.readthedocs.io/en/latest/) package to load the data lazily, which means data is not *actually* read unless required (say, for plotting). Once loaded, we will process the data and make a simple interactive dashboard to look at the satellite imagery over a location for different seasons. We will use the [HoloViz ecosystem](https://holoviz.org/background.html) for the interactive dashboard."
17+
"In this notebook, we will take a look at how to retrieve Sentinel-2 L2A satellite imagery from the [Microsoft Planetary Computer Data Catalog (MSPC)](https://planetarycomputer.microsoft.com/catalog). We will go over how to interact with the Data Catalog, which exposes a [SpatioTemporal Asset Catalog (STAC)](https://stacspec.org/en) interface for querying, searching and retrieving data. We will use the [stackstac](https://stackstac.readthedocs.io/en/latest/) package to load the data lazily, which means data is not *actually* read unless required (say, for plotting). Once loaded, we will process the data and make a simple interactive dashboard to look at the satellite imagery over a location for different seasons. We will use the [HoloViz ecosystem](https://holoviz.org/background.html) for the interactive dashboard.\n",
18+
"\n",
19+
"# TODO: Add authorship using CITATION.cff"
2020
]
2121
},
2222
{
@@ -48,11 +48,14 @@
4848
{
4949
"cell_type": "code",
5050
"execution_count": null,
51-
"metadata": {},
51+
"metadata": {
52+
"tags": []
53+
},
5254
"outputs": [],
5355
"source": [
5456
"import os\n",
5557
"import pandas as pd\n",
58+
"import numpy as np\n",
5659
"import xarray as xr\n",
5760
"import stackstac\n",
5861
"import pystac_client\n",
@@ -65,7 +68,9 @@
6568
"from pystac.extensions.eo import EOExtension as eo\n",
6669
"import datetime\n",
6770
"from cartopy import crs\n",
71+
"import dask\n",
6872
"from dask.distributed import Client, LocalCluster\n",
73+
"import odc.stac\n",
6974
"\n",
7075
"xr.set_options(keep_attrs=True)\n",
7176
"hv.extension('bokeh')\n",
@@ -91,7 +96,6 @@
9196
]
9297
},
9398
{
94-
"attachments": {},
9599
"cell_type": "markdown",
96100
"metadata": {},
97101
"source": [
@@ -144,7 +148,6 @@
144148
]
145149
},
146150
{
147-
"attachments": {},
148151
"cell_type": "markdown",
149152
"metadata": {},
150153
"source": [
@@ -173,7 +176,7 @@
173176
"bbox = [-105.283263,39.972809,-105.266569,39.987640] # NCAR, boulder, CO. bbox from http://bboxfinder.com/\n",
174177
"date_range = \"2022-01-01/2022-12-31\"\n",
175178
"collection = \"sentinel-2-l2a\" # full id of collection\n",
176-
"cloud_thresh = 40"
179+
"cloud_thresh = 30"
177180
]
178181
},
179182
{
@@ -193,7 +196,6 @@
193196
]
194197
},
195198
{
196-
"attachments": {},
197199
"cell_type": "markdown",
198200
"metadata": {},
199201
"source": [
@@ -223,7 +225,7 @@
223225
"|B03|Green|\n",
224226
"|B02|Blue|\n",
225227
"\n",
226-
"We will use the `stackstac.stack` function to load in the assets that start with the alphabet 'B'. This function will return a lazily-loaded `xr.DataArray` (using dask). "
228+
"We will use the `odc.stac.stac_load` function to load in the assets that start with the alphabet 'B'. This function will return a lazily-loaded `xr.DataSet` (using dask). For plotting purposes it is better if we have the data as a `xr.DataArray` instead with the bands as a dimension. We can do that using `.to_array(dim=<dim_name>)` method of a dataset."
227229
]
228230
},
229231
{
@@ -234,12 +236,12 @@
234236
"source": [
235237
"bands_of_interest = [b for b in all_bands if b.startswith('B')]\n",
236238
"\n",
237-
"da = stackstac.stack(\n",
239+
"da = odc.stac.stac_load(\n",
238240
" items,\n",
239-
" bounds_latlon=bbox,\n",
240-
" assets=bands_of_interest,\n",
241-
" chunksize='50MiB'\n",
242-
")\n",
241+
" bands=bands_of_interest,\n",
242+
" bbox=bbox,\n",
243+
" chunks={}, # <-- use Dask\n",
244+
").to_array(dim='band')\n",
243245
"da"
244246
]
245247
},
@@ -303,8 +305,9 @@
303305
"metadata": {},
304306
"source": [
305307
"Now that we have a harmonized dataset, we still need to process the data as follows:\n",
306-
"- On closer inspection, I found some duplicate data in the `time` dimension which were leading to errors later in the notebook. We can drop the duplicate values using the `da.drop_duplicates` method.\n",
307-
"- Sentinel-2 L2A provides the Surface Reflectance (SR) data, which usually ranges from 0 (no reflection) to 1.0 (complete reflection). However, the actual values in the loaded dataset ranges from 0 to ~10,000. These data values need to be scaled to 0.0-1.0 by dividing the data by 10,000. More details can be found in [section 2.3.10 of this document](https://sentinel.esa.int/documents/247904/685211/Sen2-Cor-L2A-Input-Output-Data-Definition-Document.pdf/e2dd6f01-c9c7-494d-a7f2-cd3be9ad891a?t=1506524754000)."
308+
"- Sentinel-2 L2A provides the Surface Reflectance (SR) data, which usually ranges from 0 (no reflection) to 1.0 (complete reflection). However, the actual values in the loaded dataset ranges from 0 to ~10,000. These data values need to be scaled to 0.0-1.0 by dividing the data by 10,000. More details can be found in [section 2.3.10 of this document](https://sentinel.esa.int/documents/247904/685211/Sen2-Cor-L2A-Input-Output-Data-Definition-Document.pdf/e2dd6f01-c9c7-494d-a7f2-cd3be9ad891a?t=1506524754000).\n",
309+
"\n",
310+
"We will then explicitly trigger the dask computation using the `compute()` method and load the result into memory. This is to reduce repeated calls to retrieve data from MSPC. By loading the processed This wouldn't have been possible if the dataset was large."
308311
]
309312
},
310313
{
@@ -313,23 +316,18 @@
313316
"metadata": {},
314317
"outputs": [],
315318
"source": [
316-
"# Seems like there is duplicate data in the time dimension\n",
317-
"da = da.drop_duplicates(dim='time')\n",
318319
"da = da / 1e4 # Scale data values from 0:10000 to 0:1.0\n",
319-
"da = da / da.max(dim='band') # Stretch to min-max for *crispier* plots\n",
320-
"da"
320+
"da = da / da.max(dim='band') # additionally scale from 0-max -> 0-1 for visual quality\n",
321+
"da = da.compute()"
321322
]
322323
},
323324
{
324325
"cell_type": "markdown",
325326
"metadata": {},
326327
"source": [
327-
"We have now processed the data so that we can visualize it! *Note: The computation has not been done yet, it will be triggered as soon as we plot the data. This is possible because until now, `dask` has only created the \"task graph\" and we have not yet performed any operation that would trigger the computation yet*.\n",
328+
"We have now processed the data so that we can visualize it!\n",
328329
"\n",
329-
"Let's create a function that will take a `time` input and have do the following tasks:\n",
330-
" 1. plot an interactive RGB image of the data and overlay it on a map of the world.\n",
331-
" 2. provide a [date slider widget](https://panel.holoviz.org/reference/widgets/DateSlider.html) which can be used to interact with the plot.\n",
332-
" 3. only set the default value of the date slider to the `time`, but allow the user to slide through the length of the entire dataset."
330+
"Let's look at the Blue, Green and Red bands."
333331
]
334332
},
335333
{
@@ -338,14 +336,39 @@
338336
"metadata": {},
339337
"outputs": [],
340338
"source": [
341-
"season_names = {\n",
342-
" 1: 'Winter',\n",
343-
" 2: 'Spring',\n",
344-
" 3: 'Summer',\n",
345-
" 4: 'Fall'\n",
346-
"}\n",
339+
"da.sel(band='B04').isel(time=0).hvplot(x='x', y='y', data_aspect=1, cmap='Blues') \\\n",
340+
"+ da.sel(band='B03').isel(time=0).hvplot(x='x', y='y', data_aspect=1, cmap='Greens') \\\n",
341+
"+ da.sel(band='B02').isel(time=0).hvplot(x='x', y='y', data_aspect=1, cmap='Reds')"
342+
]
343+
},
344+
{
345+
"cell_type": "markdown",
346+
"metadata": {},
347+
"source": [
348+
"Let us make a dashboard composed of 4 different interactive plots showing the RGB view of the satellite observation for four different seasons.\n",
349+
"We need a function that will take a `time` input and does the following tasks:\n",
350+
" 1. plot an interactive RGB image of the data and overlay it on a map of the world.\n",
351+
" 2. provide a [date slider widget](https://panel.holoviz.org/reference/widgets/DateSlider.html) which can be used to interact with the plot.\n",
352+
" 3. only set the default value of the date slider to the `time`, but allow the user to slide through the length of the entire dataset.\n",
347353
"\n",
354+
"Using this function, we will be able to compose the dashboard."
355+
]
356+
},
357+
{
358+
"cell_type": "code",
359+
"execution_count": null,
360+
"metadata": {
361+
"tags": []
362+
},
363+
"outputs": [],
364+
"source": [
348365
"def rgb_during(time):\n",
366+
" season_names = {\n",
367+
" 1: 'Winter',\n",
368+
" 2: 'Spring',\n",
369+
" 3: 'Summer',\n",
370+
" 4: 'Fall'\n",
371+
" }\n",
349372
" da_rgb = da.sel(band=['B04', 'B03', 'B02'])\n",
350373
" start_date = pd.to_datetime(da_rgb['time'].min().data).to_pydatetime()\n",
351374
" end_date = pd.to_datetime(da_rgb['time'].max().data).to_pydatetime()\n",
@@ -355,7 +378,11 @@
355378
" def get_obs_on(t):\n",
356379
" season_key = [month%12 // 3 + 1 for month in range(1, 13)][t.month-1]\n",
357380
" season = season_names[season_key]\n",
358-
" return da_rgb.sel(time=t, method='nearest').hvplot.rgb(x='x', y='y', bands='band', data_aspect=1, geo=True, tiles='ESRI', rasterize=True, title=f\"{season}: {t.strftime('%Y-%m-%d')}\")\n",
381+
" return da.sel(band=['B04', 'B03', 'B02']).sel(time=t, method='nearest').transpose('y', 'x', 'band').hvplot.rgb(\n",
382+
" x='x', y='y', bands='band', \n",
383+
" geo=True, tiles='ESRI', crs=crs.epsg(items[0].properties['proj:epsg']), \n",
384+
" rasterize=True, title=f\"{season}: {t.strftime('%Y-%m-%d')}\")\n",
385+
" \n",
359386
" \n",
360387
" return pn.panel(pn.Column(\n",
361388
" pn.bind(get_obs_on, t=dt_slider), \n",
@@ -366,6 +393,17 @@
366393
" ))"
367394
]
368395
},
396+
{
397+
"cell_type": "code",
398+
"execution_count": null,
399+
"metadata": {
400+
"tags": []
401+
},
402+
"outputs": [],
403+
"source": [
404+
"rgb_during('2023-01-01')"
405+
]
406+
},
369407
{
370408
"cell_type": "markdown",
371409
"metadata": {},
@@ -380,7 +418,7 @@
380418
"outputs": [],
381419
"source": [
382420
"winter = '2022-01-15'\n",
383-
"spring = '2022-04-15'\n",
421+
"spring = '2022-04-30'\n",
384422
"summer = '2022-08-01'\n",
385423
"fall = '2022-09-15'\n",
386424
"\n",

0 commit comments

Comments
 (0)