Merge branch 'main' of github.com:pritamd47/interactive-sentinel-2

pritamd47 · pritamd47 · commit d7bd04bfa05d · 2023-07-02T20:45:53.000-06:00
diff --git a/.github/workflows/nightly-build.yaml b/.github/workflows/nightly-build.yaml
@@ -10,7 +10,7 @@ jobs:
     if: ${{ github.repository_owner == 'ProjectPythia' }}
     uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
     with:
-      environment_name: cookbook-dev
+      environment_name: interactive-sentinel-2-cookbook-dev
 
   link-check:
     if: ${{ github.repository_owner == 'ProjectPythia' }}
diff --git a/.github/workflows/publish-book.yaml b/.github/workflows/publish-book.yaml
@@ -11,7 +11,7 @@ jobs:
   build:
     uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
     with:
-      environment_name: cookbook-dev
+      environment_name: interactive-sentinel-2-cookbook-dev
 
   deploy:
     needs: build
diff --git a/.github/workflows/trigger-book-build.yaml b/.github/workflows/trigger-book-build.yaml
@@ -6,6 +6,6 @@ jobs:
   build:
     uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
     with:
-      environment_name: cookbook-dev
+      environment_name: interactive-sentinel-2-cookbook-dev
       artifact_name: book-zip-${{ github.event.number }}
       # Other input options are possible, see ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml
diff --git a/_config.yml b/_config.yml
@@ -19,7 +19,7 @@ execute:
   # To execute notebooks via a Binder instead, replace 'cache' with 'binder'
   execute_notebooks: cache
   timeout: 3000
-  allow_errors: False # cells with expected failures must set the `raises-exception` cell tag
+  allow_errors: false # ~cells with expected failures must set the `raises-exception` cell tag~
 
 # Add a few extensions to help with parsing content
 parse:
diff --git a/environment.yml b/environment.yml
@@ -1,4 +1,4 @@
-name: cookbook-dev
+name: interactive-sentinel-2-cookbook-dev
 channels:
   - pyviz
   - conda-forge
@@ -22,5 +22,9 @@ dependencies:
   - geoviews
   - dask
   - distributed
+  - odc-stac
+  - graphviz
+  - python-graphviz
+  - pydantic < 2
   - pip:
       - sphinx-pythia-theme
diff --git a/notebooks/data-intake-ms-planetary-computer.ipynb b/notebooks/data-intake-ms-planetary-computer.ipynb
@@ -1,7 +1,6 @@
 {
  "cells": [
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -10,13 +9,14 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Overview\n",
     "\n",
-    "In this notebook, we will take a look at how to retrieve Sentinel-2 L2A satellite imagery from the [Microsoft Planetary Computer Data Catalog (MSPC)](https://planetarycomputer.microsoft.com/catalog). We will go over how to interact with the Data Catalog, which exposes a [SpatioTemporal Asset Catalog (STAC)](https://stacspec.org/en) interface for querying, searching and retrieving data. We will use the [stackstac](https://stackstac.readthedocs.io/en/latest/) package to load the data lazily, which means data is not *actually* read unless required (say, for plotting). Once loaded, we will process the data and make a simple interactive dashboard to look at the satellite imagery over a location for different seasons. We will use the [HoloViz ecosystem](https://holoviz.org/background.html) for the interactive dashboard."
+    "In this notebook, we will take a look at how to retrieve Sentinel-2 L2A satellite imagery from the [Microsoft Planetary Computer Data Catalog (MSPC)](https://planetarycomputer.microsoft.com/catalog). We will go over how to interact with the Data Catalog, which exposes a [SpatioTemporal Asset Catalog (STAC)](https://stacspec.org/en) interface for querying, searching and retrieving data. We will use the [stackstac](https://stackstac.readthedocs.io/en/latest/) package to load the data lazily, which means data is not *actually* read unless required (say, for plotting). Once loaded, we will process the data and make a simple interactive dashboard to look at the satellite imagery over a location for different seasons. We will use the [HoloViz ecosystem](https://holoviz.org/background.html) for the interactive dashboard.\n",
+    "\n",
+    "# TODO: Add authorship using CITATION.cff"
    ]
   },
   {
@@ -48,11 +48,14 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "import os\n",
     "import pandas as pd\n",
+    "import numpy as np\n",
     "import xarray as xr\n",
     "import stackstac\n",
     "import pystac_client\n",
@@ -65,7 +68,9 @@
     "from pystac.extensions.eo import EOExtension as eo\n",
     "import datetime\n",
     "from cartopy import crs\n",
+    "import dask\n",
     "from dask.distributed import Client, LocalCluster\n",
+    "import odc.stac\n",
     "\n",
     "xr.set_options(keep_attrs=True)\n",
     "hv.extension('bokeh')\n",
@@ -91,7 +96,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -144,7 +148,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -173,7 +176,7 @@
     "bbox = [-105.283263,39.972809,-105.266569,39.987640] # NCAR, boulder, CO. bbox from http://bboxfinder.com/\n",
     "date_range = \"2022-01-01/2022-12-31\"\n",
     "collection = \"sentinel-2-l2a\"                        # full id of collection\n",
-    "cloud_thresh = 40"
+    "cloud_thresh = 30"
    ]
   },
   {
@@ -193,7 +196,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -223,7 +225,7 @@
     "|B03|Green|\n",
     "|B02|Blue|\n",
     "\n",
-    "We will use the `stackstac.stack` function to load in the assets that start with the alphabet 'B'. This function will return a lazily-loaded `xr.DataArray` (using dask). "
+    "We will use the `odc.stac.stac_load` function to load in the assets that start with the alphabet 'B'. This function will return a lazily-loaded `xr.DataSet` (using dask). For plotting purposes it is better if we have the data as a `xr.DataArray` instead with the bands as a dimension. We can do that using `.to_array(dim=<dim_name>)` method of a dataset."
    ]
   },
   {
@@ -234,12 +236,12 @@
    "source": [
     "bands_of_interest = [b for b in all_bands if b.startswith('B')]\n",
     "\n",
-    "da = stackstac.stack(\n",
+    "da = odc.stac.stac_load(\n",
     "    items,\n",
-    "    bounds_latlon=bbox,\n",
-    "    assets=bands_of_interest,\n",
-    "    chunksize='50MiB'\n",
-    ")\n",
+    "    bands=bands_of_interest,\n",
+    "    bbox=bbox,\n",
+    "    chunks={},  # <-- use Dask\n",
+    ").to_array(dim='band')\n",
     "da"
    ]
   },
@@ -303,8 +305,9 @@
    "metadata": {},
    "source": [
     "Now that we have a harmonized dataset, we still need to process the data as follows:\n",
-    "- On closer inspection, I found some duplicate data in the `time` dimension which were leading to errors later in the notebook. We can drop the duplicate values using the `da.drop_duplicates` method.\n",
-    "- Sentinel-2 L2A provides the Surface Reflectance (SR) data, which usually ranges from 0 (no reflection) to 1.0 (complete reflection). However, the actual values in the loaded dataset ranges from 0 to ~10,000. These data values need to be scaled to 0.0-1.0 by dividing the data by 10,000. More details can be found in [section 2.3.10 of this document](https://sentinel.esa.int/documents/247904/685211/Sen2-Cor-L2A-Input-Output-Data-Definition-Document.pdf/e2dd6f01-c9c7-494d-a7f2-cd3be9ad891a?t=1506524754000)."
+    "- Sentinel-2 L2A provides the Surface Reflectance (SR) data, which usually ranges from 0 (no reflection) to 1.0 (complete reflection). However, the actual values in the loaded dataset ranges from 0 to ~10,000. These data values need to be scaled to 0.0-1.0 by dividing the data by 10,000. More details can be found in [section 2.3.10 of this document](https://sentinel.esa.int/documents/247904/685211/Sen2-Cor-L2A-Input-Output-Data-Definition-Document.pdf/e2dd6f01-c9c7-494d-a7f2-cd3be9ad891a?t=1506524754000).\n",
+    "\n",
+    "We will then explicitly trigger the dask computation using the `compute()` method and load the result into memory. This is to reduce repeated calls to retrieve data from MSPC. By loading the processed  This wouldn't have been possible if the dataset was large."
    ]
   },
   {
@@ -313,23 +316,18 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Seems like there is duplicate data in the time dimension\n",
-    "da = da.drop_duplicates(dim='time')\n",
     "da = da / 1e4   # Scale data values from 0:10000 to 0:1.0\n",
-    "da = da / da.max(dim='band')  # Stretch to min-max for *crispier* plots\n",
-    "da"
+    "da = da / da.max(dim='band')  # additionally scale from 0-max -> 0-1 for visual quality\n",
+    "da = da.compute()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We have now processed the data so that we can visualize it! *Note: The computation has not been done yet, it will be triggered as soon as we plot the data. This is possible because until now, `dask` has only created the \"task graph\" and we have not yet performed any operation that would trigger the computation yet*.\n",
+    "We have now processed the data so that we can visualize it!\n",
     "\n",
-    "Let's create a function that will take a `time` input and have do the following tasks:\n",
-    " 1. plot an interactive RGB image of the data and overlay it on a map of the world.\n",
-    " 2. provide a [date slider widget](https://panel.holoviz.org/reference/widgets/DateSlider.html) which can be used to interact with the plot.\n",
-    " 3. only set the default value of the date slider to the `time`, but allow the user to slide through the length of the entire dataset."
+    "Let's look at the Blue, Green and Red bands."
    ]
   },
   {
@@ -338,14 +336,39 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "season_names = {\n",
-    "    1: 'Winter',\n",
-    "    2: 'Spring',\n",
-    "    3: 'Summer',\n",
-    "    4: 'Fall'\n",
-    "}\n",
+    "da.sel(band='B04').isel(time=0).hvplot(x='x', y='y', data_aspect=1, cmap='Blues') \\\n",
+    "+ da.sel(band='B03').isel(time=0).hvplot(x='x', y='y', data_aspect=1, cmap='Greens') \\\n",
+    "+ da.sel(band='B02').isel(time=0).hvplot(x='x', y='y', data_aspect=1, cmap='Reds')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us make a dashboard composed of 4 different interactive plots showing the RGB view of the satellite observation for four different seasons.\n",
+    "We need a function that will take a `time` input and does the following tasks:\n",
+    " 1. plot an interactive RGB image of the data and overlay it on a map of the world.\n",
+    " 2. provide a [date slider widget](https://panel.holoviz.org/reference/widgets/DateSlider.html) which can be used to interact with the plot.\n",
+    " 3. only set the default value of the date slider to the `time`, but allow the user to slide through the length of the entire dataset.\n",
     "\n",
+    "Using this function, we will be able to compose the dashboard."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
     "def rgb_during(time):\n",
+    "    season_names = {\n",
+    "        1: 'Winter',\n",
+    "        2: 'Spring',\n",
+    "        3: 'Summer',\n",
+    "        4: 'Fall'\n",
+    "    }\n",
     "    da_rgb = da.sel(band=['B04', 'B03', 'B02'])\n",
     "    start_date = pd.to_datetime(da_rgb['time'].min().data).to_pydatetime()\n",
     "    end_date = pd.to_datetime(da_rgb['time'].max().data).to_pydatetime()\n",
@@ -355,7 +378,11 @@
     "    def get_obs_on(t):\n",
     "        season_key = [month%12 // 3 + 1 for month in range(1, 13)][t.month-1]\n",
     "        season = season_names[season_key]\n",
-    "        return da_rgb.sel(time=t, method='nearest').hvplot.rgb(x='x', y='y', bands='band', data_aspect=1, geo=True, tiles='ESRI', rasterize=True, title=f\"{season}: {t.strftime('%Y-%m-%d')}\")\n",
+    "        return da.sel(band=['B04', 'B03', 'B02']).sel(time=t, method='nearest').transpose('y', 'x', 'band').hvplot.rgb(\n",
+    "            x='x', y='y', bands='band', \n",
+    "            geo=True, tiles='ESRI', crs=crs.epsg(items[0].properties['proj:epsg']), \n",
+    "            rasterize=True, title=f\"{season}: {t.strftime('%Y-%m-%d')}\")\n",
+    "        \n",
     "    \n",
     "    return pn.panel(pn.Column(\n",
     "                pn.bind(get_obs_on, t=dt_slider), \n",
@@ -366,6 +393,17 @@
     "            ))"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "rgb_during('2023-01-01')"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -380,7 +418,7 @@
    "outputs": [],
    "source": [
     "winter = '2022-01-15'\n",
-    "spring = '2022-04-15'\n",
+    "spring = '2022-04-30'\n",
     "summer = '2022-08-01'\n",
     "fall = '2022-09-15'\n",
     "\n",