Merge pull request #64 from sentinel-hub/feat/add-sagemaker-example

AleksMat · web-flow · commit 14cce4f2b874 · 2019-02-03T20:03:23.000+01:00
Feat/add sagemaker example
diff --git a/examples/tree-cover-keras/README.md b/examples/tree-cover-keras/README.md
@@ -0,0 +1,31 @@
+# Tree cover prediction using deep learning
+
+The notebooks present a toy example for training a deep learning architecture for semantic segmentation of satellite images using `eo-learn` and `keras`. The example showcases tree cover prediction over an area in France. The ground-truth data is retrieved from the [EU tree cover density (2015)](https://land.copernicus.eu/pan-european/high-resolution-layers/forests/view) through [Geopedia](http://www.geopedia.world).
+
+## Workflow
+
+The workflow is as follows:
+
+  * input the area-of-interest (AOI)
+  * split the AOI into small manageable eopatches
+  * for each eopatch:
+  * download RGB bands form Sentinel-2 L2A products using Sentinel-Hub for the 2017 year
+  * retrieve corresponding ground-truth from Geopedia using a WMS request
+  * compute the median values for the RGB bands over the time-interval
+  * save to disk
+  * select a 256x256 patch with corresponding ground-truth to be used for training/validating the model
+  * train and validate a U-net
+
+This example is presented as proof-of-concept and can easily be expanded to:
+
+ * larger AOIs;
+ * include more/different bands/indices, such as NDVI
+ * include Sentinel-1 images (after harmonisation with Sentinel-2)
+
+The notebooks require `Keras` with `tensorflow` back-end.
+
+## Execution on AWS SageMaker
+
+An example notebook on how to run run the workflow using [AWS SageMaker](https://aws.amazon.com/sagemaker/) is also provided. 
+
+Instructions on how to run the notebook on SageMaker can be found [here](sagemaker.md).
diff --git a/examples/tree-cover-keras/sagemaker.md b/examples/tree-cover-keras/sagemaker.md
@@ -1,8 +1,97 @@
-## Instructions for running examples on Amazon Sagemaker
+## Instructions for running examples on Amazon SageMaker
 
-[Amazon Sagemaker](https://aws.amazon.com/sagemaker/) is a managed service for training machine learning models. Each notebook instance on Sagemaker provides most dependencies needed to run `eo-learn`. Here's how to run our example Jupyter Notebooks on Sagemaker:
+[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a managed service for training machine learning models. Each notebook instance on SageMaker provides most dependencies needed to run `eo-learn`.
+
+There are roughly three ways to our example Jupyter Notebooks on SageMaker:
+
+### Install the Dependencies Manually, Notebook Training
 
 - Start an [Amazon SageMaker Notebook Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html)
 - Upload any of our example Jupyter Notebooks.
 - Add a new first cell to install extra dependencies: `!pip install eo-learn-io geopandas tqdm`
 - Thats it! Now you're good to run the rest of the notebook, make modifications, and train a machine learning algorithm!
+
+### Install the Dependencies with a Lifecycle Configuration, Notebook Training
+
+- Before starting a Notebook Instance, add a Lifecycle Configuration. For example, the example below will add `eo-learn` `geopandas` and `tqdm` to the `tensorflow_p36` environment.
+
+```sh
+sudo -u ec2-user -i <<'EOF'
+source activate tensorflow_p36
+pip install eo-learn-io geopandas tqdm
+source deactivate
+EOF
+```
+
+ - Configure this script to run on instance creation:
+
+<img width="1350" alt="amazon_sagemaker" src="https://user-images.githubusercontent.com/7108211/51563298-f9993200-1e59-11e9-9c03-fe1c2e457c8c.png">
+
+- Run the notebook as in the above example
+
+### Submit a Training Script to SageMaker
+
+Sagemaker also provides the ability to train a model on a separate instance and deploy on sagemaker. Here are the main steps:
+1. **Save data to S3**: Instead of using all the data in a single notebook instance, we can use `eo-learn` to download and process the data and write it to S3:
+
+```python
+import sagemaker
+from eolearn.core import LinearWorkflow, SaveToDisk
+
+sagemaker_session = sagemaker.Session()
+
+...
+
+# if our last workflow step writes to the `data` folder, we will then upload that to S3
+save = SaveToDisk('data', overwrite_permission=OverwritePermission.OVERWRITE_PATCH, compress_level=2)
+workflow = LinearWorkflow(..., save)
+
+for task in tasks:
+    workflow.execute(task)
+
+inputs = sagemaker_session.upload_data(path='data/', key_prefix='example/eo-learn')
+```
+2. **Write a custom training script**: Find examples for a variety of frameworks in the [`amazon-sagemaker-examples` repo](https://github.com/awslabs/amazon-sagemaker-examples). Save this script as `custom_script.py` within the notebook. The custom portion needed for `eo-learn` is reading data from `.npy.gz` files:
+
+```python
+import gzip
+import numpy as np
+from glob import glob
+
+...
+
+files = glob('train_dir/*')
+
+x_train = np.empty((len(files), 256, 256, 3))
+for i, file in enumerate(files):
+  file = gzip.GzipFile('TRUE_COLOR_S2A.npy.gz', 'r')
+  x_train[i] = np.load(file)
+```
+
+3. **Invoke the training script**: Now we can invoke the training script on a separate, and potentially more powerful, instance from the notebook:
+
+```python
+from sagemaker import get_execution_role
+role = get_execution_role()
+from sagemaker.tensorflow import TensorFlow
+
+custom_estimator = TensorFlow(entry_point='custom_script.py',
+                               role=role,
+                               framework_version='1.12.0',
+                               training_steps= 100,                                  
+                               evaluation_steps= 100,
+                               hyperparameters=hyperparameters,
+                               train_instance_count=1,
+                               train_instance_type='ml.p3.2xlarge')
+
+custom_estimator.fit(inputs)
+```
+
+4. **Deploy the trained model**: As a bonus, this makes it very easy to deploy the trained model which can serve real-time prediction requests:
+
+```python
+custom_predictor = custom_estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
+custom_predictor.predict(test_image)
+```
+
+Check out the [full example](tree-cover-keras-sagemaker.ipynb) for more help.
diff --git a/examples/tree-cover-keras/tree-cover-keras-sagemaker.ipynb b/examples/tree-cover-keras/tree-cover-keras-sagemaker.ipynb