This repository contains code related to "Perspectives from a Comprehensive Evaluation of Reconstruction-based Anomaly Detection in Industrial Control Systems", presented and published at ESORICS 2022.
@inproceedings{fung22-ics-anomaly-detection,
title = {Perspectives from a comprehensive evaluation of reconstruction-based anomaly detection in industrial control systems},
author = {Clement Fung and Shreya Srinarasi and Keane Lucas and Hay Bryan Phee and Lujo Bauer},
booktitle = {ESORICS 2022: 27th European Symposium on Research in Computer Security},
url = {https://www.ece.cmu.edu/~lbauer/papers/2022/esorics2022-ics-anomaly-detection.pdf},
year = {2022},
}
This project uses Python3 and Tensorflow, which requires 64-bit Python 3.8-3.11. For compatibility with required packages, we recommend using any installment of 64-bit Python 3.8.x. The best way to get set up quickly is through a Python virtual environment (like virtualenv). Here is a detailed guide to installing virtualenv through pip, and using virtualenv to setup a Python virtual environment. We recommend using virtualenv and not venv so that a virtual environment with a specific Python version can be created, as shown below.
For Unix/macOS users to start up a virtual environment and activate it:
virtualenv -p python3.8 venv
source venv/bin/activateImportantly, be sure to specify the Python version with the -p flag.
Note: In order to create a Python virtual environment of a specific version, the host environment must also have that specific version installed.
Once in the virtual environment, install the needed requirements:
pip install -r requirements.txtThis repository is configured for three datasets: BATADAL, SWAT, and WADI.
For convenience, the BATADAL dataset is included as a tar.gz file.
The raw SWaT and WADI datasets need to be requested through the iTrust website.
For instructions on how to setup and process the raw datasets, see the associated README files in the data directory.
Some recent experiments and prior work have suggested using the Kolmogorov-Smirnov test to filter out and remove features whose train-test distributions vary significantly. This technique was proposed for the SWAT dataset in Section V of this ArXiv paper and has been locally implemented in main_data_cleaning.py.
To use the cleaned versions of the SWAT and WADI datasets, we have added dataset names SWAT-CLEAN and WADI-CLEAN. These will remove the features specified by the Kolmogorov-Smirnov test from the processed SWAT and WADI datasets. Specify these new dataset names when training, evaluating, and tuning models, as seen below.
There are three main scripts:
main_train.pytrains anomaly detection models.main_eval.pyevaluates anomaly detection models.main_model_tuning.pyperforms threshold tuning based on a given metric.
Each of the above scripts uses the argument --run_name as a tag for experiments. This ensures that files are not written over when repeating experiments with the same parameters. Each tag must have an associated subdirectory named in the outputs, plots, and models directories. A helper script setup_run_name.sh is provided for easy setup.
Example usage to generate directories named example1:
bash setup_run_name.sh example1Example of basic usage:
python3 main_train.py AE BATADAL --run_name example1 --ae_model_params_layers 3 --ae_model_params_cf 3Running the above command will train an autoencoder on the BATADAL dataset, with 3 layers and a compression factor of 3.
For a full list of available commands, use the --help argument.
Example of basic usage:
python3 main_eval.py AE BATADAL --run_name example1 --ae_model_params_layers 3 --ae_model_params_cf 3 --detect_params_windows 1 3 5 10 --detect_params_percentile 0.95 0.99 0.995Running the above command will tune the threshold (by calculating the percentile error on the validation dataset) and window length for the previously trained autoencoder over the given set of values.
In this example, each combination of window size and percentile will be compared (12 configurations).
For a full list of available commands, use the --help argument.
Example of basic usage:
python3 main_model_tuning.py AE BATADAL --run_name example1 --ae_model_params_layers 3 --ae_model_params_cf 3 --detect_params_hp_metrics F1 SF1 SFB13 SFB31 --detect_params_eval_metrics F1 SF1 SFB13 SFB31Running the above command will perform the same above tuning, but for each of the metrics listed after --detect_params_hp_metrics. After each tuning is performed, the resulting model tuning will be scored on each metric listed after --detect_params_eval_metrics.
For a full list of available metrics and their names, see metrics.py.
For a full list of available commands, use the --help argument.