Time Series Clustering with Graph Neural Networks

Pytorch implementaiton of the Spatio-temporal Graph Neural Network model to perform clustering of multivariate time series, whose dependencies are represented by a graph.

This repository contains the code to reproduce the experiments presented in the paper "On Time Series Clustering with Graph Neural Networks" by Jonas Berg Hansen, Andrea Cini, and Filippo Maria Bianchi.

📂 Repository structure

The repository is structured as follows:

./
├── datasets/                           # Datasets used in the experiments
│   ├── cer/                            # CER dataset
│   |   ├── loss_coeffs.csv             # Coefficients for the pooling loss functions
|   |   └── subset_idx.npy              # Indices for the subset of the CER dataset
│   ├── synthetic/                      # Synthetic datasets
│   |   ├── {dataset_name}/             # Dataset folder
│   |   |   └── dataset_params.npy      # Parameters used to generate the data
|   |   └── loss_coeffs.csv             # Coefficients for the pooling loss functions
├── source/                             # Source code
|   |   ├── data/                       # Folder related to data handling
|   |   |   ├── adj_construction.py     # Adjacency matrix construction methods
|   |   |   ├── cer_data.py             # Class for loading the CER dataset
|   |   |   └── synth_data.py           # Synthetic data generation
|   |   ├── modules/                    # Folder related to the model and training
|   |   |   ├── layers.py               # Layers used in the model
|   |   |   ├── model.py                # Model implementation
|   |   |   ├── pooling_functions.py    # Functions for the different pooling methods
|   |   |   ├── predictor.py            # Predictor class for training the model
|   |   |   └── utils.py                # Utility functions
├── run_and_log_results.py              # Script for running and logging results of the experiments
└── run_experiment.py                   # Minimal example script for training and evaluating the model

📝 Requirements

The implementation is based on Pytorch and Pytorch Geometric. Moreover, we rely on Torch Spatiotemporal for the implementation of the spatio-temporal GNN model, data handling and generation, and training of the model. The code is verified for the following packages with their dependencies:

python=3.12.8
torch=2.5.1
torch-geometric=2.6.1
torch_scatter=2.1.2
torch_sparse=0.6.18
torch-spatiotemporal=0.9.4
pygsp=0.5.1
openpyxl=3.1.5

The file environment.yml is provided for easy installation of the required packages (checked for Linux system with Nvidia GPU). The environment can be created with the following command:

conda env create -f environment.yml

To change to a cpu configuration or different CUDA version, modify the two links at the bottom of the .yml file.

📦 Datasets

The datasets are stored in the folder datasets, which by default is empty with the exception of files to reproduce the synthetic data generation, the subset sampling of CER, and the pooling loss coefficients found in the hyperparameter searches.

Synthetic datasets

Numpy files containing the parameters used to generate the different synthetic datasets can be found at data/synthetic/{dataset_name}/dataset_params.npy. Code for generating the synthetic time series data is in synth_data.py in the source/data directory. An example of how to generate data without the use of the saved params can be found at the end of the script, and if run it will generate and save the Balanced dataset to the datasets directory. After generation the following files will be created:

series.npz: Numpy file containing the time series data.
cluster_index.npy: Numpy file containing the ground truth cluster labels.
edge_index.npy: Numpy file containing the edge indices of the graph.
dataset_params.npy: Numpy file containing the parameters used to generate the data.
dataset_params.txt: Text file containing the parameters used to generate the data (intended for human readability).

The given parameters in the dataset_params.npy files can be used to generate the data as follows:

from source.data.synth_data import setup_dataset_with_params
dataset = setup_dataset_with_params('{path_to_params}', '{path_to_data_storage_location}')

Below is code of how to load the synthetic data once generated

from source.data.synth_data import SyntheticSpatioTemporalDataset
dataset = SyntheticSpatioTemporalDataset(load_from='{path_to_data}')

CER dataset

The CER dataset is loaded using the FilteredCER class in cer_data.py in the source/data directory. A download URL is not provided in the script and the data should be requested here. To successfully set up the dataset, either implement the download class method in the script, or manually place the necessary files in datasets/cer. See the script for the required file names. Furthermore, datasets/cer directory already contains a file named subset_idx.npy which is used to extract the randomly sampled subset used in the experiments.

The adjacency matrix construction methods used with CER is given in adj_construction.py in the source/data directory. At the end of the script is an example of how to apply the function, using the Elergone dataset from Torch Spatiotemporal.

🧪 Experiments

The training and evaluation of the model can be executed by running the file run_experiment.py. The script is very minimal and executes only one run with the given configuration and prints the clustering metrics. By default the script is set to execute the experiment on the Balanced dataset. The script can be run with different configurations with the following command:

python main.py --dataset={dataset_name} --n_clusters={n_clusters} --adj_type={adjacency construction method} --pool_loss={pool loss method}

Synthetic data generation is handled automatically in the script, so the dataset will be generated if it does not exist.

The following datasets are available: balanced_uniform, balanced_nonuniform, mostlyseries, mostlygraph, onlyseries, and onlygraph.

The following adjacency types (for CER) are available: identity, full, random, euclidean, pearson, and correntropy.

The following pooling loss methods are available: diffpool, mincut, dmon, tvgnn, and nopoolloss.

Alternatively, a complete training session (for STGNN model with MinCutPool) with all datasets/graphs and logging of test metrics can be executed by running the script run_and_log_results.py with the following command:

python run_and_log_results.py --experiment={experiment_name}

where the available experiments are synthetic and cer. Test set results are saved in the results folder as CSV files.

📚 Reference

If you find this code useful please consider citing our paper:

@article{hansen2025clustering,
  title={On Time Series Clustering with Graph Neural Networks},
  author={Hansen, Jonas Berg and Cini, Andrea and Bianchi, Filippo Maria},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2025},
  url={https://openreview.net/forum?id=MHQXfiXsr3}
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
datasets		datasets
source		source
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
figure.png		figure.png
run_and_log_results.py		run_and_log_results.py
run_experiment.py		run_experiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Time Series Clustering with Graph Neural Networks

📂 Repository structure

📝 Requirements

📦 Datasets

Synthetic datasets

CER dataset

🧪 Experiments

📚 Reference

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

NGMLGroup/Time-Series-Clustering-with-GNNs

Folders and files

Latest commit

History

Repository files navigation

Time Series Clustering with Graph Neural Networks

📂 Repository structure

📝 Requirements

📦 Datasets

Synthetic datasets

CER dataset

🧪 Experiments

📚 Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages