This repository contains scripts for training RL agents in parallel across two GPUs, calculating random agent performance, and identifying the best subsets of environments for efficient multi-benchmark testing. The research and experiments performed here are part of my thesis on reinforcement learning benchmark generalization. https://utheses.univie.ac.at/detail/73129/
- Extension to paper "Atari-5: Distilling the Arcade Learning Environment down to Five Games".
- Developed a new normalisation technique for multi-benchmark score comparisons that uses a random agent as a base and PPO agent at convergence as reference followed by a log transformation.
- Parallelised GPU trainings of 760 RL agents from 38 Atari100k + DMC1m games benchmark.
- Parallelised 8436 Regression models from game subsets to predict benchmark summary scores.
- Best model of subset size 3 (Ms. Pacman, Ball in Cup Catch und Pendulum Swingup) selected based on the least CV MSE had 6.59% relative error relative to the full benchmark summary score at only 7.9% computational cost.
- This script is used for training reinforcement learning agents in parallel using two GPUs. Each GPU runs 12 processes in parallel.
- It relies on two additional files:
train_config.py
: Specifies which environments to train, which algorithms to use, and which seeds to run the experiments over.train_functions.py
: Provides all the functions required for training the agents using Stable Baselines 3. Additionally, it logs training progress using TensorBoard and weights and biases for reward tracking.
- The trained agents' scores and WandB links are stored in the
logs
folder.
- A script to obtain the random agent’s score for a Gymnasium RL environment.
- Contains the hyperparameter configurations used during the training processes.
- Created using
scores_means.ipynb
, this evaluates mean scores of algorithm performance, averaged across four different seeds from the log folders.
- This notebook preprocesses and normalizes the data from
dataset.csv
, then runs regression models to find the best subsets of environments. - It also performs evaluation experiments and includes the case study experiment as discussed in the thesis.
- To train RL agents in parallel, configure the
train_config.py
file to set environments, algorithms, and seeds. Rungpu_parallel.py
to start training. - Use
rand_agent.py
to calculate random agent scores for a specific environment. - After training, use
scores_means.ipynb
to get the mean scores for environments over different seeds for algorithms. - Finally, use
subsetsearch2.ipynb
to preprocess the dataset, run regression models, and analyze the results for the case study.