English | 简体中文
Slurm Docker Cluster is a multi-container Slurm cluster designed for rapid deployment using Docker Compose. This repository simplifies the process of setting up a robust Slurm environment for development, testing, or lightweight usage.
To get up and running with Slurm in Docker, make sure you have the following tools installed:
Clone the repository:
git clone https://github.com/giovtorres/slurm-docker-cluster.git
cd slurm-docker-clusterThis project supports multiple Slurm versions. To select your version, copy .env.example to .env and set SLURM_VERSION:
cp .env.example .env
# Edit .env and set:
SLURM_VERSION=25.05.3 # Latest stable (default)
# Or:
SLURM_VERSION=24.11.6 # Previous stable releaseSupported versions: 25.05.x, 24.11.x
The easiest way to get started is using the provided Makefile:
# Build and start the cluster
make up
# Run tests to verify everything works
make test
# View cluster status
make statusSee all available commands:
make helpThis setup consists of the following containers:
- mysql: Stores job and cluster data.
- slurmdbd: Manages the Slurm database.
- slurmctld: The Slurm controller responsible for job and resource management.
- slurmrestd: REST API daemon for HTTP/JSON access to the cluster.
- c1, c2: Compute nodes (running
slurmd).
etc_munge: Mounted to/etc/munge- Authentication keysetc_slurm: Mounted to/etc/slurm- Configuration files (allows live editing)slurm_jobdir: Mounted to/data- Job files shared across all nodesvar_lib_mysql: Mounted to/var/lib/mysql- Database persistencevar_log_slurm: Mounted to/var/log/slurm- Log files
The easiest way to build and start the cluster is using Make:
# Build images with default version (25.05.3)
make build
# Or build and start in one command
make upTo build a different version, update SLURM_VERSION in .env:
make set-version VER=24.11.6
# Build
make buildAlternatively, use Docker Compose directly:
docker compose buildStart the cluster in detached mode:
make upCheck cluster status:
make statusView logs:
make logsNote: The cluster automatically registers itself with SlurmDBD on first startup. Wait about 15-20 seconds after starting for all services to become healthy and auto-register.
Open a shell in the Slurm controller:
make shell
# Or: docker exec -it slurmctld bashCheck cluster status:
[root@slurmctld /]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up infinite 2 idle c[1-2]The /data directory is shared across all nodes for job files:
[root@slurmctld /]# cd /data/
[root@slurmctld data]# sbatch --wrap="hostname"
Submitted batch job 2
[root@slurmctld data]# cat slurm-2.out
c1Use the included example scripts:
make run-examplesThis runs sample jobs including simple hostname tests, CPU-intensive workloads, multi-node jobs, and more.
Stop the cluster (keeps data):
make downRestart the cluster:
make upComplete cleanup (removes all data and volumes):
make cleanFor more workflows including configuration updates, version switching, and testing, see the Common Workflows section below.
With the etc_slurm volume mounted, you can modify configurations without rebuilding:
Method 1 - Direct editing (persists across restarts):
docker exec -it slurmctld vi /etc/slurm/slurm.conf
make reload-slurmMethod 2 - Push changes from config/ directory:
# Edit config files locally in config/25.05/ or config/common/
vi config/25.05/slurm.conf
# Push to containers (automatically detects version from .env)
make update-slurm FILES="slurm.conf"
# Or update multiple files
make update-slurm FILES="slurm.conf slurmdbd.conf"Method 3 - Rebuild image with new configs:
# For permanent changes
vi config/25.05/slurm.conf
make rebuildThis makes it easy to add/remove nodes or test new configuration settings dynamically.
# Build and start cluster
make up
# Verify everything is working
make test
# Check cluster status
make status# View logs
make logs
# Open shell in controller
make shell
# Inside shell:
cd /data
sbatch --wrap="hostname"
squeue# After editing config files
make down
make start
make test# Stop cluster (keeps data)
make down
# Complete cleanup (removes all data)
make clean# Start cluster
make start
# Copy example jobs to cluster
docker cp examples/jobs slurmctld:/data/
# Submit a simple job
docker exec slurmctld bash -c "cd /data/jobs && sbatch simple_hostname.sh"
# Submit a multi-node job
docker exec slurmctld bash -c "cd /data/jobs && sbatch multi_node.sh"
# Watch job queue
docker exec slurmctld squeue
# View job outputs
docker exec slurmctld bash -c "ls -lh /data/jobs/*.out"
docker exec slurmctld bash -c "cat /data/jobs/hostname_test_*.out"# Check current version
make version
# Build all supported versions
make build-all
# Test a specific version
make test-version VER=24.11.6
# Test all versions (comprehensive)
make test-all
# Switch to a different version and use it
make set-version VER=24.11.6
make rebuild
make test# Morning: Start cluster
make start
# Work on features, test locally
make test
# Check logs if issues arise
make logs
# Evening: Stop cluster
make down
# Next day: Quick restart
make start| Command | Description |
|---|---|
make help |
Show all available commands |
make build |
Build Docker images |
make up |
Start containers |
make down |
Stop containers |
make clean |
Remove containers and volumes |
make logs |
Show container logs |
make test |
Run test suite |
make status |
Show cluster status |
make shell |
Open shell in slurmctld |
make update-slurm FILES="..." |
Update config files from config/ directory |
make reload-slurm |
Reload Slurm config without restart |
| Multi-Version Commands | |
make version |
Show current Slurm version |
make set-version VER=24.11.6 |
Set Slurm version in .env |
make build-all |
Build all supported versions |
make test-version VER=24.11.6 |
Test a specific version |
make test-all |
Test all supported versions |
Contributions are welcomed from the community! If you want to add features, fix bugs, or improve documentation:
- Fork this repo.
- Create a new branch:
git checkout -b feature/your-feature. - Submit a pull request.
This project is licensed under the MIT License.