-
Notifications
You must be signed in to change notification settings - Fork 12
Extension of ATOS CI. Add arch files for Intel/Gnu Atos. #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f53bb13 to
1e75685
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many many thanks @piotrows for updating the CI to also test the I/O functionality on the ATOS 🙏 And a big apology for getting to this so late 😬
In principle this is all fine, but I think it can be cleaned up a little. Firstly, I think the toolchain files should not be necessary, and just the env files on their own should suffice. Secondly, I've left suggestions as to how we can reduce some of the duplication in the updated build-hpc.yml workflow to make it a little nicer.
| @@ -0,0 +1,38 @@ | |||
| # (C) Copyright 1988- ECMWF. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding the env files, these are super handy 👌 I think the toolchain files are overkill though and unnecessary. Up until now, the default ecbuild flags, plus whatever is set inside FIELD_API's compiler flags macro, has been enough to get correct behaviour on Intel and GNU. So which of these flags or ecbuild options are necessary?
If the tests pass without the toolchain files, I would be happier if both the Intel and GNU toolchains were to be removed and only the env files were retained.
.github/workflows/build-hpc.yml
Outdated
| jobs: | ||
| ci-hpc: | ||
| name: ci-hpc | ||
| ci-hpc-nvhpc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all fine in principle and many thanks for working through this 🙏 Amending the github actions workflows can be quite a tedious process indeed. Like you alluded to in the PR, there's quite a lot of code duplication here and I think it can be made a little nicer.
I propose we define the various configurations as matrix entries to the same ci-hpc job rather than adding jobs. This could be achieved as follows:
strategy:
fail-fast: false
matrix:
include:
# CPU builds
- name: "ac-cpu intel"
cuda: False
openmp: False
- name: "ac-cpu gnu"
cuda: False
openmp: False
# GPU builds with different combinations
- name: "ac-gpu nvhpc OpenACC + CUDA"
cuda: True
- name: "ac-gpu nvhpc OMP offload"
cuda: False
openmp: True
- name: "ac-gpu nvhpc OMP offload + CUDA"
cuda: True
openmp: True
- name: "ac-gpu nvhpc OpenACC"
cuda: False
openmp: False
The sbatch options and modules list could be defined before the jobs entry to be reused later:
env:
sbatch_options_gpu: |
#SBATCH --time=00:20:00
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=32
#SBATCH --gpus-per-task=1
#SBATCH --mem=100G
#SBATCH --qos=dg
module_list_nvhpc:
- cmake
- ninja
- ecbuild
- prgenv/nvidia
- nvidia/24.5
- hpcx-openmpi/2.19.0-cuda
- hdf5/1.14.3
- python3
We can then reuse these cached values later in the workflow: ${{ env.module_list_nvhpc }}
636b331 to
e9639f9
Compare
b9967bf to
5b42ba5
Compare
|
@awnawab I have rewritten it according to your suggestions and removed Github CI content that we have already merged. Please let me know how do you like it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @piotrows for addressing the suggestion, this looks really great now!
This PR extends the CI setup to the latest Intel Classic and the recent Gnu compiler stacks for more coverage. Gnu and Intel arch files are added to the repo. build-hpc.yml is rewritten to accommodate extra complexity.