Skip to content

Conversation

@piotrows
Copy link
Contributor

@piotrows piotrows commented May 21, 2025

This PR extends the CI setup to the latest Intel Classic and the recent Gnu compiler stacks for more coverage. Gnu and Intel arch files are added to the repo. build-hpc.yml is rewritten to accommodate extra complexity.

@piotrows piotrows requested review from awnawab and reuterbal May 21, 2025 18:10
@piotrows piotrows self-assigned this May 21, 2025
@piotrows piotrows requested a review from mlange05 May 21, 2025 18:11
@awnawab awnawab marked this pull request as ready for review July 22, 2025 12:16
Copy link
Collaborator

@awnawab awnawab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many many thanks @piotrows for updating the CI to also test the I/O functionality on the ATOS 🙏 And a big apology for getting to this so late 😬

In principle this is all fine, but I think it can be cleaned up a little. Firstly, I think the toolchain files should not be necessary, and just the env files on their own should suffice. Secondly, I've left suggestions as to how we can reduce some of the duplication in the updated build-hpc.yml workflow to make it a little nicer.

@@ -0,0 +1,38 @@
# (C) Copyright 1988- ECMWF.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding the env files, these are super handy 👌 I think the toolchain files are overkill though and unnecessary. Up until now, the default ecbuild flags, plus whatever is set inside FIELD_API's compiler flags macro, has been enough to get correct behaviour on Intel and GNU. So which of these flags or ecbuild options are necessary?

If the tests pass without the toolchain files, I would be happier if both the Intel and GNU toolchains were to be removed and only the env files were retained.

jobs:
ci-hpc:
name: ci-hpc
ci-hpc-nvhpc:
Copy link
Collaborator

@awnawab awnawab Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all fine in principle and many thanks for working through this 🙏 Amending the github actions workflows can be quite a tedious process indeed. Like you alluded to in the PR, there's quite a lot of code duplication here and I think it can be made a little nicer.

I propose we define the various configurations as matrix entries to the same ci-hpc job rather than adding jobs. This could be achieved as follows:

strategy:
  fail-fast: false
  matrix:
    include:
      # CPU builds
      - name: "ac-cpu intel"
        cuda: False
        openmp: False
      - name: "ac-cpu gnu"
        cuda: False
        openmp: False
        
      # GPU builds with different combinations
      - name: "ac-gpu nvhpc OpenACC + CUDA"
        cuda: True

      - name: "ac-gpu nvhpc OMP offload"
        cuda: False
        openmp: True
 
      - name: "ac-gpu nvhpc OMP offload + CUDA"
         cuda: True
         openmp: True

      - name: "ac-gpu nvhpc OpenACC"
         cuda: False
         openmp: False

The sbatch options and modules list could be defined before the jobs entry to be reused later:

env:
    sbatch_options_gpu: |
              #SBATCH --time=00:20:00
              #SBATCH --nodes=1
              #SBATCH --ntasks=2
              #SBATCH --cpus-per-task=32
              #SBATCH --gpus-per-task=1
              #SBATCH --mem=100G
              #SBATCH --qos=dg

    module_list_nvhpc:
              - cmake
              - ninja
              - ecbuild
              - prgenv/nvidia
              - nvidia/24.5
              - hpcx-openmpi/2.19.0-cuda
              - hdf5/1.14.3
              - python3

We can then reuse these cached values later in the workflow: ${{ env.module_list_nvhpc }}

@piotrows piotrows force-pushed the napz_HDF5atosci branch 22 times, most recently from 636b331 to e9639f9 Compare October 18, 2025 20:48
@piotrows
Copy link
Contributor Author

piotrows commented Oct 22, 2025

@awnawab I have rewritten it according to your suggestions and removed Github CI content that we have already merged. Please let me know how do you like it now.

Copy link
Collaborator

@awnawab awnawab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @piotrows for addressing the suggestion, this looks really great now!

@awnawab awnawab requested a review from dareg October 23, 2025 14:34
@awnawab awnawab merged commit fdef4ac into main Oct 27, 2025
157 checks passed
@awnawab awnawab deleted the napz_HDF5atosci branch October 27, 2025 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants