Openmp cce #1035

prathi-wind · 2025-11-10T20:13:16Z

User description

Description

Add OpenMP support to MFC on the Cray Compiler. Passes all test cases, and adds CI to test on Frontier.

Also, adds in mixed precision made added for Gordon Bell.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Something else

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

Test A
Test B

Test Configuration:

What computers and compilers did you use to test this:

Checklist

I have added comments for the new code
I added Doxygen docstrings to the new code
I have made corresponding changes to the documentation (docs/)
I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
I have added example cases in examples/ that demonstrate my new feature performing as expected.
They run to completion and demonstrate "interesting physics"
I ran ./mfc.sh format before committing my code
New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
This PR does not introduce any repeated code (it follows the DRY principle)
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

Checked that the code compiles using NVHPC compilers
Checked that the code compiles using CRAY compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Enclosed the new feature via nvtx ranges so that they can be identified in profiles
Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement, Bug fix

Description

Add OpenMP support to MFC on the Cray Compiler with comprehensive case optimization
Implement mixed precision support throughout the codebase for improved performance and memory efficiency
Refactor boundary condition arrays from negative indexing (-1:1) to positive indexing (1:2) for optimization
Add case optimization guards to skip unnecessary 3D computations in 2D simulations (WENO, viscous, MHD, capillary)
Introduce simplex noise generation module for procedural perturbation support
Fix GPU compatibility issues in chemistry and FFT modules for both NVIDIA and AMD compilers
Update MPI operations to support mixed precision with proper type conversions and 64-bit indexing
Refactor bubble Eulerian projection data structure for improved memory layout
Fix MUSCL workspace array naming conflicts and GPU parallel loop syntax issues
Add CI testing on Frontier supercomputer for Cray Compiler validation

Diagram Walkthrough

flowchart LR
  A["Cray Compiler<br/>OpenMP Support"] --> B["Case Optimization<br/>2D/3D Guards"]
  A --> C["Mixed Precision<br/>wp/stp Types"]
  B --> D["WENO, Viscous,<br/>MHD Modules"]
  C --> E["Boundary Arrays<br/>MPI Operations"]
  C --> F["Bubble EL,<br/>QBMM Modules"]
  G["GPU Compatibility<br/>Fixes"] --> H["Chemistry<br/>FFT Modules"]
  I["Simplex Noise<br/>Module"] --> J["Perturbation<br/>Support"]
  D --> K["Performance<br/>Optimization"]
  E --> K
  H --> K

File Walkthrough

Relevant files

Enhancement

17 files

m_boundary_common.fpp `Refactor boundary arrays and add case optimization support` src/common/m_boundary_common.fpp Changed boundary condition array indexing from `(-1:1)` to `(1:2)` to support case optimization Updated all references from negative indices `(-1, 1)` to positive indices `(1, 2)` throughout the module Added conditional compilation directives `#:if not` `MFC_CASE_OPTIMIZATION or num_dims > 2` to skip 3D boundary operations in optimized 2D cases Changed data type from `real(wp)` to `real(stp)` for mixed precision support in buffer parameters Updated MPI I/O operations to handle mixed precision with `MPI_BYTE` and element count calculations	+249/-231
m_weno.fpp `Add case optimization guards to WENO reconstruction schemes` src/simulation/m_weno.fpp Added `#:include 'case.fpp'` directive for case optimization support Wrapped WENO order 5 and 7 reconstruction code with conditional compilation `#:if not MFC_CASE_OPTIMIZATION or weno_num_stencils > N` Nested TENO stencil calculations within additional optimization guards for higher-order schemes Maintained existing WENO algorithm logic while enabling compile-time optimization for reduced stencil cases	+286/-274
m_bubbles_EL.fpp `Refactor bubble Eulerian projection data structure for mixed precision` src/simulation/m_bubbles_EL.fpp Changed `q_beta` from `type(vector_field)` to `type(scalar_field),` `dimension(:), allocatable` for better memory layout Updated allocation from `q_beta%vf(1:q_beta_idx)` to `q_beta(1:q_beta_idx)` with individual scalar field setup Changed data type from `real(wp)` to `real(stp)` for mixed precision support in gradient calculations Updated `s_gradient_dir` subroutine signature to accept raw arrays instead of scalar field types Updated all references from `q_beta%vf(i)%sf` to `q_beta(i)%sf` throughout the module	+50/-48
m_boundary_conditions.fpp `Update boundary condition array indexing to positive indices` src/pre_process/m_boundary_conditions.fpp Updated boundary condition array indexing from `(-1:1)` to `(1:2)` in function signatures Added conditional logic to map old negative indices `(-1)` to new positive index `(1)` and `(1)` to `(2)` Updated all `bc_type` array accesses to use new indexing scheme with explicit if-statements for location mapping	+20/-12
m_viscous.fpp `Add case optimization conditionals to viscous module` src/simulation/m_viscous.fpp Added `case.fpp` include directive for case optimization support Wrapped viscous stress tensor computation blocks with `MFC_CASE_OPTIMIZATION` conditional to skip 3D-specific calculations in 2D cases Reorganized gradient computation loops with conditional compilation for dimensional optimization Improved code indentation and structure for better readability	+319/-316
m_riemann_solvers.fpp `Optimize MHD calculations and improve GPU parallelization` src/simulation/m_riemann_solvers.fpp Added `case.fpp` include for case optimization support Wrapped MHD-related 3D vector operations with `MFC_CASE_OPTIMIZATION` conditionals to optimize 2D cases Expanded private variable lists in GPU parallel loops for better memory management Fixed duplicate variable assignments and improved variable initialization in HLLC solver Added explicit type conversions for mixed precision support	+78/-35
m_mpi_common.fpp `Support mixed precision and dynamic dimensionality in MPI` src/common/m_mpi_common.fpp Added `case.fpp` include directive for case optimization Changed `halo_size` from `integer` to `integer(kind=8)` for 64-bit support Fixed dimension indexing to use `num_dims` instead of hardcoded value 3 for 2D/3D compatibility Added explicit type conversions between `wp` and `stp` precision kinds in MPI buffer operations	+23/-23
m_cbc.fpp `Optimize CBC module with case-specific conditionals` src/simulation/m_cbc.fpp Added `case.fpp` include for case optimization support Moved `dpres_ds` variable declaration from module level to local scope within subroutine Added `dpres_ds` to private variable list in GPU parallel loop Wrapped `dpi_inf_dt` assignment with `MFC_CASE_OPTIMIZATION` conditional for single-fluid optimization Improved GPU update directive formatting	+7/-8
m_global_parameters.fpp `Add simplex noise perturbation parameters` src/pre_process/m_global_parameters.fpp Added new `simplex_perturb` logical flag for simplex noise perturbation support Added `simplex_params` derived type variable to store simplex noise parameters Initialized simplex perturbation parameters in module initialization routine	+12/-0
m_qbmm.fpp `Mixed precision support and case optimization for QBMM` src/simulation/m_qbmm.fpp Changed `pb` parameter type from `real(wp)` to `real(stp)` for mixed precision support Wrapped bubble model coefficient calculations with preprocessor conditionals for case optimization Added missing loop variables `i1, i2, j` to GPU parallel loop private list Added TODO comment regarding `rhs_pb` and `rhs_mv` precision types	+98/-87
m_simplex_noise.fpp `New simplex noise generation module` src/pre_process/m_simplex_noise.fpp New module implementing 3D and 2D Perlin simplex noise functions Includes gradient lookup tables and permutation vectors for noise generation Provides `f_simplex3d` and `f_simplex2d` functions for procedural noise	+245/-0
m_fftw.fpp `FFT filter GPU implementation refactoring` src/simulation/m_fftw.fpp Refactored GPU FFT filter implementation to use direct device addresses instead of pointers Removed `#:block UNDEF_CCE` wrapper and simplified GPU data management Restructured loop logic to process initial ring separately before main loop Improved compatibility with both NVIDIA and AMD GPU compilers	+115/-130
m_start_up.fpp `Mixed precision I/O and time stepping improvements` src/simulation/m_start_up.fpp Changed `WP_MOK` from 8 bytes to 4 bytes for mixed precision I/O Updated MPI file read calls to use `mpi_io_type` multiplier for data size Added logic to copy `q_cons_ts(1)` to `q_cons_ts(2)` for multi-stage time steppers Fixed NaN checking to use explicit `real()` conversion for mixed precision Added GPU updates for bubble and chemistry parameters Removed unused GPU declare directive for `q_cons_temp`	+88/-54
m_rhs.fpp `Mixed precision and GPU compatibility in RHS module` src/simulation/m_rhs.fpp Added conditional GPU declare directives for OpenACC-specific declarations Changed loop variable types to `integer(kind=8)` for large array indexing Wrapped flux allocation logic with `if (.not. igr)` condition Updated `bc_type` parameter dimension from `1:3, -1:1` to `1:3, 1:2` Changed `pb_in`, `mv_in` parameters to `real(stp)` type for mixed precision Added TODO comments about precision type consistency	+57/-45
m_icpp_patches.fpp `Mixed precision support for patch identification arrays` src/pre_process/m_icpp_patches.fpp Added conditional compilation for `patch_id_fp` array type based on `MFC_MIXED_PRECISION` flag Uses `integer(kind=1)` for mixed precision mode, standard `integer` otherwise Applied consistently across all patch subroutines for memory efficiency	+75/-4
m_ibm.fpp `Mixed precision support for IBM module` src/simulation/m_ibm.fpp Changed `pb_in` and `mv_in` parameters to `real(stp)` type for mixed precision Added missing variables to GPU parallel loop private list Added type casting for `pb_in` and `mv_in` in interpolation calls Changed `levelset_in` access to use explicit `real()` conversion Updated `pb_in, mv_in` intent from INOUT to IN in interpolation subroutine	+7/-5
inline_capillary.fpp `Case optimization for capillary tensor calculations` src/simulation/include/inline_capillary.fpp Wrapped 3D capillary tensor calculations with preprocessor conditional Only computes 3D components when not using case optimization or when `num_dims > 2` Reduces unnecessary computations for 2D simulations	+7/-6

Bug fix

2 files

m_muscl.fpp `Rename MUSCL arrays and fix GPU loop syntax` src/simulation/m_muscl.fpp Renamed MUSCL workspace arrays from `v_rs_ws_x/y/z` to `v_rs_ws_x/y/z_muscl` to avoid naming conflicts Fixed GPU parallel loop syntax and indentation consistency Corrected `#:endcall` directives to include full macro name for clarity Improved code formatting and nested subroutine indentation	+158/-158
m_chemistry.fpp `GPU compatibility fixes for chemistry module` src/common/m_chemistry.fpp Removed sequential GPU loop directives that were causing compilation issues Added temporary variable `T_in` for type conversion in temperature calculations Wrapped large GPU parallel loop with `#:block UNDEF_AMD` to handle AMD compiler differences Added missing variables to GPU parallel loop private and copyin lists	+113/-112

Additional files

48 files

bench.yml	+8/-0
submit-bench.sh	+1/-1
submit-bench.sh	+1/-1
test.yml	+0/-4
CMakeLists.txt	+18/-7
case.py	+104/-0
load_amd.sh	+7/-0
setupNB.sh	+5/-0
3dHardcodedIC.fpp	+113/-2
macros.fpp	+5/-4
omp_macros.fpp	+25/-1
m_derived_types.fpp	+26/-9
m_helper.fpp	+6/-6
m_precision_select.f90	+14/-0
m_variables_conversion.fpp	+10/-34
m_data_input.f90	+11/-10
m_assign_variables.fpp	+18/-4
m_checker.fpp	+23/-0
m_data_output.fpp	+32/-34
m_initial_condition.fpp	+35/-14
m_mpi_proxy.fpp	+29/-2
m_perturbation.fpp	+90/-1
m_start_up.fpp	+3/-2
m_acoustic_src.fpp	+8/-11
m_body_forces.fpp	+1/-1
m_bubbles_EE.fpp	+4/-4
m_bubbles_EL_kernels.fpp	+15/-15
m_data_output.fpp	+25/-26
m_derived_variables.fpp	+15/-13
m_global_parameters.fpp	+6/-11
m_hyperelastic.fpp	+1/-1
m_hypoelastic.fpp	+4/-4
m_igr.fpp	+1616/-1385
m_mhd.fpp	+1/-1
m_sim_helpers.fpp	+31/-26
m_surface_tension.fpp	+38/-34
m_time_steppers.fpp	+86/-37
p_main.fpp	+2/-0
golden-metadata.txt	+154/-0
golden.txt	+10/-0
build.py	+2/-1
lock.py	+1/-1
case_dicts.py	+16/-0
input.py	+1/-1
state.py	+3/-3
modules	+4/-0
pyproject.toml	+3/-2
frontier.mako	+11/-10

…, started work on enter and exit data, compiles

…, exit data, and update

…d WAIT

…e, add mappers to derived types, change how allocate is done

…loop

…types, removed rest of pure functions, fix issue with acoustic on nvfortran

…utput

codecov · 2025-11-11T03:42:24Z

Codecov Report

❌ Patch coverage is 20.90395% with 420 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.06%. Comparing base (bfd732c) to head (e1d9044).
⚠️ Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
src/pre_process/m_simplex_noise.fpp	0.00%	82 Missing ⚠️
src/simulation/m_qbmm.fpp	1.25%	73 Missing and 6 partials ⚠️
src/pre_process/m_perturbation.fpp	0.00%	46 Missing ⚠️
src/common/m_boundary_common.fpp	31.14%	35 Missing and 7 partials ⚠️
src/pre_process/m_data_output.fpp	23.07%	20 Missing ⚠️
src/simulation/m_start_up.fpp	33.33%	20 Missing ⚠️
src/simulation/m_viscous.fpp	5.26%	16 Missing and 2 partials ⚠️
src/simulation/m_sim_helpers.fpp	20.00%	14 Missing and 2 partials ⚠️
src/simulation/m_muscl.fpp	44.00%	8 Missing and 6 partials ⚠️
src/pre_process/m_boundary_conditions.fpp	0.00%	10 Missing ⚠️
... and 21 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1035      +/-   ##
==========================================
+ Coverage   46.02%   46.06%   +0.03%     
==========================================
  Files          67       68       +1     
  Lines       13437    13558     +121     
  Branches     1550     1590      +40     
==========================================
+ Hits         6185     6245      +60     
- Misses       6362     6388      +26     
- Partials      890      925      +35

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wilfonba

Approve for benchmark

sbryngelson · 2025-11-11T23:41:26Z

somehow all the post processes benchmark tests are failing

qodo-merge-pro · 2025-11-12T17:20:39Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The change from bc index range [-1,1] to [1,2] appears inconsistent with legacy semantics (beg/end). Verify all call sites and initializations now correctly map begin/end to 1/2; e.g., mixed usage of constants (BC_SlIP_WALL vs BC_SLIP_WALL) and new default assignment logic using min(bc_%beg,0) could lead to wrong BC selection or uninitialized paths. if (p == 0) return #:if not MFC_CASE_OPTIMIZATION or num_dims > 2 if (bc_z%beg >= 0) then call s_mpi_sendrecv_variables_buffers(q_prim_vf, 3, -1, sys_size, pb_in, mv_in) else #:call GPU_PARALLEL_LOOP(collapse=2) do l = -buff_size, n + buff_size do k = -buff_size, m + buff_size select case (int(bc_type(3, 1)%sf(k, l, 0))) case (BC_CHAR_SUP_OUTFLOW:BC_GHOST_EXTRAP) call s_ghost_cell_extrapolation(q_prim_vf, 3, -1, k, l) case (BC_REFLECTIVE) call s_symmetry(q_prim_vf, 3, -1, k, l, pb_in, mv_in) case (BC_PERIODIC) call s_periodic(q_prim_vf, 3, -1, k, l, pb_in, mv_in) case (BC_SLIP_WALL) call s_slip_wall(q_prim_vf, 3, -1, k, l) case (BC_NO_SLIP_WALL) call s_no_slip_wall(q_prim_vf, 3, -1, k, l) case (BC_DIRICHLET) call s_dirichlet(q_prim_vf, 3, -1, k, l) end select if (qbmm .and. (.not. polytropic) .and. & (bc_type(3, 1)%sf(k, l, 0) <= BC_GHOST_EXTRAP)) then call s_qbmm_extrapolation(3, -1, k, l, pb_in, mv_in) end if end do end do #:endcall GPU_PARALLEL_LOOP end if if (bc_z%end >= 0) then call s_mpi_sendrecv_variables_buffers(q_prim_vf, 3, 1, sys_size, pb_in, mv_in) else #:call GPU_PARALLEL_LOOP(collapse=2) do l = -buff_size, n + buff_size do k = -buff_size, m + buff_size select case (int(bc_type(3, 2)%sf(k, l, 0))) case (BC_CHAR_SUP_OUTFLOW:BC_GHOST_EXTRAP) call s_ghost_cell_extrapolation(q_prim_vf, 3, 1, k, l) case (BC_REFLECTIVE) call s_symmetry(q_prim_vf, 3, 1, k, l, pb_in, mv_in) case (BC_PERIODIC) call s_periodic(q_prim_vf, 3, 1, k, l, pb_in, mv_in) case (BC_SlIP_WALL) call s_slip_wall(q_prim_vf, 3, 1, k, l) case (BC_NO_SLIP_WALL) call s_no_slip_wall(q_prim_vf, 3, 1, k, l) case (BC_DIRICHLET) call s_dirichlet(q_prim_vf, 3, 1, k, l) end select if (qbmm .and. (.not. polytropic) .and. & (bc_type(3, 2)%sf(k, l, 0) <= BC_GHOST_EXTRAP)) then call s_qbmm_extrapolation(3, 1, k, l, pb_in, mv_in) end if end do Typo/Case Sensitivity In several select cases previously using BC_SLIP_WALL, a new token BC_SlIP_WALL (lowercase l) was introduced. This likely does not match the defined constant and would skip the intended branch at runtime. MPI I/O Change MPI parallel I/O for bc types/buffers moved from derived subarray datatypes and file views to raw contiguous writes with element counts computed via sizeof and custom mpi_io_type/stp. Confirm datatype, alignment, and endianness consistency across ranks and restarts, especially under mixed precision; also ensure nelements computation is correct for all precisions. integer :: offset character(len=7) :: proc_rank_str logical :: dir_check integer :: nelements call s_pack_boundary_condition_buffers(q_prim_vf) file_loc = trim(case_dir)//'/restart_data/boundary_conditions' if (proc_rank == 0) then call my_inquire(file_loc, dir_check) if (dir_check .neqv. .true.) then call s_create_directory(trim(file_loc)) end if end if call s_create_mpi_types(bc_type) call s_mpi_barrier() call DelayFileAccess(proc_rank) write (proc_rank_str, '(I7.7)') proc_rank file_path = trim(file_loc)//'/bc_'//trim(proc_rank_str)//'.dat' call MPI_File_open(MPI_COMM_SELF, trim(file_path), MPI_MODE_CREATE + MPI_MODE_WRONLY, MPI_INFO_NULL, file_id, ierr) offset = 0 ! Write bc_types do dir = 1, num_dims do loc = 1, 2 #ifdef MFC_MIXED_PRECISION nelements = sizeof(bc_type(dir, loc)%sf) call MPI_File_write_all(file_id, bc_type(dir, loc)%sf, nelements, MPI_BYTE, MPI_STATUS_IGNORE, ierr) #else nelements = sizeof(bc_type(dir, loc)%sf)/4 call MPI_File_write_all(file_id, bc_type(dir, loc)%sf, nelements, MPI_INTEGER, MPI_STATUS_IGNORE, ierr) #endif end do end do ! Write bc_buffers do dir = 1, num_dims do loc = 1, 2 nelements = sizeof(bc_buffers(dir, loc)%sf)mpi_io_type/stp call MPI_File_write_all(file_id, bc_buffers(dir, loc)%sf, nelements, mpi_io_p, MPI_STATUS_IGNORE, ierr) end do end do call MPI_File_close(file_id, ierr) #endif end subroutine s_write_parallel_boundary_condition_files subroutine s_read_serial_boundary_condition_files(step_dirpath, bc_type) character(LEN=), intent(in) :: step_dirpath type(integer_field), dimension(1:num_dims, 1:2), intent(inout) :: bc_type integer :: dir, loc logical :: file_exist character(len=path_len) :: file_path character(len=10) :: status ! Read bc_types file_path = trim(step_dirpath)//'/bc_type.dat' inquire (FILE=trim(file_path), EXIST=file_exist) if (.not. file_exist) then call s_mpi_abort(trim(file_path)//' is missing. Exiting.') end if open (1, FILE=trim(file_path), FORM='unformatted', STATUS='unknown') do dir = 1, num_dims do loc = 1, 2 read (1) bc_type(dir, loc)%sf $:GPU_UPDATE(device='[bc_type(dir, loc)%sf]') end do end do close (1) ! Read bc_buffers file_path = trim(step_dirpath)//'/bc_buffers.dat' inquire (FILE=trim(file_path), EXIST=file_exist) if (.not. file_exist) then call s_mpi_abort(trim(file_path)//' is missing. Exiting.') end if open (1, FILE=trim(file_path), FORM='unformatted', STATUS='unknown') do dir = 1, num_dims do loc = 1, 2 read (1) bc_buffers(dir, loc)%sf $:GPU_UPDATE(device='[bc_buffers(dir, loc)%sf]') end do end do close (1) end subroutine s_read_serial_boundary_condition_files subroutine s_read_parallel_boundary_condition_files(bc_type) type(integer_field), dimension(1:num_dims, 1:2), intent(inout) :: bc_type integer :: dir, loc character(len=path_len) :: file_loc, file_path character(len=10) :: status #ifdef MFC_MPI integer :: ierr integer :: file_id integer :: offset character(len=7) :: proc_rank_str logical :: dir_check integer :: nelements file_loc = trim(case_dir)//'/restart_data/boundary_conditions' if (proc_rank == 0) then call my_inquire(file_loc, dir_check) if (dir_check .neqv. .true.) then call s_mpi_abort(trim(file_loc)//' is missing. Exiting.') end if end if call s_create_mpi_types(bc_type) call s_mpi_barrier() call DelayFileAccess(proc_rank) write (proc_rank_str, '(I7.7)') proc_rank file_path = trim(file_loc)//'/bc_'//trim(proc_rank_str)//'.dat' call MPI_File_open(MPI_COMM_SELF, trim(file_path), MPI_MODE_RDONLY, MPI_INFO_NULL, file_id, ierr) offset = 0 ! Read bc_types do dir = 1, num_dims do loc = 1, 2 #ifdef MFC_MIXED_PRECISION nelements = sizeof(bc_type(dir, loc)%sf) call MPI_File_read_all(file_id, bc_type(dir, loc)%sf, nelements, MPI_BYTE, MPI_STATUS_IGNORE, ierr) #else nelements = sizeof(bc_type(dir, loc)%sf)/4 call MPI_File_read_all(file_id, bc_type(dir, loc)%sf, nelements, MPI_INTEGER, MPI_STATUS_IGNORE, ierr) #endif $:GPU_UPDATE(device='[bc_type(dir, loc)%sf]') end do end do ! Read bc_buffers do dir = 1, num_dims do loc = 1, 2 nelements = sizeof(bc_buffers(dir, loc)%sf)*mpi_io_type/stp call MPI_File_read_all(file_id, bc_buffers(dir, loc)%sf, nelements, mpi_io_p, MPI_STATUS_IGNORE, ierr) $:GPU_UPDATE(device='[bc_buffers(dir, loc)%sf]') end do end do

qodo-merge-pro · 2025-11-12T17:21:53Z

src/common/m_boundary_common.fpp

+        bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1)
+        bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1)
+        $:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')

        if (n > 0) then
-            bc_type(2, -1)%sf(:, :, :) = bc_y%beg
-            bc_type(2, 1)%sf(:, :, :) = bc_y%end
-            $:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]')
-
-            if (p > 0) then
-                bc_type(3, -1)%sf(:, :, :) = bc_z%beg
-                bc_type(3, 1)%sf(:, :, :) = bc_z%end
-                $:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]')
-            end if
+            bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1)
+            bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1)
+            $:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')
+            #:if not MFC_CASE_OPTIMIZATION or num_dims > 2
+                if (p > 0) then
+                    bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1)
+                    bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1)
+                    $:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')
+                end if
+            #:endif
        end if


Suggestion: Correct the default boundary condition assignment by removing the min(..., 0) logic, which incorrectly handles MPI processor ranks. The previous logic of direct assignment from bc_...%... should be restored with a type cast. [possible issue, importance: 10]

Suggested change

bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1)

bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1)

$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')

if (n > 0) then

bc_type(2, -1)%sf(:, :, :) = bc_y%beg

bc_type(2, 1)%sf(:, :, :) = bc_y%end

$:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]')

if (p > 0) then

bc_type(3, -1)%sf(:, :, :) = bc_z%beg

bc_type(3, 1)%sf(:, :, :) = bc_z%end

$:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]')

end if

bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1)

bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1)

$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')

#:if not MFC_CASE_OPTIMIZATION or num_dims > 2

if (p > 0) then

bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1)

bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1)

$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')

end if

#:endif

end if

bc_type(1, 1)%sf(:, :, :) = int(bc_x%beg, kind=1)

bc_type(1, 2)%sf(:, :, :) = int(bc_x%end, kind=1)

$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')

if (n > 0) then

bc_type(2, 1)%sf(:, :, :) = int(bc_y%beg, kind=1)

bc_type(2, 2)%sf(:, :, :) = int(bc_y%end, kind=1)

$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')

#:if not MFC_CASE_OPTIMIZATION or num_dims > 2

if (p > 0) then

bc_type(3, 1)%sf(:, :, :) = int(bc_z%beg, kind=1)

bc_type(3, 2)%sf(:, :, :) = int(bc_z%end, kind=1)

$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')

end if

#:endif

end if

qodo-merge-pro · 2025-11-12T17:21:53Z

src/simulation/m_viscous.fpp

+            if (shear_stress) then    ! Shear stresses
+                #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
+                    do l = is3_viscous%beg, is3_viscous%end
+                        do k = -1, 1
+                            do j = is1_viscous%beg, is1_viscous%end

-                            $:GPU_LOOP(parallelism='[seq]')
-                            do i = 1, num_fluids
-                                alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
-                                if (bubbles_euler .and. num_fluids == 1) then
-                                    alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
+                                $:GPU_LOOP(parallelism='[seq]')
+                                do i = 1, num_fluids
+                                    alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
+                                    if (bubbles_euler .and. num_fluids == 1) then
+                                        alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
+                                    else
+                                        alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
+                                    end if
+                                end do
+
+                                if (bubbles_euler) then
+                                    rho_visc = 0._wp
+                                    gamma_visc = 0._wp
+                                    pi_inf_visc = 0._wp
+
+                                    if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids
+                                            rho_visc = rho_visc + alpha_rho_visc(i)
+                                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
+                                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
+                                        end do
+                                    else if ((model_eqns == 2) .and. (num_fluids > 2)) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids - 1
+                                            rho_visc = rho_visc + alpha_rho_visc(i)
+                                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
+                                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
+                                        end do
+                                    else
+                                        rho_visc = alpha_rho_visc(1)
+                                        gamma_visc = gammas(1)
+                                        pi_inf_visc = pi_infs(1)
+                                    end if
                                else
-                                    alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
-                                end if
-                            end do
+                                    rho_visc = 0._wp
+                                    gamma_visc = 0._wp
+                                    pi_inf_visc = 0._wp

-                            if (bubbles_euler) then
-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp
+                                    alpha_visc_sum = 0._wp
+
+                                    if (mpp_lim) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids
+                                            alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
+                                            alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
+                                            alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
+                                        end do
+
+                                        alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
+
+                                    end if

-                                if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                                    $:GPU_LOOP(parallelism='[seq]')
                                    do i = 1, num_fluids
                                        rho_visc = rho_visc + alpha_rho_visc(i)
                                        gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                        pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                                    end do
-                                else if ((model_eqns == 2) .and. (num_fluids > 2)) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_fluids - 1
-                                        rho_visc = rho_visc + alpha_rho_visc(i)
-                                        gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
-                                        pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
-                                    end do
-                                else
-                                    rho_visc = alpha_rho_visc(1)
-                                    gamma_visc = gammas(1)
-                                    pi_inf_visc = pi_infs(1)
+
+                                    if (viscous) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, 2
+                                            Re_visc(i) = dflt_real
+
+                                            if (Re_size(i) > 0) Re_visc(i) = 0._wp
+                                            $:GPU_LOOP(parallelism='[seq]')
+                                            do q = 1, Re_size(i)
+                                                Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+                                                             + Re_visc(i)
+                                            end do
+
+                                            Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
+
+                                        end do
+                                    end if
                                end if
-                            else
-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp

-                                alpha_visc_sum = 0._wp
+                                tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
+                                               Re_visc(1)

-                                if (mpp_lim) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_fluids
-                                        alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
-                                        alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
-                                        alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
-                                    end do
+                                tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
+                                                 q_prim_vf(momxe)%sf(j, k, l))/ &
+                                                y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
+                                               Re_visc(1)

-                                    alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
+                                $:GPU_LOOP(parallelism='[seq]')
+                                do i = 2, 3
+                                    tau_Re_vf(contxe + i)%sf(j, k, l) = &
+                                        tau_Re_vf(contxe + i)%sf(j, k, l) - &
+                                        tau_Re(2, i)
+
+                                    tau_Re_vf(E_idx)%sf(j, k, l) = &
+                                        tau_Re_vf(E_idx)%sf(j, k, l) - &
+                                        q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
+                                end do

-                                end if
+                            end do
+                        end do
+                    end do
+                #:endcall GPU_PARALLEL_LOOP
+            end if
+
+            if (bulk_stress) then    ! Bulk stresses
+                #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
+                    do l = is3_viscous%beg, is3_viscous%end
+                        do k = -1, 1
+                            do j = is1_viscous%beg, is1_viscous%end

                                $:GPU_LOOP(parallelism='[seq]')
                                do i = 1, num_fluids
-                                    rho_visc = rho_visc + alpha_rho_visc(i)
-                                    gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
-                                    pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
+                                    alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
+                                    if (bubbles_euler .and. num_fluids == 1) then
+                                        alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
+                                    else
+                                        alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
+                                    end if
                                end do

-                                if (viscous) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, 2
-                                        Re_visc(i) = dflt_real
+                                if (bubbles_euler) then
+                                    rho_visc = 0._wp
+                                    gamma_visc = 0._wp
+                                    pi_inf_visc = 0._wp

-                                        if (Re_size(i) > 0) Re_visc(i) = 0._wp
+                                    if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                                        $:GPU_LOOP(parallelism='[seq]')
-                                        do q = 1, Re_size(i)
-                                            Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
-                                                         + Re_visc(i)
+                                        do i = 1, num_fluids
+                                            rho_visc = rho_visc + alpha_rho_visc(i)
+                                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
+                                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                                        end do
+                                    else if ((model_eqns == 2) .and. (num_fluids > 2)) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids - 1
+                                            rho_visc = rho_visc + alpha_rho_visc(i)
+                                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
+                                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
+                                        end do
+                                    else
+                                        rho_visc = alpha_rho_visc(1)
+                                        gamma_visc = gammas(1)
+                                        pi_inf_visc = pi_infs(1)
+                                    end if
+                                else
+                                    rho_visc = 0._wp
+                                    gamma_visc = 0._wp
+                                    pi_inf_visc = 0._wp

-                                        Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
-
-                                    end do
-                                end if
-                            end if
-
-                            tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
-                                           Re_visc(1)
-
-                            tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
-                                             q_prim_vf(momxe)%sf(j, k, l))/ &
-                                            y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
-                                           Re_visc(1)
-
-                            $:GPU_LOOP(parallelism='[seq]')
-                            do i = 2, 3
-                                tau_Re_vf(contxe + i)%sf(j, k, l) = &
-                                    tau_Re_vf(contxe + i)%sf(j, k, l) - &
-                                    tau_Re(2, i)
-
-                                tau_Re_vf(E_idx)%sf(j, k, l) = &
-                                    tau_Re_vf(E_idx)%sf(j, k, l) - &
-                                    q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
-                            end do
-
-                        end do
-                    end do
-                end do
-            #:endcall GPU_PARALLEL_LOOP
-        end if
+                                    alpha_visc_sum = 0._wp

-        if (bulk_stress) then    ! Bulk stresses
-            #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
-                do l = is3_viscous%beg, is3_viscous%end
-                    do k = -1, 1
-                        do j = is1_viscous%beg, is1_viscous%end
+                                    if (mpp_lim) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids
+                                            alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
+                                            alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
+                                            alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
+                                        end do

-                            $:GPU_LOOP(parallelism='[seq]')
-                            do i = 1, num_fluids
-                                alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
-                                if (bubbles_euler .and. num_fluids == 1) then
-                                    alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
-                                else
-                                    alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
-                                end if
-                            end do
+                                        alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

-                            if (bubbles_euler) then
-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp
+                                    end if

-                                if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                                    $:GPU_LOOP(parallelism='[seq]')
                                    do i = 1, num_fluids
                                        rho_visc = rho_visc + alpha_rho_visc(i)
                                        gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                        pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                                    end do
-                                else if ((model_eqns == 2) .and. (num_fluids > 2)) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_fluids - 1
-                                        rho_visc = rho_visc + alpha_rho_visc(i)
-                                        gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
-                                        pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
-                                    end do
-                                else
-                                    rho_visc = alpha_rho_visc(1)
-                                    gamma_visc = gammas(1)
-                                    pi_inf_visc = pi_infs(1)
-                                end if
-                            else
-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp
-
-                                alpha_visc_sum = 0._wp
-
-                                if (mpp_lim) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_fluids
-                                        alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
-                                        alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
-                                        alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
-                                    end do
-
-                                    alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

-                                end if
-
-                                $:GPU_LOOP(parallelism='[seq]')
-                                do i = 1, num_fluids
-                                    rho_visc = rho_visc + alpha_rho_visc(i)
-                                    gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
-                                    pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
-                                end do
-
-                                if (viscous) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, 2
-                                        Re_visc(i) = dflt_real
-
-                                        if (Re_size(i) > 0) Re_visc(i) = 0._wp
+                                    if (viscous) then
                                        $:GPU_LOOP(parallelism='[seq]')
-                                        do q = 1, Re_size(i)
-                                            Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
-                                                         + Re_visc(i)
-                                        end do
+                                        do i = 1, 2
+                                            Re_visc(i) = dflt_real
+
+                                            if (Re_size(i) > 0) Re_visc(i) = 0._wp
+                                            $:GPU_LOOP(parallelism='[seq]')
+                                            do q = 1, Re_size(i)
+                                                Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+                                                             + Re_visc(i)
+                                            end do

-                                        Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
+                                            Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

-                                    end do
+                                        end do
+                                    end if
                                end if
-                            end if

-                            tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
-                                           Re_visc(2)
+                                tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
+                                               Re_visc(2)

-                            tau_Re_vf(momxb + 1)%sf(j, k, l) = &
-                                tau_Re_vf(momxb + 1)%sf(j, k, l) - &
-                                tau_Re(2, 2)
+                                tau_Re_vf(momxb + 1)%sf(j, k, l) = &
+                                    tau_Re_vf(momxb + 1)%sf(j, k, l) - &
+                                    tau_Re(2, 2)

-                            tau_Re_vf(E_idx)%sf(j, k, l) = &
-                                tau_Re_vf(E_idx)%sf(j, k, l) - &
-                                q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
+                                tau_Re_vf(E_idx)%sf(j, k, l) = &
+                                    tau_Re_vf(E_idx)%sf(j, k, l) - &
+                                    q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

+                            end do
                        end do
                    end do
-                end do
-            #:endcall GPU_PARALLEL_LOOP
-        end if
+                #:endcall GPU_PARALLEL_LOOP
+            end if


Suggestion: Refactor the duplicated logic within the shear_stress and bulk_stress blocks into a separate subroutine to improve code maintainability. [general, importance: 7]

Suggested change

if (shear_stress) then ! Shear stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)

if (bubbles_euler .and. num_fluids == 1) then

alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)

if (bubbles_euler .and. num_fluids == 1) then

alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)

else

alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)

end if

end do

if (bubbles_euler) then

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else if ((model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids - 1

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else

rho_visc = alpha_rho_visc(1)

gamma_visc = gammas(1)

pi_inf_visc = pi_infs(1)

end if

else

alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)

end if

end do

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

if (bubbles_euler) then

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp

if (mpp_lim) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))

alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)

alpha_visc_sum = alpha_visc_sum + alpha_visc(i)

end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

end if

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else if ((model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids - 1

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else

rho_visc = alpha_rho_visc(1)

gamma_visc = gammas(1)

pi_inf_visc = pi_infs(1)

if (viscous) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 2

Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp

$:GPU_LOOP(parallelism='[seq]')

do q = 1, Re_size(i)

Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &

+ Re_visc(i)

end do

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do

end if

end if

else

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp

tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(1)

if (mpp_lim) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))

alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)

alpha_visc_sum = alpha_visc_sum + alpha_visc(i)

end do

tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &

q_prim_vf(momxe)%sf(j, k, l))/ &

y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &

Re_visc(1)

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

$:GPU_LOOP(parallelism='[seq]')

do i = 2, 3

tau_Re_vf(contxe + i)%sf(j, k, l) = &

tau_Re_vf(contxe + i)%sf(j, k, l) - &

tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)

end do

end if

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

if (bulk_stress) then ! Bulk stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)

if (bubbles_euler .and. num_fluids == 1) then

alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)

else

alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)

end if

end do

if (viscous) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 2

Re_visc(i) = dflt_real

if (bubbles_euler) then

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

if (Re_size(i) > 0) Re_visc(i) = 0._wp

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do q = 1, Re_size(i)

Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &

+ Re_visc(i)

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else if ((model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids - 1

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else

rho_visc = alpha_rho_visc(1)

gamma_visc = gammas(1)

pi_inf_visc = pi_infs(1)

end if

else

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do

end if

end if

tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(1)

tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &

q_prim_vf(momxe)%sf(j, k, l))/ &

y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &

Re_visc(1)

$:GPU_LOOP(parallelism='[seq]')

do i = 2, 3

tau_Re_vf(contxe + i)%sf(j, k, l) = &

tau_Re_vf(contxe + i)%sf(j, k, l) - &

tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)

end do

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

alpha_visc_sum = 0._wp

if (bulk_stress) then ! Bulk stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

if (mpp_lim) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))

alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)

alpha_visc_sum = alpha_visc_sum + alpha_visc(i)

end do

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)

if (bubbles_euler .and. num_fluids == 1) then

alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)

else

alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)

end if

end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

if (bubbles_euler) then

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

end if

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else if ((model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids - 1

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else

rho_visc = alpha_rho_visc(1)

gamma_visc = gammas(1)

pi_inf_visc = pi_infs(1)

end if

else

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp

if (mpp_lim) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))

alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)

alpha_visc_sum = alpha_visc_sum + alpha_visc(i)

end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

end if

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

if (viscous) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 2

Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp

if (viscous) then

$:GPU_LOOP(parallelism='[seq]')

do q = 1, Re_size(i)

Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &

+ Re_visc(i)

end do

do i = 1, 2

Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp

$:GPU_LOOP(parallelism='[seq]')

do q = 1, Re_size(i)

Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &

+ Re_visc(i)

end do

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do

end do

end if

end if

end if

tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(2)

tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(2)

tau_Re_vf(momxb + 1)%sf(j, k, l) = &

tau_Re_vf(momxb + 1)%sf(j, k, l) - &

tau_Re(2, 2)

tau_Re_vf(momxb + 1)%sf(j, k, l) = &

tau_Re_vf(momxb + 1)%sf(j, k, l) - &

tau_Re(2, 2)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

end do

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

#:endcall GPU_PARALLEL_LOOP

end if

if (shear_stress) then ! Shear stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc)

tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(1)

tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &

q_prim_vf(momxe)%sf(j, k, l))/ &

y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &

Re_visc(1)

$:GPU_LOOP(parallelism='[seq]')

do i = 2, 3

tau_Re_vf(contxe + i)%sf(j, k, l) = &

tau_Re_vf(contxe + i)%sf(j, k, l) - &

tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)

end do

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

if (bulk_stress) then ! Bulk stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc)

tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(2)

tau_Re_vf(momxb + 1)%sf(j, k, l) = &

tau_Re_vf(momxb + 1)%sf(j, k, l) - &

tau_Re(2, 2)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

Copilot

Pull Request Overview

This PR adds OpenMP support for the Cray Compiler Environment (CCE) and introduces mixed precision functionality developed for Gordon Bell submissions. The changes span toolchain configuration, simulation code optimization, and new simplex noise perturbation capabilities.

Key Changes:

Added mixed precision support with a new stp (storage precision) type alongside the existing wp (working precision) type
Implemented case-specific optimizations that conditionally compile code based on runtime parameters
Updated MPI I/O operations to handle mixed precision data types
Added simplex noise module for initial condition perturbations
Enhanced Frontier HPC template with better resource management (sbcast for binaries, nvme storage)

Reviewed Changes

Copilot reviewed 65 out of 67 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
toolchain/templates/frontier.mako	Added nvme constraint, improved binary broadcast with sbcast, streamlined profiler placement
toolchain/pyproject.toml	Updated pyrometheus dependency to development branch for OpenMP testing
toolchain/modules	Duplicate entry for CSCS Santis configuration (lines 92-95)
toolchain/mfc/state.py	Added `mixed` precision flag to MFCConfig, improved code formatting
toolchain/mfc/run/input.py	Updated real_type selection logic to include mixed precision mode
toolchain/mfc/run/case_dicts.py	Added simplex_params parameter definitions for density/velocity perturbations
toolchain/mfc/lock.py	Incremented lock version to 8
toolchain/mfc/build.py	Added MFC_MIXED_PRECISION CMake flag
tests/43B5FEBD/golden-metadata.txt	New golden metadata file documenting test configuration
src/simulation/p_main.fpp	Added unused `status` variable declaration
src/simulation/m_weno.fpp	Added case optimization guards to conditionally compile WENO stencil code
src/simulation/m_viscous.fpp	Wrapped 3D viscous stress calculations in case optimization guards
src/simulation/m_time_steppers.fpp	Converted to mixed precision (stp) for time-stepping arrays, reorganized probe storage
src/simulation/m_surface_tension.fpp	Added explicit type conversion for sqrt argument, wrapped 3D code
src/simulation/m_start_up.fpp	Changed MPI I/O word size to 4 bytes, extensive mixed precision conversions
src/simulation/m_sim_helpers.fpp	Wrapped 3D CFL calculations in case optimization guards
src/simulation/m_riemann_solvers.fpp	Expanded private variable lists in parallel loops for thread safety
src/simulation/m_rhs.fpp	Changed loop iterators to 8-byte integers, wrapped OpenACC directives
src/simulation/m_qbmm.fpp	Added case optimization for bubble model coefficient calculations
src/simulation/m_muscl.fpp	Renamed reconstruction workspace arrays with `_muscl` suffix, fixed indentation
src/simulation/m_mhd.fpp	Added missing private variables to parallel loop
src/simulation/m_ibm.fpp	Converted ghost point routines to use mixed precision
src/simulation/m_hypoelastic.fpp	Added private variables to GPU loops
src/simulation/m_hyperelastic.fpp	Added missing private variable to parallel loop
src/simulation/m_global_parameters.fpp	Cleaned up GPU data transfer directives
src/simulation/m_fftw.fpp	Refactored FFT filtering for improved device management
src/simulation/m_derived_variables.fpp	Updated time-stepping array references
src/simulation/m_data_output.fpp	Renamed downsampling temporary arrays, updated MPI I/O types
src/simulation/m_cbc.fpp	Made dpres_ds a local variable, wrapped case-specific code
src/simulation/m_bubbles_EL_kernels.fpp	Converted to mixed precision with explicit type conversions
src/simulation/m_bubbles_EL.fpp	Changed q_beta from vector_field to scalar_field array
src/simulation/m_bubbles_EE.fpp	Added private variables and explicit type conversions
src/simulation/m_body_forces.fpp	Removed unnecessary parentheses
src/simulation/m_acoustic_src.fpp	Updated loop structure and allocation
src/simulation/include/inline_capillary.fpp	Wrapped 3D capillary stress code in optimization guards
src/pre_process/m_start_up.fpp	Added simplex_perturb to namelist, added finalization call
src/pre_process/m_simplex_noise.fpp	New module implementing 2D/3D simplex noise for perturbations
src/pre_process/m_perturbation.fpp	Added simplex noise perturbation routine
src/pre_process/m_mpi_proxy.fpp	Added MPI broadcast for simplex parameters
src/pre_process/m_initial_condition.fpp	Updated bc_type indexing, added simplex perturbation call
src/pre_process/m_global_parameters.fpp	Added simplex_perturb parameters
setupNB.sh	New setup script for NVHPC compilers on specific system

Copilot · 2025-11-12T17:22:04Z

examples/3D_IGR_jet_1fluid/case.py

@@ -0,0 +1,104 @@
+#!/usr/bin/env python3
+import math


Import of 'math' is not used.

Suggested change

import math

Accept this suggestion

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-12T17:23:09Z

src/pre_process/m_simplex_noise.fpp

+    function dot2(g, x, y) result(dot)
+
+        integer, intent(in) :: g
+        real(wp), intent(in) :: x, y
+        real(wp) :: dot
+        dot = grad2(g + 1, 1)*x + grad2(g + 1, 2)*y


Prevent simplex noise gradient indexing overflow

grad2 contains only 10 gradient vectors, but the indices used in f_simplex2d are computed as mod(...,12)+1 and dot2 then indexes grad2(g + 1, :). Whenever the permutation yields 10 or 11, g + 1 evaluates to 11–12 and the code reads beyond grad2’s bounds, which will crash or corrupt data when simplex_perturb is enabled. Either the gradient table needs 12 entries or the modulus/offset logic must restrict g to 0–9 before the lookup.

Useful? React with 👍 / 👎.

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bug: Array Index Mismatch Breaks Deallocation

The deallocation code attempts to deallocate bc_type(1, -1)%sf, bc_type(2, -1)%sf, and bc_type(3, -1)%sf, but the bc_type array dimension was changed from (-1:1) to (1:2) throughout the codebase. The second dimension no longer has index -1, causing an out-of-bounds access. The indices should be 1 and 2 instead of -1 and 1.

src/post_process/m_data_input.f90#L671-L676

MFC/src/post_process/m_data_input.f90

Lines 671 to 676 in e1d9044

    
           deallocate (bc_type(1, -1)%sf, bc_type(1, 1)%sf) 
        
           if (n > 0) then 
        
               deallocate (bc_type(2, -1)%sf, bc_type(2, 1)%sf) 
        
               if (p > 0) then 
        
                   deallocate (bc_type(3, -1)%sf, bc_type(3, 1)%sf)

cursor · 2025-11-12T17:25:23Z

src/pre_process/m_boundary_conditions.fpp

                            if ((z_cc(k) - z_centroid)**2._wp + &
                                (y_cc(j) - y_centroid)**2._wp <= radius**2._wp) then
-                                bc_type(1, -1)%sf(0, j, k) = patch_bc(patch_id)%type
+                                bc_type(1, 1)%sf(0, j, k) = patch_bc(patch_id)%type


Bug: Incorrect boundary indexing causes data overwrite.

The s_circle_bc subroutine always writes to bc_type(1, 1) regardless of whether the patch is at the beginning or end boundary. The template loop iterates over LOC values of -1 and 1, but unlike s_line_segment_bc which conditionally selects index 1 or 2 based on LOC, this code always uses index 1. When LOC == 1 (end boundary), it writes to the wrong array element, overwriting beginning boundary data instead of writing to bc_type(1, 2).

cursor · 2025-11-12T17:25:23Z

src/pre_process/m_boundary_conditions.fpp

                                z_boundary%beg <= z_cc(k) .and. &
                                z_boundary%end >= z_cc(k)) then
-                                bc_type(1, -1)%sf(0, j, k) = patch_bc(patch_id)%type
+                                bc_type(1, 1)%sf(0, j, k) = patch_bc(patch_id)%type


Bug: Boundary Data Overlap Error

The s_rectangle_bc subroutine always writes to bc_type(1, 1) regardless of whether the patch is at the beginning or end boundary. The template loop iterates over LOC values of -1 and 1, but unlike s_line_segment_bc which conditionally selects index 1 or 2 based on LOC, this code always uses index 1. When LOC == 1 (end boundary), it writes to the wrong array element, overwriting beginning boundary data instead of writing to bc_type(1, 2).

cursor · 2025-11-12T17:25:24Z

src/simulation/m_riemann_solvers.fpp

                                    E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R

+                                    H_L = (E_L + pres_L)/rho_L
+                                    H_R = (E_R + pres_R)/rho_R


Bug: Duplicate Assignment Overwrites Itself

Duplicate assignment of H_L and H_R that immediately overwrites the same values computed on lines 2471-2472. The calculation (E_L + pres_L)/rho_L and (E_R + pres_R)/rho_R is performed twice consecutively with identical results.

src/pre_process/m_initial_condition.fpp

wilfonba · 2025-11-12T18:56:37Z

CMakeLists.txt

        else()
            message(STATUS "Performing IPO using -Mextract followed by -Minline")
-            set(NVHPC_USE_TWO_PASS_IPO TRUE)
+            set(NVHPC_USE_TWO_PASS_IPO FALSE)


wilfonba · 2025-11-12T18:58:08Z

examples/3D_IGR_jet_1fluid/case.py

@@ -0,0 +1,104 @@
+#!/usr/bin/env python3
+import math


Accept this suggestion

wilfonba · 2025-11-12T18:59:50Z

examples/3D_IGR_jet_1fluid/case.py

+            "num_igr_iters": igrIters,
+            "num_igr_warm_start_iters": igrIters,
+            "alf_factor": 10,
+            "bc_x%beg": -3,


Should be -17. This case and the associated hard coded patches need cleaned up anyway. I can address this

wilfonba · 2025-11-12T19:01:23Z

load_amd.sh

This file needs to be deleted, moved to misc/, or incorporated into the toolchain somehow. This PR doesn't add AMD compilers anyway, so my suggestion would be deleted for now.

wilfonba · 2025-11-12T19:01:48Z

setupNB.sh

Delete file

wilfonba · 2025-11-12T19:31:45Z

src/simulation/m_rhs.fpp

                    @:DEALLOCATE(q_prim_qp%vf(j)%sf)
                else
-                    $:GPU_EXIT_DATA(detach='[q_prim_qp%vf(j)%sf]')
+                    !$:GPU_EXIT_DATA(detach='[q_prim_qp%vf(j)%sf]')


commented out?

src/simulation/m_rhs.fpp

src/simulation/m_time_steppers.fpp

wilfonba · 2025-11-12T19:36:06Z

toolchain/pyproject.toml

-    "pyrometheus == 1.0.5",
-
+    #"pyrometheus == 1.0.5",
+    "pyrometheus @ git+https://github.com/wilfonba/pyrometheus-wilfong.git@OpenMPTest",


I need to merge this with the Pyro upstream...

wilfonba · 2025-11-12T19:36:36Z

src/simulation/p_main.fpp

    real(wp) :: start, finish
    integer :: nt

+    logical :: status


Is this used anywhere?

wilfonba

Approve for benchmark

prathi-wind and others added 30 commits July 22, 2025 13:16

Update mfc python and cmake to support OpenMP

8584e89

Fixed issue with not compiling on CPU builds

7521731

Temporary commit

90c6738

OMP parallel and parallel loop

06a783e

Removed pure markings

2abfad5

Added routine and declare and partial data, non compiling

db1b8c5

Some manual changes to codebase, and implemented attach

ed29d13

Changed parallel loop to also include the end parallel

c1b41a6

Ran formatter

29e9404

Fixed some issues with matching start and end of parallel loop macros…

cec9867

…, started work on enter and exit data, compiles

Moved macro code to their corresponding file, and finished enter data…

33eca5f

…, exit data, and update

remove line that sets default_val to empty string

11822f5

Fixed GPU_PARALLEL for omp and ran formatter

314fa13

Add syscheck of OpenMP, add omp support for GPU_HOST_DATA, ATOMIC, an…

9d23036

…d WAIT

Update var name

de586ad

Change how parallel loop is translated

84ddc01

Ran formatter

7246e9b

Remove extraneous build flags

d5381aa

Remove thermochem function calls

dbcb6f3

Remove LTO add always to data allocation omp, switch delete to releas…

94222f4

…e, add mappers to derived types, change how allocate is done

Fixed parallel loop when no OpenMP or OpenACC

473d19d

Update how allocate macro works, update riemann solver, and parallel …

5703c19

…loop

Passing most 1D cases

7021690

Fixed IGR 2D, readded parallel loop in cbc, undid changes in derived …

828d9d8

…types, removed rest of pure functions, fix issue with acoustic on nvfortran

Forgot to add something for IGR and add back parallel loop for data o…

cb02511

…utput

change binding + test suite works

69b5792

Added missing space

a01f262

Chemistry works with OpenACC and almost works with OpenMP

d7fbcab

Added a half-precision data type

b3c1ad1

Readd LTO to cmake

0d550aa

Anand Radhakrishnan and others added 5 commits November 10, 2025 15:53

remove duplicates in private

cacc5d5

IGR tests pass with AMD compiler

268808e

fix format

216261f

fix bug in CBC + OpenMP

84f85a5

fix bounds in post_process

530638a

wilfonba approved these changes Nov 11, 2025

View reviewed changes

Increase bench time

e1d9044

sbryngelson marked this pull request as ready for review November 12, 2025 17:19

Copilot AI review requested due to automatic review settings November 12, 2025 17:19

sbryngelson requested review from a team and sbryngelson as code owners November 12, 2025 17:19

sbryngelson approved these changes Nov 12, 2025

View reviewed changes

Copilot started reviewing on behalf of sbryngelson November 12, 2025 17:19 View session

qodo-merge-pro bot added the Review effort 4/5 label Nov 12, 2025

Copilot finished reviewing on behalf of sbryngelson November 12, 2025 17:21

qodo-merge-pro bot reviewed Nov 12, 2025

View reviewed changes

Copilot AI reviewed Nov 12, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 12, 2025

View reviewed changes

cursor bot reviewed Nov 12, 2025

View reviewed changes

sbryngelson approved these changes Nov 12, 2025

View reviewed changes

Anand Radhakrishnan and others added 3 commits November 12, 2025 13:40

fix post_process benchmark

a774051

fix benchmarking on frontier

87fae79

fix benchmarking on frontier

771797b

sbryngelson approved these changes Nov 12, 2025

View reviewed changes

wilfonba reviewed Nov 12, 2025

View reviewed changes

wilfonba approved these changes Nov 12, 2025

View reviewed changes


	deallocate (bc_type(1, -1)%sf, bc_type(1, 1)%sf)
	if (n > 0) then
	deallocate (bc_type(2, -1)%sf, bc_type(2, 1)%sf)
	if (p > 0) then
	deallocate (bc_type(3, -1)%sf, bc_type(3, 1)%sf)

Openmp cce #1035

Are you sure you want to change the base?

Openmp cce #1035

Conversation

prathi-wind commented Nov 10, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

codecov bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wilfonba left a comment

Choose a reason for hiding this comment

Uh oh!

sbryngelson commented Nov 11, 2025

Uh oh!

qodo-merge-pro bot commented Nov 12, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

qodo-merge-pro bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Bug: Array Index Mismatch Breaks Deallocation

Uh oh!

cursor bot Nov 12, 2025

Choose a reason for hiding this comment

Bug: Incorrect boundary indexing causes data overwrite.

Uh oh!

cursor bot Nov 12, 2025

Choose a reason for hiding this comment

Bug: Boundary Data Overlap Error

Uh oh!

cursor bot Nov 12, 2025

Choose a reason for hiding this comment

Bug: Duplicate Assignment Overwrites Itself

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

prathi-wind commented Nov 10, 2025 •

edited by qodo-merge-pro bot

Loading

If your code changes any code source files (anything in `src/simulation`)

codecov bot commented Nov 11, 2025 •

edited

Loading