Skip to content

Conversation

@prathi-wind
Copy link
Collaborator

@prathi-wind prathi-wind commented Nov 10, 2025

User description

Description

Add OpenMP support to MFC on the Cray Compiler. Passes all test cases, and adds CI to test on Frontier.

Also, adds in mixed precision made added for Gordon Bell.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Something else

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • What computers and compilers did you use to test this:

Checklist

  • I have added comments for the new code
  • I added Doxygen docstrings to the new code
  • I have made corresponding changes to the documentation (docs/)
  • I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
  • I have added example cases in examples/ that demonstrate my new feature performing as expected.
    They run to completion and demonstrate "interesting physics"
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

  • Checked that the code compiles using NVHPC compilers
  • Checked that the code compiles using CRAY compilers
  • Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
  • Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
  • Enclosed the new feature via nvtx ranges so that they can be identified in profiles
  • Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
  • Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
  • Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement, Bug fix


Description

  • Add OpenMP support to MFC on the Cray Compiler with comprehensive case optimization

  • Implement mixed precision support throughout the codebase for improved performance and memory efficiency

  • Refactor boundary condition arrays from negative indexing (-1:1) to positive indexing (1:2) for optimization

  • Add case optimization guards to skip unnecessary 3D computations in 2D simulations (WENO, viscous, MHD, capillary)

  • Introduce simplex noise generation module for procedural perturbation support

  • Fix GPU compatibility issues in chemistry and FFT modules for both NVIDIA and AMD compilers

  • Update MPI operations to support mixed precision with proper type conversions and 64-bit indexing

  • Refactor bubble Eulerian projection data structure for improved memory layout

  • Fix MUSCL workspace array naming conflicts and GPU parallel loop syntax issues

  • Add CI testing on Frontier supercomputer for Cray Compiler validation


Diagram Walkthrough

flowchart LR
  A["Cray Compiler<br/>OpenMP Support"] --> B["Case Optimization<br/>2D/3D Guards"]
  A --> C["Mixed Precision<br/>wp/stp Types"]
  B --> D["WENO, Viscous,<br/>MHD Modules"]
  C --> E["Boundary Arrays<br/>MPI Operations"]
  C --> F["Bubble EL,<br/>QBMM Modules"]
  G["GPU Compatibility<br/>Fixes"] --> H["Chemistry<br/>FFT Modules"]
  I["Simplex Noise<br/>Module"] --> J["Perturbation<br/>Support"]
  D --> K["Performance<br/>Optimization"]
  E --> K
  H --> K
Loading

File Walkthrough

Relevant files
Enhancement
17 files
m_boundary_common.fpp
Refactor boundary arrays and add case optimization support

src/common/m_boundary_common.fpp

  • Changed boundary condition array indexing from (-1:1) to (1:2) to
    support case optimization
  • Updated all references from negative indices (-1, 1) to positive
    indices (1, 2) throughout the module
  • Added conditional compilation directives #:if not
    MFC_CASE_OPTIMIZATION or num_dims > 2 to skip 3D boundary operations
    in optimized 2D cases
  • Changed data type from real(wp) to real(stp) for mixed precision
    support in buffer parameters
  • Updated MPI I/O operations to handle mixed precision with MPI_BYTE and
    element count calculations
+249/-231
m_weno.fpp
Add case optimization guards to WENO reconstruction schemes

src/simulation/m_weno.fpp

  • Added #:include 'case.fpp' directive for case optimization support
  • Wrapped WENO order 5 and 7 reconstruction code with conditional
    compilation #:if not MFC_CASE_OPTIMIZATION or weno_num_stencils > N
  • Nested TENO stencil calculations within additional optimization guards
    for higher-order schemes
  • Maintained existing WENO algorithm logic while enabling compile-time
    optimization for reduced stencil cases
+286/-274
m_bubbles_EL.fpp
Refactor bubble Eulerian projection data structure for mixed precision

src/simulation/m_bubbles_EL.fpp

  • Changed q_beta from type(vector_field) to type(scalar_field),
    dimension(:), allocatable for better memory layout
  • Updated allocation from q_beta%vf(1:q_beta_idx) to
    q_beta(1:q_beta_idx) with individual scalar field setup
  • Changed data type from real(wp) to real(stp) for mixed precision
    support in gradient calculations
  • Updated s_gradient_dir subroutine signature to accept raw arrays
    instead of scalar field types
  • Updated all references from q_beta%vf(i)%sf to q_beta(i)%sf throughout
    the module
+50/-48 
m_boundary_conditions.fpp
Update boundary condition array indexing to positive indices

src/pre_process/m_boundary_conditions.fpp

  • Updated boundary condition array indexing from (-1:1) to (1:2) in
    function signatures
  • Added conditional logic to map old negative indices (-1) to new
    positive index (1) and (1) to (2)
  • Updated all bc_type array accesses to use new indexing scheme with
    explicit if-statements for location mapping
+20/-12 
m_viscous.fpp
Add case optimization conditionals to viscous module         

src/simulation/m_viscous.fpp

  • Added case.fpp include directive for case optimization support
  • Wrapped viscous stress tensor computation blocks with
    MFC_CASE_OPTIMIZATION conditional to skip 3D-specific calculations in
    2D cases
  • Reorganized gradient computation loops with conditional compilation
    for dimensional optimization
  • Improved code indentation and structure for better readability
+319/-316
m_riemann_solvers.fpp
Optimize MHD calculations and improve GPU parallelization

src/simulation/m_riemann_solvers.fpp

  • Added case.fpp include for case optimization support
  • Wrapped MHD-related 3D vector operations with MFC_CASE_OPTIMIZATION
    conditionals to optimize 2D cases
  • Expanded private variable lists in GPU parallel loops for better
    memory management
  • Fixed duplicate variable assignments and improved variable
    initialization in HLLC solver
  • Added explicit type conversions for mixed precision support
+78/-35 
m_mpi_common.fpp
Support mixed precision and dynamic dimensionality in MPI

src/common/m_mpi_common.fpp

  • Added case.fpp include directive for case optimization
  • Changed halo_size from integer to integer(kind=8) for 64-bit support
  • Fixed dimension indexing to use num_dims instead of hardcoded value 3
    for 2D/3D compatibility
  • Added explicit type conversions between wp and stp precision kinds in
    MPI buffer operations
+23/-23 
m_cbc.fpp
Optimize CBC module with case-specific conditionals           

src/simulation/m_cbc.fpp

  • Added case.fpp include for case optimization support
  • Moved dpres_ds variable declaration from module level to local scope
    within subroutine
  • Added dpres_ds to private variable list in GPU parallel loop
  • Wrapped dpi_inf_dt assignment with MFC_CASE_OPTIMIZATION conditional
    for single-fluid optimization
  • Improved GPU update directive formatting
+7/-8     
m_global_parameters.fpp
Add simplex noise perturbation parameters                               

src/pre_process/m_global_parameters.fpp

  • Added new simplex_perturb logical flag for simplex noise perturbation
    support
  • Added simplex_params derived type variable to store simplex noise
    parameters
  • Initialized simplex perturbation parameters in module initialization
    routine
+12/-0   
m_qbmm.fpp
Mixed precision support and case optimization for QBMM     

src/simulation/m_qbmm.fpp

  • Changed pb parameter type from real(wp) to real(stp) for mixed
    precision support
  • Wrapped bubble model coefficient calculations with preprocessor
    conditionals for case optimization
  • Added missing loop variables i1, i2, j to GPU parallel loop private
    list
  • Added TODO comment regarding rhs_pb and rhs_mv precision types
+98/-87 
m_simplex_noise.fpp
New simplex noise generation module                                           

src/pre_process/m_simplex_noise.fpp

  • New module implementing 3D and 2D Perlin simplex noise functions
  • Includes gradient lookup tables and permutation vectors for noise
    generation
  • Provides f_simplex3d and f_simplex2d functions for procedural noise
+245/-0 
m_fftw.fpp
FFT filter GPU implementation refactoring                               

src/simulation/m_fftw.fpp

  • Refactored GPU FFT filter implementation to use direct device
    addresses instead of pointers
  • Removed #:block UNDEF_CCE wrapper and simplified GPU data management
  • Restructured loop logic to process initial ring separately before main
    loop
  • Improved compatibility with both NVIDIA and AMD GPU compilers
+115/-130
m_start_up.fpp
Mixed precision I/O and time stepping improvements             

src/simulation/m_start_up.fpp

  • Changed WP_MOK from 8 bytes to 4 bytes for mixed precision I/O
  • Updated MPI file read calls to use mpi_io_type multiplier for data
    size
  • Added logic to copy q_cons_ts(1) to q_cons_ts(2) for multi-stage time
    steppers
  • Fixed NaN checking to use explicit real() conversion for mixed
    precision
  • Added GPU updates for bubble and chemistry parameters
  • Removed unused GPU declare directive for q_cons_temp
+88/-54 
m_rhs.fpp
Mixed precision and GPU compatibility in RHS module           

src/simulation/m_rhs.fpp

  • Added conditional GPU declare directives for OpenACC-specific
    declarations
  • Changed loop variable types to integer(kind=8) for large array
    indexing
  • Wrapped flux allocation logic with if (.not. igr) condition
  • Updated bc_type parameter dimension from 1:3, -1:1 to 1:3, 1:2
  • Changed pb_in, mv_in parameters to real(stp) type for mixed precision
  • Added TODO comments about precision type consistency
+57/-45 
m_icpp_patches.fpp
Mixed precision support for patch identification arrays   

src/pre_process/m_icpp_patches.fpp

  • Added conditional compilation for patch_id_fp array type based on
    MFC_MIXED_PRECISION flag
  • Uses integer(kind=1) for mixed precision mode, standard integer
    otherwise
  • Applied consistently across all patch subroutines for memory
    efficiency
+75/-4   
m_ibm.fpp
Mixed precision support for IBM module                                     

src/simulation/m_ibm.fpp

  • Changed pb_in and mv_in parameters to real(stp) type for mixed
    precision
  • Added missing variables to GPU parallel loop private list
  • Added type casting for pb_in and mv_in in interpolation calls
  • Changed levelset_in access to use explicit real() conversion
  • Updated pb_in, mv_in intent from INOUT to IN in interpolation
    subroutine
+7/-5     
inline_capillary.fpp
Case optimization for capillary tensor calculations           

src/simulation/include/inline_capillary.fpp

  • Wrapped 3D capillary tensor calculations with preprocessor conditional
  • Only computes 3D components when not using case optimization or when
    num_dims > 2
  • Reduces unnecessary computations for 2D simulations
+7/-6     
Bug fix
2 files
m_muscl.fpp
Rename MUSCL arrays and fix GPU loop syntax                           

src/simulation/m_muscl.fpp

  • Renamed MUSCL workspace arrays from v_rs_ws_x/y/z to
    v_rs_ws_x/y/z_muscl to avoid naming conflicts
  • Fixed GPU parallel loop syntax and indentation consistency
  • Corrected #:endcall directives to include full macro name for clarity
  • Improved code formatting and nested subroutine indentation
+158/-158
m_chemistry.fpp
GPU compatibility fixes for chemistry module                         

src/common/m_chemistry.fpp

  • Removed sequential GPU loop directives that were causing compilation
    issues
  • Added temporary variable T_in for type conversion in temperature
    calculations
  • Wrapped large GPU parallel loop with #:block UNDEF_AMD to handle AMD
    compiler differences
  • Added missing variables to GPU parallel loop private and copyin lists
+113/-112
Additional files
48 files
bench.yml +8/-0     
submit-bench.sh +1/-1     
submit-bench.sh +1/-1     
test.yml +0/-4     
CMakeLists.txt +18/-7   
case.py +104/-0 
load_amd.sh +7/-0     
setupNB.sh +5/-0     
3dHardcodedIC.fpp +113/-2 
macros.fpp +5/-4     
omp_macros.fpp +25/-1   
m_derived_types.fpp +26/-9   
m_helper.fpp +6/-6     
m_precision_select.f90 +14/-0   
m_variables_conversion.fpp +10/-34 
m_data_input.f90 +11/-10 
m_assign_variables.fpp +18/-4   
m_checker.fpp +23/-0   
m_data_output.fpp +32/-34 
m_initial_condition.fpp +35/-14 
m_mpi_proxy.fpp +29/-2   
m_perturbation.fpp +90/-1   
m_start_up.fpp +3/-2     
m_acoustic_src.fpp +8/-11   
m_body_forces.fpp +1/-1     
m_bubbles_EE.fpp +4/-4     
m_bubbles_EL_kernels.fpp +15/-15 
m_data_output.fpp +25/-26 
m_derived_variables.fpp +15/-13 
m_global_parameters.fpp +6/-11   
m_hyperelastic.fpp +1/-1     
m_hypoelastic.fpp +4/-4     
m_igr.fpp +1616/-1385
m_mhd.fpp +1/-1     
m_sim_helpers.fpp +31/-26 
m_surface_tension.fpp +38/-34 
m_time_steppers.fpp +86/-37 
p_main.fpp +2/-0     
golden-metadata.txt +154/-0 
golden.txt +10/-0   
build.py +2/-1     
lock.py +1/-1     
case_dicts.py +16/-0   
input.py +1/-1     
state.py +3/-3     
modules +4/-0     
pyproject.toml +3/-2     
frontier.mako +11/-10 

prathi-wind and others added 30 commits July 22, 2025 13:16
…, started work on enter and exit data, compiles
…e, add mappers to derived types, change how allocate is done
…types, removed rest of pure functions, fix issue with acoustic on nvfortran
@codecov
Copy link

codecov bot commented Nov 11, 2025

Codecov Report

❌ Patch coverage is 20.90395% with 420 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.06%. Comparing base (bfd732c) to head (e1d9044).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
src/pre_process/m_simplex_noise.fpp 0.00% 82 Missing ⚠️
src/simulation/m_qbmm.fpp 1.25% 73 Missing and 6 partials ⚠️
src/pre_process/m_perturbation.fpp 0.00% 46 Missing ⚠️
src/common/m_boundary_common.fpp 31.14% 35 Missing and 7 partials ⚠️
src/pre_process/m_data_output.fpp 23.07% 20 Missing ⚠️
src/simulation/m_start_up.fpp 33.33% 20 Missing ⚠️
src/simulation/m_viscous.fpp 5.26% 16 Missing and 2 partials ⚠️
src/simulation/m_sim_helpers.fpp 20.00% 14 Missing and 2 partials ⚠️
src/simulation/m_muscl.fpp 44.00% 8 Missing and 6 partials ⚠️
src/pre_process/m_boundary_conditions.fpp 0.00% 10 Missing ⚠️
... and 21 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1035      +/-   ##
==========================================
+ Coverage   46.02%   46.06%   +0.03%     
==========================================
  Files          67       68       +1     
  Lines       13437    13558     +121     
  Branches     1550     1590      +40     
==========================================
+ Hits         6185     6245      +60     
- Misses       6362     6388      +26     
- Partials      890      925      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@wilfonba wilfonba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for benchmark

@sbryngelson
Copy link
Member

somehow all the post processes benchmark tests are failing

@sbryngelson sbryngelson marked this pull request as ready for review November 12, 2025 17:19
Copilot AI review requested due to automatic review settings November 12, 2025 17:19
@sbryngelson sbryngelson requested review from a team and sbryngelson as code owners November 12, 2025 17:19
@qodo-merge-pro
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The change from bc index range [-1,1] to [1,2] appears inconsistent with legacy semantics (beg/end). Verify all call sites and initializations now correctly map begin/end to 1/2; e.g., mixed usage of constants (BC_SlIP_WALL vs BC_SLIP_WALL) and new default assignment logic using min(bc_%beg,0) could lead to wrong BC selection or uninitialized paths.

if (p == 0) return

#:if not MFC_CASE_OPTIMIZATION or num_dims > 2

    if (bc_z%beg >= 0) then
        call s_mpi_sendrecv_variables_buffers(q_prim_vf, 3, -1, sys_size, pb_in, mv_in)
    else
        #:call GPU_PARALLEL_LOOP(collapse=2)
            do l = -buff_size, n + buff_size
                do k = -buff_size, m + buff_size
                    select case (int(bc_type(3, 1)%sf(k, l, 0)))
                    case (BC_CHAR_SUP_OUTFLOW:BC_GHOST_EXTRAP)
                        call s_ghost_cell_extrapolation(q_prim_vf, 3, -1, k, l)
                    case (BC_REFLECTIVE)
                        call s_symmetry(q_prim_vf, 3, -1, k, l, pb_in, mv_in)
                    case (BC_PERIODIC)
                        call s_periodic(q_prim_vf, 3, -1, k, l, pb_in, mv_in)
                    case (BC_SLIP_WALL)
                        call s_slip_wall(q_prim_vf, 3, -1, k, l)
                    case (BC_NO_SLIP_WALL)
                        call s_no_slip_wall(q_prim_vf, 3, -1, k, l)
                    case (BC_DIRICHLET)
                        call s_dirichlet(q_prim_vf, 3, -1, k, l)
                    end select

                    if (qbmm .and. (.not. polytropic) .and. &
                        (bc_type(3, 1)%sf(k, l, 0) <= BC_GHOST_EXTRAP)) then
                        call s_qbmm_extrapolation(3, -1, k, l, pb_in, mv_in)
                    end if
                end do
            end do
        #:endcall GPU_PARALLEL_LOOP
    end if

    if (bc_z%end >= 0) then
        call s_mpi_sendrecv_variables_buffers(q_prim_vf, 3, 1, sys_size, pb_in, mv_in)
    else
        #:call GPU_PARALLEL_LOOP(collapse=2)
            do l = -buff_size, n + buff_size
                do k = -buff_size, m + buff_size
                    select case (int(bc_type(3, 2)%sf(k, l, 0)))
                    case (BC_CHAR_SUP_OUTFLOW:BC_GHOST_EXTRAP)
                        call s_ghost_cell_extrapolation(q_prim_vf, 3, 1, k, l)
                    case (BC_REFLECTIVE)
                        call s_symmetry(q_prim_vf, 3, 1, k, l, pb_in, mv_in)
                    case (BC_PERIODIC)
                        call s_periodic(q_prim_vf, 3, 1, k, l, pb_in, mv_in)
                    case (BC_SlIP_WALL)
                        call s_slip_wall(q_prim_vf, 3, 1, k, l)
                    case (BC_NO_SLIP_WALL)
                        call s_no_slip_wall(q_prim_vf, 3, 1, k, l)
                    case (BC_DIRICHLET)
                        call s_dirichlet(q_prim_vf, 3, 1, k, l)
                    end select

                    if (qbmm .and. (.not. polytropic) .and. &
                        (bc_type(3, 2)%sf(k, l, 0) <= BC_GHOST_EXTRAP)) then
                        call s_qbmm_extrapolation(3, 1, k, l, pb_in, mv_in)
                    end if
                end do

Typo/Case Sensitivity
In several select cases previously using BC_SLIP_WALL, a new token BC_SlIP_WALL (lowercase l) was introduced. This likely does not match the defined constant and would skip the intended branch at runtime.

MPI I/O Change

MPI parallel I/O for bc types/buffers moved from derived subarray datatypes and file views to raw contiguous writes with element counts computed via sizeof and custom mpi_io_type/stp. Confirm datatype, alignment, and endianness consistency across ranks and restarts, especially under mixed precision; also ensure nelements computation is correct for all precisions.

        integer :: offset
        character(len=7) :: proc_rank_str
        logical :: dir_check
        integer :: nelements

        call s_pack_boundary_condition_buffers(q_prim_vf)

        file_loc = trim(case_dir)//'/restart_data/boundary_conditions'
        if (proc_rank == 0) then
            call my_inquire(file_loc, dir_check)
            if (dir_check .neqv. .true.) then
                call s_create_directory(trim(file_loc))
            end if
        end if

        call s_create_mpi_types(bc_type)

        call s_mpi_barrier()

        call DelayFileAccess(proc_rank)

        write (proc_rank_str, '(I7.7)') proc_rank
        file_path = trim(file_loc)//'/bc_'//trim(proc_rank_str)//'.dat'
        call MPI_File_open(MPI_COMM_SELF, trim(file_path), MPI_MODE_CREATE + MPI_MODE_WRONLY, MPI_INFO_NULL, file_id, ierr)

        offset = 0

        ! Write bc_types
        do dir = 1, num_dims
            do loc = 1, 2
#ifdef MFC_MIXED_PRECISION
                nelements = sizeof(bc_type(dir, loc)%sf)
                call MPI_File_write_all(file_id, bc_type(dir, loc)%sf, nelements, MPI_BYTE, MPI_STATUS_IGNORE, ierr)
#else
                nelements = sizeof(bc_type(dir, loc)%sf)/4
                call MPI_File_write_all(file_id, bc_type(dir, loc)%sf, nelements, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
#endif
            end do
        end do

        ! Write bc_buffers
        do dir = 1, num_dims
            do loc = 1, 2
                nelements = sizeof(bc_buffers(dir, loc)%sf)*mpi_io_type/stp
                call MPI_File_write_all(file_id, bc_buffers(dir, loc)%sf, nelements, mpi_io_p, MPI_STATUS_IGNORE, ierr)
            end do
        end do

        call MPI_File_close(file_id, ierr)
#endif

    end subroutine s_write_parallel_boundary_condition_files

    subroutine s_read_serial_boundary_condition_files(step_dirpath, bc_type)

        character(LEN=*), intent(in) :: step_dirpath

        type(integer_field), dimension(1:num_dims, 1:2), intent(inout) :: bc_type

        integer :: dir, loc
        logical :: file_exist
        character(len=path_len) :: file_path

        character(len=10) :: status

        ! Read bc_types
        file_path = trim(step_dirpath)//'/bc_type.dat'
        inquire (FILE=trim(file_path), EXIST=file_exist)
        if (.not. file_exist) then
            call s_mpi_abort(trim(file_path)//' is missing. Exiting.')
        end if

        open (1, FILE=trim(file_path), FORM='unformatted', STATUS='unknown')
        do dir = 1, num_dims
            do loc = 1, 2
                read (1) bc_type(dir, loc)%sf
                $:GPU_UPDATE(device='[bc_type(dir, loc)%sf]')
            end do
        end do
        close (1)

        ! Read bc_buffers
        file_path = trim(step_dirpath)//'/bc_buffers.dat'
        inquire (FILE=trim(file_path), EXIST=file_exist)
        if (.not. file_exist) then
            call s_mpi_abort(trim(file_path)//' is missing. Exiting.')
        end if

        open (1, FILE=trim(file_path), FORM='unformatted', STATUS='unknown')
        do dir = 1, num_dims
            do loc = 1, 2
                read (1) bc_buffers(dir, loc)%sf
                $:GPU_UPDATE(device='[bc_buffers(dir, loc)%sf]')
            end do
        end do
        close (1)

    end subroutine s_read_serial_boundary_condition_files

    subroutine s_read_parallel_boundary_condition_files(bc_type)

        type(integer_field), dimension(1:num_dims, 1:2), intent(inout) :: bc_type

        integer :: dir, loc
        character(len=path_len) :: file_loc, file_path

        character(len=10) :: status

#ifdef MFC_MPI
        integer :: ierr
        integer :: file_id
        integer :: offset
        character(len=7) :: proc_rank_str
        logical :: dir_check
        integer :: nelements

        file_loc = trim(case_dir)//'/restart_data/boundary_conditions'

        if (proc_rank == 0) then
            call my_inquire(file_loc, dir_check)
            if (dir_check .neqv. .true.) then
                call s_mpi_abort(trim(file_loc)//' is missing. Exiting.')
            end if
        end if

        call s_create_mpi_types(bc_type)

        call s_mpi_barrier()

        call DelayFileAccess(proc_rank)

        write (proc_rank_str, '(I7.7)') proc_rank
        file_path = trim(file_loc)//'/bc_'//trim(proc_rank_str)//'.dat'
        call MPI_File_open(MPI_COMM_SELF, trim(file_path), MPI_MODE_RDONLY, MPI_INFO_NULL, file_id, ierr)

        offset = 0

        ! Read bc_types
        do dir = 1, num_dims
            do loc = 1, 2
#ifdef MFC_MIXED_PRECISION
                nelements = sizeof(bc_type(dir, loc)%sf)
                call MPI_File_read_all(file_id, bc_type(dir, loc)%sf, nelements, MPI_BYTE, MPI_STATUS_IGNORE, ierr)
#else
                nelements = sizeof(bc_type(dir, loc)%sf)/4
                call MPI_File_read_all(file_id, bc_type(dir, loc)%sf, nelements, MPI_INTEGER, MPI_STATUS_IGNORE, ierr)
#endif
                $:GPU_UPDATE(device='[bc_type(dir, loc)%sf]')
            end do
        end do

        ! Read bc_buffers
        do dir = 1, num_dims
            do loc = 1, 2
                nelements = sizeof(bc_buffers(dir, loc)%sf)*mpi_io_type/stp
                call MPI_File_read_all(file_id, bc_buffers(dir, loc)%sf, nelements, mpi_io_p, MPI_STATUS_IGNORE, ierr)
                $:GPU_UPDATE(device='[bc_buffers(dir, loc)%sf]')
            end do
        end do

Copilot finished reviewing on behalf of sbryngelson November 12, 2025 17:21
Comment on lines +1944 to 1959
bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1)
bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')

if (n > 0) then
bc_type(2, -1)%sf(:, :, :) = bc_y%beg
bc_type(2, 1)%sf(:, :, :) = bc_y%end
$:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]')

if (p > 0) then
bc_type(3, -1)%sf(:, :, :) = bc_z%beg
bc_type(3, 1)%sf(:, :, :) = bc_z%end
$:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]')
end if
bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1)
bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')
#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
if (p > 0) then
bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1)
bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')
end if
#:endif
end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Correct the default boundary condition assignment by removing the min(..., 0) logic, which incorrectly handles MPI processor ranks. The previous logic of direct assignment from bc_...%... should be restored with a type cast. [possible issue, importance: 10]

Suggested change
bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1)
bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')
if (n > 0) then
bc_type(2, -1)%sf(:, :, :) = bc_y%beg
bc_type(2, 1)%sf(:, :, :) = bc_y%end
$:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]')
if (p > 0) then
bc_type(3, -1)%sf(:, :, :) = bc_z%beg
bc_type(3, 1)%sf(:, :, :) = bc_z%end
$:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]')
end if
bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1)
bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')
#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
if (p > 0) then
bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1)
bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')
end if
#:endif
end if
bc_type(1, 1)%sf(:, :, :) = int(bc_x%beg, kind=1)
bc_type(1, 2)%sf(:, :, :) = int(bc_x%end, kind=1)
$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')
if (n > 0) then
bc_type(2, 1)%sf(:, :, :) = int(bc_y%beg, kind=1)
bc_type(2, 2)%sf(:, :, :) = int(bc_y%end, kind=1)
$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')
#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
if (p > 0) then
bc_type(3, 1)%sf(:, :, :) = int(bc_z%beg, kind=1)
bc_type(3, 2)%sf(:, :, :) = int(bc_z%end, kind=1)
$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')
end if
#:endif
end if

Comment on lines +311 to +519
if (shear_stress) then ! Shear stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do

if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
alpha_visc_sum = 0._wp

if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

end if

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)

if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do
end if
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp
tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)

if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do

end if
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if

if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do

if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

if (Re_size(i) > 0) Re_visc(i) = 0._wp
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do
end if
end if

tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)

tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)

$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do

end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
alpha_visc_sum = 0._wp

if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
end if

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp

if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

end if

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do

if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do
do i = 1, 2
Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do
end do
end if
end if
end if

tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)
tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)

tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)
tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)

tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
#:endcall GPU_PARALLEL_LOOP
end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Refactor the duplicated logic within the shear_stress and bulk_stress blocks into a separate subroutine to improve code maintainability. [general, importance: 7]

Suggested change
if (shear_stress) then ! Shear stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
alpha_visc_sum = 0._wp
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
end if
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real
if (Re_size(i) > 0) Re_visc(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
end do
end if
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
alpha_visc_sum = 0._wp
tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do
end if
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
if (Re_size(i) > 0) Re_visc(i) = 0._wp
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
end do
end if
end if
tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)
tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)
$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
alpha_visc_sum = 0._wp
if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
end if
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
alpha_visc_sum = 0._wp
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
end if
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real
if (Re_size(i) > 0) Re_visc(i) = 0._wp
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do
do i = 1, 2
Re_visc(i) = dflt_real
if (Re_size(i) > 0) Re_visc(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
end do
end do
end if
end if
end if
tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)
tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)
tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)
tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
#:endcall GPU_PARALLEL_LOOP
end if
if (shear_stress) then ! Shear stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc)
tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)
tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)
$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc)
tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)
tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds OpenMP support for the Cray Compiler Environment (CCE) and introduces mixed precision functionality developed for Gordon Bell submissions. The changes span toolchain configuration, simulation code optimization, and new simplex noise perturbation capabilities.

Key Changes:

  • Added mixed precision support with a new stp (storage precision) type alongside the existing wp (working precision) type
  • Implemented case-specific optimizations that conditionally compile code based on runtime parameters
  • Updated MPI I/O operations to handle mixed precision data types
  • Added simplex noise module for initial condition perturbations
  • Enhanced Frontier HPC template with better resource management (sbcast for binaries, nvme storage)

Reviewed Changes

Copilot reviewed 65 out of 67 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
toolchain/templates/frontier.mako Added nvme constraint, improved binary broadcast with sbcast, streamlined profiler placement
toolchain/pyproject.toml Updated pyrometheus dependency to development branch for OpenMP testing
toolchain/modules Duplicate entry for CSCS Santis configuration (lines 92-95)
toolchain/mfc/state.py Added mixed precision flag to MFCConfig, improved code formatting
toolchain/mfc/run/input.py Updated real_type selection logic to include mixed precision mode
toolchain/mfc/run/case_dicts.py Added simplex_params parameter definitions for density/velocity perturbations
toolchain/mfc/lock.py Incremented lock version to 8
toolchain/mfc/build.py Added MFC_MIXED_PRECISION CMake flag
tests/43B5FEBD/golden-metadata.txt New golden metadata file documenting test configuration
src/simulation/p_main.fpp Added unused status variable declaration
src/simulation/m_weno.fpp Added case optimization guards to conditionally compile WENO stencil code
src/simulation/m_viscous.fpp Wrapped 3D viscous stress calculations in case optimization guards
src/simulation/m_time_steppers.fpp Converted to mixed precision (stp) for time-stepping arrays, reorganized probe storage
src/simulation/m_surface_tension.fpp Added explicit type conversion for sqrt argument, wrapped 3D code
src/simulation/m_start_up.fpp Changed MPI I/O word size to 4 bytes, extensive mixed precision conversions
src/simulation/m_sim_helpers.fpp Wrapped 3D CFL calculations in case optimization guards
src/simulation/m_riemann_solvers.fpp Expanded private variable lists in parallel loops for thread safety
src/simulation/m_rhs.fpp Changed loop iterators to 8-byte integers, wrapped OpenACC directives
src/simulation/m_qbmm.fpp Added case optimization for bubble model coefficient calculations
src/simulation/m_muscl.fpp Renamed reconstruction workspace arrays with _muscl suffix, fixed indentation
src/simulation/m_mhd.fpp Added missing private variables to parallel loop
src/simulation/m_ibm.fpp Converted ghost point routines to use mixed precision
src/simulation/m_hypoelastic.fpp Added private variables to GPU loops
src/simulation/m_hyperelastic.fpp Added missing private variable to parallel loop
src/simulation/m_global_parameters.fpp Cleaned up GPU data transfer directives
src/simulation/m_fftw.fpp Refactored FFT filtering for improved device management
src/simulation/m_derived_variables.fpp Updated time-stepping array references
src/simulation/m_data_output.fpp Renamed downsampling temporary arrays, updated MPI I/O types
src/simulation/m_cbc.fpp Made dpres_ds a local variable, wrapped case-specific code
src/simulation/m_bubbles_EL_kernels.fpp Converted to mixed precision with explicit type conversions
src/simulation/m_bubbles_EL.fpp Changed q_beta from vector_field to scalar_field array
src/simulation/m_bubbles_EE.fpp Added private variables and explicit type conversions
src/simulation/m_body_forces.fpp Removed unnecessary parentheses
src/simulation/m_acoustic_src.fpp Updated loop structure and allocation
src/simulation/include/inline_capillary.fpp Wrapped 3D capillary stress code in optimization guards
src/pre_process/m_start_up.fpp Added simplex_perturb to namelist, added finalization call
src/pre_process/m_simplex_noise.fpp New module implementing 2D/3D simplex noise for perturbations
src/pre_process/m_perturbation.fpp Added simplex noise perturbation routine
src/pre_process/m_mpi_proxy.fpp Added MPI broadcast for simplex parameters
src/pre_process/m_initial_condition.fpp Updated bc_type indexing, added simplex perturbation call
src/pre_process/m_global_parameters.fpp Added simplex_perturb parameters
setupNB.sh New setup script for NVHPC compilers on specific system

@@ -0,0 +1,104 @@
#!/usr/bin/env python3
import math
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'math' is not used.

Suggested change
import math

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accept this suggestion

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +236 to +241
function dot2(g, x, y) result(dot)

integer, intent(in) :: g
real(wp), intent(in) :: x, y
real(wp) :: dot
dot = grad2(g + 1, 1)*x + grad2(g + 1, 2)*y

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent simplex noise gradient indexing overflow

grad2 contains only 10 gradient vectors, but the indices used in f_simplex2d are computed as mod(...,12)+1 and dot2 then indexes grad2(g + 1, :). Whenever the permutation yields 10 or 11, g + 1 evaluates to 11–12 and the code reads beyond grad2’s bounds, which will crash or corrupt data when simplex_perturb is enabled. Either the gradient table needs 12 entries or the modulus/offset logic must restrict g to 0–9 before the lookup.

Useful? React with 👍 / 👎.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bug: Array Index Mismatch Breaks Deallocation

The deallocation code attempts to deallocate bc_type(1, -1)%sf, bc_type(2, -1)%sf, and bc_type(3, -1)%sf, but the bc_type array dimension was changed from (-1:1) to (1:2) throughout the codebase. The second dimension no longer has index -1, causing an out-of-bounds access. The indices should be 1 and 2 instead of -1 and 1.

src/post_process/m_data_input.f90#L671-L676

deallocate (bc_type(1, -1)%sf, bc_type(1, 1)%sf)
if (n > 0) then
deallocate (bc_type(2, -1)%sf, bc_type(2, 1)%sf)
if (p > 0) then
deallocate (bc_type(3, -1)%sf, bc_type(3, 1)%sf)

Fix in Cursor Fix in Web


if ((z_cc(k) - z_centroid)**2._wp + &
(y_cc(j) - y_centroid)**2._wp <= radius**2._wp) then
bc_type(1, -1)%sf(0, j, k) = patch_bc(patch_id)%type
bc_type(1, 1)%sf(0, j, k) = patch_bc(patch_id)%type
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect boundary indexing causes data overwrite.

The s_circle_bc subroutine always writes to bc_type(1, 1) regardless of whether the patch is at the beginning or end boundary. The template loop iterates over LOC values of -1 and 1, but unlike s_line_segment_bc which conditionally selects index 1 or 2 based on LOC, this code always uses index 1. When LOC == 1 (end boundary), it writes to the wrong array element, overwriting beginning boundary data instead of writing to bc_type(1, 2).

Fix in Cursor Fix in Web

z_boundary%beg <= z_cc(k) .and. &
z_boundary%end >= z_cc(k)) then
bc_type(1, -1)%sf(0, j, k) = patch_bc(patch_id)%type
bc_type(1, 1)%sf(0, j, k) = patch_bc(patch_id)%type
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Boundary Data Overlap Error

The s_rectangle_bc subroutine always writes to bc_type(1, 1) regardless of whether the patch is at the beginning or end boundary. The template loop iterates over LOC values of -1 and 1, but unlike s_line_segment_bc which conditionally selects index 1 or 2 based on LOC, this code always uses index 1. When LOC == 1 (end boundary), it writes to the wrong array element, overwriting beginning boundary data instead of writing to bc_type(1, 2).

Fix in Cursor Fix in Web

E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R

H_L = (E_L + pres_L)/rho_L
H_R = (E_R + pres_R)/rho_R
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Duplicate Assignment Overwrites Itself

Duplicate assignment of H_L and H_R that immediately overwrites the same values computed on lines 2471-2472. The calculation (E_L + pres_L)/rho_L and (E_R + pres_R)/rho_R is performed twice consecutively with identical results.

Fix in Cursor Fix in Web

else()
message(STATUS "Performing IPO using -Mextract followed by -Minline")
set(NVHPC_USE_TWO_PASS_IPO TRUE)
set(NVHPC_USE_TWO_PASS_IPO FALSE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

@@ -0,0 +1,104 @@
#!/usr/bin/env python3
import math
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accept this suggestion

"num_igr_iters": igrIters,
"num_igr_warm_start_iters": igrIters,
"alf_factor": 10,
"bc_x%beg": -3,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be -17. This case and the associated hard coded patches need cleaned up anyway. I can address this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be deleted, moved to misc/, or incorporated into the toolchain somehow. This PR doesn't add AMD compilers anyway, so my suggestion would be deleted for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete file

@:DEALLOCATE(q_prim_qp%vf(j)%sf)
else
$:GPU_EXIT_DATA(detach='[q_prim_qp%vf(j)%sf]')
!$:GPU_EXIT_DATA(detach='[q_prim_qp%vf(j)%sf]')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out?

"pyrometheus == 1.0.5",

#"pyrometheus == 1.0.5",
"pyrometheus @ git+https://github.com/wilfonba/pyrometheus-wilfong.git@OpenMPTest",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to merge this with the Pyro upstream...

real(wp) :: start, finish
integer :: nt

logical :: status
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Collaborator

@wilfonba wilfonba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants