-
Notifications
You must be signed in to change notification settings - Fork 121
Openmp cce #1035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Openmp cce #1035
Conversation
…, started work on enter and exit data, compiles
…, exit data, and update
…e, add mappers to derived types, change how allocate is done
…types, removed rest of pure functions, fix issue with acoustic on nvfortran
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1035 +/- ##
==========================================
+ Coverage 46.02% 46.06% +0.03%
==========================================
Files 67 68 +1
Lines 13437 13558 +121
Branches 1550 1590 +40
==========================================
+ Hits 6185 6245 +60
- Misses 6362 6388 +26
- Partials 890 925 +35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
wilfonba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve for benchmark
|
somehow all the post processes benchmark tests are failing |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
| bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1) | ||
| bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1) | ||
| $:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]') | ||
|
|
||
| if (n > 0) then | ||
| bc_type(2, -1)%sf(:, :, :) = bc_y%beg | ||
| bc_type(2, 1)%sf(:, :, :) = bc_y%end | ||
| $:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]') | ||
|
|
||
| if (p > 0) then | ||
| bc_type(3, -1)%sf(:, :, :) = bc_z%beg | ||
| bc_type(3, 1)%sf(:, :, :) = bc_z%end | ||
| $:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]') | ||
| end if | ||
| bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1) | ||
| bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1) | ||
| $:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]') | ||
| #:if not MFC_CASE_OPTIMIZATION or num_dims > 2 | ||
| if (p > 0) then | ||
| bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1) | ||
| bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1) | ||
| $:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]') | ||
| end if | ||
| #:endif | ||
| end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Correct the default boundary condition assignment by removing the min(..., 0) logic, which incorrectly handles MPI processor ranks. The previous logic of direct assignment from bc_...%... should be restored with a type cast. [possible issue, importance: 10]
| bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1) | |
| bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1) | |
| $:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]') | |
| if (n > 0) then | |
| bc_type(2, -1)%sf(:, :, :) = bc_y%beg | |
| bc_type(2, 1)%sf(:, :, :) = bc_y%end | |
| $:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]') | |
| if (p > 0) then | |
| bc_type(3, -1)%sf(:, :, :) = bc_z%beg | |
| bc_type(3, 1)%sf(:, :, :) = bc_z%end | |
| $:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]') | |
| end if | |
| bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1) | |
| bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1) | |
| $:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]') | |
| #:if not MFC_CASE_OPTIMIZATION or num_dims > 2 | |
| if (p > 0) then | |
| bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1) | |
| bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1) | |
| $:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]') | |
| end if | |
| #:endif | |
| end if | |
| bc_type(1, 1)%sf(:, :, :) = int(bc_x%beg, kind=1) | |
| bc_type(1, 2)%sf(:, :, :) = int(bc_x%end, kind=1) | |
| $:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]') | |
| if (n > 0) then | |
| bc_type(2, 1)%sf(:, :, :) = int(bc_y%beg, kind=1) | |
| bc_type(2, 2)%sf(:, :, :) = int(bc_y%end, kind=1) | |
| $:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]') | |
| #:if not MFC_CASE_OPTIMIZATION or num_dims > 2 | |
| if (p > 0) then | |
| bc_type(3, 1)%sf(:, :, :) = int(bc_z%beg, kind=1) | |
| bc_type(3, 2)%sf(:, :, :) = int(bc_z%end, kind=1) | |
| $:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]') | |
| end if | |
| #:endif | |
| end if |
| if (shear_stress) then ! Shear stresses | ||
| #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') | ||
| do l = is3_viscous%beg, is3_viscous%end | ||
| do k = -1, 1 | ||
| do j = is1_viscous%beg, is1_viscous%end | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) | ||
| if (bubbles_euler .and. num_fluids == 1) then | ||
| alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) | ||
| if (bubbles_euler .and. num_fluids == 1) then | ||
| alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) | ||
| else | ||
| alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) | ||
| end if | ||
| end do | ||
|
|
||
| if (bubbles_euler) then | ||
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
|
|
||
| if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
| else if ((model_eqns == 2) .and. (num_fluids > 2)) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids - 1 | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
| else | ||
| rho_visc = alpha_rho_visc(1) | ||
| gamma_visc = gammas(1) | ||
| pi_inf_visc = pi_infs(1) | ||
| end if | ||
| else | ||
| alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) | ||
| end if | ||
| end do | ||
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
|
|
||
| if (bubbles_euler) then | ||
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
| alpha_visc_sum = 0._wp | ||
|
|
||
| if (mpp_lim) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) | ||
| alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) | ||
| alpha_visc_sum = alpha_visc_sum + alpha_visc(i) | ||
| end do | ||
|
|
||
| alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) | ||
|
|
||
| end if | ||
|
|
||
| if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
| else if ((model_eqns == 2) .and. (num_fluids > 2)) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids - 1 | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
| else | ||
| rho_visc = alpha_rho_visc(1) | ||
| gamma_visc = gammas(1) | ||
| pi_inf_visc = pi_infs(1) | ||
|
|
||
| if (viscous) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, 2 | ||
| Re_visc(i) = dflt_real | ||
|
|
||
| if (Re_size(i) > 0) Re_visc(i) = 0._wp | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do q = 1, Re_size(i) | ||
| Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & | ||
| + Re_visc(i) | ||
| end do | ||
|
|
||
| Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) | ||
|
|
||
| end do | ||
| end if | ||
| end if | ||
| else | ||
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
|
|
||
| alpha_visc_sum = 0._wp | ||
| tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | ||
| Re_visc(1) | ||
|
|
||
| if (mpp_lim) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) | ||
| alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) | ||
| alpha_visc_sum = alpha_visc_sum + alpha_visc(i) | ||
| end do | ||
| tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - & | ||
| q_prim_vf(momxe)%sf(j, k, l))/ & | ||
| y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ & | ||
| Re_visc(1) | ||
|
|
||
| alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 2, 3 | ||
| tau_Re_vf(contxe + i)%sf(j, k, l) = & | ||
| tau_Re_vf(contxe + i)%sf(j, k, l) - & | ||
| tau_Re(2, i) | ||
|
|
||
| tau_Re_vf(E_idx)%sf(j, k, l) = & | ||
| tau_Re_vf(E_idx)%sf(j, k, l) - & | ||
| q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i) | ||
| end do | ||
|
|
||
| end if | ||
| end do | ||
| end do | ||
| end do | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end if | ||
|
|
||
| if (bulk_stress) then ! Bulk stresses | ||
| #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') | ||
| do l = is3_viscous%beg, is3_viscous%end | ||
| do k = -1, 1 | ||
| do j = is1_viscous%beg, is1_viscous%end | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) | ||
| if (bubbles_euler .and. num_fluids == 1) then | ||
| alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) | ||
| else | ||
| alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) | ||
| end if | ||
| end do | ||
|
|
||
| if (viscous) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, 2 | ||
| Re_visc(i) = dflt_real | ||
| if (bubbles_euler) then | ||
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
|
|
||
| if (Re_size(i) > 0) Re_visc(i) = 0._wp | ||
| if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do q = 1, Re_size(i) | ||
| Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & | ||
| + Re_visc(i) | ||
| do i = 1, num_fluids | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
| else if ((model_eqns == 2) .and. (num_fluids > 2)) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids - 1 | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
| else | ||
| rho_visc = alpha_rho_visc(1) | ||
| gamma_visc = gammas(1) | ||
| pi_inf_visc = pi_infs(1) | ||
| end if | ||
| else | ||
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
|
|
||
| Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) | ||
|
|
||
| end do | ||
| end if | ||
| end if | ||
|
|
||
| tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | ||
| Re_visc(1) | ||
|
|
||
| tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - & | ||
| q_prim_vf(momxe)%sf(j, k, l))/ & | ||
| y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ & | ||
| Re_visc(1) | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 2, 3 | ||
| tau_Re_vf(contxe + i)%sf(j, k, l) = & | ||
| tau_Re_vf(contxe + i)%sf(j, k, l) - & | ||
| tau_Re(2, i) | ||
|
|
||
| tau_Re_vf(E_idx)%sf(j, k, l) = & | ||
| tau_Re_vf(E_idx)%sf(j, k, l) - & | ||
| q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i) | ||
| end do | ||
|
|
||
| end do | ||
| end do | ||
| end do | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end if | ||
| alpha_visc_sum = 0._wp | ||
|
|
||
| if (bulk_stress) then ! Bulk stresses | ||
| #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') | ||
| do l = is3_viscous%beg, is3_viscous%end | ||
| do k = -1, 1 | ||
| do j = is1_viscous%beg, is1_viscous%end | ||
| if (mpp_lim) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) | ||
| alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) | ||
| alpha_visc_sum = alpha_visc_sum + alpha_visc(i) | ||
| end do | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) | ||
| if (bubbles_euler .and. num_fluids == 1) then | ||
| alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) | ||
| else | ||
| alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) | ||
| end if | ||
| end do | ||
| alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) | ||
|
|
||
| if (bubbles_euler) then | ||
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
| end if | ||
|
|
||
| if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
| else if ((model_eqns == 2) .and. (num_fluids > 2)) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids - 1 | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
| else | ||
| rho_visc = alpha_rho_visc(1) | ||
| gamma_visc = gammas(1) | ||
| pi_inf_visc = pi_infs(1) | ||
| end if | ||
| else | ||
| rho_visc = 0._wp | ||
| gamma_visc = 0._wp | ||
| pi_inf_visc = 0._wp | ||
|
|
||
| alpha_visc_sum = 0._wp | ||
|
|
||
| if (mpp_lim) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) | ||
| alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) | ||
| alpha_visc_sum = alpha_visc_sum + alpha_visc(i) | ||
| end do | ||
|
|
||
| alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) | ||
|
|
||
| end if | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| rho_visc = rho_visc + alpha_rho_visc(i) | ||
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | ||
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | ||
| end do | ||
|
|
||
| if (viscous) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, 2 | ||
| Re_visc(i) = dflt_real | ||
|
|
||
| if (Re_size(i) > 0) Re_visc(i) = 0._wp | ||
| if (viscous) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do q = 1, Re_size(i) | ||
| Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & | ||
| + Re_visc(i) | ||
| end do | ||
| do i = 1, 2 | ||
| Re_visc(i) = dflt_real | ||
|
|
||
| if (Re_size(i) > 0) Re_visc(i) = 0._wp | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do q = 1, Re_size(i) | ||
| Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & | ||
| + Re_visc(i) | ||
| end do | ||
|
|
||
| Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) | ||
| Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) | ||
|
|
||
| end do | ||
| end do | ||
| end if | ||
| end if | ||
| end if | ||
|
|
||
| tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | ||
| Re_visc(2) | ||
| tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | ||
| Re_visc(2) | ||
|
|
||
| tau_Re_vf(momxb + 1)%sf(j, k, l) = & | ||
| tau_Re_vf(momxb + 1)%sf(j, k, l) - & | ||
| tau_Re(2, 2) | ||
| tau_Re_vf(momxb + 1)%sf(j, k, l) = & | ||
| tau_Re_vf(momxb + 1)%sf(j, k, l) - & | ||
| tau_Re(2, 2) | ||
|
|
||
| tau_Re_vf(E_idx)%sf(j, k, l) = & | ||
| tau_Re_vf(E_idx)%sf(j, k, l) - & | ||
| q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2) | ||
| tau_Re_vf(E_idx)%sf(j, k, l) = & | ||
| tau_Re_vf(E_idx)%sf(j, k, l) - & | ||
| q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2) | ||
|
|
||
| end do | ||
| end do | ||
| end do | ||
| end do | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end if | ||
| #:endcall GPU_PARALLEL_LOOP | ||
| end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Refactor the duplicated logic within the shear_stress and bulk_stress blocks into a separate subroutine to improve code maintainability. [general, importance: 7]
| if (shear_stress) then ! Shear stresses | |
| #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') | |
| do l = is3_viscous%beg, is3_viscous%end | |
| do k = -1, 1 | |
| do j = is1_viscous%beg, is1_viscous%end | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) | |
| if (bubbles_euler .and. num_fluids == 1) then | |
| alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) | |
| if (bubbles_euler .and. num_fluids == 1) then | |
| alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) | |
| else | |
| alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) | |
| end if | |
| end do | |
| if (bubbles_euler) then | |
| rho_visc = 0._wp | |
| gamma_visc = 0._wp | |
| pi_inf_visc = 0._wp | |
| if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| else if ((model_eqns == 2) .and. (num_fluids > 2)) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids - 1 | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| else | |
| rho_visc = alpha_rho_visc(1) | |
| gamma_visc = gammas(1) | |
| pi_inf_visc = pi_infs(1) | |
| end if | |
| else | |
| alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) | |
| end if | |
| end do | |
| rho_visc = 0._wp | |
| gamma_visc = 0._wp | |
| pi_inf_visc = 0._wp | |
| if (bubbles_euler) then | |
| rho_visc = 0._wp | |
| gamma_visc = 0._wp | |
| pi_inf_visc = 0._wp | |
| alpha_visc_sum = 0._wp | |
| if (mpp_lim) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) | |
| alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) | |
| alpha_visc_sum = alpha_visc_sum + alpha_visc(i) | |
| end do | |
| alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) | |
| end if | |
| if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| else if ((model_eqns == 2) .and. (num_fluids > 2)) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids - 1 | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| else | |
| rho_visc = alpha_rho_visc(1) | |
| gamma_visc = gammas(1) | |
| pi_inf_visc = pi_infs(1) | |
| if (viscous) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, 2 | |
| Re_visc(i) = dflt_real | |
| if (Re_size(i) > 0) Re_visc(i) = 0._wp | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do q = 1, Re_size(i) | |
| Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & | |
| + Re_visc(i) | |
| end do | |
| Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) | |
| end do | |
| end if | |
| end if | |
| else | |
| rho_visc = 0._wp | |
| gamma_visc = 0._wp | |
| pi_inf_visc = 0._wp | |
| alpha_visc_sum = 0._wp | |
| tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | |
| Re_visc(1) | |
| if (mpp_lim) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) | |
| alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) | |
| alpha_visc_sum = alpha_visc_sum + alpha_visc(i) | |
| end do | |
| tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - & | |
| q_prim_vf(momxe)%sf(j, k, l))/ & | |
| y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ & | |
| Re_visc(1) | |
| alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 2, 3 | |
| tau_Re_vf(contxe + i)%sf(j, k, l) = & | |
| tau_Re_vf(contxe + i)%sf(j, k, l) - & | |
| tau_Re(2, i) | |
| tau_Re_vf(E_idx)%sf(j, k, l) = & | |
| tau_Re_vf(E_idx)%sf(j, k, l) - & | |
| q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i) | |
| end do | |
| end if | |
| end do | |
| end do | |
| end do | |
| #:endcall GPU_PARALLEL_LOOP | |
| end if | |
| if (bulk_stress) then ! Bulk stresses | |
| #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') | |
| do l = is3_viscous%beg, is3_viscous%end | |
| do k = -1, 1 | |
| do j = is1_viscous%beg, is1_viscous%end | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) | |
| if (bubbles_euler .and. num_fluids == 1) then | |
| alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) | |
| else | |
| alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) | |
| end if | |
| end do | |
| if (viscous) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, 2 | |
| Re_visc(i) = dflt_real | |
| if (bubbles_euler) then | |
| rho_visc = 0._wp | |
| gamma_visc = 0._wp | |
| pi_inf_visc = 0._wp | |
| if (Re_size(i) > 0) Re_visc(i) = 0._wp | |
| if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do q = 1, Re_size(i) | |
| Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & | |
| + Re_visc(i) | |
| do i = 1, num_fluids | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| else if ((model_eqns == 2) .and. (num_fluids > 2)) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids - 1 | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| else | |
| rho_visc = alpha_rho_visc(1) | |
| gamma_visc = gammas(1) | |
| pi_inf_visc = pi_infs(1) | |
| end if | |
| else | |
| rho_visc = 0._wp | |
| gamma_visc = 0._wp | |
| pi_inf_visc = 0._wp | |
| Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) | |
| end do | |
| end if | |
| end if | |
| tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | |
| Re_visc(1) | |
| tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - & | |
| q_prim_vf(momxe)%sf(j, k, l))/ & | |
| y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ & | |
| Re_visc(1) | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 2, 3 | |
| tau_Re_vf(contxe + i)%sf(j, k, l) = & | |
| tau_Re_vf(contxe + i)%sf(j, k, l) - & | |
| tau_Re(2, i) | |
| tau_Re_vf(E_idx)%sf(j, k, l) = & | |
| tau_Re_vf(E_idx)%sf(j, k, l) - & | |
| q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i) | |
| end do | |
| end do | |
| end do | |
| end do | |
| #:endcall GPU_PARALLEL_LOOP | |
| end if | |
| alpha_visc_sum = 0._wp | |
| if (bulk_stress) then ! Bulk stresses | |
| #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]') | |
| do l = is3_viscous%beg, is3_viscous%end | |
| do k = -1, 1 | |
| do j = is1_viscous%beg, is1_viscous%end | |
| if (mpp_lim) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) | |
| alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) | |
| alpha_visc_sum = alpha_visc_sum + alpha_visc(i) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l) | |
| if (bubbles_euler .and. num_fluids == 1) then | |
| alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l) | |
| else | |
| alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l) | |
| end if | |
| end do | |
| alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) | |
| if (bubbles_euler) then | |
| rho_visc = 0._wp | |
| gamma_visc = 0._wp | |
| pi_inf_visc = 0._wp | |
| end if | |
| if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| else if ((model_eqns == 2) .and. (num_fluids > 2)) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids - 1 | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| else | |
| rho_visc = alpha_rho_visc(1) | |
| gamma_visc = gammas(1) | |
| pi_inf_visc = pi_infs(1) | |
| end if | |
| else | |
| rho_visc = 0._wp | |
| gamma_visc = 0._wp | |
| pi_inf_visc = 0._wp | |
| alpha_visc_sum = 0._wp | |
| if (mpp_lim) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i)) | |
| alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp) | |
| alpha_visc_sum = alpha_visc_sum + alpha_visc(i) | |
| end do | |
| alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps) | |
| end if | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| rho_visc = rho_visc + alpha_rho_visc(i) | |
| gamma_visc = gamma_visc + alpha_visc(i)*gammas(i) | |
| pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i) | |
| end do | |
| if (viscous) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, 2 | |
| Re_visc(i) = dflt_real | |
| if (Re_size(i) > 0) Re_visc(i) = 0._wp | |
| if (viscous) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do q = 1, Re_size(i) | |
| Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & | |
| + Re_visc(i) | |
| end do | |
| do i = 1, 2 | |
| Re_visc(i) = dflt_real | |
| if (Re_size(i) > 0) Re_visc(i) = 0._wp | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do q = 1, Re_size(i) | |
| Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) & | |
| + Re_visc(i) | |
| end do | |
| Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) | |
| Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps) | |
| end do | |
| end do | |
| end if | |
| end if | |
| end if | |
| tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | |
| Re_visc(2) | |
| tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | |
| Re_visc(2) | |
| tau_Re_vf(momxb + 1)%sf(j, k, l) = & | |
| tau_Re_vf(momxb + 1)%sf(j, k, l) - & | |
| tau_Re(2, 2) | |
| tau_Re_vf(momxb + 1)%sf(j, k, l) = & | |
| tau_Re_vf(momxb + 1)%sf(j, k, l) - & | |
| tau_Re(2, 2) | |
| tau_Re_vf(E_idx)%sf(j, k, l) = & | |
| tau_Re_vf(E_idx)%sf(j, k, l) - & | |
| q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2) | |
| tau_Re_vf(E_idx)%sf(j, k, l) = & | |
| tau_Re_vf(E_idx)%sf(j, k, l) - & | |
| q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2) | |
| end do | |
| end do | |
| end do | |
| end do | |
| #:endcall GPU_PARALLEL_LOOP | |
| end if | |
| #:endcall GPU_PARALLEL_LOOP | |
| end if | |
| if (shear_stress) then ! Shear stresses | |
| #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]') | |
| do l = is3_viscous%beg, is3_viscous%end | |
| do k = -1, 1 | |
| do j = is1_viscous%beg, is1_viscous%end | |
| call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc) | |
| tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | |
| Re_visc(1) | |
| tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - & | |
| q_prim_vf(momxe)%sf(j, k, l))/ & | |
| y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ & | |
| Re_visc(1) | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 2, 3 | |
| tau_Re_vf(contxe + i)%sf(j, k, l) = & | |
| tau_Re_vf(contxe + i)%sf(j, k, l) - & | |
| tau_Re(2, i) | |
| tau_Re_vf(E_idx)%sf(j, k, l) = & | |
| tau_Re_vf(E_idx)%sf(j, k, l) - & | |
| q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i) | |
| end do | |
| end do | |
| end do | |
| end do | |
| #:endcall GPU_PARALLEL_LOOP | |
| end if | |
| if (bulk_stress) then ! Bulk stresses | |
| #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]') | |
| do l = is3_viscous%beg, is3_viscous%end | |
| do k = -1, 1 | |
| do j = is1_viscous%beg, is1_viscous%end | |
| call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc) | |
| tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ & | |
| Re_visc(2) | |
| tau_Re_vf(momxb + 1)%sf(j, k, l) = & | |
| tau_Re_vf(momxb + 1)%sf(j, k, l) - & | |
| tau_Re(2, 2) | |
| tau_Re_vf(E_idx)%sf(j, k, l) = & | |
| tau_Re_vf(E_idx)%sf(j, k, l) - & | |
| q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2) | |
| end do | |
| end do | |
| end do | |
| #:endcall GPU_PARALLEL_LOOP | |
| end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds OpenMP support for the Cray Compiler Environment (CCE) and introduces mixed precision functionality developed for Gordon Bell submissions. The changes span toolchain configuration, simulation code optimization, and new simplex noise perturbation capabilities.
Key Changes:
- Added mixed precision support with a new
stp(storage precision) type alongside the existingwp(working precision) type - Implemented case-specific optimizations that conditionally compile code based on runtime parameters
- Updated MPI I/O operations to handle mixed precision data types
- Added simplex noise module for initial condition perturbations
- Enhanced Frontier HPC template with better resource management (sbcast for binaries, nvme storage)
Reviewed Changes
Copilot reviewed 65 out of 67 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| toolchain/templates/frontier.mako | Added nvme constraint, improved binary broadcast with sbcast, streamlined profiler placement |
| toolchain/pyproject.toml | Updated pyrometheus dependency to development branch for OpenMP testing |
| toolchain/modules | Duplicate entry for CSCS Santis configuration (lines 92-95) |
| toolchain/mfc/state.py | Added mixed precision flag to MFCConfig, improved code formatting |
| toolchain/mfc/run/input.py | Updated real_type selection logic to include mixed precision mode |
| toolchain/mfc/run/case_dicts.py | Added simplex_params parameter definitions for density/velocity perturbations |
| toolchain/mfc/lock.py | Incremented lock version to 8 |
| toolchain/mfc/build.py | Added MFC_MIXED_PRECISION CMake flag |
| tests/43B5FEBD/golden-metadata.txt | New golden metadata file documenting test configuration |
| src/simulation/p_main.fpp | Added unused status variable declaration |
| src/simulation/m_weno.fpp | Added case optimization guards to conditionally compile WENO stencil code |
| src/simulation/m_viscous.fpp | Wrapped 3D viscous stress calculations in case optimization guards |
| src/simulation/m_time_steppers.fpp | Converted to mixed precision (stp) for time-stepping arrays, reorganized probe storage |
| src/simulation/m_surface_tension.fpp | Added explicit type conversion for sqrt argument, wrapped 3D code |
| src/simulation/m_start_up.fpp | Changed MPI I/O word size to 4 bytes, extensive mixed precision conversions |
| src/simulation/m_sim_helpers.fpp | Wrapped 3D CFL calculations in case optimization guards |
| src/simulation/m_riemann_solvers.fpp | Expanded private variable lists in parallel loops for thread safety |
| src/simulation/m_rhs.fpp | Changed loop iterators to 8-byte integers, wrapped OpenACC directives |
| src/simulation/m_qbmm.fpp | Added case optimization for bubble model coefficient calculations |
| src/simulation/m_muscl.fpp | Renamed reconstruction workspace arrays with _muscl suffix, fixed indentation |
| src/simulation/m_mhd.fpp | Added missing private variables to parallel loop |
| src/simulation/m_ibm.fpp | Converted ghost point routines to use mixed precision |
| src/simulation/m_hypoelastic.fpp | Added private variables to GPU loops |
| src/simulation/m_hyperelastic.fpp | Added missing private variable to parallel loop |
| src/simulation/m_global_parameters.fpp | Cleaned up GPU data transfer directives |
| src/simulation/m_fftw.fpp | Refactored FFT filtering for improved device management |
| src/simulation/m_derived_variables.fpp | Updated time-stepping array references |
| src/simulation/m_data_output.fpp | Renamed downsampling temporary arrays, updated MPI I/O types |
| src/simulation/m_cbc.fpp | Made dpres_ds a local variable, wrapped case-specific code |
| src/simulation/m_bubbles_EL_kernels.fpp | Converted to mixed precision with explicit type conversions |
| src/simulation/m_bubbles_EL.fpp | Changed q_beta from vector_field to scalar_field array |
| src/simulation/m_bubbles_EE.fpp | Added private variables and explicit type conversions |
| src/simulation/m_body_forces.fpp | Removed unnecessary parentheses |
| src/simulation/m_acoustic_src.fpp | Updated loop structure and allocation |
| src/simulation/include/inline_capillary.fpp | Wrapped 3D capillary stress code in optimization guards |
| src/pre_process/m_start_up.fpp | Added simplex_perturb to namelist, added finalization call |
| src/pre_process/m_simplex_noise.fpp | New module implementing 2D/3D simplex noise for perturbations |
| src/pre_process/m_perturbation.fpp | Added simplex noise perturbation routine |
| src/pre_process/m_mpi_proxy.fpp | Added MPI broadcast for simplex parameters |
| src/pre_process/m_initial_condition.fpp | Updated bc_type indexing, added simplex perturbation call |
| src/pre_process/m_global_parameters.fpp | Added simplex_perturb parameters |
| setupNB.sh | New setup script for NVHPC compilers on specific system |
| @@ -0,0 +1,104 @@ | |||
| #!/usr/bin/env python3 | |||
| import math | |||
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'math' is not used.
| import math |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accept this suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| function dot2(g, x, y) result(dot) | ||
|
|
||
| integer, intent(in) :: g | ||
| real(wp), intent(in) :: x, y | ||
| real(wp) :: dot | ||
| dot = grad2(g + 1, 1)*x + grad2(g + 1, 2)*y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent simplex noise gradient indexing overflow
grad2 contains only 10 gradient vectors, but the indices used in f_simplex2d are computed as mod(...,12)+1 and dot2 then indexes grad2(g + 1, :). Whenever the permutation yields 10 or 11, g + 1 evaluates to 11–12 and the code reads beyond grad2’s bounds, which will crash or corrupt data when simplex_perturb is enabled. Either the gradient table needs 12 entries or the modulus/offset logic must restrict g to 0–9 before the lookup.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Bug: Array Index Mismatch Breaks Deallocation
The deallocation code attempts to deallocate bc_type(1, -1)%sf, bc_type(2, -1)%sf, and bc_type(3, -1)%sf, but the bc_type array dimension was changed from (-1:1) to (1:2) throughout the codebase. The second dimension no longer has index -1, causing an out-of-bounds access. The indices should be 1 and 2 instead of -1 and 1.
src/post_process/m_data_input.f90#L671-L676
MFC/src/post_process/m_data_input.f90
Lines 671 to 676 in e1d9044
| deallocate (bc_type(1, -1)%sf, bc_type(1, 1)%sf) | |
| if (n > 0) then | |
| deallocate (bc_type(2, -1)%sf, bc_type(2, 1)%sf) | |
| if (p > 0) then | |
| deallocate (bc_type(3, -1)%sf, bc_type(3, 1)%sf) |
| if ((z_cc(k) - z_centroid)**2._wp + & | ||
| (y_cc(j) - y_centroid)**2._wp <= radius**2._wp) then | ||
| bc_type(1, -1)%sf(0, j, k) = patch_bc(patch_id)%type | ||
| bc_type(1, 1)%sf(0, j, k) = patch_bc(patch_id)%type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Incorrect boundary indexing causes data overwrite.
The s_circle_bc subroutine always writes to bc_type(1, 1) regardless of whether the patch is at the beginning or end boundary. The template loop iterates over LOC values of -1 and 1, but unlike s_line_segment_bc which conditionally selects index 1 or 2 based on LOC, this code always uses index 1. When LOC == 1 (end boundary), it writes to the wrong array element, overwriting beginning boundary data instead of writing to bc_type(1, 2).
| z_boundary%beg <= z_cc(k) .and. & | ||
| z_boundary%end >= z_cc(k)) then | ||
| bc_type(1, -1)%sf(0, j, k) = patch_bc(patch_id)%type | ||
| bc_type(1, 1)%sf(0, j, k) = patch_bc(patch_id)%type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Boundary Data Overlap Error
The s_rectangle_bc subroutine always writes to bc_type(1, 1) regardless of whether the patch is at the beginning or end boundary. The template loop iterates over LOC values of -1 and 1, but unlike s_line_segment_bc which conditionally selects index 1 or 2 based on LOC, this code always uses index 1. When LOC == 1 (end boundary), it writes to the wrong array element, overwriting beginning boundary data instead of writing to bc_type(1, 2).
| E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R | ||
|
|
||
| H_L = (E_L + pres_L)/rho_L | ||
| H_R = (E_R + pres_R)/rho_R |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| else() | ||
| message(STATUS "Performing IPO using -Mextract followed by -Minline") | ||
| set(NVHPC_USE_TWO_PASS_IPO TRUE) | ||
| set(NVHPC_USE_TWO_PASS_IPO FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert
| @@ -0,0 +1,104 @@ | |||
| #!/usr/bin/env python3 | |||
| import math | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accept this suggestion
| "num_igr_iters": igrIters, | ||
| "num_igr_warm_start_iters": igrIters, | ||
| "alf_factor": 10, | ||
| "bc_x%beg": -3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be -17. This case and the associated hard coded patches need cleaned up anyway. I can address this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs to be deleted, moved to misc/, or incorporated into the toolchain somehow. This PR doesn't add AMD compilers anyway, so my suggestion would be deleted for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete file
| @:DEALLOCATE(q_prim_qp%vf(j)%sf) | ||
| else | ||
| $:GPU_EXIT_DATA(detach='[q_prim_qp%vf(j)%sf]') | ||
| !$:GPU_EXIT_DATA(detach='[q_prim_qp%vf(j)%sf]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented out?
| "pyrometheus == 1.0.5", | ||
|
|
||
| #"pyrometheus == 1.0.5", | ||
| "pyrometheus @ git+https://github.com/wilfonba/pyrometheus-wilfong.git@OpenMPTest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to merge this with the Pyro upstream...
| real(wp) :: start, finish | ||
| integer :: nt | ||
|
|
||
| logical :: status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
wilfonba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve for benchmark
User description
Description
Add OpenMP support to MFC on the Cray Compiler. Passes all test cases, and adds CI to test on Frontier.
Also, adds in mixed precision made added for Gordon Bell.
Fixes #(issue) [optional]
Type of change
Please delete options that are not relevant.
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration
Test Configuration:
Checklist
docs/)examples/that demonstrate my new feature performing as expected.They run to completion and demonstrate "interesting physics"
./mfc.sh formatbefore committing my codeIf your code changes any code source files (anything in
src/simulation)To make sure the code is performing as expected on GPU devices, I have:
nvtxranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.PR Type
Enhancement, Bug fix
Description
Add OpenMP support to MFC on the Cray Compiler with comprehensive case optimization
Implement mixed precision support throughout the codebase for improved performance and memory efficiency
Refactor boundary condition arrays from negative indexing
(-1:1)to positive indexing(1:2)for optimizationAdd case optimization guards to skip unnecessary 3D computations in 2D simulations (WENO, viscous, MHD, capillary)
Introduce simplex noise generation module for procedural perturbation support
Fix GPU compatibility issues in chemistry and FFT modules for both NVIDIA and AMD compilers
Update MPI operations to support mixed precision with proper type conversions and 64-bit indexing
Refactor bubble Eulerian projection data structure for improved memory layout
Fix MUSCL workspace array naming conflicts and GPU parallel loop syntax issues
Add CI testing on Frontier supercomputer for Cray Compiler validation
Diagram Walkthrough
File Walkthrough
17 files
m_boundary_common.fpp
Refactor boundary arrays and add case optimization supportsrc/common/m_boundary_common.fpp
(-1:1)to(1:2)tosupport case optimization
(-1, 1)to positiveindices
(1, 2)throughout the module#:if notMFC_CASE_OPTIMIZATION or num_dims > 2to skip 3D boundary operationsin optimized 2D cases
real(wp)toreal(stp)for mixed precisionsupport in buffer parameters
MPI_BYTEandelement count calculations
m_weno.fpp
Add case optimization guards to WENO reconstruction schemessrc/simulation/m_weno.fpp
#:include 'case.fpp'directive for case optimization supportcompilation
#:if not MFC_CASE_OPTIMIZATION or weno_num_stencils > Nfor higher-order schemes
optimization for reduced stencil cases
m_bubbles_EL.fpp
Refactor bubble Eulerian projection data structure for mixed precisionsrc/simulation/m_bubbles_EL.fpp
q_betafromtype(vector_field)totype(scalar_field),dimension(:), allocatablefor better memory layoutq_beta%vf(1:q_beta_idx)toq_beta(1:q_beta_idx)with individual scalar field setupreal(wp)toreal(stp)for mixed precisionsupport in gradient calculations
s_gradient_dirsubroutine signature to accept raw arraysinstead of scalar field types
q_beta%vf(i)%sftoq_beta(i)%sfthroughoutthe module
m_boundary_conditions.fpp
Update boundary condition array indexing to positive indicessrc/pre_process/m_boundary_conditions.fpp
(-1:1)to(1:2)infunction signatures
(-1)to newpositive index
(1)and(1)to(2)bc_typearray accesses to use new indexing scheme withexplicit if-statements for location mapping
m_viscous.fpp
Add case optimization conditionals to viscous modulesrc/simulation/m_viscous.fpp
case.fppinclude directive for case optimization supportMFC_CASE_OPTIMIZATIONconditional to skip 3D-specific calculations in2D cases
for dimensional optimization
m_riemann_solvers.fpp
Optimize MHD calculations and improve GPU parallelizationsrc/simulation/m_riemann_solvers.fpp
case.fppinclude for case optimization supportMFC_CASE_OPTIMIZATIONconditionals to optimize 2D cases
memory management
initialization in HLLC solver
m_mpi_common.fpp
Support mixed precision and dynamic dimensionality in MPIsrc/common/m_mpi_common.fpp
case.fppinclude directive for case optimizationhalo_sizefromintegertointeger(kind=8)for 64-bit supportnum_dimsinstead of hardcoded value 3for 2D/3D compatibility
wpandstpprecision kinds inMPI buffer operations
m_cbc.fpp
Optimize CBC module with case-specific conditionalssrc/simulation/m_cbc.fpp
case.fppinclude for case optimization supportdpres_dsvariable declaration from module level to local scopewithin subroutine
dpres_dsto private variable list in GPU parallel loopdpi_inf_dtassignment withMFC_CASE_OPTIMIZATIONconditionalfor single-fluid optimization
m_global_parameters.fpp
Add simplex noise perturbation parameterssrc/pre_process/m_global_parameters.fpp
simplex_perturblogical flag for simplex noise perturbationsupport
simplex_paramsderived type variable to store simplex noiseparameters
routine
m_qbmm.fpp
Mixed precision support and case optimization for QBMMsrc/simulation/m_qbmm.fpp
pbparameter type fromreal(wp)toreal(stp)for mixedprecision support
conditionals for case optimization
i1, i2, jto GPU parallel loop privatelist
rhs_pbandrhs_mvprecision typesm_simplex_noise.fpp
New simplex noise generation modulesrc/pre_process/m_simplex_noise.fpp
generation
f_simplex3dandf_simplex2dfunctions for procedural noisem_fftw.fpp
FFT filter GPU implementation refactoringsrc/simulation/m_fftw.fpp
addresses instead of pointers
#:block UNDEF_CCEwrapper and simplified GPU data managementloop
m_start_up.fpp
Mixed precision I/O and time stepping improvementssrc/simulation/m_start_up.fpp
WP_MOKfrom 8 bytes to 4 bytes for mixed precision I/Ompi_io_typemultiplier for datasize
q_cons_ts(1)toq_cons_ts(2)for multi-stage timesteppers
real()conversion for mixedprecision
q_cons_tempm_rhs.fpp
Mixed precision and GPU compatibility in RHS modulesrc/simulation/m_rhs.fpp
declarations
integer(kind=8)for large arrayindexing
if (.not. igr)conditionbc_typeparameter dimension from1:3, -1:1to1:3, 1:2pb_in,mv_inparameters toreal(stp)type for mixed precisionm_icpp_patches.fpp
Mixed precision support for patch identification arrayssrc/pre_process/m_icpp_patches.fpp
patch_id_fparray type based onMFC_MIXED_PRECISIONflaginteger(kind=1)for mixed precision mode, standardintegerotherwise
efficiency
m_ibm.fpp
Mixed precision support for IBM modulesrc/simulation/m_ibm.fpp
pb_inandmv_inparameters toreal(stp)type for mixedprecision
pb_inandmv_inin interpolation callslevelset_inaccess to use explicitreal()conversionpb_in, mv_inintent from INOUT to IN in interpolationsubroutine
inline_capillary.fpp
Case optimization for capillary tensor calculationssrc/simulation/include/inline_capillary.fpp
num_dims > 22 files
m_muscl.fpp
Rename MUSCL arrays and fix GPU loop syntaxsrc/simulation/m_muscl.fpp
v_rs_ws_x/y/ztov_rs_ws_x/y/z_musclto avoid naming conflicts#:endcalldirectives to include full macro name for claritym_chemistry.fpp
GPU compatibility fixes for chemistry modulesrc/common/m_chemistry.fpp
issues
T_infor type conversion in temperaturecalculations
#:block UNDEF_AMDto handle AMDcompiler differences
48 files