-
Couldn't load subscription status.
- Fork 372
Consolidating OpenACC device-host memory transfers #1315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Consolidating OpenACC device-host memory transfers #1315
Conversation
ac98504 to
4845ce2
Compare
e8c9c64 to
e4c2509
Compare
|
@mgduda I think it might be ready for a second look. I did try to move the |
This PR consolidates much of the OpenACC host and device data transfers during the course of the dynamical execution to two subroutines mpas_atm_pre_dynamics _h2d and mpas_atm_post_dynamics_d2h that are called before and after the call to atm_srk3 subroutine. Due to atm_compute_solve_diagnostics also being called once before the start of model run, we also have a pair of subroutines mpas_atm _pre_computesolvediag_h2d and mpas_atm_post_computesolvediag_d2h to handle data movements around the first call to atm_compute_solve_diagnostics. Any fields copied onto the device in these subroutines are removed from explicit data movement statements in the dynamical core. The mesh/time-invariant fields are still copied onto the device in mpas_atm_ dynamics_init and removed from the device in mpas_atm_dynamics_finalize, with the exception of select fields moved in mpas_atm_pre_computesolvediag_h2d and mpas_atm_post_computesolvediag_d2h. This is a special case due to atm_compute_ solve_diagnostics being called for the first time before the call to mpas_atm_ dynamics_init This PR also includes explicit host-device data transfers in the mpas_atm_iau, mpas_atmphys_interface and mpas_atmphys_todynamics modules to ensure that the physics and IAU regions, which run on CPU, use the latest values from the dynamical core running on GPUs, and vice versa. In addition, this PR also includes explicit data transfers around halo exchanges in the atm_srk3 subroutine. These subroutines for data routines, and the acc update statements are an interim solution until we have a book-keeping method in place. This PR also introduces a couple of new timers to keep track of the cost of data transfers.
…t_2d This commit introduces two OpenACC data transfer routines, mpas_reconstruct_2d_h2d and mpas_reconstruct_2d_d2h in order to remove the data transfers from the mpas_reconstruct_2d routine itself. This also allows us to remove extraneous data movements within the atm_srk3 routine. mpas_reconstruct_2d_h2d and mpas_reconstruct_2d_d2h are called before and after the call to mpas_reconstruct in atm_mpas_init_block. And the reconstructed vector fields are also copied to and from the device before and after every dynamics call in mpas_atm_pre_dynamics_h2d and mpas_atm_post_dynamics_d2h.
31a1ccd to
4b7137d
Compare
This commit introduces changes to ensure that building with -DCURVATURE still produces the correct results, compared to the nvhpc cpu reference. This involves removing the data movement of the reconstructed zonal and meridional velocities in the atm_compute_dyn_tend_work subroutine and instead using copyin for the same fields in mpas_atm_pre_dynamics_h2d. This commit also removes the ACC data Xfer timers for the atm_compute_dyn_tend_work subroutine, as we only have create/delete statements
This PR consolidates much of the OpenACC host and device data transfers during the course of the dynamical execution to two subroutines
mpas_atm_pre_dynamics_h2dandmpas_atm_post_dynamics_d2hthat are called before and after the call toatm_srk3subroutine. Due toatm_compute_solve_diagnosticsalso being called once before the start of model run, we also have a pair of subroutinesmpas_atm_pre_computesolvediag_h2dandmpas_atm_post_computesolvediag_d2hto handle data movements around the first call toatm_compute_solve_diagnostics. Any fields copied onto the device in these subroutines are removed from explicit data movement statements in the dynamical core.The mesh/time-invariant fields are still copied onto the device in
mpas_atm_dynamics_initand removed from the device inmpas_atm_dynamics_finalize, with the exception of select fields moved inmpas_atm_pre_computesolvediag_h2dandmpas_atm_post_computesolvediag_d2h. This is a special case due toatm_compute_solve_diagnosticsbeing called for the first time before the call tompas_atm_dynamics_initThis PR also includes explicit host-device data transfers in the
mpas_atm_iau,mpas_atmphys_interfaceandmpas_atmphys_todynamicsmodules to ensure that the physics and IAU regions, which run on CPU, use the latest values from the dynamical core running on GPUs, and vice versa. In addition, this PR also includes explicit data transfers around halo exchanges in theatm_srk3subroutine.These subroutines for data routines, and the
acc updatestatements are an interim solution until we have a book-keeping method in place.This PR also introduces a couple of new timers to keep track of the cost of data transfers.