I recently discovered the AcceleratedKernels package, which provides a unified interface for parallelisation on CPUs, clusters, and GPUs. We could consider switching to AK.map! and other AK high-order functions for the vectorised HMC implementation.
Related discussions: #390 (comment)
Question: would AK.map! utilise SIMD similarly to the current AHMC vectorisation?