Skip to content

Conversation

@iotamudelta
Copy link
Contributor

@iotamudelta iotamudelta commented Apr 21, 2022

This initial work via opt-in configure option enables offloading of some sgemm, dgemm, cgemm, zgemm operations to AMD GPUs via AMD's rocBLAS. It hence requires a working ROCm software stack and ROCm-enabled accelerator.

After enabling offloading capability, the default is "never offload". Offloading can be controlled through three environment variables:
BLIS_OFFLOAD=[never,always,thresh] - thresh enables threshold-dependent offloading
BLIS_OFFLOAD_SGEMM_THRESH=$number1 the threshold of MN size of sgemm after which offloading should be attempted - must be specified
BLIS_OFFLOAD_DGEMM_THRESH=$number2 the threshold of M
N size of dgemm after which offloading should be attempted - must be specified
BLIS_OFFLOAD_CGEMM_THRESH=$number3 the threshold of MN size of cgemm after which offloading should be attempted - must be specified
BLIS_OFFLOAD_ZGEMM_THRESH=$number4 the threshold of M
N size of zgemm after which offloading should be attempted - must be specified

Currently known limitations:

  • offloading decision is made purely based on M*N size of gemm in conjunction w/ user-controlled thresholds (or always/never offload)
  • rocBLAS is initialized w/ default settings - it'll hence use the first enumerated accelerator in a system and default stream

Future work:

  • offloading of integer gemms can be supported
  • better offloading decision engine and performance model with less user input required

@jeffhammond
Copy link
Member

Why not make it a draft commit if you don't want it merged?

@iotamudelta
Copy link
Contributor Author

iotamudelta commented Apr 29, 2022

@jeffhammond should be ready for merge soon after WIP items done - and I'm happy to get any functional reviews already.

@iotamudelta iotamudelta changed the title [WIP] [DONTMERGE] Optional offloading to AMD GPUs Optional offloading to AMD GPUs Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants