You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug Fixes in LPGEMM for AVX512(SkyLake) machine (#24)
* Bug Fixes in LPGEMM for AVX512(SkyLake) machine
- B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that
doesn't support BF16 instructions, the BF16 input is unre-ordered and
converted to FP32 to use FP32 kernels.
- For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the
matrix to the re-ordered buffer array. But the un-reordering to FP32
requires the matrix to have size multiple of 16 along n and multiple
of 2 along k dimension.
- The entry condition to the above has been modified for AVX512 configuration.
- In bf16 API, the tiny path entry check has been modified to prevent
seg fault while AOCL_ENABLE_INSTRUCTIONS=AVX2 is set in BF16 supporting
machines.
- Modified existing store instructions in FP32 AVX512 kernels to support
execution in machines that has AVX512 support but not BF16/VNNI(SkyLake).
- Added Bf16 beta and store types in FP32 avx512_256 kernels
AMD Internal: [SWLCSG-3552]
* Bug Fixes in LPGEMM for AVX512(SkyLake) machine
- B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that
doesn't support BF16 instructions, the BF16 input is unre-ordered and
converted to FP32 to use FP32 kernels.
- For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the
matrix to the re-ordered buffer array. But the un-reordering to FP32
requires the matrix to have size multiple of 16 along n and multiple
of 2 along k dimension.
- The entry condition to the above has been modified for AVX512 configuration.
- In bf16 API, the tiny path entry check has been modified to prevent
seg fault while AOCL_ENABLE_INSTRUCTIONS=AVX2 is set in BF16 supporting
machines.
- Modified existing store instructions in FP32 AVX512 kernels to support
execution in machines that has AVX512 support but not BF16/VNNI(SkyLake).
- Added Bf16 beta and store types, along with BIAS and ZP in FP32 avx512_256
kernels
AMD Internal: [SWLCSG-3552]
* Bug Fixes in LPGEMM for AVX512(SkyLake) machine
- Support added in FP32 512_256 kerenls for : Beta, BIAS, Zero-point and
BF16 store types for bf16bf16f32obf16 API execution in AVX2 mode.
- B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that
doesn't support BF16 instructions, the BF16 input is unre-ordered and
converted to FP32 type to use FP32 kernels.
- For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the
matrix to the re-ordered buffer array. But the un-reordering to FP32
requires the matrix to have size multiple of 16 along n and multiple
of 2 along k dimension. The entry condition here has been modified for
AVX512 configuration.
- Fix for seg fault with AOCL_ENABLE_INSTRUCTIONS=AVX2 mode in BF16/VNNI
ISA supporting configruations:
- BF16 tiny path entry check has been modified to take into account arch_id
to ensure improper entry into the tiny kernel.
- The store in BF16->FP32 col-major for m = 1 conditions were updated to
correct storage pattern,
- BF16 beta load macro was modified to account for data in unaligned memory.
- Modified existing store instructions in FP32 AVX512 kernels to support
execution in machines that has AVX512 support but not BF16/VNNI(SkyLake)
AMD Internal: [SWLCSG-3552]
---------
Co-authored-by: VarshaV <[email protected]>
0 commit comments