This is the repository for a MLIR-based convolution, GEMM, attention, GEMM+GEMM and CONV+GEMM kernel generator
targetting AMD hardware. This generator is mainly used from
MIGraphX,
but it can be used on a standalone basis. (The ability to use this code via
torch-mlir is being investigated as well.)
To build the system
mkdir build
cd build
cmake -G Ninja .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++
ninja check-rocmlirNote that we require building against a relatively recent clang. The above commands specify the ROCm clang release in order to match our standard development practice.
To not actually run the tests, use check-rocmlir-build-only.
To build the static library that is used by MIGraphX
mkdir build
cd build
cmake -G Ninja .. -DBUILD_FAT_LIBROCKCOMPILER=On -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++
ninjaand to install it so MIGraphX can find it
cmake --install . --prefix [your/MIGraphX/deps/folder/path]
For usage examples, see mlir/test/rocmlir-driver, especiallly the files
sanity.mlir and the contents of the e2e_for_pr directory.
This project also includes code that translates from TOSA to kernels, see
mlir/test/fusion for examples of how to invoke it.
In general (with all invocations given from the build directory)
./bin/rocmlir-gengenerates high-level convolution operations and host code. Many of the options control data layout, size, etc, but some other useful flags are:-mfma=on(which enables mfma usage) (or-wmma=onfor gfx11 targets)-mfma=off(which disables mfma usage) (or-wmma=offfor gfx11 targets)-ph(which causes host code to be generated)-pv(which makes the host code validtae the results against a reference)-pv_with_gpu(which uses a GPU validator instead)-pr(which prints kkrnel results)
./bin/rocmlir-driveris a wrapper around the kernel generation pipeline. Use-c(or--kernel-pipeline=full --host-pipeline=runner) to run the default pipeline
The result of this pipeline should, most simply, be passed to the rocm-run
script in mlir/utils/widgets//rocm-run, which calls mlir-runner with
the appropriate flags and infers the pathnames for libraries correctly.
In more detail, the result of the above pipeline can be passed to
./external/llvm-project/llvm/bin/mlir-runner .
mlir-runner needs to link the generated host code against libraries that
map from MLIR operations to the HIP runtime.
The required command-line arguments (if running from build/) are
./external/llvm-project/llvm/bin/mlir-runner --shared-libs=./external/llvm-project/llvm/lib/libmlir_rocm_runtime.so,./lib/libconv-validation-wrappers.so,./external/llvm-project/llvm/lib/libmlir_runner_utils.so --entry-point-result=voidAdding --debug-only=serialize-to-blob to the rocmlir-driver invocation
will cause the GCN assembly code for the kernels being executed to be dumped to
standard error.
By default, we infer the use of GPU-specific acceleration instructions, like MFMA or WMMA, based on the features of the currently available GPU.
To disable this, add -DROCMLIR_GEN_FLAGS="-mfma=off -wmma=off" to
the cmake invocations given above. Note that this will not affect behavior
in production/static library builds, which do not use rocmlir-gen.