Skip to content

Commit 4414ede

Browse files
neon60j-stephan
andcommitted
SAXPY tutorial: roc-obj replace with llvm-objdump
Apply suggestions from code review Co-authored-by: Jan Stephan <[email protected]>
1 parent f25e829 commit 4414ede

File tree

1 file changed

+47
-48
lines changed

1 file changed

+47
-48
lines changed

docs/tutorial/saxpy.rst

Lines changed: 47 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -348,89 +348,88 @@ find out what device binary flavors are embedded into the executable?
348348
artifacts on disk. Add the ROCmCC installation folder to your PATH if you
349349
want to use these utilities (the utilities expect them to be on the PATH).
350350

351-
You can list embedded program binaries using ``roc-obj-ls``.
351+
You can list embedded program binaries using ``llvm-objdump`` with
352+
``--offloading`` option.
352353

353354
.. code-block:: bash
354355
355-
roc-obj-ls ./saxpy
356+
llvm-objdump --offloading ./saxpy
356357
357358
It should return something like:
358359

359360
.. code-block:: shell
360361
361-
1 host-x86_64-unknown-linux file://./saxpy#offset=12288&size=0
362-
1 hipv4-amdgcn-amd-amdhsa--gfx803 file://./saxpy#offset=12288&size=9760
362+
./saxpy: file format elf64-x86-64
363+
Extracting offload bundle: ./saxpy.0.host-x86_64-unknown-linux-gnu-
364+
Extracting offload bundle: ./saxpy.0.hipv4-amdgcn-amd-amdhsa--gfx942
363365
364366
The compiler embeds a version 4 code object (more on `code
365367
object versions <https://www.llvm.org/docs/AMDGPUUsage.html#code-object-metadata>`_)
366-
and used the LLVM target triple `amdgcn-amd-amdhsa--gfx803` (more on `target triples
368+
and used the LLVM target triple ``amdgcn-amd-amdhsa--gfx942`` (more on `target triples
367369
<https://www.llvm.org/docs/AMDGPUUsage.html#target-triples>`_). You can
368370
extract that program object in a disassembled fashion for human consumption
369-
via ``roc-obj``.
371+
via ``llvm-objdump``.
370372

371373
.. code-block:: bash
372374
373-
roc-obj -t gfx803 -d ./saxpy
375+
llvm-objdump --disassemble saxpy.0.hipv4-amdgcn-amd-amdhsa--gfx942 > saxpy.s
374376
375-
This creates two files on disk and ``.s`` extension is of most interest.
376-
Opening this file or dumping it to the console using ``cat``
377-
lets find the disassembled binary of the SAXPY compute kernel, something
378-
similar to:
377+
This creates a file on the disk called ``saxpy.s`` Opening this file or
378+
dumping it to the console using ``cat`` lets find the disassembled binary of
379+
the SAXPY compute kernel, something similar to:
379380

380381
.. code-block::
381382
383+
saxpy.0.hipv4-amdgcn-amd-amdhsa--gfx942: file format elf64-amdgpu
384+
382385
Disassembly of section .text:
383386
384-
<_Z12saxpy_kernelfPKfPfj>:
385-
s_load_dword s0, s[4:5], 0x2c // 000000001000: C0020002 0000002C
386-
s_load_dword s1, s[4:5], 0x18 // 000000001008: C0020042 00000018
387-
s_waitcnt lgkmcnt(0) // 000000001010: BF8C007F
388-
s_and_b32 s0, s0, 0xffff // 000000001014: 8600FF00 0000FFFF
389-
s_mul_i32 s6, s6, s0 // 00000000101C: 92060006
390-
v_add_u32_e32 v0, vcc, s6, v0 // 000000001020: 32000006
391-
v_cmp_gt_u32_e32 vcc, s1, v0 // 000000001024: 7D980001
392-
s_and_saveexec_b64 s[0:1], vcc // 000000001028: BE80206A
393-
s_cbranch_execz 22 // 00000000102C: BF880016 <_Z12saxpy_kernelfPKfPfj+0x88>
394-
s_load_dwordx4 s[0:3], s[4:5], 0x8 // 000000001030: C00A0002 00000008
395-
v_mov_b32_e32 v1, 0 // 000000001038: 7E020280
396-
v_lshlrev_b64 v[0:1], 2, v[0:1] // 00000000103C: D28F0000 00020082
397-
s_waitcnt lgkmcnt(0) // 000000001044: BF8C007F
398-
v_mov_b32_e32 v3, s1 // 000000001048: 7E060201
399-
v_add_u32_e32 v2, vcc, s0, v0 // 00000000104C: 32040000
400-
v_addc_u32_e32 v3, vcc, v3, v1, vcc // 000000001050: 38060303
401-
flat_load_dword v2, v[2:3] // 000000001054: DC500000 02000002
402-
v_mov_b32_e32 v3, s3 // 00000000105C: 7E060203
403-
v_add_u32_e32 v0, vcc, s2, v0 // 000000001060: 32000002
404-
v_addc_u32_e32 v1, vcc, v3, v1, vcc // 000000001064: 38020303
405-
flat_load_dword v3, v[0:1] // 000000001068: DC500000 03000000
406-
s_load_dword s0, s[4:5], 0x0 // 000000001070: C0020002 00000000
407-
s_waitcnt vmcnt(0) lgkmcnt(0) // 000000001078: BF8C0070
408-
v_mac_f32_e32 v3, s0, v2 // 00000000107C: 2C060400
409-
flat_store_dword v[0:1], v3 // 000000001080: DC700000 00000300
410-
s_endpgm // 000000001088: BF810000
387+
0000000000001900 <_Z12saxpy_kernelfPKfPfj>:
388+
s_load_dword s3, s[0:1], 0x2c // 000000001900: C00200C0 0000002C
389+
s_load_dword s4, s[0:1], 0x18 // 000000001908: C0020100 00000018
390+
s_waitcnt lgkmcnt(0) // 000000001910: BF8CC07F
391+
s_and_b32 s3, s3, 0xffff // 000000001914: 8603FF03 0000FFFF
392+
s_mul_i32 s2, s2, s3 // 00000000191C: 92020302
393+
v_add_u32_e32 v0, s2, v0 // 000000001920: 68000002
394+
v_cmp_gt_u32_e32 vcc, s4, v0 // 000000001924: 7D980004
395+
s_and_saveexec_b64 s[2:3], vcc // 000000001928: BE82206A
396+
s_cbranch_execz 20 // 00000000192C: BF880014 <_Z12saxpy_kernelfPKfPfj+0x80>
397+
s_load_dwordx4 s[4:7], s[0:1], 0x8 // 000000001930: C00A0100 00000008
398+
v_mov_b32_e32 v1, 0 // 000000001938: 7E020280
399+
v_lshlrev_b64 v[0:1], 2, v[0:1] // 00000000193C: D28F0000 00020082
400+
s_load_dword s0, s[0:1], 0x0 // 000000001944: C0020000 00000000
401+
s_waitcnt lgkmcnt(0) // 00000000194C: BF8CC07F
402+
v_lshl_add_u64 v[2:3], s[4:5], 0, v[0:1] // 000000001950: D2080002 04010004
403+
v_lshl_add_u64 v[0:1], s[6:7], 0, v[0:1] // 000000001958: D2080000 04010006
404+
global_load_dword v4, v[2:3], off // 000000001960: DC508000 047F0002
405+
global_load_dword v5, v[0:1], off // 000000001968: DC508000 057F0000
406+
s_waitcnt vmcnt(0) // 000000001970: BF8C0F70
407+
v_fmac_f32_e32 v5, s0, v4 // 000000001974: 760A0800
408+
global_store_dword v[0:1], v5, off // 000000001978: DC708000 007F0500
409+
s_endpgm // 000000001980: BF810000
410+
s_nop 0 // 000000001984: BF800000
411411
412412
Alternatively, call the compiler with ``--save-temps`` to dump all device
413413
binary to disk in separate files.
414414

415415
.. code-block:: bash
416416
417-
amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 --save-temps
417+
amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 --save-temps --offload-arch=gfx942
418418
419419
List all the temporaries created while compiling ``main.hip`` with:
420420

421421
.. code-block:: bash
422422
423423
ls main-hip-amdgcn-amd-amdhsa-*
424-
main-hip-amdgcn-amd-amdhsa-gfx803.bc
425-
main-hip-amdgcn-amd-amdhsa-gfx803.cui
426-
main-hip-amdgcn-amd-amdhsa-gfx803.o
427-
main-hip-amdgcn-amd-amdhsa-gfx803.out
428-
main-hip-amdgcn-amd-amdhsa-gfx803.out.resolution.txt
429-
main-hip-amdgcn-amd-amdhsa-gfx803.s
430-
424+
main-hip-amdgcn-amd-amdhsa-gfx942.bc
425+
main-hip-amdgcn-amd-amdhsa-gfx942.o
426+
main-hip-amdgcn-amd-amdhsa-gfx942.out.resolution.txt
427+
main-hip-amdgcn-amd-amdhsa-gfx942.hipi
428+
main-hip-amdgcn-amd-amdhsa-gfx942.out
429+
main-hip-amdgcn-amd-amdhsa-gfx942.s
431430
Files with the ``.s`` extension hold the disassembled contents of the binary.
432431
The filename notes the graphics IPs used by the compiler. The contents of
433-
this file are similar to what ``roc-obj`` printed to the console.
432+
this file are similar to the `*.s` file created with ``llvm-objdump`` earlier.
434433

435434
.. tab-item:: Linux and NVIDIA
436435
:sync: linux-nvidia
@@ -491,7 +490,7 @@ find out what device binary flavors are embedded into the executable?
491490
492491
We can see that the compiler embedded a version 4 code object (more on code
493492
`object versions <https://www.llvm.org/docs/AMDGPUUsage.html#code-object-metadata>`_) and
494-
used the LLVM target triple `amdgcn-amd-amdhsa--gfx906` (more on `target triples
493+
used the LLVM target triple ``amdgcn-amd-amdhsa--gfx906`` (more on `target triples
495494
<https://www.llvm.org/docs/AMDGPUUsage.html#target-triples>`_). Don't be
496495
alarmed about linux showing up as a binary format, AMDGPU binaries uploaded to
497496
the GPU for execution are proper linux ELF binaries in their format.

0 commit comments

Comments
 (0)