@@ -348,89 +348,88 @@ find out what device binary flavors are embedded into the executable?
348
348
artifacts on disk. Add the ROCmCC installation folder to your PATH if you
349
349
want to use these utilities (the utilities expect them to be on the PATH).
350
350
351
- You can list embedded program binaries using ``roc-obj-ls ``.
351
+ You can list embedded program binaries using ``llvm-objdump `` with
352
+ ``--offloading `` option.
352
353
353
354
.. code-block :: bash
354
355
355
- roc-obj-ls ./saxpy
356
+ llvm-objdump --offloading ./saxpy
356
357
357
358
It should return something like:
358
359
359
360
.. code-block :: shell
360
361
361
- 1 host-x86_64-unknown-linux file://./saxpy#offset=12288& size=0
362
- 1 hipv4-amdgcn-amd-amdhsa--gfx803 file://./saxpy#offset=12288& size=9760
362
+ ./saxpy: file format elf64-x86-64
363
+ Extracting offload bundle: ./saxpy.0.host-x86_64-unknown-linux-gnu-
364
+ Extracting offload bundle: ./saxpy.0.hipv4-amdgcn-amd-amdhsa--gfx942
363
365
364
366
The compiler embeds a version 4 code object (more on `code
365
367
object versions <https://www.llvm.org/docs/AMDGPUUsage.html#code-object-metadata> `_)
366
- and used the LLVM target triple `amdgcn-amd-amdhsa--gfx803 ` (more on `target triples
368
+ and used the LLVM target triple `` amdgcn-amd-amdhsa--gfx942 ` ` (more on `target triples
367
369
<https://www.llvm.org/docs/AMDGPUUsage.html#target-triples> `_). You can
368
370
extract that program object in a disassembled fashion for human consumption
369
- via ``roc-obj ``.
371
+ via ``llvm-objdump ``.
370
372
371
373
.. code-block :: bash
372
374
373
- roc-obj -t gfx803 -d ./ saxpy
375
+ llvm-objdump --disassemble saxpy.0.hipv4-amdgcn-amd-amdhsa--gfx942 > saxpy.s
374
376
375
- This creates two files on disk and ``.s `` extension is of most interest.
376
- Opening this file or dumping it to the console using ``cat ``
377
- lets find the disassembled binary of the SAXPY compute kernel, something
378
- similar to:
377
+ This creates a file on the disk called ``saxpy.s `` Opening this file or
378
+ dumping it to the console using ``cat `` lets find the disassembled binary of
379
+ the SAXPY compute kernel, something similar to:
379
380
380
381
.. code-block ::
381
382
383
+ saxpy.0.hipv4-amdgcn-amd-amdhsa--gfx942: file format elf64-amdgpu
384
+
382
385
Disassembly of section .text:
383
386
384
- <_Z12saxpy_kernelfPKfPfj>:
385
- s_load_dword s0, s[4:5], 0x2c // 000000001000: C0020002 0000002C
386
- s_load_dword s1, s[4:5], 0x18 // 000000001008: C0020042 00000018
387
- s_waitcnt lgkmcnt(0) // 000000001010: BF8C007F
388
- s_and_b32 s0, s0, 0xffff // 000000001014: 8600FF00 0000FFFF
389
- s_mul_i32 s6, s6, s0 // 00000000101C: 92060006
390
- v_add_u32_e32 v0, vcc, s6, v0 // 000000001020: 32000006
391
- v_cmp_gt_u32_e32 vcc, s1, v0 // 000000001024: 7D980001
392
- s_and_saveexec_b64 s[0:1], vcc // 000000001028: BE80206A
393
- s_cbranch_execz 22 // 00000000102C: BF880016 <_Z12saxpy_kernelfPKfPfj+0x88>
394
- s_load_dwordx4 s[0:3], s[4:5], 0x8 // 000000001030: C00A0002 00000008
395
- v_mov_b32_e32 v1, 0 // 000000001038: 7E020280
396
- v_lshlrev_b64 v[0:1], 2, v[0:1] // 00000000103C: D28F0000 00020082
397
- s_waitcnt lgkmcnt(0) // 000000001044: BF8C007F
398
- v_mov_b32_e32 v3, s1 // 000000001048: 7E060201
399
- v_add_u32_e32 v2, vcc, s0, v0 // 00000000104C: 32040000
400
- v_addc_u32_e32 v3, vcc, v3, v1, vcc // 000000001050: 38060303
401
- flat_load_dword v2, v[2:3] // 000000001054: DC500000 02000002
402
- v_mov_b32_e32 v3, s3 // 00000000105C: 7E060203
403
- v_add_u32_e32 v0, vcc, s2, v0 // 000000001060: 32000002
404
- v_addc_u32_e32 v1, vcc, v3, v1, vcc // 000000001064: 38020303
405
- flat_load_dword v3, v[0:1] // 000000001068: DC500000 03000000
406
- s_load_dword s0, s[4:5], 0x0 // 000000001070: C0020002 00000000
407
- s_waitcnt vmcnt(0) lgkmcnt(0) // 000000001078: BF8C0070
408
- v_mac_f32_e32 v3, s0, v2 // 00000000107C: 2C060400
409
- flat_store_dword v[0:1], v3 // 000000001080: DC700000 00000300
410
- s_endpgm // 000000001088: BF810000
387
+ 0000000000001900 <_Z12saxpy_kernelfPKfPfj>:
388
+ s_load_dword s3, s[0:1], 0x2c // 000000001900: C00200C0 0000002C
389
+ s_load_dword s4, s[0:1], 0x18 // 000000001908: C0020100 00000018
390
+ s_waitcnt lgkmcnt(0) // 000000001910: BF8CC07F
391
+ s_and_b32 s3, s3, 0xffff // 000000001914: 8603FF03 0000FFFF
392
+ s_mul_i32 s2, s2, s3 // 00000000191C: 92020302
393
+ v_add_u32_e32 v0, s2, v0 // 000000001920: 68000002
394
+ v_cmp_gt_u32_e32 vcc, s4, v0 // 000000001924: 7D980004
395
+ s_and_saveexec_b64 s[2:3], vcc // 000000001928: BE82206A
396
+ s_cbranch_execz 20 // 00000000192C: BF880014 <_Z12saxpy_kernelfPKfPfj+0x80>
397
+ s_load_dwordx4 s[4:7], s[0:1], 0x8 // 000000001930: C00A0100 00000008
398
+ v_mov_b32_e32 v1, 0 // 000000001938: 7E020280
399
+ v_lshlrev_b64 v[0:1], 2, v[0:1] // 00000000193C: D28F0000 00020082
400
+ s_load_dword s0, s[0:1], 0x0 // 000000001944: C0020000 00000000
401
+ s_waitcnt lgkmcnt(0) // 00000000194C: BF8CC07F
402
+ v_lshl_add_u64 v[2:3], s[4:5], 0, v[0:1] // 000000001950: D2080002 04010004
403
+ v_lshl_add_u64 v[0:1], s[6:7], 0, v[0:1] // 000000001958: D2080000 04010006
404
+ global_load_dword v4, v[2:3], off // 000000001960: DC508000 047F0002
405
+ global_load_dword v5, v[0:1], off // 000000001968: DC508000 057F0000
406
+ s_waitcnt vmcnt(0) // 000000001970: BF8C0F70
407
+ v_fmac_f32_e32 v5, s0, v4 // 000000001974: 760A0800
408
+ global_store_dword v[0:1], v5, off // 000000001978: DC708000 007F0500
409
+ s_endpgm // 000000001980: BF810000
410
+ s_nop 0 // 000000001984: BF800000
411
411
412
412
Alternatively, call the compiler with ``--save-temps `` to dump all device
413
413
binary to disk in separate files.
414
414
415
415
.. code-block :: bash
416
416
417
- amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 --save-temps
417
+ amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 --save-temps --offload-arch=gfx942
418
418
419
419
List all the temporaries created while compiling ``main.hip `` with:
420
420
421
421
.. code-block :: bash
422
422
423
423
ls main-hip-amdgcn-amd-amdhsa-*
424
- main-hip-amdgcn-amd-amdhsa-gfx803.bc
425
- main-hip-amdgcn-amd-amdhsa-gfx803.cui
426
- main-hip-amdgcn-amd-amdhsa-gfx803.o
427
- main-hip-amdgcn-amd-amdhsa-gfx803.out
428
- main-hip-amdgcn-amd-amdhsa-gfx803.out.resolution.txt
429
- main-hip-amdgcn-amd-amdhsa-gfx803.s
430
-
424
+ main-hip-amdgcn-amd-amdhsa-gfx942.bc
425
+ main-hip-amdgcn-amd-amdhsa-gfx942.o
426
+ main-hip-amdgcn-amd-amdhsa-gfx942.out.resolution.txt
427
+ main-hip-amdgcn-amd-amdhsa-gfx942.hipi
428
+ main-hip-amdgcn-amd-amdhsa-gfx942.out
429
+ main-hip-amdgcn-amd-amdhsa-gfx942.s
431
430
Files with the ``.s `` extension hold the disassembled contents of the binary.
432
431
The filename notes the graphics IPs used by the compiler. The contents of
433
- this file are similar to what `` roc-obj `` printed to the console .
432
+ this file are similar to the ` *.s ` file created with `` llvm-objdump `` earlier .
434
433
435
434
.. tab-item :: Linux and NVIDIA
436
435
:sync: linux-nvidia
@@ -491,7 +490,7 @@ find out what device binary flavors are embedded into the executable?
491
490
492
491
We can see that the compiler embedded a version 4 code object (more on code
493
492
`object versions <https://www.llvm.org/docs/AMDGPUUsage.html#code-object-metadata >`_) and
494
- used the LLVM target triple `amdgcn-amd-amdhsa--gfx906 ` (more on `target triples
493
+ used the LLVM target triple `` amdgcn-amd-amdhsa--gfx906 ` ` (more on `target triples
495
494
<https://www.llvm.org/docs/AMDGPUUsage.html#target-triples> `_). Don't be
496
495
alarmed about linux showing up as a binary format, AMDGPU binaries uploaded to
497
496
the GPU for execution are proper linux ELF binaries in their format.
0 commit comments