Skip to content

Commit 77831c9

Browse files
authored
SWDEV-398297 - HIP Documentation Updates for ROCm 5.6-needed for 23.10 (#3250)
Change-Id: I514a0a941fcc6ebd940f5c6c6700c3c305f04a56
1 parent 22e9263 commit 77831c9

File tree

6 files changed

+123
-75
lines changed

6 files changed

+123
-75
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Key features include:
77
* HIP is very thin and has little or no performance impact over coding directly in CUDA mode.
88
* HIP allows coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.
99
* HIP allows developers to use the "best" development environment and tools on each target platform.
10-
* The [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/master/README.md) tools automatically convert source from CUDA to HIP.
10+
* The [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/README.md) tools automatically convert source from CUDA to HIP.
1111
* Developers can specialize for the platform (CUDA or AMD) to tune for performance or handle tricky cases.
1212

1313
New projects can be developed directly in the portable HIP C++ language and can run on either NVIDIA or AMD platforms. Additionally, HIP provides porting tools which make it easy to port existing CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.

docs/developer_guide/logging.md

Lines changed: 50 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,7 @@ ClPrint(amd::LOG_INFO, amd::LOG_INIT, "Initializing HSA stack.");
8585

8686
## HIP Logging Example:
8787

88-
Below is an example to enable HIP logging and get logging information during
89-
execution of hipinfo,
88+
Below is an example to enable HIP logging and get logging information during execution of hipinfo on Linux,
9089

9190
```console
9291
user@user-test:~/hip/bin$ export AMD_LOG_LEVEL=4
@@ -136,22 +135,7 @@ concurrentKernels: 1
136135
cooperativeLaunch: 0
137136
cooperativeMultiDeviceLaunch: 0
138137
arch.hasGlobalInt32Atomics: 1
139-
arch.hasGlobalFloatAtomicExch: 1
140-
arch.hasSharedInt32Atomics: 1
141-
arch.hasSharedFloatAtomicExch: 1
142-
arch.hasFloatAtomicAdd: 1
143-
arch.hasGlobalInt64Atomics: 1
144-
arch.hasSharedInt64Atomics: 1
145-
arch.hasDoubles: 1
146-
arch.hasWarpVote: 1
147-
arch.hasWarpBallot: 1
148-
arch.hasWarpShuffle: 1
149-
arch.hasFunnelShift: 0
150-
arch.hasThreadFenceSystem: 1
151-
arch.hasSyncThreadsExt: 0
152-
arch.hasSurfaceFuncs: 0
153-
arch.has3dGrid: 1
154-
arch.hasDynamicParallelism: 0
138+
...
155139
gcnArch: 1012
156140
isIntegrated: 0
157141
maxTexture1D: 65536
@@ -178,6 +162,54 @@ memInfo.total: 7.98 GB
178162
memInfo.free: 7.98 GB (100%)
179163
```
180164
165+
On Windows, AMD_LOG_LEVEL can be set via environment variable from advanced system setting, or from Command prompt run as administrator, as shown below as an example, which shows some debug log information calling backend runtime on Windows.
166+
```
167+
C:\hip\bin>set AMD_LOG_LEVEL=4
168+
C:\hip\bin>hipinfo
169+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\comgrctx.cpp:33 : 605413686305 us: 29864: [tid:0x9298] Loading COMGR library.
170+
:4:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\platform\runtime.cpp:83 : 605413869411 us: 29864: [tid:0x9298] init
171+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_context.cpp:47 : 605413869502 us: 29864: [tid:0x9298] Direct Dispatch: 0
172+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:543 : 605413870553 us: 29864: [tid:0x9298] hipGetDeviceCount: Returned hipSuccess :
173+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:556 : 605413870631 us: 29864: [tid:0x9298][32m hipSetDevice ( 0 ) ←[0m
174+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:561 : 605413870848 us: 29864: [tid:0x9298] hipSetDevice: Returned hipSuccess :
175+
--------------------------------------------------------------------------------
176+
device# 0
177+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device.cpp:346 : 605413871623 us: 29864: [tid:0x9298][32m hipGetDeviceProperties ( 0000008AEBEFF8C8, 0 ) ←[0m
178+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device.cpp:348 : 605413871695 us: 29864: [tid:0x9298] hipGetDeviceProperties: Returned hipSuccess :
179+
Name: AMD Radeon(TM) Graphics
180+
pciBusID: 3
181+
pciDeviceID: 0
182+
pciDomainID: 0
183+
multiProcessorCount: 7
184+
maxThreadsPerMultiProcessor: 2560
185+
isMultiGpuBoard: 0
186+
clockRate: 1600 Mhz
187+
memoryClockRate: 1333 Mhz
188+
memoryBusWidth: 0
189+
totalGlobalMem: 12.06 GB
190+
totalConstMem: 2147483647
191+
sharedMemPerBlock: 64.00 KB
192+
...
193+
gcnArchName: gfx90c:xnack-
194+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:541 : 605413924779 us: 29864: [tid:0x9298][32m hipGetDeviceCount ( 0000008AEBEFF8A4 ) ←[0m
195+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_device_runtime.cpp:543 : 605413925075 us: 29864: [tid:0x9298] hipGetDeviceCount: Returned hipSuccess :
196+
peers: :3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_peer.cpp:176 : 605413928643 us: 29864: [tid:0x9298][32m hipDeviceCanAccessPeer ( 0000008AEBEFF890, 0, 0 ) ←[0m
197+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_peer.cpp:177 : 605413928743 us: 29864: [tid:0x9298] hipDeviceCanAccessPeer: Returned hipSuccess :
198+
non-peers: :3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_peer.cpp:176 : 605413930830 us: 29864: [tid:0x9298][32m hipDeviceCanAccessPeer ( 0000008AEBEFF890, 0, 0 ) ←[0m
199+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_peer.cpp:177 : 605413930882 us: 29864: [tid:0x9298] hipDeviceCanAccessPeer: Returned hipSuccess :
200+
device#0
201+
...
202+
:4:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\pal\palmemory.cpp:430 : 605414517802 us: 29864: [tid:0x9298] Free-: 8000 bytes, VM[ 3007c8000, 3007d0000]
203+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\devprogram.cpp:2979: 605414517893 us: 29864: [tid:0x9298] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferToImage
204+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\devprogram.cpp:2979: 605414518259 us: 29864: [tid:0x9298] For Init/Fini: Kernel Name: __amd_rocclr_copyBuffer
205+
...
206+
:4:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\pal\palmemory.cpp:206 : 605414523422 us: 29864: [tid:0x9298] Alloc: 100000 bytes, ptr[00000003008D0000-00000003009D0000], obj[00000003007D0000-00000003047D0000]
207+
:4:C:\constructicon\builds\gfx\two\22.40\drivers\compute\vdi\device\pal\palmemory.cpp:206 : 605414523767 us: 29864: [tid:0x9298] Alloc: 100000 bytes, ptr[00000003009D0000-0000000300AD0000], obj[00000003007D0000-00000003047D0000]
208+
:3:C:\constructicon\builds\gfx\two\22.40\drivers\compute\hipamd\src\hip_memory.cpp:681 : 605414524092 us: 29864: [tid:0x9298] hipMemGetInfo: Returned hipSuccess :
209+
memInfo.total: 12.06 GB
210+
memInfo.free: 11.93 GB (99%)
211+
```
212+
181213
## HIP Logging Tips:
182214
183215
- HIP logging works for both release and debug version of HIP application.

docs/how_to_guides/debugging.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ Reading symbols from ./hipTexObjPitch...
100100
(gdb) break main
101101
Breakpoint 1 at 0x4013d1: file /home/test/hip/tests/src/texture/hipTexObjPitch.cpp, line 98.
102102
(gdb) run
103-
Starting program: /home/test/hip/build/directed_tests/texture/hipTexObjPitch
103+
Starting program: /home/test/hip/build/directed_tests/texture/hipTexObjPitch
104104
[Thread debugging using libthread_db enabled]
105105
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
106106

@@ -112,11 +112,11 @@ Breakpoint 1, main ()
112112
```
113113

114114
### Other Debugging Tools
115-
There are also other debugging tools available online developers can google and choose the one best suits the debugging requirements.
115+
There are also other debugging tools available online developers can google and choose the one best suits the debugging requirements. For example, Microsoft Visual Studio and Windgb tools are options on Windows.
116116

117117
## Debugging HIP Applications
118118

119-
Below is an example to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.
119+
Below is an example on Linux to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.
120120

121121
```console
122122
test: simpleTest2<?> numElements=4194304 sizeElements=4194304 bytes
@@ -176,11 +176,14 @@ Thread 1 "hipMemcpy_simpl" received signal SIGSEGV, Segmentation fault.
176176
...
177177
```
178178

179+
On Windows, debugging HIP applications on IDE like Microsoft Visual Studio tools, are more informative and visible to debug codes, inspect variables, watch multiple details and examine the call stacks.
180+
179181
## Useful Environment Variables
180-
HIP provides some environment variables which allow HIP, hip-clang, or HSA driver to disable some feature or optimization.
182+
183+
HIP provides some environment variables which allow HIP, hip-clang, or HSA driver on Linux to disable some feature or optimization.
181184
These are not intended for production but can be useful diagnose synchronization problems in the application (or driver).
182185

183-
Some of the most useful environment variables are described here. They are supported on the ROCm path.
186+
Some of the most useful environment variables are described here. They are supported on the ROCm path on Linux and Windows as well.
184187

185188
### Kernel Enqueue Serialization
186189
Developers can control kernel command serialization from the host using the environment variable,
@@ -221,8 +224,8 @@ if (totalDeviceNum > 2) {
221224
Developers can dump code object to analyze compiler related issues via setting environment variable,
222225
GPU_DUMP_CODE_OBJECT
223226

224-
### HSA related environment variables
225-
HSA provides some environment variables help to analyze issues in driver or hardware, for example,
227+
### HSA related environment variables on Linux
228+
On Linux with open source, HSA provides some environment variables help to analyze issues in driver or hardware, for example,
226229

227230
HSA_ENABLE_SDMA=0
228231
It causes host-to-device and device-to-host copies to use compute shader blit kernels rather than the dedicated DMA copy engines.
@@ -246,12 +249,12 @@ The following is the summary of the most useful environment variables in HIP.
246249
| AMD_SERIALIZE_KERNEL <br><sub> Serialize kernel enqueue. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
247250
| AMD_SERIALIZE_COPY <br><sub> Serialize copies. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
248251
| HIP_HOST_COHERENT <br><sub> Coherent memory in hipHostMalloc. </sub> | 0 | 0: memory is not coherent between host and GPU. <br> 1: memory is coherent with host. |
249-
| AMD_DIRECT_DISPATCH <br><sub> Enable direct kernel dispatch. </sub> | 1 | 0: Disable. <br> 1: Enable. |
252+
| AMD_DIRECT_DISPATCH <br><sub> Enable direct kernel dispatch (Currently for Linux, under development on Windows). </sub> | 1 | 0: Disable. <br> 1: Enable. |
250253
| GPU_MAX_HW_QUEUES <br><sub> The maximum number of hardware queues allocated per device. </sub> | 4 | The variable controls how many independent hardware queues HIP runtime can create per process, per device. If application allocates more HIP streams than this number, then HIP runtime will reuse the same hardware queues for the new streams in round robin manner. Please note, this maximum number does not apply to either hardware queues that are created for CU masked HIP streams, or cooperative queue for HIP Cooperative Groups (there is only one single queue per device). |
251254

252255
## General Debugging Tips
253256
- 'gdb --args' can be used to conveniently pass the executable and arguments to gdb.
254-
- From inside GDB, you can set environment variables "set env". Note the command does not use an '=' sign:
257+
- From inside GDB on Linux, you can set environment variables "set env". Note the command does not use an '=' sign:
255258

256259
```
257260
(gdb) set env AMD_SERIALIZE_KERNEL 3

docs/reference/kernel_language.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,15 +126,15 @@ The `__restrict__` keyword tells the compiler that the associated memory pointer
126126

127127
## Built-In Variables
128128

129-
(coordinate_builtins)=
130129
### Coordinate Built-Ins
131130
Built-ins determine the coordinate of the active work item in the execution grid. They are defined in amd_hip_runtime.h (rather than being implicitly defined by the compiler).
132131
In HIP, built-ins coordinate variable definitions are the same as in Cuda, for instance:
133132
threadIdx.x, blockIdx.y, gridDim.y, etc.
134133
The products gridDim.x * blockDim.x, gridDim.y * blockDim.y and gridDim.z * blockDim.z are always less than 2^32.
134+
Coordinates builtins are implemented as structures for better performance. When used with printf, they needs to be casted to integer types explicitly.
135135

136136
### warpSize
137-
The warpSize variable is of type int and contains the warp size (in threads) for the target device. Note that all current Nvidia devices return 32 for this variable, and all current AMD devices return 64. Device code should use the warpSize built-in to develop portable wave-aware code.
137+
The warpSize variable is of type int and contains the warp size (in threads) for the target device. Note that all current Nvidia devices return 32 for this variable, and current AMD devices return 64 for gfx9 and 32 for gfx10 and above. The warpSize variable should only be used in device functions. Device code should use the warpSize built-in to develop portable wave-aware code.
138138

139139

140140
## Vector Types

docs/user_guide/faq.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,9 @@ ROCclr (Radeon Open Compute Common Language Runtime) is a virtual device interfa
148148
## What is HIPAMD?
149149
HIPAMD is a repository branched out from HIP, mainly the implementation for AMD GPU.
150150

151+
## Can I get HIP open source repository for Windows?
152+
No, there is no HIP repository open publicly on Windows.
153+
151154
## Can a HIP binary run on both AMD and Nvidia platforms?
152155
HIP is a source-portable language that can be compiled to run on either AMD or NVIDIA platform. HIP tools don't create a "fat binary" that can run on either platform, however.
153156

@@ -237,6 +240,11 @@ Once source is compiled with per-thread default stream enabled, all APIs will be
237240

238241
Besides, per-thread default stream be enabled per translation unit, users can compile some files with feature enabled and some with feature disabled. Feature enabled translation unit will have default stream as per thread and there will not be any implicit synchronization done but other modules will have legacy default stream which will do implicit synchronization.
239242

243+
## Can I develop applications with HIP APIs on Windows the same on Linux?
244+
245+
Yes, HIP APIs are available to use on both Linux and Windows.
246+
Due to different working mechanisms on operating systems like Windows vs Linux, HIP APIs call corresponding lower level backend runtime libraries and kernel drivers for the OS, in order to control the executions on GPU hardware accordingly. There might be a few differences on the related backend software and driver support, which might affect usage of HIP APIs. See OS support details in HIP API document.
247+
240248
## How can I know the version of HIP?
241249

242250
HIP version definition has been updated since ROCm 4.2 release as the following:

0 commit comments

Comments
 (0)