You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ Key features include:
7
7
* HIP is very thin and has little or no performance impact over coding directly in CUDA mode.
8
8
* HIP allows coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.
9
9
* HIP allows developers to use the "best" development environment and tools on each target platform.
10
-
* The [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/master/README.md) tools automatically convert source from CUDA to HIP.
10
+
* The [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/README.md) tools automatically convert source from CUDA to HIP.
11
11
* Developers can specialize for the platform (CUDA or AMD) to tune for performance or handle tricky cases.
12
12
13
13
New projects can be developed directly in the portable HIP C++ language and can run on either NVIDIA or AMD platforms. Additionally, HIP provides porting tools which make it easy to port existing CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.
Below is an example to enable HIP logging and get logging information during
89
-
execution of hipinfo,
88
+
Below is an example to enable HIP logging and get logging information during execution of hipinfo on Linux,
90
89
91
90
```console
92
91
user@user-test:~/hip/bin$ export AMD_LOG_LEVEL=4
@@ -136,22 +135,7 @@ concurrentKernels: 1
136
135
cooperativeLaunch: 0
137
136
cooperativeMultiDeviceLaunch: 0
138
137
arch.hasGlobalInt32Atomics: 1
139
-
arch.hasGlobalFloatAtomicExch: 1
140
-
arch.hasSharedInt32Atomics: 1
141
-
arch.hasSharedFloatAtomicExch: 1
142
-
arch.hasFloatAtomicAdd: 1
143
-
arch.hasGlobalInt64Atomics: 1
144
-
arch.hasSharedInt64Atomics: 1
145
-
arch.hasDoubles: 1
146
-
arch.hasWarpVote: 1
147
-
arch.hasWarpBallot: 1
148
-
arch.hasWarpShuffle: 1
149
-
arch.hasFunnelShift: 0
150
-
arch.hasThreadFenceSystem: 1
151
-
arch.hasSyncThreadsExt: 0
152
-
arch.hasSurfaceFuncs: 0
153
-
arch.has3dGrid: 1
154
-
arch.hasDynamicParallelism: 0
138
+
...
155
139
gcnArch: 1012
156
140
isIntegrated: 0
157
141
maxTexture1D: 65536
@@ -178,6 +162,54 @@ memInfo.total: 7.98 GB
178
162
memInfo.free: 7.98GB (100%)
179
163
```
180
164
165
+
On Windows, AMD_LOG_LEVEL can be set via environment variable from advanced system setting, or from Command prompt run as administrator, as shown below as an example, which shows some debug log information calling backend runtime on Windows.
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
106
106
@@ -112,11 +112,11 @@ Breakpoint 1, main ()
112
112
```
113
113
114
114
### Other Debugging Tools
115
-
There are also other debugging tools available online developers can google and choose the one best suits the debugging requirements.
115
+
There are also other debugging tools available online developers can google and choose the one best suits the debugging requirements. For example, Microsoft Visual Studio and Windgb tools are options on Windows.
116
116
117
117
## Debugging HIP Applications
118
118
119
-
Below is an example to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.
119
+
Below is an example on Linux to show how to get useful information from the debugger while running a simple memory copy test, which caused an issue of segmentation fault.
@@ -176,11 +176,14 @@ Thread 1 "hipMemcpy_simpl" received signal SIGSEGV, Segmentation fault.
176
176
...
177
177
```
178
178
179
+
On Windows, debugging HIP applications on IDE like Microsoft Visual Studio tools, are more informative and visible to debug codes, inspect variables, watch multiple details and examine the call stacks.
180
+
179
181
## Useful Environment Variables
180
-
HIP provides some environment variables which allow HIP, hip-clang, or HSA driver to disable some feature or optimization.
182
+
183
+
HIP provides some environment variables which allow HIP, hip-clang, or HSA driver on Linux to disable some feature or optimization.
181
184
These are not intended for production but can be useful diagnose synchronization problems in the application (or driver).
182
185
183
-
Some of the most useful environment variables are described here. They are supported on the ROCm path.
186
+
Some of the most useful environment variables are described here. They are supported on the ROCm path on Linux and Windows as well.
184
187
185
188
### Kernel Enqueue Serialization
186
189
Developers can control kernel command serialization from the host using the environment variable,
@@ -221,8 +224,8 @@ if (totalDeviceNum > 2) {
221
224
Developers can dump code object to analyze compiler related issues via setting environment variable,
222
225
GPU_DUMP_CODE_OBJECT
223
226
224
-
### HSA related environment variables
225
-
HSA provides some environment variables help to analyze issues in driver or hardware, for example,
227
+
### HSA related environment variables on Linux
228
+
On Linux with open source, HSA provides some environment variables help to analyze issues in driver or hardware, for example,
226
229
227
230
HSA_ENABLE_SDMA=0
228
231
It causes host-to-device and device-to-host copies to use compute shader blit kernels rather than the dedicated DMA copy engines.
@@ -246,12 +249,12 @@ The following is the summary of the most useful environment variables in HIP.
246
249
| AMD_SERIALIZE_KERNEL <br><sub> Serialize kernel enqueue. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
247
250
| AMD_SERIALIZE_COPY <br><sub> Serialize copies. </sub> | 0 | 1: Wait for completion before enqueue. <br> 2: Wait for completion after enqueue. <br> 3: Both. |
248
251
| HIP_HOST_COHERENT <br><sub> Coherent memory in hipHostMalloc. </sub> | 0 | 0: memory is not coherent between host and GPU. <br> 1: memory is coherent with host. |
| AMD_DIRECT_DISPATCH <br><sub> Enable direct kernel dispatch (Currently for Linux, under development on Windows). </sub> | 1 | 0: Disable. <br> 1: Enable. |
250
253
| GPU_MAX_HW_QUEUES <br><sub> The maximum number of hardware queues allocated per device. </sub> | 4 | The variable controls how many independent hardware queues HIP runtime can create per process, per device. If application allocates more HIP streams than this number, then HIP runtime will reuse the same hardware queues for the new streams in round robin manner. Please note, this maximum number does not apply to either hardware queues that are created for CU masked HIP streams, or cooperative queue for HIP Cooperative Groups (there is only one single queue per device). |
251
254
252
255
## General Debugging Tips
253
256
- 'gdb --args' can be used to conveniently pass the executable and arguments to gdb.
254
-
- From inside GDB, you can set environment variables "set env". Note the command does not use an '=' sign:
257
+
- From inside GDB on Linux, you can set environment variables "set env". Note the command does not use an '=' sign:
Copy file name to clipboardExpand all lines: docs/reference/kernel_language.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -126,15 +126,15 @@ The `__restrict__` keyword tells the compiler that the associated memory pointer
126
126
127
127
## Built-In Variables
128
128
129
-
(coordinate_builtins)=
130
129
### Coordinate Built-Ins
131
130
Built-ins determine the coordinate of the active work item in the execution grid. They are defined in amd_hip_runtime.h (rather than being implicitly defined by the compiler).
132
131
In HIP, built-ins coordinate variable definitions are the same as in Cuda, for instance:
133
132
threadIdx.x, blockIdx.y, gridDim.y, etc.
134
133
The products gridDim.x * blockDim.x, gridDim.y * blockDim.y and gridDim.z * blockDim.z are always less than 2^32.
134
+
Coordinates builtins are implemented as structures for better performance. When used with printf, they needs to be casted to integer types explicitly.
135
135
136
136
### warpSize
137
-
The warpSize variable is of type int and contains the warp size (in threads) for the target device. Note that all current Nvidia devices return 32 for this variable, and all current AMD devices return 64. Device code should use the warpSize built-in to develop portable wave-aware code.
137
+
The warpSize variable is of type int and contains the warp size (in threads) for the target device. Note that all current Nvidia devices return 32 for this variable, and current AMD devices return 64 for gfx9 and 32 for gfx10 and above. The warpSize variable should only be used in device functions. Device code should use the warpSize built-in to develop portable wave-aware code.
Copy file name to clipboardExpand all lines: docs/user_guide/faq.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,6 +148,9 @@ ROCclr (Radeon Open Compute Common Language Runtime) is a virtual device interfa
148
148
## What is HIPAMD?
149
149
HIPAMD is a repository branched out from HIP, mainly the implementation for AMD GPU.
150
150
151
+
## Can I get HIP open source repository for Windows?
152
+
No, there is no HIP repository open publicly on Windows.
153
+
151
154
## Can a HIP binary run on both AMD and Nvidia platforms?
152
155
HIP is a source-portable language that can be compiled to run on either AMD or NVIDIA platform. HIP tools don't create a "fat binary" that can run on either platform, however.
153
156
@@ -237,6 +240,11 @@ Once source is compiled with per-thread default stream enabled, all APIs will be
237
240
238
241
Besides, per-thread default stream be enabled per translation unit, users can compile some files with feature enabled and some with feature disabled. Feature enabled translation unit will have default stream as per thread and there will not be any implicit synchronization done but other modules will have legacy default stream which will do implicit synchronization.
239
242
243
+
## Can I develop applications with HIP APIs on Windows the same on Linux?
244
+
245
+
Yes, HIP APIs are available to use on both Linux and Windows.
246
+
Due to different working mechanisms on operating systems like Windows vs Linux, HIP APIs call corresponding lower level backend runtime libraries and kernel drivers for the OS, in order to control the executions on GPU hardware accordingly. There might be a few differences on the related backend software and driver support, which might affect usage of HIP APIs. See OS support details in HIP API document.
247
+
240
248
## How can I know the version of HIP?
241
249
242
250
HIP version definition has been updated since ROCm 4.2 release as the following:
0 commit comments