Possible Vulkan Bug? 3.14.0 CTDs splitting layers over native + D3D12 wrapper #515
-
Sorry, I wasn't sure how to summarize that title better. I will admit this is a new arena to me, so if I'm doing something dumb, missing something, or there's a feature or option that I ought to be using to fix this, I am more than happy to correct it. Otherwise, I think I may have found some bugs in the implementation, but I wanted to check here for solutions and support before submitting a bug report to make sure this was actually a bug and not just me doing something wrong. Context:
Logs of backend splitting model layers across device 2 (native NVIDIA) and device 3 (D3D12 wrapper of the same physical GPU), then crashing during token generation with:
Happens regardless of whether or not I set Test 1: Set GGML_VK_VISIBLE_DEVICES=2 (native NVIDIA)
Expected: Only use device 2 (native NVIDIA Vulkan driver)
Test 2: Set GGML_VK_VISIBLE_DEVICES=2,2
Rationale: Attempt to force single device by specifying it twice Test 3: Set GGML_VK_VISIBLE_DEVICES=1 (AMD integrated - I did this basically just to see if it tried or if it ignored it)
Expected: Only use device 1 (AMD integrated GPU) Test 4: Don't set environment variable
Expected: Automatic device selection with UUID deduplication So basically, even when explicitly setting the environment variable:
llama.cpp STILL uses both NVIDIA devices:
Then splits layers:
Final error:
Excerpt from the CTD crashdump:
Other Thoughts:In ggml-vulkan.cpp I noticed a few things when I tried to dig into this: lines 4694-4811: The code has two separate paths for device enumeration:
Basically, if I'm understanding this right: When lines 4767-4773 (still in that function)
According to this, since there's no entry for And, in case it helps, I'll just go ahead and throw this in here, from my app test runs where I'm trying to integrate this - I've been testing with Phi4:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
@praetoras-del Thank you for reporting and doing such a comprehensive investigation! It seems peculiar to me that the same GPU appears twice, the Microsoft Direct3D12 entry most likely reports a different UUID so the deduplication code regards it as a different device.
you can see that the PCI identifier of the first device is I researched a bit and it appears that the To try it, first install the prerequisites for building the Vulkan backend, and then run these commands: npm install node-llama-cpp@latest
npx --no node-llama-cpp source download --gpu vulkan --repo giladgd/llama.cpp --release b6795.1
npx --no node-llama-cpp inspect gpu
vulkaninfo Please share the outputs of these commands so I can see whether it works as I expect. I have a few other implementation details that I would want to test to make the fix more robust, but this should be a good first test to see whether this fix is in the right direction. |
Beta Was this translation helpful? Give feedback.
@praetoras-del Thank you for reporting and doing such a comprehensive investigation!
It seems peculiar to me that the same GPU appears twice, the Microsoft Direct3D12 entry most likely reports a different UUID so the deduplication code regards it as a different device.
In these logs:
you can see that the PCI identifier of the first device is
0000:01:00.0
while for the second one it's unkn…