Skip to content

Conversation

@yucai-intel
Copy link
Contributor

@yucai-intel yucai-intel commented Nov 7, 2025

Problem Description:
The XPU backend did not correctly enable the high-precision accumulator rule when processing the sum reduction of complexHalf, causing premature overflow of low-precision accumulators in the kernel's internal accumulation operation.

Specific Modifications and Effects:

  • Modify the reduce_dispatch function: The original scheduler logic omitted explicit high-precision boosting for ComplexHalf. The fix forces the acc_t to be boosted to complex at the scheduling entry point, ensuring the correct precision is used for reduction.
  • Modify the sum_functor specialization: The kernel's underlying implementation may incorrectly downgrade the accumulator type to the low-precision out_t. The fix forces the out_t parameter of gpu_reduce_kernel to be boosted to complex to avoid the type downgrading bug within the kernel.

@yucai-intel
Copy link
Contributor Author

For #2008

@yucai-intel yucai-intel changed the title Fix complex32 reduce presicion error Prevent complex<Half> accumulation overflow in sum_functor specialization Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants