Replies: 3 comments
-
I also encountered the same issue. After quantization, the partition fails. |
Beta Was this translation helpful? Give feedback.
0 replies
-
@kimishpatel @metascroy Do you have any suggestions on how to handle this? |
Beta Was this translation helpful? Give feedback.
0 replies
-
have you looked at https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantizer/composable_quantizer.py#L17 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
My team aims to develop a heterogeneous inference framework using ExecuTorch, but we are currently grappling with challenges in heterogeneous quantization.
Consider this scenario: For a single model, we plan to simultaneously delegate computations to NPU-A, NPU-B, and CPU backends. The CPU will utilize XNNPACKQuantizer, while NPU-A and NPU-B require custom quantization algorithms. How can we apply these three distinct quantization methods to their respective partitions before the graph is partitioned?
Based on my understanding of ExecuTorch's workflow: Quantization at Aten IR → Partitioning at EdgeIR,
if the graph is partitioned into:
P1 (executed on NPU-A)
P2 (executed on NPU-B)
P3 (executed on CPU via XNNPACK)
How can we ensure that the NPU-A-specific quantization algorithm is applied to P1 during the Aten IR quantization stage, given that partitioning occurs later at the Edge IR stage?
Beta Was this translation helpful? Give feedback.
All reactions