Skip to content

Conversation

shiyang-weng
Copy link
Contributor

Fix #2896

What we want to do is to enable FP8 quantization in PyTorch. Similar to INT8 quantization, this requires inserting quantize and dequantize operations into the computational graph. In order to reuse pattern matching logic of int8, we need register FP8 quant and dequant.

To address this, we attempted to register quant in #2379, but the PR was reverted in #2672 because it caused performance regression on H100 GPUs. And there is no need to register q/dq on CUDA.

Based on the above reasons, I register quant specifically for CPU.

Copy link

pytorch-bot bot commented Sep 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2961

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 88d5158 with merge base 3760978 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 9, 2025
@shiyang-weng shiyang-weng marked this pull request as draft September 9, 2025 03:18
@Xia-Weiwen Xia-Weiwen added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Sep 9, 2025
@shiyang-weng shiyang-weng marked this pull request as ready for review September 10, 2025 01:28
@jerryzh168 jerryzh168 requested a review from vkuzo September 11, 2025 00:10
Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK to me, wondering if @vkuzo has additional thoughts, not sure if there is a better alternative here to support preserving ops for cpu

@shiyang-weng
Copy link
Contributor Author

@vkuzo Could you help review this PR?

@jerryzh168
Copy link
Contributor

@vkuzo Could you help review this PR?

is this urgent? Vasiliy is not available recently and will be back next week

@shiyang-weng
Copy link
Contributor Author

is this urgent? Vasiliy is not available recently and will be back next week

Thanks for letting me know. Not urgent. We can wait for him back next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CPU][FP8][Inductor] How to support fp8 quant for inductor on CPU
3 participants