Skip to content

Conversation

xin3he
Copy link
Contributor

@xin3he xin3he commented Oct 16, 2025

User description

Type of Change

bug fix


PR Type

Enhancement, Bug fix


Description

  • Added mem_per_param_scale and enable_torch_compile arguments

  • Updated dtype handling for uNVFP4 and NVFP4+

  • Fixed regression issues in dtype mapping and layer configuration

  • Updated README to include enable_torch_compile in example command


Diagram Walkthrough

flowchart LR
  A["Add mem_per_param_scale"] -- "New argument" --> B["Update dtype handling"]
  B -- "Fix regression" --> C["Add enable_torch_compile"]
  C -- "Update README" --> D["Enhance quantization"]
Loading

File Walkthrough

Relevant files
Enhancement
quantize.py
Add mem_per_param_scale and enable_torch_compile                 

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision/quantize.py

  • Added mem_per_param_scale and enable_torch_compile arguments
  • Updated dtype handling for uNVFP4 and NVFP4+
  • Fixed regression issues in dtype mapping and layer configuration
+27/-27 
Documentation
README.md
Update README with enable_torch_compile                                   

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision/README.md

  • Updated example command to include enable_torch_compile
+3/-3     

Signed-off-by: He, Xin3 <[email protected]>
Signed-off-by: He, Xin3 <[email protected]>
Signed-off-by: He, Xin3 <[email protected]>
@PRAgent4INC
Copy link
Collaborator

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Command Consistency

The command for running the script with deepspeed has different node specifications (localhost:0,1,2,3 vs localhost:4,5,6,7). Ensure these are consistent or clarify the reason for the difference.

deepspeed --include="localhost:0,1,2,3" --master_port=29500 quantize.py  \
    --model_name_or_path meta-llama/Llama-3.3-70B-Instruct/ \

@PRAgent4INC
Copy link
Collaborator

PR Code Suggestions ✨

@chensuyue chensuyue added this to the 3.6 milestone Oct 17, 2025
@xin3he xin3he requested review from XuehaoSun and thuang6 October 20, 2025 03:43
Copy link
Contributor

@thuang6 thuang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better change AR dependency to pip released v0.8 version after AR v0.8 released

parser.add_argument("--device_map", type=str, default=None, help="device map for model")
parser.add_argument("--use_recipe", action="store_true", help="whether to use recipe to quantize model")
parser.add_argument("--recipe_file", type=str, default="recipes/Meta-Llama-3.1-8B-Instruct_6bits.json", help="path of recipe file")
parser.add_argument("--mem_per_param_scale", default=13, type=int, help="memory per param scale factor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not see this arg is used in example, is it added for further tuning consideration? any guideline on how user set the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It's for llama3.3 70b pipeline parallel. It's added in case that user wants to run 70b without TP.
It's not the suggested way, the suggested way is using main branch with my fix of compile, so I intend not to introduce it.

@xin3he xin3he requested a review from thuang6 October 20, 2025 07:32
Copy link
Contributor

@thuang6 thuang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge can wait binary published

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants