You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After quantizing the model using dynamic FP8 quantization, I found that saving the model with save_compressed=True is extremely slow. Why is this slow? What operations are performed during this process? What is the difference compared to setting it to False? Is there a way to improve the saving speed?