|
| 1 | +Block Floating Point (BFP) Example |
| 2 | +================================== |
| 3 | + |
| 4 | +.. note:: |
| 5 | + |
| 6 | + For information on accessing Quark ONNX examples, refer to :doc:`Accessing ONNX Examples <onnx_examples>`. |
| 7 | + This example and the relevant files are available at ``/onnx/accuracy_improvement/BFP``. |
| 8 | + |
| 9 | +This is an example of quantizing a `mobilenetv2_050.lamb_in1k` model using the ONNX quantizer of Quark with BFP16. |
| 10 | +Int8 quantization performs poorly on the model, but BFP16 and ADAQUANT can significantly mitigate the quantization loss. |
| 11 | + |
| 12 | +Block Floating Point (BFP) quantization computational complexity by grouping numbers to share a common exponent, preserving accuracy efficiently. |
| 13 | +BFP has both reduced storage requirements and high quantization precision. |
| 14 | + |
| 15 | +The example has the following parts: |
| 16 | + |
| 17 | +- `Pip requirements <#pip-requirements>`__ |
| 18 | +- `Prepare model <#prepare-model>`__ |
| 19 | +- `Prepare data <#prepare-data>`__ |
| 20 | +- `BFP16 Quantization <#bfp16-quantization>`__ |
| 21 | +- `BFP16 Quantization with ADAQUANT <#bfp16-quantization-with-adaquant>`__ |
| 22 | +- `Evaluation <#evaluation>`__ |
| 23 | + |
| 24 | + |
| 25 | +Pip requirements |
| 26 | +---------------- |
| 27 | + |
| 28 | +Install the necessary python packages: |
| 29 | + |
| 30 | +:: |
| 31 | + |
| 32 | + python -m pip install -r ../utils/requirements.txt |
| 33 | + |
| 34 | + |
| 35 | +Prepare model |
| 36 | +------------- |
| 37 | + |
| 38 | +Export onnx model from mobilenetv2_050.lamb_in1k torch model. The corresponding model link is https://huggingface.co/timm/mobilenetv2_050.lamb_in1k: |
| 39 | + |
| 40 | +:: |
| 41 | + |
| 42 | + mkdir models && python ../utils/export_onnx.py mobilenetv2_050.lamb_in1k |
| 43 | + |
| 44 | +Prepare data |
| 45 | +------------ |
| 46 | + |
| 47 | +ILSVRC 2012, commonly known as 'ImageNet'. This dataset provides access |
| 48 | +to ImageNet (ILSVRC) 2012 which is the most commonly used subset of |
| 49 | +ImageNet. This dataset spans 1000 object classes and contains 50,000 |
| 50 | +validation images. |
| 51 | + |
| 52 | +If you already have an ImageNet datasets, you can directly use your |
| 53 | +dataset path. |
| 54 | + |
| 55 | +To prepare the test data, please check the download section of the main |
| 56 | +website: https://huggingface.co/datasets/imagenet-1k/tree/main/data. You |
| 57 | +need to register and download **val_images.tar.gz**. |
| 58 | + |
| 59 | +Then, create the validation dataset and calibration dataset: |
| 60 | + |
| 61 | +:: |
| 62 | + |
| 63 | + mkdir val_data && tar -xzf val_images.tar.gz -C val_data |
| 64 | + python ../utils/prepare_data.py val_data calib_data |
| 65 | + |
| 66 | +The storage format of the val_data of the ImageNet dataset organized as |
| 67 | +follows: |
| 68 | + |
| 69 | +- val_data |
| 70 | + |
| 71 | + - n01440764 |
| 72 | + |
| 73 | + - ILSVRC2012_val_00000293.JPEG |
| 74 | + - ILSVRC2012_val_00002138.JPEG |
| 75 | + - … |
| 76 | + |
| 77 | + - n01443537 |
| 78 | + |
| 79 | + - ILSVRC2012_val_00000236.JPEG |
| 80 | + - ILSVRC2012_val_00000262.JPEG |
| 81 | + - … |
| 82 | + |
| 83 | + - … |
| 84 | + - n15075141 |
| 85 | + |
| 86 | + - ILSVRC2012_val_00001079.JPEG |
| 87 | + - ILSVRC2012_val_00002663.JPEG |
| 88 | + - … |
| 89 | + |
| 90 | +The storage format of the calib_data of the ImageNet dataset organized |
| 91 | +as follows: |
| 92 | + |
| 93 | +- calib_data |
| 94 | + |
| 95 | + - n01440764 |
| 96 | + |
| 97 | + - ILSVRC2012_val_00000293.JPEG |
| 98 | + |
| 99 | + - n01443537 |
| 100 | + |
| 101 | + - ILSVRC2012_val_00000236.JPEG |
| 102 | + |
| 103 | + - … |
| 104 | + - n15075141 |
| 105 | + |
| 106 | + - ILSVRC2012_val_00001079.JPEG |
| 107 | + |
| 108 | +BFP16 Quantization |
| 109 | +------------------ |
| 110 | + |
| 111 | +The quantizer takes the float model and produce a BFP16 quantized model. |
| 112 | + |
| 113 | +:: |
| 114 | + |
| 115 | + python quantize_model.py --model_name mobilenetv2_050.lamb_in1k \ |
| 116 | + --input_model_path models/mobilenetv2_050.lamb_in1k.onnx \ |
| 117 | + --output_model_path models/mobilenetv2_050.lamb_in1k_quantized.onnx \ |
| 118 | + --calibration_dataset_path calib_data \ |
| 119 | + --config BFP16 |
| 120 | + |
| 121 | +This command will generate a BFP16 quantized model under the **models** |
| 122 | +folder, which was quantized by BFP16 configuration. |
| 123 | + |
| 124 | +BFP16 Quantization with ADAQUANT |
| 125 | +-------------------------------- |
| 126 | + |
| 127 | +The quantizer takes the float model and produce a BFP16 quantized model with |
| 128 | +ADAQUANT. |
| 129 | + |
| 130 | +Note: If the model has dynamic shapes, you need to convert the model to fixed shapes before performing ADAQUANT. |
| 131 | + |
| 132 | +:: |
| 133 | + |
| 134 | + python -m quark.onnx.tools.convert_dynamic_to_fixed --fix_shapes 'input:[1,3,224,224]' models/mobilenetv2_050.lamb_in1k.onnx models/mobilenetv2_050.lamb_in1k_fix.onnx |
| 135 | + |
| 136 | +:: |
| 137 | + |
| 138 | + python quantize_model.py --model_name mobilenetv2_050.lamb_in1k \ |
| 139 | + --input_model_path models/mobilenetv2_050.lamb_in1k_fix.onnx \ |
| 140 | + --output_model_path models/mobilenetv2_050.lamb_in1k_adaquant_quantized.onnx \ |
| 141 | + --calibration_dataset_path calib_data \ |
| 142 | + --config BFP16_ADAQUANT |
| 143 | + |
| 144 | +If the GPU is available in your environment, you can accelerate the training process by configuring parameter 'device' as 'rocm' or 'cuda'. |
| 145 | + |
| 146 | +:: |
| 147 | + |
| 148 | + python quantize_model.py --model_name mobilenetv2_050.lamb_in1k \ |
| 149 | + --input_model_path models/mobilenetv2_050.lamb_in1k_fix.onnx \ |
| 150 | + --output_model_path models/mobilenetv2_050.lamb_in1k_adaquant_quantized.onnx \ |
| 151 | + --calibration_dataset_path calib_data \ |
| 152 | + --config BFP16_ADAQUANT \ |
| 153 | + --device cuda |
| 154 | + |
| 155 | +This command will generate a BFP16 quantized model under the **models** |
| 156 | +folder, which was quantized by BFP16 configuration with ADAQUANT. |
| 157 | + |
| 158 | +Evaluation |
| 159 | +---------- |
| 160 | + |
| 161 | +Test the accuracy of the float model on ImageNet val dataset: |
| 162 | + |
| 163 | +:: |
| 164 | + |
| 165 | + python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k.onnx |
| 166 | + |
| 167 | +Test the accuracy of the BFP16 quantized model on ImageNet |
| 168 | +val dataset: |
| 169 | + |
| 170 | +:: |
| 171 | + |
| 172 | + python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k_quantized.onnx |
| 173 | + |
| 174 | +If want to run faster with GPU support, you can also execute the following command: |
| 175 | + |
| 176 | +:: |
| 177 | + |
| 178 | + python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k_quantized.onnx --gpu |
| 179 | + |
| 180 | +Test the accuracy of the BFP16 quantized model with ADAQUANT on ImageNet val |
| 181 | +dataset: |
| 182 | + |
| 183 | +:: |
| 184 | + |
| 185 | + python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k_adaquant_quantized.onnx |
| 186 | + |
| 187 | +If want to run faster with GPU support, you can also execute the following command: |
| 188 | + |
| 189 | +:: |
| 190 | + |
| 191 | + python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k_adaquant_quantized.onnx --gpu |
| 192 | + |
| 193 | +Quantization Results |
| 194 | +-------------------- |
| 195 | + |
| 196 | ++-------+-------------------+---------------------+-------------------+ |
| 197 | +| | Float Model | Quantized Model | Quantized Model | |
| 198 | +| | | without ADAQUANT | with ADAQUANT | |
| 199 | ++=======+===================+=====================+===================+ |
| 200 | +| Model | 8.7 MB | 8.4 MB | 8.4 MB | |
| 201 | +| Size | | | | |
| 202 | ++-------+-------------------+---------------------+-------------------+ |
| 203 | +| P | 65.424 % | 60.806 % | 64.652 % | |
| 204 | +| rec@1 | | | | |
| 205 | ++-------+-------------------+---------------------+-------------------+ |
| 206 | +| P | 85.788 % | 82.648 % | 85.278 % | |
| 207 | +| rec@5 | | | | |
| 208 | ++-------+-------------------+---------------------+-------------------+ |
| 209 | + |
| 210 | +.. note:: Different execution devices can lead to minor variations in the |
| 211 | + accuracy of the quantized model. |
| 212 | + |
| 213 | + |
| 214 | +.. raw:: html |
| 215 | + |
| 216 | + <!-- |
| 217 | + ## License |
| 218 | + Copyright (C) 2024, Advanced Micro Devices, Inc. All rights reserved. SPDX-License-Identifier: MIT |
| 219 | + --> |
0 commit comments