Skip to content

Commit 1241e27

Browse files
Update documentation for release/0.9
1 parent 447248e commit 1241e27

File tree

56 files changed

+7320
-887
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+7320
-887
lines changed

docs/source/_ext/quark_jupyter_notebook_build.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ def update_jupyter_notebook_toc_placeholder(app, docname, source):
2323

2424
jupyter_notebook_index_rst = os.path.join('source', 'jupyter_notebook_index.rst_')
2525
if "READTHEDOCS" in os.environ:
26-
READTHEDOCS_REPOSITORY_PATH= os.environ.get("READTHEDOCS_REPOSITORY_PATH")
26+
READTHEDOCS_REPOSITORY_PATH = os.environ.get("READTHEDOCS_REPOSITORY_PATH")
2727
jupyter_notebook_index_rst = os.path.join(READTHEDOCS_REPOSITORY_PATH, 'docs', 'source', 'jupyter_notebook_index.rst_')
2828
jupyter_notebook_toc_placeholder = '@quark_jupyter_notebook_toc_placeholder@'
2929
with open(jupyter_notebook_index_rst, 'r') as f:
-71.2 KB
Binary file not shown.

docs/source/_static/quant/fx_mode_quant/yolo_nas/1_original_fp32_train_stage.png

100644100755
File mode changed.

docs/source/_static/quant/fx_mode_quant/yolo_nas/2_folded_fp32_validation_stage.png

100644100755
File mode changed.

docs/source/_static/quant/fx_mode_quant/yolo_nas/3_quant_scope.png

100644100755
File mode changed.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
quark
2+
=====
3+
4+
.. py:module:: quark
5+
6+
.. autoapi-nested-parse::
7+
8+
**Quark** is a comprehensive cross-platform toolkit designed to simplify and
9+
enhance the quantization of deep learning models. Supporting both PyTorch and
10+
ONNX models, Quark empowers developers to optimize their models for deployment
11+
on a wide range of hardware backends, achieving significant performance gains
12+
without compromising accuracy.
13+
14+
For further details on the features and capabilities of Quark, please refer to the
15+
16+
* [Documentation](https://quark.docs.amd.com>)
17+
* [Pytorch examples](https://quark.docs.amd.com/latest/pytorch/pytorch_examples.html>)
18+
* [ONNX examples](https://quark.docs.amd.com/latest/onnx/onnx_examples.html>).
19+
20+
21+

docs/source/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,8 @@ def setup(app):
177177
# directories to ignore when looking for source files.
178178
# This patterns also effect to html_static_path and html_extra_path
179179
exclude_patterns = ['include', 'api_rst', '_build', 'Thumbs.db', '.DS_Store', '**.ipynb_checkpoints']
180+
exclude_patterns.append('*autoapi/quark/index.rst')
181+
180182
nitpicky = True
181183

182184
# The name of the Pygments (syntax highlighting) style to use.

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ Key Features
131131

132132
.. toctree::
133133
:hidden:
134-
:caption: Reference API
134+
:caption: APIs
135135
:maxdepth: 1
136136

137137
PyTorch APIs <autoapi/pytorch_apis>

docs/source/onnx/appendix_full_quant_config_features.rst

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -539,10 +539,18 @@ Quantization Configuration
539539
the input data is from the float module fully, 1 represents all
540540
from the quantized module. The default value is 1.
541541
- **MemOptLevel**: (Int) Specifies the level of memory optimization.
542-
Options are 0 and 1. If 0, it means no memory optimization is applied,
543-
which will be faster but requires more memory for caching. If 1, it
544-
caches the ground-truth for finetuning layer by layer instead of all,
545-
which consumes less memory but may take longer time. The default is 1.
542+
Options are 0, 1 and 2. Setting it to 0 disables optimization,
543+
making training faster but using more memory for caching.
544+
Setting it to 1 caches data one layer at a time, reducing memory
545+
usage at the cost of longer training times. Setting it to 2
546+
saves layer data to a cache directory on disk and loads only
547+
one batch at a time, greatly lowering memory consumption but further
548+
increasing training time. The default value is 1.
549+
- **CacheDir**: (String) Specifies the directory used to cache
550+
intermediate files during fine-tuning. This option is only effective
551+
when the MemOptLevel is set to 2. Please note that after fine-tuning,
552+
some intermediate files may remain in this directory. The default value
553+
is None, in which case a temporary directory will be used for the caching.
546554
- **LogPeriod**: (Int) Indicate how many iterations to print the
547555
log once. The default value is NumIterations/10.
548556

@@ -672,6 +680,10 @@ Quantization Configuration
672680
- **Bits**: (int) The target bits to quantize. Only 4b quantization is supported for inference, additional bits support is planned.
673681
- **AccuracyLevel**: (int) The quantization level of input, can be: 0(unset), 1(fp32), 2(fp16), 3(bf16), or 4(int8). The default is 0.
674682

683+
* **EncryptionAlgorithm**: (String) A parameter used to specify the encryption algorithm for crypto mode,
684+
only "AES-256" algorithm is supported currently. The default value is None, which means it will not save
685+
any intermediate models/files to disk in crypto mode.
686+
675687

676688
Table 7. Quantize Types can be selected for different Quantize Formats
677689

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
Block Floating Point (BFP) Example
2+
==================================
3+
4+
.. note::
5+
6+
For information on accessing Quark ONNX examples, refer to :doc:`Accessing ONNX Examples <onnx_examples>`.
7+
This example and the relevant files are available at ``/onnx/accuracy_improvement/BFP``.
8+
9+
This is an example of quantizing a `mobilenetv2_050.lamb_in1k` model using the ONNX quantizer of Quark with BFP16.
10+
Int8 quantization performs poorly on the model, but BFP16 and ADAQUANT can significantly mitigate the quantization loss.
11+
12+
Block Floating Point (BFP) quantization computational complexity by grouping numbers to share a common exponent, preserving accuracy efficiently.
13+
BFP has both reduced storage requirements and high quantization precision.
14+
15+
The example has the following parts:
16+
17+
- `Pip requirements <#pip-requirements>`__
18+
- `Prepare model <#prepare-model>`__
19+
- `Prepare data <#prepare-data>`__
20+
- `BFP16 Quantization <#bfp16-quantization>`__
21+
- `BFP16 Quantization with ADAQUANT <#bfp16-quantization-with-adaquant>`__
22+
- `Evaluation <#evaluation>`__
23+
24+
25+
Pip requirements
26+
----------------
27+
28+
Install the necessary python packages:
29+
30+
::
31+
32+
python -m pip install -r ../utils/requirements.txt
33+
34+
35+
Prepare model
36+
-------------
37+
38+
Export onnx model from mobilenetv2_050.lamb_in1k torch model. The corresponding model link is https://huggingface.co/timm/mobilenetv2_050.lamb_in1k:
39+
40+
::
41+
42+
mkdir models && python ../utils/export_onnx.py mobilenetv2_050.lamb_in1k
43+
44+
Prepare data
45+
------------
46+
47+
ILSVRC 2012, commonly known as 'ImageNet'. This dataset provides access
48+
to ImageNet (ILSVRC) 2012 which is the most commonly used subset of
49+
ImageNet. This dataset spans 1000 object classes and contains 50,000
50+
validation images.
51+
52+
If you already have an ImageNet datasets, you can directly use your
53+
dataset path.
54+
55+
To prepare the test data, please check the download section of the main
56+
website: https://huggingface.co/datasets/imagenet-1k/tree/main/data. You
57+
need to register and download **val_images.tar.gz**.
58+
59+
Then, create the validation dataset and calibration dataset:
60+
61+
::
62+
63+
mkdir val_data && tar -xzf val_images.tar.gz -C val_data
64+
python ../utils/prepare_data.py val_data calib_data
65+
66+
The storage format of the val_data of the ImageNet dataset organized as
67+
follows:
68+
69+
- val_data
70+
71+
- n01440764
72+
73+
- ILSVRC2012_val_00000293.JPEG
74+
- ILSVRC2012_val_00002138.JPEG
75+
- …
76+
77+
- n01443537
78+
79+
- ILSVRC2012_val_00000236.JPEG
80+
- ILSVRC2012_val_00000262.JPEG
81+
- …
82+
83+
- …
84+
- n15075141
85+
86+
- ILSVRC2012_val_00001079.JPEG
87+
- ILSVRC2012_val_00002663.JPEG
88+
- …
89+
90+
The storage format of the calib_data of the ImageNet dataset organized
91+
as follows:
92+
93+
- calib_data
94+
95+
- n01440764
96+
97+
- ILSVRC2012_val_00000293.JPEG
98+
99+
- n01443537
100+
101+
- ILSVRC2012_val_00000236.JPEG
102+
103+
- …
104+
- n15075141
105+
106+
- ILSVRC2012_val_00001079.JPEG
107+
108+
BFP16 Quantization
109+
------------------
110+
111+
The quantizer takes the float model and produce a BFP16 quantized model.
112+
113+
::
114+
115+
python quantize_model.py --model_name mobilenetv2_050.lamb_in1k \
116+
--input_model_path models/mobilenetv2_050.lamb_in1k.onnx \
117+
--output_model_path models/mobilenetv2_050.lamb_in1k_quantized.onnx \
118+
--calibration_dataset_path calib_data \
119+
--config BFP16
120+
121+
This command will generate a BFP16 quantized model under the **models**
122+
folder, which was quantized by BFP16 configuration.
123+
124+
BFP16 Quantization with ADAQUANT
125+
--------------------------------
126+
127+
The quantizer takes the float model and produce a BFP16 quantized model with
128+
ADAQUANT.
129+
130+
Note: If the model has dynamic shapes, you need to convert the model to fixed shapes before performing ADAQUANT.
131+
132+
::
133+
134+
python -m quark.onnx.tools.convert_dynamic_to_fixed --fix_shapes 'input:[1,3,224,224]' models/mobilenetv2_050.lamb_in1k.onnx models/mobilenetv2_050.lamb_in1k_fix.onnx
135+
136+
::
137+
138+
python quantize_model.py --model_name mobilenetv2_050.lamb_in1k \
139+
--input_model_path models/mobilenetv2_050.lamb_in1k_fix.onnx \
140+
--output_model_path models/mobilenetv2_050.lamb_in1k_adaquant_quantized.onnx \
141+
--calibration_dataset_path calib_data \
142+
--config BFP16_ADAQUANT
143+
144+
If the GPU is available in your environment, you can accelerate the training process by configuring parameter 'device' as 'rocm' or 'cuda'.
145+
146+
::
147+
148+
python quantize_model.py --model_name mobilenetv2_050.lamb_in1k \
149+
--input_model_path models/mobilenetv2_050.lamb_in1k_fix.onnx \
150+
--output_model_path models/mobilenetv2_050.lamb_in1k_adaquant_quantized.onnx \
151+
--calibration_dataset_path calib_data \
152+
--config BFP16_ADAQUANT \
153+
--device cuda
154+
155+
This command will generate a BFP16 quantized model under the **models**
156+
folder, which was quantized by BFP16 configuration with ADAQUANT.
157+
158+
Evaluation
159+
----------
160+
161+
Test the accuracy of the float model on ImageNet val dataset:
162+
163+
::
164+
165+
python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k.onnx
166+
167+
Test the accuracy of the BFP16 quantized model on ImageNet
168+
val dataset:
169+
170+
::
171+
172+
python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k_quantized.onnx
173+
174+
If want to run faster with GPU support, you can also execute the following command:
175+
176+
::
177+
178+
python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k_quantized.onnx --gpu
179+
180+
Test the accuracy of the BFP16 quantized model with ADAQUANT on ImageNet val
181+
dataset:
182+
183+
::
184+
185+
python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k_adaquant_quantized.onnx
186+
187+
If want to run faster with GPU support, you can also execute the following command:
188+
189+
::
190+
191+
python ../utils/onnx_validate.py val_data --model-name mobilenetv2_050.lamb_in1k --batch-size 1 --onnx-input models/mobilenetv2_050.lamb_in1k_adaquant_quantized.onnx --gpu
192+
193+
Quantization Results
194+
--------------------
195+
196+
+-------+-------------------+---------------------+-------------------+
197+
| | Float Model | Quantized Model | Quantized Model |
198+
| | | without ADAQUANT | with ADAQUANT |
199+
+=======+===================+=====================+===================+
200+
| Model | 8.7 MB | 8.4 MB | 8.4 MB |
201+
| Size | | | |
202+
+-------+-------------------+---------------------+-------------------+
203+
| P | 65.424 % | 60.806 % | 64.652 % |
204+
| rec@1 | | | |
205+
+-------+-------------------+---------------------+-------------------+
206+
| P | 85.788 % | 82.648 % | 85.278 % |
207+
| rec@5 | | | |
208+
+-------+-------------------+---------------------+-------------------+
209+
210+
.. note:: Different execution devices can lead to minor variations in the
211+
accuracy of the quantized model.
212+
213+
214+
.. raw:: html
215+
216+
<!--
217+
## License
218+
Copyright (C) 2024, Advanced Micro Devices, Inc. All rights reserved. SPDX-License-Identifier: MIT
219+
-->

0 commit comments

Comments
 (0)