-
Notifications
You must be signed in to change notification settings - Fork 480
doc update for hgq/da #1359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
doc update for hgq/da #1359
Changes from all commits
0793293
df1bb35
f8123bb
a89f41c
e8d1e77
02ba839
fd7d1e8
3ddf275
76a4b78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
====================== | ||
Distributed Arithmetic | ||
====================== | ||
|
||
.. image:: https://img.shields.io/badge/License-LGPLv3-blue.svg | ||
:target: https://www.gnu.org/licenses/lgpl-3.0.en.html | ||
.. image:: https://badge.fury.io/py/da4ml.svg | ||
:target: https://badge.fury.io/py/da4ml | ||
.. image:: https://img.shields.io/badge/arXiv-2507.04535-b31b1b.svg | ||
:target: https://arxiv.org/abs/2507.04535 | ||
|
||
|
||
Distributed Arithmetic (DA) is a strategy for constant-matrix-vector multiplication (CMVM) operations used in hls4ml. The implementation is provided by an external library, `da4ml <https://github.com/calad0i/da4ml>`__. The library transforms the CMVM operations into an adder graph with common subexpression elimations to reduce the overall complexity. As the CMVM operation is fully unrolled, `reuse_factor` **must** be 1 (by default) for the corresponding CMVM operations [*]_. Comparing to the traditional `Latency` strategy CMVM kernels, DA can usually reduce up to 30% of the LUTs and all DSPs used. | ||
|
||
.. rst-class:: light | ||
.. image:: _static/da4ml-workflow.svg | ||
:alt: Workflow of DA in hls4ml | ||
:width: 600 | ||
|
||
When the DA strategy is used, the CMVM operations will be implemented bit-exactly, and the accumulator precision setting will not be used. | ||
|
||
.. [*] Not to be confused with `II=1`. `reuse_factor` is the `II` for one CMVM operation, not one layer. One layer may invoke the same CMVM kernel multiple times and thus has `II>1` while each CMVM operation is unrolled, e.g., convolution layers with more than one partition. | ||
|
||
Currently, the DA strategy is only available for the Vivado/Vitis HLS backends. The following layers are supported: | ||
* Dense | ||
* Convolutional (1D, 2D) | ||
* EinsumDense | ||
* Multi-head attention (implemented as multiple EinsumDense layers) | ||
|
||
While possible, the RNN layers are not yet supported by the DA strategy. | ||
|
||
For more details, please refer to the `da4ml repository <https://github.com/calad0i/da4ml>`__ or the `paper <https://arxiv.org/abs/2507.04535>`__. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,22 +18,22 @@ To implement a custom layer in ``hls4ml`` with the extension API, the required c | |
* Function config template | ||
* Registration of layer, source code, and templates | ||
|
||
Complete example | ||
================ | ||
Complete example for Keras v2 | ||
============================= | ||
|
||
For concreteness, let's say our custom layer ``KReverse`` is implemented in Keras and reverses the order of the last dimension of the input. | ||
For concreteness, let's say our custom layer ``KReverse`` is implemented in Keras v2 and reverses the order of the last dimension of the input. | ||
|
||
.. code-block:: Python | ||
|
||
# Keras implementation of a custom layer | ||
class KReverse(tf.keras.layers.Layer): | ||
class KReverse(keras.layers.Layer): | ||
'''Keras implementation of a hypothetical custom layer''' | ||
|
||
def __init__(self): | ||
super().__init__() | ||
|
||
def call(self, inputs): | ||
return tf.reverse(inputs, axis=[-1]) | ||
return inputs[..., ::-1] | ||
|
||
def get_config(self): | ||
return super().get_config() | ||
|
@@ -58,19 +58,44 @@ This parser reads the attributes of the Keras layer instance and populates a dic | |
It also returns a list of output shapes (one sjape for each output). | ||
In this case, there a single output with the same shape as the input. | ||
|
||
.. code-block:: Python | ||
.. tabs:: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part only results in empty tabs for both keras v2 and v3. The python code example are both shown, but disappear completely once you click on the tabs. |
||
.. tab:: Keras v2 | ||
|
||
.. code-block:: Python | ||
|
||
# Parser for converter | ||
def parse_reverse_layer(keras_layer, input_names, input_shapes, data_reader): | ||
layer = {} | ||
layer['class_name'] = 'KReverse' | ||
layer['name'] = keras_layer['config']['name'] | ||
layer['n_in'] = input_shapes[0][1] | ||
|
||
if input_names is not None: | ||
layer['inputs'] = input_names | ||
|
||
return layer, [shape for shape in input_shapes[0]] | ||
|
||
.. tab:: Keras v3 | ||
|
||
.. code-block:: Python | ||
|
||
# Parser for converter | ||
def parse_reverse_layer(keras_layer, input_names, input_shapes, data_reader): | ||
layer = {} | ||
layer['class_name'] = 'HReverse' | ||
layer['name'] = keras_layer['config']['name'] | ||
layer['n_in'] = input_shapes[0][1] | ||
from hls4ml.converters.keras_v3._base import register, KerasV3LayerHandler | ||
|
||
if input_names is not None: | ||
layer['inputs'] = input_names | ||
@register | ||
class KReverseHandler(KerasV3LayerHandler): | ||
'''Keras v3 layer handler for KReverse''' | ||
|
||
return layer, [shape for shape in input_shapes[0]] | ||
handles = ('KReverse',) | ||
def handle( | ||
self, | ||
layer: 'keras.Layer', | ||
in_tensors: Sequence['KerasTensor'], | ||
out_tensors: Sequence['KerasTensor'], | ||
) -> dict[str, Any] | tuple[dict[str, Any], ...]: | ||
# Only layer-specific parameters are needed. | ||
# Common parameters are automatically added in the base class. | ||
assert len(in_tensors[0].shape) == 2, 'KReverse is only supported for 2D tensors' | ||
return {'n_in': in_tensors[0].shape[-1]} | ||
|
||
Next, we need the actual HLS implementaton of the function, which can be written in a header file ``nnet_reverse.h``. | ||
|
||
|
@@ -140,33 +165,33 @@ In this case, the HLS code is valid for both the Vivado and Quartus backends. | |
.. code-block:: Python | ||
|
||
# Register the converter for custom Keras layer | ||
hls4ml.converters.register_keras_layer_handler('KReverse', parse_reverse_layer) | ||
hls4ml.converters.register_keras_v2_layer_handler('KReverse', parse_reverse_layer) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can make these blocks have tabs for v2 and v3. See: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice to add the tabs here to once they are figured out. |
||
# For keras v3, use register on subclassed KerasV3LayerHandler from hls4ml.converters.keras_v3._base instead | ||
|
||
# Register the hls4ml's IR layer | ||
hls4ml.model.layers.register_layer('HReverse', HReverse) | ||
hls4ml.model.layers.register_layer('KReverse', HReverse) | ||
|
||
for backend_id in ['Vivado', 'Quartus']: | ||
# Register the optimization passes (if any) | ||
backend = hls4ml.backends.get_backend(backend_id) | ||
backend.register_pass('remove_duplicate_reverse', RemoveDuplicateReverse, flow=f'{backend_id.lower()}:optimize') | ||
|
||
# Register template passes for the given backend | ||
backend.register_template(HReverseConfigTemplate) | ||
backend.register_template(HReverseFunctionTemplate) | ||
|
||
# Register HLS implementation | ||
backend.register_source('nnet_reverse.h') | ||
backend.register_source('/path/to/your/nnet_reverse.h') | ||
|
||
Finally, we can actually test the ``hls4ml`` custom layer compared to the Keras one. | ||
|
||
.. code-block:: Python | ||
|
||
# Test if it works | ||
kmodel = tf.keras.models.Sequential( | ||
kmodel = keras.models.Sequential( | ||
[ | ||
tf.keras.layers.Input(shape=(8,)), | ||
keras.layers.Input(shape=(8,)), | ||
KReverse(), | ||
tf.keras.layers.ReLU(), | ||
keras.layers.ReLU(), | ||
] | ||
) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,49 +1,71 @@ | ||
=================================== | ||
High Granularity Quantization (HGQ) | ||
=================================== | ||
|
||
.. image:: https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg | ||
:target: https://calad0i.github.io/HGQ/ | ||
.. image:: https://badge.fury.io/py/hgq.svg | ||
:target: https://badge.fury.io/py/hgq | ||
====================================== | ||
High Granularity Quantization (HGQ2) | ||
====================================== | ||
|
||
.. note:: | ||
New projects are encouraged to use `HGQ2 <../hgq2.html>`_ instead of the original `HGQ <../hgq.html>`_. | ||
HGQ2 extends the original HGQ with more supported layers, more quantizer options, and is on top of Keras v3, which can be used natively with JAX, PyTorch, and TensorFlow backends. | ||
|
||
.. image:: https://img.shields.io/badge/License-LGPLv3-blue.svg | ||
:target: https://www.gnu.org/licenses/lgpl-3.0.en.html | ||
.. image:: https://github.com/calad0i/HGQ2/actions/workflows/sphinx-build.yml/badge.svg | ||
:target: https://calad0i.github.io/HGQ2/ | ||
.. image:: https://badge.fury.io/py/hgq2.svg | ||
:target: https://badge.fury.io/py/hgq2 | ||
.. image:: https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg | ||
:target: https://arxiv.org/abs/2405.00645 | ||
|
||
`High Granularity Quantization (HGQ) <https://github.com/calad0i/HGQ/>`_ is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level. | ||
HGQ2 (High Granularity Quantization 2) is a quantization-aware training framework built on Keras v3, targeting real-time deep learning applications on edge devices like FPGAs. It provides a comprehensive set of tools for creating and training quantized neural networks with minimal effort. | ||
|
||
.. image:: https://calad0i.github.io/HGQ/_images/overview.svg | ||
:alt: Overview of HGQ | ||
:align: center | ||
HGQ2 implements an gradient-based automatic bitwidth optimization and quantization-aware training algorithm. By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level. | ||
|
||
Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model. | ||
.. rst-class:: light | ||
.. image:: _static/hgq-overview.svg | ||
:alt: HGQ-overview | ||
:width: 600 | ||
|
||
.. code-block:: Python | ||
Key Features | ||
------------ | ||
|
||
import keras | ||
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize | ||
from HGQ import ResetMinMax, FreeBOPs | ||
- **Multi-backend support**: Works with TensorFlow, JAX, and PyTorch through Keras v3 | ||
- **Flexible quantization**: Supports different quantization schemes including fixed-point and minifloat | ||
- **Hardware synthesis**: Direct integration with hls4ml for FPGA deployment | ||
- **Trainable quantization parameters**: Optimize bitwidths through gradient-based methods | ||
- **Effective Bit-Operations (EBOP)**: Accurate resource estimation during training for the deployed firmware | ||
- **Advanced layer support**: HGQ2 supports advanced layers like einsum, einsum dense, and multi-head attention layers with quantization and hardware synthesis support | ||
|
||
model = keras.models.Sequential([ | ||
HQuantize(beta=1.e-5), | ||
HDenseBatchNorm(32, beta=1.e-5, activation='relu'), | ||
HDenseBatchNorm(32, beta=1.e-5, activation='relu'), | ||
HDense(10, beta=1.e-5), | ||
]) | ||
|
||
opt = keras.optimizers.Adam(learning_rate=0.001) | ||
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True) | ||
model.compile(optimizer=opt, loss=loss, metrics=['accuracy']) | ||
callbacks = [ResetMinMax(), FreeBOPs()] | ||
.. code-block:: python | ||
:caption: Simple example | ||
|
||
model.fit(..., callbacks=callbacks) | ||
import keras | ||
from hgq.layers import QDense, QConv2D | ||
from hgq.config import LayerConfigScope, QuantizerConfigScope | ||
|
||
from HGQ import trace_minmax, to_proxy_model | ||
from hls4ml.converters import convert_from_keras_model | ||
# Setup quantization configuration | ||
# These values are the defaults, just for demonstration purposes here | ||
with ( | ||
# Configuration scope for setting the default quantization type and overflow mode | ||
# The second configuration scope overrides the first one for the 'datalane' place | ||
QuantizerConfigScope(place='all', default_q_type='kbi', overflow_mode='SAT_SYM'), | ||
# Configuration scope for enabling EBOPs and setting the beta0 value | ||
QuantizerConfigScope(place='datalane', default_q_type='kif', overflow_mode='WRAP'), | ||
LayerConfigScope(enable_ebops=True, beta0=1e-5), | ||
): | ||
model = keras.Sequential([ | ||
QConv2D(32, (3, 3), activation='relu'), | ||
keras.layers.MaxPooling2D((2, 2)), | ||
keras.layers.Flatten(), | ||
QDense(10) | ||
]) | ||
|
||
trace_minmax(model, x_train, cover_factor=1.0) | ||
proxy = to_proxy_model(model, aggressive=True) | ||
... # Training, evaluation, and anything else you want to do with the model | ||
|
||
model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...) | ||
model_hls = hls4ml.converters.convert_from_keras(model, ...) | ||
# Model-wise precision propagation is done automatically for HGQ models for bit-exactness | ||
# Do NOT pass precision config if you don't know what you are doing | ||
|
||
model_hls.compile() | ||
|
||
An interactive example of HGQ can be found in the `kaggle notebook <https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1>`_. Full documentation can be found at `calad0i.github.io/HGQ <https://calad0i.github.io/HGQ/>`_. | ||
.. note:: | ||
Do not pass any precision configuration from ``hls4ml.converters.convert_from_keras`` in general. HGQ-defined models will invoke model-wise precision propagation automatically to ensure bit-exactness between the Keras model and the generated HLS code (See `here <./precision.html>`__ for more details). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
=================================== | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we put both of these under a subsection of "advanced". Other options is to introduce new section called "Concepts" and put the subsection there. And if the advice for new code is HGQ2, maybe add a note here and title this one to (deprecated) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree |
||
High Granularity Quantization (HGQ) | ||
=================================== | ||
|
||
.. note:: | ||
While still supported and maintained, HGQ is deprecated in favor of `HGQ2 <../hgq2.html>`_. New projects are strongly encouraged to use HGQ2 instead. | ||
|
||
.. image:: https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg | ||
:target: https://calad0i.github.io/HGQ/ | ||
.. image:: https://badge.fury.io/py/hgq.svg | ||
:target: https://badge.fury.io/py/hgq | ||
.. image:: https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg | ||
:target: https://arxiv.org/abs/2405.00645 | ||
|
||
`High Granularity Quantization (HGQ) <https://github.com/calad0i/HGQ/>`_ is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level. | ||
|
||
.. image:: https://calad0i.github.io/HGQ/_images/overview.svg | ||
:alt: Overview of HGQ | ||
:align: center | ||
|
||
Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model. | ||
|
||
.. code-block:: Python | ||
|
||
import keras | ||
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize | ||
from HGQ import ResetMinMax, FreeBOPs | ||
|
||
model = keras.models.Sequential([ | ||
HQuantize(beta=1.e-5), | ||
HDenseBatchNorm(32, beta=1.e-5, activation='relu'), | ||
HDenseBatchNorm(32, beta=1.e-5, activation='relu'), | ||
HDense(10, beta=1.e-5), | ||
]) | ||
|
||
opt = keras.optimizers.Adam(learning_rate=0.001) | ||
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True) | ||
model.compile(optimizer=opt, loss=loss, metrics=['accuracy']) | ||
callbacks = [ResetMinMax(), FreeBOPs()] | ||
|
||
model.fit(..., callbacks=callbacks) | ||
|
||
from HGQ import trace_minmax, to_proxy_model | ||
from hls4ml.converters import convert_from_keras_model | ||
|
||
trace_minmax(model, x_train, cover_factor=1.0) | ||
proxy = to_proxy_model(model, aggressive=True) | ||
|
||
model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...) | ||
|
||
|
||
An interactive example of HGQ can be found in the `kaggle notebook <https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1>`_. Full documentation can be found at `calad0i.github.io/HGQ <https://calad0i.github.io/HGQ/>`_. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
============================== | ||
Model-wise Precision Inference | ||
============================== | ||
|
||
The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for **all** precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. trusts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Meaning only those are used as source of precision derivation |
||
|
||
This pass uses a modified symbolic interval arithmetic to compute the ranges and the needed quantization steps for all precision in the model graph, with the goal of eliminating any discrepancy between the quantized model and the original model. In the inference process, only the raw weight values and the explicit quantizers (either ``FixedPointQuantizer``, or ``linear/relu`` layers with ``trusted=True``) are considered as sources of precision information. All other precision information (e.g., user-defined precision in ``config_from_*`` functions) will not be used in the inference process. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without a? |
||
|
||
Invoking of this pass is configured by the ``bit_exact`` key in the backend configuration (default: ``None``). There are two ways to enable this pass: | ||
- When converting from ``HGQ/HGQ2`` models, this pass is automatically enabled unless ``bit_exact`` is explicitly set to ``False``. | ||
- For other models, this pass can be enabled by setting ``bit_exact`` to ``True``. Currently, only ``QKeras`` sets this key automatically when converting from ``QKeras`` models. Support for ``QONNX`` is planned but not yet implemented. | ||
|
||
If the original model is not properly quantized, this pass will lead to huge bitwidths in the model. In this context, properly quantized models are those that have quantizers defined between **all layers with non-trivial arithmetics**. The successful application of this pass should result in bit-exact model, i.e., the quantized model should produce the same outputs as the original model for all inputs [*]_. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we make explicit what non-trivial arithmetic means? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Essentially everything, except reshape, flatten, and linear (why would you put them at input though?). Will add. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You haven't been in hls4ml long enough 😉. People regularly put reshape or flatten as the first layer in order to simplify IP integration. And yes, I agree with what you're thinking reading that sentence. Let's not comment on this practice further, just make the docs clear. |
||
|
||
Not all operator types are supported in this pass. If any unsupported operator is encountered during the inference, this pass will **crash** the conversion process to prevent silent failures. Please consider use `automatic precision inference <../auto.html>`_ if your model contains unsupported operators or unquantized components. | ||
|
||
.. warning:: | ||
Importantly, quantizers **should be used immediately after the inputs**, or the input precision may not be properly inferred. If you are using ``HGQ/HGQ2``, this is automatically taken care of in most cases. If you are using ``QKeras``, make sure to put a ``QActivation`` with ``quantized_bits`` right after the input layer such that the input precision can be derived. | ||
|
||
.. [*] While quantized, the original model will still operate on float-point values, so there is a chance that the outputs will not be exactly the same due to float rounding errors in the original model. | ||
|
||
.. note:: | ||
Unlike the automatic precision inference, it is strongly recommended to **not** use the ``config_from_*`` functions to set the precisions in the model. Automatic precision inference and this pass cannot be used simultaneously. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we make this statement more specific? It sounds like users should never use config_from_*? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also update CITATION.cff