Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,18 @@ Additionally, if you use specific features developed in later papers, please cit
year = "2022"
}
```
Distributed arithmetic:
```bibtex
@misc{Sun:2025,
title={da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs},
author={Chang Sun and others},
year={2025},
eprint={2507.04535},
archivePrefix={arXiv},
primaryClass={cs.AR},
url={https://arxiv.org/abs/2507.04535},
}
```
Comment on lines +128 to +139
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also update CITATION.cff

binary/ternary networks:
```bibtex
@article{Loncar:2020hqp,
Expand Down
1 change: 1 addition & 0 deletions docs/advanced/_static/da4ml-workflow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/advanced/_static/hgq-overview.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/advanced/auto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,6 @@ inference will never set a bitwdith larger than the bitwidth of the ``max_precis
When manually setting bitdwidths, the accumulator can overflow, and the precision may need to be reduced. For the accumulator, it is usually a bad idea to explicitly
enable rounding or saturation modes since it dramatically increases the execution time. For other types (e.g. output types or weight types), however, rounding and saturation handling
can be enabled as needed.

.. note::
For supported models (Most ``HGQ/HGQ2`` models and some ``QKeras`` models), Model-wise Precision Inference (documented in `model-wise precision inference <../precision.html>`_) can be used to achieve bit-exact conversion. Please refer to that section for more details.
32 changes: 32 additions & 0 deletions docs/advanced/da.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
======================
Distributed Arithmetic
======================

.. image:: https://img.shields.io/badge/License-LGPLv3-blue.svg
:target: https://www.gnu.org/licenses/lgpl-3.0.en.html
.. image:: https://badge.fury.io/py/da4ml.svg
:target: https://badge.fury.io/py/da4ml
.. image:: https://img.shields.io/badge/arXiv-2507.04535-b31b1b.svg
:target: https://arxiv.org/abs/2507.04535


Distributed Arithmetic (DA) is a strategy for constant-matrix-vector multiplication (CMVM) operations used in hls4ml. The implementation is provided by an external library, `da4ml <https://github.com/calad0i/da4ml>`__. The library transforms the CMVM operations into an adder graph with common subexpression elimations to reduce the overall complexity. As the CMVM operation is fully unrolled, `reuse_factor` **must** be 1 (by default) for the corresponding CMVM operations [*]_. Comparing to the traditional `Latency` strategy CMVM kernels, DA can usually reduce up to 30% of the LUTs and all DSPs used.

.. rst-class:: light
.. image:: _static/da4ml-workflow.svg
:alt: Workflow of DA in hls4ml
:width: 600

When the DA strategy is used, the CMVM operations will be implemented bit-exactly, and the accumulator precision setting will not be used.

.. [*] Not to be confused with `II=1`. `reuse_factor` is the `II` for one CMVM operation, not one layer. One layer may invoke the same CMVM kernel multiple times and thus has `II>1` while each CMVM operation is unrolled, e.g., convolution layers with more than one partition.

Currently, the DA strategy is only available for the Vivado/Vitis HLS backends. The following layers are supported:
* Dense
* Convolutional (1D, 2D)
* EinsumDense
* Multi-head attention (implemented as multiple EinsumDense layers)

While possible, the RNN layers are not yet supported by the DA strategy.

For more details, please refer to the `da4ml repository <https://github.com/calad0i/da4ml>`__ or the `paper <https://arxiv.org/abs/2507.04535>`__.
69 changes: 47 additions & 22 deletions docs/advanced/extension.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,22 +18,22 @@ To implement a custom layer in ``hls4ml`` with the extension API, the required c
* Function config template
* Registration of layer, source code, and templates

Complete example
================
Complete example for Keras v2
=============================

For concreteness, let's say our custom layer ``KReverse`` is implemented in Keras and reverses the order of the last dimension of the input.
For concreteness, let's say our custom layer ``KReverse`` is implemented in Keras v2 and reverses the order of the last dimension of the input.

.. code-block:: Python

# Keras implementation of a custom layer
class KReverse(tf.keras.layers.Layer):
class KReverse(keras.layers.Layer):
'''Keras implementation of a hypothetical custom layer'''

def __init__(self):
super().__init__()

def call(self, inputs):
return tf.reverse(inputs, axis=[-1])
return inputs[..., ::-1]

def get_config(self):
return super().get_config()
Expand All @@ -58,19 +58,44 @@ This parser reads the attributes of the Keras layer instance and populates a dic
It also returns a list of output shapes (one sjape for each output).
In this case, there a single output with the same shape as the input.

.. code-block:: Python
.. tabs::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part only results in empty tabs for both keras v2 and v3. The python code example are both shown, but disappear completely once you click on the tabs.

.. tab:: Keras v2

.. code-block:: Python

# Parser for converter
def parse_reverse_layer(keras_layer, input_names, input_shapes, data_reader):
layer = {}
layer['class_name'] = 'KReverse'
layer['name'] = keras_layer['config']['name']
layer['n_in'] = input_shapes[0][1]

if input_names is not None:
layer['inputs'] = input_names

return layer, [shape for shape in input_shapes[0]]

.. tab:: Keras v3

.. code-block:: Python

# Parser for converter
def parse_reverse_layer(keras_layer, input_names, input_shapes, data_reader):
layer = {}
layer['class_name'] = 'HReverse'
layer['name'] = keras_layer['config']['name']
layer['n_in'] = input_shapes[0][1]
from hls4ml.converters.keras_v3._base import register, KerasV3LayerHandler

if input_names is not None:
layer['inputs'] = input_names
@register
class KReverseHandler(KerasV3LayerHandler):
'''Keras v3 layer handler for KReverse'''

return layer, [shape for shape in input_shapes[0]]
handles = ('KReverse',)
def handle(
self,
layer: 'keras.Layer',
in_tensors: Sequence['KerasTensor'],
out_tensors: Sequence['KerasTensor'],
) -> dict[str, Any] | tuple[dict[str, Any], ...]:
# Only layer-specific parameters are needed.
# Common parameters are automatically added in the base class.
assert len(in_tensors[0].shape) == 2, 'KReverse is only supported for 2D tensors'
return {'n_in': in_tensors[0].shape[-1]}

Next, we need the actual HLS implementaton of the function, which can be written in a header file ``nnet_reverse.h``.

Expand Down Expand Up @@ -140,33 +165,33 @@ In this case, the HLS code is valid for both the Vivado and Quartus backends.
.. code-block:: Python

# Register the converter for custom Keras layer
hls4ml.converters.register_keras_layer_handler('KReverse', parse_reverse_layer)
hls4ml.converters.register_keras_v2_layer_handler('KReverse', parse_reverse_layer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add the tabs here to once they are figured out.

# For keras v3, use register on subclassed KerasV3LayerHandler from hls4ml.converters.keras_v3._base instead

# Register the hls4ml's IR layer
hls4ml.model.layers.register_layer('HReverse', HReverse)
hls4ml.model.layers.register_layer('KReverse', HReverse)

for backend_id in ['Vivado', 'Quartus']:
# Register the optimization passes (if any)
backend = hls4ml.backends.get_backend(backend_id)
backend.register_pass('remove_duplicate_reverse', RemoveDuplicateReverse, flow=f'{backend_id.lower()}:optimize')

# Register template passes for the given backend
backend.register_template(HReverseConfigTemplate)
backend.register_template(HReverseFunctionTemplate)

# Register HLS implementation
backend.register_source('nnet_reverse.h')
backend.register_source('/path/to/your/nnet_reverse.h')

Finally, we can actually test the ``hls4ml`` custom layer compared to the Keras one.

.. code-block:: Python

# Test if it works
kmodel = tf.keras.models.Sequential(
kmodel = keras.models.Sequential(
[
tf.keras.layers.Input(shape=(8,)),
keras.layers.Input(shape=(8,)),
KReverse(),
tf.keras.layers.ReLU(),
keras.layers.ReLU(),
]
)

Expand Down
90 changes: 56 additions & 34 deletions docs/advanced/hgq.rst
Original file line number Diff line number Diff line change
@@ -1,49 +1,71 @@
===================================
High Granularity Quantization (HGQ)
===================================

.. image:: https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg
:target: https://calad0i.github.io/HGQ/
.. image:: https://badge.fury.io/py/hgq.svg
:target: https://badge.fury.io/py/hgq
======================================
High Granularity Quantization (HGQ2)
======================================

.. note::
New projects are encouraged to use `HGQ2 <../hgq2.html>`_ instead of the original `HGQ <../hgq.html>`_.
HGQ2 extends the original HGQ with more supported layers, more quantizer options, and is on top of Keras v3, which can be used natively with JAX, PyTorch, and TensorFlow backends.

.. image:: https://img.shields.io/badge/License-LGPLv3-blue.svg
:target: https://www.gnu.org/licenses/lgpl-3.0.en.html
.. image:: https://github.com/calad0i/HGQ2/actions/workflows/sphinx-build.yml/badge.svg
:target: https://calad0i.github.io/HGQ2/
.. image:: https://badge.fury.io/py/hgq2.svg
:target: https://badge.fury.io/py/hgq2
.. image:: https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg
:target: https://arxiv.org/abs/2405.00645

`High Granularity Quantization (HGQ) <https://github.com/calad0i/HGQ/>`_ is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.
HGQ2 (High Granularity Quantization 2) is a quantization-aware training framework built on Keras v3, targeting real-time deep learning applications on edge devices like FPGAs. It provides a comprehensive set of tools for creating and training quantized neural networks with minimal effort.

.. image:: https://calad0i.github.io/HGQ/_images/overview.svg
:alt: Overview of HGQ
:align: center
HGQ2 implements an gradient-based automatic bitwidth optimization and quantization-aware training algorithm. By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.

Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model.
.. rst-class:: light
.. image:: _static/hgq-overview.svg
:alt: HGQ-overview
:width: 600

.. code-block:: Python
Key Features
------------

import keras
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize
from HGQ import ResetMinMax, FreeBOPs
- **Multi-backend support**: Works with TensorFlow, JAX, and PyTorch through Keras v3
- **Flexible quantization**: Supports different quantization schemes including fixed-point and minifloat
- **Hardware synthesis**: Direct integration with hls4ml for FPGA deployment
- **Trainable quantization parameters**: Optimize bitwidths through gradient-based methods
- **Effective Bit-Operations (EBOP)**: Accurate resource estimation during training for the deployed firmware
- **Advanced layer support**: HGQ2 supports advanced layers like einsum, einsum dense, and multi-head attention layers with quantization and hardware synthesis support

model = keras.models.Sequential([
HQuantize(beta=1.e-5),
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
HDense(10, beta=1.e-5),
])

opt = keras.optimizers.Adam(learning_rate=0.001)
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
callbacks = [ResetMinMax(), FreeBOPs()]
.. code-block:: python
:caption: Simple example

model.fit(..., callbacks=callbacks)
import keras
from hgq.layers import QDense, QConv2D
from hgq.config import LayerConfigScope, QuantizerConfigScope

from HGQ import trace_minmax, to_proxy_model
from hls4ml.converters import convert_from_keras_model
# Setup quantization configuration
# These values are the defaults, just for demonstration purposes here
with (
# Configuration scope for setting the default quantization type and overflow mode
# The second configuration scope overrides the first one for the 'datalane' place
QuantizerConfigScope(place='all', default_q_type='kbi', overflow_mode='SAT_SYM'),
# Configuration scope for enabling EBOPs and setting the beta0 value
QuantizerConfigScope(place='datalane', default_q_type='kif', overflow_mode='WRAP'),
LayerConfigScope(enable_ebops=True, beta0=1e-5),
):
model = keras.Sequential([
QConv2D(32, (3, 3), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Flatten(),
QDense(10)
])

trace_minmax(model, x_train, cover_factor=1.0)
proxy = to_proxy_model(model, aggressive=True)
... # Training, evaluation, and anything else you want to do with the model

model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...)
model_hls = hls4ml.converters.convert_from_keras(model, ...)
# Model-wise precision propagation is done automatically for HGQ models for bit-exactness
# Do NOT pass precision config if you don't know what you are doing

model_hls.compile()

An interactive example of HGQ can be found in the `kaggle notebook <https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1>`_. Full documentation can be found at `calad0i.github.io/HGQ <https://calad0i.github.io/HGQ/>`_.
.. note::
Do not pass any precision configuration from ``hls4ml.converters.convert_from_keras`` in general. HGQ-defined models will invoke model-wise precision propagation automatically to ensure bit-exactness between the Keras model and the generated HLS code (See `here <./precision.html>`__ for more details).
52 changes: 52 additions & 0 deletions docs/advanced/hgq1.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
===================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put both of these under a subsection of "advanced". Other options is to introduce new section called "Concepts" and put the subsection there. And if the advice for new code is HGQ2, maybe add a note here and title this one to (deprecated)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

High Granularity Quantization (HGQ)
===================================

.. note::
While still supported and maintained, HGQ is deprecated in favor of `HGQ2 <../hgq2.html>`_. New projects are strongly encouraged to use HGQ2 instead.

.. image:: https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg
:target: https://calad0i.github.io/HGQ/
.. image:: https://badge.fury.io/py/hgq.svg
:target: https://badge.fury.io/py/hgq
.. image:: https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg
:target: https://arxiv.org/abs/2405.00645

`High Granularity Quantization (HGQ) <https://github.com/calad0i/HGQ/>`_ is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.

.. image:: https://calad0i.github.io/HGQ/_images/overview.svg
:alt: Overview of HGQ
:align: center

Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model.

.. code-block:: Python

import keras
from HGQ.layers import HDense, HDenseBatchNorm, HQuantize
from HGQ import ResetMinMax, FreeBOPs

model = keras.models.Sequential([
HQuantize(beta=1.e-5),
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
HDense(10, beta=1.e-5),
])

opt = keras.optimizers.Adam(learning_rate=0.001)
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
callbacks = [ResetMinMax(), FreeBOPs()]

model.fit(..., callbacks=callbacks)

from HGQ import trace_minmax, to_proxy_model
from hls4ml.converters import convert_from_keras_model

trace_minmax(model, x_train, cover_factor=1.0)
proxy = to_proxy_model(model, aggressive=True)

model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...)


An interactive example of HGQ can be found in the `kaggle notebook <https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1>`_. Full documentation can be found at `calad0i.github.io/HGQ <https://calad0i.github.io/HGQ/>`_.
23 changes: 23 additions & 0 deletions docs/advanced/precision.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
==============================
Model-wise Precision Inference
==============================

The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for **all** precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trusts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning only those are used as source of precision derivation


This pass uses a modified symbolic interval arithmetic to compute the ranges and the needed quantization steps for all precision in the model graph, with the goal of eliminating any discrepancy between the quantized model and the original model. In the inference process, only the raw weight values and the explicit quantizers (either ``FixedPointQuantizer``, or ``linear/relu`` layers with ``trusted=True``) are considered as sources of precision information. All other precision information (e.g., user-defined precision in ``config_from_*`` functions) will not be used in the inference process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a?


Invoking of this pass is configured by the ``bit_exact`` key in the backend configuration (default: ``None``). There are two ways to enable this pass:
- When converting from ``HGQ/HGQ2`` models, this pass is automatically enabled unless ``bit_exact`` is explicitly set to ``False``.
- For other models, this pass can be enabled by setting ``bit_exact`` to ``True``. Currently, only ``QKeras`` sets this key automatically when converting from ``QKeras`` models. Support for ``QONNX`` is planned but not yet implemented.

If the original model is not properly quantized, this pass will lead to huge bitwidths in the model. In this context, properly quantized models are those that have quantizers defined between **all layers with non-trivial arithmetics**. The successful application of this pass should result in bit-exact model, i.e., the quantized model should produce the same outputs as the original model for all inputs [*]_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make explicit what non-trivial arithmetic means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially everything, except reshape, flatten, and linear (why would you put them at input though?). Will add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't been in hls4ml long enough 😉. People regularly put reshape or flatten as the first layer in order to simplify IP integration. And yes, I agree with what you're thinking reading that sentence. Let's not comment on this practice further, just make the docs clear.


Not all operator types are supported in this pass. If any unsupported operator is encountered during the inference, this pass will **crash** the conversion process to prevent silent failures. Please consider use `automatic precision inference <../auto.html>`_ if your model contains unsupported operators or unquantized components.

.. warning::
Importantly, quantizers **should be used immediately after the inputs**, or the input precision may not be properly inferred. If you are using ``HGQ/HGQ2``, this is automatically taken care of in most cases. If you are using ``QKeras``, make sure to put a ``QActivation`` with ``quantized_bits`` right after the input layer such that the input precision can be derived.

.. [*] While quantized, the original model will still operate on float-point values, so there is a chance that the outputs will not be exactly the same due to float rounding errors in the original model.

.. note::
Unlike the automatic precision inference, it is strongly recommended to **not** use the ``config_from_*`` functions to set the precisions in the model. Automatic precision inference and this pass cannot be used simultaneously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this statement more specific? It sounds like users should never use config_from_*?

Loading
Loading