Skip to content

Conversation

calad0i
Copy link
Contributor

@calad0i calad0i commented Aug 7, 2025

Description

Adding & changing doc for HGQ2 and da4ml

If #1338 will be merged the precision propagation part will be updated accordingly

Type of change

  • Documentation update

Tests

N/A

Checklist

  • all

Copy link
Contributor

@vloncar vloncar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think minor improvements to the docs could still be made

@@ -140,7 +140,8 @@ In this case, the HLS code is valid for both the Vivado and Quartus backends.
.. code-block:: Python

# Register the converter for custom Keras layer
hls4ml.converters.register_keras_layer_handler('KReverse', parse_reverse_layer)
hls4ml.converters.register_keras_v2_layer_handler('KReverse', parse_reverse_layer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add the tabs here to once they are figured out.

======================================

.. note::
HGQ2 is the successor of the original `HGQ <./hgq1.html>`__. framework, which was built on Keras v2. HGQ2 built on top of Keras v3, leveraging its new features and improvements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add why people should switch, not just cuz it is Keras v3


.. code-block:: Python
Key Features
-----------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lacks a -

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still to do

@@ -0,0 +1,49 @@
===================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put both of these under a subsection of "advanced". Other options is to introduce new section called "Concepts" and put the subsection there. And if the advice for new code is HGQ2, maybe add a note here and title this one to (deprecated)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

.. [*] While quantized, the original model will still operate on float-point values, so there is a chance that the outputs will not be exactly the same due to float rounding errors in the original model.

.. note::
Unlike the automatic precision inference, it is strongly recommended to **not** use the ``config_from_*`` functions to set the precisions in the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just confusing now. Why can't we say in HGQ docs that this is the bit-exact flow and users should not invoke the config_from_.... We can even make it fail if it encounters HGQ layers. And then merge this section with the HGQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #1338 is merged, this flow is not exclusive to hgq only (qkeras for now with that PR, qonnx shall be possible but triggering weird bug so not yet) and can be invoked with bit_exact=True in the converter config.


* **ReuseFactor**\ : in the case that you are pipelining, this defines the pipeline interval or initiation interval
* **ParallelizationFactor**\ : The number of output "pixels" to compute in parallel in convolutional layers. Increasing this parameter results in significant increase in resources required on the FPGA.
* **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource" or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor larger than 1 should be specified when using "resource" or "unrolled" strategy. An example of using larger reuse factor can be found `here. <https://github.com/fastmachinelearning/models/tree/master/keras/KERAS_dense>`__
* **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource", "distributed_arithmetic" (or "da"), or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor must be 1 if using "distributed_arithmetic", and must be larger than 1 when using "resource" or "unrolled" strategy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel da strategy should have been in InitCase and not snake_case. (or just use the word da so whatever the user puts lower() will match it.) Too late now...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, RF of 1 is allowed with Resource strategy, at least for the Intel/Altera backends, since there is no latency. Very often you use RF=1 with Resource strategy in those setups. You might want to clarify that what you are saying is for Vitis/Vivado only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource works with RF = 1...but for Vivado/Vitis it produces a weird circuit that's something between Latency and Resource (it doesn't use BRAM as expected). So maybe not use the term must, but maybe should?

* `HGQ <https://github.com/calad0i/HGQ>`_
The equivalent HGQ API is also supported. HGQ is not compatible with Keras v3. See `advanced/HGQ <../advanced/hgq.html>`__ for more information.
The equivalent HGQ API is also supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note that it is deprecated in favor of HGQ2?

Running C simulation from Python requires a C++11-compatible compiler. On Linux, a GCC C++ compiler ``g++`` is required. Any version from a recent
Linux should work. On MacOS, the *clang*-based ``g++`` is enough. For the oneAPI backend, one must have oneAPI installed, along with the FPGA compiler,
to run C/SYCL simulations.
Running C simulation from Python requires a C++11-compatible compiler. On Linux, a GCC C++ compiler ``g++`` is required. Any version from a recent Linux should work. On MacOS, the *clang*-based ``g++`` is enough. For the oneAPI backend, one must have oneAPI installed, along with the FPGA compiler, to run C/SYCL simulations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add that it is not just any oneAPI it is only 2025.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, on the Mac, clang seemed to stop working, due to firmware/ap_types/ap_int_special.h:60:7: error: reference to 'complex' is ambiguous. I have to use brew-installed gcc now.

+-----------------------+-----+-----+--------------+--------+--------+-----+
| Keras v3 | ✅ | ✅ | ✅ | N/A | ✅ | ❌ |
+-----------------------+-----+-----+--------------+--------+--------+-----+
| HGQ2 | ✅ | ✅ | N/A | N/A | ✅ | ✅ |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you put N/A instead of ❌? keras v2 has einsum(dense), we just don't support it. brevitas supports rnn layers, so i think it is possible to get qonnx version too, we just don't support it. i think all of N/A should be ❌

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(q)onnx n/a was from the original table. RNN is indeed added for qonnx now so we can put a x there now. I hav no idea on how garnet support is defined. Since PyG was used in the original paper, I assumed anything not working with PyG is an N/A.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a note to clarify what "N/A" is supposed to mean here. I think it could be a bit confusing.

* ``hls4ml`` supports Linux and requires python \>=3.10. hlsml does not require a specific Linux distribution version and we recommended to follow the requirements of the HLS tool you are using.
* Windows and macOS are not supported. Setting up ``hls4ml`` on these platforms, for example using the Windows Subsystem for Linux (WSL) should be possible, but we do not provide support for such use cases.

- Vivado HLS versions 2018.2 to 2020.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just say Vivado 2020.1? It sorta works with earlier versions, but it is always as "if you use this version, you're on your own".

Same for Vitis, we can say >=2022.2, versions after 2024.1 are tested less

- Intel HLS versions 20.1 to 21.4, versions > 21.4 have not been tested.
- Vitis HLS versions 2022.2 to 2024.1. Versions <= 2022.1 are known not to work.
- Catapult HLS versions 2024.1_1 to 2024.2
- oneAPI versions 2024.1 to 2025.0. 2025.1 is known to not work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For oneAPI, all versions after 2025.0 will not work. Support will come in the future directly from Altera.

@calad0i calad0i force-pushed the doc branch 2 times, most recently from dee2c21 to 0055bb6 Compare August 18, 2025 19:22
@bo3z bo3z added this to the v1.2.0 milestone Sep 5, 2025
Model-wise Precision Inference
==============================

The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate for **all** precisions in the model. Unlike the automatic precision inference, this pass disregards all user-defined precisions, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be missing a noun in infer the appropriate for **all** precisions between appropriate and for.


Currently, ``hls4ml`` can parse most Keras layers, including core layers, convolutional layers, pooling layers, recurrent layers, merging/reshaping layers and activation layers, implemented either via sequential or functional API. Notably missing are the attention and normalization layers. The ``Lambda`` layers don't save their state in the serialized format and are thus impossible to parse. In this case, the ``Lambda`` layers can be implemented as custom layers and parsed via the :ref:`Extension API`.
For Keras v2, QKeras, and HGQ, ``hls4ml`` supports most of its layers, including core layers, convolutional layers, pooling layers, recurrent layers (not implemented in HGQ), merging/reshaping layers, and activation layers. For normalization layers, only the ``(Q)BatchNormalization`` layer is supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LayerNormalizaton is now supported for Vitis/Vivado backends.

@@ -58,19 +58,42 @@ This parser reads the attributes of the Keras layer instance and populates a dic
It also returns a list of output shapes (one sjape for each output).
In this case, there a single output with the same shape as the input.

.. code-block:: Python
.. tabs::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part only results in empty tabs for both keras v2 and v3. The python code example are both shown, but disappear completely once you click on the tabs.

@@ -140,7 +140,8 @@ In this case, the HLS code is valid for both the Vivado and Quartus backends.
.. code-block:: Python

# Register the converter for custom Keras layer
hls4ml.converters.register_keras_layer_handler('KReverse', parse_reverse_layer)
hls4ml.converters.register_keras_v2_layer_handler('KReverse', parse_reverse_layer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add the tabs here to once they are figured out.


.. code-block:: Python
Key Features
-----------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still to do


* (Q)Keras
* Keras
* Keras v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layout of this part is messed up:
image

+-----------------------+-----+-----+--------------+--------+--------+-----+
| Keras v3 | ✅ | ✅ | ✅ | N/A | ✅ | ❌ |
+-----------------------+-----+-----+--------------+--------+--------+-----+
| HGQ2 | ✅ | ✅ | N/A | N/A | ✅ | ✅ |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a note to clarify what "N/A" is supposed to mean here. I think it could be a bit confusing.

Copy link
Contributor

@bo3z bo3z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; my comments are mostly minor fixes.

Comment on lines +128 to +139
Distributed arithmetic:
```bibtex
@misc{Sun:2025,
title={da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs},
author={Chang Sun and others},
year={2025},
eprint={2507.04535},
archivePrefix={arXiv},
primaryClass={cs.AR},
url={https://arxiv.org/abs/2507.04535},
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also update CITATION.cff

Comment on lines -17 to -22
If you want to use our :doc:`profiling <../advanced/profiling>` toolbox, you might need to install extra dependencies:

.. code-block::

pip install hls4ml[profiling]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? It still exists, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back then there was only profiling in advanced. However, now we have 10 different items in advanced and it doesn't make too much sense to put profiling only here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not list all options somewhere?

@@ -0,0 +1,49 @@
===================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Model-wise Precision Inference
==============================

The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for **all** precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trusts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning only those are used as source of precision derivation


The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for **all** precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers.

This pass uses a modified symbolic interval arithmetic to compute the ranges and the needed quantization steps for all precision in the model graph, with the goal of eliminating any discrepancy between the quantized model and the original model. In the inference process, only the raw weight values and the explicit quantizers (either ``FixedPointQuantizer``, or ``linear/relu`` layers with ``trusted=True``) are considered as sources of precision information. All other precision information (e.g., user-defined precision in ``config_from_*`` functions) will not be used in the inference process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a?

- When converting from ``HGQ/HGQ2`` models, this pass is automatically enabled unless ``bit_exact`` is explicitly set to ``False``.
- For other models, this pass can be enabled by setting ``bit_exact`` to ``True``. Currently, only ``QKeras`` sets this key automatically when converting from ``QKeras`` models. Support for ``QONNX`` is planned but not yet implemented.

If the original model is not properly quantized, this pass will lead to huge bitwidths in the model. In this context, properly quantized models are those that have quantizers defined between **all layers with non-trivial arithmetics**. The successful application of this pass should result in bit-exact model, i.e., the quantized model should produce the same outputs as the original model for all inputs [*]_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make explicit what non-trivial arithmetic means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially everything, except reshape, flatten, and linear (why would you put them at input though?). Will add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't been in hls4ml long enough 😉. People regularly put reshape or flatten as the first layer in order to simplify IP integration. And yes, I agree with what you're thinking reading that sentence. Let's not comment on this practice further, just make the docs clear.

.. [*] While quantized, the original model will still operate on float-point values, so there is a chance that the outputs will not be exactly the same due to float rounding errors in the original model.

.. note::
Unlike the automatic precision inference, it is strongly recommended to **not** use the ``config_from_*`` functions to set the precisions in the model. Automatic precision inference and this pass cannot be used simultaneously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this statement more specific? It sounds like users should never use config_from_*?

@@ -158,12 +161,12 @@ For Vivado backend the options are:
* **Part**\ : the particular FPGA part number that you are considering, here it's a Xilinx Virtex UltraScale+ VU13P FPGA
* **ClockPeriod**\ : the clock period, in ns, at which your algorithm runs
Then you have some optimization parameters for how your algorithm runs:
* **IOType**\ : your options are ``io_parallel`` or ``io_stream`` which defines the type of data structure used for inputs, intermediate activations between layers, and outputs. For ``io_parallel``, arrays are used that, in principle, can be fully unrolled and are typically implemented in RAMs. For ``io_stream``, HLS streams are used, which are a more efficient/scalable mechanism to represent data that are produced and consumed in a sequential manner. Typically, HLS streams are implemented with FIFOs instead of RAMs. For more information see `here <https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/pragma-HLS-stream>`__.
* **HLSConfig**\: the detailed configuration of precision and parallelism, including:
* **IOType**\ : your options are ``io_parallel`` or ``io_stream`` which defines how data is transferred into and out of the HLS model IP, and how the data is transferred between layers. For ``io_parallel``, data are directly wired between layers fully in parallel. For ``io_stream``, HLS streams are used, which instantiates as stateful FIFO buffers, which effectively decouples the producer and consumer (upstream and downstream in a neural network), removing the need of a global state machine coordinating the exact timing for io operations. This is particular useful with the DATAFLOW pipeline style. For more information, see `here <https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/pragma-HLS-stream>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is directly wired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning in the generated rtl the interface is a bunch of wires without any particular structure or storage mechanism implemented. The exact implementation could be backend dependent, but in most cases there is no "RAM" used.


* **ReuseFactor**\ : in the case that you are pipelining, this defines the pipeline interval or initiation interval
* **ParallelizationFactor**\ : The number of output "pixels" to compute in parallel in convolutional layers. Increasing this parameter results in significant increase in resources required on the FPGA.
* **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource" or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor larger than 1 should be specified when using "resource" or "unrolled" strategy. An example of using larger reuse factor can be found `here. <https://github.com/fastmachinelearning/models/tree/master/keras/KERAS_dense>`__
* **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource", "distributed_arithmetic" (or "da"), or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor must be 1 if using "distributed_arithmetic", and must be larger than 1 when using "resource" or "unrolled" strategy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource works with RF = 1...but for Vivado/Vitis it produces a weird circuit that's something between Latency and Resource (it doesn't use BRAM as expected). So maybe not use the term must, but maybe should?

The ``data_format='channels_first'`` parameter of Keras layers is supported, but not extensively tested. All HLS implementations in ``hls4ml`` are based on ``channels_last`` data format and need to be converted to that format before the HLS code can be emitted. We encourage users of ``channels_first`` to report their experiences to developers on GitHub.
For Keras v3, the support for EinsumDense layer is added. For HGQ2, the following layers are supported in addition: `QEinsum`, `QMultiHeadAttention`, `QUnaryFunctionLUT` (arbitrary unary function as a 1-d lookup table), and some binary operators.

keras `Operators` that are not layers are generally not supported in ``hls4ml``. This includes operators such as `Add`, `Subtract`, `Multiply`, and `Divide`. Please use the corresponding Keras layers instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operators are specific keras ones, e.g., keras.src.ops.numpy.Add, here, thus capitalized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants