doc update for hgq/da #1359

calad0i · 2025-08-07T16:09:59Z

Description

Adding & changing doc for HGQ2 and da4ml

If #1338 will be merged the precision propagation part will be updated accordingly

Type of change

Documentation update

Tests

N/A

Checklist

all

vloncar

I think minor improvements to the docs could still be made

vloncar · 2025-08-08T13:46:26Z

docs/advanced/extension.rst

@@ -140,7 +140,8 @@ In this case, the HLS code is valid for both the Vivado and Quartus backends.
 .. code-block:: Python

    # Register the converter for custom Keras layer
-    hls4ml.converters.register_keras_layer_handler('KReverse', parse_reverse_layer)
+    hls4ml.converters.register_keras_v2_layer_handler('KReverse', parse_reverse_layer)


I think we can make these blocks have tabs for v2 and v3. See:
https://sublime-and-sphinx-guide.readthedocs.io/en/latest/code_blocks.html#code-examples-in-multiple-languages

Would be nice to add the tabs here to once they are figured out.

vloncar · 2025-08-08T13:49:05Z

docs/advanced/hgq.rst

+======================================
+
+.. note::
+   HGQ2 is the successor of the original `HGQ <./hgq1.html>`__. framework, which was built on Keras v2. HGQ2 built on top of Keras v3, leveraging its new features and improvements.


Maybe also add why people should switch, not just cuz it is Keras v3

vloncar · 2025-08-08T13:49:22Z

docs/advanced/hgq.rst


-.. code-block:: Python
+Key Features
+-----------


Lacks a -

Still to do

vloncar · 2025-08-08T13:53:01Z

docs/advanced/hgq1.rst

@@ -0,0 +1,49 @@
+===================================


Can we put both of these under a subsection of "advanced". Other options is to introduce new section called "Concepts" and put the subsection there. And if the advice for new code is HGQ2, maybe add a note here and title this one to (deprecated)

vloncar · 2025-08-08T14:49:13Z

docs/advanced/precision.rst

+.. [*] While quantized, the original model will still operate on float-point values, so there is a chance that the outputs will not be exactly the same due to float rounding errors in the original model.
+
+.. note::
+    Unlike the automatic precision inference, it is strongly recommended to **not** use the ``config_from_*`` functions to set the precisions in the model.


This is just confusing now. Why can't we say in HGQ docs that this is the bit-exact flow and users should not invoke the config_from_.... We can even make it fail if it encounters HGQ layers. And then merge this section with the HGQ

After #1338 is merged, this flow is not exclusive to hgq only (qkeras for now with that PR, qonnx shall be possible but triggering weird bug so not yet) and can be invoked with bit_exact=True in the converter config.

vloncar · 2025-08-08T14:53:38Z

docs/api/configuration.rst


  * **ReuseFactor**\ : in the case that you are pipelining, this defines the pipeline interval or initiation interval
  * **ParallelizationFactor**\ : The number of output "pixels" to compute in parallel in convolutional layers. Increasing this parameter results in significant increase in resources required on the FPGA.
-  * **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource" or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor larger than 1 should be specified when using "resource" or "unrolled" strategy. An example of using larger reuse factor can be found `here. <https://github.com/fastmachinelearning/models/tree/master/keras/KERAS_dense>`__
+  * **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource", "distributed_arithmetic" (or "da"), or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor must be 1 if using "distributed_arithmetic", and must be larger than 1 when using "resource" or "unrolled" strategy.


I still feel da strategy should have been in InitCase and not snake_case. (or just use the word da so whatever the user puts lower() will match it.) Too late now...

Actually, RF of 1 is allowed with Resource strategy, at least for the Intel/Altera backends, since there is no latency. Very often you use RF=1 with Resource strategy in those setups. You might want to clarify that what you are saying is for Vitis/Vivado only.

Resource works with RF = 1...but for Vivado/Vitis it produces a weird circuit that's something between Latency and Resource (it doesn't use BRAM as expected). So maybe not use the term must, but maybe should?

vloncar · 2025-08-08T14:54:28Z

docs/frontend/keras.rst

 * `HGQ <https://github.com/calad0i/HGQ>`_
-    The equivalent HGQ API is also supported. HGQ is not compatible with Keras v3. See `advanced/HGQ <../advanced/hgq.html>`__ for more information.
+    The equivalent HGQ API is also supported.


Add a note that it is deprecated in favor of HGQ2?

vloncar · 2025-08-08T14:56:26Z

docs/intro/setup.rst

-Running C simulation from Python requires a C++11-compatible compiler. On Linux, a GCC C++ compiler ``g++`` is required. Any version from a recent
-Linux should work. On MacOS, the *clang*-based ``g++`` is enough. For the oneAPI backend, one must have oneAPI installed, along with the FPGA compiler,
-to run C/SYCL simulations.
+Running C simulation from Python requires a C++11-compatible compiler. On Linux, a GCC C++ compiler ``g++`` is required. Any version from a recent Linux should work. On MacOS, the *clang*-based ``g++`` is enough. For the oneAPI backend, one must have oneAPI installed, along with the FPGA compiler, to run C/SYCL simulations.


Should we add that it is not just any oneAPI it is only 2025.0?

As an aside, on the Mac, clang seemed to stop working, due to firmware/ap_types/ap_int_special.h:60:7: error: reference to 'complex' is ambiguous. I have to use brew-installed gcc now.

vloncar · 2025-08-08T15:05:30Z

docs/intro/status.rst

+-----------------------+-----+-----+--------------+--------+--------+-----+
+| Keras v3              | ✅  | ✅  | ✅           | N/A    | ✅     | ❌  |
+-----------------------+-----+-----+--------------+--------+--------+-----+
+| HGQ2                  | ✅  | ✅  | N/A          | N/A    | ✅     | ✅  |


why do you put N/A instead of ❌? keras v2 has einsum(dense), we just don't support it. brevitas supports rnn layers, so i think it is possible to get qonnx version too, we just don't support it. i think all of N/A should be ❌

(q)onnx n/a was from the original table. RNN is indeed added for qonnx now so we can put a x there now. I hav no idea on how garnet support is defined. Since PyG was used in the original paper, I assumed anything not working with PyG is an N/A.

Would be nice to add a note to clarify what "N/A" is supposed to mean here. I think it could be a bit confusing.

vloncar · 2025-08-08T15:08:48Z

docs/intro/status.rst

-* ``hls4ml`` supports Linux and requires python \>=3.10. hlsml does not require a specific Linux distribution version and we recommended to follow the requirements of the HLS tool you are using.
-* Windows and macOS are not supported. Setting up ``hls4ml`` on these platforms, for example using the Windows Subsystem for Linux (WSL) should be possible, but we do not provide support for such use cases.
+
+  - Vivado HLS versions 2018.2 to 2020.1


Can we just say Vivado 2020.1? It sorta works with earlier versions, but it is always as "if you use this version, you're on your own".

Same for Vitis, we can say >=2022.2, versions after 2024.1 are tested less

jmitrevs · 2025-08-18T19:01:24Z

docs/intro/status.rst

+  - Intel HLS versions 20.1 to 21.4, versions > 21.4 have not been tested.
+  - Vitis HLS versions 2022.2 to 2024.1. Versions <= 2022.1 are known not to work.
+  - Catapult HLS versions 2024.1_1 to 2024.2
+  - oneAPI versions 2024.1 to 2025.0. 2025.1 is known to not work.


For oneAPI, all versions after 2025.0 will not work. Support will come in the future directly from Altera.

JanFSchulte · 2025-09-08T14:34:29Z

docs/advanced/precision.rst

+Model-wise Precision Inference
+==============================
+
+The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate for **all** precisions in the model. Unlike the automatic precision inference, this pass disregards all user-defined precisions, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers.


This seems to be missing a noun in infer the appropriate for **all** precisions between appropriate and for.

JanFSchulte · 2025-09-08T14:39:12Z

docs/frontend/keras.rst


-Currently, ``hls4ml`` can parse most Keras layers, including core layers, convolutional layers, pooling layers, recurrent layers, merging/reshaping layers and activation layers, implemented either via sequential or functional API. Notably missing are the attention and normalization layers. The ``Lambda`` layers don't save their state in the serialized format and are thus impossible to parse. In this case, the ``Lambda`` layers can be implemented as custom layers and parsed via the :ref:`Extension API`.
+For Keras v2, QKeras, and HGQ, ``hls4ml`` supports most of its layers, including core layers, convolutional layers, pooling layers, recurrent layers (not implemented in HGQ), merging/reshaping layers, and activation layers. For normalization layers, only the ``(Q)BatchNormalization`` layer is supported.


LayerNormalizaton is now supported for Vitis/Vivado backends.

JanFSchulte · 2025-09-08T15:14:01Z

docs/advanced/extension.rst

@@ -58,19 +58,42 @@ This parser reads the attributes of the Keras layer instance and populates a dic
 It also returns a list of output shapes (one sjape for each output).
 In this case, there a single output with the same shape as the input.

-.. code-block:: Python
+.. tabs::


This part only results in empty tabs for both keras v2 and v3. The python code example are both shown, but disappear completely once you click on the tabs.

JanFSchulte · 2025-09-08T15:16:17Z

docs/advanced/extension.rst

@@ -140,7 +140,8 @@ In this case, the HLS code is valid for both the Vivado and Quartus backends.
 .. code-block:: Python

    # Register the converter for custom Keras layer
-    hls4ml.converters.register_keras_layer_handler('KReverse', parse_reverse_layer)
+    hls4ml.converters.register_keras_v2_layer_handler('KReverse', parse_reverse_layer)


Would be nice to add the tabs here to once they are figured out.

JanFSchulte · 2025-09-08T15:17:24Z

docs/advanced/hgq.rst


-.. code-block:: Python
+Key Features
+-----------


Still to do

JanFSchulte · 2025-09-08T15:21:18Z

docs/intro/status.rst


-* (Q)Keras
+* Keras
+  * Keras v2


The layout of this part is messed up:

JanFSchulte · 2025-09-08T15:25:38Z

docs/intro/status.rst

+-----------------------+-----+-----+--------------+--------+--------+-----+
+| Keras v3              | ✅  | ✅  | ✅           | N/A    | ✅     | ❌  |
+-----------------------+-----+-----+--------------+--------+--------+-----+
+| HGQ2                  | ✅  | ✅  | N/A          | N/A    | ✅     | ✅  |


Would be nice to add a note to clarify what "N/A" is supposed to mean here. I think it could be a bit confusing.

bo3z

LGTM; my comments are mostly minor fixes.

bo3z · 2025-09-10T14:19:16Z

README.md

+Distributed arithmetic:
+```bibtex
+@misc{Sun:2025,
+      title={da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs},
+      author={Chang Sun and others},
+      year={2025},
+      eprint={2507.04535},
+      archivePrefix={arXiv},
+      primaryClass={cs.AR},
+      url={https://arxiv.org/abs/2507.04535},
+}
+```


Should also update CITATION.cff

bo3z · 2025-09-10T14:38:24Z

docs/intro/setup.rst

-If you want to use our :doc:`profiling <../advanced/profiling>` toolbox, you might need to install extra dependencies:
-
-.. code-block::
-
-   pip install hls4ml[profiling]
-


Why remove this? It still exists, no?

Back then there was only profiling in advanced. However, now we have 10 different items in advanced and it doesn't make too much sense to put profiling only here.

Should we not list all options somewhere?

bo3z · 2025-09-10T15:25:20Z

docs/advanced/hgq1.rst

@@ -0,0 +1,49 @@
+===================================


bo3z · 2025-09-10T15:26:09Z

docs/advanced/precision.rst

+Model-wise Precision Inference
+==============================
+
+The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for **all** precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers.


Meaning only those are used as source of precision derivation

bo3z · 2025-09-10T15:26:26Z

docs/advanced/precision.rst

+
+The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for **all** precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers.
+
+This pass uses a modified symbolic interval arithmetic to compute the ranges and the needed quantization steps for all precision in the model graph, with the goal of eliminating any discrepancy between the quantized model and the original model. In the inference process, only the raw weight values and the explicit quantizers (either ``FixedPointQuantizer``, or ``linear/relu`` layers with ``trusted=True``) are considered as sources of precision information. All other precision information (e.g., user-defined precision in ``config_from_*`` functions) will not be used in the inference process.


bo3z · 2025-09-10T15:28:08Z

docs/advanced/precision.rst

+- When converting from ``HGQ/HGQ2`` models, this pass is automatically enabled unless ``bit_exact`` is explicitly set to ``False``.
+- For other models, this pass can be enabled by setting ``bit_exact`` to ``True``. Currently, only ``QKeras`` sets this key automatically when converting from ``QKeras`` models. Support for ``QONNX`` is planned but not yet implemented.
+
+If the original model is not properly quantized, this pass will lead to huge bitwidths in the model. In this context, properly quantized models are those that have quantizers defined between **all layers with non-trivial arithmetics**. The successful application of this pass should result in bit-exact model, i.e., the quantized model should produce the same outputs as the original model for all inputs [*]_.


Should we make explicit what non-trivial arithmetic means?

Essentially everything, except reshape, flatten, and linear (why would you put them at input though?). Will add.

You haven't been in hls4ml long enough 😉. People regularly put reshape or flatten as the first layer in order to simplify IP integration. And yes, I agree with what you're thinking reading that sentence. Let's not comment on this practice further, just make the docs clear.

bo3z · 2025-09-10T15:29:20Z

docs/advanced/precision.rst

+.. [*] While quantized, the original model will still operate on float-point values, so there is a chance that the outputs will not be exactly the same due to float rounding errors in the original model.
+
+.. note::
+    Unlike the automatic precision inference, it is strongly recommended to **not** use the ``config_from_*`` functions to set the precisions in the model. Automatic precision inference and this pass cannot be used simultaneously.


Can we make this statement more specific? It sounds like users should never use config_from_*?

bo3z · 2025-09-10T15:29:43Z

docs/api/configuration.rst

@@ -158,12 +161,12 @@ For Vivado backend the options are:
 * **Part**\ : the particular FPGA part number that you are considering, here it's a Xilinx Virtex UltraScale+ VU13P FPGA
 * **ClockPeriod**\ : the clock period, in ns, at which your algorithm runs
  Then you have some optimization parameters for how your algorithm runs:
-* **IOType**\ : your options are ``io_parallel`` or ``io_stream`` which defines the type of data structure used for inputs, intermediate activations between layers, and outputs. For ``io_parallel``, arrays are used that, in principle, can be fully unrolled and are typically implemented in RAMs. For ``io_stream``, HLS streams are used, which are a more efficient/scalable mechanism to represent data that are produced and consumed in a sequential manner. Typically, HLS streams are implemented with FIFOs instead of RAMs. For more information see `here <https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/pragma-HLS-stream>`__.
-* **HLSConfig**\: the detailed configuration of precision and parallelism, including:
+* **IOType**\ : your options are ``io_parallel`` or ``io_stream`` which defines how data is transferred into and out of the HLS model IP, and how the data is transferred between layers. For ``io_parallel``, data are directly wired between layers fully in parallel. For ``io_stream``, HLS streams are used, which instantiates as stateful FIFO buffers, which effectively decouples the producer and consumer (upstream and downstream in a neural network), removing the need of a global state machine coordinating the exact timing for io operations. This is particular useful with the DATAFLOW pipeline style. For more information, see `here <https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/pragma-HLS-stream>`__.


data is directly wired?

Meaning in the generated rtl the interface is a bunch of wires without any particular structure or storage mechanism implemented. The exact implementation could be backend dependent, but in most cases there is no "RAM" used.

bo3z · 2025-09-10T15:31:06Z

docs/api/configuration.rst


  * **ReuseFactor**\ : in the case that you are pipelining, this defines the pipeline interval or initiation interval
  * **ParallelizationFactor**\ : The number of output "pixels" to compute in parallel in convolutional layers. Increasing this parameter results in significant increase in resources required on the FPGA.
-  * **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource" or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor larger than 1 should be specified when using "resource" or "unrolled" strategy. An example of using larger reuse factor can be found `here. <https://github.com/fastmachinelearning/models/tree/master/keras/KERAS_dense>`__
+  * **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource", "distributed_arithmetic" (or "da"), or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor must be 1 if using "distributed_arithmetic", and must be larger than 1 when using "resource" or "unrolled" strategy.


Resource works with RF = 1...but for Vivado/Vitis it produces a weird circuit that's something between Latency and Resource (it doesn't use BRAM as expected). So maybe not use the term must, but maybe should?

bo3z · 2025-09-10T15:31:38Z

docs/frontend/keras.rst

-The ``data_format='channels_first'`` parameter of Keras layers is supported, but not extensively tested. All HLS implementations in ``hls4ml`` are based on ``channels_last`` data format and need to be converted to that format before the HLS code can be emitted. We encourage users of ``channels_first`` to report their experiences to developers on GitHub.
+For Keras v3, the support for EinsumDense layer is added. For HGQ2, the following layers are supported in addition: `QEinsum`, `QMultiHeadAttention`, `QUnaryFunctionLUT` (arbitrary unary function as a 1-d lookup table), and some binary operators.
+
+keras `Operators` that are not layers are generally not supported in ``hls4ml``. This includes operators such as `Add`, `Subtract`, `Multiply`, and `Divide`. Please use the corresponding Keras layers instead.


Capitalized?

The operators are specific keras ones, e.g., keras.src.ops.numpy.Add, here, thus capitalized.

doc update for hgq/da

0793293

vloncar reviewed Aug 8, 2025

View reviewed changes

jmitrevs reviewed Aug 18, 2025

View reviewed changes

calad0i force-pushed the doc branch 2 times, most recently from dee2c21 to 0055bb6 Compare August 18, 2025 19:22

updates

df1bb35

calad0i force-pushed the doc branch from 0055bb6 to df1bb35 Compare August 18, 2025 19:24

calad0i added 2 commits August 18, 2025 12:26

why there is another requirements.txt?

f8123bb

simpler example model

a89f41c

bo3z added this to the v1.2.0 milestone Sep 5, 2025

JanFSchulte reviewed Sep 8, 2025

View reviewed changes

calad0i and others added 5 commits September 9, 2025 06:25

smaller workflow file

e8d1e77

Merge branch 'main' into doc

02ba839

model-wise desc update

fd7d1e8

fix broken hyperlink

3ddf275

rst syntax fix with white spaces

76a4b78

bo3z requested changes Sep 10, 2025

View reviewed changes


		Currently, ``hls4ml`` can parse most Keras layers, including core layers, convolutional layers, pooling layers, recurrent layers, merging/reshaping layers and activation layers, implemented either via sequential or functional API. Notably missing are the attention and normalization layers. The ``Lambda`` layers don't save their state in the serialized format and are thus impossible to parse. In this case, the ``Lambda`` layers can be implemented as custom layers and parsed via the :ref:`Extension API`.
		For Keras v2, QKeras, and HGQ, ``hls4ml`` supports most of its layers, including core layers, convolutional layers, pooling layers, recurrent layers (not implemented in HGQ), merging/reshaping layers, and activation layers. For normalization layers, only the ``(Q)BatchNormalization`` layer is supported.


		The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for all precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers.

		This pass uses a modified symbolic interval arithmetic to compute the ranges and the needed quantization steps for all precision in the model graph, with the goal of eliminating any discrepancy between the quantized model and the original model. In the inference process, only the raw weight values and the explicit quantizers (either ``FixedPointQuantizer``, or ``linear/relu`` layers with ``trusted=True``) are considered as sources of precision information. All other precision information (e.g., user-defined precision in ``config_from_*`` functions) will not be used in the inference process.

doc update for hgq/da #1359

Are you sure you want to change the base?

doc update for hgq/da #1359

Uh oh!

Conversation

calad0i commented Aug 7, 2025

Description

Type of change

Tests

Checklist

Uh oh!

vloncar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bo3z left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment