-
Notifications
You must be signed in to change notification settings - Fork 479
doc update for hgq/da #1359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
doc update for hgq/da #1359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think minor improvements to the docs could still be made
@@ -140,7 +140,8 @@ In this case, the HLS code is valid for both the Vivado and Quartus backends. | |||
.. code-block:: Python | |||
|
|||
# Register the converter for custom Keras layer | |||
hls4ml.converters.register_keras_layer_handler('KReverse', parse_reverse_layer) | |||
hls4ml.converters.register_keras_v2_layer_handler('KReverse', parse_reverse_layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make these blocks have tabs for v2 and v3. See:
https://sublime-and-sphinx-guide.readthedocs.io/en/latest/code_blocks.html#code-examples-in-multiple-languages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add the tabs here to once they are figured out.
docs/advanced/hgq.rst
Outdated
====================================== | ||
|
||
.. note:: | ||
HGQ2 is the successor of the original `HGQ <./hgq1.html>`__. framework, which was built on Keras v2. HGQ2 built on top of Keras v3, leveraging its new features and improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add why people should switch, not just cuz it is Keras v3
docs/advanced/hgq.rst
Outdated
|
||
.. code-block:: Python | ||
Key Features | ||
----------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lacks a -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still to do
@@ -0,0 +1,49 @@ | |||
=================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put both of these under a subsection of "advanced". Other options is to introduce new section called "Concepts" and put the subsection there. And if the advice for new code is HGQ2, maybe add a note here and title this one to (deprecated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
docs/advanced/precision.rst
Outdated
.. [*] While quantized, the original model will still operate on float-point values, so there is a chance that the outputs will not be exactly the same due to float rounding errors in the original model. | ||
|
||
.. note:: | ||
Unlike the automatic precision inference, it is strongly recommended to **not** use the ``config_from_*`` functions to set the precisions in the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just confusing now. Why can't we say in HGQ docs that this is the bit-exact flow and users should not invoke the config_from_...
. We can even make it fail if it encounters HGQ layers. And then merge this section with the HGQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #1338 is merged, this flow is not exclusive to hgq only (qkeras for now with that PR, qonnx shall be possible but triggering weird bug so not yet) and can be invoked with bit_exact=True
in the converter config.
|
||
* **ReuseFactor**\ : in the case that you are pipelining, this defines the pipeline interval or initiation interval | ||
* **ParallelizationFactor**\ : The number of output "pixels" to compute in parallel in convolutional layers. Increasing this parameter results in significant increase in resources required on the FPGA. | ||
* **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource" or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor larger than 1 should be specified when using "resource" or "unrolled" strategy. An example of using larger reuse factor can be found `here. <https://github.com/fastmachinelearning/models/tree/master/keras/KERAS_dense>`__ | ||
* **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource", "distributed_arithmetic" (or "da"), or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor must be 1 if using "distributed_arithmetic", and must be larger than 1 when using "resource" or "unrolled" strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel da
strategy should have been in InitCase and not snake_case. (or just use the word da
so whatever the user puts lower()
will match it.) Too late now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, RF of 1 is allowed with Resource strategy, at least for the Intel/Altera backends, since there is no latency. Very often you use RF=1 with Resource strategy in those setups. You might want to clarify that what you are saying is for Vitis/Vivado only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource works with RF = 1...but for Vivado/Vitis it produces a weird circuit that's something between Latency and Resource (it doesn't use BRAM as expected). So maybe not use the term must, but maybe should?
docs/frontend/keras.rst
Outdated
* `HGQ <https://github.com/calad0i/HGQ>`_ | ||
The equivalent HGQ API is also supported. HGQ is not compatible with Keras v3. See `advanced/HGQ <../advanced/hgq.html>`__ for more information. | ||
The equivalent HGQ API is also supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note that it is deprecated in favor of HGQ2?
docs/intro/setup.rst
Outdated
Running C simulation from Python requires a C++11-compatible compiler. On Linux, a GCC C++ compiler ``g++`` is required. Any version from a recent | ||
Linux should work. On MacOS, the *clang*-based ``g++`` is enough. For the oneAPI backend, one must have oneAPI installed, along with the FPGA compiler, | ||
to run C/SYCL simulations. | ||
Running C simulation from Python requires a C++11-compatible compiler. On Linux, a GCC C++ compiler ``g++`` is required. Any version from a recent Linux should work. On MacOS, the *clang*-based ``g++`` is enough. For the oneAPI backend, one must have oneAPI installed, along with the FPGA compiler, to run C/SYCL simulations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add that it is not just any oneAPI it is only 2025.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, on the Mac, clang seemed to stop working, due to firmware/ap_types/ap_int_special.h:60:7: error: reference to 'complex' is ambiguous
. I have to use brew-installed gcc now.
+-----------------------+-----+-----+--------------+--------+--------+-----+ | ||
| Keras v3 | ✅ | ✅ | ✅ | N/A | ✅ | ❌ | | ||
+-----------------------+-----+-----+--------------+--------+--------+-----+ | ||
| HGQ2 | ✅ | ✅ | N/A | N/A | ✅ | ✅ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you put N/A
instead of ❌? keras v2 has einsum(dense), we just don't support it. brevitas supports rnn layers, so i think it is possible to get qonnx version too, we just don't support it. i think all of N/A
should be ❌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(q)onnx n/a was from the original table. RNN is indeed added for qonnx now so we can put a x there now. I hav no idea on how garnet support is defined. Since PyG was used in the original paper, I assumed anything not working with PyG is an N/A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add a note to clarify what "N/A" is supposed to mean here. I think it could be a bit confusing.
docs/intro/status.rst
Outdated
* ``hls4ml`` supports Linux and requires python \>=3.10. hlsml does not require a specific Linux distribution version and we recommended to follow the requirements of the HLS tool you are using. | ||
* Windows and macOS are not supported. Setting up ``hls4ml`` on these platforms, for example using the Windows Subsystem for Linux (WSL) should be possible, but we do not provide support for such use cases. | ||
|
||
- Vivado HLS versions 2018.2 to 2020.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just say Vivado 2020.1? It sorta works with earlier versions, but it is always as "if you use this version, you're on your own".
Same for Vitis, we can say >=2022.2, versions after 2024.1 are tested less
docs/intro/status.rst
Outdated
- Intel HLS versions 20.1 to 21.4, versions > 21.4 have not been tested. | ||
- Vitis HLS versions 2022.2 to 2024.1. Versions <= 2022.1 are known not to work. | ||
- Catapult HLS versions 2024.1_1 to 2024.2 | ||
- oneAPI versions 2024.1 to 2025.0. 2025.1 is known to not work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For oneAPI, all versions after 2025.0 will not work. Support will come in the future directly from Altera.
dee2c21
to
0055bb6
Compare
docs/advanced/precision.rst
Outdated
Model-wise Precision Inference | ||
============================== | ||
|
||
The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate for **all** precisions in the model. Unlike the automatic precision inference, this pass disregards all user-defined precisions, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be missing a noun in infer the appropriate for **all** precisions
between appropriate and for.
|
||
Currently, ``hls4ml`` can parse most Keras layers, including core layers, convolutional layers, pooling layers, recurrent layers, merging/reshaping layers and activation layers, implemented either via sequential or functional API. Notably missing are the attention and normalization layers. The ``Lambda`` layers don't save their state in the serialized format and are thus impossible to parse. In this case, the ``Lambda`` layers can be implemented as custom layers and parsed via the :ref:`Extension API`. | ||
For Keras v2, QKeras, and HGQ, ``hls4ml`` supports most of its layers, including core layers, convolutional layers, pooling layers, recurrent layers (not implemented in HGQ), merging/reshaping layers, and activation layers. For normalization layers, only the ``(Q)BatchNormalization`` layer is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LayerNormalizaton is now supported for Vitis/Vivado backends.
@@ -58,19 +58,42 @@ This parser reads the attributes of the Keras layer instance and populates a dic | |||
It also returns a list of output shapes (one sjape for each output). | |||
In this case, there a single output with the same shape as the input. | |||
|
|||
.. code-block:: Python | |||
.. tabs:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part only results in empty tabs for both keras v2 and v3. The python code example are both shown, but disappear completely once you click on the tabs.
@@ -140,7 +140,8 @@ In this case, the HLS code is valid for both the Vivado and Quartus backends. | |||
.. code-block:: Python | |||
|
|||
# Register the converter for custom Keras layer | |||
hls4ml.converters.register_keras_layer_handler('KReverse', parse_reverse_layer) | |||
hls4ml.converters.register_keras_v2_layer_handler('KReverse', parse_reverse_layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add the tabs here to once they are figured out.
docs/advanced/hgq.rst
Outdated
|
||
.. code-block:: Python | ||
Key Features | ||
----------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still to do
|
||
* (Q)Keras | ||
* Keras | ||
* Keras v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+-----------------------+-----+-----+--------------+--------+--------+-----+ | ||
| Keras v3 | ✅ | ✅ | ✅ | N/A | ✅ | ❌ | | ||
+-----------------------+-----+-----+--------------+--------+--------+-----+ | ||
| HGQ2 | ✅ | ✅ | N/A | N/A | ✅ | ✅ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add a note to clarify what "N/A" is supposed to mean here. I think it could be a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; my comments are mostly minor fixes.
Distributed arithmetic: | ||
```bibtex | ||
@misc{Sun:2025, | ||
title={da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs}, | ||
author={Chang Sun and others}, | ||
year={2025}, | ||
eprint={2507.04535}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.AR}, | ||
url={https://arxiv.org/abs/2507.04535}, | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also update CITATION.cff
If you want to use our :doc:`profiling <../advanced/profiling>` toolbox, you might need to install extra dependencies: | ||
|
||
.. code-block:: | ||
|
||
pip install hls4ml[profiling] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this? It still exists, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back then there was only profiling in advanced. However, now we have 10 different items in advanced and it doesn't make too much sense to put profiling only here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not list all options somewhere?
@@ -0,0 +1,49 @@ | |||
=================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
Model-wise Precision Inference | ||
============================== | ||
|
||
The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for **all** precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trusts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning only those are used as source of precision derivation
|
||
The model-wise precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.bit_exact.BitExact`) attempts to infer the appropriate configuration for **all** precision in the model. Unlike the automatic precision inference, this pass disregards all user-defined precision, and "trust" only data embedded in the model, i.e., the actual values of the weights and explicit quantizers defined between layers. | ||
|
||
This pass uses a modified symbolic interval arithmetic to compute the ranges and the needed quantization steps for all precision in the model graph, with the goal of eliminating any discrepancy between the quantized model and the original model. In the inference process, only the raw weight values and the explicit quantizers (either ``FixedPointQuantizer``, or ``linear/relu`` layers with ``trusted=True``) are considered as sources of precision information. All other precision information (e.g., user-defined precision in ``config_from_*`` functions) will not be used in the inference process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a?
- When converting from ``HGQ/HGQ2`` models, this pass is automatically enabled unless ``bit_exact`` is explicitly set to ``False``. | ||
- For other models, this pass can be enabled by setting ``bit_exact`` to ``True``. Currently, only ``QKeras`` sets this key automatically when converting from ``QKeras`` models. Support for ``QONNX`` is planned but not yet implemented. | ||
|
||
If the original model is not properly quantized, this pass will lead to huge bitwidths in the model. In this context, properly quantized models are those that have quantizers defined between **all layers with non-trivial arithmetics**. The successful application of this pass should result in bit-exact model, i.e., the quantized model should produce the same outputs as the original model for all inputs [*]_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make explicit what non-trivial arithmetic means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially everything, except reshape, flatten, and linear (why would you put them at input though?). Will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You haven't been in hls4ml long enough 😉. People regularly put reshape or flatten as the first layer in order to simplify IP integration. And yes, I agree with what you're thinking reading that sentence. Let's not comment on this practice further, just make the docs clear.
.. [*] While quantized, the original model will still operate on float-point values, so there is a chance that the outputs will not be exactly the same due to float rounding errors in the original model. | ||
|
||
.. note:: | ||
Unlike the automatic precision inference, it is strongly recommended to **not** use the ``config_from_*`` functions to set the precisions in the model. Automatic precision inference and this pass cannot be used simultaneously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this statement more specific? It sounds like users should never use config_from_*?
@@ -158,12 +161,12 @@ For Vivado backend the options are: | |||
* **Part**\ : the particular FPGA part number that you are considering, here it's a Xilinx Virtex UltraScale+ VU13P FPGA | |||
* **ClockPeriod**\ : the clock period, in ns, at which your algorithm runs | |||
Then you have some optimization parameters for how your algorithm runs: | |||
* **IOType**\ : your options are ``io_parallel`` or ``io_stream`` which defines the type of data structure used for inputs, intermediate activations between layers, and outputs. For ``io_parallel``, arrays are used that, in principle, can be fully unrolled and are typically implemented in RAMs. For ``io_stream``, HLS streams are used, which are a more efficient/scalable mechanism to represent data that are produced and consumed in a sequential manner. Typically, HLS streams are implemented with FIFOs instead of RAMs. For more information see `here <https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/pragma-HLS-stream>`__. | |||
* **HLSConfig**\: the detailed configuration of precision and parallelism, including: | |||
* **IOType**\ : your options are ``io_parallel`` or ``io_stream`` which defines how data is transferred into and out of the HLS model IP, and how the data is transferred between layers. For ``io_parallel``, data are directly wired between layers fully in parallel. For ``io_stream``, HLS streams are used, which instantiates as stateful FIFO buffers, which effectively decouples the producer and consumer (upstream and downstream in a neural network), removing the need of a global state machine coordinating the exact timing for io operations. This is particular useful with the DATAFLOW pipeline style. For more information, see `here <https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/pragma-HLS-stream>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data is directly wired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning in the generated rtl the interface is a bunch of wire
s without any particular structure or storage mechanism implemented. The exact implementation could be backend dependent, but in most cases there is no "RAM" used.
|
||
* **ReuseFactor**\ : in the case that you are pipelining, this defines the pipeline interval or initiation interval | ||
* **ParallelizationFactor**\ : The number of output "pixels" to compute in parallel in convolutional layers. Increasing this parameter results in significant increase in resources required on the FPGA. | ||
* **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource" or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor larger than 1 should be specified when using "resource" or "unrolled" strategy. An example of using larger reuse factor can be found `here. <https://github.com/fastmachinelearning/models/tree/master/keras/KERAS_dense>`__ | ||
* **Strategy**\ : Optimization strategy on FPGA, either "Latency", "Resource", "distributed_arithmetic" (or "da"), or "Unrolled". If none is supplied then hl4ml uses "Latency" as default. Note that a reuse factor must be 1 if using "distributed_arithmetic", and must be larger than 1 when using "resource" or "unrolled" strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource works with RF = 1...but for Vivado/Vitis it produces a weird circuit that's something between Latency and Resource (it doesn't use BRAM as expected). So maybe not use the term must, but maybe should?
The ``data_format='channels_first'`` parameter of Keras layers is supported, but not extensively tested. All HLS implementations in ``hls4ml`` are based on ``channels_last`` data format and need to be converted to that format before the HLS code can be emitted. We encourage users of ``channels_first`` to report their experiences to developers on GitHub. | ||
For Keras v3, the support for EinsumDense layer is added. For HGQ2, the following layers are supported in addition: `QEinsum`, `QMultiHeadAttention`, `QUnaryFunctionLUT` (arbitrary unary function as a 1-d lookup table), and some binary operators. | ||
|
||
keras `Operators` that are not layers are generally not supported in ``hls4ml``. This includes operators such as `Add`, `Subtract`, `Multiply`, and `Divide`. Please use the corresponding Keras layers instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operators are specific keras ones, e.g., keras.src.ops.numpy.Add
, here, thus capitalized.
Description
Adding & changing doc for HGQ2 and da4ml
If #1338 will be merged the precision propagation part will be updated accordingly
Type of change
Tests
N/A
Checklist