v4.2.0 #692

TysonRayJones · 2025-10-09T01:54:08Z

Overview

This release adds exciting new Trotter functionality, including imaginary-time and Lindbladian simulation, as well as functions for more general manipulation of density matrices (like left or right-applying operators). It also patches critical performance regressions in CPU simulation, a maths error in the Trotter functions, and build issues when installing QuEST.

New features

Added real, imaginary and noisy/open (Lindblad) time evolution via Trotterisation (#647). Specifically:
Added applyNonUnitaryPauliGadget() which permits rotating about a Pauli string by a complex angle, effecting a non-unitary operation (#637).
Added applyTrotterizedNonUnitaryPauliStrSumGadget() which effects exp(i x H) where x is any complex scalar (#647).
Added controlled Trotter functions (d4706b9, 2110b74, 211d4e8), i.e.
- applyTrotterizedControlledPauliStrSumGadget()
- applyTrotterizedMultiControlledPauliStrSumGadget()
- applyTrotterizedMultiStateControlledPauliStrSumGadget()
Added optimised functions to prepare superpositions and mixtures of an arbitrary number of Qureg (d4a5714). Specifically:
- setQuregToWeightedSum()
- setQuregToMixture()
Added environment variables PERMIT_NODES_TO_SHARE_GPU and DEFAULT_VALIDATION_EPSILON, replacing their original macros, which now permits changing their values pre-runtime without recompilation (#653, #655).
Added left- and right-application of operators upon density matrices (#657, #668, c49f8d2, 53c6828), i.e.
- leftapplyCompMatr1()
- rightapplyCompMatr1()
- leftapplyCompMatr2()
- rightapplyCompMatr2()
- leftapplyCompMatr()
- rightapplyCompMatr()
- leftapplyDiagMatr1()
- rightapplyDiagMatr1()
- leftapplyDiagMatr2()
- rightapplyDiagMatr2()
- leftapplyDiagMatr()
- rightapplyDiagMatr()
- leftapplyDiagMatrPower()
- rightapplyDiagMatrPower()
- leftapplyFullStateDiagMatr()
- rightapplyFullStateDiagMatr()
- leftapplyFullStateDiagMatrPower()
- rightapplyFullStateDiagMatrPower()
- leftapplySwap()
- rightapplySwap()
- leftapplyPauliX()
- rightapplyPauliX()
- leftapplyPauliY()
- rightapplyPauliY()
- leftapplyPauliZ()
- rightapplyPauliZ()
- leftapplyPauliStr()
- rightapplyPauliStr()
- leftapplyPauliGadget()
- rightapplyPauliGadget()
- leftapplyPhaseGadget()
- rightapplyPhaseGadget()
- leftapplyMultiQubitNot()
- rightapplyMultiQubitNot()
- leftapplyQubitProjector()
- rightapplyQubitProjector()
- leftapplyMultiQubitProjector()
- rightapplyMultiQubitProjector()
- leftapplyPauliStrSum()
- rightapplyPauliStrSum()
Function reportQuESTEnv() will now additionally display the available RAM (#636).

Patches

Restored NUMA awareness, greatly improving multithreaded CPU performance (#658).
Patched a performance regression (from QuEST v3) in CPU simulation, related to complex arithmetic (60f9b3a).
Patched a performance regression (from QuEST v3) in the one-qubit Pauli functions, by redirecting them to make use of the one-qubit dense matrix functions (as performed by v3) (#682).
Patched applyTrotterizedPauliStrSumGadget() to effect exp(i t H) as documented, whereas it previously erroneously effected exp(-i t H) (#647).
Patched Trotterisation of order >= 4. Previously, the order >= 4 scenario of Trotterisation did not correctly invoke recursion, but instead called first order Trotterisation five times without symmetrisation. This meant passing order=4 erroneously excluded symmetrisation (halving the Trotter depth), and passing order>=6 merely performed unsymmetrised fourth-order Trotterisation (e0a0e96).
Patched installation of QuEST, which now correctly records the compile-time configuration. Beware that several macros became exclusively cmake options (#645, #685).
The unit test of createFullStateDiagMatr()'s input validation now expects the correct error message (aa11c23), and does not break CUDA execution thereafter (37e222e).

Other changes

Attached boolean field isCuQuantumEnabled to the QuESTEnv struct, so that it can be determined at runtime whether or not the cuQuantum GPU backend is being utilised. This is also now reported by reportQuESTEnv() (c357267).
Made createPauliStrSum() validate ahead of time whether it can fit the necessary data structures into RAM (ca6e9f8).
Updated the dynamics examples to use the new evolution functions (19e18b2).
Deprecated setQuregToSuperposition() since superseded by the new setQuregToWeightedSum() (ab8b560).
Added docs/news.md (5bd864b).
Added yet more missing documentation (but the battle continues!) (cb682b2)

New contributors

This release contained patches from new contributors:

Mai Đức Khang in #636.
Diogo Pratas Maia in #637.

Mai (Roll249) implemented the probe, and Tyson updated the tests and authorlist --------- Co-authored-by: Tyson Jones <[email protected]>

as part of unitaryHACK 2025, challenge issue #594 --------- Co-authored-by: Tyson Jones <[email protected]>

- applyTrotterizedPauliStrSumGadget() was documented to effect exp(i t H) but actually effected exp(-i t H), eep! Thankfully the doc warned the function was untested. It now correctly effects exp(i t H). - defensively removed hardcoding of the scalar responsible for the above bug, which undoes the coefficient convention of the applyPauliGadget() functions - added applyNonUnitaryTrotterizedPauliStrSumGadget() which permits a non-Hermitian Hamiltonian (i.e. with non-negligible imaginary components of the coefficients) and a complex angle parameter. Among other things, this permits simple imaginary-time evolution - added the full doc for both functions - fixed defunct doxygen command (@cppoverload) Note that both these Trotter functions remain without unit tests since impending new functions are anticipated which will generalise the tests.

New API functions: - applyControlledTrotterizedPauliStrSumGadget() - applyMultiControlledTrotterizedPauliStrSumGadget() - applyMultiStateControlledTrotterizedPauliStrSumGadget() - C++-only std::vector overloads of the latter two. Additionally: - renamed the internal constituent functions, like applyFirstOrderTrotter(), to explicit internal_applyFirstOrderTrotterRepetition() - renamed paulis_getInds() to paulis_getTargetInds() Note that new validation was required to check that no PauliStrSum non-identity Paulis overlapped the control qubits. This is relatively expensive; we build a PauliStrSum target-mask in O(#terms * #qubits) time whereas the previous most expensive validation (checking PauliStrSum targets do not exceed Qureg) costs O(#terms * log(#qubits)). Such costs are still completely occluded by those of simulating/processing a PauliStrSum in the backend, but might still attract lazy evaluation of the target-mask which is bound to the PauliStrSum instance. We have deferred any such optimisation and the associated struct changes since it necessitates an update to the PauliStrSum design (like new sync functions)

so that the COMPILE_CUQUANTUM preprocessor need only ever be consulted by the source during compilation, as proposed by Oliver in #645

@TysonRayJones

- Promoted variables set in compile_option CMake function to parent scope, which ensures correct values are generated in the header file. - Separated compilation configuration defines into config.h - Added guards to check at least one of the required compiler macros is undefined as proposed by @TysonRayJones - Removed guards from modes.h which prevented COMPILE macros from being defined, as proposed by @TysonRayJones

which enable configuring QuEST's execution after compilation, before QuEST environment initialisation, solving some of the issues lamented in #645 and generally being more sensible/convenient. It also patched an esoteric bug in the parsing of floating-point numbers, affecting functions like initInlinePauliStrSum(). Refactor included: - adding (basic) utilities for parsing environment variables. - changing PERMIT_NODES_TO_SHARE_GPU and DEFAULT_EPSILON_ENV_VAR_NOT_A_REAL from macros to environment variables. The latter empowers users to disable all numerically-sensitive validation without modifying or recompiling their code. - patching the parsing of non-quadruple-precision floats which would previously see numbers beyond the qcomp-range silently over or underflow instead of throwing an error (see commit a66f797). - inserted whitespaces into cmake error message about MacOS multithreading to make the advised commands clearer. A subsequent commit will refactor some unit-testing macros to non-QuEST-managed environment variables.

which enables post-compilation pre-runtime configuring of the unit tests without hooking into QuEST's internal environment variable facilities. The macros... - TEST_MAX_NUM_QUBIT_PERMUTATIONS - TEST_MAX_NUM_SUPEROP_TARGETS - TEST_ALL_DEPLOYMENTS - TEST_NUM_MIXED_DEPLOYMENT_REPETITIONS are now environment variables, along with new variable TEST_NUM_QUBITS_IN_QUREG which controls the size of the Quregs in the unit tests. With this commit, all preprocessors considered in #645 have become environment variables

The API functions createFullStateDiagMatr() and createCustomFullStateDiagMatr() worked correctly though their "insufficient distributed memory" validation error messages were erroneously excluded from the expected message lists in their respective unit tests.

For every existing multiply*() function, such as multiplyCompMatr(), this commit adds a corresponding postMultiply*() function which operates upon a density matrix from the right-hand side. This is useful for preparing density matrices in non-physical states which appear as sub-expressions within things like commutators and the Linbladian. Implementing these functions involved: - updating the templating of "any target dense matrix" function, inadvertently simplifying the associated instantiation and dispatch macros by re-using those for the "any target diagonal matrix" function - adding new utilities and logic for obtaining/effecting the transpose of a function, to undo the transpose effected via operation upon the bra-qubits of a vectorised density matrix - extending and refactoring the unit tests with postMultiply references We additionally added the below expected but missing functions from the API: - multiplyPauliX - multiplyPauliY - multiplyPauliZ

Luc: v3.7 was sensible on NUMA machines “by default” through first-touch initialization. This had been lost in v4 as idnetified by James Richings. Here’s some basic numa-aware allocation, and a little love for general parallel/openmp usage. - If we’re on *nix _and_ we find libnuma, we enable NUMA-aware allocaitons - Add & use cpu_allocNumaArray() and cpu_deallocNumaArray for the state-vector allocations (as the current alloc functions are also used for many smaller regions). Fall-back to normal allocation functions if NUMA-unaware. - Perform zero-initialization in parallel (still with std::fill() but use a parallel region) - Make getCurrentNumThreads() work inside parallel regions (!) - Add getAvailableNumThreads() to get thread count outside parallel regions. Improve this from previous getCurrentNumThreads() to only call the omp function once (rather than once per thread). Luc coded the logic and Tyson added doc and error-handling. PR #658 replaced the original of #652 --------- Co-authored-by: Luc Jaulmes <[email protected]>

Danny Hindson discovered a bug wherein the failing cudaMalloc() call deliberately induced by the 'out-of-memory' unit test of createFullStateDiagMatr() breaks subsequent GPU simulation. This is because a failing cudaMalloc corrupts the CUDA API state until being explicitly cleared using the undocumented facility of cudaGetLastError (which clears "non-sticky' errors). We correct this, and defensively check for irrecoverable sticky errors.

in order to shrink the bloated set of operations, making those remaining "standard" and trace-preserving (with the exception of the applyQubitProjector and applyMultiQubitProjector). The new multiplications module is catered to "raw" linear algebra upon density matrices

which has the below benefits: - the remaining functions in operations.cpp are precise and do not need to be user-configured for accuracy (i.e. no Trotter hyperparameters) - the remaining functions in operations.cpp merely call the backend and do not include any bespoke logic (i.e. Trotter circuit scheduling) - incoming new Trotter functions for dynamical simulation will be more clearly delineated from the "standard" (and relatively boring) operations - the Trotter logic is isolated in preparation for it becoming more substantial with the introduction of randomisation, commuting groups, and that necessary for Linblad master equation solving

Previously, the order >= 4 scenario of Trotterisation did not correctly invoke recursion, but instead called first order Trotterisation five times without symmetrisation. This meant passing order=4 erroneously excluded symmetrisation (halving the Trotter depth), and passing order>=6 merely performed unsymmetrised fourth-order Trotterisation. Thankfully the exacted operation was still a valid Trotter approximation of the intended unitary, albeit of lower order and ergo accuracy than expected. This was not caught by the unit tests since they do not exist, as warned in the function documentation. Eep!

specifically: - multiplyQubitProjector - postMultiplyQubitProjector - multiplyMultiQubitProjector - postMultiplyMultiQubitProjector Also updated multiplication doc warnings

…ly (#668) as discussed in issue #663

Specifically: - applyTrotterizedUnitaryTimeEvolution() - applyTrotterizedImaginaryTimeEvolution() - applyTrotterizedNoisyTimeEvolution() where the latter has significant novelty. PR also - (patch) made imaginary-time evolution assert Hermiticity (315ea41) - (patch) patched non-unitary Trotter on density matrix (01c51e1) - updated dynamics examples to use these new time-evol functions - tidied some Pauli algebra (replacing paulis_hasOddNumY calls with direct paulis_getSignOfPauliStrConj) - made createPauliStrSum validate it can fit in RAM

since paulis.cpp is an "API" file while the pauli logic previously therein was used by numerous core files. A final inelegance remains; some unit test utilities leverage the internal pauli logic functions using extern, which will break when we eventually switch to private namespacing. Alas!

in order to make the applyTrotterized prefix consistent, as considered in #669. Specifically, renamed: - applyNonUnitaryTrotterizedPauliStrSumGadget -> applyTrotterizedNonUnitaryPauliStrSumGadget - applyControlledTrotterizedPauliStrSumGadget -> applyTrotterizedControlledPauliStrSumGadget - applyMultiControlledTrotterizedPauliStrSumGadget -> applyTrotterizedMultiControlledPauliStrSumGadget - applyMultiStateControlledTrotterizedPauliStrSumGadget -> applyTrotterizedMultiStateControlledPauliStrSumGadget

since superseded by setQuregToWeightedSum(). Additionally defined internal convenience functions... - localiser_statevec_scaleAmps - localiser_statevec_setQuregToClone which merely call localiser_statevec_setQuregToWeightedSum, for code clarity

as per #638. The previous use of the Pauli-specific multi-qubit backend logic was suboptimal for single-target since it involved superfluous per-amplitude evaluation of bitstring parity. This introduced a performance regression in single-core QuEST v4 since v3 which used single-target matrix logic. This affects the performance of all explicitly single-target Pauli functions. Specifically: - applyPauliX() - applyControlledPauliX() - applyMultiControlledPauliX() - applyMultiStateControlledPauliX() - applyPauliY() - applyControlledPauliY() - applyMultiControlledPauliY() - applyMultiStateControlledPauliY() - applyMultiStateControlledPauliX() - applyPauliZ() - applyControlledPauliZ() - applyMultiControlledPauliZ() - applyMultiStateControlledPauliZ() - applyRotateX() - applyControlledRotateX() - applyMultiControlledRotateX() - applyMultiStateControlledRotateX() - applyRotateY() - applyControlledRotateY() - applyMultiControlledRotateY() - applyMultiStateControlledRotateY() - applyMultiStateControlledRotateX() - applyRotateZ() - applyControlledRotateZ() - applyMultiControlledRotateZ() - applyMultiStateControlledRotateZ() which are concisely summarised as X cX ccX csX Y cY ccY csY Z cZ ccZ csZ Rx cRx ccRx csRx Ry cRy ccRy csRy Rz cRz ccRz csRz. Beware this does not affect when incidentally passing a single-target through functions which can accept many, such as applyPauliStr() and applyPauliGadget(). Note too that further changes are expected necessary to recover single-core v3 performance.

which had only the below 4 changes visible to users: - changed the significant figures from 3 to 4 of reported memory (e.g. `3.23e1 KiB` becomes `32.30 KiB`) - fixed a README doc link - renamed to clarify the `setQuregToClone` parameters - suppressed illegitimate unused-variable compiler warnings and otherwise merely tidied internal code. See #672 for all changes.

so I can subsequently refactor it (stop propagating options to preprocessors) without pulling out all my hair. Whitespace is free! :^)

As described in issue #638, QuEST v4 contained a performance regression (from v3) only sometimes seen in CPU settings. This was due to the use of std::complex operator overloads in cpu_subroutines.cpp (whereas QuEST v3 hand-rolled complex arithmetic), and affected compilation with Clang (in both single-threaded and multithreaded settings) as well as in GCC (only in single-threaded settings) and potentially other compilers. We tentatively patch this issue by passing additional compiler optimisation flags to cpu_subroutines.cpp which circumvent the issue. This is a rather aggravating solution to a major pitfall in the C++ standard library. After deliberation, it beat out other solutions including hand-rolling complex arithmetic, use of a custom complex type, and use of more precise and compiler-specific flags.

All user-configurable macros utilised by the source code (e.g. `COMPILE_MPI`) are now CMake options, passed to the source only via preparation of the `config.h` header. This centralises them, reduces the myriad of arguments to the compiler command (which made verbose debugging cumbersome), makes erroneous overriding of macros more difficult (if not impossible), and logs the macro choices when installing QuEST. We also took the chance to clean up the main CMakeLists.txt, defend against user-overriding of pre-set macros, and automate setting the QuEST version macros from the CMake build. Finally, we patched an issue when installing QuEST via FetchContent and/or inside a directory (like as a git submodule). Tyson refactored options and Oliver patched the install issues. --------- Co-authored-by: Oliver Thomson Brown <[email protected]>

Roll249 and others added 30 commits June 4, 2025 00:04

implemented cross-platform RAM probe (#636)

cd5120c

Mai (Roll249) implemented the probe, and Tyson updated the tests and authorlist --------- Co-authored-by: Tyson Jones <[email protected]>

added non-unitary Pauli gadgets (#637)

5fa4f60

as part of unitaryHACK 2025, challenge issue #594 --------- Co-authored-by: Tyson Jones <[email protected]>

bound isCuQuantumEnabled to QuESTEnv

c357267

so that the COMPILE_CUQUANTUM preprocessor need only ever be consulted by the source during compilation, as proposed by Oliver in #645

added multiply projector functions

53c6828

specifically: - multiplyQubitProjector - postMultiplyQubitProjector - multiplyMultiQubitProjector - postMultiplyMultiQubitProjector Also updated multiplication doc warnings

renamed multiply and postmultiply functions to leftapply and rightapp…

2db00e5

…ly (#668) as discussed in issue #663

added setQuregToWeightedSum (and ToMixture)

d4a5714

removed setQuregToSuperposition()

ab8b560

since superseded by setQuregToWeightedSum(). Additionally defined internal convenience functions... - localiser_statevec_scaleAmps - localiser_statevec_setQuregToClone which merely call localiser_statevec_setQuregToWeightedSum, for code clarity

added news doc

5bd864b

tidied CMakeLists.txt

be7edbb

so I can subsequently refactor it (stop propagating options to preprocessors) without pulling out all my hair. Whitespace is free! :^)

updated LICENSE.txt

2655690

added calculation doc

cb682b2

TysonRayJones added 2 commits October 8, 2025 21:52

bumped version to v4.2.0

f0d4bcd

updated README with EPCC lead

b0c66d2

TysonRayJones merged commit 9d7618d into main Oct 14, 2025
260 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v4.2.0 #692

v4.2.0 #692

Uh oh!

TysonRayJones commented Oct 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

v4.2.0 #692

v4.2.0 #692

Uh oh!

Conversation

TysonRayJones commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

New features

Patches

Other changes

New contributors

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TysonRayJones commented Oct 9, 2025 •

edited

Loading