-
Notifications
You must be signed in to change notification settings - Fork 163
v4.2.0 #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
v4.2.0 #692
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Mai (Roll249) implemented the probe, and Tyson updated the tests and authorlist --------- Co-authored-by: Tyson Jones <[email protected]>
as part of unitaryHACK 2025, challenge issue #594 --------- Co-authored-by: Tyson Jones <[email protected]>
- applyTrotterizedPauliStrSumGadget() was documented to effect exp(i t H) but actually effected exp(-i t H), eep! Thankfully the doc warned the function was untested. It now correctly effects exp(i t H). - defensively removed hardcoding of the scalar responsible for the above bug, which undoes the coefficient convention of the applyPauliGadget() functions - added applyNonUnitaryTrotterizedPauliStrSumGadget() which permits a non-Hermitian Hamiltonian (i.e. with non-negligible imaginary components of the coefficients) and a complex angle parameter. Among other things, this permits simple imaginary-time evolution - added the full doc for both functions - fixed defunct doxygen command (@cppoverload) Note that both these Trotter functions remain without unit tests since impending new functions are anticipated which will generalise the tests.
New API functions: - applyControlledTrotterizedPauliStrSumGadget() - applyMultiControlledTrotterizedPauliStrSumGadget() - applyMultiStateControlledTrotterizedPauliStrSumGadget() - C++-only std::vector overloads of the latter two. Additionally: - renamed the internal constituent functions, like applyFirstOrderTrotter(), to explicit internal_applyFirstOrderTrotterRepetition() - renamed paulis_getInds() to paulis_getTargetInds() Note that new validation was required to check that no PauliStrSum non-identity Paulis overlapped the control qubits. This is relatively expensive; we build a PauliStrSum target-mask in O(#terms * #qubits) time whereas the previous most expensive validation (checking PauliStrSum targets do not exceed Qureg) costs O(#terms * log(#qubits)). Such costs are still completely occluded by those of simulating/processing a PauliStrSum in the backend, but might still attract lazy evaluation of the target-mask which is bound to the PauliStrSum instance. We have deferred any such optimisation and the associated struct changes since it necessitates an update to the PauliStrSum design (like new sync functions)
so that the COMPILE_CUQUANTUM preprocessor need only ever be consulted by the source during compilation, as proposed by Oliver in #645
- Promoted variables set in compile_option CMake function to parent scope, which ensures correct values are generated in the header file. - Separated compilation configuration defines into config.h - Added guards to check at least one of the required compiler macros is undefined as proposed by @TysonRayJones - Removed guards from modes.h which prevented COMPILE macros from being defined, as proposed by @TysonRayJones
which enable configuring QuEST's execution after compilation, before QuEST environment initialisation, solving some of the issues lamented in #645 and generally being more sensible/convenient. It also patched an esoteric bug in the parsing of floating-point numbers, affecting functions like initInlinePauliStrSum(). Refactor included: - adding (basic) utilities for parsing environment variables. - changing PERMIT_NODES_TO_SHARE_GPU and DEFAULT_EPSILON_ENV_VAR_NOT_A_REAL from macros to environment variables. The latter empowers users to disable all numerically-sensitive validation without modifying or recompiling their code. - patching the parsing of non-quadruple-precision floats which would previously see numbers beyond the qcomp-range silently over or underflow instead of throwing an error (see commit a66f797). - inserted whitespaces into cmake error message about MacOS multithreading to make the advised commands clearer. A subsequent commit will refactor some unit-testing macros to non-QuEST-managed environment variables.
which enables post-compilation pre-runtime configuring of the unit tests without hooking into QuEST's internal environment variable facilities. The macros... - TEST_MAX_NUM_QUBIT_PERMUTATIONS - TEST_MAX_NUM_SUPEROP_TARGETS - TEST_ALL_DEPLOYMENTS - TEST_NUM_MIXED_DEPLOYMENT_REPETITIONS are now environment variables, along with new variable TEST_NUM_QUBITS_IN_QUREG which controls the size of the Quregs in the unit tests. With this commit, all preprocessors considered in #645 have become environment variables
The API functions createFullStateDiagMatr() and createCustomFullStateDiagMatr() worked correctly though their "insufficient distributed memory" validation error messages were erroneously excluded from the expected message lists in their respective unit tests.
For every existing multiply*() function, such as multiplyCompMatr(), this commit adds a corresponding postMultiply*() function which operates upon a density matrix from the right-hand side. This is useful for preparing density matrices in non-physical states which appear as sub-expressions within things like commutators and the Linbladian. Implementing these functions involved: - updating the templating of "any target dense matrix" function, inadvertently simplifying the associated instantiation and dispatch macros by re-using those for the "any target diagonal matrix" function - adding new utilities and logic for obtaining/effecting the transpose of a function, to undo the transpose effected via operation upon the bra-qubits of a vectorised density matrix - extending and refactoring the unit tests with postMultiply references We additionally added the below expected but missing functions from the API: - multiplyPauliX - multiplyPauliY - multiplyPauliZ
Luc: v3.7 was sensible on NUMA machines “by default” through first-touch initialization. This had been lost in v4 as idnetified by James Richings. Here’s some basic numa-aware allocation, and a little love for general parallel/openmp usage. - If we’re on *nix _and_ we find libnuma, we enable NUMA-aware allocaitons - Add & use cpu_allocNumaArray() and cpu_deallocNumaArray for the state-vector allocations (as the current alloc functions are also used for many smaller regions). Fall-back to normal allocation functions if NUMA-unaware. - Perform zero-initialization in parallel (still with std::fill() but use a parallel region) - Make getCurrentNumThreads() work inside parallel regions (!) - Add getAvailableNumThreads() to get thread count outside parallel regions. Improve this from previous getCurrentNumThreads() to only call the omp function once (rather than once per thread). Luc coded the logic and Tyson added doc and error-handling. PR #658 replaced the original of #652 --------- Co-authored-by: Luc Jaulmes <[email protected]>
Danny Hindson discovered a bug wherein the failing cudaMalloc() call deliberately induced by the 'out-of-memory' unit test of createFullStateDiagMatr() breaks subsequent GPU simulation. This is because a failing cudaMalloc corrupts the CUDA API state until being explicitly cleared using the undocumented facility of cudaGetLastError (which clears "non-sticky' errors). We correct this, and defensively check for irrecoverable sticky errors.
in order to shrink the bloated set of operations, making those remaining "standard" and trace-preserving (with the exception of the applyQubitProjector and applyMultiQubitProjector). The new multiplications module is catered to "raw" linear algebra upon density matrices
which has the below benefits: - the remaining functions in operations.cpp are precise and do not need to be user-configured for accuracy (i.e. no Trotter hyperparameters) - the remaining functions in operations.cpp merely call the backend and do not include any bespoke logic (i.e. Trotter circuit scheduling) - incoming new Trotter functions for dynamical simulation will be more clearly delineated from the "standard" (and relatively boring) operations - the Trotter logic is isolated in preparation for it becoming more substantial with the introduction of randomisation, commuting groups, and that necessary for Linblad master equation solving
Previously, the order >= 4 scenario of Trotterisation did not correctly invoke recursion, but instead called first order Trotterisation five times without symmetrisation. This meant passing order=4 erroneously excluded symmetrisation (halving the Trotter depth), and passing order>=6 merely performed unsymmetrised fourth-order Trotterisation. Thankfully the exacted operation was still a valid Trotter approximation of the intended unitary, albeit of lower order and ergo accuracy than expected. This was not caught by the unit tests since they do not exist, as warned in the function documentation. Eep!
specifically: - multiplyQubitProjector - postMultiplyQubitProjector - multiplyMultiQubitProjector - postMultiplyMultiQubitProjector Also updated multiplication doc warnings
Specifically: - applyTrotterizedUnitaryTimeEvolution() - applyTrotterizedImaginaryTimeEvolution() - applyTrotterizedNoisyTimeEvolution() where the latter has significant novelty. PR also - (patch) made imaginary-time evolution assert Hermiticity (315ea41) - (patch) patched non-unitary Trotter on density matrix (01c51e1) - updated dynamics examples to use these new time-evol functions - tidied some Pauli algebra (replacing paulis_hasOddNumY calls with direct paulis_getSignOfPauliStrConj) - made createPauliStrSum validate it can fit in RAM
since paulis.cpp is an "API" file while the pauli logic previously therein was used by numerous core files. A final inelegance remains; some unit test utilities leverage the internal pauli logic functions using extern, which will break when we eventually switch to private namespacing. Alas!
in order to make the applyTrotterized prefix consistent, as considered in #669. Specifically, renamed: - applyNonUnitaryTrotterizedPauliStrSumGadget -> applyTrotterizedNonUnitaryPauliStrSumGadget - applyControlledTrotterizedPauliStrSumGadget -> applyTrotterizedControlledPauliStrSumGadget - applyMultiControlledTrotterizedPauliStrSumGadget -> applyTrotterizedMultiControlledPauliStrSumGadget - applyMultiStateControlledTrotterizedPauliStrSumGadget -> applyTrotterizedMultiStateControlledPauliStrSumGadget
since superseded by setQuregToWeightedSum(). Additionally defined internal convenience functions... - localiser_statevec_scaleAmps - localiser_statevec_setQuregToClone which merely call localiser_statevec_setQuregToWeightedSum, for code clarity
as per #638. The previous use of the Pauli-specific multi-qubit backend logic was suboptimal for single-target since it involved superfluous per-amplitude evaluation of bitstring parity. This introduced a performance regression in single-core QuEST v4 since v3 which used single-target matrix logic. This affects the performance of all explicitly single-target Pauli functions. Specifically: - applyPauliX() - applyControlledPauliX() - applyMultiControlledPauliX() - applyMultiStateControlledPauliX() - applyPauliY() - applyControlledPauliY() - applyMultiControlledPauliY() - applyMultiStateControlledPauliY() - applyMultiStateControlledPauliX() - applyPauliZ() - applyControlledPauliZ() - applyMultiControlledPauliZ() - applyMultiStateControlledPauliZ() - applyRotateX() - applyControlledRotateX() - applyMultiControlledRotateX() - applyMultiStateControlledRotateX() - applyRotateY() - applyControlledRotateY() - applyMultiControlledRotateY() - applyMultiStateControlledRotateY() - applyMultiStateControlledRotateX() - applyRotateZ() - applyControlledRotateZ() - applyMultiControlledRotateZ() - applyMultiStateControlledRotateZ() which are concisely summarised as X cX ccX csX Y cY ccY csY Z cZ ccZ csZ Rx cRx ccRx csRx Ry cRy ccRy csRy Rz cRz ccRz csRz. Beware this does not affect when incidentally passing a single-target through functions which can accept many, such as applyPauliStr() and applyPauliGadget(). Note too that further changes are expected necessary to recover single-core v3 performance.
which had only the below 4 changes visible to users: - changed the significant figures from 3 to 4 of reported memory (e.g. `3.23e1 KiB` becomes `32.30 KiB`) - fixed a README doc link - renamed to clarify the `setQuregToClone` parameters - suppressed illegitimate unused-variable compiler warnings and otherwise merely tidied internal code. See #672 for all changes.
so I can subsequently refactor it (stop propagating options to preprocessors) without pulling out all my hair. Whitespace is free! :^)
As described in issue #638, QuEST v4 contained a performance regression (from v3) only sometimes seen in CPU settings. This was due to the use of std::complex operator overloads in cpu_subroutines.cpp (whereas QuEST v3 hand-rolled complex arithmetic), and affected compilation with Clang (in both single-threaded and multithreaded settings) as well as in GCC (only in single-threaded settings) and potentially other compilers. We tentatively patch this issue by passing additional compiler optimisation flags to cpu_subroutines.cpp which circumvent the issue. This is a rather aggravating solution to a major pitfall in the C++ standard library. After deliberation, it beat out other solutions including hand-rolling complex arithmetic, use of a custom complex type, and use of more precise and compiler-specific flags.
All user-configurable macros utilised by the source code (e.g. `COMPILE_MPI`) are now CMake options, passed to the source only via preparation of the `config.h` header. This centralises them, reduces the myriad of arguments to the compiler command (which made verbose debugging cumbersome), makes erroneous overriding of macros more difficult (if not impossible), and logs the macro choices when installing QuEST. We also took the chance to clean up the main CMakeLists.txt, defend against user-overriding of pre-set macros, and automate setting the QuEST version macros from the CMake build. Finally, we patched an issue when installing QuEST via FetchContent and/or inside a directory (like as a git submodule). Tyson refactored options and Oliver patched the install issues. --------- Co-authored-by: Oliver Thomson Brown <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This release adds exciting new Trotter functionality, including imaginary-time and Lindbladian simulation, as well as functions for more general manipulation of density matrices (like left or right-applying operators). It also patches critical performance regressions in CPU simulation, a maths error in the Trotter functions, and build issues when installing QuEST.
New features
Added real, imaginary and noisy/open (Lindblad) time evolution via Trotterisation (#647). Specifically:
applyTrotterizedUnitaryTimeEvolution()
applyTrotterizedImaginaryTimeEvolution()
applyTrotterizedNoisyTimeEvolution()
Added
applyNonUnitaryPauliGadget()
which permits rotating about a Pauli string by a complex angle, effecting a non-unitary operation (#637).Added
applyTrotterizedNonUnitaryPauliStrSumGadget()
which effectsexp(i x H)
wherex
is any complex scalar (#647).Added controlled Trotter functions (d4706b9, 2110b74, 211d4e8), i.e.
applyTrotterizedControlledPauliStrSumGadget()
applyTrotterizedMultiControlledPauliStrSumGadget()
applyTrotterizedMultiStateControlledPauliStrSumGadget()
Added optimised functions to prepare superpositions and mixtures of an arbitrary number of
Qureg
(d4a5714). Specifically:setQuregToWeightedSum()
setQuregToMixture()
Added environment variables
PERMIT_NODES_TO_SHARE_GPU
andDEFAULT_VALIDATION_EPSILON
, replacing their original macros, which now permits changing their values pre-runtime without recompilation (#653, #655).Added left- and right-application of operators upon density matrices (#657, #668, c49f8d2, 53c6828), i.e.
leftapplyCompMatr1()
rightapplyCompMatr1()
leftapplyCompMatr2()
rightapplyCompMatr2()
leftapplyCompMatr()
rightapplyCompMatr()
leftapplyDiagMatr1()
rightapplyDiagMatr1()
leftapplyDiagMatr2()
rightapplyDiagMatr2()
leftapplyDiagMatr()
rightapplyDiagMatr()
leftapplyDiagMatrPower()
rightapplyDiagMatrPower()
leftapplyFullStateDiagMatr()
rightapplyFullStateDiagMatr()
leftapplyFullStateDiagMatrPower()
rightapplyFullStateDiagMatrPower()
leftapplySwap()
rightapplySwap()
leftapplyPauliX()
rightapplyPauliX()
leftapplyPauliY()
rightapplyPauliY()
leftapplyPauliZ()
rightapplyPauliZ()
leftapplyPauliStr()
rightapplyPauliStr()
leftapplyPauliGadget()
rightapplyPauliGadget()
leftapplyPhaseGadget()
rightapplyPhaseGadget()
leftapplyMultiQubitNot()
rightapplyMultiQubitNot()
leftapplyQubitProjector()
rightapplyQubitProjector()
leftapplyMultiQubitProjector()
rightapplyMultiQubitProjector()
leftapplyPauliStrSum()
rightapplyPauliStrSum()
Function
reportQuESTEnv()
will now additionally display the available RAM (#636).Patches
Restored NUMA awareness, greatly improving multithreaded CPU performance (#658).
Patched a performance regression (from QuEST v3) in CPU simulation, related to complex arithmetic (60f9b3a).
Patched a performance regression (from QuEST v3) in the one-qubit Pauli functions, by redirecting them to make use of the one-qubit dense matrix functions (as performed by v3) (#682).
Patched
applyTrotterizedPauliStrSumGadget()
to effectexp(i t H)
as documented, whereas it previously erroneously effectedexp(-i t H)
(#647).Patched Trotterisation of
order >= 4
. Previously, theorder >= 4
scenario of Trotterisation did not correctly invoke recursion, but instead called first order Trotterisation five times without symmetrisation. This meant passingorder=4
erroneously excluded symmetrisation (halving the Trotter depth), and passingorder>=6
merely performed unsymmetrised fourth-order Trotterisation (e0a0e96).Patched installation of QuEST, which now correctly records the compile-time configuration. Beware that several macros became exclusively cmake options (#645, #685).
The unit test of
createFullStateDiagMatr()
's input validation now expects the correct error message (aa11c23), and does not break CUDA execution thereafter (37e222e).Other changes
Attached boolean field
isCuQuantumEnabled
to theQuESTEnv
struct, so that it can be determined at runtime whether or not the cuQuantum GPU backend is being utilised. This is also now reported byreportQuESTEnv()
(c357267).Made
createPauliStrSum()
validate ahead of time whether it can fit the necessary data structures into RAM (ca6e9f8).Updated the dynamics examples to use the new evolution functions (19e18b2).
Deprecated
setQuregToSuperposition()
since superseded by the newsetQuregToWeightedSum()
(ab8b560).Added
docs/news.md
(5bd864b).Added yet more missing documentation (but the battle continues!) (cb682b2)
New contributors
This release contained patches from new contributors: