Add multithreading support to ddsim #1240

wdconinc · 2024-03-11T01:25:26Z

This PR adds multithreading support to ddsim, through the -j flag. More later.

BEGINRELEASENOTES

Multithreading support in ddsim

ENDRELEASENOTES

github-actions · 2024-03-11T02:04:44Z

Test Results

9 files 9 suites 1h 40m 30s ⏱️
26 tests 23 ✅ 0 💤 3 ❌
201 runs 184 ✅ 0 💤 17 ❌

For more details on these failures, see this check.

Results for commit d0ae8cd.

♻️ This comment has been updated with latest results.

DDG4/python/DDSim/DD4hepSimulation.py

wdconinc · 2024-03-11T18:49:56Z

This is still a work in progress, but let me give some more background here as to the plan (the "more later" from the top).

The intention is, of course, full backwards compatibility so this runs in the single-threaded G4RunManager with command line options as given now (and therefore must pass all tests unchanged).
This PR already works in multi-threaded mode for generators like the gun, but
- Not for external (hepmc) input files: I would like to consider this beyond the scope of this PR since this will require some changes to the input file reader to make sure it does not read the same sequence of events in each file.
- Adapting to MT has caused single-threaded running to break: As written above, this needs to get fixed, of course.
There may be changes to the output files that are due to event reordering. These are likely unavoidable.

It is likely that I will be doing some rewrites of the last commit (5bf711c) which is a bit too big for my taste and not as clear as one would like for non-squash merging.

wdconinc · 2025-03-14T13:50:12Z

The status of this PR after #1417 is that we can now run MT gun simulations, like so:

ddsim --compactFile DDDetectors/compact/SiD.xml -j8 -G -N160 --gun.distribution uniform --outputFile SiD.edm4hep.root

Main missing feature is ensuring inputFiles is serialized. Right now this just reads the same event sequence for all threads.

wdconinc · 2025-03-14T15:21:02Z

DDG4/python/DDSim/DD4hepSimulation.py

-    self._g4gun = None
-    self._g4gps = None


TODO: Geant4 gun and GPS event generators are not initialized correctly due to this removal.

Despite modified implementation, this remain a TODO since only one thread reads the macroFile... (as of da36a3e). And Geant4GeneratorWrapper doesn't allow setting properties when shared = True.

DDG4/python/DDSim/DD4hepSimulation.py

wdconinc · 2025-03-24T19:14:22Z

The primary remaining difficulty I'm having here is with the event input files (I'm using HepMC3), which we create in:

DD4hep/DDG4/python/DDSim/DD4hepSimulation.py

Lines 444 to 446 in 6143d12

    
           gen = DDG4.GeneratorAction(kernel, "Geant4InputAction/hepmc%d" % index) 
        
           gen.Parameters = self.hepmc3.getParameters() 
        
           gen.Input = "HEPMC3FileReader|" + inputFile

Simply making the GeneratorAction shared with DDG4.GeneratorAction(kernel, "Geant4InputAction/hepmc%d" % index, shared=True) fails because the properties (Parameters, Input) that are subsequently set are properties of the Geant4InputAction and not exposed to the GeneratorAction handle gen.

To get around this, I went through setting up the necessary DDG4.InputAction support (duplicating what I felt was too much stuff along the way), and it still ended up not working.

So, I'm open to input as to how best to have a single Geant4InputAction where we can hand-out events sequentially while locking access with G4AutoLock in Geant4InputAction::create_reader and Geant4InputAction::operator().

@MarkusFrankATcernch @andresailer Do you have any suggestions?

MarkusFrankATcernch · 2025-03-25T11:25:10Z

Hi Wouter,

This construct is not simple. Let's see what is necessary before the hacking starts...

For Geant4 the GeneratorAction belongs to the worker thread's execution flow, not the main thread.
This idea is also reflected in DDG4. If you look at DD4hep/blob/master/DDG4/examples/SiDSim_MT.py you can clearly see this.

Now what one probably wants is somehow a structure that this worker thread bound GeneratorAction is sharable between all workers (share same input file, share common parametrization etc) and use this action from all threads. Secondly the underlying reader implementation should be accessible by the configuration (python with ddg4, ddsim, etc) to allow more fine grained customization than just setting some input file and defining a factory.

From what I see somehow the input stage needs then to be re-designed. In particular exchanging data between the actions building the queue of GeneratorActions which define the input stage, cannot use the Geant4Event and its extension mechanism to pass these data. The Geant4Event is bound to the worker thread via the Geant4Context.

Having said all this, it looks necessary that the queue of GeneratorActions needs to be cut in two:

the first part being locked and single threaded doing the file reading (multiple inputs, spillover, multiple collisions per crossing, etc) and the data preparation with user exposure of the queue elements before execution starts.
the second part is not locked and can then finalize the event generation (smearing, primary vertex movements, etc.) and passes all the initial elements to the Geant4Event, which is then passed to Geant4 for the actual simulation work.

How to do something like this in a backwards compatible way is not so obvious to me and should be designed properly.
As I use to say: software is never difficult or complicated unless there is a lack of thinking upfront.
Maybe we should first collect all the goodies we want to see in such an implementation. The parameter setup you mention here is already a first step.

MarkusFrankATcernch · 2025-03-25T12:32:49Z

So after coming back from the drawing board....
What you effectively want is a transition from the following object model as it is implemented now:

to an input stage which looks much rather like this (details are left out here):

Here you have

a single threaded simple input stage filling the Geant4Event structure. The Geant4Events must be requested by the initializing action (Geant4GenerationInit) from the Geant4EventQueue object to limit the number of queued events (memory, etc). This compromises the first actions from the above model.
these Geant4Event entities are then pushed into a queue from which these Geant4Events are then requested by the multi-threaded simulation.
the multi-thread action sequence then finalizes the primaruy generation and the rest continues as shown here: DD4hep/blob/master/DDG4/examples/SiDSim_MT.py

In principle this is not too difficult, but still quite more difficult that you have imagined upfront.

In addition:

the Geant4InputAction objects in the queue need their properties be exposed to the user
the entire machinery must be embedded in ddsim. How this can be done elegantly has to be seen.

wdconinc · 2025-03-25T12:50:01Z

While I agree that the diagram you outline (with an event queue) is optimal for throughput performance, I wonder if there is another less optimal but still functional approach (which may have some threads waiting unnecessarily for some of the time). What I had originally anticipated was a pattern where there are multiple Geant4InputActions sequences, one per thread as now, but the action that reads from disk would have a shared reader that is locked. This would mimic the approach of the Geant4OutputAction where each thread has an output event action, but they share and lock the output file.

MarkusFrankATcernch · 2025-03-25T19:38:05Z

@wdconinc
We should under no circumstances deviate too far from the processing model Geant4 uses.
Geant4 expects the G4GeneratorAction to be part of the worker thread and each worker thread is only provided with a unique seed for the random generator for this event. That's it.

Where one actually puts the Geant4EventQueue is sort of user-gusto here, but it must be located before the
G4Primary particles and vertices are created. If you want to work with input files a la HepMC, you have to go the complicated way I am afraid. Otherwise you would need one input per thread.

If we invest here, we should do it properly....even if it costs some development.

wdconinc · 2025-04-02T23:41:49Z

In addition:

the Geant4InputAction objects in the queue need their properties be exposed to the user

See #1440, which exposes the properties correctly now.

MarkusFrankATcernch · 2025-04-30T13:25:53Z

@wdconinc @murnanedaniel
So in other words, only the raw DDG4 python interface can currently be used to run multi-threaded.
Correct Wouter ?

wdconinc · 2025-04-30T13:37:09Z

@wdconinc @murnanedaniel
So in other words, only the raw DDG4 python interface can currently be used to run multi-threaded.
Correct Wouter ?

The scope of this PR has always been to make multi-threaded running work with ddsim. I don't know what other ways of running DDG4 people are using, but I'm not familiar with them.

murnanedaniel · 2025-05-01T14:52:09Z

Hi @wdconinc. Thanks for the quick response. I have been trying to reproduce your working setup with input hepmc3 files, but get this error, when setting to 8 threads:

=======================================================================
Geant4Kernel     INFO  +++ Created worker instance id=1046861376l
UserInitialization INFO  +++ Executing Geant4UserActionInitialization::Build. Context:0x7f9b303af6a0 Kernel:0x7f9b303afb30 [0]
PyG4Init         INFO  +++ Worker:0 Build PYTHON Worker [empty]....
Geant4Kernel     INFO  ++ Registered global action RunInit of type dd4hep::sim::Test::Geant4TestRunAction
Geant4UI         INFO  +++ RunAction> Install Geant4 control directory:/ddg4.0/RunAction/
Traceback (most recent call last):
  File "/global/cfs/cdirs/m4958/usr/danieltm/ColliderML/software/OtherLibraries/dd4hep-mt/lib/python3.11/site-packages/DDSim/DD4hepSimulation.py", line 480, in __setupWorker
    self.__setupGeneratorActions(kernel, geant4)
  File "/global/cfs/cdirs/m4958/usr/danieltm/ColliderML/software/OtherLibraries/dd4hep-mt/lib/python3.11/site-packages/DDSim/DD4hepSimulation.py", line 339, in __setupGeneratorActions
    shared = (kernel.RunManagerType == "G4MTRunManager")
              ^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m4958/usr/danieltm/ColliderML/software/OtherLibraries/dd4hep-mt/lib/python3.11/site-packages/DDG4.py", line 161, in _getKernelProperty
    elif hasattr(self, name):
         ^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m4958/usr/danieltm/ColliderML/software/OtherLibraries/dd4hep-mt/lib/python3.11/site-packages/DDG4.py", line 161, in _getKernelProperty
    elif hasattr(self, name):
         ^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m4958/usr/danieltm/ColliderML/software/OtherLibraries/dd4hep-mt/lib/python3.11/site-packages/DDG4.py", line 161, in _getKernelProperty
    elif hasattr(self, name):
         ^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded
TypeError: 'NoneType' object cannot be interpreted as an integer
PyG4Init         FATAL +++ Worker setup command returned -1, not SUCCESS (1). Terminating setup

**************************************************** 
*  A runtime error has occured :                     
*    +++ Worker setup command returned -1, not SUCCESS (1). Terminating setup
*  the program will have to be terminated - sorry.   
****************************************************

I removed the elif python handler test that was causing the recursion, leading to (I believe) the real error, which is this:

=======================================================================
Geant4Kernel     INFO  +++ Created worker instance id=16168512l
UserInitialization INFO  +++ Executing Geant4UserActionInitialization::Build. Context:0x7f01f43af6a0 Kernel:0x7f01f43afb30 [0]
PyG4Init         INFO  +++ Worker:0 Build PYTHON Worker [empty]....
Geant4Kernel     INFO  ++ Registered global action RunInit of type dd4hep::sim::Test::Geant4TestRunAction
Geant4UI         INFO  +++ RunAction> Install Geant4 control directory:/ddg4.0/RunAction/
Traceback (most recent call last):
  File "/global/cfs/cdirs/m4958/usr/danieltm/ColliderML/software/OtherLibraries/dd4hep-mt/lib/python3.11/site-packages/DDSim/DD4hepSimulation.py", line 480, in __setupWorker
    self.__setupGeneratorActions(kernel, geant4)
  File "/global/cfs/cdirs/m4958/usr/danieltm/ColliderML/software/OtherLibraries/dd4hep-mt/lib/python3.11/site-packages/DDSim/DD4hepSimulation.py", line 339, in __setupGeneratorActions
    shared = (kernel.RunManagerType == "G4MTRunManager")
              ^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m4958/usr/danieltm/ColliderML/software/OtherLibraries/dd4hep-mt/lib/python3.11/site-packages/DDG4.py", line 162, in _getKernelProperty
    raise KeyError(msg)
KeyError: 'Geant4Kernel::GetProperty [Unhandled]: Cannot access Kernel.RunManagerType'
TypeError: 'NoneType' object cannot be interpreted as an integer
PyG4Init         FATAL +++ Worker setup command returned -1, not SUCCESS (1). Terminating setup

**************************************************** 
*  A runtime error has occured :                     
*    +++ Worker setup command returned -1, not SUCCESS (1). Terminating setup
*  the program will have to be terminated - sorry.   
****************************************************

For context, here is the full output log: full_output.log
My environment should be perfectly vanilla - LCG 107, running on the OpenDataDetector, with Pythia-generated hepmc3 files. I will continue to debug, but hoped you might have a quick idea.

murnanedaniel · 2025-05-01T15:03:15Z

Okay, I made a couple of quick tweaks and it does seem to work, which is really promising. Firstly, the tweaks I made are here: murnanedaniel@cbbe86e. I'm not sure if you agree with them, or why they are not necessary in your setup?

Secondly, there is one final error appearing after the simulation that doesn't appear in the main DD4hep. You can see it at the end of the successful log file: full_output_mt.log
Any idea where this is coming from?

murnanedaniel · 2025-05-14T11:00:22Z

@wdconinc Just following up here, since we will soon be running a large set of simulations with your MT changes. I just want to ensure that the small error that appears, and the tweaks that I needed to make, are still consistent with the working version of ddsim as you see it. On that same point - have you validated that MT sim of hepmc3 inputs produces more or less identical outputs to DD4hep main?

Remove elif leading to recursion and change shared check strategy

andresailer reviewed Mar 11, 2024

View reviewed changes

DDG4/python/DDSim/DD4hepSimulation.py Show resolved Hide resolved

andresailer reviewed Mar 11, 2024

View reviewed changes

DDG4/python/DDSim/DD4hepSimulation.py Outdated Show resolved Hide resolved

wdconinc mentioned this pull request Mar 12, 2025

Support multithreading in Geant4EDM4hepOutput plugin #1417

Merged

1 task

wdconinc force-pushed the ddsim-mt branch from 5bf711c to 2a30872 Compare March 14, 2025 13:47

wdconinc commented Mar 14, 2025

View reviewed changes

wdconinc mentioned this pull request Apr 2, 2025

fix: adopt properties of underlying Geant4GeneratorAction in Geant4SharedGeneratorAction #1440

Merged

wdconinc added 15 commits April 8, 2025 21:26

ddsim: option -j,--numberOfThreads (defaults to 1)

e59b302

ddsim: print number of threads in processing time summary

4135d9d

ddsim: support MT in magnetic field

5442d1e

ddsim: support MT

cd386d2

ddsim: use instance methods

db1c3bb

ddsim: move output into setupWorker

d7f3967

ddsim: remove geant4.terminate since geant4.run does all steps

ecf538d

ddsim: fixup __setupWorker args when single-threaded

767e3a2

ddsim: re-add removed cosmetics

6e83015

ddsim: fix flake8 indendation

f1cbc4c

ddsim: fix flake8 unused variable

40a3eed

ddsim: restore original MagFieldTrackingSetup

65e5e35

ddsim: fix flake8 indentation

da0895f

PyDDG4: run -> runAll; add configure, initialize, run, terminate

0d343de

ddsim: restore separation of configure, initialize from run, terminate

08e2c69

Remove elif leading to recursion and change shared check strategy

cbbe86e

murnanedaniel mentioned this pull request May 26, 2025

Remove elif leading to recursion and change shared check strategy wdconinc/DD4hep#3

Merged

wdconinc added 20 commits August 1, 2025 10:44

ddsim: option -j,--numberOfThreads (defaults to 1)

088cd08

ddsim: print number of threads in processing time summary

35f2442

ddsim: support MT in magnetic field

b44858f

ddsim: support MT

21d70fc

ddsim: use instance methods

762e2b5

ddsim: move output into setupWorker

4beeacc

ddsim: remove geant4.terminate since geant4.run does all steps

1e8d8d6

ddsim: fixup __setupWorker args when single-threaded

2677278

ddsim: re-add removed cosmetics

5eaa6bd

ddsim: fix flake8 indendation

232ecdc

ddsim: fix flake8 unused variable

425ff75

ddsim: restore original MagFieldTrackingSetup

a2c1a8b

ddsim: fix flake8 indentation

d9b82d2

PyDDG4: run -> runAll; add configure, initialize, run, terminate

13be92b

ddsim: restore separation of configure, initialize from run, terminate

13d63e8

ddsim: add each thread's g4gun and g4gps to _g4gun, _g4gun

31eb19e

ddsim: readd exitCode

a586d82

ddsim: use shared generator action for hepmc3

7492c77

ddsim: only set generator shared when G4MTRunManager

f70fb63

ddsim: use shared-access-if-MT for all generators with input file

0e25341

wdconinc force-pushed the ddsim-mt branch from 5db9e60 to 0e25341 Compare August 1, 2025 15:44

wdconinc added 2 commits August 1, 2025 12:37

Merge branch 'ddsim-mt' into ddsim-mt-pr-fixes

1d3f87e

Merge pull request #3 from murnanedaniel/ddsim-mt-pr-fixes

d0ae8cd

Remove elif leading to recursion and change shared check strategy

Add multithreading support to ddsim #1240

Are you sure you want to change the base?

Add multithreading support to ddsim #1240

Uh oh!

Conversation

wdconinc commented Mar 11, 2024

Uh oh!

github-actions bot commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Uh oh!

Uh oh!

wdconinc commented Mar 11, 2024

Uh oh!

wdconinc commented Mar 14, 2025

Uh oh!

wdconinc Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

wdconinc Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wdconinc commented Mar 24, 2025

Uh oh!

MarkusFrankATcernch commented Mar 25, 2025

Uh oh!

MarkusFrankATcernch commented Mar 25, 2025

Uh oh!

wdconinc commented Mar 25, 2025

Uh oh!

MarkusFrankATcernch commented Mar 25, 2025

Uh oh!

wdconinc commented Apr 2, 2025

Uh oh!

MarkusFrankATcernch commented Apr 30, 2025

Uh oh!

wdconinc commented Apr 30, 2025

Uh oh!

murnanedaniel commented May 1, 2025

Uh oh!

murnanedaniel commented May 1, 2025

Uh oh!

murnanedaniel commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Mar 11, 2024 •

edited

Loading

murnanedaniel commented May 14, 2025 •

edited

Loading