Skip to content

Conversation

@kk428
Copy link
Contributor

@kk428 kk428 commented Oct 14, 2025

With dynamic memory allocation implemented, the high pT QCD sample I am looking at causes LST to crash. There is an overwhelming large amount of fake tracks that need to be accounted for. As is, it attempts to allocate memory for millions of (mostly fake) T5's. We expect the true amount to be around 300,000, compared to the 60,000 in the ttbar sample.

We could implement a cut that would remove many of the fake tracks, but there is a chance that this impacts the efficiency. Instead, here I included a condition that performs the standard set of cuts in the counting kernel if the corresponding triplet is densely connected. Most triplets will only be connected to a handful of potential quintuplets, whereas the ones leading to a large amount of fake tracks will have O(1000) connections. So, I included a condition that checks if the number of inner and outer connections is less than 1000.

Normally the full set of cuts would only be performed in the creation kernel. The cuts that are now included in the counting kernel do not affect the timing for a ttbar sample as none of its events meet the densely connected condition.

Here is a comparison in performance and timing on standalone:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     15.2      0.4      0.4      0.5      0.6      0.3      0.6      0.3      0.9      0.0      19.3       3.9+/-  0.8      19.3   explicit[s=1]
   avg      1.2      0.6      0.6      0.7      0.8      0.3      0.9      0.4      1.3      0.0       6.8       5.3+/-  1.1       6.9   explicit[s=2]
   avg      2.1      0.8      1.0      1.2      1.3      0.4      1.5      0.7      1.9      0.0      10.9       8.4+/-  1.5       2.8   explicit[s=4]
   avg      2.4      1.3      1.5      1.8      1.8      0.6      2.0      0.9      2.6      0.0      14.8      11.8+/-  2.3       2.5   explicit[s=6]
   avg      3.5      1.7      2.0      2.2      2.5      0.7      2.9      1.2      3.3      0.0      20.1      15.8+/-  3.6       5.1   explicit[s=8]
[this PR]
   avg     24.3      0.4      0.4      0.5      0.9      0.3      0.6      0.3      0.9      0.0      28.7       4.1+/-  1.7      28.8   explicit[s=1]
   avg      1.2      0.6      0.6      0.7      1.1      0.3      1.0      0.5      1.2      0.0       7.2       5.7+/-  1.9       3.6   explicit[s=2]
   avg      2.5      0.8      1.0      1.2      1.6      0.4      1.5      0.7      1.9      0.0      11.8       8.8+/-  2.4       3.0   explicit[s=4]
   avg      3.0      1.3      1.4      1.8      2.3      0.6      2.1      1.0      2.7      0.0      16.3      12.7+/-  3.4       2.8   explicit[s=6]
   avg      3.8      1.7      2.0      2.4      3.0      0.7      2.9      1.3      3.3      0.0      21.2      16.7+/-  3.9       2.7   explicit[s=8]

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 14, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49164/46457

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kk428 for master.

It involves the following packages:

  • RecoTracker/LSTCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @elusian, @felicepantaleo, @gpetruc, @mmasciov, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Oct 14, 2025

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 29634.704,29834.704
  • workflows = 29634.703,29834.703,29834.755,29634.757,29834.757
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

@slava77
Copy link
Contributor

slava77 commented Oct 14, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 44KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4492e0/48684/summary.html
COMMIT: 4b229f3
CMSSW: CMSSW_16_0_X_2025-10-14-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/49164/48684/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 4 lines from the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 58
  • DQMHistoTests: Total histograms compared: 4329520
  • DQMHistoTests: Total failures: 26
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4329474
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 57 files compared)
  • Checked 243 log files, 210 edm output root files, 58 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

AMD_W7900 Comparison Summary

Summary:

NVIDIA_H100 Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 264 differences found in the comparisons
  • DQMHistoTests: Total files compared: 12
  • DQMHistoTests: Total histograms compared: 178926
  • DQMHistoTests: Total failures: 28432
  • DQMHistoTests: Total nulls: 10
  • DQMHistoTests: Total successes: 150484
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 11 files compared)
  • Checked 46 log files, 48 edm output root files, 12 DQM output files
  • TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

Summary:

NVIDIA_T4 Comparison Summary

Summary:

@jfernan2
Copy link
Contributor

assign heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@jfernan2
Copy link
Contributor

@kk428 could you please add to the description any link to performance or timing studies of this change? Thanks

@kk428
Copy link
Contributor Author

kk428 commented Oct 15, 2025

@kk428 could you please add to the description any link to performance or timing studies of this change? Thanks

I included a link to the CI tests done on the SegmentLinking fork. Let me know if you need anything else.

Edit: I changed the description from a link to a timing comparison to the timing comparison itself.

@jfernan2
Copy link
Contributor

+1

@slava77
Copy link
Contributor

slava77 commented Oct 21, 2025

@cms-sw/heterogeneous-l2
please clarify on the status/plans of your review of this PR.
Thank you.

} else {
int quintupletModuleIndex = alpaka::atomicAdd(
acc, &quintupletsOccupancy.nQuintuplets()[lowerModule1], 1u, alpaka::hierarchy::Threads{});
//this if statement should never get executed!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean

this if statement should never get executed!

?
The if statement is always going to be executed once the code enters this branch.
Do you mean that the condition should never be true ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it means that the condition should never be true. This comment was leftover from where I copied this block of code from, but I don't think it's really necessary, so I removed it.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #49164 was updated. @cmsbuild, @fwyzard, @jfernan2, @makortel, @mandrenguyen can you please check and sign again.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 21, 2025

Thanks

@fwyzard
Copy link
Contributor

fwyzard commented Oct 21, 2025

+heterogeneous

Although, as I pointed out before, I don't think this kind of changes requires a review by @cms-sw/heterogeneous-l2 .

@slava77
Copy link
Contributor

slava77 commented Oct 21, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 36KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4492e0/48761/summary.html
COMMIT: 788d8b7
CMSSW: CMSSW_16_0_X_2025-10-21-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49164/48761/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 58
  • DQMHistoTests: Total histograms compared: 4329400
  • DQMHistoTests: Total failures: 88
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4329292
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 57 files compared)
  • Checked 243 log files, 210 edm output root files, 58 DQM output files
  • TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

  • You potentially added 5 lines to the logs
  • Reco comparison results: 219 differences found in the comparisons
  • DQMHistoTests: Total files compared: 12
  • DQMHistoTests: Total histograms compared: 180174
  • DQMHistoTests: Total failures: 31474
  • DQMHistoTests: Total nulls: 14
  • DQMHistoTests: Total successes: 148686
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 11 files compared)
  • Checked 46 log files, 48 edm output root files, 12 DQM output files
  • TriggerResults: found differences in 1 / 11 workflows

NVIDIA_H100 Comparison Summary

Summary:

NVIDIA_L40S Comparison Summary

Summary:

NVIDIA_T4 Comparison Summary

Summary:

@jfernan2
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @sextonkennedy, @ftenchini (and backports should be raised in the release meeting by the corresponding L2)

@jfernan2
Copy link
Contributor

+1

1 similar comment
@ftenchini
Copy link

+1

@cmsbuild cmsbuild merged commit bcba1a3 into cms-sw:master Oct 22, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants