-
Couldn't load subscription status.
- Fork 4.6k
LST: Introducing a condition before checking cuts in the T5 counting kernel #49164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49164/46457
|
|
A new Pull Request was created by @kk428 for master. It involves the following packages:
@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
test parameters:
|
|
@cmsbuild please test |
|
+1 Size: This PR adds an extra 44KB to repository Comparison SummarySummary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
NVIDIA_T4 Comparison SummarySummary:
|
|
assign heterogeneous |
|
@kk428 could you please add to the description any link to performance or timing studies of this change? Thanks |
I included a link to the CI tests done on the SegmentLinking fork. Let me know if you need anything else. Edit: I changed the description from a link to a timing comparison to the timing comparison itself. |
|
+1 |
|
@cms-sw/heterogeneous-l2 |
| } else { | ||
| int quintupletModuleIndex = alpaka::atomicAdd( | ||
| acc, &quintupletsOccupancy.nQuintuplets()[lowerModule1], 1u, alpaka::hierarchy::Threads{}); | ||
| //this if statement should never get executed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean
this if statement should never get executed!
?
The if statement is always going to be executed once the code enters this branch.
Do you mean that the condition should never be true ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it means that the condition should never be true. This comment was leftover from where I copied this block of code from, but I don't think it's really necessary, so I removed it.
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49164/46508 |
|
Thanks |
|
+heterogeneous Although, as I pointed out before, I don't think this kind of changes requires a review by @cms-sw/heterogeneous-l2 . |
|
@cmsbuild please test |
|
+1 Size: This PR adds an extra 36KB to repository Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
NVIDIA_T4 Comparison SummarySummary:
|
|
+1 |
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @sextonkennedy, @ftenchini (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 |
1 similar comment
|
+1 |
With dynamic memory allocation implemented, the high pT QCD sample I am looking at causes LST to crash. There is an overwhelming large amount of fake tracks that need to be accounted for. As is, it attempts to allocate memory for millions of (mostly fake) T5's. We expect the true amount to be around 300,000, compared to the 60,000 in the ttbar sample.
We could implement a cut that would remove many of the fake tracks, but there is a chance that this impacts the efficiency. Instead, here I included a condition that performs the standard set of cuts in the counting kernel if the corresponding triplet is densely connected. Most triplets will only be connected to a handful of potential quintuplets, whereas the ones leading to a large amount of fake tracks will have O(1000) connections. So, I included a condition that checks if the number of inner and outer connections is less than 1000.
Normally the full set of cuts would only be performed in the creation kernel. The cuts that are now included in the counting kernel do not affect the timing for a ttbar sample as none of its events meet the densely connected condition.
Here is a comparison in performance and timing on standalone: