- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
Optimize histogram reservoir #7443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize histogram reservoir #7443
Conversation
| Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@           Coverage Diff           @@
##            main   #7443     +/-   ##
=======================================
- Coverage   86.2%   86.2%   -0.1%     
=======================================
  Files        295     295             
  Lines      25864   25863      -1     
=======================================
- Hits       22307   22303      -4     
- Misses      3184    3187      +3     
  Partials     373     373             
 🚀 New features to boost your workflow:
 | 
512b67e    to
    a7d395d      
    Compare
  
    864211f    to
    6497b59      
    Compare
  
    7457c73    to
    7c1476f      
    Compare
  
    7c1476f    to
    7b79e43      
    Compare
  
    Forked from this discussion here: #7443 (comment) It seems like a good idea for us as a group to align on and document what we are comfortable with in terms of how ordered measurements are reflected in collected metric data. --------- Co-authored-by: Tyler Yahn <[email protected]>
| On further reflection, I fixed the copying issue before running the benchmark, so it is perhaps reasonable that less racy code runs slower. Would be good if the tests and/or linter detected the issue.  I note that  | 
433ff16    to
    e4dfbac      
    Compare
  
    | I also see slightly worse results, but agree it is definitely better to be correct. I'll work on a test. | 
e4dfbac    to
    67df837      
    Compare
  
    | I added a ConcurrentSafe test, and verified that it fails (quite spectacularly) with the previous atomic.Value implementation. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| The concurrent safe test found another race condition around my usage of sync.Pool, which i'm looking into | 
597d23c    to
    81231b8      
    Compare
  
    81231b8    to
    2c82611      
    Compare
  
    | The other race had to do with my usage of sync.Pool. After Collect loaded an element, that element could be placed into the sync.Pool by a subsequent store() that replaced the measurement, and then modified by another store() that retrieved it from the sync.Pool. I worked out a way to fix this, but it made the performance around ~45ns. In the end, I decided to just lock around each measurement, since that has the same parallel performance, is much more simple and readable, and has better single-threaded performance. | 
2c82611    to
    5e17e43      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much simpler now.
|  | ||
| r.mu.Lock() | ||
| defer r.mu.Unlock() | ||
| if int(r.count) < cap(r.measurements) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: I think this (and all similar code) should be len not cap.
In the current code they are always the same, but it's a slight jar when reading it to wonder what was intended.
This improves the concurrent performance of the histogram reservoir's Offer function by 4x (i.e. 75% reduction).
Accomplish this by locking each measurement, rather than locking around the entire storage. Also, defer extracting the trace context from context.Context until collection time. This improves the performance of Offer, which is on the measure hot path. Exemplars are often overwritten, so deferring the operation until Collect reduces the overall work.
I explored using a []atomic.Pointer[measurement], but this had similar performance while being much more complex (needing a sync.Pool to eliminate allocations). The single-threaded performance was also much worse for that solution. See main...dashpole:optimize_histogram_reservoir_old.