-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Multi target memory reallocation optimization #11744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi target memory reallocation optimization #11744
Conversation
} | ||
page_idx++; | ||
} | ||
partitioner_.resize(page_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to resize this after the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's meant to be a safeguard if page size decreases between different calls. if the number of pages has decreased, LeafPartition
which iterates partitioner_
with for (auto const &part : partitioner_)
could touch on stale data. this wasn't a problem previously since we were clearing partitioner_
every time we call InitData
.
it does seem to me the scenario above is extremely rare, but since adding a resize in most situations won't do anything, i figured it is probably ok to include it. they will be redundant if the number of pages never decreases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing. In that case, would you like to help add a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. i've added tests for both HistUpdater
and MultiTargetHistBuilder
that stale partitioners would not write to p_out_position
especially when we perform an update using a data with large number of pages first then followed by an update using a data with smaller number of pages. let me know if there is any issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the optimization.
This PR extends the memory reallocation optimization introduced in #11112 to
MultiTargetHistBuilder
.In addition, this pr also adds a
resize()
call to bothMultiTargetHistBuilder
andHistUpdater
as safeguards in situations when number of pages decreases.I did a quick benchmark using a few datasets from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html that aren't too small while I could fit in my ec2 instance (c7i.24xl), and the changes seems consistent: