Single PUT when uploading "small" files to S3 #106
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We currently follow always same path on uploading to S3-like storage:
While this is general and battle tested, there is an improvement, we could check the count of buffers to be uploaded, and if that is at 1 we could perform a single PUT (moving from 3 requests over the network to a single one).
This is particularly significant for writes to data lakes in general and Iceberg in particular, due to the fact that both JSON and AVRO files have to be uploaded (and that means a
INSERT INTO <table> VALUES ()
costs currently 10+ sequential network requests)Next up: figuring out how to cut the HEAD request currently performed when opening a FileHandle, that could properly move to single request.
For the reviewer(s): I am assuming we always fully upload a file, and never just a single part of an existing file. It would be much better to check that explicitly, unsure if there is else needed.