-
Notifications
You must be signed in to change notification settings - Fork 818
Streamline download skip chunked files #13698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamline download skip chunked files #13698
Conversation
Build Artifacts
|
Revert to simpler file transfer when not needed.
87948db to
511d46f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments / questions
kolibri/utils/file_transfer.py
Outdated
| CHUNK_SUFFIX = ".chunks" | ||
|
|
||
|
|
||
| class TransferFileBase(BufferedIOBase, metaclass=ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this metaclass syntax python 3.x only? Just inherit ABC instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - we dropped Python 2.7 support already though. Inheriting ABC should also work, I had just copied it from the definition of the Transfer base class below, where I had also unnecessarily used the metaclass instead of direct inheritance. Can update both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! where have I been!? we dropped 2.7 support 🤯
kolibri/utils/file_transfer.py
Outdated
|
|
||
| @property | ||
| def chunks_count(self): | ||
| return int(math.ceil(float(self.file_size) / float(self.chunk_size))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't math.ceil always return an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - guess I was being overcautious in the original implementation.
kolibri/utils/file_transfer.py
Outdated
| return False | ||
|
|
||
| def ensure_writable(self): | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why doesn't this class follow similar behavior to ChunkedFile? should it just error here if the directory doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should never happen, but no reason not to add this extra safeguard.
|
|
||
| @_catch_exception_and_retry | ||
| def run(self, progress_update=None): | ||
| self._set_completed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I should add an explanatory comment - it's to address the test case when between the transfer being started and the transfer being run, the ChunkedFile is cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be helpful, considering this line then immediately afterwards if not self.completed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does rather give the appearance of making that check redundant - but it's not!
|
Updated to address comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM! Ready for QA testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 💯
🚀
Original PR #13698 by rtibbles Original: learningequality/kolibri#13698
Merged from original PR #13698 Original: learningequality/kolibri#13698
Original PR #13698 by rtibbles Original: learningequality/kolibri#13698
Merged from original PR #13698 Original: learningequality/kolibri#13698
Summary
In 0.16 to accommodate remote file browsing, I changed all file downloading to use the ChunkedFile object to accommodate simultaneous remote browsing and file download. This created a lot more file overhead, always creating a cache database, chunked files etc. etc. even when it was a dedicated file download with no remote browsing needed.
This partially reverts some of that work to ensure that in the case where we are trying to download full files, and there is no simultaneous or previous partial file download still active, we skip the ChunkedFile overhead and just download with a straight forward GET request.
This has two performance advantages - it no longer creates directories, caches, and separate files, and it also reduces requests from a HEAD request and a GET request (using range headers for the full length of the file) to just a GET request with no range headers.
By a happy coincidence, we've been having some issues with range requests and Cloudflare recently, so reducing the number of range requests we have to handle, hopefully will make things smoother there too.
References
4th fix for #13680
Depends on #13689
Reviewer guidance
How much faster is this for downloading KA en?