Skip to content

Conversation

@rtibbles
Copy link
Member

@rtibbles rtibbles commented Sep 2, 2025

Summary

In 0.16 to accommodate remote file browsing, I changed all file downloading to use the ChunkedFile object to accommodate simultaneous remote browsing and file download. This created a lot more file overhead, always creating a cache database, chunked files etc. etc. even when it was a dedicated file download with no remote browsing needed.

This partially reverts some of that work to ensure that in the case where we are trying to download full files, and there is no simultaneous or previous partial file download still active, we skip the ChunkedFile overhead and just download with a straight forward GET request.

This has two performance advantages - it no longer creates directories, caches, and separate files, and it also reduces requests from a HEAD request and a GET request (using range headers for the full length of the file) to just a GET request with no range headers.

By a happy coincidence, we've been having some issues with range requests and Cloudflare recently, so reducing the number of range requests we have to handle, hopefully will make things smoother there too.

References

4th fix for #13680

Depends on #13689

Reviewer guidance

How much faster is this for downloading KA en?

@rtibbles rtibbles added this to the Kolibri 0.18: Planned Patch 3 milestone Sep 2, 2025
@github-actions github-actions bot added DEV: backend Python, databases, networking, filesystem... SIZE: medium labels Sep 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2025

Copy link
Member

@bjester bjester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments / questions

CHUNK_SUFFIX = ".chunks"


class TransferFileBase(BufferedIOBase, metaclass=ABCMeta):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this metaclass syntax python 3.x only? Just inherit ABC instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - we dropped Python 2.7 support already though. Inheriting ABC should also work, I had just copied it from the definition of the Transfer base class below, where I had also unnecessarily used the metaclass instead of direct inheritance. Can update both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! where have I been!? we dropped 2.7 support 🤯


@property
def chunks_count(self):
return int(math.ceil(float(self.file_size) / float(self.chunk_size)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't math.ceil always return an int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - guess I was being overcautious in the original implementation.

return False

def ensure_writable(self):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't this class follow similar behavior to ChunkedFile? should it just error here if the directory doesn't exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should never happen, but no reason not to add this extra safeguard.


@_catch_exception_and_retry
def run(self, progress_update=None):
self._set_completed()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I should add an explanatory comment - it's to address the test case when between the transfer being started and the transfer being run, the ChunkedFile is cleaned up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be helpful, considering this line then immediately afterwards if not self.completed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does rather give the appearance of making that check redundant - but it's not!

@rtibbles
Copy link
Member Author

Updated to address comments!

Copy link
Member

@bjester bjester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM! Ready for QA testing

Copy link
Member

@radinamatic radinamatic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 💯 :shipit: 🚀

@rtibbles rtibbles merged commit 96566b5 into learningequality:release-v0.18.x Sep 15, 2025
51 checks passed
@rtibbles rtibbles deleted the streamline_download_skip_chunked_files branch September 15, 2025 19:44
snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/learningequality_kolibri_pr_13698_9086b2d0-b7d6-4a93-af29-f1eddb0eadc0 that referenced this pull request Oct 11, 2025
snorkelopstesting1-a11y added a commit to snorkel-marlin-repos/learningequality_kolibri_pr_13698_9086b2d0-b7d6-4a93-af29-f1eddb0eadc0 that referenced this pull request Oct 11, 2025
snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/learningequality_kolibri_pr_13698_fd39f0b4-ee6e-4edf-a350-f61d28487cef that referenced this pull request Oct 11, 2025
snorkelopstesting1-a11y added a commit to snorkel-marlin-repos/learningequality_kolibri_pr_13698_fd39f0b4-ee6e-4edf-a350-f61d28487cef that referenced this pull request Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DEV: backend Python, databases, networking, filesystem... SIZE: large SIZE: medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants