Streamline download skip chunked files #13698

rtibbles · 2025-09-02T20:02:40Z

Summary

In 0.16 to accommodate remote file browsing, I changed all file downloading to use the ChunkedFile object to accommodate simultaneous remote browsing and file download. This created a lot more file overhead, always creating a cache database, chunked files etc. etc. even when it was a dedicated file download with no remote browsing needed.

This partially reverts some of that work to ensure that in the case where we are trying to download full files, and there is no simultaneous or previous partial file download still active, we skip the ChunkedFile overhead and just download with a straight forward GET request.

This has two performance advantages - it no longer creates directories, caches, and separate files, and it also reduces requests from a HEAD request and a GET request (using range headers for the full length of the file) to just a GET request with no range headers.

By a happy coincidence, we've been having some issues with range requests and Cloudflare recently, so reducing the number of range requests we have to handle, hopefully will make things smoother there too.

References

4th fix for #13680

Depends on #13689

Reviewer guidance

How much faster is this for downloading KA en?

github-actions · 2025-09-02T20:37:40Z

Build Artifacts

Asset type	Download link
PEX file	kolibri-0.18.2a0.dev0_git.25.g70654b1d.pex
Windows Installer (EXE)	kolibri-0.18.2a0.dev0+git.25.g70654b1d-windows-setup-unsigned.exe
Debian Package	kolibri_0.18.2a0.dev0+git.25.g70654b1d-0ubuntu1_all.deb
Mac Installer (DMG)	kolibri-0.18.2a0.dev0+git.25.g70654b1d.dmg
Android Package (APK)	kolibri-0.18.2a0.dev0+git.25.g70654b1d-0.1.5-debug.apk
Raspberry Pi Image	kolibri-pi-image-0.18.2a0.dev0+git.25.g70654b1d.zip
TAR file	kolibri-0.18.2a0.dev0+git.25.g70654b1d.tar.gz
WHL file	kolibri-0.18.2a0.dev0+git.25.g70654b1d-py2.py3-none-any.whl

…ever.

Revert to simpler file transfer when not needed.

bjester

Some comments / questions

bjester · 2025-09-11T16:42:41Z

kolibri/utils/file_transfer.py

 CHUNK_SUFFIX = ".chunks"


+class TransferFileBase(BufferedIOBase, metaclass=ABCMeta):


Isn't this metaclass syntax python 3.x only? Just inherit ABC instead?

Yes - we dropped Python 2.7 support already though. Inheriting ABC should also work, I had just copied it from the definition of the Transfer base class below, where I had also unnecessarily used the metaclass instead of direct inheritance. Can update both.

Ah! where have I been!? we dropped 2.7 support 🤯

bjester · 2025-09-11T16:56:47Z

kolibri/utils/file_transfer.py

+
    @property
    def chunks_count(self):
        return int(math.ceil(float(self.file_size) / float(self.chunk_size)))


doesn't math.ceil always return an int?

Yes - guess I was being overcautious in the original implementation.

bjester · 2025-09-11T16:59:27Z

kolibri/utils/file_transfer.py

+        return False
+
+    def ensure_writable(self):
+        pass


why doesn't this class follow similar behavior to ChunkedFile? should it just error here if the directory doesn't exist?

Yes, it should never happen, but no reason not to add this extra safeguard.

bjester · 2025-09-11T17:32:48Z

kolibri/utils/file_transfer.py


    @_catch_exception_and_retry
    def run(self, progress_update=None):
+        self._set_completed()


Should this be here?

Yes, but I should add an explanatory comment - it's to address the test case when between the transfer being started and the transfer being run, the ChunkedFile is cleaned up.

I think that would be helpful, considering this line then immediately afterwards if not self.completed

Yes, it does rather give the appearance of making that check redundant - but it's not!

rtibbles · 2025-09-11T22:00:35Z

Updated to address comments!

bjester

Code LGTM! Ready for QA testing

radinamatic

LGTM! 💯 🚀

Original PR #13698 by rtibbles Original: learningequality/kolibri#13698

Merged from original PR #13698 Original: learningequality/kolibri#13698

Original PR #13698 by rtibbles Original: learningequality/kolibri#13698

Merged from original PR #13698 Original: learningequality/kolibri#13698

rtibbles added this to the Kolibri 0.18: Planned Patch 3 milestone Sep 2, 2025

github-actions bot added DEV: backend Python, databases, networking, filesystem... SIZE: medium labels Sep 2, 2025

rtibbles added 2 commits September 2, 2025 13:58

Fix bug that was preventing more than one download thread being used …

40096be

…ever.

Reverts complete transition to ChunkedFile usage for FileDownload class.

511d46f

Revert to simpler file transfer when not needed.

rtibbles force-pushed the streamline_download_skip_chunked_files branch from 87948db to 511d46f Compare September 2, 2025 23:20

rtibbles mentioned this pull request Sep 3, 2025

Fix bug that was preventing more than one download thread being used ever #13689

Closed

rtibbles assigned bjester Sep 9, 2025

bjester reviewed Sep 11, 2025

View reviewed changes

file_transfer cleanup based on review comments.

7942e93

github-actions bot added the SIZE: large label Sep 11, 2025

bjester approved these changes Sep 12, 2025

View reviewed changes

radinamatic approved these changes Sep 15, 2025

View reviewed changes

rtibbles merged commit 96566b5 into learningequality:release-v0.18.x Sep 15, 2025
51 checks passed

rtibbles deleted the streamline_download_skip_chunked_files branch September 15, 2025 19:44

snorkelopstesting1-a11y mentioned this pull request Oct 11, 2025

Streamline download skip chunked files snorkel-marlin-repos/learningequality_kolibri_pr_13698_9086b2d0-b7d6-4a93-af29-f1eddb0eadc0#1

Merged

snorkelopstesting1-a11y mentioned this pull request Oct 11, 2025

Streamline download skip chunked files snorkel-marlin-repos/learningequality_kolibri_pr_13698_fd39f0b4-ee6e-4edf-a350-f61d28487cef#1

Merged

		CHUNK_SUFFIX = ".chunks"


		class TransferFileBase(BufferedIOBase, metaclass=ABCMeta):

Streamline download skip chunked files #13698

Streamline download skip chunked files #13698

Uh oh!

Conversation

rtibbles commented Sep 2, 2025

Summary

References

Reviewer guidance

Uh oh!

github-actions bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjester left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rtibbles commented Sep 11, 2025

Uh oh!

bjester left a comment

Choose a reason for hiding this comment

Uh oh!

radinamatic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Sep 2, 2025 •

edited

Loading