Skip to content

Conversation

yolannel
Copy link

@yolannel yolannel commented Sep 5, 2025

First draft proposed for the GSoC 2025 blog post for Neural (De)compression for High Energy Physics under the ATLAS project proposal.

Copy link

netlify bot commented Sep 5, 2025

Deploy Preview for earnest-hotteok-b1e1bf ready!

Name Link
🔨 Latest commit 7243389
🔍 Latest deploy log https://app.netlify.com/projects/earnest-hotteok-b1e1bf/deploys/68e7c7cf6d881500080e86d9
😎 Deploy Preview https://deploy-preview-1755--earnest-hotteok-b1e1bf.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@yolannel
Copy link
Author

yolannel commented Sep 5, 2025

@maszyman Hi Maciej - I think as I'm external to this repo, I can't directly request a reviewer. Pinging you here so that you can add yourself as a reviewer to this draft PR and provide any feedback prior to marking this ready for review!

@maszyman maszyman self-assigned this Sep 5, 2025
@maszyman maszyman self-requested a review September 5, 2025 09:20
@maszyman maszyman added the GSoC Related to Google Summer of Code activity label Sep 5, 2025
Copy link
Contributor

@maszyman maszyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot Yolanne for this nice report (and your contribution in general)!

Please have a look at my comments inline.


## Introduction

In high-energy physics experiments such as those at CERN’s ATLAS project, immense volumes of data are generated. This project explores the feasability for “precision upsampling” using deep generative models to be used to reconstruct high-precision floating-point data from aggressively compressed representations. I had the opportunity to work on this topic with the support and supervision of Maciej Szymański and Peter Van Gemmeren with the ATLAS Software & Computing group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you could mention Argonne National Laboratory here as well :-)


Another approach under development is to treat the data as an inpainting problem, commonly seen within image generation where some part of an image may be blacked out; an inpainting model is designed to "fill in the blanks". In our case, we not only have the new theoretical bounds but also the first $23-n$ bits of data that is retained after truncation: this is valuable information which, in statistical tests, is also often a 'good-enough' approximation of the uncompressed data to begin with. Then, the challenge is only to "fill in" the remaining truncated $n$ bits which represents an even more bounded problem space and would minimize unexpected upsampling artifacts by constraining any correction terms to be within the allowable $n$ bits of change.

While this project has not yet conclusively found a candidate model to precision upsample, ongoing work is being performed and is to continue beyond the timeline of the GSoC project toward proposing a working pipeline based off of the work performed up to this point. In short, autoencoders, variational autoencoders, and some simple flow matching models have been implemented and tested, with performance measured using simple MSE loss as well as distribution-based metrics such as KL divergence. The pipeline of the model was being tested in Jupyter notebook files, but I have begun to move them to modular python files to facilitate further work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still add the plots showing the results for the models you implemented, even if they are not as good as one could hope for.

I think it's important to clearly show what has been achieved, try explaining why, and propose the next steps (which you did in the previous paragraph basically).

but I have begun to move them to modular python files to facilitate further work.

I would drop that. Instead you may say few words about your repo, what's there, how to use it, etc.

@maszyman
Copy link
Contributor

maszyman commented Oct 7, 2025

Hi @yolannel

Is it ready for the 2nd review?

Cheers,
Maciej

@yolannel
Copy link
Author

yolannel commented Oct 7, 2025 via email

Copy link
Contributor

@maszyman maszyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Please have a look at a couple of remaining small items.

@yolannel
Copy link
Author

yolannel commented Oct 9, 2025

@maszyman apologies for the small errors, I hadn't been too careful and should have caught those myself. I removed the latex and also uploaded the headshot; please see the modifications and thanks for the second eye!

Copy link
Contributor

@maszyman maszyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks @yolannel !

I think this is ready to go, could you please undraft?

@yolannel yolannel marked this pull request as ready for review October 9, 2025 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSoC Related to Google Summer of Code activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants