Abstract class for target/aux computation #1184

sophie-xhonneux · 2025-10-30T17:29:29Z

Implemented Identity class

TODO: implement EMATeacher

Description

Issue Number

Closes #1179

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

Implemented Identity class TODO: implement EMATeacher

The big question on the EMA teacher side to me is how to allow for a fleixble teacher and student architecture that can differ We updated some APIs of the abstract base class to allow the ema_model forward, subject to change given the loss calculator, which is imho the second big question mark

shmh40 · 2025-10-31T16:55:48Z

src/weathergen/train/target_and_aux_ssl_teacher.py

+
+class EMATeacher(TargetAndAuxModuleBase):
+    def __init__(self, model, rng, ema_model, batch_size, **kwargs):
+        # One of the issues is that the teacher model may have a different architecture


Do you mean that e.g. in JEPA the student has the predictor too?

Yea, in JEPA the student is Predictor(Encoder(x')) whereas the teacher is just Encoder(x), but also in BYOL there is a difference for instance

Cool. Is there a useful abstraction we could stick with that would be helpful -- always EMA'ed encoder for example? EMATeacherEncoder always the same, then add e.g. predictor to this? This might not help, and don't know if this holds for byol, just thinking

I agree. The predictor could be the identity if it's not present.

We will need different "heads" for different latent student-teacher losses, the predictor would be just one of them

Easier to read and as batchsize gets more complicated in SSL this will be a useful abstraction

It runs so far. Next steps: - Route all the config options - Start writing the loss functions to understand the state requirements

clessig

Looks already very nice overall but some minor structural changes would be good, see detailed comments.

clessig · 2025-11-06T19:50:56Z

src/weathergen/model/model.py

        return preds_tokens
+
+
+def get_model(student_or_teacher, cf: Config, sources_size, targets_num_channels, targets_coords_size, **kwargs):


instantiate_model() is a more natural name for me

And I don't think it should go to model.py. If we have the function then it seems more natural that it is also responsible which model potentially to instantiate.

it felt unnecessary to create another file for it

clessig · 2025-11-06T19:51:11Z

src/weathergen/model/ema.py

        maybe_sharded_sd = self.original_model.state_dict()
        # this copies correctly tested in pdb
-        mkeys, ukeys = self.ema_model.load_state_dict(maybe_sharded_sd, strict=True, assign=False)
+        mkeys, ukeys = self.ema_model.load_state_dict(maybe_sharded_sd, strict=False, assign=False)


Why is this changed?

because teacher arch =/= student arch so it cannot be strict

clessig · 2025-11-06T19:52:59Z

src/weathergen/model/model.py

+    if student_or_teacher == "student" or student_or_teacher == "teacher":
+        return Model(cf, sources_size, targets_num_channels, targets_coords_size).create()
+    else:
+        if cf["training_mode"] == "masking": # TODO implement mode "student-teacher-pretrain":


This should be a nested dict. But we should write an example config to see how it looks and feels like and how it works.

clessig · 2025-11-06T19:53:34Z

src/weathergen/train/target_and_aux_module_base.py

+
+
+class IdentityTargetAndAux(TargetAndAuxModuleBase):
+    def __init__(self, model, rng, config):


Could we have a brief documentation

clessig · 2025-11-06T19:55:08Z

src/weathergen/train/target_and_aux_ssl_teacher.py

+
+class EMATeacher(TargetAndAuxModuleBase):
+    def __init__(self, model, rng, ema_model, batch_size, **kwargs):
+        # One of the issues is that the teacher model may have a different architecture


I agree. The predictor could be the identity if it's not present.

clessig · 2025-11-06T20:14:04Z

src/weathergen/train/trainer.py

            loss_values = self.loss_calculator.compute_loss(
                preds=preds,
-                streams_data=batch[0],
+                streams_data=batch[0],  # should additionally take targets?


Yes, this should take targets. We should have an TargetAndAuxCalculatorIdentity class that takes the batch and returns just the physical space targets. (No strong feelings if we call TargetAndAuxCalculatorIdentity or TargetAndAuxCalculatorPhysical or something similar)

src/weathergen/train/trainer.py

clessig · 2025-11-06T20:17:12Z

src/weathergen/train/trainer.py

                self.ema_model.update(
-                    self.cf.istep * self.world_size_original * self.cf.batch_size_per_gpu,
-                    self.world_size_original * self.cf.batch_size_per_gpu,
+                    self.cf.istep * get_batch_size(self.cf, self.world_size_original),


We need to abstract this into a function in utils/distributed.py

this change does this abstraction, not sure I understand

clessig · 2025-11-06T20:18:28Z

src/weathergen/train/trainer_base.py

+
+
+# should be moved to its own file so as to prevent cyclical imports
+def get_target_and_aux_calculator(config, model, rng, batch_size, **kwargs):


This should go to the same file as instantiate_model.py.

sure, how strongly are you married to instantiate_model?

Abstract class for target/aux computation

3f1bb7d

Implemented Identity class TODO: implement EMATeacher

github-project-automation bot added this to WeatherGen-dev Oct 30, 2025

shmh40 self-assigned this Oct 31, 2025

shmh40 added the model:pretrain label Oct 31, 2025

shmh40 moved this to In Progress in WeatherGen-dev Oct 31, 2025

shmh40 reviewed Oct 31, 2025

View reviewed changes

sophie-xhonneux added 2 commits November 4, 2025 12:30

Option for constructing teacher model flexibly

192beb6

Extract get batch size util function

aac7e29

Easier to read and as batchsize gets more complicated in SSL this will be a useful abstraction

github-actions bot added the model Related to model training or definition (not generic infra) label Nov 5, 2025

Fix mismatched dtypes in the target computation

145d18a

It runs so far. Next steps: - Route all the config options - Start writing the loss functions to understand the state requirements

clessig reviewed Nov 6, 2025

View reviewed changes

		return preds_tokens


		def get_model(student_or_teacher, cf: Config, sources_size, targets_num_channels, targets_coords_size, **kwargs):



		class IdentityTargetAndAux(TargetAndAuxModuleBase):
		def __init__(self, model, rng, config):



		# should be moved to its own file so as to prevent cyclical imports
		def get_target_and_aux_calculator(config, model, rng, batch_size, **kwargs):

Abstract class for target/aux computation #1184

Are you sure you want to change the base?

Abstract class for target/aux computation #1184

Conversation

sophie-xhonneux commented Oct 30, 2025

Description

Issue Number

Checklist before asking for review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shmh40 Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shmh40 Oct 31, 2025 •

edited

Loading