Skip to content

Conversation

@SavvasMel
Copy link
Contributor

Description

This PR changes the MLP of the last block of the forecast engine. Specifically, it includes a LayerNorm with scale and bias turned off after the last skip connection of the last block.

Issue Number

This is a draft PR

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@SavvasMel SavvasMel changed the title Fe experiments layer LayerNorm after last skip connection Oct 27, 2025
@grassesi grassesi deleted the branch ecmwf:mk/develop/fe_experiments October 27, 2025 14:18
@grassesi grassesi closed this Oct 27, 2025
@grassesi grassesi mentioned this pull request Oct 27, 2025
38 tasks
@grassesi grassesi reopened this Oct 27, 2025
Copy link
Contributor

@MatKbauer MatKbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing your code, Savvas. Let's please modify it slightly to minimize redundant lines. See my suggestions below.

norm_type=self.cf.norm_type,
dim_aux=1,
norm_eps=self.cf.mlp_norm_eps,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of having less redundant code, can we modify the MLP to receive an additional argument with_residual_layer_norm=False instead of introducing FEMLP? When calling MLP here, we can set with_residual_layer_norm=(i + 1) == self.cf.fe_num_blocks to add the residual layer norm in the last MLP layer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need to modify the MLP? It can--and in my opinion definitely should--be implemented in the forecast engine. Where we can just have the LayerNorm as the last block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check now? Is that better?

@github-project-automation github-project-automation bot moved this from Done to In Progress in WeatherGen-dev Oct 28, 2025
@SavvasMel SavvasMel requested a review from MatKbauer October 28, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants