-
Notifications
You must be signed in to change notification settings - Fork 41
LayerNorm after last skip connection #1157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: mk/develop/fe_experiments
Are you sure you want to change the base?
LayerNorm after last skip connection #1157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing your code, Savvas. Let's please modify it slightly to minimize redundant lines. See my suggestions below.
| norm_type=self.cf.norm_type, | ||
| dim_aux=1, | ||
| norm_eps=self.cf.mlp_norm_eps, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of having less redundant code, can we modify the MLP to receive an additional argument with_residual_layer_norm=False instead of introducing FEMLP? When calling MLP here, we can set with_residual_layer_norm=(i + 1) == self.cf.fe_num_blocks to add the residual layer norm in the last MLP layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we need to modify the MLP? It can--and in my opinion definitely should--be implemented in the forecast engine. Where we can just have the LayerNorm as the last block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check now? Is that better?
Description
This PR changes the MLP of the last block of the forecast engine. Specifically, it includes a LayerNorm with scale and bias turned off after the last skip connection of the last block.
Issue Number
This is a draft PR
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60