(Speed-up) In-place normalisation #324

dfulu · 2025-08-19T13:21:04Z

Pull Request

Description

This PR moves the normalisation to be in-place and hency by-passes some of the slow xarray functions.

This results in a modest speed up of 2.7 ± 1.2 % (range 1.01-4.96% across 7 trials) of the UK regional sampler. But one nice piece of this is that the normalisation is done only once for the UK concurrent sampler, rather than done separately for all concurrent GSPs in a single sample.

However, I wonder if this speed up warrants the change

Checklist:

My code follows OCF's coding style guidelines
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have checked my code and corrected any misspellings

Sukh-P · 2025-08-19T15:23:40Z

ocf_data_sampler/torch_datasets/datasets/pvnet_uk.py

+                channel_means = self.means_dict["nwp"][nwp_key]
+                channel_stds = self.stds_dict["nwp"][nwp_key]
+
+                da_nwp.values = (da_nwp.values - channel_means) / channel_stds


Is it correct to say that the in-place normalisation here means that xarray's broadcasting and dimension alignment checks won't happen so we would have to be sure that everything is in the correct order?

If so I couldn't see a test around config_normalization_values_to_dicts which is used above and seems to be the function which would define that order, should we add a test there or is there a test somewhere else that checks the ordering?

Since channel_means and channel_stds are just arrays, we were already skipping xarray's broadcasting and dimension alignment. I didn't quite look into everything that this skips past inside xarray, but I think in the old version there is internal copying happening inside the xarray sub and mult methods. Now we're just relying on the fast that channel_means/stds and da_nwp.values which are numpy arrays have compatible shapes for numpy broadcasting.

So no we don't have tests around config_normalization_values_to_dicts and the dimension order. Right now channel_means/stds have shapes [1, channels, 1, 1]. The sat data has shape [time, channels, x, y], and after time slicing the NWP has the same dimension order too.

So the alignment in the shapes is there in that they need to be compatible arrays for numpy broadcasting. But yes this produce make a silent error if we transposed the dimensions by accident, and if the norm values had the same number of channels as another dimension of the sat/NWP. But it will fail loudly if the number of channels doesn't match.

I think the risk is probably minimal, but I'm open to the idea that there could be a test. I'm just not sure what we'd be looking for in a test. Any thoughts?

Thanks for explaining, hmmm maybe we could throw in a check rather than a test, something like this

if da_nwp.dims[1] != "channel": raise ValueError(f"Expected second dimension to be 'channel', but got '{da_nwp.dims[1]}'")

and a comment in the config_normalization_values_to_dicts function that the order is there to match how we transpose the dimensions

I agree the chance is small but it feels strange not add in some check for what could lead to a silent error, but I don't feel super strong on this so if you think this is overkill then we can skip it

I'll have a look and see if there are any performance implecations. We do a similar thing in the diff_channels() function so that seems reasonable

Sukh-P · 2025-08-19T15:25:41Z

Although the performance gain is modest the change itself is straightforward so might be worth it, hope this doesn't feel like a contradiction to my comment on the other PR 😅 also nice one pulling out the normalisation function, think that makes it more readable and clearer responsibility separation of functions

dfulu added 6 commits August 18, 2025 13:39

Make normalisation in-place

01fde73

refactor

5520a3d

Merge branch 'main' into in_place_norm

9c645a3

Merge branch 'main' into in_place_norm

94db266

Merge branch 'main' into in_place_norm

157e640

linting

5509cdd

dfulu changed the title ~~(Speed-up) In place normalisation~~ (Speed-up) In-place normalisation Aug 19, 2025

Sukh-P reviewed Aug 19, 2025

View reviewed changes

Merge branch 'main' into in_place_norm

9e46542

Sukh-P approved these changes Aug 19, 2025

View reviewed changes

Merge branch 'main' into in_place_norm

6ce7f8c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

(Speed-up) In-place normalisation #324

(Speed-up) In-place normalisation #324

Uh oh!

dfulu commented Aug 19, 2025 •

edited

Loading

Uh oh!

Sukh-P Aug 19, 2025

Uh oh!

dfulu Aug 19, 2025

Uh oh!

Sukh-P Aug 19, 2025

Uh oh!

dfulu Aug 20, 2025 •

edited

Loading

Uh oh!

Sukh-P commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

(Speed-up) In-place normalisation #324

Are you sure you want to change the base?

(Speed-up) In-place normalisation #324

Uh oh!

Conversation

dfulu commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Checklist:

Uh oh!

Sukh-P Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

dfulu Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Sukh-P Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

dfulu Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sukh-P commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dfulu commented Aug 19, 2025 •

edited

Loading

dfulu Aug 20, 2025 •

edited

Loading