Skip to content

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Nov 4, 2025

Stack from ghstack (oldest at bottom):

We are adding more actions to convert the raw inputs and label.

  1. The new CP can do the input/label/BlockMask sharding this in this method.
  2. The experimental full dtensor model can simply override this method without changing too many Trainer code.

This method is extracted from #1857

Makeing this a standalone PR allows us to continue the two projects above without one blocks another.

[ghstack-poisoned]
fegin added a commit that referenced this pull request Nov 4, 2025
We are adding more actions to convert the raw inputs and label.

1. The new CP can do the input/label/BlockMask sharding this in this method.
2. The experimental full dtensor model can simply override this method without changing too many Trainer code.

This method is extracted from #1857

Makeing this a standalone PR allows us to continue the two projects above without one blocks another.


ghstack-source-id: d1882a7
Pull-Request: #1985
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 4, 2025
extra_inputs=extra_inputs,
)

return inputs, label, extra_inputs, extra_kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add docstring for returns, especially on the difference between extra_inputs and extra_kwargs.

Also not sure if we should just merge inputs and extra_inputs. Not urgent though.

model_parts = self.model_parts
parallel_dims = self.parallel_dims

def post_dataloading_processing(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is accurate in where it should be called, but we are putting it not right after dataloading. Rather we are putting it before training, which makes sense because when other library depends on torchtitan training but not torchtitan data loading, this is the right place to put it.

I just wonder if we could have another name that can express it's happening right before (but mostly as part of) the training, e.g. a bad and verbose version would be pre-actual-training-last-minute-data-preparation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about pre_training_data_processing or pre_training_data_preparation or if the "last" is really an important message, then final_data_preparation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid the term "pre-training" which could cause confusion.

I think we can go with post_dataloading_process, seems no ambiguity.

Suggested change
def post_dataloading_processing(
def post_dataloading_process(

model_parts = self.model_parts
parallel_dims = self.parallel_dims

def post_dataloading_processing(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid the term "pre-training" which could cause confusion.

I think we can go with post_dataloading_process, seems no ambiguity.

Suggested change
def post_dataloading_processing(
def post_dataloading_process(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants