adding variable length attention to llama3 8b #2000

liangel-02 · 2025-11-07T01:12:31Z

Summary
This PR adds variable length attention (varlen) support to the Llama 3 8b model in torchtitan. We add a flag use_varlen_attn to the model config, and if this is set to True, the attention module calls a compiled varlen_attn defined here.

Testing
Ran loss and performance tests against flex attention. Loss is on par.

Varlen is slightly slower than Flex due to the cuda kernel speeds (varlen calls into flash_attention_forward/flash_attention_backward today).

	Varlen	Flex
Forward	774us 357ns	722us 317ns
Backward	1ms 955us 916ns	1ms 558us 747ns

drisspg · 2025-11-12T23:28:28Z

torchtitan/hf_datasets/text_datasets.py

        self._sample_idx = 0
        self._token_buffer: list[int] = []

+        self._boundary_buffer: list[int] = [0]


can you add a comment on on boundary_buffer and why its needed

drisspg · 2025-11-12T23:28:56Z

torchtitan/hf_datasets/text_datasets.py

-                    yield {"input": input}, label
+
+                    if self.use_varlen_attn:
+                        boundaries_in_window = [


also maybe make this a func? that gets called so you can better document it

fegin

This implementation won't work with PP and too model intrusive. The pack logic should be hide inside the inner attention.

fegin · 2025-11-13T07:26:39Z

torchtitan/hf_datasets/text_datasets.py

    return path, config.loader, config.sample_processor


+def varlen_collate_fn(batch):


This should not be done in the dataloader. If you always pack the input batch to batch size 1, then pipeline parallelism won't work. You should perform the pack using the mask (namedtuple) data (see below) inside the inner attention to pack to what you need.

fegin · 2025-11-13T07:29:01Z

torchtitan/hf_datasets/text_datasets.py

+    return {
+        "input": packed_input,
+        "cu_seq_q": packed_cu_seqlens,
+        "cu_seq_k": packed_cu_seqlens,
+        "max_q": max_seqlen,
+        "max_k": max_seqlen,
+    }, packed_label


You should follow how we create BlockMask by letting the model to provide the attention mask. You can extend AttentionMasksType and https://github.com/pytorch/torchtitan/blob/main/torchtitan/protocols/model.py#L64. You can use namedtuple for this.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 7, 2025

liangel-02 added 4 commits November 12, 2025 14:39

support varlen_attn for llama3

5071936

testing

ebd1d9f

fixing is_causal and .item()

93a5bac

correct loss/perf

af8e3e8

liangel-02 force-pushed the test_varlen branch 2 times, most recently from a96be88 to eeecb63 Compare November 12, 2025 22:42

cleaning up

cad97e5

liangel-02 force-pushed the test_varlen branch from eeecb63 to cad97e5 Compare November 12, 2025 22:49

liangel-02 changed the title ~~Test varlen~~ adding variable length attention to llama 3 8b Nov 12, 2025

liangel-02 changed the title ~~adding variable length attention to llama 3 8b~~ adding variable length attention to llama3 8b Nov 12, 2025

liangel-02 requested a review from drisspg November 12, 2025 23:18

drisspg reviewed Nov 12, 2025

View reviewed changes

fegin requested changes Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding variable length attention to llama3 8b #2000

adding variable length attention to llama3 8b #2000

liangel-02 commented Nov 7, 2025 •

edited

Loading

Uh oh!

drisspg Nov 12, 2025

Uh oh!

drisspg Nov 12, 2025

Uh oh!

fegin left a comment

Uh oh!

fegin Nov 13, 2025

Uh oh!

fegin Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return path, config.loader, config.sample_processor


		def varlen_collate_fn(batch):

adding variable length attention to llama3 8b #2000

Are you sure you want to change the base?

adding variable length attention to llama3 8b #2000

Conversation

liangel-02 commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drisspg Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

drisspg Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

fegin left a comment

Choose a reason for hiding this comment

Uh oh!

fegin Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

fegin Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liangel-02 commented Nov 7, 2025 •

edited

Loading