Skip to content

Rotation trick diverges when stochastic_sample_codes=True #223

@mahdip72

Description

@mahdip72

@lucidrains Thanks for the brilliant repo. It’s been central to our new project: link

Recently, we are hitting divergence during training whenever rotation_trick and stochastic_sample_codes=True are both enabled, even with sample_codebook_temp=0.01, it does not fully solve the problem. However, disabling the rotation_trick restores stability. Is this combination expected to misbehave, or is there a workaround you’d recommend for us?

We are using these parameters for your reference:

dim: 64
decay: 0.99
codebook_size: 8196
commitment_weight: 0.25
orthogonal_reg_weight: 10
orthogonal_reg_max_codes: 512
orthogonal_reg_active_codes_only: True
rotation_trick: True
threshold_ema_dead_code: 3
kmeans_init: True
kmeans_iters: 10
stochastic_sample_codes: True
sample_codebook_temp: 0.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions