-
Notifications
You must be signed in to change notification settings - Fork 297
Open
Description
@lucidrains Thanks for the brilliant repo. It’s been central to our new project: link
Recently, we are hitting divergence during training whenever rotation_trick and stochastic_sample_codes=True are both enabled, even with sample_codebook_temp=0.01, it does not fully solve the problem. However, disabling the rotation_trick restores stability. Is this combination expected to misbehave, or is there a workaround you’d recommend for us?
We are using these parameters for your reference:
dim: 64
decay: 0.99
codebook_size: 8196
commitment_weight: 0.25
orthogonal_reg_weight: 10
orthogonal_reg_max_codes: 512
orthogonal_reg_active_codes_only: True
rotation_trick: True
threshold_ema_dead_code: 3
kmeans_init: True
kmeans_iters: 10
stochastic_sample_codes: True
sample_codebook_temp: 0.1Metadata
Metadata
Assignees
Labels
No labels