Replies: 1 comment 5 replies
-
|
Hi @iN1k1 , this is an issue of Wandb. You can see the docs to learn how to use wandb on multiple gpus. You can use the group parameter when you initialize wandb to define a shared experiment and group the logged values together in the W&B App UI. like: vis_backends = [dict(type='WandbVisBackend',init_kwargs=dict(group='xxx'))] |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I am running a training job with the
dist_train.shscript and logging the training details on wandb with the help of theLoggerHookandWandbGenVisBackend. Everything is fine during training and all the pars and losses are correctly logged in wandb under a single run.However, when the
MultiValLoopis executed, there will be multiple wandb initialization (one for each process that has rank>0). Each of these processes logs nothing. Validation results are saved only under the wandb run for rank=0. So, the problem is that there are multiple wandb inits that are useless and just adding noise on the wandb UI. Any way of avoiding multiple wand initialization during the validation loop?Thanks
Beta Was this translation helpful? Give feedback.
All reactions