
Describe the bug
step
used to log self.summary when training SAC resets to 0 when self.learn(reset_num_timesteps=False)
is run. This results in some tensorboard logs to become disconnected and have logging begin at step=0 again.
Additional context
This can be resolved by logging to tensorboard with self.num_timesteps instead of step passed into self._train_step.