Releases: aws/sagemaker-training-toolkit
Releases · aws/sagemaker-training-toolkit
v4.3.0
Features
- Add torch_distributed support for Trainium instances in SageMaker
v4.2.10
v4.2.9
Bug Fixes and Other Changes
- Add SageMaker Debugger exceptions
v4.2.8
prepare release v4.2.8
v4.2.7
Bug Fixes and Other Changes
- improve worker node wait logic and update EFA flags
v4.2.6
Bug Fixes and Other Changes
- Enable PT XLA distributed training on homogeneous clusters
v4.2.5
Bug Fixes and Other Changes
- relax exception type
v4.2.4
prepare release v4.2.4
v4.2.3
Bug Fixes and Other Changes
- update num_processes_per_host for smdataparallel runner
v4.2.2
Bug Fixes and Other Changes
- Removed version hardcoding for sagemaker test dependency
- update distribution_instance_group for pytorch ddp
- specify flake8 config explicitly