Skip to content

Releases: aws/sagemaker-training-toolkit

v4.3.0

20 Oct 18:48

Choose a tag to compare

Features

  • Add torch_distributed support for Trainium instances in SageMaker

v4.2.10

17 Oct 16:31

Choose a tag to compare

Bug Fixes and Other Changes

    • feature: Add neuron cores support (#21)

v4.2.9

26 Sep 16:33

Choose a tag to compare

Bug Fixes and Other Changes

  • Add SageMaker Debugger exceptions

v4.2.8

12 Sep 20:19

Choose a tag to compare

prepare release v4.2.8

v4.2.7

10 Sep 00:18

Choose a tag to compare

Bug Fixes and Other Changes

  • improve worker node wait logic and update EFA flags

v4.2.6

18 Aug 15:17

Choose a tag to compare

Bug Fixes and Other Changes

  • Enable PT XLA distributed training on homogeneous clusters

v4.2.5

17 Aug 16:28

Choose a tag to compare

Bug Fixes and Other Changes

  • relax exception type

v4.2.4

15 Aug 22:48

Choose a tag to compare

prepare release v4.2.4

v4.2.3

11 Aug 23:49

Choose a tag to compare

Bug Fixes and Other Changes

  • update num_processes_per_host for smdataparallel runner

v4.2.2

10 Aug 20:30

Choose a tag to compare

Bug Fixes and Other Changes

  • Removed version hardcoding for sagemaker test dependency
  • update distribution_instance_group for pytorch ddp
  • specify flake8 config explicitly