Skip to content

Pass SIGTERM to training script to stop training #125

@bstriner

Description

@bstriner

Describe the bug
SIGTERM from StopTrainingJob doesn't appear to be passed to the training subprocess.

To reproduce
Add a SIGTERM handler to a training script, start a training job, then click "Stop". The signal handler will not fire.

Expected behavior
Signal handler should fire when "StopTrainingJob" happens

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system.

  • Include the version of SageMaker Training Toolkit you are using.
  • If you are using a prebuilt Amazon SageMaker Docker image, provide the URL.
  • If you are using a custom Docker image, provide:
    • framework name (eg. PyTorch)
    • framework version
    • Python version
    • processing unit type (ie. CPU or GPU)

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions