-
Notifications
You must be signed in to change notification settings - Fork 136
Open
Description
Describe the bug
SIGTERM from StopTrainingJob doesn't appear to be passed to the training subprocess.
To reproduce
Add a SIGTERM handler to a training script, start a training job, then click "Stop". The signal handler will not fire.
Expected behavior
Signal handler should fire when "StopTrainingJob" happens
Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.
System information
A description of your system.
- Include the version of SageMaker Training Toolkit you are using.
- If you are using a prebuilt Amazon SageMaker Docker image, provide the URL.
- If you are using a custom Docker image, provide:
- framework name (eg. PyTorch)
- framework version
- Python version
- processing unit type (ie. CPU or GPU)
Additional context
Add any other context about the problem here.
croth1 and theo-rogers
Metadata
Metadata
Assignees
Labels
No labels