-
Notifications
You must be signed in to change notification settings - Fork 158
Description
Bug report
Required Info:
- Operating System:
- Ubuntu 22.04 & Docker Hub ROS2 Humble images
- Installation type:
- binaries
- Version or commit hash:
- Humble
- DDS implementation:
- Fast-RTPS & Cyclone
- Client library (if applicable):
- rclcpp & rclpy
Steps to reproduce issue
Start a ROS2 Launch process which spawns at least one node, such as
ros2 launch robot_state_publisher rsp-launch-urdf-inline.xml --noninteractive
Send a SIGTERM to the ROS2 launch process with
kill -TERM $ros2_launch_PID
or
Use the official ROS2 OCI compliant containers from Docker Hub and run with Docker or Podman
podman run --rm --name ros_signal_test ros:humble ros2 launch robot_state_publisher rsp-launch-urdf-inline.xml
And then stop the container with
podman stop ros_signal_test
Expected behavior
ROS2 Launch should catch the SIGTERM and propagate that signal to each of its child nodes.
Actual behavior
SIGTERM is ignored in the first instance, and in the second containerized example SIGTERM is still ignored but all processes are SIGKILL'd after the podman stop
timeout.
Additional information
The ROS2 Launch design article says
"if someone attempts to SIGTERM the launch system, they probably did so out of impatience after sending SIGINT to the launch system, and therefore the launch system should attempt to exit quickly."
This is not an accurate assumption.
The Linux signal man page defines, according to the POSIX.1-1990 standard, "SIGINT" as an "Interrupt from keyboard" and "SIGTERM" as a "Termination signal".
The GNU manual entry on termination signals goes further, defining "SIGINT" as being "sent when the user types the INTR character (normally C-c)" and "SIGTERM" as "a generic signal used to cause program termination".
Neither of these sources make reference to any hierarchy or precedence of these two signals.
Users of the ROS2 Launch system are likely to first run into this issue when try to automate the running of their ROS2 nodes on embedded systems, in systemd services, podman/docker containers, or both!
By default systemd services send SIGTERMs to their processes on shutdown, followed by SIGKILLs after a timeout.
Similarly, Podman/Docker containers get sent a SIGTERM to PID1 when running, for example, podman stop ros_signal_test
.
The issue with containers cropped up here #473, but it was only resolved for SIGINTs.
An issue of SIGTERMs not being handled was mentioned here #545 (comment), but I've not been able to find any progress on it.
My current workaround for clean termination of an embedded ROS2 Launch system is to set KillMode=SIGINT
for systemd services, or run Podman/Docker containers with the --stop-signal=SIGINT
flag or set STOPSIGNAL SIGINT
in the Dockerfile.
This only works when ROS2 Launch either detects it's running in non-interactive mode or that is set explicitly with --noninteractive
#475.
However, it's oxymoronic to have to both be in "non-interactive" mode and to send a "keyboard interrupt" signal (SIGINT) for clean termination!
Feature request
Feature description
Have the ROS2 Launch system catch SIGTERMs and forward the same signal to their child nodes.
Implementation considerations
Windows??