Skip to content

Conversation

LinJin23
Copy link

@LinJin23 LinJin23 commented Jul 31, 2025

Description

add exception handling in get_chassis()

Motivation and Context

Improve exception handling during chassis loading to prevent thermalctld crashes (e.g., ValueError: invalid literal for int() with base 16: 'read error').

How Has This Been Tested?

Unit tests were extended to test the chassis exception handling functionality:

Added test for chassis initialization failure:

  • TestThermalControlDaemon.test_get_chassis_exception()

Added tests for None chassis handling:

  • TestFanUpdater.test_update_with_none_chassis()
  • TestTemperatureUpdater.test_init_with_none_chassis()
  • TestTemperatureUpdater.test_update_with_none_chassis()

Verified:
Daemon continues running and logs appropriate warnings when chassis initialization fails.

Additionally, tested on a physical platform to verify that when an exception occurs in get_chassis, the thermalctld daemon doesn’t crash.

Additional Information (Optional)

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LinJin23 LinJin23 changed the title add exception in get chassis add exception handling in get_chassis Aug 4, 2025
Copy link

Commenter does not have sufficient privileges for PR 652 in repo sonic-net/sonic-platform-daemons

@LinJin23
Copy link
Author

/Azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LinJin23 LinJin23 requested a review from vvolam August 13, 2025 01:08
try:
self.chassis = sonic_platform.platform.Platform().get_chassis()
except Exception as e:
self.log_warning("Failed to get chassis due to {}".format(repr(e)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use log_error instead here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


self.chassis = sonic_platform.platform.Platform().get_chassis()
try:
self.chassis = sonic_platform.platform.Platform().get_chassis()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please check if these lines have code coverage in sonic-thermalctld/tests/test_thermalctld.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code coverage.

@vvolam
Copy link
Contributor

vvolam commented Aug 21, 2025

@LinJin23 could you please test this exception path is hit by testing on any platform and simulating the exception

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LinJin23 LinJin23 marked this pull request as draft September 23, 2025 16:22
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LinJin23 LinJin23 marked this pull request as ready for review September 29, 2025 16:04
@LinJin23 LinJin23 requested a review from vvolam September 29, 2025 16:05
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds exception handling to the get_chassis() method call in the ThermalControlDaemon initialization to prevent crashes when chassis loading fails.

  • Wraps the platform chassis initialization in a try-except block to catch and log exceptions
  • Adds comprehensive test coverage for both successful and failed chassis initialization scenarios
  • Updates mock library to include missing methods and classes for improved test compatibility

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
sonic-thermalctld/scripts/thermalctld Adds try-except wrapper around get_chassis() call with error logging
sonic-thermalctld/tests/test_thermalctld.py Adds test class with comprehensive test cases for chassis initialization scenarios
sonic-thermalctld/tests/mocked_libs/swsscommon/swsscommon.py Updates mock implementation with missing classes and methods for test compatibility

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

try:
self.chassis = sonic_platform.platform.Platform().get_chassis()
except Exception as e:
self.log_error("Failed to get chassis due to {}".format(repr(e)))
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling catches the error but doesn't prevent the subsequent code from trying to use self.chassis. Line 829 will fail with AttributeError because self.chassis is never assigned when an exception occurs. Consider either setting self.chassis = None in the except block or adding a return/raise to prevent further initialization.

Suggested change
self.log_error("Failed to get chassis due to {}".format(repr(e)))
self.log_error("Failed to get chassis due to {}".format(repr(e)))
raise

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With self.chassis initialized, this comment could be ignored.

Comment on lines 5 to 17
# Mock classes that are imported by sonic_py_common
class ConfigDBConnector:
def __init__(self):
pass

class SonicV2Connector:
def __init__(self):
pass

class SonicDBConfig:
def __init__(self):
pass

Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the import statement from swsssdk import ConfigDBConnector, SonicDBConfig, SonicV2Connector and replacement with local mock classes creates potential issues. If other parts of the test suite depend on the imported classes from swsssdk, this change could break existing functionality. Consider keeping both approaches or ensuring all dependencies are properly mocked.

Suggested change
# Mock classes that are imported by sonic_py_common
class ConfigDBConnector:
def __init__(self):
pass
class SonicV2Connector:
def __init__(self):
pass
class SonicDBConfig:
def __init__(self):
pass
# Try to import real swsssdk classes, fallback to mocks if not available
try:
from swsssdk import ConfigDBConnector, SonicDBConfig, SonicV2Connector
except ImportError:
class ConfigDBConnector:
def __init__(self):
pass
class SonicV2Connector:
def __init__(self):
pass
class SonicDBConfig:
def __init__(self):
pass

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@vvolam
Copy link
Contributor

vvolam commented Sep 30, 2025

Description

add exception handling in get_chassis()

Motivation and Context

Improve exception handling during chassis loading to prevent thermalctld crashes (e.g., ValueError: invalid literal for int() with base 16: 'read error').

How Has This Been Tested?

Additional Information (Optional)

Please updated "How Has This Been Tested?" step as well

self.stop_event = threading.Event()

self.wait_time = self.INTERVAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize self.chassis to None

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LinJin23 LinJin23 marked this pull request as draft October 8, 2025 22:39
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LinJin23
Copy link
Author

@LinJin23 could you please test this exception path is hit by testing on any platform and simulating the exception

Already tested on a physical platform to verify that when an exception occurs in get_chassis, the thermalctld daemon doesn’t crash.

@LinJin23 LinJin23 requested a review from Copilot October 10, 2025 14:52
@LinJin23 LinJin23 marked this pull request as ready for review October 10, 2025 14:52
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

try:
self.chassis = sonic_platform.platform.Platform().get_chassis()
except Exception as e:
self.log_error("Failed to get chassis due to {}".format(repr(e)))
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chassis initialization should set self.chassis to None in the exception handler to ensure it's explicitly None when chassis loading fails, making the error state more obvious.

Suggested change
self.log_error("Failed to get chassis due to {}".format(repr(e)))
self.log_error("Failed to get chassis due to {}".format(repr(e)))
self.chassis = None

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.is_smartswitch_dpu = chassis.is_smartswitch() and chassis.is_dpu()
self.is_chassis_upd_required = self.is_chassis_system or self.is_smartswitch_dpu

if self.is_chassis_upd_required and chassis is not None:
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition chassis is not None in line 555 is redundant since it's already inside the else block that handles the case when chassis is not None.

Suggested change
if self.is_chassis_upd_required and chassis is not None:
if self.is_chassis_upd_required:

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LinJin23 LinJin23 requested a review from Copilot October 10, 2025 15:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +551 to +553
self.is_chassis_system = chassis.is_modular_chassis()
self.is_smartswitch_dpu = chassis.is_smartswitch() and chassis.is_dpu()
self.is_chassis_upd_required = self.is_chassis_system or self.is_smartswitch_dpu
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else block should be indented at the same level as the if statement. Currently, the else is incorrectly aligned with the inner statements.

Suggested change
self.is_chassis_system = chassis.is_modular_chassis()
self.is_smartswitch_dpu = chassis.is_smartswitch() and chassis.is_dpu()
self.is_chassis_upd_required = self.is_chassis_system or self.is_smartswitch_dpu
self.is_chassis_system = chassis.is_modular_chassis()
self.is_smartswitch_dpu = chassis.is_smartswitch() and chassis.is_dpu()
self.is_chassis_upd_required = self.is_chassis_system or self.is_smartswitch_dpu

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applicable

Comment on lines 835 to 836
with (mock.patch('thermalctld.sonic_platform.platform.Platform') as mock_platform_class,
mock.patch.object(thermalctld.ThermalControlDaemon, 'log_error') as mock_log_error):
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using parentheses for multi-line with statements is a Python 3.9+ feature. Consider using backslash continuation for better compatibility: with mock.patch(...) as mock_platform_class, \\\n mock.patch.object(...) as mock_log_error:

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 862 to 863
with (mock.patch('thermalctld.sonic_platform.platform.Platform') as mock_platform_class,
mock.patch.object(thermalctld.ThermalControlDaemon, 'log_error') as mock_log_error):
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using parentheses for multi-line with statements is a Python 3.9+ feature. Consider using backslash continuation for better compatibility: with mock.patch(...) as mock_platform_class, \\\n mock.patch.object(...) as mock_log_error:

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LinJin23 LinJin23 requested a review from Copilot October 10, 2025 15:31
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +11 to +17
pass
class SonicV2Connector:
def __init__(self):
pass
class SonicDBConfig:
def __init__(self):
pass
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback mock classes are empty and may not provide the interface expected by test code. Consider adding basic method stubs that return appropriate default values to prevent AttributeError exceptions during testing.

Suggested change
pass
class SonicV2Connector:
def __init__(self):
pass
class SonicDBConfig:
def __init__(self):
pass
self.db = {}
def connect(self):
pass
def get(self, table, key):
return self.db.get(table, {}).get(key, None)
def set(self, table, key, value):
if table not in self.db:
self.db[table] = {}
self.db[table][key] = value
def get_table(self, table):
return self.db.get(table, {})
class SonicV2Connector:
def __init__(self):
self.db = {}
def connect(self, db_name):
pass
def get(self, db_name, table, key):
return self.db.get(db_name, {}).get(table, {}).get(key, None)
def set(self, db_name, table, key, value):
if db_name not in self.db:
self.db[db_name] = {}
if table not in self.db[db_name]:
self.db[db_name][table] = {}
self.db[db_name][table][key] = value
def get_table(self, db_name, table):
return self.db.get(db_name, {}).get(table, {})
class SonicDBConfig:
def __init__(self):
self.config = {}
def get_db_config(self, db_name):
return self.config.get(db_name, {})

Copilot uses AI. Check for mistakes.

self.is_smartswitch_dpu = chassis.is_smartswitch() and chassis.is_dpu()
self.is_chassis_upd_required = self.is_chassis_system or self.is_smartswitch_dpu

if self.is_chassis_upd_required:
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chassis None check logic should be consistently applied to the chassis table initialization code below (lines 555-563). Currently, if chassis is None but somehow the conditions pass, there could be AttributeError exceptions when calling chassis methods.

Suggested change
if self.is_chassis_upd_required:
if self.is_chassis_upd_required and chassis is not None:

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. BTW this is the opposite of the previous suggestion, which recommended that I remove it.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants