Skip to content

Conversation

@prabhataravind
Copy link
Contributor

@prabhataravind prabhataravind commented Sep 19, 2025

  • In cases where the NPU itself is being rebooted, it can take more than 10 minutes in some cases for dhcp_server to be UP. If DPU is coming up at this time, it might not have all the remote databases instantiated, so give it sufficient time to get an IP via dhcp server running on NPU as proceeding without waiting long enough inevitably leads to orchagent crash when accessing remote DBs on the NPU.
  • Other solutions like introducing a hard dependency b/w midplane-network-dpu.service and database.service will not work for kvm testbeds.

Why I did it

Fixes #24015

Work item tracking
  • Microsoft ADO (number only):

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

 * In cases where the NPU itself is being rebooted, it can take more than 10
   minutes in some cases for dhcp_server to be UP. If DPU is coming up at this
   time, it might not have all the remote databases instantiated, so give it
   sufficient time to get an IP via dhcp server running on NPU.

Signed-off-by: Prabhat Aravind <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@prabhataravind prabhataravind requested review from Pterosaur and removed request for lguohan September 19, 2025 06:24
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kperumalbfn
Copy link
Contributor

@vivekrnv Could you please check why 10 mins delay to bring up DHCP server?

@prabhataravind
Copy link
Contributor Author

No longer required. The issue happens only when DPU is being brought up when dhcp_server on NPU is down.

croos12 pushed a commit to croos12/sonic-buildimage that referenced this pull request Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: [Smartswitch] Orchagent crashes if eth0-midplane receives IP after DPU Database service start

5 participants