Skip to content

Conversation

@Nuckal777
Copy link
Contributor

Proposed Changes

Fixes #399.

@github-actions github-actions bot added size/L enhancement New feature or request labels Aug 7, 2025
@Nuckal777 Nuckal777 force-pushed the enh/bootstate-endpoint branch from bd4e68c to 9a2c29f Compare August 7, 2025 15:35
@afritzler
Copy link
Member

Adding a k8sClient to the registry server will make it harder in the future to factor out the registry and deploy it somewhere were it might not have access to the metal-operator api server. How about instead the ServerReconciler probes the /bootstate endpoint periodically since we are doing a periodic retry on all Server objects?

@Nuckal777
Copy link
Contributor Author

How about instead the ServerReconciler probes the /bootstate endpoint periodically since we are doing a periodic retry on all Server objects?

This would introduce a gap in which the bootstate can be lost:

  • Server boots up, calls the bootstate endpoint and receives an HTTP 200.
  • Some time passes, while the bootstate only resides in memory
  • Server controller reconciles the Server object and applies the condition

Restarting the registry in the second step would loose the bootstate event.

@Nuckal777 Nuckal777 force-pushed the enh/bootstate-endpoint branch from 9a2c29f to f32a257 Compare October 27, 2025 14:14
@hardikdr hardikdr self-requested a review October 28, 2025 10:28
Copy link
Member

@hardikdr hardikdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Nuckal777, dropped some nit inline.

Thinking about it a little more, I’d propose a following holistic way of handling the first-boot flow. Instead of tying it to a single signal, we could make it more flexible, for example, let the boot-operator set conditions like IPXEScriptFetched and IgnitionDataFetched, and have the /bootstate POST update a BootStateReceived condition on the ServerBootConfig.

Then, the metal-operator could decide which of these conditions to treat as the actual boot completion using a boot-completion-condition flag. This would also make it easier to support things like NetBootOnce and NetBootAlways policies on the ServerClaim side later even when /bootstate POST call is not configured in the Ignition. Wdyt?


// BootstatePayload represents the payload to send to the `/bootstate` endpoint,
// including the systemUUID and the booted state.
type BootstatePayload struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: how about camel-case for BootState references instead?

conditionutils.UpdateStatus(metav1.ConditionTrue),
conditionutils.UpdateReason("BootStatePosted"),
conditionutils.UpdateMessage("Server successfully posted boot state"),
conditionutils.UpdateObserved(&server),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to explore the trade offs between having this condition on Server vs ServerBootConfig CR. I am somehow leaning more towards ServerBootConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ServerBootConfiguration might be more fitting as their lifetimes are bound to a ServerClaim and the discovery boot. 🤔

@Nuckal777
Copy link
Contributor Author

Instead of tying it to a single signal, we could make it more flexible...

Agree, iirc it was part of the initial discussion on the topic that the owner of a ServerClaim could choose what is considered to be a successful network boot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Define /bootstate call back endpoint in manager to track if Server successfully booted

4 participants