-
Notifications
You must be signed in to change notification settings - Fork 277
nfd-worker: Watch features.d changes #2156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for kubernetes-sigs-nfd ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ozhuraki The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @ozhuraki. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ozhuraki for taking a stab at this.
I think we should refactor the code and re-architecture this more to make the code more maintainable. There might be other sources we'd also make react to events in a similar way. Basically, it should be the source (source/local
in this case) which should be able to notify the main event loop that features have been updated. Also, no need to run re-discovery of all features.
Thanks, makes sense. I will move this into source/local. |
Moved into source/local, please take a look |
Thanks, updated, please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress in the right direction, but I maintain that we should aim for a more generic, maintainable solution. For example, who knows in the future we might want to do some uevent-based stuff or similar and it would be good to have the basics right for that, instead building of pile of one-off tricks.
Some specific observations:
- We operate on interfaces in nfd-worker, IMO we better keep that to keep the design cleaner. E.g. introduce a new AsyncSource, EventSource or smth with a method to set the event channel, and then when configuring/enabling the feature sources check if the source implements the interface and if it does call the method
- It should be the source/local who is internally setting up the the fswatcher and notifies nfd-worker. Then, we have two possibilities here:
- either nfd-worker does the
source.Discover()
and then advertises the updated features/labels - or the source runs discovery internally and notifies nfd-worker just to re-advertise update features
- either nfd-worker does the
- When a source notifies the nfd-worker main loop, the main loop does not need to do full re-discovery of all feature sources
- Some unit test for the local source would be nide 😊
9a65598
to
7c29f0a
Compare
ping @ozhuraki any update on this? |
/ok-to-test |
/test pull-node-feature-discovery-verify-master |
7c29f0a
to
b9ea21a
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements filesystem watching functionality for the nfd-worker to automatically detect changes in the features.d
directory. It introduces an EventSource interface to enable sources to send notifications when their underlying data changes, specifically targeting the local source to watch for file modifications.
- Adds EventSource interface and related infrastructure for event-driven feature discovery
- Implements filesystem watching in the local source using fsnotify
- Integrates event handling into the nfd-worker main loop to trigger selective feature discovery
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
source/source.go | Adds EventSource interface and GetAllEventSources() function |
source/local/local.go | Implements EventSource interface with fsnotify-based file watching |
pkg/nfd-worker/nfd-worker.go | Integrates event handling and selective feature discovery by source |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
case err := <-s.fsWatcher.Errors: | ||
klog.ErrorS(err, "failed to to watch features.d changes") | ||
} | ||
time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded 1-second sleep in the event loop may cause unnecessary delays in event processing. Consider removing this sleep or making it configurable, as fsnotify events should be processed immediately.
time.Sleep(1 * time.Second) |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ozhuraki I think copilot is up to something here :) If we get a burst of say 10 events, we'd read them one-per-second, causing a streak of 10 1-per-second updates in nfd-worker. We'd only want a 1 second total delay, i.e. read events as fast as we can but "group" them into one event that gets sent to the ch.
Elsewhere we've used e.g. a patter of rateLimit := time.After(time.Second)...
, you could come up with something better, thoug...
Signed-off-by: Oleg Zhurakivskyy <[email protected]>
b9ea21a
to
7f64c8b
Compare
Thanks for the review. Updated, please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ozhuraki for the update. Comments below
case err := <-s.fsWatcher.Errors: | ||
klog.ErrorS(err, "failed to to watch features.d changes") | ||
} | ||
time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ozhuraki I think copilot is up to something here :) If we get a burst of say 10 events, we'd read them one-per-second, causing a streak of 10 1-per-second updates in nfd-worker. We'd only want a 1 second total delay, i.e. read events as fast as we can but "group" them into one event that gets sent to the ch.
Elsewhere we've used e.g. a patter of rateLimit := time.After(time.Second)...
, you could come up with something better, thoug...
@@ -341,6 +381,12 @@ func (w *nfdWorker) Run() error { | |||
return err | |||
} | |||
|
|||
case sourceName := <-w.sourceEvent: | |||
err = w.runFeatureDiscoveryBySourceName(sourceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could simplify this quite bit by having sourceEvent of type chan FeatureSource
. We could just do s.Discover()
here and ditch runFeatureDiscoveryBySourceName altogether. Need to split runFeatureDiscovery()
into two parts to avoid copy-pasting code: feature-discovery and feature-advertisement (create labels etc). Then call the feature-advertisement part here.
|
||
discoveryDuration := time.Since(discoveryStart) | ||
klog.V(2).InfoS("feature discovery of all sources completed", "duration", discoveryDuration) | ||
featureDiscoveryDuration.WithLabelValues(utils.NodeName()).Observe(discoveryDuration.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't update the metrics here as the metric is for "all enabled feature sources" and we're only doing one here. We can think about per-source metrics in a separate PR
for _, s := range eventSources { | ||
if err := s.SetNotifyChannel(w.sourceEvent); err != nil { | ||
klog.ErrorS(err, "failed to set notify channel for event source", "source", s.Name()) | ||
return fmt.Errorf("failed to set notify channel for event source %s: %w", s.Name(), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not error out in this case (just log the error). The design pattern of nfd-worker has been that log errors if we cannot access some data but don't crash. Let's follow that
eventSources := source.GetAllEventSources() | ||
for _, s := range eventSources { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this stuff should be done in the configure()
function. The other similar stuff is there, too.
Closes: #2075