Skip to content

OCPBUGS-31521: Don't publish duplicate DNS records #1229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rfredette
Copy link
Contributor

Before attempting to publish a domain to a zone, check if that domain is already being published to the same zone.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2025
Copy link
Contributor

openshift-ci bot commented May 7, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented May 7, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign frobware for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

for _, existingRecord := range records.Items {
// we only care if the domain name is published by a different record, so ignore the matching record if it
// already exists.
// TODO: There's got to be a better way to match the same object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare by UID instead of name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a relevant comment, but I moved the function, which made github think it's outdated. I'll make this change in my next update

@@ -352,6 +361,24 @@ func (r *reconciler) publishRecordToZones(zones []configv1.DNSZone, record *iov1
}
} else if isRecordPublished {
condition, err = r.replacePublishedRecord(zones[i], record)
} else if isDomainPublished, err = domainIsAlreadyPublishedInZone(context.Background(), r.cache, record, &zones[i]); err != nil {
Copy link
Contributor

@Miciah Miciah May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to declare isDomainPublished outside the else if clause. Better to keep isDomainPublished and err scoped to the else if clauses.

Suggested change
} else if isDomainPublished, err = domainIsAlreadyPublishedInZone(context.Background(), r.cache, record, &zones[i]); err != nil {
} else if isDomainPublished, err := domainIsAlreadyPublishedInZone(context.Background(), r.cache, record, &zones[i]); err != nil {

Edit: Discussed on a call. Line 388 uses the err value. This logic is a bit subtle and could use some refactoring.

Reason: "DomainAlreadyInUse",
Status: string(operatorv1.ConditionUnknown),
Type: iov1.DNSRecordPublishedConditionType,
LastTransitionTime: metav1.Now(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe mergeConditions will set LastTransitionTime, so you don't need to set it here (lines 371 and 380).

Copy link
Contributor

@Miciah Miciah May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

523461a updated mergeConditions to set LastTransitionTime. That's a fairly recent change, so older code (e.g. line 360) might be setting LastTransitionTime because it used to be necessary but no longer is. Scratch that, mergeConditions was setting LastTransitionTime even before that commit.

Before attempting to publish a domain to a zone, check if that domain is
already being published to the same zone.
@rfredette rfredette force-pushed the no-conflicting-dns branch from 9fece4b to d38f7bf Compare May 15, 2025 17:33
func (r *reconciler) MapOnRecordDelete(ctx context.Context, o client.Object) []reconcile.Request {
deletedRecord, ok := o.(*iov1.DNSRecord)
if !ok {
log.Info("failed to read DNS record")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info("failed to read DNS record")
log.Infof("Got unexpected object; expected type DNSRecord, got type %T", o)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or something more like this:

Suggested change
log.Info("failed to read DNS record")
log.Error(nil, "Got unexpected type of object", "expected", "DNSRecord", "actual", fmt.Sprintf("%T", o))

Comment on lines +86 to +88
// When a DNS record is deleted, there may be a conflicting record that should be published. Watch exclusively for
// deletes, and queue a reconcile request for the appropriate conflicting record, if applicable.
if err := c.Watch(source.Kind[client.Object](operatorCache, &iov1.DNSRecord{}, handler.EnqueueRequestsFromMapFunc(reconciler.MapOnRecordDelete), predicate.Funcs{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the comment a little more explicit that, yes, we have two watches on the same resource (dnsrecords), and the reason is so that we can have a predicate and mapfunc to do something special for deletes.

Comment on lines +86 to +90
// When a DNS record is deleted, there may be a conflicting record that should be published. Watch exclusively for
// deletes, and queue a reconcile request for the appropriate conflicting record, if applicable.
if err := c.Watch(source.Kind[client.Object](operatorCache, &iov1.DNSRecord{}, handler.EnqueueRequestsFromMapFunc(reconciler.MapOnRecordDelete), predicate.Funcs{
CreateFunc: func(e event.CreateEvent) bool { return false },
DeleteFunc: func(e event.DeleteEvent) bool { return true },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment with your findings that the delete event happens when the object is actually deleted, not when it is merely marked for deletion (that is, when deletionTimestamp is set).

@@ -580,6 +646,37 @@ func (r *reconciler) ToDNSRecords(ctx context.Context, o client.Object) []reconc
return requests
}

func (r *reconciler) MapOnRecordDelete(ctx context.Context, o client.Object) []reconcile.Request {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (r *reconciler) MapOnRecordDelete(ctx context.Context, o client.Object) []reconcile.Request {
func (r *reconciler) mapOnRecordDelete(ctx context.Context, o client.Object) []reconcile.Request {

@@ -580,6 +646,37 @@ func (r *reconciler) ToDNSRecords(ctx context.Context, o client.Object) []reconc
return requests
}

func (r *reconciler) MapOnRecordDelete(ctx context.Context, o client.Object) []reconcile.Request {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a godoc comment

@@ -840,3 +847,26 @@ func Test_customCABundle(t *testing.T) {
})
}
}

func buildFakeCache(t *testing.T) *testutil.FakeCache {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could use a godoc comment

// correctly merges status updates.
func TestPublishRecordToZonesMergesStatus(t *testing.T) {
func Test_publishRecordToZonesMergesStatus(t *testing.T) {
Copy link
Contributor

@Miciah Miciah May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestPublishRecordToZonesMergesStatus is an appropriate name for the test as there is no publishRecordToZonesMergesStatus function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I am also missing something here, but is there a new test case that will check if the condition is set?

@rfredette rfredette changed the title Don't publish duplicate DNS records OCPBUGS-31521: Don't publish duplicate DNS records Aug 13, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 13, 2025
@openshift-ci-robot
Copy link
Contributor

@rfredette: This pull request references Jira Issue OCPBUGS-31521, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.19.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Before attempting to publish a domain to a zone, check if that domain is already being published to the same zone.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rfredette rfredette marked this pull request as ready for review August 13, 2025 21:21
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 13, 2025
@openshift-ci openshift-ci bot requested review from miheer and Thealisyed August 13, 2025 21:22
Copy link
Contributor

openshift-ci bot commented Aug 14, 2025

@rfredette: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-hypershift-conformance d38f7bf link true /test e2e-aws-ovn-hypershift-conformance
ci/prow/e2e-gcp-operator d38f7bf link true /test e2e-gcp-operator
ci/prow/e2e-aws-operator d38f7bf link true /test e2e-aws-operator
ci/prow/e2e-azure-operator d38f7bf link true /test e2e-azure-operator
ci/prow/e2e-aws-ovn d38f7bf link true /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-single-node d38f7bf link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-techpreview d38f7bf link false /test e2e-aws-ovn-techpreview
ci/prow/okd-scos-e2e-aws-ovn d38f7bf link false /test okd-scos-e2e-aws-ovn
ci/prow/hypershift-e2e-aks d38f7bf link true /test hypershift-e2e-aks
ci/prow/verify d38f7bf link true /test verify
ci/prow/e2e-azure-ovn d38f7bf link false /test e2e-azure-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

}
oldestExistingRecord := iov1.DNSRecord{}
for _, existingRecord := range otherRecords.Items {
if oldestExistingRecord.CreationTimestamp.IsZero() || existingRecord.CreationTimestamp.Before(&oldestExistingRecord.CreationTimestamp) {
Copy link
Contributor

@rikatz rikatz Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a suggestion, first check if existingRecord is marked for deletion, so you don't consider it for reconciliation if it is also on the queue for removal:

if !existingRecord.DeletionTimestamp.IsZero() {
  continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable. On the one hand, as noted in #1229 (comment), the reconciliation triggers when the DNSRecord CR is actually deleted, not merely marked for deletion. On the other hand, we could be in a situation where multiple dnsrecords are being deleted at the same time, in which case your suggestion might avoid some unnecessary reconciliations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, I should have been more explicit but my idea here is to ignore any other DNS record that would be considered older but also being on the deletion queue.

for _, existingRecord := range records.Items {
// we only care if the domain name is published by a different record, so ignore the matching record if it
// already exists.
// TODO: There's got to be a better way to match the same object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can match the UID here. I would say that during admission webhooks matching UID is a bad idea, as the UID will be added once persisted by apiserver, but in case of a reconciliation process it means that the UID was already generated and cannot be changed anymore.

iov1.AddToScheme(scheme)
fakeClient := fake.NewClientBuilder().
WithScheme(scheme).
WithIndex(&iov1.DNSRecord{}, dnsRecordIndexFieldName, func(o client.Object) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (no need to fix now!): do we care about making this indexer function some sort of utils/specific function that can be used both on the operatorCache.IndexField and on fakeCache to keep consistency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a util function that the controller logic and test logic would share? That does make sense, though I do caution against re-using controller logic in tests if doing so could mask a defect in the controller logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the idea was to make some util/shared function, but also I see your concerns here, so makes sense also to not share and in case something changes on the main reconciliation logic, the test that has a different cache logic will catch the regression.

@candita
Copy link
Contributor

candita commented Aug 20, 2025

/assign @rikatz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants