Skip to content

Commit 3e85ce3

Browse files
authored
Ensuring proper resource reconciliation on Repository Creation (#81)
Resolves: aws-controllers-k8s/community#1873 **TLDR:** - Addressed a bug in the ECR controller where Repository Custom Resources (CRs) failed to create `Policy` and `LifecyclePolicy` entries when containing all (possible) fields. - The ECR controller was missing the post `sdkCreate` hooks; But the resources that were missing few default-able fields we properly reconciled primarily due to late initialization, detecting API default values for non-specified fields, causing the controller to re queue resources after they have been created. - Resolved by adding the necessary hooks to the create path. **Description:** This fix addresses a bug in the ECR controller. When a user submitted a Repository Custom Resource (CR) containing all fields, the controller failed to create `Policy` and `LifecyclePolicy` entries. However, if one of the fields was missing (excluding `Policy` or `LifecyclePolicy`), the reconciliation process succeeded. The bug fix was straightforward and intuitive, as it became evident that the controller lacked the required hook for the create path. However, this raised questions about why `policy` and `LifecyclePolicy` were missing in some specific cases. Our e2e tests created CRs with isolated fields, such as `Policy` and `LifecyclePolicy`, and verified that these policies were indeed applied to the ECR Repository. It was evident that the reason these specific cases were successfully reconciled was due to the correct logic in the update path. This logic ensured that if a `Policy` or `LifecyclePolicy` was missing, an API call was made to update these fields. The mystery revolved around why these edge cases were reconciled even when we didn't explicitly request a second reconciliation or requeue of the resource **Debugging:** - Our debugging journey began with a review of the logs and a comparison between the logs of a resource created with all fields and one with only the policy. Surprisingly, the logs showed a complete aditional reconciliation step, and it wasn't initiated by the controller itself (no requeue, no drift remediation, no condition). This was puzzling. - To gain deeper insights, we connected the controller to a local clone of the runtime and added print statements. However, this approach made the logs messier and harder to interpret. Consequently, we tried to debugging with a go-delve debugger. - Our debugging session took us through the ACK runtime, but the issue didn't originate there. Instead, it originiated from 'k8s-sigs/controller-runtime.' We cloned it and extended our debugging to the `controller-runtime`. There, we discovered that 'c.Do.Reconcile(ctx, req)' was causing two reconciliations. - Further investigation revealed that 'for c.processNextWorkItem(ctx) {}' was called, indicating the presence of an extra item in the queue. This item had to be an update because the first item in the queue was a "create". - The delve debugger showed that we submitted two resources one with only few fields and the other with all the fields... We then ran a `diff` on `k get repository <repo1/repo2> -o yaml` which revealed a crucial clue: 'metadata.generation' was 1 for the policy+name-only resource and 2 for the all the fields resource, confirming the update theory. - The controller initiated the resource update because the ECR API returned all fields, even those not explicitly provided. This behavior prompted the ACK runtime to execute a patch into the resource spec, incrementing the generation and triggering an update. Yeah, we know what that late initialisation is a thing, but it completly baited me here... Signed-off-by: Amine Hilaly <[email protected]> By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
1 parent 7cd88b9 commit 3e85ce3

File tree

7 files changed

+101
-5
lines changed

7 files changed

+101
-5
lines changed
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
ack_generate_info:
2-
build_date: "2023-09-14T23:11:50Z"
3-
build_hash: 892f29d00a4c4ad21a2fa32919921de18190979d
4-
go_version: go1.21.0
5-
version: v0.27.1
2+
build_date: "2023-09-29T07:01:53Z"
3+
build_hash: 7445de38211639a12e79992d154adab6e60f01fd
4+
go_version: go1.21.1
5+
version: v0.27.1-1-g7445de3
66
api_directory_checksum: e33a65f2b24673bb4e6b614f65b3e16533ceb95c
77
api_version: v1alpha1
88
aws_sdk_go_version: v1.44.93
99
generator_config_info:
10-
file_checksum: 0e045292e0fa1ace6f091edb93cab73b1817490c
10+
file_checksum: 2c93dfcca987fd568b301811e33e4233523ffee8
1111
original_file_name: generator.yaml
1212
last_modification:
1313
reason: API generation

apis/v1alpha1/generator.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ resources:
5252
code: customPreCompare(delta, a, b)
5353
sdk_read_many_post_set_output:
5454
template_path: hooks/repository/sdk_read_many_post_set_output.go.tpl
55+
sdk_create_post_set_output:
56+
template_path: hooks/repository/sdk_create_post_set_output.go.tpl
5557
sdk_delete_post_build_request:
5658
template_path: hooks/repository/sdk_delete_post_build_request.go.tpl
5759
update_operation:

generator.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ resources:
5252
code: customPreCompare(delta, a, b)
5353
sdk_read_many_post_set_output:
5454
template_path: hooks/repository/sdk_read_many_post_set_output.go.tpl
55+
sdk_create_post_set_output:
56+
template_path: hooks/repository/sdk_create_post_set_output.go.tpl
5557
sdk_delete_post_build_request:
5658
template_path: hooks/repository/sdk_delete_post_build_request.go.tpl
5759
update_operation:

pkg/resource/repository/sdk.go

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
// Set the repository policy
2+
if ko.Spec.Policy != nil && *ko.Spec.Policy != "" {
3+
if _, err := rm.updateRepositoryPolicy(ctx, desired); err != nil{
4+
return nil, err
5+
}
6+
}
7+
// Set the lifecycle policy
8+
if ko.Spec.LifecyclePolicy != nil && *ko.Spec.LifecyclePolicy != "" {
9+
if _, err := rm.updateLifecyclePolicy(ctx, desired); err != nil{
10+
return nil, err
11+
}
12+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: ecr.services.k8s.aws/v1alpha1
2+
kind: Repository
3+
metadata:
4+
name: $REPOSITORY_NAME
5+
spec:
6+
name: $REPOSITORY_NAME
7+
registryID: '$REGISTRY_ID'
8+
imageScanningConfiguration:
9+
scanOnPush: false
10+
imageTagMutability: MUTABLE
11+
encryptionConfiguration:
12+
encryptionType: AES256
13+
policy: '$REPOSITORY_POLICY'
14+
lifecyclePolicy: '$REPOSITORY_LIFECYCLE_POLICY'

test/e2e/tests/test_repository.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121

2222
from acktest import tags as tagutil
2323
from acktest.resources import random_suffix_name
24+
from acktest.aws.identity import get_region, get_account_id
2425
from acktest.k8s import resource as k8s
2526
from e2e import service_marker, CRD_GROUP, CRD_VERSION, load_ecr_resource
2627
from e2e.replacement_values import REPLACEMENT_VALUES
@@ -365,3 +366,56 @@ def test_repository_policy(self, ecr_client):
365366
# Check ECR repository doesn't exists
366367
exists = self.repository_exists(ecr_client, resource_name)
367368
assert not exists
369+
370+
def test_repository_create_will_all_fields(self, ecr_client):
371+
resource_name = random_suffix_name("ecr-repository", 24)
372+
373+
replacements = REPLACEMENT_VALUES.copy()
374+
replacements["REPOSITORY_NAME"] = resource_name
375+
replacements["REPOSITORY_POLICY"] = REPOSITORY_POLICY_GET_DOWNLOAD_URL_ALL
376+
replacements["REPOSITORY_LIFECYCLE_POLICY"] = LIFECYCLE_POLICY_FILTERING_ON_IMAGE_AGE
377+
replacements["REGISTRY_ID"] = get_account_id()
378+
379+
# Load Repository CR
380+
resource_data = load_ecr_resource(
381+
"repository_all_fields",
382+
additional_replacements=replacements,
383+
)
384+
logging.debug(resource_data)
385+
386+
# Create k8s resource
387+
ref = k8s.CustomResourceReference(
388+
CRD_GROUP, CRD_VERSION, RESOURCE_PLURAL,
389+
resource_name, namespace="default",
390+
)
391+
k8s.create_custom_resource(ref, resource_data)
392+
cr = k8s.wait_resource_consumed_by_controller(ref)
393+
394+
assert cr is not None
395+
assert k8s.get_resource_exists(ref)
396+
397+
time.sleep(CREATE_WAIT_AFTER_SECONDS)
398+
399+
# Get latest repository CR
400+
cr = k8s.wait_resource_consumed_by_controller(ref)
401+
402+
# Check ECR repository exists
403+
repo = self.get_repository(ecr_client, resource_name)
404+
assert repo is not None
405+
406+
# Check ECR repository policy exists
407+
policy = self.get_repository_policy(ecr_client, resource_name, repo["registryId"])
408+
assert minify_json_string(policy) == REPOSITORY_POLICY_GET_DOWNLOAD_URL_ALL
409+
# Check ECR repository lifecycle policy exists
410+
lifecycle_policy = self.get_lifecycle_policy(ecr_client, resource_name, repo["registryId"])
411+
assert lifecycle_policy == LIFECYCLE_POLICY_FILTERING_ON_IMAGE_AGE
412+
413+
# Delete k8s resource
414+
_, deleted = k8s.delete_custom_resource(ref)
415+
assert deleted is True
416+
417+
time.sleep(DELETE_WAIT_AFTER_SECONDS)
418+
419+
# Check ECR repository doesn't exists
420+
exists = self.repository_exists(ecr_client, resource_name)
421+
assert not exists

0 commit comments

Comments
 (0)