Skip to content

Conversation

@d0weinberger
Copy link
Collaborator

@d0weinberger d0weinberger commented Oct 15, 2025

Why this PR?

This PR enables the creation of AWS connections via Terraform which are, for example, used in Workflows.

What has changed?

The code was already there, but disabled. I enabled it again, did some manual testing with it, and fixed/changed parts of it.

  • In role_arn/service.go:71, I let Update return nil. Only Create does anything, which is updating the role_arn property of the corresponding connection.
  • In aws/service.go, the commented logic for the Update did not work after enabling it, because even if an ARN is set in the Terraform resource, it would be overwritten by the ARN set on the server (which is empty). So the created ARN resource did not lead to an update of the connection settings on the server. I removed all of that and just called me.service.Update, but need to talk to @Reinhard-Pilz-Dynatrace whether this might be problematic, or to understand the intention of the code that was there before.
  • The call for adding the role ARN needed some retry logic. The reason for this is that when creating the IAM role on AWS, it is not immediately in effect. It can take a few seconds before this is the case. If the AWS connection settings object is updated immediately after IAM role creation, the call will fail, as HAS will fail to assume the role. I implemented a retry loop using retry.RetryContext. We retry on server-errors, non-HTTP-errors and errors of code 400 and 404 to handle the case with eventual consistency. The default timeout is 20 minutes. The timeout for the retry loop can be set with a timeouts block, as seen in the example below, where it is set to 30 minutes. One minute (if available) is deducted from the deadline for the retry context to have enough time for finalizing tasks. Example:
resource "dynatrace_aws_connection_role_arn" "test-aws-connection-rolebased-arn" {
  aws_connection_id = dynatrace_aws_connection.test-aws-connection-rolebased.id
  role_arn          = aws_iam_role.example_role_rolebased.arn

  timeouts {
    create = "30m"
  }
}

How does it do it?

The first step is to create a dynatrace_aws_connection resource. This also creates the settings object on the server.
Then, after having created the IAM role in AWS, the dynatrace_aws_connection_role_arn resource is used. This resource updates the role_arn of the AWS connection settings object on the server.

Since the AWS connection settings object is practically immutable after creation, all attributes and sub-resources of the resource dynatrace_aws_connection are flagged as forceNew. The Update should only ever be called via role_arn/service to set the role_arn.

So for dynatrace_aws_connection, if a terraform resource is

  • created --> Create is called, new settings object is created (without role ARN)
  • changed --> because of forceNew, the old resource is deleted and a new one is created
  • removed --> Delete is called and the connection is deleted.

For dynatrace_aws_connection_role_arn, if a terraform resource is

  • created --> Create is called, which calls Update for dynatrace_aws_connection, so the AWS connection settings object is updated - this should only set the ARN. Other changes, or resetting the ARN, would lead to API errors
  • changed --> returns nil.
  • removed --> Delete is called but nothing happens - nil is returned. Since the role ARN is also immutable once it is set, we cannot update the AWS connection with an empty ARN.

How is it tested?

Just manual tests right now. No E2E test setup.
An example using web identity is added.
An acceptance test is added, but skipped for now.

How does it affect users?

Users can now use the TF resources dynatrace_aws_connection and dynatrace_aws_connection_role_arn to configure AWS connections, which in turn can be used in, e.g. workflows.

Issue:
CA-16697

var err error
var retry = 10
for i := 0; i < retry; i++ {
if err = me.connService.Update(ctx, v.AWSConnectionID, &connValue); err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Should we limit it to only retry on 400 status codes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I don't see any immediaty harm in retrying on every error. Do you?
What I'd like to discuss, though, is: number of retries, duration of timeout and whether these 2 values should be customizable, i.e. should we pass this from the outside via env vars instead of magic numbers? What do you think?

@d0weinberger d0weinberger force-pushed the feat/aws-connection branch 3 times, most recently from 182c43d to b57e9e7 Compare October 22, 2025 07:35
retryTimeout = remaining - timeoutDeadlineBuffer
} else {
retryTimeout = remaining
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what that does.
timeoutDeadlineBuffer is like an expected amount of time of API call (e.g., min time)?
If the remaining time is less than timeoutDeadlineBuffer I would expect that the call is not executed. With this else here, I don't quite get the point of the timeoutDeadlineBuffer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timeout for the whole create-process triggered via Terraform is, say, 10 minutes, then I don't want to have the whole 10 minutes allocated for the (internal) update call to the settings API. I want to give it maybe 9 minutes, so there is still enough time for all the other operations after the update call to finish, before the whole create times out.

Whether to do or not to do the update call if the remaining time is less than what we would like to have for buffer ... yes, that's a fair point. To play it save, we could just not do the call at all instead of using the whole remaining time for the update call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arthurpitman could you also have a look?
I would prefer either returning in the else branch with DeadlineExceeded or not considering a timeoutDeadlineBuffer at all

- Code that was previously commented in `enums.go`, `provider.go`, `resource_descriptor`, `aws/service.go`, and `role_arn/service.go` is now enabled.
- A (minimal) `example.tf` using web identity is added. The AWS provider part is missing to have a complete example.
- An acceptance test is added, but skipped for now.
This is necessary because when the IAM role is created on AWS, it takes some time until it comes into effect. If immediately after creating the IAM role the role ARN is added to the AWS connection, the HAS will fail when trying to assume the role. Therefore, we need to retry adding the role ARN
…able

With the changes in this commit, users of the Terraform provider can configure the timeout for the `create` operation for the `role_arn` resource by adding the following block to the resource:

```
timeouts {
  create = "<x>m"
}
``
where <x> is the max amount of minutes the user wants to wait.
Kirdock
Kirdock previously approved these changes Oct 27, 2025
Copy link
Collaborator

@Kirdock Kirdock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation solves the issue. The other comment open is minor and doesn't have a big impact on the implementation. So lgtm

Kirdock
Kirdock previously approved these changes Oct 27, 2025
@sonarqubecloud
Copy link

@d0weinberger d0weinberger merged commit 2489714 into main Oct 30, 2025
6 of 8 checks passed
@d0weinberger d0weinberger deleted the feat/aws-connection branch October 30, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants