-
Notifications
You must be signed in to change notification settings - Fork 4.3k
feat(kinesisfirehose): support iceberg destination #35761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(kinesisfirehose): support iceberg destination #35761
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This review is outdated)
|
|
||
| const app = new cdk.App(); | ||
|
|
||
| const stack = new cdk.Stack(app, 'aws-cdk-firehose-iceberg-destination'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current integ test file doesn't work in my environment. It throws errors as below:
5:23:04 PM | CREATE_FAILED | AWS::KinesisFirehose::DeliveryStream | IcebergDeliveryStr...CustomRole50F03C05
Resource handler returned message: "Role arn:aws:iam::123456789012:role/aws-cdk-firehose-iceberg-destina-CustomRole6D8E6809-g4HAdgCYjJQD is not authorized to perform: glue:GetTable for the given table or the table does not exist. (Service: Firehose, Status Code: 400, Request ID: ...
| destinationTableConfigurations: [ | ||
| { | ||
| databaseName: 'iceberg_db', | ||
| tableName: 'iceberg_table', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use database.ref and table.ref to get these names. Plus it automatically sets proper dependencies between those resources.
| resources: [ | ||
| catalogArn, | ||
| `${catalogArn}/database/*`, | ||
| `${catalogArn}/table/*`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that a catalog can be shared with multiple systems in a region, I feel table/* and database/* is a bit too permissive. Can we narrow them down using information from destinationTableConfigurations?
| new firehose.DeliveryStream(stack, 'IcebergDeliveryStreamAllProperties', { | ||
| destination: new firehose.IcebergDestination(bucket, { | ||
| catalogConfiguration: { | ||
| catalogArn: `arn:aws:glue:${cdk.Stack.of(stack).region}:${cdk.Stack.of(stack).account}:catalog`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend to use Stack.formatArn in aws-cdk-lib code. This way you don't have to care about region/account/partition.
| resources: [ | ||
| catalogArn, | ||
| `${catalogArn}/database/*`, | ||
| `${catalogArn}/table/*`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this document, the correct format for glue database/table is
arn:${Partition}:glue:${Region}:${Account}:database/${DatabaseName} or arn:${Partition}:glue:${Region}:${Account}:table/${DatabaseName}/${TableName}. It seems we don't need catalog part.
Another document is referring to another arn format, so I'm not entirely sure 👀
| * Specifies the Glue catalog ARN identifier of the destination Apache Iceberg Tables. | ||
| * Format: arn:aws:glue:region:account-id:catalog | ||
| * | ||
| * @default - Uses the default Glue Catalog for the account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this description align with the current implementation?
|
|
||
| // Add CreateTable permission if table creation is enabled | ||
| if (this.props.tableCreationEnabled) { | ||
| actions.push('glue:CreateTable'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need this permission when warehouseLocation (instead of catalogArn) and tableCreationEnabled is specified?
| new firehose.DeliveryStream(stack, 'IcebergDeliveryStreamWithWarehouse', { | ||
| destination: new firehose.IcebergDestination(bucket, { | ||
| catalogConfiguration: { | ||
| warehouseLocation: `s3://${bucket.bucketName}/warehouse`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about setting tableCreationEnabled for wider coverage?
✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.
…ub.com:Konippi/aws-cdk into support-iceberg-destination-in-kinesisfirehose
Issue # (if applicable)
Closes #33666.
Reason for this change
Firehose supports the destination settings for Apache Iceberg Table.
https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-destination.html
But at the present, L2 Construct does not support it.
Description of changes
This PR adds a new
IcebergDestinationto theaws-kinesisfirehosemodule that implements theIDestinationinterface.Describe any new or updated permissions being added
Description of how you validated changes
Unit tests and integration tests
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license