-
Notifications
You must be signed in to change notification settings - Fork 1
Guide on recovering failed events in the warehouse in SQL #1378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… SQL. Starting with one failure type, and for one data warehouse (BigQuery)
✅ Deploy Preview for snowplow-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
SELECT FORMAT("""INSERT INTO `snowplow_good.events` | ||
WITH prep AS ( | ||
SELECT | ||
contexts_com_snowplowanalytics_snowplow_failure_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrpeck1989 I think this should actually be SELECT *
Otherwise you don't get the non failure columns and the generated insert query will fail and error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This essentially becomes a fully typed out SELECT app_id, v_tracker...
in the following outer query outside of this CTE. You can see the line below that says SELECT %s FROM prep
, where %s
is the full list of columns to be inserted, that have been programmatically generated in previous CTEs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least when I was doing it in Snowflake (maybe it's different in BQ) the only column present in prep was contexts_com_snowplowanalytics_snowplow_failure_1 therefore the outer query failed because there was no app_id, v_tracker... present in prep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I understand now in context.
This query is only designed to create the repaired column statement - hence, it only needs (at least in this example) the contexts_com_snowplowanalytics_snowplow_failure_1
column, as we're pulling out the values from the raw event from that failure context. The following query further down the page is designed to create the full INSERT
command to reinsert the repaired data.
I have created and tested the Snowflake version of these queries but do not have permissions to commit to this repo. I've attached my version of the index.md file which uses tabs to choose between BQ and Snowflake. @jrpeck1989 could you please take a look at at them and add them if all looks good? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment on the SQL, but the page itself looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have incorporated Alana's contribution to showing Snowflake support, and tidied one or two things up.
Fixing small typos in my SQL that I discovered when doing a Demo to the Support team
Summary
A documentation piece/guide on how to use the Failed Events In The Warehouse events table to recover failing events back into your good events table. Covers the following:
Things NOT covered in this guide:
snowplow_unified
package