Skip to content

Conversation

jrpeck1989
Copy link
Contributor

@jrpeck1989 jrpeck1989 commented Aug 22, 2025

Summary

A documentation piece/guide on how to use the Failed Events In The Warehouse events table to recover failing events back into your good events table. Covers the following:

  • Finding and identifying the failing events of interest
  • Reconstructing the events to a valid format using SQL
  • Generating a script to programmatically re-insert the newly fixed rows back into the good events table

Things NOT covered in this guide:

  • Only BigQuery for the time being, as it has structured columns which make this slightly easier (Snowflake to follow in an upcoming PR)
  • It does not cover the data modelling considerations (i.e. working with the newly recovered rows in the snowplow_unified package
  • This guide only covers one type of recovery - an entity sent with an incorrect schema version - but the pattern is very similar for any kind of failure

… SQL. Starting with one failure type, and for one data warehouse (BigQuery)
Copy link

netlify bot commented Aug 22, 2025

Deploy Preview for snowplow-docs ready!

Name Link
🔨 Latest commit e2f9548
🔍 Latest deploy log https://app.netlify.com/projects/snowplow-docs/deploys/68eafcc637f8150008f2f9c3
😎 Deploy Preview https://deploy-preview-1378--snowplow-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 18 (🟢 up 5 from production)
Accessibility: 91 (no change from production)
Best Practices: 100 (🟢 up 8 from production)
SEO: 97 (🟢 up 2 from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@jrpeck1989 jrpeck1989 marked this pull request as ready for review August 25, 2025 19:54
SELECT FORMAT("""INSERT INTO `snowplow_good.events`
WITH prep AS (
SELECT
contexts_com_snowplowanalytics_snowplow_failure_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrpeck1989 I think this should actually be SELECT * Otherwise you don't get the non failure columns and the generated insert query will fail and error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially becomes a fully typed out SELECT app_id, v_tracker... in the following outer query outside of this CTE. You can see the line below that says SELECT %s FROM prep, where %s is the full list of columns to be inserted, that have been programmatically generated in previous CTEs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least when I was doing it in Snowflake (maybe it's different in BQ) the only column present in prep was contexts_com_snowplowanalytics_snowplow_failure_1 therefore the outer query failed because there was no app_id, v_tracker... present in prep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I understand now in context.

This query is only designed to create the repaired column statement - hence, it only needs (at least in this example) the contexts_com_snowplowanalytics_snowplow_failure_1 column, as we're pulling out the values from the raw event from that failure context. The following query further down the page is designed to create the full INSERT command to reinsert the repaired data.

@alana-snowplow
Copy link
Contributor

alana-snowplow commented Sep 23, 2025

I have created and tested the Snowflake version of these queries but do not have permissions to commit to this repo. I've attached my version of the index.md file which uses tabs to choose between BQ and Snowflake. @jrpeck1989 could you please take a look at at them and add them if all looks good?
index.md

Copy link
Collaborator

@mscwilson mscwilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the SQL, but the page itself looks good

Copy link
Contributor Author

@jrpeck1989 jrpeck1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have incorporated Alana's contribution to showing Snowflake support, and tidied one or two things up.

Fixing small typos in my SQL that I discovered when doing a Demo to the Support team
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants