-
Notifications
You must be signed in to change notification settings - Fork 1
AWS Personalize solution accelerator tutorial files #1411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
snow-leosanches
wants to merge
2
commits into
main
Choose a base branch
from
recommendations-aws-personalize-tutorial
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
13 changes: 13 additions & 0 deletions
13
tutorials/recommendations-with-aws-personalize/aws-personalize-setup.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
--- | ||
title: AWS Personalize Setup | ||
position: 5 | ||
--- | ||
|
||
To proceed we need the following infrastructure: | ||
|
||
- A Dataset Group: this is the top-level resource for AWS Personalize, and maps to a use case (e.g. Ecommerce) and will contain our Datasets and Recommenders | ||
- A set of Schemas for each of our datasets above; these are defined in Avro format ([the repo has templates](https://github.com/snowplow-incubator/dbt-snowplow-recommendations/tree/main/aws_personalize_utilities/schemas)) | ||
- The Datasets configurations themselves; these bind some use-case configuration to the Dataset Group and Schema | ||
- Support infrastructure like IAM roles and policies to allow AWS Personalize to access your data | ||
- A Dataset Import to load your datasets from S3 | ||
- Recommenders to choose a model, train on the dataset, and serve personalizations |
30 changes: 30 additions & 0 deletions
30
tutorials/recommendations-with-aws-personalize/conclusion.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
--- | ||
position: 9 | ||
title: Conclusion | ||
--- | ||
|
||
# Conclusion | ||
|
||
In this tutorial, we have explored the **AWS Personalize** solution accelerator for feeding AWS Personalize with Snowplow data, enabling customers to update recommendations for their end-users in real-time. | ||
|
||
## Key Takeaways | ||
|
||
### Understanding the Process | ||
|
||
We successfully have built a real-time system for processing event data including: | ||
|
||
- Initial training data for AWS Personalize, to jump-start a base recommendation system with seminal data; | ||
- Continuously feeding AWS Personalization recommendation engine with new data once this new data is generated in real-time. | ||
|
||
### Practical Applications | ||
|
||
This tutorial can be extended to utilize Snowplow event data for other real-time use cases, such as: | ||
|
||
- Web Engagement analytics; | ||
- Ad performance tracking. | ||
|
||
## Next Steps | ||
- **Extend tracking:** Extend the solution to track more granular user interactions or track on a new platform such as mobile; | ||
- **Expanding use cases:** AWS Personalize can be used not just for recommendations, but also for hyperpersonalization based on customers' history. | ||
|
||
By completing this tutorial, you are equipped to harness the power of event-driven systems and Snowplow’s analytics framework to build dynamic, real-time solutions tailored to your streaming and analytics needs. |
167 changes: 167 additions & 0 deletions
167
tutorials/recommendations-with-aws-personalize/impression-integration.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
--- | ||
title: Impression Integration | ||
position: 8 | ||
--- | ||
|
||
To "close the loop" and allow AWS Personalize to improve the performance of it's recommendations, it should be fed back information about how they perform. | ||
|
||
It is important know which of its recommendations actually got clicked, so it can account for that in its models and optimize the performance of individual recommendations. | ||
|
||
For this to work, the recommendation IDs and the corresponding clicks of the widgets that are rendered, based on its suggestions, need to be tracked. | ||
|
||
To do this, adjustments need to be performed to what was built so far, such as: | ||
|
||
- The site tracking to track the clicks (impressions may be tracked to assess performance) | ||
- The Snowbridge config to account for the click events | ||
- The Lambda code already understands how to handle events with impression information, so no changes are required in it | ||
|
||
## Tracking Impressions | ||
|
||
The [Element Tracking plugin](https://github.com/snowplow/snowplow-javascript-tracker/pull/1400) can be used as follows: | ||
|
||
```javascript | ||
snowplow("addPlugin", "/cdn/shop/t/3/assets/element-tracker.umd.min.js", ["snowplowElementTracking", "SnowplowElementTrackingPlugin"]); | ||
|
||
// set up impression tracking | ||
snowplow("startElementTracking", { | ||
elements: { | ||
name: "recommendation-impression", // name the configuration something logical | ||
selector: "[data-recommendation-id]", // selector will vary based on the widget implementation | ||
expose: { when: "element", minPercentage: .5 }, // once per widget, only once it is 50% in view | ||
component: true, // mark it as a component ton get clicks | ||
details: { dataset: ["recommendationId"] }, // extract the recommendation ID | ||
contents: { | ||
name: "recomendation-item", | ||
selector: "[data-item-id]", | ||
details: { dataset: ["itemId"] } // also extract the shown item IDs | ||
} | ||
} | ||
}); | ||
|
||
// set up click tracking | ||
snowplow("getComponentListGenerator", function (_, componentGeneratorWithDetail) { | ||
document.addEventListener("click", function(e) { | ||
if (e.target.closest("a") && e.target.closest("[data-recommendation-id]")) { | ||
const target = e.target.closest("a"); | ||
const details = componentGeneratorWithDetail(target); | ||
snowplow("trackLinkClick", { element: target, context: details }); | ||
} | ||
}, false); | ||
}); | ||
``` | ||
|
||
With this configuration, whenever the custom recommendations widget is in-view, an `expose_element` event will be fired, like the following: | ||
|
||
```json | ||
{ | ||
"schema": "iglu:com.snowplowanalytics.snowplow/expose_element/jsonschema/1-0-0", | ||
"data": { | ||
"element_name": "recommendation-impression" | ||
} | ||
} | ||
``` | ||
|
||
This event will have an `element` entity describing the widget, including the recommendation/impression ID, like so: | ||
|
||
```json | ||
{ | ||
"schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0","data": { | ||
"element_name": "recommendation-impression", | ||
"width": 1600, | ||
"height": 229.4166717529297, | ||
"position_x": 160, | ||
"position_y": 531.5, | ||
"doc_position_x": 160, | ||
"doc_position_y": 3329, | ||
"element_index": 1, | ||
"element_matches": 1, | ||
"originating_page_view": "3d775590-74c6-4d0a-85ee-4d63d72bda2d", | ||
"attributes":[ | ||
{ | ||
"source": "dataset", | ||
"attribute": "recommendationId","value": "RID-24-4a6a-8380-506b189ff622-CID-529b19" | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
And it will also contain `element_content` entities for each item in the widget, capturing their product IDs, like the following: | ||
|
||
```json | ||
{ | ||
"schema": "iglu:com.snowplowanalytics.snowplow/element_content/jsonschema/1-0-0", | ||
"data": { | ||
"element_name": "recomendation-item", | ||
"parent_name": "recommendation-impression", | ||
"parent_position": 1, | ||
"position": 1, | ||
"attributes": [ | ||
{ | ||
"source": "dataset", | ||
"attribute": "itemId", | ||
"value": "48538191331628" | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
In addition, if the links in the widget are clicked, a regular `link_click` event will be generated - but because the widget is defined as a component, it will extract the same entities as an impression and include those, too. | ||
|
||
These `link_click` events are what it's needed to detect and forward to AWS Personalize. | ||
|
||
## Snowbridge Impressions and Clicks | ||
|
||
Back to the `transform.js` file, `link_click` event needs to be included in the event selection regex: | ||
|
||
```hcl | ||
regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order)$" # before | ||
|
||
regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order|link_click)$" # after | ||
``` | ||
|
||
The custom transform then needs to be aware of them: | ||
|
||
```javascript | ||
case 'link_click': // recommendation clicks | ||
ep.Data.event_type = "Click"; | ||
|
||
const element = event.contexts_com_snowplowanalytics_snowplow_element_1 || []; | ||
const content = event.contexts_com_snowplowanalytics_snowplow_element_content_1 || []; | ||
|
||
if (!element.length) return SKIP_EVENT; // unrelated link_click | ||
if (!content.length) return SKIP_EVENT; // unrelated link_click | ||
|
||
let impressionId = null; | ||
|
||
element.forEach((e) => { | ||
if (e.element_name !== "recommendation-impression") return; // some other element/component | ||
if (e.attributes) { | ||
e.attributes.forEach((a) => { | ||
if (a.source === "dataset" && a.attribute === "recommendationId") { | ||
impressionId = a.value; | ||
} | ||
}); | ||
} | ||
}); | ||
|
||
if (!impressionId) return SKIP_EVENT; // couldn't find impression info | ||
|
||
const items = []; | ||
|
||
content.forEach((ec) => { | ||
if (ec.parent_name !== "recommendation-impression") return; | ||
items.push(ec.attributes[0].value); | ||
}); | ||
|
||
ep.Data.item_ids = items; // for simplicity we will pretend the first item was the clicked one | ||
ep.Data.impression_id = impressionId; | ||
break; | ||
default: | ||
return SKIP_EVENT; | ||
``` | ||
|
||
Snowbridge will now send the clicked recommendation events to the Lambda, which will send them to AWS Personalize. | ||
|
||
AWS Personalize will now be able to optimize its recommendations based on how they perform. |
31 changes: 31 additions & 0 deletions
31
tutorials/recommendations-with-aws-personalize/introduction.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
title: Introduction | ||
position: 1 | ||
--- | ||
|
||
[AWS Personalize](https://aws.amazon.com/personalize/) is a ML-based solution to provide personalization and recommendations capabilities to end-users. It can use Snowplow data to build different use cases. It may be used with AWS SDKs, and it counts with a UI/UX interface with AWS Console. | ||
|
||
Some other examples supported by AWS Personalize include: | ||
* Email personaliztion | ||
* Next best action | ||
* Search personalization | ||
* Media/content recommendations | ||
|
||
This accelerator demonstrates how Snowplow data can be used to feed AWS Personalize models. Any version of Snowplow that supports Snowbridge can be used, such as [Snowplow Local](https://github.com/snowplow-incubator/snowplow-local). For testing purposes, we recommend generating events using one of our examples that work with our out-of-the-box ecommerce events, like our [**Snowplow ecommerce store**](https://github.com/snowplow-industry-solutions/ecommerce-nextjs-example-store). | ||
|
||
## Key technologies | ||
|
||
* Snowplow: event tracking pipeline (Collector, Enrich, Kinesis sink) | ||
* Snowbridge: event forwarding module, part of Snowplow | ||
* AWS Personalize: the recommender technology | ||
* AWS Lambda: a public endpoint to receive Snowplow events and serve user recommendations, properly disclosing AWS credentials | ||
* Terraform: an Infrastructure-as-Code library, to configure AWS Personalize and its dependent components | ||
|
||
Optionally, the following technologies are recommended to complete this tutorial: | ||
|
||
* Python: general pupose programming language | ||
|
||
### Event capture and ingestion with Snowplow | ||
|
||
- E-store front-end and Snowplow JavaScript tracker: user activity is captured as Snowplow ecommerce events | ||
- Snowplow to AWS Lambda to AWS Personalize: the Snowplow pipeline validates the events, enriches them with device and geolocation data, then forwards them into the proper AWS lambda instance, that will feed AWS Personalize |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"title": "Recommendations system with AWS Personalize", | ||
"label": "Solution accelerator", | ||
"description": "Use Snowplow data to build a recommendations system with AWS Personalize.", | ||
"useCases": ["Ecommerce", "Recommendations"], | ||
"technologies": [], | ||
"snowplowTech": ["Snowbridge"] | ||
} |
140 changes: 140 additions & 0 deletions
140
tutorials/recommendations-with-aws-personalize/real-time-integration.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
--- | ||
title: Real-time Integration | ||
position: 7 | ||
--- | ||
|
||
Having recommendations being served on site and influencing behaviour, the next step is to give AWS Personalize a feed of events as they occur so it can keep up to date with its suggestions. | ||
|
||
To do this, utilize Snowbridge to intercept events coming from the site, and send them to AWS Personalize. | ||
|
||
While Snowbridge has support for making requests to HTTP APIs, it unfortunately isn't sophisticated enough to do the authentication required to send events to an AWS API, so the AWS Lambda function should be adjusted to do that part, and just have Snowbridge send the events to it. | ||
|
||
Start with a Snowbridge configuration to filter our events and do some simple transforms: | ||
|
||
```hcl | ||
transform { | ||
use "spEnrichedFilter" { | ||
atomic_field = "event_name" | ||
regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order)$" # filter to only ecommerce events you're interested in | ||
filter_action = "keep" | ||
} | ||
} | ||
|
||
transform { | ||
use "js" { | ||
script_path = "/tmp/transform.js" # use a custom JS transform on the payload; not all ecommerce events can be filtered by just `event_name` so more checks are needed there | ||
snowplow_mode = true # turn the TSV record into JSON for our transform function to handle | ||
} | ||
} | ||
|
||
target { | ||
use "http" { | ||
url = "https://your-lambda-instance.execute-api.your-aws-region.amazonaws.com/interaction/recommendations_demo_dsg_ecommerce" # this is the dataset group name | ||
content_type = "application/json" | ||
basic_auth_username = "snowbridge" | ||
basic_auth_password = ">_-G6xdYDjU?O4NXGpc4" # use a password to prevent abuse and false event submission | ||
} | ||
} | ||
``` | ||
|
||
This configuration requires a `transform.js` file that will describe how to translate different e-commerce events into interactions AWS Personaize is expecting: | ||
|
||
```js | ||
/** | ||
* @typedef {object} EngineProtocol | ||
* @property {boolean} [FilterOut] | ||
* @property {string} [PartitionKey] | ||
* @property {string | object} [Data] | ||
* @property {Record<string, string>} [HTTPHeaders] | ||
*/ | ||
|
||
const SKIP_EVENT = { FilterOut: true }; | ||
|
||
/** | ||
* @param {EngineProtocol} ep | ||
* @returns {EngineProtocol} | ||
*/ | ||
function main(ep) { | ||
if (typeof ep.Data === "string") return SKIP_EVENT; // should be in snowplow_mode | ||
|
||
const event = ep.Data; | ||
|
||
const ts = (event.derived_tstamp || event.collector_tstamp).UnixMilli(); | ||
|
||
const client_session = (event.contexts_com_snowplowanalytics_snowplow_client_session_1 || [])[0] || {}; | ||
|
||
ep.Data = { | ||
event_id: event.event_id, | ||
event_type: "", | ||
user_id: event.user_id || event.domain_userid || client_session.userId, | ||
session_id: event.domain_sessionid || client_session.sessionId, | ||
item_ids: undefined, | ||
sent_at: ts, | ||
}; | ||
|
||
let payload = undefined; | ||
let products = undefined; | ||
|
||
switch (event.event_name) { | ||
case 'transaction_item': // classic ecommerce | ||
ep.Data.event_type = "Purchase"; | ||
ep.Data.item_ids = [event.ti_sku]; | ||
break; | ||
case 'action': // enhanced ecommerce | ||
payload = event.unstruct_event_com_google_analytics_enhanced_ecommerce_action_1; | ||
products = event.contexts_com_google_analytics_enhanced_ecommerce_product_1; | ||
if (!payload || !payload.action || !products) return SKIP_EVENT; | ||
ep.Data.item_ids = products.map((i) => i.id); | ||
if (payload.action === "view") { | ||
ep.Data.event_type = "View"; | ||
} else if (payload.action === "click") { | ||
ep.Data.event_type = "Click"; | ||
} else if (payload.action === "purchase") { | ||
ep.Data.event_type = "Purchase"; | ||
} else return SKIP_EVENT; | ||
break; | ||
case 'snowplow_ecommerce_action': // snowplow ecommerce | ||
payload = event.unstruct_event_com_snowplowanalytics_snowplow_ecommerce_snowplow_ecommerce_action_1; | ||
products = event.contexts_com_snowplowanalytics_snowplow_ecommerce_product_1; | ||
if (!payload || !payload.type || !products) return SKIP_EVENT; | ||
ep.Data.item_ids = products.map((i) => i.id); | ||
if (payload.type === "product_view") { | ||
ep.Data.event_type = "View"; | ||
} else if (payload.type === "list_view") { | ||
ep.Data.event_type = "View"; // ??? | ||
} else if (payload.type === "list_click") { | ||
ep.Data.event_type = "Click"; | ||
} else if (payload.type === "transaction") { | ||
ep.Data.event_type = "Purchase"; | ||
} else return SKIP_EVENT; | ||
break; | ||
case 'view_item': // hyper-t ecommerce | ||
payload = event.unstruct_event_io_snowplow_ecomm_view_item_1; | ||
if (!payload) return SKIP_EVENT; | ||
ep.Data.event_type = "View"; | ||
ep.Data.item_ids = [payload.item_id]; | ||
break; | ||
case 'create_order': // hyper-t ecommerce | ||
ep.Data.event_type = "Purchase"; | ||
payload = event.contexts_io_snowplow_ecomm_cart_1; | ||
if (!payload || !payload.items_in_cart) return SKIP_EVENT; | ||
ep.Data.item_ids = payload.items_in_cart.map((i) => i.item_id); | ||
break; | ||
default: | ||
return SKIP_EVENT; | ||
} | ||
|
||
if (!ep.Data.item_ids || !ep.Data.event_type) return SKIP_EVENT; | ||
return ep; | ||
} | ||
``` | ||
|
||
Snowbridge will: | ||
- Filter the event stream to ecommerce events | ||
- Transform them into a common format | ||
- Send it to your AWS Lambda | ||
|
||
The Lambda will then: | ||
- Submit the interaction to AWS Personalize as a real-time event | ||
|
||
This allows AWS Personalize to react to new behaviour, and it will periodically retrain itself and adjust its models to accomodate the newer observations. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to get it?