Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: AWS Personalize Setup
position: 5
---

To proceed we need the following infrastructure:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to get it?


- A Dataset Group: this is the top-level resource for AWS Personalize, and maps to a use case (e.g. Ecommerce) and will contain our Datasets and Recommenders
- A set of Schemas for each of our datasets above; these are defined in Avro format ([the repo has templates](https://github.com/snowplow-incubator/dbt-snowplow-recommendations/tree/main/aws_personalize_utilities/schemas))
- The Datasets configurations themselves; these bind some use-case configuration to the Dataset Group and Schema
- Support infrastructure like IAM roles and policies to allow AWS Personalize to access your data
- A Dataset Import to load your datasets from S3
- Recommenders to choose a model, train on the dataset, and serve personalizations
30 changes: 30 additions & 0 deletions tutorials/recommendations-with-aws-personalize/conclusion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
position: 9
title: Conclusion
---

# Conclusion

In this tutorial, we have explored the **AWS Personalize** solution accelerator for feeding AWS Personalize with Snowplow data, enabling customers to update recommendations for their end-users in real-time.

## Key Takeaways

### Understanding the Process

We successfully have built a real-time system for processing event data including:

- Initial training data for AWS Personalize, to jump-start a base recommendation system with seminal data;
- Continuously feeding AWS Personalization recommendation engine with new data once this new data is generated in real-time.

### Practical Applications

This tutorial can be extended to utilize Snowplow event data for other real-time use cases, such as:

- Web Engagement analytics;
- Ad performance tracking.

## Next Steps
- **Extend tracking:** Extend the solution to track more granular user interactions or track on a new platform such as mobile;
- **Expanding use cases:** AWS Personalize can be used not just for recommendations, but also for hyperpersonalization based on customers' history.

By completing this tutorial, you are equipped to harness the power of event-driven systems and Snowplow’s analytics framework to build dynamic, real-time solutions tailored to your streaming and analytics needs.
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
title: Impression Integration
position: 8
---

To "close the loop" and allow AWS Personalize to improve the performance of it's recommendations, it should be fed back information about how they perform.

It is important know which of its recommendations actually got clicked, so it can account for that in its models and optimize the performance of individual recommendations.

For this to work, the recommendation IDs and the corresponding clicks of the widgets that are rendered, based on its suggestions, need to be tracked.

To do this, adjustments need to be performed to what was built so far, such as:

- The site tracking to track the clicks (impressions may be tracked to assess performance)
- The Snowbridge config to account for the click events
- The Lambda code already understands how to handle events with impression information, so no changes are required in it

## Tracking Impressions

The [Element Tracking plugin](https://github.com/snowplow/snowplow-javascript-tracker/pull/1400) can be used as follows:

```javascript
snowplow("addPlugin", "/cdn/shop/t/3/assets/element-tracker.umd.min.js", ["snowplowElementTracking", "SnowplowElementTrackingPlugin"]);

// set up impression tracking
snowplow("startElementTracking", {
elements: {
name: "recommendation-impression", // name the configuration something logical
selector: "[data-recommendation-id]", // selector will vary based on the widget implementation
expose: { when: "element", minPercentage: .5 }, // once per widget, only once it is 50% in view
component: true, // mark it as a component ton get clicks
details: { dataset: ["recommendationId"] }, // extract the recommendation ID
contents: {
name: "recomendation-item",
selector: "[data-item-id]",
details: { dataset: ["itemId"] } // also extract the shown item IDs
}
}
});

// set up click tracking
snowplow("getComponentListGenerator", function (_, componentGeneratorWithDetail) {
document.addEventListener("click", function(e) {
if (e.target.closest("a") && e.target.closest("[data-recommendation-id]")) {
const target = e.target.closest("a");
const details = componentGeneratorWithDetail(target);
snowplow("trackLinkClick", { element: target, context: details });
}
}, false);
});
```

With this configuration, whenever the custom recommendations widget is in-view, an `expose_element` event will be fired, like the following:

```json
{
"schema": "iglu:com.snowplowanalytics.snowplow/expose_element/jsonschema/1-0-0",
"data": {
"element_name": "recommendation-impression"
}
}
```

This event will have an `element` entity describing the widget, including the recommendation/impression ID, like so:

```json
{
"schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0","data": {
"element_name": "recommendation-impression",
"width": 1600,
"height": 229.4166717529297,
"position_x": 160,
"position_y": 531.5,
"doc_position_x": 160,
"doc_position_y": 3329,
"element_index": 1,
"element_matches": 1,
"originating_page_view": "3d775590-74c6-4d0a-85ee-4d63d72bda2d",
"attributes":[
{
"source": "dataset",
"attribute": "recommendationId","value": "RID-24-4a6a-8380-506b189ff622-CID-529b19"
}
]
}
}
```

And it will also contain `element_content` entities for each item in the widget, capturing their product IDs, like the following:

```json
{
"schema": "iglu:com.snowplowanalytics.snowplow/element_content/jsonschema/1-0-0",
"data": {
"element_name": "recomendation-item",
"parent_name": "recommendation-impression",
"parent_position": 1,
"position": 1,
"attributes": [
{
"source": "dataset",
"attribute": "itemId",
"value": "48538191331628"
}
]
}
}
```

In addition, if the links in the widget are clicked, a regular `link_click` event will be generated - but because the widget is defined as a component, it will extract the same entities as an impression and include those, too.

These `link_click` events are what it's needed to detect and forward to AWS Personalize.

## Snowbridge Impressions and Clicks

Back to the `transform.js` file, `link_click` event needs to be included in the event selection regex:

```hcl
regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order)$" # before

regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order|link_click)$" # after
```

The custom transform then needs to be aware of them:

```javascript
case 'link_click': // recommendation clicks
ep.Data.event_type = "Click";

const element = event.contexts_com_snowplowanalytics_snowplow_element_1 || [];
const content = event.contexts_com_snowplowanalytics_snowplow_element_content_1 || [];

if (!element.length) return SKIP_EVENT; // unrelated link_click
if (!content.length) return SKIP_EVENT; // unrelated link_click

let impressionId = null;

element.forEach((e) => {
if (e.element_name !== "recommendation-impression") return; // some other element/component
if (e.attributes) {
e.attributes.forEach((a) => {
if (a.source === "dataset" && a.attribute === "recommendationId") {
impressionId = a.value;
}
});
}
});

if (!impressionId) return SKIP_EVENT; // couldn't find impression info

const items = [];

content.forEach((ec) => {
if (ec.parent_name !== "recommendation-impression") return;
items.push(ec.attributes[0].value);
});

ep.Data.item_ids = items; // for simplicity we will pretend the first item was the clicked one
ep.Data.impression_id = impressionId;
break;
default:
return SKIP_EVENT;
```

Snowbridge will now send the clicked recommendation events to the Lambda, which will send them to AWS Personalize.

AWS Personalize will now be able to optimize its recommendations based on how they perform.
31 changes: 31 additions & 0 deletions tutorials/recommendations-with-aws-personalize/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: Introduction
position: 1
---

[AWS Personalize](https://aws.amazon.com/personalize/) is a ML-based solution to provide personalization and recommendations capabilities to end-users. It can use Snowplow data to build different use cases. It may be used with AWS SDKs, and it counts with a UI/UX interface with AWS Console.

Some other examples supported by AWS Personalize include:
* Email personaliztion
* Next best action
* Search personalization
* Media/content recommendations

This accelerator demonstrates how Snowplow data can be used to feed AWS Personalize models. Any version of Snowplow that supports Snowbridge can be used, such as [Snowplow Local](https://github.com/snowplow-incubator/snowplow-local). For testing purposes, we recommend generating events using one of our examples that work with our out-of-the-box ecommerce events, like our [**Snowplow ecommerce store**](https://github.com/snowplow-industry-solutions/ecommerce-nextjs-example-store).

## Key technologies

* Snowplow: event tracking pipeline (Collector, Enrich, Kinesis sink)
* Snowbridge: event forwarding module, part of Snowplow
* AWS Personalize: the recommender technology
* AWS Lambda: a public endpoint to receive Snowplow events and serve user recommendations, properly disclosing AWS credentials
* Terraform: an Infrastructure-as-Code library, to configure AWS Personalize and its dependent components

Optionally, the following technologies are recommended to complete this tutorial:

* Python: general pupose programming language

### Event capture and ingestion with Snowplow

- E-store front-end and Snowplow JavaScript tracker: user activity is captured as Snowplow ecommerce events
- Snowplow to AWS Lambda to AWS Personalize: the Snowplow pipeline validates the events, enriches them with device and geolocation data, then forwards them into the proper AWS lambda instance, that will feed AWS Personalize
8 changes: 8 additions & 0 deletions tutorials/recommendations-with-aws-personalize/meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"title": "Recommendations system with AWS Personalize",
"label": "Solution accelerator",
"description": "Use Snowplow data to build a recommendations system with AWS Personalize.",
"useCases": ["Ecommerce", "Recommendations"],
"technologies": [],
"snowplowTech": ["Snowbridge"]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
---
title: Real-time Integration
position: 7
---

Having recommendations being served on site and influencing behaviour, the next step is to give AWS Personalize a feed of events as they occur so it can keep up to date with its suggestions.

To do this, utilize Snowbridge to intercept events coming from the site, and send them to AWS Personalize.

While Snowbridge has support for making requests to HTTP APIs, it unfortunately isn't sophisticated enough to do the authentication required to send events to an AWS API, so the AWS Lambda function should be adjusted to do that part, and just have Snowbridge send the events to it.

Start with a Snowbridge configuration to filter our events and do some simple transforms:

```hcl
transform {
use "spEnrichedFilter" {
atomic_field = "event_name"
regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order)$" # filter to only ecommerce events you're interested in
filter_action = "keep"
}
}

transform {
use "js" {
script_path = "/tmp/transform.js" # use a custom JS transform on the payload; not all ecommerce events can be filtered by just `event_name` so more checks are needed there
snowplow_mode = true # turn the TSV record into JSON for our transform function to handle
}
}

target {
use "http" {
url = "https://your-lambda-instance.execute-api.your-aws-region.amazonaws.com/interaction/recommendations_demo_dsg_ecommerce" # this is the dataset group name
content_type = "application/json"
basic_auth_username = "snowbridge"
basic_auth_password = ">_-G6xdYDjU?O4NXGpc4" # use a password to prevent abuse and false event submission
}
}
```

This configuration requires a `transform.js` file that will describe how to translate different e-commerce events into interactions AWS Personaize is expecting:

```js
/**
* @typedef {object} EngineProtocol
* @property {boolean} [FilterOut]
* @property {string} [PartitionKey]
* @property {string | object} [Data]
* @property {Record<string, string>} [HTTPHeaders]
*/

const SKIP_EVENT = { FilterOut: true };

/**
* @param {EngineProtocol} ep
* @returns {EngineProtocol}
*/
function main(ep) {
if (typeof ep.Data === "string") return SKIP_EVENT; // should be in snowplow_mode

const event = ep.Data;

const ts = (event.derived_tstamp || event.collector_tstamp).UnixMilli();

const client_session = (event.contexts_com_snowplowanalytics_snowplow_client_session_1 || [])[0] || {};

ep.Data = {
event_id: event.event_id,
event_type: "",
user_id: event.user_id || event.domain_userid || client_session.userId,
session_id: event.domain_sessionid || client_session.sessionId,
item_ids: undefined,
sent_at: ts,
};

let payload = undefined;
let products = undefined;

switch (event.event_name) {
case 'transaction_item': // classic ecommerce
ep.Data.event_type = "Purchase";
ep.Data.item_ids = [event.ti_sku];
break;
case 'action': // enhanced ecommerce
payload = event.unstruct_event_com_google_analytics_enhanced_ecommerce_action_1;
products = event.contexts_com_google_analytics_enhanced_ecommerce_product_1;
if (!payload || !payload.action || !products) return SKIP_EVENT;
ep.Data.item_ids = products.map((i) => i.id);
if (payload.action === "view") {
ep.Data.event_type = "View";
} else if (payload.action === "click") {
ep.Data.event_type = "Click";
} else if (payload.action === "purchase") {
ep.Data.event_type = "Purchase";
} else return SKIP_EVENT;
break;
case 'snowplow_ecommerce_action': // snowplow ecommerce
payload = event.unstruct_event_com_snowplowanalytics_snowplow_ecommerce_snowplow_ecommerce_action_1;
products = event.contexts_com_snowplowanalytics_snowplow_ecommerce_product_1;
if (!payload || !payload.type || !products) return SKIP_EVENT;
ep.Data.item_ids = products.map((i) => i.id);
if (payload.type === "product_view") {
ep.Data.event_type = "View";
} else if (payload.type === "list_view") {
ep.Data.event_type = "View"; // ???
} else if (payload.type === "list_click") {
ep.Data.event_type = "Click";
} else if (payload.type === "transaction") {
ep.Data.event_type = "Purchase";
} else return SKIP_EVENT;
break;
case 'view_item': // hyper-t ecommerce
payload = event.unstruct_event_io_snowplow_ecomm_view_item_1;
if (!payload) return SKIP_EVENT;
ep.Data.event_type = "View";
ep.Data.item_ids = [payload.item_id];
break;
case 'create_order': // hyper-t ecommerce
ep.Data.event_type = "Purchase";
payload = event.contexts_io_snowplow_ecomm_cart_1;
if (!payload || !payload.items_in_cart) return SKIP_EVENT;
ep.Data.item_ids = payload.items_in_cart.map((i) => i.item_id);
break;
default:
return SKIP_EVENT;
}

if (!ep.Data.item_ids || !ep.Data.event_type) return SKIP_EVENT;
return ep;
}
```

Snowbridge will:
- Filter the event stream to ecommerce events
- Transform them into a common format
- Send it to your AWS Lambda

The Lambda will then:
- Submit the interaction to AWS Personalize as a real-time event

This allows AWS Personalize to react to new behaviour, and it will periodically retrain itself and adjust its models to accomodate the newer observations.
Loading