diff --git a/tutorials/recommendations-with-aws-personalize/aws-personalize-setup.md b/tutorials/recommendations-with-aws-personalize/aws-personalize-setup.md new file mode 100644 index 000000000..4bda3d441 --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/aws-personalize-setup.md @@ -0,0 +1,13 @@ +--- +title: AWS Personalize Setup +position: 5 +--- + +To proceed we need the following infrastructure: + +- A Dataset Group: this is the top-level resource for AWS Personalize, and maps to a use case (e.g. Ecommerce) and will contain our Datasets and Recommenders +- A set of Schemas for each of our datasets above; these are defined in Avro format ([the repo has templates](https://github.com/snowplow-incubator/dbt-snowplow-recommendations/tree/main/aws_personalize_utilities/schemas)) +- The Datasets configurations themselves; these bind some use-case configuration to the Dataset Group and Schema +- Support infrastructure like IAM roles and policies to allow AWS Personalize to access your data +- A Dataset Import to load your datasets from S3 +- Recommenders to choose a model, train on the dataset, and serve personalizations \ No newline at end of file diff --git a/tutorials/recommendations-with-aws-personalize/conclusion.md b/tutorials/recommendations-with-aws-personalize/conclusion.md new file mode 100644 index 000000000..99af6d138 --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/conclusion.md @@ -0,0 +1,30 @@ +--- +position: 9 +title: Conclusion +--- + +# Conclusion + +In this tutorial, we have explored the **AWS Personalize** solution accelerator for feeding AWS Personalize with Snowplow data, enabling customers to update recommendations for their end-users in real-time. + +## Key Takeaways + +### Understanding the Process + +We successfully have built a real-time system for processing event data including: + +- Initial training data for AWS Personalize, to jump-start a base recommendation system with seminal data; +- Continuously feeding AWS Personalization recommendation engine with new data once this new data is generated in real-time. + +### Practical Applications + +This tutorial can be extended to utilize Snowplow event data for other real-time use cases, such as: + +- Web Engagement analytics; +- Ad performance tracking. + +## Next Steps +- **Extend tracking:** Extend the solution to track more granular user interactions or track on a new platform such as mobile; +- **Expanding use cases:** AWS Personalize can be used not just for recommendations, but also for hyperpersonalization based on customers' history. + +By completing this tutorial, you are equipped to harness the power of event-driven systems and Snowplow’s analytics framework to build dynamic, real-time solutions tailored to your streaming and analytics needs. \ No newline at end of file diff --git a/tutorials/recommendations-with-aws-personalize/impression-integration.md b/tutorials/recommendations-with-aws-personalize/impression-integration.md new file mode 100644 index 000000000..b7c7495fd --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/impression-integration.md @@ -0,0 +1,167 @@ +--- +title: Impression Integration +position: 8 +--- + +To "close the loop" and allow AWS Personalize to improve the performance of it's recommendations, it should be fed back information about how they perform. + +It is important know which of its recommendations actually got clicked, so it can account for that in its models and optimize the performance of individual recommendations. + +For this to work, the recommendation IDs and the corresponding clicks of the widgets that are rendered, based on its suggestions, need to be tracked. + +To do this, adjustments need to be performed to what was built so far, such as: + +- The site tracking to track the clicks (impressions may be tracked to assess performance) +- The Snowbridge config to account for the click events +- The Lambda code already understands how to handle events with impression information, so no changes are required in it + +## Tracking Impressions + +The [Element Tracking plugin](https://github.com/snowplow/snowplow-javascript-tracker/pull/1400) can be used as follows: + +```javascript +snowplow("addPlugin", "/cdn/shop/t/3/assets/element-tracker.umd.min.js", ["snowplowElementTracking", "SnowplowElementTrackingPlugin"]); + +// set up impression tracking +snowplow("startElementTracking", { + elements: { + name: "recommendation-impression", // name the configuration something logical + selector: "[data-recommendation-id]", // selector will vary based on the widget implementation + expose: { when: "element", minPercentage: .5 }, // once per widget, only once it is 50% in view + component: true, // mark it as a component ton get clicks + details: { dataset: ["recommendationId"] }, // extract the recommendation ID + contents: { + name: "recomendation-item", + selector: "[data-item-id]", + details: { dataset: ["itemId"] } // also extract the shown item IDs + } + } +}); + +// set up click tracking +snowplow("getComponentListGenerator", function (_, componentGeneratorWithDetail) { + document.addEventListener("click", function(e) { + if (e.target.closest("a") && e.target.closest("[data-recommendation-id]")) { + const target = e.target.closest("a"); + const details = componentGeneratorWithDetail(target); + snowplow("trackLinkClick", { element: target, context: details }); + } + }, false); +}); +``` + +With this configuration, whenever the custom recommendations widget is in-view, an `expose_element` event will be fired, like the following: + +```json +{ + "schema": "iglu:com.snowplowanalytics.snowplow/expose_element/jsonschema/1-0-0", + "data": { + "element_name": "recommendation-impression" + } +} +``` + +This event will have an `element` entity describing the widget, including the recommendation/impression ID, like so: + +```json +{ + "schema": "iglu:com.snowplowanalytics.snowplow/element/jsonschema/1-0-0","data": { + "element_name": "recommendation-impression", + "width": 1600, + "height": 229.4166717529297, + "position_x": 160, + "position_y": 531.5, + "doc_position_x": 160, + "doc_position_y": 3329, + "element_index": 1, + "element_matches": 1, + "originating_page_view": "3d775590-74c6-4d0a-85ee-4d63d72bda2d", + "attributes":[ + { + "source": "dataset", + "attribute": "recommendationId","value": "RID-24-4a6a-8380-506b189ff622-CID-529b19" + } + ] + } +} +``` + +And it will also contain `element_content` entities for each item in the widget, capturing their product IDs, like the following: + +```json +{ + "schema": "iglu:com.snowplowanalytics.snowplow/element_content/jsonschema/1-0-0", + "data": { + "element_name": "recomendation-item", + "parent_name": "recommendation-impression", + "parent_position": 1, + "position": 1, + "attributes": [ + { + "source": "dataset", + "attribute": "itemId", + "value": "48538191331628" + } + ] + } +} +``` + +In addition, if the links in the widget are clicked, a regular `link_click` event will be generated - but because the widget is defined as a component, it will extract the same entities as an impression and include those, too. + +These `link_click` events are what it's needed to detect and forward to AWS Personalize. + +## Snowbridge Impressions and Clicks + +Back to the `transform.js` file, `link_click` event needs to be included in the event selection regex: + +```hcl + regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order)$" # before + + regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order|link_click)$" # after +``` + +The custom transform then needs to be aware of them: + +```javascript + case 'link_click': // recommendation clicks + ep.Data.event_type = "Click"; + + const element = event.contexts_com_snowplowanalytics_snowplow_element_1 || []; + const content = event.contexts_com_snowplowanalytics_snowplow_element_content_1 || []; + + if (!element.length) return SKIP_EVENT; // unrelated link_click + if (!content.length) return SKIP_EVENT; // unrelated link_click + + let impressionId = null; + + element.forEach((e) => { + if (e.element_name !== "recommendation-impression") return; // some other element/component + if (e.attributes) { + e.attributes.forEach((a) => { + if (a.source === "dataset" && a.attribute === "recommendationId") { + impressionId = a.value; + } + }); + } + }); + + if (!impressionId) return SKIP_EVENT; // couldn't find impression info + + const items = []; + + content.forEach((ec) => { + if (ec.parent_name !== "recommendation-impression") return; + items.push(ec.attributes[0].value); + }); + + ep.Data.item_ids = items; // for simplicity we will pretend the first item was the clicked one + ep.Data.impression_id = impressionId; + break; + default: + return SKIP_EVENT; +``` + +Snowbridge will now send the clicked recommendation events to the Lambda, which will send them to AWS Personalize. + +AWS Personalize will now be able to optimize its recommendations based on how they perform. \ No newline at end of file diff --git a/tutorials/recommendations-with-aws-personalize/introduction.md b/tutorials/recommendations-with-aws-personalize/introduction.md new file mode 100644 index 000000000..8eadd864b --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/introduction.md @@ -0,0 +1,31 @@ +--- +title: Introduction +position: 1 +--- + +[AWS Personalize](https://aws.amazon.com/personalize/) is a ML-based solution to provide personalization and recommendations capabilities to end-users. It can use Snowplow data to build different use cases. It may be used with AWS SDKs, and it counts with a UI/UX interface with AWS Console. + +Some other examples supported by AWS Personalize include: +* Email personaliztion +* Next best action +* Search personalization +* Media/content recommendations + +This accelerator demonstrates how Snowplow data can be used to feed AWS Personalize models. Any version of Snowplow that supports Snowbridge can be used, such as [Snowplow Local](https://github.com/snowplow-incubator/snowplow-local). For testing purposes, we recommend generating events using one of our examples that work with our out-of-the-box ecommerce events, like our [**Snowplow ecommerce store**](https://github.com/snowplow-industry-solutions/ecommerce-nextjs-example-store). + +## Key technologies + +* Snowplow: event tracking pipeline (Collector, Enrich, Kinesis sink) +* Snowbridge: event forwarding module, part of Snowplow +* AWS Personalize: the recommender technology +* AWS Lambda: a public endpoint to receive Snowplow events and serve user recommendations, properly disclosing AWS credentials +* Terraform: an Infrastructure-as-Code library, to configure AWS Personalize and its dependent components + +Optionally, the following technologies are recommended to complete this tutorial: + +* Python: general pupose programming language + +### Event capture and ingestion with Snowplow + +- E-store front-end and Snowplow JavaScript tracker: user activity is captured as Snowplow ecommerce events +- Snowplow to AWS Lambda to AWS Personalize: the Snowplow pipeline validates the events, enriches them with device and geolocation data, then forwards them into the proper AWS lambda instance, that will feed AWS Personalize diff --git a/tutorials/recommendations-with-aws-personalize/meta.json b/tutorials/recommendations-with-aws-personalize/meta.json new file mode 100644 index 000000000..59135a7e3 --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/meta.json @@ -0,0 +1,8 @@ +{ + "title": "Recommendations system with AWS Personalize", + "label": "Solution accelerator", + "description": "Use Snowplow data to build a recommendations system with AWS Personalize.", + "useCases": ["Ecommerce", "Recommendations"], + "technologies": [], + "snowplowTech": ["Snowbridge"] +} diff --git a/tutorials/recommendations-with-aws-personalize/real-time-integration.md b/tutorials/recommendations-with-aws-personalize/real-time-integration.md new file mode 100644 index 000000000..485bc0947 --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/real-time-integration.md @@ -0,0 +1,140 @@ +--- +title: Real-time Integration +position: 7 +--- + +Having recommendations being served on site and influencing behaviour, the next step is to give AWS Personalize a feed of events as they occur so it can keep up to date with its suggestions. + +To do this, utilize Snowbridge to intercept events coming from the site, and send them to AWS Personalize. + +While Snowbridge has support for making requests to HTTP APIs, it unfortunately isn't sophisticated enough to do the authentication required to send events to an AWS API, so the AWS Lambda function should be adjusted to do that part, and just have Snowbridge send the events to it. + +Start with a Snowbridge configuration to filter our events and do some simple transforms: + +```hcl +transform { + use "spEnrichedFilter" { + atomic_field = "event_name" + regex = "^(snowplow_ecommerce_action|action|view_item|transaction_item|create_order)$" # filter to only ecommerce events you're interested in + filter_action = "keep" + } +} + +transform { + use "js" { + script_path = "/tmp/transform.js" # use a custom JS transform on the payload; not all ecommerce events can be filtered by just `event_name` so more checks are needed there + snowplow_mode = true # turn the TSV record into JSON for our transform function to handle + } +} + +target { + use "http" { + url = "https://your-lambda-instance.execute-api.your-aws-region.amazonaws.com/interaction/recommendations_demo_dsg_ecommerce" # this is the dataset group name + content_type = "application/json" + basic_auth_username = "snowbridge" + basic_auth_password = ">_-G6xdYDjU?O4NXGpc4" # use a password to prevent abuse and false event submission + } +} +``` + +This configuration requires a `transform.js` file that will describe how to translate different e-commerce events into interactions AWS Personaize is expecting: + +```js +/** + * @typedef {object} EngineProtocol + * @property {boolean} [FilterOut] + * @property {string} [PartitionKey] + * @property {string | object} [Data] + * @property {Record} [HTTPHeaders] + */ + +const SKIP_EVENT = { FilterOut: true }; + +/** + * @param {EngineProtocol} ep + * @returns {EngineProtocol} + */ +function main(ep) { + if (typeof ep.Data === "string") return SKIP_EVENT; // should be in snowplow_mode + + const event = ep.Data; + + const ts = (event.derived_tstamp || event.collector_tstamp).UnixMilli(); + + const client_session = (event.contexts_com_snowplowanalytics_snowplow_client_session_1 || [])[0] || {}; + + ep.Data = { + event_id: event.event_id, + event_type: "", + user_id: event.user_id || event.domain_userid || client_session.userId, + session_id: event.domain_sessionid || client_session.sessionId, + item_ids: undefined, + sent_at: ts, + }; + + let payload = undefined; + let products = undefined; + + switch (event.event_name) { + case 'transaction_item': // classic ecommerce + ep.Data.event_type = "Purchase"; + ep.Data.item_ids = [event.ti_sku]; + break; + case 'action': // enhanced ecommerce + payload = event.unstruct_event_com_google_analytics_enhanced_ecommerce_action_1; + products = event.contexts_com_google_analytics_enhanced_ecommerce_product_1; + if (!payload || !payload.action || !products) return SKIP_EVENT; + ep.Data.item_ids = products.map((i) => i.id); + if (payload.action === "view") { + ep.Data.event_type = "View"; + } else if (payload.action === "click") { + ep.Data.event_type = "Click"; + } else if (payload.action === "purchase") { + ep.Data.event_type = "Purchase"; + } else return SKIP_EVENT; + break; + case 'snowplow_ecommerce_action': // snowplow ecommerce + payload = event.unstruct_event_com_snowplowanalytics_snowplow_ecommerce_snowplow_ecommerce_action_1; + products = event.contexts_com_snowplowanalytics_snowplow_ecommerce_product_1; + if (!payload || !payload.type || !products) return SKIP_EVENT; + ep.Data.item_ids = products.map((i) => i.id); + if (payload.type === "product_view") { + ep.Data.event_type = "View"; + } else if (payload.type === "list_view") { + ep.Data.event_type = "View"; // ??? + } else if (payload.type === "list_click") { + ep.Data.event_type = "Click"; + } else if (payload.type === "transaction") { + ep.Data.event_type = "Purchase"; + } else return SKIP_EVENT; + break; + case 'view_item': // hyper-t ecommerce + payload = event.unstruct_event_io_snowplow_ecomm_view_item_1; + if (!payload) return SKIP_EVENT; + ep.Data.event_type = "View"; + ep.Data.item_ids = [payload.item_id]; + break; + case 'create_order': // hyper-t ecommerce + ep.Data.event_type = "Purchase"; + payload = event.contexts_io_snowplow_ecomm_cart_1; + if (!payload || !payload.items_in_cart) return SKIP_EVENT; + ep.Data.item_ids = payload.items_in_cart.map((i) => i.item_id); + break; + default: + return SKIP_EVENT; + } + + if (!ep.Data.item_ids || !ep.Data.event_type) return SKIP_EVENT; + return ep; +} +``` + +Snowbridge will: +- Filter the event stream to ecommerce events +- Transform them into a common format +- Send it to your AWS Lambda + +The Lambda will then: +- Submit the interaction to AWS Personalize as a real-time event + +This allows AWS Personalize to react to new behaviour, and it will periodically retrain itself and adjust its models to accomodate the newer observations. \ No newline at end of file diff --git a/tutorials/recommendations-with-aws-personalize/serving-recommendations.md b/tutorials/recommendations-with-aws-personalize/serving-recommendations.md new file mode 100644 index 000000000..9cfc29bed --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/serving-recommendations.md @@ -0,0 +1,73 @@ +--- +title: Serving Recommendations +position: 6 +--- + +As an AWS service, AWS Personalize requires some AWS credentials in order to interact with its APIs and get recommendations. + +The API is roughly split into 3 parts: + +- Personalize: the main API for dealing with the service itself and its resources, we mostly used this in the last section when creating infrastructure +- Personalize Runtime: used to talk to Recommenders and get actual personalization results from our models +- Personalize Events: for ingesting real-time events _back_ into our Recommenders; we'll use this later + +A list of recommenders can be obtained with the following Python code: + +```py +import os +import boto3 +from pprint import pprint + +region = 'us-east-2' # Replace here with your own region +try: + region = os.environ['AWS_REGION'] +except userdata.SecretNotFoundError: + pass + +session = boto3.Session( + aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'], + aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'], + region_name=region, +) + +personalize = session.client('personalize') +runtime = session.client('personalize-runtime') + +recommender_list = personalize.list_recommenders() + +recommenders = { r["name"]: r["recommenderArn"] for r in recommender_list["recommenders"] } +pprint(recommenders) +``` + +You'll need the ARN of the Recommender to request recommendations. Each Recommender is tied to a specific model and has its own requirements for what is needed in the request for recommendations. + +One of these recommenders, `most_viewed`, can serve recommendations with the additional logic: + +```py +# `runtime` was declared previously + +recs = runtime.get_recommendations( + recommenderArn=recommenders["most_viewed"], + userId="776f1a6e-bccd-46ff-ad79-1319a3f833b7", # domain user id picked randomly from our dataset +) + +pprint(recs) +``` + +The Recommender has returned a list of recommended items, and a `recommendationId` can be used for impression tracking. + +Since it's not recommended to expose this API and credentials to visitors to our site, a service of some kind will be required, that can take requests from visitors to our site, talk to the Recommender on their behalf, and return the recommendations to the user for display on the site. [The Terraform example found in the corresponding accelerator repository](https://github.com/snowplow-industry-solutions/ecommerce-recsys-with-amazon-personalize/blob/main/terraform_utilities/aws_personalize_module/app_module/lambda/lambda.tf) contains the deployment of [an AWS Lambda app](https://github.com/snowplow-industry-solutions/ecommerce-recsys-with-amazon-personalize/blob/main/aws_personalize_utilities/lambda_app.py) that shelters the API logic plus sensitive environment variables. Once deployed, it should work as demonstrated below: + +```py +import requests +from pprint import pprint + +resp = requests.post("https://your-lambda-instance.execute-api.your-aws-region.amazonaws.com/", json={ + "method": "get_recommendations", + "user_id": "776f1a6e-bccd-46ff-ad79-1319a3f833b7", # domain user id picked randomly from our dataset + "recommender": "most_viewed", +}) + +recs = resp.json() +pprint(recs) +``` \ No newline at end of file diff --git a/tutorials/recommendations-with-aws-personalize/setup.md b/tutorials/recommendations-with-aws-personalize/setup.md new file mode 100644 index 000000000..165fa20e5 --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/setup.md @@ -0,0 +1,17 @@ +--- +position: 2 +title: Installation and setup +--- + +This tutorial can be executed with a AWS account and [our corresponding Colab notebook](https://colab.research.google.com/drive/19T6EICwF5nF4yrA7ftS3pE-zp9huxaYk), which utilizes our demo stores as a source of events. The rest of this tutorial guides you setting AWS Personalize with your own sources. + +Different stages of this tutorial use Python for code examples, although any programming language with support for AWS SDK can be used. + +Supporting files can be found at [the corresponding GitHub repository](https://github.com/snowplow-industry-solutions/ecommerce-recsys-with-amazon-personalize). In there you should find: + +- [An example on how to implement your own Lambda function (in Python)](https://github.com/snowplow-industry-solutions/ecommerce-recsys-with-amazon-personalize/blob/main/aws_personalize_utilities/lambda_app.py); + - There's [an alternative implementation using Flask](https://github.com/snowplow-industry-solutions/ecommerce-recsys-with-amazon-personalize/blob/main/aws_personalize_utilities/flask_app.py). +- [Different Terraform modules to help you to setup AWS Personalize, and/or Snowflake and Databricks instances](https://github.com/snowplow-industry-solutions/ecommerce-recsys-with-amazon-personalize/tree/main/terraform_utilities); +- [A specific data model, to help you to extract relevant data to prepare a training dataset for AWS Personalize](https://github.com/snowplow-industry-solutions/ecommerce-recsys-with-amazon-personalize/tree/main/dbt-snowplow-recommendations). + +AWS Personalize generally requires custom infrastructure to run. Some of that infrastructure can take time to deploy (20+ minutes) so you may need to make sure it's spun up before starting to work with it. diff --git a/tutorials/recommendations-with-aws-personalize/tracking-setup.md b/tutorials/recommendations-with-aws-personalize/tracking-setup.md new file mode 100644 index 000000000..fabe64746 --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/tracking-setup.md @@ -0,0 +1,19 @@ +--- +title: Tracking Setup +position: 3 +--- + +The recommended tracking setup relies on events from the ["Yuper Transactional" e-commerce schema](https://iglucentral.com/?q=io.snowplow.ecomm). This includes events like: + +- Page views +- Product views +- Product click +- Add to cart / Remove from cart +- Collection viewed +- Search +- Checkout start +- Purchase (checkout end) + +Of these, we will use the Product View events as the main signal to train AWS Personalize on. Most code we deploy will also work with the other e-commerce plugins. + +We will also use the Element Tracking plugin's `expose_element` events for measuring impressions of our Recommendations once we are serving them. \ No newline at end of file diff --git a/tutorials/recommendations-with-aws-personalize/training-data.md b/tutorials/recommendations-with-aws-personalize/training-data.md new file mode 100644 index 000000000..39973f90a --- /dev/null +++ b/tutorials/recommendations-with-aws-personalize/training-data.md @@ -0,0 +1,9 @@ +--- +title: Training Data +position: 4 +--- + +In order for the AWS Personalize model to serve usable results, we need to give it some initial training on our actual store. +Out of the box Personalize will have no idea about our customers or products, let alone the relationships between them and what makes a good recommendation. + +To solve this "cold start" problem, we need to export our catalog information for AWs Personalize to read, and give it an initial training dataset of interactions for it to build a model off of. \ No newline at end of file