Tag Archives: Amazon Personalize

Building a generative AI Marketing Portal on AWS

Post Syndicated from Tristan Nguyen original https://aws.amazon.com/blogs/messaging-and-targeting/building-a-generative-ai-marketing-portal-on-aws/

Introduction

In the preceding entries of this series, we examined the transformative impact of Generative AI on marketing strategies in “Building Generative AI into Marketing Strategies: A Primer” and delved into the intricacies of Prompt Engineering to enhance the creation of marketing content with services such as Amazon Bedrock in “From Prompt Engineering to Auto Prompt Optimisation”. We also explored the potential of Large Language Models (LLMs) to refine prompts for more effective customer engagement.

Continuing this exploration, we will articulate how Amazon Bedrock, Amazon Personalize, and Amazon Pinpoint can be leveraged to construct a marketer portal that not only facilitates AI-driven content generation but also personalizes and distributes this content effectively. The aim is to provide a clear blueprint for deploying a system that crafts, personalizes, and distributes marketing content efficiently. This blog will guide you through the deployment process, underlining the real-world utility of these services in optimizing marketing workflows. Through use cases and a code demonstration, we’ll see these technologies in action, offering a hands-on perspective on enhancing your marketing pipeline with AI-driven solutions.

The Challenge with Content Generation in Marketing

Many companies struggle to streamline their marketing operations effectively, facing hurdles at various stages of the marketing operations pipeline. Below, we list the challenges at three main stages of the pipeline: content generation, content personalization, and content distribution.

Content Generation

Creating high-quality, engaging content is often easier said than done. Companies need to invest in skilled copywriters or content creators who understand not just the product but also the target audience. Even with the right talent, the process can be time-consuming and costly. Moreover, generating content at scale while maintaining quality and compliance to industry regulations is the key blocker for many companies considering adopting generative AI technologies in production environments.

Content Personalization

Once the content is created, the next hurdle is personalization. In today’s digital age, generic content rarely captures attention. Customers expect content tailored to their needs, preferences, and behaviors. However, personalizing content is not straightforward. It requires a deep understanding of customer data, which often resides in siloed databases, making it difficult to create a 360-degree view of the customer.

Content Distribution

Finally, even the most captivating, personalized content is ineffective if it doesn’t reach the right audience at the right time. Companies often grapple with choosing the appropriate channels for content distribution, be it email, social media, or mobile notifications. Additionally, ensuring that the content complies with various regulations and doesn’t end up in spam folders adds another layer of complexity to the distribution phase. Sending at scale requires paying attention to deliverability, security and reliability which often poses significant challenges to marketers.

By addressing these challenges, companies can significantly improve their marketing operations and empower their marketers to be more effective. But how can this be achieved efficiently and at scale? The answer lies in leveraging the power of Amazon Bedrock, Amazon Personalize, and Amazon Pinpoint, as we will explore in the following solution.

The Solution In Action

Before we dive into the details of the implementation, let’s take a look at the end result through the linked demo video.

Use Case 1: Banking/Financial Services Industry

You are a relationship manager working in the Consumer Banking department of a fictitious company called AnyCompany Bank. You are assigned a group of customers and would like to send out personalized and targeted communications to the channel of choice to every members of this group of customer.

Behind the scene, the marketer is utilizing Amazon Pinpoint to create the segment of customers they would like to target. The customers’ information and the marketer’s prompt are then fed into Amazon Bedrock to generate the marketing content, which is then sent to the customer via SMS and email using Amazon Pinpoint.

  • In the Prompt Iterator page, you can employ a process called “prompt engineering” to further optimize your prompt to maximize the effectiveness of your marketing campaigns. Please refer to this blog on the process behind engineering the prompt as well as how to apply an additional LLM model for auto-prompting. To get started, simply copy the sample banking prompt which has gone through the prompt engineering process in this page.
  • Next, you can either upload your customer group by uploading a .csv file (through “Importing a Segment”) or specify a customer group using pre-defined filter criteria based on your current customer database using Amazon Pinpoint.

UseCase1Segment

E.g.: The screenshot shows a sample filtered segment named ManagementOrRetired that only filters to customers who are management or retirees.

  • Once done, you can log into the marketer portal and choose the relevant segment that you’ve just created within the Amazon Pinpoint console.

PinpointSegment

  • You can then preview the customers and their information stored in your Amazon Pinpoint’s customer database. Once satisfied, we’re ready to start generating content for those customers!
  • Click on 1:1 Content Generator tab, your content is automatically generated for your first customer. Here, you can cycle through your customers one by one, and depending on the customer’s preferred language and channel, an email or SMS in the preferred language is automatically generated for them.
    • Generated SMS in English

PostiveSMS

    • A negative example showing proper prompt-engineering at work to moderate content. This happens if we try to insert data that does not make sense for the marketing content generator to output. In this case, the marketing generator refuses to output (justifiably) an advertisement for a 6-year-old on a secured instalment loan.

NegativeSMS

  • Finally, we choose to send the generated content via Amazon Pinpoint by clicking on “Send with Amazon Pinpoint”. In the back end, Amazon Pinpoint will orchestrate the sending of the email/SMS through the appropriate channels.
    • Alternatively, if the auto-generated content still did not meet your needs and you want to generate another draft, you can Disagree and try again.

Use Case 2: Travel & Hospitality

You are a marketing executive that’s working for an online air ticketing agency. You’ve been tasked to promote a specific flight from Singapore to Hong Kong for AnyCompany airline. You’d first like to identify which customers would be prime candidates to promote this flight leg to and then send out hyper-personalized message to them.

Behind the scene, instead of using Amazon Pinpoint to manually define the segment, the marketer in this case is leveraging AIML capabilities of Amazon Personalize to define the best group of customers to recommend the specific flight leg to them. Similar to the above use case, the customers’ information and LLM prompt are fed into the Amazon Bedrock, which generates the marketing content that is eventually sent out via Amazon Pinpoint.

  • Similar to the above use case, you’d need to go through a prompt engineering process to ensure that the content the LLM model is generating will be relevant and safe for use. To get started quickly, go to the Prompt Iterator page, you can use the sample airlines prompt and iterate from there.
  • Your company offers many different flight legs, aggregated from many different carriers. You first filter down to the flight leg that you want to promote using the Filters on the left. In this case, we are filtering for flights originating from Singapore (SRCCity) and going to Hong Kong (DSTCity), operated by AnyCompany Airlines.

PersonalizeInstructions

  • Now, let’s choose the number of customers that you’d like to generate. Once satisfied, you choose to start the batch segmentation job.
  • In the background, Amazon Personalize generates a group of customers that are most likely to be interested in this flight leg based on past interactions with similar flight itineraries.
  • Once the segmentation job is finished as shown, you can fetch the recommended group of customers and start generating content for them immediately, similar to the first use case.

Setup instructions

The setup instructions and deployment details can be found in the GitHub link.

Conclusion

In this blog, we’ve explored the transformative potential of integrating Amazon Bedrock, Amazon Personalize, and Amazon Pinpoint to address the common challenges in marketing operations. By automating the content generation with Amazon Bedrock, personalizing at scale with Amazon Personalize, and ensuring precise content distribution with Amazon Pinpoint, companies can not only streamline their marketing processes but also elevate the customer experience.

The benefits are clear: time-saving through automation, increased operational efficiency, and enhanced customer satisfaction through personalized engagement. This integrated solution empowers marketers to focus on strategy and creativity, leaving the heavy lifting to AWS’s robust AI and ML services.

For those ready to take the next step, we’ve provided a comprehensive guide and resources to implement this solution. By following the setup instructions and leveraging the provided prompts as a starting point, you can deploy this solution and begin customizing the marketer portal to your business’ needs.

Call to Action

Don’t let the challenges of content generation, personalization, and distribution hold back your marketing potential. Deploy the Generative AI Marketer Portal today, adapt it to your specific needs, and watch as your marketing operations transform. For a hands-on start and to see this solution in action, visit the GitHub repository for detailed setup instructions.

Have a question? Share your experiences or leave your questions in the comment section.

About the Authors

Tristan (Tri) Nguyen

Tristan (Tri) Nguyen

Tristan (Tri) Nguyen is an Amazon Pinpoint and Amazon Simple Email Service Specialist Solutions Architect at AWS. At work, he specializes in technical implementation of communications services in enterprise systems and architecture/solutions design. In his spare time, he enjoys chess, rock climbing, hiking and triathlon.

Philipp Kaindl

Philipp Kaindl

Philipp Kaindl is a Senior Artificial Intelligence and Machine Learning Solutions Architect at AWS. With a background in data science and
mechanical engineering his focus is on empowering customers to create lasting business impact with the help of AI. Outside of work, Philipp enjoys tinkering with 3D printers, sailing and hiking.

Bruno Giorgini

Bruno Giorgini

Bruno Giorgini is a Senior Solutions Architect specializing in Pinpoint and SES. With over two decades of experience in the IT industry, Bruno has been dedicated to assisting customers of all sizes in achieving their objectives. When he is not crafting innovative solutions for clients, Bruno enjoys spending quality time with his wife and son, exploring the scenic hiking trails around the SF Bay Area.

Push notification engagement metrics tracking

Post Syndicated from Pavlos Ioannou Katidis original https://aws.amazon.com/blogs/messaging-and-targeting/push-notification-engagement-metrics-tracking/

In this blog you will learn how to track and attribute Amazon Pinpoint push notification events for Campaigns and Journeys via API.

Amazon Pinpoint is a multichannel customer engagement platform allowing you to engage with your customers across 6 different channels. Amazon Pinpoint’s push notification channel, can send messages to your mobile app users via Firebase Cloud Messaging (FCM), Apple Push Notification service (APNs), Baidu Cloud Push, Amazon Device Messaging (ADM).

Push notifications is a preferable channel of communication as it notifies your app users even when they are not on your app. This increases app engagement and probability of customers to convert. Additionally, users who download your app but don’t register, can still be targeted and receive your messages.

Using Amazon Pinpoint’s push notification channel you can engage users with highly curated content. The messages can be personalized with customer data stored in Amazon Pinpoint, images, deep links and custom alert sounds – read more here. Amazon Pinpoint Campaigns and Journeys enable marketers to schedule communications, build multichannel experiences and for developers it offers a rich API to send messages. By default, all Amazon Pinpoint accounts are configured to send 25,000 messages per second, which can be increased by requesting a quota increase.

Measuring success of your communications is paramount for optimizing future customer engagements. Amazon Pinpoint push notifications offer the following three events:

  • _opened_notification – This event type indicates that the recipient tapped the notification to open it.
  • _received_foreground – This event type indicates that the recipient received the message as a foreground notification.
  • _received_background – This event type indicates that the recipient received the message as a background notification.

To track the above events from your mobile application, it is recommended using AWS Amplify’s push notification library which is currently available only in React Native.

Solution description

This blog provides an alternative for AWS Amplify for Amazon Pinpoint push notification tracking. Specifically, it utilizes Amazon Pinpoint’s Events API operation, which can be used to record events your customers generate on your mobile or web application. The same API operation can be used to record push notification engagement events.

The Events API operation request body is populated with the Campaign or Journey attributes received via the push notification payload metadata. These attributes help Amazon Pinpoint to attribute the events back to the correct Campaign or Journey

This blog provides examples of campaign, journey & transactional push notification payloads and how to correctly populate the Events API operation. Furthermore it shares an architecture to securely call Amazon Pinpoint’s API from your application’s frontend.

Prerequisites

This post assumes that you already have an Amazon Pinpoint project that is correctly configured to send push notification to your various endpoints using Campaigns or Journeys. Refer to the getting started guide and setting up Amazon Pinpoint mobile push channels for information on how to set up your Amazon Pinpoint project.

You will also need the AWS Mobile SDKs for the respective platform of your apps. The following are the repositories that can be used:

Implementation

The push notification payload received from the application differs between campaign, journey and transactional messages. This blog provides examples for campaign, journey and transactional message payloads as well as how to populate the Amazon Pinpoint Events API request body correctly to report push notification tracking data to Amazon Pinpoint.

Push notification message payload examples:

Campaign payload example:

{
   "pinpoint.openApp":"true",
   "pinpoint.campaign.treatment_id":"0",
   "pinpoint.notification.title":"Message title",
   "pinpoint.notification.body":"Message body",
   "data":"{\"pinpoint\":{\"endpointId\":\"endpoint_id1\",\"userId\":\"user_id1\"}}",
   "pinpoint.campaign.campaign_id":"5befa9dc28b1430cb0469554789e3f99",
   "pinpoint.notification.silentPush":"0",
   "pinpoint.campaign.campaign_activity_id":"613f918c7a4440b69b09c4806d1a9357",
   "receivedAt":"1671009494989",
   "sentAt":"1671009495484"
}

Journey payload example:

{
   "pinpoint.openApp":"true",
   "pinpoint.notification.title":"Message title",
   "pinpoint":{
      "journey":{
         "journey_activity_id":"ibcF4z9lsp",
         "journey_run_id":"5df6dd97f9154cb688afc0b41ab221c3",
         "journey_id":"dc893692ea9848faa76cceef197c5305"
      }
   },
   "pinpoint.notification.body":"Message body",
   "data":"{\"pinpoint\":{\"endpointId\":\"endpoint_id1\",\"userId\":\"user_id1\"}}",
   "pinpoint.notification.silentPush":"0"
}

Transactional payload example:

Note the transactional payload is the same for both messages sent to a push notification token and endpoint-id. Additionally the pinpoint.campaign.campaign_id is always set to _DIRECT.

{
   "pinpoint.openApp":"true",
   "pinpoint.notification.title":"Message title",
   "pinpoint.notification.body":"Message body",
   "pinpoint.campaign.campaign_id":"_DIRECT",
   "pinpoint.notification.silentPush":"0",
   "receivedAt":"1671731433375",
   "sentAt":"1671731433565"
}

Recording push notification events

To record push notification events from your mobile or web application, we will leverage the AWS Mobile SDKs or the Amazon Pinpoint Events API. To prevent inaccurate metrics such as double counting” it is recommended using the appropriate endpoint_id as Pinpoint uses this for de-duplication. Below you can find examples for both Events REST API and put_events AWS Python SDK – Boto3. Visit this page for more information on how to create a signed AWS API request.

Campaign event example – REST API:

Required fields: endpoint_id1, EventType, Timestamp, campaign_id and campaign_activity_id

POST https://pinpoint.us-east-1.amazonaws.com/v1/apps/<Pinpoint-App-id>/events

{
   "BatchItem":{
      "<endpoint_id1>":{
         "Endpoint":{}
       },
      "Events":{
         "<event_id>":{
            "EventType":"_campaign.opened_notification",
            "Timestamp":"2022-12-14T09:50:00.000Z",
            "Attributes":{
               "treatment_id":"0",
               "campaign_id":"5befa9dc28b1430cb0469554789e3f99",
               "campaign_activity_id":"613f918c7a4440b69b09c4806d1a9357"
            }
         }
      }
   }
}

Campaign event example – Python SDK:

Required fields: ApplicationId, endpoint_id, EventType, Timestamp, campaign_id and campaign_activity_id

import boto3 
client = boto3.client("pinpoint")
response = client.put_events(
  ApplicationId = <Pinpoint-App-id>,
  EventsRequest = { 
    "BatchItem": {
      "<event_id>": {
        "Endpoint": {},
        "Events": { 
          "<endpoint_id1>": { 
            "EventType":"_campaign.opened_notification",
            "Timestamp": "2022-12-14T09:50:00.000Z",
            "Attributes": {
              "treatment_id":"0",
              "campaign_id":"5befa9dc28b1430cb0469554789e3f99",
              "campaign_activity_id":"613f918c7a4440b69b09c4806d1a9357"
            }
          }
        }
      }
    }
  }
)
print(response)

Journey event example – REST API:

Required fields: endpoint_id, EventType, Timestamp, journey_id and journey_activity_id

POST https://pinpoint.us-east-1.amazonaws.com/v1/apps/<Pinpoint-App-id>/events

{
   "BatchItem":{
      "<endpoint_id1>":{
         "Endpoint":{}
      },
      "Events":{
         "<event_id>":{
            "EventType":"_journey.opened_notification",
            "Timestamp":"2022-12-14T09:50:00.000Z",
            "Attributes":{
               "journey_id":"5befa9dc28b1430cb0469554789e3f99",
               "journey_activity_id":"613f918c7a4440b69b09c4806d1a9357"
            }
         }
      }
   }
}

Journey event example – Python SDK:

Required fields: ApplicationId, endpoint_id1, EventType, Timestamp, journey_id and journey_activity_id

import boto3 
client = boto3.client("pinpoint")
response = client.put_events(
  ApplicationId = <Pinpoint-App-id>,
  EventsRequest = { 
    "BatchItem": {
      "<endpoint_id1>": {
        "Endpoint": {},
        "Events": { 
          "<event_id>": { 
            "EventType":"_journey.opened_notification",
            "Timestamp": "2022-12-14T09:50:00.000Z",
            "Attributes": {
              "journey_id":"5befa9dc28b1430cb0469554789e3f99",
              "journey_activity_id":"613f918c7a4440b69b09c4806d1a9357"
            }
          }
        }
      }
    }
  }
)
print(response)

Transactional event:

Amazon Pinpoint doesn’t support push notification metrics for transactional messages. Specifically, transactional messages don’t offer a field that can be used to attribute engagement events. These engagement events can still be recorded using the Amazon Pinpoint’s Events API. However, unlike Campaign & Journey events, the transactional push notification message payload doesn’t provide an identifier such as Campaign id or Journey Id that can be used as an Amazon Pinpoint event attribute for data reconciliation purposes.

Next steps

Requests to the Amazon Pinpoint Events API must be signed using AWS Signature version 4. We recommend using the AWS Mobile SDKs which handle request signing on your behalf. You can use the AWS Mobile SDKs with temporary limited-privilege Amazon Cognito credentials. For more information and examples, see Getting credentials.

 

About the Authors

Franklin Ochieng

Franklin Ochieng

Franklin Ochieng is a senior software engineer at the Amazon Pinpoint team. He has attained over 7 years experience at AWS building highly scalable system that solve complex problems for our customers. Outside of work, Frank enjoys getting out in nature and playing basketball or pool.

Pavlos Ioannou Katidis

Pavlos Ioannou Katidis

Pavlos Ioannou Katidis is an Amazon Pinpoint and Amazon Simple Email Service Senior Specialist Solutions Architect at AWS. He enjoys diving deep into customers’ technical issues and help in designing communication solutions. In his spare time, he enjoys playing tennis, watching crime TV series, playing FPS PC games, and coding personal projects.

Build AI and ML into Email & SMS for customer engagement

Post Syndicated from Vinay Ujjini original https://aws.amazon.com/blogs/messaging-and-targeting/build-ai-and-ml-into-email-sms-for-customer-engagement/

Build AI and ML into Email & SMS for customer engagement

Customers engage with businesses through various channels like email, SMS, Push, and in-app. With the availability and ease of usage of mobile phones, businesses can use 2-way Short Service Messages (SMS) to engage with their customers. Text messaging does not need applications and provides immediate interaction with your customers. Amazon Pinpoint enables businesses & organizations to interact in 2-way SMS messages with their customers. Since it is not practical and scalable for organizations to have people responding to millions of their customer’s texts, we can leverage Amazon Lex which helps build the conversational AI into the 2-way SMS. Amazon Lex is a fully managed artificial intelligent (AI) AWS service with advanced natural language models to design, build, test, and deploy conversational interfaces in applications. Machine Learning (ML) is used in digital marketing to help businesses detect patterns in customer bhevaior.

Today, if customers want to know the latest status on their order, they have to send an email, which is hard for businesses to monitor and respond, and time consuming for the customer to call regarding their order status and also expensive for businesses to field the calls.

This blog post shows how you can elevate your customer’s experience using Amazon Pinpoint’s omni-channel capabilities, Amazon Lex’s AI powered chat, and ML-powered personalization using Amazon Personalize.

The solution presented in this blog helps resolve all the above issues. The example I have used to depict this where a customer orders a bike and since the delivery has been delayed, he wants to get timely updates on the progress. He has been given a phone number by the bike company to text them with any questions. This solution elevates the customer’s experience by providing him with timely update by checking the latest from the database and also sending additional product recommendations, predicting what the customer might need.

Architecture

This solution uses Amazon Pinpoint, Amazon Lex, AWS Lambda, Amazon Dynamo DB, Amazon Simple Notification Services, Amazon Personalize.

AWS architecture diagram AI/ML, Email, SMS.

  1. The customer sends a message to the number provided by the store asking about their order status.
  2. Pinpoint 2-way SMS has as SNS topic tied to it.
  3. The SNS topic relays the message to the Lex integration Lambda.
  4. This Lex integration lambda has the integration between Pinpoint & Lex.
  5. When the customer checks on their order status, Lex taps into the fulfillment lambda that is tied to it.
  6. That lambda checks on the order status from the DynamoDB and sends it back to Lex.
  7. Lex sends the order details to Amazon Pinpoint and Amazon Pinpoint delivers the SMS with the order details to the customer’s phone number.
  8. Amazon Lex lets fulfillment Lambda know to send an email to the customer with the order details.
  9. Fulfillment Lambda create an event called ‘Order Status’ for Amazon Pinpoint Journey to consume in its Journey.
  10. Amazon Pinpoint’s message template reaches out to Amazon Personalize to get the 3 recommendations.
  11. Amazon Pinpoint’s Journey triggers an email message to the customer with the order information and recommendations

Prerequisites

To deploy this solution, you must have the following:

  1. An AWS account.
  2. An Amazon Pinpoint project.
  3. An originating identity that supports 2 way SMS in the country you are planning to send SMS to – Supported countries and regions (SMS channel).
  4. A mobile number to send and receive text messages.
  5. An SMS customer segment – Download the example CSV, that contains a sample SMS & email endpoints. Replace the phone number (column C) with yours, and email with your email and import it to Amazon Pinpoint – How to import an Amazon Pinpoint segment.
  6. Add your mobile number in the Amazon Pinpoint SMS sandbox.
  7. Verify your email address that needs to receive messages from this account.
  8. Download the LexIntegration.zip & RE_Order_Validation.zip Lambda files from this Github location.

Preparation:

  1. Download the CloudFormation template.
  2. Go to Amazon S3 console and create a bucket. I created one for this example as ‘pinpointreinventaiml-code’. Under that S3 bucket, create a sub-folder and name it Lambda.
  3. Upload the 2 zip files you downloaded earlier from the Github.
  4. In Amazon Pinpoint > Phone numbers, Check to make sure the phone number you are using is enabled for SMS and its status is active.
  5. Add the machine learning generated product recommendations using Amazon Personalize.
Check if phone number is enabled & active in Pinpoint console

Phone numbers in Pinpoint console

Solution implementation

Create a Lex Chat bot:

  1. Now it’s time to create your bot. To create your bot, sign in to the Lex console at https://console.aws.amazon.com/lex.
  2. For more information about creating bots in Lex, see https://docs.aws.amazon.com/lex/latest/dg/gs-console.html.
  3. Click on Create bot button. Next steps:
    1. Select Create a blank bot radio button.
    2. Give a Bot name ‘Order Status’ under Bot name Configuration. (Use the same Bot name as mentioned here. If you change the Bot name here, your CloudFormation will fail)
    3. Under IAM permissions, select the radio button Create a role with basic Amazon Lex permissions.
    4. For COPPA, choose No. Click Next
    5. Under Language dropdown, choose the language of usage. I chose Language as English in my example.
    6. Click Done, to complete the Bot creation.
  4. You have to create an Intent within the Bot you just created
    1. Click on the Bot you just created. Click on Intents and click the dropdown Add intent and select Add empty intent.
    2. Give an intent name and click Ok.
  5. Once the intent is created, go to the intent and open the Conversation flow section in the intent and create a flow that that has the following info and looks like below image:
    1. Click on Sample utterance and it takes you to Sample Utterance and type in Order status.
    2. Click on initial response and type in Okay, I can help with that. What is your order number?
    3. Click on the slot value and click on Add a slot. Name: OrderNumber and Slot type is AMAZON.AlphaNumeric. In the prompt, enter Please enter your order number.
    4. Click on Save Intent button. The conversation flow should look like the below screenshot:

Amazon Lex intent

6. Go back to the Intent you just created and click on the Build button that is to the right side of the page.

Build intent

7. Once the build is successfully completed, go back to the Bot you created and click on Aliases on the left frame. Click on the Alias that was created earlier, TestBotAlias.

Bot Alias

8. In the Languages section, click on the English language that we created earlier.
9. Open the Lambda function – optional section and point the source to RE_Order_Validation Lambda that we downloaded earlier.
10. For Lambda function version or alias, select $LATEST. Click on Save.

Add Lambda to Alias

11. Go to Intents, choose the intent you just built and click on Build button again. Once build is complete, you can test the intent.

Import and execute CloudFormation:

  1. Navigate to the Amazon CloudFormation console in the AWS region you want to deploy this solution.
  2. Select Create stack and With new resources. Choose Template is ready as Prerequisite – Prepare template and Upload a template file as Specify template. Upload the template downloaded in step 1 under Preparation section of this document. Click Next.
  3. Fill the AWS CloudFormation parameters as shown below:
  4. Stack name: Give a name to this stack.
    1. Under Parameters, for BotAlias: The Bot Alias that you created as part of Amazon Lex above.
    2. BotId: The Bot ID for the bot that you created as part of Amazon Lex above.
    3. CodeS3Bucket: Give the name of the S3 bucket you created in step3 of the Preparation topic above.
    4. OriginationNumber: This is the origination identity phone number you created in step4 of the Preparation topic above.
    5. PinpointProjectId: Use the ProjectID you have from step2 of the Prerequisites phase above.
  5. After entering all the parameter info, it would look something like this below:
  6. CloudFormation parameters
  7. Click Next. Leave the default options on the next page and click Next again.
  8. Check the box I acknowledge that AWS CloudFormation might create IAM resources with custom names. Click Submit.

Set up data in Amazon Dynamo DB

  1. We are using DynamoDB table here as the transactional database that stores order information for the bike store.
  2. Once the solution has been successfully deployed, navigate to the Amazon DynamoDB console and access the OrderStatus DynamoDB table. Each row created in this table represents an order and it’s details. Each row should have a unique Order_Num that holds the order number and it’s related information. You can put additional information about the order like the example below:
  3. {
       "Order_Num":{
          "Value":"ABC123"
       },
       "Delivery_Dt":{
          "Value":"12/01/2022"
       },
       "Order_Dt":{
          "Value":"11/01/2022"
       }
       "Shipping_Dt":{
          "Value":"11/24/2022"
       }
       "UserId":{
          "Value":"example-iser-id-3"
       }
    }
  4. Once you enter the data, it should look like the image below. Click on Create item.
  5. Dynamo DB values

Set up Amazon Simple Notification Service (SNS) topic

  1. We need the Amazon Simple Notification Service here, to provide internal message delivery from publishers (customer’s text message) to subscribers (Amazon Lex in this example). This is used for internal notifications in this use case.
  2. As part of the CloudFormation above, check if you have an SNS topic created by the name LexPinpointIntegrationDemo.
  3. Now, we have successfully created an Amazon SNS topic.

Set up Lambda Functions

  1. Go to AWS Lambda console and open the Lambda function LexIntegration. Under the Function overview, click on the Add trigger. Under Trigger configuration dropdown, select SNS and under SNS Topic select LexPinpointIntegrationDemo topic. Click on Add.
  2. Note: In this example, I used Node.js in a Lambda and Python in another, to show how AWS Lambda functions are flexible to use the scripting language of your choice.

Setting up 2-way SMS in Amazon Pinpoint

  1. Go to Amazon Pinpoint console and click on Phone numbers under SMS & Voice in the left frame. If you don’t see any phone numbers, please refer to #3 under prerequisites section above.
  2. This is how your screen should look like
  3. Phone numbers in Pinpoint
  4. Click on the number.
  5. On the right frame, expand Two-way SMS drop down arrow.
  6. Click on the check box ‘Enable two-way SMS’.
  7. In the ‘Incoming message destination’ select the radio button ‘Choose an existing SNS topic’ and in the drop down below, choose the SNS topic you built above.
  8. The result would look like the screenshot below:
  9. 2-way SMS
  10. Click on Save.

Import Machine Learning model into Pinpoint

  1. Go to Amazon Pinpoint.
  2. Click on Machine Learning Models. Click on Add recommender model.
  3. Give a recommender model name and description under model details.
  4. Under Model configuration, choose the radio button ‘Automatically create a role’ and give an IAM role name in the textbox below.
  5. Under recommender model, choose the recommender model campaign that you created in Amazon Personalize earlier in the project.
    1. If you did not create it, use this Pinpoint workshop to create a recommender model in Amazon Personalize.
    2. The data used in this example is for retail industry, please edit the data as needed for your use case and industry.
  6. Under the settings section:
    1. Select ‘User Id’ as identifier.
    2. Click on the drop down ‘Number of recommendations per message’ and select 3.
  7. For Processing method, choose ‘Use value returned by model’.
  8. Click on Next.
  9. You are presented with attributes section. Give a display name as ‘product_name’ for the attributes and click next.
  10. On the next screen, you can review all the information provided and click on Publish.
  11. The completed model after publishing looks like the screen below:
  12. Personalize model in Pinpoint

Create a Message Template in Amazon Pinpoint

  1. Use chapter 6.4 in this workshop Amazon Pinpoint Workshop to create a message template.
  2. Once the template is created, you need to add recommendations to the message template using this Amazon Pinpoint Workshop details. Change the type of data needed for your use case and industry in this workshop. I used sample retail data.
  3. To create a Amazon Pinpoint Journey, navigate to the Amazon Pinpoint console , select Journeys and click on Create journey.
  4. Give a name, click on Set entry condition in the Journey entry block.
  5. Choose the radio button Add participants when they perform an activity.
  6. Click in the ‘Events’ text box and type in OrderStatus.
  7. Pinpoint Journey entry
  8. Click on Add activity and select Send an email.
  9. Click on choose an email template and select the email message template we created earlier in this blog. Click on choose template button.
  10. Select a Sender email address from the drop down list.
  11. Choose sender email here
  12. Click Save. The final journey should look like this:
  13. This is the final journey
  14. Click on Actions > Settings where you will review the journey settings. There you set the start and end date of the journey if applicable as well as other advanced settings. Configure your journey settings to look like the screenshot below and click Save.
  15. Journey settings
  16. To publish your journey click on Review. On the Review your journey click Next > Mark as reviewed > Publish. A 5 minutes countdown will begin after, which your journey will be live.
  17. Once the journey is live, we need to pass the event OrderStatus and the endpoints will go through that journey and will receive an email.

Testing the solution

  1. Use a phone with a valid number (in this example, I took a US phone number) and send a text ‘Order Status’ to the number generated in Amazon Pinpoint above.
  2. You should get a response “Okay, I can help with that. What is your order number?”
  3. You should type in the order number you generated earlier and stored it in Amazon DynamoDB table.
  4. You should get a response “Your order <order number> was shipped on <shipped_dt> and is expected to be delivered to your address on <delivery_dt>. Your order details have been emailed to you.”
  5. Text message flow
  6. Alternatively, you can test this solution from the Lex bot.
  7. In Amazon Lex, go to the intent you created above and click on the Test button. Next steps:
    1. In the text box, enter Order Status.
    2. Bot should respond with Okay, I can help with that. What is your order number?
    3. You can respond with the order number you entered in the DynamoDB table.
    4. Bot should respond with Your order <Order_Num> was shipped on <Shipping_Dt> and is expected to be delivered to your address on <Delivery_Dt>. Your order details have been emailed to you.
    5. Testing the 2 way messaging in Lex console

Conclusion

Using this blog post, you can elevate your customer’s experience by using Amazon Lex’s AI chat capabilities, Amazon Personalize’s ML recommendation models and trigger a Pinpoint Journey. This blog highlights how organizations can interact in a 2-way SMS with their customers and convert that engagement to a triggered email, with product recommendations, if needed.

Next Steps

You can use the above solution and modify it easily to use it across different verticals and applicable use cases. You can also extend this solution to Amazon Connect to an agent via SMS chat, using this blog.

Clean-up

  1. To delete the solution, go to CloudFormation you created as part od this project. Click on the stack and click Delete.
  2. Navigate to Amazon Pinpoint and stop the Journey you ran in this solution. Delete the Journey, Machine learning models, Message templates you created for this solution. Delete the Project you created for this solution.
  3. In Amazon Lex, delete the intent and bot you created for this solution.
  4. Delete the folder and bucket you created in S3 as part of this project.
  5. Amazon Personalize resources like Dataset groups, datasets, etc. are not created via AWS Cloudformation, thus you have to delete them manually. Please follow the instructions in the AWS documentation on how to clean up the created resources.

Additional resources

Retry delivering failed SMS using Pinpoint

How to target customers using ML, based on their interest in a product

 About the Authors

Vinay Ujjini

Vinay Ujjini is an Amazon Pinpoint and Amazon Simple Email Service Principal Specialist Solutions Architect at AWS. He has been solving customer’s omni-channel challenges for over 15 years. In his spare time, he enjoys playing tennis & cricket.

How SikSin improved customer engagement with AWS Data Lab and Amazon Personalize

Post Syndicated from Byungjun Choi original https://aws.amazon.com/blogs/big-data/how-siksin-improved-customer-engagement-with-aws-data-lab-and-amazon-personalize/

This post is co-written with Byungjun Choi and Sangha Yang from SikSin.

SikSin is a technology platform connecting customers with restaurant partners serving their multiple needs. Customers use the SikSin platform to search and discover restaurants, read and write reviews, and view photos. From the restaurateurs’ perspective, SikSin enables restaurant partners to engage and acquire customers in order to grow their business. SikSin has a partnership with 850 corporate companies and more than 50,000 restaurants. They issue restaurant e-vouchers to more than 220,000 members, including individuals as well as corporate members. The SikSin platform receives more than 3 million users in a month. SikSin was listed in the top 100 of the Financial Times’s Asia-Pacific region’s high-growth companies in 2022.

SikSin was looking to deliver improved customer experiences and increase customer engagement. SikSin confronted two business challenges:

  • Customer engagement – SikSin maintains data on more than 750,000 restaurants and has more than 4,000 restaurant articles (and growing). SikSin was looking for a personalized and customized approach to provide restaurant recommendations for their customers and get them engaged with the content, thereby providing a personalized customer experience.
  • Data analysis activities – The SikSin Food Service team experienced difficulties in regards to report generation due to scattered data across multiple systems. The team previously had to submit a request to the IT team and then wait for answers that might be outdated. For the IT team, they needed to manually pull data out of files, databases, and applications, and then combine them upon every request, which is a time-consuming activity. The SikSin Food Service team wanted to view web analytics log data by multiple dimensions, such as customer profiles and places. Examples include page view, conversion rate, and channels.

To overcome these two challenges, SikSin participated in the AWS Data Lab program to assist them in building a prototype solution. The AWS Data Lab offers accelerated, joint-engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives. The Build Lab is a 2–5-day intensive build with a technical customer team.

In this post, we share how SikSin built the basis for accelerating their data project with the help of the Data Lab and Amazon Personalize.

Use cases

The Data Lab team and SikSin team had three consecutive meetings to discuss business and technical requirements, and decided to work on two uses cases to resolve their two business challenges:

  • Build personalized recommendations – SikSin wanted to deploy a machine learning (ML) model to produce personalized content on the landing page of the platform, particularly restaurants and restaurant articles. The success criteria was to increase the number of page views per session and membership subscription, reduce their bounce rate, and ultimately engage more visitors and members in SikSin’s contents.
  • Establish self-service analytics – SikSin’s business users wanted to reduce time to insight by making data more accessible while removing the reliance on the IT team by giving business users the ability to query data. The key was to consolidate web logs from BigQuery and operational business data from Amazon Relational Data Service (Amazon RDS) into a single place and analyze data whenever they need.

Solution overview

The following architecture depicts what the SikSin team built in the 4-day Build Lab. There are two parts in the solution to address SikSin’s business and technical requirements. The first part (1–8) is for building personalized recommendations, and the second part (A–D) is for establishing self-service analytics.

SikSin Solution Architecture

SikSin deployed an ML model to produce personalized content recommendations by using the following AWS services:

  1. AWS Database Migration Service (AWS DMS) helps migrate databases to AWS quickly and securely with minimal downtime. The SikSin team used AWS DMS to perform full load to bring data from the database tables into Amazon Simple Storage Service (Amazon S3) as a target. Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. An AWS Glue crawler populates the AWS Glue Data Catalog with the data schema definitions (in a landing folder).
  2. An AWS Lambda function checks if any previous files still exist in the landing folder and archives the files into a backup folder, if any.
  3. AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, ML, and application development. The SikSin team created AWS Glue Spark extract, transform, and load (ETL) jobs to prepare input datasets for ML models. These datasets are used to train ML models in bulk mode. There are a total of five datasets for training and two datasets for batch inference jobs.
  4. Amazon Personalize allows developers to quickly build and deploy curated recommendations and intelligent user segmentation at scale using ML. Because Amazon Personalize can be tailored to your individual needs, you can deliver the right customer experience at the right time and in the right place. Also, users will select existing ML models (also known as recipes), train models, and run batch inference to make recommendations.
  5. An Amazon Personalize job predicts for each line of input data (restaurants and restaurant articles) and produces ML-generated recommendations in the designated S3 output folder. The recommendation records are surfaced using interaction data, product data, and predictive models. An AWS Glue crawler populates the AWS Glue Data Catalog with the data schema definitions (in an output folder).
  6. The SikSin team applied business logics and filters in an AWS Glue job to prepare the final datasets for recommendations.
  7. AWS Step Functions enables you to build scalable, distributed applications using state machines. The SikSin team used AWS Step Functions Workflow Studio to visually create, run, and debug workflow runs. This workflow is triggered based on a schedule. The process includes data ingestion, cleansing, processing, and all steps defined in Amazon Personalize. This also involves managing run dependencies, scheduling, error-catching, and concurrency in accordance with the logical flow of the pipeline.
  8. Amazon Simple Notification Service (Amazon SNS) sends notifications. The SikSin team used Amazon SNS to send a notification via email and Google Hangouts with a Lambda function as a target.

To establish a self-service analytics environment to enable business users to perform data analysis, SikSin used the following services:

  1. The Google BigQuery Connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from BigQuery. The SikSin team used the connector to extract web analytics logs from BigQuery and load them to an S3 bucket.
  2. AWS Glue DataBrew is a visual data preparation tool that makes it easy for data analysts and data scientists to clean and normalize data to prepare it for analytics and ML. You can choose from over 250 pre-built transformations to automate data preparation tasks, all without the need to write any code. The SikSin Food Service team used it to visually inspect large datasets and shape the data for their data analysis activities. An S3 bucket (in the intermediate folder) contains business operational data such as customers, places, articles, and products, and reference data loaded from AWS DMS and web analytics logs and data by AWS Glue jobs.
  3. An AWS Glue Python shell runs a job to cleanse and join data, and apply business rules to prepare the data for queries. The SikSin team used AWS SDK Pandas, an AWS Professional Service open-source Python initiative, which extends the power of the Pandas library to AWS, connecting DataFrames and AWS data related services. The output files are stored in an Apache Parquet format in a single folder. An AWS Glue crawler populates the data schema definitions (in an output folder) into the AWS Glue Data Catalog.
  4. The SikSin Food Service team used Amazon Athena and Amazon Quicksight to query and visualize the data analysis. Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. QuickSight is an ML-powered business intelligence service built for the cloud.

Business outcomes

The SikSin Food Service team is now able to access the available data for performing data analysis and manipulation operations efficiently, as well as for getting insights on their own. This immediately allows the team as well as other lines of business to understand how customers are interacting with SikSin’s contents and services on the platform and make decisions sooner. For example, with the data output, the Food Service team was able to provide insights and data points for their external stakeholder and customer to initiate a new business idea. Moreover, the team shared, “We anticipate the recommendations and personalized content will increase conversion rates and customer engagement.”

The AWS Data Lab enabled SikSin to review and assess thoroughly what data is actually usable and available. With SikSin’s objective to successfully build a data pipeline for data analytics purposes, the SikSin team came to realize the importance of data cleansing, categorization, and standardization. “Only fruitful analysis and recommendation are possible when data is intact and properly cleansed,” said Byungjun Choi (the Head of SikSin’s Food Service Team). After completing the Data Lab, SikSin completed and set up an internal process that can streamline the data cleansing pipeline.

SikSin was stuck in the research phase of looking for a solution to solve their personalization challenges. The AWS Data Lab enabled the SikSin IT Team to get hands-on with the technology and build a minimum viable product (MVP) to explore how Amazon Personalize would work in their environment with their data. They achieved this via the Data Lab by adopting AWS DMS, AWS Glue, Amazon Personalize, and Step Functions. “Though it is still the early stage of building a prototype, I am very confident with the right enablement provided from AWS that an effective recommendation system can be adopted on production level very soon,” commented Sangha Yang (the Head of SikSin IT Team).

Conclusion

As a result of the 4-day Build Lab, the SikSin team left with a working prototype that is custom fit to their needs, gaining a clear path forward for enabling end-users to gain valuable insights into its data. The Data Lab allowed the SikSin team to accelerate the architectural design and prototype build of this solution by months. Based on the lessons and learnings obtained from Data Lab, SikSin is planning to launch a Global News Content Platform equipped with a recommendation feature in FY23.

As demonstrated by SikSin’s achievements, Amazon Personalize allows developers to quickly build and deploy curated recommendations and intelligent user segmentation at scale using ML. Because Amazon Personalize can be tailored to your individual needs, you can deliver the right customer experience at the right time and in the right place. Whether you want to optimize recommendations, target customers more accurately, maximize your data’s value, or promote items using business rules.

To accelerate your digital transformation with ML, the Data Lab program is available to support you by providing prescriptive architectural guidance on a particular use case, sharing best practices, and removing technical roadblocks. You’ll leave the engagement with an architecture or working prototype that is custom fit to your needs, a path to production, and deeper knowledge of AWS services.

Please contact your AWS Account Manager or Solutions Architect to get started. If you don’t have an AWS Account Manager, please contact Sales.


About the Authors

bdb-2857-BJByungjun Choi is the Head of SikSin Food Service at SikSin.

bdb-2857-SHSangha Yang is the Head of IT team at SinSin.

bdb-2857-youngguYounggu Yun is a Senior Data Lab Architect at AWS. He works with customers around the APAC region to help them achieve business goals and solve technical problems by providing prescriptive architectural guidance, sharing best practices, and building innovative solutions together.

Junwoo Lee is an Account Manager at AWS. He provides technical and business support to help customer resolve their problems and enrich customer journey by introducing local and global programs for his customers.

bdb-2857-jinwooJinwoo Park is a Senior Solutions Architect at AWS. He provides technical support for AWS customers to succeed with their cloud journey. He helps customers build more secure, efficient, and cost-optimized architectures and solutions, and delivers best practices and workshops.

AWS Week in Review – January 16, 2023

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-week-in-review-january-16-2023/

Today, we celebrate Martin Luther King Jr. Day in the US to honor the late civil rights leader’s life, legacy, and achievements. In this article, Amazon employees share what MLK Day means to them and how diversity makes us stronger.

Coming back to our AWS Week in Review—it’s been a busy week!

Last Week’s Launches
Here are some launches that got my attention during the previous week:

AWS Local Zones in Perth and Santiago now generally available – AWS Local Zones help you run latency-sensitive applications closer to end users. AWS now has a total of 29 Local Zones; 12 outside of the US (Bangkok, Buenos Aires, Copenhagen, Delhi, Hamburg, Helsinki, Kolkata, Muscat, Perth, Santiago, Taipei, and Warsaw) and 17 in the US. See the full list of available and announced AWS Local Zones and learn how to get started.

AWS Local Zones Locations

AWS Clean Rooms now available in preview – During AWS re:Invent this past November, we announced AWS Clean Rooms, a new analytics service that helps companies across industries easily and securely analyze and collaborate on their combined datasets—without sharing or revealing underlying data. You can now start using AWS Clean Rooms (Preview).

Amazon Kendra updates – Amazon Kendra is an intelligent search service powered by machine learning (ML) that helps you search across different content repositories with built-in connectors. With the new Amazon Kendra Intelligent Ranking for self-managed OpenSearch, you can now improve the quality of your OpenSearch search results using Amazon Kendra’s ML-powered semantic ranking technology.

Amazon Kendra also released an Amazon S3 connector with VPC support to index and search documents from Amazon S3 hosted in your VPC, a new Google Drive Connector to index and search documents from Google Drive, a Microsoft Teams Connector to enable Microsoft Teams messaging search, and a Microsoft Exchange Connector to enable email-messaging search.

Amazon Personalize updates – Amazon Personalize helps you improve customer engagement through personalized product and content recommendations. Using the new Trending-Now recipe, you can now generate recommendations for items that are rapidly becoming more popular with your users. Amazon Personalize now also supports tag-based resource authorization. Tags are labels in the form of key-value pairs that can be attached to individual Amazon Personalize resources to manage resources or allocate costs.

Amazon SageMaker Canvas now delivers up to 3x faster ML model training time – SageMaker Canvas is a visual interface that enables business analysts to generate accurate ML predictions on their own—without having to write a single line of code. The accelerated model training times help you prototype and experiment more rapidly, shortening the time to generate predictions and turn data into valuable insights.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some additional news items and blog posts that you may find interesting:

AWS open-source news and updates – My colleague Ricardo writes this weekly open-source newsletter in which he highlights new open-source projects, tools, and demos from the AWS Community. Read edition #141 here.

ML model hosting best practices in Amazon SageMaker – This seven-part blog series discusses best practices for ML model hosting in SageMaker to help you identify which hosting design pattern meets your needs best. The blog series also covers advanced concepts such as multi-model endpoints (MME), multi-container endpoints (MCE), serial inference pipelines, and model ensembles. Read part one here.

I would also like to recommend this really interesting Amazon Science article about differential privacy for end-to-end speech recognition. The data used to train AI models is protected by differential privacy (DP), which adds noise during training. In this article, Amazon researchers show how ensembles of teacher models can meet DP constraints while reducing error by more than 26 percent relative to standard DP methods.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

#BuildOnLiveBuild On AWS Live events are a series of technical streams on twitch.tv/aws that focus on technology topics related to challenges hands-on practitioners face today.

  • Join the Build On Live Weekly show about the cloud, the community, the code, and everything in between, hosted by AWS Developer Advocates. The show streams every Thursday at 09:00 US PT on twitch.tv/aws.
  • Join the new The Big Dev Theory show, co-hosted with AWS partners, discussing various topics such as data and AI, AIOps, integration, and security. The show streams every Tuesday at 08:00 US PT on twitch.tv/aws.

Check the AWS Twitch schedule for all shows.

AWS Community Days – AWS Community Day events are community-led conferences that deliver a peer-to-peer learning experience, providing developers with a venue to acquire AWS knowledge in their preferred way: from one another.

AWS Innovate Data and AI/ML edition – AWS Innovate is a free online event to learn the latest from AWS experts and get step-by-step guidance on using AI/ML to drive fast, efficient, and measurable results.

  • AWS Innovate Data and AI/ML edition for Asia Pacific and Japan is taking place on February 22, 2023. Register here.
  • Registrations for AWS Innovate EMEA (March 9, 2023) and the Americas (March 14, 2023) will open soon. Check the AWS Innovate page for updates.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Week in Review!

— Antje

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Architecting near real-time personalized recommendations with Amazon Personalize

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/architecting-near-real-time-personalized-recommendations-with-amazon-personalize/

Delivering personalized customer experiences enables organizations to improve business outcomes such as acquiring and retaining customers, increasing engagement, driving efficiencies, and improving discoverability. Developing an in-house personalization solution can take a lot of time, which increases the time it takes for your business to launch new features and user experiences.

In this post, we show you how to architect near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services.  We also discuss key considerations and best practices while building near real-time personalized recommendations.

Building personalized recommendations with Amazon Personalize

Amazon Personalize makes it easy for developers to build applications capable of delivering a wide array of personalization experiences, including specific product recommendations, personalized product re-ranking, and customized direct marketing.

Amazon Personalize provisions the necessary infrastructure and manages the entire machine learning (ML) pipeline, including processing the data, identifying features, using the most appropriate algorithms, and training, optimizing, and hosting the models. You receive results through an Application Programming Interface (API) and pay only for what you use, with no minimum fees or upfront commitments.

Figure 1 illustrates the comparison of Amazon Personalize with the ML lifecycle.

Machine learning lifecycle vs. Amazon Personalize

Figure 1. Machine learning lifecycle vs. Amazon Personalize

First, provide the user and items data to Amazon Personalize. In general, there are three steps for building near real-time recommendations with Amazon Personalize:

  1. Data preparation: Preparing data is one of the prerequisites for building accurate ML models and analytics, and it is the most time-consuming part of an ML project. There are three types of data you use for modeling on Amazon Personalize:
    • An Interactions data set captures the activity of your users, also known as events. Examples include items your users click on, purchase, or watch. The events you choose to send are dependent on your business domain. This data set has the strongest signal for personalization, and is the only mandatory data set.
    • An Items data set includes details about your items, such as price point, category information, and other essential information from your catalog. This data set is optional, but very useful for scenarios such as recommending new items.
    • A Users data set includes details about the users, such as their location, age, and other details.
  2. Train the model with Amazon Personalize: Amazon Personalize provides recipes, based on common use cases for training models. A recipe is an Amazon Personalize algorithm prepared for a given use case. Refer to Amazon Personalize recipes for more details. The four types of recipes are:
    • USER_PERSONALIZATION: Recommends items for a user from a catalog. This is often included on a landing page.
    • RELATED_ITEM: Suggests items similar to a selected item on a detail page.
    • PERSONALZIED_RANKING: Re-ranks a list of items for a user within a category or in within search results.
    • USER_SEGMENTATION: Generates segments of users based on item input data. You can use this to create a targeted marketing campaign for particular products by brand.
  3. Get near real-time recommendations: Once your model is trained, a private personalization model is hosted for you. You can then provide recommendations for your users through a private API.

Figure 2 illustrates a high-level overview of Amazon Personalize:

Figure 2. Building recommendations with Amazon Personalize

Figure 2. Building recommendations with Amazon Personalize

Near real-time personalized recommendations reference architecture

Figure 3 illustrates how to architect near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services.

Reference architecture for near real-time recommendations

Figure 3. Near real-time recommendations reference architecture

Architecture flow:

  1. Data preparation: Start by creating a dataset group, schemas, and datasets representing your items, interactions, and user data.
  2. Train the model: After importing your data, select the recipe matching your use case, and then create a solution to train a model by creating a solution version.
    Once your solution version is ready, you can create a campaign for your solution version. You can create a campaign for every solution version that you want to use for near real-time recommendations.
    In this example architecture, we’re just showing a single solution version and campaign. If you were building out multiple personalization use cases with different recipes, you could create multiple solution versions and campaigns from the same datasets.
  3. Get near real-time recommendations: Once you have a campaign, you can integrate calls to the campaign in your application. This is where calls to the GetRecommendations or GetPersonalizedRanking APIs are made to request near real-time recommendations from Amazon Personalize.
    • The approach you take to integrate recommendations into your application varies based on your architecture but it typically involves encapsulating recommendations in a microservice or AWS Lambda function that is called by your website or mobile application through a RESTful or GraphQL API interface.
    • Near real-time recommendations support the ability to adapt to each user’s evolving interests. This is done by creating an event tracker in Amazon Personalize.
    • An event tracker provides an endpoint that allows you to stream interactions that occur in your application back to Amazon Personalize in near real-time. You do this by using the PutEvents API.
    • Again, the architectural details on how you integrate PutEvents into your application varies, but it typically involves collecting events using a JavaScript library in your website or a native library in your mobile apps, and making API calls to stream them to your backend. AWS provides the AWS Amplify framework that can be integrated into your web and mobile apps to handle this for you.
    • In this example architecture, you can build an event collection pipeline using  Amazon API Gateway, Amazon Kinesis Data Streams, and Lambda to receive and forward interactions to Amazon Personalize.
    • The Event Tracker performs two primary functions. First, it persists all streamed interactions so they will be incorporated into future retraining of your model. This also how Amazon Personalize cold starts new users. When a new user visits your site, Amazon Personalize will recommend popular items. After you stream in an event or two, Amazon Personalize immediately starts adjusting recommendations.

Key considerations and best practices

  1. For all use cases, your interactions data must have a minimum 1000 interaction records from users interacting with items in your catalog. These interactions can be from bulk imports, streamed events, or both, and a minimum 25 unique user IDs with at least two interactions for each.
  2. Metadata fields (user or item) can be used for training, filters, or both.
  3. Amazon Personalize supports the encryption of your imported data. You can specify a role allowing Amazon Personalize to use an AWS Key Management Service (AWS KMS) key to decrypt your data, or use the Amazon Simple Storage Service (Amazon S3) AES-256 server-side default encryption.
  4. You can re-train Amazon Personalize deployments based on how much interaction data you generate on a daily basis. A good rule is to re-train your models once every week or two as needed.
  5. You can apply business rules for personalized recommendations using filters. Refer to Filtering recommendations and user segments for more details.

Conclusion

In this post, we showed you how to build near real-time personalized recommendations using Amazon Personalize and AWS purpose-built data services. With the information in this post, you can now build your own personalized recommendations for your applications.

Read more and get started on building personalized recommendations on AWS:

Design a data mesh with event streaming for real-time recommendations on AWS

Post Syndicated from Vittorio Denti original https://aws.amazon.com/blogs/big-data/design-a-data-mesh-with-event-streaming-for-real-time-recommendations-on-aws/

This blog post was co-authored with Federico Piccinini.

The data landscape has been changing in recent years: there is a proliferation of entities producing and consuming large quantities of data within companies, and for most of them defining a proper data strategy has become of fundamental importance. A modern data strategy gives you a comprehensive plan to manage, access, analyze, and act on data.

As a result, more companies are considering the adoption of a data mesh architecture, a recently introduced paradigm where data is organized by domain, clear ownership of data and technology stack is enhanced, and a more agile setup is achieved. Because of this, some of your applications may need to be designed for a data-by-domain separation in order to benefit from a data mesh architecture.

In this post, we show you how to design a data mesh architecture for a scenario that requires real-time recommendations. The recommendation system is implemented through Amazon Personalize, a fully managed machine learning (ML) service, and works by consuming data by domain. For recommendations use cases, it’s important to have access to information about users, items, and interactions, often associated with different data sources within a company.

Because ML applications may have multiple types of input data, we propose a solution that works both for data at rest as well as real-time streaming. Real-time recommendations require streaming data in order to adapt to the user’s current intent.

Throughout the post, we introduce the data mesh paradigm and then extend it to a real-time use case by adding event streaming capabilities. We use the use case of a music streaming company that offers its customers the opportunity to listen to on-demand songs. The company has also started to offer, through the same platform, on-demand podcasts, and wants to take advantage of a modern data architecture to support data access for fast ML experimentation and inference.

Data mesh: A paradigm shift

Domain-driven design (DDD) represents a software design approach where complex solutions are divided into domains according to the underlying business logic. An architectural style that is often mentioned in the context of DDD is microservice architecture, a concept where software systems are structured into loosely coupled entities, namely microservices, each one owned by a small team and structured around business requirements. These paradigms, together with the advancement of cloud technologies, allowed companies to release software updates faster and continuously adapt their technology stack to evolving business requirements.

However, unlike software architectures, most data architectures were still designed around technologies rather than business domains. This changed in 2019, when Zhamak Dehghani introduced the data mesh. Data mesh is a paradigm shift towards data being treated as a product and processed as part of a domain. Data mesh applies the principles of DDD to data architectures: data is organized into data domains and the data is considered the product that the team owns and offers for consumption. This is a shift from a centralized ownership model to a decentralized one that allows companies to access data at scale. This shift also allows each team assigned to a data domain to build the data products by choosing the right technology for their job, analogous to software engineers working on a microservice.

Data mesh advocates for decentralized ownership and delivery of data management systems, while emphasizing the need for distributed governance and self-service tooling. The data mesh approach enables better autonomy of data domain owners and brings domains together to enable data sharing and federation across business units, without compromising on data security. This type of architecture supports the idea of distributed data, where all data is accessible for those with the right authority to access it. One key differentiator between a data lake and a data mesh is that in a data mesh, data doesn’t have to be consolidated into a single data lake and can remain within different databases.

For more information about the details and advantages of adopting the data mesh as a domain-driven data architecture, refer to Design a data mesh architecture using AWS Lake Formation and AWS Glue.

The components of a data mesh

Now that we have a good understanding of the data mesh paradigm, let’s look at the implementation and its components.

First, we start with data producers. These are the entities that are responsible for maintaining, owning, and exposing the specific data of their domain. Because of the domain separation, each producer can choose its own technology stack independently.

Similarly, we also have data consumers. These components, as their name indicates, use one or more data sources exposed by the producers. As before, adopting a data mesh architecture implies that each consumer is independent one another, meaning they could implement different technology stacks as well as solve different use cases.

The data-at-rest plane is then completed by the Centralized Data Catalog, a component that works as the link between producers and consumers. This middle layer is responsible for indexing the available data producers into a centralized data catalog as well as controlling access to the different data sources.

The data catalog is used by the producers to expose the data products (steps 1a and 1b) to the organization’s data scientists and data engineers working on the consumer domains. The following figure illustrates how data products should be easily discoverable: the central data catalog allows the data consumers to find their data source of interest (steps 2a and 2b) after they have been registered with the centralized catalog by their corresponding producer domain (steps 1a and 1b).

Working with real-time events

One could argue that this architecture can only support data at rest as it is; indeed, there is no straightforward solution to move data in real time from a producer domain to a consumer. The paradigm presented so far addresses the scenario of data at rest, where producers are pulling data on demand rather than being notified when data is changed.

Because several applications need to quickly respond to the changes happening in the environment, real-time data is an important consideration in data architectures. For example, an ecommerce platform or a video streaming service can extract value from the real-time user interactions with content. In these cases, it’s critical to track events as they happen, feed them in the ML model, and adapt the predictions accordingly.

In this section, we want to introduce some of the streaming platforms that can work to implement this pattern, with a focus on Apache Kafka because it’s frequently used and many companies are moving their Kafka workloads to the cloud.

Apache Kafka is an open-source distributed event streaming platform that captures data in real time from sources such as microservices or databases, stores the events in streams organized into topics, and reacts to these events in real time as well as retrospectively. Event streaming architectures built on Apache Kafka follow the publish/subscribe paradigm: the producers publish events to topics via a write operation, and the consumers, which subscribe to such topics, receive the events as they happen. An alternative to Apache Kafka in this scenario could be Amazon Kinesis Data Streams, a streaming service that allows developers to collect, store, and process real-time data in the cloud.

If we consider for example an ecommerce platform, we could have a Payment microservice running the payment functionalities of the system publishing events to Purchases topic, tracking every transaction happening on the platform. Then, we could have another component subscribing to the Purchases topic to receive the events and take action accordingly, for example by updating a dashboard for business intelligence. For more information on Apache Kafka, we recommend reading Introduction to Apache Kafka.

Event-streaming architecture

The data-in-motion plane is introduced to implement the publish/subscribe pattern in the context of a data mesh. Such a plane is composed of the set of producer and consumer domains connected via a central event streaming component that makes real-time events accessible. To benefit from the data-by-domain architecture, we consider each producer to have its own corresponding centralized stream, as shown in the following figure.

You can also think of the event stream as the channel for sending real-time events to the consumers, therefore each producer has its dedicated channel to send updates.

Each consumer can subscribe to multiple topics based on specific data needs. When new events are available, the corresponding producer publishes them in the associated stream (steps 1a and 1b) and the subscribers can read the events (step 2a and 2b), process them, and take action accordingly.

The preceding figure shows a scenario with N producer domains and M consumer domains: each consumer subscribes only to the streams of interest for that domain. In this example, Consumer #1 is subscribed to the events coming from Producer #1, while Consumer #M is subscribed to the events coming from both Producer #1 and Producer #N.

You could adopt this pattern to solve several use cases and data domains. For instance, a user playing a song on a music streaming platform can generate a new event sent from the Interactions service producer to the Personalization consumer, where the recommendation system generates personalized recommendations. Similarly, a Payment producer can send a transaction request, and a Fraud Detector consumer determines whether the transaction is fraudulent or not.

For producers and consumers to communicate correctly, the event payload schema needs to be consistent. Applications depend on schemas so no changes made to events break the implicit contract between producers and consumers. For complex use cases, you can use a schema registry to enforce compatibility in event streaming. For more information about the options for working with the AWS Glue Schema Registry, refer to Validate streaming data over Amazon MSK using schemas in cross-account AWS Glue Schema Registry.

Recommendation use case

Previously, we introduced the overall idea behind the data mesh architecture without focusing on a specific use case. In this section, we present a real-world scenario where the mesh paradigm is implemented using AWS.

Let’s consider the music streaming company XYZ, which offers its customers the opportunity to listen to on-demand songs. XYZ has recently started to offer, through the same platform, on-demand podcasts as well.

The ML team is interested in adding podcasts to the catalog of personalized recommendations that are presented to users. To do so, the ML team working on the recommendation system, which in the data mesh paradigm can be seen as a consumer, needs access to multiple data domains (producers): Users, Songs, Podcasts, and Interactions.

In this post, we use Amazon Personalize as a fully managed ML service for personalized recommendations. It allows developers to train, tune, and deploy custom ML models to deliver highly customized experiences. Amazon Personalize provisions the infrastructure and manages the entire ML pipeline, including processing the data; identifying features; and training, optimizing, and hosting the models. You can learn more about Amazon Personalize in the Developer Guide.

We now dive deeper into the implementation of the solution, both for the data-at-rest and data-in-motion scenario. ML needs large amounts of data at rest to create a dataset and train the models. Additionally, the personalization scenario requires access to real-time data to adapt to the users’ current intent, so we need access to real-time events and interactions. A data mesh solution for this scenario will require both:

  • Data at rest – Historical data from user, items, and interactions. Some of this could be stored in separate systems and data sources.
  • Data in motion – This data is for the real-time events, for instance songs listened to or new items made available in the catalog.

Architecture for data at rest

In this section, we focus on the data at rest part of the solution.

The following diagram shows how we can implement the data mesh architecture in the context of personalized recommendations, and include the podcasts in the recommendation system deployed with Amazon Personalize. Each producer domain owns the data and exposes them via the data catalogs. The consumers use the data catalogs to find the data they need for their application.

First, we can identify the three main components of the mesh architecture introduced before: data producers, the centralized data catalog, and data consumers.

In this specific example, we can see how different producer domains implement different storage solutions:

  • The Users domain uses Amazon Aurora as its own line of business (LOB) database, a relational database (step 1a)
  • Songs and Podcasts use Amazon DynamoDB, a NoSQL database (steps 1b and 1c)
  • Interactions ingests the events directly into Amazon S3 (step 1d)

The producer domains are decoupling their LOB databases from the data catalogs by using Amazon Simple Storage Service (Amazon S3). With the data mesh paradigm, each producer considers the data as a product, therefore it can preprocess the data before exposing them, and store the results in a format that is suitable for the consumers. This decoupling is implemented using AWS Glue to define an extract, transform, and load (ETL) pipeline, whose results are eventually stored in S3 buckets (steps 2a, 2b, 2c).

Finally, each producer shares its respective AWS Glue Data Catalog with the Centralized Data Catalog (steps 3a, 3b, 3c, 3d).

Data consumers can now access the different data domains through the central catalog. As shown in the preceding figure, we have two consumers: the Analytics domain, which accesses certain catalogs and showcases metrics on an Amazon QuickSight dashboard (step 4), and the Personalized Recommendations domain (step 5).

The latter, which is the one of interest for this post, consists of an AWS Glue ETL job that accesses, through the central catalog, data from the different producers. The ETL job performs traditional data engineering tasks, for example merging song and podcast data. We can now generate our Amazon Personalize solution, where our items dataset includes information about both songs and podcasts, expanding the initial recommendation catalog.

Our recommendation engine is then made available for inference requests through an API deployed using Amazon API Gateway (step 6).

The architecture is designed to work across multiple accounts: an AWS account is a natural boundary for the resources deployed into it and a single unit of billing. This approach allows us to separate the resources owned by the different domains and maintain operational agility: each team owns and controls its account. To learn more about the approaches for sharing data catalogs across different accounts while working with a data mesh, check out Design a data mesh architecture using AWS Lake Formation and AWS Glue.

We’re now able to provide users with song or podcast recommendations based on their comprehensive listening preferences across the two categories. In the next section, we explore how to improve the architecture to be reactive to continuously evolving data, such as new songs added to the catalog or new interactions made available.

Architecture for data in motion

Earlier, we introduced the theoretical framework for event streaming in the context of the data mesh, defined as the data-in-motion plane. We can now drill down into the architecture for our specific use case.

We’re using a scenario with four producers (Users, Songs, Podcasts, and Interactions), the central streaming component, and two consumer domains (Personalized Recommendations and Analytics). The data-in-motion plane is implemented by using a platform for event streaming, namely Apache Kafka, and each producer has a dedicated stream to publish its events.

In the scenario of real-time recommendations for music, the Personalized Recommendations consumer is notified about changes to Users, Songs, Podcasts, and Interactions. Similar to the at-rest example, we also consider a second consumer domain, called Analytics, used to create real-time dashboards about the trends in the interactions. Here, the analytics consumer requires only interaction events, therefore it subscribes only to the Interactions stream.

This architecture is designed to offer a loosely coupled interaction mechanism for producers and consumers: the producers don’t need to know about the consumers that are part of the system. The producers focus on emitting the events, the events are sent to the data-in-motion plane, and the delivery is guaranteed by the streaming platform.

Let’s drill down into the strategy for building this architecture in the cloud. For readability purposes, we study this part of the solution in isolation, without adding to the diagram of the data-at-rest scenario.

From a technological perspective, we use AWS Lambda to run the back-end business logic of the application: the microservice runs the logic in a Lambda function and publishes events to the event streams. We use Lambda because it fits our use case well, both for scalability and operational efficiency, because it offers minimal operational overhead. However, the architecture pattern is also valid for other types of backend deployments, for example, containers running on Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS).

The data-in-motion plane is implemented using Amazon Managed Streaming for Apache Kafka (Amazon MSK), a fully managed solution for running Apache Kafka in the cloud. It provisions the servers, configures the Apache Kafka clusters, replaces servers when they fail, orchestrates server patches and upgrades, and runs clusters for high availability. Kafka organizes and stores events into topics. Topics are always multi-producer and multi-consumer: this means that one or many producers can publish to the same topic, and one or many consumers can subscribe to read from the topic. We use the concept of topics to model this architecture paradigm, and we assign one topic for each producer domain.

Finally, we adapt our previously introduced consumer domain, Personalized Recommendations, to take into account real-time events. This time, we use Lambda to read the events from the topics and invoke the commands to call the Amazon Personalize API through the Amazon Personalize SDK. Within the same consumer domain, we use a Lambda function per topic, which is triggered as soon as a new event is published in the monitored topic. This event-driven pattern allows us to run code only when a new event is published and we need to update the information in Amazon Personalize. Each Lambda function in the Personalized Recommendations domain uses the Amazon Personalize SDK to invoke the corresponding actions on Amazon Personalize and update the datasets.

Let’s consider a new interaction happening in the system using the following figure. This serverless implementation of the event streaming pattern extends the data mesh to respond to real-time events.

The Interactions microservice, which is running the backend logic of the application, publishes a new event (step 1), which is persisted in the Interactions topic (step 2). The publishing of a new event triggers the Lambda functions subscribed to the topic, in this case InteractionsUpdate and InteractionsIngestor (step 3). The InteractionsUpdate function invokes the PutEvents operation on the Amazon Personalize API through the Amazon Personalize SDK to add the real-time event to the recommendation system (step 4). InteractionsIngestor triggers the operations to refresh the dashboards according to the strategy adopted by the Analytics domain. Finally, other services and components can consume the recommendations through the API exposed by the Personalized Recommendation domain to make the predictions consumable (step 5).

For the Analytics domain, which we added to showcase the scalability of this architecture, we use a Lambda function to ingest the real-time events into Amazon Kinesis Data Firehose. Then we can visualize the interactions using Amazon OpenSearch Service in conjunction with Amazon QuickSight. For more details, refer to Visualize live analytics from Amazon QuickSight connected to Amazon OpenSearch Service.

Because the data producers, Kafka resources, and data consumers are all in different accounts, we need to establish cross-account connectivity to keep the traffic within the AWS infrastructure and avoid the public internet, both for security reasons as well as cost-optimization. The objective of this post is to show the architecture and the approach to implement this pattern. If you want to dive deeper into how to establish cross-account connectivity between producers and consumers and Amazon MSK, refer to Secure connectivity patterns to access Amazon MSK and How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS Private Link.

Data mesh with event streaming: Putting it all together

Earlier, we recalled the data mesh paradigm and designed a solution to emphasize the importance of adopting a data as a product strategy. Each producer domain exposes the data via the catalog, and they are made centrally discoverable through the Centralized Data Catalog. Each consumer domain has a catalog interface for connecting to the central catalog and finding the data required to build the solution the domain focuses on.

Next, we studied the scenario for data in motion, introduced Apache Kafka and Amazon MSK to implement the event streaming platform, and connected the producers and consumers with the streaming service via Lambda. This event-driven implementation allows us to decouple the producers from the consumers, and make the solution scalable as the domains may change and evolve during time, without significant changes required in the architecture.

We can now put it all together, as shown in the following figure. The complete data mesh with event streaming architecture uses two different data planes: one is dedicated for sharing data at rest (blue); the other one is for data in motion (red).

Each domain has two interfaces required to communicate with both planes: the data catalogs and the Lambda functions. The data at rest is shared and discovered by taking advantage of the data catalogs, whereas the data in motion are emitted by the service running the backend logic in the producer domains. They’re consumed using the Lambda functions subscribed to the topics, which are deployed in the consumer domains.

Conclusion

In this post, we introduced the high-level architecture paradigm that allows you to extend the concept of a data mesh to real-time events.

We first covered the fundamental concepts associated with this architectural style, and then showcased how to apply this solution to solve real-world business challenges, such as real-time personalized recommendations and analytics, in a multi-account setting on AWS.

Furthermore, the framework presented in this post can be generalized to different domains, for example other AWS AI services such as Amazon Forecast or Amazon Comprehend, or your custom ML solutions built for your specific scenario and deployed through Amazon SageMaker. With the most experience, the most reliable, scalable and secure cloud, and the most comprehensive set of services and solutions, AWS is the best place to unlock value from your data.

More resources:


About the authors

Vittorio Denti is a Solutions Architect at AWS based in London. After completing his M.Sc. in Computer Science and Engineering at Politecnico di Milano (Milan) and the KTH Royal Institute of Technology (Stockholm), he joined AWS. Vittorio has a background in Distributed Systems and Machine Learning, and a strong interest in cloud technologies. He’s especially passionate for software engineering, building ML models, and putting ML into production.

Anna Grüebler is a Specialist Solutions Architect at AWS focusing on in Artificial Intelligence. She has more than 10 years experience helping customers develop and deploy machine learning applications. Her passion is taking new technologies and putting them in the hands of everyone, and solving difficult problems leveraging the advantages of using AI in the cloud.

Amazon Personalize customer outreach on your ecommerce platform

Post Syndicated from Sridhar Chevendra original https://aws.amazon.com/blogs/architecture/amazon-personalize-customer-outreach-on-your-ecommerce-platform/

In the past, brick-and-mortar retailers leveraged native marketing and advertisement channels to engage with consumers. They have promoted their products and services through TV commercials, and magazine and newspaper ads. Many of them have started using social media and digital advertisements. Although marketing approaches are beginning to modernize and expand to digital channels, businesses still depend on expensive marketing agencies and inefficient manual processes to measure campaign effectiveness and understand buyer behavior. The recent pandemic has forced many retailers to take their businesses online. Those who are ready to embrace these changes have embarked on a technological and digital transformation to connect to their customers. As a result, they have begun to see greater business success compared to their peers.

Digitizing a business can be a daunting task, due to lack of expertise and high infrastructure costs. By using Amazon Web Services (AWS), retailers are able to quickly deploy their products and services online with minimal overhead. They don’t have to manage their own infrastructure. With AWS, retailers have no upfront costs, have minimal operational overhead, and have access to enterprise-level capabilities that scale elastically, based on their customers’ demands. Retailers can gain a greater understanding of customers’ shopping behaviors and personal preferences. Then, they are able to conduct effective marketing and advertisement campaigns, and develop and measure customer outreach. This results in increased satisfaction, higher retention, and greater customer loyalty. With AWS you can manage your supply chain and directly influence your bottom line.

Building a personalized shopping experience

Let’s dive into the components involved in building this experience. The first step in a retailer’s digital transformation journey is to create an ecommerce platform for their customers. This platform enables the organization to capture their customers’ actions, also referred to as ‘events’. Some examples of events are clicking on the shopping site to browse product categories, searching for a particular product, adding an item to the shopping cart, and purchasing a product. Each of these events gives the organization information about their customer’s intent, which is invaluable in creating a personalized experience for that customer. For instance, if a customer is browsing the “baby products” category, it indicates their interest in that category even if a purchase is not made. These insights are typically difficult to capture in an in-store experience. Online shopping makes gaining this knowledge much more straightforward and scalable.

The proposed solution outlines the use of AWS services to create a digital experience for a retailer and consumers. The three key areas are: 1) capturing customer interactions, 2) making real-time recommendations using AWS managed Artificial Intelligence/Machine Learning (AI/ML) services, and 3) creating an analytics platform to detect patterns and adjust customer outreach campaigns. Figure 1 illustrates the solution architecture.

Digital shopping experience architecture

Figure 1. Digital shopping experience architecture

For this use case, let’s assume that you are the owner of a local pizzeria, and you manage deliveries through an ecommerce platform like Shopify or WooCommerce. We will walk you through how to best serve your customer with a personalized experience based on their preferences.

The proposed solution consists of the following components:

  1. Data collection
  2. Promotion campaigns
  3. Recommendation engine
  4. Data analytics
  5. Customer reachability

Let’s explore each of these components separately.

Data collection with Amazon Kinesis Data Streams

When a customer uses your web/mobile application to order a pizza, the application captures their activity as click-stream ‘events’. These events provide valuable insights about your customers’ behavior. You can use these insights to understand the trends and browsing pattern of prospects who visited your web/mobile app, and use the data collected for creating promotion campaigns. As your business scales, you’ll need a durable system to preserve these events against system failures, and scale based on unpredictable traffic on your platform.

Amazon Kinesis is a Multi-AZ, managed streaming service that provides resiliency, scalability, and durability to capture an unlimited number of events without any additional operational overhead. Using Kinesis producers (Kinesis Agent, Kinesis Producer Library, and the Kinesis API), you can configure applications to capture your customer activity. You can ingest these events from the frontend, and then publish them to Amazon Kinesis Data Streams.

Let us start by setting up Amazon Kinesis Data Streams to capture the real-time sales transactions from the online channels like a portal or mobile app. For this blog post, we have used the Kaggle’s public data set as a reference. Figure 2 illustrates a snapshot of sample data to build personalized recommendations for a customer.

Sample sales transaction data

Figure 2. Sample sales transaction data

Promotion campaigns with AWS Lambda

One way to increase customer conversion is by offering discounts. When the customer adds a pizza to their cart, you want to make sure they are receiving the best deal. Let’s assume that by adding an additional item, your customer will receive the best possible discount. Just by knowing the total cost of added items to the cart, you can provide these relevant promotions to this customer.

For this scenario, the AWS Lambda service polls the Amazon Kinesis Data Streams to read all the events in the stream. It then matches the events based on your criteria of items in the cart. In turn, these events will be processed by the Lambda function. The Lambda function will read your up-to-date promotions stored in Amazon DynamoDB. As an option, caching recent or most popular promotions will improve your application response time, as well as improve the customer experience on your platform. Amazon DynamoDB DAX is an integrated caching for DynamoDB that caches the most recent or popular promotions or items.

For example, when the customer added the items to their shopping cart, Lambda will send promotion details to them based on the purchase amount. This can be for free shipping or discount of a certain percentage. Figure 3 illustrates the snapshot of sample promotions.

Promotions table in DynamoDB

Figure 3. Promotions table in DynamoDB

Recommendations engine with Amazon Personalize

In addition to sharing these promotions with your customer, you may also want to share the recommended add-ons. In order to understand your customer preferences, you must gather historical datasets to determine patterns and generate relevant recommendations. Since web activity consists of millions of events, this would be a daunting task for humans to review, determine the patterns, and make recommendations. And since user preferences change, you need a system that can use all this volume of data and provide accurate predictions.

Amazon Personalize is a managed AI/ML service that will help you to train an ML model based on datasets. It provides an inference point for real-time recommendations prior to having ML experience. Based on the datasets, Amazon Personalize also provides recipes to generate recommendations. As your customers interact on the ecommerce platform, your frontend application calls Amazon Personalize inference endpoints. It then retrieves a set of personalized recommendations based on your customer preferences.

Here is the sample Python code to display the list of available recommenders, and associated recommendations.

import boto3
import json
client = boto3.client('personalize')

# Connect to the personalize runtime for the customer recommendations

recomm_endpoint = boto3.client('personalize-runtime')
response = recomm_endpoint.get_recommendations(itemId='79323P',
  recommenderArn='arn:aws:personalize:us-east-1::recommender/my-items',
  numResults=5)

print(json.dumps(response['itemList'], indent=2))

[
  {
    "itemId": "79323W"
  },
  {
    "itemId": "79323GR"
  },
  {
    "itemId": "79323LP"
  },
  {
  "itemId": "79323B"
  },
  {
    "itemId": "79323G"
  }
]

You can use Amazon Kinesis Data Firehose to read the data near real time from the Amazon Kinesis Data Streams collected the data from the front-end applications. Then you can store this data in Amazon Simple Storage Service (S3). Amazon S3 is peta-byte scale storage help you scale and acts as a repository and single source of truth. We use S3 data as seed data to build a personalized recommendation engine using Amazon Personalize. As your customers interact on the ecommerce platform, call the Amazon Personalize inference endpoint to make personalized recommendations based on user preferences.

Customer reachability with Amazon Pinpoint

If a customer adds products to their cart but never checks out, you may want to send them a reminder. You can set up an email to suggest they re-order after a period of time after their first order. Or you may want to send them promotions based on their preferences. And as your customers’ behavior changes, you probably want to adapt your messaging accordingly.

Your customer may have a communication preference, such as phone, email, SMS, or in-app notifications. If an order has an issue, you can inform the customer as soon as possible using their preferred method of communication, and perhaps follow it up with a discount.

Amazon Pinpoint is a flexible and scalable outbound and inbound marketing communications service. You can add users to Audience Segments, create reusable content templates integrated with Amazon Personalize, and run scheduled campaigns. With Amazon Pinpoint journeys, you can send action or time-based notifications to your users.

The following workflow shown in Figure 4, illustrates customer communication workflow for promotion. A journey is created for a cohort of college students: a “Free Drink” promotion is offered with a new order. You can send this promotion over email. If the student opens the email, you can immediately send them a push notification reminding them to place an order. But if they didn’t open this email, you could wait three days, and follow up with a text message.

Promotion workflow in Amazon Pinpoint

Figure 4. Promotion workflow in Amazon Pinpoint

Data analytics with Amazon Athena and Amazon QuickSight

To understand the effectiveness of your campaigns, you can use S3 data as a source for Amazon Athena. Athena is an interactive query service that analyzes data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

There are different ways to create visualizations in Amazon QuickSight. For instance, you can use Amazon S3 as a data lake. One option is to import your data into SPICE (Super-fast, Parallel, In-memory Calculation Engine) to provide high performance and concurrency. You can also create a direct connection to the underlying data source. For this use case, we choose to import to SPICE, which provides faster visualization in a production setup. Schedule consistent refreshes to help ensure that dashboards are referring to the most current data.

Once your data is imported to your SPICE, review QuickSight’s visualization dashboard. Here, you’ll be able to choose from a wide variety of charts and tables, while adding interactive features like drill downs and filters.

The process following illustrates how to create a customer outreach strategy using ZIP codes, and allocate budgets to the marketing campaigns accordingly. First, we use this sample SQL command that we ran in Athena to query for top 10 pizza providers. The results are shown in Figure 5.

SELECT name, count(*) as total_count FROM "analyticsdemodb"."fooddatauswest2"
group by name
order by total_count desc
limit 10

Athena query results for top 10 pizza providers

Figure 5. Athena query results for top 10 pizza providers

Second, here is the sample SQL command that we ran in Athena to find Total pizza counts by postal code (ZIP code). Figure 6 shows a visualization to help create customer outreach strategy per ZIP codes and budget the marketing campaigns accordingly.

SELECT postalcode, count(*) as total_count FROM "analyticsdemodb"."fooddatauswest2"
where postalcode is not null
group by postalcode
order by total_count desc limit 50;

QuickSight visualization showing pizza orders by zip codes

Figure 6. QuickSight visualization showing pizza orders by zip codes

Conclusion

AWS enables you to build an ecommerce platform and scale your existing business with minimal operational overhead and no upfront costs. You can augment your ecommerce platform by building personalized recommendations and effective marketing campaigns based on your customer needs. The solution approach provided in the blog will help organizations build re-usable architecture pattern and personalization using AWS managed services.

Target your customers with ML based on their interest in a product or product attribute.

Post Syndicated from Pavlos Ioannou Katidis original https://aws.amazon.com/blogs/messaging-and-targeting/use-machine-learning-to-target-your-customers-based-on-their-interest-in-a-product-or-product-attribute/

Customer segmentation allows marketers to better tailor their efforts to specific subgroups of their audience. Businesses who employ customer segmentation can create and communicate targeted marketing messages that resonate with specific customer groups. Segmentation increases the likelihood that customers will engage with the brand, and reduces the potential for communications fatigue—that is, the disengagement of customers who feel like they’re receiving too many messages that don’t apply to them. For example, if your business wants to launch an email campaign about business suits, the target audience should only include people who wear suits.

This blog presents a solution that uses Amazon Personalize to generate highly personalized Amazon Pinpoint customer segments. Using Amazon Pinpoint, you can send messages to those customer segments via campaigns and journeys.

Personalizing Pinpoint segments

Marketers first need to understand their customers by collecting customer data such as key characteristics, transactional data, and behavioral data. This data helps to form buyer personas, understand how they spend their money, and what type of information they’re interested in receiving.

You can create two types of customer segments in Amazon Pinpoint: imported and dynamic. With both types of segments, you need to perform customer data analysis and identify behavioral patterns. After you identify the segment characteristics, you can build a dynamic segment that includes the appropriate criteria. You can learn more about dynamic and imported segments in the Amazon Pinpoint User Guide.

Businesses selling their products and services online could benefit from segments based on known customer preferences, such as product category, color, or delivery options. Marketers who want to promote a new product or inform customers about a sale on a product category can use these segments to launch Amazon Pinpoint campaigns and journeys, increasing the probability that customers will complete a purchase.

Building targeted segments requires you to obtain historical customer transactional data, and then invest time and resources to analyze it. This is where the use of machine learning can save time and improve the accuracy.

Amazon Personalize is a fully managed machine learning service, which requires no prior ML knowledge to operate. It offers ready to use models for segment creation as well as product recommendations, called recipes. Using Amazon Personalize USER_SEGMENTATION recipes, you can generate segments based on a product ID or a product attribute.

About this solution

The solution is based on the following reference architectures:

Both of these architectures are deployed as nested stacks along the main application to showcase how contextual segmentation can be implemented by integrating Amazon Personalize with Amazon Pinpoint.

High level architecture

Architecture Diagram

Once training data and training configuration are uploaded to the Personalize data bucket (1) an AWS Step Function state machine is executed (2). This state machine implements a training workflow to provision all required resources within Amazon Personalize. It trains a recommendation model (3a) based on the Item-Attribute-Affinity recipe. Once the solution is created, the workflow creates a batch segment job to get user segments (3b). The job configuration focuses on providing segments of users that are interested in action genre movies

{ "itemAttributes": "ITEMS.genres = \"Action\"" }

When the batch segment job finishes, the result is uploaded to Amazon S3 (3c). The training workflow state machine publishes Amazon Personalize state changes on a custom event bus (4). An Amazon Event Bridge rule listens on events describing that a batch segment job has finished (5). Once this event is put on the event bus, a batch segment postprocessing workflow is executed as AWS Step Function state machine (6). This workflow reads and transforms the segment job output from Amazon Personalize (7) into a CSV file that can be imported as static segment into Amazon Pinpoint (8). The CSV file contains only the Amazon Pinpoint endpoint-ids that refer to the corresponding users from the Amazon Personalize recommendation segment, in the following format:

Id
hcbmnng30rbzf7wiqn48zhzzcu4
tujqyruqu2pdfqgmcgkv4ux7qlu
keul5pov8xggc0nh9sxorldmlxc
lcuxhxpqh/ytkitku2zynrqb2ce

The mechanism to resolve an Amazon Pinpoint endpoint id relies on the user id that is set in Amazon Personalize to be also referenced in each endpoint within Amazon Pinpoint using the user ID attribute.

State machine for getting Amazon Pinpoint endpoints

The workflow ensures that the segment file has a unique filename so that the segments within Amazon Pinpoint can be identified independently. Once the segment CSV file is uploaded to S3 (7), the segment import workflow creates a new imported segment within Amazon Pinpoint (8).

Datasets

The solution uses an artificially generated movies’ dataset called Bingewatch for demonstration purposes. The data is pre-processed to make it usable in the context of Amazon Personalize and Amazon Pinpoint. The pre-processed data consists of the following:

  • Interactions’ metadata created out of the Bingewatch ratings.csv
  • Items’ metadata created out of the Bingewatch movies.csv
  • users’ metadata created out of the Bingewatch ratings.csv, enriched with invented data about e-mail address and age
  • Amazon Pinpoint endpoint data

Interactions’ dataset

The interaction dataset describes movie ratings from Bingewatch users. Each row describes a single rating by a user identified by a user id.

The EVENT_VALUE describes the actual rating from 1.0 to 5.0 and the EVENT_TYPE specifies that the rating resulted because a user watched this movie at the given TIMESTAMP, as shown in the following example:

USER_ID,ITEM_ID,EVENT_VALUE,EVENT_TYPE,TIMESTAMP
1,1,4.0,Watch,964982703 
2,3,4.0,Watch,964981247
3,6,4.0,Watch,964982224
...

Items’ dataset

The item dataset describes each available movie using a TITLE, RELEASE_YEAR, CREATION_TIMESTAMP and a pipe concatenated list of GENRES, as shown in the following example:

ITEM_ID,TITLE,RELEASE_YEAR,CREATION_TIMESTAMP,GENRES
1,Toy Story,1995,788918400,Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji,1995,788918400,Adventure|Children|Fantasy
3,Grumpier Old Men,1995,788918400,Comedy|Romance
...

Users’ dataset

The users dataset contains all known users identified by a USER_ID. This dataset contains artificially generated metadata that describe the users’ GENDER and AGE, as shown in the following example:

USER_ID,GENDER,E_MAIL,AGE
1,Female,[email protected],21
2,Female,[email protected],35
3,Male,[email protected],37
4,Female,[email protected],47
5,Agender,[email protected],50
...

Amazon Pinpoint endpoints

To map Amazon Pinpoint endpoints to users in Amazon Personalize, it is important to have a consisted user identifier. The mechanism to resolve an Amazon Pinpoint endpoint id relies that the user id in Amazon Personalize is also referenced in each endpoint within Amazon Pinpoint using the userId attribute, as shown in the following example:

User.UserId,ChannelType,User.UserAttributes.Gender,Address,User.UserAttributes.Age
1,EMAIL,Female,[email protected],21
2,EMAIL,Female,[email protected],35
3,EMAIL,Male,[email protected],37
4,EMAIL,Female,[email protected],47
5,EMAIL,Agender,[email protected],50
...

Solution implementation

Prerequisites

To deploy this solution, you must have the following:

Note: This solution creates an Amazon Pinpoint project with the name personalize. If you want to deploy this solution on an existing Amazon Pinpoint project, you will need to perform changes in the YAML template.

Deploy the solution

Step 1: Deploy the SAM solution

Clone the GitHub repository to your local machine (how to clone a GitHub repository). Navigate to the GitHub repository location in your local machine using SAM CLI and execute the command below:

sam deploy --stack-name contextual-targeting --guided

Fill the fields below as displayed. Change the AWS Region to the AWS Region of your preference, where Amazon Pinpoint and Amazon Personalize are available. The Parameter Email is used from Amazon Simple Notification Service (SNS) to send you an email notification when the Amazon Personalize job is completed.

Configuring SAM deploy
======================
        Looking for config file [samconfig.toml] :  Not found
        Setting default arguments for 'sam deploy'     =========================================
        Stack Name [sam-app]: contextual-targeting
        AWS Region [us-east-1]: eu-west-1
        Parameter Email []: [email protected]
        Parameter PEVersion [v1.2.0]:
        Parameter SegmentImportPrefix [pinpoint/]:
        #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
        Confirm changes before deploy [y/N]:
        #SAM needs permission to be able to create roles to connect to the resources in your template
        Allow SAM CLI IAM role creation [Y/n]:
        #Preserves the state of previously provisioned resources when an operation fails
        Disable rollback [y/N]:
        Save arguments to configuration file [Y/n]:
        SAM configuration file [samconfig.toml]:
        SAM configuration environment [default]:
        Looking for resources needed for deployment:
        Creating the required resources...
        [...]
        Successfully created/updated stack - contextual-targeting in eu-west-1
======================

Step 2: Import the initial segment to Amazon Pinpoint

We will import some initial and artificially generated endpoints into Amazon Pinpoint.

Execute the command below to your AWS CLI in your local machine.

The command below is compatible with Linux:

SEGMENT_IMPORT_BUCKET=$(aws cloudformation describe-stacks --stack-name contextual-targeting --query 'Stacks[0].Outputs[?OutputKey==`SegmentImportBucket`].OutputValue' --output text)
aws s3 sync ./data/pinpoint s3://$SEGMENT_IMPORT_BUCKET/pinpoint

For Windows PowerShell use the command below:

$SEGMENT_IMPORT_BUCKET = (aws cloudformation describe-stacks --stack-name contextual-targeting --query 'Stacks[0].Outputs[?OutputKey==`SegmentImportBucket`].OutputValue' --output text)
aws s3 sync ./data/pinpoint s3://$SEGMENT_IMPORT_BUCKET/pinpoint

Step 3: Upload training data and configuration for Amazon Personalize

Now we are ready to train our initial recommendation model. This solution provides you with dummy training data as well as a training and inference configuration, which needs to be uploaded into the Amazon Personalize S3 bucket. Training the model can take between 45 and 60 minutes.

Execute the command below to your AWS CLI in your local machine.

The command below is compatible with Linux:

PERSONALIZE_BUCKET=$(aws cloudformation describe-stacks --stack-name contextual-targeting --query 'Stacks[0].Outputs[?OutputKey==`PersonalizeBucketName`].OutputValue' --output text)
aws s3 sync ./data/personalize s3://$PERSONALIZE_BUCKET

For Windows PowerShell use the command below:

$PERSONALIZE_BUCKET = (aws cloudformation describe-stacks --stack-name contextual-targeting --query 'Stacks[0].Outputs[?OutputKey==`PersonalizeBucketName`].OutputValue' --output text)
aws s3 sync ./data/personalize s3://$PERSONALIZE_BUCKET

Step 4: Review the inferred segments from Amazon Personalize

Once the training workflow is completed, you should receive an email on the email address you provided when deploying the stack. The email should look like the one in the screenshot below:

SNS notification for Amazon Personalize job

Navigate to the Amazon Pinpoint Console > Your Project > Segments and you should see two imported segments. One named endpoints.csv that contains all imported endpoints from Step 2. And then a segment named ITEMSgenresAction_<date>-<time>.csv that contains the ids of endpoints that are interested in action movies inferred by Amazon Personalize

Amazon Pinpoint segments created by the solution

You can engage with Amazon Pinpoint customer segments via Campaigns and Journeys. For more information on how to create and execute Amazon Pinpoint Campaigns and Journeys visit the workshop Building Customer Experiences with Amazon Pinpoint.

Next steps

Contextual targeting is not bound to a single channel, like in this solution email. You can extend the batch-segmentation-postprocessing workflow to fit your engagement and targeting requirements.

For example, you could implement several branches based on the referenced endpoint channel types and create Amazon Pinpoint customer segments that can be engaged via Push Notifications, SMS, Voice Outbound and In-App.

Clean-up

To delete the solution, run the following command in the AWS CLI.

The command below is compatible with Linux:

SEGMENT_IMPORT_BUCKET=$(aws cloudformation describe-stacks --stack-name contextual-targeting --query 'Stacks[0].Outputs[?OutputKey==`SegmentImportBucket`].OutputValue' --output text)
PERSONALIZE_BUCKET=$(aws cloudformation describe-stacks --stack-name contextual-targeting --query 'Stacks[0].Outputs[?OutputKey==`PersonalizeBucketName`].OutputValue' --output text)
aws s3 rm s3://$SEGMENT_IMPORT_BUCKET/ --recursive
aws s3 rm s3://$PERSONALIZE_BUCKET/ --recursive
sam delete

For Windows PowerShell use the command below:

$SEGMENT_IMPORT_BUCKET=$(aws cloudformation describe-stacks --stack-name contextual-targeting --query 'Stacks[0].Outputs[?OutputKey==`SegmentImportBucket`].OutputValue' --output text)
$PERSONALIZE_BUCKET=$(aws cloudformation describe-stacks --stack-name contextual-targeting --query 'Stacks[0].Outputs[?OutputKey==`PersonalizeBucketName`].OutputValue' --output text)
aws s3 rm s3://$SEGMENT_IMPORT_BUCKET/ --recursive
aws s3 rm s3://$PERSONALIZE_BUCKET/ --recursive
sam delete

Amazon Personalize resources like Dataset groups, datasets, etc. are not created via AWS Cloudformation, thus you have to delete them manually. Please follow the instructions in the official AWS documentation on how to clean up the created resources.

About the Authors

Pavlos Ioannou Katidis

Pavlos Ioannou Katidis

Pavlos Ioannou Katidis is an Amazon Pinpoint and Amazon Simple Email Service Specialist Solutions Architect at AWS. He loves to dive deep into his customer’s technical issues and help them design communication solutions. In his spare time, he enjoys playing tennis, watching crime TV series, playing FPS PC games, and coding personal projects.

Christian Bonzelet

Christian Bonzelet

Christian Bonzelet is an AWS Solutions Architect at DFL Digital Sports. He loves those challenges to provide high scalable systems for millions of users. And to collaborate with lots of people to design systems in front of a whiteboard. He uses AWS since 2013 where he built a voting system for a big live TV show in Germany. Since then, he became a big fan on cloud, AWS and domain driven design.

AWS Week in Review – August 15, 2022

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-week-in-review-august-15-2022/

I love the AWS Twitch channel for watching interesting online live shows such as AWS On Air, Containers from the Couch, and Serverless Office Hours.

Last week, AWS Storage Day 2022 was hosted virtually on the AWS Twitch channel and covered recent announcements and insights that address customers’ needs to reduce and optimize storage costs and build data resiliency into their organization. For example, we pre-announced Amazon File Cache, an upcoming new service on AWS that accelerates and simplifies hybrid cloud workloads. To learn more, watch the on-demand recording.

Two weeks ago, AWS Silicon Innovation Day 2022 was also hosted on the AWS Twitch channel. This event covered an overview of our history of silicon development and provided useful sessions on specific AWS chip innovations such as AWS NitroAWS GravitonAWS Inferencia, and AWS Trainium. To learn more, watch the on-demand recording. If you don’t miss such useful live events or online shows, check out the upcoming live schedule!

Last Week’s Launches
Here are some launches that caught my eye last week:

AWS Private 5G – With the general availability of AWS Private 5G, you can easily make your own private mobile networks with a powerful box of hardware and software for 4G/LTE mobile networks. This cool new service lets you easily install, operate, and scale high reliability and low latency of a private cellular network in a matter of days and does not require any specialized expertise. You pay only for the network coverage and capacity that you need.

AWS DeepRacer Student Community Races – Educators and event organizers can now create their own private virtual autonomous racing league for students by powering a 1/18th scale race car driven by reinforcement learning. They can select their own track, race date, and time and invite students to participate through a unique link for their event. To learn more, see the AWS DeepRacer Developer Guide.

Amazon SageMaker Updates – Amazon SageMaker Automatic Model Tuning now supports specifying multiple alternate SageMaker training instance types to make tuning jobs more robust when the preferred instance type is not available due to insufficient capacity. SageMaker Model Pipelines supports securely sharing pipeline entities across AWS accounts and access to shared pipelines through direct API calls. SageMaker Canvas expands capabilities to better prepare and analyze data, including replacing missing values and outliers and the flexibility to choose different sample sizes for your datasets.

Amazon Personalize Updates – Amazon Personalize supports incremental bulk dataset imports, a new option for updating your data and improving the quality of your recommendations. Also, Amazon Personalize allows you to promote specific items in all users’ recommendations based on rules that align with your business goals.

AWS Partner Program Updates – We announce the new AWS Transfer Family Delivery Program for AWS Partners that helps customers build sophisticated Managed File Transfer (MFT) and business-to-business (B2B) file exchange solutions with AWS Transfer Family. Also, we introduce the new AWS Supply Chain Competency, featuring top AWS Partners who provide professional services and cloud-native supply chain solutions on AWS.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other news items that you may find interesting:

AWS CDK for Terraform – Two years ago, AWS began collaborating with HashiCorp to develop Cloud Development Kit for Terraform (CDKTF), an open-source tool that provides a developer-friendly workflow for deploying cloud infrastructure with Terraform in their preferred programming language. The CDKTF is now generally available, so try CDK for Terraform and AWS CDK.

Smithy Interface Definition Language (IDL) 2.0 – Smithy is Amazon’s next-generation API modeling language, based on our experience building tens of thousands of services and generating SDKs. This release focuses on improving the developer experience of authoring Smithy models and using code generated from Smithy models.

Serverless Snippets Collection – The AWS Serverless Developer Advocate team introduces the snippets collection to enable reusable, tested, and recommended snippets driven and maintained by the community. Builders can use serverless snippets to find and integrate tools and code examples to help with their development workflow. I recommend searching other useful resources such as Serverless patterns and workflows collection to get started on your serverless application.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS Summit

AWS Summit – Registration is open for upcoming in-person AWS Summits that might be close to you in August and September: Anaheim (August 18), Chicago (August 28), Canberra (August 31), Ottawa (September 8), New Delhi (September 9), and Mexico City (September 21–22).

AWS Innovate – Data Edition – On August 23, learn how a modern data strategy can support your present and future use cases, including steps to build an end-to-end data solution to store and access, analyze and visualize, and even predict.

AWS Innovate – For Every Application Edition – On August 25, learn about a wide selection of AWS solutions across compute, storage, networking, hybrid, and edge infrastructure to help you scale application resources seamlessly and optimally.

Although these two Innovate events will be held in the Asia Pacific and Japan time zones, you can view on-demand videos for two months following your registration.

Also, we are preparing 16 upcoming online tech talks on August 15–26  to cover a range of topics and expertise levels and feature technical deep dives, demonstrations, customer examples, and live Q&A with AWS experts.

That’s all for this week. Check back next Monday for another Week in Review!

— Channy

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Automating Recommendation Engine Training with Amazon Personalize and AWS Glue

Post Syndicated from Alexander Spivak original https://aws.amazon.com/blogs/architecture/automating-recommendation-engine-training-with-amazon-personalize-and-aws-glue/

Customers from startups to enterprises observe increased revenue when personalizing customer interactions. Still, many companies are not yet leveraging the power of personalization, or, are relying solely on rule-based strategies. Those strategies are effort-intensive to maintain and not effective. Common reasons for not launching machine learning (ML) based personalization projects include: the complexity of aggregating and preparing the datasets, gaps in data science expertise and the lack of trust regarding the quality of ML recommendations.

This blog post demonstrates an approach for product recommendations to mitigate those concerns using historical datasets. To get started with your personalization journey, you don’t need ML expertise or a data lake. The following serverless end-to-end architecture involves aggregating and transforming the required data, as well as automatically training an ML-based recommendation engine.

I will outline the architectural production-ready setup for personalized product recommendations based on historical datasets. This is of interest to data analysts who look for ways to bring an existing recommendation engine to production, as well as solutions architects new to ML.

Solution Overview

The two core elements to create a proof-of-concept for ML-based product recommendations are:

  1. the recommendation engine and,
  2. the data set to train the recommendation engine.

Let’s start with the recommendation engine first, and work backwards to the corresponding data needs.

Product recommendation engine

To create the product recommendation engine, we use Amazon Personalize. Amazon Personalize differentiates three types of input data:

  1. user events called interactions (user events like views, signups or likes),
  2. item metadata (description of your items: category, genre or availability), and
  3. user metadata (age, gender, or loyalty membership).

An interactions dataset is typically the minimal requirement to build a recommendation system. Providing user and item metadata datasets improves recommendation accuracy, and enables cold starts, item discovery and dynamic recommendation filtering.

Most companies already have existing historical datasets to populate all three input types. In the case of retail companies, the product order history is a good fit for interactions. In the case of the media and entertainment industry, the customer’s consumption history maps to the interaction dataset. The product and media catalogs map to the items dataset and the customer profiles to the user dataset.

Amazon Personalize: from datasets to a recommendation API

Amazon Personalize: from datasets to a recommendation API

The Amazon Personalize Deep Dive Series provides a great introduction into the service and explores the topics of training, inference and operations. There are also multiple blog posts available explaining how to create a recommendation engine with Amazon Personalize and how to select the right metadata for the engine training. Additionally, the Amazon Personalize samples repository in GitHub showcases a variety of topics: from getting started with Amazon Personalize, up to performing a POC in a Box using existing datasets, and, finally, automating the recommendation engine with MLOps. In this post, we focus on getting the data from the historical data sources into the structure required by Amazon Personalize.

Creating the dataset

While manual data exports are a quick way to get started with one-time datasets for experiments, we use AWS Glue to automate this process. The automated approach with AWS Glue speeds up the proof of concept (POC) phase and simplifies the process to production by:

  • easily reproducing dataset exports from various data sources. This are used to iterate with other feature sets for recommendation engine training.
  • adding additional data sources and using those to enrich existing datasets
  • efficiently performing transformation logic like column renaming and fuzzy matching out of the box with code generation support.

AWS Glue is a serverless data integration service that is scalable and simple to use. It provides all of the capabilities needed for data integration and supports a wide variety of data sources: Amazon S3 buckets, JDBC connectors, MongoDB databases, Kafka, and Amazon Redshift, the AWS data warehouse. You can even make use of data sources living outside of your AWS environment, e.g. on-premises data centers or other services outside of your VPC. This enables you to perform a data-driven POC even when the data is not yet in AWS.

Modern application environments usually combine multiple heterogeneous database systems, like operational relational and NoSQL databases, in addition to, the BI-powering data warehouses. With AWS Glue, we orchestrate the ETL (extract, transform, and load) jobs to fetch the relevant data from the corresponding data sources. We then bring it into a format that Amazon Personalize understands: CSV files with pre-defined column names hosted in an Amazon S3 bucket.

Each dataset consists of one or multiple CSV files, which can be uniquely identified by an Amazon S3 prefix. Additionally, each dataset must have an associated schema describing the structure of your data. Depending on the dataset type, there are required and pre-defined fields:

  • USER_ID (string) and one additional metadata field for the users dataset
  • ITEM_ID (string) and one additional metadata field for the items dataset
  • USER_ID (string), ITEM_ID (string), TIMESTAMP (long; as Epoch time) for the interactions dataset

The following graph presents a high-level architecture for a retail customer, who has a heterogeneous data store landscape.

Using AWS Glue to export datasets from heterogeneous data sources to Amazon S3

Using AWS Glue to export datasets from heterogeneous data sources to Amazon S3

To understand how AWS Glue connects to the variety of data sources and allows transforming the data into the required format, we need to drill down into the AWS Glue concepts and components.

One of the key components of AWS Glue is the AWS Glue Data Catalog: a persistent metadata store containing table definitions, connection information, as well as, the ETL job definitions.
The tables are metadata definitions representing the structure of the data in the defined data sources. They do not contain any data entries from the sources but solely the structure definition. You can create a table either manually or automatically by using AWS Glue Crawlers.

AWS Glue Crawlers scan the data in the data sources, extract the schema information from it, and store the metadata as tables in the AWS Glue Data Catalog. This is the preferred approach for defining tables. The crawlers use AWS Glue Connections to connect to the data sources. Each connection contains the properties that are required to connect to a particular data store. The connections will be also used later by the ETL jobs to fetch the data from the data sources.

AWS Glue Crawlers also help to overcome a challenge frequently appearing in microservice environments. Microservice architectures are frequently operated by fully independent and autonomous teams. This means that keeping track of changes to the data source format becomes a challenge. Based on a schedule, the crawlers can be triggered to update the metadata for the relevant data sources in the AWS Glue Data Catalog automatically. To detect cases when a schema change would break the ETL job logic, you can combine the CloudWatch Events emitted by AWS Glue on updating the Data Catalog tables with an AWS Lambda function or a notification send via the Amazon Simple Notification Service (SNS).

The AWS Glue ETL jobs use the defined connections and the table information from the Data Catalog to extract the data from the described sources, apply the user-defined transformation logic and write the results into a data sink. AWS Glue can automatically generate code for ETL jobs to help perform a variety of useful data transformation tasks. AWS Glue Studio makes the ETL development even simpler by providing an easy-to-use graphical interface that accelerates the development and allows designing jobs without writing any code. If required, the generated code can be fully customized.

AWS Glue supports Apache Spark jobs, written either in Python or in Scala, and Python Shell jobs. Apache Spark jobs are optimized to run in a highly scalable, distributed way dealing with any amount of data and are a perfect fit for data transformation jobs. The Python Shell jobs provide access to the raw Python execution environment, which is less scalable but provides a cost-optimized option for orchestrating AWS SDK calls.

The following diagram visualizes the interaction between the components described.

The basic concepts of populating your Data Catalog and processing ETL dataflow in AWS Glue

The basic concepts of populating your Data Catalog and processing ETL dataflow in AWS Glue

For each Amazon Personalize dataset type, we create a separate ETL job. Since those jobs are independent, they also can run in parallel. After all jobs have successfully finished, we can start the recommendation engine training. AWS Glue Workflows allow simplifying data pipelines by visualizing and monitoring complex ETL activities involving multiple crawlers, jobs, and triggers, as a single entity.

The following graph visualizes a typical dataset export workflow for training a recommendation engine, which consists of:

  • a workflow trigger being either manual or scheduled
  • a Python Shell job to remove the results of the previous export workflow from S3
  • a trigger firing when the removal job is finished and initiating the parallel execution of the dataset ETL jobs
  • the three Apache Spark ETL jobs, one per dataset type
  • a trigger firing when all three ETL jobs are finished and initiating the training notification job
  • a Python Shell job to initiate a new dataset import or a full training cycle in Amazon Personalize (e.g. by triggering the MLOps pipeline using the AWS SDK)

 

AWS Glue workflow for extracting the three datasets and triggering the training workflow of the recommendation engine

AWS Glue workflow for extracting the three datasets and triggering the training workflow of the recommendation engine

Combining the data export and the recommendation engine

In the previous sections, we discussed how to create an ML-based recommendation engine and how to create the datasets for the training of the engine. In this section, we combine both parts of the solution leveraging an adjusted version of the MLOps pipeline solution available on GitHub to speed up the iterations on new solution versions by avoiding manual steps. Moreover, automation means new items can be put faster into production.

The MLOps pipeline uses a JSON file hosted in an S3 bucket to describe the training parameters for Amazon Personalize. The creation of a new parameter file version triggers a new training workflow orchestrated in a serverless manner using AWS Step Functions and AWS Lambda.

To integrate the Glue data export workflow described in the previous section, we also enable the Glue workflow to trigger the training pipeline. Additionally, we manipulate the pipeline to read the parameter file as the first pipeline step. The resulting architecture enables an automated end-to-end set up from dataset export up to the recommendation engine creation.

End-to-end architecture combining the data export with AWS Glue, the MLOps training workflow and Amazon Personalize

End-to-end architecture combining the data export with AWS Glue, the MLOps training workflow, and Amazon Personalize

The architecture for the end-to-end data export and recommendation engine creation solution is completely serverless. This makes it highly scalable, reliable, easy to maintain, and cost-efficient. You pay only for what you consume. For example, in the case of the data export, you pay only for the duration of the AWS Glue crawler executions and ETL jobs. These are only need to run to iterate with a new dataset.

The solution is also flexible in terms of the connected data sources. This architecture is also recommended for use cases with a single data source. You can also start with a single data store and enrich the datasets on-demand with additional data sources in future iterations.

Testing the quality of the solution

A common approach to validate the quality of the solution is the A/B testing technique, which is widely used to measure the efficacy of generated recommendations. Based on the testing results, you can iterate on the recommendation engine by optimizing the underlying datasets and models. The high degree of automation increases the speed of iterations and the resiliency of the end-to-end process.

Conclusion

In this post, I presented a typical serverless architecture for a fully automated, end-to-end ML-based recommendation engine leveraging available historical datasets. As you begin to experiment with ML-based personalization, you will unlock value currently hidden in the data. This helps mitigate potential concerns like the lack of trust in machine learning and you can put the resulting engine into production.

Start your personalization journey today with the Amazon Personalize code samples and bring the engine to production with the architecture outlined in this blog. As a next step, you can involve recording real-time events to update the generated recommendations automatically based on the event data.