Tag Archives: personalization

Page Simulator

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/page-simulator-fa02069fb269?source=rss----2615bd06b42e---4

Page Simulation for Better Offline Metrics at Netflix

by David Gevorkyan, Mehmet Yilmaz, Ajinkya More, Gaurav Agrawal,
Richard Wellington, Vivek Kaushal, Prasanna Padmanabhan, Justin Basilico

At Netflix, we spend a lot of effort to make it easy for our members to find content they will love. To make this happen, we personalize many aspects of our service, including which movies and TV shows we present on each member’s homepage. Over the years, we have built a recommendation system that uses many different machine learning algorithms to create these personalized recommendations. We also apply additional business logic to handle constraints like maturity filtering and deduplication of videos. All of these algorithms and logic come together in our page generation system to produce a personalized homepage for each of our members, which we have outlined in a previous post. While a diverse set of algorithms working together can produce a great outcome, innovating on such a complex system can be difficult. For instance, adding a single feature to one of the recommendation algorithms can change how the whole page is put together. Conversely, a big change to such a ranking system may only have a small incremental impact (for instance because it makes the ranking of a row similar to that of another existing row).

Every aspect is personalized

With systems driven by machine learning, it is important to measure the overall system-level impact of changes to a model, not just the local impact on the model performance itself. One way to do this is by running A/B tests. Netflix typically A/B tests all changes before rolling them out to all members. A drawback to this approach is that tests take time to run and require experimental models be ready to run in production. In Machine Learning, offline metrics are often used to measure the performance of model changes on historical data. With a good offline metric, we can gain a reasonable understanding of how a particular model change would perform online. We would like to extend this approach, which is typically applied to a single machine-learned model, and apply it to the entire homepage generation system. This would allow us to measure the potential impact of offline changes in any of the models or logic involved in creating the homepage before running an A/B test.

To achieve this goal, we have built a system that simulates what a member’s homepage would have been given an experimental change and compares it against the page the member actually saw in the service. This provides an indication of the overall quality of the change. While we primarily use this for evaluating modifications to our machine learning algorithms, such as what happens when we have a new row selection or ranking algorithm, we can also use it to evaluate any changes in the code used to construct the page, from filtering rules to new row types. A key feature of this system is the ability to reconstruct a view of the systemic and user-level data state at a certain point in the past. As such, the system uses time-travel mechanisms for more precise reconstruction of an experience and coordinates time-travel across multiple systems. Thus, the simulator allows us to rapidly evaluate new ideas without needing to expose members to the changes.

In this blog post, we will go into more detail about this page simulation system and discuss some of the lessons we learned along the way.

Why Is This Hard?

A simulation system needs to run on many samples to generate reliable results. In our case, this requirement translates to generating millions of personalized homepages. Naturally, some problems of scale come into the picture, including:

  • How to ensure that the executions run within a reasonable time frame
  • How to coordinate work despite the distributed nature of the system
  • How to ensure that the system is easy to use and extend for future types of experiments

Stages Involved

At a high level, the Page Simulation system consists of the following stages:

We’ll go through each of these stages in more detail below.

Experiment Scope

The experiment scope determines the set of experimental pages that will be simulated and which data sources will be used to generate those pages. Thus, the experimenter needs to tailor the scope to the metrics the experiment aims to measure. This involves defining three aspects:

  • A data source
  • Stratification rules for profile selection
  • Number of profiles for the experiment

Data Sources

We provide two different mechanisms for data retrieval: via time travel and via live service calls.

In the first approach, we use data from time-travel infrastructure built at Netflix to compute pages as they would have been at some point in the past. In the experimentation landscape, this gives us the ability to backtest the performance of experimental page generation model accurately. In particular, it lets us compare a new page against a page that a member has seen and interacted with in the past, including what actions they took in the session.

The second approach retrieves data in the exact same way as the live production system. To simulate production systems closely, in this mode, we randomly select profiles that have recently logged into Netflix. The primary drawback of using live data is that we can only compute a limited set of metrics compared to the time-travel approach. However, this type of experiment is still valuable in the following scenarios:

  • Doing final sanity checks before allocating a new A/B test or rolling out a new feature
  • Analyzing changes in page composition, which are measures of the rows and videos on the page. These measures are needed to validate that the changes we seek to test are having the intended effect without unexpected side-effects
  • Determining if two approaches are producing sufficiently similar pages that we may not need to test both
  • Early detection of negative interactions between two features that will be rolled out simultaneously

Stratification

Once the data source is specified, a combination of different stratification types can be applied to refine user selection. Some examples of stratification types are:

  • Country — select profiles based on their country
  • Tenure — select profiles based on their membership tenure; long-term members vs members in trial period
  • Login device — select users based on their active device type; e.g. Smart TV, Android, or devices supporting certain feature sets

Number of Profiles

We typically start with a small number to perform a dry run of the experiment configuration and then extend it to millions of users to ensure reliable and statistically significant results.

Simulating Modified Behavior

Once the experiment scope is determined, experimenters specify the modifications they would like to test within the page generation framework. Generally, these changes can be made by either modifying the configuration of the existing system or by implementing new code and deploying it to the simulation system.

There are several ways to control what changes are run in the simulator, including but not limited to:

  1. A/B test allocations
  • Collect metrics of the behavior of an A/B test that is not yet allocated
  • Analyze the behavior across cells using custom metrics
  • Inspect the effect of cross-allocating members to multiple A/B tests

2. Page generation models

  • Compare performance of different page generation models
  • Evaluate interactions between different models (when page is constructed using multiple models)

3. Device capabilities and page geometry

  • Evaluate page composition for different geometries. Page geometry is the number of rows and columns, which differs between device types

Multiple modifications can be grouped together to define different experimental variants. During metrics computation we collect each metric at the level of variant and stratum. This detailed breakdown of metrics allows for a fine-grained attribution of any shifts in page characteristics.

Experiment Workflow

Architecture diagram of the Page Simulation System

The lifecycle of an experiment starts when a user (Engineer, Researcher, Data Scientist or Product Manager) configures an experiment and submits it for execution (detailed below). Once the execution is complete, they get detailed Tableau reports. Those reports contain page composition and other important metrics regarding their experiment, which can be split by the different variants under test.

The execution workflow for the experiment proceeds through the following stages:

  • Partition the experiment into smaller chunks
  • Compute pages asynchronously for each partition
  • Compute experiment metrics

Experiment Partition

In the Page Simulation system an experiment is configured as a single entity, however when executing the experiment, the system splits it into multiple partitions. This is needed to isolate different parts of the experiment for the following reasons:

  • Some modifications to the page algorithm might impact the latency of page generation significantly
  • When time traveling to different times, different clusters of the page generation system are needed for each time (more on this later)

Asynchronous Page Computation

We embrace asynchronous computation as much as possible, especially in the page computation stage, which can be very compute-intensive and time consuming due to the heavy machine-learned models we often test. Each experiment partition is sent out as an event to a Request Poster. The Request Poster is responsible for reading data and applying stratification to select profiles for each partition. For each selected profile, page computation requests are generated and sent to a dedicated queue per partition. Each queue is then processed by a separate Page Generation cluster that is launched to serve a particular partition. Once the generator is running, it processes the requests in the queue to compute the simulated pages. Generated pages are then persisted to an S3-backed Hive table for metrics processing.

We chose to use queue-based communication between the systems instead of RESTFul calls to decouple the systems and allow for easy retries of each request, as well as individual experiment partitions. Writing the generated pages to Hive and running the Metrics Computation stage out-of-band allows us to modify or add new metrics on previously generated pages, thus avoiding needing to regenerate them.

Creating Mini Netflix Ecosystem on the Fly

The page generation system at Netflix consists of many interdependent services. Experiments can simulate new behaviors in any number of these microservices. Thus, for each experiment, we need to create an isolated mini Netflix ecosystem where each service exhibits their respective new behaviors. Because of this isolation requirement, we architected a system that can create a mini Netflix ecosystem on the fly.

Our approach is to create Docker container stacks to define a mini Netflix ecosystem for each simulation. We use Titus as a container management platform, which was built internally at Netflix. We configure each cluster using custom bootstrapping code in order to create different server environments, for example to initialize the containers with different machine-learned model versions and other data to precisely replicate time-traveled state in the past. Because we would like to time-travel all the services together to replicate a specific point in time in the past, we created a new capability to start stacks of multiple services with a common time configuration and route traffic between them on-the-fly per experiment to maintain temporal accuracy of the data. This capability provides the precision we need to simulate and correlate metrics correctly with actions of our members that happened in the past.

Achieving high temporal accuracy across multiple systems and data sources is challenging. It took us several iterations to determine the correct set of data and services to include in this time-travel scheme for accurate simulation of pages in time-travel mode. To this end, we developed tools that compared real pages computed by our live production system with that of our simulators, both in terms of the final output and the features involved in our models. To ensure that we maintain temporal accuracy going forward, we also automated these checks to avoid future regressions and identify new data sources that we need to handle. As such, the system is architected in a flexible way so we can easily incorporate more downstream systems into the time-travel experiment workflow.

Metrics Computation

Once the generated pages are saved to a Hive table, the system sends a signal to the workflow manager (Controller) for the completion of the page generation experiment. This signal triggers a Spark job to calculate the metrics, normalize the results and save both the raw and normalized data to Hive. Experimenters can then access the results of their experiment either using pre-configured Tableau reports or from notebooks that pull the raw data from Hive. If necessary, they can also access the simulated pages to compute new experiment-specific metrics.

Experiment Workflow Management

Given the asynchronous nature of the experiment workflow and the need to govern the lifecycle of multiple clusters dedicated to each partition, we needed a solution to manage the experiment workflow. Thus, we built a simple and lightweight workflow management system with the following capabilities:

  • Automatic retry of workflow steps in case of a transient failure
  • Conditional execution of workflow steps
  • Recording execution history

We use this simple workflow engine for the execution of the following tasks:

  • Govern the lifecycle of page generation services dedicated to each partition (external startup, shutdown tasks)
  • Initialize metrics computation when page generation for all partitions is complete
  • Terminate the experiment when the experiment does not have a sufficient page yield (i.e. there is a high error rate)
  • Send out notifications to experiment owners on the status of the experiment
  • Listen to the heartbeat of all components in the experimentation system and terminate the experiment when an issue is detected

Status Keeper

To facilitate lifecycle management and to monitor the overall health of an experiment, we built a separate micro-service called Status Keeper. This service provides the following capabilities:

  • Expose a detailed report with granular metrics about different steps (Controller / Request Poster / Page Generator and Metrics Processor) in the system
  • Aid in lifecycle decisions to fast fail the experiment if failure threshold has reached
  • Store and retrieve status and aggregate metrics

Throughout the experiment workflow, each application in the Page Simulation system reports its status to the Status Keeper. We combine all the status and metrics recorded by each application in the system to create a view of the overall health of the system.

Metrics

Need for Offline Metrics

An important part of improving our page generation approach is having good offline metrics to track model performance and to compare different model variants. Usually, there is not a perfect correspondence between offline results and results from A/B testing (if there was, it would do away with the need for online testing). For example, suppose we build two model variants and we find that one is better than the other according to our offline metric. The online A/B test performance will usually be measured by a different metric, and it may turn out that the model that’s worse on the offline metric is actually the better model online or even that there is no statistically significant difference between the two models online. Given that A/B tests need to run for a while to measure long-term metrics, finding an offline metric that provides an accurate pulse of how the testing might pan out is critical. So one of the main objectives in building our page simulation system was to come up with offline metrics that correspond better with online A/B metrics.

Presentation Bias

One major source of discrepancy between online and offline results is presentation bias. The real pages we presented to our members are the result of ranking videos and rows from our current production page generation models. Thus, the engagement data (what members click, play or thumb) we get as a result can be strongly influenced by those models. Members can only see and play from rows that the production system served to them. Thus, it is important that our offline metrics mitigate this bias (i.e. it should not unduly favor or disfavor the production model).

Validation

In the absence of A/B testing results on new candidate models, there is no ground truth to compare offline metrics against. However, because of the system described above, we can simulate how a member’s page might have looked at a past point-in-time if it had been generated by our new model instead of the production model. Because of time travel, we could also build the new model based on the data available at that time so as to get us as close as possible to the unobserved counterfactual page that the new model would have shown.

Given these pages, the next question to answer was exactly what numerical metrics we can use for validating the effectiveness of our offline metrics. This turned out to be easy with the new system because we could use models from past A/B tests to ascertain how well the offline metrics computed on the simulated pages correlated with the actual online metrics for those A/B tests. That is, we could take the hypothetical pages generated by certain models, evaluate them according to an offline metric, and then see how well those offline metrics correspond to online ones. After trying out a few variations, we were able to settle on a suite of metrics that had a much stronger correlation with corresponding online metrics across many A/B tests as compared to our previous offline metric, as shown below.

Benefits

Having such offline metrics that strongly correlate with online metrics allows us to experiment more rapidly and reject model variants which may not be significantly better than the current production model, thus saving valuable A/B testing bandwidth and time. It has also helped us detect bugs early in the model development process when the offline metrics go vigorously against our hypothesis. This has saved many development cycles, experimentation cycles, and has enabled us to try out more ideas.

In addition, these offline metrics enable us to:

  • Compare models trained with different objective functions
  • Compare models trained on different datasets
  • Compare page construction related changes outside of our machine learning models
  • Reconcile effects due to changes arising out of many A/B tests running simultaneously

Conclusion

Personalizing home pages for users is a hard problem and one that traditionally required us to run A/B tests to find out whether a new approach works. However, our Page Simulation system allows us to rapidly try out new ideas and obtain results without needing to expose our members to all these experiences. Being able to create a mini Netflix ecosystem on the fly helps us iterate fast and allows us to try out more far-fetched ideas. Building this system was a big collaboration between our engineering and research teams that allows our researchers to run page simulations and our engineers to quickly extend the system to accommodate new types of simulations. This, in turn, has resulted in improvements of the personalized homepages for our members. If you are interested in helping us solve these types of problems and helping entertain the world, please take a look at some of our open positions on the Netflix jobs page.


Page Simulator was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Predictive User Engagement using Amazon Pinpoint and Amazon Personalize

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/predictive-user-engagement-using-amazon-pinpoint-and-amazon-personalize/

Note: This post was written by John Burry, a Solution Architect on the AWS Customer Engagement team.


Predictive User Engagement (PUE) refers to the integration of machine learning (ML) and customer engagement services. By implementing a PUE solution, you can combine ML-based predictions and recommendations with real-time notifications and analytics, all based on your customers’ behaviors.

This blog post shows you how to set up a PUE solution by using Amazon Pinpoint and Amazon Personalize. Best of all, you can implement this solution even if you don’t have any prior machine learning experience. By completing the steps in this post, you’ll be able to build your own model in Personalize, integrate it with Pinpoint, and start sending personalized campaigns.

Prerequisites

Before you complete the steps in this post, you need to set up the following:

  • Create an admin user in Amazon Identity and Account Management (IAM). For more information, see Creating Your First IAM Admin User and Group in the IAM User Guide. You need to specify the credentials of this user when you set up the AWS Command Line Interface.
  • Install Python 3 and the pip package manager. Python 3 is installed by default on recent versions of Linux and macOS. If it isn’t already installed on your computer, you can download an installer from the Python website.
  • Use pip to install the following modules:
    • awscli
    • boto3
    • jupyter
    • matplotlib
    • sklearn
    • sagemaker

    For more information about installing modules, see Installing Python Modules in the Python 3.X Documentation.

  • Configure the AWS Command Line Interface (AWS CLI). During the configuration process, you have to specify a default AWS Region. This solution uses Amazon Sagemaker to build a model, so the Region that you specify has to be one that supports Amazon Sagemaker. For a complete list of Regions where Sagemaker is supported, see AWS Service Endpoints in the AWS General Reference. For more information about setting up the AWS CLI, see Configuring the AWS CLI in the AWS Command Line Interface User Guide.
  • Install Git. Git is installed by default on most versions of Linux and macOS. If Git isn’t already installed on your computer, you can download an installer from the Git website.

Step 1: Create an Amazon Pinpoint Project

In this section, you create and configure a project in Amazon Pinpoint. This project contains all of the customers that we will target, as well as the recommendation data that’s associated with each one. Later, we’ll use this data to create segments and campaigns.

To set up the Amazon Pinpoint project

  1. Sign in to the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint/.
  2. On the All projects page, choose Create a project. Enter a name for the project, and then choose Create.
  3. On the Configure features page, under SMS and voice, choose Configure.
  4. Under General settings, select Enable the SMS channel for this project, and then choose Save changes.
  5. In the navigation pane, under Settings, choose General settings. In the Project details section, copy the value under Project ID. You’ll need this value later.

Step 2: Create an Endpoint

In Amazon Pinpoint, an endpoint represents a specific method of contacting a customer, such as their email address (for email messages) or their phone number (for SMS messages). Endpoints can also contain custom attributes, and you can associate multiple endpoints with a single user. In this example, we use these attributes to store the recommendation data that we receive from Amazon Personalize.

In this section, we create a new endpoint and user by using the AWS CLI. We’ll use this endpoint to test the SMS channel, and to test the recommendations that we receive from Personalize.

To create an endpoint by using the AWS CLI

  1. At the command line, enter the following command:
    aws pinpoint update-endpoint --application-id <project-id> \
    --endpoint-id 12456 --endpoint-request "Address='<mobile-number>', \
    ChannelType='SMS',User={UserAttributes={recommended_items=['none']},UserId='12456'}"

    In the preceding example, replace <project-id> with the Amazon Pinpoint project ID that you copied in Step 1. Replace <mobile-number> with your phone number, formatted in E.164 format (for example, +12065550142).

Note that this endpoint contains hard-coded UserId and EndpointId values of 12456. These IDs match an ID that we’ll create later when we generate the Personalize data set.

Step 3: Create a Segment and Campaign in Amazon Pinpoint

Now that we have an endpoint, we need to add it to a segment so that we can use it within a campaign. By sending a campaign, we can verify that our Pinpoint project is configured correctly, and that we created the endpoint correctly.

To create the segment and campaign

  1. Open the Pinpoint console at http://console.aws.amazon.com/pinpoint, and then choose the project that you created in Step 1.
  2. In the navigation pane, choose Segments, and then choose Create a segment.
  3. Name the segment “No recommendations”. Under Segment group 1, on the Add a filter menu, choose Filter by user.
  4. On the Choose a user attribute menu, choose recommended-items. Set the value of the filter to “none”.
  5. Confirm that the Segment estimate section shows that there is one eligible endpoint, and then choose Create segment.
  6. In the navigation pane, choose Campaigns, and then choose Create a campaign.
  7. Name the campaign “SMS to users with no recommendations”. Under Choose a channel for this campaign, choose SMS, and then choose Next.
  8. On the Choose a segment page, choose the “No recommendations” segment that you just created, and then choose Next.
  9. In the message editor, type a test message, and then choose Next.
  10. On the Choose when to send the campaign page, keep all of the default values, and then choose Next.
  11. On the Review and launch page, choose Launch campaign. Within a few seconds, you receive a text message at the phone number that you specified when you created the endpoint.

Step 4: Load sample data into Amazon Personalize

At this point, we’ve finished setting up Amazon Pinpoint. Now we can start loading data into Amazon Personalize.

To load the data into Amazon Personalize

  1. At the command line, enter the following command to clone the sample data and Jupyter Notebooks to your computer:
    git clone https://github.com/markproy/personalize-car-search.git

  2. At the command line, change into the directory that contains the data that you just cloned. Enter the following command:
    jupyter notebook

    A new window opens in your web browser.

  3. In your web browser, open the first notebook (01_generate_data.ipynb). On the Cell menu, choose Run all. Wait for the commands to finish running.
  4. Open the second notebook (02_make_dataset_group.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete. Make sure that all of the commands have run successfully before you proceed to the next step.
  5. Open the third notebook (03_make_campaigns.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete. Make sure that all of the commands have run successfully before you proceed to the next step.
  6. Open the fourth notebook (04_use_the_campaign.ipynb). In the first step, replace the value of the account_id variable with the ID of your AWS account. Then, on the Cell menu, choose Run all. This step takes several minutes to complete.
  7. After the fourth notebook is finished running, choose Quit to terminate the Jupyter Notebook. You don’t need to run the fifth notebook for this example.
  8. Open the Amazon Personalize console at http://console.aws.amazon.com/personalize. Verify that Amazon Personalize contains one dataset group named car-dg.
  9. In the navigation pane, choose Campaigns. Verify that it contains all of the following campaigns, and that the status for each campaign is Active:
    • car-popularity-count
    • car-personalized-ranking
    • car-hrnn-metadata
    • car-sims
    • car-hrnn

Step 5: Create the Lambda function

We’ve loaded the data into Amazon Personalize, and now we need to create a Lambda function to update the endpoint attributes in Pinpoint with the recommendations provided by Personalize.

The version of the AWS SDK for Python that’s included with Lambda doesn’t include the libraries for Amazon Personalize. For this reason, you need to download these libraries to your computer, put them in a .zip file, and upload the entire package to Lambda.

To create the Lambda function

  1. In a text editor, create a new file. Paste the following code.
    # Copyright 2010-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
    #
    # This file is licensed under the Apache License, Version 2.0 (the "License").
    # You may not use this file except in compliance with the License. A copy of the
    # License is located at
    #
    # http://aws.amazon.com/apache2.0/
    #
    # This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
    # OF ANY KIND, either express or implied. See the License for the specific
    # language governing permissions and limitations under the License.
    
    AWS_REGION = "<region>"
    PROJECT_ID = "<project-id>"
    CAMPAIGN_ARN = "<car-hrnn-campaign-arn>"
    USER_ID = "12456"
    endpoint_id = USER_ID
    
    from datetime import datetime
    import json
    import boto3
    import logging
    from botocore.exceptions import ClientError
    
    DATE = datetime.now()
    
    personalize           = boto3.client('personalize')
    personalize_runtime   = boto3.client('personalize-runtime')
    personalize_events    = boto3.client('personalize-events')
    pinpoint              = boto3.client('pinpoint')
    
    def lambda_handler(event, context):
        itemList = get_recommended_items(USER_ID,CAMPAIGN_ARN)
        response = update_pinpoint_endpoint(PROJECT_ID,endpoint_id,itemList)
    
        return {
            'statusCode': 200,
            'body': json.dumps('Lambda execution completed.')
        }
    
    def get_recommended_items(user_id, campaign_arn):
        response = personalize_runtime.get_recommendations(campaignArn=campaign_arn, 
                                                           userId=str(user_id), 
                                                           numResults=10)
        itemList = response['itemList']
        return itemList
    
    def update_pinpoint_endpoint(project_id,endpoint_id,itemList):
        itemlistStr = []
        
        for item in itemList:
            itemlistStr.append(item['itemId'])
    
        pinpoint.update_endpoint(
        ApplicationId=project_id,
        EndpointId=endpoint_id,
        EndpointRequest={
                            'User': {
                                'UserAttributes': {
                                    'recommended_items': 
                                        itemlistStr
                                }
                            }
                        }
        )    
    
        return
    

    In the preceding code, make the following changes:

    • Replace <region> with the name of the AWS Region that you want to use, such as us-east-1.
    • Replace <project-id> with the ID of the Amazon Pinpoint project that you created earlier.
    • Replace <car-hrnn-campaign-arn> with the Amazon Resource Name (ARN) of the car-hrnn campaign in Amazon Personalize. You can find this value in the Amazon Personalize console.
  2. Save the file as pue-get-recs.py.
  3. Create and activate a virtual environment. In the virtual environment, use pip to download the latest versions of the boto3 and botocore libraries. For complete procedures, see Updating a Function with Additional Dependencies With a Virtual Environment in the AWS Lambda Developer Guide. Also, add the pue-get-recs.py file to the .zip file that contains the libraries.
  4. Open the IAM console at http://console.aws.amazon.com/iam. Create a new role. Attach the following policy to the role:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogStream",
                    "logs:DescribeLogGroups",
                    "logs:CreateLogGroup",
                    "logs:PutLogEvents",
                    "personalize:GetRecommendations",
                    "mobiletargeting:GetUserEndpoints",
                    "mobiletargeting:GetApp",
                    "mobiletargeting:UpdateEndpointsBatch",
                    "mobiletargeting:GetApps",
                    "mobiletargeting:GetEndpoint",
                    "mobiletargeting:GetApplicationSettings",
                    "mobiletargeting:UpdateEndpoint"
                ],
                "Resource": "*"
            }
        ]
    }
    
  5. Open the Lambda console at http://console.aws.amazon.com/lambda, and then choose Create function.
  6. Create a new Lambda function from scratch. Choose the Python 3.7 runtime. Under Permissions, choose Use an existing role, and then choose the IAM role that you just created. When you finish, choose Create function.
  7. Upload the .zip file that contains the Lambda function and the boto3 and botocore libraries.
  8. Under Function code, change the Handler value to pue-get-recs.lambda_handler. Save your changes.

When you finish creating the function, you can test it to make sure it was set up correctly.

To test the Lambda function

  1. On the Select a test event menu, choose Configure test events. On the Configure test events window, specify an Event name, and then choose Create.
  2. Choose the Test button to execute the function.
  3. If the function executes successfully, open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint.
  4. In the navigation pane, choose Segments, and then choose the “No recommendations” segment that you created earlier. Verify that the number under total endpoints is 0. This is the expected value; the segment is filtered to only include endpoints with no recommendation attributes, but when you ran the Lambda function, it added recommendations to the test endpoint.

Step 7: Create segments and campaigns based on recommended items

In this section, we’ll create a targeted segment based on the recommendation data provided by our Personalize dataset. We’ll then use that segment to create a campaign.

To create a segment and campaign based on personalized recommendations

  1. Open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint. On the All projects page, choose the project that you created earlier.
  2. In the navigation pane, choose Segments, and then choose Create a segment. Name the new segment “Recommendations for product 26304”.
  3. Under Segment group 1, on the Add a filter menu, choose Filter by user. On the Choose a user attribute menu, choose recommended-items. Set the value of the filter to “26304”. Confirm that the Segment estimate section shows that there is one eligible endpoint, and then choose Create segment.
  4. In the navigation pane, choose Campaigns, and then choose Create a campaign.
  5. Name the campaign “SMS to users with recommendations for product 26304”. Under Choose a channel for this campaign, choose SMS, and then choose Next.
  6. On the Choose a segment page, choose the “Recommendations for product 26304” segment that you just created, and then choose Next.
  7. In the message editor, type a test message, and then choose Next.
  8. On the Choose when to send the campaign page, keep all of the default values, and then choose Next.
  9. On the Review and launch page, choose Launch campaign. Within a few seconds, you receive a text message at the phone number that you specified when you created the endpoint.

Next steps

Your PUE solution is now ready to use. From here, there are several ways that you can make the solution your own:

  • Expand your usage: If you plan to continue sending SMS messages, you should request a spending limit increase.
  • Extend to additional channels: This post showed the process of setting up an SMS campaign. You can add more endpoints—for the email or push notification channels, for example—and associate them with your users. You can then create new segments and new campaigns in those channels.
  • Build your own model: This post used a sample data set, but Amazon Personalize makes it easy to provide your own data. To start building a model with Personalize, you have to provide a data set that contains information about your users, items, and interactions. To learn more, see Getting Started in the Amazon Personalize Developer Guide.
  • Optimize your model: You can enrich your model by sending your mobile, web, and campaign engagement data to Amazon Personalize. In Pinpoint, you can use event streaming to move data directly to S3, and then use that data to retrain your Personalize model. To learn more about streaming events, see Streaming App and Campaign Events in the Amazon Pinpoint User Guide.
  • Update your recommendations on a regular basis: Use the create-campaign API to create a new recurring campaign. Rather than sending messages, include the hook property with a reference to the ARN of the pue-get-recs function. By completing this step, you can configure Pinpoint to retrieve the most up-to-date recommendation data each time the campaign recurs. For more information about using Lambda to modify segments, see Customizing Segments with AWS Lambda in the Amazon Pinpoint Developer Guide.

How I built a data warehouse using Amazon Redshift and AWS services in record time

Post Syndicated from Stephen Borg original https://aws.amazon.com/blogs/big-data/how-i-built-a-data-warehouse-using-amazon-redshift-and-aws-services-in-record-time/

This is a customer post by Stephen Borg, the Head of Big Data and BI at Cerberus Technologies.

Cerberus Technologies, in their own words: Cerberus is a company founded in 2017 by a team of visionary iGaming veterans. Our mission is simple – to offer the best tech solutions through a data-driven and a customer-first approach, delivering innovative solutions that go against traditional forms of working and process. This mission is based on the solid foundations of reliability, flexibility and security, and we intend to fundamentally change the way iGaming and other industries interact with technology.

Over the years, I have developed and created a number of data warehouses from scratch. Recently, I built a data warehouse for the iGaming industry single-handedly. To do it, I used the power and flexibility of Amazon Redshift and the wider AWS data management ecosystem. In this post, I explain how I was able to build a robust and scalable data warehouse without the large team of experts typically needed.

In two of my recent projects, I ran into challenges when scaling our data warehouse using on-premises infrastructure. Data was growing at many tens of gigabytes per day, and query performance was suffering. Scaling required major capital investment for hardware and software licenses, and also significant operational costs for maintenance and technical staff to keep it running and performing well. Unfortunately, I couldn’t get the resources needed to scale the infrastructure with data growth, and these projects were abandoned. Thanks to cloud data warehousing, the bottleneck of infrastructure resources, capital expense, and operational costs have been significantly reduced or have totally gone away. There is no more excuse for allowing obstacles of the past to delay delivering timely insights to decision makers, no matter how much data you have.

With Amazon Redshift and AWS, I delivered a cloud data warehouse to the business very quickly, and with a small team: me. I didn’t have to order hardware or software, and I no longer needed to install, configure, tune, or keep up with patches and version updates. Instead, I easily set up a robust data processing pipeline and we were quickly ingesting and analyzing data. Now, my data warehouse team can be extremely lean, and focus more time on bringing in new data and delivering insights. In this post, I show you the AWS services and the architecture that I used.

Handling data feeds

I have several different data sources that provide everything needed to run the business. The data includes activity from our iGaming platform, social media posts, clickstream data, marketing and campaign performance, and customer support engagements.

To handle the diversity of data feeds, I developed abstract integration applications using Docker that run on Amazon EC2 Container Service (Amazon ECS) and feed data to Amazon Kinesis Data Streams. These data streams can be used for real time analytics. In my system, each record in Kinesis is preprocessed by an AWS Lambda function to cleanse and aggregate information. My system then routes it to be stored where I need on Amazon S3 by Amazon Kinesis Data Firehose. Suppose that you used an on-premises architecture to accomplish the same task. A team of data engineers would be required to maintain and monitor a Kafka cluster, develop applications to stream data, and maintain a Hadoop cluster and the infrastructure underneath it for data storage. With my stream processing architecture, there are no servers to manage, no disk drives to replace, and no service monitoring to write.

Setting up a Kinesis stream can be done with a few clicks, and the same for Kinesis Firehose. Firehose can be configured to automatically consume data from a Kinesis Data Stream, and then write compressed data every N minutes to Amazon S3. When I want to process a Kinesis data stream, it’s very easy to set up a Lambda function to be executed on each message received. I can just set a trigger from the AWS Lambda Management Console, as shown following.

I also monitor the duration of function execution using Amazon CloudWatch and AWS X-Ray.

Regardless of the format I receive the data from our partners, I can send it to Kinesis as JSON data using my own formatters. After Firehose writes this to Amazon S3, I have everything in nearly the same structure I received but compressed, encrypted, and optimized for reading.

This data is automatically crawled by AWS Glue and placed into the AWS Glue Data Catalog. This means that I can immediately query the data directly on S3 using Amazon Athena or through Amazon Redshift Spectrum. Previously, I used Amazon EMR and an Amazon RDS–based metastore in Apache Hive for catalog management. Now I can avoid the complexity of maintaining Hive Metastore catalogs. Glue takes care of high availability and the operations side so that I know that end users can always be productive.

Working with Amazon Athena and Amazon Redshift for analysis

I found Amazon Athena extremely useful out of the box for ad hoc analysis. Our engineers (me) use Athena to understand new datasets that we receive and to understand what transformations will be needed for long-term query efficiency.

For our data analysts and data scientists, we’ve selected Amazon Redshift. Amazon Redshift has proven to be the right tool for us over and over again. It easily processes 20+ million transactions per day, regardless of the footprint of the tables and the type of analytics required by the business. Latency is low and query performance expectations have been more than met. We use Redshift Spectrum for long-term data retention, which enables me to extend the analytic power of Amazon Redshift beyond local data to anything stored in S3, and without requiring me to load any data. Redshift Spectrum gives me the freedom to store data where I want, in the format I want, and have it available for processing when I need it.

To load data directly into Amazon Redshift, I use AWS Data Pipeline to orchestrate data workflows. I create Amazon EMR clusters on an intra-day basis, which I can easily adjust to run more or less frequently as needed throughout the day. EMR clusters are used together with Amazon RDS, Apache Spark 2.0, and S3 storage. The data pipeline application loads ETL configurations from Spring RESTful services hosted on AWS Elastic Beanstalk. The application then loads data from S3 into memory, aggregates and cleans the data, and then writes the final version of the data to Amazon Redshift. This data is then ready to use for analysis. Spark on EMR also helps with recommendations and personalization use cases for various business users, and I find this easy to set up and deliver what users want. Finally, business users use Amazon QuickSight for self-service BI to slice, dice, and visualize the data depending on their requirements.

Each AWS service in this architecture plays its part in saving precious time that’s crucial for delivery and getting different departments in the business on board. I found the services easy to set up and use, and all have proven to be highly reliable for our use as our production environments. When the architecture was in place, scaling out was either completely handled by the service, or a matter of a simple API call, and crucially doesn’t require me to change one line of code. Increasing shards for Kinesis can be done in a minute by editing a stream. Increasing capacity for Lambda functions can be accomplished by editing the megabytes allocated for processing, and concurrency is handled automatically. EMR cluster capacity can easily be increased by changing the master and slave node types in Data Pipeline, or by using Auto Scaling. Lastly, RDS and Amazon Redshift can be easily upgraded without any major tasks to be performed by our team (again, me).

In the end, using AWS services including Kinesis, Lambda, Data Pipeline, and Amazon Redshift allows me to keep my team lean and highly productive. I eliminated the cost and delays of capital infrastructure, as well as the late night and weekend calls for support. I can now give maximum value to the business while keeping operational costs down. My team pushed out an agile and highly responsive data warehouse solution in record time and we can handle changing business requirements rapidly, and quickly adapt to new data and new user requests.


Additional Reading

If you found this post useful, be sure to check out Deploy a Data Warehouse Quickly with Amazon Redshift, Amazon RDS for PostgreSQL and Tableau Server and Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift.


About the Author

Stephen Borg is the Head of Big Data and BI at Cerberus Technologies. He has a background in platform software engineering, and first became involved in data warehousing using the typical RDBMS, SQL, ETL, and BI tools. He quickly became passionate about providing insight to help others optimize the business and add personalization to products. He is now the Head of Big Data and BI at Cerberus Technologies.

 

 

 

Optimize Delivery of Trending, Personalized News Using Amazon Kinesis and Related Services

Post Syndicated from Yukinori Koide original https://aws.amazon.com/blogs/big-data/optimize-delivery-of-trending-personalized-news-using-amazon-kinesis-and-related-services/

This is a guest post by Yukinori Koide, an the head of development for the Newspass department at Gunosy.

Gunosy is a news curation application that covers a wide range of topics, such as entertainment, sports, politics, and gourmet news. The application has been installed more than 20 million times.

Gunosy aims to provide people with the content they want without the stress of dealing with a large influx of information. We analyze user attributes, such as gender and age, and past activity logs like click-through rate (CTR). We combine this information with article attributes to provide trending, personalized news articles to users.

In this post, I show you how to process user activity logs in real time using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services.

Why does Gunosy need real-time processing?

Users need fresh and personalized news. There are two constraints to consider when delivering appropriate articles:

  • Time: Articles have freshness—that is, they lose value over time. New articles need to reach users as soon as possible.
  • Frequency (volume): Only a limited number of articles can be shown. It’s unreasonable to display all articles in the application, and users can’t read all of them anyway.

To deliver fresh articles with a high probability that the user is interested in them, it’s necessary to include not only past user activity logs and some feature values of articles, but also the most recent (real-time) user activity logs.

We optimize the delivery of articles with these two steps.

  1. Personalization: Deliver articles based on each user’s attributes, past activity logs, and feature values of each article—to account for each user’s interests.
  2. Trends analysis/identification: Optimize delivering articles using recent (real-time) user activity logs—to incorporate the latest trends from all users.

Optimizing the delivery of articles is always a cold start. Initially, we deliver articles based on past logs. We then use real-time data to optimize as quickly as possible. In addition, news has a short freshness time. Specifically, day-old news is past news, and even the news that is three hours old is past news. Therefore, shortening the time between step 1 and step 2 is important.

To tackle this issue, we chose AWS for processing streaming data because of its fully managed services, cost-effectiveness, and so on.

Solution

The following diagrams depict the architecture for optimizing article delivery by processing real-time user activity logs

There are three processing flows:

  1. Process real-time user activity logs.
  2. Store and process all user-based and article-based logs.
  3. Execute ad hoc or heavy queries.

In this post, I focus on the first processing flow and explain how it works.

Process real-time user activity logs

The following are the steps for processing user activity logs in real time using Kinesis Data Streams and Kinesis Data Analytics.

  1. The Fluentd server sends the following user activity logs to Kinesis Data Streams:
{"article_id": 12345, "user_id": 12345, "action": "click"}
{"article_id": 12345, "user_id": 12345, "action": "impression"}
...
  1. Map rows of logs to columns in Kinesis Data Analytics.

  1. Set the reference data to Kinesis Data Analytics from Amazon S3.

a. Gunosy has user attributes such as gender, age, and segment. Prepare the following CSV file (user_id, gender, segment_id) and put it in Amazon S3:

101,female,1
102,male,2
103,female,3
...

b. Add the application reference data source to Kinesis Data Analytics using the AWS CLI:

$ aws kinesisanalytics add-application-reference-data-source \
  --application-name <my-application-name> \
  --current-application-version-id <version-id> \
  --reference-data-source '{
  "TableName": "REFERENCE_DATA_SOURCE",
  "S3ReferenceDataSource": {
    "BucketARN": "arn:aws:s3:::<my-bucket-name>",
    "FileKey": "mydata.csv",
    "ReferenceRoleARN": "arn:aws:iam::<account-id>:role/..."
  },
  "ReferenceSchema": {
    "RecordFormat": {
      "RecordFormatType": "CSV",
      "MappingParameters": {
        "CSVMappingParameters": {"RecordRowDelimiter": "\n", "RecordColumnDelimiter": ","}
      }
    },
    "RecordEncoding": "UTF-8",
    "RecordColumns": [
      {"Name": "USER_ID", "Mapping": "0", "SqlType": "INTEGER"},
      {"Name": "GENDER",  "Mapping": "1", "SqlType": "VARCHAR(32)"},
      {"Name": "SEGMENT_ID", "Mapping": "2", "SqlType": "INTEGER"}
    ]
  }
}'

This application reference data source can be referred on Kinesis Data Analytics.

  1. Run a query against the source data stream on Kinesis Data Analytics with the application reference data source.

a. Define the temporary stream named TMP_SQL_STREAM.

CREATE OR REPLACE STREAM "TMP_SQL_STREAM" (
  GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER
);

b. Insert the joined source stream and application reference data source into the temporary stream.

CREATE OR REPLACE PUMP "TMP_PUMP" AS
INSERT INTO "TMP_SQL_STREAM"
SELECT STREAM
  R.GENDER, R.SEGMENT_ID, S.ARTICLE_ID, S.ACTION
FROM      "SOURCE_SQL_STREAM_001" S
LEFT JOIN "REFERENCE_DATA_SOURCE" R
  ON S.USER_ID = R.USER_ID;

c. Define the destination stream named DESTINATION_SQL_STREAM.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
  TIME TIMESTAMP, GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER, 
  IMPRESSION INTEGER, CLICK INTEGER
);

d. Insert the processed temporary stream, using a tumbling window, into the destination stream per minute.

CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
  ROW_TIME AS TIME,
  GENDER, SEGMENT_ID, ARTICLE_ID,
  SUM(CASE ACTION WHEN 'impression' THEN 1 ELSE 0 END) AS IMPRESSION,
  SUM(CASE ACTION WHEN 'click' THEN 1 ELSE 0 END) AS CLICK
FROM "TMP_SQL_STREAM"
GROUP BY
  GENDER, SEGMENT_ID, ARTICLE_ID,
  FLOOR("TMP_SQL_STREAM".ROWTIME TO MINUTE);

The results look like the following:

  1. Insert the results into Amazon Elasticsearch Service (Amazon ES).
  2. Batch servers get results from Amazon ES every minute. They then optimize delivering articles with other data sources using a proprietary optimization algorithm.

How to connect a stream to another stream in another AWS Region

When we built the solution, Kinesis Data Analytics was not available in the Asia Pacific (Tokyo) Region, so we used the US West (Oregon) Region. The following shows how we connected a data stream to another data stream in the other Region.

There is no need to continue containing all components in a single AWS Region, unless you have a situation where a response difference at the millisecond level is critical to the service.

Benefits

The solution provides benefits for both our company and for our users. Benefits for the company are cost savings—including development costs, operational costs, and infrastructure costs—and reducing delivery time. Users can now find articles of interest more quickly. The solution can process more than 500,000 records per minute, and it enables fast and personalized news curating for our users.

Conclusion

In this post, I showed you how we optimize trending user activities to personalize news using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services in Gunosy.

AWS gives us a quick and economical solution and a good experience.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Joining and Enriching Streaming Data on Amazon Kinesis.


About the Authors

Yukinori Koide is the head of development for the Newspass department at Gunosy. He is working on standardization of provisioning and deployment flow, promoting the utilization of serverless and containers for machine learning and AI services. His favorite AWS services are DynamoDB, Lambda, Kinesis, and ECS.

 

 

 

Akihiro Tsukada is a start-up solutions architect with AWS. He supports start-up companies in Japan technically at many levels, ranging from seed to later-stage.

 

 

 

 

Yuta Ishii is a solutions architect with AWS. He works with our customers to provide architectural guidance for building media & entertainment services, helping them improve the value of their services when using AWS.

 

 

 

 

 

AWS IoT, Greengrass, and Machine Learning for Connected Vehicles at CES

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-greengrass-and-machine-learning-for-connected-vehicles-at-ces/

Last week I attended a talk given by Bryan Mistele, president of Seattle-based INRIX. Bryan’s talk provided a glimpse into the future of transportation, centering around four principle attributes, often abbreviated as ACES:

Autonomous – Cars and trucks are gaining the ability to scan and to make sense of their environments and to navigate without human input.

Connected – Vehicles of all types have the ability to take advantage of bidirectional connections (either full-time or intermittent) to other cars and to cloud-based resources. They can upload road and performance data, communicate with each other to run in packs, and take advantage of traffic and weather data.

Electric – Continued development of battery and motor technology, will make electrics vehicles more convenient, cost-effective, and environmentally friendly.

Shared – Ride-sharing services will change usage from an ownership model to an as-a-service model (sound familiar?).

Individually and in combination, these emerging attributes mean that the cars and trucks we will see and use in the decade to come will be markedly different than those of the past.

On the Road with AWS
AWS customers are already using our AWS IoT, edge computing, Amazon Machine Learning, and Alexa products to bring this future to life – vehicle manufacturers, their tier 1 suppliers, and AutoTech startups all use AWS for their ACES initiatives. AWS Greengrass is playing an important role here, attracting design wins and helping our customers to add processing power and machine learning inferencing at the edge.

AWS customer Aptiv (formerly Delphi) talked about their Automated Mobility on Demand (AMoD) smart vehicle architecture in a AWS re:Invent session. Aptiv’s AMoD platform will use Greengrass and microservices to drive the onboard user experience, along with edge processing, monitoring, and control. Here’s an overview:

Another customer, Denso of Japan (one of the world’s largest suppliers of auto components and software) is using Greengrass and AWS IoT to support their vision of Mobility as a Service (MaaS). Here’s a video:

AWS at CES
The AWS team will be out in force at CES in Las Vegas and would love to talk to you. They’ll be running demos that show how AWS can help to bring innovation and personalization to connected and autonomous vehicles.

Personalized In-Vehicle Experience – This demo shows how AWS AI and Machine Learning can be used to create a highly personalized and branded in-vehicle experience. It makes use of Amazon Lex, Polly, and Amazon Rekognition, but the design is flexible and can be used with other services as well. The demo encompasses driver registration, login and startup (including facial recognition), voice assistance for contextual guidance, personalized e-commerce, and vehicle control. Here’s the architecture for the voice assistance:

Connected Vehicle Solution – This demo shows how a connected vehicle can combine local and cloud intelligence, using edge computing and machine learning at the edge. It handles intermittent connections and uses AWS DeepLens to train a model that responds to distracted drivers. Here’s the overall architecture, as described in our Connected Vehicle Solution:

Digital Content Delivery – This demo will show how a customer uses a web-based 3D configurator to build and personalize their vehicle. It will also show high resolution (4K) 3D image and an optional immersive AR/VR experience, both designed for use within a dealership.

Autonomous Driving – This demo will showcase the AWS services that can be used to build autonomous vehicles. There’s a 1/16th scale model vehicle powered and driven by Greengrass and an overview of a new AWS Autonomous Toolkit. As part of the demo, attendees drive the car, training a model via Amazon SageMaker for subsequent on-board inferencing, powered by Greengrass ML Inferencing.

To speak to one of my colleagues or to set up a time to see the demos, check out the Visit AWS at CES 2018 page.

Some Resources
If you are interested in this topic and want to learn more, the AWS for Automotive page is a great starting point, with discussions on connected vehicles & mobility, autonomous vehicle development, and digital customer engagement.

When you are ready to start building a connected vehicle, the AWS Connected Vehicle Solution contains a reference architecture that combines local computing, sophisticated event rules, and cloud-based data processing and storage. You can use this solution to accelerate your own connected vehicle projects.

Jeff;

Introducing Email Templates and Bulk Sending

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/ses/introducing-email-templates-and-bulk-sending/

The Amazon SES team is excited to announce our latest update, which includes two related features that help you send personalized emails to large groups of customers. This post discusses these features, and provides examples that you can follow to start using these features right away.

Email templates

You can use email templates to create the structure of an email that you plan to send to multiple recipients, or that you will use again in the future. Each template contains a subject line, a text part, and an HTML part. Both the subject and the email body can contain variables that are automatically replaced with values specific to each recipient. For example, you can include a {{name}} variable in the body of your email. When you send the email, you specify the value of {{name}} for each recipient. Amazon SES then automatically replaces the {{name}} variable with the recipient’s first name.

Creating a template

To create a template, you use the CreateTemplate API operation. To use this operation, pass a JSON object with four properties: a template name (TemplateName), a subject line (SubjectPart), a plain text version of the email body (TextPart), and an HTML version of the email body (HtmlPart). You can include variables in the subject line or message body by enclosing the variable names in two sets of curly braces. The following example shows the structure of this JSON object.

{
  "TemplateName": "MyTemplate",
  "SubjectPart": "Greetings, {{name}}!",
  "TextPart": "Dear {{name}},\r\nYour favorite animal is {{favoriteanimal}}.",
  "HtmlPart": "<h1>Hello {{name}}</h1><p>Your favorite animal is {{favoriteanimal}}.</p>"
}

Use this example to create your own template, and save the resulting file as mytemplate.json. You can then use the AWS Command Line Interface (AWS CLI) to create your template by running the following command: aws ses create-template --cli-input-json mytemplate.json

Sending an email created with a template

Now that you have created a template, you’re ready to send email that uses the template. You can use the SendTemplatedEmail API operation to send email to a single destination using a template. Like the CreateTemplate operation, this operation accepts a JSON object with four properties. For this operation, the properties are the sender’s email address (Source), the name of an existing template (Template), an object called Destination that contains the recipient addresses (and, optionally, any CC or BCC addresses) that will receive the email, and a property that refers to the values that will be replaced in the email (TemplateData). The following example shows the structure of the JSON object used by the SendTemplatedEmail operation.

{
  "Source": "[email protected]",
  "Template": "MyTemplate",
  "Destination": {
    "ToAddresses": [ "[email protected]" ]
  },
  "TemplateData": "{ \"name\":\"Alejandro\", \"favoriteanimal\": \"zebra\" }"
}

Customize this example to fit your needs, and then save the resulting file as myemail.json. One important note: in the TemplateData property, you must use a blackslash (\) character to escape the quotes within this object, as shown in the preceding example.

When you’re ready to send the email, run the following command: aws ses send-templated-email --cli-input-json myemail.json

Bulk email sending

In most cases, you should use email templates to send personalized emails to several customers at the same time. The SendBulkTemplatedEmail API operation helps you do that. This operation also accepts a JSON object. At a minimum, you must supply a sender email address (Source), a reference to an existing template (Template), a list of recipients in an array called Destinations (within which you specify the recipient’s email address, and the variable values for that recipient), and a list of fallback values for the variables in the template (DefaultTemplateData). The following example shows the structure of this JSON object.

{
  "Source":"[email protected]",
  "ConfigurationSetName":"ConfigSet",
  "Template":"MyTemplate",
  "Destinations":[
    {
      "Destination":{
        "ToAddresses":[
          "[email protected]"
        ]
      },
      "ReplacementTemplateData":"{ \"name\":\"Anaya\", \"favoriteanimal\":\"yak\" }"
    },
    {
      "Destination":{ 
        "ToAddresses":[
          "[email protected]"
        ]
      },
      "ReplacementTemplateData":"{ \"name\":\"Liu\", \"favoriteanimal\":\"water buffalo\" }"
    },
    {
      "Destination":{
        "ToAddresses":[
          "[email protected]"
        ]
      },
      "ReplacementTemplateData":"{ \"name\":\"Shirley\", \"favoriteanimal\":\"vulture\" }"
    },
    {
      "Destination":{
        "ToAddresses":[
          "[email protected]"
        ]
      },
      "ReplacementTemplateData":"{}"
    }
  ],
  "DefaultTemplateData":"{ \"name\":\"friend\", \"favoriteanimal\":\"unknown\" }"
}

This example sends unique emails to Anaya ([email protected]), Liu ([email protected]), Shirley ([email protected]), and a fourth recipient ([email protected]), whose name and favorite animal we didn’t specify. Anaya, Liu, and Shirley will see their names in place of the {{name}} tag in the template (which, in this example, is present in both the subject line and message body), as well as their favorite animals in place of the {{favoriteanimal}} tag in the message body. The DefaultTemplateData property determines what happens if you do not specify the ReplacementTemplateData property for a recipient. In this case, the fourth recipient will see the word “friend” in place of the {{name}} tag, and “unknown” in place of the {{favoriteanimal}} tag.

Use the example to create your own list of recipients, and save the resulting file as mybulkemail.json. When you’re ready to send the email, run the following command: aws ses send-bulk-templated-email --cli-input-json mybulkemail.json

Other considerations

There are a few limits and other considerations when using these features:

  • You can create up to 10,000 email templates per Amazon SES account.
  • Each template can be up to 10 MB in size.
  • You can include an unlimited number of replacement variables in each template.
  • You can send email to up to 50 destinations in each call to the SendBulkTemplatedEmail operation. A destination includes a list of recipients, as well as CC and BCC recipients. Note that the number of destinations you can contact in a single call to the API may be limited by your account’s maximum sending rate. For more information, see Managing Your Amazon SES Sending Limits in the Amazon SES Developer Guide.

We look forward to seeing the amazing things you create with these new features. If you have any questions, please leave a comment on this post, or let us know in the Amazon SES forum.