Tag Archives: Amazon Machine Learning

Introducing the new AWS Well-Architected Machine Learning Lens

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/introducing-the-new-aws-well-architected-machine-learning-lens/

The AWS Well-Architected Framework provides you with a formal approach to compare your workloads against best practices. It also includes guidance on how to make improvements.

Machine learning (ML) algorithms discover and learn patterns in data, and construct mathematical models to predict future data. These solutions can revolutionize lives through better diagnoses of diseases, environmental protections, products and services transformation, and more.

Your ML models depend on the quality of input data to generate accurate results. As data changes over time, monitoring is required to continuously detect, correct, and mitigate issues. This improves accuracy and performance. It also may require you to retrain your model with the latest refined data.

Application workloads rely on step-by-step instructions to solve a problem. ML workloads enable algorithms to learn from data through an iterative and continuous cycle. We are announcing a brand-new version of the AWS Well-Architected Machine Learning Lens whitepaper. It complements and builds upon the Well-Architected Framework to address this difference between these two types of workloads.

The whitepaper provides you with a set of established cloud and technology agnostic best practices. You can apply this guidance and architectural principles when designing your ML workloads, or after your workloads have entered production as part of continuous improvement. The paper includes guidance and resources to help you implement these best practices on AWS.

The Well-Architected Machine Learning Lens components

The Lens includes four focus areas:

1. The Well-Architected Machine Learning Design Principles — A set of considerations that are used as the basis for a Well-Architected ML workload. These design principles are the guiding light for the collection of the best practices in the ML Lens.

2. The Well-Architected Machine Learning Lifecycle — This integrates the Well-Architected Framework into the Machine Learning Lifecycle as can be seen in figure 1.

    • The Well-Architected Framework pillars includes:
      1. Operational Excellence
      2. Security
      3. Reliability
      4. Performance Efficiency
      5. Cost Optimization
    • The Machine Learning Lifecycle phases referenced in the ML Lens include:
      1. Business goal identification
      2. ML problem framing
      3. Data processing (data collection, data pre-processing, feature engineering)
      4. Model development (training, tuning, evaluation)
      5. Model deployment (prediction, inference)
      6. Model monitoring
Figure 1. Well-Architected Machine Learning Lifecycle

Figure 1. Well-Architected Machine Learning Lifecycle

In the Well-Architected ML Lens whitepaper, the Well-Architected Machine Learning Lifecycle applies the Well-Architected Framework pillars to each of the lifecycle phases.

3. Cloud and technology agnostic best practices — These are best practices for each ML lifecycle phase across the Well-Architected Framework pillars. Best practices are accompanied by:

    • Implementation guidance that provides AWS implementation plans for each best practice with references to AWS technologies and resources.
    • Resources as a set of links to AWS documents, blogs, videos, and code examples as supporting resources to the best practices and their implementation plans.

4. ML Lifecycle architecture diagrams — These illustrate processes, technologies, and components that support many of the best practices, shown in Figure 2. They include: Feature stores, Model Registry, lineage tracker, alarm manager, scheduler, and more. Different pipeline technologies are illustrated using these architecture diagrams.

Figure 2. Machine Learning Lifecycle phases with expanded components

Figure 2. Machine Learning Lifecycle phases with expanded components

Where should you apply the Well-Architected Machine Learning Lens?

Use the Well-Architected ML Lens to:

  • Make informed decisions — Plan early and make informed decisions by reviewing best practices before a new workload design begins.
  • Build and deploy faster — Use the best practices to guide you through building new Well-Architected workloads across the ML lifecycle.
  • Lower or mitigate risks — Evaluate existing workloads regularly to identify, mitigate, and address potential issues early.
  • Learn AWS best practices — Use the provided implementation plans as guidance on implementing the best practices on AWS.


The new Well-Architected Machine Learning Lens whitepaper is available now. Use the Lens to help ensure that your ML workloads are architected with operational excellence, security, reliability, performance efficiency, and cost optimization in mind.

Special thanks to everyone across the AWS Solution Architecture and Machine Learning communities.  These contributions encompassed diverse perspectives, expertise, and experiences in developing the new AWS Well-Architected Machine Learning Lens.

Field Notes: Build a Cross-Validation Machine Learning Model Pipeline at Scale with Amazon SageMaker

Post Syndicated from Wei Teh original https://aws.amazon.com/blogs/architecture/field-notes-build-a-cross-validation-machine-learning-model-pipeline-at-scale-with-amazon-sagemaker/

When building a machine learning algorithm, such as a regression or classification algorithm, a common goal is to produce a generalized model. This is so that it performs well on new data that the model has not seen before. Overfitting and underfitting are two fundamental causes of poor performance for machine learning models. A model is overfitted when it performs well on known data, but generalizes poorly on new data. However, an underfit model performs poorly on both trained and new data. A reliable model validation technique helps provide better assessment for predicting model performance in practice, and provides insight for training models to achieve the best accuracy.

Cross-validation is a standard model validation technique commonly used for assessing performance of machine learning algorithms. In general, it works by first sampling the dataset into groups of similar sizes, where each group contains a subset of data dedicated for training and model evaluation. After the data has been grouped, a machine learning algorithm will fit and score a model using the data in each group independently. The final score of the model is defined by the average score across all the trained models for performance metric representation.

There are few cross-validation methods commonly used, including k-fold, stratified k-fold, and leave-p-out, to name a few. Although there are well-defined data science frameworks that can help simplify cross-validation processes, such as Python scikit-learn library, these frameworks are designed to work in a monolithic, single compute environment. When it comes to training machine learning algorithms with large volume of data, these frameworks become bottlenecked with limited scalability and reliability.

In this blog post, we are going to walk through the steps for building a highly scalable, high-accuracy, machine learning pipeline, with the k-fold cross-validation method, using Amazon Simple Storage Service (Amazon S3), Amazon SageMaker Pipelines, SageMaker automatic model tuning, and SageMaker training at scale.

Overview of solution

To operate the k-fold cross validation training pipeline at scale, we built an end to end machine learning pipeline using SageMaker native features. This solution implements the k-fold data processing, model training, and model selection processes as individual components to maximize parallellism. The pipeline is orchestrated through SageMaker Pipelines in distributed manner to achieve scalability and performance efficiency. Let’s dive into the high-level architecture of the solution in the following section.

Figure 1. Solution architecture

Figure 1. Solution architecture

The overall solution architecture is shown in Figure 1. There are four main building blocks in the k-fold cross-validation model pipeline:

  1. Preprocessing – Sample and split the entire dataset into k groups.
  2. Model training – Fit the SageMaker training jobs in parallel with hyperparameters optimized through the SageMaker automatic model tuning job.
  3. Model selection – Fit a final model, using the best hyperparameters obtained in step 2, with the entire dataset.
  4. Model registration – Register the final model with SageMaker Model Registry, for model lifecycle management and deployment.

The final output from the pipeline is a model that represents best performance and accuracy for the given dataset. The pipeline can be orchestrated easily using a workflow management tool, such as Pipelines.

Amazon SageMaker is a fully managed service that enables data scientists and developers to quickly develop, train, tune, and deploy machine learning quickly and at scale. When it comes to choosing the right machine learning and data processing frameworks to solve problems, SageMaker gives you the flexibility to use prebuilt containers bundled with the supported common machine learning frameworks—such as Tensorflow, Pytorch, and MxNet—or to bring your own container images with custom scripts and libraries that fit your use cases to train on the highly available SageMaker model training environment. Additionally, Pipelines enables users to develop complete machine learning workflows using python SDK, and manage these workflows in SageMaker Studio.

For simplicity, we will use the public Iris flower data as the train and test dataset to build a multivariate classification model using linear algorithm (SVM). The pipeline architecture is agnostic to the data and model; hence, it can be modified to adopt a different dataset or algorithm.


To deploy the solution, you require the following:

  • SageMaker Studio
  • A Command Line (Terminal) that supports building Docker images (or instance, AWS Cloud9)

Solution walkthrough

In this section, we are going to walk through the steps to create a cross-validation model training pipeline using Pipelines. The main components are as follows.

  1. Pipeline parameters
    Pipelines parameters are introduced as variables that allow the predefined values to be overridden at runtime. Pipelines supports the following parameters types: String, Integer, and Float (expressed as ParameterString, ParameterInteger, and ParameterFloat). The following are some examples of the parameters used in the cross-validation model training pipeline:
    • K-Fold – Value of k to be used in k-fold cross-validation
    • ProcessingInstanceCount – Number of instances for SageMaker processing job
    • ProcessingInstanceType – Instance type used for SageMaker processing job
    • TrainingInstanceType – Instance type used for SageMaker training job
    • TrainingInstanceCount – Number of instances for SageMaker training job
  1. Preprocessing

In this step, the original dataset is split into k equal-sized samples. One of the k samples is retained as the validation data for model evaluation, with the remaining k-1 samples to be used as training data. This process is repeated k times, with each of the k samples used as the validation set only one time. The k sample collections are uploaded to an S3 bucket, with the prefix corresponding to an index (0 – k-1) to be identified as the input path to the specified training jobs in the next step of the pipeline. The cross-validation split is submitted as a SageMaker processing job orchestrated through the Pipelines processing step. The processing flow is shown in Figure 2.

Figure 2. K-fold cross-validation: original data is split into k equal-sized samples uploaded to S3 bucket

Figure 2. K-fold cross-validation: original data is split into k equal-sized samples uploaded to S3 bucket

The following code snippet splits the k-fold dataset in the preprocessing script:

def save_kfold_datasets(X, y, k):
    """ Splits the datasets (X,y) k folds and saves the output from 
    each fold into separate directories.

        X : numpy array represents the features
        y : numpy array represetns the target
        k : int value represents the number of folds to split the given datasets

    # Shuffles and Split dataset into k folds. 
    kf = KFold(n_splits=k, random_state=23, shuffle=True)

    fold_idx = 0
    for train_index, test_index in kf.split(X, y=y, groups=None):    
       X_train, X_test = X[train_index], X[test_index]
       y_train, y_test = y[train_index], y[test_index]
       os.makedirs(f'{base_dir}/train/{fold_idx}', exist_ok=True)
       np.savetxt(f'{base_dir}/train/{fold_idx}/train_x.csv', X_train, delimiter=',')
       np.savetxt(f'{base_dir}/train/{fold_idx}/train_y.csv', y_train, delimiter=',')

       os.makedirs(f'{base_dir}/test/{fold_idx}', exist_ok=True)
       np.savetxt(f'{base_dir}/test/{fold_idx}/test_x.csv', X_test, delimiter=',')
       np.savetxt(f'{base_dir}/test/{fold_idx}/test_y.csv', y_test, delimiter=',')
       fold_idx += 1
  1.  Cross-validation training with SageMaker automatic model tuning

In a typical cross-validation training scenario, a chosen algorithm is trained for k times with specific training and a validation dataset sampled through the k-fold technique, mentioned in the previous step. Traditionally, the cross-validation model training process is performed sequentially on the same server. This method is inefficient and doesn’t scale well for models with large volumes of data. Because all the samples are uploaded to an S3 bucket, we can now run k training jobs in parallel. Each training job will consume input samples in the specified bucket location correspond to the index (ranged between 0 – k-1) given to the training job. Additionally, the hyperparameter values must be the same for all k jobs because cross validation estimates the true out-of-sample performance of a model trained with this specific set of hyperparameters.

Although the cross-validation technique helps generalize the models, hyperparameter tuning for the model is typically performed manually. In this blog post, we are going to take a heuristic approach of finding the most optimized hyperparameters using SageMaker automatic model tuning.

We start by defining a training script that accepts the hyperparameters as input for the specified model algorithm, and then implement the model training and evaluation steps.

The steps involved in the training script are summarized as follows:

    1. Parse hyperparameters from the input.
    2. Fit the model using the parsed hyperparameters.
    3. Evaluate model performance (score).
    4. Save the trained model.
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-c', '--c', type=float, default=1.0)
    parser.add_argument('--gamma', type=float)
    parser.add_argument('--kernel', type=str)
    # Sagemaker specific arguments. Defaults are set in the environment variables.
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
    args = parser.parse_args()
    model = train(train=args.train, test=args.test)
    evaluate(test=args.test, model=model)
    dump(model, os.path.join(args.model_dir, "model.joblib"))

Next, we create a python script that performs cross-validation model training by submitting k SageMaker training jobs in parallel with given hyperparameters. Additionally, the script monitors the progress of the training jobs, and calculates the objective metrics by averaging the scores across the completed jobs.

Now we create a python script that uses a SageMaker automatic model tuning job to find the optimal hyperparameters for the trained models. The hyperparameter tuner works by running a specified number of training jobs using the ranges of hyperparameters specified. The number of training jobs and ranges of hyperparameters are given in the input parameter to the script. After the tuning job completes, the objective metrics, as well as the hyperparameters from the best cross-validation model training job, are captured, formatted in JSON format, respectively, to be used in the next steps of the workflow. Figure 3 illustrates cross-validation training with automatic model tuning.

Figure 3. In cross-validation training step, a SageMaker HyperparameterTuner job invokes n training jobs. The metrics and hyperparameters are captured for downstream processes.

Figure 3. In cross-validation training step, a SageMaker HyperparameterTuner job invokes n training jobs. The metrics and hyperparameters are captured for downstream processes.

Finally, the training and cross-validation scripts are packaged and built as a custom container image, available for the SageMaker automatic model tuning job for submission. The following code snippet is for building the custom image:

FROM python:3.7
RUN apt-get update && pip install sagemaker boto3 numpy sagemaker-training
COPY cv.py /opt/ml/code/train.py
COPY scikit_learn_iris.py /opt/ml/code/scikit_learn_iris.py
  1. Model evaluation
    The objective metrics in the cross-validation training and tuning steps define the model quality. To evaluate the model performance, we created a conditional step that compares the metrics against a baseline to determine the next step in the workflow. The following code snippet illustrates the conditional step in detail. Specifically, this step first extracts the objective metrics based on the evaluation report uploaded in previous step, and then compares the value with baseline_model_objective_value provided in the pipeline job. The workflow continues if the model objective metric is greater than or equal to the baseline value, and stops otherwise.
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import (
cond_gte = ConditionGreaterThanOrEqualTo(
step_cond = ConditionStep(
    if_steps=[step_model_selection, step_register_model],
  1. Model Selection
    At this stage of the pipeline, we’ve completed cross-validation and hyperparameter optimization steps to identify the best performing model trained with the specific hyperparameter values. In this step, we are going to fit a model using the same algorithm used in cross-validation training by providing the entire dataset and the hyperparameters from the best model. The trained model will be used for serving predictions for downstream applications. The following code snippet illustrates a Pipelines training step for model selection:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep
from sagemaker.sklearn.estimator import SKLearn
sklearn_estimator = SKLearn("scikit_learn_iris.py", 
step_model_selection = TrainingStep(
        "train": TrainingInput(
        "jobinfo": TrainingInput(
  1. Model registration
    Because the cross-validation model training pipeline evolves, it’s important to have a mechanism for managing the version of model artifacts over time, so that the team responsible for the project can manage the model lifecycle, including track, deploy, or rollback a model based on the version. Building your own model registry, with lifecycle management capabilities, can be complicated and challenging to maintain and operate. SageMaker Model Registry simplifies model lifecycle management by enabling model catalog, versioning, metrics association, model approval workflow, and model deployment automation.

In the final step of the pipeline, we are going to register the trained model with Model Registry by associating model objective metrics, the model artifact location on S3 bucket, the estimator object used in the model selection step, model training and inference metadata, and approval status. The following code snippet illustrates the model registry step using ModelMetrics and RegisterModel.

from sagemaker.model_metrics import MetricsSource, ModelMetrics
from sagemaker.workflow.step_collections import RegisterModel
model_metrics = ModelMetrics(
step_register_model = RegisterModel(
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],

Figure 4 shows a model version registered in SageMaker Model Registry upon a successful pipeline job through Studio.

Figure 4. Model version registered successfully in SageMaker

  1. Putting everything together
    Now that we’ve defined a cross-validation training pipeline, we can track, visualize, and manage the pipeline job directly from within Studio. The following code snippet and Figure 5 depicts our pipeline definition:
from sagemaker.workflow.pipeline_experiment_config import PipelineExperimentConfig
from sagemaker.workflow.execution_variables import ExecutionVariables
pipeline_name = f"CrossValidationTrainingPipeline"
pipeline = Pipeline(
    steps=[step_process, step_cv_train_hpo, step_cond],
Figure 5. SageMaker Pipelines definition shown in SageMaker Studio

Figure 5. SageMaker Pipelines definition shown in SageMaker Studio

Finally, to kick off the pipeline, invoke the pipeline.start() function, with optional parameters specific to the job run:

execution = pipeline.start(

You can track the pipeline job from within Studio, or use SageMaker application programming interfaces (APIs). Figure 6 shows a screenshot of a pipeline job in progress from Studio.

Figure 6. SageMaker Pipelines job progress shown in SageMaker Studio

Figure 6. SageMaker Pipelines job progress shown in SageMaker Studio


In this blog post, we showed you an architecture that orchestrates a complete workflow for cross-validation model training. We implemented the workflow using SageMaker Pipelines that incorporates preprocessing, hyperparameter tuning, model evaluation, model selection, and model registration. The solution addresses the common challenge of orchestrating cross-validation model pipeline at scale. The entire pipeline implementation, including a jupyter notebook that defines the pipeline, a Dockerfile and python scripts described in this blog post, can be found in the GitHub project.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Classifying Millions of Amazon items with Machine Learning, Part I: Event Driven Architecture

Post Syndicated from Mahmoud Abid original https://aws.amazon.com/blogs/architecture/classifying-millions-of-amazon-items-with-machine-learning-part-i-event-driven-architecture/

As part of AWS Professional Services, we work with customers across different industries to understand their needs and supplement their teams with specialized skills and experience.

Some of our customers are internal teams from the Amazon retail organization who request our help with their initiatives. One of these teams, the Global Environmental Affairs team, identifies the number of electronic products sold. Then they classify these products according to local laws and accurately report this data to regulators. This process covers the products’ end-of-life costs and ensures a high quality of recycling.

These electronic products have classification codes that differ from country to country, and these codes change according to each country’s latest regulations. This poses a complex technical problem. How do we automate our compliance teams’ work to efficiently and accurately classify over three million product classifications every month, in more than 38 countries, while also complying with evolving classification regulations?

To solve this problem, we used Amazon Machine Learning (Amazon ML) capabilities to build a resilient architecture. It ingests and processes data, trains ML models, and predicts (also known as inference workflow) monthly sales data for all countries concurrently.

In this post, we outline how we used AWS Lambda, Amazon EventBridge, and AWS Step Functions to build a scalable and cost-effective solution. We’ll also show you how to keep the data secure while processing it in Amazon ML flows.

Solution overview

Our solution consists of three main parts, which are summarized here and detailed in the following sections:

  1. Training the ML models
  2. Evaluating their performance
  3. Using them to run an inference workflow (in other words, label) the sold items with the correct classification codes

Training the Amazon ML model

For training our Amazon ML model, we use the architecture in Figure 1. It starts with a periodic query against the Amazon.com data warehouse in Amazon Redshift.

Training workflow

Figure 1. Training workflow

  1. A labeled dataset containing pre-recorded classification codes is extracted from Amazon Redshift. This dataset is stored in an Amazon Simple Storage Service (Amazon S3) bucket and split up by country. The data is encrypted at rest with server-side encryption using an AWS Key Management Service (AWS KMS) key. This is also known as server-side encryption with AWS KMS (SSE-KMS). The extraction query uses the AWS KMS key to encrypt the data when storing it in the S3 bucket.
  2. Each time a country’s dataset is uploaded to the S3 bucket, a message is sent to an Amazon Simple Queue Service (Amazon SQS) queue. This prompts a Lambda function. We use Amazon SQS to ensure resiliency. If the Lambda function fails, the message will be tried again automatically. Overall, the message is either processed successfully, or ends up in a dead letter queue that we monitor (not displayed in Figure 1).
  3. If the message is processed successfully, the Lambda function generates necessary input parameters. Then it starts a Step Functions workflow execution for the training process.
  4. The training process involves orchestrating Amazon SageMaker Processing jobs to prepare the data. Once the data is prepared, a hyperparameter optimization job invokes multiple training jobs. These run in parallel with different values from a range of hyperparameters. The model that performs the best is chosen to move forward.
  5. After the model is trained successfully, an EventBridge event is prompted, which will be used to invoke the performance comparison process.

Comparing performance of Amazon ML models

Because Amazon ML models are automatically trained periodically, we want to assess their performance automatically too. Newly created models should perform better than their predecessors. To measure this, we use the flow in Figure 2.

Model performance comparison workflow

Figure 2. Model performance comparison workflow

  1. The flow is activated by the EventBridge event at the end of the training flow.
  2. A Lambda function gathers the necessary input parameters and uses them to start an inference workflow, implemented as a Step Function.
  3. The inference workflow use SageMaker Processing jobs to prepare a new test dataset. It performs predictions using SageMaker Batch Transform jobs with the new model. The test dataset is a labeled subset that was not used in model training. Its prediction gives an unbiased estimation of the model’s performance, proving that the model can generalize.
  4. After the inference workflow is completed and the results are stored on Amazon S3, an EventBridge event is performed, which prompts another Lambda function. This function runs the performance comparison Step Function.
  5. The performance comparison workflow uses a SageMaker Processing job to analyze the inference results and calculate its performance score based on ground truth. For each country, the job compares the performance of the new model with the performance of the last used model to determine which one was best, otherwise known as the “winner model.” The metadata of the winner model is saved in an Amazon DynamoDB table so it can be queried and used in the next production inference job.
  6. At the end of the performance comparison flow, an informational notification is sent to an Amazon Simple Notification Service (Amazon SNS) topic, which will be received by the MLOps team.

Running inference

The inference flow starts with a periodic query against the Amazon.com data warehouse in Amazon Redshift, as shown in Figure 3.

Inference workflow

Figure 3. Inference workflow

  1. As with training, the dataset is extracted from Amazon Redshift, split up by country, and stored in an S3 bucket and encrypted at rest using the AWS KMS key.
  2. Every country dataset upload prompts a message to an SQS queue, which invokes a Lambda function.
  3. The Lambda function gathers necessary input parameters and starts a workflow execution for the inference process. This is the same Step Function we used in the performance comparison. Now it runs against the real dataset instead of the test set.
  4. The inference Step Function orchestrates the data preparation and prediction using the winner model for each country, as stored in the model performance DynamoDB table. The predictions are uploaded back to the S3 bucket to be further consumed for reporting.
  5. Lastly, an Amazon SNS message is sent to signal completion of the inference flow, which will be received by different stakeholders.

Data encryption

One of the key requirements of this solution was to provide least privilege access to all data. To achieve this, we use AWS KMS to encrypt all data as follows:

Restriction of data decryption permissions

Figure 4. Restriction of data decryption permissions


In this post, we outline how we used a serverless architecture to handle the end-to-end flow of data extraction, processing, and storage. We also talk about how we use this data for model training and inference.

With this solution, our customer team onboarded 38 countries and brought 60 Amazon ML models to production to classify 3.3 million items on a monthly basis.

In the next post, we show you how we use AWS Developer Tools to build a comprehensive continuous integration/continuous delivery (CI/CD) pipeline that safeguards the code behind this solution.


Improving Retail Forecast Accuracy with Machine Learning

Post Syndicated from Soonam Jose original https://aws.amazon.com/blogs/architecture/improving-retail-forecast-accuracy-with-machine-learning/

The global retail market continues to grow larger and the influx of consumer data increases daily. The rise in volume, variety, and velocity of data poses challenges with demand forecasting and inventory planning. Outdated systems generate inaccurate demand forecasts. This results in multiple challenges for retailers. They are faced with over-stocking and lost sales, and often have to rely on increased levels of safety stock to avoid losing sales.

A recent McKinsey study indicates that AI-based forecasting improves forecasting accuracy by 10–20 percent. This translates to revenue increases of 2–3 percent. An accurate forecasting system can also help determine ideal inventory levels and better predict the impact of sales promotions. It provides a single view of demand across all channels and a better customer experience overall.

In this blog post, we will show you how to build a reliable retail forecasting system. We will use Amazon Forecast, and an AWS-vetted solution called Improving Forecast Accuracy with Machine Learning. This is an AWS Solutions Implementation that automatically produces forecasts and generates visualization dashboards. This solution can be extended to use cases across a variety of industries.

Improving Forecast Accuracy solution architecture

This post will illustrate a retail environment that has an SAP S/4 HANA system for overall enterprise resource planning (ERP). We will show a forecasting solution based on Amazon Forecast to predict demand across product categories. The environment also has a unified platform for customer experience provided by SAP Customer Activity Repository (CAR). Replenishment processes are driven by SAP Forecasting and Replenishment (F&R), and SAP Fiori apps are used to manage forecasts.

The solution is divided into four parts: Data extraction and preparation, Forecasting and monitoring, Data visualization, and Forecast import and utilization in SAP.

Figure 1. Notional architecture for improving forecasting accuracy solution and SAP integration

Figure 1. Notional architecture for improving forecasting accuracy solution and SAP integration

­­Data extraction and preparation

Historical demand data such as sales, web traffic, inventory numbers, and resource demand are extracted from SAP and uploaded to Amazon Simple Storage Service (S3). There are multiple ways to extract data from an SAP system into AWS. As part of this architecture, we will use operational data provisioning (ODP) extraction. ODP acts as a data source for OData services, enabling REST-based integrations with external applications. The ODP-Based Data Extraction via OData document details this approach. The steps involved are:

  1. Create a data source using transaction RSO2, allow Change Data Capture for specific data to be extracted
  2. Create an OData service using transaction SEGW
  3. Create a Data model for ODP extraction, which refers to the defined data source, then register the service
  4. Initiate the service from SAP gateway client
  5. In the AWS Management Console, create an AWS Lambda function to extract data and upload to S3. Check out the sample extractor code using Python, referenced in the blog Building data lakes with SAP on AWS

Related data that can potentially affect demand levels can be uploaded to Amazon S3. These could include seasonal events, promotions, and item price. Additional item metadata, such as product descriptions, color, brand, size may also be uploaded. Amazon Forecast provides built-in related time series data for holidays and weather. These three components together form the forecast inputs.

Forecasting and monitoring

An S3 event notification will be initiated when new datasets are uploaded to the input bucket. This in turn, starts an AWS Step Functions state machine. The state machine combines a series of AWS Lambda functions that build, train, and deploy machine learning models in Amazon Forecast. All AWS Step Functions logs are sent to Amazon CloudWatch. Administrators will be notified with the results of the AWS Step Functions through Amazon Simple Notification Service (SNS).

An AWS Glue job combines raw forecast input data, metadata, predictor backtest exports, and forecast exports. These all go into an aggregated view of forecasts in an S3 bucket. It is then translated to the format expected by the External Forecast import interface. Amazon Athena can be used to query forecast output in S3 using standard SQL queries.

Data visualization

Amazon QuickSight analyses can be created on a per-forecast basis. This provides users with forecast output visualization across hierarchies and categories of forecasted items. It also displays item-level accuracy metrics. Dashboards can be created from these analyses and shared within the organization. Additionally, data scientists and developers can prepare and process data, and evaluate Forecast outputs using an Amazon SageMaker Notebook Instance.

Forecast import and utilization in SAP

Amazon Forecast outputs located in Amazon S3 will be imported into the Unified Demand Forecast (UDF) module within the SAP Customer Activity Repository (CAR). You can read here about how to import external forecasts. An AWS Lambda function will be initiated when aggregated forecasts are uploaded to the S3 bucket. The Lambda function performs a remote function call (RFC) to the SAP system through the official SAP JCo Library. The SAP RFC credentials and connection information may be stored securely inside AWS Secrets Manager and read on demand to establish connectivity.

Once imported, forecast values from the solution can be retrieved by SAP Forecasting and Replenishment (F&R). They will be consumed as an input to replenishment processes, which consist of requirements calculation and­­­­­ requirement quantity optimization. SAP F&R calculates requirements based on the forecast, the current stock, and the open purchase orders. The requirement quantity then may be improved in accordance with optimization settings defined in SAP F&R.


Additionally, you have the flexibly to adjust the system forecast as required by the demand situation or analyze forecasts via respective SAP Fiori Apps.

Sample use case: AnyCompany Stores, Inc.

To illustrate how beneficial this solution can be for retail organizations, let’s consider AnyCompany Stores, Inc. This is a hypothetical customer and leader in the retailer industry with 985 stores across the United States. They struggle with issues stemming from their existing forecasting implementation. That implementation only understands certain categories and does not factor in the entire product portfolio. Additionally, it is limited to available demand history and does not consider related information that may affect forecasts. AnyCompany Stores is looking to improve their demand forecasting system.

Using Improving Forecast Accuracy with Machine Learning, AnyCompany Stores can easily generate AI-based forecasts at appropriate quantiles to address sensitivities associated with respective product categories. This mitigates inconsistent inventory buys, overstocks, out-of-stocks, and margin erosion. The solution also considers all relevant related data in addition to the historical demand data. This ensures that generated forecasts are accurate for each product category.

The generated forecasts may be used to complement existing forecasting and replenishment processes. With an improved forecasting solution, AnyCompany Stores will be able to meet demand, while holding less inventory and improving customer experience. This also helps ensure that potential demand spikes are accurately captured, so staples will always be in stock. Additionally, the company will not overstock expensive items with short shelf lives that are likely to spoil.


In this post, we explored how to implement an accurate retail forecasting solution using a ready-to-deploy AWS Solution. We use generated forecasts to drive inventory replenishment optimization and improve customer experience. The solution can be extended to inventory, workforce, capacity, and financial planning.

We showcase one of the ways in which Improving Forecast Accuracy with Machine Learning may be extended for a use case in the retail industry. If your organization would like to improve business outcomes with the power of forecasting, explore customizing this solution to fit your unique needs.

Further reading:

Integrating Redaction of FinServ Data into a Machine Learning Pipeline

Post Syndicated from Ravikant Gupta original https://aws.amazon.com/blogs/architecture/integrating-redaction-of-finserv-data-into-a-machine-learning-pipeline/

Financial companies process hundreds of thousands of documents every day. These include loan and mortgage statements that contain large amounts of confidential customer information.

Data privacy requires that sensitive data be redacted to protect the customer and the institution. Redacting digital and physical documents is time-consuming and labor-intensive. The accidental or inadvertent release of personal information can be devastating for the customer and the institution. Having automated processes in place reduces the likelihood of a data breach.

In this post, we discuss how to automatically redact personally identifiable information (PII) data fields from your financial services (FinServ) data through machine learning (ML) capabilities of Amazon Comprehend and Amazon Athena. This will ensure you comply with federal regulations and meet customer expectations.

Protecting data and complying with regulations

Protecting PII is crucial to complying with regulations like the California Consumer Privacy Act (CCPA), Europe’s General Data Protection Regulation (GDPR), and Payment Card Industry’s data security standards (PCI DSS).

In Figure 1, we show how structured and non-structured sensitive data stored in AWS data stores can be redacted before it is made available to data engineers and data scientists for feature engineering and building ML models in compliance with organizations data security policies.

How to redact confidential information in your ML pipeline

Figure 1. How to redact confidential information in your ML pipeline

Architecture walkthrough

This section explains each step presented in Figure 1 and the AWS services used:

  1. By using services like AWS DataSync, AWS Storage Gateway, and AWS Transfer Family, data can be ingested into AWS using batch or streaming pattern. This data lands in an Amazon Simple Storage Service (Amazon S3) bucket, we call this “raw data” in Figure 1.
  2. To detect if the raw data bucket has any sensitive data, use Amazon Macie. Macie is a fully managed data security and data privacy service that uses ML and pattern matching to discover and protect your sensitive data in AWS. When Macie discovers sensitive data, you can configure it to tag the objects with an Amazon S3 object tag to identify that sensitive data was found in the object before progressing to the next stage of the pipeline. Refer to the Use Macie to discover sensitive data as part of automated data pipelines blog post for detailed instruction on building such pipeline.
  3.  This tagged data lands in a “scanned data” bucket, where we use Amazon Comprehend, a natural language processing (NLP) service that uses ML to uncover information in unstructured data. Amazon Comprehend works for unstructured text document data and redacts sensitive fields like credit card numbers, date of birth, social security number, passport number, and more. Refer to the Detecting and redacting PII using Amazon Comprehend blog post for step-by-step instruction on building such a capability.
  4. If your pipeline requires redaction for specific use cases only, you can use the information in Introducing Amazon S3 Object Lambda – Use Your Code to Process Data as It Is Being Retrieved from S3 to redact sensitive data. Using this operation, an AWS Lambda function will intercept each GET request. It will redact data as necessary before it goes back to the requestor. This allows you to keep one copy of all the data and redact the data as it is requested for a specific workload. For further details, refer to the Amazon S3 Object Lambda Access Point to redact personally identifiable information (PII) from documents developer guide.
  5. When you want to join multiple datasets from different data sources, use an Athena federated query. Using user-defined functions (UDFs) with Athena federated query will help you redact data in Amazon S3 or from other data sources such as an online transaction store like Amazon Relational Database Service (Amazon RDS), a data warehouse solution like Amazon Redshift, or a NoSQL store like Amazon DocumentDB. Athena supports UDFs, which enable you to write custom functions and invoke them in SQL queries. UDFs allow you to perform custom processing such as redacting sensitive data, compressing, and decompressing data or applying customized decryption. To read further on how you can get this set up refer to the Redacting sensitive information with user-defined functions in Amazon Athena blog post.
  6. Redacted data lands in another S3 bucket that is now ready for any ML pipeline consumption.
  7. Using AWS Glue DataBrew, the data preparation without writing any code. You can choose reusable recipes from over 250 pre-built transformations to automate data preparation tasks by jobs that can be scheduled based on your requirements.
  8. Data is then used by Amazon SageMaker Data Wrangler to do feature engineering on curated data in data preparation (step 6). SageMaker Data Wrangler offers over 300 pre-configured data transformations, such as convert column type, one hot encoding, impute missing data with mean or median, rescale columns, and data/time embedding, so you can transform your data into formats that can be effectively used for models without writing a single line of code.
  9. The output of the SageMaker Data Wrangler job is stored in Amazon SageMaker Feature Store, a purpose-built repository where you can store and access features to name, organize, and reuse them across teams. SageMaker Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent.
  10. Use ML features in SageMaker notebooks or SageMaker Studio for ML training on your redacted data. SageMaker notebook instance is an ML compute instance running the Jupyter Notebook App. Amazon SageMaker Studio is a web-based, integrated development environment for ML that lets you build, train, debug, deploy, and monitor your ML models. SageMaker Studio is integrated with SageMaker Data Wrangler.


Federal regulations require that financial institutions protect customer data. To achieve this, redact sensitive fields in your data.

In this post, we showed you how to use AWS services to meet these requirements with Amazon Comprehend and Amazon Athena. These services allow data engineers and data scientist in your organization to safely consume this data for machine learning pipelines.

Field Notes: Automating Data Ingestion and Labeling for Autonomous Vehicle Development

Post Syndicated from Amr Ragab original https://aws.amazon.com/blogs/architecture/field-notes-automating-data-ingestion-and-labeling-for-autonomous-vehicle-development/

This post was co-written by Amr Ragab, AWS Sr. Solutions Architect, EC2 Engineering and Anant Nawalgaria, former AWS Professional Services EMEA.

One of the most common needs we have heard from customers in Autonomous Vehicle (AV) development, is to launch a hybrid deployment environment at scale. As vehicle fleets are deployed across the globe, they are capturing real-time telemetry and sensory data. A data lake is created to process the data, and then iterating that dataset improves machine learning models for L3+ development. These datasets can include 4K60Hz camera video captures, LIDAR, RADAR and car telemetry data. The first step is to create the data gravity on which the compute and labeling portions will operate.

Architecture Overview

In this blog, we explain how to build the components in the following architecture, which takes an input dataset of 4K camera data, and performs event-driven video processing. This also includes anonymizing the dataset with face and license plate blurring using AWS Batch.

Figure 1 - Architecture for Automating Data Ingestion and Labeling for Autonomous Vehicle Development

Figure 1 – Architecture for Automating Data Ingestion and Labeling for Autonomous Vehicle Development

The output is a cleaned dataset which is processed in an Amazon SageMaker Ground Truth workflow. You can visualize the results in a SageMaker Jupyter notebook.

You can transfer data from on-premises to AWS with both online transfer using  AWS Storage Gateway, AWS Transfer Family, or AWS DataSync via AWS DirectConnect, as well as offline using AWS Snowball. Whether you use an online or offline approach, a fully automated processing workflow can still be initiated.

Description of the Dataset

The dataset we acquired was a driving sample in the N. Virginia/Washington DC metro area, as shown in the following map.

The dataset we acquired was a driving sample in the N. Virginia/Washington DC metro area, as shown in the following image

Figure 2 – Map of the area in which the driving sample was taken.

This path was chosen because of several different driving patterns including city, suburban, highway and unique driving characteristics.

We captured a 4K-60Hz video as well telemetry data from the CAN bus and finally GPS coordinates. The telemetry data from the CAN bus and GPS coordinates were streamed in real time over 4G/5G mobile network through the Amazon Kinesis service. Reach out to your AWS account team if you are interested in exploring connected car applications.

Video Processing Workflow and Manifest Creation

We walk you through the process for the video processing workflow and manifest creation.

  • The first step in the workflow is to take the 4K video file and split it into frames. Some additional processing is done to perform color and lens correction but that is specific to the camera used in this blog.
  • The next step is to use the ML-based anonymizer application which processes incoming video frames and applies face and license plate blurring on the dataset.
  • It uses the excellent work from Understand.ai and is available on Github via the Apache 2.0 License.
  • We then take the processed data and create a manifest.json file which is uploaded to a S3 bucket.
  • The S3 bucket then becomes the source for the SageMaker Ground Truth workflow.
  • Ancillary steps also include applying a lifecycle policy to Amazon S3 to transfer the raw video file to Amazon S3 Glacier. The video is then reconstructed from the processed frames. The docker image contains the following enablement stack:

Ubuntu 18.04 – nvcr.io/nvidia/cuda
AWS command line utility
Understand.ai anonymizer GitHub

SageMaker Ground Truth Labeling and Analysis

To preparing image metadata, we use Amazon Rekognition. Any extra information, such as other objects were added to the image using Amazon Rekognition, and following is the Lambda code for it.

Preparing image metadata using Amazon Rekognition

Figure 3 -Lambda code to add additional objects to the image using Amazon Rekognition.

Before starting the analysis, let’s create some helper functions using the following code:

Code sample to show Helper functions

Code sample to show additional help functions

Compute Summary Statistics

First, let’s compute some summary statistics for a set of labeling jobs. For example, we might want to make a high-level comparison between two labeling jobs performed with different settings. Here, we’ll calculate the number of annotations and the mean confidence score for two jobs. We know that the first job was run with three workers per task, and the second was run with five workers per task.

Code to calculate the number of annotations and the mean confidence score for two jobs.

We can determine that the mean confidence of the whole job was 40%. Now, let us do the querying on the dataN.

Example 1:

Objective: All images with at least 5 cars annotations with confidence score of at least 80%.

Query: select * from s3object s where s.”demo-full-dataset-2″ is not null and ‘Car’ in s.”demo-full-dataset-2-metadata”.”class-map”.* and size(s.”demo-full-dataset-2″.”annotations”) >= 5 and min(s.”demo-full-dataset-2-metadata”.objects[*].”confidence”) >= 0.8 limit 20


Query output



Photos showing video output

Example 2:

Objective: Identify all images with at least one Person in them.

Query: select * from s3object s where ‘Person’ in s.”Objects”[*] limit 10.


Objective: Identify all images with at least one Person in them


Identify all images with at least one Person in them

Results - Identify all images with at least one Person in them

Example 3:

Objective: Identify all images with at least some text in them (for example, on signboards)

Query: select * from s3object s where CHAR_LENGTH(s.Text)>0 and s.Text  limit 10


Objective: Identify all images with at least some text in them


Results - images with at least some text in them

images with at least some text in them


In this blog, we showed an architecture to automate data ingestion and Ground Truth labeling for autonomous vehicle development. We initiated a workflow to process a data lake to anonymize the individual video frames and then prepare the dataset for Ground Truth labeling. The ground truth labeling UI was offered to a globally distributed workforce which labeled our images at scale. If you are developing an autonomous vehicle/robotics platform, contact your AWS account team for more information.

Recommended Reading:

Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Deploying machine learning models with serverless templates

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/deploying-machine-learning-models-with-serverless-templates/

This post written by Sean Wilkinson, Machine Learning Specialist Solutions Architect, and Newton Jain, Senior Product Manager for Lambda

After designing and training machine learning models, data scientists deploy the models so applications can use them. AWS Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda’s pay-per-request billing, automatic scaling, and ease of use make it a popular deployment choice for data science teams.

With minimal code, data scientists can turn a model into a cost effective and scalable API endpoint backed by Lambda. Lambda supports container images, Advanced Vector Extensions 2 (AVX2), and functions with up to 10 GB of memory. Using these capabilities, data science teams can deploy larger, more powerful models with improved performance.

To deploy Lambda-based applications, serverless developers can use the AWS Serverless Application Model framework (AWS SAM). AWS SAM creates and manages serverless applications based on templates. It supports local testing, aids best practices, and integrates with popular developer tools. It allows data scientists to define serverless applications, security permissions, and advanced configuration capabilities using YAML.

AWS SAM contains pre-built templates that allow developers to get started quickly. This blog shows how to use machine learning templates to deploy a Scikit-Learn based model that classifies images of handwritten digits from zero to nine. Once deployed to Lambda, you can access the model via a REST API.

This walkthrough creates resources that incur costs in an AWS account. To minimize cost, follow the Cleaning up section to remove resources after completing the walkthrough.


The AWS SAM machine learning templates are available for the Scikit-Learn, PyTorch, TensorFlow, and XGBoost frameworks. Each template deploys a Lambda function to host the model behind an Amazon API Gateway, which serves as the front end and handles authentication. The following diagram shows the architecture of the solution:

Serverless architecture for ML inference

Serverless architecture for ML inference

Creating the containerized Lambda function

This section uses AWS SAM to build, test, and deploy a Docker image containing a pre-trained digit classifier model on Lambda:

  1. Update or install AWS SAM. AWS SAM CLI v1.24.1 or later is required to use the machine learning templates.
  2. In a terminal, create a new serverless application in AWS SAM using the command:
    sam init
  3. Follow the on-screen prompts, select AWS Quick Start Templates as the template source.

    SAM: choose a template source

    SAM: choose a template source

  4. Choose Image as the package type.

    SAM: Choose a package type

    SAM: Choose a package type

  5. Select amazon/python3.8-base as the base image.

    SAM: Choose an runtime image

    SAM: Choose an runtime image

  6. When prompted, enter an application name. AWS SAM uses this to group and label resources it creates.

    SAM: Choose an runtime image

    SAM: Choose an runtime image

  7. Select the desired ML framework from the template list. The walkthrough uses the Scikit-Learn template.

    SAM: choose the application template

    SAM: choose the application template

  8. AWS SAM creates a directory with the name of your application. Change to the new directory and run the AWS SAM build command:
    sam build

    SAM: build results

    SAM: build results

Files generated by AWS SAM

After selecting the template, AWS SAM generates the following files in the application directory:

  • Dockerfile: The application uses the Lambda-provided Python 3.8 base image. It installs the relevant dependencies and defines the CMD variable for the Lambda execution environment to initialize the handler.
    FROM public.ecr.aws/lambda/python:3.8
    COPY app.py requirements.txt ./
    COPY digit_classifier.joblib /opt/ml/model/1
    RUN python3.8 -m pip install -r requirements.txt -t .
    CMD ["app.lambda_handler"]
  • app.py: This Python code runs after the Lambda handler is invoked and generates predictions from the Scikit-Learn model. The model is reused across multiple Lambda invocations by loading it outside the lambda_handler.
    import joblib
    import base64
    import numpy as np
    import json
    from io import BytesIO
    from PIL import Image
    from scipy.ndimage import interpolation
    model_file = '/opt/ml/model'
    model = joblib.load(model_file)
    # Functions to pre-process images (we used same preprocessing when training)
    def moments(image):
        c0, c1 = np.mgrid[:image.shape[0], :image.shape[1]]
        img_sum = np.sum(image)
        m0 = np.sum(c0 * image) / img_sum
        m1 = np.sum(c1 * image) / img_sum
        m00 = np.sum((c0-m0)**2 * image) / img_sum
        m11 = np.sum((c1-m1)**2 * image) / img_sum
        m01 = np.sum((c0-m0) * (c1-m1) * image) / img_sum
        mu_vector = np.array([m0,m1])
        covariance_matrix = np.array([[m00, m01],[m01, m11]])
        return mu_vector, covariance_matrix
    def deskew(image):
        c, v = moments(image)
        alpha = v[0,1] / v[0,0]
        affine = np.array([[1,0], [alpha,1]])
        ocenter = np.array(image.shape) / 2.0
        offset = c - np.dot(affine, ocenter)
        return interpolation.affine_transform(image, affine, offset=offset)
    def get_np_image(image_bytes):
        image = Image.open(BytesIO(base64.b64decode(image_bytes))).convert(mode='L')
        image = image.resize((28, 28))
        return np.array(image)
    # Lambda handler code
    def lambda_handler(event, context):
        image_bytes = event['body'].encode('utf-8')
        x = deskew(get_np_image(image_bytes))
        prediction = int(model.predict(x.reshape(1, -1))[0])
        return {
            'statusCode': 200,
            'body': json.dumps(
                    "predicted_label": prediction,

After completing these steps, this is the directory structure:

File structure

File structure

Testing the AWS SAM templates

For container image-based Lambda functions, sam build creates and updates a container image in the local Docker repo. It copies the template to the output directory and updates the location for the newly built image.

You can see the following top-level tree under the .aws-sam directory:

SAM build artifacts directory structure

SAM build artifacts directory structure

After building the Docker image, use AWS SAM’s local test functionality to test the endpoint. There are two ways to test the application locally:

  1. Local invoke –event uses the mock data in event.json to invoke the function and generate a prediction. An image of a handwritten digit is encoded as a base64 string in the body attribute in the event.json file. Test using mock event.json:
    sam local invoke InferenceFunction --event events/event.json

    SAM local invoke results

    SAM local invoke results

  2. The start-api command starts up a local endpoint that emulates a REST API endpoint. It downloads an execution container that runs API Gateway and the Lambda function locally. Invoke using the API Gateway emulator:
    sam local start-apiSAM local start-api monitor

SAM local start-api monitorTo test the local endpoint use a REST client, like Postman, to send a POST request to the /classify_digit endpoint.

Testing with Postman

Testing with Postman

While testing locally, use images smaller than 100 KB. If the file is larger, the request fails with status code: 502 and the error “argument list too long”. After deploying to Lambda, you can use larger images.

Deploying the application to Lambda

After testing the model locally, use the AWS SAM guided deployment process to package and deploy the application:

  1. To deploy a Lambda function based on a container image, the container image must be pushed to Amazon Elastic Container Registry (ECR). Run the following command to retrieve an authentication token and authenticate the Docker client with the ECR registry. Replace the region and accountID placeholders with your Region and AWS account ID:
    aws --region <region> ecr get-login-password | docker login --username AWS --password-stdin <accountID>.dkr.ecr.<region>.amazonaws.com

    Login Succeeded

    Login Succeeded

  2. Use the AWS CLI to create an ECR repository called classifier-demo:
    aws ecr create-repository \
    --repository-name classifier-demo \
    --image-tag-mutability MUTABLE \
    --image-scanning-configuration scanOnPush=true

    Create ECR repo results

    Create ECR repo results

  3. Copy the repositoryUri from the output. This is needed in the next step. Initiate the AWS SAM guided deployment using the deploy command:
    sam deploy --guided
  4. Follow the on-screen prompts. To accept the default options provided in the interactive experience, press Enter. When prompted for an ECR repository, use the Amazon ECR repository created in the previous step.
    CloudFormation change set verification screen

    CloudFormation change set verification screen

    CloudFormation outputs

    CloudFormation outputs

  5. AWS SAM packages and deploys the application as a versioned entity. After deployment, the production API endpoint is ready to use. The template produces multiple outputs. Find the unique URL of the endpoint in the “HelloWorldAPI” key in the “Outputs” section.

After retrieving the URL, test the live endpoint using a REST client:

Testing with Postman

Testing with Postman

Optimizing performance

After the Lambda function is deployed, you can optimize for latency and cost. To do this, adjust the memory allocation setting for the function, which also linearly changes the allocated vCPU (to learn more, read the AWS News Blog).

The digit classifier model is optimized with 5 GB memory (~3 vCPUs). Any gains beyond 5 GB are relatively minor. Each model responds differently to changes in vCPU and memory, so it is best practice to determine this experimentally. There are open-source tools available to automate performance tuning.

Further optimizations can be made by compiling the source code to take advantage of AVX2 instructions. AVX2 allows Lambda to run more operations per clock cycle, reducing the time it takes a model to generate predictions.

Cleaning up

This walkthrough creates a Lambda function, API Gateway endpoint, and an ECR repository. These resources incur charges so it is recommended to clean up resources to avoid incurring cost. To delete the ECR repository, run:

aws ecr delete-repository --registry-id <account-id> --repository-name classifier-demo --force

To delete the remaining resources, navigate to AWS CloudFormation in the AWS Management Console and select the Region used for the walkthrough. Select the stack created by AWS SAM (the default is “sam-app”) and choose Delete.


Lambda is a cost-effective, scalable, and reliable way for data scientists to deploy CPU-based machine learning models for inference. With support for larger functions sizes, AVX2 instruction sets, and container image support, Lambda can now deploy more complex models while maintaining low latency.

Use the new machine learning templates within AWS SAM today to deploy your first serverless machine learning application in minutes. We look forward to seeing the exciting machine learning applications that you build on Lambda.

For more serverless learning resources, visit Serverless Land.

Gaining operational insights with AIOps using Amazon DevOps Guru

Post Syndicated from Nikunj Vaidya original https://aws.amazon.com/blogs/devops/gaining-operational-insights-with-aiops-using-amazon-devops-guru/

Amazon DevOps Guru offers a fully managed AIOps platform service that enables developers and operators to improve application availability and resolve operational issues faster. It minimizes manual effort by leveraging machine learning (ML) powered recommendations. Its ML models take advantage of AWS’s expertise in operating highly available applications for the world’s largest e-commerce business for over 20 years. DevOps Guru automatically detects operational issues, predicts impending resource exhaustion, details likely causes, and recommends remediation actions.

This post walks you through how to enable DevOps Guru for your account in a typical serverless environment and observe the insights and recommendations generated for various activities. These insights are generated for operational events that could pose a risk to your application availability. DevOps Guru uses AWS CloudFormation stacks as the application boundary to detect resource relationships and co-relate with deployment events.

Solution overview

The goal of this post is to demonstrate the insights generated for anomalies detected by DevOps Guru from DevOps operations. If you don’t have a test environment and want to build out infrastructure to test the generation of insights, then you can follow along through this post. However, if you have a test or production environment (preferably spawned from CloudFormation stacks), you can skip the first section and jump directly to section, Enabling DevOps Guru and injecting traffic.

The solution includes the following steps:

1. Deploy a serverless infrastructure.

2. Enable DevOps Guru and inject traffic.

3. Review the generated DevOps Guru insights.

4. Inject another failure to generate a new insight.

As depicted in the following diagram, we use a CloudFormation stack to create a serverless infrastructure, comprising of Amazon API Gateway, AWS Lambda, and Amazon DynamoDB, and inject HTTP requests at a high rate towards the API published to list records.

Serverless infrastructure monitored by DevOps Guru

When DevOps Guru is enabled to monitor your resources within the account, it uses a combination of vended Amazon CloudWatch metrics and specific patterns from its ML models to detect anomalies. When an anomaly is detected, it generates an insight with the recommendations.

The generation of insights is dependent upon several factors. Although this post provides a canned environment to reproduce insights, the results may vary depending upon traffic pattern, timings of traffic injection, and so on.


To complete this tutorial, you should have access to an AWS Cloud9 environment and the AWS Command Line Interface (AWS CLI).

Deploying a serverless infrastructure

To deploy your serverless infrastructure, you complete the following high-level steps:

1.   Create your IDE environment.

2.   Launch the CloudFormation template to deploy the serverless infrastructure.

3.   Populate the DynamoDB table.


1. Creating your IDE environment

We recommend using AWS Cloud9 to create an environment to get access to the AWS CLI from a bash terminal. AWS Cloud9 is a browser-based IDE that provides a development environment in the cloud. While creating the new environment, ensure you choose Linux2 as the operating system. Alternatively, you can use your bash terminal in your favorite IDE and configure your AWS credentials in your terminal.

When access is available, run the following command to confirm that you can see the Amazon Simple Storage Service (Amazon S3) buckets in your account:

aws s3 ls

Install the following prerequisite packages and ensure you have Python3 installed:

sudo yum install jq -y

export AWS_REGION=$(curl -s \ | jq -r '.region')

sudo pip3 install requests

Clone the git repository to download the required CloudFormation templates:

git clone https://github.com/aws-samples/amazon-devopsguru-samples
cd amazon-devopsguru-samples/generate-devopsguru-insights/

2. Launching the CloudFormation template to deploy your serverless infrastructure

To deploy your infrastructure, complete the following steps:

  • Run the CloudFormation template using the following command:
aws cloudformation create-stack --stack-name myServerless-Stack \
--template-body file:///$PWD/cfn-shops-monitoroper-code.yaml \

The AWS CloudFormation deployment creates an API Gateway, a DynamoDB table, and a Lambda function with sample code.

  • When it’s complete, go to the Outputs tab of the stack on the AWS CloudFormation console.
  • Record the links for the two APIs: one of them to list the table contents and other one to populate the contents.

3. Populating the DynamoDB table

Run the following commands (simply copy-paste) to populate the DynamoDB table. The below three commands will identify the name of the DynamoDB table from the CloudFormation stack and populate the name in the populate-shops-dynamodb-table.json file.

dynamoDBTableName=$(aws cloudformation list-stack-resources \
--stack-name myServerless-Stack | \
jq '.StackResourceSummaries[]|select(.ResourceType == "AWS::DynamoDB::Table").PhysicalResourceId' | tr -d '"')
sudo sed -i s/"<YOUR-DYNAMODB-TABLE-NAME>"/$dynamoDBTableName/g \
aws dynamodb batch-write-item \
--request-items file://populate-shops-dynamodb-table.json

The command gives the following output:

"UnprocessedItems": {}

This populates the DynamoDB table with a few sample records, which you can verify by accessing the ListRestApiEndpointMonitorOper API URL published on the Outputs tab of the CloudFormation stack. The following screenshot shows the output.

Screenshot showing the API output

Enabling DevOps Guru and injecting traffic

In this section, you complete the following high-level steps:

1.   Enable DevOps Guru for the CloudFormation stack.

2.   Wait for the serverless stack to complete.

3.   Update the stack.

4.   Inject HTTP requests into your API.


1. Enabling DevOps Guru for the CloudFormation stack

To enable DevOps Guru for CloudFormation, complete the following steps:

  • Run the CloudFormation template to enable DevOps Guru for this CloudFormation stack:
aws cloudformation create-stack \
--stack-name EnableDevOpsGuruForServerlessCfnStack \
--template-body file:///$PWD/EnableDevOpsGuruForServerlessCfnStack.yaml \
--parameters ParameterKey=CfnStackNames,ParameterValue=myServerless-Stack \
--region ${AWS_REGION}
  • When the stack is created, navigate to the Amazon DevOps Guru console.
  • Choose Settings.
  • Under CloudFormation stacks, locate myServerless-Stack.

If you don’t see it, your CloudFormation stack has not been successfully deployed. You may remove and redeploy the EnableDevOpsGuruForServerlessCfnStack stack.

Optionally, you can configure Amazon Simple Notification Service (Amazon SNS) topics or enable AWS Systems Manager integration to create OpsItem entries for every insight created by DevOps Guru. If you need to deploy as a stack set across multiple accounts and Regions, see Easily configure Amazon DevOps Guru across multiple accounts and Regions using AWS CloudFormation StackSets.

2. Waiting for baselining of resources

This is a necessary step to allow DevOps Guru to complete baselining the resources and benchmark the normal behavior. For our serverless stack with 3 resources, we recommend waiting for 2 hours before carrying out next steps. When enabled in a production environment, depending upon the number of resources selected to monitor, it can take up to 24 hours for it to complete baselining.

Note: Unlike many monitoring tools, DevOps Guru does not expect the dashboard to be continuously monitored and thus under normal system health, the dashboard would simply show zero’ed counters for the ongoing insights. It is only when an anomaly is detected, it will raise an alert and display an insight on the dashboard.

3. Updating the CloudFormation stack

When enough time has elapsed, we will make a configuration change to simulate a typical operational event. As shown below, update the CloudFormation template to change the read capacity for the DynamoDB table from 5 to 1.

CloudFormation showing read capacity settings to modify

Run the following command to deploy the updated CloudFormation template:

aws cloudformation update-stack --stack-name myServerless-Stack \
--template-body file:///$PWD/cfn-shops-monitoroper-code.yaml \

4. Injecting HTTP requests into your API

We now inject ingress traffic in the form of HTTP requests towards the ListRestApiEndpointMonitorOper API, either manually or using the Python script provided in the current directory (sendAPIRequest.py). Due to reduced read capacity, the traffic will oversubscribe the dynamodb tables, thus inducing an anomaly. However, before triggering the script, populate the url parameter with the correct API link for the  ListRestApiEndpointMonitorOper API, listed in the CloudFormation stack’s output tab.

After the script is saved, trigger the script using the following command:

python sendAPIRequest.py

Make sure you’re getting an output of status 200 with the records that we fed into the DynamoDB table (see the following screenshot). You may have to launch multiple tabs (preferably 4) of the terminal to run the script to inject a high rate of traffic.

Terminal showing script executing and injecting traffic to API

After approximately 10 minutes of the script running in a loop, an operational insight is generated in DevOps Guru.


Reviewing DevOps Guru insights

While these operations are being run, DevOps Guru monitors for anomalies, logs insights that provide details about the metrics indicating an anomaly, and prints actionable recommendations to mitigate the anomaly. In this section, we review some of these insights.

Under normal conditions, DevOps Guru dashboard will show the ongoing insights counter to be zero. It monitors a high number of metrics behind the scenes and offloads the operator from manually monitoring any counters or graphs. It raises an alert in the form of an insight, only when anomaly occurs.

The following screenshot shows an ongoing reactive insight for the specific CloudFormation stack. When you choose the insight, you see further details. The number under the Total resources analyzed last hour may vary, so for this post, you can ignore this number.

DevOps Guru dashboard showing an ongoing reactive insight

The insight is divided into four sections: Insight overview, Aggregated metrics, Relevant events, and Recommendations. Let’s take a closer look into these sections.

The following screenshot shows the Aggregated metrics section, where it shows metrics for all the resources that it detected anomalies in (DynamoDB, Lambda, and API Gateway). Note that depending upon your traffic pattern, lambda settings, baseline traffic, the list of metrics may vary. In the example below, the timeline shows that an anomaly for DynamoDB started first and was followed by API Gateway and Lambda. This helps us understand the cause and symptoms, and prioritize the right anomaly investigation.

The listing of metrics inside an Insight

Initially, you may see only two metrics listed, however, over time, it populates more metrics that showed anomalies. You can see the anomaly for DynamoDB started earlier than the anomalies for API Gateway and Lambda, thus indicating them as after effects. In addition to the information in the preceding screenshot, you may see Duration p90 and IntegrationLatency p90 (for Lambda and API Gateway respectively, due to increased backend latency) metrics also listed.

Now we check the Relevant events section, which lists potential triggers for the issue. The events listed here depend on the set of operations carried out on this CloudFormation stack’s resources in this Region. This makes it easy for the operator to be reminded of a change that may have caused this issue. The dots (representing events) that are near the Insight start portion of timeline are of particular interest.

Related Events shown inside the Insight

If you need to delve into any of these events, just click of any of these points, and it provides more details as shown below screenshot.

Delving into details of the related event listed in Insight

You can choose the link for an event to view specific details about the operational event (configuration change via CloudFormation update stack operation).

Now we move to the Recommendations section, which provides prescribed guidance for mitigating this anomaly. As seen in the following screenshot, it recommends to roll back the configuration change related to the read capacity for the DynamoDB table. It also lists specific metrics and the event as part of the recommendation for reference.

Recommendations provided inside the Insight

Going back to the metric section of the insight, when we select Graphed anomalies, it shows you the graphs of all related resources. Below screenshot shows a snippet showing anomaly for DynamoDB ReadThrottleEvents metrics. As seen in the below screenshot of the graph pattern, the read operations on the table are exceeding the provisioned throughput of read capacity. This clearly indicates an anomaly.

Graphed anomalies in DevOps Guru

Let’s navigate to the DynamoDB table and check our table configuration. Checking the table properties, we notice that our read capacity is reduced to 1. This is our root cause that led to this anomaly.

Checking the DynamoDB table capacity settings

If we change it to 5, we fix this anomaly. Alternatively, if the traffic is stopped, the anomaly moves to a Resolved state.

The ongoing reactive insight takes a few minutes after resolution to move to a Closed state.

Note: When the insight is active, you may see more metrics get populated over time as we detect further anomalies. When carrying out the preceding tests, if you don’t see all the metrics as listed in the screenshots, you may have to wait longer.


Injecting another failure to generate a new DevOps Guru insight

Let’s create a new failure and generate an insight for that.

1.   After you close the insight from the previous section, trigger the HTTP traffic generation loop from the AWS Cloud9 terminal.

We modify the Lambda functions’s resource-based policy by removing the permissions for API Gateway to access this function.

2.   On the Lambda console, open the function ScanFunctionMonitorOper.

3.   On the Permissions tab, access the policy.

Accessing the permissions tab for the Lambda


4.   Save a copy of the policy offline as a backup before making any changes.

5.   Note down the “Sid” values for the “AWS:SourceArn” that ends with prod/*/ and prod/*/*.

Checking the Resource-based policy for the Lambda

6.   Run the following command to remove the “Sid” JSON statements in your Cloud9 terminal:

aws lambda remove-permission --function-name ScanFunctionMonitorOper \
--statement-id <Sid-value-ending-with-prod/*/>

7.   Run the same command for the second Sid value:

aws lambda remove-permission --function-name ScanFunctionMonitorOper \
--statement-id <Sid-value-ending-with-prod/*/*>

You should see several 5XX errors, as in the following screenshot.

Terminal output now showing 500 errors for the script output

After less than 8 minutes, you should see a new ongoing reactive insight on the DevOps Guru dashboard.

Let’s take a closer look at the insight. The following screenshot shows the anomalous metric 5XXError Average of API Gateway and its duration. (This insight shows as closed because I had already restored permissions.)

Insight showing 5XX errors for API-Gateway and link to OpsItem

If you have configured to enable creating OpsItem in Systems Manager, you would see the link to OpsItem ID created in the insight, as shown above. This is an optional configuration, which will enable you to track the insights in the form of open tickets (OpsItems) in Systems Manager OpsCenter.

The recommendations provide guidance based upon the related events and anomalous metrics.

After the insight has been generated, reviewed, and verified, restore the permissions by running the following command:

aws lambda add-permission --function-name ScanFunctionMonitorOper  \
--statement-id APIGatewayProdPerm --action lambda:InvokeFunction \
--principal apigateway.amazonaws.com

If needed, you can insert the condition to point to the API Gateway ARN to allow only specific API Gateways to access the Lambda function.


Cleaning up

After you walk through this post, you should clean up and un-provision the resources to avoid incurring any further charges.

1.   To un-provision the CloudFormation stacks, on the AWS CloudFormation console, choose Stacks.

2.   Select each stack (EnableDevOpsGuruForServerlessCfnStack and myServerless-Stack) and choose Delete.

3.   Check to confirm that the DynamoDB table created with the stacks is cleaned up. If not, delete the table manually.

4.   Un-provision the AWS Cloud9 environment.



This post reviewed how DevOps Guru can continuously monitor the resources in your AWS account in a typical production environment. When it detects an anomaly, it generates an insight, which includes the vended CloudWatch metrics that breached the threshold, the CloudFormation stack in which the resource existed, relevant events that could be potential triggers, and actionable recommendations to mitigate the condition.

DevOps Guru generates insights that are relevant to you based upon the pre-trained machine-learning models, removing the undifferentiated heavy lifting of manually monitoring several events, metrics, and trends.

I hope this post was useful to you and that you would consider DevOps Guru for your production needs.


Top 15 Architecture Blog Posts of 2020

Post Syndicated from Jane Scolieri original https://aws.amazon.com/blogs/architecture/top-15-architecture-blog-posts-of-2020/

The goal of the AWS Architecture Blog is to highlight best practices and provide architectural guidance. We publish thought leadership pieces that encourage readers to discover other technical documentation, such as solutions and managed solutions, other AWS blogs, videos, reference architectures, whitepapers, and guides, Training & Certification, case studies, and the AWS Architecture Monthly Magazine. We welcome your contributions!

Field Notes is a series of posts within the Architecture blog channel which provide hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

We would like to thank you, our readers, for spending time on our blog this last year. Much appreciation also goes to our hard-working AWS Solutions Architects and other blog post writers. Below are the top 15 Architecture & Field Notes blog posts written in 2020.

#15: Field Notes: Choosing a Rehost Migration Tool – CloudEndure or AWS SMS

by Ebrahim (EB) Khiyami

In this post, Ebrahim provides some considerations and patterns where it’s recommended based on your migration requirements to choose one tool over the other.

Read Ebrahim’s post.

#14: Architecting for Reliable Scalability

by Marwan Al Shawi

In this post, Marwan explains how to architect your solution or application to reliably scale, when to scale and how to avoid complexity. He discusses several principles including modularity, horizontal scaling, automation, filtering and security.

Read Marwan’s post.

#13: Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

by Junjie Tang and Dean Phillips

In this post, Junjie and Dean explain how to build an Autonomous Driving Data Lake using this Reference Architecture. They cover all steps in the workflow from how to ingest the data, to moving it into an organized data lake construct.

Read Junjie’s and Dean’s post.

#12: Building a Self-Service, Secure, & Continually Compliant Environment on AWS

by Japjot Walia and Jonathan Shapiro-Ward

In this post, Jopjot and Jonathan provide a reference architecture for highly regulated Enterprise organizations to help them maintain their security and compliance posture. This blog post provides an overview of a solution in which AWS Professional Services engaged with a major Global systemically important bank (G-SIB) customer to help develop ML capabilities and implement a Defense in Depth (DiD) security strategy.

Read Jopjot’s and Jonathan’s post.

#11: Introduction to Messaging for Modern Cloud Architecture

by Sam Dengler

In this post, Sam focuses on best practices when introducing messaging patterns into your applications. He reviews some core messaging concepts and shows how they can be used to address challenges when designing modern cloud architectures.

Read Sam’s post.

#10: Building a Scalable Document Pre-Processing Pipeline

by Joel Knight

In this post, Joel presents an overview of an architecture built for Quantiphi Inc. This pipeline performs pre-processing of documents, and is reusable for a wide array of document processing workloads.

Read Joel’s post.

#9: Introducing the Well-Architected Framework for Machine Learning

by by Shelbee Eigenbrode, Bardia Nikpourian, Sireesha Muppala, and Christian Williams

In the Machine Learning Lens whitepaper, the authors focus on how to design, deploy, and architect your machine learning workloads in the AWS Cloud. The whitepaper describes the general design principles and the five pillars of the Framework as they relate to ML workloads.

Read the post.

#8: BBVA: Helping Global Remote Working with Amazon AppStream 2.0

by Jose Luis Prieto

In this post, Jose explains why BBVA chose Amazon AppStream 2.0 to accommodate the remote work experience. BBVA built a global solution reducing implementation time by 90% compared to on-premises projects, and is meeting its operational and security requirements.

Read Jose’s post.

#7: Field Notes: Serverless Container-based APIs with Amazon ECS and Amazon API Gateway

by Simone Pomata

In this post, Simone guides you through the details of the option based on Amazon API Gateway and AWS Cloud Map, and how to implement it. First you learn how the different components (Amazon ECS, AWS Cloud Map, API Gateway, etc.) work together, then you launch and test a sample container-based API.

Read Simone’s post.

#6: Mercado Libre: How to Block Malicious Traffic in a Dynamic Environment

by Gaston Ansaldo and Matias Ezequiel De Santi

In this post, readers will learn how to architect a solution that can ingest, store, analyze, detect and block malicious traffic in an environment that is dynamic and distributed in nature by leveraging various AWS services like Amazon CloudFront, Amazon Athena and AWS WAF.

Read Gaston’s and Matias’ post.

#5: Announcing the New Version of the Well-Architected Framework

by Rodney Lester

In this post, Rodney announces the availability of a new version of the AWS Well-Architected Framework, and focuses on such issues as removing perceived repetition, adding content areas to explicitly call out previously implied best practices, and revising best practices to provide clarity.

Read Rodney’s post.

#4: Serverless Stream-Based Processing for Real-Time Insights

by Justin Pirtle

In this post, Justin provides an overview of streaming messaging services and AWS Serverless stream processing capabilities. He shows how it helps you achieve low-latency, near real-time data processing in your applications.

Read Justin’s post.

#3: Field Notes: Working with Route Tables in AWS Transit Gateway

by Prabhakaran Thirumeni

In this post, Prabhakaran explains the packet flow if both source and destination network are associated to the same or different AWS Transit Gateway Route Table. He outlines a scenario with a substantial number of VPCs, and how to make it easier for your network team to manage access for a growing environment.

Read Prabhakaran’s post.

#2: Using VPC Sharing for a Cost-Effective Multi-Account Microservice Architecture

by Anandprasanna Gaitonde and Mohit Malik

Anand and Mohit present a cost-effective approach for microservices that require a high degree of interconnectivity and are within the same trust boundaries. This approach requires less VPC management while still using separate accounts for billing and access control, and does not sacrifice scalability, high availability, fault tolerance, and security.

Read Anand’s and Mohit’s post.

#1: Serverless Architecture for a Web Scraping Solution

by Dzidas Martinaitis

You may wonder whether serverless architectures are cost-effective or expensive. In this post, Dzidas analyzes a web scraping solution. The project can be considered as a standard extract, transform, load process without a user interface and can be packed into a self-containing function or a library.

Read Dzidas’ post.

Thank You

Thanks again to all our readers and blog post writers! We look forward to learning and building amazing things together in 2021.

Field Notes: Improving Call Center Experiences with Iterative Bot Training Using Amazon Connect and Amazon Lex

Post Syndicated from Marius Cealera original https://aws.amazon.com/blogs/architecture/field-notes-improving-call-center-experiences-with-iterative-bot-training-using-amazon-connect-and-amazon-lex/

This post was co-written by Abdullah Sahin, senior technology architect at Accenture, and Muhammad Qasim, software engineer at Accenture. 

Organizations deploying call-center chat bots are interested in evolving their solutions continuously, in response to changing customer demands. When developing a smart chat bot, some requests can be predicted (for example following a new product launch or a new marketing campaign). There are however instances where this is not possible (following market shifts, natural disasters, etc.)

While voice and chat bots are becoming more and more ubiquitous, keeping the bots up-to-date with the ever-changing demands remains a challenge.  It is clear that a build>deploy>forget approach quickly leads to outdated AI that lacks the ability to adapt to dynamic customer requirements.

Call-center solutions which create ongoing feedback mechanisms between incoming calls or chat messages and the chatbot’s AI, allow for a programmatic approach to predicting and catering to a customer’s intent.

This is achieved by doing the following:

  • applying natural language processing (NLP) on conversation data
  • extracting relevant missed intents,
  • automating the bot update process
  • inserting human feedback at key stages in the process.

This post provides a technical overview of one of Accenture’s Advanced Customer Engagement (ACE+) solutions, explaining how it integrates multiple AWS services to continuously and quickly improve chatbots and stay ahead of customer demands.

Call center solution architects and administrators can use this architecture as a starting point for an iterative bot improvement solution. The goal is to lead to an increase in call deflection and drive better customer experiences.

Overview of Solution

The goal of the solution is to extract missed intents and utterances from a conversation and present them to the call center agent at the end of a conversation, as part of the after-work flow. A simple UI interface was designed for the agent to select the most relevant missed phrases and forward them to an Analytics/Operations Team for final approval.

Figure 1 – Architecture Diagram

Amazon Connect serves as the contact center platform and handles incoming calls, manages the IVR flows and the escalations to the human agent. Amazon Connect is also used to gather call metadata, call analytics and handle call center user management. It is the platform from which other AWS services are called: Amazon Lex, Amazon DynamoDB and AWS Lambda.

Lex is the AI service used to build the bot. Lambda serves as the main integration tool and is used to push bot transcripts to DynamoDB, deploy updates to Lex and to populate the agent dashboard which is used to flag relevant intents missed by the bot. A generic CRM app is used to integrate the agent client and provide a single, integrated, dashboard. For example, this addition to the agent’s UI, used to review intents, could be implemented as a custom page in Salesforce (Figure 2).

Figure 2 – Agent feedback dashboard in Salesforce. The section allows the agent to select parts of the conversation that should be captured as intents by the bot.

A separate, stand-alone, dashboard is used by an Analytics and Operations Team to approve the new intents, which triggers the bot update process.


The typical use case for this solution (Figure 4) shows how missing intents in the bot configuration are captured from customer conversations. These intents are then validated and used to automatically build and deploy an updated version of a chatbot. During the process, the following steps are performed:

  1. Customer intents that were missed by the chatbot are automatically highlighted in the conversation
  2. The agent performs a review of the transcript and selects the missed intents that are relevant.
  3. The selected intents are sent to an Analytics/Ops Team for final approval.
  4. The operations team validates the new intents and starts the chatbot rebuild process.

Figure 3 – Use case: the bot is unable to resolve the first call (bottom flow). Post-call analysis results in a new version of the bot being built and deployed. The new bot is able to handle the issue in subsequent calls (top flow)

During the first call (bottom flow) the bot fails to fulfil the request and the customer is escalated to a Live Agent. The agent resolves the query and, post call, analyzes the transcript between the chatbot and the customer, identifies conversation parts that the chatbot should have understood and sends a ‘missed intent/utterance’ report to the Analytics/Ops Team. The team approves and triggers the process that updates the bot.

For the second call, the customer asks the same question. This time, the (trained) bot is able to answer the query and end the conversation.

Ideally, the post-call analysis should be performed, at least in part, by the agent handling the call. Involving the agent in the process is critical for delivering quality results. Any given conversation can have multiple missed intents, some of them irrelevant when looking to generalize a customer’s question.

A call center agent is in the best position to judge what is or is not useful and mark the missed intents to be used for bot training. This is the important logical triage step. Of course, this will result in the occasional increase in the average handling time (AHT). This should be seen as a time investment with the potential to reduce future call times and increase deflection rates.

One alternative to this setup would be to have a dedicated analytics team review the conversations, offloading this work from the agent. This approach avoids the increase in AHT, but also introduces delay and, possibly, inaccuracies in the feedback loop.

The approval from the Analytics/Ops Team is a sign off on the agent’s work and trigger for the bot building process.


The following section focuses on the sequence required to programmatically update intents in existing Lex bots. It assumes a Connect instance is configured and a Lex bot is already integrated with it. Navigate to this page for more information on adding Lex to your Connect flows.

It also does not cover the CRM application, where the conversation transcript is displayed and presented to the agent for intent selection.  The implementation details can vary significantly depending on the CRM solution used. Conceptually, most solutions will follow the architecture presented in Figure1: store the conversation data in a database (DynamoDB here) and expose it through an (API Gateway here) to be consumed by the CRM application.

Lex bot update sequence

The core logic for updating the bot is contained in a Lambda function that triggers the Lex update. This adds new utterances to an existing bot, builds it and then publishes a new version of the bot. The Lambda function is associated with an API Gateway endpoint which is called with the following body:

	“intent”: “INTENT_NAME”,
	“utterances”: [“UTTERANCE_TO_ADD_1”, “UTTERANCE_TO_ADD_2” …]

Steps to follow:

  1. The intent information is fetched from Lex using the getIntent API
  2. The existing utterances are combined with the new utterances and deduplicated.
  3. The intent information is updated with the new utterances
  4. The updated intent information is passed to the putIntent API to update the Lex Intent
  5. The bot information is fetched from Lex using the getBot API
  6. The intent version present within the bot information is updated with the new intent

Figure 4 – Representation of Lex Update Sequence


7. The update bot information is passed to the putBot API to update Lex and the processBehavior is set to “BUILD” to trigger a build. The following code snippet shows how this would be done in JavaScript:

const updateBot = await lexModel
        processBehavior: "BUILD"

9. The last step is to publish the bot, for this we fetch the bot alias information and then call the putBotAlias API.

const oldBotAlias = await lexModel
        name: config.botAlias,
        botName: updatedBot.name

return lexModel
        name: config.botAlias,
        botName: updatedBot.name,
        botVersion: updatedBot.version,
        checksum: oldBotAlias.checksum,


In this post, we showed how a programmatic bot improvement process can be implemented around Amazon Lex and Amazon Connect.  Continuously improving call center bots is a fundamental requirement for increased customer satisfaction. The feedback loop, agent validation and automated bot deployment pipeline should be considered integral parts to any a chatbot implementation.

Finally, the concept of a feedback-loop is not specific to call-center chatbots. The idea of adding an iterative improvement process in the bot lifecycle can also be applied in other areas where chatbots are used.

Accelerating Innovation with the Accenture AWS Business Group (AABG)

By working with the Accenture AWS Business Group (AABG), you can learn from the resources, technical expertise, and industry knowledge of two leading innovators, helping you accelerate the pace of innovation to deliver disruptive products and services. The AABG helps customers ideate and innovate cloud solutions with customers through rapid prototype development.

Connect with our team at [email protected] to learn and accelerate how to use machine learning in your products and services.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.


Abdullah Sahin

Abdullah Sahin

Abdullah Sahin is a senior technology architect at Accenture. He is leading a rapid prototyping team bringing the power of innovation on AWS to Accenture customers. He is a fan of CI/CD, containerization technologies and IoT.

Muhammad Qasim

Muhammad Qasin

Muhammad Qasim is a software engineer at Accenture and excels in development of voice bots using services such as Amazon Connect. In his spare time, he plays badminton and loves to go for a run.

Automating Recommendation Engine Training with Amazon Personalize and AWS Glue

Post Syndicated from Alexander Spivak original https://aws.amazon.com/blogs/architecture/automating-recommendation-engine-training-with-amazon-personalize-and-aws-glue/

Customers from startups to enterprises observe increased revenue when personalizing customer interactions. Still, many companies are not yet leveraging the power of personalization, or, are relying solely on rule-based strategies. Those strategies are effort-intensive to maintain and not effective. Common reasons for not launching machine learning (ML) based personalization projects include: the complexity of aggregating and preparing the datasets, gaps in data science expertise and the lack of trust regarding the quality of ML recommendations.

This blog post demonstrates an approach for product recommendations to mitigate those concerns using historical datasets. To get started with your personalization journey, you don’t need ML expertise or a data lake. The following serverless end-to-end architecture involves aggregating and transforming the required data, as well as automatically training an ML-based recommendation engine.

I will outline the architectural production-ready setup for personalized product recommendations based on historical datasets. This is of interest to data analysts who look for ways to bring an existing recommendation engine to production, as well as solutions architects new to ML.

Solution Overview

The two core elements to create a proof-of-concept for ML-based product recommendations are:

  1. the recommendation engine and,
  2. the data set to train the recommendation engine.

Let’s start with the recommendation engine first, and work backwards to the corresponding data needs.

Product recommendation engine

To create the product recommendation engine, we use Amazon Personalize. Amazon Personalize differentiates three types of input data:

  1. user events called interactions (user events like views, signups or likes),
  2. item metadata (description of your items: category, genre or availability), and
  3. user metadata (age, gender, or loyalty membership).

An interactions dataset is typically the minimal requirement to build a recommendation system. Providing user and item metadata datasets improves recommendation accuracy, and enables cold starts, item discovery and dynamic recommendation filtering.

Most companies already have existing historical datasets to populate all three input types. In the case of retail companies, the product order history is a good fit for interactions. In the case of the media and entertainment industry, the customer’s consumption history maps to the interaction dataset. The product and media catalogs map to the items dataset and the customer profiles to the user dataset.

Amazon Personalize: from datasets to a recommendation API

Amazon Personalize: from datasets to a recommendation API

The Amazon Personalize Deep Dive Series provides a great introduction into the service and explores the topics of training, inference and operations. There are also multiple blog posts available explaining how to create a recommendation engine with Amazon Personalize and how to select the right metadata for the engine training. Additionally, the Amazon Personalize samples repository in GitHub showcases a variety of topics: from getting started with Amazon Personalize, up to performing a POC in a Box using existing datasets, and, finally, automating the recommendation engine with MLOps. In this post, we focus on getting the data from the historical data sources into the structure required by Amazon Personalize.

Creating the dataset

While manual data exports are a quick way to get started with one-time datasets for experiments, we use AWS Glue to automate this process. The automated approach with AWS Glue speeds up the proof of concept (POC) phase and simplifies the process to production by:

  • easily reproducing dataset exports from various data sources. This are used to iterate with other feature sets for recommendation engine training.
  • adding additional data sources and using those to enrich existing datasets
  • efficiently performing transformation logic like column renaming and fuzzy matching out of the box with code generation support.

AWS Glue is a serverless data integration service that is scalable and simple to use. It provides all of the capabilities needed for data integration and supports a wide variety of data sources: Amazon S3 buckets, JDBC connectors, MongoDB databases, Kafka, and Amazon Redshift, the AWS data warehouse. You can even make use of data sources living outside of your AWS environment, e.g. on-premises data centers or other services outside of your VPC. This enables you to perform a data-driven POC even when the data is not yet in AWS.

Modern application environments usually combine multiple heterogeneous database systems, like operational relational and NoSQL databases, in addition to, the BI-powering data warehouses. With AWS Glue, we orchestrate the ETL (extract, transform, and load) jobs to fetch the relevant data from the corresponding data sources. We then bring it into a format that Amazon Personalize understands: CSV files with pre-defined column names hosted in an Amazon S3 bucket.

Each dataset consists of one or multiple CSV files, which can be uniquely identified by an Amazon S3 prefix. Additionally, each dataset must have an associated schema describing the structure of your data. Depending on the dataset type, there are required and pre-defined fields:

  • USER_ID (string) and one additional metadata field for the users dataset
  • ITEM_ID (string) and one additional metadata field for the items dataset
  • USER_ID (string), ITEM_ID (string), TIMESTAMP (long; as Epoch time) for the interactions dataset

The following graph presents a high-level architecture for a retail customer, who has a heterogeneous data store landscape.

Using AWS Glue to export datasets from heterogeneous data sources to Amazon S3

Using AWS Glue to export datasets from heterogeneous data sources to Amazon S3

To understand how AWS Glue connects to the variety of data sources and allows transforming the data into the required format, we need to drill down into the AWS Glue concepts and components.

One of the key components of AWS Glue is the AWS Glue Data Catalog: a persistent metadata store containing table definitions, connection information, as well as, the ETL job definitions.
The tables are metadata definitions representing the structure of the data in the defined data sources. They do not contain any data entries from the sources but solely the structure definition. You can create a table either manually or automatically by using AWS Glue Crawlers.

AWS Glue Crawlers scan the data in the data sources, extract the schema information from it, and store the metadata as tables in the AWS Glue Data Catalog. This is the preferred approach for defining tables. The crawlers use AWS Glue Connections to connect to the data sources. Each connection contains the properties that are required to connect to a particular data store. The connections will be also used later by the ETL jobs to fetch the data from the data sources.

AWS Glue Crawlers also help to overcome a challenge frequently appearing in microservice environments. Microservice architectures are frequently operated by fully independent and autonomous teams. This means that keeping track of changes to the data source format becomes a challenge. Based on a schedule, the crawlers can be triggered to update the metadata for the relevant data sources in the AWS Glue Data Catalog automatically. To detect cases when a schema change would break the ETL job logic, you can combine the CloudWatch Events emitted by AWS Glue on updating the Data Catalog tables with an AWS Lambda function or a notification send via the Amazon Simple Notification Service (SNS).

The AWS Glue ETL jobs use the defined connections and the table information from the Data Catalog to extract the data from the described sources, apply the user-defined transformation logic and write the results into a data sink. AWS Glue can automatically generate code for ETL jobs to help perform a variety of useful data transformation tasks. AWS Glue Studio makes the ETL development even simpler by providing an easy-to-use graphical interface that accelerates the development and allows designing jobs without writing any code. If required, the generated code can be fully customized.

AWS Glue supports Apache Spark jobs, written either in Python or in Scala, and Python Shell jobs. Apache Spark jobs are optimized to run in a highly scalable, distributed way dealing with any amount of data and are a perfect fit for data transformation jobs. The Python Shell jobs provide access to the raw Python execution environment, which is less scalable but provides a cost-optimized option for orchestrating AWS SDK calls.

The following diagram visualizes the interaction between the components described.

The basic concepts of populating your Data Catalog and processing ETL dataflow in AWS Glue

The basic concepts of populating your Data Catalog and processing ETL dataflow in AWS Glue

For each Amazon Personalize dataset type, we create a separate ETL job. Since those jobs are independent, they also can run in parallel. After all jobs have successfully finished, we can start the recommendation engine training. AWS Glue Workflows allow simplifying data pipelines by visualizing and monitoring complex ETL activities involving multiple crawlers, jobs, and triggers, as a single entity.

The following graph visualizes a typical dataset export workflow for training a recommendation engine, which consists of:

  • a workflow trigger being either manual or scheduled
  • a Python Shell job to remove the results of the previous export workflow from S3
  • a trigger firing when the removal job is finished and initiating the parallel execution of the dataset ETL jobs
  • the three Apache Spark ETL jobs, one per dataset type
  • a trigger firing when all three ETL jobs are finished and initiating the training notification job
  • a Python Shell job to initiate a new dataset import or a full training cycle in Amazon Personalize (e.g. by triggering the MLOps pipeline using the AWS SDK)


AWS Glue workflow for extracting the three datasets and triggering the training workflow of the recommendation engine

AWS Glue workflow for extracting the three datasets and triggering the training workflow of the recommendation engine

Combining the data export and the recommendation engine

In the previous sections, we discussed how to create an ML-based recommendation engine and how to create the datasets for the training of the engine. In this section, we combine both parts of the solution leveraging an adjusted version of the MLOps pipeline solution available on GitHub to speed up the iterations on new solution versions by avoiding manual steps. Moreover, automation means new items can be put faster into production.

The MLOps pipeline uses a JSON file hosted in an S3 bucket to describe the training parameters for Amazon Personalize. The creation of a new parameter file version triggers a new training workflow orchestrated in a serverless manner using AWS Step Functions and AWS Lambda.

To integrate the Glue data export workflow described in the previous section, we also enable the Glue workflow to trigger the training pipeline. Additionally, we manipulate the pipeline to read the parameter file as the first pipeline step. The resulting architecture enables an automated end-to-end set up from dataset export up to the recommendation engine creation.

End-to-end architecture combining the data export with AWS Glue, the MLOps training workflow and Amazon Personalize

End-to-end architecture combining the data export with AWS Glue, the MLOps training workflow, and Amazon Personalize

The architecture for the end-to-end data export and recommendation engine creation solution is completely serverless. This makes it highly scalable, reliable, easy to maintain, and cost-efficient. You pay only for what you consume. For example, in the case of the data export, you pay only for the duration of the AWS Glue crawler executions and ETL jobs. These are only need to run to iterate with a new dataset.

The solution is also flexible in terms of the connected data sources. This architecture is also recommended for use cases with a single data source. You can also start with a single data store and enrich the datasets on-demand with additional data sources in future iterations.

Testing the quality of the solution

A common approach to validate the quality of the solution is the A/B testing technique, which is widely used to measure the efficacy of generated recommendations. Based on the testing results, you can iterate on the recommendation engine by optimizing the underlying datasets and models. The high degree of automation increases the speed of iterations and the resiliency of the end-to-end process.


In this post, I presented a typical serverless architecture for a fully automated, end-to-end ML-based recommendation engine leveraging available historical datasets. As you begin to experiment with ML-based personalization, you will unlock value currently hidden in the data. This helps mitigate potential concerns like the lack of trust in machine learning and you can put the resulting engine into production.

Start your personalization journey today with the Amazon Personalize code samples and bring the engine to production with the architecture outlined in this blog. As a next step, you can involve recording real-time events to update the generated recommendations automatically based on the event data.


Field Notes: Comparing Algorithm Performance Using MLOps and the AWS Cloud Development Kit

Post Syndicated from Moataz Gaber original https://aws.amazon.com/blogs/architecture/field-notes-comparing-algorithm-performance-using-mlops-and-the-aws-cloud-development-kit/

Comparing machine learning algorithm performance is fundamental for machine learning practitioners, and data scientists. The goal is to evaluate the appropriate algorithm to implement for a known business problem.

Machine learning performance is often correlated to the usefulness of the model deployed. Improving the performance of the model typically results in an increased accuracy of the prediction. Model accuracy is a key performance indicator (KPI) for businesses when evaluating production readiness and identifying the appropriate algorithm to select earlier in model development. Organizations benefit from reduced project expenses, accelerated project timelines and improved customer experience. Nevertheless, some organizations have not introduced a model comparison process into their workflow which negatively impacts cost and productivity.

In this blog post, I describe how you can compare machine learning algorithms using Machine Learning Operations (MLOps). You will learn how to create an MLOps pipeline for comparing machine learning algorithms performance using AWS Step Functions, AWS Cloud Development Kit (CDK) and Amazon SageMaker.

First, I explain the use case that will be addressed through this post. Then, I explain the design considerations for the solution. Finally, I provide access to a GitHub repository which includes all the necessary steps for you to replicate the solution I have described, in your own AWS account.

Understanding the Use Case

Machine learning has many potential uses and quite often the same use case is being addressed by different machine learning algorithms. Let’s take Amazon Sagemaker built-in algorithms. As an example, if you are having a “Regression” use case, it can be addressed using (Linear Learner, XGBoost and KNN) algorithms. Another example for a “Classification” use case you can use algorithm such as (XGBoost, KNN, Factorization Machines and Linear Learner). Similarly for “Anomaly Detection” there are (Random Cut Forests and IP Insights).

In this post, it is a “Regression” use case to identify the age of the abalone which can be calculated based on the number of rings on its shell (age equals to number of rings plus 1.5). Usually the number of rings are counted through microscopes examinations.

I use the abalone dataset in libsvm format which contains 9 fields [‘Rings’, ‘Sex’, ‘Length’,’ Diameter’, ‘Height’,’ Whole Weight’,’ Shucked Weight’,’ Viscera Weight’ and ‘Shell Weight’] respectively.

The features starting from Sex to Shell Weight are physical measurements that can be measured using the correct tools. Therefore, using the machine learning algorithms (Linear Learner and XGBoost) to address this use case, the complexity of having to examine the abalone under microscopes to understand its age can be improved.

Benefits of the AWS Cloud Development Kit (AWS CDK)

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework to define your cloud application resources.

The AWS CDK uses the jsii which is an interface developed by AWS that allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase.

This means that you can use the CDK and define your cloud application resources in typescript language for example. Then by compiling your source module using jsii, you can package it as modules in one of the supported target languages (e.g: Javascript, python, Java and .Net). So if your developers or customers prefer any of those languages, you can easily package and export the code to their preferred choice.

Also, the cdk tf provides constructs for defining Terraform HCL state files and the cdk8s enables you to use constructs for defining kubernetes configuration in TypeScript, Python, and Java. So by using the CDK you have a faster development process and easier cloud onboarding. It makes your cloud resources more flexible for sharing.


Overview of solution

This architecture serves as an example of how you can build a MLOps pipeline that orchestrates the comparison of results between the predictions of two algorithms.

The solution uses a completely serverless environment so you don’t have to worry about managing the infrastructure. It also deletes resources not needed after collecting the predictions results, so as not to incur any additional costs.

Figure 1: Solution Architecture


In the preceding diagram, the serverless MLOps pipeline is deployed using AWS Step Functions workflow. The architecture contains the following steps:

  1. The dataset is uploaded to the Amazon S3 cloud storage under the /Inputs directory (prefix).
  2. The uploaded file triggers AWS Lambda using an Amazon S3 notification event.
  3. The Lambda function then will initiate the MLOps pipeline built using a Step Functions state machine.
  4. The starting lambda will start by collecting the region corresponding training images URIs for both Linear Learner and XGBoost algorithms. These are used in training both algorithms over the dataset. It will also get the Amazon SageMaker Spark Container Image which is used for running the SageMaker processing Job.
  5. The dataset is in libsvm format which is accepted by the XGBoost algorithm as per the Input/Output Interface for the XGBoost Algorithm. However, this is not supported by the Linear Learner Algorithm as per Input/Output interface for the linear learner algorithm. So we need to run a processing job using Amazon SageMaker Data Processing with Apache Spark. The processing job will transform the data from libsvm to csv and will divide the dataset into train, validation and test datasets. The output of the processing job will be stored under /Xgboost and /Linear directories (prefixes).

Figure 2: Train, validation and test samples extracted from dataset

6. Then the workflow of Step Functions will perform the following steps in parallel:

    • Train both algorithms.
    • Create models out of trained algorithms.
    • Create endpoints configurations and deploy predictions endpoints for both models.
    • Invoke lambda function to describe the status of the deployed endpoints and wait until the endpoints become in “InService”.
    • Invoke lambda function to perform 3 live predictions using boto3 and the “test” samples taken from the dataset to calculate the average accuracy of each model.
    • Invoke lambda function to delete deployed endpoints not to incur any additional charges.

7. Finally, a Lambda function will be invoked to determine which model has better accuracy in predicting the values.

The following shows a diagram of the workflow of the Step Functions:

Figure 3: AWS Step Functions workflow graph

The code to provision this solution along with step by step instructions can be found at this GitHub repo.

Results and Next Steps

After waiting for the complete execution of step functions workflow, the results are depicted in the following diagram:

Figure 4: Comparison results

This doesn’t necessarily mean that the XGBoost algorithm will always be the better performing algorithm. It just means that the performance was the result of these factors:

  • the hyperparameters configured for each algorithm
  • the number of epochs performed
  • the amount of dataset samples used for training

To make sure that you are getting better results from the models, you can run hyperparameters tuning jobs which will run many training jobs on your dataset using the algorithms and ranges of hyperparameters that you specify. This helps you allocate which set of hyperparameters which are giving better results.

Finally, you can use this comparison to determine which algorithm is best suited for your production environment. Then you can configure your step functions workflow to update the configuration of the production endpoint with the better performing algorithm.

Figure 5: Update production endpoint workflow


This post showed you how to create a repeatable, automated pipeline to deliver the better performing algorithm to your production predictions endpoint. This helps increase the productivity and reduce the time of manual comparison.  You also learned to provision the solution using AWS CDK and to perform regular cleaning of deployed resources to drive down business costs. If this post helps you or inspires you to solve a problem, share your thoughts and questions in the comments. You can use and extend the code on the GitHub repo.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers

New – Store, Discover, and Share Machine Learning Features with Amazon SageMaker Feature Store

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/new-store-discover-and-share-machine-learning-features-with-amazon-sagemaker-feature-store/

Today, I’m extremely happy to announce Amazon SageMaker Feature Store, a new capability of Amazon SageMaker that makes it easy for data scientists and machine learning engineers to securely store, discover and share curated data used in training and prediction workflows.

For all the importance of selecting the right algorithm to train machine learning (ML) models, experienced practitioners know how crucial it is to feed it with high-quality data. Cleaning data is a good first step, and ML workflows routinely include steps to fill missing values, remove outliers, and so on. Then, they often move on to transforming data, using a mix of common and arcane techniques known as “feature engineering.”

Simply put, the purpose of feature engineering is to transform your data and to increase its expressiveness so that the algorithm may learn better. For instance, many columnar datasets include strings, such as street addresses. To most ML algorithms, strings are meaningless, and they need to be encoded in a numerical representation. Thus, you could replace street addresses with GPS coordinates, a much more expressive way to learn the concept of location. In other words, if data is the new oil, then feature engineering is the refining process that turns it into high-octane jet fuel that helps models get to stratospheric accuracy.

Indeed, ML practitioners spend a lot of time crafting feature engineering code, applying it to their initial datasets, training models on the engineered datasets, and evaluating model accuracy. Given the experimental nature of this work, even the smallest project will lead to multiple iterations. The same feature engineering code is often run again and again, wasting time and compute resources on repeating the same operations. In large organizations, this can cause an even greater loss of productivity, as different teams often run identical jobs, or even write duplicate feature engineering code because they have no knowledge of prior work.

There’s another hard problem that ML teams have to solve. As models are trained on engineered datasets, it’s imperative to apply the same transformations to data sent for prediction. This often means rewriting feature engineering code, sometimes in a different language, integrating it in your prediction workflow, and running it at prediction time. This whole process is not only time-consuming, it can also introduce inconsistencies, as even the tiniest variation in a data transform can have a large impact on predictions.

In order to solve these problems, ML teams sometimes build a feature store, a central repository where they can keep and retrieve engineered data used in their training and predictions jobs. As useful as feature stores are, building and managing your own involves a lot of engineering, infrastructure, and operational effort that takes valuable time away from actual ML work. Customers asked us for a better solution, and we got to work.

Introducing Amazon SageMaker Feature Store
Amazon SageMaker Feature Store is a fully managed centralized repository for your ML features, making it easy to securely store and retrieve features without having to manage any infrastructure. It’s part of Amazon SageMaker, our fully managed service for ML, and supports all algorithms. It’s also integrated with Amazon SageMaker Studio, our web-based development environment for ML.

Features stored in SageMaker Feature Store are organized in groups, and tagged with metadata. Thanks to this, you can quickly discover which features are available, and whether they’re suitable for your models. Multiple teams can also easily share and re-use features, reducing the cost of development and accelerating innovation.

Once stored, features can be retrieved and used in your SageMaker workflows: model training, batch transform, and real-time prediction with low latency. Not only do you avoid duplicating work, you also build consistent workflows that use the same consistent features stored in the offline and online stores.

The Climate Corporation (Climate) is a subsidiary of Bayer, and the industry leader in bringing digital innovation to farmers. Says Daniel McCaffrey, Vice President, Data and Analytics, Climate: “At Climate, we believe in providing the world’s farmers with accurate information to make data driven decisions and maximize their return on every acre. To achieve this, we have invested in technologies such as machine learning tools to build models using measurable entities known as features, such as yield for a grower’s field. With Amazon SageMaker Feature Store, we can accelerate the development of ML models with a central feature store to access and reuse features across multiple teams easily. SageMaker Feature Store makes it easy to access features in real-time using the online store, or run features on a schedule using the offline store for different use cases, and we can develop ML models faster.”

Care.com, the world’s leading platform for finding and managing high-quality family care, is also using Amazon SageMaker Feature Store. This is what Clemens Tummeltshammer, Data Science Manager, Care.com, told us: “A strong care industry where supply matches demand is essential for economic growth from the individual family up to the nation’s GDP. We’re excited about Amazon SageMaker Feature Store and Amazon SageMaker Pipelines , as we believe they will help us scale better across our data science and development teams, by using a consistent set of curated data that we can use to build scalable end-to-end machine learning model pipelines from data preparation to deployment. With the newly announced capabilities of Amazon SageMaker, we can accelerate development and deployment of our ML models for different applications, helping our customers make better informed decisions through faster real-time recommendations.

Now, let’s see how you can get started.

Storing and Retrieving Features with Amazon SageMaker Feature Store
Once you’ve run your feature engineering code on your data, you can organize and store your engineered features in SageMaker Feature Store, by grouping them in feature groups. A feature group is a collection of records, similar to rows in a table. Each record has a unique identifier, and holds the engineered feature values for one of the data instances in your original data source. Optionally, you can choose to encrypt the data at rest using your own AWS Key Management Service (KMS) key that is unique for each feature group.

How you define feature groups is up to you. For example, you could create one per data source (CSV files, database tables, and so on), and use a convenient unique column as the record identifier (primary key, customer id, transaction id, and so on).

Once you’ve got your groups figured out, you should repeat the following steps for each group:

  1. Create feature definitions, with the name and the type of each feature in a record (Fractional, Integral, or String).
  2. Create each feature group with the create_feature_group() API:
         # The name of the feature group
         # The name of the column acting as the record identifier
         # The name of the column action as the feature timestamp
         EventTimeFeatureName = event_time_feature_name,
         # A list of feature names and types
         # The S3 location for the offline feature store
         # Optionally, enable the online feature store
         # An IAM role
  3. In each feature group, store records containing a collection of feature name/feature value pairs, using the put_record() API:

    For faster ingestion, you could create multiple threads and parallelize this operation.

At this point, features will be available in Amazon SageMaker Feature Store. Thanks to the offline store, you can use services such as Amazon Athena, AWS Glue, or Amazon EMR to build datasets for training: fetch the corresponding JSON objects in S3, select the features that you need, and save them in S3 in the format expected by your ML algorithm. From then on, it’s SageMaker business as usual!

In addition, you can use the get_record() API to access individual records stored in the online store, passing the group name and the unique identifier of the record you want to access, like so:

record = sm_feature_store.get_record(
    RecordIdentifierValue={"IntegralValue": 5962}

Amazon SageMaker Feature Store is designed for fast and efficient access for real time inference, with a P95 latency lower than 10ms for a 15-kilobyte payload. This makes it possible to query for engineered features at prediction time, and to replace raw features sent by the upstream application with the exact same features used to train the model. Feature inconsistencies are eliminated by design, letting you focus on building the best models instead of chasing bugs.

Finally, as SageMaker Feature Store includes feature creation timestamps, you can retrieve the state of your features at a particular point in time.

As Amazon SageMaker Feature Store is integrated with SageMaker Studio, I can see my two features groups there.

SageMaker screenshot

Right-clicking on “Open feature group detail”, I open the identity feature group.

SageMaker screenshot

I can see feature definitions.

SageMaker screenshot

Finally, I can generate queries for the offline store, which I could add to a Amazon SageMaker Data Wrangler workflow to load features prior to training.

SageMaker screenshot

How to Get Started with Amazon SageMaker Feature Store
As you can see, SageMaker Feature Store makes it easy to store, retrieve, and share features required by your training and prediction workflows.

SageMaker Feature Store is available in all regions where SageMaker is available. Pricing is based on feature reads and writes, and on the total amount of data stored.

Here are sample notebooks that will help you get started right away. Give them a try, and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

New – Profile Your Machine Learning Training Jobs With Amazon SageMaker Debugger

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/profile-your-machine-learning-training-jobs-with-amazon-sagemaker-debugger/

Today, I’m extremely happy to announce that Amazon SageMaker Debugger can now profile machine learning models, making it much easier to identify and fix training issues caused by hardware resource usage.

Despite its impressive performance on a wide range of business problems, machine learning (ML) remains a bit of a mysterious topic. Getting things right is an alchemy of science, craftsmanship (some would say wizardry), and sometimes luck. In particular, model training is a complex process whose outcome depends on the quality of your dataset, your algorithm, its parameters, and the infrastructure you’re training on.

As ML models become ever larger and more complex (I’m looking at you, deep learning), one growing issue is the amount of infrastructure required to train them. For instance, training BERT on the publicly available COCO dataset takes well over six hours on a single p3dn.24xlarge instance, even with its eight NVIDIA V100 GPUs. Some customers like autonomous vehicle companies deal with much larger datasets, and train object detection models for several days.

When a complex training job takes this long, the odds that something goes wrong and ruins it are pretty high, not only wasting time and money but also causing lots of frustration. Important work needs to be put on the back burner while you investigate, figure out the root cause, try to fix it, and then run your training job again. Often, you’ll have to iterate quite a few times to nail the problem.

Depending on the ML framework that you use, and sometimes on its version, you may or may not be able to use existing framework-specific tools. Often, you’ll have to build and maintain your own bespoke tools. Even for experienced practitioners, this is a lot of hard work. For regular developers like me, this is an utterly daunting task.

Introducing Model Profiling in Amazon SageMaker Debugger
Launched last year at AWS re:Invent, Amazon SageMaker Debugger is a capability of Amazon SageMaker that automatically identifies complex issues developing in ML training jobs. These include loss not decreasing, exploding gradients, and more.

Now, SageMaker Debugger can also monitor hardware resource usage, and allows you to profile your training job to help you correlate resource usage to ML operations in your training script. Thus, you’ll be able to resolve performance issues much quicker, and iterate through your training job much faster.

Chaim Rand, ML Algorithm Developer at Mobileye, an Intel company building automated driving and driver assistance systems, had the opportunity to work with the new profiling capabilities, and here’s what he told us: “Many of the assisted driving and autonomous vehicle technologies that we develop at Mobileye, rely on training deep neural network models to detect a wide variety of road artifacts, including vehicles, pedestrians, speed bumps, road signs and more. Often, these models train on extremely large datasets, on multiple machines, and for periods of up to several days. For us, at Mobileye, it is imperative that we have a toolkit of advanced performance profiling capabilities, for analyzing the flow of data across the network, CPU, and GPU resources, and for pinpointing performance issues. The profiling functionality in SageMaker Debugger provides just that, taking performance profiling out of the domain of a few specialized experts, and empowering our algorithm developers to maximize training resource utilization, accelerate model convergence, and reduce cost.

At launch, the new profiling capability of SageMaker Debugger is available for TensorFlow 2.x and PyTorch 1.x. All you have to do is to train with the corresponding built-in frameworks in Amazon SageMaker. Distributed training is supported out of the box.

Setting a single parameter in your SageMaker estimator, and without any change to your training code, you can enable the collection of infrastructure and model metrics such as:

  • CPU and GPU,
  • RAM and GPU RAM,
  • Network I/O,
  • Storage I/O (local storage and Pipe Mode),
  • Python metrics,
  • Data loading time,
  • Time spent in ML operators running on CPU and GPU,
  • Distributed training metrics for Horovod,
  • and many more.

In addition, you can visualize how much time is spent in different phases, such as preprocessing, training loop, and postprocessing. If needed, you can drill down on each training epoch, and even on each function in your training script.

By default, metrics are collected every 500ms, and you can also set this value to 100ms, 200ms, 1s, 5s, and 1min. For finer-grained analysis, you can also enable and disable profiling explicitly in your training code, only capturing metrics for specific parts.

While your training job is running, you can easily visualize these metrics in Amazon SageMaker Studio, our web-based integrated development environment for ML. As you would expect, all data is also available through the SageMaker Debugger API, and you can retrieve it to build your own graphs.

Running in parallel of the training job, an Amazon SageMaker Processing analyzes captured data, builds graphs, and generates a report providing insights on potential problems. This doesn’t require any work on your part, as this analysis runs inside a built-in container on fully managed infrastructure.

Now, let’s run a demo with PyTorch, where we’ll profile a ResNet-50 image classification model training on the CIFAR-10 dataset.

Profiling a Training Job with Amazon SageMaker Debugger
All it takes to enable profiling on your training job is an extra parameter in your SageMaker estimator. You don’t need to change a line in your training code. By default, SageMaker Debugger uses a set of built-in profiling rules looking for unwanted conditions that could develop during training, such as low GPU utilization. On top of reporting these conditions, SageMaker Debugger also triggers events in CloudWatch Events. For example, I could use them to run a AWS Lambda function that automatically stops inefficient training jobs.

First, I create a profiling configuration capturing data every 500ms. Optionally, I could select a training step interval if I wanted to profile only a certain portion of the job.

import sagemaker
from sagemaker.profiler import ProfilerConfig 
profiler_config = ProfilerConfig(profiling_interval_millis=500)

Then, I pass this configuration to my PyTorch Estimator, training on an ml.p3.8xlarge instance equipped with 4 NVIDIA V100 GPUs.

from sagemaker.pytorch import PyTorch
estimator = PyTorch(
    hyperparameters={"batch_size":512, "epochs":10},

Then, I launch the training job as usual. Once the job is running, profiling data is captured and stored in S3.

path = estimator.latest_job_profiler_artifacts_path()

Using the SageMaker SDK, I can retrieve and count profiling events.

from smdebug.profiler.system_metrics_reader import S3SystemMetricsReader
system_metrics_reader = S3SystemMetricsReader(path)
last_timestamp = system_metrics_reader.get_timestamp_of_latest_available_file()
events = system_metrics_reader.get_events(0, last_timestamp)
print("Found", len(events), "recorded system metric events. Latest recorded event:", last_timestamp)
Found 411853 recorded system metric events. Latest recorded event: 1605620280000000

Of course, I could parse and analyze these profiling events, build my own graphs, and so on. Instead, let’s visualize them in near real-time in SageMaker Studio.

While my training job is still running, I locate it in SageMaker Studio, and I right-click “Open Debugger for insights”.

SageMaker screenshot

This opens a new tab, and I select the “Nodes” panel where I can see details statistics for each instance in my training job. So, how’s my training job doing? Feel free to click on the image below to zoom in.

SageMaker screenshot

Apparently, this job isn’t going great. GPU utilization and GPU memory utilization are desperately flat at around 10%. I’m definitely not pushing my multi-GPU instance hard enough. Maybe GPUs are not receiving data fast enough because the CPU can’t keep up? Let’s check the system utilization heatmap.

SageMaker screenshot

The CPU is taking a nap here, hardly ever exceeding 20% usage. This instance is definitely not busy enough. Is there anything I could do to fix this?

Switching to the “Overview” panel, I see that some of the built-in profiling rules have been triggered.

SageMaker screenshot

LowGPUUtilization confirms what I saw on the graphs above. BatchSize is very interesting, as it suggests increasing the size of mini-batches sent to the GPUs by the training script running on the CPU. This should definitely help fill GPU memory, put more GPU cores to work, speed up my training job, and improve infrastructure usage.

At this point, I should decide to stop my inefficient training job, and to relaunch it with a larger batch size. Here, I’ll let it run to completion to show you the report generated by the SageMaker Processing job running in parallel of your training job.

Once the training job is complete, I can see its summary in the “Overview” panel.

SageMaker screenshot

Clicking on the “Download report” button, I get a very detailed report that includes additional metrics, for example the ratio between the different phases of the training job, or the ratio between the forward and backward pass.

SageMaker screenshot

I can also see information on the most time-consuming CPU and GPU operators, which is really important if I wanted to optimize my code. For example, the graph below tells me that the most time-consuming GPU operations in my training job are backward pass convolution operators.

SageMaker screenshot

There’s much more to read in the report (rules summary, training loop analysis, and more). A companion notebook is also available to understand how graphs have been built, and how you can tailor them to your own needs.

Getting Started
We’ve just scratched the surface, and there are many more features in Amazon SageMaker Debugger that make it easy to gather, analyze and visualize model profiling information. You can start using it today in all regions where Amazon SageMaker is available. You won’t be charged for any compute used to run built-in profiling rules.

You’ll find sample notebooks on Github, so give them a try, and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

re:Invent 2020 Liveblog: Machine Learning Keynote

Post Syndicated from AWS News Blog Team original https://aws.amazon.com/blogs/aws/reinvent-2020-liveblog-machine-learning-keynote/

Swami Sivasubramanian speaks on stage at AWS re:InventFollow along as AWS Chief Evangelist Jeff Barr and Developer Advocates Martin Beeby and Steve Roberts liveblog the first-ever Machine Learning Keynote. Swami Sivasubramanian, VP of Amazon ML/AI will share the latest developments and launches in AWS machine learning, as well as demos of new technology, and insights from customers.

Join us here from 7:45-10 AM (PST), Tuesday, Dec. 8, 2020! 


AWS Panorama Appliance: Bringing Computer Vision Applications to the Edge

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/using-computer-vision-applications-at-the-edge/

At AWS re:Invent today, we gave a preview of the AWS Panorama Appliance. Also, we announced the AWS Panorama SDK is coming soon. These allow organizations to bring computer vision to their on-premises cameras and make automated predictions with high accuracy and low latency.

Over the past couple of decades, computer vision has gone from a topic discussed by academics to a tool used by businesses worldwide. Cloud has been critical in enabling this growth, and we have had an explosion of services and infrastructure capabilities that have made the previously impossible possible.

Customers face challenges with their physical systems, whether it be inspecting parts on a manufacturing line, ensuring workers wear hard hats in hazardous areas or analyzing customer traffic in retail stores. Customers often solve these problems by manually monitoring live video feeds or reviewing recorded footage after an issue or incident has occurred. These solutions are manual, error-prone, and difficult to scale.

Computer Vision is increasingly being used to perform these inspection tasks using models running in the cloud. Still, there are circumstances when relying exclusively on the cloud is not optimal due to latency requirements or intermittent connectivity that make a round trip to the cloud unfeasible.

What Was Announced Today
You can now develop a computer vision model using Amazon SageMaker and then deploy it to a Panorama Appliance that can then run the model on video feeds from multiple network and IP cameras. The Panorama Appliance and its associated console are now in preview.

Coming soon, we have the Panorama SDK, which is a Software Development Kit (SDK) that can be used by third-party device manufacturers to build Panorama-enabled devices. The Panorama SDK is flexible, with a small footprint, making it easy for hardware vendors to build new devices in various form factors and sensors to satisfy use cases across different industries and environments, including industrial sites, low light scenarios, and the outdoors.

Unboxing the Appliance
Jeff was sent a Panorama Appliance a few weeks before AWS re:Invent so that we could write this blog; here is a picture of the device set up in Jeff’s office.

To set up the Panorama Appliance, I head over to the console and click on Get Started.

The console presents me with a three-step guide to getting a computer vision model running on the Panorama Appliance. In this blog, I will only look at Step 1, which helps me set up the Panorama Appliance.

I plugged the Panorama Appliance into the local network with an ethernet cable, and so in the configure step, I choose the Ethernet option.

The console creates a configuration archive, based on my input, that can set up the device. I download the file and transfer it to a USB key, which I will then plug into the Panorama Appliance.

I power on the Panorama Appliance and plug in the USB key; lights begin to flash as the Panorama Appliance connects to AWS. A green message appears on the console after a little while, saying that the Panorama Appliance is connected and online.

The guide then asks me to Add camera streams.

There are two ways to Add cameras: Automatic and Manual. I select Automatic, which will search the subnet for available cameras.

Some of the network cameras are password-protected. I add the Username and Password so that the Panorama Appliance can connect.

The Panorama Appliance is now connected to the network cameras, and my Panorama Appliance is ready to use.

The next step is to create an application and then deploy it to the Panorama Appliance. Over the next few weeks, I will produce a demo application, and when the service becomes Generally Available, I look forward to talking about it on this blog and sharing with you what I have created.

Get Started
To get started with the AWS Panorama Appliance, visit the product page. To get started with the AWS Panorama SDK, check out the documentation. We are excited to see what our customers will develop with the Panorama Appliance, and the sort of products third-party device manufacturers will build with the Panorama SDK.

Happy Automating!

— Martin







Amazon EC2 P4d instances deep dive

Post Syndicated from Neelay Thaker original https://aws.amazon.com/blogs/compute/amazon-ec2-p4d-instances-deep-dive/

This post is contributed by Amr Ragab, Senior Solutions Architect, Amazon EC2


AWS is excited to announce that the new Amazon EC2 P4d instances are now generally available. This instance type brings additional benefits with 2.5x higher deep learning performance; adding to the accelerated instances portfolio, new features, and technical breakthroughs that our customers can benefit from with this latest technology. This blog post details some of those key features and how to integrate them into your current workloads and architectures.


P4d instances

As you can see from the generalized block diagram above, the p4d comes with dual socket Intel Cascade Lake 8275CL processors totaling 96 vCPUs at 3.0 GHz with 1.1 TB of RAM and 8 TB of NVMe local storage. P4d also comes with 8 x 40 GB NVIDIA Tesla A100 GPUs with NVSwitch and 400 Gbps Elastic Fabric Adapter (EFA) enabled networking. This instance configuration represents the latest generation of computing for our customers spanning Machine Learning (ML), High Performance Computing (HPC), and analytics.

One of the improvements of the p4d is in the networking stack.  This new instance type has 400 Gbps with support for EFA and GPUDirect RDMA. Now, on AWS, you can take advantage of point-to-point GPU to GPU communication (across nodes), bypassing the CPU. Look out for additional blogs and webinars detailing use cases of GPUDirect and how this feature helps decrease latency and improve performance for certain workloads.

Let’s look at some new features and performance metrics for the P4d instances.


Local ephemeral NVMe storage
The p4d instance type comes with 8 TB of local NVMe storage. Each device has a maximum read/write throughput of 2.7 GB/s. To create a local namespace and staging area for input into the GPUs, you can create a local RAID 0 of all the drives. This results in aggregate read throughput of about 16 GB/s. The following table summarizes the I/O tests on the NVMe drives in this configuration.

FIO – Test Block Size Threads Bandwidth
1 Sequential Read 128k 96 16.4 GiB/s
2 Sequential Write 128k 96 8.2 GiB/s
3 Random Read 128k 96 16.3 GiB/s
4 Random Write 128k 96 8.1 GiB/s


Introduced with the p4d instance type is NVSwitch. Every GPU in the node is connected to each other in a full mesh topology up to 600 GB/s bidirectional bandwidth. ML frameworks and HPC applications that use NVIDIA communication collectives library (NCCL) can take full advantage of this all-to-all communication layer.

P4d GPU to GPU bandwidth

P3 GPU to GPU bandwidth

P4d uses a full mesh NVLink topology for optimized all-to-all communication, compared to the previous generation P3/P3dn instances, which have all-to-all communication across various data path domains (NUMA, PCIe switch, NVLink).  This new topology accessed via NCCL will improve performance for multiGPU workloads.
To make optimal use of the NVSwitch ensure that in your instance, all GPUs application boost clocks are set to its maximum values:

sudo nvidia-smi -ac 1215,1410

Multi-Instance GPU (MIG)

It’s now possible, at the user level, to have control of fractionating a GPU into multiple GPU slices, with each GPU slice isolated from each other. This enables multiple users to run different workloads on the same GPU without impacting performance. I walk you through an example implementation of MIG in the following steps:

With every newly launched instance, MIG is disabled. So, you must enable it with the following command:

[email protected]:~# sudo nvidia-smi -mig 1 

Enabled MIG Mode for GPU 00000000:10:1C.0
You can get a list of supported MIG profiles.
Next, you can create seven slices, and create compute instances for each slice.
[email protected]:~# sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 
Successfully created GPU instance ID 9 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 7 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 8 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 11 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 12 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 13 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 14 on GPU 0 using profile MIG 1g.5gb (ID 19)
[email protected]:~# nvidia-smi mig -cci -gi 7,8,9,11,12,13,14 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 7 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 8 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 9 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 11 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 12 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 13 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 14 using profile MIG 1g.5gb (ID 0)

You can split a GPU into a maximum of seven slices. To pass the GPU through into a docker container, you can specify the index pair at runtime:

docker run -it --gpus '"device=1:0"' nvcr.io/nvidia/tensorflow:20.09-tf1-py3

With MIG, you can run multiple smaller workloads on the same GPU without compromising performance. We will follow up with additional blogs on this feature as we integrate it with additional AWS services.


For workloads optimized for multiGPU capabilities, we introduced GPUDirect over EFA fabric. This allows direct GPU-GPU communication across multiple p4d nodes for decreased latency and improved performance. Follow this user guide to get started with installing the EFA driver and setting up the environment. The code sample below can be used as a template to use GPUDirect RDMA over EFA.

/opt/amazon/openmpi/bin/mpirun \
     -n ${NUM_PROCS} -N ${NUM_PROCS_NODE} \
     -x RDMAV_FORK_SAFE=1 -x NCCL_DEBUG=info \
     --hostfile ${HOSTS_FILE} \
     --mca pml ^cm --mca btl tcp,self --mca btl_tcp_if_exclude lo,docker0 --bind-to none \
     $HOME/nccl-tests/build/all_reduce_perf -b 8 -e 4G -f 2 -g 1 -c 1 -n 100

Machine learning Optimizations

You can quickly get started with the all benefits mentioned earlier for the p4d by using our latest Deep Learning AMI (DLAMI). The DLAMI now comes with CUDA11 and the latest NVLink and cuDNN SDKs and drivers to take advantage of the p4d.

TensorFloat32 – TF32

TF32 is a new 19 bit precision datatype from NVIDIA introduced for the first time for the p4d.24xlarge instance. This datatype improves performance with little to no loss of training and validation accuracy for most mainstream models. We have more detailed benchmarks for individual algorithms. But, on the p4d.24xlarge you can achieve approximately a 2.5 fold increase compared to FP32 on the p3dn.24xlarge for mainstream deep learning models.

We have updated our machine learning models here to show examples (see the following chart) of popular algorithms our customers are using today including general DNNs and Bert.

DNN P3dn FP32 (imgs/sec) P3dn FP16 (imgs/sec) P4d Throughput TF32 (imgs/sec) P4d Throughput FP16 (imgs/sec) P4d over p3dn TF32/FP32 P4d over P3dn FP16
1 Resnet50 3057 7413 6841 15621 2.2 2.1
2 Resnet152 1145 2644 2823 5700 2.5 2.2
3 Inception3 2010 4969 4808 10433 2.4 2.1
4 Inception4 847 1778 2025 3811 2.4 2.1
5 VGG16 1202 2092 4532 7240 3.8 3.5
6 Alexnet 32198 50708 82192 133068 2.6 2.6
7 SSD300 1554 2918 3467 6016 2.2 2.1

BERT Large – Wikipedia/Books Corpus

GPUs Sequence Length Batch size / GPU: mixed precision, TF32 Gradient Accumulation: mixed precision, TF32 Throughput – mixed precision
1 1 128 64,64 1024,1024 372
2 4 128 64,64 256,256 1493
3 8 128 64,64 128,128 2936
4 1 512 16,8 2048,4096 77
5 4 512 16,8 512,1024 303
6 8 512 16,8 256,512 596

You can find other code examples at github.com/NVIDIA/DeepLearningExamples.

If you want to builld your own AMI or extend an AMI maintained by your organization you can use the github repo, which provides Packer scripts to build AMIs for Amazon Linux 2 or Ubuntu 18.04 versions.


The stack includes the following components:

  • NVIDIA Driver 450.80.02
  • CUDA 11
  • NVIDIA Fabric Manager
  • cuDNN 8
  • NCCL 2.7.8
  • EFA latest driver
  • FSx kernel and client driver and utilities
  • Intel OneDNN
  • NVIDIA-runtime Docker


Get started with the new P4d instances with support on Amazon EKS, AWS Batch, and Amazon Sagemaker. We are excited to hear about what you develop and run with the new P4d instances. If you have any questions please reach out to your account team. Now, go power up your ML and HPC workloads with NVIDIA Tesla A100s and the P4d instances.

AWS Architecture Monthly Magazine: Robotics

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/architecture-monthly-magazine-robotics/

Architecture Monthly: RoboticsSeptember’s issue of AWS Architecture Monthly issue is all about robotics. Discover why iRobot, the creator of your favorite (though maybe not your pet’s favorite) little robot vacuum, decided to move its mission-critical platform to the serverless architecture of AWS. Learn how and why you sometimes need to test in a virtual environment instead of a physical one. You’ll also have the opportunity to hear from technical experts from across the robotics industry who came together for the AWS Cloud Robotics Summit in August.

Our expert this month, Matt Hansen (who has dreamed of building robots since he was a teen), gives us his outlook for the industry and explains why cloud will be an essential part of that.

In September’s Robotics issue

  • Ask an Expert: Matt Hansen, Principle Solutions Architect
  • Blog: Testing a PR2 Robot in a Simulated Hospital
  • Case Study: iRobot
  • Blog: Introduction to Automatic Testing of Robotics Applications
  • Case Study: Multiply Labs Uses AWS RoboMaker to Manufacture Individualized Medicines
  • Demos & Videos: AWS Cloud Robotics Summit (August 18-19, 2020)
  • Related Videos: iRobot and ZS Associates

Survey opportunity

This month, we’re also asking you to take a 10-question survey about your experiences with this magazine. The survey is hosted by an external company (Qualtrics), so the below survey button doesn’t lead to our website. Please note that AWS will own the data gathered from this survey, and we will not share the results we collect with survey respondents. Your responses to this survey will be subject to Amazon’s Privacy Notice. Please take a few moments to give us your opinions.

How to access the magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

AWS announces AWS Contact Center Intelligence solutions

Post Syndicated from Alejandra Quetzalli original https://aws.amazon.com/blogs/aws/aws-announces-aws-contact-center-intelligence-solutions/

What was announced?

We’re announcing the availability of AWS Contact Center Intelligence (CCI) solutions, a combination of services that empowers customers to easily integrate AI into contact centers, made available through AWS Partner Network (APN) partners.

AWS CCI has solutions for self-service, live-call analytics & agent assist, and post-call analytics, making it possible for customers to quickly deploy AI into their existing workflows or build completely new ones.

Pricing and regional availability correspond to the underlying services (Amazon Comprehend, Amazon Kendra, Amazon Lex, Amazon Transcribe, Amazon Translate, and Amazon Polly) used.

What is AWS Contact Center Intelligence?

We mentioned that AWS CCI brings solutions to contact centers powered by AI for before, during, and after customer interactions.

My colleague Swami Sivasubramanian (VP, Amazon Machine Learning, AWS) said: “We want to make it easy for our customers with contact centers to benefit from machine learning capabilities even if they have no machine learning expertise. By partnering with APN technology and consulting partners to bring AWS Contact Center Intelligence solutions to market, we are making it easier for customers to realize the benefits of cloud-based machine learning services while removing the heavy lifting and the need to hire specialized developers to integrate the ML capabilities in to their existing contact centers.

But what does that mean? 🤔

AWS CCI solutions lets you leverage machine learning (ML) functionality such as text-to-speech, translation, enterprise search, chatbots, business intelligence, and language comprehension into current contact center environments. Customers can now implement contact center intelligence ML solutions to aid self-service, live-call analytics & agent assist, and post-call analytics. Currently, AWS CCI solutions are available through partners such as Genesys, Vonage, and UiPath for easy integration into existing enterprise contact center systems.

“We’re proud Genesys customers will be among the first to benefit from the off-the-shelf machine learning capabilities of AWS Contact Center Intelligence solutions. It’s now simpler and more cost-effective for organizations to combine AWS’s AI capabilities, including search, text-to-speech and natural language understanding, with the advanced contact center capabilities of Genesys Cloud to give customers outstanding self-service experiences.” ~ Olivier Jouve (Executive Vice President and General Manager of Genesys Cloud)

“More and more consumers are relying on automated methods to interact with brands, especially in today’s retail environment where online shopping is taking a front seat. The Genesys Cloud and Amazon Web Services (AWS) integration will make it easier to leverage conversational AI so we can provide more effective self-service experiences for our customers.” ~ Aarde Cosseboom (Senior Director of Global Member Services Technology, Analytics and Product at TechStyle Fashion Group)


How it works and who it’s for…

AWS Contact Center Intelligence solutions offer a variety of ways that organizations can quickly and cost-effectively add machine learning-based intelligence to their contact centers, via AWS pre-trained AI Services. AWS CCI is currently available through participating APN partners, and it is focused on three stages of the contact center workflow: Self-Service, Live Call Analytics and Agent Assist, and Post-Call Analytics. Let’s break each one of these up.

The Self-Service solution helps with creation of chatbots and ML-driven IVRs (Interactive voice response) to address the most common queries a contact center workforce often gets. This now allows actual call center employees to focus on higher value work. To implement this solution, you’ll want to work with either Amazon Lex and/or Amazon Kendra. The novelty of this solution is that Lex + Kendra not only fulfills transactional queries (i.e. book a hotel room or reset my password), but also addresses the long tail of customers questions whose answers live in enterprises knowledge systems. Before, these Q&A had to be hard coded in Lex, making it harder to implement and maintain. Today, you can implement this solution directly from your existing contact center platform with AWS CCI partners, such as Genesys.

The Live Call Analytics & Agent Assist solution enables the creation of real-time ML capabilities to increase staff productivity and engagement. Here, Amazon Transcribe is used to perform real-time speech transcription, while Amazon Comprehend can analyze interactions, detect the sentiment of the caller, and identify key words and phrases in the conversation. Amazon Translate can even be added to translate the conversation into a preferred language! Now, you can implement this solution directly from several leading contact center platforms with AWS CCI partners, like SuccessKPI.

The Post-Call Analytics solution is an automatic analysis of contact center conversations, which tend to leave actionable data for product and service feedback loops. Similar to live call analytics, this solution combines Amazon Transcribe to perform speech recognition and creates a high-quality text transcription of each call, with Amazon Comprehend to analyze the interaction. Amazon Translate can be added to translate the conversation into your preferred language, and Amazon Kendra can be used for contextual natural language queries. Today, you can implement this solution directly from several leading contact center platforms with AWS CCI partners, such as Acqueon.

AWS helps partners integrate these solutions into their products. Some solutions also have a Quick Start, which includes CloudFormation templates and deployment guide, to automate the deployments. The good news is that our AWS Partners landing pages will also provide additional implementation information specific to their products. 👌

Let’s see a demo…

For today’s post, we chose to focus on diving deeper into the Self-Service and Post-Call Analytics solutions, so let’s begin with Self-Service.

We have a public GitHub repository that has a complete Quick Start template plus a detailed deployment guide with architecture diagrams. (And the good news is that our APN partner landing pages will also reference this repo!)

This GitHub repo talks about the Amazon Lex chatbot integration with Amazon Kendra. The main idea here is that the customer can bring their own document repository through Amazon Kendra, which can be sourced through Amazon Lex when customers are interacting with this Lex chatbot.

The main thing we want to notice in this architecture is that customers can bring their existing documents and allow their chatbot to search that document whenever someone interacts with said chatbot. The architecture below assumes our docs are in an S3 bucket, but it’s worth noting that Amazon Kendra can integrate with multiple kinds of data sources. If using an S3 bucket, customers must provide their own S3 bucket name, the one that has their document repository. This is a prerequisite for deployment.

Let’s follow the instructions under the repo’s Deployment Steps, skipping ahead to Step #2, “Click Deploy to launch the CloudFormation template.”

Since this is a Quick Start template, you can see how everything is already filled out for us. We click Next and move on to Step 2, Specify stack details.

Notice how the S3 bucket section is blank. You can provide your own S3 bucket name if you want to test this out with your own docs. For today, I am going to use the S3 bucket name that was provided to us in the GitHub doc.

The next part to configure will be the Cross account role configuration section. For my demo, I will add my own AWS account ID under “Assuming Account ID.”

We click Next and move on to Step 3, Configure Stack options.

Nothing to configure here, so we can click Next again and move on to Step 4, Review. We click to accept these final acknowledgements and click Create Stack.

If we were to navigate over to our deployed AWS CloudFormation stacks, we can go to Outputs of this stack and see our Kendra index name and Lex bot name.

Now if we head over to Amazon Lex, we should be able to easily find our chatbot.

We click into it and we can see that our chatbot is ready. At this point, we can start interacting with it!

We can something like “Hi” for example.

Eventually we would also get a response that details the reply source. What this means is that it will tell you if this came from Amazon Lex or from Amazon Kendra and the documents we saved in our S3 bucket.


Live Call Analytics & Agent Assist
We have two public GitHub repositories for this solution too, and both have detailed deployment guide with architecture diagrams as well.

This GitHub repo provides us a code example and a fully functional AWS Lambda function to get you started with capturing and transcribing Amazon Chime Voice Connector phone calls using Amazon Kinesis Video Streams and Amazon Transcribe. This solution gives us the ability to see how to use AI and ML services to talk to the customer’s existent environment, to drive agent assistance or analytics. We can take a real-time voice feed, transcribe that information, and then use Amazon Comprehend to pull that information out to provide the key action and sentiment.

We now also provide the Chime SIP req connector (a chime component that allows you to connect voice over an IP compatible environment with Amazon voice services) to stream voice in Amazon Transcribe from virtually any contact center. Our partner Vonage can do the same through websocket.

👉🏽 Check out the GitHub developer docs:

And as we mentioned above, for today’s post, we chose to focus on diving deeper into the Self-Service and Post-Call Analytics solutions. So let’s move on to show an example for Post-Call Analytics.


Post-Call Analytics

We have a public GitHub repository for this solution too, with another complete Quick Start template and detailed deployment guide with architecture diagrams. This solution is used after the call has ended, so that our customers can review the analytics of those calls.

This GitHub repo talks about how to look for insights and information about calls that have already happened. We call this, Quality Management. We can use Amazon Transcribe and Amazon Comprehend to pull out key words, information, and data, in order to know how to better drive what is happening in our contact center calls. We can then review these insights on Amazon QuickSight.

Let’s look at the architecture diagram for this solution too. Our call recording gets stored in an S3 bucket, which is then picked up by a Lambda function which does a transcription using Amazon Transcribe. It puts the result in a different bucket and then that call’s metadata gets stored in DynamoDB. Now Amazon Comprehend can conduct text analysis on the call’s metadata, and stores the result in a Text analysis Output bucket. Eventually, QuickSight is used to provide dashboards showing the resulting call analytics.

Just like in the previous example, we move down to the Deployment steps section. Just like before, we have a pre-made CloudFormation template that is ready to be deployed.

Step 1, Specify template is good to go, so we click Next.

In Step 2, Specify stack details, something important to note is that the User Pool Domain Name must be globally unique.

We click Next and move on to Step 3, Configure Stack options. Nothing additional to configure here either, so we can click Next again and move on to Step 4, Review.

We click to accept these final acknowledgements and click Create Stack.

And if we were to navigate over to our deployed AWS CloudFormation stacks again, we can go to Outputs of this stack and see the PortalEndpoint key. After the stack creation has completed successfully, and portal website is available at CloudFront distribution endpoint. This key is what will allow us to find the portal URL.

We will need to have user created in Amazon Cognito for the next steps to work. (If you have never created one, visit this how-to guide.)

⚠ NOTE: Make sure to open the portal URL endpoint in a different Incognito Window as the portal attaches a QuickSight User Role that can interfere with your actual role.

We go to the portal URL and login with our created Cognito user. We’re prompted to change the temporary password and are eventually directed to the QuickSight homepage.

Now we want to upload the audio files of our calls and we can do so with the Upload button.

After successfully uploading our audio files, the audio processing will run through transcription and text analysis. At this point we can click on the Call Analytics logo in the top left of the Navigation Bar to return to home page.

Now we can drill down into a call to see Amazon Comprehend’s result of the call classifications and turn-by-turn sentiments.


🌎 Lastly…

Regional availability for AWS Contact Center Intelligence (CCI) solutions correspond to the underlying services (Amazon Comprehend, Amazon Kendra, Amazon Lex, Amazon Transcribe, Amazon Translate) used.

We are announcing AWS CCI availability with 12 APN partners: Genesys, UiPath, Vonage, Acqueon, SuccessKPI, and Inference Solutions (Technology partners), and Slalom, Onica/Rackspace, TensorIoT, Quantiphi, Accenture, and HGS Digital (Consulting partners).

Ready to get started? Contact one of the AWS CCI launch partners listed on the AWS CCI web page.


You may also want to see…

👉🏽AWS Quick Start links from post:


¡Gracias por tu tiempo!
~Alejandra 💁🏻‍♀️🤖 y Canela 🐾

AWS Architecture Monthly Magazine: Agriculture

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/aws-architecture-monthly-magazine-agriculture/

Architecture Monthly Magazine cover - AgricultureIn this month’s issue of AWS Architecture Monthly, Worldwide Tech Lead for Agriculture, Karen Hildebrand (who’s also a fourth generation farmer) refers to agriculture as “the connective tissue our world needs to survive.” As our expert for August’s Agriculture issue, she also talks about what role cloud will play in future development efforts in this industry and why developing personal connections with our AWS agriculture customers is one of the most important aspects of our jobs.

You’ll also buzz through the world of high tech beehives, milk the information about data analytics-savvy cows, and see what the reference architecture of a Smart Farm looks like.

In August’s issue Agriculture issue

  • Ask an Expert: Karen Hildebrand, AWS WW Agriculture Tech Leader
  • Customer Success Story: Tine & Crayon: Revolutionizing the Norwegian Dairy Industry Using Machine Learning on AWS
  • Blog Post: Beewise Combines IoT and AI to Offer an Automated Beehive
  • Reference Architecture:Smart Farm: Enabling Sensor, Computer Vision, and Edge Inference in Agriculture
  • Customer Success Story: Farmobile: Empowering the Agriculture Industry Through Data
  • Blog Post: The Cow Collar Wearable: How Halter benefits from FreeRTOS
  • Related Videos: DuPont, mPrest & Netafirm, and Veolia

Survey opportunity

This month, we’re also asking you to take a 10-question survey about your experiences with this magazine. The survey is hosted by an external company (Qualtrics), so the below survey button doesn’t lead to our website. Please note that AWS will own the data gathered from this survey, and we will not share the results we collect with survey respondents. Your responses to this survey will be subject to Amazon’s Privacy Notice. Please take a few moments to give us your opinions.

How to access the magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].