Tag Archives: Compute

Visualize user behavior with Auth0 and Amazon EventBridge

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/visualize-user-behavior-with-auth0-and-amazon-eventbridge/

In this post, I show how to capture user events and monitor user behavior by using the Amazon EventBridge partner integration with Auth0. This enables you to gain insights to help deliver a more customized application experience for your users.

Auth0 is a flexible, drop-in solution that adds authentication and authorization services to your applications. The EventBridge integration automatically and continuously pushes Auth0 log events your AWS account via a custom SaaS event bus.

The examples used in this post are implemented in a custom-built serverless application called FreshTracks. This is a demo application built in Vue.js, which I will use to demonstrate multiple SaaS integrations into AWS with EventBridge in this and future blog posts.

FreshTracks – a demo serverless web application with multiple SaaS Integrations.

FreshTracks – a demo serverless web application with multiple SaaS Integrations.

The components for this EventBridge integration with Auth0 have been extracted into a separate example application in this GitHub repo.

How the application works

Routing Auth0 Events with Amazon EventBridge.

Routing Auth0 Events with Amazon EventBridge.

  1. Events are emitted from Auth0 when a user interacts with the login service on the front-end application.
  2. These events are streamed into a custom SaaS event bus in EventBridge.
  3. Event rules match events and send them downstream to a Lambda function target.
  4. The receiving Lambda function performs some data transformation before writing an object to S3.
  5. These objects are made available by a QuickSight data source manifest file and used as datapoints for QuickSight visuals.

Configuring the Auth0 EventBridge integration

To capture Auth0 emitted events in EventBridge, you must first configure Auth0 for use as the Event Source on your Auth0 Dashboard.

  1. Log in to the Auth0 Dashboard.
  2. Choose to Logs > Streams.
  3. Choose + Create Stream.
  4. Choose Amazon EventBridge and enter a unique name for the new Amazon EventBridge Event Stream.
  5. Create the Event Source by providing your AWS Account ID and AWS Region. The Region you select must match the Region of the Amazon EventBridge bus.
  6. Choose Save.
Event Source Configuration on Auth0 dashboard

Event Source Configuration on Auth0 dashboard

Auth0 provides you with an Event Source Name. Make sure to save your Event Source Name value since you need this at a later point to complete the integration.

Creating a custom event bus

  1. Go to the EventBridge partners tab in your AWS Management Console. Ensure the AWS Region matches where the Event Source was created.
  2. Paste the Event Source Name in the Partner event sources search box to find and choose the new Auth0 event source.Note: The Event Source remains in a pending state until it is associated with an event bus.

    Partner event source

  3. Choose the event source, then choose Associate with Event Bus.
  4. Choose Associate.

Deploying the application

Once you have associated the Event Source with a new partner event bus, you are ready to deploy backend services to receive and respond to these events.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file.

When deploying the application stack, make sure to provide the custom event bus name with –parameter-overrides.

sam deploy --parameter-overrides Auth0EventBusName=aws.partner/auth0.com/auth0username-0123344567-e5d2-4514-84f2-97dd4ff8aad0/auth0.logs

You can find the name of the new Auth0 custom event bus in the custom event bus section of the EventBridge console:

Custom event bus name

Custom event bus name

Routing events with rules

The AWS Serverless Application Model (SAM) template in the example application creates four event rules:

  1. Successful sign-in
  2. Successful signup
  3. Successful log-out
  4. Unsuccessful signup

These are defined with the `AWS::Events::Rule` resource type. Each of these rules is routed to a single target Lambda function. For a successful sign-in, the rule event pattern is matched with detail:data:type:s.  This refers to the Auth0 event type code for a successful sign-in. Every Auth0 event code is listed here.

SuccessfullSignIn: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "Auth0 User Successfully signed in"
      EventBusName: 
         Ref: Auth0EventBusName
      EventPattern: 
        account:
        - !Sub '${AWS::AccountId}'
        detail:
          data:
            type:
            - s
      Targets: 
        - 
          Arn:
            Fn::GetAtt:
              - "SaveAuth0EventToS3" 
              - "Arn"
          Id: "SignInSuccessV1"

To respond to additional events, copy this event rule pattern and change the event code string for the event you want to match.

Writing events to S3 with Lambda

The application routes events to a Lambda function, which performs some data transformation before writing an object to S3.  The function code uses an environment variable named AuthLogsBucket to store the S3 bucket name. The permissions to write to S3 are granted by policy defined within the SAM template:

  SaveAuth0EventToS3:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: src/
      Handler: saveAuth0EventToS3.handler
      Runtime: nodejs12.x
      MemorySize: 128
      Environment:
        Variables:
          AuthLogBucket: !Ref AuthZeroToEventBridgeUserActivitylogs
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref AuthZeroToEventBridgeUserActivitylogs

The S3 object is a CSV file with context about each event. Each of the Auth0 event schemas is different. To maintain a consistent CSV file structure across different event types, this Lambda function explicitly defines each of the header and row values. An output string is constructed from the Auth0 event:

Lambda function output string

Lambda function output string

This string is placed into a new buffer and written to S3 with the AWS SDK for Javascript as referenced in GitHub here

Sending events to the application

There is a test event in the /event directory of the example application. This contains an example of a successful sign-in event emitted from Auth0.

Sending a test Auth0 event to Lambda

Send a test event to the Lambda function using the AWS Command Line Interface.

Run the following command in the root directory of the example application, replacing {function-name} with the full name of your Lambda function.

aws lambda invoke --function-name {function-name} --invocation-type Event --payload file://events/event.json  events/response.json --log-type Tail

Response:

{  
 "StatusCode": 202
}

The response output appears in the output terminal window. To confirm that an object is stored in S3, navigate to the S3 Console.  Choose the AuthZeroToEventBridgeUserActivityLogs bucket. You see a new auth0 directory and can open the CSV file that holds context about the event.

Object written to S3

Object written to S3

Sending real Auth0 events from a front-end application

Follow the instructions in the Fresh Tracks repo on GitHub to deploy the front-end application. This application includes Auth0’s authentication flow. You can connect to your Auth0 application by entering your credentials in the `auth0_config.json` file:

{
  "domain": "<YOUR AUTH0 DOMAIN>",
  "clientId": "<YOUR AUTH0 CLIENT ID>"
}

The example backend application starts receiving Auth0 emitted events immediately.

To see the full Fresh Tracks application continue to the backend deployment instructions. This is not required for the examples in this blog post.

Building a QuickSight dashboard

You can visualize these Auth0 user events with an Amazon QuickSight dashboard. This provides a snapshot analysis that you can share with other QuickSight users for reporting purposes.

To use Auth0 events as metrics, create a separate calculated field for each event (for example, successful signup and successful login).  For example, an analysis could include multiple visuals, custom fields, conditional formatting, and events. This gives a snapshot of user interaction with the front-end application at any given time.

FreshTracks final dashboard example

FreshTracks final dashboard example

The example application in the GitHub repo provides instructions on how to create a dashboard.

Conclusion

This post explains how to set up EventBridge’s third-party integration with Auth0 to capture events. The example backend application demonstrates how to filter these events, perform computations on them, save as S3 objects, and send to a downstream service.

The ability to build QuickSight story boards from these events and share visuals with key business stakeholders can provide a narrative about the analysis data. This is implemented with minimal code to provide near real-time streaming of events and without adding latency to your application.

The possibilities are vast. I am excited to see how builders use this serverless pattern to create their own visuals to build a better, more customized application experience for their users.

Start here to learn about other SaaS integrations with Amazon EventBridge.

10 things you can do today to reduce AWS costs

Post Syndicated from Roshni Pary original https://aws.amazon.com/blogs/compute/10-things-you-can-do-today-to-reduce-aws-costs/

This post is contributed by Shankar Ramachandran, SA Specialist, Cost Optimization

Introduction

AWS’s breadth of services and pricing options offer the flexibility to effectively manage your costs, and still keep the performance and capacity per your business requirement. While the fundamental process of cost optimization on AWS remains the same – monitor your AWS costs and usage, analyze the data to find savings, take action to realize the savings; in this blog I will take a more tactical approach of reducing cost with changes in user demand.

Before you start

Before taking any actions to reduce cost, find out the costs of AWS services you are consuming. The AWS Free Tier provides customers the ability to explore and try out AWS services free of charge up to specified limits for each service. Use the steps in this video to check if you are exceeding the free tier limit.

Next, use AWS Cost Explorer to view and analyze your AWS costs and usage. This tool provides default reports that help you visualize cost and usage at a high level (e.g. AWS  accounts, AWS service),  or at the resource level (e.g. EC2 instance ID). Start by identifying the top accounts where your costs incur using the “Monthly costs by linked account report”. Next, identify the top services that contribute to the costs within those accounts. You can do this by using the “Monthly costs by service report”. Use the hourly and resource level granularity and tags to filter and identify the top resources incurring costs.

Cost Explorer resource level granularity

Now, you should have an understanding of your AWS costs and usage. Next, let’s walk through 10 tactical things you can do today using existing AWS tools and services to reduce your AWS costs.

#1 Identify Amazon EC2 instances with low-utilization and reduce cost by stopping or rightsizing

Use AWS Cost Explorer Resource Optimization to get a report of EC2 instances that are either idle or have low utilization. You can reduce costs by either stopping or downsizing these instances. Use AWS Instance Scheduler to automatically stop instances. Use AWS Operations Conductor to automatically resize the EC2 instances (based on the recommendations report from Cost Explorer).

Cost Explorer rightsizing recommendations

Use AWS Compute Optimizer to look at instance type recommendations beyond downsizing within an instance family. It gives downsizing recommendations within or across instance families, upsizing recommendations to remove performance bottlenecks, and recommendations for EC2 instances that are parts of an Auto Scaling group.

#2 Identify Amazon EBS volumes with low-utilization and reduce cost by snapshotting then deleting them

EBS volumes that have very low activity (less than 1 IOPS per day) over a period of 7 days indicate that they are probably not in use. Identify these volumes using the Trusted Advisor Underutilized Amazon EBS Volumes Check. To reduce costs, first snapshot the volume (in case you need it later), then delete these volumes. You can automate the creation of snapshots using the Amazon Data Lifecycle Manager. Follow the steps here to delete EBS volumes.

#3 Analyze Amazon S3 usage and reduce cost by leveraging lower cost storage tiers

Use S3 Analytics to analyze storage access patterns on the object data set for 30 days or longer. It makes recommendations on where you can leverage S3 Infrequently Accessed (S3 IA) to reduce costs. You can automate moving these objects into lower cost storage tier using Life Cycle Policies. Alternately, you can also use S3 Intelligent-Tiering, which automatically analyzes and moves your objects to the appropriate storage tier.

#4 Identify Amazon RDS, Amazon Redshift instances with low utilization and reduce cost by stopping (RDS) and pausing (Redshift)

Use the Trusted Advisor Amazon RDS Idle DB instances check, to identify DB instances which have not had any connection over the last 7 days. To reduce costs, stop these DB instances using the automation steps described in this blog post. For Redshift, use the Trusted Advisor Underutilized Redshift clusters check, to identify clusters which have had no connections for the last 7 days, and less than 5% cluster wide average CPU utilization for 99% of the last 7 days. To reduce costs, pause these clusters using the steps in this blog.

Pause underutilized Redshift clusters

#5 Analyze Amazon DynamoDB usage and reduce cost by leveraging Autoscaling or On-demand

Analyze your DynamoDB usage by monitoring 2 metrics, ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits, in CloudWatch. To automatically scale (in and out) your DynamoDB table, use the AutoScaling feature. Using the steps here, you can enable AutoScaling on your existing tables. Alternately, you can also use the on-demand option. This option allows you to pay-per-request for read and write requests so that you only pay for what you use, making it easy to balance costs and performance.

DynamoDB read/write capacity mode

#6 Review networking and reduce costs by deleting idle load balancers

Use the Trusted Advisor Idle Load Balancers check to get a report of load balancers that have RequestCount of less than 100 over the past 7 days. Then, use the steps here, to delete these load balancers to reduce costs. Additionally, use the steps provided in this blog, review your data transfer costs using Cost Explorer.

Cost Explorer filters for EC2 Data Transfer

If data transfer from EC2 to the public internet shows up as a significant cost, consider using Amazon CloudFront. Any image, video, or static web content can be cached at AWS edge locations worldwide, using the Amazon CloudFront Content Delivery Network (CDN). CloudFront eliminates the need to over-provision capacity in order to serve potential spikes in traffic.

#7 Use Amazon EC2 Spot Instances to reduce EC2 costs

If your workload is fault-tolerant, use Spot instances to reduce costs by up to 90%. Typical workload examples include big data, containerized workloads, CI/CD, web servers, high-performance computing (HPC), and other test & development workloads. Using EC2 Auto Scaling you can launch both On-Demand and Spot instances to meet a target capacity. Auto Scaling automatically takes care of requesting for Spot instances and attempts to maintain the target capacity even if your Spot instances are interrupted. You can learn more about Spot by watching this 2019 re:Invent session:

#8 Review and modify EC2 AutoScaling Groups configuration

An EC2 Autoscaling group allows your EC2 fleet to expand or shrink based on demand. Review your scaling activity using either the describe-scaling-activity CLI command or on the console using the steps described here. Analyze the result to see if the scaling policy can be tuned to add instances less aggressively. Also review your settings to see if the minimum can be reduced so as to serve end user requests but with a smaller fleet size.

#9 Use Reserved Instances (RI) to reduce RDS, Redshift, ElastiCache and Elasticsearch costs

Use one year, no upfront RIs to get a discount of up to 42% compared to On-Demand pricing. Use the recommendations provided in AWS Cost Explorer RI purchase recommendations, which is based on your RDS, Redshift, ElastiCache and Elasticsearch usage. Make sure you adjust the parameters to one year, no upfront. This requires a one year commitment but the break-even point is typically seven to nine months. I recommend doing #4 before #9

#10 Use Compute Savings Plans to reduce EC2, Fargate and Lambda costs

Compute Savings Plans automatically apply to EC2 instance usage regardless of instance family, size, AZ, region, OS or tenancy, and also apply to Fargate and Lambda usage. Use one year, no upfront Compute Savings Plans to get a discount of up to 54% compared to On-Demand pricing. Use the recommendations provided in AWS Cost Explorer, and ensure that you have chosen compute, one year, no upfront options. Once you sign up for Savings Plans, your compute usage is automatically charged at the discounted Savings Plans prices. Any usage beyond your commitment will be charged at regular On Demand rates. I recommend doing #1 before #10.

Savings Plans Recommendations

Whats Next

With these 10 steps, you can save costs on EC2, Fargate, Lambda, EBS, S3, ELB, RDS, Redshift, DynamoDB, ElastiCache and Elasticsearch. I recommend setting up a budget using AWS Budgets, so that you get alerted when your cost and usage changes.

Set a budget to track your AWS cost and usage

 

Setup an alert on forecasted costs

Using Budgets, you can set up an alert on forecasted costs too (apart from actual). This gives you the ability to get ahead of the problem and reduce costs proactively.

Conclusion

To learn more quick cost optimization techniques, watch the on-demand webinar – Nine Ways to Reduce Your AWS Bill. We are here to help you, please reach out to AWS Support and your AWS account team if you need further assistance in optimizing your AWS environment.

Building a Raspberry Pi telepresence robot using serverless: Part 1

Post Syndicated from Moheeb Zara original https://aws.amazon.com/blogs/compute/building-a-raspberry-pi-telepresence-robot-using-serverless-part-1/

A Pimoroni STS-Pi Robot Kit connected to AWS for remote control and viewing.

A Pimoroni STS-Pi Robot Kit connected to AWS for remote control and viewing.

A telepresence robot allows you to explore remote environments from the comfort of your home through live stream video and remote control. These types of robots can improve the lives of the disabled, elderly, or those that simply cannot be with their coworkers or loved ones in person. Some are used to explore off-world terrain and others for search and rescue.

This guide walks through building a simple telepresence robot using a Pimoroni STS-PI Raspberry Pi robot kit. A Raspberry Pi is a small low-cost device that runs Linux. Add-on modules for Raspberry Pi are called “hats”. You can substitute this kit with any mobile platform that uses two motors wired to an Adafruit Motor Hat or a Pimoroni Explorer Hat.

The sample serverless application uses AWS Lambda and Amazon API Gateway to create a REST API for driving the robot. A Python application running on the robot uses AWS IoT Core to receive drive commands and authenticate with Amazon Kinesis Video Streams with WebRTC using an IoT Credentials Provider. In the next blog I walk through deploying a web frontend to both view the livestream and control the robot via the API.

Prerequisites

You need the following to complete the project:

A Pimoroni STS-Pi robot kit, Explorer Hat, Raspberry Pi, camera, and battery.

A Pimoroni STS-Pi robot kit, Explorer Hat, Raspberry Pi, camera, and battery.

Estimated Cost: $120

There are three major parts to this project. First deploy the serverless backend using the AWS Serverless Application Repository. Then assemble the robot and run an installer on the Raspberry Pi. Finally, configure and run the Python application on the robot to confirm it can be driven through the API and is streaming video.

Deploy the serverless application

In this section, use the Serverless Application Repository to deploy the backend resources for the robot. The resources to deploy are defined using the AWS Serverless Application Model (SAM), an open-source framework for building serverless applications using AWS CloudFormation. To deeper understand how this application is built, look at the SAM template in the GitHub repository.

An architecture diagram of the AWS IoT and Amazon Kinesis Video Stream resources of the deployed application.

The Python application that runs on the robot requires permissions to connect as an IoT Thing and subscribe to messages sent to a specific topic on the AWS IoT Core message broker. The following policy is created in the SAM template:

RobotIoTPolicy:
      Type: "AWS::IoT::Policy"
      Properties:
        PolicyName: !Sub "${RobotName}Policy"
        PolicyDocument:
          Version: "2012-10-17"
          Statement:
            - Effect: Allow
              Action:
                - iot:Connect
                - iot:Subscribe
                - iot:Publish
                - iot:Receive
              Resource:
                - !Sub "arn:aws:iot:*:*:topicfilter/${RobotName}/action"
                - !Sub "arn:aws:iot:*:*:topic/${RobotName}/action"
                - !Sub "arn:aws:iot:*:*:topic/${RobotName}/telemetry"
                - !Sub "arn:aws:iot:*:*:client/${RobotName}"

To transmit video, the Python application runs the amazon-kinesis-video-streams-webrtc-sdk-c sample in a subprocess. Instead of using separate credentials to authenticate with Kinesis Video Streams, a Role Alias policy is created so that IoT credentials can be used.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "iot:Connect",
        "iot:AssumeRoleWithCertificate"
      ],
      "Resource": "arn:aws:iot:Region:AccountID:rolealias/robot-camera-streaming-role-alias",
      "Effect": "Allow"
    }
  ]
}

When the above policy is attached to a certificate associated with an IoT Thing, it can assume the following role:

 KVSCertificateBasedIAMRole:
      Type: 'AWS::IAM::Role'
      Properties:
        AssumeRolePolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: 'Allow'
            Principal:
              Service: 'credentials.iot.amazonaws.com'
            Action: 'sts:AssumeRole'
        Policies:
        - PolicyName: !Sub "KVSIAMPolicy-${AWS::StackName}"
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
            - Effect: Allow
              Action:
                - kinesisvideo:ConnectAsMaster
                - kinesisvideo:GetSignalingChannelEndpoint
                - kinesisvideo:CreateSignalingChannel
                - kinesisvideo:GetIceServerConfig
                - kinesisvideo:DescribeSignalingChannel
              Resource: "arn:aws:kinesisvideo:*:*:channel/${credentials-iot:ThingName}/*"

This role grants access to connect and transmit video over WebRTC using the Kinesis Video Streams signaling channel deployed by the serverless application. An architecture diagram of the API endpoint in the deployed application.

A deployed API Gateway endpoint, when called with valid JSON, invokes a Lambda function that publishes to an IoT message topic, RobotName/action. The Python application on the robot subscribes to this topic and drives the motors based on any received message that maps to a command.

  1. Navigate to the aws-serverless-telepresence-robot application in the Serverless Application Repository.
  2. Choose Deploy.
  3. On the next page, under Application Settings, fill out the parameter, RobotName.
  4. Choose Deploy.
  5. Once complete, choose View CloudFormation Stack.
  6. Select the Outputs tab. Copy the ApiURL and the EndpointURL for use when configuring the robot.

Create and download the AWS IoT device certificate

The robot requires an AWS IoT root CA (fetched by the install script), certificate, and private key to authenticate with AWS IoT Core. The certificate and private key are not created by the serverless application since they can only be downloaded on creation. Create a new certificate and attach the IoT policy and Role Alias policy deployed by the serverless application.

  1. Navigate to the AWS IoT Core console.
  2. Choose Manage, Things.
  3. Choose the Thing that corresponds with the name of the robot.
  4. Under Security, choose Create certificate.
  5. Choose Activate.
  6. Download the Private Key and Thing Certificate. Save these securely, as this is the only time you can download this certificate.
  7. Choose Attach Policy.
  8. Two policies are created and must be attached. From the list, select
    <RobotName>Policy
    AliasPolicy-<AppName>
  9. Choose Done.

Flash an operating system to an SD card

The Raspberry Pi single-board Linux computer uses an SD card as the main file system storage. Raspbian Buster Lite is an officially supported Debian Linux operating system that must be flashed to an SD card. Balena.io has created an application called balenaEtcher for the sole purpose of accomplishing this safely.

  1. Download the latest version of Raspbian Buster Lite.
  2. Download and install balenaEtcher.
  3. Insert the SD card into your computer and run balenaEtcher.
  4. Choose the Raspbian image. Choose Flash to burn the image to the SD card.
  5. When flashing is complete, balenaEtcher dismounts the SD card.

Configure Wi-Fi and SSH headless

Typically, a keyboard and monitor are used to configure Wi-Fi or to access the command line on a Raspberry Pi. Since it is on a mobile platform, configure the Raspberry Pi to connect to a Wi-Fi network and enable remote access headless by adding configuration files to the SD card.

  1. Re-insert the SD card to your computer so that it shows as volume boot.
  2. Create a file in the boot volume of the SD card named wpa_supplicant.conf.
  3. Paste in the following contents, substituting your Wi-Fi credentials.
    ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
            update_config=1
            country=<Insert country code here>
    
            network={
             ssid="<Name of your WiFi>"
             psk="<Password for your WiFi>"
            }

  4. Create an empty file without a file extension in the boot volume named ssh. At boot, the Raspbian operating system looks for this file and enables remote access if it exists. This can be done from a command line:
    cd path/to/volume/boot
    touch ssh

  5. Safely eject the SD card from your computer.

Assemble the robot

For this section, you can use the Pimoroni STS-Pi robot kit with a Pimoroni Explorer Hat, along with a Raspberry Pi Model 3 B+ or newer, and a camera module. Alternatively, you can use any two motor robot platform that uses the Explorer Hat or Adafruit Motor Hat.

  1. Follow the instructions in this video to assemble the Pimoroni STS-Pi robot kit.
  2. Place the SD card in the Raspberry Pi.
  3. Since the installation may take some time, power the Raspberry Pi using a USB 5V power supply connected to a wall plug rather than a battery.

Connect remotely using SSH

Use your computer to gain remote command line access of the Raspberry Pi using SSH. Both devices must be on the same network.

  1. Open a terminal application with SSH installed. It is already built into Linux and Mac OS, to enable SSH on Windows follow these instructions.
  2. Enter the following to begin a secure shell session as user pi on the default local hostname raspberrypi, which resolves to the IP address of the device using MDNS:
  3. If prompted to add an SSH key to the list of known hosts, type yes.
  4. When prompted for a password, type raspberry. This is the default password and can be changed using the raspi-config utility.
  5. Upon successful login, you now have shell access to your Raspberry Pi device.

Enable the camera using raspi-config

A built-in utility, raspi-config, provides an easy to use interface for configuring Raspbian. You must enable the camera module, along with I2C, a serial bus used for communicating with the motor driver.

  1. In an open SSH session, type the following to open the raspi-config utility:
    sudo raspi-config

  2. Using the arrows, choose Interfacing Options.
  3. Choose Camera. When prompted, choose Yes to enable the camera module.
  4. Repeat the process to enable the I2C interface.
  5. Select Finish and reboot.

Run the install script

An installer script is provided for building and installing the Kinesis Video Stream WebRTC producer, AWSIoTPythonSDK and Pimoroni Explorer Hat Python libraries. Upon completion, it creates a directory with the following structure:

├── /home/pi/Projects/robot
│  └── main.py // The main Python application
│  └── config.json // Parameters used by main.py
│  └── kvsWebrtcClientMasterGstSample //Kinesis Video Stream producer
│  └── /certs
│     └── cacert.pem // Amazon SFSRootCAG2 Certificate Authority
│     └── certificate.pem // AWS IoT certificate placeholder
│     └── private.pem.key // AWS IoT private key placeholder
  1. Open an SSH session on the Raspberry Pi.
  2. (Optional) If using the Adafruit Motor Hat, run this command, otherwise the script defaults to the Pimoroni Explorer Hat.
    export MOTOR_DRIVER=adafruit  

  3. Run the following command to fetch and execute the installer script.
    wget -O - https://raw.githubusercontent.com/aws-samples/aws-serverless-telepresence-robot/master/scripts/install.sh | bash

  4. While the script installs, proceed to the next section.

Configure the code

The Python application on the robot subscribes to AWS IoT Core to receive messages. It requires the certificate and private key created for the IoT thing to authenticate. These files must be copied to the directory where the Python application is stored on the Raspberry Pi.

It also requires the IoT Credentials endpoint is added to the file config.json to assume permissions necessary to transmit video to Amazon Kinesis Video Streams.

  1. Open an SSH session on the Raspberry Pi.
  2. Open the certificate.pem file with the nano text editor and paste in the contents of the certificate downloaded earlier.
    cd/home/pi/Projects/robot/certs
    nano certificate.pem

  3. Press CTRL+X and then Y to save the file.
  4. Repeat the process with the private.key.pem file.
    nano private.key.pem

  5. Open the config.json file.
    cd/home/pi/Projects/robot
    nano config.json

  6. Provide the following information:
    IOT_THINGNAME: The name of your robot, as set in the serverless application.
    IOT_CORE_ENDPOINT: This is found under the Settings page in the AWS IoT Core console.
    IOT_GET_CREDENTIAL_ENDPOINT: Provided by the serverless application.
    ROLE_ALIAS: This is already set to match the Role Alias deployed by the serverless application.
    AWS_DEFAULT_REGION: Corresponds to the Region the application is deployed in.
  7. Save the file using CTRL+X and Y.
  8. To start the robot, run the command:
    python3 main.py

  9. To stop the script, press CTRL+C.

View the Kinesis video stream

The following steps create a WebRTC connection with the robot to view the live stream.

  1. Navigate to the Amazon Kinesis Video Streams console.
  2. Choose Signaling channels from the left menu.
  3. Choose the channel that corresponds with the name of your robot.
  4. Open the Media Playback card.
  5. After a moment, a WebRTC peer to peer connection is negotiated and live video is displayed.
    An animated gif demonstrating a live video stream from the robot.

Sending drive commands

The serverless backend includes an Amazon API Gateway REST endpoint that publishes JSON messages to the Python script on the robot.

The robot expects a message:

{ “action”: <direction> }

Where direction can be “forward”, “backwards”, “left”, or “right”.

  1. While the Python script is running on the robot, open another terminal window.
  2. Run this command to tell the robot to drive forward. Replace <API-URL> using the endpoint listed under Outputs in the CloudFormation stack for the serverless application.
    curl -d '{"action":"forward"}' -H "Content-Type: application/json" -X POST https://<API-URL>/publish

    An animated gif demonstrating the robot being driven from a REST request.

Conclusion

In this post, I show how to build and program a telepresence robot with remote control and a live video feed in the cloud. I did this by installing a Python application on a Raspberry Pi robot and deploying a serverless application.

The Python application uses AWS IoT credentials to receive remote commands from the cloud and transmit live video using Kinesis Video Streams with WebRTC. The serverless application deploys a REST endpoint using API Gateway and a Lambda function. Any application that can connect to the endpoint can drive the robot.

In part two, I build on this project by deploying a web interface for the robot using AWS Amplify.

A preview of the web frontend built in the next blog.

A preview of the web frontend built in the next blog.

 

 

Building a Photo Diary Ghost on Amazon Lightsail

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/building-a-photo-diary-ghost-on-amazon-lightsail/

This post was written by Robert Zhu, a principal technical evangelist at AWS and a member of the GraphQL Working Group. 

Ghost is a simple and flexible alternative to WordPress. With Ghost, you can build a beautiful company website, personal portfolio, photo diary, or anything in between.

In this post, I show you how to start a photo diary using Ghost on Amazon Lightsail, AWS’ easiest solution for hosting virtual private servers. Compared to Amazon EC2, Amazon Lightsail takes care of advanced concepts like VPCs, Security Groups, and IAM policies for you until you want to re-engage those services. Amazon Lightsail also bundles monthly network transfer with the instance, whereas Amazon EC2 charges separately for network transfer.

There are two easy ways to get up and running with Ghost on Lightsail:

1. Using the Ghost Blueprint

Lightsail includes a number of common instance images known as “Blueprints.” When you launch a Blueprint, the selected software is installed and preconfigured, along with any dependencies. This is especially handy for Ghost because it saves you from having to install and configure MySQL. You can find our click-to-launch stacks in the Amazon Lightsail console, as shown in the following image.

Lightsail has pre-built Blueprints for you to launch entire applications with your VPS.

To launch a Blueprint instance, navigate to the Lightsail console. Then, select “Apps + OS” on the “Create Instance” page, and choose “Ghost.” Select the $5/month instance (this is the cheapest instance that meets the minimum requirements for running Ghost and its database on a single box).

 

Once the instance is up and running, find its IP address and open it in your browser. It may take a few minutes for Ghost to start running, during which you may see an error in the browser. As you can see, it is easy to start running Ghost on Amazon Lightsail with the Blueprint.

 

2. Running Ghost in a Container

As an alternative to launching a Ghost Blueprint, you can also run Ghost within a Docker container. In order to run Ghost within a container, you must install Docker on the Lightsail instance. Compared to the Blueprint approach, there are two advantages:

  1. Ability to run other applications on your Lightsail instance, such as forums, static websites, APIs, etc.
  2. Painless upgrades to new Ghost versions. When a new version of Ghost is released, you can upgrade with just a single command.

To run Ghost in a container complete the following steps.

  1. Create a Lightsail Instance (select Ubuntu 18.04).
  2. Connect to it using Lightsail’s browser-based SSH client. Once the instance is up and running, run the following commands to install Docker on your Lightsail instance after connecting:

sudo apt-get update && sudo apt-get install docker.io -y

sudo systemctl start docker

sudo systemctl enable docker

       3. To make sure Docker is installed and running, run:

sudo docker run hello-world 

You should see:

      4. Start the ghost container by running:

sudo mkdir /var/lib/ghost

sudo mkdir /var/lib/ghost/content

sudo docker run -d --name blog -p 80:2368 -v /var/lib/ghost/content:/var/lib/ghost/content ghost

The last command runs the “ghost” container from the public Docker registry. It also maps port 80 (HTTP) on the host to port 2368 inside the container where Ghost is listening for HTTP requests. The -v argument maps a path on the host to a path inside the container. This way, any blog content is persisted into /var/lib/ghost/content and doesn’t disappear when you stop/delete/upgrade the container. The -d argument tells Docker to run the container in detached mode.

 

Now you can test your new blog.

  1. Find the public IP address of your Lightsail instance by selecting Manage from the Instance menu: You can find your IP address by clicking "Manage" on the instance.
  2. Paste the IP address into a browser, and you should see your brand new Ghost blog:After following these steps, you will see Ghost being served from the IP address of your VPS.

 

Installing a Custom Theme

You can customize the look and feel of your Ghost blog by installing a custom theme. There are plenty of free and paid themes online. If you’re a designer, you can even build your own theme from scratch.

For my photo diary, I use the London theme. Download the theme by clicking “Clone or download” > “Download ZIP.”

To download the London theme from Github, click on "Clone or download" and then click "Download ZIP".

 

Next, we need to log into the Ghost admin console. If you launched your Ghost instance using the Lightsail blueprint, follow these steps:

  1. SSH into your Ghost instance
  2. You should see the Bitnami welcome message and a bitnami_credentails file under your home directory.After SSHing into your Lightsail blueprint instance, you'll see a bitnami_credentials file in your home directory.
  3. Display the default admin user name and password by typing cat bitnami_credentials.
  4. Use your credentials to administer your ghost instance at http://INSTANCEIPADDRESS/ghost.

If you launched Ghost within a Docker container, then you can administer Ghost by going to http://INSTANCEIPADDRESS/admin.

Once you are logged into the Ghost admin console, click on Settings > Design > Upload a theme.

In the left menu, click on settings, design, then click "upload a theme" on the main page.

You then see a pop-up prompt to upload the theme file. Drag and drop the theme file you downloaded above. Once the upload is complete, click “activate.”

The london theme is perfect for showcasing your photographs

Refresh your ghost blog IP address in another tab, and you should see a bold, new look and feel.

The Ghost admin console is also used for managing content, plugins, users, and more. For example, you can use header and footer scripts to inject code to add features like analytics and comments. By combining these features, you can make your Ghost blog stand out and fit your workflow.

Managing DNS with Lightsail

If you want to add a custom domain for your blog, you can manage that domain within Lightsail. By managing all your DNS records in one place, you can simplify your workflows.

For example, Lightsail offers the ability to assign Static IP addresses to instances, and then automatically associate an A record to a static IP address. This is a quality-of-life improvement that helps you avoid typing the wrong IP address.

Here are the steps to manage your domain with Lightsail:

  1. Register your domain with a domain registrar, such as Route 53 or GoDaddy.
  2. Once registered, create a DNS Zone within Lightsail and note the nam eservers.
  3. In your domain registrar, replace the domain name servers with your Lightsail DNS Zone name servers. If you’re using Route 53, do not replace the name servers in the default hosted zone. Here’s a video that makes everything clear.
  4. After creating your DNS Zone in Lightsail, you can now associate A (address) records with your Lightsail instances.

Once you have a custom domain, you’ll want to consider adding SSL. Here are step-by-step instructions for issuing an SSL certificate using Let’s Encrypt and terminating SSL using NGNIX.

Conclusion

While Ghost is already a very flexible platform, the Ghost community constantly adds new plugins, themes, extensions, and features. Hosting Ghost yourself is a great way to learn some basic server workflows and get the most out of your Ghost blog. And with Lightsail, it’s simple and cheap! Please feel free to contact me with comments and questions. Go ahead and start running Ghost on Amazon Lightsail.

 

About the Author: 
Robert Zhu is a Principal Developer Evangelist at Amazon Web Services. He focuses on APIs, Web, Mobile, and Gaming. Prior to joining AWS, he worked on GraphQL at Facebook. While at Microsoft, he worked on the .net Framework, Windows Server, and Microsoft Game Studios. In his spare time, he loves learning about history, economics, and psychology. You can reach him @rbzhu on twitter or directly via telepathy.

Enabling job accounting for HPC with AWS ParallelCluster and Amazon RDS

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/enabling-job-accounting-for-hpc-with-aws-parallelcluster-and-amazon-rds/

This post is written by Nicola Venuti, HPC Specialist SA, and contributed to by Rex Chen, Software Development Engineer.

Introduction

Accounting, reporting, and advanced analytics used for data-driven planning and decision making are key areas of focus for High Performance Computing (HPC) Administrators. In the cloud, these areas are more relevant to the costs of the services, which directly impact budgeting and forecasting of expenses. With the growth of new HPC services that perform analyses and corrective actions, you can better optimize for performance, which reduces cost.

Solution Overview

In this blog post, we walk through an easy way to collect accounting information for evert job and step executed in a cluster with job scheduling. This post uses a new feature in the latest version (2.6.0) of AWS ParallelCluster, which makes this process easier than before, and Slurm.  Accounting records are saved into a relational database for both currently executing jobs and jobs, which have already terminated.

Prerequisites

This tutorial assumes you already have a cluster in AWS ParallelCluster. If  you don’t, refer to the AWS ParallelCluster documentation, a getting started blog post, or a how-to blog post.

Solution

Choose your architecture

There are two common architectures to save job accounting information into a database:

  1. Installing and directly managing a DBMS in the master node of your cluster (or in an additional EC2 instance dedicated to it)
  2. Using a fully managed service like Amazon Relational Database Service (RDS)

While the first option might appear to be the most economical solution, it requires heavy lifting. You must install and manage the database, which is not a core part of running your HPC workloads.  Alternatively, Amazon RDS reduces this burden of installing updates, managing security patches, and properly allocating resources.  Additionally, Amazon RDS Free Tier can get you started with a managed database service in the cloud for free. Refer to the hyperlink for a list of free resources.

Amazon RDS is my preferred choice, and the following sections implement this architecture. Bear in mind, however, that the settings and the customizations required in the AWS ParallelCluster environment are the same, regardless of which architecture you prefer.

 

Set up and configure your database

Now, with your architecture determined, let’s configure it.  First, go to Amazon RDS’ console.  Select the same Region where your AWS ParallelCluster is deployed, then click on Create Database.

There are two database instances to consider: Amazon Aurora and MySQL.

Amazon Aurora has many benefits compared to MySQL. However, in this blog post, I use MySQL to show how to build your HPC accounting database within the Free-tier offering.

The following steps are the same regardless of your database choice. So, if you’re interested in one of the many features that differentiate Amazon Aurora from MySQL, feel free to use. Check out Amazon Aurora’s landing page to learn more about its benefits, such as its faster performance and cost effectiveness.

To configure your database, you must complete the following steps:

  1. Name the database
  2. Establish credential settings
  3. Select the DB instance size
  4. Identify storage type
  5. Allocate amount of storage

The following images show the settings that I chose for storage options and the “Free tier” template.  Feel free to change it accordingly to the scope and the usage you expect.

Make sure you select the corresponding VPC to wherever your “compute fleet” is deployed by AWS ParallelCluster, and wherever the Security Group of your compute fleet is selected.  You can access information for your “compute fleet” in your AWS ParallelCluster config file. The Security Group should look something like this: “parallelcluster-XXX-ComputeSecurityGroup-XYZ”.

At this stage, you can click on Create database and wait until the Database status moves from the Creating to Available in the Amazon RDS Dashboard.

The last step for this section is to grant privileges on the database.

  1. Connect to your database. Use the master node of your AWS ParallelCluster as a client.
  2. Install the MySQL client by running sudo yum install mysql on AmazonLinux and CentOS and sudo apt-get install mysql-client on Ubuntu.
  3. Connect to your MySQL RDS database using the following code: mysql --host=<your_rds_endpoint> --port=3306 -u admin -p The following screenshot shows how to find your RDS endpoint and port.

 

4. Run GRANT ALL ON `%`.* TO [email protected]`%`; to grant the required privileges.

The following code demonstrates these steps together:

[[email protected]]$ mysql --host=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com --port=3306 -u admin -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 251
Server version: 8.0.16 Source distribution

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> GRANT ALL ON `%`.* TO [email protected]`%`;

Note: typically this command is run as GRANT ALL ON *.* TO 'admin'@'%';   With Amazon RDS , for security reasons, this is not possible as the master account does not have access to the MySQL database. Using *.*  triggers an error. To work around this, I use the _ and % wildcards that are permitted. To look at the actual grants, you can run the following: SHOW GRANTS;

Enable Slurm Database logging

Now, your database is fully configured. The next step is to enable Slurm as a workload manager.

A few steps must occur to let Slurm log its job accounting information on an external database. The following code demonstrates the steps you must make.

  1. Add the DB configuration file, slurmdbd.conf after /opt/slurm/etc/ 
  2. Slurm’s slurm.conf file requires a few modifications. These changes are noted after the following code examples.

 

Note: You do not need to configure each and every compute node because AWS ParallelCluster installs Slurm in a shared directory. All of these nodes share this directory, and, thus the same configuration files with the master node of your cluster.

Below, you can find two example configuration files that you can use just by modifying a few parameters accordingly to your setup.

For more information about all the possible settings of configuration parameters, please refer to the official Slurm documentation, and in particular to the accounting section.

Add the DB configuration file

#
## Sample /opt/slurm/etc/slurmdbd.conf
#
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveResvs=yes
ArchiveSteps=no
ArchiveSuspend=no
ArchiveTXN=no
ArchiveUsage=no
AuthType=auth/munge
DbdHost=ip-10-0-16-243  #YOUR_MASTER_IP_ADDRESS_OR_NAME
DbdPort=6819
DebugLevel=info
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
PurgeTXNAfter=12month
PurgeUsageAfter=24month
SlurmUser=slurm
LogFile=/var/log/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageUser=admin
StoragePass=password
StorageHost=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com # Endpoint from RDS console
StoragePort=3306                                                                # Port from RDS console

See below for key values that you should plug into the example configuration file:

  • DbdHost: the name of the machine where the Slurm Database Daemon is executed. This is typically the master node of your AWS ParallelCluster. You can run hostname -s on your master node to get this value.
  • DbdPort: The port number that the Slurm Database Daemon (slurmdbd) listens to for work. 6819 is the default value.
  • StorageUser: Define the user name used to connect to the database. This has been defined during the Amazon RDS configuration as shown in the second step of the previous section.
  • StoragePass: Define the password used to gain access to the database. Defined as the user name during the Amazon RDS configuration.
  • StorageHost: Define the name of the host running the database. You can find this value in the Amazon RDS console, under “Connectivity & security”.
  • StoragePort: Define the port on which the database is listening. You can find this value in the Amazon RDS console, under “Connectivity & security”. (see the screenshot below for more information).

Modify the file

Add the following lines at the end of the slurm configuration file:

#
## /opt/slurm/etc/slurm.conf
#
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30
#
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=ip-10-0-16-243
AccountingStorageUser=admin
AccountingStoragePort=6819

Modify the following:

  • AccountingStorageHost: The hostname or address of the host where SlurmDBD executes. In our case this is again the master node of our AWS ParallelCluster, you can get this value by running hostname -s again.
  • AccountingStoragePort: The network port that SlurmDBD accepts communication on. It must be the same as DbdPort specified in /opt/slurm/etc/slurmdbd.conf
  • AccountingStorageUser: it must be the same as in /opt/slurm/etc/slurmdbd.conf (specified in the “Credential Settings” of your Amazon RDS database).

 

Restart the Slurm service and start the SlurmDB demon on the master node

 

Depending on the operating system you are running, this would look like:

  • Amazon Linux / Amazon Linux 2
[[email protected]]$ sudo /etc/init.d/slurm restart                                                                                                                                                                                                                                             
stopping slurmctld:                                        [  OK  ]
slurmctld is stopped
slurmctld is stopped
starting slurmctld:                                        [  OK  ]
[[email protected]]$ 
[[email protected]]$ sudo /opt/slurm/sbin/slurmdbd
  • CentOS7 and Ubuntu 16/18
[[email protected] ~]$ sudo /opt/slurm/sbin/slurmdbd
[[email protected] ~]$ sudo systemctl restart slurmctld

Note: even if you have jobs running, restarting the daemons will not affect them.

Check to see if your cluster is already in the Slurm Database:

/opt/slurm/bin/sacctmgr list cluster

And if it is not (see below):

[[email protected]]$ /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
[[email protected]]$ 

You can add it as follows:

sudo /opt/slurm/bin/sacctmgr add cluster parallelcluster

You should now see something like the following:

[[email protected]]$ /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
parallelc+     10.0.16.243         6817  8704         1                                                                                           normal           
[[email protected]]$ 

At this stage, you should be all set with your AWS ParallelCluster accounting configured to be stored in the Amazon RDS database.

Replicate the process on multiple clusters

The same database instance can be easily used for multiple clusters to log its accounting data in. To do this, repeat the last configuration step for your clusters built using AWS ParallelCluster that you want to share the same database.

The additional steps to follow are:

  • Ensure that all the clusters are in the same VPC (or, if you prefer to use multiple VPCs, you can choose to set up VPC-Peering)
  • Add the SecurityGroup of your new compute fleets (“parallelcluster-XXX-ComputeSecurityGroup-XYZ”) to your RDS database
  • Change the cluster name parameter at the very top of the file. This is in addition to the slurm configuration file ( /opt/slurm/etc/slurm.conf) editing explained prior.  By default, your cluster is called “parallelcluster.” You may want to change that to clearly identify other clusters using the same database. For instance: ClusterName=parallelcluster2

Once these additional steps are complete, you can run /opt/slurm/bin/sacctmgr list cluster  again. Now, you should see two (or multiple) clusters:

[[email protected] ]# /opt/slurm/bin/sacctmgr list cluster
   Cluster     ControlHost  ControlPort   RPC     Share GrpJobs       GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall                  QOS   Def QOS 
---------- --------------- ------------ ----- --------- ------- ------------- --------- ------- ------------- --------- ----------- -------------------- --------- 
parallelc+                            0  8704         1                                                                                           normal           
parallelc+     10.0.16.129         6817  8704         1                                                                                           normal           
[[email protected] ]#   

If you want to see the full name of your clusters, run the following:

[[email protected] ]# /opt/slurm/bin/sacctmgr list cluster format=cluster%30
                       Cluster 
------------------------------ 
               parallelcluster 
              parallelcluster2 
[[email protected] ]# 

Note: If you check the Slurm logs (under /var/log/slurm*), you may see this error:

error: Database settings not recommended values: innodb_buffer_pool_size innodb_lock_wait_timeout

This error refers to default parameters that Amazon RDS sets for you on your MySQL database. You can change them by setting new “group parameters” as explained in the official documentation and in this support article. Please also note that the innodb_buffer_pool_size is related to the amount of memory available on your instance, so you may want to use a different instance type with higher memory to avoid this warning.

Run your first job and check the accounting

Now that the application is installed and configured, you can test it! Submit a job to Slurm, query your database, and check your job accounting information.

If you are using a brand new cluster, test it with a simple hostname job as follows:

[[email protected]]$ sbatch -N2 <<EOF
> #!/bin/sh
> srun hostname |sort
> srun sleep 10
> EOF
Submitted batch job 31
[[email protected]]$

Immediately after you have submitted the job, you should see it with a state of “pending”:

[[email protected]]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2    PENDING      0:0 
[[email protected]]$ 

And, after a while the job should be “completed”:

[[email protected]]$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2  COMPLETED      0:0 
38.batch           batch                                1  COMPLETED      0:0 
38.0           hostname                                2  COMPLETED      0:0 
[[email protected]]$

Now that you know your cluster works, you can build complex queries using sacct. See few examples below, and refer to the official documentation for more details:

[[email protected]]$ sacct --format=jobid,elapsed,ncpus,ntasks,state
       JobID    Elapsed      NCPUS   NTasks      State 
------------ ---------- ---------- -------- ---------- 
38             00:00:00          2           COMPLETED 
38.batch       00:00:00          1        1  COMPLETED 
38.0           00:00:00          2        2  COMPLETED 
39             00:00:00          2           COMPLETED 
39.batch       00:00:00          1        1  COMPLETED 
39.0           00:00:00          2        2  COMPLETED 
40             00:00:10          2           COMPLETED 
40.batch       00:00:10          1        1  COMPLETED 
40.0           00:00:00          2        2  COMPLETED 
40.1           00:00:10          2        2  COMPLETED 
[[email protected]]$ sacct --allocations
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
38               sbatch    compute                     2  COMPLETED      0:0 
39               sbatch    compute                     2  COMPLETED      0:0 
40               sbatch    compute                     2  COMPLETED      0:0 
[[email protected]]$ sacct -S2020-02-17 -E2020-02-20 -X -ojobid,start,end,state
       JobID               Start                 End      State 
------------ ------------------- ------------------- ---------- 
38           2020-02-17T12:25:12 2020-02-17T12:25:12  COMPLETED 
39           2020-02-17T12:25:12 2020-02-17T12:25:12  COMPLETED 
40           2020-02-17T12:27:59 2020-02-17T12:28:09  COMPLETED 
[[email protected]]$ 

If you have configured your cluster(s) for multiple users, you may want to look at the accounting info for all of these.  If you want to configure your clusters with multiple users, follow this blog post. It demonstrates how to configure AWS ParallelCluster with AWS Directory Services to create a multiuser, POSIX-compliant system with centralized authentication.

Each and every user can only look at his own accounting data. However, Slurm admins (or root) can see accounting info for every user. The following code shows accounting data coming from two clusters (parallelcluster and parallelcluster2) and from two users (ec2-user and nicola):

[[email protected] ~]# sacct -S 2020-01-01 --clusters=parallelcluster,parallelcluster2 --format=jobid,elapsed,ncpus,ntasks,state,user,cluster%20                                                                                                                                                        
       JobID    Elapsed      NCPUS   NTasks      State      User              Cluster 
------------ ---------- ---------- -------- ---------- --------- -------------------- 
36             00:00:00          2              FAILED  ec2-user      parallelcluster 
36.batch       00:00:00          1        1     FAILED                parallelcluster 
37             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
37.batch       00:00:00          1        1  COMPLETED                parallelcluster 
37.0           00:00:00          2        2  COMPLETED                parallelcluster 
38             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
38.batch       00:00:00          1        1  COMPLETED                parallelcluster 
38.0           00:00:00          2        2  COMPLETED                parallelcluster 
39             00:00:00          2           COMPLETED  ec2-user      parallelcluster 
39.batch       00:00:00          1        1  COMPLETED                parallelcluster 
39.0           00:00:00          2        2  COMPLETED                parallelcluster 
40             00:00:10          2           COMPLETED  ec2-user      parallelcluster 
40.batch       00:00:10          1        1  COMPLETED                parallelcluster 
40.0           00:00:00          2        2  COMPLETED                parallelcluster 
40.1           00:00:10          2        2  COMPLETED                parallelcluster 
41             00:00:29        144              FAILED  ec2-user      parallelcluster 
41.batch       00:00:29         36        1     FAILED                parallelcluster 
41.0           00:00:00        144      144  COMPLETED                parallelcluster 
41.1           00:00:01        144      144  COMPLETED                parallelcluster 
41.2           00:00:00          3        3  COMPLETED                parallelcluster 
41.3           00:00:00          3        3  COMPLETED                parallelcluster 
42             01:22:03        144           COMPLETED  ec2-user      parallelcluster 
42.batch       01:22:03         36        1  COMPLETED                parallelcluster 
42.0           00:00:01        144      144  COMPLETED                parallelcluster 
42.1           00:00:00        144      144  COMPLETED                parallelcluster 
42.2           00:00:39          3        3  COMPLETED                parallelcluster 
42.3           00:34:55          3        3  COMPLETED                parallelcluster 
43             00:00:11          2           COMPLETED  ec2-user      parallelcluster 
43.batch       00:00:11          1        1  COMPLETED                parallelcluster 
43.0           00:00:01          2        2  COMPLETED                parallelcluster 
43.1           00:00:10          2        2  COMPLETED                parallelcluster 
44             00:00:11          2           COMPLETED    nicola      parallelcluster 
44.batch       00:00:11          1        1  COMPLETED                parallelcluster 
44.0           00:00:01          2        2  COMPLETED                parallelcluster 
44.1           00:00:10          2        2  COMPLETED                parallelcluster 
4              00:00:10          2           COMPLETED    nicola     parallelcluster2 
4.batch        00:00:10          1        1  COMPLETED               parallelcluster2 
4.0            00:00:00          2        2  COMPLETED               parallelcluster2 
4.1            00:00:10          2        2  COMPLETED               parallelcluster2 
[[email protected] ~]# 

You can also directly query your database, and look at the accounting information stored in it or link your preferred BI tool to get insights from your HPC cluster. To do so, run the following code:

[[email protected]]$ mysql --host=parallelcluster-accounting.c68dmmc6ycyr.us-east-1.rds.amazonaws.com --port=3306 -u admin -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 48
Server version: 8.0.16 Source distribution

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| slurm_acct_db      |
+--------------------+
4 rows in set (0.00 sec)

mysql> use slurm_acct_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-----------------------------------------+
| Tables_in_slurm_acct_db                 |
+-----------------------------------------+
| acct_coord_table                        |
| acct_table                              |
| clus_res_table                          |
| cluster_table                           |
| convert_version_table                   |
| federation_table                        |
| parallelcluster_assoc_table             |
| parallelcluster_assoc_usage_day_table   |
| parallelcluster_assoc_usage_hour_table  |
| parallelcluster_assoc_usage_month_table |
| parallelcluster_event_table             |
| parallelcluster_job_table               |
| parallelcluster_last_ran_table          |
| parallelcluster_resv_table              |
| parallelcluster_step_table              |
| parallelcluster_suspend_table           |
| parallelcluster_usage_day_table         |
| parallelcluster_usage_hour_table        |
| parallelcluster_usage_month_table       |
| parallelcluster_wckey_table             |
| parallelcluster_wckey_usage_day_table   |
| parallelcluster_wckey_usage_hour_table  |
| parallelcluster_wckey_usage_month_table |
| qos_table                               |
| res_table                               |
| table_defs_table                        |
| tres_table                              |
| txn_table                               |
| user_table                              |
+-----------------------------------------+
29 rows in set (0.00 sec)

mysql>

Conclusion

You’re finally all set! In this blog post you set up a database using Amazon RDS, configured AWS ParallelCluster and Slurm to enable job accounting with your database, and learned how to query your job accounting history from your database using the sacct command or by running SQL queries.

Deriving insights for your HPC workloads doesn’t end when your workloads finish running. Now, you can better understand and optimize your usage patterns and generate ideas about how to wring more price-performance out of your HPC clusters on AWS. For retrospective analysis, you can easily understand whether specific jobs, projects, or users are responsible for driving your HPC usage on AWS.  For forward-looking analysis, you can better forecast future usage to set budgets with appropriate insight into your costs and your resource consumption.

You can also use these accounting systems to identify users who may require additional training on how to make the most of cloud resources on AWS. Finally, together with your spending patterns, you can better capture and explain the return on investment from all of the valuable HPC work you do. And, this gives you the raw data to analyze how you can get even more value and price-performance out of the work you’re doing on AWS.

 

 

 

 

Amazon Lightsail Database Tips and Tricks

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/amazon-lightsail-database-tips-and-tricks/

This post is contributed by Mike Coleman | Developer Advocate for Lightsail | Twitter: @mikegcoleman

Managed Databases on Amazon Lightsail are affordably priced, and incredibly easy to run. Lightsail databases offer a solid foundation on which to build your application.  You can leverage attractive features like one-click high availability, automatic backups, and a choice of database engines to support your Lightsail apps.

While it’s super simple to do an initial deployment on Amazon Lightsail, I often get questions about how to perform some standard management tasks. Some examples of these tasks are scaling up a database or accessing that database with command line tools. I am also asked how to handle a scenario when you find that you need some of the advanced features found in Amazon Relation Database Service (RDS).

This blog answers these questions and offers general guidance on how to address these issues.

Scale Up Your Database

When I first deploy resources to the cloud, I always choose the least expensive option. Often times, that choice works out and everything runs fine. But sometimes, this results in under sizing resources, which necessitates a move to resources with more horsepower.

If this happens with your Lightsail databases, it’s straightforward to move your database to a larger size. Additionally, you can check the metrics page in the Amazon Lightsail console to see your database performance, and to determine if you need to upgrade.

Let’s walk through how to size up your database.

Start by creating a snapshot of your instance.

  1. Navigate to the Lightsail home page and click databases
  2. Click on the name of your database
  3. From the horizontal menu, click on Snapshots & restoreScreenshot of the snapshot and restore choice
  4. Under Manual Snapshot click + Create snapshotscreenshot of where to hit create snapshot
  5. Give the snapshot a name
  6. Click Create

It takes several minutes for the snapshot creation process to complete. Once the snapshot is available, you can create your new database instance choosing a larger size.

  1. Click the three-dot menu to the right of the snapshot you just created
  2. Choose Create new database
  3. Under Choose your database plan, select either a Standard or High Availability If you’re running a mission critical application, you definitely want to choose the high availability option. Standard is great for test environments or workloads where your application can withstand downtime in the event of a database failure.
  4. Choose the size for your new database instance
  5. Give your database instance a name
  6. Click Create database

The new database is created after several minutes.

Lightsail generates a new password when you create a new database from a snapshot. You can either use this newly generated password, or change it. You can change the password using the following steps:

  1. From the Lightsail home, page click Databases
  2. Scroll down to the Connection details section
  3. If you want to use the auto-generated password, click Show in the password box to display the password
    Otherwise complete steps 4 and 5 to specify a new password.
  4. Under Password, click Change password
  5. Enter a new password and click Save
    It will take a few minutes for the password to update

Now, go into your application. Configure the application to point the new database using the new endpoint, user name, and password values.

Note: It’s out of the scope for this blog to cover how to configure individual applications. Consult your application documentation to see how to do it for your specific application.

Command Line Access

There may be times when you need to work on your database using command line tools. You cannot connect directly to your Lightsail database instance. But, you can access the database remotely from another Lightsail instance.

You can also make your instance accessible via the public internet, and access it remotely from any internet-connected computer. However, I wouldn’t recommend this from a security perspective.

You first must create a new Lightsail instance to get started accessing your Lightsail database via the command line. I recommend basing your instance on Lightsail’s LAMP blueprint because there are MySQL command line tools already installed.

To create a new LAMP instance, do the following:

  1. From the Lightsail home page, click Create Instance
  2. Make sure you create the instance in the same Region as your Lightsail databaseinstance location image
  3. Under Select a blueprint, choose LAMP (PHP 7)blueprint selection
  4. Since you’re only using this instance to run MySQL command line tools, you can choose the smallest instance size
  5. Give your instance a name
  6. Click Create Instance

It takes a few minutes for your new instance to start up.

To check that everything is working correctly, use the MySQL command line interface.

Make sure you have the database user name, password, and endpoint. These can be found by clicking on the name of your database under the Connection details section.

  1. Use either your own SSH client or the built-sin web client to access the Lightsail instance you just created
  2. On the command line, enter the following command substituting the values for your database
mysql \
--host <lightsail database endpoint> \
--user <lightsail database username> \
--password

For example:

mysql \
--host ls-randomchars.us-east-2.rds.amazonaws.com \
--user dbmasteruser \
--password

Notice that you don’t actually put the password on the command line.

3. When prompted enter the password (note that the password will not show up when you enter it)

4. You should now be at the MySQL command prompt

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 87482
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

From here, you can use the command line as you normally would.

Migrating From a Managed Database to Amazon RDS

One of the great things about Lightsail is that it’s easy to get started quickly. It also gives you an easy migration path to more advanced AWS services, should you ever need them. For instance, you might se tup your database on Lightsail, and then realize that it could benefit from read replicas to handle growing traffic. Fortunately, it’s a pretty straightforward process to migrate your data from Lightsail to RDS.

 

Deploy an Amazon RDS database

First, make sure you have an RDS database running the same engine in the same Region as your Lightsail instance, and in your default Amazon VPC. For example, if your Lightsail database is running MySQL in the Oregon Region, RDS should also be running MySQL in the Oregon Region and in the default VPC. If you’re not sure how to create an RDS database, check out their documentation.

Make sure to note the username and password for your new database.

Create a Lightsail Instance

You also need a Lightsail instance with the MySQL command line tools installed. You can set one up by following the instructions in the previous section of this blog.

Enable VPC Peering

To get started, ensure that the Lightsail VPC can communicate. You do this by enabling VPC peering in Lightsail, and modifying the security group for RDS to allow traffic from the Lightsail VPC.

  1. Return to the Lightsail console home page and click Account in the top-right corner. Choose Account from the pop out menu.
  2. Click Advanced on the horizontal menu
  3. Under VPC peering, ensure that the Enable VPC peering box is checked for the region where your database is deployed.
    enable vpc peering screenshot

Adjust the RDS database security group

The next step is to edit the security group for the RDS instance to allow traffic from the Lightsail subnet.

  1. Return to the RDS console home page
  2. Under Resourcesclick on DB Instances
  3. Click on the name of the database you want to migrate data into
  4. Under Connectivity and securityclick on the security group nameconnectivity and security configuration

The security group dialog appears. From here you can add an entry for the Lightsail subnet.

  1. Click the Inbound tab near the bottom of the screen
  2. Click the Edit button
  3. Click Add rule in the pop-up box
  4. From the Type drop-down choose MySQL/Aurora
  5. In the source box, enter 172.26.0.0/16 (this is the CIDR address for the Lightsail subnet)inbound rules
  6. Click Save

Migrate the data from the Lightsail Database to RDS

Now that Lightsail resources can talk with your RDS database, you can do the actual migration.

The initial step is to use mysqldump to export your database information into a file that can be imported into RDS. mysqldump has many options. In this case, you export a database named tasks. Choose the appropriate database for your use case, as well as any other options that make sense.

  1. Use either your own SSH client or the built-in web client to access the Lightsail instance you just created.
  2. Use the following mysqldump command to create a backup of your database to a text file (dump.sql). Substitute the connection values for your Lightsail database. These values  are on the details page of your database under Connection details. The database name must be specific to your environment.
mysqldump \
--host <lightsail database endpoint> \
--user <lightsail database username> \
--databases <database name> \
--password \
> dump.sql

For example:

mysqldump \
--host ls-randomchars.us-west-2.rds.amazonaws.com \
--user dbmasteruser \
--databases tasks \
--password \
--set-gtid-purged=OFF \
> dump.sql

Now that you have a database backup, you can import that into your RDS instance. You need the connection details from your RDS database. Use the username and password from when you created the database. You can find the endpoint on the details page of your database under Connectivity and security (See the following screenshot for an example).

endpoint and port for connectivity and security

If you are not already, return to the terminal session for the Lightsail instance that has the MySQL tools installed.

To import the data into the RDS database you must provide the contents of the dump.sql file to the mysql command line, too. The cat command lists out the file, and by using | (referred to as a pipe) we can send the output directly from that command into mysql.

cat dump.sql | \
mysql \
--host <RDS database endpoint> \
--user <RDS user> \
--password

For example:

cat dump.sql | \
mysql \
--host database.randomchars.us-west-2.rds.amazonaws.com \
--user dbmasteruser \
--password

You can also use the mysql command to see if the database was created (this is similar to what we did when we passed in the file in the previous step. Instead, this time we’re using echo to pipe in the command show databases;)

echo "show databases;| \
mysql \
--host <RDS database endpoint> \
--user <RDS user> \
--password

For example:

echo "show databases;" | \
mysql \
--host database.randomchars.us-west-2.rds.amazonaws.com \
--user dbmasteruser \
--password

From here, you reconfigure your application to access your new RDS database.

Conclusion

In this post I reviewed some common tasks that you might want to do once you created your Amazon Lightsail database. You learned how to scale up the size of your database, how to access it with command line tools, and how to migrate to RDS.

If you’ve not yet deployed a Managed Database on Lightsail why not head over to the Lightsail console and create one now. If you need a bit of guidance to get started, we have a workshop at https://lightsailworkshop.com that will show you how to use Lightsail to deploy a two-tier web application using a MySQL database backend. Please feel free to leave comments and questions for future blog posts.

Building faster, lower cost, better APIs – HTTP APIs now generally available

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-better-apis-http-apis-now-generally-available/

In July 2015, AWS announced Amazon API Gateway. This enabled developers to build secure, scalable APIs quickly in front of a variety of different types of architectures. Since then, the API Gateway team continues to build new features and services for customers.

Figure 1: API Gateway feature highlights timeline

In early 2019, the team evaluated the current services and made plans for the next chapter of API Gateway. They prototyped new languages and technologies, applied lessons learned from building the REST and WebSocket APIs, and looked closely at customer feedback. The result is HTTP APIs for Amazon API Gateway, a service built from the ground up to be faster, lower cost, and simpler to use. In short, HTTP APIs offers a better solution for building APIs. If you are building an API and HTTP APIs fit your requirements, this is the place to start.

Faster

For the majority of use cases, HTTP APIs offers up to 60% reduction in latency. Developers strive to build applications with minimal latency and maximum functionality. They understand that each service involved in the application process can introduce latency.

All services add latency

Figure 2: All services add latency

With that in mind, HTTP APIs is built to reduce the latency overhead of the API Gateway service. Combining both the request and response, 99% of all requests (p99) have less than 10 ms of additional latency from HTTP API.

Lower cost

At Amazon, one of our core leadership principles is frugality. We believe in doing things in a cost-effective manner and passing those savings to our customers. With the availability of new technology, and the expertise of running API Gateway for almost five years, we built HTTP APIs to run more efficiently.

REST/HTTP APIs price comparison

Figure 3: REST/HTTP APIs price comparison

Using the pricing for us-east-1, figure 3 shows a cost comparison for 100 million, 500 million, and 1 billion requests a month. Overall, HTTP APIs is at least 71% lower cost compared to API Gateway REST APIs.

Simpler

On the user interface for HTTP API, the API Gateway team has made the entire experience more intuitive and easier to use.

CORS configuration

Figure 4: CORS configuration

Another example is the configuration of cross origin resource scripting (CORS). CORS provides security by controlling cross-domain access to servers and can be difficult to understand and configure. HTTP APIs enables a developer to configure CORS settings quickly using a simple, easy-to-understand UI. This same approach throughout the UI creates a powerful, yet easy approach to building APIs.

New features

HTTP APIs beta was announced at AWS re:Invent 2019 with powerful features like JWT authorizers, auto-deploying stages, and simplified route integrations. Today, HTTP APIs is generally available (GA) with more features to help developers build APIs faster, lower cost and better.

Private integrations

HTTP APIs now offers developers the ability to integrate with resources secured in a Amazon VPC. When developing with technologies like containers via Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS), the underlying Amazon EC2 clusters must reside inside a VPC. While it is possible to make these services available through Elastic Load Balancing, developers can also take advantage of HTTP APIs to front their applications.

VPC link configuration

Figure 5: VPC link configuration

To create a private integration, you need a VPC link. VPC links take advantage of AWS Hyperplane, the Network Function Virtualization platform used for Network Load Balancer and NAT Gateway. With this technology, multiple HTTP APIs can use a single VPC link to a VPC. Likewise, multiple REST APIs can use a REST APIs VPC link.

Private integration configuration

Figure 6: Private integration configuration

Once a VPC link exists, you can configure an HTTP APIs private integration using a Network Load Balancer (NLB), an Application Load Balancer (ALB), or the resource discovery service, AWS Cloud Map.

Custom domain cross compatibility

Amazon API Gateway now offers the ability to share custom domains across REST APIs and HTTP API. This flexibility allows developers to mix and match between REST APIs and HTTP APIs while building applications.

Custom domain cross compatibility

Figure 7: Custom domain cross compatibility

Previously, when building applications needing a consistent domain name, developers could only use a single type of API. Because applications often require features only available on REST, REST APIs are a common choice. From today, you can distribute these routes between HTTP APIs and REST APIs based on feature requirement.

Request throttling

HTTP APIs now offers the ability to do granular throttling at the stage and route level. API throttling is an often-overlooked API feature that is critical to the health of APIs and their infrastructure. By default, API Gateway limits the steady-state request rate to 10,000 requests per second (rps) with a 5,000 request burst limit. These are soft limits and can be raised via AWS Service Quotas.

HTTP APIs has the concept of stages that can be used for different purposes. Applications can have dev, beta, and prod stages on the same API. Additionally, for backwards compatibility, you can configure multiple production stages of the same API. Each stage has optional burst and rate limit settings that override the account level setting of 10,000 rps.

Stage level throttling

Figure 8: Stage level throttling

Throttling can also be set at the route level. A route is a combination of the path and method. For example, the GET method on a root path (/) combine to make a route. At the time of this writing, route level throttling must be created with the AWS Command Line Interface (CLI) or an AWS SDK. To set the throttling on a route of / [ANY] on the $default stage, use the following CLI command:

aws apigatewayv2 update-stage --api-id <your API ID> --stage-name $default --route-settings '{"ANY /": {"ThrottlingBurstLimit":1000, "ThrottlingRateLimit":2000}}'

Stage variables

HTTP APIs now supports the use of stage variables to pass dynamic data to the backend integration or even define the integration itself. When a stage is defined on HTTP API, it creates a new path to the backend integration. The following table shows a domain with several stages:

StagePath
$defaultwww.mydomain.com
devwww.mydomain.com/dev
betawww.mydomain.com/beta

When you access the link for the dev stage, the dev stage variables are passed to the backend integration in the event object. The backend uses this information when processing the request. While not a best practice for passing secrets, it is useful for designating non-secret data, like environment-specific endpoints or feature switches.

Stage variables are also used to dynamically define the backend integration. For example, if the integration uses one AWS Lambda function for production and another for testing, you can use stage variables to dynamically route to the appropriate Lambda function as shown below:

Dynamically choosing an integration point

Figure 9: Dynamically choosing an integration point

When building dynamic integrations, it is also important to update permissions accordingly. HTTP APIs automatically adds invocation rights when an integration points to a single Lambda function. However, when using multiple functions, you must create and manage the role manually. Do this by turning off the Grant API Gateway permission to invoke your Lambda function option and entering a custom role with the appropriate permissions.

Integration custom role

Figure 10: Integration custom role

Lambda payload version 2.0

HTTP APIs now supports an updated event payload and response format for the Lambda function integration. Version 2.0 payload simplifies the format of the event object sent to the Lambda function. Here is a comparison of the event object that is sent to a Lambda function in version 1.0 and 2.0:

Version 1.0

{
    "version": "1.0",
    "resource": "/Echo",
    "path": "/Echo",
    "httpMethod": "GET",
    "headers": {
        "Content-Length": "0",
        "Host": "0000000000.execute-api.us-east-1.amazonaws.com",
        "User-Agent": "TestClient",
        "X-Amzn-Trace-Id": "Root=1-5e6ab926-933e1530e55773a0709dfaa6",
        "X-Forwarded-For": "1.1.1.1",
        "X-Forwarded-Port": "443",
        "X-Forwarded-Proto": "https",
        "accept": "*/*",
        "accept-encoding": "gzip, deflate, br",
        "cache-control": "no-cache",
        "clientInformation": "private",
        "cookie": "Cookie_2=value; Cookie_3=value; Cookie_4=value"
    },
    "multiValueHeaders": {
        "Content-Length": [
            "0"
        ],
        "Host": [
            "0000000000.execute-api.us-east-1.amazonaws.com"
        ],
        "X-Amzn-Trace-Id": [
            "Root=1-5e6ab926-933e1530e55773a0709dfaa6"
        ],
        "X-Forwarded-For": [
            "1.1.1.1"
        ],
        "X-Forwarded-Port": [
            "443"
        ],
        "X-Forwarded-Proto": [
            "https"
        ],
        "accept": [
            "*/*"
        ],
        "accept-encoding": [
            "gzip, deflate, br"
        ],
        "cache-control": [
            "no-cache"
        ],
        "clientInformation": [
            "public",
            "private"
        ],
        "cookie": [
            "Cookie_2=value; Cookie_3=value; Cookie_4=value"
        ]
    },
    "queryStringParameters": {
        "getValueFor": "newClient"
    },
    "multiValueQueryStringParameters": {
        "getValueFor": [
            "newClient"
        ]
    },
    "requestContext": {
        "accountId": "0000000000",
        "apiId": "0000000000",
        "domainName": "0000000000.execute-api.us-east-1.amazonaws.com",
        "domainPrefix": "0000000000",
        "extendedRequestId": "JTHd9j2EoAMEPEA=",
        "httpMethod": "GET",
        "identity": {
            "accessKey": null,
            "accountId": null,
            "caller": null,
            "cognitoAuthenticationProvider": null,
            "cognitoAuthenticationType": null,
            "cognitoIdentityId": null,
            "cognitoIdentityPoolId": null,
            "principalOrgId": null,
            "sourceIp": "1.1.1.1",
            "user": null,
            "userAgent": "TestClient",
            "userArn": null
        },
        "path": "/Echo",
        "protocol": "HTTP/1.1",
        "requestId": "JTHd9j2EoAMEPEA=",
        "requestTime": "12/Mar/2020:22:35:18 +0000",
        "requestTimeEpoch": 1584052518094,
        "resourceId": null,
        "resourcePath": "/Echo",
        "stage": "$default"
    },
    "pathParameters": null,
    "stageVariables": null,
    "body": null,
    "isBase64Encoded": true
}

Version 2.0

{
    "version": "2.0",
    "routeKey": "ANY /Echo",
    "rawPath": "/Echo",
    "rawQueryString": "getValueFor=newClient",
    "cookies": [
        "Cookie_2=value",
        "Cookie_3=value",
        "Cookie_4=value"
    ],
    "headers": {
        "accept": "*/*",
        "accept-encoding": "gzip, deflate, br",
        "cache-control": "no-cache",
        "clientinformation": "public,private",
        "content-length": "0",
        "host": "0000000000.execute-api.us-east-1.amazonaws.com",
        "user-agent": "TestClient",
        "x-amzn-trace-id": "Root=1-5e6ab967-cfe253ce6f8b90986a678c40",
        "x-forwarded-for": "1.1.1.1",
        "x-forwarded-port": "443",
        "x-forwarded-proto": "https"
    },
    "queryStringParameters": {
        "getValueFor": "newClient"
    },
    "requestContext": {
        "accountId": "0000000000",
        "apiId": "0000000000",
        "domainName": "0000000000.execute-api.us-east-1.amazonaws.com",
        "domainPrefix": "0000000000",
        "http": {
            "method": "GET",
            "path": "/Echo",
            "protocol": "HTTP/1.1",
            "sourceIp": "1.1.1.1",
            "userAgent": "TestClient"
        },
        "requestId": "JTHoQgr2oAMEPMg=",
        "routeId": "47matwk",
        "routeKey": "ANY /Echo",
        "stage": "$default",
        "time": "12/Mar/2020:22:36:23 +0000",
        "timeEpoch": 1584052583903
    },
    "isBase64Encoded": true
}

Additionally, version 2.0 allows more flexibility in the format of the response object from the Lambda function. Previously, this was the required format of the response:

{
  “statusCode”: 200,
  “body”:
  {
    “Name”: “Eric Johnson”,
    “TwitterHandle”: “@edjgeek”
  },
  Headers: {
    “Access-Control-Allow-Origin”: “https://amazon.com”
  }
}

When using version 2.0, the response is simpler:

{
  “Name”: “Eric Johnson”,
  “TwitterHandle”: “@edjgeek”
}

When HTTP APIs receives the response, it uses data like CORS settings and integration response codes to populate the missing data.

By default, new Lambda function integrations use version 2.0. You can change this under the Advanced settings toggle for the Lambda function integration. The version applies to both the event and response payloads. If you choose version 1.0, the old event format is sent to the Lambda function and the full response object must be returned.

Lambda integration advanced settings

Figure 11: Lambda integration advanced settings

OpenAPI/Swagger support

HTTP APIs now supports importing Swagger or OpenAPI configuration files. This makes it simple to migrate from other API Gateway services to HTTP API. When importing a configuration file, HTTP APIs implements all supported features and reports any features that are not currently supported.

AWS Serverless Application Model (SAM) support

At the time of this writing, the AWS Serverless Application Framework (SAM) supports most features released in beta at re:Invent 2019. AWS SAM support for many GA features is scheduled for release by March 20, 2020.

Conclusion

For almost five years, Amazon API Gateway has enabled developers to build highly scalable and durable application programming interfaces. It has allowed the abstraction of tasks like authorization, throttling, and data validation from within the application code to a managed service. With the introduction of HTTP APIs for Amazon API Gateway, developers can now use this powerful service in a faster, lower cost, and better way.

Configuring and using monitoring and notifications in Amazon Lightsail

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/compute/configuring-and-using-monitoring-and-notifications-in-amazon-lightsail/

This post is contributed by Mike Coleman | Developer Advocate for Lightsail | Twitter: @mikegcoleman

We recently announced the release of resource monitoring, alarms, and notifications for Amazon Lightsail. This new feature allows you to set alarm thresholds on your Lightsail instances, databases, and load balancers. When those alarm thresholds are breached, you can be notified via the Lightsail console, SMS text message, and/or email.

In this blog, I walk you through the process of setting your notification contacts, creating a new alarm, and testing out the notifications. Additionally, you can simulate a failure to see how that is represented in the metrics graph, console, and notification endpoints.

The alarm you create during this blog notifies you whenever there are two or fewer healthy instances attached to your load balancer. In a production workload, this is a fairly critical scenario. So, you can configure Lightsail to notify you as soon as it discovers the problem.

Prerequisites
To complete this walkthrough, you need three running Lightsail instances, placed behind a Lightsail load balancer. If you need with help getting that done, check out our documentation on instances and load balancers. Don’t worry about what’s running on your instance, it’s not important for the sake of this walkthrough.

Configuring notification contacts
To get started, create notification contacts for both email and SMS. You can create one email contact per AWS Region where Lightsail operates. If you find that you need to notify multiple people via email, you can create an email distribution list with those contacts. Then, set Lightsail to send the email notification to that distribution list.

For SMS, you can only create contacts in specific Regions. For an up-to-date list of those Regions, go here:

Let’s get started creating the notification contacts. First, log into the AWS Management Console, navigate to the Lightsail home page, and follow the steps below.

  1. Click Account near the top right of the Lightsail home page, and click account from the dropdown menu.Image showing the location of the account drop down 
  2. Scroll down to the Notification contacts section and click + Add email address. Ensure that the correct AWS Region is selected. Image showing the location of the region drop down on the add regional email contact dialogue box
  3. Enter the email address where you want Lightsail to send alarm notifications in the text box.
  4. Click Add contact. Then, click I understand. This signifies that you understand that a verification email is sent to the email address you entered, and that notifications are not sent to that address until it’s verified.You verify your email address in the next section.
  5. Scroll down and click + Add SMS number, and ensure that the correct AWS Region is selected.
  6. Choose the country/Region for the phone number you want to use (note: this can be different than your AWS Region), and enter the mobile number you want to receive notifications at.
  7. Click Add contact.

Verifying the notification contact email address

Return to the Lightsail home page. There should be a banner at the top of the page notifying you that your email address must be verified.

  1. Access the email account that you set up in the previous section. You should find an email with the subject AWS Notification – Subscription Confirmation
  2. Open the email.
  3. In the email body near the bottom there is a link to click to verify your email address. Click the verification link. A webpage loads, which lets you know that your email address was successfully verified.Image showing the location of the confirm subscription link in the verification email
  4. Refresh the Lightsail home page, and the verification banner should now be gone.

Creating an alarm

Now you have your notification contacts set. It’s time to create the actual alarm. An alarm is defined by choosing a metric to monitor, and defining a threshold value and schedule for evaluation.

In our example today, we’re going to look at the healthy host count metric. As the name implies, this metric tells us how many healthy hosts are connected to our Lightsail load balancer. In this example, there are three instances attached to the load balancer, and you want to know if one of them becomes unhealthy as soon as possible.  In other words, when there are two or fewer healthy instances.

Lightsail reports metrics every five minutes. So the cadence you set to be notified is the first time the metric reports back unhealthy in any five-minute window. Follow the steps below to configure the alarm.

  1. From the Lightsail home page click on Networking from the horizontal menu, and then click on the name of your load balancer.
  2. From the horizontal menu click on Metric.
  3. Ensure that Healthy host count is selected from the metric dropdown menu.
  4. Scroll down the page and click +Add alarm
  5. Use the dialog box and drop-down menu to set the threshold for healthy host count to Less than or equal to 2. Set the time values so that an alarm is raised if the threshold is exceeded 1 time within the last 5 minutes.
  6. Leave the check boxes checked for SMS and email notifications.
  7. The alarm is created, but Lightsail may display a message indicating that the alarm is in insufficient data state. This disappears in a few minutes.

Test to make sure that you actually receive a notification when the alarm threshold is breached.

  1. To test that the notifications are configured correctly, click the three dot menu for your alarm. Image showing the location of the three dot menu
  2. From the drop down menu, choose Test alarm notification. After a minute or two, you should get both a text and an email notification.

Triggering the alarm with a simulated instance failure

So, at this point you have an alarm set, and you know that the notifications are working properly. In this final, step you simulate a failure to ensure that it’s detected correctly by Lightsail. That the alarm then sends the appropriate notifications.

To simulate a failure, you’re simply going to stop one of the instances attached to your load balancer by following the steps below.

  1. Return to the Lightsail home page.
  2. Click on the three dot menu for one of your instances and click
  3. Click Stop again to confirm.
  4. Return to the Load Balancers metrics page, and notice that after a few minutes the graph shows a drop in the healthy host count. Shortly after the graph updates the alarm should breach, and you should receive your text and email notifications.Note: Since our alarm threshold is evaluated after 5 minutes, it may take a few minutes for the alarm to trigger and notifications to be sent.
  5. Once you receive your text and/or email notifications, return to the Lightsail home page. Notice there is also a notification banner at the top of the page.
  6. Click on the three dot menu for the instance you previously stopped, and click Start.
  7. Return the load balancer metrics screen and wait for the graph to update and show three healthy instances. You may need to refresh your web browser.

As mentioned previously you can create alarms for a wide variety of metrics across databases, load balancers, and instances. For a complete list of metrics, check out the Lightsail documentation.

Conclusion

So, that’s all there is to it. If you spun up new instances for this simulation, make sure to terminate those to eliminate extra costs. If you’ve already got some critical resources running in Lightsail, now is a good time to set up some alarms and notifications. If you don’t have anything currently running in Lightsail, why not take advantage of our free 30-day offer for Lightsail instances, and build something today. Need more? Reach out to me @mikegcoleman, and check out Lightsail’s landing page for other resources and tutorials.

AWS Lambda now supports Ruby 2.7

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/aws-lambda-now-supports-ruby-2-7/

You can now develop your AWS Lambda functions using Ruby 2.7. Start using this runtime today by specifying a runtime parameter value of ruby2.7 when creating or updating Lambda functions.

New Ruby runtime features

Ruby 2.7 is a stable release and brings several new features, including pattern matching, argument forwarding, and numbered arguments.

Pattern matching

Pattern matching, a widely used feature in functional programming languages, is introduced as a new experimental feature. This allows deep matching of structured values, checking the structure and binding the matched parts to local variables.

It can traverse a given object and assign it’s value if it matches a pattern:

require "json"
 
json = <<END
{
  "name": "Alice",
  "age": 30,
  "children": [{ "name": "Bob", "age": 2 }]
}
END
 
case JSON.parse(json, symbolize_names: true)
in {name: "Alice", children: [{name: "Bob", age: age}]}
  p age #=> 2
end

For additional information on pattern matching, see Feature #14912.

Argument forwarding

Prior to Ruby 2.7 the * and ** operators are available for single and keyword arguments. These are used to specify any number of arguments or convert array or hashes to several arguments.

Ruby 2.7 added a new shorthand syntax ... for forwarding all arguments to a method irrespective of type. In the example below, all arguments to foo are forwarded to bar, including keyword and block arguments. It acts similar to calling super without any arguments.

def foo(...)
  bar(...)
end

Numbered arguments

Numbered arguments allow you to reference block arguments solely by their index. They are only valid when referenced inside of a block:

Before:

[1, 2, 3].each { |i| puts i }

After:

[1, 2, 3].each { puts @1 }

This can make short code blocks easier to read and reduce code repetition.

Amazon Linux 2

Ruby 2.7, like (Python 3.8, Node.js 10 and 12, and Java 11) is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.

Next steps

Get started building with Ruby 2.7 today by specifying a runtime parameter value of ruby2.7 when creating your Lambda functions. You can read about the Ruby programming model in the AWS Lambda documentation to learn more about writing functions in Ruby 2.7.

For existing Ruby functions, migrate to the new runtime by making any necessary changes to the code for compatibility with Ruby 2.7, then changing the function’s runtime configuration to ruby2.7.

Enjoy, go build with Ruby!

Estimating Amazon EC2 instance needed when migrating ERP from IBM Power

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/estimating-amazon-ec2-instance-needed-when-migrating-erp-from-ibm-power/

This post courtesy of CK Tan, AWS, Enterprise Migration Architect – APAC

Today there are many enterprise customers who are keen on migrating their mission critical Enterprise Resource Planning (ERP) applications such as Oracle E-Business Suite (Oracle EBS) or SAP running on IBM Power System from on-premises to the AWS cloud platform. They realize running critical ERP applications on AWS will enable their business to be more agile, cost-effective, and secure than on premises. AWS provides cloud native services that streamline your ability to adopt emerging technologies, enabling greater innovation, and faster time-to-value.

However, it is essential to note that both IBM Power System and Amazon Elastic Cloud Compute (EC2) are using different CPU processor architectures. Amazon EC2 instances are running on either x86 or ARM-based processors while IBM Power System is running on IBM Power processors. As a result, there are no direct performance benchmarks mapping the two platforms, or sizing for enterprise customers who intend to migrate their application running on IBM Power System to Amazon EC2.

The purpose of this blog is to share a methodology along with example of how to estimate the sizing of mission critical applications running on IBM Power System to Amazon EC2 running on x86 platform.

Methodology

In order to normalize the CPU performance between IBM Power and Intel x86 processors, we may refer to SAP Application Performance Standard (SAPS) for performance benchmark purposes. SAPS is a hardware-independent unit of measurement that describes the performance of a system configuration in the SAP environment. It is derived from the Sales and Distribution (SD) two-tier benchmark, where 100 SAPS is defined as 2,000 fully business processed order line items per hour.

In technical terms, this throughput is achieved by processing 6,000 dialog steps (screen changes), 2,000 postings per hour in the SD Benchmark, or 2,400 SAP transactions.

In the SD benchmark, fully business processed means the full business process of an order line item: creating the order, creating a delivery note for the order, displaying the order, changing the delivery, posting a goods issue, listing orders, and creating an invoice.

In this methodology, we compare the performance sizing by using SAPS which is not released by other ERP software such as Oracle EBS. However, we believe that this methodology will be most pertinent of indirect performance benchmark from an ERP software perspective as most of the enterprise businesses have similar process of ordering.

Discussion by Example

We will walk you through a real example to demonstrate how you can translate and migrate an SAP ERP 6.0 or Oracle EBS application database that is running on IBM Power 795 (IBM P795) with IBM AIX from on-premises to AWS cloud environment by choosing the right size of Amazon EC2 instances.

Step 1: IBM P795 System Performance Analysis

Users can leverage IBM nmon Analyzer which is a free tool to produce AIX performance analysis reports running on IBM Power System. The tool is designed to work with the latest version of nmon, but it is also tested with older versions for backwards compatibility. The tool is updated whenever nmon is updated, and at irregular intervals for new function. It allows users to:

  • View the data in spreadsheet form
  • Eliminate “bad” data
  • Automatically produce graphs for presentation to clients

Below is the performance analysis being captured for CPU, memory, disk throughput, disk Input / Output (I/O), and network I/O during the system’s database peak demand at month end closing – using IBM nmon Analyzer.

Figure 1: Actual Number of Pyhsical Core of CPU Utilization Analysis on IBM P795

 

Figure 2: Memory Usage Analysis on IBM P795

 

Figure 3: Total Disk Throughput Analysis on IBM P795

 

Figure 4: Total Disk I/O on IBM P795

 

Figure 5: Network I/O Analysis on IBM P795

Step 2: CPU Normalization

Users can find the SAPS results for different hardware manufacturers from this link. Table 1 below shows how to calculate the processing power for single CPU core with respect to different hardware manufacturers. We have chosen Amazon EC2 instances r5.metal and r4.16xlarge as potential candidates because they provide memory sizing which best fits the current demand of 408 GB – refer to Figure 2.

Instance TypeSAPS (2- Tier)Memory (GB)Total CPU CoresSAPS/Core
IBM P795688,6304,0962562,690
EC2 r5.metal140,000768482,917
EC2 r4.16xlarge76,400512362,122

Table 1: Calculate the SAPS Processing Power for Single Core of Physical CPU

In order to calculate the Normalization Factor for any targeted Amazon EC2 instance with respect to source system of IBM Power system, we will apply Equation 1 as shown below:

Normalization Factor = [SAPS Per Core] Amazon EC2 / [SAPS Per Core] IBM POWER System (1)

Therefore, by inserting the SAPS/Core being calculated from Table 1 into Equation (1):

  • The normalization factor for 1 core of Amazon EC2 of r5.metal to IBM P795 is equivalent to 1.08
  • The normalization factor for 1 core of Amazon EC2 of r4.16xlarge to IBM P795 is equivalent to 0.79

Table 2 below shows how to calculate normalized total number of CPU cores being allocated to each system. We need to make sure that the normalized total number of CPU cores from selected Amazon EC2 instance must be greater than or equal to the assigned entitlement capacity of IBM P795 in this case.

 

Instance TypeNormalization Factor (A)Total # CPU Cores (B)Normalized Total # CPU Core (A X B)
IBM P7951.003232
EC2 r5.metal1.084852
EC2 x1.32xlarge0.793628

Table 2: Normalized Total Number of CPU Core

Figure 1 indicates that the peak to meet the total processing power of database ERP running on IBM P795 is 29 cores of physical CPU as compared to assigned entitlement capacity of 32. As a result, the Amazon EC2 instance, r5.metal, should be the right candidate which will deliver sufficient processing power with enough buffer for future business growth in next 12 to 24 months.

Step 3: Amazon EC2 Right Sizing

Next, we need to check and ensure that other specifications such as network I/O, disk throughput and disk I/O will meet the current peak demand of the application. Table 3 shows the different specifications of r5.metal . It is clear from Table 3 that r5.metal meets all the demand requirements of the database ERP which is running on IBM P795.

SpecificationsAmazon EC2 Instance Type of r5.metal
Physical Core of CPU48 > 29 (current CPU utilization)
RAM (GiB)768 > 408 (current memory usage)
Network Perf (GBps)3 > 0.3 (current network I/O throughput)
Dedicated EBS BW (GBps)~ 2.4 > 2.0 (current disk throughput)
Maximum IOPS (16 KiB I/O)80,000 > 25,000 (current disk IO consumption)

Table 3: EC2 r5.metal Specifications

Conclusion

The methodology and example discussed in this blog gives a pragmatic estimation for optimal performance by selecting the right type and size of Amazon EC2 instances when you plan to migrate from IBM Power System, which is running mission critical ERP application, from on-premises datacenter to AWS cloud. This methodology has been applied to many enterprise cloud transformation projects and delivered more predictable performance with significant TCO savings. Additionally, this methodology can be adopted for capacity planning and helps enterprises establish strong business justifications for heterogeneous platform migration.

EC2 Price Reduction in the São Paulo Region (R5 and I3)

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/ec2-price-reduction-in-the-sao-paulo-region-r5-and-i3/

I’ve got good news for AWS customers using our South America (São Paulo) Region!

Effective February 1, 2020 we are reducing prices for On-Demand, Reserved and Dedicated Instances as follows:

  • All R5 families (R5, R5a, R5d, R5ad) – Up to 25%.
  • All I3 families (I3, I3en) – 13%.

The pricing pages have been updated.

Questions?
If you need assistance or have feedback, please reach out to your usual AWS support contacts, or post a message in the AWS Forum for Amazon EC2.

– Julien

Fact-checking GigaOm’s Microsoft-sponsored benchmark claims

Post Syndicated from Fred Wurden original https://aws.amazon.com/blogs/compute/fact-checking-gigaoms-microsoft-sponsored-benchmark-claims/

SQL Server on AWS delivers 40% price/performance advantage over Azure

In this blog, we will review a recent benchmark that Microsoft sponsored and GigaOm published on 12/2/2019. This benchmark is not credible because Microsoft and GigaOm use configurations of AWS that generate weaker performance, they have not been transparent on how it was run, and the benchmarks are not reproducible.

AWS is committed to providing objective, transparent, and replicable benchmarking data so you can make an informed decision for why to run SQL Server on AWS. The latest AWS performance benchmark shows that AWS has up to a 1.75x performance advantage and up to 40% price/performance advantage over Azure (see Appendix).

The GigaOm/Microsoft benchmark is not an accurate, head-to-head comparison. It claims that Azure has over 3x performance and 80% price/performance advantage compared to AWS. This claim by GigaOm and Microsoft is not reproductible, and was created by utilizing a modified TPC-E benchmark which uses a Microsoft’s proprietary benchmark tool. These benchmark results are also inaccurate comparisons due to significant mismatches in the configurations and the price calculations for Azure and AWS. These inaccuracies include:

  1. The benchmark uses four striped disks for Azure, but did not apply any striping to AWS. Striping is a technique that is commonly used to enhance performance.
  2. The benchmark uses an older generation AWS instance (R4), ignoring AWS hardware innovations that the latest generation comparable instance R5d delivers. R5d also has up to 3.6TB local storage for additional performance benefits.
  3. The benchmark leaves out significant cost components for Microsoft, resulting in an inaccurate price/performance result. The Microsoft cost does not account for the original cost of the Window Server licenses or the Software Assurance required to use Azure Hybrid Benefits. It also does not take into account the AWS programs that provide similar benefits, such as the Migration Acceleration Program (MAP).

AWS ran a performance analysis with comparable instance type and storage configurations using the publicly available TPC-C HammerDB benchmark tool. The benchmark running SQL Server on AWS delivers 1.75x better performance and delivers up to 40% better price/performance than Azure. You can use the same HammerDB benchmark tool and run your own TPC-C tests to replicate the latest results using this whitepaper. Similarly, you can use the same tool to replicate prior TPC-C like benchmarks from DB Best which showed that AWS delivered 2-3x better performance over Azure.

Performance is just one of the reasons customers such as NextGen Healthcare and Pearson choose AWS to run their Windows workloads, and according to a report by IDC, we host nearly two times as many Windows Server instances in the cloud as Microsoft. More and more enterprises are entrusting their Windows workloads to AWS because of its greater reliability, security, and performance. Publishing misleading benchmarks is just one more old-guard tactic by Microsoft, in addition to license complexity and licensing restrictions, to try to prevent customers from using the best cloud for their Windows workloads.

It is important that you have the facts when making a decision about your cloud provider. AWS encourages you to look for replicable research and benchmark your own workloads that help you verify which provider offers the best price, performance, and reliability. Don’t be misguided by vendor claims that cannot be validated.

Appendix

HammerDB TPC-C tests run internally on the same server and storage configurations as the GigaOm report deliver the following performance improvements over Azure when more optimal options are chosen:

About the Author

Fred Wurden is the GM of Enterprise Engineering (Windows, VMware, RedHat, SAP, benchmarking) working to make AWS the most customer-centric cloud platform on Earth. Prior to AWS, Fred worked at Microsoft for 17 years and held positions, including: EU/DoJ engineering compliance for Windows and Azure, interoperability principles and partner engagements, and open source engineering. He lives with his wife and a few four-legged friends since his kids are all in college now.

Update on Amazon Linux AMI end-of-life

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/update-on-amazon-linux-ami-end-of-life/

Launched in September 2010, the Amazon Linux AMI has helped numerous customers build Linux-based applications on Amazon Elastic Compute Cloud (EC2). In order to bring them even more security, stability, and productivity, we introduced Amazon Linux 2 in 2017. Adding many modern features, Amazon Linux 2 is backed by long-term support, and we strongly encourage you to use it for your new applications.

As stated in the FAQ, we documented that the last version of the Amazon Linux AMI (2018.03) would be end-of-life on June 30, 2020. Based on customer feedback, we are extending the end-of-life date, and we’re also announcing a maintenance support period.

End-of-life Extension
The end-of-life for Amazon Linux AMI is now extended to December 31, 2020: until then, we will continue to provide security updates and refreshed versions of packages as needed.

Maintenance Support
Beyond December 31, 2020, the Amazon Linux AMI will enter a new maintenance support period that extends to June 30, 2023.

During this maintenance support period:

  • The Amazon Linux AMI will only receive critical and important security updates for a reduced set of packages.
  • It will no longer be guaranteed to support new EC2 platform capabilities, or new AWS features.

Supported packages will include:

  • The Linux kernel,
  • Low-level system libraries such as glibc and openssl,
  • Popular packages that are still in a supported state in their upstream sources, such as MySQL and PHP.

We will provide a detailed list of supported and unsupported packages in future posts.

Questions?
If you need assistance or have feedback, please reach out to your usual AWS support contacts, or post a message in the AWS Forum for Amazon Linux. Thank you for using Amazon Linux AMI!

– Julien

 

ICYMI: Serverless Q4 2019

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2019/

Welcome to the eighth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

The three months comprising the fourth quarter of 2019

AWS re:Invent

AWS re:Invent 2019

re:Invent 2019 dominated the fourth quarter at AWS. The serverless team presented a number of talks, workshops, and builder sessions to help customers increase their skills and deliver value more rapidly to their own customers.

Serverless talks from re:Invent 2019

Chris Munns presenting 'Building microservices with AWS Lambda' at re:Invent 2019

We presented dozens of sessions showing how customers can improve their architecture and agility with serverless. Here are some of the most popular.

Videos

Decks

You can also find decks for many of the serverless presentations and other re:Invent presentations on our AWS Events Content.

AWS Lambda

For developers needing greater control over performance of their serverless applications at any scale, AWS Lambda announced Provisioned Concurrency at re:Invent. This feature enables Lambda functions to execute with consistent start-up latency making them ideal for building latency sensitive applications.

As shown in the below graph, provisioned concurrency reduces tail latency, directly impacting response times and providing a more responsive end user experience.

Graph showing performance enhancements with AWS Lambda Provisioned Concurrency

Lambda rolled out enhanced VPC networking to 14 additional Regions around the world. This change brings dramatic improvements to startup performance for Lambda functions running in VPCs due to more efficient usage of elastic network interfaces.

Illustration of AWS Lambda VPC to VPC NAT

New VPC to VPC NAT for Lambda functions

Lambda now supports three additional runtimes: Node.js 12, Java 11, and Python 3.8. Each of these new runtimes has new version-specific features and benefits, which are covered in the linked release posts. Like the Node.js 10 runtime, these new runtimes are all based on an Amazon Linux 2 execution environment.

Lambda released a number of controls for both stream and async-based invocations:

  • You can now configure error handling for Lambda functions consuming events from Amazon Kinesis Data Streams or Amazon DynamoDB Streams. It’s now possible to limit the retry count, limit the age of records being retried, configure a failure destination, or split a batch to isolate a problem record. These capabilities help you deal with potential “poison pill” records that would previously cause streams to pause in processing.
  • For asynchronous Lambda invocations, you can now set the maximum event age and retry attempts on the event. If either configured condition is met, the event can be routed to a dead letter queue (DLQ), Lambda destination, or it can be discarded.

AWS Lambda Destinations is a new feature that allows developers to designate an asynchronous target for Lambda function invocation results. You can set separate destinations for success and failure. This unlocks new patterns for distributed event-based applications and can replace custom code previously used to manage routing results.

Illustration depicting AWS Lambda Destinations with success and failure configurations

Lambda Destinations

Lambda also now supports setting a Parallelization Factor, which allows you to set multiple Lambda invocations per shard for Kinesis Data Streams and DynamoDB Streams. This enables faster processing without the need to increase your shard count, while still guaranteeing the order of records processed.

Illustration of multiple AWS Lambda invocations per Kinesis Data Streams shard

Lambda Parallelization Factor diagram

Lambda introduced Amazon SQS FIFO queues as an event source. “First in, first out” (FIFO) queues guarantee the order of record processing, unlike standard queues. FIFO queues support messaging batching via a MessageGroupID attribute that supports parallel Lambda consumers of a single FIFO queue, enabling high throughput of record processing by Lambda.

Lambda now supports Environment Variables in the AWS China (Beijing) Region and the AWS China (Ningxia) Region.

You can now view percentile statistics for the duration metric of your Lambda functions. Percentile statistics show the relative standing of a value in a dataset, and are useful when applied to metrics that exhibit large variances. They can help you understand the distribution of a metric, discover outliers, and find hard-to-spot situations that affect customer experience for a subset of your users.

Amazon API Gateway

Screen capture of creating an Amazon API Gateway HTTP API in the AWS Management Console

Amazon API Gateway announced the preview of HTTP APIs. In addition to significant performance improvements, most customers see an average cost savings of 70% when compared with API Gateway REST APIs. With HTTP APIs, you can create an API in four simple steps. Once the API is created, additional configuration for CORS and JWT authorizers can be added.

AWS SAM CLI

Screen capture of the new 'sam deploy' process in a terminal window

The AWS SAM CLI team simplified the bucket management and deployment process in the SAM CLI. You no longer need to manage a bucket for deployment artifacts – SAM CLI handles this for you. The deployment process has also been streamlined from multiple flagged commands to a single command, sam deploy.

AWS Step Functions

One powerful feature of AWS Step Functions is its ability to integrate directly with AWS services without you needing to write complicated application code. In Q4, Step Functions expanded its integration with Amazon SageMaker to simplify machine learning workflows. Step Functions also added a new integration with Amazon EMR, making EMR big data processing workflows faster to build and easier to monitor.

Screen capture of an AWS Step Functions step with Amazon EMR

Step Functions step with EMR

Step Functions now provides the ability to track state transition usage by integrating with AWS Budgets, allowing you to monitor trends and react to usage on your AWS account.

You can now view CloudWatch Metrics for Step Functions at a one-minute frequency. This makes it easier to set up detailed monitoring for your workflows. You can use one-minute metrics to set up CloudWatch Alarms based on your Step Functions API usage, Lambda functions, service integrations, and execution details.

Step Functions now supports higher throughput workflows, making it easier to coordinate applications with high event rates. This increases the limits to 1,500 state transitions per second and a default start rate of 300 state machine executions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). Click the above link to learn more about the limit increases in other Regions.

Screen capture of choosing Express Workflows in the AWS Management Console

Step Functions released AWS Step Functions Express Workflows. With the ability to support event rates greater than 100,000 per second, this feature is designed for high-performance workloads at a reduced cost.

Amazon EventBridge

Illustration of the Amazon EventBridge schema registry and discovery service

Amazon EventBridge announced the preview of the Amazon EventBridge schema registry and discovery service. This service allows developers to automate discovery and cataloging event schemas for use in their applications. Additionally, once a schema is stored in the registry, you can generate and download a code binding that represents the schema as an object in your code.

Amazon SNS

Amazon SNS now supports the use of dead letter queues (DLQ) to help capture unhandled events. By enabling a DLQ, you can catch events that are not processed and re-submit them or analyze to locate processing issues.

Amazon CloudWatch

Amazon CloudWatch announced Amazon CloudWatch ServiceLens to provide a “single pane of glass” to observe health, performance, and availability of your application.

Screenshot of Amazon CloudWatch ServiceLens in the AWS Management Console

CloudWatch ServiceLens

CloudWatch also announced a preview of a capability called Synthetics. CloudWatch Synthetics allows you to test your application endpoints and URLs using configurable scripts that mimic what a real customer would do. This enables the outside-in view of your customers’ experiences, and your service’s availability from their point of view.

CloudWatch introduced Embedded Metric Format, which helps you ingest complex high-cardinality application data as logs and easily generate actionable metrics. You can publish these metrics from your Lambda function by using the PutLogEvents API or using an open source library for Node.js or Python applications.

Finally, CloudWatch announced a preview of Contributor Insights, a capability to identify who or what is impacting your system or application performance by identifying outliers or patterns in log data.

AWS X-Ray

AWS X-Ray announced trace maps, which enable you to map the end-to-end path of a single request. Identifiers show issues and how they affect other services in the request’s path. These can help you to identify and isolate service points that are causing degradation or failures.

X-Ray also announced support for Amazon CloudWatch Synthetics, currently in preview. CloudWatch Synthetics on X-Ray support tracing canary scripts throughout the application, providing metrics on performance or application issues.

Screen capture of AWS X-Ray Service map in the AWS Management Console

X-Ray Service map with CloudWatch Synthetics

Amazon DynamoDB

Amazon DynamoDB announced support for customer-managed customer master keys (CMKs) to encrypt data in DynamoDB. This allows customers to bring your own key (BYOK) giving you full control over how you encrypt and manage the security of your DynamoDB data.

It is now possible to add global replicas to existing DynamoDB tables to provide enhanced availability across the globe.

Another new DynamoDB capability to identify frequently accessed keys and database traffic trends is currently in preview. With this, you can now more easily identify “hot keys” and understand usage of your DynamoDB tables.

Screen capture of Amazon CloudWatch Contributor Insights for DynamoDB in the AWS Management Console

CloudWatch Contributor Insights for DynamoDB

DynamoDB also released adaptive capacity. Adaptive capacity helps you handle imbalanced workloads by automatically isolating frequently accessed items and shifting data across partitions to rebalance them. This helps reduce cost by enabling you to provision throughput for a more balanced workload instead of over provisioning for uneven data access patterns.

Amazon RDS

Amazon Relational Database Services (RDS) announced a preview of Amazon RDS Proxy to help developers manage RDS connection strings for serverless applications.

Illustration of Amazon RDS Proxy

The RDS Proxy maintains a pool of established connections to your RDS database instances. This pool enables you to support a large number of application connections so your application can scale without compromising performance. It also increases security by enabling IAM authentication for database access and enabling you to centrally manage database credentials using AWS Secrets Manager.

AWS Serverless Application Repository

The AWS Serverless Application Repository (SAR) now offers Verified Author badges. These badges enable consumers to quickly and reliably know who you are. The badge appears next to your name in the SAR and links to your GitHub profile.

Screen capture of SAR Verifiedl developer badge in the AWS Management Console

SAR Verified developer badges

AWS Developer Tools

AWS CodeCommit launched the ability for you to enforce rule workflows for pull requests, making it easier to ensure that code has pass through specific rule requirements. You can now create an approval rule specifically for a pull request, or create approval rule templates to be applied to all future pull requests in a repository.

AWS CodeBuild added beta support for test reporting. With test reporting, you can now view the detailed results, trends, and history for tests executed on CodeBuild for any framework that supports the JUnit XML or Cucumber JSON test format.

Screen capture of AWS CodeBuild

CodeBuild test trends in the AWS Management Console

Amazon CodeGuru

AWS announced a preview of Amazon CodeGuru at re:Invent 2019. CodeGuru is a machine learning based service that makes code reviews more effective and aids developers in writing code that is more secure, performant, and consistent.

AWS Amplify and AWS AppSync

AWS Amplify added iOS and Android as supported platforms. Now developers can build iOS and Android applications using the Amplify Framework with the same category-based programming model that they use for JavaScript apps.

Screen capture of 'amplify init' for an iOS application in a terminal window

The Amplify team has also improved offline data access and synchronization by announcing Amplify DataStore. Developers can now create applications that allow users to continue to access and modify data, without an internet connection. Upon connection, the data synchronizes transparently with the cloud.

For a summary of Amplify and AppSync announcements before re:Invent, read: “A round up of the recent pre-re:Invent 2019 AWS Amplify Launches”.

Illustration of AWS AppSync integrations with other AWS services

Q4 serverless content

Blog posts

October

November

December

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q4:

Twitch

October

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

AWS Serverless Application Repository (SAR) Apps

In this edition of ICYMI, we are introducing a section devoted to SAR apps written by the AWS Serverless Developer Advocacy team. You can run these applications and review their source code to learn more about serverless and to see examples of suggested practices.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. We’re also kicking off a fresh series of Tech Talks in 2020 with new content providing greater detail on everything new coming out of AWS for serverless application developers.

Throughout 2020, the AWS Serverless Developer Advocates are crossing the globe to tell you more about serverless, and to hear more about what you need. Follow this blog to keep up on new launches and announcements, best practices, and examples of serverless applications in action.

You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Moheeb Zara: @virgilvox
Ben Smith: @benjamin_l_s
Rob Sutter: @rts_rob
Julian Wood: @julian_wood

Happy coding!

Orchestrating a security incident response with AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/orchestrating-a-security-incident-response-with-aws-step-functions/

In this post I will show how to implement the callback pattern of an AWS Step Functions Standard Workflow. This is used to add a manual approval step into an automated security incident response framework. The framework could be extended to remediate automatically, according to the individual policy actions defined. For example, applying alternative actions, or restricting actions to specific ARNs.

The application uses Amazon EventBridge to trigger a Step Functions Standard Workflow on an IAM policy creation event. The workflow compares the policy action against a customizable list of restricted actions. It uses AWS Lambda and Step Functions to roll back the policy temporarily, then notify an administrator and wait for them to approve or deny.

Figure 1: High-level architecture diagram.

Important: the application uses various AWS services, and there are costs associated with these services after the Free Tier usage. Please see the AWS pricing page for details.

You can deploy this application from the AWS Serverless Application Repository. You then create a new IAM Policy to trigger the rule and run the application.

Deploy the application from the Serverless Application Repository

  1. Find the “Automated-IAM-policy-alerts-and-approvals” app in the Serverless Application Repository.
  2. Complete the required application settings
    • Application name: an identifiable name for the application.
    • EmailAddress: an administrator’s email address for receiving approval requests.
    • restrictedActions: the IAM Policy actions you want to restrict.

      Figure 2 Deployment Fields

  3. Choose Deploy.

Once the deployment process is completed, 21 new resources are created. This includes:

  • Five Lambda functions that contain the business logic.
  • An Amazon EventBridge rule.
  • An Amazon SNS topic and subscription.
  • An Amazon API Gateway REST API with two resources.
  • An AWS Step Functions state machine

To receive Amazon SNS notifications as the application administrator, you must confirm the subscription to the SNS topic. To do this, choose the Confirm subscription link in the verification email that was sent to you when deploying the application.

EventBridge receives new events in the default event bus. Here, the event is compared with associated rules. Each rule has an event pattern defined, which acts as a filter to match inbound events to their corresponding rules. In this application, a matching event rule triggers an AWS Step Functions execution, passing in the event payload from the policy creation event.

Running the application

Trigger the application by creating a policy either via the AWS Management Console or with the AWS Command Line Interface.

Using the AWS CLI

First install and configure the AWS CLI, then run the following command:

aws iam create-policy --policy-name my-bad-policy1234 --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketObjectLockConfiguration",
                "s3:DeleteObjectVersion",
                "s3:DeleteBucket"
            ],
            "Resource": "*"
        }
    ]
}'

Using the AWS Management Console

  1. Go to Services > Identity Access Management (IAM) dashboard.
  2. Choose Create policy.
  3. Choose the JSON tab.
  4. Paste the following JSON:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:GetBucketObjectLockConfiguration",
                    "s3:DeleteObjectVersion",
                    "s3:DeleteBucket"
                ],
                "Resource": "*"
            }
        ]
    }
  5. Choose Review policy.
  6. In the Name field, enter my-bad-policy.
  7. Choose Create policy.

Either of these methods creates a policy with the permissions required to delete Amazon S3 buckets. Deleting an S3 bucket is one of the restricted actions set when the application is deployed:

Figure 3 default restricted actions

This sends the event to EventBridge, which then triggers the Step Functions state machine. The Step Functions state machine holds each state object in the workflow. Some of the state objects use the Lambda functions created during deployment to process data.

Others use Amazon States Language (ASL) enabling the application to conditionally branch, wait, and transition to the next state. Using a state machine decouples the business logic from the compute functionality.

After triggering the application, go to the Step Functions dashboard and choose the newly created state machine. Choose the current running state machine from the executions table.

Figure 4 State machine executions.

You see a visual representation of the current execution with the workflow is paused at the AskUser state.

Figure 5 Workflow Paused

These are the states in the workflow:

ModifyData
State Type: Pass
Re-structures the input data into an object that is passed throughout the workflow.

ValidatePolicy
State type: Task. Services: AWS Lambda
Invokes the ValidatePolicy Lambda function that checks the new policy document against the restricted actions.

ChooseAction
State type: Choice
Branches depending on input from ValidatePolicy step.

TempRemove
State type: Task. Service: AWS Lambda
Creates a new default version of the policy with only permissions for Amazon CloudWatch Logs and deletes the previously created policy version.

AskUser
State type: Choice
Sends an approval email to user via SNS, with the task token that initiates the callback pattern.

UsersChoice
State type: Choice
Branch based on the user action to approve or deny.

Denied
State type: Pass
Ends the execution with no further action.

Approved
State type: Task. Service: AWS Lambda
Restores the initial policy document by creating as a new version.

AllowWithNotification
State type: Task. Services: AWS Lambda
With no restricted actions detected, the user is still notified of change (via an email from SNS) before execution ends.

The callback pattern

An important feature of this application is the ability for an administrator to approve or deny a new policy. The Step Functions callback pattern makes this possible.

The callback pattern allows a workflow to pause during a task and wait for an external process to return a task token. The task token is generated when the task starts. When the AskUser function is invoked, it is passed a task token. The task token is published to the SNS topic along with the API resources for approval and denial. These API resources are created when the application is first deployed.

When the administrator clicks on the approve or deny links, it passes the token with the API request to the receiveUser Lambda function. This Lambda function uses the incoming task token to resume the AskUser state.

The lifecycle of the task token as it transitions through each service is shown below:

Figure 6 Task token lifecycle

  1. To invoke this callback pattern, the askUser state definition is declared using the .waitForTaskToken identifier, with the task token passed into the Lambda function as a payload parameter:
    "AskUser":{
     "Type": "Task",
     "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
     "Parameters":{  
     "FunctionName": "${AskUser}",
     "Payload":{  
     "token.$":"$$.Task.Token"
      }
     },
      "ResultPath":"$.taskresult",
      "Next": "usersChoice"
      },
  2. The askUser Lambda function can then access this token within the event object:
    exports.handler = async (event,context) => {
        let approveLink = `process.env.APIAllowEndpoint?token=${JSON.stringify(event.token)}`
        let denyLink = `process.env.APIDenyEndpoint?token=${JSON.stringify(event.token)}
    //code continues
  3. The task token is published to an SNS topic along with the message text parameter:
        let params = {
     TopicArn: process.env.Topic,
     Message: `A restricted Policy change has been detected Approve:${approveLink} Or Deny:${denyLink}` 
    }
     let res = await sns.publish(params).promise()
    //code continues
  4. The administrator receives an email with two links, one to approve and one to deny. The task token is appended to these links as a request query string parameter named token:

    Figure 7 Approve / deny email.

  5. Using the Amazon API Gateway proxy integration, the task token is passed directly to the recieveUser Lambda function from the API resource, and accessible from within in the function code as part of the event’s queryStringParameter object:
    exports.handler = async(event, context) => {
    //some code
        let taskToken = event.queryStringParameters.token
    //more code
    
  6.  The token is then sent back to the askUser state via an API call from within the recieveUser Lambda function.  This API call also defines the next course of action for the workflow to take.
    //some code 
    let params = {
            output: JSON.stringify({"action":NextAction}),
            taskToken: taskTokenClean
        }
    let res = await stepfunctions.sendTaskSuccess(params).promise()
    //code continues
    

Each Step Functions execution can last for up to a year, allowing for long wait periods for the administrator to take action. There is no extra cost for a longer wait time as you pay for the number of state transitions, and not for the idle wait time.

Conclusion

Using EventBridge to route IAM policy creation events directly to AWS Step Functions reduces the need for unnecessary communication layers. It helps promote good use of compute resources, ensuring Lambda is used to transform data, and not transport or orchestrate.

Using Step Functions to invoke services sequentially has two important benefits for this application. First, you can identify the use of restricted policies quickly and automatically. Also, these policies can be removed and held in a ‘pending’ state until approved.

Step Functions Standard Workflow’s callback pattern can create a robust orchestration layer that allows administrators to review each change before approving or denying.

For the full code base see the GitHub repository https://github.com/bls20AWS/AutomatedPolicyOrchestrator.

For more information on other Step Functions patterns, see our documentation on integration patterns.

Continued support for Python 2.7 on AWS Lambda

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/continued-support-for-python-2-7-on-aws-lambda/

The Python Software Foundation (PSF), which is the Python language governing body, ended support for Python 2.7 on January 1, 2020. Additionally, many popular open source software packages also ended their support for Python 2.7 then.

What is AWS Lambda doing?

We recognize that Python 2 and 3 differ across several core language aspects, and the application binary interface is not always compatible. We also recognize that these differences can make migration challenging. To allow you additional time to prepare, AWS Lambda will continue to provide critical security patches for the Python 2.7 runtime until at least December 31, 2020. Lambda’s scope of support includes the Python interpreter and standard library, but does not extend to third-party packages.

What do I need to do?

There is no action required to continue using your existing Python 2.7 Lambda functions. Python 2.7 function creates and updates will continue to work as normal. However, we highly recommend that you migrate your Lambda functions to Python 3. AWS Lambda supports Python 3.6, 3.7 and version 3.8. As always, you should test your functions for Python 3 language compatibility before applying changes to your production functions.

The Python community offers helpful guides and tools to help you port Python 2 code to Python 3:

What if I have issues or need help?

Please contact us through AWS Support or the AWS Developer Forums with any questions.

Happy coding!

Running ANSYS Fluent on Amazon EC2 C5n with Elastic Fabric Adapter (EFA)

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/running-ansys-fluent-on-amazon-ec2-c5n-with-elastic-fabric-adapter-efa/

Written by: Nicola Venuti, HPC Specialist Solutions Architect 

In July 2019 I published: “Best Practices for Running Ansys Fluent Using AWS ParallelCluster.” The first post demonstrated how to launch ANSYS Fluent on AWS using AWS ParallelCluster. In this blog, I discuss a new AWS service: the Elastic Fabric Adapter (EFA).   I also walk you through an example EFA for tightly coupled workloads. Finally, I demonstrate how you can accelerate your tightly coupled (MPI) workloads with EFA, which lowers your cost per job.

 

EFA is a new network interface for Amazon EC2 instances designed to accelerate tightly coupled (MPI) workloads on AWS. AWS announced EFA at re:Invent in 2018. If you want to learn more about EFA, you can read Jeff Barr’s blog post and watch this technical video led by our Principal Engineer, Brian Barrett.

 

After reading our first blog post, readers asked for benchmark results and possibly the cost associated to the job. So, in addition to a step-by-step guide on how to run your first ANSYS Fluent job with EFA, this blog post also shows the results (in terms of rating and scaling curve) up to 5000 cores of a common ANSYS Fluent benchmark the Formula-1 Race Car (140M cells Mesh), and the costs per job comparison among the most suitable Amazon EC2 instance types.

 

Create your HPC Cluster

In this part of the blog, I will walk you through the following:  the setup of AWS ParallelCluster configuration file, the setup the post-install script, and the deployment of your HPC cluster.

 

Setup AWS ParellelCluster
I use AWS ParallelCluster in this example because it simplifies the deployment of  HPC clusters on AWS.  This AWS supported, open source tool manages and deploys HPC clusters in the cloud.  Additionally, AWS ParellelCluster is already integrated with EFA, which eliminates extra effort to run your preferred HPC applications.

 

The latest release (2.5.1) of AWS ParellelCluster simplifies cluster deployment in three main ways. First, the updates remove the need for custom AMIs. Second, important components (particularly Nice DCV) run on AWS ParallelCluster. Finally, the hyperthreading can be shutdown using a new parameter in the configuration file.

 

Note: If you need additional instructions on how to install AWS ParallelCluster and get started, read this blog post, and/or the AWS ParallelCluster documentation.

 

The first few steps of this blog post differ from the previous post’s because of these updates. This means that the AWS ParallelCluster configuration file is different. In particular, here are the additions:

  1. enable_efa = compute in the [cluster] section
  2. the new [dcv] section and the dcv_settings parameter in the [cluster] section
  3. the new parameter disable_hyperthreading in the [cluster] section

These additions to the configuration file enable automatic functionalities that previously needed to be enabled manually.

 

Next, in your preferred text editor paste the following code:

aws_region_name = <your-preferred-region>

[global]
sanity_check = true
cluster_template = fluentEFA
update_check = true

[vpc my-vpc-1]
vpc_id = vpc-<VPC-ID>
master_subnet_id = subnet-<Subnet-ID>

[cluster fluentEFA]
key_name = <Key-Name>
vpc_settings = my-vpc-1
compute_instance_type=c5n.18xlarge
master_instance_type=c5n.2xlarge
initial_queue_size = 0
max_queue_size = 100
maintain_initial_size = true
scheduler=slurm
cluster_type = ondemand
s3_read_write_resource=arn:aws:s3:::<Your-S3-Bucket>*
post_install = s3://<Your-S3-Bucket>/fluent-efa-post-install.sh
placement_group = DYNAMIC
placement = compute
base_os = centos7
tags = {"Name" : "fluentEFA"}
disable_hyperthreading = true
fsx_settings = parallel-fs
enable_efa = compute
dcv_settings = my-dcv

[dcv my-dcv]
enable = master

[fsx parallel-fs]
shared_dir = /fsx
storage_capacity = 1200
import_path = s3://<Your-S3-Bucket>
imported_file_chunk_size = 1024
export_path = s3://<Your-S3-Bucket>/export

 

Now that the ParellelCluster is set up, you are ready for the second step: post-install script.

 

Edit the Post-Install Script

Below is an example of a post install script. Make sure it is saved in the S3 bucket defined with the parameter post_install = s3://<Your-S3-Bucket>/fluent-efa-post-install.sh in the configuration file above.

 

To upload the post-install script into your S3 bucket, run the following command:

aws s3 cp fluent-efa-post-install.sh s3://<Your-S3-Bucket>/fluent-efa-post-install.sh

#!/bin/bash

#this will disable the ssh host key checking
#usually not needed, but Fluent might require this setting.
cat <<\EOF >> /etc/ssh/ssh_config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF

# set higher ulimits,
# usefull when running Fluent (and in general HPC applications) on multiple instances via mpi
cat <<\EOF >> /etc/security/limits.conf
* hard memlock unlimited
* soft memlock unlimited
* hard stack 1024000
* soft stack 1024000
* hard nofile 1024000
* soft nofile 1024000
EOF

#stop and disable the firewall
systemctl disable firewalld
systemctl stop firewalld

Now, you have in place all the s of AWS ParellelCluster, and you are ready to deploy your HPC cluster.

 

Deploy HPC Cluster

Run the following command to create your HPC cluster that is EFA enabled:

pcluster create -c fluent.config fluentEFA -t fluentEFA -r <your-preferred-region>

Note:  The “*” at the end of the s3_read_write_resource parameter line is needed in order to let AWS ParallelCluster accessing your S3 bucket correctly. So, for example, if your S3 bucket is called “ansys-download,” it would look like:

s3_read_write_resource=arn:aws:s3:::ansys-download*

 

You should have your HPC cluster up and running after following the three main steps in this section. Now you can install ANSYS Fluent.

Install ANSYS Fluent

The previous section of this post should take about 10 minutes to produce the following output:

Status: parallelcluster-fluentEFA - CREATE_COMPLETE
MasterPublicIP: 3.212.243.33
ClusterUser: centos
MasterPrivateIP: 10.6.1.153

Once you receive that successful output, you can move on to install ANSYS fluent. Enter the following commands to connect to the master node of your new cluster via SSH and/or DCV:

  1. via SSH: pcluster ssh fluentEFA -i ~/my_key.pem
  2. via DCV: pcluster dcv connect fluentEFA --key-path ~/my_key.pem

 

Once you are logged in, become root (sudo su - or sudo -i ), and install the ANSYS suite under the /fsx directory. You can install it manually, or you can use the sample script.

 

Note: I defined the import_path = s3://<Your-S3-Bucket> in the Amazon FSx section of the configuration file. This tells Amazon FSx to preload all the data from <Your-S3-Bucket>. I recommend copying the ANSYS installation files, and any other file or package you need, to S3 in advance. This step ensures that your files are available under the /fsx directory of your cluster.

 

The example below uses the ANSYS iso installation files. You can use either the tar or the iso file. You can download both from the ANSYS Customer Portal under Download → Current Release.

 

Run this sample script to install ANSYS:

#!/bin/bash

#check the installation directory
if [ ! -d "${1}" -o -z "${1}" ]; then
echo "Error: please check the install dir"
exit -1
fi

ansysDir="${1}/ansys_inc"
installDir="${1}/"

ansysDisk1="ANSYS2019R3_LINX64_Disk1.iso"
ansysDisk2="ANSYS2019R3_LINX64_Disk2.iso"

# mount the Disks
disk1="${installDir}/AnsysDisk1"
disk2="${installDir}/AnsysDisk2"
mkdir -p "${disk1}"
mkdir -p "${disk2}"

echo "Mounting ${ansysDisk1} ..."
mount -o loop "${installDir}/${ansysDisk1}" "${disk1}"

echo "Mounting ${ansysDisk2} ..."
mount -o loop "${installDir}/${ansysDisk2}" "${disk2}"

# INSTALL Ansys WB
echo "Installing Ansys ${ansysver}"
"${disk1}/INSTALL" -silent -install_dir "${ansysDir}" -media_dir2 "${disk2}"

echo "Ansys installed"

umount -l "${disk1}"
echo "${ansysDisk1} unmounted..."

umount -l "${disk2}"
echo "${ansysDisk2} unmounted..."

echo "Cleaning up temporary install directory"
rm -rf "${disk1}"
rm -rf "${disk2}"

echo "Installation process completed!"

 

Congrats, now you have successfully installed ANSYS Workbench!

 

Adapt the ANSYS Fluent mpi_wrapper

Now that your HPC cluster is running and that ANSYS Workbench is installed, you can patch ANSYS Fluent. ANSYS Fluent does not currently support EFA out-of-the-box, so, you need to make a few modifications to get your app running properly.

 

Complete the following steps to make the proper modifications:

 

Open mpirun.fl (an MPI wrapper script) with your preferred text editor:

vim /fsx/ansys_inc/v195/fluent/fluent19.5.0/multiport/mpi_wrapper/bin/mpirun.fl 

 

Comment this line 465:

# For better performance, suggested by Intel
FS_MPIRUN_FLAGS="$FS_MPIRUN_FLAGS -genv I_MPI_ADJUST_REDUCE 2 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_ADJUST_BCAST 1"

 

In addition to that, line 548:

FS_MPIRUN_FLAGS="$FS_MPIRUN_FLAGS -genv LD_PRELOAD $INTEL_ROOT/lib/libmpi_mt.so"

should be modified as follows:

FS_MPIRUN_FLAGS="$FS_MPIRUN_FLAGS -genv LD_PRELOAD $INTEL_ROOT/lib/release_mt/libmpi.so"

The library file location and name changed for Intel 2019 Update 5. Fixing this will remove the following error message:

ERROR: ld.so: object '/opt/intel/parallel_studio_xe_2019/compilers_and_libraries_2019/linux/mpi/intel64//lib/libmpi_mt.so' from LD_PR ELOAD cannot be preloaded: ignored

 

I recommend backing-up the MPI wrapper script before any modification:

cp /fsx/ansys_inc/v195/fluent/fluent19.5.0/multiport/mpi_wrapper/bin/mpirun.fl /fsx/ansys_inc/v195/fluent/fluent19.5.0/multiport/mpi_wrapper/bin/mpirun.fl.ORIG

Once these steps are completed, your ANSYS Fluent installation is properly modified to support EFA.

 

Run your first ANSYS Fluent job using EFA

You are almost ready to run your first ANSYS Fluent job using EFA. You can use the same submission script used previously.  Export INTELMPI_ROOT or OPENMPI_ROOT in order to specify the custom MPI library to use.

 

The following script demonstrates this step:

#!/bin/bash

#SBATCH -J Fluent
#SBATCH -o Fluent."%j".out

module load intelmpi
export INTELMPI_ROOT=/opt/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/

export [email protected]<your-license-server>
export [email protected]<your-license-server>

basedir="/fsx"
workdir="${basedir}/$(date "+%d-%m-%Y-%H-%M")-${SLURM_NPROCS}-$RANDOM"
mkdir "${workdir}"
cd "${workdir}"
cp "${basedir}/f1_racecar_140m.tar.gz" .
tar xzvf f1_racecar_140m.tar.gz
rm -f f1_racecar_140m.tar.gz
cd bench/fluent/v6/f1_racecar_140m/cas_dat

srun -l /bin/hostname | sort -n | awk '{print $2}' > hostfile
${basedir}/ansys_inc/v194/fluent/bin/fluentbench.pl f1_racecar_140m -t${SLURM_NPROCS} -cnf=hostfile -part=1 -nosyslog -noloadchk -ssh -mpi=intel -cflush

Save this snippet as fluent-run-efa.sh under /fsx and run it as follows:

sbatch -n 2304 /fsx/fluent-run-efa.sh

 

Note1: The number, 2304 cores, is an example, this command will tell AWS ParallelCluster to spin-up 64 C5n.18xlarge. Feel free to change it and run it as you wish.

Note2: you may want to copy on S3 the benchmark file f1_racecar_140m.tar.gz or any other dataset you want to use, so that it’s preloaded on Amazon FSx and ready for you to use.

 

Performance and cost considerations

Now I will show benchmark results (in terms of rating and scaling efficiency) and cost per job (only EC2 instances costs will be considered). The following graph shows the scaling curve of EFA vs C5.18xlarge vs the ideal scalability.

 

The Formula-1 Race Car used for this benchmark is a 140-M cells mesh. The range of 70k-100k cells per core optimizes cost for performance. Improvement in turnaround time continues up to 40,000 cells per core with an acceptable cost for the performance. C5n.18xlarge + EFA shows ~89% scaling efficiency at 3024 cores. This metric is a great improvement compared to the C5.18xlarge scaling (48% at 3024 cores). In both cases, I ran with the hyperthreading disabled, up to 84 instances in total.

Scaling vs number of cores graph

ANSYS has published some results of this benchmark here. The plot below shows the “Rating” of a Cray XC50 and C5n.18xlarge + EFA. In ANSYS’ own words the rating is defined as: “ the primary metric used to report performance results of the Fluent Benchmarks. It is defined as the number of benchmarks that can be run on a given machine (in sequence) in a 24 hour period. It is computed by dividing the number of seconds in a day (86,400 seconds) by the number of seconds required to run the benchmark. A higher rating means faster performance.”

 

The plot below shows C5n.18xlarge + EFA with a higher rating than the XC50, up to ~2400 cores, and is on par with it up to ~3800 cores.

In addition to turnaround time improvements, EFA brings another primary advantage: cost reduction. At the moment, C5n.18xlarge costs 27% more compared to C5.18xlarge (EFA is available at no additional cost). This price difference is due to the higher, 4x network performance (100-Gbps vs 25-Gbps) and 33% higher memory footprint (192 vs 144 GB). The following chart shows the cost comparison between C5.18xlarge and C5n.18xlarge + EFA as I scale out for the ANSYS benchmark run.

 

cost per run vs number of cores

Please note that the chart above shows the cost per job using the On-Demand price (OD) in US-East-1 (N. Virginia), for short jobs (that last minutes or even hours) you may want to consider using the EC2 Spot price. Spot Instances offer spare EC2 instances at steep discounts. At the moment I am writing this blog post, the C5n.18xlarge Spot price in N. Virginia is 70% lower compared to the On-Demand price: a significant price reduction.

 

Conclusions

This blog post reviewed best practices for running ANSYS Fluent with EFA, and walked through the performance and cost benefits of EFA running a 140-M cell ANSYS Fluent benchmark. Computational Fluid Dynamics and tightly coupled workloads involve an iterative process of tuning, refining, testing, and benchmarking. Many variables can affect performance of these workloads, so the AWS HPC team is committed to document best practices for MPI workloads on AWS.

I would also love to hear from you. Please let us know about other applications you would like us to test and features you would like to request.

Orchestrating an application process with AWS Batch using AWS CloudFormation

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/orchestrating-an-application-process-with-aws-batch-using-aws-cloudformation/

This post is written by Sivasubramanian Ramani

In many real work applications, you can use custom Docker images with AWS Batch and AWS CloudFormation to execute complex jobs efficiently.

 

This post provides a file processing implementation using Docker images and Amazon S3, AWS Lambda, Amazon DynamoDB, and AWS Batch. In this scenario, the user uploads a CSV file into an Amazon S3 bucket, which is processed by AWS Batch as a job. These jobs can be packaged as Docker containers and are executed using Amazon EC2 and Amazon ECS.

 

The following steps provide an overview of this implementation:

  1. AWS CloudFormation template launches the S3 bucket that stores the CSV files.
  2. The Amazon S3 file event notification executes an AWS Lambda function that starts an AWS Batch job.
  3. AWS Batch executes the job as a Docker container.
  4. A Python-based program reads the contents of the S3 bucket, parses each row, and updates an Amazon DynamoDB table.
  5. Amazon DynamoDB stores each processed row from the CSV.

 Prerequisites 

 

Walkthrough

The following steps outline this walkthrough. Detailed steps are given through the course of the material.

  1. Run the CloudFormation template (command provided) to create the necessary infrastructure.
  2. Set up the Docker image for the job:
    1. Build a Docker image.
    2. Tag the build and push the image to the repository.
  3. Drop the CSV into the S3 bucket (copy paste the contents and create them as a [sample file csv]).
  4. Confirm that the job runs and performs the operation based on the pushed container image. The job parses the CSV file and adds each row into DynamoDB.

 

Points to consider

  • The provided AWS CloudFormation template has all the services (refer to upcoming diagram) needed for this walkthrough in one single template. In an ideal production scenario, you might split them into different templates for easier maintenance.

 

  • As part of this walkthrough, you use the Optimal Instances for the batch. The a1.medium instance is a less expensive instance type introduced for batch operations, but you can use any AWS Batch capable instance type according to your needs.

 

  • To handle a higher volume of CSV file contents, you can do multithreaded or multiprocessing programming to complement the AWS Batch performance scale.

 

Deploying the AWS CloudFormation template

 

When deployed, the AWS CloudFormation template creates the following infrastructure.

 application using AWS BatchAn application process using AWS Batch

 

You can download the source from the github location. Below steps will detail using the downloaded code. This has the CloudFormation template that spins up the infrastructure, a Python application (.py file) and a sample CSV file. You can optionally use the below git command to clone the repository as below. This becomes your SOUCE_REPOSITORY

$ git clone https://github.com/aws-samples/aws-batch-processing-job-repo

$ cd aws-batch-processing-job-repo

$ aws cloudformation create-stack --stack-name batch-processing-job --template-body file://template/template.yaml --capabilities CAPABILITY_NAMED_IAM

 

When the preceding CloudFormation stack is created successfully, take a moment to identify the major components.

 

The CloudFormation stack spins up the following resources, which can be viewed in the AWS Management Console.

  1. CloudFormation Stack Name – batch-processing-job
  2. S3 Bucket Name – batch-processing-job-<YourAccountNumber>
    1. After the sample CSV file is dropped into this bucket, the process should kick start.
  3. JobDefinition – BatchJobDefinition
  4. JobQueue – BatchProcessingJobQueue
  5. Lambda – LambdaInvokeFunction
  6. DynamoDB – batch-processing-job
  7. Amazon CloudWatch Log – This is created when the first execution is made.
    1. /aws/batch/job
    2. /aws/lambda/LambdaInvokeFunction
  8. CodeCommit – batch-processing-job-repo
  9. CodeBuild – batch-processing-job-build

 

Once the above CloudFormation stack is complete in your personal account, we need to containerize the sample python application and push it to the ECR. This can be done in two ways as below:

 

Option A: CI/CD implementation.

As you notice, a CodeCommit and CodeBuild were created as part of above stack creation. CodeCommit URL can be found from your AWS Console > CodeCommit.  With this option, we can copy the contents from the downloaded source git repo and trigger deployment into your repository as soon as the code is checked in into your CodeCommit repository. Your CodeCommit repository will be similar to “https://git-codecommit.us-east-1.amazonaws.com/v1/repos/batch-processing-job-repo”

 

Below steps details to clone your code commit & push the changes to the repo

  1.     – $ git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/batch-processing-job-repo
  2.     – cd batch-processing-job-repo
  3.     – copy all the contents from SOURCE_REPOSITORY (from step 1) and paste inside this folder
  4.     – $ git add .
  5.     – $ git commit -m “commit from source”
  6.     – $ git push

 

You would notice as soon as the code is checked in into your CodeCommit repo, a build is triggered and Docker image built based on the Python source will be pushed to the ECR!

 

Option B: Pushing the Docker image to your repository manually in your local desktop

 

Optionally you can build the docker image and push it to the repository. The following commands build the Docker image from the provided Python code file and push the image to your Amazon ECR repository. Make sure to replace <YourAcccountNumber> with your information. The following sample CLI command uses the us-west-2 Region. If you change the Region, make sure to replace the Region values in the get-login, docker tag, and push commands, also.

 

#Get login credentials by copying and pasting the following into the command line

$ aws ecr get-login --region us-west-2 --no-include-email

# Build the Docker image.

$ docker build -t batch_processor .

# Tag the image to your repository.

$ docker tag batch_processor <YourAccountNumber>.dkr.ecr.us-west-2.amazonaws.com/batch-processing-job-repository

# Push your image to the repository.

$ docker push <YourAccountNumber>.dkr.ecr.us-west-2.amazonaws.com/batch-processing-job-repository

 

Navigate to the AWS Management Console, and verify that you can see the image in the Amazon ECR section (on the AWS Management Console screen).

 

Testing

  • In AWS Console, select “CloudFormation”. Select the S3 bucket that was created as part of the stack. This will be something like – batch-processing-job-<YourAccountNumber>
  • Drop the sample CSV file provided as part of the SOUCE_REPOSITORY

 

Code cleanup

To clean up, delete the contents of the Amazon S3 bucket and Amazon ECR repository.

 

In the AWS Management Console, navigate to your CloudFormation stack “batch-processing-job” and delete it.

 

Alternatively, run this command in AWS CLI to delete the job:

$ aws cloudformation delete-stack --stack-name batch-processing-job

 

Conclusion 

You were able to launch an application process involving AWS Batch to integrate with various AWS services. Depending on the scalability of the application needs, AWS Batch is able to process both faster and cost efficiently. I also provided a Python script, CloudFormation template, and a sample CSV file in the corresponding GitHub repo that takes care of all the preceding CLI arguments for you to build out the job definitions.

 

I encourage you to test this example and see for yourself how this overall orchestration works with AWS Batch. Then, it is just a matter of replacing your Python (or any other programming language framework) code, packaging it as a Docker container, and letting the AWS Batch handle the process efficiently.

 

If you decide to give it a try, have any doubt, or want to let me know what you think about the post, please leave a comment!

 


About the Author

Sivasubramanian Ramani (Siva Ramani) is a Sr Cloud Application Architect at AWS. His expertise is in application optimization, serverless solutions and using Microsoft application workloads with AWS.

New – Provisioned Concurrency for Lambda Functions

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/

It’s really true that time flies, especially when you don’t have to think about servers: AWS Lambda just turned 5 years old and the team is always looking for new ways to help customers build and run applications in an easier way.

As more mission critical applications move to serverless, customers need more control over the performance of their applications. Today we are launching Provisioned Concurrency, a feature that keeps functions initialized and hyper-ready to respond in double-digit milliseconds. This is ideal for implementing interactive services, such as web and mobile backends, latency-sensitive microservices, or synchronous APIs.

When you invoke a Lambda function, the invocation is routed to an execution environment to process the request. When a function has not been used for some time, when you need to process more concurrent invocations, or when you update a function, new execution environments are created. The creation of an execution environment takes care of installing the function code and starting the runtime. Depending on the size of your deployment package, and the initialization time of the runtime and of your code, this can introduce latency for the invocations that are routed to a new execution environment. This latency is usually referred to as a “cold start”. For most applications this additional latency is not a problem. For some applications, however, this latency may not be acceptable.

When you enable Provisioned Concurrency for a function, the Lambda service will initialize the requested number of execution environments so they can be ready to respond to invocations.

Configuring Provisioned Concurrency
I create two Lambda functions that use the same Java code and can be triggered by Amazon API Gateway. To simulate a production workload, these functions are repeating some mathematical computation 10 million times in the initialization phase and 200,000 times for each invocation. The computation is using java.Math.Random and conditions (if ...) to avoid compiler optimizations (such as “unlooping” the iterations). Each function has 1GB of memory and the size of the code is 1.7MB.

I want to enable Provisioned Concurrency only for one of the two functions, so that I can compare how they react to a similar workload. In the Lambda console, I select one the functions. In the configuration tab, I see the new Provisioned Concurrency settings.

I select Add configuration. Provisioned Concurrency can be enabled for a specific Lambda function version or alias (you can’t use $LATEST). You can have different settings for each version of a function. Using an alias, it is easier to enable these settings to the correct version of your function. In my case I select the alias live that I keep updated to the latest version using the AWS SAM AutoPublishAlias function preference. For the Provisioned Concurrency, I enter 500 and Save.

Now, the Provisioned Concurrency configuration is in progress. The execution environments are being prepared to serve concurrent incoming requests based on my input. During this time the function remains available and continues to serve traffic.

After a few minutes, the concurrency is ready. With these settings, up to 500 concurrent requests will find an execution environment ready to process them. If I go above that, the usual scaling of Lambda functions still applies.

To generate some load, I use an Amazon Elastic Compute Cloud (EC2) instance in the same region. To keep it simple, I use the ab tool bundled with the Apache HTTP Server to call the two API endpoints 10,000 times with a concurrency of 500. Since these are new functions, I expect that:

  • For the function with Provisioned Concurrency enabled and set to 500, my requests are managed by pre-initialized execution environments.
  • For the other function, that has Provisioned Concurrency disabled, about 500 execution environments need to be provisioned, adding some latency to the same amount of invocations, about 5% of the total.

One cool feature of the ab tool is that is reporting the percentage of the requests served within a certain time. That is a very good way to look at API latency, as described in this post on Serverless Latency by Tim Bray.

Here are the results for the function with Provisioned Concurrency disabled:

Percentage of the requests served within a certain time (ms)
50% 351
66% 359
75% 383
80% 396
90% 435
95% 1357
98% 1619
99% 1657
100% 1923 (longest request)

Looking at these numbers, I see that 50% the requests are served within 351ms, 66% of the requests within 359ms, and so on. It’s clear that something happens when I look at 95% or more of the requests: the time suddenly increases by about a second.

These are the results for the function with Provisioned Concurrency enabled:

Percentage of the requests served within a certain time (ms)
50% 352
66% 368
75% 382
80% 387
90% 400
95% 415
98% 447
99% 513
100% 593 (longest request)

Let’s compare those numbers in a graph.

As expected for my test workload, I see a big difference in the response time of the slowest 5% of the requests (between 95% and 100%), where the function with Provisioned Concurrency disabled shows the latency added by the creation of new execution environments and the (slow) initialization in my function code.

In general, the amount of latency added depends on the runtime you use, the size of your code, and the initialization required by your code to be ready for a first invocation. As a result, the added latency can be more, or less, than what I experienced here.

The number of invocations affected by this additional latency depends on how often the Lambda service needs to create new execution environments. Usually that happens when the number of concurrent invocations increases beyond what already provisioned, or when you deploy a new version of a function.

A small percentage of slow response times (generally referred to as tail latency) really makes a difference in end user experience. Over an extended period of time, most users are affected during some of their interactions. With Provisioned Concurrency enabled, user experience is much more stable.

Provisioned Concurrency is a Lambda feature and works with any trigger. For example, you can use it with WebSockets APIsGraphQL resolvers, or IoT Rules. This feature gives you more control when building serverless applications that require low latency, such as web and mobile apps, games, or any service that is part of a complex transaction.

Available Now
Provisioned Concurrency can be configured using the console, the AWS Command Line Interface (CLI), or AWS SDKs for new or existing Lambda functions, and is available today in the following AWS Regions: in US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), and Europe (Stockholm), Middle East (Bahrain), and South America (São Paulo).

You can use the AWS Serverless Application Model (SAM) and SAM CLI to test, deploy and manage serverless applications that use Provisioned Concurrency.

With Application Auto Scaling you can automate configuring the required concurrency for your functions. As policies, Target Tracking and Scheduled Scaling are supported. Using these policies, you can automatically increase the amount of concurrency during times of high demand and decrease it when the demand decreases.

You can also use Provisioned Concurrency today with AWS Partner tools, including configuring Provisioned Currency settings with the Serverless Framework and Terraform, or viewing metrics with Datadog, Epsagon, Lumigo, New Relic, SignalFx, SumoLogic, and Thundra.

You only pay for the amount of concurrency that you configure and for the period of time that you configure it. Pricing in US East (N. Virginia) is $0.015 per GB-hour for Provisioned Concurrency and $0.035 per GB-hour for Duration. The number of requests is charged at the same rate as normal functions. You can find more information in the Lambda pricing page.

This new feature enables developers to use Lambda for a variety of workloads that require highly consistent latency. Let me know what you are going to use it for!

Danilo

Decoupled Serverless Scheduler To Run HPC Applications At Scale on EC2

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/decoupled-serverless-scheduler-to-run-hpc-applications-at-scale-on-ec2/

This post is written by Ludvig Nordstrom and Mark Duffield | on November 27, 2019

In this blog post, we dive in to a cloud native approach for running HPC applications at scale on EC2 Spot Instances, using a decoupled serverless scheduler. This architecture is ideal for many workloads in the HPC and EDA industries, and can be used for any batch job workload.

At the end of this blog post, you will have two takeaways.

  1. A highly scalable environment that can run on hundreds of thousands of cores across EC2 Spot Instances.
  2. A fully serverless architecture for job orchestration.

We discuss deploying and running a pre-built serverless job scheduler that can run both Windows and Linux applications using any executable file format for your application. This environment provides high performance, scalability, cost efficiency, and fault tolerance. We introduce best practices and benefits to creating this environment, and cover the architecture, running jobs, and integration in to existing environments.

quick note about the term cloud native: we use the term loosely in this blog. Here, cloud native  means we use AWS Services (to include serverless and microservices) to build out our compute environment, instead of a traditional lift-and-shift method.

Let’s get started!

 

Solution overview

This blog goes over the deployment process, which leverages AWS CloudFormation. This allows you to use infrastructure as code to automatically build out your environment. There are two parts to the solution: the Serverless Scheduler and Resource Automation. Below are quick summaries of each part of the solutions.

Part 1 – The serverless scheduler

This first part of the blog builds out a serverless workflow to get jobs from SQS and run them across EC2 instances. The CloudFormation template being used for Part 1 is serverless-scheduler-app.template, and here is the Reference Architecture:

 

Serverless Scheduler Reference Architecture . Reference Architecture for Part 1. This architecture shows just the Serverless Schduler. Part 2 builds out the resource allocation architecture. Outlined Steps with detail from figure one

    Figure 1: Serverless Scheduler Reference Architecture (grayed-out area is covered in Part 2).

Read the GitHub Repo if you want to look at the Step Functions workflow contained in preceding images. The walkthrough explains how the serverless application retrieves and runs jobs on its worker, updates DynamoDB job monitoring table, and manages the worker for its lifetime.

 

Part 2 – Resource automation with serverless scheduler


This part of the solution relies on the serverless scheduler built in Part 1 to run jobs on EC2.  Part 2 simplifies submitting and monitoring jobs, and retrieving results for users. Jobs are spread across our cost-optimized Spot Instances. AWS Autoscaling automatically scales up the compute resources when jobs are submitted, then terminates them when jobs are finished. Both of these save you money.

The CloudFormation template used in Part 2 is resource-automation.template. Building on Figure 1, the additional resources launched with Part 2 are noted in the following image, they are an S3 Bucket, AWS Autoscaling Group, and two Lambda functions.

Resource Automation using Serverless Scheduler This is Part 2 of the deployment process, and leverages the Part 1 architecture. This provides the resource allocation, that allows for automated job submission and EC2 Auto Scaling. Detailed steps for the prior image

 

Figure 2: Resource Automation using Serverless Scheduler

                               

Introduction to decoupled serverless scheduling

HPC schedulers traditionally run in a classic master and worker node configuration. A scheduler on the master node orchestrates jobs on worker nodes. This design has been successful for decades, however many powerful schedulers are evolving to meet the demands of HPC workloads. This scheduler design evolved from a necessity to run orchestration logic on one machine, but there are now options to decouple this logic.

What are the possible benefits that decoupling this logic could bring? First, we avoid a number of shortfalls in the environment such as the need for all worker nodes to communicate with a single master node. This single source of communication limits scalability and creates a single point of failure. When we split the scheduler into decoupled components both these issues disappear.

Second, in an effort to work around these pain points, traditional schedulers had to create extremely complex logic to manage all workers concurrently in a single application. This stifled the ability to customize and improve the code – restricting changes to be made by the software provider’s engineering teams.

Serverless services, such as AWS Step Functions and AWS Lambda fix these major issues. They allow you to decouple the scheduling logic to have a one-to-one mapping with each worker, and instead share an Amazon Simple Queue Service (SQS) job queue. We define our scheduling workflow in AWS Step Functions. Then the workflow scales out to potentially thousands of “state machines.” These state machines act as wrappers around each worker node and manage each worker node individually.  Our code is less complex because we only consider one worker and its job.

We illustrate the differences between a traditional shared scheduler and decoupled serverless scheduler in Figures 3 and 4.

 

Traditional Scheduler Model This shows a traditional sceduler where there is one central schduling host, and then multiple workers.

Figure 3: Traditional Scheduler Model

 

Decoupled Serverless Scheduler on each instance This shows what a Decoupled Serverless Scheduler design looks like, wit

Figure 4: Decoupled Serverless Scheduler on each instance

 

Each decoupled serverless scheduler will:

  • Retrieve and pass jobs to its worker
  • Monitor its workers health and take action if needed
  • Confirm job success by checking output logs and retry jobs if needed
  • Terminate the worker when job queue is empty just before also terminating itself

With this new scheduler model, there are many benefits. Decoupling schedulers into smaller schedulers increases fault tolerance because any issue only affects one worker. Additionally, each scheduler consists of independent AWS Lambda functions, which maintains the state on separate hardware and builds retry logic into the service.  Scalability also increases, because jobs are not dependent on a master node, which enables the geographic distribution of jobs. This geographic distribution allows you to optimize use of low-cost Spot Instances. Also, when decoupling the scheduler, workflow complexity decreases and you can customize scheduler logic. You can leverage lower latency job monitoring and customize automated responses to job events as they happen.

 

Benefits

  • Fully managed –  With Part 2, Resource Automation deployed, resources for a job are managed. When a job is submitted, resources launch and run the job. When the job is done, worker nodes automatically shut down. This prevents you from incurring continuous costs.

 

  • Performance – Your application runs on EC2, which means you can choose any of the high performance instance types. Input files are automatically copied from Amazon S3 into local Amazon EC2 Instance Store for high performance storage during execution. Result files are automatically moved to S3 after each job finishes.

 

  • Scalability – A worker node combined with a scheduler state machine become a stateless entity. You can spin up as many of these entities as you want, and point them to an SQS queue. You can even distribute worker and state machine pairs across multiple AWS regions. These two components paired with fully managed services optimize your architecture for scalability to meet your desired number of workers.

 

  • Fault Tolerance –The solution is completely decoupled, which means each worker has its own state machine that handles scheduling for that worker. Likewise, each state machine is decoupled into Lambda functions that make up your state machine. Additionally, the scheduler workflow includes a Lambda function that confirms each successful job or resubmits jobs.

 

  • Cost Efficiency – This fault tolerant environment is perfect for EC2 Spot Instances. This means you can save up to 90% on your workloads compared to On-Demand Instance pricing. The scheduler workflow ensures little to no idle time of workers by closely monitoring and sending new jobs as jobs finish. Because the scheduler is serverless, you only incur costs for the resources required to launch and run jobs. Once the job is complete, all are terminated automatically.

 

  • Agility – You can use AWS fully managed Developer Tools to quickly release changes and customize workflows. The reduced complexity of a decoupled scheduling workflow means that you don’t have to spend time managing a scheduling environment, and can instead focus on your applications.

 

 

Part 1 – serverless scheduler as a standalone solution

 

If you use the serverless scheduler as a standalone solution, you can build clusters and leverage shared storage such as FSx for Lustre, EFS, or S3. Additionally, you can use AWS CloudFormation or to deploy more complex compute architectures that suit your application. So, the EC2 Instances that run the serverless scheduler can be launched in any number of ways. The scheduler only requires the instance id and the SQS job queue name.

 

Submitting Jobs Directly to serverless scheduler

The severless scheduler app is a fully built AWS Step Function workflow to pull jobs from an SQS queue and run them on an EC2 Instance. The jobs submitted to SQS consist of an AWS Systems Manager Run Command, and work with any SSM Document and command that you chose for your jobs. Examples of SSM Run Commands are ShellScript and PowerShell.  Feel free to read more about Running Commands Using Systems Manager Run Command.

The following code shows the format of a job submitted to SQS in JSON.

  {

    "job_id": "jobId_0",

    "retry": "3",

    "job_success_string": " ",

    "ssm_document": "AWS-RunPowerShellScript",

    "commands":

        [

            "cd C:\\ProgramData\\Amazon\\SSM; mkdir Result",

            "Copy-S3object -Bucket my-bucket -KeyPrefix jobs/date/jobId_0 -LocalFolder .\\",

            "C:\\ProgramData\\Amazon\\SSM\\jobId_0.bat",

            "Write-S3object -Bucket my-bucket -KeyPrefix jobs/date/jobId_0 –Folder .\\Result\\"

        ],

  }

 

Any EC2 Instance associated with a serverless scheduler it receives jobs picked up from a designated SQS queue until the queue is empty. Then, the EC2 resource automatically terminates. If the job fails, it retries until it reaches the specified number of times in the job definition. You can include a specific string value so that the scheduler searches for job execution outputs and confirms the successful completions of jobs.

 

Tagging EC2 workers to get a serverless scheduler state machine

In Part 1 of the deployment, you must manage your EC2 Instance launch and termination. When launching an EC2 Instance, tag it with a specific tag key that triggers a state machine to manage that instance. The tag value is the name of the SQS queue that you want your state machine to poll jobs from.

In the following example, “my-scheduler-cloudformation-stack-name” is the tag key that serverless scheduler app will for with any new EC2 instance that starts. Next, “my-sqs-job-queue-name” is the default job queue created with the scheduler. But, you can change this to any queue name you want to retrieve jobs from when an instance is launched.

{"my-scheduler-cloudformation-stack-name":"my-sqs-job-queue-name"}

 

Monitor jobs in DynamoDB

You can monitor job status in the following DynamoDB. In the table you can find job_id, commands sent to Amazon EC2, job status, job output logs from Amazon EC2, and retries among other things.

Alternatively, you can query DynamoDB for a given job_id via the AWS Command Line Interface:

aws dynamodb get-item --table-name job-monitoring \

                      --key '{"job_id": {"S": "/my-jobs/my-job-id.bat"}}'

 

Using the “job_success_string” parameter

For the prior DynamoDB table, we submitted two identical jobs using an example script that you can also use. The command sent to the instance is “echo Hello World.” The output from this job should be “Hello World.” We also specified three allowed job retries.  In the following image, there are two jobs in SQS queue before they ran.  Look closely at the different “job_success_strings” for each and the identical command sent to both:

DynamoDB CLI info This shows an example DynamoDB CLI output with job information.

From the image we see that Job2 was successful and Job1 retried three times before permanently labelled as failed. We forced this outcome to demonstrate how the job success string works by submitting Job1 with “job_success_string” as “Hello EVERYONE”, as that will not be in the job output “Hello World.” In “Job2” we set “job_success_string” as “Hello” because we knew this string will be in the output log.

Job outputs commonly have text that only appears if job succeeded. You can also add this text yourself in your executable file. With “job_success_string,” you can confirm a job’s successful output, and use it to identify a certain value that you are looking for across jobs.

 

Part 2 – Resource Automation with the serverless scheduler

The additional services we deploy in Part 2 integrate with existing architectures to launch resources for your serverless scheduler. These services allow you to submit jobs simply by uploading input files and executable files to an S3 bucket.

Likewise, these additional resources can use any executable file format you want, including proprietary application level scripts. The solution automates everything else. This includes creating and submitting jobs to SQS job queue, spinning up compute resources when new jobs come in, and taking them back down when there are no jobs to run. When jobs are done, result files are copied to S3 for the user to retrieve. Similar to Part 1, you can still view the DynamoDB table for job status.

This architecture makes it easy to scale out to different teams and departments, and you can submit potentially hundreds of thousands of jobs while you remain in control of resources and cost.

 

Deeper Look at the S3 Architecture

The following diagram shows how you can submit jobs, monitor progress, and retrieve results. To submit jobs, upload all the needed input files and an executable script to S3. The suffix of the executable file (uploaded last) triggers an S3 event to start the process, and this suffix is configurable.

The S3 key of the executable file acts as the job id, and is kept as a reference to that job in DynamoDB. The Lambda (#2 in diagram below) uses the S3 key of the executable to create three SSM Run Commands.

  1. Synchronize all files in the same S3 folder to a working directory on the EC2 Instance.
  2. Run the executable file on EC2 Instances within a specified working directory.
  3. Synchronize the EC2 Instances working directory back to the S3 bucket where newly generated result files are included.

This Lambda (#2) then places the job on the SQS queue using the schedulers JSON formatted job definition seen above.

IMPORTANT: Each set of job files should be given a unique job folder in S3 or more files than needed might be moved to the EC2 Instance.

 

Figure 5: Resource Automation using Serverless Scheduler - A deeper look A deeper dive in to Part 2, resource allcoation.

Figure 5: Resource Automation using Serverless Scheduler – A deeper look

 

EC2 and Step Functions workflow use the Lambda function (#3 in prior diagram) and the Auto Scaling group to scale out based on the number of jobs in the queue to a maximum number of workers (plus state machine), as defined in the Auto Scaling Group. When the job queue is empty, the number of running instances scale down to 0 as they finish their remaining jobs.

 

Process Submitting Jobs and Retrieving Results

  1. Seen in1, upload input file(s) and an executable file into a unique job folder in S3 (such as /year/month/day/jobid/~job-files). Upload the executable file last because it automatically starts the job. You can also use a script to upload multiple files at a time but each job will need a unique directory. There are many ways to make S3 buckets available to users including AWS Storage Gateway, AWS Transfer for SFTP, AWS DataSync, the AWS Console or any one of the AWS SDKs leveraging S3 API calls.
  2. You can monitor job status by accessing the DynamoDB table directly via the AWS Management Console or use the AWS CLI to call DynamoDB via an API call.
  3. Seen in step 5, you can retrieve result files for jobs from the same S3 directory where you left the input files. The DynamoDB table confirms when jobs are done. The SQS output queue can be used by applications that must automatically poll and retrieve results.

You no longer need to create or access compute nodes as compute resources. These automatically scale up from zero when jobs come in, and then back down to zero when jobs are finished.

 

Deployment

Read the GitHub Repo for deployment instructions. Below are CloudFormation templates to help:

AWS RegionLaunch Stack
eu-north-1link to zone
ap-south-1
eu-west-3
eu-west-2
eu-west-1
ap-northeast-3
ap-northeast-2
ap-northeast-1
sa-east-1
ca-central-1
ap-southeast-1
ap-southeast-2
eu-central-1
us-east-1
us-east-2
us-west-1
us-west-2

 

 

Additional Points on Usage Patterns

 

  • While the two solutions in this blog are aimed at HPC applications, they can be used to run any batch jobs. Many customers that run large data processing batch jobs in their data lakes could use the serverless scheduler.

 

  • You can build pipelines of different applications when the output of one job triggers another to do something else – an example being pre-processing, meshing, simulation, post-processing. You simply deploy the Resource Automation template several times, and tailor it so that the output bucket for one step is the input bucket for the next step.

 

  • You might look to use the “job_success_string” parameter for iteration/verification used in cases where a shot-gun approach is needed to run thousands of jobs, and only one has a chance of producing the right result. In this case the “job_success_string” would identify the successful job from potentially hundreds of thousands pushed to SQS job queue.

 

Scale-out across teams and departments

Because all services used are serverless, you can deploy as many run environments as needed without increasing overall costs. Serverless workloads only accumulate cost when the services are used. So, you could deploy ten job environments and run one job in each, and your costs would be the same if you had one job environment running ten jobs.

 

All you need is an S3 bucket to upload jobs to and an associated AMI that has the right applications and license configuration. Because a job configuration is passed to the scheduler at each job start, you can add new teams by creating an S3 bucket and pointing S3 events to a default Lambda function that pulls configurations for each job start.

 

Setup CI/CD pipeline to start continuous improvement of scheduler

If you are advanced, we encourage you to clone the git repo and customize this solution. The serverless scheduler is less complex than other schedulers, because you only think about one worker and the process of one job’s run.

Ways you could tailor this solution:

  • Add intelligent job scheduling using AWS Sagemaker  – It is hard to find data as ready for ML as log data because every job you run has different run times and resource consumption. So, you could tailor this solution to predict the best instance to use with ML when workloads are submitted.
  • Add Custom Licensing Checkout Logic – Simply add one Lambda function to your Step Functions workflow to make an API call a license server before continuing with one or more jobs. You can start a new worker when you have a license checked out or if a license is not available then the instance can terminate to remove any costs waiting for licenses.
  • Add Custom Metrics to DynamoDB – You can easily add metrics to DynamoDB because the solution already has baseline logging and monitoring capabilities.
  • Run on other AWS Services – There is a Lambda function in the Step Functions workflow called “Start_Job”. You can tailor this Lambda to run your jobs on AWS Sagemaker, AWS EMR, AWS EKS or AWS ECS instead of EC2.

 

Conclusion

 

Although HPC workloads and EDA flows may still be dependent on current scheduling technologies, we illustrated the possibilities of decoupling your workloads from your existing shared scheduling environments. This post went deep into decoupled serverless scheduling, and we understand that it is difficult to unwind decades of dependencies. However, leveraging numerous AWS Services encourages you to think completely differently about running workloads.

But more importantly, it encourages you to Think Big. With this solution you can get up and running quickly, fail fast, and iterate. You can do this while scaling to your required number of resources, when you want them, and only pay for what you use.

Serverless computing  catalyzes change across all industries, but that change is not obvious in the HPC and EDA industries. This solution is an opportunity for customers to take advantage of the nearly limitless capacity that AWS.

Please reach out with questions about HPC and EDA on AWS. You now have the architecture and the instructions to build your Serverless Decoupled Scheduling environment.  Go build!


About the Authors and Contributors

Authors 

 

Ludvig Nordstrom is a Senior Solutions Architect at AWS

 

 

 

 

Mark Duffield is a Tech Lead in Semiconductors at AWS

 

 

 

Contributors

 

Steve Engledow is a Senior Solutions Builder at AWS

 

 

 

 

Arun Thomas is a Senior Solutions Builder at AWS