Tag Archives: Best practices

Continuous runtime security monitoring with AWS Security Hub and Falco

Post Syndicated from Rajarshi Das original https://aws.amazon.com/blogs/security/continuous-runtime-security-monitoring-with-aws-security-hub-and-falco/

Customers want a single and comprehensive view of the security posture of their workloads. Runtime security event monitoring is important to building secure, operationally excellent, and reliable workloads, especially in environments that run containers and container orchestration platforms. In this blog post, we show you how to use services such as AWS Security Hub and Falco, a Cloud Native Computing Foundation project, to build a continuous runtime security monitoring solution.

With the solution in place, you can collect runtime security findings from multiple AWS accounts running one or more workloads on AWS container orchestration platforms, such as Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS). The solution collates the findings across those accounts into a designated account where you can view the security posture across accounts and workloads.

 

Solution overview

Security Hub collects security findings from other AWS services using a standardized AWS Security Findings Format (ASFF). Falco provides the ability to detect security events at runtime for containers. Partner integrations like Falco are also available on Security Hub and use ASFF. Security Hub provides a custom integrations feature using ASFF to enable collection and aggregation of findings that are generated by custom security products.

The solution in this blog post uses AWS FireLens, Amazon CloudWatch Logs, and AWS Lambda to enrich logs from Falco and populate Security Hub.

Figure : Architecture diagram of continuous runtime security monitoring

Figure 1: Architecture diagram of continuous runtime security monitoring

Here’s how the solution works, as shown in Figure 1:

  1. An AWS account is running a workload on Amazon EKS.
    1. Runtime security events detected by Falco for that workload are sent to CloudWatch logs using AWS FireLens.
    2. CloudWatch logs act as the source for FireLens and a trigger for the Lambda function in the next step.
    3. The Lambda function transforms the logs into the ASFF. These findings can now be imported into Security Hub.
    4. The Security Hub instance that is running in the same account as the workload running on Amazon EKS stores and processes the findings provided by Lambda and provides the security posture to users of the account. This instance also acts as a member account for Security Hub.
  2. Another AWS account is running a workload on Amazon ECS.
    1. Runtime security events detected by Falco for that workload are sent to CloudWatch logs using AWS FireLens.
    2. CloudWatch logs acts as the source for FireLens and a trigger for the Lambda function in the next step.
    3. The Lambda function transforms the logs into the ASFF. These findings can now be imported into Security Hub.
    4. The Security Hub instance that is running in the same account as the workload running on Amazon ECS stores and processes the findings provided by Lambda and provides the security posture to users of the account. This instance also acts as another member account for Security Hub.
  3. The designated Security Hub administrator account combines the findings generated by the two member accounts, and then provides a comprehensive view of security alerts and security posture across AWS accounts. If your workloads span multiple regions, Security Hub supports aggregating findings across Regions.

 

Prerequisites

For this walkthrough, you should have the following in place:

  1. Three AWS accounts.

    Note: We recommend three accounts so you can experience Security Hub’s support for a multi-account setup. However, you can use a single AWS account instead to host the Amazon ECS and Amazon EKS workloads, and send findings to Security Hub in the same account. If you are using a single account, skip the following account specific-guidance. If you are integrated with AWS Organizations, the designated Security Hub administrator account will automatically have access to the member accounts.

  2. Security Hub set up with an administrator account on one account.
  3. Security Hub set up with member accounts on two accounts: one account to host the Amazon EKS workload, and one account to host the Amazon ECS workload.
  4. Falco set up on the Amazon EKS and Amazon ECS clusters, with logs routed to CloudWatch Logs using FireLens. For instructions on how to do this, see:

    Important: Take note of the names of the CloudWatch Logs groups, as you will need them in the next section.

  5. AWS Cloud Development Kit (CDK) installed on the member accounts to deploy the solution that provides the custom integration between Falco and Security Hub.

 

Deploying the solution

In this section, you will learn how to deploy the solution and enable the CloudWatch Logs group. Enabling the CloudWatch Logs group is the trigger for running the Lambda function in both member accounts.

To deploy this solution in your own account

  1. Clone the aws-securityhub-falco-ecs-eks-integration GitHub repository by running the following command.
    $git clone https://github.com/aws-samples/aws-securityhub-falco-ecs-eks-integration
  2. Follow the instructions in the README file provided on GitHub to build and deploy the solution. Make sure that you deploy the solution to the accounts hosting the Amazon EKS and Amazon ECS clusters.
  3. Navigate to the AWS Lambda console and confirm that you see the newly created Lambda function. You will use this function in the next section.
Figure : Lambda function for Falco integration with Security Hub

Figure 2: Lambda function for Falco integration with Security Hub

To enable the CloudWatch Logs group

  1. In the AWS Management Console, select the Lambda function shown in Figure 2—AwsSecurityhubFalcoEcsEksln-lambdafunction—and then, on the Function overview screen, select + Add trigger.
  2. On the Add trigger screen, provide the following information and then select Add, as shown in Figure 3.
    • Trigger configuration – From the drop-down, select CloudWatch logs.
    • Log group – Choose the Log group you noted in Step 4 of the Prerequisites. In our setup, the log group for the Amazon ECS and Amazon EKS clusters, deployed in separate AWS accounts, was set with the same value (falco).
    • Filter name – Provide a name for the filter. In our example, we used the name falco.
    • Filter pattern – optional – Leave this field blank.
    Figure 3: Lambda function trigger - CloudWatch Log group

    Figure 3: Lambda function trigger – CloudWatch Log group

  3. Repeat these steps (as applicable) to set up the trigger for the Lambda function deployed in other accounts.

 

Testing the deployment

Now that you’ve deployed the solution, you will verify that it’s working.

With the default rules, Falco generates alerts for activities such as:

  • An attempt to write to a file below the /etc folder. The /etc folder contains important system configuration files.
  • An attempt to open a sensitive file (such as /etc/shadow) for reading.

To test your deployment, you will attempt to perform these activities to generate Falco alerts that are reported as Security Hub findings in the same account. Then you will review the findings.

To test the deployment in member account 1

  1. Run the following commands to trigger an alert in member account 1, which is running an Amazon EKS cluster. Replace <container_name> with your own value.
    kubectl exec -it <container_name> /bin/bash
    touch /etc/5
    cat /etc/shadow > /dev/null
  2. To see the list of findings, log in to your Security Hub admin account and navigate to Security Hub > Findings. As shown in Figure 4, you will see the alerts generated by Falco, including the Falco-generated title, and the instance where the alert was triggered.

    Figure 4: Findings in Security Hub

    Figure 4: Findings in Security Hub

  3. To see more detail about a finding, check the box next to the finding. Figure 5 shows some of the details for the finding Read sensitive file untrusted.
    Figure 5: Sensitive file read finding - detail view

    Figure 5: Sensitive file read finding – detail view

    Figure 6 shows the Resources section of this finding, that includes the instance ID of the Amazon EKS cluster node. In our example this is the Amazon Elastic Compute Cloud (Amazon EC2) instance.

    Figure 6: Resource Detail in Security Hub finding

To test the deployment in member account 2

  1. Run the following commands to trigger a Falco alert in member account 2, which is running an Amazon ECS cluster. Replace <<container_id> with your own value.
    docker exec -it <container_id> bash
    touch /etc/5
    cat /etc/shadow > /dev/null
  2. As in the preceding example with member account 1, to view the findings related to this alert, navigate to your Security Hub admin account and select Findings.

To view the collated findings from both member accounts in Security Hub

  1. In the designated Security Hub administrator account, navigate to Security Hub > Findings. The findings from both member accounts are collated in the designated Security Hub administrator account. You can use this centralized account to view the security posture across accounts and workloads. Figure 7 shows two findings, one from each member account, viewable in the Single Pane of Glass administrator account.

    Figure 7: Write below /etc findings in a single view

    Figure 7: Write below /etc findings in a single view

  2. To see more information and a link to the corresponding member account where the finding was generated, check the box next to the finding. Figure 8 shows the account detail associated with a specific finding in member account 1.
    Figure 8: Write under /etc detail view in Security Hub admin account

    Figure 8: Write under /etc detail view in Security Hub admin account

    By centralizing and enriching the findings from Falco, you can take action more quickly or perform automated remediation on the impacted resources.

 

Cleaning up

To clean up this demo:

  1. Delete the CloudWatch Logs trigger from the Lambda functions that were created in the section To enable the CloudWatch Logs group.
  2. Delete the Lambda functions by deleting the CloudFormation stack, created in the section To deploy this solution in your own account.
  3. Delete the Amazon EKS and Amazon ECS clusters created as part of the Prerequisites.

 

Conclusion

In this post, you learned how to achieve multi-account continuous runtime security monitoring for container-based workloads running on Amazon EKS and Amazon ECS. This is achieved by creating a custom integration between Falco and Security Hub.

You can extend this solution in a number of ways. For example:

  • You can forward findings across accounts using a single source to security information and event management (SIEM) tools such as Splunk.
  • You can perform automated remediation activities based on the findings generated, using Lambda.

To learn more about managing a centralized Security Hub administrator account, see Managing administrator and member accounts. To learn more about working with ASFF, see AWS Security Finding Format (ASFF) in the documentation. To learn more about the Falco engine and rule structure, see the Falco documentation.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Rajarshi Das

Rajarshi Das

Rajarshi is a Solutions Architect at Amazon Web Services. He focuses on helping Public Sector customers accelerate their security and compliance certifications and authorizations by architecting secure and scalable solutions. Rajarshi holds 4 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

Author

Adam Cerini

Adam is a Senior Solutions Architect with Amazon Web Services. He focuses on helping Public Sector customers architect scalable, secure, and cost effective systems. Adam holds 5 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

Unify log aggregation and analytics across compute platforms

Post Syndicated from Hari Ohm Prasath original https://aws.amazon.com/blogs/big-data/unify-log-aggregation-and-analytics-across-compute-platforms/

Our customers want to make sure their users have the best experience running their application on AWS. To make this happen, you need to monitor and fix software problems as quickly as possible. Doing this gets challenging with the growing volume of data needing to be quickly detected, analyzed, and stored. In this post, we walk you through an automated process to aggregate and monitor logging-application data in near-real time, so you can remediate application issues faster.

This post shows how to unify and centralize logs across different computing platforms. With this solution, you can unify logs from Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Kinesis Data Firehose, and AWS Lambda using agents, log routers, and extensions. We use Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) with OpenSearch Dashboards to visualize and analyze the logs, collected across different computing platforms to get application insights. You can deploy the solution using the AWS Cloud Development Kit (AWS CDK) scripts provided as part of the solution.

Customer benefits

A unified aggregated log system provides the following benefits:

  • A single point of access to all the logs across different computing platforms
  • Help defining and standardizing the transformations of logs before they get delivered to downstream systems like Amazon Simple Storage Service (Amazon S3), Amazon OpenSearch Service, Amazon Redshift, and other services
  • The ability to use Amazon OpenSearch Service to quickly index, and OpenSearch Dashboards to search and visualize logs from its routers, applications, and other devices

Solution overview

In this post, we use the following services to demonstrate log aggregation across different compute platforms:

  • Amazon EC2 – A web service that provides secure, resizable compute capacity in the cloud. It’s designed to make web-scale cloud computing easier for developers.
  • Amazon ECS – A web service that makes it easy to run, scale, and manage Docker containers on AWS, designed to make the Docker experience easier for developers.
  • Amazon EKS – A web service that makes it easy to run, scale, and manage Docker containers on AWS.
  • Kinesis Data Firehose – A fully managed service that makes it easy to stream data to Amazon S3, Amazon Redshift, or Amazon OpenSearch Service.
  • Lambda – A compute service that lets you run code without provisioning or managing servers. It’s designed to make web-scale cloud computing easier for developers.
  • Amazon OpenSearch Service – A fully managed service that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more.

The following diagram shows the architecture of our solution.

The architecture uses various log aggregation tools such as log agents, log routers, and Lambda extensions to collect logs from multiple compute platforms and deliver them to Kinesis Data Firehose. Kinesis Data Firehose streams the logs to Amazon OpenSearch Service. Log records that fail to get persisted in Amazon OpenSearch service will get written to AWS S3. To scale this architecture, each of these compute platforms streams the logs to a different Firehose delivery stream, added as a separate index, and rotated every 24 hours.

The following sections demonstrate how the solution is implemented on each of these computing platforms.

Amazon EC2

The Kinesis agent collects and streams logs from the applications running on EC2 instances to Kinesis Data Firehose. The agent is a standalone Java software application that offers an easy way to collect and send data to Kinesis Data Firehose. The agent continuously monitors files and sends logs to the Firehose delivery stream.

BDB-1742-Ec2

The AWS CDK script provided as part of this solution deploys a simple PHP application that generates logs under the /etc/httpd/logs directory on the EC2 instance. The Kinesis agent is configured via /etc/aws-kinesis/agent.json to collect data from access_logs and error_logs, and stream them periodically to Kinesis Data Firehose (ec2-logs-delivery-stream).

Because Amazon OpenSearch Service expects data in JSON format, you can add a call to a Lambda function to transform the log data to JSON format within Kinesis Data Firehose before streaming to Amazon OpenSearch Service. The following is a sample input for the data transformer:

46.99.153.40 - - [29/Jul/2021:15:32:33 +0000] "GET / HTTP/1.1" 200 173 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"

The following is our output:

{
    "logs" : "46.99.153.40 - - [29/Jul/2021:15:32:33 +0000] \"GET / HTTP/1.1\" 200 173 \"-\" \"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36\"",
}

We can enhance the Lambda function to extract the timestamp, HTTP, and browser information from the log data, and store them as separate attributes in the JSON document.

Amazon ECS

In the case of Amazon ECS, we use FireLens to send logs directly to Kinesis Data Firehose. FireLens is a container log router for Amazon ECS and AWS Fargate that gives you the extensibility to use the breadth of services at AWS or partner solutions for log analytics and storage.

BDB-1742-ECS

The architecture hosts FireLens as a sidecar, which collects logs from the main container running an httpd application and sends them to Kinesis Data Firehose and streams to Amazon OpenSearch Service. The AWS CDK script provided as part of this solution deploys a httpd container hosted behind an Application Load Balancer. The httpd logs are pushed to Kinesis Data Firehose (ecs-logs-delivery-stream) through the FireLens log router.

Amazon EKS

With the recent announcement of Fluent Bit support for Amazon EKS, you no longer need to run a sidecar to route container logs from Amazon EKS pods running on Fargate. With the new built-in logging support, you can select a destination of your choice to send the records to. Amazon EKS on Fargate uses a version of Fluent Bit for AWS, an upstream conformant distribution of Fluent Bit managed by AWS.

BDB-1742-EKS

The AWS CDK script provided as part of this solution deploys an NGINX container hosted behind an internal Application Load Balancer. The NGINX container logs are pushed to Kinesis Data Firehose (eks-logs-delivery-stream) through the Fluent Bit plugin.

Lambda

For Lambda functions, you can send logs directly to Kinesis Data Firehose using the Lambda extension. You can deny the records being written to Amazon CloudWatch.

BDB-1742-Lambda

After deployment, the workflow is as follows:

  1. On startup, the extension subscribes to receive logs for the platform and function events. A local HTTP server is started inside the external extension, which receives the logs.
  2. The extension buffers the log events in a synchronized queue and writes them to Kinesis Data Firehose via PUT records.
  3. The logs are sent to downstream systems.
  4. The logs are sent to Amazon OpenSearch Service.

The Firehose delivery stream name gets specified as an environment variable (AWS_KINESIS_STREAM_NAME).

For this solution, because we’re only focusing on collecting the run logs of the Lambda function, the data transformer of the Kinesis Data Firehose delivery stream filters out the records of type function ("type":"function") before sending it to Amazon OpenSearch Service.

The following is a sample input for the data transformer:

[
   {
      "time":"2021-07-29T19:54:08.949Z",
      "type":"platform.start",
      "record":{
         "requestId":"024ae572-72c7-44e0-90f5-3f002a1df3f2",
         "version":"$LATEST"
      }
   },
   {
      "time":"2021-07-29T19:54:09.094Z",
      "type":"platform.logsSubscription",
      "record":{
         "name":"kinesisfirehose-logs-extension-demo",
         "state":"Subscribed",
         "types":[
            "platform",
            "function"
         ]
      }
   },
   {
      "time":"2021-07-29T19:54:09.096Z",
      "type":"function",
      "record":"2021-07-29T19:54:09.094Z\tundefined\tINFO\tLoading function\n"
   },
   {
      "time":"2021-07-29T19:54:09.096Z",
      "type":"platform.extension",
      "record":{
         "name":"kinesisfirehose-logs-extension-demo",
         "state":"Ready",
         "events":[
            "INVOKE",
            "SHUTDOWN"
         ]
      }
   },
   {
      "time":"2021-07-29T19:54:09.097Z",
      "type":"function",
      "record":"2021-07-29T19:54:09.097Z\t024ae572-72c7-44e0-90f5-3f002a1df3f2\tINFO\tvalue1 = value1\n"
   },   
   {
      "time":"2021-07-29T19:54:09.098Z",
      "type":"platform.runtimeDone",
      "record":{
         "requestId":"024ae572-72c7-44e0-90f5-3f002a1df3f2",
         "status":"success"
      }
   }
]

Prerequisites

To implement this solution, you need the following prerequisites:

Build the code

Check out the AWS CDK code by running the following command:

mkdir unified-logs && cd unified-logs
git clone https://github.com/aws-samples/unified-log-aggregation-and-analytics .

Build the lambda extension by running the following command:

cd lib/computes/lambda/extensions
chmod +x extension.sh
./extension.sh
cd ../../../../

Make sure to replace default AWS region specified under the value of firehose.endpoint attribute inside lib/computes/ec2/ec2-startup.sh.

Build the code by running the following command:

yarn install && npm run build

Deploy the code

If you’re running AWS CDK for the first time, run the following command to bootstrap the AWS CDK environment (provide your AWS account ID and AWS Region):

cdk bootstrap \
    --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess \
    aws://<AWS Account Id>/<AWS_REGION>

You only need to bootstrap the AWS CDK one time (skip this step if you have already done this).

Run the following command to deploy the code:

cdk deploy --requires-approval

You get the following output:

 ✅  CdkUnifiedLogStack

Outputs:
CdkUnifiedLogStack.ec2ipaddress = xx.xx.xx.xx
CdkUnifiedLogStack.ecsloadbalancerurl = CdkUn-ecsse-PY4D8DVQLK5H-xxxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.ecsserviceLoadBalancerDNS570CB744 = CdkUn-ecsse-PY4D8DVQLK5H-xxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.ecsserviceServiceURL88A7B1EE = http://CdkUn-ecsse-PY4D8DVQLK5H-xxxx.us-east-1.elb.amazonaws.com
CdkUnifiedLogStack.eksclusterClusterNameCE21A0DB = ekscluster92983EFB-d29892f99efc4419bc08534a3d253160
CdkUnifiedLogStack.eksclusterConfigCommand515C0544 = aws eks update-kubeconfig --name ekscluster92983EFB-d29892f99efc4419bc08534a3d253160 --region us-east-1 --role-arn arn:aws:iam::xxx:role/CdkUnifiedLogStack-clustermasterroleCD184EDB-12U2TZHS28DW4
CdkUnifiedLogStack.eksclusterGetTokenCommand3C33A2A5 = aws eks get-token --cluster-name ekscluster92983EFB-d29892f99efc4419bc08534a3d253160 --region us-east-1 --role-arn arn:aws:iam::xxx:role/CdkUnifiedLogStack-clustermasterroleCD184EDB-12U2TZHS28DW4
CdkUnifiedLogStack.elasticdomainarn = arn:aws:es:us-east-1:xxx:domain/cdkunif-elasti-rkiuv6bc52rp
CdkUnifiedLogStack.s3bucketname = cdkunifiedlogstack-logsfailederrcapturebucket0bcc-xxxxx
CdkUnifiedLogStack.samplelambdafunction = CdkUnifiedLogStack-LambdatransformerfunctionFA3659-c8u392491FrW

Stack ARN:
arn:aws:cloudformation:us-east-1:xxxx:stack/CdkUnifiedLogStack/6d53ef40-efd2-11eb-9a9d-1230a5204572

AWS CDK takes care of building the required infrastructure, deploying the sample application, and collecting logs from different sources to Amazon OpenSearch Service.

The following is some of the key information about the stack:

  • ec2ipaddress – The public IP address of the EC2 instance, deployed with the sample PHP application
  • ecsloadbalancerurl – The URL of the Amazon ECS Load Balancer, deployed with the httpd application
  • eksclusterClusterNameCE21A0DB – The Amazon EKS cluster name, deployed with the NGINX application
  • samplelambdafunction – The sample Lambda function using the Lambda extension to send logs to Kinesis Data Firehose
  • opensearch-domain-arn – The ARN of the Amazon OpenSearch Service domain

Generate logs

To visualize the logs, you first need to generate some sample logs.

  1. To generate Lambda logs, invoke the function using the following AWS CLI command (run it a few times):
aws lambda invoke \
--function-name "<<samplelambdafunction>>" \
--payload '{"payload": "hello"}' /tmp/invoke-result \
--cli-binary-format raw-in-base64-out \
--log-type Tail

Make sure to replace samplelambdafunction with the actual Lambda function name. The file path needs to be updated based on the underlying operating system.

The function should return "StatusCode": 200, with the following output:

{
    "StatusCode": 200,
    "LogResult": "<<Encoded>>",
    "ExecutedVersion": "$LATEST"
}
  1. Run the following command a couple of times to generate Amazon EC2 logs:
curl http://ec2ipaddress:80

Make sure to replace ec2ipaddress with the public IP address of the EC2 instance.

  1. Run the following command a couple of times to generate Amazon ECS logs:
curl http://ecsloadbalancerurl:80

Make sure to replace ecsloadbalancerurl with the public ARN of the AWS Application Load Balancer.

We deployed the NGINX application with an internal load balancer, so the load balancer hits the health checkpoint of the application, which is sufficient to generate the Amazon EKS access logs.

Visualize the logs

To visualize the logs, complete the following steps:

  1. On the Amazon OpenSearch Service console, choose the hyperlink provided for the OpenSearch Dashboard 7URL.
  2. Configure access to the OpenSearch Dashboard.
  3. Under OpenSearch Dashboard, on the Discover menu, start creating a new index pattern for each compute log.

We can see separate indexes for each compute log partitioned by date, as in the following screenshot.

BDB-1742-create-index

The following screenshot shows the process to create index patterns for Amazon EC2 logs.

BDB-1742-ec2

After you create the index pattern, we can start analyzing the logs using the Discover menu under OpenSearch Dashboard in the navigation pane. This tool provides a single searchable and unified interface for all the records with various compute platforms. We can switch between different logs using the Change index pattern submenu.

BDB-1742-unified

Clean up

Run the following command from the root directory to delete the stack:

cdk destroy

Conclusion

In this post, we showed how to unify and centralize logs across different compute platforms using Kinesis Data Firehose and Amazon OpenSearch Service. This approach allows you to analyze logs quickly and the root cause of failures, using a single platform rather than different platforms for different services.

If you have feedback about this post, submit your comments in the comments section.

Resources

For more information, see the following resources:


About the author

HariHari Ohm Prasath is a Senior Modernization Architect at AWS, helping customers with their modernization journey to become cloud native. Hari loves to code and actively contributes to the open source initiatives. You can find him in Medium, Github & Twitter @hariohmprasath.

balluBallu Singh is a Principal Solutions Architect at AWS. He lives in the San Francisco Bay area and helps customers architect and optimize applications on AWS. In his spare time, he enjoys reading and spending time with his family.

How to customize behavior of AWS Managed Rules for AWS WAF

Post Syndicated from Madhu Kondur original https://aws.amazon.com/blogs/security/how-to-customize-behavior-of-aws-managed-rules-for-aws-waf/

AWS Managed Rules for AWS WAF provides a group of rules created by AWS that can be used help protect you against common application vulnerabilities and other unwanted access to your systems without having to write your own rules. AWS Threat Research Team updates AWS Managed Rules to respond to an ever-changing threat landscape in order to protect your applications.

Recently, AWS WAF launched four new features that are centered on rule customization:

  • Labels – Metadata that can be added to web requests when a rule is matched. Labels can be used to alter the behavior or default action of managed rules.
  • Version management – You can select a specific version of a managed rule group. Versioning can be used to return to previously tested versions.
  • Scope-down statements – Use to narrow the scope of the requests that a rule group evaluates.
  • Custom responses – Send a custom HTTP response back to the client from AWS WAF when a rule blocks a connection request.

In this blog, we go through four use cases to demonstrate how you can use these features to improve your security posture by customizing managed rules.

Case 1: Control automatic updates for a managed rule group by selecting a specific version

By default, managed rule groups are updated automatically as updates become available. This ensures you have the latest protection as soon as it’s available. With the version management feature, you can choose to stay on a specific version, meaning that it won’t update until you explicitly move to a newer version. This allows you to test a new version and promote it to your web ACL when you’re ready, and to return to a previously tested version if necessary.

Note: It’s recommended that you use a version as close as possible to the latest.

To select a managed rule group version

  1. In your AWS WAF console, navigate to the web ACL where you’ve added a managed rule group.
  2. Select the managed rule group whose version you want to set, and choose Edit.
  3. In the Version selection drop down, select the version you want to use. You’ll remain on this version until the version expires or you select another version—you’ll learn how to manage version expiration later in this post.

Note: If you want to receive updates automatically, select Default as the version.

  1. Choose Save Rule to save the configuration.

Figure 1: Console screenshot showing the AWS Managed Rules version drop downFigure 1: Console screenshot showing the AWS Managed Rules version drop down

Set up notifications

You can use Amazon Simple Notification Service (Amazon SNS) to get notifications of updates to a managed rule group. You can subscribe to the SNS topic using the ARN of the managed rule group. Every SNS notification for AWS Managed Rules updates uses the same message format, which enables you to consume these updates programmatically. For more details on the SNS notification message format, see Getting notified of new versions and updates to a managed rule group.

To set up email notifications on new rule updates through Amazon SNS

  1. In your AWS WAF console, navigate to the web ACL where you added the managed rule group.
  2. Select the managed rule group that you want to receive notifications for, and choose Edit.
  3. On the Core rule set page, look for the Amazon SNS topic ARN. Select the link to go to the Amazon SNS console. Make a note of the topic ARN to use in step 4.

Figure 2: Console screenshot highlighting the SNS topic ARNFigure 2: Console screenshot highlighting the SNS topic ARN

  1. On the Create subscription page, enter the following information:
    Topic ARN: Enter the SNS topic ARN from step 3.
    Protocol: Select Email.
    Endpoint: Enter the email address where you want notifications sent.

Figure 3: SNS Create subscription console screenshotFigure 3: SNS Create subscription console screenshot

  1. Choose Create subscription.
  2. Watch for a confirmation email from Amazon SNS. Choose the confirm subscription link in the email to complete the subscription.

Set up a version expiration alert using a CloudWatch alarm

When you stay on a specific version of managed rule group for a long time, there is a risk that you may miss important updates. To ensure you do not stay on a stale version for long time, you should set up an alarm to alert you when a version is close to expiring. When a version expires, the managed rule group automatically switches to the default version. To be notified when a version is about to expire, set up an alert using an Amazon CloudWatch alarm based on DaysToExpiry. You can use the following procedure to set up a notification 60 days before a specific version of the rule set you’re using expires.

To set up a CloudWatch alarm

This will notify you 60 days before a specific version of a rule set expires

  1. Navigate to the CloudWatch console.
  2. Select All metrics from the left navigation pane, and then select WAFV2 from the list of namespaces.
  3. Choose ManagedRuleGroup, Region, Vendor, Version.
  4. Select the managed rule group whose expiration you want to monitor. This example uses AWSManagedRulesCommonRuleSet and Version_1.0.
  5. Select Graphed metric and select the bell alarm icon on the lower right, under Actions. Selecting this icon will take you to the CloudWatch alarms console.

Figure 4: CloudWatch Graphed metrics tabFigure 4: CloudWatch Graphed metrics tab

  1. Configure the CloudWatch alarm with the following details, and then choose Next:
    Statistic: Select Minimum
    Period: Select 5 minutes
    Threshold Type: Select Static
    Operator: Select Lower/Equal (<=threshold)
    Threshold: Enter the value as 60
    Datapoints to alarm: Enter the lower value as 1 and higher value as 1
    Missing data treatment: Select Treat missing data as good (not breaching threshold)
  2. Select the SNS topic that you want to be launched when the configured alarm goes to ALARM state and choose Next.
  3. Enter a name and description for the Alarm. Choose Next to preview the configuration and choose Create Alarm to complete the CloudWatch alarm creation process.

Additional tips

  • If the version of a managed rule group that you’re using has expired, AWS WAF will prevent any configuration change to the web ACL until you select a valid version. You should move onto the newest version as soon as possible so you are covered against the latest threats.
  • You will only receive the DaysToExpiry metric when there is traffic flowing through your web ACL.
  • You can use two different versions of a managed rule group in a web ACL. This can be useful if you want to test two different versions simultaneously to see how they will affect your traffic once deployed—for example, have one version in count mode and the other in block mode.

Note: This workflow is supported through the JSON rule editor and API, but not through the console.

Case 2: Use labels to mitigate false positives caused by a rule in a managed rule group

A label is metadata that a rule can add to matching web requests, regardless of the action associated with the rule. The latest version of AWS Managed Rules supports labels. By creating custom rules that match requests that have labels, you can change the behavior or default action of rules inside a managed rule group.

For example, if you have a rule that’s causing a false positive in a managed rule group, you can mitigate it by overriding the managed rule to Count and writing a custom rule with logic similar to the following:

IF (Statement 1) AND NOT (Statement 2) THEN Block
Statement 1 matches on the label generated from the rule causing a false positive.
Statement 2 contains exception conditions for when you don’t want the rule to evaluate because it’s causing false positives.

Consider a scenario where redirection requests to your application are blocked due to the rule GenericRFI_QUERYARGUMENTS in the managed rule group you’re using. This rule inspects the value of all query parameters and blocks requests that attempt to exploit remote file inclusion (RFI) in web applications, such as :// embedded midway through a URL. An example of a legitimate redirection request that could be blocked due to the characters :// present in the query argument scope could be as follows:

https://ourdomain.com/sso/complete?scope=email profile https://www.redirect-domain.com/auth/email https://www.redirect-domain.com/auth/profile

To prevent similar legitimate requests from being blocked, you can write a custom rule to match based on the label.

Step 1: Set the specific managed rule group to count mode

The first step is to set the specific managed rule to count mode, so that labels are added to the matching requests. Next, the priority of the managed rule must be set higher than the priority of the custom rule.

To set the specific managed rule group to count mode

  1. In your AWS WAF console, navigate to your web ACL and select the Rules tab. Choose Add Rule, and then select Add managed rule groups.
  2. Select AWS managed rule groups.
  3. Under Free rule groups, look for Core rule set and add it to your web ACL by selecting the toggle Add to web ACL.
  4. Choose Edit.
  5. From the list of rules, set the rule generating false positives to the Count action, by selecting the Count toggle beside the rule. This example changes the action for the rule GenericRFI_QUERYARGUMENTS to Count. This ensures that all the matching requests are sent to the subsequent WAF rules in order of priority and adds the label awswaf:managed:aws:commonruleset:GenericRFI_QueryArguments whenever there’s a matching request.
  6. Choose on Save rule.
  7. Choose Add rules again to go to the next window where you can set the rule priority. The managed rule must have a higher priority than the custom rule that you will create in the next steps.
  8. Choose Save to save the configuration.

Step 2: Add a custom rule to the web ACL with lower priority than the managed rule

Create a custom rule in the web ACL that blocks requests if it has the label that you are looking for and doesn’t have the exception condition that caused the false positive. The priority of this custom rule should be set lower than the managed rule.

To add a custom rule with lower priority than the managed rule

  1. In your AWS WAF console, navigate to your web ACL Rules tab and choose Add Rule and select Add my own rules and rule groups.
  2. Select Rule Builder for the rule type.
  3. Enter a Rule Name and select Regular Rule as the Type.
  4. Use the If a request drop down to select matches all the statements (AND).
  5. Statement 1 checks if the request has the label that you’re looking for. In this example it is configured with the following details:
    Inspect: Select Has a label
    Labels: Select Label
    Match key: Select awswaf:managed:aws:commonruleset:GenericRFI_QueryArguments
  6. All subsequent statements must be negated so that the requests don’t match the statement criteria and will be treated as legitimate requests. In this example, we set NOT Statement 2, that checks if the request contains https://www.redirect-domain.com/ in its query string:
    Enable: Select Negate statement results
    Inspect: Select All query parameters
    Match type: Select Contains string
    String to Match: Enter https://www.redirect-domain.com/
    Text transformation: Select None
  7. Under Action, select Block and choose Add rule.
  8. In the Set rule Priority window, set the rule priority of your custom rule to lower than the AWS Managed Rules rule.
  9. Choose Save.

Case 3: Use a scope-down statement to narrow the scope of traffic matching a managed rule group

A scope-down statement can be added to any rule group to narrow the scope of the requests that a rule group evaluates. This allows you to either filter in the requests that you want the rule group to inspect or filter out any requests that doesn’t meet the criteria.

Consider a case where you have a list of trusted IP address that you don’t want to be evaluated against AmazonIPReputationList. You can avoid blocking these trusted IP addresses by using a scope-down statement to exclude the traffic from evaluation.

Step 1: Create the IP set for allowed list of IPs

The first step is to create an IP set that contains the allowed list of IPs. The IP set can be created for a particular AWS Region, or can be global if the web ACL is associated with an Amazon CloudFront distribution.

To create an IP set

  1. Choose IP sets in the AWS WAF console and then choose Create IP set.
  2. In IP set name enter Allowed IPs. Enter the IPs that you want to allow in IP addresses. Choose Create IP set when done.

Figure 5: Console screenshot creating an IP setFigure 5: Console screenshot creating an IP set

Step 2: Add a scope-down statement to the managed rule group

Once you have created the IP set, you can add a scope down statement in your managed rule group so that traffic originating from the IPs in the IP set aren’t evaluated against the rules in the managed rule group.

To add a scope-down statement

  1. On the Rules tab of you your web ACL, choose Add Rule and select Add managed rule groups.
  2. Select AWS managed rule groups.
  3. Under Free rule groups, turn on Amazon IP reputation list to add it to the web ACL and choose Edit.
  4. Select Enable scope-down statement.

Figure 6: Console screenshot showing enabling the scope-down statementFigure 6: Console screenshot showing enabling the scope-down statement

  1. Add the condition so that only the requests that don’t originate from the allowed IPs list created earlier are evaluated for this rule group. Use the If a request drop down to select doesn’t match the statements (NOT).
    Inspect: Select Originates from an IP address in
    IP set: Select Allowed-IPs
    IP address to use as the originating address: Select Source IP address

Figure 7: Scope down statement configuration console screenshotFigure 7: Scope down statement configuration console screenshot

  1. Choose Save rule to add this rule to your web ACL.

Case 4: Use custom responses to change the default block action for a managed rule group

AWS WAF sends back response code 403 (forbidden) when it blocks an incoming request. You can use the custom response feature to instead send a custom HTTP response back to the client when the rule blocks access. Using the custom response, you can customize the status code, response headers, and response body.

Let’s say you want to respond back to a client who might be connecting to your application over VPN. You want to use a custom response to inform the user that this behavior is discouraged, by sending error code 400 (Bad Request) and a static body message (“Please don’t try to connect over a VPN”). To do this, you can use the AWS Managed Rule group AWSManagedRulesAnonymousIpList and then set up custom rules using the label awswaf:managed:aws:anonymous-ip-list:AnonymousIPList.

Step 1: Create a custom response body

The first step in creating a custom response is to create a custom response body. This is the message that will be shown when the custom response is sent.

To create a custom response body

  1. In the AWS WAF console, open your web ACL and select the Custom response bodies tab.
  2. Choose Create custom response body.
  3. In Response body object name, enter a name for this response—for example, Custom-body-IP-list.
  4. Choose a Content type for the response body.
  5. In Response body, enter the response that you want to send back to the client.
  6. Choose Save.

Figure 8: Custom response body creation on the AWS WAF consoleFigure 8: Custom response body creation on the AWS WAF console

Step 2: Override the actions of the managed rule group

The rule you use to send your custom response should be in count mode. This will ensure that all the matching requests are sent to the subsequent WAF rules in priority order. In the following example, the rule AnonymousIPList in the managed rule group AWSManagedRulesAnonymousIpList is set to count mode. For more details on how to override the action of a managed rule group, see Overriding the actions of a rule group or its rules.

Figure 9: console screenshot overriding an AWS Managed Rules ruleFigure 9: console screenshot overriding an AWS Managed Rules rule

Step 3: Create a rule to block the request and send a custom response back to the client

You’ll use the AWS WAF labels feature for this step. As explained in Case 2 above, you need to create a custom rule that matches the label generated by the managed rule. In this case the, custom rule should be configured to send your custom response.

To create a custom rule

  1. Expand the Custom response section and select Enable.
  2. Under Response code, enter the custom HTTP status code you want to send back to the client.
  3. (Optional) Use the Response headers section if you wish to add a custom response header
  4. Under Choose how you would like to specify the response body, select the custom response body you created in Step 1.
  5. (Optional) If you wish to generate additional labels to track activity in logs, you can use Add label.
  6. Choose Add rule.
  7. Set the rule priority of your custom rule to lower than the AWS Managed Rules rule.
  8. Choose Save.

Figure 10: Console screenshot configuring a custom response body for a ruleFigure 10: Console screenshot configuring a custom response body for a rule

Summary

In this post, we demonstrated how the new AWS WAF features such as labels, version management, scope-down statements, and custom responses can help you customize the behavior of AWS Managed Rules to protect your web applications and minimize risk. You can use these features in various ways, such as customizing AWS Managed Rules by combining labels and request properties to allow or block requests, and using labels to help filter logs.

You can learn more about AWS WAF in other AWS WAF–related Security Blog posts.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Madhu Kondur

Madhu Kondur

Madhu is a cloud support engineer at AWS. He’s passionate about helping customers solve their AWS issues. He specializes in network security and enjoys helping customers get the best cloud experience possible through AWS.

Venugopal Pai

Venugopal Pai

Venugopal is a solutions architect at AWS. He lives in Bengaluru, India, and helps customers scale and optimize their applications in AWS.

Hardening the security of your AWS Elastic Beanstalk Application the Well-Architected way

Post Syndicated from Laurens Brinker original https://aws.amazon.com/blogs/security/hardening-the-security-of-your-aws-elastic-beanstalk-application-the-well-architected-way/

Launching an application in AWS Elastic Beanstalk is straightforward. You define a name for your application, select the platform you want to run it on (for example, Ruby), and upload the source code. The default Elastic Beanstalk configuration is intended to be a starting point which prioritizes simplicity and ease of setup. This allows you to quickly deploy a web application on the AWS Cloud. For increased security of production applications, we recommend additional steps you can take to complement the default configuration.

In this post we will describe our recommendations, which are aligned with the AWS Well-Architected Framework, to help you harden the security posture of your Elastic Beanstalk applications. The Well-Architected Framework provides best practices to help you build secure, high-performing, resilient, and efficient infrastructure for your applications and workloads. Focusing on the Security pillar of the framework, we will walk you through additional configurations for increased network protection and protection of data at rest and in transit.

Introduction to Elastic Beanstalk

Elastic Beanstalk is an orchestration service that provisions and operates infrastructure in the AWS Cloud. You can use Elastic Beanstalk to deploy and manage applications in the cloud. Elastic Beanstalk supports many programming languages and frameworks, such as Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker. Elastic Beanstalk can help you decrease overhead by handling tasks such as resource provisioning, load balancing, auto scaling, and health monitoring. You only need to upload the application code. Elastic Beanstalk automatically integrates with other AWS services such as Amazon CloudWatch for logging and monitoring.

Target scenario for this post

This post shows you how to achieve the following things:

  • Launch a highly available Ruby application on Elastic Beanstalk.
  • Attach a MySQL database to the application using Amazon RDS.
  • Protect your sensitive data.
  • Align your application’s security configuration to the Security pillar of the Well-Architected Framework.

Figure 1: Target architecture for the two-tier web application deployed using Elastic Beanstalk

Figure 1: Target architecture for the two-tier web application deployed using Elastic Beanstalk

Figure 1 depicts the target architecture, which is a two-tier web application. Clients resolve the website’s domain name using the Domain Name System (DNS) service Amazon Route 53. An Application Load Balancer (ALB) is used to direct traffic to and from the Amazon EC2 instances which are running the web servers. The EC2 instances are deployed in an Auto Scaling group in private subnets. To ensure that clients can always access the application, the infrastructure is setup so that it can automatically deal with system failures and scale up when there’s an increase in demand. This is done by placing the EC2 instances in the Auto Scaling group across two Availability Zones for high availability. There is also an RDS MySQL database deployed in a private subnet, which is replicated to a stand-by instance in another Availability Zone for disaster recovery. Logs and Metrics are sent to CloudWatch, and Amazon Simple Storage Service (Amazon S3) is used to store logs and source code. Finally, a Network Address Translation (NAT) gateway and Internet gateway manage inbound and outbound traffic to subnets.

The following sections focus on the four main security configurations numbered in Figure 1:

  1. Deploying the EC2 and RDS instances from the web and database layer in private subnets.
  2. Encrypting the logging and source code S3 bucket.
  3. Encrypting the RDS instance and its stand-by replica.
  4. Encrypting data in transit by using the HTTPS protocol.

Strengthening your Elastic Beanstalk application based on the Security pillar of the Well-Architected Framework

To harden the security of your Elastic Beanstalk application, you can build on top of the default setup to incorporate the following security best practices:

  1. Protect networks In the default Elastic Beanstalk setup, the EC2 instances are deployed together with an Application Load Balancer (ALB) in a public subnet. In most cases, EC2 instances do not need to be directly accessible from the internet and therefore should be placed in private subnets. The ALB should be left in the public subnet to provide a single entry-point for inbound traffic from external clients and forward this traffic to the instances over a private network. If these instances need to make a direct outbound connection to the internet, for example to call third-party APIs, we recommend creating a Network Address Translation (NAT) gateway in a public subnet, and adding a route from the private subnet where your instances are running to the NAT Gateway. Your instances can then send requests to the internet and receive corresponding responses through the NAT gateway, but the instances themselves will not be directly accessible from the internet. For more options on interactively accessing instances see AWS Systems Manager.
  2. Protect data at rest We recommend encrypting data at rest. Elastic Beanstalk does not encrypt data stored in Amazon S3 buckets by default, so you should modify the default setup to encrypt the bucket. Similarly, when you set up an RDS database directly through Elastic Beanstalk, you don’t have the option to encrypt the database, so you need to set up your database independently and enable encryption.
  3. Protect data in transit – Web traffic sent between your clients and the ALB over the internet should use HTTPS rather than HTTP. The HTTPS protocol creates an encrypted connection through TLS (Transport Layer Security) between client and server before sending any web traffic. The default setup in Elastic Beanstalk uses HTTP, so the choice to use HTTPS and how to enable it sits with the user. Setting up HTTPS can be done with SSL / TLS server certificates (X.509 certificates) which you manage inside AWS using AWS Certificate Manager or through an external provider. ALB supports TLS-termination, which means that it takes care of the encryption and decryption of the traffic communicated with clients, and then forwards the traffic to the instances over the AWS private network.

Implementing the recommended best practices for your application

To implement the best practices from the section above, you will take the following steps to launch your application, protect networks and to protect data at rest and in transit:

  1. Create your own VPC with public and private subnets.
  2. Create a highly-available Elastic Beanstalk application.
  3. Modify the configuration to deploy instances in private subnets.
  4. Encrypt the log and source code bucket.
  5. Launch an encrypted RDS instance.
  6. Set up encryption in-transit by using the HTTPS protocol.

Create your VPC with public and private subnets

  1. In the AWS Management Console, go to VPC, and select Launch VPC wizard.
  2. Select the VPC with Public and Private Subnets option on the left-hand side, as shown in Figure 2.

Figure 2: Launch VPC wizard

Figure 2: Launch VPC wizard

  1. Click Select.
  2. Adjust the VPC specifications as needed. Specify a CIDR range and a name for the VPC. For the private and public subnets, you need to additionally specify the subnets CIDR range as well as which Availability Zone they should be created in. In order for instances in the private subnet to access the internet, the set-up creates a NAT gateway that resides in the public subnet. In order to do that, you need to specify an Elastic IP ID. If you don’t have an Elastic IP yet, under the VPC console go to Elastic IP addresses, click on Allocate Elastic IP address and Allocate. Use the Allocation ID in the VPC wizard.
  3. Select Create VPC.
  4. Because the target architecture is highly available, another set of public and private subnets needs to be created and set to reside in a different Availability Zone from the subnets you configured in step 4. This is done by going to the Subnets section in the VPC Console. Click on Create subnet, select the VPC you just created, add a new subnet, making sure to assign it to a different Availability Zone. Press Add new subnet to add a second subnet on the same configuration page. When done, press Create subnet.
  5. By default, the subnets will use the main routing table, which will treat them as private subnets. In order to make one of the newly created subnets public, it needs to be added to the route table, which has a route to the Internet Gateway. Go to the Route Tables section in the VPC Console and find the route table associated with your newly created VPC, which has the route to the Internet Gateway. This should be the Route Table which has 1 explicit subnet association. Click on the Route Table’s ID, and verify that there’s a route to a target with the igw- prefix. Then, under the Subnet association tab, edit the explicit subnet associations to include the newly created subnet.

After this is done, you should have two public and two private subnets across two Availability Zones for your new VPC.

Create a highly available Elastic Beanstalk application

The following steps will show you how to create a highly available Elastic Beanstalk application.

  1. In the AWS Management Console, choose Elastic Beanstalk, and then, in the Get Started section, select Create Application.
  2. Provide a name for the application and define the platform it should run on. In our example, the platform is Ruby.
  3. Provide the source code for your web application or use the sample code provided in the Elastic Beanstalk setup console.
    • To use the sample code, select Sample Application.
    • To upload your own source code, in the Source code origin section, for Version label, enter the name of your application code, and then for Source code origin, choose Local file, select Choose File, and navigate to the file that you want to use, as shown in Figure 3.

Figure 3: Source code origin section of the Elastic Beanstalk console

Figure 3: Source code origin section of the Elastic Beanstalk console

  1. Select Configure more options
  2. Depending on your application’s needs, you can select a configuration preset that includes recommended values for several configurations. Select High Availability to include a load balancer and auto scaling for multiple Availability Zones.

Deploy your instances in private subnets

In this step, you will set up Elastic Beanstalk to deploy the Application Load Balancer in public subnets to provide a point of access for inbound traffic from the internet, and you deploy the EC2 instances in a private subnet.

While still in the Configure more options settings:

  1. In the Network section, select Edit, and then, from the dropdown list, select the VPC that you just created.
  2. To deploy your instances in private subnets, in the Load balancer settings section, for Load balancer subnets, check the box next to each public subnet, and in the Instance settings section, for Instance subnets, check the box next to each private subnet, as shown in Figure 4.
Figure 4: Elastic Beanstalk subnet settings for Load Balancer and instances

Figure 4: Elastic Beanstalk subnet settings for Load Balancer and instances

  1. Select Save.

Encrypt the log and source code bucket and block public access

After Elastic Beanstalk has created the application, you can encrypt the S3 bucket.

  1. Open the S3 console and choose the bucket that was created automatically as part of the Elastic Beanstalk setup. The bucket name will have the following structure: elasticbeanstalk-region-account-id.
  2. To encrypt the bucket, choose Properties, and then, for Default Encryption, select Edit, and for Server-side encryption, select Enable.
  3. For Encryption key type, you can use an S3-managed encryption key by selecting Amazon S3 key (SSE-S3). If you want more control over the keys used for encryption, select AWS Key Management Service key (SSE-KMS), which is an encryption key protected by AWS Key Management Service (KMS). Here, you can specify to use an AWS managed key or one of your own Customer managed keys from KMS. For more information on SSE-KMS, visit Protecting Data Using Server-Side Encryption with KMS keys Stored in AWS Key Management Service (SSE-KMS).
  4. Select Save changes.

Even though the bucket that was created by Elastic Beanstalk is non-public by default, we recommend to enable “S3 Block Public Access” at the account level or at least at the bucket level to prevent tampering or accidentally changing this setting in the future.

  1. In your S3 console, click on Block Public Access settings for this account.
  2. If Block all public access is not yet enabled, click on Edit, check the box next to Block all public access and click Save.
  3. Apart from that, you can also block public access at the bucket level. For this, click on the respective bucket, open the Permissions section and edit Block public access (bucket settings) similarly to how you did in step 2.

Launch an encrypted RDS instance

Elastic Beanstalk allows you to set up and run RDS instances in your Elastic Beanstalk environment. Until recently, the database was tied to the lifecycle of the Elastic Beanstalk environment, and its use was recommended to be limited to development and testing environments only. For example, if you previously launched an RDS instance using Elastic Beanstalk, and the Elastic Beanstalk environment was terminated, the RDS instance would also be deleted. As of October 6, 2021, Elastic Beanstalk now supports Database Decoupling, so that the database will persist when the environment is deleted.

However, Elastic Beanstalk currently does not allow you to set up encryption for your database. For this reason, this post shows you how to set up your Elastic Beanstalk application with a decoupled database, by creating the database directly in the RDS service, separate from your Elastic Beanstalk application. RDS allows you to encrypt your database.

Decoupling your database and setting it up directly through the RDS service in the AWS console will require additional steps for integration with your Elastic Beanstalk application, which this post will walk you through.

Note: If you are using the Elastic Beanstalk service to create your RDS instance, you can select one of three options:

  • The first option, enabled by selecting the Create snapshot retention option in the database settings in the Elastic Beanstalk console, makes sure that Elastic Beanstalk creates a snapshot of your database prior to termination. You can restore an existing snapshot of your database through the Elastic Beanstalk console. Bear in mind that there will be downtime of your database between snapshot creation and snapshot restore.
  • The second option, Retain, creates a decoupled database, which persists if the Elastic Beanstalk environment has been terminated.
  • The third option Delete removes the database upon termination.

In this step, you will create an encrypted RDS database, allow access to the database from your application’s instances only, and add the required environment variables to your application so you can use your database in the application.

  1. On the RDS service page in the console, select Create database.
  2. For the database creation method, select Standard create.
  3. For Engine options, choose MySQL and select the latest version.
  4. For Templates choose either the Dev/Test or Production template according to your use case.
  5. In the Settings section, provide a name to use as the database identifier and set a username and password.
  6. Select the appropriate DB instance class that meets your processing power and memory requirements.
  7. For Storage, select your storage type and allocate storage.
  8. If you need Multi-AZ deployment, in the Availability & durability section, choose Create a standby instance.
  9. In the Connectivity section, select the VPC that you created in the Create your VPC with public and private subnets section earlier in this blog post, and verify that Public access has been set to No. For VPC security group, choose Create new and provide a name to identify the group later on.
  10. In the Additional configuration section, under Encryption, choose Enable Encryption. You can choose the default AWS KMS key if you’re happy with AWS managing the keys, or provide a custom key if you want more control. Bear in mind that the encryption option cannot be changed after the database has been created.
  11. Leave the defaults for the remaining settings and select Create database.

After you set up the RDS database and your new Elastic Beanstalk application, you can add the database to your application.

  1. In the RDS console, go to your newly created RDS database and scroll down to Security group rules.
  2. Select the security group that has the CIDR/IP – Inbound type.
  3. Under Inbound rules, select the rule that is listed, and then select Edit inbound rules.
  4. Under the Source column, make sure Custom is selected, and in the search-box next to it, select the security group associated with your Elastic Beanstalk Auto Scaling group.

Important: As a security best practice, you should allow traffic to your RDS database from your instances only. Therefore, make sure the security group allows traffic only from the Auto Scaling group’s security group, and that it has no additional entries.

  1. To add the RDS details to the Elastic Beanstalk environment properties, go to your application’s environment in the Elastic Beanstalk service and navigate to Configuration > Software > Edit > Environment Properties. Add RDS_HOSTNAME, RDS_PORT, RDS_DB_NAME, RDS_USERNAME and RDS_PASSWORD as properties and provide the values that you used to set up the database.
  2. Restart the application by going back to your Elastic Beanstalk environment, and then under Environment actions, choose Restart app server(s).

After the server has restarted, you can access the RDS database in your web application by using the environment properties you set in the console, just as you would if you attached the database directly through the Elastic Beanstalk setup. For more information on using environment properties, visit Environment properties and other software settings.

The new database is now separate from your application and it is encrypted to provide data protection at rest.

Important: The environment properties, including the database username and password, are visible and stored in plain text in the Environment Properties in Elastic Beanstalk.

Depending on your security requirements, you can choose to use AWS Secrets Manager to protect your database credentials, which you can then fetch programmatically in your Elastic Beanstalk instance or through Elastic Beanstalk’s custom environment configuration files (.ebextensions). To learn more about using Secrets Manager to protect and rotate database credentials, see Rotate Amazon RDS database credentials automatically with AWS Secrets Manager. However, this will require additional configuration for Elastic Beanstalk and is beyond the scope of this post.

A second possibility is to use IAM database authentication, which allows you to use your Elastic Beanstalk’s EC2 IAM role to connect to your database. This method leverages short-lived authentication tokens rather than a static database password. In order to set this up, you need to enable IAM database authentication, create an IAM policy to allow IAM database access and create a database account for IAM authentication using the AWSAuthenticationPlugin (for MySQL). Authentication tokens are valid for 15 minutes, and if your web instances need to create a new database connection, or reconnect, authentication tokens will need to be refreshed if they have expired, otherwise the connection will be rejected.

For an implementation guide, check out How do I allow users to authenticate to an Amazon RDS MySQL DB instance using their IAM credentials. For Ruby applications, you can get the authentication token in your application by leveraging the auth_token_generator method in the Ruby aws-sdk.

Set up encryption in transit using the HTTPS protocol

In the Elastic Beanstalk architecture, you can encrypt data in transit at three connection points: from your clients to the load balancer, from the load balancer to the EC2 instances, and from the EC2 instances to the RDS database.

Securing the connection from clients to the ALB

You can use a custom domain name to use HTTPS for your Elastic Beanstalk environment and have your clients can connect securely to your environment. If you don’t have a domain name, you can assign a self-signed server certificate to your ALB to use HTTPS for development and testing purposes.

To secure the connection to your ALB, add a HTTPS listener for the traffic inbound port (typically 443) and attach an TLS / SSL server certificate (X.509 certificate). To generate certificates, use AWS Certificate Manager or third-party providers such as Let’s Encrypt. For a walkthrough on how to set up an HTTPS listener through the console or through .ebextensions configuration files, see the Configuring your Elastic Beanstalk environment’s load balancer to terminate HTTPS.

Securing the connection from the ALB to the EC2 instances

While securing the connection between clients and the ALB is enough for most applications, in some cases a complete end-to-end encryption may be required; for example, to comply with (external) regulations. To secure the connection from your ALB to your application running on an EC2 instance, you must use the .ebextensions configuration files to modify the software running on the instance. You then need to allow the HTTPS traffic to pass through from the ALB to your EC2 instance by allowing inbound traffic on port 443 on the instance’s security group. For a Ruby specific example, see Terminating HTTPS on EC2 instances running Ruby.

For a complete end-to-end encryption walkthrough, see How can I configure HTTPS for my Elastic Beanstalk environment?

Securing the RDS connection

To securely connect from your application to your RDS database, you can use SSL or TLS to encrypt the connection. You will need to download an RDS root certificate and require your application to use this certificate when connecting to the RDS instance to verify the RDS server certificate. For more information on how to download and use the root certificate to setup a secure RDS connection, see the Using SSL with a MySQL DB instance documentation page.

This post has shown you how to align your application with some of the security best practices of the Well-Architected Framework. After completing these steps, your architecture includes four key modifications to improve security:

  1. The EC2 and RDS instances are deployed in a private subnet.
  2. The logging and source code S3 bucket is encrypted.
  3. An encrypted RDS instance is attached.
  4. Encryption occurs in transit by using the HTTPS protocol.

Conclusion

In this post, we’ve covered the additional configuration you should be aware of to harden the security posture of your Elastic Beanstalk applications, aligning to the Security pillar of the Well-Architected Framework. The final setup you created uses a VPC and private subnets to allow internet access only to resources that require it, and provides encryption at rest and in transit using AWS Cloud Security services and secure protocols. The Well-Architected Framework describes additional concepts, design principles, and architectural best practices for designing and running workloads in the cloud. To learn more, see AWS Well-Architected.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Laurens Brinker

Laurens Brinker

Laurens Brinker is an Associate Solutions Architect based in London who is part of the Security Community at AWS. Laurens joined AWS as part of the TechU Graduate program in 2020 and now helps customers running their workloads securely in the AWS Cloud. Outside of work, Laurens enjoys cycling, a casual game of chess, and building small web applications.

Katja Philipp

Katja Philipp

Katja Philipp is an Associate Solutions Architect based in Germany. With a background in M.Sc. Information Systems, she joined AWS in September 2020 with the TechU Graduate program. She enables her customers in the Power & Utilities vertical with best practices around their cloud journey. Katja is passionate about sustainability and how technology can be leveraged to solve current challenges for a better future.

Laura Verghote

Laura Verghote

Laura Verghote is an Associate Technical Trainer based in London, UK. With a background in electrical engineering, she joined AWS in September 2020 with the TechU Graduate program. She delivers a variety of technical trainings to AWS customers across EMEA.

Kimessha Paupamah

Kimessha Paupamah

Kimessha Paupamah is a ProServe Consultant based in South Africa. With a background in Computer Science, she joined AWS in September 2020 with the TechU Graduate program. She accelerates customer business outcomes through guidance on how to architect, design, develop and implement the AWS platform. Kimessha is passionate about enabling customers to build innovative solutions in the cloud.

Benjamin Richer

Benjamin Richer

Benjamin Richer is an Associate Solutions Architect based in Paris. With a background in Network & Telecom, he joined AWS in 2020 through the TechU Graduate Program. Currently working in the Digital Native Business segment he helps grown up Startups optimizing their workload in the Cloud.

Choose the right storage tier for your needs in Amazon OpenSearch Service

Post Syndicated from Changbin Gong original https://aws.amazon.com/blogs/big-data/choose-the-right-storage-tier-for-your-needs-in-amazon-opensearch-service/

Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) enables organizations to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open-source, distributed search and analytics suite derived from Elasticsearch. Amazon OpenSearch Service offers the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions).

In this post, we present three storage tiers of Amazon OpenSearch Service—hot, UltraWarm, and cold storage—and discuss how to effectively choose the right storage tier for your needs. This post can help you understand how these storage tiers integrate together and what the trade-off is for each storage tier. To choose a storage tier of Amazon OpenSearch Service for your use case, you need to consider the performance, latency, and cost of these storage tiers in order to make the right decision.

Amazon OpenSearch Service storage tiers overview

There are three different storage tiers for Amazon OpenSearch Service: hot, UltraWarm, and cold. The following diagram illustrates these three storage tiers.

Hot storage

Hot storage for Amazon OpenSearch Service is used for indexing and updating, while providing fast access to data. Standard data nodes use hot storage, which takes the form of instance store or Amazon Elastic Block Store (Amazon EBS) volumes attached to each node. Hot storage provides the fastest possible performance for indexing and searching new data.

You get the lowest latency for reading data in the hot tier, so you should use the hot tier to store frequently accessed data driving real-time analysis and dashboards. As your data ages, you access it less frequently and can tolerate higher latency, so keeping data in the hot tier is no longer cost-efficient.

If you want to have low latency and fast access to the data, hot storage is a good choice for you.

UltraWarm storage

UltraWarm nodes use Amazon Simple Storage Service (Amazon S3) with related caching solutions to improve performance. UltraWarm offers significantly lower costs per GiB for read-only data that you query less frequently and don’t need the same performance as hot storage. Although you can’t modify the data while in UltraWarm, you can move the data to the hot storage tier for edits before moving it back.

When calculating UltraWarm storage requirements, you consider only the size of the primary shards. When you query for the list of shards in UltraWarm, you still see the primary and replicas listed. Both shards are stubs for the same, single copy of the data, which is in Amazon S3. The durability of data in Amazon S3 removes the need for replicas, and Amazon S3 abstracts away any operating system or service considerations. In the hot tier, accounting for one replica, 20 GB of index uses 40 GB of storage. In the UltraWarm tier, it’s billed at 20 GB.

The UltraWarm tier acts like a caching layer on top of the data in Amazon S3. UltraWarm moves data from Amazon S3 onto the UltraWarm nodes on demand, which speeds up access for subsequent queries on that data. For that reason, UltraWarm works best for use cases that access the same, small slice of data multiple times. You can add or remove UltraWarm nodes to increase or decrease the amount of cache against your data in Amazon S3 to optimize your cost per GB. To dial in your cost, be sure to test using a representative dataset. To monitor performance, use the WarmCPUUtilization and WarmJVMMemoryPressure metrics. See UltraWarm metrics for a complete list of metrics.

The combined CPU cores and RAM allocated to UltraWarm nodes affects performance for simultaneous searches across shards. We recommend deploying enough UltraWarm instances so that you store no more than 400 shards per ultrawarm1.medium.search node and 1,000 shards per ultrawarm1.large.search node (including both primaries and replicas). We recommend a maximum shard size of 50 GB for both hot and warm tiers. When you query UltraWarm, each shard uses a CPU and moves data from Amazon S3 to local storage. Running single or concurrent queries that access many indexes can overwhelm the CPU and local disk resources. This can cause longer latencies through inefficient use of local storage, and even cause cluster failures.

UltraWarm storage requires OpenSearch 1.0 or later, or Elasticsearch version 6.8 or later.

If you have large amounts of read-only data and want to balance the cost and performance, use UltraWarm for your infrequently accessed, older data.

Cold storage

Cold storage is optimized to store infrequently accessed or historical data at $0.024 per GB per month. When you use cold storage, you detach your indexes from the UltraWarm tier, making them inaccessible. You can reattach these indexes in a few seconds when you need to query that data. Cold storage is a great fit for scenarios in which a low ROI necessitates an archive or delete action on historical data, or if you need to conduct research or perform forensic analysis on older data with Amazon OpenSearch Service.

Cold storage doesn’t have specific instance types because it doesn’t have any compute capacity attached to it. You can store any amount of data in cold storage.

Cold storage requires OpenSearch 1.0 or later, or Elasticsearch version 7.9 or later and UltraWarm.

Manage storage tiers in OpenSearch Dashboards

OpenSearch Dashboards installed on your Amazon OpenSearch Service domain provides a useful UI for managing indexes in different storage tiers on your domain. From the OpenSearch Dashboards main menu, you can view all indexes in hot, UltraWarm, and cold storage. You can also see the indexes managed by Index State Management (ISM) policies. OpenSearch Dashboards enables you to migrate indexes between UltraWarm and cold storage, and monitor index migration status, without using the AWS Command Line Interface (AWS CLI) or configuration API. For more information on OpenSearch Dashboards, see Using OpenSearch Dashboards with Amazon OpenSearch Service.

Cost considerations

The hot tier requires you to pay for what is provisioned, which includes the hourly rate for the instance type. Storage is either Amazon EBS or a local SSD instance store. For Amazon EBS-only instance types, additional EBS volume pricing applies. You pay for the amount of storage you deploy.

UltraWarm nodes charge per hour just like other node types, but you only pay for the storage actually stored in Amazon S3. For example, although the instance type ultrawarm1.large.elasticsearch provides up to 20 TiB addressable storage on Amazon S3, if you only store 2 TiB of data, you’re only billed for 2 TiB. Like the standard data node types, you also pay an hourly rate for each UltraWarm node. For more information, see Pricing for Amazon OpenSearch Service.

Cold storage doesn’t incur compute costs, and like UltraWarm, you’re only billed for the amount of data stored in Amazon S3. There are no additional transfer charges when moving data between cold and UltraWarm storage.

Example use case

Let’s look at an example with 1 TB of source data per day, 7 days hot, 83 days warm, 365 days cold. For more information on sizing the cluster, see Sizing Amazon OpenSearch Service domains.

For hot storage, you can go through a baseline estimation with the calculation as: storage needed = (daily source data in bytes * 1.25) * (number_of_replicas + 1) * number of days retention. With the best practice for two replicas, we should use two replicas here. The minimum storage requirement to retain 7 TB of data on the hot tier is (7TB*1.25)*(2+1)= 26.25 TB. For this amount of storage, we need 6x R6g.4xlarge.search instances given the Amazon EBS size limit.

We also need to verify from the CPU side, we need 25 primary shards (1TB*1.25/50GB) =25. We have two replicas. With that, we have total 75 active shards. With that, the total vCPU needed is 75*1.5=112.5 vCPU. This means 8x R6g.4xlarge.search instances. This also requires three dedicated c6g.xlarge.search leader nodes.

When calculating UltraWarm storage requirements, you consider only the size of the primary shards, because that’s the amount of data stored in Amazon S3. For this example, the total primary shard size for warm storage is 83*1.25=103.75 TB. Each ultrawarm1.large.search instance has 16 CPU cores and can address up to 20 TiB of storage on Amazon S3. A minimum of six ultrawarm1.large.search nodes is recommended. You’re charged for the actual storage, which is 103.75 TB.

For cold storage, you only pay for the cost of storing 365*1.25=456.25 TB on Amazon S3. The following table contains a breakdown of the monthly costs (USD) you’re likely to incur. This assumes a 1-year reserved instance for the cluster instances with no upfront payment in the US East (N. Virgina) Region.

Cost Type Pricing Usage Cost per month
Instance Usage R6g.4xlarge.search = $0.924 per hour 8 instances * 730 hours in a month = 5,840 hours 5,840 hours * $0.924 = $5,396.16
c6g.xlarge.search = $0.156 per hour 3 instances (leader nodes) * 730 hours in a month = 2,190 hours 2,190 hours * $0.156 = $341.64
ultrawarm1.large.search = $2.68 per hour 6 instances * 730 hours = 4,380 hours 4,380 hours * $2.68 = $11,738.40
Storage Cost Hot storage cost (Amazon EBS) EBS general purpose SSD (gp3) = $0.08 per GB per month 7 days host = 26.25TB 26,880 GB * $0.08 = $2,150.40
UltraWarm managed storage cost = $0.024 per GB per month 83 days warm = 103.75 TB per month 106,240 GB * $0.024 = $2,549.76
Cold storage cost on Amazon S3 = $0.022 per GB per month 365 days cold = 456.25 TB per month 467,200 GB * $0.022 = $10,278.40

The total monthly cost is $32,454.76. The hot tier costs $7,888.20, UltraWarm costs $14,288.16, and cold storage is $10,278.40. UltraWarm allows 83 days of additional retention for slightly more cost than the hot tier, which only provides 7 days. For nearly the same cost as the hot tier, the cold tier stores the primary shards for up to 1 year.

Conclusion

Amazon OpenSearch Service supports three integrated storage tiers: hot, UltraWarm, and cold storage. Based on your data retention, query latency, and budgeting requirements, you can choose the best strategy to balance cost and performance. You can also migrate data between different storage tiers. To start using these storage tiers, sign in to the AWS Management Console, use the AWS SDK, or AWS CLI, and enable the corresponding storage tier.


About the Author

Changbin Gong is a Senior Solutions Architect at Amazon Web Services (AWS). He engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services. In his spare time, Changbin enjoys reading, running, and traveling.

Rich Giuli is a Principal Solutions Architect at Amazon Web Service (AWS). He works within a specialized group helping ISVs accelerate adoption of cloud services. Outside of work Rich enjoys running and playing guitar.

Insulating AWS Outposts Workloads from Amazon EC2 Instance Size, Family, and Generation Dependencies

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/insulating-aws-outposts-workloads-from-amazon-ec2-instance-size-family-and-generation-dependencies/

This post is written by Garry Galinsky, Senior Solutions Architect.

AWS Outposts is a fully managed service that offers the same AWS infrastructure, AWS services, APIs, and tools to virtually any datacenter, co-location space, or on-premises facility for a truly consistent hybrid experience. AWS Outposts is ideal for workloads that require low-latency access to on-premises systems, local data processing, data residency, and application migration with local system interdependencies.

Unlike AWS Regions, which offer near-infinite scale, Outposts are limited by their provisioned capacity, EC2 family and generations, configured instance sizes, and availability of compute capacity that is not already consumed by other workloads. This post explains how Amazon EC2 Fleet can be used to insulate workloads running on Outposts from EC2 instance size, family, and generation dependencies, reducing the likelihood of encountering an error when launching new workloads or scaling existing ones.

Product Overview

Outposts is available as a 42U rack that can scale to create pools of on-premises compute and storage capacity. When you order an Outposts rack, you specify the quantity, family, and generation of Amazon EC2 instances to be provisioned. As of this writing, five EC2 families, each of a single generation, are available on Outposts (m5, c5, r5, g4dn, and i3en). However, in the future, more families and generations may be available, and a given Outposts rack may include a mix of families and generations. EC2 servers on Outposts are partitioned into instances of homogenous or heterogeneous sizes (e.g., large, 2xlarge, 12xlarge) based on your workload requirements.

Workloads deployed through AWS CloudFormation or scaled through Amazon EC2 Auto Scaling generally assume that the required EC2 instance type will be available when the deployment or scaling event occurs. Although in the Region this is a reasonable assumption, the same is not true for Outposts. Whether as a result of competing workloads consuming the capacity, the Outpost having been configured with limited capacity for a given instance size, or an Outpost update resulting in instances being replaced with a newer generation, a deployment or scaling event tied to a specific instance size, family, and generation may encounter an InsufficentInstanceCapacity error (ICE). And this may occur even though sufficient unused capacity of a different size, family, or generation is available.

EC2 Fleet

Amazon EC2 Fleet simplifies the provisioning of Amazon EC2 capacity across different Amazon EC2 instance types and Availability Zones, as well as across On-Demand, Amazon EC2 Reserved Instances (RI), and Amazon EC2 Spot purchase models. A single API call lets you provision capacity across EC2 instance types and purchase models in order to achieve the desired scale, performance, and cost.

An EC2 Fleet contains a configuration to launch a fleet, or group, of EC2 instances. The LaunchTemplateConfigs parameter lets multiple instance size, family, and generation combinations be specified in a priority order.

This feature is commonly used in AWS Regions to optimize fleet costs and allocations across multiple deployment strategies (reserved, on-demand, and spot), while on Outposts it can be used to eliminate the tight coupling of a workload to specific EC2 instances by specifying multiple instance families, generations, and sizes.

Launch Template Overrides

The EC2 Fleet LaunchTemplateConfigs definition describes the EC2 instances required for the fleet. A specific parameter of this definition, the Overrides, can include prioritized and/or weighted options of EC2 instances that can be launched to satisfy the workload. Let’s investigate how you can use Overrides to decouple the EC2 size, family, and generation dependencies.

Overriding EC2 Instance Size

Let’s assume our Outpost was provisioned with an m5 server. The server is the equivalent of an m5.24xlarge, which can be configured into multiple instances. For example, the server can be homogeneously provisioned into 12 x m5.2xlarge, or heterogeneously into 1 x m5.8xlarge, 3 x m5.2xlarge, 8 x m5.xlarge, and 4 x m5.large. Let’s assume the heterogeneous configuration has been applied.

Our workload requires compute capacity equivalent to an m5.4xlarge (16 vCPUs, 64 GiB memory), but that instance size is not available on the Outpost. Attempting to launch this instance would result in an InsufficentInstanceCapacity error. Instead, the following LaunchTemplateConfigs override could be used:

"Overrides": [
    {
        "InstanceType": "m5.4xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 1.0
    },
    {
        "InstanceType": "m5.2xlarge",
        "WeightedCapacity": 0.5,
        "Priority": 2.0
    },
    {
        "InstanceType": "m5.8xlarge",
        "WeightedCapacity": 2.0,
        "Priority": 3.0
    }
]

The Priority describes our order of preference. Ideally, we launch a single m5.4xlarge instance, but that’s not an option. Therefore, in this case, the EC2 Fleet would move to the next priority option, an m5.2xlarge. Given that an m5.2xlarge (8 vCPUs, 32 GiB memory) offers only half of the resource of the m5.4xlarge, the override includes the WeightedCapacity parameter of 0.5, resulting in two m5.2xlarge instances launching instead of one.

Our overrides include a third, over-provisioned and less preferable option, should the Outpost lack two m5.2xlarge capacity: launch one m5.8xlarge. Operating within finite resources requires tradeoffs, and priority lets us optimize them. Note that had the launch required 2 x m5.4xlarge, only one instance of m5.8xlarge would have been launched.

Overriding EC2 Instance Family

Let’s assume our Outpost was provisioned with an m5 and a c5 server, homogeneously partitioned into 12 x m5.2xlarge and 12 x c5.2xlarge instances. Our workload requires compute capacity equivalent to a c5.2xlarge instance (8 vCPUs, 16 GiB memory). As our workload scales, more instances must be launched to meet demand. If we couple our workload to c5.2xlarge, then our scaling will be blocked as soon as all 12 instances are consumed. Instead, we use the following LaunchTemplateConfigs override:

"Overrides": [
    {
        "InstanceType": "c5.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 1.0
    },
    {
        "InstanceType": "m5.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 2.0
    }
]

The Priority describes our order of preference. Ideally, we scale more c5.2xlarge instances, but when those are not an option EC2 Fleet would launch the next priority option, an m5.2xlarge. Here again the outcome may result in over-provisioned memory capacity (32 vs 16 GiB memory), but it’s a reasonable tradeoff in a finite resource environment.

Overriding EC2 Instance Generation

Let’s assume our Outpost was provisioned two years ago with an m5 server. Since then, m6 servers have become available, and there’s an expectation that m7 servers will be available soon. Our single-generation Outpost may unexpectedly become multi-generation if the Outpost is expanded, or if a hardware failure results in a newer generation replacement.

Coupling our workload to a specific generation could result in future scaling challenges. Instead, we use the following LaunchTemplateConfigs override:

"Overrides": [
    {
        "InstanceType": "m6.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 1.0
    },
    {
        "InstanceType": "m5.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 2.0
    },
    {
        "InstanceType": "m7.2xlarge",
        "WeightedCapacity": 1.0,
        "Priority": 3.0
    }

]

Note the Priority here, our preference is for the current generation m6, even though it’s not yet provisioned in our Outpost. The m5 is what would be launched now, given that it’s the only provisioned generation. However, we’ve also future-proofed our workload by including the yet unreleased m7.

Deploying an EC2 Fleet

To deploy an EC2 Fleet, you must:

  1. Create a launch template, which streamlines and standardizes EC2 instance provisioning by simplifying permission policies and enforcing best practices across your organization.
  2. Create a fleet configuration, where you set the number of instances required and specify the prioritized instance family/generation combinations.
  3. Launch your fleet (or a single EC2 instance).

These steps can be codified through AWS CloudFormation or executed through AWS Command Line Interface (CLI) commands. However, fleet definitions cannot be implemented by using the AWS Console. This example will use CLI commands to conduct these steps.

Prerequisites

To follow along with this tutorial, you should have the following prerequisites:

Create a Launch Template

Launch templates let you store launch parameters so that you do not have to specify them every time you launch an EC2 instance. A launch template can contain the Amazon Machine Images (AMI) ID, instance type, and network settings that you typically use to launch instances. For more details about launch templates, reference Launch an instance from a launch template .

For this example, we will focus on these specifications:

  • AMI image ImageId
  • Subnet (the SubnetId associated with your Outpost)
  • Availability zone (the AvailabilityZone associated with your Outpost)
  • Tags

Create a launch template configuration (launch-template.json) with the following content:

{
    "ImageId": "<YOUR-AMI>",
    "NetworkInterfaces": [
        {
            "DeviceIndex": 0,
            "SubnetId": "<YOUR-OUTPOST-SUBNET>"
        }
    ],
    "Placement": {
        "AvailabilityZone": "<YOUR-OUTPOST-AZ>"
    },
    "TagSpecifications": [
        {
            "ResourceType": "instance",
            "Tags": [
                {
                    "Key": "<YOUR-TAG-KEY>",
                    "Value": "<YOUR-TAG-VALUE>"
                }
            ]
        }
    ]
}

Create your launch template using the following CLI command:

aws ec2 create-launch-template \
  --launch-template-name <YOUR-LAUNCH-TEMPLATE-NAME> \
  --launch-template-data file://launch-template.json

You should see a response like this:

{
    "LaunchTemplate": {
        "LaunchTemplateId": "lt-010654c96462292e8",
        "LaunchTemplateName": "<YOUR-LAUNCH-TEMPLATE-NAME>",
        "CreateTime": "2021-07-12T15:55:00+00:00",
        "CreatedBy": "arn:aws:sts::<YOUR-AWS-ACCOUNT>:assumed-role/<YOUR-AWS-ROLE>",
        "DefaultVersionNumber": 1,
        "LatestVersionNumber": 1
    }
}

The value for LaunchTemplateId is the identifier for your newly created launch template. You will need this value lt-010654c96462292e8 in the subsequent step.

Create a Fleet Configuration

Refer to Generate an EC2 Fleet JSON configuration file for full documentation on the EC2 Fleet configuration.

For this example, we will use this configuration to override a mix of instance size, family, and generation. The override includes three EC2 instance types:

  • m5.large, the instance family and generation currently available on the Outpost.
  • m6.large, a forthcoming family and generation not yet available for Outposts.
  • m7.large, a potential future family and generation.

Create an EC2 fleet configuration (ec2-fleet.json) with the following content (note that the LaunchTemplateId was the value returned in the prior step):

{
    "TargetCapacitySpecification": {
        "TotalTargetCapacity": 1,
        "OnDemandTargetCapacity": 1,
        "SpotTargetCapacity": 0,
        "DefaultTargetCapacityType": "on-demand"
    },
    "OnDemandOptions": {
        "AllocationStrategy": "prioritized",
        "SingleInstanceType": true,
        "SingleAvailabilityZone": true,
        "MinTargetCapacity": 1
    },
    "LaunchTemplateConfigs": [
        {
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-010654c96462292e8",
                "Version": "1"
            },
            "Overrides": [
                {
                    "InstanceType": "m6.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 1.0
                },
                {
                    "InstanceType": "c5.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 2.0
                },
                {
                    "InstanceType": "m5.large",
                    "WeightedCapacity": 0.25,
                    "Priority": 3.0
                },
                {
                    "InstanceType": "m5.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 4.0
                },
                {
                    "InstanceType": "r5.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 5.0
                }


            ]
        }
    ],
    "Type": "instant"
}

Launch the Single Instance Fleet

To launch the fleet, execute the following CLI command (this will launch a single instance, but a similar process can be used to launch multiple):

aws ec2 create-fleet \
  --cli-input-json file://ec2-fleet.json

You should see a response like this:

{
    "FleetId": "fleet-dc630649-5d77-60b3-2c30-09808ef8aa90",
    "Errors": [
        {
            "LaunchTemplateAndOverrides": {
                "LaunchTemplateSpecification": {
                    "LaunchTemplateId": "lt-010654c96462292e8",
                    "Version": "1"
                },
                "Overrides": {
                    "InstanceType": "m6.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 1.0
                }
            },
            "Lifecycle": "on-demand",
            "ErrorCode": "InvalidParameterValue",
            "ErrorMessage": "The instance type 'm6.2xlarge' is not supported in Outpost 'arn:aws:outposts:us-west-2:111111111111:outpost/op-0000ffff0000fffff'."
        },
        {
            "LaunchTemplateAndOverrides": {
                "LaunchTemplateSpecification": {
                    "LaunchTemplateId": "lt-010654c96462292e8",
                    "Version": "1"
                },
                "Overrides": {
                    "InstanceType": "c5.2xlarge",
                    "WeightedCapacity": 1.0,
                    "Priority": 2.0
                }
            },
            "Lifecycle": "on-demand",
            "ErrorCode": "InsufficientCapacityOnOutpost",
            "ErrorMessage": "There is not enough capacity on the Outpost to launch or start the instance."
        }
    ],
    "Instances": [
        {
            "LaunchTemplateAndOverrides": {
                "LaunchTemplateSpecification": {
                    "LaunchTemplateId": "lt-010654c96462292e8",
                    "Version": "1"
                },
                "Overrides": {
                    "InstanceType": "m5.large",
                    "WeightedCapacity": 0.25,
                    "Priority": 3.0
                }
            },
            "Lifecycle": "on-demand",
            "InstanceIds": [
                "i-03d6323c8a1df8008",
                "i-0f62593c8d228dba5",
                "i-0ae25baae1f621c15",
                "i-0af7e688d0460a60a"
            ],
            "InstanceType": "m5.large"
        }
    ]
}

Results

Navigate to the EC2 Console where you will find new instances running on your Outpost. An example is shown in the following screenshot:

EC2 running instances, AWS console, network view, filtered by tag

Although multiple instance size, family, and generation combinations were included in the Overrides, only the c5.large was available on the Outpost. Instead of launching one m6.2xlarge, four c5.large were launched in order to compensate for their lower WeightedCapacity. From the fleet-create response, the overrides were clearly evaluated in priority order with the error messages explaining why the top two overrides were ignored.

Clean up

AWS CLI EC2 commands can be used to create fleets but can also be used to delete them.

To clean up the resources created in this tutorial:

    1. Note the FleetId values returned in the create-fleet command.
    2. Run the following command for each fleet created:
aws ec2 delete-fleets \
  --fleet-ids  \
  --terminate-instances
  1. Note the launch-template-name used in the create-launch-template command.
  2. Run the following command for each fleet created:
{
    "SuccessfulFleetDeletions": [
        {
            "CurrentFleetState": "deleted_terminating",
            "PreviousFleetState": "active",
            "FleetId": "fleet-dc630649-5d77-60b3-2c30-09808ef8aa90"
        }
    ],
    "UnsuccessfulFleetDeletions": []
}
  1. Clean up any resources you created for the prerequisites.

Conclusion

This post discussed how EC2 Fleet can be used to decouple the availability of specific EC2 instance sizes, families, and generation from the ability to launch or scale workloads. On an Outpost provisioned with multiple families of EC2 instances (say m5 and c5) and different sizes (say m5.large and m5.2xlarge), EC2 Fleet can be used to satisfy a workload launch request even if the capacity of the preferred instance size, family, or generation is unavailable.

To learn more about AWS Outposts, check out the Outposts product page. To see a full list of pre-defined Outposts configurations, visit the Outposts pricing page

Building Resilient Well-Architected Workloads Using AWS Resilience Hub

Post Syndicated from Seth Eliot original https://aws.amazon.com/blogs/architecture/building-resilient-well-architected-workloads-using-aws-resilience-hub/

AWS Resilience Hub is a new service that helps you understand and improve the resiliency of your workloads using AWS Well-Architected best practices. As the lead for the Reliability Pillar of AWS Well-Architected, I am eager to share with you how you can use Resilience Hub to ensure your workload architecture is as reliable as you need.

In this blog post, I’ll show you how to use Resilience Hub to assess and improve the resiliency of your architecture based on its recommendations. I’ll start with a single Availability Zone (AZ) architecture, and evolve the architecture using the resiliency recommendations.

Single AZ architecture

Figure 1 shows the single AZ architecture I’m going to start with and assess using Resilience Hub. This simple web server runs on Amazon Elastic Compute Cloud (Amazon EC2). It serves a static web page stored in an Amazon Simple Storage Service (Amazon S3) bucket, and then records web site statistics in a MySQL Amazon Relational Database Service (Amazon RDS) database. A NAT gateway is also deployed so the EC2 servers can make calls out to the internet. When I add my application to Resilience Hub, it will discover my application structure. Then I can use it to assess my application’s resiliency per the instructions in the Measure and Improve Your Application Resilience with AWS Resilience Hub blog post.

Single AZ architecture

Figure 1. Single AZ architecture

Even with only a single Amazon EC2 instance, it is still useful to include an Elastic Load Balancer. This lets you configure health checks performed against the EC2 instance. It also makes it easier to add more EC2 instances later. The Amazon EC2 Auto Scaling group helps improve resiliency—if the EC2 instance fails its health check, the Amazon EC2 Auto Scaling group will replace it.

Resilience Hub assessment results for the single AZ architecture

Figure 2 shows the results from Resilience Hub; it’s showing a lot of red flags! This architecture does not meet my required RTO (Recovery Time Objective) and RPO (Recovery Point Objective) goals for resiliency, and it is unrecoverable for all failure types. Figure 2 shows that Resilience Hub assesses for several failure types, including failures in the workload application (bad code or data), infrastructure (component failure), or individual AZ availability. AZs within a Region are independent of each other, even if one AZ experiences issues, the other AZs will remain available. The single AZ architecture does not take advantage of that.

Resilience Hub assessment of the single AZ architecture

Figure 2. Resilience Hub assessment of the single AZ architecture

Figure 3 shows the component-level assessment of resiliency where each component corresponds to a part of the single AZ architecture. The results show that the S3 bucket does well. S3 is resilient to AZ failures and stores data across multiple AZs, which results in high data durability.

However, having a single RDS instance means that if the instance fails (infrastructure), or the AZ containing the instance fails (AZ) then it cannot operate (unrecoverable RTO), and the data will be lost (unrecoverable RPO). Similarly, deploying only one NAT gateway leaves the architecture vulnerable if the AZ experiences issues, so it shows as unrecoverable for AZ disruptions. But, because the NAT gateway is a fully managed service, there is no hardware to manage, and therefore it shows as resilient (0s RTO) for infrastructure issues.

Resilience Hub component-level assessment of the single AZ architecture against cloud infrastructure failures

Figure 3. Resilience Hub component-level assessment of the single AZ architecture against cloud infrastructure failures

To improve the single AZ architecture’s resiliency, Resilience Hub recommends enabling multi-AZ for the RDS instance. This will set up a standby database instance in another AZ. For the NAT gateway, it suggests “Add NAT Gateways in multiple AZs. (i.e., every AZ you have resources in).” I’ll implement these suggestions in the next section.

Multi-AZ architecture

Figure 4 shows the multi-AZ architecture that I built based on Resilience Hub’s recommendations, which use the following Well-Architected Reliability best practices:

Well-Architected Best Practice Modification to Architecture
Deploy the workload to multiple locations I set up RDS instances, NAT gateways, and EC2 instances distributed across AZs.
Fail over to healthy resources If one EC2 instance fails, the Elastic Load Balancer will fail over and send traffic to the remaining healthy ones. If the primary RDS instance fails, the standby will be promoted to be the new primary.
Automate healing on all layers The Amazon EC2 Auto Scaling group will replace faulty EC2 instances, and the RDS failover is automatic.
Multi-AZ architecture

Figure 4. Multi-AZ architecture

In the next section, I’ll send this new architecture through Resilience Hub to check how much it has improved.

Resilience Hub assessment results for the multi-AZ architecture

As shown in Figure 5, the architecture still has some problems, but it’s looking much better! Resilience Hub has highlighted application failure as the possible source of resilience issues, so let’s dive into application RTO and RPO.

Resilience Hub assessment of the multi-AZ architecture

Figure 5. Resilience Hub assessment of the multi-AZ architecture

Resilience Hub component-level assessment of the multi-AZ architecture against customer application failures

Figure 6. Resilience Hub component-level assessment of the multi-AZ architecture against customer application failures

By making RDS multi-AZ, data is replicated to a standby instance. If your infrastructure or AZ fails, the system will fail over to the standby instance.

Application failures are different. If the data is corrupted or deleted due to a bug, accident, or unauthorized action, then that deletion or corruption will be replicated to the standby. The standby does not protect you in this case. Similarly, with S3, I need to protect against unwanted deletion or corruption by the application, so let’s see what Resilience Hub recommends.

As shown in Figure 7, for RDS, I can enable instance backup, with a suggested retention period of 7 days. By doing this, I can achieve an RPO of 5 minutes because using automatic backup in RDS allows me to restore a DB instance to any specific time during the backup retention period. The latest restorable time for a DB instance is typically within 5 minutes of the current time. The RTO is longer because it takes time to restore a new RDS instance from backup.

Resilience Hub suggestions to improve RDS resiliency in the architecture

Figure 7. Resilience Hub suggestions to improve RDS resiliency in the architecture

As shown in Figure 8, for S3, it recommends I enable versioning. This feature allows me to roll back any change to an object (including deletion) to a last known good state. This means zero data loss and a 0s RPO.

Resilience Hub suggestions to improve S3 resiliency in the architecture

Figure 8. Resilience Hub suggestions to improve S3 resiliency in the architecture

Let’s implement these suggestions!

Multi-AZ architecture with data backup

Figure 9 shows the multi-AZ architecture that incorporates data backup features.

Multi-AZ architecture incorporating data backup features

Figure 9. Multi-AZ architecture incorporating data backup features

Resilience Hub has recommended more revisions to this architecture based on AWS Well-Architected Reliability best practices:

Well-Architected Best Practice Modification to Architecture
Identify and back up all data that needs to be backed up The data in my RDS database and S3 bucket are backed up.
Perform data backup automatically RDS backups are automatic. S3 object versioning is also automatic.

Resilience Hub final assessment results

And…that’s it! You’ll see in Figure 10 that I’ve achieved the goals I set for RTO and RPO and my architecture is more resilient and reliable.

Resilience Hub assessment of the multi-AZ architecture after incorporating data backup features

Figure 10. Resilience Hub assessment of the multi-AZ architecture after incorporating data backup features

Resilience Hub has even more recommendations for things I could improve. For example, if I switch from MySQL RDS to Amazon Aurora I can use backtracking to reduce the 1 hour 40 minute RTO that it takes to restore an RDS database backup. Backtracking “rewinds” the DB cluster to the time you specify, so you can restore a last known good state without needing to recreate the entire database from backup, which saves time and reduces RTO.

Conclusion

To improve the resiliency of your workloads, you need to apply the right best practices. This blog post shows you how Resilience Hub can help you assess and improve a workload with poor resiliency, identify areas for improvement, implement best practices, and evaluate how those practices meet your resiliency goals.

Optimizing Apache Flink on Amazon EKS using Amazon EC2 Spot Instances

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/optimizing-apache-flink-on-amazon-eks-using-amazon-ec2-spot-instances/

This post is written by Kinnar Sen, Senior EC2 Spot Specialist Solutions Architect

Apache Flink is a distributed data processing engine for stateful computations for both batch and stream data sources. Flink supports event time semantics for out-of-order events, exactly-once semantics, backpressure control, and optimized APIs. Flink has connectors for third-party data sources and AWS Services, such as Apache Kafka, Apache NiFi, Amazon Kinesis, and Amazon MSK. Flink can be used for Event Driven (Fraud Detection), Data Analytics (Ad-Hoc Analysis), and Data Pipeline (Continuous ETL) applications. Amazon Elastic Kubernetes Service (Amazon EKS) is the chosen deployment option for many AWS customers for Big Data frameworks such as Apache Spark and Apache Flink. Flink has native integration with Kubernetes allowing direct deployment and dynamic resource allocation.

In this post, I illustrate the deployment of scalable, highly available (HA), resilient, and cost optimized Flink application using Kubernetes via Amazon EKS and Amazon EC2 Spot Instances (Spot). Learn how to save money on big data streaming workloads by implementing this solution.

Overview

Amazon EC2 Spot Instances

Amazon EC2 Spot Instances let you take advantage of spare EC2 capacity in the AWS Cloud and are available at up to a 90% discount compared to On-Demand Instances. Spot Instances receive a two-minute warning when these instances are about to be reclaimed by Amazon EC2. There are many graceful ways to handle the interruption. Recently EC2 Instance rebalance recommendation has been added to send proactive notifications when a Spot Instance is at elevated risk of interruption. Spot Instances are a great way to scale up and increase throughput of Big Data workloads and has been adopted by many customers.

Apache Flink and Kubernetes

Apache Flink is an adaptable framework and it allows multiple deployment options and one of them being Kubernetes. Flink framework has a couple of key building blocks.

  • Job Client submits the job in form of a JobGraph to the Job Manager.
  • Job Manager plays the role of central work coordinator which distributes the job to the Task Managers.
  • Task Managers are the worker component, which runs the operators for source, transformations and sinks.
  • External components which are optional such as Resource Provider, HA Service Provider, Application Data Source, Sinks etc., and this varies with the deployment mode and options.

Image shows Flink application deployment architecture with Job Manager, Task Manager, Scheduler, Data Flow Graph, and client.

Flink supports different deployment (Resource Provider) modes when running on Kubernetes. In this blog we will use the Standalone Deployment mode, as we just want to showcase the functionality. We recommend first-time users however to deploy Flink on Kubernetes using the Native Kubernetes Deployment.

Flink can be run in different modes such as Session, Application, and Per-Job. The modes differ in cluster lifecycle, resource isolation and execution of the main() method. Flink can run jobs on Kubernetes via Application and Session Modes only.

  • Application Mode: This is a lightweight and scalable way to submit an application on Flink and is the preferred way to launch application as it supports better resource isolation. Resource isolation is achieved by running a cluster per job. Once the application shuts down all the Flink components are cleaned up.
  • Session Mode: This is a long running Kubernetes deployment of Flink. Multiple applications can be launched on a cluster and the applications competes for the resources. There may be multiple jobs running on a TaskManager in parallel. Its main advantage is that it saves time on spinning up a new Flink cluster for new jobs, however if one of the Task Managers fails it may impact all the jobs running on that.

Amazon EKS

Amazon EKS is a fully managed Kubernetes service. EKS supports creating and managing Spot Instances using Amazon EKS managed node groups following Spot best practices. This enables you to take advantage of the steep savings and scale that Spot Instances provide for interruptible workloads. EKS-managed node groups require less operational effort compared to using self-managed nodes. You can learn more in the blog “Amazon EKS now supports provisioning and managing EC2 Spot Instances in managed node groups.”

Apache Flink and Spot

Big Data frameworks like Spark and Flink are distributed to manage and process high volumes of data. Designed for failure, they can run on machines with different configurations, inherently resilient and flexible. Spot Instances can optimize runtimes by increasing throughput, while spending the same (or less). Flink can tolerate interruptions using restart and failover strategies.

Fault Tolerance

Fault tolerance is implemented in Flink with the help of check-pointing the state. Checkpoints allow Flink to recover state and positions in the streams. There are two per-requisites for check-pointing a persistent data source (Apache Kafka, Amazon Kinesis) which has the ability to replay data and a persistent distributed storage to store state (Amazon Simple Storage Service (Amazon S3), HDFS).

Cost Optimization

Job Manager and Task Manager are key building blocks of Flink. The Task Manager is the compute intensive part and Job Manager is the orchestrator. We would be running Task Manager on Spot Instances and Job Manager on On Demand Instances.

Scaling

Flink supports elastic scaling via Reactive Mode, Task Managers can be added/removed based on metrics monitored by an external service monitor like Horizontal Pod Autoscaling (HPA). When scaling up new pods would be added, if the cluster has resources they would be scheduled it not then they will go in pending state. Cluster Autoscaler (CA) detects pods in pending state and new nodes will be added by EC2 Auto Scaling. This is ideal with Spot Instances as it implements elastic scaling with higher throughput in a cost optimized way.

Tutorial: Running Flink applications in a cost optimized way

In this tutorial, I review steps, which help you launch cost optimized and resilient Flink workloads running on EKS via Application mode. The streaming application will read dummy Stock ticker prices send to an Amazon Kinesis Data Stream by Amazon Kinesis Data Generator, try to determine the highest price within a per-defined window, and output will be written onto Amazon S3 files.

Image shows Flink application pipeline with data flowing from Amazon Kinesis Data Generator to Kinesis Data Stream, processed in Apache Flink and output being written in Amazon S3

The configuration files can be found in this github location. To run the workload on Kubernetes, make sure you have eksctl and kubectl command line utilities installed on your computer or on an AWS Cloud9 environment. You can run this by using an AWS IAM user or role that has the Administrator Access policy attached to it, or check the minimum required permissions for using eksctl. The Spot node groups in the Amazon EKS cluster can be launched both in a managed or a self-managed way, in this post I use the EKS Managed node group for Spot Instances.

Steps

When we deploy Flink in Application Mode it runs as a single application. The cluster is exclusive for the job. We will be bundling the user code in the Flink image for that purpose and upload in Amazon Elastic Container Registry (Amazon ECR). Amazon ECR is a fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts anywhere.

1. Build the Amazon ECR Image

  • Login using the following cmd and don’t forget to replace the AWS_REGION and AWS_ACCOUNT_ID with your details.

aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS —password-stdin ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com

  • Create a repository

aws ecr create-repository --repository-name flink-demo --image-scanning-configuration scanOnPush=true —region ${AWS_REGION}

  • Build the Docker image:

Download the Docker file. I am using multistage docker build here. The sample code is from Github’s Amazon Kinesis Data Analytics Java examples. I modified the code to allow checkpointing and change the sliding window interval. Build and push the docker image using the following instructions.

docker build --tag flink-demo .

  • Tag and Push your image to Amazon ECR

docker tag flink-demo:latest ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/flink-demo:latest
docker push ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.
amazonaws.com/flink-demo:latest

2. Create Amazon S3/Amazon Kinesis Access Policy

First, I must create an access policy to allow the Flink application to read/write from Amazon fFS3 and read Kinesis data streams. Download the Amazon S3 policy file from here and modify the <<output folder>> to an Amazon S3 bucket which you have to create.

  • Run the following to create the policy. Note the ARN.

aws iam create-policy --policy-name flink-demo-policy --policy-document file://flink-demo-policy.json

3. Cluster and node groups deployment

  • Create an EKS cluster using the following command:

eksctl create cluster –name= flink-demo --node-private-networking --without-nodegroup --asg-access –region=<<AWS Region>>

The cluster takes approximately 15 minutes to launch.

  • Create the node group using the nodeGroup config file. I am using multiple nodeGroups of different sizes to adapt Spot best practice of diversification.  Replace the <<Policy ARN>> string using the ARN string from the previous step.

eksctl create nodegroup -f managedNodeGroups.yml

  • Download the Cluster Autoscaler and edit it to add the cluster-name (flink-demo)

curl -LO https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

4. Install the Cluster AutoScaler using the following command:

kubectl apply -f cluster-autoscaler-autodiscover.yaml

  • Using EKS Managed node groups requires significantly less operational effort compared to using self-managed node group and enables:
    • Auto enforcement of Spot best practices.
    • Spot Instance lifecycle management.
    • Auto labeling of Pods.
  • eksctl has integrated amazon-ec2-instance-selector to enable auto selection of instances based on the criteria passed. This has multiple benefits
    • ‘instance diversification’ is implemented by enabling multiple instance types selection in the node group which works well with CA
    • Reduces manual effort of selecting the instances.
  • We can create node group manifests using ‘dryrun’ and then create node groups using that.

eksctl create cluster --name flink-demo --instance-selector-vcpus=2 --instance-selector-memory=4 --dry-run

eksctl create node group -f managedNodeGroups.yml

5. Create service accounts for Flink

$ kubectl create serviceaccount flink-service-account
$ kubectl create clusterrolebinding flink-role-binding-flink --clusterrole=edit --serviceaccount=default:flink-service-account

6. Deploy Flink

This install folder here has all the YAML files required to deploy a standalone Flink cluster. Run the install.sh file. This will deploy the cluster with a JobManager, a pool of TaskManagers and a Service exposing JobManager’s ports.

  • This is a High-Availability(HA) deployment of Flink with the use of Kubernetes high availability service.
  • The JobManager runs on OnDemand and TaskManager on Spot. As the cluster is launched in Application Mode, if a node is interrupted only one job will be restarted.
  • Autoscaling is enabled by the use of ‘Reactive Mode’. Horizontal Pod Autoscaler is used to monitor the CPU load and scale accordingly.
  • Check-pointing is enabled which allows Flink to save state and be fault tolerant.

Image shows the Flink dashboard highlighting checkpoints for a job

7. Create Amazon Kinesis data stream and send dummy data      

Log in to AWS Management Console and create a Kinesis data stream name ‘ExampleInputStream’. Kinesis Data Generator is used to send data to the data stream. The template of the dummy data can be found here. Once this sends data the Flink application starts processing.

Image shows Amazon Kinesis Data Generator console sending data to Kinesis Data Strea

Observations

Spot Interruptions

If there is an interruption then the Flick application will be restarted using check-pointed data. The JobManager will restore the job as highlighted in the following log. The node will be replaced automatically by the Managed Node Group.

mage shows logs from a Flink job highlighting job restart using checkpoints.

One will be able to observe the graceful restart in the Flink UI.

Image shows the Flink dashboard highlighting job restart after failure.

AutoScaling

You can observe the elastic scaling using logs. The number of TaskManagers in the Flink UI will also reflect the scaling state.

Image shows kubectl output showing status of HPA during scale-out

Cleanup

If you are trying out the tutorial, run the following steps to make sure that you don’t encounter unwanted costs.

  • Run the delete.sh file.
  • Delete the EKS cluster and the node groups:
    • eksctl delete cluster --name flink-demo
  • Delete the Amazon S3 Access Policy:
    • aws iam delete-policy --policy-arn <<POLICY ARN>>
  • Delete the Amazon S3 Bucket:
    • aws s3 rb --force s3://<<S3_BUCKET>>
  • Delete the CloudFormation stack related to Kinesis Data Generator named ‘Kinesis-Data-Generator-Cognito-User’
  • Delete the Kinesis Data Stream.

Conclusion

In this blog, I demonstrated how you can run Flink workloads on a Kubernetes Cluster using Spot Instances, achieving scalability, resilience, and cost optimization. To cost optimize your Flink based big data workloads you should start thinking about using Amazon EKS and Spot Instances.

Deep Dive on Amazon EC2 VT1 Instances

Post Syndicated from Rick Armstrong original https://aws.amazon.com/blogs/compute/deep-dive-on-amazon-ec2-vt1-instances/

This post is written by:  Amr Ragab, Senior Solutions Architect; Bryan Samis, Principal Elemental SSA; Leif Reinert, Senior Product Manager

Introduction

We at AWS are excited to announce that new Amazon Elastic Compute Cloud (Amazon EC2) VT1 instances are now generally available in the US-East (N. Virginia), US-West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo) Regions. This instance family provides dedicated video transcoding hardware in Amazon EC2 and offers up to 30% lower cost per stream as compared to G4dn GPU based instances or 60% lower cost per stream as compared to C5 CPU based instances. These instances are powered by Xilinx Alveo U30 media accelerators with up to eight U30 media accelerators per instance in the vt1.24xlarge. Each U30 accelerator comes with two XCU30 Zynq UltraScale+ SoCs, totaling 16 addressable devices in the vt1.24xlarge instance with H.264/H.265 Video Codec Units (VCU) cores.

Each U30 accelerator card comes with two XCU30 Zynq UltraScale+ SoCs

Currently, the VT1 family consists of three sizes, as summarized in the following:

Instance Type vCPUs RAM U30 accelerator cards Addressable XCU30 SoCs 
vt1.3xlarge 12 24 1 2
vt1.6xlarge 24 48 2 8
vt1.24xlarge 96 182 8 16

Each addressable XCU30 SoC device supports:

  • Codec: MPEG4 Part 10 H.264, MPEG-H Part 2 HEVC H.265
  • Resolutions: 128×128 to 3840×2160
  • Flexible rate control: Constant Bitrate (CBR), Variable Bitrate(VBR), and Constant Quantization Parameter(QP)
  • Frame Scan Types: Progressive H.264/H.265
  • Input Color Space: YCbCr 4:2:0, 8-bit per color channel.

The following table outlines the number of transcoding streams per addressable device and instance type:

Transcoding Each XCU30 SoC vt1.3xlarge vt1.6xlarge vt1.24xlarge
3840x2160p60 1 2 4 16
3840x2160p30 2 4 8 32
1920x1080p60 4 8 16 64
1920x1080p30 8 16 32 128
1280x720p30 16 32 64 256
960x540p30 24 48 92 384

Each XCU30 SoC can support the following stream densities: 1x 4kp60, 2x 4kp30, 4x 1080p60, 8x 1080p30, 16x 720p30

Customers with applications such as live broadcast, video conferencing and just-in-time transcoding can now benefit from a dedicated instance family devoted to video encoding and decoding with rescaling optimizations at the lowest cost per stream. This dedicated instance family lets customers run batch, real-time, and faster than real-time transcoding workloads.

Deployment and Quick Start

To get started, you launch a VT1 instance with prebuilt VT1 Amazon Machine Images (AMIs), available on the AWS Marketplace. However, if you have AMI hardening requirements or other requirements that require you to install the Xilinx software stack, you can reference the Xilinx Video SDK documentation for VT1.

The software stack utilizes a driver suite that is a combination of the driver stack as well as management and client tools. The following terminology will be used in this instance family:

  • XRT – Xilinx Runtime Library
  • XRM – Xilinx Runtime Management Library
  • XCDR – Xilinx Video Transcoding SDK
  • XMA – Xilinx Media Accelerator API and Samples
  • XOCL – Xilinx driver (xocl)

To run workloads directly on Amazon EC2 instances, you must load both the XRT and XRM stack. These are conveniently provided by loading the XCDR environment. To load the devices, run the following:

source /opt/xilinx/xcdr/setup.sh

With the output:

-----Source Xilinx U30 setup files-----
XILINX_XRT        : /opt/xilinx/xrt
PATH              : /opt/xilinx/xrt/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
LD_LIBRARY_PATH   : /opt/xilinx/xrt/lib:
PYTHONPATH        : /opt/xilinx/xrt/python:
XILINX_XRM      : /opt/xilinx/xrm
PATH            : /opt/xilinx/xrm/bin:/opt/xilinx/xrt/bin:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
LD_LIBRARY_PATH : /opt/xilinx/xrm/lib:/opt/xilinx/xrt/lib:
Number of U30 devices found : 16  

Running Containerized Workloads on Amazon ECS and Amazon EKS

To help build AMIs for Amazon Linux2, Ubuntu 18/20, Amazon ECS and Amazon Elastic Kubernetes Service (Amazon EKS), we have provided a Github project in order to simplify the build process utilizing Packer:

https://github.com/aws-samples/aws-vt-baseami-pipeline

At the time of writing, Xilinx does not have an officially supported container runtime. However, it is possible to pass the specific devices in the docker run ... stanza, and in order to set this environment download this specific script. The following example is the output for vt1.24xlarge:

[ec2-user@ip-10-0-254-236 ~]$ source xilinx_aws_docker_setup.sh
XILINX_XRT : /opt/xilinx/xrt
PATH : /opt/xilinx/xrt/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin
LD_LIBRARY_PATH  : /opt/xilinx/xrt/lib:
PYTHONPATH : /opt/xilinx/xrt/python:
XILINX_AWS_DOCKER_DEVICES : --device=/dev/dri/renderD128:/dev/dri/renderD128
--device=/dev/dri/renderD129:/dev/dri/renderD129
--device=/dev/dri/renderD130:/dev/dri/renderD130
--device=/dev/dri/renderD131:/dev/dri/renderD131
--device=/dev/dri/renderD132:/dev/dri/renderD132
--device=/dev/dri/renderD133:/dev/dri/renderD133
--device=/dev/dri/renderD134:/dev/dri/renderD134
--device=/dev/dri/renderD135:/dev/dri/renderD135
--device=/dev/dri/renderD136:/dev/dri/renderD136
--device=/dev/dri/renderD137:/dev/dri/renderD137
--device=/dev/dri/renderD138:/dev/dri/renderD138
--device=/dev/dri/renderD139:/dev/dri/renderD139
--device=/dev/dri/renderD140:/dev/dri/renderD140
--device=/dev/dri/renderD141:/dev/dri/renderD141
--device=/dev/dri/renderD142:/dev/dri/renderD142
--device=/dev/dri/renderD143:/dev/dri/renderD143
--mount type=bind,source=/sys/bus/pci/devices/0000:00:1d.0,target=/sys/bus/pci/devices/0000:00:1d.0 --mount type=bind,source=/sys/bus/pci/devices/0000:00:1e.0,target=/sys/bus/pci/devices/0000:00:1e.0

Once the devices have been enumerated, start the workload by running:

docker run -it $XILINX_AWS_DOCKER_DEVICES <image:tag>

Amazon EKS Setup

To launch an EKS cluster with VT1 instances, create the AMI from the scripts provided in the repo earlier.

https://github.com/aws-samples/aws-vt-baseami-pipeline

Once the AMI is created, launch an EKS cluster:

eksctl create cluster --region us-east-1 --without-nodegroup --version 1.19 \
       --zones us-east-1c,us-east-1d

Once the cluster is created, substitute the values for the cluster name, subnets, and AMI IDs in the following template.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: <cluster-name>
  region: us-east-1

vpc:
  id: vpc-285eb355
  subnets:
    public:
      endpoint-one:
        id: subnet-5163b237
      endpoint-two:
        id: subnet-baff22e5

managedNodeGroups:
  - name: vt1-ng-1d
    instanceType: vt1.3xlarge
    volumeSize: 200
    instancePrefix: vt1-ng-1d-worker
    ami: <ami-id>
    iam:
      withAddonPolicies:
        imageBuilder: true
        autoScaler: true
        ebs: true
        fsx: true
        cloudWatch: true
    ssh:
      allow: true
      publicKeyName: amrragab-aws
    subnets:
    - endpoint-one
    minSize: 1
    desiredCapacity: 1
    maxSize: 4
    overrideBootstrapCommand: |
      #!/bin/bash
      /etc/eks/bootstrap.sh <cluster-name>

Save this file, and then deploy the nodegroup.

eksctl create nodegroup -f vt1-managed-ng.yaml

Once deployed, apply the FPGA U30 device plugin. The daemonset container is available on the Amazon Elastic Container Registry (ECR) public gallery. You can also access the daemonset deployment file.

kubectl apply -f xilinx-device-plugin.yml

Confirm that the Xilinx U30 device(s) are seen by K8s API server and can be allocatable in your job.

Capacity:
  attachable-volumes-aws-ebs:                  39
  cpu:                                         12
  ephemeral-storage:                           209702892Ki
  hugepages-1Gi:                               0
  hugepages-2Mi:                               0
  memory:                                      23079768Ki
  pods:                                        15
  xilinx.com/fpga-xilinx_u30_gen3x4_base_1-0:  1
Allocatable:
  attachable-volumes-aws-ebs:                  39
  cpu:                                         11900m
  ephemeral-storage:                           192188443124
  hugepages-1Gi:                               0
  hugepages-2Mi:                               0
  memory:                                      22547288Ki
  pods:                                        15
  xilinx.com/fpga-xilinx_u30_gen3x4_base_1-0:  1

Video Quality Analysis

The video quality produced by the U30 is roughly equivalent to the “faster” profile in the x264 and x265 codecs, or the “p4” preset using the nvenc codec on G4dn. For example, in the following test we encoded the same UHD (4K) video at multiple bitrates into H264, and then compared Video Multimethod Assessment Fusion (VMAF) scores:

Plotting VMAF and bitrate we see comparable quality across x264 faster, h264_nvenc p4 and u30

Stream Density and Encoding Performance

To illustrate the VT1 instance family stream density and encoding performance, let’s look at the smallest instance, the vt1.3xlarge, which can encode up to eight simultaneous 1080p60 streams into H.264. We chose a set of similar instances at a close price point, and then compared how many 1080p60 H264 streams they could encode simultaneously to an equivalent quality:

Column 1 Column 2 Column 3 Column 4 Column 5
Instance Codec us-east-1 Hourly Price* 1080p60 Streams / Instance Hourly Cost / Stream
c5.4xlarge x264 $0.680 2 $0.340
c6g.4xlarge x264 $0.544 2 $0.272
c5a.4xlarge x264 $0.616 3 $0.205
g4dn.xlarge nvenc $0.526 4 $0.132
vt1.3xlarge xma $0.650 8 $0.081

* Prices accurate as of the publishing date of this article.

As you can see, the vt1.3xlarge instance can encode four times as many streams as the c5.4xlarge, and at a lower hourly cost. It can also encode two times the number of streams as a g4dn.xlarge instance. Thus, yielding in this example a cost per stream reduction of up to 76% over c5.4xlarge, and up to 39% compared to g4dn.xlarge.

Faster than Real-time Transcoding

In addition to encoding multiple live streams in parallel, VT1 instances can also be utilized to encode file-based content at faster-than-real-time performance. This can be done by over-provisioning resources on a single XCU30 device so that more resources are dedicated to transcoding than are necessary to maintain real-time.

For example, running the following command (specifying -cores 4) will utilize all resources on a single XCU30 device, and yield an encode speed of approximately 177 FPS, or 2.95 times faster than real-time for a 60 FPS source:

$ ffmpeg -c:v mpsoc_vcu_h264 -i input_1920x1080p60_H264_8Mbps_AAC_Stereo.mp4 -f mp4 -b:v 5M -c:v mpsoc_vcu_h264 -cores 4 -slices 4 -y /tmp/out.mp4
frame=43092 fps=177 q=-0.0 Lsize= 402721kB time=00:11:58.92 bitrate=4588.9kbits/s speed=2.95x

To maximize FPS further, utilize the “split and stitch” operation to break the input file into segments, and then transcode those in parallel across multiple XCU30 chips or even multiple U30 cards in an instance. Then, recombine the file at the output. For more information, see the Xilinx Video SDK documentation on Faster than Real-time transcoding.

Using the provided example script on the same 12-minute source file as the preceding example on a vt1.3xlarge, we can utilize both addressable devices on the U30 card at once in order to yield an effective encode speed of 512 fps, or 8.5 times faster than real-time.

$ python3 13_ffmpeg_transcode_only_split_stitch.py -s input_1920x1080p60_H264_8Mbps_AAC_Stereo.mp4 -d /tmp/out.mp4 -i h264 -o h264 -b 5.0

There are 1 cards, 2 chips in the system
...
Time from start to completion : 84 seconds (1 minute and 24 seconds)
This clip was processed 8.5 times faster than realtime

This clip was effectively processed at 512.34 FPS

Conclusion

We are excited to launch VT1, our first EC2 instance with dedicated hardware acceleration for video transcoding, which provides up to 30% lower cost per stream as compared to G4dn or 60% lower cost per stream as compared to G5. With up to eight Xilinx Alveo U30 media accelerators, you can parallelize up to 16 4K UHD streams, for batch, real-time, and faster than real-time transcoding. If you have any questions, reach out to your account team. Now, go power up your video transcoding workloads with Amazon EC2 VT1 instances.

Cybersecurity Awareness Month: Learn about the job zero of securing your data using Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/cybersecurity-awareness-month-learn-about-the-job-zero-of-securing-your-data-using-amazon-redshift/

Amazon Redshift is the most widely used cloud data warehouse. It allows you to run complex analytic queries against terabytes to petabytes of structured and semi-structured data, using sophisticated query optimization, columnar on high-performance storage, and massively parallel query execution.

At AWS, we embrace the culture that security is job zero, by which we mean it’s even more important than any number one priority. AWS provides comprehensive security capabilities to satisfy the most demanding requirements, and Amazon Redshift provides data security out of the box at no extra cost. Amazon Redshift uses the AWS security frameworks to implement industry-leading security in the areas of authentication, access control, auditing, logging, compliance, data protection, and network security.

Cybersecurity Awareness Month raises awareness about the importance of cybersecurity, ensuring everyone has the resources they need to be safer and more secure online. This post highlights some of the key out-of-the-box capabilities available in Amazon Redshift to manage your data securely.

Authentication

Amazon Redshift supports industry-leading security with built-in AWS Identity and Access Management (IAM) integration, identity federation for single sign-on (SSO), and multi-factor authentication. You can federate database user authentication easily with IAM and Amazon Redshift using IAM and a third-party SAML-2.0 identity provider (IdP), such as AD FS, PingFederate, or Okta.

To get started, see the following posts:

Access control

Granular row and column-level security controls ensure users see only the data they should have access to. You can achieve column-level access control for data in Amazon Redshift by using the column-level grant and revoke statements without having to implement views-based access control or use another system. Amazon Redshift is also integrated with AWS Lake Formation, which makes sure that column and row level access in Lake Formation is enforced for Amazon Redshift queries on the data in the data lake.

The following guides can help you implement fine-grained access control on Amazon Redshift:

Auditing and logging

To monitor Amazon Redshift for any suspicious activities, you can take advantage of the auditing and logging features. Amazon Redshift logs information about connections and user activities, which can be uploaded to Amazon Simple Storage Service (Amazon S3) if you enable the audit logging feature. The API calls to Amazon Redshift are logged to AWS CloudTrail, and you can create a log trail by configuring CloudTrail to upload to Amazon S3. For more details, see Database audit logging, Analyze logs using Amazon Redshift spectrum, Querying AWS CloudTrail Logs, and System object persistence utility.

Compliance

Amazon Redshift is assessed by third-party auditors for compliance with multiple programs. If your use of Amazon Redshift is subject to compliance with standards like HIPAA, PCI, or FedRAMP, you can find more details at Compliance validation for Amazon Redshift.

Data protection

To protect data both at rest and while in transit, Amazon Redshift provides options to encrypt the data. Although the encryption settings are optional, we highly recommend enabling them. When you enable encryption at rest for your cluster, it encrypts both the data blocks as well as the metadata, and there are multiple ways to manage the encryption key (see Amazon Redshift database encryption). To safeguard your data while it’s in transit from your SQL clients to the Amazon Redshift cluster, we highly recommend configuring the security options as described in Configuring security options for connections.

For additional data protection options, see the following resources:

Amazon Redshift enables you to use an AWS Lambda function as a UDF in Amazon Redshift. You can write Lambda UDFs to enable external tokenization of data dynamic data masking, as illustrated in Amazon Redshift – Dynamic Data Masking.

Network security

Amazon Redshift is a service that runs within your VPC. There are multiple configurations to ensure access to your Amazon Redshift cluster is secured, whether the connection is from an application within your VPC or an on-premises system. For more information, see VPCs and subnets. Amazon Redshift for AWS PrivateLink ensures that all API calls from your VPC to Amazon Redshift stay within the AWS network. For more information, see Connecting to Amazon Redshift using an interface VPC endpoint.

Customer success stories

You can run your security-demanding analytical workload using out-of-the-box features. For example, SoePay, a Hong Kong–based payments solutions provider, uses AWS Fargate and Amazon Elastic Container Service (Amazon ECS) to scale its infrastructure, AWS Key Management Service (AWS KMS) to manage cryptographic keys, and Amazon Redshift to store data from merchants’ smart devices.

With AWS services, GE Renewable Energy has created a data lake where it collects and analyses machine data captured at GE wind turbines around the world. GE relies on Amazon S3 to store and protect its ever-expanding collection of wind turbine data and Amazon Redshift to help them gain new insights from the data it collects.

For more customer stories, see Amazon Redshift customers.

Conclusion and Next Steps

In this post, we discussed some of the key out-of-the-box capabilities at no extra cost available in Amazon Redshift to manage your data securely, such as authentication, access control, auditing, logging, compliance, data protection, and network security.

You should periodically review your AWS workloads to ensure security best practices have been implemented. The AWS Well-Architected Framework helps you understand the pros and cons of decisions you make while building systems on AWS. This framework can help you learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. Review your security pillar provided in this framework.

In addition, AWS Security Hub, an AWS service, provides a comprehensive view of your security state within AWS that helps you check your compliance with security industry standards and best practices.

To adhere to the security needs of your organization, you can automate the deployment of an Amazon Redshift cluster in an AWS account using AWS CloudFormation and AWS Service Catalog. For more information, see Automate Amazon Redshift cluster creation using AWS CloudFormation and Automate Amazon Redshift Cluster management operations using AWS CloudFormation.


About the Authors

Kunal Deep Singh is a Software Development Manager at Amazon Web Services (AWS) and leads development of security features for Amazon Redshift. Prior to AWS he has worked at Amazon Ads and Microsoft Azure. He is passionate about building customer solutions for cloud, data and security.

Thiyagarajan Arumugam is a Principal Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

Building Modern Applications with Amazon EKS on Amazon Outposts

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/building-modern-applications-with-amazon-eks-on-amazon-outposts/

This post is written by Brad Kirby, Principal Outposts Specialist, and Chris Lunsford, Senior Outposts SA. 

Customers are modernizing applications by deconstructing monolithic architectures and migrating application components into container–based, service-oriented, and microservices architectures. Modern applications improve scalability, reliability, and development efficiency by allowing services to be owned by smaller, more focused teams.

This post shows how you can combine Amazon Elastic Kubernetes Service (Amazon EKS) with AWS Outposts to deploy managed Kubernetes application environments on-premises alongside your existing data and applications.

For a brief introduction to the subject matter, the watch this video, which demonstrates:

  • The benefits of application modernization
  • How containers are an ideal enabling technology for microservices architectures
  • How AWS Outposts combined with Amazon container services enables you to unwind complex service interdependencies and modernize on-premises applications with low latency, local data processing, and data residency requirements

Understanding the Amazon EKS on AWS Outposts architecture

Amazon EKS

Many organizations chose Kubernetes as their container orchestration platform because of its openness, flexibility, and a growing pool of Kubernetes literate IT professionals. Amazon EKS enables you to run Kubernetes clusters on AWS without needing to install and manage the Kubernetes control plane. The control plane, including the API servers, scheduler, and cluster store services, runs within a managed environment in the AWS Region. Kubernetes clients (like kubectl) and cluster worker nodes communicate with the managed control plane via public and private EKS service endpoints.

AWS Outposts

The AWS Outposts service delivers AWS infrastructure and services to on-premises locations from the AWS Global Cloud Infrastructure. An Outpost functions as an extension of the Availability Zone (AZ) where it is anchored. AWS operates, monitors, and manages AWS Outposts infrastructure as part of its parent Region. Each Outpost connects back to its anchor AZ via a Service Link connection (a set of encrypted VPN tunnels). AWS Outposts extend Virtual Private Cloud (VPC) environments from the Region to on-premises locations and enables you to deploy VPC subnets to Outposts in your data center and co-location spaces. The Outposts Local Gateway (LGW) routes traffic between Outpost VPC subnets and the on-premises network.

Amazon EKS on AWS Outposts

You deploy Amazon EKS worker nodes to Outposts using self-managed node groups. The worker nodes run on Outposts and register with the Kubernetes control plane in the AWS Region. The worker nodes, and containers running on the nodes, can communicate with AWS services and resources running on the Outpost and in the region (via the Service Link) and with on-premises networks (via the Local Gateway).

 

You use the same AWS and Kubernetes tools and APIs to work with EKS on Outposts nodes that you use to work with EKS nodes in the Region. You can use eksctl, the AWS Management Console, AWS CLI, or infrastructure as code (IaC) tools like AWS CloudFormation or HashiCorp Terraform to create self-managed node groups on AWS Outposts.

Amazon EKS self-managed node groups on AWS Outposts

Like many customers, you might use managed node groups when you deploy EKS worker nodes in your Region, and you may be wondering, “what are self-managed node groups?”

Self-managed node groups, like managed node groups, use standard Amazon Elastic Compute Cloud (Amazon EC2) instances in EC2 Auto Scaling groups to deploy, scale-up, and scale-down EKS worker nodes using Amazon EKS optimized Amazon Machine Images (AMIs). Amazon configures the EKS optimized AMIs to work with the EKS service. The images include Docker, kubelet, and the AWS IAM Authenticator. The AMIs also contain a specialized bootstrap script /etc/eks/bootstrap.sh that allows worker nodes to discover and connect to your cluster control plane and add Kubernetes labels that identify the nodes in the cluster.

What makes the node groups self-managed? Managed node groups have additional features that simplify updating nodes in the node group. With self-managed nodes, you must implement a process to update or replace your node group when you want to update the nodes to use a new Amazon EKS optimized AMI.

You create self-managed node groups on AWS Outposts using the same process and resources that you use to deploy EC2 instances using EC2 Auto Scaling groups. All instances in a self-managed node group must:

  • Be the same Instance type
  • Be running the same Amazon Machine Image (AMI)
  • Use the same Amazon EKS node IAM role
  • Be tagged with the io/cluster/<cluster-name>=owned tag

Additionally, to deploy on AWS Outposts, instances must:

  • Use encrypted EBS volumes
  • Launch in Outposts subnets

Kubernetes authenticates all API requests to a cluster. You must configure an EKS cluster to allow nodes from a self-managed node group to join the cluster. Self-managed nodes use the node IAM role to authenticate with an EKS cluster. Amazon EKS clusters use the AWS IAM Authenticator for Kubernetes to authenticate requests and Kubernetes native Role Based Access Control (RBAC) to authorize requests. To enable self-managed nodes to register with a cluster, configure the AWS IAM Authenticator to recognize the node IAM role and assign the role to the system:bootstrappers and system:nodes RBAC groups.

In the following tutorial, we take you through the steps required to deploy EKS worker nodes on an Outpost and register those nodes with an Amazon EKS cluster running in the Region. We created a sample Terraform module aws-eks-self-managed-node-group to help you get started quickly. If you are interested, you can dive into the module sample code to see the detailed configurations for self-managed node groups.

Deploying Amazon EKS on AWS Outposts (Terraform)

To deploy Amazon EKS nodes on an AWS Outposts deployment, you must complete two high-level steps:

Step 1: Create a self-managed node group

Step 2: Configure Kubernetes to allow the nodes to register

We use Terraform in this tutorial to provide visibility (through the sample code) into the details of the configurations required to deploy self-managed node groups on AWS Outposts.

Prerequisites

To follow along with this tutorial, you should have the following prerequisites:

  • An AWS account.
  • An operational AWS Outposts deployment (Outpost) associated with your AWS account.
  • AWS Command Line Interface (CLI) version 1.25 or later installed and configured on your workstation.
  • HashiCorp Terraform version 14.6 or later installed on your workstation.
  • Familiarity with Terraform HashiCorp Configuration Language (HCL) syntax and experience using Terraform to deploy AWS resources.
  • An existing VPC that meets the requirements for an Amazon EKS cluster. For more information, see Cluster VPC considerations. You can use the Getting started with Amazon EKS guide to walk you through creating a VPC that meets the requirements.
  • An existing Amazon EKS cluster. You can use the Creating an Amazon EKS cluster guide to walk you through creating the cluster.
  • Tip: We recommend creating and using a new VPC and Amazon EKS cluster to complete this tutorial. Procedures like modifying the aws-auth ConfigMap on the cluster may impact other nodes, users, and workloads on the cluster. Using a new VPC and Amazon EKS cluster will help minimize the risk of impacting other applications as you complete the tutorial.
  • Note: Do not reference subnets deployed on AWS Outposts when creating Amazon EKS clusters in the Region. The Amazon EKS cluster control plane runs in the Region and attaches to VPC subnets in the Region availability zones.
  • A symmetric KMS key to encrypt the EBS volumes of the EKS worker nodes. You can use the alias/aws/ebs AWS managed key for this prerequisite.

Using the sample code

The source code for the amazon-eks-self-managed-node-group Terraform module is available in AWS-Samples on GitHub.

Setup

To follow along with this tutorial, complete the following steps to setup your workstation:

  1. Open your terminal.
  2. Make a new directory to hold your EKS on Outposts Terraform configurations.
  3. Change to the new directory.
  4. Create a new file named providers.tf with the following contents:
    terraform {
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 3.27"
        }
    
        kubernetes = {
          source  = "hashicorp/kubernetes"
          version = "~> 1.13.3"
        }
      }
    }

  5. Keep your terminal open and do all the work in this tutorial from this directory.

Step 1: Create a self-managed node group

To create a self-managed node group on your Outpost

  1. Create a new Terraform configuration file named self-managed-node-group.tf.
  2. Configure the aws Terraform provider with your AWS CLI profile and Outpost parent Region:
    provider "aws" {
      region  = "us-west-2"
      profile = "default"
    }
    

  3. Configure the aws-eks-self-managed-node-group module with the following (minimum) arguments:
    • eks_cluster_name the name of your EKS cluster
    • instance_type an instance type supported on your AWS Outposts deployment
    • desired_capacity, min_size, and max_size as desired to control the number of nodes in your node group (ensure that your Outpost has sufficient resources available to run the desired number nodes of the specified instance type)
    • subnets the subnet IDs of the Outpost subnets where you want the nodes deployed
    • (Optional) Kubernetes node_labels to apply to the nodes
    • Allow ebs_encrypted and configure the ebs_kms_key_arn with KMS key you want to use to encrypt the nodes’ EBS volumes (required for Outposts deployments)

Example:

module "eks_self_managed_node_group" {
  source = "github.com/aws-samples/amazon-eks-self-managed-node-group"

  eks_cluster_name = "cmluns-eks-cluster"
  instance_type    = "m5.2xlarge"
  desired_capacity = 1
  min_size         = 1
  max_size         = 1
  subnets          = ["subnet-0afb721a5cc5bd01f"]

  node_labels = {
    "node.kubernetes.io/outpost"    = "op-0d4579457ff2dc345"
    "node.kubernetes.io/node-group" = "node-group-a"
  }

  ebs_encrypted   = true
  ebs_kms_key_arn = "arn:aws:kms:us-west-2:799838960553:key/0e8f15cc-d3fc-4da4-ae03-5fadf45cc0fb"
}

You may configure other optional arguments as appropriate for your use case – see the module README for details.

Step 2: Configure Kubernetes to allow the nodes to register

Use the following procedures to configure Terraform to manage the AWS IAM Authenticator (aws-auth) ConfigMap on the cluster. This adds the node-group IAM role to the IAM authenticator and Kubernetes RBAC groups.

Configure the Terraform Kubernetes Provider to allow Terraform to interact with the Kubernetes control plane.

Note: If you add a node group to a cluster with existing node groups, mapped IAM roles, or mapped IAM users, the aws-auth ConfigMap may already be configured on your cluster. If the ConfigMap exists, you must download, edit, and replace the ConfigMap on the cluster using kubectl. We do not provide a procedure for this operation as it may affect workloads running on your cluster. Please see the section Managing users or IAM roles for your cluster in the Amazon EKS User Guide for more information.

To check if your cluster has the aws-auth ConfigMap configured

  1. Run the aws eks --region <region> update-kubeconfig --name <cluster-name> command to update your workstation’s ~/.kube/config with the information needed to connect to your cluster, substituting your <region> and <cluster-name>.
❯ aws eks --region us-west-2 update-kubeconfig --name cmluns-eks-cluster
Updated context arn:aws:eks:us-west-2:799838960553:cluster/cmluns-eks-cluster in ~/.kube/config
  1. Run the kubectl describe configmap -n kube-system aws-auth
  • If you receive an error stating, Error from server (NotFound): configmaps "aws-auth" not found, then proceed with the following procedures to use Terraform to apply the ConfigMap.
❯ kubectl describe configmap -n kube-system aws-auth
Error from server (NotFound): configmaps "aws-auth" not found
  • If you do not receive the preceding error, and kubectl returns an aws-auth ConfigMap, then you should not use Terraform to manage the ConfigMap.

To configure the Terraform Kubernetes provider

  1. Create a new Terraform configuration file named aws-auth-config-map.tf.
  2. Add the aws_eks_cluster Terraform data source, and configure it to look up your cluster by name.
data "aws_eks_cluster" "selected" {
  name = "cmluns-eks-cluster"
}
  1. Add the aws_eks_cluster_auth Terraform data source, and configure it to look up your cluster by name.
data "aws_eks_cluster_auth" "selected" {
  name = "cmluns-eks-cluster"
}
  1. Configure the kubernetes provider with your cluster host (endpoint address), cluster_ca_certificate, and the token from the aws_eks_cluster and aws_eks_cluster_auth data sources.
provider "kubernetes" {
  load_config_file       = false
  host                   = data.aws_eks_cluster.selected.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.selected.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.selected.token
}

To configure the AWS IAM Authenticator on the cluster

  1. Open the aws-auth-config-map.tf Terraform configuration file you created in the last step.
  2. Add a kubernetes_config_map Terraform resource to add the aws-auth ConfigMap to the kube-system
  3. Configure the data argument with a YAML format heredoc string that adds the Amazon Resource Name (ARN) for your IAM role to the mapRoles
resource "kubernetes_config_map" "aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }

  data = {
    mapRoles = <<-EOT
      - rolearn: ${module.eks_self_managed_node_group.role_arn}
        username: system:node:{{EC2PrivateDNSName}}
        groups:
          - system:bootstrappers
          - system:nodes
    EOT
  }
}

Apply the configuration and verify node registration

You have created the Terraform configurations to deploy an EKS self-managed node group to your Outpost and configure Kubernetes to authenticate the nodes. Now you apply the configurations and verify that the nodes register with your Kubernetes cluster.

To apply the Terraform configurations

  1. Run the terraform init command to download the providers, self-managed node group module, and prepare the directory for use.
  2. Run the terraform apply
  3. Review the resources that will be created.
  4. Enter yes to confirm that you want to create the resources.
    ❯ terraform apply
    
    Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
      + create
    
    Terraform will perform the following actions:
    
    <-- Output omitted for brevity -->
    
    Plan: 9 to add, 0 to change, 0 to destroy.
    
    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
      Enter a value: yes
    

  5. Press Enter.
    <-- Output omitted for brevity -->
    
    Apply complete! Resources: 9 added, 0 changed, 0 destroyed.
    

  6. Run the aws eks --region <region> update-kubeconfig --name <cluster-name> command to update your workstation’s ~/.kube/config with the information required to connect to your cluster – substituting your <region> and <cluster-name>.
    ❯ aws eks --region us-west-2 update-kubeconfig --name cmluns-eks-cluster
    Updated context arn:aws:eks:us-west-2:799838960553:cluster/cmluns-eks-cluster in ~/.kube/config
    

  7. Run the kubectl get nodes command to view the status of your cluster nodes.
  8. Verify your new nodes show up in the list with a STATUS of Ready.
    ❯ kubectl get nodes
    NAME                                          STATUS   ROLES    AGE   VERSION
    ip-10-253-46-119.us-west-2.compute.internal   Ready    <none>   37s   v1.18.9-eks-d1db3c
    

  9. (Optional) If your nodes are registering, and their STATUS does not show Ready, run the kubectl get nodes --watch command to watch them come online.
  10. (Optional) Run the kubectl get nodes --show-labels command to view the node list with the labels assigned to each node. The nodes created by your AWS Outposts node group will have the labels you assigned in Step 1.

To verify the Kubernetes system pods deploy on the worker nodes

  1. Run the kubectl get pods --namespace kube-system
  2. Verify that each pod shows READY 1/1 with a STATUS of Running.
❯ kubectl get pods --namespace kube-system
NAME                       READY   STATUS    RESTARTS   AGE
aws-node-84xlc             1/1     Running   0          2m16s
coredns-559b5db75d-jxqbp   1/1     Running   0          5m36s
coredns-559b5db75d-vdccc   1/1     Running   0          5m36s
kube-proxy-fspw4           1/1     Running   0          2m16s

The nodes in your AWS Outposts node group should be registered with the EKS control plane in the Region and the Kubernetes system pods should successfully deploy on the new nodes.

Clean up

One of the nice things about using infrastructure as code tools like Terraform is that they automate stack creation and deletion. Use the following procedure to remove the resources you created in this tutorial.

To clean up the resources created in this tutorial

  1. Run the terraform destroy
  2. Review the resources that will be destroyed.
  3. Enter yes to confirm that you want to destroy the resources.
❯ terraform destroy

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

Plan: 0 to add, 0 to change, 9 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
  1. Press Enter.

Destroy complete! Resources: 9 destroyed.

  1. Clean up any resources you created for the prerequisites.

Conclusion

In this post, we discussed how containers sit at the heart of the application modernization process, making it easy to adopt microservices architectures that improve application scalability, availability, and performance. We also outlined the challenges associated with modernizing on-premises applications with low latency, local data processing, and data residency requirements. In these cases, AWS Outposts brings cloud services, like Amazon EKS, close to the workloads being refactored. We also walked you through deploying Amazon EKS worker nodes on-premises on AWS Outposts.

Now that you have deployed Amazon EKS worker nodes on in a test VPC using Terraform, you should adapt the Terraform module(s) and resources to prepare and deploy your production Kubernetes clusters on-premises on Outposts. If you don’t have an Outpost and you want to get started modernizing your on-premises applications with Amazon EKS and AWS Outposts, contact your local AWS account Solutions Architect (SA) to help you get started.

Apple Mail’s iOS15 Privacy Protection Impact to Senders

Post Syndicated from Matt Strzelecki original https://aws.amazon.com/blogs/messaging-and-targeting/apple-mails-ios15-privacy-protection-impact-to-senders-2/

On June 7th at Apple’s Worldwide Developer’s Conference (WWDC 2021) Apple announced that Apple Mail users can now choose to use Apple Mail Privacy Protection. Apple Mail Privacy Protection will allow iOS to privately load remote message content which will hide recipient’s mail activity information like IP and user agent information, including geolocation and device(s) used to engage with the message. Apple Mail Privacy Protection will eliminate the open as being a reliable metric to evaluate user engagement on the sender’s side as all tracking pixels and images will be cached and fired as it hits Apple Mail. Apple is doing this in order to protect user information and increase privacy while also helping to facilitate a richer user experience as Apple Mail user can confidently open, read and engage with messages without all their email interactions are being tracked through remote images and tracking pixels. This will result in all messages that have the Apple Mail Privacy Protection enabled to register an open regardless of whether the recipient has read the email message or not. The end user will also have more confidence in the security of the message including its links.

When a user starts Apple Mail on their iOS device, emails to that user are initiated for download to their device but are first cached by Apple including all images and pixels, to a proxy server that does not expose individual recipient IP addresses but rather a generic IP of the Apple Cache. This happens regardless of if the user actually opens the mail at that time or not. If the user opens the email it pulls the message from the Apple Cache rather than from the original sending source, typically an email service provider (ESP). As a result, senders will not have open tracking insight as all tracking images and pixels will fire as the messages are downloaded to the Apple Cache.

Apple Mail Privacy Protection will apply to email opened on the Apple Mail app. If a user engages messages through another mail application such as the Gmail app, Apple Mail Privacy Protection will not be applied. Apple Mail Privacy Protection is not enabled by default but as you launch the Apple Mail app in iOS 15 initially, the user will be prompted to enable privacy protection which most users will choose to turn on.

Impact to Marketers

There will be a major impact to marketers who rely heavily on open rates as a conversion metric for user engagement as open data will be skewed as messages containing tracking links will fire regardless of if a recipient actually engages with the message or not. However, other data points and user activity will still be available such as click-through rates, onsite activity, and conversion history. These types of metrics will need to be relied upon to supplement open tracking data. Additionally, email deliverability best practices will be more important than ever to help maintain healthy lists and a responsive user base. Best practices such as confirmed opt-in list building, list maintenance & hygiene, consistent sending patterns and cadence, and honoring opt-outs and complaints will be even more important for marketers to adhere to as they adjust to the new Mail Privacy Protection feature.

While Mail Privacy Protection reduces visibility of open rates there are benefits to the user experience as user trust increases in the messages received through Apple Mail. For example, previous users who chose to receive text-only based messages to protect their privacy will now receive the more rich content of the full message providing a better user experience while engaging with the message. Full load of images and content will be sent to the recipients who will have a much higher sense of security in reading/ingesting/actioning the email and its content. Prior to Apple Mail Privacy Protection there could be skepticism of URLs and links within the messages leading to more deletes or false positive, potentially also resulting in more complaints and/or unsubscribes.

There are other benefits of Apple’s Mail Privacy Protection to marketers such as validation of email addresses. Since emails are cached as the messages are initiated for download to a device, and as a result it is downloaded to the Apple Cache and the tracking image or pixel is fired, it validates the existence of that email address. This does not mean you should use this feature as a validation tool as mailbox providers such as Gmail will still evaluate senders in part on list hygiene and high invalid requests will still lead to negative sender reputation with those providers. Confirmed opt-in practices are going to be even more crucial for managing healthy and long-term lists for marketers than it was prior to Apple Mail Privacy Protection. If a marketer is unsure about opt-in status, look into creating a re-confirmation campaign and only add back in recipients that re-confirm the opt-in by clicking a confirmation link in the message.

Conclusion

Email is still the most used tool to communicate whether that’s business-to-business, business-to-consumer or peer-to-peer, especially when it comes to marketing. Marketers need to continue to evolve and be creative when sending messages to their recipients because email, as it it relates to privacy & security, will continue evolve and leave marketers who don’t keep pace behind. While Apple’s Mail Privacy Protection reduces open rate visibility it does provide its user base with more security and confidence in messages passed to their devices. That confidence can allow marketers to focus on developing richer content for a better user experience and drive conversions rather than just opens.

Developing and managing a list with proper confirmed opt-in methods are crucial to developing long-term email lists and the trust of your recipients. The implementation of Apple Mail Privacy Protection reinforces this principle.

Lastly, email privacy & security will continue to advance forward and marketers along with email service providers should not be trying to “get around” these privacy features, rather they need to understand that these features are intended to help the end user and your customers. Work within the ideology of providing the customers what they want to receive and nothing more or less, and you can help your emails thrive. Stay tuned for more updates as they become available.

Enabling parallel file systems in the cloud with Amazon EC2 (Part I: BeeGFS)

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/enabling-parallel-file-systems-in-the-cloud-with-amazon-ec2-part-i-beegfs/

This post was authored by AWS Solutions Architects Ray Zaman, David Desroches, and Ameer Hakme.

In this blog series, you will discover how to build and manage your own Parallel Virtual File System (PVFS) on AWS. In this post you will learn how to deploy the popular open source parallel file system, BeeGFS, using AWS D3en and I3en EC2 instances. We will also provide a CloudFormation template to automate this BeeGFS deployment.

A PVFS is a type of distributed file system that distributes file data across multiple servers and provides concurrent data access to multiple execution tasks of an application. PVFS focuses on high-performance access to large datasets. It consists of a server process and a client library, which allows the file system to be mounted and used with standard utilities. PVFS on the Linux OS originated in the 1990’s and today several projects are available including Lustre, GlusterFS, and BeeGFS. Workloads such as shared storage for video transcoding and export, batch processing jobs, high frequency online transaction processing (OLTP) systems, and scratch storage for high performance computing (HPC) benefit from the high throughput and performance provided by PVFS.

Implementation of a PVFS can be complex and expensive. There are many variables you will want to take into account when designing a PVFS cluster including the number of nodes, node size (CPU, memory), cluster size, storage characteristics (size, performance), and network bandwidth. Due to the difficulty in estimating the correct configuration, systems procured for on-premises data centers are typically oversized, resulting in additional costs, and underutilized resources. In addition, the hardware procurement process is lengthy and the installation and maintenance of the hardware adds additional overhead.

AWS makes it easy to run and fully manage your parallel file systems by allowing you to choose from a variety of Amazon Elastic Compute Cloud (EC2) instances. EC2 instances are available on-demand and allow you to scale your workload as needed. AWS storage-optimized EC2 instances offer up to 60 TB of NVMe SSD storage per instance and up to 336 TB of local HDD storage per instance. With storage-optimized instances, you can easily deploy PVFS to support workloads requiring high-performance access to large datasets. You can test and iterate on different instances to find the optimal size for your workloads.

D3en instances leverage 2nd-generation Intel Xeon Scalable Processors (Cascade Lake) and provide a sustained all core frequency up to 3.1 GHz. These instances provide up to 336 TB of local HDD storage (which is the highest local storage capacity in EC2), up to 6.2 GiBps of disk throughput, and up to 75 Gbps of network bandwidth.

I3en instances are powered by 1st or 2nd generation Intel® Xeon® Scalable (Skylake or Cascade Lake) processors with 3.1 GHz sustained all-core turbo performance. These instances provide up to 60 TB of NVMe storage, up to 16 GB/s of sequential disk throughput, and up to 100 Gbps of network bandwidth.

BeeGFS, originally released by ThinkParQ in 2014, is an open source, software defined PVFS that runs on Linux. You can scale the size and performance of the BeeGFS file-system by configuring the number of servers and disks in the clusters up to thousands of nodes.

BeeGFS architecture

D3en instances offer HDD storage while I3en instances offer NVMe SSD storage. This diversity allows you to create tiers of storage based on performance requirements. In the example presented in this post you will use four D3en.8xlarge (32 vCPU, 128 GB, 16x14TB HDD, 50 Gbit) and two I3en.12xlarge (48 vCPU, 384 GB, 4 x 7.5-TB NVMe) instances to create two storage tiers. You may choose different sizes and quantities to meet your needs. The I3en instances, with SSD, will be configured as tier 1 and the D3en instances, with HDD, will be configured as tier 2. One disk from each instance will be formatted as ext4 and used for metadata while the remaining disks will be formatted as XFS and used for storage. You may choose to separate metadata and storage on different hosts for workloads where these must scale independently. The array will be configured RAID 0, since it will provide maximum performance. Software replication or other RAID types can be employed for higher durability.

BeeGFS architecture

Figure 1: BeeGFS architecture

You will deploy all instances within a single VPC in the same Availability Zone and subnet to minimize latency. Security groups must be configured to allow the following ports:

  • Management service (beegfs-mgmtd): 8008
  • Metadata service (beegfs-meta): 8005
  • Storage service (beegfs-storage): 8003
  • Client service (beegfs-client): 8004

You will use the Debian Quick Start Amazon Machine Image (AMI) as it supports BeeGFS. You can enable Amazon CloudWatch to capture metrics.

How to deploy the BeeGFS architecture

Follow the steps below to create the PVFS described above. For automated deployment, use the CloudFormation template located at AWS Samples.

  1. Use the AWS Management Console or CLI to deploy one D3en.8xlarge instance into a VPC as described above.
  2. Log in to the instance and update the system:
    • sudo apt update
    • sudo apt upgrade
  3. Install the XFS utilities and load the kernel module:
    • sudo apt-get -y install xfsprogs
    • sudo modprobe -v xfs

Format the first disk ext4 as it is used for metadata, the rest are formatted xfs. The disks will appear as “nvme???” which actually represent the HDD drives on the D3en instances.

4. View a listing of available disks:

    • sudo lsblk

5. Format hard disks:

    • sudo mkfs -t ext4 /dev/nvme0n1
    • sudo mkfs -t xfs /dev/nvme1n1
    • Repeat this command for disks nvme2n1 through nvme15n1

6. Create file system mount points:

    • sudo mkdir /disk00
    • sudo mkdir /disk01
    • Repeat this command for disks disk02 through disk15

7. Mount the filesystems:

    • sudo mount /dev/nvme0n1 /disk00
    • sudo mount /dev/nvme0n1 /disk01
    • Repeat this command for disks disk02 through disk15

Repeat steps 1 through 7 on the remaining nodes. Remember to account for fewer disks for i3en.12xlarge instances or if you decide to use different instance sizes.

8. Add the BeeGFS Repo to each node:

    • sudo apt-get -y install gnupg
    • wget https://www.beegfs.io/release/beegfs_7.2.3/dists/beegfs-deb10.list
    • sudo cp beegfs-deb10.list /etc/apt/sources.list.d/
    • sudo wget -q https://www.beegfs.io/release/latest-stable/gpg/DEB-GPG-KEY-beegfs -O- | sudo apt-key add -
    • sudo apt update

9. Install BeeGFS management (node 1 only):

    • sudo apt-get -y install beegfs-mgmtd
    • sudo mkdir /beegfs-mgmt
    • sudo /opt/beegfs/sbin/beegfs-setup-mgmtd -p /beegfs-mgmt/beegfs/beegfs_mgmtd

10. Install BeeGFS metadata and storage (all nodes):

    • sudo apt-get -y install beegfs-meta beegfs-storage beegfs-meta beegfs-client beegfs-helperd beegfs-utils
    • # -s is unique ID based on node - change this!, -m is hostname of management server
    • sudo /opt/beegfs/sbin/beegfs-setup-meta -p /disk00/beegfs/beegfs_meta -s 1 -m ip-XXX-XXX-XXX-XXX
    • # Change -s to nodeID and -i to (nodeid)0(disk), -m is hostname of management server
    • sudo /opt/beegfs/sbin/beegfs-setup-storage -p /disk01/beegfs_storage -s 1 -i 101 -m ip-XXX-XXX-XXX-XXX
    • sudo /opt/beegfs/sbin/beegfs-setup-storage -p /disk02/beegfs_storage -s 1 -i 102 -m ip-XXX-XXX-XXX-XXX
    • Repeat this last command for the remaining disks disk03 through disk15

11. Start the services:

    • #Only on node1
    • sudo systemctl start beegfs-mgmtd
    • #All servers
    • sudo systemctl start beegfs-meta
    • sudo systemctl start beegfs-storage

At this point, your BeeGFS cluster is running and ready for use by a client system. The client system requires BeeGFS client software in order to mount the cluster.

12. Deploy an m5n.2xlarge instance into the same subnet as the PVFS cluster.

13. Log in to the instance, install, and configure the client:

    • sudo apt update
    • sudo apt upgrade
    • sudo apt-get -y install gnupg
    • #Need linux sources for client compilation
    • sudo apt-get -y install linux-source
    • sudo apt-get -y install linux-headers-4.19.0-14-all
    • wget https://www.beegfs.io/release/beegfs_7.2.3/dists/beegfs-deb10.list
    • sudo cp beegfs-deb10.list /etc/apt/sources.list.d/
    • sudo wget -q https://www.beegfs.io/release/latest-stable/gpg/DEB-GPG-KEY-beegfs -O- | sudo apt-key add -
    • sudo apt update
    • sudo apt-get -y install beegfs-client beegfs-helperd beegfs-utils
    • sudo /opt/beegfs/sbin/beegfs-setup-client -m ip-XXX-XXX-XXX-XX # use the ip address of the management node
    • sudo systemctl start beegfs-helperd
    • sudo systemctl start beegfs-client

14. Create the storage pools:

    • sudo beegfs-ctl --addstoragepool —desc="tier1" —targets=501,502,503,601,602,603
    • sudo beegfs-ctl --addstoragepool --desc="tier2" --targets=101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,201,202,203,204,205,206,207,208,209,210,
      211,212,213,214,215,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,401,402,403,404,405,406,407,
      408,409,410,411,412,413,414,415
    • sudo beegfs-ctl --liststoragepools
    • Pool ID Pool Description                      Targets                 Buddy Groups
    • ======= ================== ============================ ============================
      • Default
      • tier1 501,502,503,601,602,603
      • tier2 101,102,103,104,105,106,107,
        • 108,109,110,111,112,113,114,
        • 115,201,202,203,204,205,206,
        • 207,208,209,210,211,212,213,
        • 214,215,301,302,303,304,305,
        • 306,307,308,309,310,311,312,
        • 313,314,315,401,402,403,404,
        • 405,406,407,408,409,410,411,
        • 412,413,414,415

15. Mount the pools to the file system:

    • sudo beegfs-ctl --setpattern --storagepoolid=2 /mnt/beegfs/tier1
    • sudo beegfs-ctl --setpattern --storagepoolid=3 /mnt/beegfs/tier2

The BeeGFS PVFS is now ready to be used by the client system.

How to test your new BeeGFS PVFS

BeeGFS provides StorageBench to evaluate the performance of BeeGFS on the storage targets. This benchmark measures the streaming throughput of the underlying file system and devices independent of the network performance. To simulate client I/O, this benchmark generates read/write locally on the servers without any client communication.

It is possible to benchmark specific targets or all targets together using the “servers” parameter. A “read” or “write” parameter sets the type pf test to perform. The “threads” parameter is set to the number of storage devices.

Try the following commands to test performance:

Write test (1x d3en):

sudo beegfs-ctl --storagebench --servers=1 --write --blocksize=512K —size=20G —threads=15

Write test (4x d3en):

sudo beegfs-ctl --storagebench --alltargets --write --blocksize=512K —size=20G —threads=15

Read test (4x d3en):

sudo beegfs-ctl --storagebench --servers=1,2,3,4 --read --blocksize=512K --size=20G --threads=15

Write test (1x i3en):

sudo beegfs-ctl --storagebench --servers=5 --write --blocksize=512K --size=20G --threads=3

Read test (2x i3en):

sudo beegfs-ctl --storagebench --servers=5,6 --read --blocksize=512K —size=20G —threads=3

StorageBench is a great way to test what the potential performance of a given environment looks like by reducing variables like network throughput and latency, but you may want to test in a more real-world fashion. For this, tools like ‘fio’ can generate mixed read/write workloads against files on the client BeeGFS mountpoint.

First, we need to define which directory goes to which Storage Pool (tier) by setting a pattern:

sudo beegfs-ctl --setpattern --storagepoolid=2 /mnt/beegfs/tier1 sudo beegfs-ctl --setpattern --storagepoolid=3 /mnt/beegfs/tier2

You can see how a file gets striped across the various disks in a pool by adding a file and running the command:

sudo beegfs-ctl —getentryinfo /mnt/beegfs/tier1/myfile.bin

Install fio:

sudo apt-get install -y fio

Now you can run a fio test against one of the tiers.  This example command runs eight threads running a 75/25 read/write workload against a 10-GB file:

sudo fio --numjobs=8 --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/mnt/beegfs/tier1/test --bs=512k --iodepth=64 --size=10G --readwrite=randrw --rwmixread=75

Cleaning up

To avoid ongoing charges for resources you created, you should:

Conclusion

In this blog post we demonstrated how to build and manage your own BeeGFS Parallel Virtual File System on AWS. In this example, you created two storage tiers using the I3en and D3en. The I3en was used as the first tier for SSD storage and the D3en was used as a second tier for HDD storage. By using two different tiers, you can optimize performance to meet your application requirements.

Amazon EC2 storage-optimized instances make it easy to deploy the BeeGFS Parallel Virtual File System. Using combinations of SSD and HDD storage available on the I3en and D3en instance types, you can achieve the capacity and performance needed to run the most demanding workloads. Read more about the D3en and I3en instances.

Improve the performance of Lambda applications with Amazon CodeGuru Profiler

Post Syndicated from Vishaal Thanawala original https://aws.amazon.com/blogs/devops/improve-performance-of-lambda-applications-amazon-codeguru-profiler/

As businesses expand applications to reach more users and devices, they risk higher latencies that could degrade the  customer experience. Slow applications can frustrate or even turn away customers. Developers therefore increasingly face the challenge of ensuring that their code runs as efficiently as possible, since application performance can make or break businesses.

Amazon CodeGuru Profiler helps developers improve their application speed by analyzing its runtime. CodeGuru Profiler analyzes single and multi-threaded applications and then generates visualizations to help developers understand the latency sources. Afterwards, CodeGuru Profiler provides recommendations to help resolve the root cause.

CodeGuru Profiler recently began providing recommendations for applications written in Python. Additionally, the new automated onboarding process for AWS Lambda functions makes it even easier to use CodeGuru Profiler with serverless applications built on AWS Lambda.

This post highlights these new features by explaining how to set up and utilize CodeGuru Profiler on an AWS Lambda function written in Python.

Prerequisites

This post focuses on improving the performance of an application written with AWS Lambda, so it’s important to understand the Lambda functions that work best with CodeGuru Profiler. You will get the most out of CodeGuru Profiler on long-duration Lambda functions (>10 seconds) or frequently invoked shorter generation Lambda functions (~100 milliseconds). Because CodeGuru Profiler requires five minutes of runtime data before the Lambda container is recycled, very short duration Lambda functions with an execution time of 1-10 milliseconds may not provide sufficient data for CodeGuru Profiler to generate meaningful results.

The automated CodeGuru Profiler onboarding process, which automatically creates the profiling group for you, supports Lambda functions running on Java 8 (Amazon Corretto), Java 11, and Python 3.8 runtimes. Additional runtimes, without the automated onboarding process, are supported and can be found in the Java documentation and the Python documentation.

Getting Started

Let’s quickly demonstrate the new Lambda onboarding process and the new Python recommendations. This example assumes you have already created a Lambda function, so we will just walk through the process of turning on CodeGuru Profiler and viewing results. If you don’t already have a Lambda function created, you can create one by following these set up instructions. If you would like to replicate this example, the code we used can be found on GitHub here.

  1. On the AWS Lambda Console page, open your Lambda function. For this example, we’re using a function with a Python 3.8 runtime.

 This image shows the Lambda console page for the Lambda function we are referencing.

2. Navigate to the Configuration tab, go to the Monitoring and operations tools page, and click Edit on the right side of the page.

This image shows the instructions to open the page to turn on profiling

3. Scroll down to “Amazon CodeGuru Profiler” and click the button next to “Code profiling” to turn it on. After enabling Code profiling, click Save.

This image shows how to turn on CodeGuru Profiler

4. Verify that CodeGuru Profiler has been turned on within the Monitoring and operations tools page

This image shows how to validate Code profiling has been turned on

That’s it! You can now navigate to CodeGuru Profiler within the AWS console and begin viewing results.

Viewing your results

CodeGuru Profiler requires 5 minutes of Lambda runtime data to generate results. After your Lambda function provides this runtime data, which may need multiple runs if your lambda has a short runtime, it will display within the “Profiling group” page in the CodeGuru Profiler console. The profiling group will be given a default name (i.e., aws-lambda-<lambda-function-name>), and it will take approximately 15 minutes after CodeGuru Profiler receives the runtime data before it appears on this page.

This image shows where you can see the profiling group created after turning on CodeGuru Profiler on your Lambda function

After the profile appears, customers can view their profiling results by analyzing the flame graphs. Additionally, after approximately 1 hour, customers will receive their first recommendation set (if applicable). For more information on how to reading the CodeGuru Profiler results, see Investigating performance issues with Amazon CodeGuru Profiler.

The below images show the two flame graphs (CPU Utilization and Latency) generated from profiling the Lambda function. Note that the highlighted horizontal bar (also referred to as a frame) in both images corresponds with one of the three frames that generates a recommendation. We’ll dive into more details on the recommendation in the following sections.

CPU Utilization Flame Graph:

This image shows the CPU Utilization flame graph for the Lambda function

Latency Flame Graph:

This image shows the Latency flame graph for the Lambda function

Here are the three recommendations generated from the above Lambda function:

This image shows the 3 recommendations generated by CodeGuru Profiler

Addressing a recommendation

Let’s dive even further into it an example recommendation. The first recommendation above notices that the Lambda function is spending more than the normal amount of runnable time (6.8% vs <1%) creating AWS SDK service clients. It recommends ensuring that the function doesn’t unnecessarily create AWS SDK service clients, which wastes CPU time.

This image shows the detailed recommendation for 'Recreation of AWS SDK service clients'

Based on the suggested resolution step, we made a quick and easy code change, moving the client creation outside of the lambda-handler function. This ensures that we don’t create unnecessary AWS SDK clients. The code change below shows how we would resolve the issue.

...
s3_client = boto3.client('s3')
cw_client = boto3.client('cloudwatch')

def lambda_handler(event, context):
...

Reviewing your savings

After making each of the three changes recommended above by CodeGuru Profiler, look at the new flame graphs to see how the changes impacted the applications profile. You’ll notice below that we no longer see the previously wide frames for boto3 clients, put_metric_data, or the logger in the S3 API call.

CPU Utilization Flame Graph:

This image shows the CPU Utilization flame graph after making the recommended changes

Latency Flame Graph:

This image shows the Latency flame graph after making the recommended changes

Moreover, we can run the Lambda function for one day (1439 invocations) and see the results in Lambda Insights in order to understand our total savings. After every recommendation was addressed, this Lambda function, with a 128 MB memory and 10 second timeout, decreased in CPU time by 10% and dropped in maximum memory usage and network IO leading to a 30% drop in GB-s. Decreasing GB-s leads to 30% lower cost for the Lambda’s duration bill as explained in the AWS Lambda Pricing.

Latency (before and after CodeGuru Profiler):

The graph below displays the duration change while running the Lambda function for 1 day.

This image shows the duration change while running the Lambda function for 1 day.

Cost (before and after CodeGuru Profiler):

The graph below displays the function cost change while running the lambda function for 1 day.

This image shows the function cost change while running the lambda function for 1 day.

Conclusion

This post showed how developers can easily onboard onto CodeGuru Profiler in order to improve the performance of their serverless applications built on AWS Lambda. Get started with CodeGuru Profiler by visiting the CodeGuru console.

All across Amazon, numerous teams have utilized CodeGuru Profiler’s technology to generate performance optimizations for customers. It has also reduced infrastructure costs, saving millions of dollars annually.

Onboard your Python applications onto CodeGuru Profiler by following the instructions on the documentation. If you’re interested in the Python agent, note that it is open-sourced on GitHub. Also for more demo applications using the Python agent, check out the GitHub repository for additional samples.

About the authors

Headshot of Vishaal

Vishaal is a Product Manager at Amazon Web Services. He enjoys getting to know his customers’ pain points and transforming them into innovative product solutions.

 

 

 

 

 

Headshot of Mirela

Mirela is working as a Software Development Engineer at Amazon Web Services as part of the CodeGuru Profiler team in London. Previously, she has worked for Amazon.com’s teams and enjoys working on products that help customers improve their code and infrastructure.

 

 

 

 

Headshot of Parag

Parag is a Software Development Engineer at Amazon Web Services as part of the CodeGuru Profiler team in Seattle. Previously, he was working for Amazon.com’s catalog and selection team. He enjoys working on large scale distributed systems and products that help customers build scalable, and cost-effective applications.

How to run massively multiplayer games with EC2 Spot using Aurora Serverless

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/how-to-run-massively-multiplayer-games-with-ec2-spot-using-aurora-serverless/

This post is written by Yahav Biran, Principal Solutions Architect, and Pritam Pal, Sr. EC2 Spot Specialist SA

Massively multiplayer online (MMO) game servers must dynamically scale their compute and storage to create a world-scale persistence simulation with millions of dynamic objects, such as complex AR/VR synthetic environments that match real-world fidelity. The Elastic Kubernetes Service (EKS) powered by Amazon EC2 Spot and Aurora Serverless allow customers to create world-scale persistence simulations processed by numerous cost-effective compute chipsets, such as ARM, x86, or Nvidia GPU. It also persists them On-Demand — an automatic scaling configuration of open-source database engines like MySQL or PostgreSQL without managing any database capacity. This post proposes a fully open-sourced option to build, deploy, and manage MMOs on AWS. We use a Python-base game server to demonstrate the MMOs.

Challenge

Increasing competition in the gaming industry has driven game developers to architect cost-effective game servers that can scale up to meet player demands and scale down to meet cost goals. AWS enables MMO developers to scale their game beyond the limits of a single server. The game state (world) can spatially partition across many Regions based on the requested number of sessions or simulations using Amazon EC2 compute power. As the game progresses over many sessions, you must track the simulation’s global state. Amazon Aurora maintains the global game state in memory to manage complex interactions, such as hand-offs across instances and Regions. Amazon Aurora Serverless powers all of these for PostgreSQL or MySQL.

This blog shows how to use a commodity server using an ephemeral disk and decouple the game state from the game server. We store the game state in Aurora for PostgreSQL, but you can also use DynamoDB or KeySpaces for the NoSQL case.

 

Game overview

We use a Minecraft clone to demonstrate a distributed persistence simulation. The game server is python-based deployed on Agones, an open-source multiplayer dedicated game-server platform on Kubernetes. The Kubernetes cluster is powered by EC2 Spot Instances and configured with EC2 instances to auto-scale that expands and shrinks the compute seamlessly upon game-server allocation. We add a git-ops-based continuous delivery system that stores the game-server binaries and config in a git repository and deploys the game in a cluster deploy in one or more Regions to allow global compute scale. The following image is a diagram of the architecture.

The game server persists every object in an Aurora Serverless PostgreSQL-compatible edition. The serverless database configuration aids automatic start-up and scales capacity up or down as per player demand. The world is divided into 32×32 block chunks in the XYZ plane (Y is up). This allows it to be “infinite” (PostgreSQL Bigint type) and eases data management. Only visible chunks must be queried from the database.

The central database table is named “block” and has the columns p, q, x, y, z, w. (p, q) identifies the chunk, (x, y, z) identifies the block position, and (w) identifies the block type. 0 represents an empty block (air).

In the game, the chunks store their blocks in a hash map. An (x, y, z) key maps to a (w) value.

The y positions of blocks are limited to 0 <= y < 256. The upper limit is essentially an artificial limitation that prevents users from building tall structures. Users cannot destroy blocks at y = 0 to avoid falling underneath the world.

Solution overview

Kubernetes allows dedicated game server scaling and orchestration without limiting the compute platform spanning across many Regions and staying closer to the player. For simplicity, we use EKS to reduce operational overhead.

Amazon EC2 runs the compute simulation, which might require different EC2 instance types. These include compute-optimized instances for compute-bound applications benefiting from high-performance processors or accelerated compute (GPU), using hardware accelerators to perform functions like graphics processing or data pattern matching. In addition, the EC2 Auto Scaling  runs the game-server configured to use Amazon EC2 Spot Instances in order to allow up to 90% discount as compared to On-Demand Instance prices. However, Amazon EC2 can interrupt your Spot Instance when the demand for Spot Instances rises, when the supply of Spot Instances decreases, or when the Spot price exceeds your maximum price.

The following two mechanisms minimize the EC2 reclaim compute capacity impact:

  1. Pull interruption notifications and notify the game-server to replicate the session to another game server.
  2. Prioritize compute capacity based on availability.

Two Auto Scaling groups are deployed for method two. The first Auto Scaling group uses latest generation Spot Instances (C5, C5a, C5n, M5, M5a, and M5n) instances, and the second uses all generations x86-based instances (C4 and M4). We configure the cluster-autoscaler that controls the Auto Scaling group size with the Expander priority option in order to favor the latest generation Spot Auto Scaling group. The priority should be a positive value, and the highest value wins. For each priority value, a list of regular expressions should be given. The following example assumes an EKS cluster craft-us-west-2 and two ASGs. The craft-us-west-2-nodegroup-spot5 Auto Scaling group wins the priority. Therefore, new instances will be spawned from the EC2 Spot Auto Scaling group.

.…
- --expander=priority
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/craft-us-west-2
….
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    10:
      - .*craft-us-west-2-nodegroup-spot4*
    50:
      - .*craft-us-west-2-nodegroup-spot5*
---

The complete working spec is available in https://github.com/aws-samples/spotable-game-server/blob/master/specs/cluster_autoscaler.yml.

We propose the following two options to minimize player impact during interruption. The first is based on two-minute interruption notification, and the second on rebalance recommendations.

  1. Notify that that the instance will be shut down within two minutes.
  2. Notify when a Spot Instance is at elevated risk of interruption, this signal can arrive sooner than the two-minute Spot Instance interruption notice.

Choosing between the two depends on the transition you want for the player. Both cases deploy DaemonSet that listens to either notification and notifies every game server running on the EC2 instance. It also prevents new game-servers from running on this instance.

The daemon set pulls from the instance metadata every five seconds denoted by POLL_INTERVAL as follows:

while http_status=$(curl -o /dev/null -w '%{http_code}' -sL ${NOTICE_URL}); [ ${http_status} -ne 200 ]; do
  echo $(date): ${http_status}
  sleep ${POLL_INTERVAL}
done

where NOTICE_URL can be either

NOTICE_URL=”http://169.254.169.254/latest/meta-data/spot/termination-time”

Or, for the second option:

NOTICE_URL=”http://169.254.169.254/latest/meta-data/events/recommendations/rebalance”

The command that notifies all the game-servers about the interruption is:

kubectl drain ${NODE_NAME} --force --ignore-daemonsets --delete-local-data

From that point, every game server that runs on the instance gets notified by the Unix signal SIGTERM.

In our example server.py, we tell the OS to signal the Python process and complete the sig_handler function. The example prompts a message to every connected player regarding the incoming interruption.

def sig_handler(signum, frameframe):
  log('Signal handler called with signal',signum)
  model.send_talk("WARN game server maintenance is pending - your universe is saved")

def main():
    ..
    signal.signal(signal.SIGTERM,sig_handler)

 

Why Agones?

Agones orchestrates game servers via declarative configuration in order to manage groups of ready game-servers to play. It also offers integrated SDK for managing game server lifecycle, health, and configuration. Finally, it runs on Kubernetes, so it is an all-up open-source platform that runs anywhere. The Agones SDK is easily implemented. Furthermore, combining the compute platform AWS and Agones offers the most secure, resilient, scalable, and cost-effective method for running an MMO.

 

In our example, we implemented the /health in agones_health and /allocate in agones_allocate calls. Then, the agones_health() called upon the server init to indicate that it is ready to assign new players.

def agones_allocate(model):
  url="http://localhost:"+agones_port+"/allocate"
  req = urllib2.Request(url)
  req.add_header('Content-Type','application/json')
  req.add_data('')
  r = urllib2.urlopen(req)
  resp=r.getcode()
  log('agones- Response code from agones allocate was:',resp)
  model.send_talk("new player joined - reporting to agones the server is allocated") 

The agones_health() using the native health and relay its health to keep the game-server in a viable state.

def agones_health(model):
  url="http://localhost:"+agones_port+"/health"
  req = urllib2.Request(url)
  req.add_header('Content-Type','application/json')
  req.add_data('')
  while True:
    model.ishealthy()
    r = urllib2.urlopen(req)
    resp=r.getcode()
    log('agones- Response code from agones health was:',resp)
    time.sleep(10)

The main() function forks a new process that reports health. Agones manages the port allocation and maintains the game state, e.g., Allocated, Scheduled, Shutdown, Creating, and Unhealthy.

def main():
    …
    server = Server((host, port), Handler)
    server.model = model
    newpid=os.fork()
    if newpid ==0:
      log('agones-in child process about to call agones_health()')
      agones_health()
      log('agones-in child process called agones_health()')
    else:
      pids = (os.getpid(), newpid)
      log('agones server pid and health pid',pids)
    log('SERV', host, port)
    server.serve_forever()

 

Other ways than Agones on EKS?

Configuring an Agones group of ready game-servers behind a load balancer is difficult. Agones game-servers endpoint must be published so that players’ clients can connect and play. Agones creates game-servers endpoints that are an IP and port pair. The IP is the public IP of the EC2 Instance. The port results from PortPolicy, which generates a non-predictable port number. Hence, make it impossible to use with load balancer such as Amazon Network Load Balancer (NLB).

Suppose you want to use a load balancer to route players via a predictable endpoint. You could use the Kubernetes Deployment construct and configure a Kubernetes Service construct with NLB and no need to implement additional SDK or install any additional components on your EKS cluster.

The following example defines a service craft-svc that creates an NLB that will route TCP connections to Pod targets carrying the selector craft and listening on port 4080.

apiVersion: v1
kind: Service
metadata:
  name: craft-svc
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  selector:
    app: craft
  ports:
    - protocol: TCP
      port: 4080
      targetPort: 4080
  type: LoadBalancer

The game server Deployment set the metadata label for the Service load balancer and the port.

Furthermore, the Kubernetes readinessProbe and livenessProbe offer similar features as the Agones SDK /allocate and /health implemented prior, making the Deployment option parity with the Agones option.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: craft
  name: craft
spec:
…
    metadata:
      labels:
        app: craft
    spec:
      …
        readinessProbe:
          tcpSocket:
            port: 4080
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: 4080
          initialDelaySeconds: 5
          periodSeconds: 10

Overview of the Database

The compute layer running the game could be reclaimed at any time within minutes, so it is imperative to continuously store the game state as players progress without impacting the experience. Furthermore, it is essential to read the game state quickly from scratch upon session recovery from an unexpected interruption. Therefore, the game requires fast reads and lazy writes. More considerations are consistency and isolation. Developers could handle inconsistency via the game in order to relax hard consistency requirements. As for isolation, in our Craft example players can build a structure without other players seeing it and publish it globally only when they choose.

Choosing the proper database for the MMO depends on the game state structure, e.g., a set of related objects such as our Craft example or a single denormalized table. The former fits the relational model used with RDS or Aurora open-source databases such as MySQL or PostgreSQL. While the latter can be used with Keyspaces, an AWS managed Cassandra, or a key-value store such as DynamoDB. Our example includes two options to store the game state with Aurora Serverless for PostgreSQL or DynamoDB. We chose those because of the ACID support. PostgreSQL offers four isolation levels: dirty read, nonrepeatable read, phantom read, and serialization anomaly. DynamoDB offers two isolation levels: serializable and read-committed. Both databases’ options allow the game developer to implement the best player experience and avoid additional implementation efforts.

Moreover, both engines offer Restful connection methods to the database. Aurora uses Data API. The Data API doesn’t require a persistent DB cluster connection. Instead, it provides a secure HTTP endpoint and integration with AWS SDKs. Game developers can run SQL statements via the endpoint without managing connections. DynamoDB only supports a Restful connection. Finally, Aurora Serverless and DynamoDB scale the compute and storage to reduce operational overhead and pay only for what was played.

 

Conclusion

MMOs are unique because they require infrastructure features similar to other game types like FPS or casual games, and reliable persistence storage for the game-state. Unfortunately, this leads to expensive choices that make monetizing the game difficult. Therefore, we proposed an option with a fun game to help you, the developer, analyze what is best for your MMO. Our option is built upon open-source projects that allow you to build it anywhere, but we also show that AWS offers the most cost-effective and scalable option. We encourage you to read recent announcements about these topics, including several at AWS re:Invent 2021.

 

Understanding Amazon Machine Images for Red Hat Enterprise Linux with Microsoft SQL Server

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/understanding-amazon-machine-images-for-red-hat-enterprise-linux-with-microsoft-sql-server/

This post is written by Kumar Abhinav, Sr. Product Manager EC2, and David Duncan, Principal Solution Architect. 

Customers now have access to AWS license-included Amazon Machine Images (AMI) for hosting their SQL Server workloads with Red Hat Enterprise Linux (RHEL). With these AMIs, customers can easily build highly available, reliable, and performant Microsoft SQL Server 2017 and 2019 clusters with RHEL 7 and 8, with a simple pay-as-you-go pricing model that doesn’t require long-term commitments or upfront costs. By switching from Windows Server to RHEL to run SQL Server workloads, customers can reduce their operating system licensing costs and further lower total cost of ownership. This blog post provides a deep dive into how to deploy SQL Server on RHEL using these new AMIs, how to tune instances for performance, and how to reduce licensing costs with RHEL.

Overview

The RHEL AMIs with SQL Server are customized for the following editions of SQL Server 2019 or SQL Server 2017:

  • Express/Web Edition is an entry-level database for small web and mobile apps.
  • Standard Edition is a full-featured database designed for medium-sized applications and data marts.
  • Enterprise Edition is a mission-critical database built for high-performing, intelligent apps.

To maintain high availability (HA), you can bring the same HA configuration from an on-premises Enterprise Edition to AWS by combining SQL Server Availability Groups with the Red Hat Enterprise Linux HA add-on. The HA add-on provides a light-weight cluster management portfolio that runs across multiple AWS Availability Zones to eliminate single point of failure and deliver timely recovery when there is need.

These AMIs include all the software packages required to run SQL Server on RHEL along with the most recent updates and security patches. By removing some of the heavy lifting around deployment, you can deploy SQL Server instances faster to accommodate growth and service events.

The following architecture diagram shows the necessary building blocks for SQL Server Enterprise HA configuration on RHEL.

instance configuration

To build the RHEL 7 and RHEL 8 AMIs, we focused on requirements from the Red Hat community of practice for SQL Server, which includes code, documentation, playbooks, and other artifacts relating to deployment of SQL Server on RHEL. We used Amazon EC2 Image Builder for installing and updating Microsoft SQL Server. For custom configuration or for installing any additional software, you can use the base machine image and extend it using EC2 Image Builder. If you have different, requirements, you can build your own images using the Red Hat Image Builder.

Tuning for virtual performance and compatibility with storage options

It is a common practice to apply pre-built performance profiles for SQL Server deployments when running on-premises. However, you do not need to apply the operating system performance tuning profiles for SQL Server to optimize the performance on Amazon EC2. The AMIs include the virtual-guest tuning from Red Hat along with additional optimizations for the EC2 environment. For example, the images include the timeout for NVMe IO operations set to the maximum possible value for an experience that is more consistent with the way EBS volumes are managed. Database administrators can further configure workload specific tuning parameters such as paging, swapping, and memory pressure using the Microsoft SQL Server performance best practices guidelines.

SQL Server availability groups help achieve HA and improve the read performance of your database cluster. However, this approach only improves availability at the database layer. RHEL with HA further improves the availability of a SQL Server cluster by providing service failover capabilities at the operating system layer. You can easily build a highly available database cluster as shown in the following figure by using the RHEL with SQL Server and HA add-on AMI on an instance of their choice in multiple Availability Zones.

 

High availability SQL Cluster built on top of RHEL HA

When it comes to storage, AWS offers many different choices. Amazon Elastic Block Storage (EBS) offers Provisioned IOPS volumes for specific performance requirements, where you know you need a specific level of performance required for the database operations. Provisioned IOPS are an excellent option when the general-purpose volume doesn’t meet your requirements of levels of I/O operations necessary for your production database. EBS volumes add the flexibility you need to increase your volume storage space size and performance through API calls. With Amazon EBS, you can also use additional data volumes directly attached to your instance or leverage multiple instance store volumes for performance targets. Volume IOPS are optimized to be sustainable even if they climb into the thousands, and your maximum IOPS does not decrease.

Storing data on secondary volumes improves performance of your database

Provisioning only one large root EBS volume for the database storage and sharing that with the operating system and any logging, management tools, or monitoring processes is not a well-architected solution. That shared activity reduces the bandwidth and operational performance of the database workloads. On the other hand, using practices like separating the database workloads onto separate EBS volumes or leveraging instance store volumes work well for use cases like storing a large number of temp tables. By separating the volumes by specialized activities, the performance of each component is independently manageable. Profile your utilization to choose the right combination of EBS and instance storage options for your workload.

Lower Total Cost of Ownership

Another benefit of using RHEL AMIs with SQL Server is cost savings. When you move from Windows Server to RHEL to run SQL Server, you can reduce the operating system licensing costs. Windows Server virtual machines are priced per core and hence you pay more for virtual machines with large numbers of cores. In other words, the more virtual machine cores you have, the more you pay in software license fees. On the other hand, RHEL has just two pricing tiers. One is for virtual machines with fewer than four cores and the other is for virtual machines with four cores or more. You pay the same operating system software subscription fees whether you choose a virtual machine with two or three virtual cores. Similarly, for larger workloads, your operating system software subscription costs are the same no matter whether you choose a virtual machine with eight or 16 virtual cores.

In addition, with the elasticity of EC2, you can save costs by sizing your starting workloads accordingly and later resizing your instances to prevent over provisioning compute resources when workloads experience uneven usage patterns, such as month end business reporting and batch programs. You can choose to use On-Demand Instances or use Savings Plans to build flexible pricing for long-term compute costs effectively. With AWS, you have the flexibility to right-size your instances and save costs without compromising business agility.

Conclusion

These new RHEL with SQL Server AMIs on Amazon EC2 are pre-configured and optimized to reduce undifferentiated heavy lifting. Customers can easily build highly available, reliable, and performant database clusters using RHEL with SQL Server along with the HA add-on and Provisioned IOPS EBS volumes. To get started, search for RHEL with SQL Server in the Amazon EC2 Console or find it in the AWS Marketplace. To learn more about Red Hat Enterprise Linux on EC2, check out the frequently asked questions page.

 

Deploy data lake ETL jobs using CDK Pipelines

Post Syndicated from Ravi Itha original https://aws.amazon.com/blogs/devops/deploying-data-lake-etl-jobs-using-cdk-pipelines/

Many organizations are building data lakes on AWS, which provides the most secure, scalable, comprehensive, and cost-effective portfolio of services. Like any application development project, a data lake must answer a fundamental question: “What is the DevOps strategy?” Defining a DevOps strategy for a data lake requires extensive planning and multiple teams. This typically requires multiple development and test cycles before maturing enough to support a data lake in a production environment. If an organization doesn’t have the right people, resources, and processes in place, this can quickly become daunting.

What if your data engineering team uses basic building blocks to encapsulate data lake infrastructure and data processing jobs? This is where CDK Pipelines brings the full benefit of infrastructure as code (IaC). CDK Pipelines is a high-level construct library within the AWS Cloud Development Kit (AWS CDK) that makes it easy to set up a continuous deployment pipeline for your AWS CDK applications. The AWS CDK provides essential automation for your release pipelines so that your development and operations team remain agile and focus on developing and delivering applications on the data lake.

In this post, we discuss a centralized deployment solution utilizing CDK Pipelines for data lakes. This implements a DevOps-driven data lake that delivers benefits such as continuous delivery of data lake infrastructure, data processing, and analytical jobs through a configuration-driven multi-account deployment strategy. Let’s dive in!

Data lakes on AWS

A data lake is a centralized repository where you can store all of your structured and unstructured data at any scale. Store your data as is, without having to first structure it, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning in order to guide better decisions. To further explore data lakes, refer to What is a data lake?

We design a data lake with the following elements:

  • Secure data storage
  • Data cataloging in a central repository
  • Data movement
  • Data analysis

The following figure represents our data lake.

Data Lake on AWS

We use three Amazon Simple Storage Service (Amazon S3) buckets:

  • raw – Stores the input data in its original format
  • conformed – Stores the data that meets the data lake quality requirements
  • purpose-built – Stores the data that is ready for consumption by applications or data lake consumers

The data lake has a producer where we ingest data into the raw bucket at periodic intervals. We utilize the following tools: AWS Glue processes and analyzes the data. AWS Glue Data Catalog persists metadata in a central repository. AWS Lambda and AWS Step Functions schedule and orchestrate AWS Glue extract, transform, and load (ETL) jobs. Amazon Athena is used for interactive queries and analysis. Finally, we engage various AWS services for logging, monitoring, security, authentication, authorization, alerting, and notification.

A common data lake practice is to have multiple environments such as dev, test, and production. Applying the IaC principle for data lakes brings the benefit of consistent and repeatable runs across multiple environments, self-documenting infrastructure, and greater flexibility with resource management. The AWS CDK offers high-level constructs for use with all of our data lake resources. This simplifies usage and streamlines implementation.

Before exploring the implementation, let’s gain further scope of how we utilize our data lake.

The solution

Our goal is to implement a CI/CD solution that automates the provisioning of data lake infrastructure resources and deploys ETL jobs interactively. We accomplish this as follows: 1) applying separation of concerns (SoC) design principle to data lake infrastructure and ETL jobs via dedicated source code repositories, 2) a centralized deployment model utilizing CDK pipelines, and 3) AWS CDK enabled ETL pipelines from the start.

Data lake infrastructure

Our data lake infrastructure provisioning includes Amazon S3 buckets, S3 bucket policies, AWS Key Management Service (KMS) encryption keys, Amazon Virtual Private Cloud (Amazon VPC), subnets, route tables, security groups, VPC endpoints, and secrets in AWS Secrets Manager. The following diagram illustrates this.

Data Lake Infrastructure

Data lake ETL jobs

For our ETL jobs, we process New York City TLC Trip Record Data. The following figure displays our ETL process, wherein we run two ETL jobs within a Step Functions state machine.

AWS Glue ETL Jobs

Here are a few important details:

  1. A file server uploads files to the S3 raw bucket of the data lake. The file server is a data producer and source for the data lake. We assume that the data is pushed to the raw bucket.
  2. Amazon S3 triggers an event notification to the Lambda function.
  3. The function inserts an item in the Amazon DynamoDB table in order to track the file processing state. The first state written indicates the AWS Step Function start.
  4. The function starts the state machine.
  5. The state machine runs an AWS Glue job (Apache Spark).
  6. The job processes input data from the raw zone to the data lake conformed zone. The job also converts CSV input data to Parquet formatted data.
  7. The job updates the Data Catalog table with the metadata of the conformed Parquet file.
  8. A second AWS Glue job (Apache Spark) processes the input data from the conformed zone to the purpose-built zone of the data lake.
  9. The job fetches ETL transformation rules from the Amazon S3 code bucket and transforms the input data.
  10. The job stores the result in Parquet format in the purpose-built zone.
  11. The job updates the Data Catalog table with the metadata of the purpose-built Parquet file.
  12. The job updates the DynamoDB table and updates the job status to completed.
  13. An Amazon Simple Notification Service (Amazon SNS) notification is sent to subscribers that states the job is complete.
  14. Data engineers or analysts can now analyze data via Athena.

We will discuss data formats, Glue jobs, ETL transformation logics, data cataloging, auditing, notification, orchestration, and data analysis in more detail in AWS CDK Pipelines for Data Lake ETL Deployment GitHub repository. This will be discussed in the subsequent section.

Centralized deployment

Now that we have data lake infrastructure and ETL jobs ready, let’s define our deployment model. This model is based on the following design principles:

  • A dedicated AWS account to run CDK pipelines.
  • One or more AWS accounts into which the data lake is deployed.
  • The data lake infrastructure has a dedicated source code repository. Typically, data lake infrastructure is a one-time deployment and rarely evolves. Therefore, a dedicated code repository provides a landing zone for your data lake.
  • Each ETL job has a dedicated source code repository. Each ETL job may have unique AWS service, orchestration, and configuration requirements. Therefore, a dedicated source code repository will help you more flexibly build, deploy, and maintain ETL jobs.

We organize our source code repo into three branches: dev (main), test, and prod. In the deployment account, we manage three separate CDK Pipelines and each pipeline is sourced from a dedicated branch. Here we choose a branch-based software development method in order to demonstrate the strategy in more complex scenarios where integration testing and validation layers require human intervention. As well, these may not immediately follow with a corresponding release or deployment due to their manual nature. This facilitates the propagation of changes through environments without blocking independent development priorities. We accomplish this by isolating resources across environments in the central deployment account, allowing for the independent management of each environment, and avoiding cross-contamination during each pipeline’s self-mutating updates. The following diagram illustrates this method.

Centralized deployment

 

Note: This centralized deployment strategy can be adopted for trunk-based software development with minimal solution modification.

Deploying data lake ETL jobs

The following figure illustrates how we utilize CDK Pipelines to deploy data lake infrastructure and ETL jobs from a central deployment account. This model follows standard nomenclature from the AWS CDK. Each repository represents a cloud infrastructure code definition. This includes the pipelines construct definition. Pipelines have one or more actions, such as cloning the source code (source action) and synthesizing the stack into an AWS CloudFormation template (synth action). Each pipeline has one or more stages, such as testing and deploying. In an AWS CDK app context, the pipelines construct is a stack like any other stack. Therefore, when the AWS CDK app is deployed, a new pipeline is created in AWS CodePipeline.

This provides incredible flexibility regarding DevOps. In other words, as a developer with an understanding of AWS CDK APIs, you can harness the power and scalability of AWS services such as CodePipeline, AWS CodeBuild, and AWS CloudFormation.

Deploying data lake ETL jobs using CDK Pipelines

Here are a few important details:

  1. The DevOps administrator checks in the code to the repository.
  2. The DevOps administrator (with elevated access) facilitates a one-time manual deployment on a target environment. Elevated access includes administrative privileges on the central deployment account and target AWS environments.
  3. CodePipeline periodically listens to commit events on the source code repositories. This is the self-mutating nature of CodePipeline. It’s configured to work with and can update itself according to the provided definition.
  4. Code changes made to the main repo branch are automatically deployed to the data lake dev environment.
  5. Code changes to the repo test branch are automatically deployed to the test environment.
  6. Code changes to the repo prod branch are automatically deployed to the prod environment.

CDK Pipelines starter kits for data lakes

Want to get going quickly with CDK Pipelines for your data lake? Start by cloning our two GitHub repositories. Here is a summary:

AWS CDK Pipelines for Data Lake Infrastructure Deployment

This repository contains the following reusable resources:

  • CDK Application
  • CDK Pipelines stack
  • CDK Pipelines deploy stage
  • Amazon VPC stack
  • Amazon S3 stack

It also contains the following automation scripts:

  • AWS environments configuration
  • Deployment account bootstrapping
  • Target account bootstrapping
  • Account secrets configuration (e.g., GitHub access tokens)

AWS CDK Pipelines for Data Lake ETL Deployment

This repository contains the following reusable resources:

  • CDK Application
  • CDK Pipelines stack
  • CDK Pipelines deploy stage
  • Amazon DynamoDB stack
  • AWS Glue stack
  • AWS Step Functions stack

It also contains the following:

  • AWS Lambda scripts
  • AWS Glue scripts
  • AWS Step Functions State machine script

Advantages

This section summarizes some of the advantages offered by this solution.

Scalable and centralized deployment model

We utilize a scalable and centralized deployment model to deliver end-to-end automation. This allows DevOps and data engineers to use the single responsibility principal while maintaining precise control over the deployment strategy and code quality. The model can readily be expanded to more accounts, and the pipelines are responsive to custom controls within each environment, such as a production approval layer.

Configuration-driven deployment

Configuration in the source code and AWS Secrets Manager allow deployments to utilize targeted values that are declared globally in a single location. This provides consistent management of global configurations and dependencies such as resource names, AWS account Ids, Regions, and VPC CIDR ranges. Similarly, the CDK Pipelines export outputs from CloudFormation stacks for later consumption via other resources.

Repeatable and consistent deployment of new ETL jobs

Continuous integration and continuous delivery (CI/CD) pipelines allow teams to deploy to production more frequently. Code changes can be safely and securely propagated through environments and released for deployment. This allows rapid iteration on data processing jobs, and these jobs can be changed in isolation from pipeline changes, resulting in reliable workflows.

Cleaning up

You may delete the resources provisioned by utilizing the starter kits. You can do this by running the cdk destroy command using AWS CDK Toolkit. For detailed instructions, refer to the Clean up sections in the starter kit README files.

Conclusion

In this post, we showed how to utilize CDK Pipelines to deploy infrastructure and data processing ETL jobs of your data lake in dev, test, and production AWS environments. We provided two GitHub repositories for you to test and realize the full benefits of this solution first hand. We encourage you to fork the repositories, bring your ETL scripts, bootstrap your accounts, configure account parameters, and continuously delivery your data lake ETL jobs.

Let’s stay in touch via the GitHub—AWS CDK Pipelines for Data Lake Infrastructure Deployment and AWS CDK Pipelines for Data Lake ETL Deployment.


About the authors

Ravi Itha

Ravi Itha is a Sr. Data Architect at AWS. He works with customers to design and implement Data Lakes, Analytics, and Microservices on AWS. He is an open-source committer and has published more than a dozen solutions using AWS CDK, AWS Glue, AWS Lambda, AWS Step Functions, Amazon ECS, Amazon MQ, Amazon SQS, Amazon Kinesis Data Streams, and Amazon Kinesis Data Analytics for Apache Flink. His solutions can be found at his GitHub handle. Outside of work, he is passionate about books, cooking, movies, and yoga.

 

 

Isaiah Grant

Isaiah Grant is a Cloud Consultant at 2nd Watch. His primary function is to design architectures and build cloud-based applications and services. He leads customer engagements and helps customers with enterprise cloud adoptions. In his free time, he is engaged in local community initiatives and enjoys being outdoors with his family.

 

 

 

 

Zahid Ali

Zahid Ali is a Data Architect at AWS. He helps customers design, develop, and implement data warehouse and Data Lake solutions on AWS. Outside of work he enjoys playing tennis, spending time outdoors, and traveling.

 

Advanced Amazon Pinpoint Templates using Message Template Helpers

Post Syndicated from davelem original https://aws.amazon.com/blogs/messaging-and-targeting/advanced-amazon-pinpoint-templates-using-message-template-helpers/

Personalization of customer messages is a proven way to increase engagement of promotional and transactional communications. In order to make these communications repeatable and scalable, building personalization through templates is often required.

Using the Advanced Template Capabilities feature of Amazon Pinpoint, it’s now possible to create highly customized templates used for email, SMS, and Push Notifications.

Pinpoint templates are personalized using Handlebars.js. The new message template helpers are an expansion on the default Handlebars.js features. Please refer to handlebarsjs.com for more information on the default functionality of Handlebars.js

In this blog we will build an Order Confirmation template that will demonstrate a few helpers from each of the following categories:

Prerequisites

Before creating a template, you need to have an existing Amazon Pinpoint Project with the email channel enabled. The following will walk you through creating a project if you don’t already have one: Create and configure a Pinpoint Project

Architecture Overview

Step 1: Create a CSV file with sample Endpoint and Imported Segment

In Amazon Pinpoint, an endpoint represents a specific method of contacting a customer. This could be their email address (for email messages) or their phone number (for SMS messages) or a custom endpoint type. Endpoints can also contain custom attributes, and you can associate multiple endpoints with a single user. In this step, we create a simple CSV file which we will use to create a Segment in Pinpoint.

The data below contains the sample Order and Product data we will use in our Order Confirmation Email.

  1. Create a .CSV file named AdvancedTemplatesSegment.csv with the following data:
    ChannelType,Address,Id,Attributes.FirstName,Attributes.LastName,Attributes.OrderDate,Attributes.OrderNumber,Attributes.ProductNumber,Attributes.ProductNumber,Attributes.ProductNumber,Attributes.ProductNumber,Attributes.ProductNumber,Attributes.ProductName,Attributes.ProductName,Attributes.ProductName,Attributes.ProductName,Attributes.ProductName,Attributes.Amount,Attributes.Amount,Attributes.Amount,Attributes.Amount,Attributes.Amount,Attributes.ItemCount,Attributes.ItemCount,Attributes.ItemCount,Attributes.ItemCount,Attributes.ItemCount,Attributes.CLVTier,User.UserId,Metrics.Age
    EMAIL,[email protected],287b3858-3097-40e3-9af4-19bd4509a8f2,Mary,Smith,2021-01-15T18:07:13Z,460-ITS-2320,DWG8799598,XTC5517773,XRO7471152,EAT5122843,LNP9056489,non lectus aliquam,sapien placerat ante,semper sapien a libero nam dui,vitae consectetuer eget rutrum,nisl ut volutpat sapien arcu,68.88,32.89,53.19,45.38,47.31,20,76,33,15,53,High,66af7a81-77f2-485f-b115-d8c3a00f7077,84
    EMAIL,[email protected],b42e6c5f-3e15-4fdd-b61c-499508271082,,,2021-01-30T22:33:22Z,296-OZA-6579,VMC0637283,RGM6575767,BTM9430068,XCV9343127,GVU2858284,a libero nam dui,sit amet consectetuer adipiscing,at ipsum,ut dolor morbi,nullam molestie,74.86,83.18,15.42,97.03,37.42,13,94,50,54,84,Low,dadc1be9-daf4-46ce-9069-13565e03eaa0,61

    NOTE: The file above has a few attributes that are the key to personalizing our email and including multiple items in our Order Confirmation table:

    • Attributes.FirstName – This will allow us to personalize with a salutation if available.
    • Attributes.CLVTier – This is an attribute that could be specified from a Machine Learning model to determine the customers CLV Tier. We will be using it to provide coupons specific to a given CLV Tier. See Predictive Segmentation Using Amazon Pinpoint and Amazon SageMaker for an example solution that demonstrates using Machine Learning to analyze information in Pinpoint.
    • Attributes.ProductNumber – Note that we have multiple columns that repeat for the product information in the order.  Pinpoint attributes are actually stored as a list, so if you pass multiple columns with the same name it will add items to the attribute list.This is the key to how we are able to display a table of information, but note that it does require making sure the attributes are aligned in the proper columns. For example, Attributes.ProductNumber[0] needs to align with Attributes.ProductName[0]See Using variables with message template helpers for more details.
  2. Search for [email protected] above and replace with two valid email addresses. Note that if your account is still in the sandbox these will need to be verified email addresses. If you only have access to a single email address you can use labels by adding a plus sign (+) followed by a string of text after the local part of the address and before the at (@) sign. For example: [email protected] and [email protected]
  3. Create a Pinpoint Segment
    1. Open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint, and then choose the project that you created as part of the Prerequisites.
    2. In the navigation pane, choose Segments, and then choose Create a segment.
    3. Select Import a Segment.
    4. Browse to or Drag and Drop the .CSV file you created in the previous step.
    5. Use the default Segment Name and select Create Segment.

Step 2: Build The Message Template

  1. Open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint.
  2. In the navigation pane, choose Message templates, and then choose Create template.
  3. Select Email as the Channel.
  4. For Template name use: AdvancedTemplateExample.
  5. For Subject use: AdvancedTemplateExample.
  6. Paste the following code into the HTML Editor. We will take some time later on to dig into the specific Handlebars helpers:
    {{#* inline "salutation"}}
        {{#if Attributes.FirstName.[0]}}
            Dear {{Attributes.FirstName.[0]}},<br />
        {{else}}
            Dear Valued Customer,<br />
        {{/if}}
    {{/inline}}
    
    {{#* inline "clvcoupon"}}
        {{#if Attributes.CLVTier.[0]}}
            {{#eq Attributes.CLVTier.[0] "High"}}
                As a thank-you for your continued support, please use this coupon code for <strong>30%</strong> off your next order: <strong>WELOVEYOU30</strong>
            {{/eq}}
        {{/if}}
    {{/inline}}
    
    {{#* inline "footer"}}
        <hr />	
        Accent Athletics - 1234 Anywhere Ave, Anywhere USA, 12345 - <a href="https://www.example.com/preferences/index.html?pid={{ApplicationId}}&uid={{User.UserId}}&h={{sha256 (join User.UserId "d67c37ed538b751d850de18" "+" prefix="" suffix="")}}">Manage Preferences</a>
        <hr />
    {{/inline}}
    
    
    <!DOCTYPE html>
        <html lang="en">
        <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    </head>
    <body>
      {{> salutation}}
      Thank-you for your order! {{> clvcoupon}}<br /><br />
      Order Date: {{now format="d MMM yyyy HH:mm:ss" tz=America/Los_Angeles}}<br /><br />
      <table>
      <thead>
        <tr style="background-color: #f2f2f2;">
          <th style="text-align:left; width:75px">
            Product #
          </th>
          <th style="text-align:left; width:200px;">
            Name
          </th>
          <th style="text-align:center; width:75px;">
            Count
          </th>
          <th style="text-align:center; width:75px;">
            Amount
          </th>
        </tr>
      </thead>
      <tbody>
      {{#each Attributes.ProductNumber}}
        {{#eq (modulo @index 2) "1.0"}}
            <tr style="background-color: #f2f2f2;">
        {{else}}
            <tr>
        {{/eq}}
          <td style="text-align:left;">{{this}}</td>
          <td style="text-align:left;">{{#with (lookup ../Attributes.ProductName @index)}}{{this}}{{/with}}</td>
          <td style="text-align:center;">{{#with (lookup ../Attributes.ItemCount @index)}}{{this}}{{/with}}</td>
          <td style="text-align:center;">${{#with (lookup ../Attributes.Amount @index)}}{{this}}{{/with}}</td>
        </tr>
      {{else}}
        <tr>
          <td style="text-align:left;">{{Attributes.ProductNumber}}</td>
          <td style="text-align:center;">{{Attributes.ProductName}}</td>
          <td style="text-align:center;">{{Attributes.ItemCount}}</td>
          <td style="text-align:center;">{{Attributes.Amount}}</td>
        </tr>
      {{/each}}
      </tbody>
      </table>
      {{> footer}}
    </body>
    </html>

  7. Click Create to finish creating your template

Step 3: Create an Amazon Pinpoint Campaign

By sending a campaign, we can verify that our Amazon Pinpoint project is configured correctly, and that we created the segment and template correctly.

To create the segment and campaign:

  1. Open the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint, and then choose the project that you created in step 1.
  2. In the navigation pane, choose Campaigns, and then choose Create a campaign.
  3. Name the campaign “AdvancedTemplateTest.” Under Choose a channel for this campaign, choose Email, and then choose Next.
  4. On the Choose a segment page, choose the “AdvancedTemplateExample” segment that you just created, and then choose Next.
  5. In Create your message, choose the template we just created, ‘AdvancedTemplateExample. Note: You will see an Alert with: “This template contains a reference to an attribute from another project…” This is expected as Pinpoint is scanning the template for attributes allowing you to specify default values in case the endpoint doesn’t contain a value for the attribute. In this blog post we are using the {{#if}} conditional helper to handle any missing data, i.e. {{#if Attributes.FirstName.[0]}}
  6. On the Choose when to send the campaign page, keep all of the default values, and then choose Next.
  7. On the Review and launch page, choose Launch campaign.

Within a few seconds, you should receive an email:

So what just happened?

Let’s take a deeper dive at each of helpers we included in the template:

{{#* inline "salutation"}}
    {{#if Attributes.FirstName.[0]}}
        Dear {{Attributes.FirstName.[0]}},<br /><br />
    {{else}}
        Dear Valued Customer,<br /><br />
    {{/if}}
{{/inline}}

First you will notice that we are making use of Inline Partials. Using Inline Partials allows you to build a library of frequently used snippets of content. In this case we frequently use a salutation in our communications. You can build and maintain your own frequently used snippets and include them at the beginning of the template.

Later in the message we can simply include: {{> salutation}} to include a salutation in our email.

In this example we also see the {{#if}} helper which is used to evaluate if a first name is available on the endpoint. If the name is found, a greeting is returned that passes the user’s first name in the response. Otherwise, the else statement returns an alternative greeting.

{{#* inline "clvcoupon"}}
    {{#if Attributes.CLVTier.[0]}}
        {{#eq Attributes.CLVTier.[0] "High"}}
            As a thank-you for your continued support, please use this coupon code for <strong>30%</strong> off your next order: <strong>WELOVEYOU30</strong>
        {{/eq}}
    {{/if}}
{{/inline}}

Again, we are using Inline Partials to organize our code. Additionally we are using {{#if}} to see if the user has a CLVTier attribute and if so, we use the {{#eq}} conditional helper to see if their CLVTier is “High” as we only want this coupon to display for customers that fall into that tier.

Note that CLVTier is an attribute that is populated along with the Endpoint when we created the Segment above. You could also use a solution such as Predictive Segmentation using Amazon Pinpoint and Amazon SageMaker to incorporate Machine Learning to classify your existing users.

{{#* inline "footer"}}
    <hr />
    Accent Athletics - 1234 Anywhere Ave, Anywhere USA, 12345 - <a href="https://www.example.com/preferences/index.html?pid={{ApplicationId}}&uid={{User.UserId}}&h={{sha256 (join User.UserId "d67c37ed538b751d850de18" "+" prefix="" suffix="")}}">Manage Preferences</a>
    <hr />
{{/inline}}

In the example above we are using the {{sha256}} and {{join}} helpers to create a secure link to the Preference Center deployed as part of the Amazon Pinpoint Preference Center solution.

{{> salutation}}
Thank-you for your order! {{> clvcoupon}}<br /><br /><hr />
Order Date: {{now format="d MMM yyyy HH:mm:ss" tz=America/Los_Angeles}}<br /><br />

This is where all of our hard work implementing Inline Partials really starts to pay off. To include our salutation and coupon we simply need to specify: {{> salutation}} and {{> clvcoupon}}

The {{now}} string helper allows us to display the current date and time in the format of our choosing. Please reference the following for more details on the date pattern and available timezones:

<tbody>
{{#each Attributes.ProductNumber}}
{{#eq (modulo @index 2) "1.0"}}
    		<tr style="background-color: #f2f2f2;">
{{else}}
    		<tr>
{{/eq}}
    	<td style="text-align:left;">{{this}}</td>
    	<td style="text-align:left;">{{#with (lookup ../Attributes.ProductName @index)}}{{this}}{{/with}}</td>
    	<td style="text-align:center;">{{#with (lookup ../Attributes.ItemCount @index)}}{{this}}{{/with}}</td>
    	<td style="text-align:center;">${{#with (lookup ../Attributes.Amount @index)}}{{this}}{{/with}}</td>
</tr>
{{else}}
<tr>
    		<td style="text-align:left;">{{Attributes.ProductNumber}}</td>
    		<td style="text-align:center;">{{Attributes.ProductName}}</td>
    		<td style="text-align:center;">{{Attributes.ItemCount}}</td>
    		<td style="text-align:center;">{{Attributes.Amount}}</td>
</tr>
{{/each}}
</tbody>

This particular section has a lot going on. We will break each part down for further explanation:

  • {{#each}} – Allows us to loop through each of the values in our attribute. In this case our ProductNumber attribute will contain: [“order#1”, “order#2”,”order#3, etc.]
    Note that if you only have one item in the attribute array. Pinpoint will simplify that into a single string attribute. That is why we have the {{each}} part of the {{#else}} statement. This allows us to reference the attribute as a single string in case we don’t have a collection of values in the attribute.
  • {{#eq (modulo @index 2) “1.0”}} – In order to alternate our background color for even/odd rows, we are making use of the {{modulo}} operator from the Math and encoding helpers which will return the remainder of two given numbers allowing us to determine if this is an odd or even row.
    @index is a native Handlebars.js property that contains the current index we are on in the loop.
  • {{this}} – When iterating through a collection using {{#each}}, {{this}} allows you to reference the current item in the collection
  • {{#with (lookup ../Attributes.ProductName @index)}}{{this}}{{/with}}Lookup is a built-in handlebars helper that allows us to find values in another collection. We are using the index combined with lookup to find the Product Name that goes along with the Product Number we are currently on. The same pattern is used for the remaining columns of the table.The ability to lookup values in another attribute collection is the key to how we are able to display a table of information, but note that it does require making sure the attributes are aligned in the proper columns. For example, Attributes.ProductNumber[0] needs to align with Attributes.ProductName[0]See Using variables with message template helpers for more details.
{{> footer}}

And just to wrap things up, let’s pull in the footer we defined with Inline Partials.

Next Steps

Using the techniques above, you can create sophisticated and personalized communications using Amazon Pinpoint.

Think about your existing communications to see if you can use personalization to increase customer engagement for your promotional and transactional messages.

Amazon Pinpoint is a flexible and scalable outbound and inbound marketing communications service. Learn more here: https://aws.amazon.com/pinpoint/

Optimizing EC2 Workloads with Amazon CloudWatch

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/optimizing-ec2-workloads-with-amazon-cloudwatch/

This post is written by David (Dudu) Twizer, Principal Solutions Architect, and Andy Ward, Senior AWS Solutions Architect – Microsoft Tech.

In December 2020, AWS announced the availability of gp3, the next-generation General Purpose SSD volumes for Amazon Elastic Block Store (Amazon EBS), which allow customers to provision performance independent of storage capacity and provide up to a 20% lower price-point per GB than existing volumes.

This new release provides an excellent opportunity to right-size your storage layer by leveraging AWS’ built-in monitoring capabilities. This is especially important with SQL workloads as there are many instance types and storage configurations you can select for your SQL Server on AWS.

Many customers ask for our advice on choosing the ‘best’ or the ‘right’ storage and instance configuration, but there is no one solution that fits all circumstances. This blog post covers the critical techniques to right-size your workloads. We focus on right-sizing a SQL Server as our example workload, but the techniques we will demonstrate apply equally to any Amazon EC2 instance running any operating system or workload.

We create and use an Amazon CloudWatch dashboard to highlight any limits and bottlenecks within our example instance. Using our dashboard, we can ensure that we are using the right instance type and size, and the right storage volume configuration. The dimensions we look into are EC2 Network throughput, Amazon EBS throughput and IOPS, and the relationship between instance size and Amazon EBS performance.

 

The Dashboard

It can be challenging to locate every relevant resource limit and configure appropriate monitoring. To simplify this task, we wrote a simple Python script that creates a CloudWatch Dashboard with the relevant metrics pre-selected.

The script takes an instance-id list as input, and it creates a dashboard with all of the relevant metrics. The script also creates horizontal annotations on each graph to indicate the maximums for the configured metric. For example, for an Amazon EBS IOPS metric, the annotation shows the Maximum IOPS. This helps us identify bottlenecks.

Please take a moment now to run the script using either of the following methods described. Then, we run through the created dashboard and each widget, and guide you through the optimization steps that will allow you to increase performance and decrease cost for your workload.

 

Creating the Dashboard with CloudShell

First, we log in to the AWS Management Console and load AWS CloudShell.

Once we have logged in to CloudShell, we must set up our environment using the following command:

# Download the script locally
wget -L https://raw.githubusercontent.com/aws-samples/amazon-ec2-mssql-workshop/master/resources/code/Monitoring/create-cw-dashboard.py

# Prerequisites (venv and boto3)
python3 -m venv env # Optional
source env/bin/activate  # Optional
pip3 install boto3 # Required

The commands preceding download the script and configure the CloudShell environment with the correct Python settings to run our script. Run the following command to create the CloudWatch Dashboard.

# Execute
python3 create-cw-dashboard.py --InstanceList i-example1 i-example2 --region eu-west-1

At its most basic, you just must specify the list of instances you are interested in (i-example1 and i-example2 in the preceding example), and the Region within which those instances are running (eu-west1 in the preceding example). For detailed usage instructions see the README file here. A link to the CloudWatch Dashboard is provided in the output from the command.

 

Creating the Dashboard Directly from your Local Machine

If you’re familiar with running the AWS CLI locally, and have Python and the other pre-requisites installed, then you can run the same commands as in the preceding CloudShell example, but from your local environment. For detailed usage instructions see the README file here. If you run into any issues, we recommend running the script from CloudShell as described prior.

 

Examining Our Metrics

 

Once the script has run, navigate to the CloudWatch Dashboard that has been created. A direct link to the CloudWatch Dashboard is provided as an output of the script. Alternatively, you can navigate to CloudWatch within the AWS Management Console, and select the Dashboards menu item to access the newly created CloudWatch Dashboard.

The Network Layer

The first widget of the CloudWatch Dashboard is the EC2 Network throughput:

The automatic annotation creates a red line that indicates the maximum throughput your Instance can provide in Mbps (Megabits per second). This metric is important when running workloads with high network throughput requirements. For our SQL Server example, this has additional relevance when considering adding replica Instances for SQL Server, which place an additional burden on the Instance’s network.

 

In general, if your Instance is frequently reaching 80% of this maximum, you should consider choosing an Instance with greater network throughput. For our SQL example, we could consider changing our architecture to minimize network usage. For example, if we were using an “Always On Availability Group” spread across multiple Availability Zones and/or Regions, then we could consider using an “Always On Distributed Availability Group” to reduce the amount of replication traffic. Before making a change of this nature, take some time to consider any SQL licensing implications.

 

If your Instance generally doesn’t pass 10% network utilization, the metric is indicating that networking is not a bottleneck. For SQL, if you have low network utilization coupled with high Amazon EBS throughput utilization, you should consider optimizing the Instance’s storage usage by offloading some Amazon EBS usage onto networking – for example by implementing SQL as a Failover Cluster Instance with shared storage on Amazon FSx for Windows File Server, or by moving SQL backup storage on to Amazon FSx.

The Storage Layer

The second widget of the CloudWatch Dashboard is the overall EC2 to Amazon EBS throughput, which means the sum of all the attached EBS volumes’ throughput.

Each Instance type and size has a different Amazon EBS Throughput, and the script automatically annotates the graph based on the specs of your instance. You might notice that this metric is heavily utilized when analyzing SQL workloads, which are usually considered to be storage-heavy workloads.

If you find data points that reach the maximum, such as in the preceding screenshot, this indicates that your workload has a bottleneck in the storage layer. Let’s see if we can find the EBS volume that is using all this throughput in our next series of widgets, which focus on individual EBS volumes.

And here is the culprit. From the widget, we can see the volume ID and type, and the performance maximum for this volume. Each graph represents one of the two dimensions of the EBS volume: throughput and IOPS. The automatic annotation gives you visibility into the limits of the specific volume in use. In this case, we are using a gp3 volume, configured with a 750-MBps throughput maximum and 3000 IOPS.

Looking at the widget, we can see that the throughput reaches certain peaks, but they are less than the configured maximum. Considering the preceding screenshot, which shows that the overall instance Amazon EBS throughput is reaching maximum, we can conclude that the gp3 volume here is unable to reach its maximum performance. This is because the Instance we are using does not have sufficient overall throughput.

Let’s change the Instance size so that we can see if that fixes our issue. When changing Instance or volume types and sizes, remember to re-run the dashboard creation script to update the thresholds. We recommend using the same script parameters, as re-running the script with the same parameters overwrites the initial dashboard and updates the threshold annotations – the metrics data will be preserved.  Running the script with a different dashboard name parameter creates a new dashboard and leaves the original dashboard in place. However, the thresholds in the original dashboard won’t be updated, which can lead to confusion.

Here is the widget for our EBS volume after we increased the size of the Instance:

We can see that the EBS volume is now able to reach its configured maximums without issue. Let’s look at the overall Amazon EBS throughput for our larger Instance as well:

We can see that the Instance now has sufficient Amazon EBS throughput to support our gp3 volume’s configured performance, and we have some headroom.

Now, let’s swap our Instance back to its original size, and swap our gp3 volume for a Provisioned IOPS io2 volume with 45,000 IOPS, and re-run our script to update the dashboard. Running an IOPS intensive task on the volume results in the following:

As you can see, despite having 45,000 IOPS configured, it seems to be capping at about 15,000 IOPS. Looking at the instance level statistics, we can see the answer:

Much like with our throughput testing earlier, we can see that our io2 volume performance is being restricted by the Instance size. Let’s increase the size of our Instance again, and see how the volume performs when the Instance has been correctly sized to support it:

We are now reaching the configured limits of our io2 volume, which is exactly what we wanted and expected to see. The instance level IOPS limit is no longer restricting the performance of the io2 volume:

Using the preceding steps, we can identify where storage bottlenecks are, and we can identify if we are using the right type of EBS volume for the workload. In our examples, we sought bottlenecks and scaled upwards to resolve them. This process should be used to identify where resources have been over-provisioned and under-provisioned.

If we see a volume that never reaches the maximums that it has been configured for, and that is not subject to any other bottlenecks, we usually conclude that the volume in question can be right-sized to a more appropriate volume that costs less, and better fits the workload.

We can, for example, change an Amazon EBS gp2 volume to an EBS gp3 volume with the correct IOPS and throughput. EBS gp3 provides up to 1000-MBps throughput per volume and costs $0.08/GB (versus $0.10/GB for gp2). Additionally, unlike with gp2, gp3 volumes allow you to specify provisioned IOPS independently of size and throughput. By using the process described above, we could identify that a gp2, io1, or io2 volume could be swapped out with a more cost-effective gp3 volume.

If during our analysis we observe an SSD-based volume with relatively high throughput usage, but low IOPS usage, we should investigate further. A lower-cost HDD-based volume, such as an st1 or sc1 volume, might be more cost-effective while maintaining the required level of performance. Amazon EBS st1 volumes provide up to 500 MBps throughput and cost $0.045 per GB-month, and are often an ideal volume-type to use for SQL backups, for example.

Additional storage optimization you can implement

Move the TempDB to Instance Store NVMe storage – The data on an SSD instance store volume persists only for the life of its associated instance. This is perfect for TempDB storage, as when the instance stops and starts, SQL Server saves the data to an EBS volume. Placing the TempDB on the local instance store frees the associated Amazon EBS throughput while providing better performance as it is locally attached to the instance.

Consider Amazon FSx for Windows File Server as a shared storage solutionAs described here, Amazon FSx can be used to store a SQL database on a shared location, enabling the use of a SQL Server Failover Cluster Instance.

 

The Compute Layer

After you finish optimizing your storage layer, wait a few days and re-examine the metrics for both Amazon EBS and networking. Use these metrics in conjunction with CPU metrics and Memory metrics to select the right Instance type to meet your workload requirements.

AWS offers nearly 400 instance types in different sizes. From a SQL perspective, it’s essential to choose instances with high single-thread performance, such as the z1d instance, due to SQL’s license-per-core model. z1d instances also provide instance store storage for the TempDB.

You might also want to check out the AWS Compute Optimizer, which helps you by automatically recommending instance types by using machine learning to analyze historical utilization metrics. More details can be found here.

We strongly advise you to thoroughly test your applications after making any configuration changes.

 

Conclusion

This blog post covers some simple and useful techniques to gain visibility into important instance metrics, and provides a script that greatly simplifies the process. Any workload running on EC2 can benefit from these techniques. We have found them especially effective at identifying actionable optimizations for SQL Servers, where small changes can have beneficial cost, licensing and performance implications.

 

 

Frictionless hosting of containerized ASP.NET web apps using Amazon Lightsail

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/frictionless-hosting-of-containerized-asp-net-web-apps-using-amazon-lightsail/

This post is written by Fahad Mustafa, Cloud Application Architect, AWS Professional Services

There are many ways to deploy ASP.NET web apps to AWS. Each with its own use cases and differing pricing models. But what if you have a small website and database that you must deploy rapidly, manage, and scale? What if you want a cost-effective simple monthly plan? In these cases, Amazon Lightsail is a great choice. This post shows you how to take a containerized ASP.NET web application that connects to a PostgreSQL database and deploy it to Lightsail. So that you can get your ASP.NET web app up and running.

Product Overview

Amazon Lightsail is an easy way to get started on AWS. It gives you building blocks to deploy an application or website and provision a database at an affordable, monthly price.

Lightsail is perfect for students, small businesses, and startups to get their website or application up and running in the cloud. By providing a secure, highly available, and managed environment Lightsail does all the heavy lifting like setting up IAM roles and policies.

Lightsail can also run containers! By pointing Lightsail to a public image on Amazon ECR or Docker Hub, or uploading an image from your local machine, you can easily run the container, scale it, monitor it and use a custom domain.

Overview of solution

To deploy an ASP.NET app that connects to a PostgreSQL database, you create a Lightsail container service and PostgreSQL database through the AWS Management Console. Create your app and container image. Push the image to Lightsail and finally create the Lightsail deployment to run the container.

solution diagram

Overview of steps

In this post, you create a sample ASP.NET web app through the .NET CLI. Alternatively, you can use Visual Studio to create the app.

This is the sequence of steps I review in this post:

  • Create a PostgreSQL database
  • Create a Lightsail container service
  • Create an ASP.NET web app
  • Create a Dockerfile and build image
  • Upload the image to Lightsail
  • Deploy and run the image

Prerequisites

For this walkthrough, you should have the following prerequisites:

Walkthrough

Create a PostgreSQL database

In this step, you create a PostgreSQL database through the Lightsail console.

Create the database

  1. Sign in to the Lightsail console.
  2. On the Lightsail home page, choose the Database
  3. Choose Create database.
  4. Choose the Database location by changing the AWS Region and Availability Zone.
  5. Choose the database engine. In this example, select PostgreSQL 12.6.
  6. Optional – Specify login credentials. If not changed, AWS generates a default secure password.
  7. Optional – Specify the master database name. If not changed, AWS will use “dbmaster” as the default.
  8. Choose the database plan. Compare the plan’s memory, CPU, storage, and transfer quota to decide which best fits your needs. The smallest database plan is Free Tier eligible.
  9. Identify your database by giving it a unique name.
  10. Choose Create database.

Creating and configuring the database can take a few minutes. Once ready, the status changes to Available. For more information and options on creating a database in Lightsail, see Creating a database in Amazon Lightsail.

available database

Now you are ready to connect to the database and create a table. To connect, see Connecting to your PostgreSQL database in Amazon Lightsail. This sample uses a database named aspnetlightsaildb and a table named Person that you can create by running the following script using PgAdmin. Note that the Owner value is dbmasteruser. This is the default username AWS generates. If you changed the default, then use the username you specified in step 6.

-- Database: aspnetlightsaildb
CREATE DATABASE aspnetlightsaildb
    WITH 
    OWNER = dbmasteruser
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1;	
-- Table: public.Person
CREATE TABLE IF NOT EXISTS public."Person"
(
    "Id" integer NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
    "Name" text COLLATE pg_catalog."default",
    "DateOfBirth" date,
    "Address" text COLLATE pg_catalog."default",
    CONSTRAINT "Person_pkey" PRIMARY KEY ("Id")
)
TABLESPACE pg_default;
ALTER TABLE public."Person"
    OWNER to dbmasteruser;

Now your database and table is created and you can create a container service.

Create a Lightsail container service

In this step, you create a Lightsail container service that is ready to accept your container images.

Create the container service

  1. Sign in to the Lightsail console.
  2. On the Lightsail home page, choose the Containers Tab
  3. Choose Create container service.
  4. In the Create a container service page, choose Change AWS Region, then choose an AWS Region for your container service.
  5. Choose a capacity for your container service. For more information, see Container service capacity (scale and power).
  6. Skip the Set up your first deployment step as you’ll create the deployment after creating the container image on your dev machine.
  7. Enter a name for your container service. Take note of this name, you’ll need it later to deploy the container to Lightsail.
  8. Click Create container service.

After a few minutes, your container service status changes from Pending to Ready. This indicates you can now deploy images. If this is the first time you created a service, it can take 10–15 minutes for the status to become Ready.

container service

Create an ASP.NET web app

Using the .NET CLI, you’ll create a sample ASP.NET web app. In an empty directory run the following command:

dotnet new webapp --name HelloWorldLightsail

The “webapp” segment of the commands specifies the project template to use. In this case, it’s a default ASP.NET web app. The “name” parameter is the name of the ASP.NET project.

To connect to your PostgreSQL Db from ASP.NET, you must install the “Npgsql” Nuget package. In the root directory of the project run the following command in the terminal:

dotnet add package Npgsql.EntityFrameworkCore.PostgreSQL –-version 5.0.6

Once installed, you create a Model class to represent the data and a DbContext class to connect and query the database.

public class Person
    {
        public int Id { get; set; }

        public string Name { get; set; }

        public DateTime DateOfBirth { get; set; }

        public string Address { get; set; }
    }

public class PostgreSqlContext : DbContext
    {
        public PostgreSqlContext(DbContextOptions<PostgreSqlContext> options) : base(options)
        {
        }

        public DbSet<Person> Person { get; set; }
    }

The next step is to add the connection string to appSettings.json. In the root of the settings file add a new ConnectionStrings property as shown below. The following properties are required:

  • lightsail-endpoint: The database endpoint as shown in the Lightsail console.
  • db-name: The name of the database you want to connect to.
  • db-username: The username as shown in the Lightsail console.
  • db-password: The password as shown in the Lightsail console.
"ConnectionStrings": {
    "AspnetLightsailDb": "Server=<lightsail-endpoint>;Port=5432;Database=<db-name>;User Id=<db-username>;Password=<db-password>;"
  }

Next step is to tell ASP.NET where to find the connection string and which DbContext class to use. This is done by configuring the DbContext in Startup.cs. Under the ConfigureServices method add the following line of code:

  services.AddDbContext<PostgreSqlContext>(options => options.UseNpgsql(Configuration.GetConnectionString("AspnetLightsailDb")));

 

Now, you are ready to perform operations against the database. This is done by performing operations against the Person property of the PostgreSqlContext instance.

For example to fetch all records form the Person table:

public IList<Person> Person { get;set; }

        public async Task OnGetAsync()
        {
            Person = await _context.Person.ToListAsync();
        }

You now have an ASP.NET web application that can query the “Person” table against the PostgreSQL database.

Create a Dockerfile and build image

In order to containerize the web app, you must create a Dockerfile. This file provides instructions to Docker on how to build the container image.

To create a Dockerfile and build image

  1. In the root directory of the project, you created (where the .csproj file lives) create an empty file named “Dockerfile”. Note this file does not have an extension.
  2. Open the file with a text editor or IDE and insert the following:
# https://hub.docker.com/_/microsoft-dotnet
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /source

# copy csproj and restore as distinct layers
COPY *.csproj .
RUN dotnet restore

# copy everything else and build app
COPY . .
RUN dotnet publish -c release -o /app --no-restore

# final stage/image
FROM mcr.microsoft.com/dotnet/aspnet:5.0
WORKDIR /app
COPY --from=build /app ./
ENTRYPOINT ["dotnet", "HelloWorldLightsail.dll"]
  1. To build the image, open a terminal in the same directory as the Dockerfile. Run the following command to build the image.
    docker build -t helloworldlightsail .
    The “-t” parameter is a human readable tag you give the image to make it easy to identify.
  2. After the command completes, you can verify that the image exists by running
    docker images
    You should see the newly created image.newly created image

Upload the image to Lightsail

In this step, you upload the newly built image to the Lightsail container service that you created earlier.

To upload the image to Lightsail

  1. Ensure you have configured the AWS CLI to access AWS.
  2. In a terminal enter the following command:
    aws lightsail push-container-image --region ap-southeast-2 --service-name aspnet-helloworld --label helloworldlightsail --image helloworldlightsail:latest
    The –-region and –-service-name parameters should match the container service you created through the AWS Management Console. The –-label parameter is a descriptive name you give the image when it’s stored in the container service. This will help you track the different versions of the image. The –-image parameter consists of the image name and tag on your local machine that you want to push to Lightsail. Read more about how to push images to Lightsail.
  3. After the command runs successfully browse your container service in the Lightsail console and click the “Images” tab. You should see the uploaded image.

3. After the command runs successfully browse your container service in the Lightsail console and click the “Images” tab

Deploy and run the image

Now that your image is uploaded to the container service it’s time to create a deployment to run the app.

To create a deployment

  1. Go to the Deployments tab in the Lightsail console.
  2. Click on Create your first deployment.
  3. Enter the Container name.
  4. Click Choose stored image and select the image you uploaded in the previous step.
  5. Click on Add open ports to add a port mapping to the container. This allows Lightsail to forward web traffic to your ASP.NET web app. By default ASP.NET web server will listen to port 80.
  6. Under the Public endpoint section, select the container from the drop-down. This specifies which container Lightsail will forward traffic to since a single deployment can have more than one container.
  7. Click Save and deploy

Your configuration should looks like this. Read more about creating container services deployments in Lightsail.

configuration overview

After the deployment is complete, you can navigate to the Public domain of your container service. You will see your ASP.NET web app in action!

public domain

Conclusion

In this post, I demonstrated how easy it is to create a PostgreSQL DB and deploy an ASP.NET web app to Amazon Lightsail. Going from a container on your dev machine to a publicly accessible, scalable, and secure cloud environment within minutes.

You can now add a custom domain to your web app through the Lightsail console. Additionally, you can increase the scale of your container to keep up with demand based on the useful CPU and memory metrics provided in the console.

If you have more advanced needs for your web app, you have the whole robust ecosystem of AWS at your disposal. You can deploy your ASP.NET web app to Amazon Elastic Container Service (Amazon ECS) or even decide to go completely serverless and utilize AWS Lambda and API Gateway.

Visit the Amazon Lightsail homepage to get started with your next idea and read the docs for more details about container services on Amazon Lightsail.