Tag Archives: Amazon EC2

Genomics workflows, Part 5: automated benchmarking

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/genomics-workflows-part-5-automated-benchmarking/

Launching and running genomics workflows can take hours and involves large pools of compute instances that process data at a petabyte scale. Benchmarking helps you evaluate workflow performance and discover faster and cheaper ways of running them.

In practice, performance evaluations happen irregularly because of the associated heavy lifting. In this blog post, we discuss how life-science research teams can automate evaluations.

Business Benefits

An automated benchmarking solution provides:

  • more accurate enterprise resource planning by performing historical analytics,
  • lower cost to the business by comparing performance on different resource types, and
  • cost transparency to the business by quantifying periodical chargeback.

We’ve used automated benchmarking to compare processing times on different services such as Amazon Elastic Compute Cloud (Amazon EC2), AWS Batch, AWS ParallelCluster, Amazon Elastic Kubernetes Service (Amazon EKS), and on-premises HPC clusters. Scientists, financiers, technical leaders, and other stakeholders can build reports and dashboards to compare consumption data by consumer, workflow type, and time period.

Design pattern

Our automated benchmarking solution measures performance on two dimensions:

  • Timing: measures the duration of a workflow launch on a specific dataset
  • Pricing: measures the associated cost

This solution can be extended to other performance metrics such as iterations per second or process/thread distribution across compute nodes.

Our requirements include the following:

  • Consistent measurement of timing based on workflow status (such as preparing, waiting, ready, running, failed, complete)
  • Extensible pricing models based on unit prices (the Amazon EC2 Spot price at a specific period of time compared to Amazon EC2 On-Demand pricing)
  • Scalable, cost-efficient, and flexible data store enabling historical benchmarking and estimations
  • Minimal infrastructure management overhead

We choose a serverless design pattern using AWS Step Functions orchestration, AWS Lambda for our application code, and Amazon DynamoDB to track workflow launch IDs and states (as described in Part 3). We assume that the genomics workflows run on AWS Batch with genomics data on Amazon FSx for Lustre (Part 1). AWS Step Functions allows us to break down processing into smaller steps and avoid monolithic application code. Our evaluation process runs in four steps:

  1. Monitor for completed workflow launches in the DynamoDB stream using an Amazon EventBridge pipe with a Step Functions workflow as target. This event-driven approach is more efficient than periodic polling and avoids custom code for parsing status and cost values in all records of the DynamoDB stream.
  2. Collect a list of all compute resources associated with the workflow launch. Design a Lambda function that queries the AWS Batch API (see Part 1) to describe compute environment parameters like the Amazon EC2 instance IDs and their details, such as processing times, instance family/size, and allocation strategy (for example, Spot Instances, Reserved Instances, On-Demand Instances).
  3. Calculate the cost of all consumed resources. We achieve this with another Lambda function, which calculates the total price based on unit prices from the AWS Price List Query API.
  4. Our state machine updates the total price in the DynamoDB table without the need for additional application code.

Figure 1 visualizes these steps.

Automated benchmarking of genomics workflows

Figure 1. Automated benchmarking of genomics workflows

Implementation considerations

AWS Step Functions orchestrates our benchmarking workflow reliably and makes our application code easy to maintain. Figure 2 summarizes the state machine transitions that we’ll describe.

AWS Step Functions state machine for automated benchmarking

Figure 2. AWS Step Functions state machine for automated benchmarking

Gather consumption details

Configure the DynamoDB stream view type to New image so that the entire item is passed through as it appears after it was changed. We set up an Amazon EventBridge pipe with event filtering and the DynamoDB stream as a source. Our event filter uses multiple matching on records with a status of COMPLETE, but no cost entry in order to avoid an infinite loop. Once our state machine has updated the DynamoDB item with the workflow price, the resulting record in the DynamoDB stream will not pass our event filter.

The syntax of our event filter is as follows:

{
  "dynamodb": {
    "NewImage": {
      "status": {
        "S": ["COMPLETE"]
      },
      "totalCost": {
        "S": [{
          "exists": false
        }]
      }
    }
  }
}

We use an input transformer to simplify follow-on parsing by removing unnecessary metadata from the event.

The consumed resources included in the stream record are the auto-scaling group ID for AWS Batch and the Amazon FSx for Lustre volume ID. We use the DescribeJobs API (describe_jobs in Boto3) to determine which compute resources were used. If the response is a list of EC2 instances, we then look up consumption information including start and end times using the ListJobs API (list_jobs in Boto3) for each compute node. We use describe_volumes with filters on the identified EC2 instances to obtain the size and type of Amazon Elastic Block Store (Amazon EBS) volumes.

Calculate prices

Another Lambda function obtains the associated unit prices of all consumed resources using the GetProducts request of AWS Price List Query API (get_products in Boto3) and then parsing the pricePerUnit value. For Spot Instances, we use describe_spot_price_history of the EC2 client in Boto3 and specify the time range and instance types for which we want to receive prices.

Calculate the price of workflow launches based on the following factors:

  • Number and size of EC2 instances in auto-scaling node groups
  • Size of EBS volumes and Amazon FSx for Lustre
  • Processing duration

Our Python-based Lambda function calculates the total, rounds it, and delivers the price breakdown in the following format:

total_cost: str, instance_cost: str, volume_cost: str, filesystem_cost: str

Lastly, we put the price breakdown to the DynamoDB table using UpdateItem directly from the Amazon States Language.

Note that AWS credits and enterprise discounts might not be reflected in the responses of the AWS Price List Query API unless applied to the particular AWS account. This is often considered best practice in light of least-privilege considerations.

In the past, we’ve also used AWS Cost Explorer instead of the AWS Price List API. AWS Cost Explorer data is updated at least once every 24 hours. You can denote the pending price status in the DynamoDB table item and use the Wait state to delay the calculation process.

The presented solution can be extended to other compute services such as Amazon Elastic Kubernetes Service (Amazon EKS). For Amazon EKS, events are enriched with the cluster ID from the DynamoDB table and the price calculation should also include control plane costs.

Conclusion

Life-science research teams use benchmarking to compare workflow performance and inform their architectural decisions. Such evaluations are effort-intensive and therefore done irregularly.

In this blog post, we showed how life-science research teams can automate benchmarking for their scientific workflows. The insights teams gain from automated benchmarking indicate continuous optimization opportunities, such as by adjusting compute node configuration. The evaluation data is also available on demand for other purposes including chargeback.

Stay tuned for our next post in which we show how to use historical benchmarking data for price estimations of future workflow launches.

Related information

Building diversified and cost-optimized EC2 server groups in Spinnaker

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-diversified-and-cost-optimized-ec2-server-groups-in-spinnaker/

This blog post is written by Sandeep Palavalasa, Sr. Specialist Containers SA, and Prathibha Datta-Kumar, Software Development Engineer

Spinnaker is an open source continuous delivery platform created by Netflix for releasing software changes rapidly and reliably. It enables teams to automate deployments into pipelines that are run whenever a new version is released with proven deployment strategies that are faster and more dependable with zero downtime. For many AWS customers, Spinnaker is a critical piece of technology that allows developers to deploy their applications safely and reliably across different AWS managed services.

Listening to customer requests on the Spinnaker open source project and in the Amazon EC2 Spot Instances integrations roadmap, we have further enhanced Spinnaker’s ability to deploy on Amazon Elastic Compute Cloud (Amazon EC2). The enhancements make it easier to combine Spot Instances with On-Demand, Reserved, and Savings Plans Instances to optimize workload costs with performance. You can improve workload availability when using Spot Instances with features such as allocation strategies and proactive Spot capacity rebalancing, when you are flexible about Instance types and Availability Zones. Combinations of these features offer the best possible experience when using Amazon EC2 with Spinnaker.

In this post, we detail the recent enhancements, along with a walkthrough of how you can use them following the best practices.

Amazon EC2 Spot Instances

EC2 Spot Instances are spare compute capacity in the AWS Cloud available at steep discounts of up to 90% when compared to On-Demand Instance prices. The primary difference between an On-Demand Instance and a Spot Instance is that a Spot Instance can be interrupted by Amazon EC2 with a two-minute notification when Amazon EC2 needs the capacity back. Amazon EC2 now sends rebalance recommendation notifications when Spot Instances are at an elevated risk of interruption. This signal can arrive sooner than the two-minute interruption notice. This lets you proactively replace your Spot Instances before it’s interrupted.

The best way to adhere to Spot best practices and instance fleet management is by using an Amazon EC2 Auto Scaling group When using Spot Instances in Auto Scaling group, enabling Capacity Rebalancing helps you maintain workload availability by proactively augmenting your fleet with a new Spot Instance before a running instance is interrupted by Amazon EC2.

Spinnaker concepts

Spinnaker uses three key concepts to describe your services, including applications, clusters, and server groups, and how your services are exposed to users is expressed as Load balancers and firewalls.

An application is a collection of clusters, a cluster is a collection of server groups, and a server group identifies the deployable artifact and basic configuration settings such as the number of instances, autoscaling policies, metadata, etc. This corresponds to an Auto Scaling group in AWS. We use Auto Scaling groups and server groups interchangeably in this post.

Spinnaker and Amazon EC2 Integration

In mid-2020, we started looking into customer requests and gaps in the Amazon EC2 feature set supported in Spinnaker. Around the same time, Spinnaker OSS added support for Amazon EC2 Launch Templates. Thanks to their effort, we could follow-up and expand the Amazon EC2 feature set supported in Spinnaker. Now that we understand the new features, let’s look at how to use some of them in the following tutorial spinnaker.io.

Here are some highlights of the features contributed recently:

Feature Why use it? (Example use cases)
  Multiple Instance Types   Tap into multiple capacity pools to achieve and maintain the desired scale using Spot Instances.
  Combining On-Demand and Spot Instances

  – Control the proportion of On-Demand and Spot Instances launched in your sever group.

– Combine Spot Instances with Amazon EC2 Reserved Instances or Savings Plans.

  Amazon EC2 Auto Scaling allocation strategies   Reduce overall Spot interruptions by launching from Spot pools that are optimally chosen based on the available Spot capacity, using capacity-optimized Spot allocation strategy.
  Capacity rebalancing   Improve your workload availability by proactively shifting your Spot capacity to optimal pools by enabling capacity rebalancing along with capacity-optimized allocation strategy.
  Improved support for burstable performance instance types with custom credit specification   Reduce costs by preventing wastage of CPU cycles.

We recommend using Spinnaker stable release 1.28.x for API users and 1.29.x for UI users. Here is the Git issue for related PRs and feature releases.

Now that we understand the new features, let’s look at how to use some of them in the following tutorial.

Example tutorial: Deploy a demo web application on an Auto Scaling group with On-Demand and Spot Instances

In this example tutorial, we setup Spinnaker to deploy to Amazon EC2, create an Application Load Balancer, and deploy a demo application on a server group diversified across multiple instance types and purchase options – this case On-Demand and Spot Instances.

We leverage Spinnaker’s API throughout the tutorial to create new resources, along with a quick guide on how to deploy the same using Spinnaker UI (Deck) and leverage UI to view them.

Prerequisites

As a prerequisite to complete this tutorial, you must have an AWS Account with an AWS Identity and Access Management (IAM) User that has the AdministratorAccess configured to use with AWS Command Line Interface (AWS CLI).

1. Spinnaker setup

We will use the AWS CloudFormation template setup-spinnaker-with-deployment-vpc.yml to setup Spinnaker and the required resources.

1.1 Create an Secure Shell(SSH) keypair used to connect to Spinnaker and EC2 instances launched by Spinnaker.

AWS_REGION=us-west-2 # Change the region where you want Spinnaker deployed
EC2_KEYPAIR_NAME=spinnaker-blog-${AWS_REGION}
aws ec2 create-key-pair --key-name ${EC2_KEYPAIR_NAME} --region ${AWS_REGION} --query KeyMaterial --output text > ~/${EC2_KEYPAIR_NAME}.pem
chmod 600 ~/${EC2_KEYPAIR_NAME}.pem

1.2 Deploy the Cloudformation stack.

STACK_NAME=spinnaker-blog
SPINNAKER_VERSION=1.29.1 # Change the version if newer versions are available
NUMBER_OF_AZS=3
AVAILABILITY_ZONES=${AWS_REGION}a,${AWS_REGION}b,${AWS_REGION}c
ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
S3_BUCKET_NAME=spin-persitent-store-${ACCOUNT_ID}

# Download template
curl -o setup-spinnaker-with-deployment-vpc.yml https://raw.githubusercontent.com/awslabs/ec2-spot-labs/master/ec2-spot-spinnaker/setup-spinnaker-with-deployment-vpc.yml

# deploy stack
aws cloudformation deploy --template-file setup-spinnaker-with-deployment-vpc.yml \
    --stack-name ${STACK_NAME} \
    --parameter-overrides NumberOfAZs=${NUMBER_OF_AZS} \
    AvailabilityZones=${AVAILABILITY_ZONES} \
    EC2KeyPairName=${EC2_KEYPAIR_NAME} \
    SpinnakerVersion=${SPINNAKER_VERSION} \
    SpinnakerS3BucketName=${S3_BUCKET_NAME} \
    --capabilities CAPABILITY_NAMED_IAM --region ${AWS_REGION}

1.3 Connecting to Spinnaker

1.3.1 Get the SSH command to port forwarding for Deck – the browser-based UI (9000) and Gate – the API Gateway (8084) to access the Spinnaker UI and API.

SPINNAKER_INSTANCE_DNS_NAME=$(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --region ${AWS_REGION} --query "Stacks[].Outputs[?OutputKey=='SpinnakerInstance'].OutputValue" --output text)
echo 'ssh -A -L 9000:localhost:9000 -L 8084:localhost:8084 -L 8087:localhost:8087 -i ~/'${EC2_KEYPAIR_NAME}' [email protected]$'{SPINNAKER_INSTANCE_DNS_NAME}''

1.3.2 Open a new terminal and use the SSH command (output from the previous command) to connect to the Spinnaker instance. After you successfully connect to the Spinnaker instance via SSH, access the Spinnaker UI here and API here.

2. Deploy a demo web application

Let’s make sure that we have the environment variables required in the shell before proceeding. If you’re using the same terminal window as before, then you might already have these variables.

STACK_NAME=spinnaker-blog
AWS_REGION=us-west-2 # use the same region as before
EC2_KEYPAIR_NAME=spinnaker-blog-${AWS_REGION}
VPC_ID=$(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --region ${AWS_REGION} --query "Stacks[].Outputs[?OutputKey=='VPCID'].OutputValue" --output text)

2.1 Create a Spinnaker Application

We start by creating an application in Spinnaker, a placeholder for the service that we deploy.

curl 'http://localhost:8084/tasks' \
-H 'Content-Type: application/json;charset=utf-8' \
--data-raw \
'{
   "job":[
      {
         "type":"createApplication",
         "application":{
            "cloudProviders":"aws",
            "instancePort":80,
            "name":"demoapp",
            "email":"[email protected]",
            "providerSettings":{
               "aws":{
                  "useAmiBlockDeviceMappings":true
               }
            }
         }
      }
   ],
   "application":"demoapp",
   "description":"Create Application: demoapp"
}'

Spin Create Server Group

2.2 Create an Application Load Balancer

Let’s create an Application Load Balanacer and a target group for port 80, spanning the three availability zones in our public subnet. We use the Demo-ALB-SecurityGroup for Firewalls to allow public access to the ALB on port 80.

As Spot Instances are interrupted with a two minute warning, you must adjust the Target Group’s deregistration delay to a slightly lower time. Recommended values are 90 seconds or less. This allows time for in-flight requests to complete and gracefully close existing connections before the instance is interrupted.

curl 'http://localhost:8084/tasks' \
-H 'Content-Type: application/json;charset=utf-8' \
--data-binary \
'{
   "application":"demoapp",
   "description":"Create Load Balancer: demoapp",
   "job":[
      {
         "type":"upsertLoadBalancer",
         "name":"demoapp-lb",
         "loadBalancerType":"application",
         "cloudProvider":"aws",
         "credentials":"my-aws-account",
         "region":"'"${AWS_REGION}"'",
         "vpcId":"'"${VPC_ID}"'",
         "subnetType":"public-subnet",
         "idleTimeout":60,
         "targetGroups":[
            {
               "name":"demoapp-targetgroup",
               "protocol":"HTTP",
               "port":80,
               "targetType":"instance",
               "healthCheckProtocol":"HTTP",
               "healthCheckPort":"traffic-port",
               "healthCheckPath":"/",
               "attributes":{
                  "deregistrationDelay":90
               }
            }
         ],
         "regionZones":[
            "'"${AWS_REGION}"'a",
            "'"${AWS_REGION}"'b",
            "'"${AWS_REGION}"'c"
         ],
         "securityGroups":[
            "Demo-ALB-SecurityGroup"
         ],
         "listeners":[
            {
               "protocol":"HTTP",
               "port":80,
               "defaultActions":[
                  {
                     "type":"forward",
                     "targetGroupName":"demoapp-targetgroup"
                 }
               ]
            }
         ]
      }
   ]
}'

Spin Create ALB

2.3 Create a server group

Before creating a server group (Auto Scaling group), here is a brief overview of the features used in the example:

      • onDemandBaseCapacity (default 0): The minimum amount of your ASG’s capacity that must be fulfilled by On-Demand instances (can also be applied toward Reserved Instances or Savings Plans). The example uses an onDemandBaseCapacity of three.
      • onDemandPercentageAboveBaseCapacity (default 100): The percentages of On-Demand and Spot Instances for additional capacity beyond OnDemandBaseCapacity. The example uses onDemandPercentageAboveBaseCapacity of 10% (i.e. 90% Spot).
      • spotAllocationStrategy: This indicates how you want to allocate instances across Spot Instance pools in each Availability Zone. The example uses the recommended Capacity Optimized strategy. Instances are launched from optimal Spot pools that are chosen based on the available Spot capacity for the number of instances that are launching.
      • launchTemplateOverridesForInstanceType: The list of instance types that are acceptable for your workload. Specifying multiple instance types enables tapping into multiple instance pools in multiple Availability Zones, designed to enhance your service’s availability. You can use the ec2-instance-selector, an open source AWS Command Line Interface(CLI) tool to narrow down the instance types based on resource criteria like vcpus and memory.
      • capacityRebalance: When enabled, this feature proactively manages the EC2 Spot Instance lifecycle leveraging the new EC2 Instance rebalance recommendation. This increases the emphasis on availability by automatically attempting to replace Spot Instances in an ASG before they are interrupted by Amazon EC2. We enable this feature in this example.

Learn more on spinnaker.io: feature descriptions and use cases and sample API requests.

Let’s create a server group with a desired capacity of 12 instances diversified across current and previous generation instance types, attach the previously created ALB, use Demo-EC2-SecurityGroup for the Firewalls which allows http traffic only from the ALB, use the following bash script for UserData to install httpd, and add instance metadata into the index.html.

2.3.1 Save the userdata bash script into a file user-date.sh.

Note that Spinnaker only support base64 encoded userdata. We use base64 bash command to encode the file contents in the next step.

cat << "EOF" > user-data.sh
#!/bin/bash
yum update -y
yum install httpd -y
echo "<html>
    <head>
        <title>Demo Application</title>
        <style>body {margin-top: 40px; background-color: #Gray;} </style>
    </head>
    <body>
        <h2>You have reached a Demo Application running on</h2>
        <ul>
            <li>instance-id: <b> `curl http://169.254.169.254/latest/meta-data/instance-id` </b></li>
            <li>instance-type: <b> `curl http://169.254.169.254/latest/meta-data/instance-type` </b></li>
            <li>instance-life-cycle: <b> `curl http://169.254.169.254/latest/meta-data/instance-life-cycle` </b></li>
            <li>availability-zone: <b> `curl http://169.254.169.254/latest/meta-data/placement/availability-zone` </b></li>
        </ul>
    </body>
</html>" > /var/www/html/index.html
systemctl start httpd
systemctl enable httpd
EOF

2.3.2 Create the server group by running the following command. Note we use the KeyPairName that we created as part of the prerequisites.

curl 'http://localhost:8084/tasks' \
-H 'Content-Type: application/json;charset=utf-8' \
-d \
'{
   "job":[
      {
         "type":"createServerGroup",
         "cloudProvider":"aws",
         "account":"my-aws-account",
         "application":"demoapp",
         "stack":"",
         "credentials":"my-aws-account",
	"healthCheckType": "ELB",
	"healthCheckGracePeriod":600,
	"capacityRebalance": true,
         "onDemandBaseCapacity":3, 
         "onDemandPercentageAboveBaseCapacity":10,
         "spotAllocationStrategy":"capacity-optimized",
         "setLaunchTemplate":true,
         "launchTemplateOverridesForInstanceType":[
            {
               "instanceType":"m4.large"
            },
            {
               "instanceType":"m5.large"
            },
            {
               "instanceType":"m5a.large"
            },
            {
               "instanceType":"m5ad.large"
            },
            {
               "instanceType":"m5d.large"
            },
            {
               "instanceType":"m5dn.large"
            },
            {
               "instanceType":"m5n.large"
            }

         ],
         "capacity":{
            "min":6,
            "max":21,
            "desired":12
         },
         "subnetType":"private-subnet",
         "availabilityZones":{
            "'"${AWS_REGION}"'":[
               "'"${AWS_REGION}"'a",
               "'"${AWS_REGION}"'b",
               "'"${AWS_REGION}"'c"
            ]
         },
         "keyPair":"'"${EC2_KEYPAIR_NAME}"'",
         "securityGroups":[
            "Demo-EC2-SecurityGroup"
         ],
         "instanceType":"m5.large",
         "virtualizationType":"hvm",
         "amiName":"'"$(aws ec2 describe-images --owners amazon --filters "Name=name,Values=amzn2-ami-hvm-2*x86_64-gp2" --query 'reverse(sort_by(Images, &CreationDate))[0].Name' --region ${AWS_REGION} --output text)"'",
         "targetGroups":[
            "demoapp-targetgroup"
         ],
         "base64UserData":"'"$(base64 user-data.sh)"'",,
        "associatePublicIpAddress":false,
         "instanceMonitoring":false
      }
   ],
   "application":"demoapp",
   "description":"Create New server group in cluster demoapp"
}'

Spin Create ServerGroup

Spinnaker creates an Amazon EC2 Launch Template and an ASG with specified parameters and waits until the ALB health check passes before sending traffic to the EC2 Instances.

The server group and launch template that we just created will look like this in Spinnaker UI:

Spin View ServerGroup

The UI also displays capacity type, such as the purchase option for each instance type in the Instance Information section:

Spin View ServerGroup Purchase Options 1Spin View ServerGroup Purchase Options 2

3. Access the application

Copy the Application Load Balancer URL by selecting the tree icon in the right top corner of the server group, and access it in a browser. You can refresh multiple times to see that the requests are going to different instances every time.

Spin Access App

Congratulations! You successfully deployed the demo application on an Amazon EC2 server group diversified across multiple instance types and purchase options.

Moreover, you can clone, modify, disable, and destroy these server groups, as well as use them with Spinnaker pipelines to effectively release new versions of your application.

Cost savings

Check the savings you realized by deploying your demo application on EC2 Spot Instances by going to EC2 console > Spot Requests > Saving Summary.

Spin Spot Savings

Cleanup

To avoid incurring any additional charges, clean up the resources created in the tutorial.

Frist, delete the server group, application load balancer and application in Spinnaker.

curl 'http://localhost:8084/tasks' \
-H 'Content-Type: application/json;charset=utf-8' \
--data-raw \
'{
   "job":[
      {
         "reason":"Cleanup",
         "asgName":"demoapp-v000",
         "moniker":{
            "app":"demoapp",
            "cluster":"demoapp",
            "sequence":0
         },
         "serverGroupName":"demoapp-v000",
         "type":"destroyServerGroup",
         "region":"'"${AWS_REGION}"'",
         "credentials":"my-aws-account",
         "cloudProvider":"aws"
      },
      {
         "cloudProvider":"aws",
         "loadBalancerName":"demoapp-lb",
         "loadBalancerType":"application",
         "regions":[
            "'"${AWS_REGION}"'"
         ],
         "credentials":"my-aws-account",
         "vpcId":"'"${VPC_ID}"'",
         "type":"deleteLoadBalancer"
      },
      {
         "type":"deleteApplication",
         "application":{
            "name":"demoapp",
            "cloudProviders":"aws"
         }
      }
   ],
   "application":"demoapp",
   "description":"Deleting ServerGroup, ALB and Application: demoapp"
}'

Wait for Spinnaker to delete all of the resources before proceeding further. You can confirm this either on the Spinnaker UI or AWS Management Console.

Then delete the Spinnaker infrastructure by running the following command:

aws ec2 delete-key-pair --key-name ${EC2_KEYPAIR_NAME} --region ${AWS_REGION}
rm ~/${EC2_KEYPAIR_NAME}.pem
aws s3api delete-objects \
--bucket ${S3_BUCKET_NAME} \
--delete "$(aws s3api list-object-versions \
--bucket ${S3_BUCKET_NAME} \
--query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}')" #If error occurs, there are no Versions and is OK
aws s3api delete-objects \
--bucket ${S3_BUCKET_NAME} \
--delete "$(aws s3api list-object-versions \
--bucket ${S3_BUCKET_NAME} \
--query='{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}')" #If error occurs, there are no DeleteMarkers and is OK
aws s3 rb s3://${S3_BUCKET_NAME} --force #Delete Bucket
aws cloudformation delete-stack --region ${AWS_REGION} --stack-name ${STACK_NAME}

Conclusion

In this post, we learned about the new Amazon EC2 features recently added to Spinnaker, and how to use them to build diversified and optimized Auto Scaling Groups. We also discussed recommended best practices for EC2 Spot and how they can improve your experience with it.

We would love to hear from you! Tell us about other Continuous Integration/Continuous Delivery (CI/CD) platforms that you want to use with EC2 Spot and/or Auto Scaling Groups by adding an issue on the Spot integrations roadmap.

Amazon Linux 2023, a Cloud-Optimized Linux Distribution with Long-Term Support

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-linux-2023-a-cloud-optimized-linux-distribution-with-long-term-support/

I am excited to announce the general availability of Amazon Linux 2023 (AL2023). AWS has provided you with a cloud-optimized Linux distribution since 2010. This is the third generation of our Amazon Linux distributions.

Every generation of Amazon Linux distribution is secured, optimized for the cloud, and receives long-term AWS support. We built Amazon Linux 2023 on these principles, and we go even further. Deploying your workloads on Amazon Linux 2023 gives you three major benefits: a high-security standard, a predictable lifecycle, and a consistent update experience.

Let’s look at security first. Amazon Linux 2023 includes preconfigured security policies that make it easy for you to implement common industry guidelines. You can configure these policies at launch time or run time.

For example, you can configure the system crypto policy to enforce system-wide usage of a specific set of cipher suites, TLS versions, or acceptable parameters in certificates and key exchanges. Also, the Linux kernel has many hardening features enabled by default.

Amazon Linux 2023 makes it easier to plan and manage the operating system lifecycle. New Amazon Linux major versions will be available every two years. Major releases include new features and improvements in security and performance across the stack. The improvements might include major changes to the kernel, toolchain, GLib C, OpenSSL, and any other system libraries and utilities.

During those two years, a major release will receive an update every three months. These updates include security updates, bug fixes, and new features and packages. Each minor version is a cumulative list of updates that includes security and bug fixes in addition to new features and packages. These releases might include the latest language runtimes such as Python or Java. They might also include other popular software packages such as Ansible and Docker. In addition to these quarterly updates, security updates will be provided as soon as they are available.

Each major version, including 2023, will come with five years of long-term support. After the initial two-year period, each major version enters a three-year maintenance period. During the maintenance period, it will continue to receive security bug fixes and patches as soon as they are available. This support commitment gives you the stability you need to manage long project lifecycles.

The following diagram illustrates the lifecycle of Amazon Linux distributions:

Last—and this policy is by far my favorite—Amazon Linux provides you with deterministic updates through versioned repositories, a flexible and consistent update mechanism. The distribution locks to a specific version of the Amazon Linux package repository, giving you control over how and when you absorb updates. By default, and in contrast with Amazon Linux 2, a dnf update command will not update your installed packages (dnf is the successor to yum). This helps to ensure that you are using the same package versions across your fleet. All Amazon Elastic Compute Cloud (Amazon EC2) instances launched from an Amazon Machine Image (AMI) will have the same version of packages. Deterministic updates also promote usage of immutable infrastructure, where no infrastructure is updated after deployment. When an update is required, you update your infrastructure as code scripts and redeploy a new infrastructure. Of course, if you really want to update your distribution in place, you can point dnf to an updated package repository and update your machine as you do today. But did I tell you this is not a good practice for production workloads? I’ll share more technical details later in this blog post.

How to Get Started
Getting started with Amazon Linux 2023 is no different than with other Linux distributions. You can use the EC2 run-instances API, the AWS Command Line Interface (AWS CLI), or the AWS Management Console, and one of the four Amazon Linux 2023 AMIs that we provide. We support two machine architectures (x86_64 and Arm) and two sizes (standard and minimal). Minimal AMIs contain the most basic tools and utilities to start the OS. The standard version comes with the most commonly used applications and tools installed.

To retrieve the latest AMI ID for a specific Region, you can use AWS Systems Manager get-parameter API and query the /aws/service/ami-amazon-linux-latest/<alias> parameter.

Be sure to replace <alias> with one of the four aliases available:

  • For arm64 architecture (standard AMI): al2023-ami-kernel-default-arm64
  • For arm64 architecture (minimal AMI): al2023-ami-minimal-kernel-default-arm64
  • For x86_64 architecture (standard AMI): al2023-ami-kernel-default-x86_64
  • For x86_64 architecture (minimal AMI): al2023-ami-minimal-kernel-default-x86_64

For example, to search for the latest Arm64 full distribution AMI ID, I open a terminal and enter:

~ aws ssm get-parameters --region us-east-2 --names /aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64
{
    "Parameters": [
        {
            "Name": "/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64",
            "Type": "String",
            "Value": "ami-02f9b41a7af31dded",
            "Version": 1,
            "LastModifiedDate": "2023-02-24T22:54:56.940000+01:00",
            "ARN": "arn:aws:ssm:us-east-2::parameter/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64",
            "DataType": "text"
        }
    ],
    "InvalidParameters": []
}

To launch an instance, I use the run-instances API. Notice how I use Systems Manager resolution to dynamically lookup the AMI ID from the CLI.

➜ aws ec2 run-instances                                                                            \
       --image-id resolve:ssm:/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64  \
       --key-name my_ssh_key_name                                                                   \
       --instance-type c6g.medium                                                                   \
       --region us-east-2 
{
    "Groups": [],
    "Instances": [
        {
          "AmiLaunchIndex": 0,
          "ImageId": "ami-02f9b41a7af31dded",
          "InstanceId": "i-0740fe8e23f903bd2",
          "InstanceType": "c6g.medium",
          "KeyName": "my_ssh_key_name",
          "LaunchTime": "2023-02-28T14:12:34+00:00",

...(redacted for brevity)
}

When the instance is launched, and if the associated security group allows SSH (TCP 22) connections, I can connect to the machine:

~ ssh [email protected]
Warning: Permanently added '3.145.19.213' (ED25519) to the list of known hosts.
   ,     #_
   ~\_  ####_        Amazon Linux 2023
  ~~  \_#####\       Preview
  ~~     \###|
  ~~       \#/ ___   https://aws.amazon.com/linux/amazon-linux-2023
   ~~       V~' '->
    ~~~         /
      ~~._.   _/
         _/ _/
       _/m/'
Last login: Tue Feb 28 14:14:44 2023 from 81.49.148.9
[[email protected] ~]$ uname -a
Linux ip-172-31-9-76.us-east-2.compute.internal 6.1.12-19.43.amzn2023.aarch64 #1 SMP Thu Feb 23 23:37:18 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

We also distribute Amazon Linux 2023 as Docker images. The Amazon Linux 2023 container image is built from the same software components that are included in the Amazon Linux 2023 AMI. The container image is available for use in any environment as a base image for Docker workloads. If you’re using Amazon Linux for applications in EC2, you can containerize your applications with the Amazon Linux container image.

These images are available from Amazon Elastic Container Registry (Amazon ECR) and from Docker Hub. Here is a quick demo to start a Docker container using Amazon Linux 2023 from Elastic Container Registry.

$ aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws
Login Succeeded
~ docker run --rm -it public.ecr.aws/amazonlinux/amazonlinux:2023 /bin/bash
Unable to find image 'public.ecr.aws/amazonlinux/amazonlinux:2023' locally
2023: Pulling from amazonlinux/amazonlinux
b4265814d5cf: Pull complete 
Digest: sha256:bbd7a578cff9d2aeaaedf75eb66d99176311b8e3930c0430a22e0a2d6c47d823
Status: Downloaded newer image for public.ecr.aws/amazonlinux/amazonlinux:2023
bash-5.2# uname -a 
Linux 9d5b45e9f895 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
bash-5.2# exit 

When pulling from Docker Hub, you can use this command to pull the image: docker pull amazonlinux:2023.

What Are the Main Differences Compared to Amazon Linux 2?
Amazon Linux 2023 has some differences compared to Amazon Linux 2. The documentation explains these differences in detail. The two differences I would like to focus on are dnf and the package management policies.

AL2023 comes with Fedora’s dnf, the successor to yum. But don’t worry, dnf provides similar commands as yum to search, install, or remove packages. Where you used to run the commands yum list or yum install httpd, you may now run dnf list or dnf install httpd. For convenience, we create a symlink for /usr/bin/yum, so you can run your scripts unmodified.

$ which yum
/usr/bin/yum
$ ls -al /usr/bin/yum
lrwxrwxrwx. 1 root root 5 Jun 19 18:06 /usr/bin/yum -> dnf-3

The biggest difference, in my opinion, is the deterministic updates through versioned repositories. By default, the software repository is locked to the AMI version. This means that a dnf update command will not return any new packages to install. Versioned repositories give you the assurance that all machines started from the same AMI ID are identical. Your infrastructure will not deviate from the baseline.

$ sudo dnf update 
Last metadata expiration check: 0:14:10 ago on Tue Feb 28 14:12:50 2023.
Dependencies resolved.
Nothing to do.
Complete!

Yes, but what if you want to update a machine? You have two options to update an existing machine. The cleanest one for your production environment is to create duplicate infrastructure based on new AMIs. As I mentioned earlier, we publish updates for every security fix and a consolidated update every three months for two years after the initial release. Each update is provided as a set of AMIs and their corresponding software repository.

For smaller infrastructure, such as test or development machines, you might choose to update the operating system or individual packages in place as well. This is a three-step process:

  • first, list the available updated software repositories;
  • second, point dnf to a specific software repository;
  • and third, update your packages.

To show you how it works, I purposely launched an EC2 instance with an “old” version of Amazon Linux 2023 from February 2023. I first run dnf check-release-update to list the available updated software repositories.

$ dnf check-release-update
WARNING:
  A newer release of "Amazon Linux" is available.

  Available Versions:

  Version 2023.0.20230308:
    Run the following command to upgrade to 2023.0.20230308:

      dnf upgrade --releasever=2023.0.20230308

    Release notes:
     https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes.html

Then, I might either update the full distribution using dnf upgrade --releasever=2023.0.20230308 or point dnf to the updated repository to select individual packages.

$ dnf check-update --releasever=2023.0.20230308

Amazon Linux 2023 repository                                                    28 MB/s |  11 MB     00:00
Amazon Linux 2023 Kernel Livepatch repository                                  1.2 kB/s | 243  B     00:00

amazon-linux-repo-s3.noarch                          2023.0.20230308-0.amzn2023                amazonlinux
binutils.aarch64                                     2.39-6.amzn2023.0.5                       amazonlinux
ca-certificates.noarch                               2023.2.60-1.0.amzn2023.0.1                amazonlinux
(redacted for brevity)
util-linux-core.aarch64 2.37.4-1.amzn2022.0.1 amazonlinux

Finally, I might run a dnf update <package_name> command to update a specific package.

This might look like overkill for a simple machine, but when managing enterprise infrastructure or large-scale fleets of instances, this facilitates the management of your fleet by ensuring that all instances run the same version of software packages. It also means that the AMI ID is now something that you can fully run through your CI/CD pipelines for deployment and that you have a way to roll AMI versions forward and backward according to your schedule.

Where is Fedora?
When looking for a base to serve as a starting point for Amazon Linux 2023, Fedora was the best choice. We found that Fedora’s core tenets (Freedom, Friends, Features, First) resonate well with our vision for Amazon Linux. However, Amazon Linux focuses on a long-term, stable OS for the cloud, which is a notable different release cycle and lifecycle than Fedora. Amazon Linux 2023 provides updated versions of open-source software, a larger variety of packages, and frequent releases.

Amazon Linux 2023 isn’t directly comparable to any specific Fedora release. The Amazon Linux 2023 GA version includes components from Fedora 34, 35, and 36. Some of the components are the same as the components in Fedora, and some are modified. Other components more closely resemble the components in CentOS Stream 9 or were developed independently. The Amazon Linux kernel, on its side, is sourced from the long-term support options that are on kernel.org, chosen independently from the kernel provided by Fedora.

Like every good citizen in the open-source community, we give back and contribute our changes to upstream distributions and sources for the benefit of the entire community. Amazon Linux 2023 itself is open source. The source code for all RPM packages that are used to build the binaries that we ship are available through the SRPM yum repository (sudo dnf install -y 'dnf-command(download)' && dnf download --source bash)

One More Thing: Amazon EBS Gp3 Volumes
Amazon Linux 2023 AMIs use gp3 volumes by default.

Gp3 is the latest generation general-purpose solid-state drive (SSD) volume for Amazon Elastic Block Store (Amazon EBS). Gp3 provides 20 percent lower storage costs compared to gp2. Gp3 volumes deliver a baseline performance of 3,000 IOPS and 125MB/s at any volume size. What I particularly like about gp3 volumes is that I can now provision performance independently of capacity. When using gp3 volumes, I can now increase IOPS and throughput without incurring charges for extra capacity that I don’t actually need.

With the availability of gp3-backed AL2023 AMIs, this is the first time a gp3-backed Amazon Linux AMI is available. Gp3-backed AMIs have been a common customer request since gp3 was launched in 2020. It is now available by default.

Price and Availability
Amazon Linux 2023 is provided at no additional charge. Standard Amazon EC2 and AWS charges apply for running EC2 instances and other services. This distribution includes full support for five years. When deploying on AWS, our support engineers will provide technical support according to the terms and conditions of your AWS Support plan. AMIs are available in all AWS Regions.

Amazon Linux is the most used Linux distribution on AWS, with hundreds of thousands of customers using Amazon Linux 2. Dozens of Independent Software Vendors (ISVs) and hardware partners are supporting Amazon Linux 2023 today. You can adopt this new version with the confidence that the partner tools you rely on are likely to be supported. We are excited about this release, which brings you an even higher level of security, a predictable release lifecycle, and a consistent update experience.

Now go build and deploy your workload on Amazon Linux 2023 today.

— seb

Realtime monitoring of microservices and cloud-native applications with IBM Instana SaaS on AWS

Post Syndicated from Eduardo Monich Fronza original https://aws.amazon.com/blogs/architecture/realtime-monitoring-of-microservices-and-cloud-native-applications-with-ibm-instana-saas-on-aws/

Customers are adopting microservices architecture to build innovative and scalable applications on Amazon Web Services (AWS). These microservices applications are deployed across multiple AWS services, and customers are looking for comprehensive observability solutions that can help them effectively monitor and manage the performance of their applications in real-time.

IBM Instana is a fully automated application performance management (APM) solution, available to customers as a fully managed software as a service (SaaS) solution on AWS. It is specifically designed to help customers address the challenges of monitoring microservices and cloud-native applications in real-time. It uses artificial intelligence and machine learning to provide detailed insights into the health and behavior of applications, allowing developers and IT teams to gain real-time insights into their microservices applications, optimize performance, and quickly identify and troubleshoot issues.

This post explains the capabilities of IBM Instana to automatically collect observability metrics, traces, and events from microservices deployed on AWS cloud, as well as on-premises, to provide full visibility into the performance of individual components and applications as a whole.

IBM Instana solution overview

IBM Instana is designed to be highly scalable and adaptable to changing microservices applications environments. Its architecture (Figure 1) consists of several components that work together to provide comprehensive monitoring for microservices and cloud-native applications.

Instana’s main building blocks are host agents and agent sensors that are deployed in a customer’s AWS account and responsible for collecting, aggregating, and sending detailed monitoring information of applications and AWS services to the Instana SaaS backend.

The Instana SaaS backend services provide several key components, including data collectors, storage services, analytics engines, and user interfaces. It allows customers to process and analyze data in real-time, generate actionable insights, have a comprehensive view of their applications and infrastructure performance, enabling them to quickly identify and resolve issues and improve their overall operations.

IBM Instana architecture on AWS

Figure 1. IBM Instana architecture on AWS

Monitoring data

Instana monitors and observes microservices and cloud-native applications by collecting beacons, traces, and one-second metrics:

  • Beacons are small monitoring payloads that are transmitted by a JavaScript agent to the Instana servers, modeling specific events occurring within the lifecycle of a page view of a website; for example, page loading, resource retrieval, and HTTP requests.
  • Traces are detailed records of the requests and transactions that flow through a microservice architecture. They record the sequence of events that occur when a request is processed, including the services that are involved, the duration of each service, and any errors or exceptions that occur. Instana automatically correlates traces across services to provide a complete view of an entire transaction. This allows for easy identification and diagnosis of performance issues.
  • Metrics are numerical values that represent the performance and resource utilization of a microservice or infrastructure component. Metrics are collected by Instana Agents and sent to the Instana backend at regular intervals. Instana Agents collect hundreds of different metrics, including (but not limited to) CPU usage, memory usage, network traffic, and disk I/O.

This information is captured by Instana agents and sensors, which also collect application configurations and events, plus discover application building blocks, including clusters, containers, and services.

IBM Instana agents and sensors

The Instana host agent is a lightweight software component that collects and aggregates data from various sensors before sending the data to the Instana backend. It can be deployed to AWS services, including Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Fargate, AWS Lambda, or Red Hat OpenShift Service on AWS (ROSA). A single host agent, one per host, is used to collect data from monitored systems.

Once Instana agents are running, they automatically detect applications and services, such as containers running on Amazon EKS, and processes like Nginx, NodeJS, Spring Boot, Postgres, Elasticsearch, or Cassandra. For each component detected, different Instana sensors are automatically downloaded, installed, and configured to monitor the environment.

Instana sensors are small programs that are designed to attach and monitor one specific technology and pass their data to the agent. They are automatically managed, updated, loaded, and unloaded by the host agent.

These sensors can monitor several different AWS services like Lambda, Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), Amazon Aurora, Amazon Simple Queue Service, and Amazon Managed Streaming for Apache Kafka. They collect data—like request and error rates, latency, CPU utilization—via AWS APIs and Amazon CloudWatch.

Instana also provides sensors to collect data from applications running on AWS, like IBM MQ, IBM Db2, or Red Hat OpenShift Container Platform. Review IBM’s full list of supported technologies and AWS services.

Instana also provides tracers, which are used with runtimes like Java, .NET, NodeJS, plus others. They modify code execution to capture logs, traces at request level, and send those back to the Instana agent.

With the use of sensors, the host agent collects configuration data and monitors the applications it has detected. The host agent also handles communications with the Instana SaaS backend services. It collects, aggregates and sends logs, traces and records metrics (such as response times, error rates, and resource utilization) every second to the Instana SaaS backend in real-time, using secure and efficient communication protocols.

IBM Instana SaaS

The Instana SaaS backend is the heart of the Instana APM solution and responsible for processing, storing, and analyzing the monitoring data collected from the Instana agents and sensors installed in the customer’s infrastructure.

It consists of several components and services that work together to provide real-time monitoring and analysis of microservices applications, including:

  • Data collectors: Receive and process data from the Instana agents and sensors, and store it in the Instana backend for further analysis.
  • Analytics engine: Analyzes the data collected by the agents and sensors to provide insights into the performance and health of the microservices applications.
  • User interface: Web-based interface that customers use to view and analyze their monitoring data.
  • Alerting engine: Generates alerts when thresholds or anomalies are detected in the monitoring data.
  • Data storage: Time-series database that stores the monitoring data collected by the agents and sensors. Allows customers to query and analyze the data in real-time.
  • Integrations: Integrates with various third-party tools, such as Slack, PagerDuty, and ServiceNow, providing seamless alerting and incident management.

IBM Instana backend: making sense of the situation in real time

The Instana SaaS platform automatically ingests data from agents and continuously updates a dependency map (Figure 2). This map presents every dependency in context, giving users an easy way to understand the interrelationships between application components and services.

This understanding enables users to identify the upstream and downstream impacts of any issue, ensuring that they stay informed about any potential impacts.

An example of an IBM Instana dependency map

Figure 2. An example of an IBM Instana dependency map

Instana traces every request end-to-end without sampling. The traces are analyzed in real-time, providing metrics that make any performance problems immediately visible. In the event of an incident, Instana can illustrate how a single issue can generate a ripple effect and impact a number of directly and indirectly connected services. Using the relationship information from the Dynamic Graph, Instana’s automatic root-cause analysis can precisely aggregate the individual issues into a single incident.

Applications monitoring with IBM Instana

Figure 3. Applications monitoring with IBM Instana

Developers, IT operations, or site reliability engineers (SREs) can access the Instana backend end-user monitoring interface (Figure 3) or end-user monitoring (EUM) interface (Figure 4) to view monitoring data of their workloads. These can be websites, mobile applications, AWS services, and infrastructure levels. From this UI, these personas can access service dashboards that show key performance indicators (KPIs), like response time and error rate.

End-user monitoring with IBM Instana

Figure 4. End-user monitoring with IBM Instana

The following actions demonstrate how an EUM for a JavaScript application, deployed to Amazon S3 can be completed:

  • Developers inject Instana JavaScript code (Figure 5) into the static website (HTML).
  • When a user visits the website, the JavaScript agent sends beacons to the Instana backend.
  • Dashboards show specific events of the website lifecycle, including page loading, JS errors, and HTTP requests.
  • Teams access Instana UI to check performance matrices. They can configure Smart Alerts with custom alerting policies based on specific metrics and KPIs.
  • Smart Alerts can send alerts via various channels, such as email, Slack, or IBM Watson AIOps Webhook.
  • In case of an incident, teams can use Instana to retrieve various performance metrics for root-cause analysis.
  • Developers can resolve the issues and apply the patch.
IBM Instana EUM JavaScript agent

Figure 5. IBM Instana EUM JavaScript agent

Instana also offers Smart Alerts (Figure 6) to provide a more intuitive process of managing alerts. With Smart Alerts, customers can automatically generate alerting configurations using relevant KPIs and automatic threshold detection for use cases like website slowness or website errors.

IBM Instana Smart Alerts

Figure 6. IBM Instana Smart Alerts

Conclusion

In this post, we discussed how IBM Instana provides a comprehensive monitoring solution with the right tools to help you implement a real-time observability and monitoring solution. It allows you to gain insight into your microservices and cloud-native applications, including visibility into AWS services, containers, on-premises infrastructure, and other technologies. Instana can quickly identify and resolve issues before they impact end-users, ensuring that your applications are performing optimally.

As an IT administrator, developer, or business owner, IBM Instana on AWS give a deeper understanding of your applications and help you make data-driven decisions to improve overall performance.

Additional resources

Enabling Microsoft Defender Credential Guard on Amazon EC2

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/enabling-microsoft-defender-credential-guard-on-amazon-ec2/

This blog post is written by Jason Nicholls, Principal Solutions Architect AWS.

In this post we show you how to enable that Windows Defender Credential Guard (Credential Guard) on Amazon Elastic Compute Cloud (Amazon EC2) running Microsoft Windows Server. Credential Guard, when enabled on Amazon EC2 Windows Instances protects sensitive user login information from being extracted from the Operating System (OS) memory.

Microsoft Windows stores credential material such as authentication tokens in the Local Security Authority (LSA), a process in the Microsoft Windows operating system that is responsible for enforcing the security policy of the system. Traditionally these credentials are stored in LSA’s process memory accessible to the rest of the OS, making it possible for a privileged user to extract these credentials. The use of complex passwords, limiting privileged account use, and training users not to use the same password across multiple systems are steps that limit the use of credentials extracted from the LSA, but customers want to isolate these credentials from other rest of the OS.

With Credential Guard enabled, the LSA is isolated by Windows virtualization-based security (VBS). VBS is a suite of Windows security mechanisms that use hardware virtualization features to create an isolated compute environment to store user credentials referred to as the isolated LSA. The isolated LSA is inaccessible to the rest of the OS.

Prerequisites

Credential Guard requires a virtualization technology that supports Virtualization-based Security (VBS) and Unified Extensible Firmware Interface (UEFI) Secure Boot. Optionally, Credential Guard can leverage a Trusted Platform Module (TPM) to further secure credentials. In this walk-through we will use NitroTPM, a virtual TPM 2.0-compliant (ISO/IEC 11889:2015) module for your Amazon EC2 instances to enhance the security of the isolated LSA.

Launch your instance

A list of Windows Amazon Machine Image (AMI)s preconfigured to enable UEFI Secure Boot is available in our “Launch an instance with UEFI Secure Boot support ” guide. You can verify that your AMI supports UEFI Secure Boot and NitroTPM using the AWS Command Line Interface (AWS CLI) describe-images command as follows:

aws ec2 describe-images --image-ids ami-0123456789

When UEFI Secure Boot and NitroTPM are enabled for the AMI, “TpmSupport“: “v2.0“, and “BootMode”: “uefi” appear in the output respectively, such as in the following example.

{
   "Images": [
      {
         ...
         "BootMode": "uefi",
         "TpmSupport": "v2.0"
      }
   ]
}

Before launching the AMI verify that the instance type you want to launch supports UEFI boot mode and uses the Nitro System by using the DescribeInstanceTypes API call. Using the AWS CLI and calling describe-instance-types as follows:

aws ec2 describe-instance-types --instance-types [INSTANCE_TYPE] --region [REGION]

Where INSTANCE_TYPE is a supported instance type as defined in the documentation, an example is c5.large and REGION is a supported AWS Region.

The output of the command should display nitro as the hypervisor and list uefi as a supported boot mode. For example:

{
   "InstanceTypes": [
   {
      ...
      "Hypervisor": "nitro",
      ...
      "SupportedBootModes": [
                "legacy-bios",
                "uefi"
      ]
   }
}

Use the Amazon EC2 console or AWS Command Line Interface (AWS CLI) to launch an Amazon EC2 instance which has “uefi” boot mode enabled. Administrator access to the Amazon EC2 instance is required to enable Credential Guard.

Walkthrough:

Enabling Credential Guard with Amazon EC2 Launch an Amazon EC2 Windows Instance

Launch an Amazon EC2 Windows Instance using a Windows AMI preconfigured to enable UEFI Secure Boot with Microsoft Windows Secure Boot Keys on an instance type that supports UEFI Secure Boot.

You can use the launch wizard in the Amazon EC2 console or run-instances command via the AWS CLI to launch an instance that can support Credential Guard. You need a compatible AMI ID for launching your instance which is unique for each AWS Region. You can use the following link to discover and launch instances with compatible Amazon-provided AMIs in the Amazon EC2 console:

Enable Credential Guard on the launched Instance

Now that you know how to create an AMI with UEFI Secure Boot support enabled, let’s create a Windows instance and configure Credential Guard.

Credential Guard can be enabled either by using Group Policies, the Windows Registry, or the Windows Defender Device Guard and Windows Defender Credential Guard hardware readiness tool (DG-Readiness tool). In this post we will show you how to enable Credential Guard using the Windows Registry and via the DG-Readiness tool.

Option 1: Enabling Credential Guard via the Windows Registry

Start the Amazon EC2 instance using an AMI that has the BootMode set to “uefi“. Windows AMI must be preconfigured to enable UEFI Secure Boot with Microsoft Windows Secure Boot keys as we defined earlier. Once you’re connected to the instance use the Windows System Information tool to check that Credential Guard isn’t running on the instance:

  1. Select Start, type msinfo32.exe, and then select System Information.
  2. Select System Summary on the left.
  3. Confirm that Virtualization-based security is Not Enabled.

Figure 1 Overview of System Information in WindowsFigure 1 Overview of System Information in Windows

Figure 2 System Information confirming that Credential Guard is Not Enabled

Figure 2 System Information confirming that Credential Guard is Not Enabled

4. Open the Windows Command Shell by selecting Start, type cmd.exe, and then press Enter.

5. Run the following commands from the Windows Command Shell to enable Credential Guard using the Windows Registry:

REG ADD "HKLM\System\CurrentControlSet\Control\DeviceGuard" /v EnableVirtualizationBasedSecurity /t REG_DWORD /d 1 /f

REG ADD "HKLM\System\CurrentControlSet\Control\DeviceGuard" /v RequirePlatformSecurityFeatures /t REG_DWORD /d 1 /f

REG ADD "HKLM\System\CurrentControlSet\Control\LSA" /v LsaCfgFlags /t REG_DWORD /d 2 /f

REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard\Capabilities\" /v "HyperVEnabled" /t REG_DWORD /d 1 /f

Figure 3 Update the Windows Registry to enable Credential GuardFigure 3 Update the Windows Registry to enable Credential Guard

6. Once the changes have been made, reboot the instance.

7. When the instance is back up, reconnect to the instance and select Start, type msinfo32.exe, and then select System Information.

8. Select System Summary on the left.

9. Note that Virtualisation-based security is now changed to Running, and that Secure Boot and Credential Guard are enabled.

Figure 4 Overview of System Information showing Credential Guard is EnabledFigure 4 Overview of System Information showing Credential Guard is Enabled

Figure 5 System Information shows Credential Guard is now EnabledFigure 5 System Information shows Credential Guard is now Enabled

Option 2: Enabling Credential Guard via the DG-Readiness tool

Option 1 requires a remote desktop session to your Windows Instance. Option 2 is run via PowerShell which can either be done in a remote desktop session as described here or via a remote PowerShell session.

Start the Amazon EC2 instance using an AMI that has the BootMode set to “uefi“. Windows AMI must be preconfigured to enable UEFI Secure Boot with Microsoft Windows Secure Boot keys as we defined earlier. Once you’re connected to the instance start a PowerShell session:

  1. Select Start, type PowerShell, and then click Windows PowerShell.

Figure 6 Start PowerShell via the Start BarFigure 6 Start PowerShell via the Start Bar

      2. Download the DG-Readiness tool by running the command:

wget https://download.microsoft.com/download/B/D/8/BD821B1F-05F2-4A7E-AA03-DF6C4F687B07/dgreadiness_v3.6.zip -outfile dgreadiness.zip

Figure 7 Download the DG-Readiness toolFigure 7 Download the DG-Readiness tool

  1. Uncompress the downloaded zip file using the Expand-Archive function within PowerShell
Expand-Archive -Path C:\Users\Administrator\dgreadiness.zip -DestinationPath C:\dgreadiness
  1. Move the DG-Readiness tool to the current folder

copy C:\dgreadiness\dgreadiness_v3.6\DG_Readiness_Tool_v3.6.ps1 .\

Figure 8 Copy the DG-Readiness tool to the current folderFigure 8 Copy the DG-Readiness tool to the current folder

  1. Confirm that Credential Guard is disabled by running the DG-Readiness tool with the -Ready option, as follows:
DG_Readiness_Tool_v3.6.ps1 -Ready

Figure 9 Using the DG-Readiness tool to confirm Credential Guard is not enabledFigure 9 Using the DG-Readiness tool to confirm Credential Guard is not enabled

  1. Enable Credential Guard using the -Enable -CG options as follows:
DG_Readiness_Tool_v3.6.ps1 -Enable -CG
  1. Reboot the instance
  2. Reconnect to the instance after the reboot and confirm that Credential Guard is now running by running the command:
DG_Readiness_Tool_v3.6.ps1 -Ready

Figure 10 DG Readiness Tool shows Credential Guard is enabled and runningFigure 10 DG Readiness Tool shows Credential Guard is enabled and running

Conclusion

With support for Windows Defender Credential Guard on Amazon EC2 Windows Instances customers can create an isolated compute environment that is inaccessible to the rest of the OS. Credential Guard requires UEFI Secure Boot support. Credential Guard can leverage NitroTPM to further secure credentials.

To learn more about the support of Credential Guard visit documentation.

Using Porting Advisor for Graviton

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/using-porting-advisor-for-graviton/

This blog post is written by Ryan Doty Solutions Architect , AWS and Vishal Manan Sr. SSA, EC2 Graviton , AWS.

Introduction

AWS customers recognize that Graviton-based EC2 instances deliver price-performance benefits but many are concerned about the effort to port existing applications. Porting code from one architecture to another can result in a substantial investment in time and effort. AWS has worked continuously to improve the migration process for customers. We recently introduced the Porting Advisor for Graviton as a tool to further simplify the migration process. In this blog, we’ll walk you through how to use Porting Advisor for Graviton so that you can learn how to use it.

Porting Advisor for Graviton is an open-source, command-line tool that analyzes source code and generates a report highlighting missing or outdated libraries and code constructs that may require modification and provides a user with alternative recommendations. It helps customers and developers accelerate their transition to Graviton-based Amazon EC2 instances by reducing the iterative process of identifying and resolving source code and library dependencies. This blog post will provide you with a step-by-step implementation on how to use Porting Advisor for Graviton. At the end of the blog, you will be able to run Porting Advisor for Graviton on your source code tree, generating findings that will help simplify the effort required to port your application.

Porting Advisor for Graviton scans for potentially unsupported or non-portable arm64 code in source code trees. The tool only scans source code files for the programming languages C/C++, Fortran, Go 1.11+, Java 8+, Python 3+, and dependency files such as project/requirements.txt file(s). Most importantly the Porting Advisor doesn’t make any code modifications, API-level recommendations, or send data back to AWS.

You can utilize Porting Advisor for Graviton either as a Python script or compiled into a binary and run on x86-64 or arm64 systems. Therefore, it can be easily implemented as part of your build processes.

Expected Results

Porting Advisor for Graviton reports the following issues:

  1. Inline assembly with no corresponding arm64 inline assembly.
  2. Assembly source files with no corresponding arm64 assembly source files.
  3. Missing arm64 architecture detection in autoconf config.guess scripts.
  4. Linking against libraries that aren’t available on the arm64 architecture.
  5. Use  architecture specific intrinsics.
  6. Preprocessor errors that trigger when compiling on arm64.
  7. Usages of old Visual C++ runtime (Windows specific).

Compiler specific code guarded by compiler specific pre-defined macros is detected, but not reported by default. The following cross-compile specific issues are detected, but not reported by default:

  • Architecture detection that depends on the host rather than the target.
  • Use of build artifacts in the build process.

Skillsets needed for using the tool

Porting Advisor for Graviton is designed to be easy to use. Users though should be versed in the following skills in order to take advantage of the recommendations the tool provides:

  • An understanding of the build system – project requirements and dependencies, versioning, etc.
  • Basic scripting language skills around Python, PowerShell/Bash.
  • Understanding hardware (inline assembly for C/C++) or compiler specific (intrinsic for C/C++) constructs when applicable.
  • The ability to follow best practices in the AWS Graviton Technical Guide for code optimization.

How to use Porting Advisor for Graviton

Prerequisites

The tool requires a minimum version of Python 3.10 and Java (8+ to be installed). The installation of Java is also optional and only required if you want to scan JAR files for native method calls. You can run the tool on a Windows/Linux/MacOS machine, or in an EC2 instance. I will show case usage on Windows and Amazon Linux 2(AL2) running on an EC2 instance. It supports both arm64 and x86-64 processors.

You don’t need to be on an arm64-based processor to run the tool.

The tool doesn't need a lot of CPU Horsepower and even a system with few processors will do

You can run the tool as a Python script or an executable. The executable requires extra steps to build. However, it can be used on another machine without the need to install Python packages.

You must copy the complete “dist” folder for it to work, as there are libraries that are present in that folder along with the executable.

Porting Advisor for Graviton can be run as a binary or as a script. You must run the script to generate the binaries

./porting-advisor-linux-x86_64 ~/my/path/to/my/repo --output report.html 
./porting-advisor-linux-x86_64.exe ../test/CppCode/inline_assembly --output report.html

Running Porting Advisor for Graviton as an executable

The first step is to build the executable.

Building the executable

The first step is to set up the Python Environment. If you run into any errors building the binary, see the Troubleshooting section for more details.

Building the binary on Windows

Building Porting Advisor using Powershell

Building Porting Advisor binary

Building the binary on Linux/Mac

Using shell script to build on Linux or macOS

Porting Advisor binary saved in dist folder

Running the binary

Here you can see how you can run the tool on Linux as a binary for a C++ project.

Porting advisor binary run on a C++ codebase with 350 files

The “dist” folder will have the executable.

Running Porting Advisor for Graviton as a script

Enable the Python environment for the following:

Linux/Mac:

$. python3 -m venv .venv
$. source .venv/bin/activate

PowerShell:

PS> python -m venv .venv
PS> .\.venv\Scripts\Activate.ps1

The following shows how the tool would work on Windows when run as a script:

Running Porting Advisor on Windows as a powershell script

Running the Porting Advisor for Graviton on Linux as a Script

Setting up Python environment to run the Porting Advisor as a script

Running Porting Advisor on Linux as a script

Output of the Porting Advisor for Graviton

The Porting Advisor for Graviton requires the directory parameter to point to the folder where your source code lives. If there are multiple locations, then you can run the tool as part of the script.

If no output file is supplied only standard output will be produced. The following is the output of the tool in HTML format. The first line shows the total files scanned.

  1. If no issues are found, then you’ll see an output like the following:

Results of Porting Advisor run on C++ code with 2836 files with no issues found

  1. With x86-64 specific intrinsics, such as _bswap64, you’ll see it flagged. arm64 specific intrinsics won’t be flagged. Therefore, if your code has arm64 specific intrinsics, then the porting advisor will only flag x86-64 to arm64 and not vice versa. The primary goal of the tool is determining the arm64 readiness of your code.

Porting Advisor reporting inline assembly files in C++ code

  1. The scanner typically looks for source code files only, but it can also look for assembly files with *.s extensions. An example of a file with the C++ code with inline assembly code is as follows:

Porting Advisor reporting use of intrinsics in C++ code

  1. The tool has pointed out errors such as a preprocessor error and architecture-specific intrinsic usage errors.

Porting Advisor results run on C++ code pointing out missing preprocessor macros for arm64 and x86-64 specific intrinsics

Next steps

If you don’t see any issues reported by the tool, then you are in good shape for doing the port. If no issues are reported it does not guarantee that it will work as port. Porting Advisor for Graviton is a tool used best as a helper. Regardless of the issues reported by the tool, we still highly suggest testing the application thoroughly.

As a good practice, we recommend that you use the latest version of your libraries.

Based on the issue, you can consider further actions. For Compiler intrinsic errors, we recommend studying Intel and arm64 intrinsics.

Once you’ve migrated your code and are using Gravition, you can start to look at taking steps to optimize your performance. If you’re interested in doing that please look at our Getting Started Guide.

If you run into any issues, then see our CONTRIBUTING file.

FAQs

  1. How fast is the tool? The tool is able to scan 4048 files in 1.18 seconds.

On an arm64 Based Instance:

Porting Advisor scans 4048 files in 1.18 seconds

  1. False Positives?

This tool scans all files in a source tree, regardless of whether they are included by the build system or not. Therefore, it may misreport issues in files that appear in the source tree but are excluded by the build system. Currently, the tool supports the following languages/dependencies: C/C++, Fortran, Go 1.11+, Java 8+, and Python 3+.

For example: You may have legacy code using Python version 2.7 that is in the source tree but isn’t being used. The tool will scan the code base and point out issues in that particular codebase even though you may not be using that piece of code.  To mitigate, either  remove that folder from the source code or ignore the error pointed by the tool.

  1. I see mention of Ruby and .Net in the Open source tool, but they don’t work on my tool.

Ruby and .Net haven’t been implemented yet, but please consider contributing to it and open an issue requesting support. If you need support then see our CONTRIBUTING file.

Troubleshooting

Errors that you may encounter while building the tool binary:

PyInstaller needs a shared version of libraries.

  1. Python 3.10+ not having shared libraries for use by the PyInstaller tool.

Building Porting Advisor binaries

Pyinstaller failed at building binary and suggesting building Python configure script with --enable-shared on Linux or --enable-framework on macOS

The fix for this is to build your version of Python(3.10+) with the right flags:

./configure --enable-optimizations --enable-shared

If the two flags don’t work together, try doing the build with each flag enabled sequentially.

pyinstaller tool needs python configure script with --enable-shared and enable-optimizations flag

  1. Incorrect Python version (version less than 3.10).If you aren’t on the correct version of Python:

You will get errors similar to the ones here:

Python version on host is 3.7.15 which is less than the recommended version

If you want to run the tool in an EC2 instance on Amazon Linux 2(AL2), then you could try upgrading/installing Python 3.10 as pointed out here.

If you run into any issues, then see our CONTRIBUTING file.

Trying to run Porting Advisor as script will result in Syntax errors on Python version less than 3.10

Conclusion

Porting Advisor for Graviton helps customers quantify the amount of work that is required to port an application. It accelerates your ability to transition to Graviton-based Amazon EC2 instances by reducing the iterative process of identifying and resolving source code and library dependencies.

Resources

To learn how to migrate your workloads to Graviton-based instances, see the AWS Graviton Technical Guide GitHub Repository and AWS Graviton Transition Guide. To get started with Graviton-based Amazon EC2 instances, see the AWS Management Console, AWS Command Line Interface (AWS CLI), and AWS SDKs.

Some other resources include:

AWS Week in Review – February 20, 2023

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-week-in-review-february-20-2023/

Since the devastating earthquake in Türkiye and Syria, Amazon has activated disaster relief services to quickly provide relief items to impacted areas. The company and Amazon customers have donated nearly 100,000 relief items so far, and donations continue to come in.

The AWS Disaster Preparedness and Response team is providing trained technical volunteers and solutions to Help.NGO, a United Nations standby partner assisting in the region.

We continue to support field requests for winter survival equipment, clothing, hygiene products, and other items. If you wish to donate, check out our blog post to find your local donation site and to learn more about how we’ve supported relief efforts so far. Thank you for your support!

Last Week’s Launches
As usual, let’s take a look at some launches from the last week that I want to remind you of:

New Amazon EC2 M7g and R7g instances – Since we launched C7g instances in May 2022, the General Purpose (M7g) and the Memory-Optimized (R7g) instances are generally available. Both types are powered by the latest generation AWS Graviton3 processors, and are designed to deliver up to 25 percent better performance than the equivalent sixth-generation (M6g and R6g) instances, making them the best performers in Amazon EC2.

Here is my infographic to highlight the principal performance and capacity improvements that we have made available with the new instances:

Enable AWS Systems Manager across all Amazon EC2 instances – All EC2 instances in your account become managed instances, with a single action using the Default Host Management Configuration (DHMC) Agent without changing existing instance profile roles. DHMC is ideal for all EC2 users, and offers a simple, scalable process to standardize the availability of System Manager tools for users who manage many instances. To learn more, see Default Host Management Configuration in the AWS documentation.

Programmatically manage opt-in AWS Regions – You can now view and manage enabled and disabled opt-in AWS Regions on your AWS accounts using AWS APIs. You can enable, disable, read, and list Region opt status by using the following AWS CLI commands in case of enabling Africa (Cape Town) Region:

$ aws account enable-region --region-name af-south-1
$ aws account get-region-opt-status --region-name af-south-1 
{ 
   "RegionName": "af-south-1", 
   "RegionOptStatus": "ENABLING" 
}

It will save you the time and effort of doing it through the AWS Management Console. To learn more, see Specifying which AWS Regions your account can use in the AWS documentation.

Pictured: A 3D rendering of the AWS Modular Data Center (MDC) unit.AWS Modular Data Center (AWS MDC) – AWS MDC is available as a self-contained modular data center unit: an environmentally controlled physical enclosure that can host racks of AWS Outposts or AWS Snow Family devices. AWS MDC lets defense customers run low-latency applications in infrastructure-limited environments for scenarios like large-scale military operations, crisis response, and security cooperation.

At this time, AWS MDC is now available in the AWS GovCloud Regions, and this service can only be purchased by the U.S. Department of Defense under the Joint Warfighting Cloud Capability (JWCC) contract. To learn more, read the AWS Public Sector Blog post.

A picture of a cute English bulldog on top of 3 AWS Snowball Edge device. Amazon EKS Anywhere on Snow – This is a new deployment option that helps you create and operate Kubernetes clusters on AWS Snowball Edge devices for provisioning and familiar operational visibility tooling of container applications deployed at the edge.

Amazon EKS Anywhere on Snow is ideal for customers who run their operations using secure and durable AWS Snow Family devices in unconditioned or mobile environments such as construction sites, ships, and rapidly deployed military forces. To learn more, read the AWS Container Blog post.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other news items that you may find interesting in the last week:

Upcoming AWS Events
Check your calendars and sign up for these AWS-led events:

AWS at MWC 2023 – Join AWS at MWC23 in Barcelona, Spain, February 27 – March 2, and interact with upcoming innovative new service demonstrations, be inspired at one of our many sessions, or request a more personal meeting with us onsite.

AWS Innovate Data and AI/ML edition – AWS Innovate is a free online event to learn the latest from AWS experts and get step-by-step guidance on using AI/ML to drive fast, efficient, and measurable results. Register now for Asia Pacific & Japan (February 22, 2023), EMEA (March 9), and the Americas (March 14).

AWS Summits – AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. We kick off Paris and Sydney on April 4th and schedule most other Summits from April to June. Please stay tuned and watch for the dates and locations to be announced.

You can browse all upcoming AWS-led in-person, virtual events, and developer focused events such as Community Days.

That’s all for this week. Check back next Monday for another Week in Review!

— Channy

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

New Graviton3-Based General Purpose (m7g) and Memory-Optimized (r7g) Amazon EC2 Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-graviton3-based-general-purpose-m7g-and-memory-optimized-r7g-amazon-ec2-instances/

We’ve come a long way since the launch of the m1.small instance in 2006, adding instances with additional memory, compute power, and your choice of Intel, AMD, or Graviton processors. The original general-purpose “one size fits all” instance has evolved into six families, each one optimized for specific uses cases, with over 600 generally available instances in all.

New M7g and R7g
Today I am happy to tell you about the newest Amazon EC2 instance types, the M7g and the R7g. Both types are powered by the latest generation AWS Graviton3 processors, and are designed to deliver up to 25% better performance than the equivalent sixth-generation (M6g and R6g) instances, making them the best performers in EC2.

The M7g instances are for general purpose workloads such as application servers, microservices, gaming servers, mid-sized data stores, and caching fleets. The R7g instances are a great fit for memory-intensive workloads such as open-source databases, in-memory caches, and real-time big data analytics.

Here are the specs for the M7g instances:

Instance Name vCPUs
Memory
Network Bandwidth
EBS Bandwidth
m7g.medium 1 4 GiB up to 12.5 Gbps up to 10 Gbps
m7g.large 2 8 GiB up to 12.5 Gbps up to 10 Gbps
m7g.xlarge 4 16 GiB up to 12.5 Gbps up to 10 Gbps
m7g.2xlarge 8 32 GiB up to 15 Gbps up to 10 Gbps
m7g.4xlarge 16 64 GiB up to 15 Gbps up to 10 Gbps
m7g.8xlarge 32 128 GiB 15 Gbps 10 Gbps
m7g.12xlarge 48 192 GiB 22.5 Gbps 15 Gbps
m7g.16xlarge 64 256 GiB 30 Gbps 20 Gbps
m7g.metal 64 256 GiB 30 Gbps 20 Gbps

And here are the specs for the R7g instances:

Instance Name vCPUs
Memory
Network Bandwidth
EBS Bandwidth
r7g.medium 1 8 GiB up to 12.5 Gbps up to 10 Gbps
r7g.large 2 16 GiB up to 12.5 Gbps up to 10 Gbps
r7g.xlarge 4 32 GiB up to 12.5 Gbps up to 10 Gbps
r7g.2xlarge 8 64 GiB up to 15 Gbps up to 10 Gbps
r7g.4xlarge 16 128 GiB up to 15 Gbps up to 10 Gbps
r7g.8xlarge 32 256 GiB 15 Gbps 10 Gbps
r7g.12xlarge 48 384 GiB 22.5 Gbps 15 Gbps
r7g.16xlarge 64 512 GiB 30 Gbps 20 Gbps
r7g.metal 64 512 GiB 30 Gbps 20 Gbps

Both types of instances are equipped with DDR5 memory, which provides up to 50% higher memory bandwidth than the DDR4 memory used in previous generations. Here’s an infographic that I created to highlight the principal performance and capacity improvements that we have made available with the new instances:

If you are not yet running your application on Graviton instances, be sure to take advantage of the AWS Graviton Ready Program. The partners in this program provide services and solutions that will help you to migrate your application and to take full advantage of all that the Graviton instances have to offer. Other helpful resources include the Porting Advisor for Graviton and the Graviton Fast Start program.

The instances are built on the AWS Nitro System, and benefit from multiple features that enhance security: always-on memory encryption, a dedicated cache for each vCPU, and support for pointer authentication. They also support encrypted EBS volumes, which protect data at rest on the volume, data moving between the instance and the volume, snapshots created from the volume, and volumes created from those snapshots. To learn more about these and other Nitro-powered security features, be sure to read The Security Design of the AWS Nitro System.

On the network side the instances are EBS-Optimized with dedicated networking between the instances and the EBS volumes, and also support Enhanced Networking (read How do I enable and configure enhanced networking on my EC2 instances? for more info). The 16xlarge and metal instances also support Elastic Fabric Adapter (EFA) for applications that need a high level of inter-node communication.

Pricing and Regions
M7g and R7g instances are available today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) AWS Regions in On-Demand, Spot, Reserved Instance, and Savings Plan form.

Jeff;

PS – Launch one today and let me know what you think!

How to create custom health checks for your Amazon EC2 Auto Scaling Fleet

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/how-to-create-custom-health-checks-for-your-amazon-ec2-auto-scaling-fleet/

This blog post is written by Gaurav Verma, Cloud Infrastructure Architect, Professional Services AWS.

Amazon EC2 Auto Scaling helps you maintain application availability and lets you automatically add or remove Amazon Elastic Compute Cloud (Amazon EC2) instances according to the conditions that you define. You can use dynamic and predictive scaling to scale-out and scale-in EC2 instances. Auto Scaling helps to maintain the self-healing Amazon EC2 environment for an application where Auto Scaling can use the status of Amazon EC2 health checks to determine if an instance is faulty and needs replacement. Amazon EC2 Auto Scaling provides three types of health checks, which are discussed below.

EC2 Status check: AWS provides two types of health checks for EC2 instances: System status check and instance status check. System status checks monitor the AWS system on which an instance is running. If the problem is with underlying system, AWS will fix the problem. Instance status check monitors the software and network configuration of an instance. If the instance status check fails, then you can fix the problem by following the steps in the troubleshoot instances with failed status checks documentation.

Elastic Load Balancer Health Check: Auto Scaling groups are generally connected to Elastic Load Balancers (ELB). ELB provides the application-level health check by monitoring the endpoint (a webpage, or a health page in a web application) of an application. ELB health check monitors the application and marks the instance unhealthy if there is no response from an instance in the configured time.

Custom Health Check: You can use custom health checks to mark any instance as unhealthy if the instance fails for the check you define. Custom health checks can be used to implement various user requirements, such as the presence of instance tags added upon completion of a required workflow. The user data script is executed at instance boot time, and it can perform additional investigation into whether or not the user requirements are met before confirming that the instance is ready to accept load. For example, this approach could be used to confirm that the instance was successfully integrated with other parts of a complex application stack.

In some cases, a customer may add multiple checks either in the Amazon EC2 AMI or in the boot sequence to keep the instance secure and compliant. These checks can increase the boot time for the EC2 instance, and they can reboot the EC2 instance multiple times before an instance can be marked as compliant. Therefore, in some cases, an EC2 instance boot period can take forty to fifty minutes or longer.

If an EC2 instance isn’t marked as healthy within a defined time, Auto Scaling will mark an instance unhealthy, even though the instance wasn’t yet ready for evaluation. Custom health checks can help manage these situations. You can write the Amazon EC2 user data script to perform the custom health check and force Auto Scaling to wait until the instance is truly healthy (i.e., functional, secure, and compliant).

This blog describes a method to write a custom health check. We write an Amazon EC2 user data script to perform the custom health check and automate it for future EC2 instances in the Auto Scaling group. This script can wait for an instance to successfully complete the boot process and then mark the instance as healthy.

Prerequisites

You must have an AWS Identity and Access Management (IAM) role for Amazon EC2 with an Auto Scaling policy, which has these two actions allowed for the Auto Scaling group:

autoscaling:CompleteLifecycleAction

autoscaling:RecordLifecycleActionHeartbeat

Furthermore, we use the Amazon EC2 Auto Scaling lifecycle hooks. Lifecycle hooks let you create a solutions that are aware of events in Auto Scaling instance lifecycle, and then perform a custom action on instances when the lifecycle event occurs. As mentioned previously, typically a custom health check is needed when determining the workload readiness of an instance would be longer than the usual boot time for an EC2 instance that Auto Scaling assumes. Therefore, we utilize lifecycle hooks to keep the checks running until the instance is marked healthy.

Create custom health check

Let’s look at an example where an instance can only be marked as healthy if the instance has a tag with the key “Compliance-Check” and value “Successful”. Until this tag is both (a) present and (b) carries the value “Successful”, the instance shouldn’t be marked as “InService”.

  1. Create the Auto Scaling launch template for Amazon EC2 Auto Scaling. Name your Launch template “test”. In the additional configuration for user data, use this shell script as text.

The Following script will install the AWS Command Line Interface (AWS CLI) to interact with the AWS tagging and Auto Scaling APIs. Then, the script will run the while loop until the instance has a tag with the key “Compliance-Check” and value “Successful”. Once the instance has a tag, it will mark the instance as healthy and the instance will move into the “InService” state.

#!/bin/bash
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
#get instance id
instance=$(curl http://169.254.169.254/latest/meta-data/instance-id)

#Checking instance status
while true
do
readystatus=$(aws ec2 describe-instances --instance-ids $instance --filters "Name=tag: Compliance-Check,Values= Successful" |grep -i $instance)
if [[ $readystatus = *"InstanceId"* ]]; then
        echo $readystatus >> /home/ec2-user/user-script-output.txt
        aws autoscaling set-instance-health --instance-id $instance --health-status Healthy
        aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id $instance --lifecycle-hook-name test --auto-scaling-group-name my-asg
        break 
else
	aws autoscaling set-instance-health --instance-id $instance --health-status Unhealthy
	sleep 5  
fi

done
  1. Create an Amazon EC2 Auto Scaling group using the AWS CLI with the “test” launch template that you just created and a predefined lifecycle hook. First, create a JSON file “config.json” in a system where you will run the AWS command to create the Auto Scaling group.
{
    "AutoScalingGroupName": "my-asg",
    "LaunchTemplate": {"LaunchTemplateId": "lt-1234567890abcde12"} ,
    "MinSize": 2,
    "MaxSize": 4,
    "DesiredCapacity": 2,
    "VPCZoneIdentifier": "subnet-12345678, subnet-90123456",
    "NewInstancesProtectedFromScaleIn": true,
    "LifecycleHookSpecificationList": [
        {
            "LifecycleHookName": "test",
            "LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING",
            "HeartbeatTimeout": 300,
            "DefaultResult": "ABANDON"
        }
    ],
    "Tags": [
        {
            "ResourceId": "my-asg",
            "ResourceType": "auto-scaling-group",
            "Key": "Compliance-Check",
            "Value": "UnSuccessful",
            "PropagateAtLaunch": true
        }
    ]
}

To create the Auto Scaling group with the AWS CLI, you must run the following command at the same location where you saved the preceding JSON file. Make sure to replace the relevant subnets that you intend to use in the VPCZoneIdentifier.

>> aws autoscaling create-auto-scaling-group –cli-input-json file://config.json

This command will create the Auto Scaling group with a configuration defined in the JSON file. This Auto Scaling group should have two instances and a lifecycle hook called “test” with a 300 second wait period at the time of launch of an instance.

Tests

Now is the time to test the newly-created instances with a custom health check. Instances in Auto Scaling should be in the “Pending:Wait” stage, not the “InService” stage. Instances will be in this stage for approximately five minutes because we have a lifecycle hook time of 300 seconds in the config.json file.

If the workload readiness evaluation takes more than 300 seconds in your environment, then you can increase the lifecycle hook period to as long as 7200 seconds.

Change the tag value for one instance from “UnSuccessful” to “Successful”. If you’ve changed the tag within the five minutes of instances creation, then the instance should be in the “InService” state and marked as healthy.

Change Compliance-Check value from "UnSuccessful" to "Successful".

This test is a simulation of the situation where the health check of an instance depends on the tag values, and the tag values are only updated if the instance passes all of the checks as per the organization standards. Here we change the tag value manually, but in a real use case scenario, this value would be changed by the booting process when instances are marked as compliant.

Another test case could be that an instance should be marked as healthy if it’s added to the configuration management database, but not before that. For these checks, you can use the API with the curl command and look for the desired result. An example to call an API is in the above script, where it calls the AWS API to get the instance ID.

In case your custom health check script needs more than 7200 seconds, you can use this command to increase the lifecycle hook time:

>> aws autoscaling record-lifecycle-action-heartbeat –lifecycle-hook-name <lh_
name> –auto-scaling-group-name <asg_name> –instance-id <instance_id>

This command will give you the extra time equal to the time that you have configured in the life cycle hook.

Cleanup

Once you successfully test the solution, to avoid ongoing charges for resources you created, you should delete the resources.

To delete the Amazon EC2 Auto Scaling group, run the following command:

aws autoscaling delete-auto-scaling-group --auto-scaling-group-name my-asg

To delete the launch template, run the following command:

aws ec2 delete-launch-template --launch-template-id lt-1234567890abcde12

Delete the role and policy as well if you no longer need it.

Conclusion

EC2 Auto Scaling custom health checks are useful when system or instance health is insufficient, and you want instances to be marked as healthy only after additional checks. Typically, because of these different checks, the Amazon EC2 boot period can be longer than usual, and this may impact the scale-out process when an application needs more resources.

You can start by exploring EC2 Auto Scaling warm pools for these environments. You can keep the healthy marked instances in the warm pool in the Stopped stage. Then, these instances can be brought into the main pool at the time of scale-out without spending time on the boot process and lengthy health check. If you enable scale-in protection, then these healthy instances can move back to the warm pool at the time of scale-in rather than being terminated altogether.

How to choose between CoIP and Direct VPC routing modes on AWS Outposts rack

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/how-to-choose-between-coip-and-direct-vpc-routing-modes-on-aws-outposts-rack/

This blog post is written by Sumit Menaria, Senior Hybrid Solutions Architect AWS WWSO Core Services.

AWS Outposts Rack is a fully-managed service that extends AWS infrastructure, services, APIs, and tools to customer premises. By providing local access to AWS managed infrastructure and services, Outposts rack enables customers to build and run applications on premises using the same programming interfaces as in AWS Regions, while using local compute and storage resources for low latency, local data processing, and data residency needs.

There are various data sources on premises that you might want to connect from your Outpost. These sources can include field devices, on-premises databases, mainframes, storage arrays, or end users. Each Outpost supports a single Local Gateway (LGW) construct, which enables connectivity from your Outpost subnets to an on-premises network. Note that this post is specific to Outposts racks and a different method of local communication is used for AWS Outposts servers.

Two different options for facilitating communication between your Outpost based resources and on-premises network: Direct VPC routing and customer-owned IP pool. Both of these are mutually exclusive options, and routing works differently based on your choice of the mode. The two modes are the attributes of the LGW route table that your Outpost subnets’ VPC is associated with, which specifies the communication mode for the Outpost subnets.

Direct VPC routing mode

Direct VPC routing uses the private IP address of the instances in the VPC CIDR block to facilitate communication with your on-premises network. These addresses are advertised to your on-premises network with Border Gateway Protocol (BGP). Advertisement via BGP is only for the private IP addresses that belong to the subnets on your Outpost and have a route pointing to the LGW via the subnet’s route table. This type of routing is the default mode for Outposts Rack. In this mode, the LGW doesn’t perform Network Address Translation (NAT) for instances. Furthermore, you don’t have to assign an Elastic IP address to your Amazon Elastic Compute Cloud (Amazon EC2) instance from a (CoIP) to enable communication with your on-premises resources.

A diagram showing how an EC2 instance on an Outpost communicates with on-premises network using direct VPC routing mode

In this diagram, when the instance Y wants to communicate with an on-premises server, it traverses the LGW and can talk to the on-premises server using its source address (10.0.1.11) in the Subnet CIDR range (10.0.1.0/24) that is advertised over BGP from the LGW to the Customer Network Device. Similarly when the on-premises server wants to initiate communication with the Outpost based EC2 instance, it uses the instance’s private IP address (10.0.1.11) as the destination IP address to set up the connection.

CoIP mode

Utilizing CoIP mode means that you must provide a separate IP address range from your on-premises IP space for AWS to create an address pool, known as a CoIP. With CoIP, when an Outpost based resources, such as EC2 instances, Application Load Balancer (ALB), or Amazon Relational Database Service (Amazon RDS) instances, need to communicate to your on-premises network, the Local Gateway will perform 1:1 NAT from the resource’s private IP address from the Outpost subnet range to an IP address from the CoIP pool. The subnet-to-CoIP address mapping is done by assigning an Elastic IP (EIP) from the CoIP address range allocated for resources such as EC2 instances. To enable the communication with the CoIP pool from the on-premises network, and then the LGW advertises the CoIP pool through BGP over its peering with the Customer Network Device.

A diagram showing how an EC2 instance on an Outpost communicates with on-premises network using CoIP mode

In this diagram, when the instance Y wants to talk to an on-premises server, the traffic traverses the LGW and the source IP address (10.0.1.11) of the instance gets translated to an IP address (192.168.0.11) in the CoIP range that is associated with the instance. Similarly, when the on-premises server initiates the communication, the request will be sent with the CoIP address (192.168.0.11) of the instance as the destination IP address. This will be changed to the instance’s private IP address (10.0.1.11) via NAT at the LGW. The CoIP pool (192.168.0.0/26) is advertised via BGP to the Customer Network Device to provide the route to the on-premises environment for reaching the Outpost based resources.

When to choose CoIP routing mode

CoIP is particularly useful when you want to isolate your Outpost based workloads from the on-premises infrastructure and only need specific resources on the Outpost to be able to communicate to the on-premises infrastructure. This is useful in situations where large enterprise networks have hundreds of IP pools allocated and there is a high chance of overlap between IP addresses allocated to Outpost based VPCs/Subnets and those allocated to on-premises infrastructure. Furthermore, CoIP can act as another layer of security, as you may choose to allocate the CoIP addresses to only the resources which must communicate with the on-premises network. Then you can allocate for the rest of them using the subnet private IP address range for communication within the Outpost or Region based resources.

This means that you don’t need to have the number of IPs in the CoIP pool be equal to the number of resources on your Outpost. For example, you may choose to configure a /26 CoIP range and a /22 pool for subnets to meet your workload requirements.

CoIP mode can also be useful when using an external ALB on Outpost and you want to make it routable through the local internet connectivity. By using a smaller internet routable CoIP address range assignment for your external ALB, you can route traffic to the ALB on the Outpost without needing to traverse through the internet gateway (IGW) in the parent Region.

When to choose Direct VPC routing mode

You can choose Direct VPC routing if you don’t want the operational overhead of managing the additional IP pools for NAT between your Outpost based resources and on-premises network. There are also few applications which may not work well if there is an NAT of IPs between the two endpoints communicating with each other. Some examples you may see are Active Directory communication with on-premises based servers, or iSCSI mount of your instances as an additional storage to on-premises Storage Area Network (SAN). These applications may not work or may need additional tuning if they encounter NATed IP addresses between an Outpost based client on an EC2 instance and on-premises based server for a two-way communication.

When Direct VPC routing mode is used, multiple VPCs can be associated to an Outpost LGW route table, and the Outpost subnets with the LGW as the route target, are automatically advertised to the on-premises network through BGP. Therefore, you must make sure that appropriate IP planning is in place to avoid any overlap of the Outpost VPC/Subnet IP range with the on-premises IP range, as they are directly advertised from LGW toward the Customer Network Device. Having overlapping IP subnets in your network can lead to undesired effects on your application connectivity and you must pay special attention when allocating IP pools for your on-premises and Outpost based VPC address space. You can use Amazon VPC IP Address Manager (IPAM) to plan the IP space of your VPCs and CoIP pools, as well as add on-premises based IP Pools using manual allocation.

Conclusion

You can select either Direct VPC routing or CoIP mode for routing through an Outpost Local Gateway. Since this selection affects the routing for all of the subnets on your Outpost associated with the LGW route table, it should be selected based on your workload requirements and existing IP infrastructure planning. You can also change the LGW route table mode at a later stage. However, that involves network disruption and the creation of a new LGW route table. To learn more about Outposts Racks routing, visit the LGW Route table documentation.

Quick Restoration through Replacing the Root Volumes of Amazon EC2 instances

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/quick-restoration-through-replacing-the-root-volumes-of-amazon-ec2/

This blog post is written by Katja-Maja Krödel, IoT Specialist Solutions Architect, and Benjamin Meyer, Senior Solutions Architect, Game Tech.

Customers use Amazon Elastic Compute Cloud (Amazon EC2) instances to develop, deploy, and test applications. To use those instances most effectively, customers have expressed the need to set back their instance to a previous state within minutes or even seconds. They want to find a quick and automated way to manage setting back their instances at scale.

The feature of replacing Root Volumes of Amazon EC2 instances enables customers to replace the root volumes of running EC2 instances to a specific snapshot or its launch state. Without stopping the instance, this allows customers to fix issues while retaining the instance store data, networking, and AWS Identity and Access Management (IAM) configuration. Customers can resume their operations with their instance store data intact. This works for all virtualized EC2 instances and bare metal EC2 Mac instances today.

In this post, we show you how to design your architecture for automated Root Volume Replacement using this Amazon EC2 feature. We start with the automated snapshot creation, continue with automatically replacing the root volume, and finish with how to keep your environment clean after your replacement job succeeds.

What is Root Volume Replacement?

Amazon EC2 enables customers to replace the root Amazon Elastic Block Store (Amazon EBS) volume for an instance without stopping the instance to which it’s attached. An Amazon EBS root volume is replaced to the launch state, or any snapshot taken from the EBS volume itself. This allows issues to be fixed, such as root volume corruption or guest OS networking errors. Replacing the root volume of an instance includes the following steps:

  • A new EBS volume is created from a previously taken snapshot or the launch state
  • Reboot of the instance
  • While rebooting, the current root volume is detached and the new root volume is attached

The previous EBS root volume isn’t deleted and can be attached to an instance for later investigation of the volume. If replacing to a different state of the EBS than the launch state, then a snapshot of the current root volume is used.

An example use case is a continuous integration/continuous deployment (CI/CD) System that builds on EC2 instances to build artifacts. Within this system, you could alter the installed tools on the host and may cause failing builds on the same machine. To prevent any unclean builds, the introduced architecture is used to clean up the machine by replacing the root volume to a previously known good state. This is especially interesting for EC2 Mac Instances, as their Dedicated Host won’t undergo the scrubbing process, and the instance is more quickly restored than launching a fresh EC2 Mac instance on the same host.

Overview

The feature of replacing Root Volumes was introduced in April 2021 and has just been <TBD> extended to work for Bare Metal EC2 Mac Instances. This means that EC2 Mac Instances are included. If you want to reset an EC2 instance to a previously known good state, then you can create Snapshots of your EBS volumes. To reset the root volume to its launch state, no snapshot is needed. For non-root volumes, you can use these Snapshots to create new EBS volumes, and then attach those to your instance as well as detach them. To automate the process of replacing your root volume not only once, but also in a repeatable manner, we’re introducing you to an architecture that can fully-automate this process.

In the case that you use a snapshot to create a new root volume, you must take a new snapshot of that volume to be able to get back to that state later on. You can’t use a snapshot of a different volume to restore to, which is the reason that the architecture includes the automatic snapshot creation of a fresh root volume.

The architecture is built in three steps:

  1. Automation of Snapshot Creation for new EBS volumes
  2. Automation of replacing your Root Volume
  3. Preparation of the environment for the next Root Volume Replacement

The following diagram illustrates the architecture of this solution.

 Architecture of the automated creation of Root Volumes for Amazon EC2 Instances

In the next sections, we go through these concepts to design the automatic Root Volume Replacement Task.

Automation of Snapshot Creation for new EBS volumes

Architecture of the automated creation of Snapshots of new EBS Volumes.

The figure above illustrates the architecture for automatically creating a snapshot of an existing EBS volume. In this architecture, we focus on the automation of creating a snapshot whenever a new EBS root volume is created.

Amazon EventBridge is used to invoke an AWS Lambda function on an emitted createVolume event. For automated reaction to the event, you can add a rule to the EventBridge which will forward the event to an AWS Lambda function whenever a new EBS volume is created. The rule within EventBridge looks like this:

{
  "source": ["aws.ec2"],
  "detail-type": ["EBS Volume Notification"],
  "detail": {
    "event": ["createVolume"]
  }
}

An example event is emitted when an EBS root volume is created, which will then invoke the Lambda function to look like this:

{
   "version": "0",
   "id": "01234567-0123-0123-0123-012345678901",
   "detail-type": "EBS Volume Notification",
   "source": "aws.ec2",
   "account": "012345678901",
   "time": "yyyy-mm-ddThh:mm:ssZ",
   "region": "us-east-1",
   "resources": [
      "arn:aws:ec2:us-east-1:012345678901:volume/vol-01234567"
   ],
   "detail": {
      "result": "available",
      "cause": "",
      "event": "createVolume",
      "request-id": "01234567-0123-0123-0123-0123456789ab"
   }
}

The code of the function uses the resource ARN within the received event and requests resource details about the EBS volume from the Amazon EC2 APIs. Since the event doesn’t include information if it’s a root volume, then you must verify this using the Amazon EC2 API.

The following is a summary of the tasks of the Lambda function:

  1. Extract the EBS ARN from the EventBridge Event
  2. Verify that it’s a root volume of an EC2 Instance
  3. Call the Amazon EC2 API create-snapshot to create a snapshot of the root volume and add a tag replace-snapshot=true

Then, the tag is used to clean up the environment and get rid of snapshots that aren’t needed.

As an alternative, you can emit your own event to EventBridge. This can be used to automatically create snapshots to which you can restore your volume. Instead of reacting to the createVolume event, you can use a customized approach for this architecture.

Automation of replacing your Root Volume

Architecture of the automated creation of Snapshots of new EBS Volumes.

The figure above illustrates the procedure of replacing the EBS root volume. It starts with the event, which is created through the AWS Command Line Interface (AWS CLI), console, or usage of the API. This leads to creating a new volume from a snapshot or using the initial launch state. The EC2 instance is rebooted, and during that time the old root volume is detached and a new volume gets attached as the root volume.

To invoke the create-replace-root-volume-task, you can call the Amazon EC2 API with the following AWS CLI command:

aws ec2 create-replace-root-volume-task --instance-id <value> --snapshot <value> --tag-specifications ResourceType=string,Tags=[{Key=replaced-volume,Value=true}]

If you want to restore to launch state, then omit the --snapshot parameter:

aws ec2 create-replace-root-volume-task --instance-id <value> --tag-specifications ResourceType=string,Tags=[{Key=delete-volume,Value=true}]

After running this command, AWS will create a new EBS volume, add the tag to the old EBS replaced-volume=true, restart your instance, and attach the new volume to the instance as the root volume. The tag is used later to detect old root volumes and clean up the environment.

If this is combined with the earlier explained automation, then the automation will immediately take a snapshot from the new EBS volume. A restore operation can only be done to a snapshot of the current EBS root volume. Therefore, if no snapshot is taken from the freshly restored EBS volume, then no restore operation is possible except the restore to launch state.

Preparation of the Environment for the next Root Volume Replacement

After the task is completed, the old root volume isn’t removed. Additionally, snapshots of previous root volumes can’t be used to restore current root volumes. To clean up your environment, you can schedule a Lambda function which does the following steps:

  • Delete detached EBS volumes with the tag delete-volume=true
  • Delete snapshots with the tag replace-snapshot=true, which aren’t associated with an existing EBS volume

Conclusion

In this post, we described an architecture to quickly restore EC2 instances through Root Volume Replacement. The feature of replacing Root Volumes of Amazon EC2 instances, now including Bare Metal EC2 Mac instances, enables customers to replace the root volumes of running EC2 instances to a specific snapshot or its launch state. Customers can resume their operations with their instance store data intact. We’ve split the process of doing this in an automated and quick manner into three steps: Create a snapshot, run the replacement task, and reset your environment to be prepared for a following replacement task. If you want to learn more about this feature, then see the Announcement of replacing Root Volumes, as well as the documentation for this feature. <TBD Announcement Bare Metal>

Automating your workload deployments in AWS Local Zones

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/automating-your-workload-deployments-in-aws-local-zones/

This blog post is written by Enrico Liguori, SA – Solutions Builder , WWPS Solution Architecture.

AWS Local Zones are a type of infrastructure deployment that places compute, storage,and other select AWS services close to large population and industry centers.

We now have a total of 32 Local Zones; 15 outside of the US (Bangkok, Buenos Aires, Copenhagen, Delhi, Hamburg, Helsinki, Kolkata, Lagos, Lima, Muscat, Perth, Querétaro, Santiago, Taipei, and Warsaw) and 17 in the US. We will continue to launch Local Zones in 21 metro areas in 18 countries, including Australia, Austria, Belgium, Brazil, Canada, Colombia, Czech Republic, Germany, Greece, India, Kenya, Netherlands, New Zealand, Norway, Philippines, Portugal, South Africa, and Vietnam.

Customers using AWS Local Zones can provision the infrastructure and services needed to host their workloads with the same APIs and tools for automation that they use in the AWS Region, included the AWS Cloud Development Kit (AWS CDK).

The AWS CDK is an open source software development framework to model and provision your cloud application resources using familiar programming languages, including TypeScript, JavaScript, Python, C#, and Java. For the solution in this post, we use Python.

Overview

In this post we demonstrate how to:

  1. Programmatically enable the Local Zone of your interest.
  2. Explore the supported APIs to check the types of Amazon Elastic Compute Cloud (Amazon EC2) instances available in a specific Local Zone and get their associated price per hour;
  3. Deploy a simple WordPress application in the Local Zone through AWS CDK.

Prerequisites

To be able to try the examples provided in this post, you must configure:

  1. AWS Command Line Interface (AWS CLI)
  2. Python version 3.8 or above
  3. AWS CDK

Enabling a Local Zone programmatically

To get started with Local Zones, you must first enable the Local Zone that you plan to use in your AWS account. In this tutorial, you can learn how to select the Local Zone that provides the lowest latency to your site and understand how to opt into the Local Zone from the AWS Management Console.

If you prefer to interact with AWS APIs programmatically, then you can enable the Local Zone of your interest by calling the ModifyAvailabilityZoneGroup API through the AWS CLI or one of the supported AWS SDKs.

The following examples show how to opt into the Atlanta Local Zone through the AWS CLI and through the Python SDK:

AWS CLI:

aws ec2 modify-availability-zone-group \
  --region us-east-1 \
  --group-name us-east-1-atl-1 \
  --opt-in-status opted-in

Python SDK:

ec2 = boto3.client('ec2', config=Config(region_name='us-east-1'))
response = ec2.modify_availability_zone_group(
                  GroupName='us-east-1-atl-1',
                  OptInStatus='opted-in'
           )

The opt in process takes approximately five minutes to complete. After this time, you can confirm the opt in status using the DescribeAvailabilityZones API.

From the AWS CLI, you can check the enabled Local Zones with:

aws ec2 describe-availability-zones --region us-east-1

Or, once again, we can use one of the supported SDKs. Here is an example using Phyton:

ec2 = boto3.client('ec2', config=Config(region_name='us-east-1'))
response = ec2.describe_availability_zones()

In both cases, a JSON object similar to the following, will be returned:

{
"State": "available",
"OptInStatus": "opted-in",
"Messages": [],
"RegionName": "us-east-1",
"ZoneName": "us-east-1-atl-1a",
"ZoneId": "use1-atl1-az1",
"GroupName": "us-east-1-atl-1",
"NetworkBorderGroup": "us-east-1-atl-1",
"ZoneType": "local-zone",
"ParentZoneName": "us-east-1d",
"ParentZoneId": "use1-az4"
}

The OptInStatus confirms that we successful enabled the Atlanta Local Zone and that we can now deploy resources in it.

How to check available EC2 instances in Local Zones

The set of instance types available in a Local Zone might change from one Local Zone to another. This means that before starting deploying resources, it’s a good practice to check which instance types are supported in the Local Zone.

After enabling the Local Zone, we can programmatically check the instance types that are available by using DescribeInstanceTypeOfferings. To use the API with Local Zones, we must pass availability-zone as the value of the LocationType parameter and use a Filter object to select the correct Local Zone that we want to check. The resulting AWS CLI command will look like the following example:

aws ec2 describe-instance-type-offerings --location-type "availability-zone" --filters 
Name=location,Values=us-east-1-atl-1a --region us-east-1

Using Python SDK:

ec2 = boto3.client('ec2', config=Config(region_name='us-east-1'))
response = ec2.describe_instance_type_offerings(
      LocationType='availability-zone',
      Filters=[
            {
            'Name': 'location',
            'Values': ['us-east-1-atl-1a']
            }
            ]
      )

How to check prices of EC2 instances in Local Zones

EC2 instances and other AWS resources in Local Zones will have different prices than in the parent Region. Check the pricing page for the complete list of pricing options and associated price-per-hour.

To access the pricing list programmatically, we can use the GetProducts API. The API returns the list of pricing options available for the AWS service specified in the ServiceCode parameter. We also recommend defining Filters to restrict the number of results returned. For example, to retrieve the On-Demand pricing list of a T3 Medium instance in Atlanta from the AWS CLI, we can use the following:

aws pricing get-products --format-version aws_v1 --service-code AmazonEC2 --region us-east-1 \
--filters 'Type=TERM_MATCH,Field=instanceType,Value=t3.medium' \
--filters 'Type=TERM_MATCH,Field=location,Value=US East (Atlanta)'

Similarly, with Python SDK we can use the following:

pricing = boto3.client('pricing',config=Config(region_name="us-east-1")) response = pricing.get_products(
         ServiceCode='AmazonEC2',
         Filters= [
          {
          "Type": "TERM_MATCH",
          "Field": "instanceType",
          "Value": "t3.medium"
          },
          {
          "Type": "TERM_MATCH",
          "Field": "regionCode",
          "Value": "us-east-1-atl-1"
          }
        ],
         FormatVersion='aws_v1',
)

Note that the Region specified in the CLI command and in Boto3, is the location of the AWS Price List service API endpoint. This API is available only in us-east-1 and ap-south-1 Regions.

Deploying WordPress in Local Zones using AWS CDK

In this section, we see how to use the AWS CDK and Python to deploy a simple non-production WordPress installation in a Local Zone.

Architecture overview

architecture overview

The AWS CDK stack will deploy a new standard Amazon Virtual Private Cloud (Amazon VPC) in the parent Region (us-east-1) that will be extended to the Local Zone. This creates two subnets associated with the Atlanta Local Zone: a public subnet to expose resources on the Internet, and a private subnet to host the application and database layers. Review the AWS public documentation for a definition of public and private subnets in a VPC.

The application architecture is made of the following:

  • A front-end in the private subnet where a WordPress application is installed, through a User Data script, in a type T3 medium EC2 instance.
  • A back-end in the private subnet where MySQL database is installed, through a User Data script, in a type T3 medium EC2 instance.
  • An Application Load Balancer (ALB) in the public subnet that will act as the entry point for the application.
  • A NAT instance to allow resources in the private subnet to initiate traffic to the Internet.

Clone the sample code from the AWS CDK examples repository

We can clone the AWS CDK code hosted on GitHub with:

$ git clone https://github.com/aws-samples/aws-cdk-examples.git

Then navigate to the directory aws-cdk-examples/python/vpc-ec2-local-zones using the following:

$ cd aws-cdk-examples/python/vpc-ec2-local-zones

Before starting the provisioning, let’s look at the code in the following sections.

Networking infrastructure

The networking infrastructure is usually the first building block that we must define. In AWS CDK, this can be done using the VPC construct:

import aws_cdk.aws_ec2 as ec2
vpc = ec2.Vpc(
            self,
            "Vpc",
            cidr=”172.31.100.0/24”,
            subnet_configuration=[
                ec2.SubnetConfiguration(
                    name = 'Public-Subnet',
                    subnet_type = ec2.SubnetType.PUBLIC,
                    cidr_mask = 26,
                ),
                ec2.SubnetConfiguration(
                    name = 'Private-Subnet',
                    subnet_type = ec2.SubnetType.PRIVATE_ISOLATED,
                    cidr_mask = 26,
                ),
            ]      
        )

Together with the VPC CIDR (i.e. 172.31.100.0/24), we define also the subnets configuration through the subnet_configuration parameter.

Note that in the subnet definitions above there is no specification of the Availability Zone or Local Zone that we want to associate them with. We can define this setting at the VPC level, overwriting the availability_zones method as shown here:

@property
def availability_zones(self):
   return [“us-east-1-atl-1a”]

As an alternative, you can use a Local Zone Name as the value of the availability_zones parameter in each Subnet definition. For a complete list of Local Zone Names, check out the Zone Names on the Local Zones Locations page.

Specifying ec2.SubnetType.PUBLIC  in the subnet_type parameter, AWS CDK  automatically creates an Internet Gateway (IGW) associated with our VPC and a default route in its routing table pointing to the IGW. With this setup, the Internet traffic will go directly to the IGW in the Local Zone without going through the parent AWS Region. For other connectivity options, check the AWS Local Zone User Guide.

The last piece of our networking infrastructure is a self-managed NAT instance. This will allow instances in the private subnet to communicate with services outside of the VPC and simultaneously prevent them from receiving unsolicited connection requests.

We can implement the best practices for NAT instances included in the AWS public documentation using a combination of parameters of the Instance construct, as shown here:

nat = ec2.Instance(self, "NATInstanceInLZ",
                 vpc=vpc,
                 security_group=self.create_nat_SG(vpc),
                 instance_type=ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.MEDIUM),
                 machine_image=ec2.MachineImage.latest_amazon_linux(),
                 user_data=ec2.UserData.custom(user_data),
                 vpc_subnets=ec2.SubnetSelection(availability_zones=[“us-east-1-atl-1a”], subnet_type=ec2.SubnetType.PUBLIC),
                 source_dest_check=False
                )

In the previous code example, we specify the following as parameters:

The final required step is to update the route table of the private subnet with the following:

priv_subnet.add_route("DefRouteToNAT",
            router_id=nat_instance.instance_id,
            router_type=ec2.RouterType.INSTANCE,
            destination_cidr_block="0.0.0.0/0",
            enables_internet_connectivity=True)

The application stack

The other resources, including the front-end instance managed by AutoScaling, the back-end instance, and ALB are deployed using the standard AWS CDK constructs. Note that the ALB service is only available in some Local Zones. If you plan to use a Local Zone where ALB isn’t supported, then you must deploy a load balancer on a self-managed EC2 instance, or use a load balancer available in AWS Marketplace.

Stack deployment

Next, let’s go through the AWS CDK bootstrapping process. This is required only for the first time that we use AWS CDK in a specific AWS environment (an AWS environment is a combination of an AWS account and Region).

$ cdk bootstrap

Now we can deploy the stack with the following:

$ cdk deploy

After the deployment is completed, we can connect to the application with a browser using the URL returned in the output of the cdk deploy command:

terminal screenshot

The WordPress install wizard will be displayed in the browser, thereby confirming that the deployment worked as expected:

The WordPress install wizard

Note that in this post we use the Local Zone in Atlanta. Therefore, we must deploy the stack in its parent Region, US East (N. Virginia). To select the Region used by the stack, configure the AWS CLI default profile.

Cleanup

To terminate the resources that we created in this post, you can simply run the following:

$ cdk destroy

Conclusion

In this post, we demonstrated how to interact programmatically with the different AWS APIs available for Local Zones. Furthermore, we deployed a simple WordPress application in the Atlanta Local Zone after analyzing the AWS CDK code used for the deployment.

We encourage you to try the examples provided in this post and get familiar with the programmatic configuration and deployment of resources in a Local Zone.

Streaming the AWS Wickr desktop client with Amazon AppStream 2.0

Post Syndicated from Charles H. original https://aws.amazon.com/blogs/architecture/streaming-the-aws-wickr-desktop-client-with-amazon-appstream-2-0/

Amazon Web Services (AWS) customers using AWS Wickr who want to find a way to access their AWS Wickr Windows desktop client though a web browser, can use Amazon AppStream 2.0 to stream the application through to their users.

Using this architecture, you can provide lightweight access to the AWS Wickr desktop client for users that cannot install it onto their local device. By using AppStream 2.0, you can focus on managing your AWS Wickr network while AppStream 2.0 manages the AWS resources required to host and run the application, scaling automatically and providing on-demand access to your users.

If you want to ensure that AWS Wickr user data persists between streaming sessions, you can make use of AppStream 2.0 user persistence to securely save user data (including AWS Wickr client data) to Amazon Simple Storage Service (Amazon S3).

In this post, we discuss how to build an AppStream 2.0 image for AWS Wickr on Windows, enable persistence for users, and deploy a stack.

Solution overview

These steps will illustrate deploying an AppStream 2.0 Windows Image Builder to a single availability zone. Then we deploy an AppStream 2.0 fleet with internet access and user data persistence to three availability zones for high availability.

Review the Regions and Availability Zones documentation and the AWS Regional Services List to choose the best Region for your deployment, as well as the networking and bandwidth requirements for User Connections to Amazon AppStream 2.0.

AWS Wickr with Amazon AppStream 2.0 Architecture

Figure 1. AWS Wickr with Amazon AppStream 2.0 Architecture

Cost

The costs associated with using AWS services when deploying AppStream 2.0 and AWS Wickr in your AWS account can be estimated on the pricing pages for the services used.

Walkthrough

This walkthrough takes you from installing the AWS Wickr for Windows desktop client onto an AppStream 2.0 Image Builder, to configuring the client for persistence, and finally to deploying a fleet and stack for your users to consume:

  1. Install AWS Wickr for Windows client onto your AppStream 2.0 Windows Image Builder.
  2. Configure the client using the Image Assistant to set up user data and settings persistence.
  3. Create an on-demand instance streaming fleet.
  4. Create a user stack and enable user data and settings persistence.
  5. Create a user pool.
  6. Test your streaming application and prove data persistence.

Prerequisites

You should have the following prerequisites:

  • Familiarity with AppStream 2.0 and an existing AWS Wickr account with associated credentials. Currently, it is not possible for an AWS Wickr user to register their account over AppStream 2.0.
  • An AWS account with access to AppStream 2.0. Instructions for setting up an AWS account can be found at Setting Up for Amazon AppStream 2.0.
  • A running AppStream 2.0 image builder, based on the WinServer2019-10-05-2022 base image with the AWS Wickr for Windows client installer downloaded to it.
  • An up-to-date Google Chrome browser (if you want to use your device’s webcam).

Install AWS Wickr

AppStream 2.0 uses Amazon Elastic Compute Cloud (Amazon EC2) instances to stream applications. You launch instances from base images, called image builders. In this step you will install the AWS Wickr client onto your image builder before moving on to configuration.

  1. Connect to your image builder as an Administrator.
  2. Run the AWS Wickr installer and carry out the following steps:
    1. On the Welcome to the AWS Wickr Setup Wizard prompt, choose Next.
    2. For Installation Type, choose Everybody (all users).
    3. Choose Next to accept the default installation folder.
    4. Choose Install.
  3. Once the AWS Wickr client has finished installing, uncheck Launch Wickr on the final screen, and then choose Finish.

Run the Image Assistant

To create your own custom image, connect to an image builder instance, install and configure your applications for streaming with guidance from the Image Assistant, and then create your image by creating a snapshot of the image builder instance:

  1. Run the Image Assistant shortcut found on the desktop.
  2. Under 1. ADD APPS, choose Add App.
  3. Choose Program Files, Amazon Web Services, Wickr, and AWS Wickr.
  4. Scroll to the bottom, choose the Wickr application, and then choose Open.
  5. In the pop-up that appears, customize the Name and Display Name (if needed), and then in the Launch Parameters box, enter -datalocation “C:\Users\%username%” (Figure 2).
    Updating Launch Parameters

    Figure 2. Updating Launch Parameters

    Note: If you want to mask the name of the application within the URL used for streaming, replace the Display Name field text with something else.

  6. Choose Save.
  7. Wickr will now appear as an app on the Image Assistant screen. Choose Next.
  8. Under the 2. CONFIGURE APPS stage, follow the instructions from 1-5, choose Save settings, and choose Next (Figure 3).

    Configuring the Application with Image Assistant

    Figure 3. Configuring the Application with Image Assistant

  9. Under the 3. TEST stage, carry out steps 1-3.
  10. Under the 4. OPTIMIZE stage, select Launch.
  11. As instructed, once the app has launched successfully, select Continue and wait for the app to be optimized.
  12. Under the 5. CONFIGURE IMAGE stage, give the image that will be created a Name, Display name, and Description.
  13. Choose the Always use latest agent version checkbox. This ensures that new image builders or fleet instances that are launched from your image always use the latest AppStream 2.0 agent version, and then select Next (Figure 4).

    Finalizing the Application Image

    Figure 4. Finalizing the Application Image

  14. Under the 6. REVIEW stage, select Disconnect and Create Image. Your session will be terminated while your image is being created.

Create an AppStream 2.0 fleet

With AppStream 2.0, you create fleet instances and stacks as part of the process of streaming applications. A fleet consists of streaming instances that run the image that you specify.

  1. Return to the AppStream 2.0 management console.
  2. Choose Fleets from the menu on the left.
  3. Choose your Fleet type (this demonstration uses On-Demand) and choose Next.
  4. Give the Fleet a Name, Display name, and a Description.
    Note: If you wish to mask the name of the fleet within the URL used for streaming, replace the Name and Display name section with something else.
  5. Choose your Fleet instance type (this walkthrough uses a general purpose stream.standard.large instance type).
  6. Adjust the Fleet capacity as required, leave the other sections as they’re displayed by default, and choose Next.
  7. On the Choose an Image screen, choose the image you created earlier.
  8. On the Configure network screen, choose Enable default internet access and choose your VPC, along with three subnets that you will deploy into. Finally, choose the Security group that you will use to restrict access (the default is to allow access from all IPs). Choose Next.
  9. Review your settings and then choose Create fleet.

Create an AppStream 2.0 stack

A stack consists of an associated fleet, user access policies, and storage configurations. In this step, you will create a stack, enable application settings persistence, and associate the stack with the fleet you provisioned previously.

  1. From the menu in the AppStream 2.0 management console, choose Stacks.
  2. Choose Create Stack.
  3. Give the stack a Name, and optionally a Display name and Description.
  4. Leave all other options as they’re displayed by default and choose Next.
  5. In the Enable storage window, ensure that Enable home folders is selected, leave all the settings as they are, and choose Next.
  6. In the Edit user settings section, modify the copy and paste functionality to your requirements, and ensure that Enable application settings persistence is selected. Choose Next.
  7. Review your configuration and choose Create stack.
  8. You will now be presented with an overview of the stack you have created. Choose the Action dropdown list and choose Associate fleet.
  9. From the dropdown list, choose the fleet that you previously provisioned and choose Associate.

Create an AppStream 2.0 user pool

Users can access application stacks through a persistent URL and login credentials by using their email address and a password that they choose. In this step, you will create a user and assign it to a stack so you can access your AppStream 2.0 streaming session.

  1. Choose User pool.
  2. Choose Create user.
  3. Enter an email address, first name, and last name, and then choose Create User.
  4. In the User pool window, choose the User and then choose Action.
  5. Choose Assign stack, choose the newly created stack, and choose Send email notification to user.
  6. Choose Assign stack.
  7. You will receive two emails. Follow the instructions on the one titled Start accessing your apps using AppStream 2.0 to access your app.

Launch your AppStream 2.0 streaming session

In this step, you will use the user created earlier to log in to an AppStream 2.0 streaming session. You will then prove AWS Wickr user data persistence by exiting and logging back into your session.

Note: You will only be able to launch your session once your fleet has been provisioned. This can take around 15-20 minutes.

  1. Choose the login page link from the email you received in the previous section and log in.
  2. You will be presented with the AWS Wickr client icon. Choose it to start your session.
  3. Log in to the AWS Wickr client with your credentials.
  4. As you have application persistence enabled, you can close the tab and the session will pick up from where you left it when you log back in (Figure 5).
Accessing the Streaming Application

Figure 5. Accessing the Streaming Application

Cleanup

To avoid incurring future charges, delete the stack, user, fleet, custom image, and image builder that you have created.

Conclusion

In this post, we demonstrated how customers can take advantage of AppStream 2.0 as a managed service to enable the provisioning of AWS Wickr clients for users, with persistence between sessions, through a web browser.

Building Sustainable, Efficient, and Cost-Optimized Applications on AWS

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-sustainable-efficient-and-cost-optimized-applications-on-aws/

This blog post is written by Isha Dua Sr. Solutions Architect AWS, Ananth Kommuri Solutions Architect AWS, and Dr. Sam Mokhtari Sr. Sustainability Lead SA WA for AWS.

Today, more than ever, sustainability and cost-savings are top of mind for nearly every organization. Research has shown that AWS’ infrastructure is 3.6 times more energy efficient than the median of U.S. enterprise data centers and up to five times more energy efficient than the average in Europe. That said, simply migrating to AWS isn’t enough to meet the Environmental, Social, Governance (ESG) and Cloud Financial Management (CFM) goals that today’s customers are setting. In order to make conscious use of our planet’s resources, applications running on the cloud must be built with efficiency in mind.

That’s because cloud sustainability is a shared responsibility. At AWS, we’re responsible for optimizing the sustainability of the cloud – building efficient infrastructure, enough options to meet every customer’s needs, and the tools to manage it all effectively. As an AWS customer, you’re responsible for sustainability in the cloud – building workloads in a way that minimizes the total number of resource requirements and makes the most of what must be consumed.

Most AWS service charges are correlated with hardware usage, so reducing resource consumption also has the added benefit of reducing costs. In this blog post, we’ll highlight best practices for running efficient compute environments on AWS that maximize utilization and decrease waste, with both sustainability and cost-savings in mind.

First: Measure What Matters

Application optimization is a continuous process, but it has to start somewhere. The AWS Well Architected Framework Sustainability pillar includes an improvement process that helps customers map their journey and understand the impact of possible changes. There is a saying “you can’t improve what you don’t measure.”, which is why it’s important to define and regularly track metrics which are important to your business. Scope 2 Carbon emissions, such as those provided by the AWS Customer Carbon Footprint Tool, are one metric that many organizations use to benchmark their sustainability initiatives, but they shouldn’t be the only one.

Even after AWS meets our 2025 goal of powering our operations with 100% renewable energy, it’s still be important to maximize the utilization and minimize the consumption of the resources that you use. Just like installing solar panels on your house, it’s important to limit your total consumption to ensure you can be powered by that energy. That’s why many organizations use proxy metrics such as vCPU Hours, storage usage, and data transfer to evaluate their hardware consumption and measure improvements made to infrastructure over time.

In addition to these metrics, it’s helpful to baseline utilization against the value delivered to your end-users and customers. Tracking utilization alongside business metrics (orders shipped, page views, total API calls, etc) allows you to normalize resource consumption with the value delivered to your organization. It also provides a simple way to track progress towards your goals over time. For example, if the number of orders on your ecommerce site remained constant over the last month, but your AWS infrastructure usage decreased by 20%, you can attribute the efficiency gains to your optimization efforts, not changes in your customer behavior.

Utilize all of the available pricing models

Compute tasks are the foundation of many customers’ workloads, so it typically sees biggest benefit by optimization. Amazon EC2 provides resizable compute across a wide variety of compute instances, is well-suited to virtually every use case, is available via a number of highly flexible pricing options. One of the simplest changes you can make to decrease your costs on AWS is to review the purchase options for the compute and storage resources that you already use.

Amazon EC2 provides multiple purchasing options to enable you to optimize your costs based on your needs. Because every workload has different requirements, we recommend a combination of purchase options tailored for your specific workload needs. For steady-state workloads that can have a 1-3 year commitment, using Compute Savings Plans helps you save costs, move from one instance type to a newer, more energy-efficient alternative, or even between compute solutions (e.g., from EC2 instances to AWS Lambda functions, or AWS Fargate).

EC2 Spot instances are another great way to decrease cost and increase efficiency on AWS. Spot Instances make unused Amazon EC2 capacity available for customers at discounted prices. At AWS, one of our goals it to maximize utilization of our physical resources. By choosing EC2 Spot instances, you’re running on hardware that would otherwise be sitting idle in our datacenters. This increases the overall efficiency of the cloud, because more of our physical infrastructure is being used for meaningful work. Spot instances use market-based pricing that changes automatically based on supply and demand. This means that the hardware with the most spare capacity sees the highest discounts, sometimes up to XX% off on-demand prices, to encourage our customers to choose that configuration.

Savings Plans are ideal for predicable, steady-state work. On-demand is best suited for new, stateful, and spiky workloads which can’t be instance, location, or time flexible. Finally, Spot instances are a great way to supplement the other options for applications that are fault tolerant and flexible. AWS recommends using a mix of pricing models based on your workload needs and ability to be flexible.

By using these pricing models, you’re creating signals for your future compute needs, which helps AWS better forecast resource demands, manage capacity, and run our infrastructure in a more sustainable way.

Choose efficient, purpose-built processors whenever possible

Choosing the right processor for your application is as equally important consideration because under certain use cases a more powerful processor can allow for the same level of compute power with a smaller carbon footprint. AWS has the broadest choice of processors, such as Intel – Xeon scalable processors, AMD – AMD EPYC processors, GPU’s FPGAs, and Custom ASICs for Accelerated Computing.

AWS Graviton3, AWS’s latest and most power-efficient processor, delivers 3X better CPU performance per-watt than any other processor in AWS, provides up to 40% better price performance over comparable current generation x86-based instances for various workloads, and helps customers reduce their carbon footprint. Consider transitioning your workload to Graviton-based instances to improve the performance efficiency of your workload (see AWS Graviton Fast Start and AWS Graviton2 for ISVs). Note the considerations when transitioning workloads to AWS Graviton-based Amazon EC2 instances.

For machine learning (ML) workloads, use Amazon EC2 instances based on purpose-built Amazon Machine Learning (Amazon ML) chips, such as AWS TrainiumAWS Inferentia, and Amazon EC2 DL1.

Optimize for hardware utilization

The goal of efficient environments is to use only as many resources as required in order to meet your needs. Thankfully, this is made easier on the cloud because of the variety of instance choices, the ability to scale dynamically, and the wide array of tools to help track and optimize your cloud usage. At AWS, we offer a number of tools and services that can help you to optimize both the size of individual resources, as well as scale the total number of resources based on traffic and load.

Two of the most important tools to measure and track utilization are Amazon CloudWatch and the AWS Cost & Usage Report (CUR). With CloudWatch, you can get a unified view of your resource metrics and usage, then analyze the impact of user load on capacity utilization over time. The Cost & Usage Report (CUR) can help you understand which resources are contributing the most to your AWS usage, allowing you to fine-tune your efficiency and save on costs. CUR data is stored in S3, which allows you to query it with tools like Amazon Athena or generate custom reports in Amazon QuickSight or integrate with AWS Partner tools for better visibility and insights.

An example of a tool powered by CUR data is the AWS Cost Intelligence Dashboard. The Cost Intelligence Dashboard provides a detailed, granular, and recommendation-driven view of your AWS usage. With its prebuilt visualizations, it can help you identify which service and underlying resources are contributing the most towards your AWS usage, and see the potential savings you can realize by optimizing. It even provides right sizing recommendations and the appropriate EC2 instance family to help you optimize your resources.

Cost Intelligence Dashboard is also integrated with AWS Compute Optimizer, which makes instance type and size recommendations based on workload characteristics. For example, it can identify if the workload is CPU-intensive, if it exhibits a daily pattern, or if local storage is accessed frequently. Compute Optimizer then infers how the workload would have performed on various hardware platforms (for example, Amazon EC2 instance types) or using different configurations (for example, Amazon EBS volume IOPS settings, and AWS Lambda function memory sizes) to offer recommendations. For stable workloads, check AWS Compute Optimizer at regular intervals to identify right-sizing opportunities for instances. By right sizing with Compute Optimizer, you can increase resource utilization and reduce costs by up to 25%. Similarly, Lambda Power Tuning can help choose the memory allocated to Lambda functions is an optimization process that balances speed (duration) and cost while lowering your carbon emission in the process.

CloudWatch metrics are used to power EC2 Autoscaling, which can automatically choose the right instance to fit your needs with attribute-based instance selection and scale your entire instance fleet up and down based on demand in order to maintain high utilization. AWS Auto Scaling makes scaling simple with recommendations that let you optimize performance, costs, or balance between them. Configuring and testing workload elasticity will help save money, maintain performance benchmarks, and reduce the environmental impact of workloads. You can utilize the elasticity of the cloud to automatically increase the capacity during user load spikes, and then scale down when the load decreases. Amazon EC2 Auto Scaling allows your workload to automatically scale up and down based on demand. You can set up scheduled or dynamic scaling policies based on metrics such as average CPU utilization or average network in or out. Then, you can integrate AWS Instance Scheduler and Scheduled scaling for Amazon EC2 Auto Scaling to schedule shut downs and terminate resources that run only during business hours or on weekdays to further reduce your carbon footprint.

Design applications to minimize overhead and use fewer resources

Using the latest Amazon Machine Image (AMI) gives you updated operating systems, packages, libraries, and applications, which enable easier adoption as more efficient technologies become available. Up-to-date software includes features to measure the impact of your workload more accurately, as vendors deliver features to meet their own sustainability goals.

By reducing the amount of equipment that your company has on-premises and using managed services, you can help facilitate the move to a smaller, greener footprint. Instead of buying, storing, maintaining, disposing of, and replacing expensive equipment, businesses can purchase services as they need that are already optimized with a greener footprint. Managed services also shift responsibility for maintaining high average utilization and sustainability optimization of the deployed hardware to AWS. Using managed services will help distribute the sustainability impact of the service across all of the service tenants, thereby reducing your individual contribution. The following services help reduce your environmental impact because capacity management is automatically optimized.

 AWS  Managed Service   Recommendation for sustainability improvement

Amazon Aurora

You can use Amazon  Aurora Serverless to automatically start up, shut down, and scale capacity up or down based on your application’s needs.

Amazon Redshift

You can use Amazon Redshift Serverless to run and scale data warehouse capacity.

AWS Lambda

You can Migrate AWS Lambda functions to Arm-based AWS Graviton2 processors.

Amazon ECS

You can run Amazon ECS on AWS Fargate to avoid the undifferentiated heavy lifting by leveraging sustainability best practices AWS put in place for management of the control plane.

Amazon EMR

You can use EMR Serverless to avoid over- or under-provisioning resources for your data processing jobs.

AWS Glue

You can use Auto-scaling for AWS Glue to enable on-demand scaling up and scaling down of the computing resources.

 Centralized data centers consume a lot of energy, produce a lot of carbon emissions and cause significant electronic waste. While more data centers are moving towards green energy, an even more sustainable approach (alongside these so-called “green data centers”) is to actually cut unnecessary cloud traffic, central computation and storage as much as possible by shifting computation to the edge. Edge Computing stores and uses data locally, on or near the device it was created on. This reduces the amount of traffic sent to the cloud and, at scale, can limit the overall energy used and carbon emissions.

Use storage that best supports how your data is accessed and stored to minimize the resources provisioned while supporting your workload. Solid state devices (SSDs) are more energy intensive than magnetic drives and should be used only for active data use cases. You should look into using ephemeral storage whenever possible and categorize, centralize, deduplicate, and compress persistent storage.

AWS OutpostsAWS Local Zones and AWS Wavelength services deliver data processing, analysis, and storage close to your endpoints, allowing you to deploy APIs and tools to locations outside AWS data centers. Build high-performance applications that can process and store data close to where it’s generated, enabling ultra-low latency, intelligent, and real-time responsiveness. By processing data closer to the source, edge computing can reduce latency, which means that less energy is required to keep devices and applications running smoothly. Edge computing can help to reduce the carbon footprint of data centers by using renewable energy sources such as solar and wind power.

Conclusion

In this blog post, we discussed key methods and recommended actions you can take to optimize your AWS compute infrastructure for resource efficiency. Using the appropriate EC2 instance types with the right size, processor, instance storage and pricing model can enhance the sustainability of your applications. Use of AWS managed services, options for edge computing and continuously optimizing your resource usage can further improve the energy efficiency of your workloads. You can also analyze the changes in your emissions over time as you migrate workloads to AWS, re-architect applications, or deprecate unused resources using the Customer Carbon Footprint Tool.

Ready to get started? Check out the AWS Sustainability page to find out more about our commitment to sustainability and learn more about renewable energy usage, case studies on sustainability through the cloud, and more.

Building a Cloud in the Cloud: Running Apache CloudStack on Amazon EC2, Part 2

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-a-cloud-in-the-cloud-running-apache-cloudstack-on-amazon-ec2-part-2/

This blog is written by Mark Rogers, SDE II – Customer Engineering AWS.

In part 1, I showed you how to run Apache CloudStack with KVM on a single Amazon Elastic Compute Cloud (Amazon EC2) instance. That simple setup is great for experimentation and light workloads. In this post, things will get a lot more interesting. I’ll show you how to create an overlay network in your Amazon Virtual Private Cloud (Amazon VPC) that allows CloudStack to scale horizontally across multiple EC2 instances. This same method could work with other hypervisors, too.

If you haven’t read it yet, then start with part 1. It explains why this network setup is necessary. The same prerequisites apply to both posts.

Making things easier

I wrote some scripts to automate the CloudStack installation and OS configuration on CentOS 7. You can customize them to meet your needs. I also wrote some AWS CloudFormation templates you can copy in order to create a demo environment. The README file has more details.

The scalable method

Our team started out using a single EC2 instance, as described in my last post. That worked at first, but it didn’t have the capacity we needed. We were limited to a couple of dozen VMs, but we needed hundreds. We also needed to scale up and down as our needs changed. This meant we needed the ability to add and remove CloudStack hosts. Using a Linux bridge as a virtual subnet was no longer adequate.

To support adding hosts, we need a subnet that spans multiple instances. The solution I found is Virtual Extensible LAN (VXLAN). It’s lightweight, easy to configure, and included in the Linux kernel. VXLAN creates a layer 2 overlay network that abstracts away the details of the underlying network. It allows machines in different parts of a network to communicate as if they’re all attached to the same simple network switch.

Another example of an overlay network is an Amazon VPC. It acts like a physical network, but it’s actually a layer on top of other networks. It’s networks all the way down. VXLAN provides a top layer where CloudStack can sit comfortably, handling all of your VM needs, blissfully unaware of the world below it.

An overlay network comes with some big advantages. The biggest improvement is that you can have multiple hosts, allowing for horizontal scaling. Having more hosts not only gives you more computing power, but also lets you do rolling maintenance. Instead of putting the database and file storage on the management server, I’ll show you how to use Amazon Elastic File System (Amazon EFS) and Amazon Relational Database Service (Amazon RDS) for scalable and reliable storage.

EC2 Instances

Let’s start with three Amazon EC2 instances. One will be a router between the overlay network and your Amazon VPC, the second one will be your CloudStack management server, and the third one will be your VM host. You’ll also need a way to connect to your instances, such as a bastion host or a VPN endpoint.

Three EC2 instances are connected to an AWS subnet. There's an overlay network that spans all three instances.The router instance connects the overlay network to the AWS subnet. The management instance contains the CloudStack management service, which is attached to the overlay network. The host instance contains the CloudStack agent and some VMs, all of which are connected to the overlay network.VXLAN must send and receive multicast traffic. Only Nitro instances can be multicast senders. As you plan, look at the list of Nitro instance types.

The router won’t need much computing power, but it will need enough network bandwidth to meet your needs. If you put Amazon EFS in the same subnet as your instances, then they’ll communicate with it directly, thereby reducing the load on the router. Decide how much network throughput you want, and then pick a suitable Nitro instance type.

After creating the router instance, configure AWS to use it as a router. Stop source/destination checking in the instance’s network settings. Then update the applicable AWS route tables to use the router as the target for the overlay network. The router’s security group needs to allow ingress to the CloudStack UI (TCP port 8080) and any services you plan to offer from VMs.

For the management server, you’ll want a Nitro instance type. It’s going to use more CPU than your router, so plan accordingly.

In addition to being a Nitro type, the host instance must also be a metal type. Metal instances have hardware virtualization support, which is needed by KVM. If you have a new AWS account with a low on-demand vCPU limit, then consider starting with an m5zn.metal, which has 48 vCPUs. Otherwise, I suggest going directly to a c5.metal because it provides 96 vCPUs for a similar price. There are bigger types available depending on your compute needs, budget, and vCPU limit. If your account’s on-demand vCPU limit is too low, then you can file a support ticket to have it raised.

Networking

All of the instances should be on a dedicated subnet. Sharing the subnet with other instances can cause communication issues. For an example, refer to the following figure. The subnet has an instance named TroubleMaker that’s not on the overlay network. If TroubleMaker sends a request to the management instance’s overlay network address, then here’s what happens:

  1. The request goes through the AWS subnet to the router.
  2. The router forwards the request via the overlay network.
  3. The CloudStack management instance has a connection to the same AWS subnet that TroubleMaker is on. Therefore, it responds directly instead of using the router. This isn’t the return path that AWS is expecting, so the response is dropped.

This diagram depicts the steps described in the previous paragraph.

If you move TroubleMaker to a different subnet, then the requests and responses will all go through the router. That will fix the communication issues.

The instances in the overlay network will use special interfaces that serve as VXLAN tunnel endpoints (VTEPs). The VTEPs must know how to contact each other via the underlay network. You could manually give each instance a list of all of the other instances, but that’s a maintenance nightmare. It’s better to let the VTEPs discover each other, which they can do using multicast. You can add multicast support using AWS Transit Gateway.

Here are the steps to make VXLAN multicasts work:

  1. Enable multicast support when you create the transit gateway.
  2. Attach the transit gateway to your subnet.
  3. Create a transit gateway multicast domain with IGMPv2 support enabled.
  4. Associate the multicast domain with your subnet.
  5. Configure the eth0 interface on each instance to use IGMPv2. The following sample code shows how to do this.
  6. Make sure that your instance security groups allow ingress for IGMP queries (protocol 2 traffic from 0.0.0.0/32) and VXLAN traffic (UDP port 4789 from the other instances).

CloudStack VMs must connect to the same bridge as the VXLAN interface. As mentioned in the previous post, CloudStack cares about names. I recommend giving the interface a name starting with “eth”. Moreover, this naming convention tells CloudStack which bridge to use, thereby avoiding the need for a dummy interface like the one in the simple setup.

The following snippet shows how I configured the networking in CentOS 7. You must provide values for these variables:

  • $overlay_host_ip_address, $overlay_netmask, and $overlay_gateway_ip: Use values for the overlay network that you’re creating.
  • $dns_address: I recommend using the base of the VPC IPv4 network range, plus two. You shouldn’t use 169.654.169.253 because CloudStack reserves link-local addresses for its own use.
  • $multicast_address: The multicast address that you want VXLAN to use. Pick something in the multicast range that won’t conflict with anything else. I recommend choosing from the IPv4 local scope (239.255.0.0/16).
  • $interface_name: The name of the interface VXLAN should use to communicate with the physical network. This is typically eth0.

A couple of the steps are different for the router instance than for the other instances. Pay attention to the comments!

yum install -y bridge-utils net-tools

# IMPORTANT: Omit the GATEWAY setting on the router instance!
cat << EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=no
USERCTL=no
NM_CONTROLLED=no
IPADDR=$overlay_host_ip_address
NETMASK=$overlay_netmask
DNS1=$dns_address
GATEWAY=$overlay_gateway_ip
EOF

cat << EOF > /sbin/ifup-local
#!/bin/bash
# Set up VXLAN once cloudbr0 is available.
if [[ \$1 == "cloudbr0" ]]
then
    ip link add ethvxlan0 type vxlan id 100 dstport 4789 group "$multicast_address" dev "$interface_name"
    brctl addif cloudbr0 ethvxlan0
    ip link set up dev ethvxlan0
fi
EOF

chmod +x /sbin/ifup-local

# Transit Gateway requires IGMP version 2
echo "net.ipv4.conf.$interface_name.force_igmp_version=2" >> /etc/sysctl.conf
sysctl -p

# Enable IPv4 forwarding
# IMPORTANT: Only do this on the router instance!
echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p

# Restart the network service to make the changes take effect.
systemctl restart network

Storage

Let’s look at storage. Create an Amazon RDS database using the MySQL 8.0 engine, and set a master password for CloudStack’s database setup tool to use. Refer to the CloudStack documentation to find the MySQL settings that you’ll need. You can put the settings in an RDS parameter group. In case you’re wondering why I’m not using Amazon Aurora, it’s because CloudStack needs the MyISAM storage engine, which isn’t available in Aurora.

I recommend Amazon EFS for file storage. For efficiency, create a mount target in the subnet with your EC2 instances. That will enable them to communicate directly with the mount target, thereby bypassing the overlay network and router. Note that the system VMs will use Amazon EFS via the router.

If you want, you can consolidate your CloudStack file systems. Just create a single file system with directories for each zone and type of storage. For example, I use directories named /zone1/primary, /zone1/secondary, /zone2/primary, etc. You should also consider enabling provisioned throughput on the file system, or you may run out of bursting credits after booting a few VMs.

One consequence of the file system’s scalability is that the amount of free space (8 exabytes) will cause an integer overflow in CloudStack! To avoid this problem, reduce storage.overprovisioning.factor in CloudStack’s global settings from 2 to 1.

When your environment is ready, install CloudStack. When it asks for a default gateway, remember to use the router’s overlay network address. When you add a host, make sure that you use the host’s overlay network IP address.

Cleanup

If you used my CloudFormation template, delete the stack and remove any route table entries you added.

If you didn’t use CloudFormation, here are the things to delete:

  1. The CloudStack EC2 instances
  2. The Amazon RDS database and parameter group
  3. The Amazon EFS file system
  4. The transit gateway multicast domain subnet association
  5. The transit gateway multicast domain
  6. The transit gateway VPC attachment
  7. The transit gateway
  8. The route table entries that you created
  9. The security groups that you created for the instances, database, and file system

Conclusion

The approach I shared has many steps, but they’re not bad when you have a plan. Whether you need a simple setup for experiments, or you need a scalable environment for a data center migration, you now have a path forward. Give it a try, and comment here about the things you learned. I hope you find it useful and fun!

“Apache”, “Apache CloudStack”, and “CloudStack” are trademarks of the Apache Software Foundation.

Building a Cloud in the Cloud: Running Apache CloudStack on Amazon EC2, Part 1

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-a-cloud-in-the-cloud-running-apache-cloudstack-on-amazon-ec2-part-1/

This blog is written by Mark Rogers, SDE II – Customer Engineering AWS.

How do you put a cloud inside another cloud? Some features that make Amazon Elastic Compute Cloud (Amazon EC2) secure and wonderful also make running CloudStack difficult. The biggest obstacle is that AWS and CloudStack both want to manage network resources. Therefore, we must keep them out of each other’s way. This requires some steps that aren’t obvious, and it took a long time to figure out. I’m going to share what I learned, so that you can navigate the process more easily.

Apache CloudStack is an open-source platform for deploying and managing virtual machines (VMs) and the associated network and storage infrastructure. You would normally run it on your own hardware to create your own cloud. But there can be advantages to running it inside of an Amazon Virtual Private Cloud (Amazon VPC), including how it could help you migrate out of a data center. It’s a great way to create disposable environments for experiments or training. Furthermore, it’s a convenient way to test-drive the new CloudStack support in Amazon Elastic Kubernetes Service (Amazon EKS) Anywhere. In my case, I needed to create development and test environments for a project that uses the CloudStack API. The environments needed to be shared and scalable. Our build pipelines were already in AWS, so it made sense to put the new environments there, too.

CloudStack can work with a number of hypervisors. The instructions in this article will use Kernel-based Virtual Machine (KVM) on Linux. KVM will manage the VMs at a low level, and CloudStack will manage KVM.

Prerequisites

Most of the information in this article should be applicable to a range of CloudStack versions. I targeted CloudStack 4.14 on CentOS 7. I also tested CloudStack versions 4.16 and 4.17, and I recommend them.

The official CentOS 7 x86_64 HVM image works well. If you use a different Linux flavor or version, then you might have to modify some of the implementation details.

You’ll need to know the basics of CloudStack. The scope of this article is making CloudStack and AWS coexist peacefully. Once CloudStack is running, I’m assuming that you’ll handle things from there.  Refer to the AWS documentation and CloudStack documentation for information on security and other best practices.

Making things easier

I wrote some scripts to automate the installation. You can run them on EC2 instances with CentOS 7, and they’ll do all the installation and OS configuration for you. You can use them as they are, or customize them to meet your needs. I also wrote some AWS CloudFormation templates you can copy in order to create a demo environment. The README file has more details.

Amazon EC2 instance types

KVM requires hardware virtualization support. Most EC2 instances are VMs that don’t support nested virtualization. To get access to the bare hardware, you need a metal instance type.

I like c5.metal because it’s one of the least expensive metal types, and has a low cost per vCPU. It has 96 vCPUs and 192 GiB of memory. If you run 20 VMs on it, with 4 CPU cores and 8 GiB of memory each, then you’d still have 16 vCPUs and 32 GiB to share between the operating system, CloudStack, and MySQL. Using CloudStack’s overprovisioning feature, you could fit even more VMs if they’re running light loads.

Networking

The biggest challenge is the network. AWS knows which IP and MAC addresses should exist, and it knows the machines to which they should belong. It blocks any traffic that doesn’t fit its idea of how the network should behave. Simultaneously, CloudStack assumes that any IP or MAC address it invents should work just fine. When CloudStack assigns addresses to VMs on an AWS subnet, their network traffic gets blocked.

You could get around this by enabling network address translation (NAT) on the instance running CloudStack. That’s a great solution if it fits your needs, but it makes it hard for other machines in your Amazon VPC to contact your VMs. I recommend a different approach.

Although AWS restricts what you can do with its layer 2 network, it’s perfectly happy to let you run your own layer 3 router. Your EC2 instance can act as a router to a new virtual subnet that’s outside of the jurisdiction of AWS. The instance integrates with AWS just like a VPN appliance, routing traffic to wherever it needs to go. CloudStack can do whatever it wants in the virtual subnet, and everybody’s happy.

What do I mean by a virtual subnet? This is a subnet that exists only inside the EC2 instance.  It consists of logical network interfaces attached to a Linux bridge. The entire subnet exists inside a single EC2 instance. It doesn’t scale well, but it’s simple. In my next post, I’ll cover a more complicated setup with an overlay network that spans multiple instances to allow horizontal scaling.

The simple way

The simple way is to put everything in one EC2 instance, including the database, file storage, and a virtual subnet. Because everything’s stored locally, allocate enough disk space for your needs. 500 GB will be enough to support a few basic VMs. Create or select a security group for your instance that gives users access to the CloudStack UI (TCP port 8080). The security group should also allow access to any services that you’ll offer from your VMs.

EC2 instance summary info showing 1 instance, CentOS 7 (x86_64) AMI, c5.metal instance type, a security group name, and a 500 GiB volume

When you have your instance, configure AWS to treat it as a router.

  1. Go to Amazon EC2 in the AWS Management Console.
  2. Select your instance, and stop source/destination checking.

In the EC2 Actions menu, select Networking, then Change source/destination check.

3. Update the subnet route tables.

a. Go to the VPC settings, and select Route Tables.

b. Identify the tables for subnets that need CloudStack access.

c. In each of these tables, add a route to the new virtual subnet. The route target should be your EC2 instance.

4. Depending on your network needs, you may also need to add routes to transit gateways, VPN endpoints, etc.

Because everything will be on one server, creating a virtual subnet is simply a matter of creating a Linux bridge. CloudStack must find a network adapter attached to the bridge. Therefore, add a dummy interface with a name that CloudStack will recognize.

A single EC2 instance contains the CloudStack management service, the CloudStack agent, a dummy network interface, several virtual machines, and a router. All of those things are connected to each other by a virtual subnet that exists inside the instance. The instance's elastic network interface is connected between the router and the Amazon VPC.The following snippet shows how I configure networking in CentOS 7. You must provide values for the variables $virutal_host_ip_address and $virtual_netmask to reflect the virtual subnet that you want to create. For $dns_address, I recommend the base of the VPC IPv4 network range, plus two. You shouldn’t use 169.654.169.253 because CloudStack reserves link-local addresses for its own use.

yum install -y bridge-utils net-tools

# The bridge must be named cloudbr0.

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=yes
USERCTL=no
NM_CONTROLLED=no
IPADDR=$virtual_host_ip_address
NETMASK=$virtual_netmask
DNS1=$dns_address
EOF

# Create a dummy network interface.
cat << EOF > /etc/sysconfig/modules/dummy.modules
#!/bin/sh
/sbin/modprobe dummy numdummies=1
/sbin/ip link set name ethdummy0 dev dummy0
EOF

chmod +x /etc/sysconfig/modules/dummy.modules
/etc/sysconfig/modules/dummy.modules

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-ethdummy0
TYPE=Ethernet
BOOTPROTO=none
NAME=ethdummy0
DEVICE=ethdummy0
ONBOOT=yes
BRIDGE=cloudbr0
NM_CONTROLLED=no
EOF

# Turn the instance into a router

echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p

# Must kill dhclient or the network service won't restart properly.
# A reboot would also work, if you’d rather do that.

pkill dhclient
systemctl restart network

CloudStack must know which IP addresses to use for inter-service communication. It will select by resolving the machine’s fully qualified domain name (FQDN) to an address. The following commands will make it to choose the right one. You must provide a value for $virtual_host_ip_address.

hostnamectl set-hostname cloudstack.localdomain

echo "$virtual_host_ip_address cloudstack.localdomain" >> 
/etc/hosts

You can finish the setup by following the Quick Installation Guide.

Remember that CloudStack is only directly connected to your virtual network. The EC2 instance is the router that connects the virtual subnet to the Amazon VPC. When you’re configuring CloudStack, use your instance’s virtual subnet address as the default gateway.

Use the EC2 instance's virtual subnet IP address as the default gateway in CloudStack. In this example, the virtual subnet is 10.100.0.0/16, and the instance's address in that subnet is 10.100.0.1. CloudStack then uses 10.100.0.1 as the default gateway.

To access CloudStack from your workstation, you’ll need a connection to your VPC. This can be through a client VPN or a bastion host. If you use a bastion, its subnet needs a route to your virtual subnet, and you’ll need an SSH tunnel for your browser to access the CloudStack UI. The UI is at http://x.x.x.x:8080/client/, where x.x.x.x is your CloudStack instance’s virtual subnet address. Note that CloudStack’s console viewer won’t work if you’re using an SSH tunnel.

If you’re just experimenting with CloudStack, then I suggest saving money by stopping your instance when it isn’t needed. The safe way to do that is:

  1. Disable your zone in the CloudStack UI.
  2. Put the primary storage into maintenance mode.
  3. Wait for the switch to maintenance mode to be complete.
  4. Stop the EC2 instance.

When you’re ready to turn everything back on, simply reverse those steps. If you have any virtual routers in CloudStack, then you may need to start those, too.

Cleanup

If you used my CloudFormation template, then delete the stack and remove any route table entries you added. If you didn’t use CloudFormation, then terminate the EC2 instance, delete the security group you created for it, and remove any route table entries that you added.

Conclusion

Getting CloudStack to run on AWS isn’t so bad. The hardest part is simply knowing how. The setup explained here is great for small installations, but it can only scale vertically. In my next post, I’ll show you how to create an installation that scales horizontally. Instead of using a virtual subnet that exists in a single EC2 instance, we’ll build an overlay network that spans multiple instances. It will use more components and features, including some that might be new to you. I hope you find it interesting!

Now that you can create a simple setup, give it a try! I hope you have fun and learn something new along the way. Comment with the results of your experiments.

“Apache”, “Apache CloudStack”, and “CloudStack” are trademarks of the Apache Software Foundation.

Deploying Oracle RAC in AWS Outposts via FlashGrid Cluster

Post Syndicated from Andreas Bogner original https://aws.amazon.com/blogs/architecture/deploying-oracle-rac-in-aws-outposts-via-flashgrid-cluster/

Amazon Web Services (AWS) customers are deploying AWS Outposts as a fully managed solution that delivers AWS infrastructure and services to on-premises or edge locations for a truly consistent hybrid experience. Those hybrid cloud workloads can require highly available Oracle databases running on- or close-to premises. One way to meet this requirement is Oracle Real Application Clusters (RAC) on top of two or more AWS Outposts racks using the FlashGrid Cluster solution.

In this post, we follow up on a Marketplace blog that describes how to deploy Oracle RAC via FlashGrid Cluster in the AWS regions. Deploying the solution onto AWS Outposts requires additional steps as the Outpost racks communicate between VPCs and across the on-premises network. In the following we explain how to configure the network for a multi-Outpost setup, how to deploy FlashGrid Cluster with Oracle RAC, and how to connect to the database cluster from the on-premises network.

Solution overview

This architecture uses two logical Outposts (42U racks) that are connected to different Availability Zones (AZs) in the same region for high availability. The networking is configured such that the communication between Amazon Elastic Compute Cloud (Amazon EC2) instances on two distinct Outpost racks uses Outpost’s local gateways and the corporate network. Therefore, data will not leave the premises unless explicitly copied to the Cloud region.

The FlashGrid Cluster solution deploys one node to each Outpost rack and an additional quorum node to the Cloud region. The cluster nodes provide the Oracle ASM disk groups and the Oracle RAC nodes. FlashGrid Cluster for Oracle RAC ensures that storage is replicated between nodes (Figure 1).

Architecture for a deployment across two logical AWS Outposts

Figure 1. Architecture for a deployment across two logical AWS Outposts

We provide a complete, step-by-step guide that deploys an Oracle RAC database across two Outpost racks.

It takes three steps to get your database up and running:

  1. Networking: prepare the virtual private clouds (VPCs), subnets, and route tables
  2. FlashGrid Cluster: use the FlashGrid Launcher to create an Oracle RAC cluster
  3. Database: configure Oracle RAC and connect to the database

The deployment uses a CloudFormation template that is generated based on the workload’s specific parameters.

Prerequisites

For this solution, you require:

Networking setup

A single VPC cannot span multiple Outposts. Therefore, each Outpost should have a separately configured VPC.

A security group allowing traffic between the nodes of the cluster must be created in each VPC. Private IP addresses of the other nodes in the cluster should be configured as allowed sources of the traffic. (A security group rule cannot reference a security group in a different VPC.)

The steps suggested for configuring the network, as in Figure 2:

  1. Create a VPC for each Outpost and a VPC between those VPCs. Ensure that the VPCs are associated with the respective local gateways on each Outpost.
  2. Within each VPC, create a subnet for the corresponding Outpost and a subnet in the Region. Add route tables to each Outpost subnet that allow cross-VPC routing (as in Table 1).Table 1. Main route table for VPC ‘Outpost 1’

    Destination Target
    10.0.1.0/24 local
    10.0.2.0/25 lgw-11111111111111111
    10.0.2.128/25 pcx-11223344556677889
    pl-12345678 vpce-12345678901234567
  3. For each of the cluster nodes, allocate a private IP address within the corresponding subnet.
  4. Within each VPC create a separate security group for each cluster. The security group must allow all inbound and outbound traffic from all nodes of the same cluster using their private IP addresses.
  5. Optional: Create a public subnet in the region in either one of the VPCs. Deploying a bastion host EC2 instance into this subnet will allow you to connect by SSH to the Oracle RAC nodes later.
  6. Open SSH access (TCP 22) and Oracle client access (default ports: TCP 1521, 1522) either by adding the corresponding rules to the same security groups or assigning additional security groups to all cluster node instances after the cluster deployment is complete.
  7. Create an Amazon Simple Storage Service (Amazon S3) gateway endpoint in each VPC and add corresponding routes. It is critical to use Amazon S3 gateway endpoints, otherwise the Oracle RAC nodes cannot download the installation binaries from the S3 bucket.
Setting up subnets on each Outpost via the AWS console

Figure 2. Setting up subnets on each Outpost via the AWS console

FlashGrid Cluster setup

The process of deploying FlashGrid Cluster consists of five main steps:

  1. Subscribing to a FlashGrid Cluster product in AWS Marketplace
  2. Uploading Oracle installation files to an S3 bucket in the Region
  3. Modifying the cluster configuration template with parameters specific to Outposts
  4. Using the FlashGrid Launcher tool to finalize cluster configuration and generate an AWS CloudFormation template
  5. Deploying the CloudFormation template

Subscribing to a FlashGrid Cluster product in AWS Marketplace

  1. Open one of the FlashGrid Cluster product pages in AWS Marketplace corresponding with the preferred operating system.
  2. Select Continue.
  3. Select the Manual Launch tab.
  4. Choose Accept Software Terms.

Creating an S3 bucket for Oracle installation files

During cluster provisioning, Oracle installation files are downloaded from an S3 bucket. The list of files that must be placed in the S3 bucket is displayed on the Oracle Files tab of the FlashGrid Launcher tool (next step). The S3 bucket is not supposed to have any sensitive data in it and can be hosted in any AWS Region. You can use the same bucket for multiple deployments.

To create an S3 bucket with the correct IAM role access:

  1. Create an S3 bucket or folder for uploading the installation files
  2. Within the IAM console, create a policy named GetOracleFilesFromS3, which allows the s3:GetObject action on all uploaded files
  3. Create an EC2 instance role named GetOracleFilesFromS3 and attach the GetOracleFilesFromS3 policy to it
  4. Use the GetOracleFilesFromS3 role when configuring cluster parameters in the FlashGrid Launcher
  5. Once the FlashGrid Launcher has provided a list of required Oracle installation files, place those into the S3 bucket

Modifying the cluster configuration template with parameters specific to Outposts

Download the configuration file template, and open it in a text editor. The template is for a two-node RAC cluster on Outposts in Multi-AZ configuration.

For the database node instances in the configuration file, manually set the following attributes:

  • outpost_arn: ARN of the target Outpost
  • ip: private IP address assigned to the EC2 instance
  • sg: security group ID
  • ins_type: instance type (ensure that this matches the available types for the Outposts)

Creating a cluster configuration file using FlashGrid Launcher tool

FlashGrid Launcher is an online tool that allows creating your desired configuration for the cluster and then generating a CloudFormation template for it.

  1. Open FlashGrid Launcher and upload the customized configuration file created at the previous step.
  2. Follow the step-by-step instructions in the Launcher tool (see Figure 3) until you get to the last Launch step.
    a. In the Oracle Files step, there is a list of installer files. Upload these files into the S3 bucket from the previous section (Creating an S3 bucket for Oracle installation files).
    b. In the Storage step, ignore the IOPS and MBPS parameters, as these parameters will not be used because Outposts uses the GP2 type of volumes.
    c. In the Network step, provide the Security Group ID that will be used for the quorum node located in the Region. Make sure to use a security group from the quorum node’s VPC.
  3. At the Launch step, click Launch FlashGrid. This generates a CloudFormation template and takes you to the AWS CloudFormation console.
The FlashGrid Launcher generates a CloudFormation template based on various input parameters

Figure 3. The FlashGrid Launcher generates a CloudFormation template based on various input parameters

Deploying the CloudFormation template

  1. Once you are in AWS CloudFormation console, select Next.
  2. Select the SSH key; do not change network parameters if you set them correctly in the Launcher. Select Next.
  3. On the Options page, if you added tags in FlashGrid Launcher, then do not add the same tags in CloudFormation console. You are free to add further tags in this step. Select Next.
  4. Select Create, and wait until the status of the stack changes to CREATE_COMPLETE.
  5. Connect by using SSH from the bastion host to the first cluster node as ec2-user with the SSH key you were provided when the stack was deployed.
  6. The welcome message details the current initialization status of the cluster: in progress, failed, or completed.
  7. If initialization is still in progress, wait for it to complete (this includes Oracle software installation and configuration). A broadcast message is delivered when initialization completes or fails. Cluster initialization takes 1 to 2 hours, depending on configuration.

Oracle database configuration

You can create an Oracle RAC database (or multiple databases) using Oracle DBCA tool and following standard Oracle best practices.

To connect to the Oracle RAC database using the SCAN listener, configure the Domain Name System (DNS) records and the connection string on client side:

  • On the DNS server(s) used by clients, add a record resolving to the VPC Private IP address of the node instance for each database node. In a test environment without a DNS server, the entries can be added to /etc/hosts on the clients instead of the DNS server.
  • In our example deployment, this is:
    rac1.example.com 10.0.1.77
    rac2.example.com 10.0.2.77
  • It is important that hostnames and domain names in the DNS records exactly match the hostnames as reported by the hostname command on the database servers.
  • Finally, define a connection string with the addresses of all database nodes listed:
    SCAN=
         (DESCRIPTION=
               (TRANSPORT_CONNECT_TIMEOUT=3) (RETRY_COUNT=6)
               (ADDRESS=(PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1521))
               (ADDRESS=(PROTOCOL=tcp) (HOST=rac2.example.com) (PORT=1521))
               (CONNECT_DATA=
                  (SERVER=DEDICATED)
                  (SERVICE_NAME=<service name>)
               )
          )

Visit the FlashGrid Help Center for more information on creating and connecting to a database.

Cleanup

The Oracle RAC instance together with the FlashGrid Cluster solution is deployed as a single CloudFormation stack. Deleting this stack will remove all associated resources, including the EBS volumes. Follow the approach detailed in the Delete Your Stacks But Keep Your Data blog post at the deployment stage to retain snapshots of all volumes automatically. You can also take snapshots of all EBS volumes manually before deleting the stack.

Conclusion

In this blog post, we have explored how to deploy Oracle RAC across two or more Outpost racks using FlashGrid Cluster. Running Oracle RAC on top of Outposts provides a highly available database solution for use cases that require you to run the workload on-premises, as in data-residency or latency-critical scenarios. Using the growing number of features for AWS Outposts rack, and you can more efficiently run hybrid workloads using the same tools and automation both in Cloud Regions and on-premises.

Further reading

An elastic deployment of Stable Diffusion with Discord on AWS

Post Syndicated from Steven Warren original https://aws.amazon.com/blogs/architecture/an-elastic-deployment-of-stable-diffusion-with-discord-on-aws/

Stable Diffusion is a state-of-the-art text-to-image model that generates images from text. Deploying text-to-image models such as Stable Diffusion can be difficult. Currently, Stable Diffusion requires specific computer hardware known as graphical processing units (GPUs). You can lower the bar to entry by offloading the text-to-image generation onto Amazon Web Services (AWS).

Discord is a popular voice, video, and text communication service. It provides a user interface that people can use to make text-to-image requests. When deployed, all members of a Discord server can create images by using Discord Slash Commands.

In this post, we discuss how to deploy a highly available solution on AWS. This solution will perform text-to-image generation with Stable Diffusion and use Discord as the user interface.

Solution architecture

Many of the services selected are serverless, which will offer many benefits. At the time of writing, Stable Diffusion requires a GPU for inference. Amazon Elastic Compute Cloud (Amazon EC2) was selected because it provides GPUs. The solution architecture is shown in the Figure 1.

Solution architecture diagram

Figure 1. Solution architecture diagram

Let us walk through the architecture of this solution.

Auto scaling with custom metrics

To properly scale the system, a custom Amazon CloudWatch metric is created. This custom CloudWatch metric calculates the number of Amazon Elastic Container Service (Amazon ECS) tasks required to adequately handle the amount of Amazon Simple Queue Service (Amazon SQS) messages. You should have a high-resolution CloudWatch metric to scale up quickly. For this use case, a high-resolution CloudWatch metric of every 10 seconds was implemented.

Next, let’s create the custom CW metric. Amazon EventBridge rules provide a serverless solution for starting actions on a schedule. Here we use an Amazon EventBridge rule, which initiates an AWS Step Function Express Workflow every minute. With the Express Workflow, we can create serverless workflows that take less than five minutes, which helps us avoid long running AWS Lambda functions. The Express Workflow runs a Lambda function every 10 seconds over a one-minute period, which generates the custom CloudWatch metric.

Two high-resolution CloudWatch alarms scale the system up and down, and are initiated by the custom CloudWatch metric. One CloudWatch alarm increases the ECS tasks and EC2 machines. The other alarm decreases the ECS tasks and EC2 machines.

Handling Discord requests

Someone on Discord sends a request. The Amazon API Gateway HTTP API receives the request and passes the information to an AWS Lambda function. The HTTP API provides a cost-effective option compared to REST APIs and provides tools for authentication and authorization. The HTTP API uses cross-origin resource sharing (CORS), which provides security because it only allows discord.com as an origin.

The AWS Lambda function provides a serverless solution for responding to the HTTP API requests. It transforms the HTTP API request and sends a message to the SQS First-In-First-Out (FIFO) queue. SQS seamelessly decouples the architecture between user requests and backend processing. A FIFO queue ensures that user requests are processed in the order they were requested. The AWS Lambda function sends a response back to the HTTP API within three seconds, which is a requirement of Discord Slash Commands.

When scaling up, an EC2 instance is registered with the ECS cluster. The EC2 instance type was selected because it provides GPU instances. ECS provides a repeatable solution to deploy the container across a variety of instance types. This solution currently only uses the g4dn.xlarge instance type. The ECS service will then place an ECS task onto the eligible EC2 instance. The ECS task will use the Amazon Elastic Container Registry (Amazon ECR) private registry to pull the image,  perform text-to-image processing, and respond to the Discord request. The ECR private registry is a managed container registry that manages the image.

Once there is an ECS task running on an Amazon EC2 instance, the ECS task will consume messages from the queue using long polling. This reduces the amount of ReceiveMessage requests the ECS task needs to send. When the ECS task receives a message from the queue, it will then processes the request.

Estimated monthly cost

The example assumes 1,000 requests per month and each request takes 16 seconds to complete. Extra EC2 time was added for the time to begin processing messages (seven minutes) and auto scaling cooldown time (30 minutes). You can adjust the pricing calculations with the AWS Pricing Calculator to reflect your usage and estimated cost.

Prerequisites

This blog assumes familiarity with Terraform, Docker, Discord, Amazon EC2Amazon Elastic Block Store (Amazon EBS)Amazon Virtual Private Cloud (Amazon VPC), AWS Identity and Access Management (IAM), Amazon API Gateway, AWS Lambda, Amazon SQS, Amazon Elastic Container Registry (Amazon ECR), Amazon ECS, Amazon EventBridge, AWS Step Functions, and Amazon CloudWatch.

For this walkthrough, you should have the following prerequisites:

  • Access to an AWS account, with permissions to create the resources described in the installation steps section
  • A virtual private cloud (VPC) with public subnets that is associated with an internet gateway in the region you are deploying into
    We suggest using the default VPC. The subnets will need the tag of key: Tier and value: Public and be attached to the VPC. If you decided to create your own VPC with subnets, make sure that auto-assign IP settings is enabled.
  • An IAM user with the required permissions to deploy the infrastructure
  • A new Discord application that is registered to a Discord server you own with the scope applications.command. Use this tutorial if you need a starting point on creating a Discord application.
    • Discord Bot token
    • Discord Application ID
    • Discord Public Key
  • A Hugging Face account
  • A computer with the following packages installed:

Walkthrough

Complete the following steps to deploy this solution on AWS.

Increase EC2 limits

This solution uses the g4dn.xlarge instance type, which might require you to request an EC2 limit increase. Check your current limit of Running On-Demand All G and VT instances. Make sure you have more than 4 vCPU; a single g4dn.xlarge requires 4 vCPU. We suggest requesting 8 vCPU so that you can access 2 g4dn.xlarge instances.

Deploy the infrastructure

  1. Ensure you have at least 60 GB of storage available and you’re running on a 64-bit x86 architecture system.
  2. Open a command line on the machine you will be deploying from.
  3. Log in as your AWS user through the AWS CLI with the command aws configure. If you are using an EC2 instance, create and use an instance profile rather than using the AWS CLI.
    The region you select will be the one you will deploy into.
  4. Clone the Terraform repository:
    git clone https://github.com/aws-samples/amazon-scalable-infra-discord-diffusion.git
  5. Navigate into the Terraform repository:
    cd amazon-scalable-infra-discord-diffusion
  6. Customize the variables in terraform.tfvars to match your deployment.
  7. Export the following secrets to the command line:
    • export TF_VAR_discord_bot_secret='DISCORD_BOT_SECRET_HERE'
    • export TF_VAR_huggingface_password='HUGGINGFACE_PASSWORD_HERE'
  8. Initialize the repository:
    terraform init
  9. Apply the infrastructure (this takes about 2 minutes):
    terraform apply
  10. Save the outputs for future use.

Set up Discord

This setup adds the Discord interactions URL to your Discord application. After terraform apply comes back successfully, move onto these steps.

  1. Open Discord Application Page -> General Information.
  2. Copy and paste the value from discord_interactions_endpoint_url into the Interactions Endpoint URL, and then save the changes.

If successful, there should be a green box with All your edits have been carefully recorded.

Docker image and Amazon Elastic Container Registry

In this section, you will create a docker image with the Stable Diffusion model.

  1. Exit the terraform repository:
    cd ..
  2. Clone the Docker build repository:
    git clone https://github.com/aws-samples/amazon-scalable-discord-diffusion.git
  3. Navigate to the Docker repository:
    cd amazon-scalable-discord-diffusion
  4. Build and push the docker image to ECR. This requires docker to be installed on the machine and actively running.
    You can find the commands for your deployment from the Amazon ECR repository.

    View push commands for Amazon ECR

    Figure 2. View push commands for Amazon ECR

This is a large image (10GB) and can take over 20 minutes to push depending on your machine’s internet connection.

Request an image with Discord Slash Commands

This section will describe how to request a text to image response with Discord.

  1. Log in to Discord and navigate to the server with your Discord application deployed.
  2. Navigate to a text channel.
  3. Type the command /sparkle.
    A box with COMMANDS MATCHING /sparkle will appear. Select the /sparkle command box.
    Depending on how you customized your Discord Application, the avatar image shown in Figure 3 might be different from what you have.

    Writing a Discord Slash Command

    Figure 3. Writing a Discord Slash Command

  4. Type in a prompt such as a corgi, style of monet.
    A response from YourBotName should appear with the response Submitted to Sparkle: YourPromptHere, as shown in Figure 4.

    First response from AWS Lambda function

    Figure 4. First response from AWS Lambda function

    It will take 10 minutes for an EC2 instance to start with an ECS Task running on the instance. Once an ECS Task is running on the instance, inference times should reduce to under 30 seconds, depending on the request.
    When an ECS Task is running your request, you will see a Processing your Sparkle message, as shown in Figure 5.

    Amazon ECS task processing a request

    Figure 5. Amazon ECS task processing a request

    The message is complete when it says Completed your Sparkle! as shown in Figure 6.

    Amazon ECS task returning the final response

    Figure 6. Amazon ECS task returning the final response

Cleaning up

To avoid incurring future charges, delete the resources created by the Terraform script.

  1. Return to the directory where you deployed your terraform script.
  2. To destroy the infrastructure in AWS, run the command terraform destroy.
  3. When prompted to confirm that you want to destroy the infrastructure, type yes and press Enter.

Conclusion

In summary, we created a solution that allows members of a Discord server to create images from text with a Stable Diffusion model. With this implementation, the deployment can scale to many Discord Servers and handle over one hundred requests per second.

Create projects on AWS that lower the bar to entry for people wanting to try text to image models.

AWS CIRT announces the release of five publicly available workshops

Post Syndicated from Steve de Vera original https://aws.amazon.com/blogs/security/aws-cirt-announces-the-release-of-five-publicly-available-workshops/

Greetings from the AWS Customer Incident Response Team (CIRT)! AWS CIRT is dedicated to supporting customers during active security events on the customer side of the AWS Shared Responsibility Model.

Over the past year, AWS CIRT has responded to hundreds of such security events, including the unauthorized use of AWS Identity and Access Management (IAM) credentials, ransomware and data deletion in an AWS account, and billing increases due to the creation of unauthorized resources to mine cryptocurrency.

We are excited to release five workshops that simulate these security events to help you learn the tools and procedures that AWS CIRT uses on a daily basis to detect, investigate, and respond to such security events. The workshops cover AWS services and tools, such as Amazon GuardDuty, Amazon CloudTrail, Amazon CloudWatch, Amazon Athena, and AWS WAF, as well as some open source tools written and published by AWS CIRT.

To access the workshops, you just need an AWS account, an internet connection, and the desire to learn more about incident response in the AWS Cloud! Choose the following links to access the workshops.

Unauthorized IAM Credential Use – Security Event Simulation and Detection

During this workshop, you will simulate the unauthorized use of IAM credentials by using a script invoked within AWS CloudShell. The script will perform reconnaissance and privilege escalation activities that have been commonly seen by AWS CIRT and that are typically performed during similar events of this nature. You will also learn some tools and processes that AWS CIRT uses, and how to use these tools to find evidence of unauthorized activity by using IAM credentials.

Ransomware on S3 – Security Event Simulation and Detection

During this workshop, you will use an AWS CloudFormation template to replicate an environment with multiple IAM users and five Amazon Simple Storage Service (Amazon S3) buckets. AWS CloudShell will then run a bash script that simulates data exfiltration and data deletion events that replicate a ransomware-based security event. You will also learn the tools and processes that AWS CIRT uses to respond to similar events, and how to use these tools to find evidence of unauthorized S3 bucket and object deletions.

Cryptominer Based Security Events – Simulation and Detection

During this workshop, you will simulate a cryptomining security event by using a CloudFormation template to initialize three Amazon Elastic Compute Cloud (Amazon EC2) instances. These EC2 instances will mimic cryptomining activity by performing DNS requests to known cryptomining domains. You will also learn the tools and processes that AWS CIRT uses to respond to similar events, and how to use these tools to find evidence of unauthorized creation of EC2 instances and communication with known cryptomining domains.

SSRF on IMDSv1 – Simulation and Detection

During this workshop, you will simulate the unauthorized use of a web application that is hosted on an EC2 instance configured to use Instance Metadata Service Version 1 (IMDSv1) and vulnerable to server side request forgery (SSRF). You will learn how web application vulnerabilities, such as SSRF, can be used to obtain credentials from an EC2 instance. You will also learn the tools and processes that AWS CIRT uses to respond to this type of access, and how to use these tools to find evidence of the unauthorized use of EC2 instance credentials through web application vulnerabilities such as SSRF.

AWS CIRT Toolkit For Automating Incident Response Preparedness

During this workshop, you will install and experiment with some common tools and utilities that AWS CIRT uses on a daily basis to detect security misconfigurations, respond to active events, and assist customers with protecting their infrastructure.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Steve de Vera

Steve is the Incident Response Watch Lead for the US Pacific region of the AWS CIRT. He is passionate about American-style BBQ and is a certified competition BBQ judge. He has a dog named Brisket.

Enabling load-balancing of non-HTTP(s) traffic on AWS Wavelength

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/enabling-load-balancing-of-non-https-traffic-on-aws-wavelength/

This blog post is written by Jack Chen, Telco Solutions Architect, and Robert Belson, Developer Advocate.

AWS Wavelength embeds AWS compute and storage services within 5G networks, providing mobile edge computing infrastructure for developing, deploying, and scaling ultra-low-latency applications. AWS recently introduced support for Application Load Balancer (ALB) in AWS Wavelength zones. Although ALB addresses Layer-7 load balancing use cases, some low latency applications that get deployed in AWS Wavelength Zones rely on UDP-based protocols, such as QUIC, WebRTC, and SRT, which can’t be load-balanced by Layer-7 Load Balancers. In this post, we’ll review popular load-balancing patterns on AWS Wavelength, including a proposed architecture demonstrating how DNS-based load balancing can address customer requirements for load-balancing non-HTTP(s) traffic across multiple Amazon Elastic Compute Cloud (Amazon EC2) instances. This solution also builds a foundation for automatic scale-up and scale-down capabilities for workloads running in an AWS Wavelength Zone.

Load balancing use cases in AWS Wavelength

In the AWS Regions, customers looking to deploy highly-available edge applications often consider Amazon Elastic Load Balancing (Amazon ELB) as an approach to automatically distribute incoming application traffic across multiple targets in one or more Availability Zones (AZs). However, at the time of this publication, AWS-managed Network Load Balancer (NLB) isn’t supported in AWS Wavelength Zones and ALB is being rolled out to all AWS Wavelength Zones globally. As a result, this post will seek to document general architectural guidance for load balancing solutions on AWS Wavelength.

As one of the most prominent AWS Wavelength use cases, highly-immersive video streaming over UDP using protocols such as WebRTC at scale often require a load balancing solution to accommodate surges in traffic, either due to live events or general customer access patterns. These use cases, relying on Layer-4 traffic, can’t be load-balanced from a Layer-7 ALB. Instead, Layer-4 load balancing is needed.

To date, two infrastructure deployments involving Layer-4 load balancers are most often seen:

  • Amazon EC2-based deployments: Often the environment of choice for earlier-stage enterprises and ISVs, a fleet of EC2 instances will leverage a load balancer for high-throughput use cases, such as video streaming, data analytics, or Industrial IoT (IIoT) applications
  • Amazon EKS deployments: Customers looking to optimize performance and cost efficiency of their infrastructure can leverage containerized deployments at the edge to manage their AWS Wavelength Zone applications. In turn, external load balancers could be configured to point to exposed services via NodePort objects. Furthermore, a more popular choice might be to leverage the AWS Load Balancer Controller to provision an ALB when you create a Kubernetes Ingress.

Regardless of deployment type, the following design constraints must be considered:

  • Target registration: For load balancing solutions not managed by AWS, seamless solutions to load balancer target registration must be managed by the customer. As one potential solution, visit a recent HAProxyConf presentation, Practical Advice for Load Balancing at the Network Edge.
  • Edge Discovery: Although DNS records can be populated into Amazon Route 53 for each carrier-facing endpoint, DNS won’t deterministically route mobile clients to the most optimal mobile endpoint. When available, edge discovery services are required to most effectively route mobile clients to the lowest latency endpoint.
  • Cross-zone load balancing: Given the hub-and-spoke design of AWS Wavelength, customer-managed load balancers should proxy traffic only to that AWS Wavelength Zone.

Solution overview – Amazon EC2

In this solution, we’ll present a solution for a highly-available load balancing solution in a single AWS Wavelength Zone for an Amazon EC2-based deployment. In a separate post, we’ll cover the needed configurations for the AWS Load Balancer Controller in AWS Wavelength for Amazon Elastic Kubernetes Service (Amazon EKS) clusters.

The proposed solution introduces DNS-based load balancing, a technique to abstract away the complexity of intelligent load-balancing software and allow your Domain Name System (DNS) resolvers to distribute traffic (equally, or in a weighted distribution) to your set of endpoints.

Our solution leverages the weighted routing policy in Route 53 to resolve inbound DNS queries to multiple EC2 instances running within an AWS Wavelength zone. As EC2 instances for a given workload get deployed in an AWS Wavelength zone, Carrier IP addresses can be assigned to the network interfaces at launch.

Through this solution, Carrier IP addresses attached to AWS Wavelength instances are automatically added as DNS records for the customer-provided public hosted zone.

To determine how Route 53 responds to queries, given an arbitrary number of records of a public hosted zone, Route53 offers numerous routing policies:

Simple routing policy – In the event that you must route traffic to a single resource in an AWS Wavelength Zone, simple routing can be used. A single record can contain multiple IP addresses, but Route 53 returns the values in a random order to the client.

Weighted routing policy – To route traffic more deterministically using a set of proportions that you specify, this policy can be selected. For example, if you would like Carrier IP A to receive 50% of the traffic and Carrier IP B to receive 50% of the traffic, we’ll create two individual A records (one for each Carrier IP) with a weight of 50 and 50, respectively. Learn more about Route 53 routing policies by visiting the Route 53 Developer Guide.

The proposed solution leverages weighted routing policy in Route 53 DNS to route traffic to multiple EC2 instances running within an AWS Wavelength zone.

Reference architecture

The following diagram illustrates the load-balancing component of the solution, where EC2 instances in an AWS Wavelength zone are assigned Carrier IP addresses. A weighted DNS record for a host (e.g., www.example.com) is updated with Carrier IP addresses.

DNS-based load balancing

When a device makes a DNS query, it will be returned to one of the Carrier IP addresses associated with the given domain name. With a large number of devices, we expect a fair distribution of load across all EC2 instances in the resource pool. Given the highly ephemeral mobile edge environments, it’s likely that Carrier IPs could frequently be allocated to accommodate a workload and released shortly thereafter. However, this unpredictable behavior could yield stale DNS records, resulting in a “blackhole” – routes to endpoints that no longer exist.

Time-To-Live (TTL) is a DNS attribute that specifies the amount of time, in seconds, that you want DNS recursive resolvers to cache information about this record.

In our example, we should set to 30 seconds to force DNS resolvers to retrieve the latest records from the authoritative nameservers and minimize stale DNS responses. However, a lower TTL has a direct impact on cost, as a result of increased number of calls from recursive resolvers to Route53 to constantly retrieve the latest records.

The core components of the solution are as follows:

Alongside the services above in the AWS Wavelength Zone, the following services are also leveraged in the AWS Region:

  • AWS Lambda – a serverless event-driven function that makes API calls to the Route 53 service to update DNS records.
  • Amazon EventBridge– a serverless event bus that reacts to EC2 instance lifecycle events and invokes the Lambda function to make DNS updates.
  • Route 53– cloud DNS service with a domain record pointing to AWS Wavelength-hosted resources.

In this post, we intentionally leave the specific load balancing software solution up to the customer. Customers can leverage various popular load balancers available on the AWS Marketplace, such as HAProxy and NGINX. To focus our solution on the auto-registration of DNS records to create functional load balancing, this solution is designed to support stateless workloads only. To support stateful workloads, sticky sessions – a process in which routes requests to the same target in a target group – must be configured by the underlying load balancer solution and are outside of the scope of what DNS can provide natively.

Automation overview

Using the aforementioned components, we can implement the following workflow automation:

Event-driven Auto Scaling Workflow

Amazon CloudWatch alarm can trigger the Auto Scaling group Scale out or Scale in event by adding or removing EC2 instances. Eventbridge will detect the EC2 instance state change event and invoke the Lambda function. This function will update the DNS record in Route53 by either adding (scale out) or deleting (scale in) a weighted A record associated with the EC2 instance changing state.

Configuration of the automatic auto scaling policy is out of the scope of this post. There are many auto scaling triggers that you can consider using, based on predefined and custom metrics such as memory utilization. For the demo purposes, we will be leveraging manual auto scaling.

In addition to the core components that were already described, our solution also utilizes AWS Identity and Access Management (IAM) policies and CloudWatch. Both services are key components to building AWS Well-Architected solutions on AWS. We also use AWS Systems Manager Parameter Store to keep track of user input parameters. The deployment of the solution is automated via AWS CloudFormation templates. The Lambda function provided should be uploaded to an AWS Simple Storage Service (Amazon S3) bucket.

Amazon Virtual Private Cloud (Amazon VPC), subnets, Carrier Gateway, and Route Tables are foundational building blocks for AWS-based networking infrastructure. In our deployment, we are creating a new VPC, one subnet in an AWS Wavelength zone of your choice, a Carrier Gateway, and updating the route table for this subnet to point the default route to the Carrier Gateway.

Wavelength VPC architecture.

Deployment prerequisites

The following are prerequisites to deploy the described solution in your account:

  • Access to an AWS Wavelength zone. If your account is not allow-listed to use AWS Wavelength zones, then opt-in to AWS Wavelength zones here.
  • Public DNS Hosted Zone hosted in Route 53. You must have access to a registered public domain to deploy this solution. The zone for this domain should be hosted in the same account where you plan to deploy AWS Wavelength workloads.
    If you don’t have a public domain, then you can register a new one. Note that there will be a service charge for the domain registration.
  • Amazon S3 bucket. For the Lambda function that updates DNS records in Route 53, store the source code as a .zip file in an Amazon S3 bucket.
  • Amazon EC2 Key pair. You can use an existing Key pair for the deployment. If you don’t have a KeyPair in the region where you plan to deploy this solution, then create one by following these instructions.
  • 4G or 5G-connected device. Although the infrastructure can be deployed independent of the underlying connected devices, testing the connectivity will require a mobile device on one of the Wavelength partner’s networks. View the complete list of Telecommunications providers and Wavelength Zone locations to learn more.

Conclusion

In this post, we demonstrated how to implement DNS-based load balancing for workloads running in an AWS Wavelength zone. We deployed the solution that used the EventBridge Rule and the Lambda function to update DNS records hosted by Route53. If you want to learn more about AWS Wavelength, subscribe to AWS Compute Blog channel here.