Tag Archives: Amazon EC2

New – Trigger a Kernel Panic to Diagnose Unresponsive EC2 Instances

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-trigger-a-kernel-panic-to-diagnose-unresponsive-ec2-instances/

When I was working on systems deployed in on-premises data centers, it sometimes happened I had to debug an unresponsive server. It usually involved asking someone to physically press a non-maskable interrupt (NMI) button on the frozen server or to send a signal to a command controller over a serial interface (yes, serial, such as in RS-232).This command triggered the system to dump the state of the frozen kernel to a file for further analysis. Such a file is usually called a core dump or a crash dump. The crash dump includes an image of the memory of the crashed process, the system registers, program counter, and other information useful in determining the root cause of the freeze.

Today, we are announcing a new Amazon Elastic Compute Cloud (EC2) API allowing you to remotely trigger the generation of a kernel panic on EC2 instances. The EC2:SendDiagnosticInterrupt API sends a diagnostic interrupt, similar to pressing a NMI button on a physical machine, to a running EC2 instance. It causes the instance’s hypervisor to send a non-maskable interrupt (NMI) to the operating system. The behaviour of your operating system when a NMI interrupt is received depends on its configuration. Typically, it involves entering into kernel panic. The kernel panic behaviour also depends on the operating system configuration, it might trigger the generation of the crash dump data file, obtain a backtrace, load a replacement kernel or restart the system.

You can control who in your organisation is authorized to use that API through IAM Policies, I will give an example below.

Cloud and System Engineers, or specialists in kernel diagnosis and debugging, find in the crash dump invaluable information to analyse the causes of a kernel freeze. Tools like WinDbg (on Windows) and crash (on Linux) can be used to inspect the dump.

Using Diagnostic Interrupt
Using this API is a three step process. First you need to configure the behavior of your OS when it receives the interrupt.

By default, our Windows Server AMIs have memory dump already turned on. Automatic restart after the memory dump has been saved is also selected. The default location for the memory dump file is %SystemRoot% which is equivalent to C:\Windows.

You can access these options by going to :
Start > Control Panel > System > Advanced System Settings > Startup and Recovery

On Amazon Linux 2, you need to install and configurekdump & kexec. This is a one-time setup.

$ sudo yum install kexec-tools

Then edit the file /etc/default/grub to allocate the amount of memory to be reserved for the crash kernel. In this example, we reserve 160M by adding crashkernel=160M. The amount of memory to allocate depends on your instance’s memory size. The general recommendation is to test kdump to see if the allocated memory is sufficient. The kernel doc has the full syntax of the crashkernel kernel parameter.

GRUB_CMDLINE_LINUX_DEFAULT="crashkernel=160M console=tty0 console=ttyS0,115200n8 net.ifnames=0 biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0"

And rebuild the grub configuration:

$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Finally edit /etc/sysctl.conf and add a line : kernel.unknown_nmi_panic=1. This tells the kernel to trigger a kernel panic upon receiving the interrupt.

You are now ready to reboot your instance. Be sure to include these commands in your user data script or in your AMI to automatically configure this on all your instances. Once the instance is rebooted, verify that kdump is correctly started.

$ systemctl status kdump.service
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: active (exited) since Fri 2019-07-05 15:09:04 UTC; 3h 13min ago
  Process: 2494 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
 Main PID: 2494 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/kdump.service

Jul 05 15:09:02 ip-172-31-15-244.ec2.internal systemd[1]: Starting Crash recovery kernel arming...
Jul 05 15:09:04 ip-172-31-15-244.ec2.internal kdumpctl[2494]: kexec: loaded kdump kernel
Jul 05 15:09:04 ip-172-31-15-244.ec2.internal kdumpctl[2494]: Starting kdump: [OK]
Jul 05 15:09:04 ip-172-31-15-244.ec2.internal systemd[1]: Started Crash recovery kernel arming.

Our documentation contains the instructions for other operating systems.

Once this one-time configuration is done, you’re ready for the second step, to trigger the API. You can do this from any machine where the AWS CLI or SDK is configured. For example :

$ aws ec2 send-diagnostic-interrupt --region us-east-1 --instance-id <value>

There is no return value from the CLI, this is expected. If you have a terminal session open on that instance, it disconnects. Your instance reboots. You reconnect to your instance, you find the crash dump in /var/crash.

The third and last step is to analyse the content of the crash dump. On Linux systems, you need to install the crash utility and the debugging symbols for your version of the kernel. Note that the kernel version should be the same that was captured by kdump. To find out which kernel you are currently running, use the uname -r command.

$ sudo yum install crash
$ sudo debuginfo-install kernel
$ sudo crash /usr/lib/debug/lib/modules/4.14.128-112.105.amzn2.x86_64/vmlinux /var/crash/127.0.0.1-2019-07-05-15\:08\:43/vmcore

crash 7.2.6-1.amzn2.0.1
... output suppressed for brevity ...

      KERNEL: /usr/lib/debug/lib/modules/4.14.128-112.105.amzn2.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2019-07-05-15:08:43/vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Fri Jul  5 15:08:38 2019
      UPTIME: 00:07:23
LOAD AVERAGE: 0.00, 0.00, 0.00
       TASKS: 104
    NODENAME: ip-172-31-15-244.ec2.internal
     RELEASE: 4.14.128-112.105.amzn2.x86_64
     VERSION: #1 SMP Wed Jun 19 16:53:40 UTC 2019
     MACHINE: x86_64  (2500 Mhz)
      MEMORY: 7.9 GB
       PANIC: "Kernel panic - not syncing: NMI: Not continuing"
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffff82013480  (1 of 2)  [THREAD_INFO: ffffffff82013480]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

Collecting kernel crash dumps is often the only way to collect kernel debugging information, be sure to test this procedure frequently, in particular after updating your operating system or when you will create new AMIs.

Control Who Is Authorized to Send Diagnostic Interrupt
You can control who in your organisation is authorized to send the Diagnostic Interrupt, and to which instances, through IAM policies with resource-level permissions, like in the example below.

{
   "Version": "2012-10-17",
   "Statement": [
      {
      "Effect": "Allow",
      "Action": "ec2:SendDiagnosticInterrupt",
      "Resource": "arn:aws:ec2:region:account-id:instance/instance-id"
      }
   ]
}

Pricing
There are no additional charges for using this feature. However, as your instance continues to be in a ‘running’ state after it receives the diagnostic interrupt, instance billing will continue as usual.

Availability
You can send Diagnostic Interrupts to all EC2 instances powered by the AWS Nitro System, except A1 (Arm-based). This is C5, C5d, C5n, i3.metal, I3en, M5, M5a, M5ad, M5d, p3dn.24xlarge, R5, R5a, R5ad, R5d, T3, T3a, and Z1d as I write this.

The Diagnostic Interrupt API is now available in all public AWS Regions and GovCloud (US), you can start to use it today.

— seb

Introducing the capacity-optimized allocation strategy for Amazon EC2 Spot Instances

Post Syndicated from Chad Schmutzer original https://aws.amazon.com/blogs/compute/introducing-the-capacity-optimized-allocation-strategy-for-amazon-ec2-spot-instances/

AWS announces the new capacity-optimized allocation strategy for Amazon EC2 Auto Scaling and EC2 Fleet. This new strategy automatically makes the most efficient use of spare capacity while still taking advantage of the steep discounts offered by Spot Instances. It’s a new way for you to gain easy access to extra EC2 compute capacity in the AWS Cloud.

This post compares how the capacity-optimized allocation strategy deploys capacity compared to the current lowest-price allocation strategy.

Overview

Spot Instances are spare EC2 compute capacity in the AWS Cloud available to you at savings of up to 90% off compared to On-Demand prices. The only difference between On-Demand Instances and Spot Instances is that Spot Instances can be interrupted by EC2 with two minutes of notification when EC2 needs the capacity back.

When making requests for Spot Instances, customers can take advantage of allocation strategies within services such as EC2 Auto Scaling and EC2 Fleet. The allocation strategy determines how the Spot portion of your request is fulfilled from the possible Spot Instance pools you provide in the configuration.

The existing allocation strategy available in EC2 Auto Scaling and EC2 Fleet is called “lowest-price” (with an option to diversify across N pools). This strategy allocates capacity strictly based on the lowest-priced Spot Instance pool or pools. The “diversified” allocation strategy (available in EC2 Fleet but not in EC2 Auto Scaling) spreads your Spot Instances across all the Spot Instance pools you’ve specified as evenly as possible.

As the AWS global infrastructure has grown over time in terms of geographic Regions and Availability Zones as well as the raw number of EC2 Instance families and types, so has the amount of spare EC2 capacity. Therefore it is important that customers have access to tools to help them utilize spare EC2 capacity optimally. The new capacity-optimized strategy for both EC2 Auto Scaling and EC2 Fleet provisions Spot Instances from the most-available Spot Instance pools by analyzing capacity metrics.

Walkthrough

To illustrate how the capacity-optimized allocation strategy deploys capacity compared to the existing lowest-price allocation strategy, here are examples of Auto Scaling group configurations and use cases for each strategy.

Lowest-price (diversified over N pools) allocation strategy

The lowest-price allocation strategy deploys Spot Instances from the pools with the lowest price in each Availability Zone. This strategy has an optional modifier SpotInstancePools that provides the ability to diversify over the N lowest-priced pools in each Availability Zone.

Spot pricing changes slowly over time based on long-term trends in supply and demand, but capacity fluctuates in real time. The lowest-price strategy does not account for pool capacity depth as it deploys Spot Instances.

As a result, the lowest-price allocation strategy is a good choice for workloads with a low cost of interruption that want the lowest possible prices, such as:

  • Time-insensitive workloads
  • Extremely transient workloads
  • Workloads that are easily check-pointed and restarted

Example

The following example configuration shows how capacity could be allocated in an Auto Scaling group using the lowest-price allocation strategy diversified over two pools:

{
  "AutoScalingGroupName": "runningAmazonEC2WorkloadsAtScale",
  "MixedInstancesPolicy": {
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "my-launch-template",
        "Version": "$Latest"
      },
      "Overrides": [
        {
          "InstanceType": "c3.large"
        },
        {
          "InstanceType": "c4.large"
        },
        {
          "InstanceType": "c5.large"
        }
      ]
    },
    "InstancesDistribution": {
      "OnDemandPercentageAboveBaseCapacity": 0,
      "SpotAllocationStrategy": "lowest-price",
      "SpotInstancePools": 2
    }
  },
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 60,
  "HealthCheckType": "EC2",
  "VPCZoneIdentifier": "subnet-a1234567890123456,subnet-b1234567890123456,subnet-c1234567890123456"
}

In this configuration, you request 60 Spot Instances because DesiredCapacity is set to 60 and OnDemandPercentageAboveBaseCapacity is set to 0. The example follows Spot best practices and is flexible across c3.large, c4.large, and c5.large in us-east-1a, us-east-1b, and us-east-1c (mapped according to the subnets in VPCZoneIdentifier). The Spot allocation strategy is set to lowest-price over two SpotInstancePools.

First, EC2 Auto Scaling tries to make sure that it balances the requested capacity across all the Availability Zones provided in the request. To do so, it splits the target capacity request of 60 across the three zones. Then, the lowest-price allocation strategy allocates the Spot Instance launches to the lowest-priced pool per zone.

Using the example Spot prices shown in the following table, the resulting allocation is:

  • 20 Spot Instances from us-east-1a (10 c3.large, 10 c4.large)
  • 20 Spot Instances from us-east-1b (10 c3.large, 10 c4.large)
  • 20 Spot Instances from us-east-1c (10 c3.large, 10 c4.large)
Availability ZoneInstance typeSpot Instances allocatedSpot price
us-east-1ac3.large10$0.0294
us-east-1ac4.large10$0.0308
us-east-1ac5.large0$0.0408
us-east-1bc3.large10$0.0294
us-east-1bc4.large10$0.0308
us-east-1bc5.large0$0.0387
us-east-1cc3.large10$0.0294
us-east-1cc4.large10$0.0331
us-east-1cc5.large0$0.0353

The cost for this Auto Scaling group is $1.83/hour. Of course, the Spot Instances are allocated according to the lowest price and are not optimized for capacity. The Auto Scaling group could experience higher interruptions if the lowest-priced Spot Instance pools are not as deep as others, since upon interruption the Auto Scaling group will attempt to re-provision instances into the lowest-priced Spot Instance pools.

Capacity-optimized allocation strategy

There is a price associated with interruptions, restarting work, and checkpointing. While the overall hourly cost of capacity-optimized allocation strategy might be slightly higher, the possibility of having fewer interruptions can lower the overall cost of your workload.

The effectiveness of the capacity-optimized allocation strategy depends on following Spot best practices by being flexible and providing as many instance types and Availability Zones (Spot Instance pools) as possible in the configuration. It is also important to understand that as capacity demands change, the allocations provided by this strategy also change over time.

Remember that Spot pricing changes slowly over time based on long-term trends in supply and demand, but capacity fluctuates in real time. The capacity-optimized strategy does account for pool capacity depth as it deploys Spot Instances, but it does not account for Spot prices.

As a result, the capacity-optimized allocation strategy is a good choice for workloads with a high cost of interruption, such as:

  • Big data and analytics
  • Image and media rendering
  • Machine learning
  • High performance computing

Example

The following example configuration shows how capacity could be allocated in an Auto Scaling group using the capacity-optimized allocation strategy:

{
  "AutoScalingGroupName": "runningAmazonEC2WorkloadsAtScale",
  "MixedInstancesPolicy": {
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "my-launch-template",
        "Version": "$Latest"
      },
      "Overrides": [
        {
          "InstanceType": "c3.large"
        },
        {
          "InstanceType": "c4.large"
        },
        {
          "InstanceType": "c5.large"
        }
      ]
    },
    "InstancesDistribution": {
      "OnDemandPercentageAboveBaseCapacity": 0,
      "SpotAllocationStrategy": "capacity-optimized"
    }
  },
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 60,
  "HealthCheckType": "EC2",
  "VPCZoneIdentifier": "subnet-a1234567890123456,subnet-b1234567890123456,subnet-c1234567890123456"
}

In this configuration, you request 60 Spot Instances because DesiredCapacity is set to 60 and OnDemandPercentageAboveBaseCapacity is set to 0. The example follows Spot best practices (especially critical when using the capacity-optimized allocation strategy) and is flexible across c3.large, c4.large, and c5.large in us-east-1a, us-east-1b, and us-east-1c (mapped according to the subnets in VPCZoneIdentifier). The Spot allocation strategy is set to capacity-optimized.

First, EC2 Auto Scaling tries to make sure that the requested capacity is evenly balanced across all the Availability Zones provided in the request. To do so, it splits the target capacity request of 60 across the three zones. Then, the capacity-optimized allocation strategy optimizes the Spot Instance launches by analyzing capacity metrics per instance type per zone. This is because this strategy effectively optimizes by capacity instead of by the lowest price (hence its name).

Using the example Spot prices shown in the following table, the resulting allocation is:

  • 20 Spot Instances from us-east-1a (20 c4.large)
  • 20 Spot Instances from us-east-1b (20 c3.large)
  • 20 Spot Instances from us-east-1c (20 c5.large)
Availability ZoneInstance typeSpot Instances allocatedSpot price
us-east-1ac3.large0$0.0294
us-east-1ac4.large20$0.0308
us-east-1ac5.large0$0.0408
us-east-1bc3.large20$0.0294
us-east-1bc4.large0$0.0308
us-east-1bc5.large0$0.0387
us-east-1cc3.large0$0.0294
us-east-1cc4.large0$0.0308
us-east-1cc5.large20$0.0353

The cost for this Auto Scaling group is $1.91/hour, only 5% more than the lowest-priced example above. However, notice the distribution of the Spot Instances is different. This is because the capacity-optimized allocation strategy determined this was the most efficient distribution from an available capacity perspective.

Conclusion

Consider using the new capacity-optimized allocation strategy to make the most efficient use of spare capacity. Automatically deploy into the most available Spot Instance pools—while still taking advantage of the steep discounts provided by Spot Instances.

This allocation strategy may be especially useful for workloads with a high cost of interruption, including:

  • Big data and analytics
  • Image and media rendering
  • Machine learning
  • High performance computing

No matter which allocation strategy you choose, you still enjoy the steep discounts provided by Spot Instances. These discounts are possible thanks to the stable Spot pricing made available with the new Spot pricing model.

Chad Schmutzer is a Principal Developer Advocate for the EC2 Spot team. Follow him on twitter to get the latest updates on saving at scale with Spot Instances, to provide feedback, or just say HI.

Optimizing Amazon ECS task density using awsvpc network mode

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/optimizing-amazon-ecs-task-density-using-awsvpc-network-mode/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS recently increased the number of elastic network interfaces available when you run tasks on Amazon ECS. Use the account setting called awsvpcTrunking. If you use the Amazon EC2 launch type and task networking (awsvpc network mode), you can now run more tasks on an instance—5 to 17 times as many—as you did before.

As more of you embrace microservices architectures, you deploy increasing numbers of smaller tasks. AWS now offers you the option of more efficient packing per instance, potentially resulting in smaller clusters and associated savings.

 

Overview

To manage your own cluster of EC2 instances, use the EC2 launch type. Use task networking to run ECS tasks using the same networking properties as if tasks were distinct EC2 instances.

Task networking offers several benefits. Every task launched with awsvpc network mode has its own attached network interface, a primary private IP address, and an internal DNS hostname. This simplifies container networking and gives you more control over how tasks communicate, both with each other and with other services within their virtual private clouds (VPCs).

Task networking also lets you take advantage of other EC2 networking features like VPC Flow Logs. This feature lets you monitor traffic to and from tasks. It also provides greater security control for containers, allowing you to use security groups and network monitoring tools at a more granular level within tasks. For more information, see Introducing Cloud Native Networking for Amazon ECS Containers.

However, if you run container tasks on EC2 instances with task networking, you can face a networking limit. This might surprise you, particularly when an instance has plenty of free CPU and memory. The limit reflects the number of network interfaces available to support awsvpc network mode per container instance.

 

Raise network interface density limits with trunking

The good news is that AWS raised network interface density limits by implementing a networking feature on ECS called “trunking.” This is a technique for multiplexing data over a shared communication link.

If you’re migrating to microservices using AWS App Mesh, you should optimize network interface density. App Mesh requires awsvpc networking to provide routing control and visibility over an ever-expanding array of running tasks. In this context, increased network interface density might save money.

By opting for network interface trunking, you should see a significant increase in capacity—from 5 to 17 times more than the previous limit. For more information on the new task limits per container instance, see Supported Amazon EC2 Instance Types.

Applications with tasks not hitting CPU or memory limits also benefit from this feature through the more cost-effective “bin packing” of container instances.

 

Trunking is an opt-in feature

AWS chose to make the trunking feature opt-in due to the following factors:

  • Instance registration: While normal instance registration is straightforward with trunking, this feature increases the number of asynchronous instance registration steps that can potentially fail. Any such failures might add extra seconds to launch time.
  • Available IP addresses: The “trunk” belongs to the same subnet in which the instance’s primary network interface originates. This effectively reduces the available IP addresses and potentially the ability to scale out on other EC2 instances sharing the same subnet. The trunk consumes an IP address. With a trunk attached, there are two assigned IP addresses per instance, one for the primary interface and one for the trunk.
  • Differing customer preferences and infrastructure: If you have high CPU or memory workloads, you might not benefit from trunking. Or, you may not want awsvpc networking.

Consequently, AWS leaves it to you to decide if you want to use this feature. AWS might revisit this decision in the future, based on customer feedback. For now, your account roles or users must opt in to the awsvpcTrunking account setting to gain the benefits of increased task density per container instance.

 

Enable trunking

Enable the ECS elastic network interface trunking feature to increase the number of network interfaces that can be attached to supported EC2 container instance types. You must meet the following prerequisites before you can launch a container instance with the increased network interface limits:

  • Your account must have the AWSServiceRoleForECS service-linked role for ECS.
  • You must opt into the awsvpcTrunking  account setting.

 

Make sure that a service-linked role exists for ECS

A service-linked role is a unique type of IAM role linked to an AWS service (such as ECS). This role lets you delegate the permissions necessary to call other AWS services on your behalf. Because ECS is a service that manages resources on your behalf, you need this role to proceed.

In most cases, you won’t have to create a service-linked role. If you created or updated an ECS cluster, ECS likely created the service-linked role for you.

You can confirm that your service-linked role exists using the AWS CLI, as shown in the following code example:

$ aws iam get-role --role-name AWSServiceRoleForECS
{
    "Role": {
        "Path": "/aws-service-role/ecs.amazonaws.com/",
        "RoleName": "AWSServiceRoleForECS",
        "RoleId": "AROAJRUPKI7I2FGUZMJJY",
        "Arn": "arn:aws:iam::226767807331:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS",
        "CreateDate": "2018-11-09T21:27:17Z",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "ecs.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        },
        "Description": "Role to enable Amazon ECS to manage your cluster.",
        "MaxSessionDuration": 3600
    }
}

If the service-linked role does not exist, create it manually with the following command:

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

For more information, see Using Service-Linked Roles for Amazon ECS.

 

Opt in to the awsvpcTrunking account setting

Your account, IAM user, or role must opt in to the awsvpcTrunking account setting. Select this setting using the AWS CLI or the ECS console. You can opt in for an account by making awsvpcTrunking  its default setting. Or, you can enable this setting for the role associated with the instance profile with which the instance launches. For instructions, see Account Settings.

 

Other considerations

After completing the prerequisites described in the preceding sections, launch a new container instance with increased network interface limits using one of the supported EC2 instance types.

Keep the following in mind:

  • It’s available with the latest variant of the ECS-optimized AMI.
  • It only affects creation of new container instances after opting into awsvpcTrunking.
  • It only affects tasks created with awsvpc network mode and EC2 launch type. Tasks created with the AWS Fargate launch type always have a dedicated network interface, no matter how many you launch.

For details, see ENI Trunking Considerations.

 

Summary

If you seek to optimize the usage of your EC2 container instances for clusters that you manage, enable the increased network interface density feature with awsvpcTrunking. By following the steps outlined in this post, you can launch tasks using significantly fewer EC2 instances. This is especially useful if you embrace a microservices architecture, with its increasing numbers of lighter tasks.

Hopefully, you found this post informative and the proposed solution intriguing. As always, AWS welcomes all feedback or comment.

Using AWS App Mesh with Fargate

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/using-aws-app-mesh-with-fargate/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS App Mesh is a service mesh, which provides a framework to control and monitor services spanning multiple AWS compute environments. My previous post provided a walkthrough to get you started. In it, I showed deploying a simple microservice application to Amazon ECS and configuring App Mesh to provide traffic control and observability.

In this post, I show more advanced techniques using AWS Fargate as an ECS launch type. I show you how to deploy a specific version of the colorteller service from the previous post. Finally, I move on and explore distributing traffic across other environments, such as Amazon EC2 and Amazon EKS.

I simplified this example for clarity, but in the real world, creating a service mesh that bridges different compute environments becomes useful. Fargate is a compute service for AWS that helps you run containerized tasks using the primitives (the tasks and services) of an ECS application. This lets you work without needing to directly configure and manage EC2 instances.

 

Solution overview

This post assumes that you already have a containerized application running on ECS, but want to shift your workloads to use Fargate.

You deploy a new version of the colorteller service with Fargate, and then begin shifting traffic to it. If all goes well, then you continue to shift more traffic to the new version until it serves 100% of all requests. Use the labels “blue” to represent the original version and “green” to represent the new version. The following diagram shows programmer model of the Color App.

You want to begin shifting traffic over from version 1 (represented by colorteller-blue in the following diagram) over to version 2 (represented by colorteller-green).

In App Mesh, every version of a service is ultimately backed by actual running code somewhere, in this case ECS/Fargate tasks. Each service has its own virtual node representation in the mesh that provides this conduit.

The following diagram shows the App Mesh configuration of the Color App.

 

 

After shifting the traffic, you must physically deploy the application to a compute environment. In this demo, colorteller-blue runs on ECS using the EC2 launch type and colorteller-green runs on ECS using the Fargate launch type. The goal is to test with a portion of traffic going to colorteller-green, ultimately increasing to 100% of traffic going to the new green version.

 

AWS compute model of the Color App.

Prerequisites

Before following along, set up the resources and deploy the Color App as described in the previous walkthrough.

 

Deploy the Fargate app

To get started after you complete your Color App, configure it so that your traffic goes to colorteller-blue for now. The blue color represents version 1 of your colorteller service.

Log into the App Mesh console and navigate to Virtual routers for the mesh. Configure the HTTP route to send 100% of traffic to the colorteller-blue virtual node.

The following screenshot shows routes in the App Mesh console.

Test the service and confirm in AWS X-Ray that the traffic flows through the colorteller-blue as expected with no errors.

The following screenshot shows racing the colorgateway virtual node.

 

Deploy the new colorteller to Fargate

With your original app in place, deploy the send version on Fargate and begin slowly increasing the traffic that it handles rather than the original. The app colorteller-green represents version 2 of the colorteller service. Initially, only send 30% of your traffic to it.

If your monitoring indicates a healthy service, then increase it to 60%, then finally to 100%. In the real world, you might choose more granular increases with automated rollout (and rollback if issues arise), but this demonstration keeps things simple.

You pushed the gateway and colorteller images to ECR (see Deploy Images) in the previous post, and then launched ECS tasks with these images. For this post, launch an ECS task using the Fargate launch type with the same colorteller and envoy images. This sets up the running envoy container as a sidecar for the colorteller container.

You don’t have to manually configure the EC2 instances in a Fargate launch type. Fargate automatically colocates the sidecar on the same physical instance and lifecycle as the primary application container.

To begin deploying the Fargate instance and diverting traffic to it, follow these steps.

 

Step 1: Update the mesh configuration

You can download updated AWS CloudFormation templates located in the repo under walkthroughs/fargate.

This updated mesh configuration adds a new virtual node (colorteller-green-vn). It updates the virtual router (colorteller-vr) for the colorteller virtual service so that it distributes traffic between the blue and green virtual nodes at a 2:1 ratio. That is, the green node receives one-third of the traffic.

$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Step 2: Deploy the green task to Fargate

The fargate-colorteller.sh script creates parameterized template definitions before deploying the fargate-colorteller.yaml CloudFormation template. The change to launch a colorteller task as a Fargate task is in fargate-colorteller-task-def.json.

$ ./fargate-colorteller.sh
...

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-fargate-colorteller
$

 

Verify the Fargate deployment

The ColorApp endpoint is one of the CloudFormation template’s outputs. You can view it in the stack output in the AWS CloudFormation console, or fetch it with the AWS CLI:

$ colorapp=$(aws cloudformation describe-stacks --stack-name=$ENVIRONMENT_NAME-ecs-colorapp --query="Stacks[0
].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp> ].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp
http://DEMO-Publi-YGZIJQXL5U7S-471987363.us-west-2.elb.amazonaws.com

Assign the endpoint to the colorapp environment variable so you can use it for a few curl requests:

$ curl $colorapp/color
{"color":"blue", "stats": {"blue":1}}
$

The 2:1 weight of blue to green provides predictable results. Clear the histogram and run it a few times until you get a green result:

$ curl $colorapp/color/clear
cleared

$ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done

0: {"color":"blue", "stats": {"blue":1}}
1: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"blue", "stats": {"blue":0.67,"green":0.33}}
3: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
4: {"color":"blue", "stats": {"blue":0.6,"green":0.4}}
5: {"color":"gre
en", "stats": {"blue":0.5,"green":0.5}}
6: {"color":"blue", "stats": {"blue":0.57,"green":0.43}}
7: {"color":"blue", "stats": {"blue":0.63,"green":0.38}}
8: {"color":"green", "stats": {"blue":0.56,"green":0.44}}
...
199: {"color":"blue", "stats": {"blue":0.66,"green":0.34}}

This reflects the expected result for a 2:1 ratio. Check everything on your AWS X-Ray console.

The following screenshot shows the X-Ray console map after the initial testing.

The results look good: 100% success, no errors.

You can now increase the rollout of the new (green) version of your service running on Fargate.

Using AWS CloudFormation to manage your stacks lets you keep your configuration under version control and simplifies the process of deploying resources. AWS CloudFormation also gives you the option to update the virtual route in appmesh-colorapp.yaml and deploy the updated mesh configuration by running appmesh-colorapp.sh.

For this post, use the App Mesh console to make the change. Choose Virtual routers for appmesh-mesh, and edit the colorteller-route. Update the HTTP route so colorteller-blue-vn handles 33.3% of the traffic and colorteller-green-vn now handles 66.7%.

Run your simple verification test again:

$ curl $colorapp/color/clear
cleared
fargate $ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done
0: {"color":"green", "stats": {"green":1}}
1: {"color":"blue", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"green", "stats": {"blue":0.33,"green":0.67}}
3: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
4: {"color":"green", "stats": {"blue":0.2,"green":0.8}}
5: {"color":"green", "stats": {"blue":0.17,"green":0.83}}
6: {"color":"blue", "stats": {"blue":0.29,"green":0.71}}
7: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
...
199: {"color":"green", "stats": {"blue":0.32,"green":0.68}}
$

If your results look good, double-check your result in the X-Ray console.

Finally, shift 100% of your traffic over to the new colorteller version using the same App Mesh console. This time, modify the mesh configuration template and redeploy it:

appmesh-colorteller.yaml
  ColorTellerRoute:
    Type: AWS::AppMesh::Route
    DependsOn:
      - ColorTellerVirtualRouter
      - ColorTellerGreenVirtualNode
    Properties:
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: colorteller-vr
      RouteName: colorteller-route
      Spec:
        HttpRoute:
          Action:
            WeightedTargets:
              - VirtualNode: colorteller-green-vn
                Weight: 1
          Match:
            Prefix: "/"
$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Again, repeat your verification process in both the CLI and X-Ray to confirm that the new version of your service is running successfully.

 

Conclusion

In this walkthrough, I showed you how to roll out an update from version 1 (blue) of the colorteller service to version 2 (green). I demonstrated that App Mesh supports a mesh spanning ECS services that you ran as EC2 tasks and as Fargate tasks.

In my next walkthrough, I will demonstrate that App Mesh handles even uncontainerized services launched directly on EC2 instances. It provides a uniform and powerful way to control and monitor your distributed microservice applications on AWS.

If you have any questions or feedback, feel free to comment below.

New: Using Amazon EC2 Instance Connect for SSH access to your EC2 Instances

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/new-using-amazon-ec2-instance-connect-for-ssh-access-to-your-ec2-instances/

This post is courtesy of Saloni Sonpal – Senior Product Manager – Amazon EC2

Today, AWS is introducing Amazon EC2 Instance Connect, a new way to control SSH access to your EC2 instances using AWS Identity and Access Management (IAM).

About Amazon EC2 Instance Connect

While infrastructure as code (IaC) tools such as Chef and Puppet have become customary in the industry for configuring servers, you occasionally must access your instances to fine-tune, consult system logs, or debug application issues. The most common tool to connect to Linux servers is Secure Shell (SSH). It was created in 1995 and is now installed by default on almost every Linux distribution.

When connecting to hosts via SSH, SSH key pairs are often used to individually authorize users. As a result, organizations have to store, share, manage access for, and maintain these SSH keys.

Some organizations also maintain bastion hosts, which help limit network access into hosts by the use of a single jump point. They provide logging and prevent rogue SSH access by adding an additional layer of network obfuscation. However, running bastion hosts comes with challenges. You maintain the installed user keys, handle rotation, and make sure that the bastion host is always available and, more importantly, secured.

Amazon EC2 Instance Connect simplifies many of these issues and provides the following benefits to help improve your security posture:

  • Centralized access control – You get centralized access control to your EC2 instances on a per-user and per-instance level. IAM policies and principals remove the need to share and manage SSH keys.
  • Short-lived keys – SSH keys are not persisted on the instance, but are ephemeral in nature. They are only accessible by an instance at the time that an authorized user connects, making it easier to grant or revoke access in real time. This also allows you to move away from long-lived keys. Instead, you generate one-time SSH keys each time that an authorized user connects, eliminating the need to track and maintain keys.
  • Auditability – User connections via EC2 Instance Connect are logged to AWS CloudTrail, providing the visibility needed to easily audit connection requests and maintain compliance.
  • Ubiquitous access – EC2 Instance Connect works seamlessly with your existing SSH client. You can also connect to your instances from a new browser-based SSH client in the EC2 console, providing a consistent experience without having to change your workflows or tools.

How EC2 Instance Connect works

When the EC2 Instance Connect feature is enabled on an instance, the SSH daemon (sshd) on that instance is configured with a custom AuthorizedKeysCommand script. This script updates AuthorizedKeysCommand to read SSH public keys from instance metadata during the SSH authentication process, and connects you to the instance.

The SSH public keys are only available for one-time use for 60 seconds in the instance metadata. To connect to the instance successfully, you must connect using SSH within this time window. Because the keys expire, there is no need to track or manage these keys directly, as you did previously.

Configuring an EC2 instance for EC2 Instance Connect

To get started using EC2 Instance Connect, you first configure your existing instances. Currently, EC2 Instance Connect supports Amazon Linux 2 and Ubuntu. Install RPM or Debian packages respectively to enable the feature. New Amazon Linux 2 instances have the EC2 Instance Connect feature enabled by default, so you can connect to those newly launched instances right away using SSH without any further configuration.

First, configure an existing instance. In this case, set up an Amazon Linux 2 instance running in your account. For the steps for Ubuntu, see Set Up EC2 Instance Connect.

  1. Connect to the instance using SSH. The instance is running a relatively recent version of Amazon Linux 2:
    [[email protected] ~]$ uname -srv
    Linux 4.14.104-95.84.amzn2.x86_64 #1 SMP Fri Jun 21 12:40:53 UTC 2019
  2. Use the yum command to install the ec2-instance-connect RPM package.
    $ sudo yum install ec2-instance-connect
    Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
    Resolving Dependencies
    --> Running transaction check
    ---> Package ec2-instance-connect.noarch 0:1.1-9.amzn2 will be installed
    --> Finished Dependency Resolution
    
    ........
    
    Installed:
      ec2-instance-connect.noarch 0:1.1-9.amzn2                                                                                                     
    
    Complete!

    This RPM installs a few scripts locally and changes the AuthorizedKeysCommand and AuthorizedKeysCommandUser configurations in /etc/ssh/sshd_config. If you are using a configuration management tool to manage your sshd configuration, install the package and add the lines as described in the documentation.

With ec2-instance-connect installed, you are ready to set up your users and have them connect to instances.

Set up IAM users

First, allow an IAM user to be able to push their SSH keys up to EC2 Instance Connect. Create a new IAM policy so that you can add it to any other users in your organization.

  1. In the IAM console, choose Policies, Create Policy.
  2. On the Create Policy page, choose JSON and enter the following JSON document. Replace $REGION and $ACCOUNTID with your own values:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ec2-instance-connect:SendSSHPublicKey"
                ],
                "Resource": [
                    "arn:aws:ec2:$REGION:$ACCOUNTID:instance/*"
                ],
                "Condition": {
                    "StringEquals": {
                        "ec2:osuser": "ec2-user"
                    }
                }
            }
        ]
    }

    Use ec2-user as the value for ec2:osuser with Amazon Linux 2. Specify this so that the metadata is made available for the proper SSH user. For more information, see Actions, Resources, and Condition Keys for Amazon EC2 Instance Connect Service.

  3. Choose Review Policy.
  4. On the Review Policy page, name the policy, give it a description, and choose Create Policy.
  5. On the Create Policy page, attach the policy to something by creating a new group (I named mine “HostAdmins”).
  6. On the Attach Policy page, attach the policy that you just created and choose Next Step.
  7. Choose Create Group.
  8. On the Groups page, select your newly created group and choose Group Actions, Add Users to Group.
  9. Select the user or users to add to this group, then choose Add Users.

Your users can now use EC2 Instance Connect.

Connecting to an instance using EC2 Instance Connect

With your instance configured and the users set with the proper policy, connect to your instance with your normal SSH client or directly, using the AWS Management Console.

To offer a seamless SSH experience, EC2 Instance Connect wraps up these steps in a command line tool. It also offers a browser-based interface in the console, which takes care of the SSH key generation and distribution for you.

To connect with your SSH client

  1. Generate the new private and public keys mynew_key and mynew_key.pub, respectively:
    $ ssh-keygen -t rsa -t mynew_key 
  2. Use the following AWS CLI command to authorize the user and push the public key to the instance using the send-ssh-public-key command. To support this, you need the latest version of the AWS CLI.
    $ aws ec2-instance-connect send-ssh-public-key --region us-east-1 --instance-id i-0989ec3292613a4f9 --availability-zone us-east-1f --instance-os-user ec2-user --ssh-public-key file://mynew_key.pub
    {
        "RequestId": "505f8675-710a-11e9-9263-4d440e7745c6", 
        "Success": true
    } 
  3. After authentication, the public key is made available to the instance through the instance metadata for 60 seconds. During this time, connect to the instance using the associated private key:
    $ ssh -i mynew_key [email protected]

If for some reason you don’t connect within that 60-second window, you see the following error:

$ ssh -i mynew_key [email protected]
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

If you do, run the send-ssh-public-key command again to connect using SSH.

Now, connect to your instance from the console.

To connect from the Amazon EC2 console

  1. Open the Amazon EC2 console.
  2. In the left navigation pane, choose Instances and select the instance to which to connect.
  3. Choose Connect.
  4. On the Connect To Your Instance page, choose EC2 Instance Connect (browser-based SSH connection), Connect.

The following terminal window opens and you are now connected through SSH to your instance.

Auditing with CloudTrail

For every connection attempt, you can also view the event details. These include the destination instance ID, OS user name, and public key, all used to make the SSH connection that corresponds to the SendSSHPublicKey API calls in CloudTrail.

In the CloudTrail console, search for SendSSHPublicKey.

If EC2 Instance Connect has been used recently, you should see records of your users having called this API operation to send their SSH key to the target host. Viewing the event’s details shows you the instance and other valuable information that you might want to use for auditing.

In the following example, see the JSON from a CloudTrail event that shows the SendSSHPublicKey command in use:

{
    "eventVersion": "1.05",
    "userIdentity": {
        "type": "User",
        "principalId": "1234567890",
        "arn": "arn:aws:iam:: 1234567890:$USER",
        "accountId": "1234567890",
        "accessKeyId": "ABCDEFGHIJK3RFIONTQQ",
        "userName": "$ACCOUNT_NAME",
        "sessionContext": {
            "attributes": {
                "mfaAuthenticated": "true",
                "creationDate": "2019-05-07T18:35:18Z"
            }
        }
    },
    "eventTime": "2019-06-21T13:58:32Z",
    "eventSource": "ec2-instance-connect.amazonaws.com",
    "eventName": "SendSSHPublicKey",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "34.204.194.237",
    "userAgent": "aws-cli/1.15.61 Python/2.7.16 Linux/4.14.77-70.59.amzn1.x86_64 botocore/1.10.60",
    "requestParameters": {
        "instanceId": "i-0989ec3292613a4f9",
        "osUser": "ec2-user",
        "SSHKey": {
            "publicKey": "ssh-rsa <removed>\\n"
        }
    },
    "responseElements": null,
    "requestID": "df1a5fa5-710a-11e9-9a13-cba345085725",
    "eventID": "070c0ca9-5878-4af9-8eca-6be4158359c4",
    "eventType": "AwsApiCall",
    "recipientAccountId": "1234567890"
}

If you’ve configured your AWS account to collect CloudTrail events in an S3 bucket, you can download and audit the information programmatically. For more information, see Getting and Viewing Your CloudTrail Log Files.

Conclusion

EC2 Instance Connect offers an alternative to complicated SSH key management strategies and includes the benefits of using built-in auditability with CloudTrail. By integrating with IAM and the EC2 instance metadata available on all EC2 instances, you get a secure way to distribute short-lived keys and control access by IAM policy.

There are some additional features in the works for EC2 Instance Connect. In the future, AWS hopes to launch tag-based authorization, which allows you to use resource tags in the condition of a policy to control access. AWS also plans to enable EC2 Instance Connect by default in popular Linux distros in addition to Amazon Linux 2.

EC2 Instance Connect is available now at no extra charge in the US East (Ohio and N. Virginia), US West (N. California and Oregon), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, and Tokyo), Canada (Central), EU (Frankfurt, Ireland, London, and Paris), and South America (São Paulo) AWS Regions.

EC2 Instance Update – Two More Sizes of M5 & R5 Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-two-more-sizes-of-m5-r5-instances/

When I introduced the Nitro system last year I said:

The Nitro system is a rich collection of building blocks that can be assembled in many different ways, giving us the flexibility to design and rapidly deliver EC2 instance types with an ever-broadening selection of compute, storage, memory, and networking options. We will deliver new instance types more quickly than ever in the months to come, with the goal of helping you to build, migrate, and run even more types of workloads.

Today I am happy to make good on that promise, with the introduction of two additional sizes of the Intel and AMD-powered M5 and R5 instances, including optional NVMe storage. These additional sizes will make it easier for you to find an instance size that is a perfect match for your workload.

M5 Instances
These instances are designed for general-purpose workloads such as web servers, app servers, dev/test environments, gaming, logging, and media processing. Here are the specs:

Instance NamevCPUsRAMStorageEBS-Optimized BandwidthNetwork Bandwidth
m5.8xlarge
32 128 GiBEBS Only5 Gbps10 Gbps
m5.16xlarge
64 256 GiBEBS Only10 Gbps20 Gbps
m5a.8xlarge
32 128 GiBEBS Only3.5 GbpsUp to 10 Gbps
m5a.16xlarge
64 256 GiBEBS Only7 Gbps12 Gbps
m5d.8xlarge
32128 GiB2 x 600 GB NVMe SSD5 Gbps10 Gbps
m5d.16xlarge
64256 GiB4 x 600 GB NVMe SSD10 Gbps20 Gbps

If you are currently using m4.10xlarge or m4.16xlarge instances, you now have several upgrade paths.

To learn more, read M5 – The Next Generation of General-Purpose EC2 Instances, New Lower-Cost, AMD-Powered M5a and R5a EC2 Instances, and EC2 Instance Update – M5 Instances with Local NVMe Storage.

R5 Instances
These instances are designed for data mining, in-memory analytics, caching, simulations, and other memory-intensive workloads. Here are the specs:

Instance NamevCPUsRAMStorageEBS-Optimized BandwidthNetwork Bandwidth
r5.8xlarge
32256 GiBEBS Only5 Gbps10 Gbps
r5.16xlarge
64512 GiBEBS Only10 Gbps20 Gbps
r5a.8xlarge
32256 GiBEBS Only3.5 GbpsUp to 10 Gbps
r5a.16xlarge
64512 GiBEBS Only7 Gbps12 Gbps
r5d.8xlarge
32256 GiB2 x 600 GB NVMe SSD5 Gbps10 Gbps
r5d.16xlarge
64512 GiB4 x 600 GB NVMe SSD10 Gbps20 Gbps

If you are currently using r4.8xlarge or r4.16xlarge instances, you now have several easy and powerful upgrade paths.

To learn more, read Amazon EC2 Instance Update – Faster Processors and More Memory.

Things to Know
Here a couple of things to keep in mind when you use these new instances:

Processor Choice – You can choose between Intel and AMD EPYC processors (instance names include an “a”). Read my post, New Lower-Cost AMD-Powered M5a and R5a EC2 Instances, to learn more.

AMIs – You can use the same AMIs that you use with your existing M5 and R5 instances.

Regions – The new sizes are available in all AWS Regions where the existing sizes are already available.

Local NVMe Storage – On “d” instances with local NVMe storage, the devices are encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated. The local devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
The new sizes are available in On-Demand, Spot, and Reserved Instance form and you can start using them today!

Jeff;

 

Now Available: New C5 instance sizes and bare metal instances

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/now-available-new-c5-instance-sizes-and-bare-metal-instances/

Amazon EC2 C5 instances are very popular for running compute-heavy workloads like batch processing, distributed analytics, high-performance computing, machine/deep learning inference, ad serving, highly scalable multiplayer gaming, and video encoding.

Today, we are happy to expand the Amazon EC2 C5 family with:

  • New larger virtualized instance sizes: 12xlarge and 24xlarge,
  • A bare metal option.

The new C5 instance sizes run on Intel’s Second Generation Xeon Scalable processors (code-named Cascade Lake) with sustained all-core turbo frequency of 3.6GHz and maximum single core turbo frequency of 3.9GHz.

The new processors also enable a new feature called Intel Deep Learning Boost, a capability based on the AVX-512 instruction set. Thanks to the new Vector Neural Network Instructions (AVX-512 VNNI), deep learning frameworks will speed up typical machine learning operations like convolution, and automatically improve inference performance over a wide range of workloads.

These instances are also based on the AWS Nitro System, with dedicated hardware accelerators for EBS processing (including crypto operations), the software-defined network inside of each Virtual Private Cloud (VPC), and ENA networking.

New C5 instance sizes: 12xlarge and 24xlarge

Previously, the largest C5 instance available was C5.18xlarge, with 72 logical processors and 144 GiB of memory. As you can see, the new 24xlarge size increases available resources by 33%, in order to scale up and reduce the time required to compute intensive tasks.

Instance NameLogical ProcessorsMemoryEBS-Optimized BandwidthNetwork Bandwidth
c5.12xlarge4896 GiB7 Gbps12 Gbps
c5.24xlarge96192 GiB14 Gbps25 Gbps

Bare metal C5

Just like for existing bare metal instances (M5, M5d, R5, R5d, z1d, and so forth), your operating system runs directly on the underlying hardware with direct access to the processor.

As described in a previous blog post, you can leverage bare metal instances for applications that:

  • do not want to take the performance hit of nested virtualization,
  • need access to physical resources and low-level hardware features, such as performance counters and Intel VT that are not always available or fully supported in virtualized environments,
  • are intended to run directly on the hardware, or licensed and supported for use in non-virtualized environments.

Bare metal instances can also take advantage of Elastic Load Balancing, Auto Scaling, Amazon CloudWatch, and other AWS services.

Instance NameLogical ProcessorsMemoryEBS-Optimized BandwidthNetwork Bandwidth
c5.metal96192 GiB14 Gbps25 Gbps

Now Available!

You can start using these new instances today in the following regions: US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (London), Europe (Frankfurt), Europe (Stockholm), Europe (Paris), Asia Pacific (Singapore), Asia Pacific (Sydney), and AWS GovCloud (US-West).

Please send us feedback and help us build the next generation of compute-optimized instances.

Julien;

New – Opt-in to Default Encryption for New EBS Volumes

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-opt-in-to-default-encryption-for-new-ebs-volumes/

My colleagues on the AWS team are always looking for ways to make it easier and simpler for you to protect your data from unauthorized access. This work is visible in many different ways, and includes the AWS Cloud Security page, the AWS Security Blog, a rich collection of AWS security white papers, an equally rich set of AWS security, identity, and compliance services, and a wide range of security features within individual services. As you might recall from reading this blog, many AWS services support encryption at rest & in transit, logging, IAM roles & policies, and so forth.

Default Encryption
Today I would like to tell you about a new feature that makes the use of encrypted Amazon EBS (Elastic Block Store) volumes even easier. This launch builds on some earlier EBS security launches including:

You can now specify that you want all newly created EBS volumes to be created in encrypted form, with the option to use the default key provided by AWS, or a key that you create. Because keys and EC2 settings are specific to individual AWS regions, you must opt-in on a region-by-region basis.

This new feature will let you reach your protection and compliance goals by making it simpler and easier for you to ensure that newly created volumes are created in encrypted form. It will not affect existing unencrypted volumes.

If you use IAM policies that require the use of encrypted volumes, you can use this feature to avoid launch failures that would occur if unencrypted volumes were inadvertently referenced when an instance is launched. Your security team can enable encryption by default without having to coordinate with your development team, and with no other code or operational changes.

Encrypted EBS volumes deliver the specified instance throughput, volume performance, and latency, at no extra charge. I open the EC2 Console, make sure that I am in the region of interest, and click Settings to get started:

Then I select Always encrypt new EBS volumes:

I can click Change the default key and choose one of my keys as the default:

Either way, I click Update to proceed. One thing to note here: This setting applies to a single AWS region; I will need to repeat the steps above for each region of interest, checking the option and choosing the key.

Going forward, all EBS volumes that I create in this region will be encrypted, with no additional effort on my part. When I create a volume, I can use the key that I selected in the EC2 Settings, or I can select a different one:

Any snapshots that I create are encrypted with the key that was used to encrypt the volume:

If I use the volume to create a snapshot, I can use the original key or I can choose another one:

Things to Know
Here are some important things that you should know about this important new AWS feature:

Older Instance Types – After you enable this feature, you will not be able to launch any more C1, M1, M2, or T1 instances or attach newly encrypted EBS volumes to existing instances of these types. We recommend that you migrate to newer instance types.

AMI Sharing – As I noted above, we recently gave you the ability to share encrypted AMIs with other AWS accounts. However, you cannot share them publicly, and you should use a separate account to create community AMIs, Marketplace AMIs, and public snapshots. To learn more, read How to Share Encrypted AMIs Across Accounts to Launch Encrypted EC2 Instances.

Other AWS Services – AWS services such as Amazon Relational Database Service (RDS) and Amazon WorkSpaces that use EBS for storage perform their own encryption and key management and are not affected by this launch. Services such as Amazon EMR that create volumes within your account will automatically respect the encryption setting, and will use encrypted volumes if the always-encrypt feature is enabled.

API / CLI Access – You can also access this feature from the EC2 CLI and the API.

No Charge – There is no charge to enable or use encryption. If you are using encrypted AMIs and create a separate one for each AWS account, you can now share the AMI with other accounts, leading to a reduction in storage utilization and charges.

Per-Region – As noted above, you can opt-in to default encryption on a region-by-region basis.

Available Now
This feature is available now and you can start using it today in all public AWS regions and in GovCloud. It is not available in the AWS regions in China.

Jeff;

 

New – The Next Generation (I3en) of I/O-Optimized EC2 Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-the-next-generation-i3en-of-i-o-optimized-ec2-instances/

Amazon’s Customer Obsession leadership principle says:

Leaders start with the customer and work backwards. They work vigorously to earn and keep customer trust. Although leaders pay attention to competitors, they obsess over customers.

Starting from the customer and working backwards means that we do not invent in a vacuum. Instead, we speak directly to our customers (both external and internal), ask detailed questions, and pay attention to what we learn. On the AWS side, we often hear about new use cases that help us to get a better understanding of what our customers are doing with AWS. For example, large-scale EC2 users provide us with another set of interesting data points, often expressed in terms of ratios between dollars, vCPUs, memory size, storage size, and networking throughput.

We launched the I3 instances (Now Available – I3 Instances for Demanding, I/O Intensive Workloads) just about two years ago. Our customers use them to host distributed file systems, relational & NoSQL databases, in-memory caches, key-value stores, data warehouses, and MapReduce clusters. Because our customers are always (in Jeff Bezos’ words) “divinely discontent”, they want I/O-optimized instances with even more power & storage. To be specific, they have asked us for:

  • A lower price per TB of storage
  • Increased storage density to allow consolidation of workloads and scale-up processing
  • A higher ratio of network bandwidth and instance storage to vCPUs

The crucial element here is that our customers were able to express their needs in a detailed and specific manner. Simply asking for something to be better, faster, and cheaper does not help us to make well-informed decisions.

New I3en Instances
Today I am happy to announce the I3en instances. Designed to meet these needs and to do an even better job of addressing the use cases that I mentioned above, these instances are powered by AWS-custom Intel Xeon Scalable (Skylake) processors with 3.1 GHz sustained all-core turbo performance, up to 60 TB of fast NVMe storage, and up to 100 Gbps of network bandwidth. Here are the specs:

Instance NamevCPUs
MemoryLocal Storage
(NVMe SSD)
Random Read IOPS
(4 K Block)
Read Throughput
(128 K Block)
EBS-Optimized BandwidthNetwork Bandwidth
i3en.large216 GiB1 x 1.25 TB42.5 K325 MB/sUp to 3,500 MbpsUp to 25 Gbps
i3en.xlarge432 GiB1 x 2.50 TB85 K650 MB/sUp to 3,500 MbpsUp to 25 Gbps
i3en.2xlarge864 GiB2 x 2.50 TB170 K1.3 GB/sUp to 3,500 MbpsUp to 25 Gbps
i3en.3xlarge1296 GiB1 x 7.5 TB250 K2 GB/sUp to 3,500 MbpsUp to 25 Gbps
i3en.6xlarge24192 GiB2 x 7.5 TB500 K4 GB/s3,500 Mbps25 Gbps
i3en.12xlarge48384 GiB4 x 7.5 TB1 M8 GB/s7,000 Mbps50 Gbps
i3en.24xlarge96768 GiB8 x 7.5 TB2 M16 GB/s14,000 Mbps100 Gbps

In comparison to the I3 instances, the I3en instances offer:

  • A cost per GB of SSD instance storage that is up to 50% lower
  • Storage density (GB per vCPU) that is roughly 2.6x greater
  • Ratio of network bandwidth to vCPUs that is up to 2.7x greater

You will need HVM AMIs with the NVMe 1.0e and ENA drivers. You can also make use of the new Elastic Fabric Adapter (EFA) if you are using the i3en.24xlarge (read my recent post to learn more).

Now Available
You can launch I3en instances today in the US East (N. Virginia), US West (Oregon), and Europe (Ireland) Regions in On-Demand and Spot form. Reserved Instances, Dedicated Instances, and Dedicated Hosts are available.

Jeff;

 

 

SAP on AWS Update – Customer Case Studies, Scale-Up, Scale-Out, and More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/sap-on-aws-update-customer-case-studies-scale-up-scale-out-and-more/

SAP SAPPHIRE NOW 2019 takes place this week in Florida! Many of my AWS colleagues will be there, and they would love to talk to you. Today, I would like to share some customer success stories and give you a brief update on the work that we have been doing to make sure that AWS is the best place for you to run your SAP-powered OLTP and OLAP enterprise workloads.

Customer Update
Let’s start with a quick recap of some recent customer success stories. Here are just a few of the many customers that are using SAP on AWS in production today:

Fiat Chrysler Automotive – After exploring multiple options and vendors, FIAT decided to deploy SAP on AWS with Capgemini as a partner:

Engie – Read the case study to learn how this international energy provider has been able to Transform and Streamline their Financial Processes and drastically reduced the ramp-up time for new users from three days to one day by running SAP S/4HANA on AWS:


AIG – Watch the video to learn how AIG migrated 13 SAP landscapes from an on-premises environment to SAP HANA on AWS in 13 months, while reducing their infrastructure cost by $8M:

Sumitomo Chemical – Read this case study to learn how Sumitomo Chemical runs a huge number of SAP ERP batch jobs on AWS, cutting job processing time by around 40%:

There are additional SAP on AWS Case Studies for your reading pleasure!

AWS customers are making great use of the 6, 9, and 12 TB EC2 High Memory instances that we launched last year. They are building many different SAP Solutions on AWS, taking advantage of the SAP Rapid Migration Test Program, and working with members of the SAP Competency Partner Network.

What’s New
Our customers are building ever-larger SAP installations, using both scale-up (larger instances) or scale-out (more instances) models. We have been working with SAP to certify two additional scale-out options:

48 TB Scale-Out (S/4HANA) – When we launched the EC2 High Memory instances with 12 TB of memory last year, they were certified by SAP to run OLTP and OLAP HANA workloads in scale-up configurations. These instances now support additional configuration choices for your OLTP workloads. You can now use up to four of these 12 TB High Memory instances to run an OLTP S/4HANA solution in scale-out mode, while meeting all SAP requirements.

This is the first ever SAP-certified scale-out certification of S/4HANA on cloud instances. SAP recommends (SAP OSS Note 2408419) the use of bare metal platforms with a minimum of 8 CPUs and 6 TB of memory for running S/4HANA in scale-out. Since the EC2 High Memory instances with 12 TB memory is an EC2 bare metal instance that combines the benefits of the cloud with the performance characteristics of a bare metal platform, it is able to support SAP-certified scale-out configurations for S/4HANA in the cloud. To learn more, read Announcing support for extremely large S/4HANA deployments on AWS and review the certification.

100 TB Scale-Out (BW4/HANA, BW on HANA, Datamart) – You can now use up to 25 x1e.32xlarge EC2 instances (thanks to TDI Phase 5) to create an OLAP solution that scales to 100 TB, again while meeting all SAP requirements. You can start with as little as 244 GB of memory and scale out to 100 TB; review the certification to learn more.

The 48 TB OLTP solution and the 100 TB OLAP solution are the largest SAP-certified solutions available from any cloud provider.

We also have a brand-new S/4HANA Quick Start to help you get going in minutes. It sets up a VPC that spans two Availability Zones, each with a public and private subnet, and a pair of EC2 instances. One instance hosts the primary copy of S/4HANA and the other hosts the secondary. Read the Quick Start to learn more:

What’s Next
Ok, still with me? I hope so, since I have saved the biggest news for last!

We are getting ready to extend our lineup of EC2 High Memory instances, and will make them available with 18 TB and 24 TB of memory in the fall of 2019. The instances will use second-generation Intel® Xeon® Scalable processors, and will be available in bare metal form. Like the existing EC2 High Memory instances, you will be able to run them in the same Virtual Private Cloud (VPC) that hosts your cloud-based business applications, and you will be able to make use of important EC2 features such as Elastic Block Store and advanced networking. You can launch, manage, and even resize these EC2 instances using the AWS Command Line Interface (CLI) and the AWS SDKs.

Here are screen shots of SAP HANA Studio running on 18 TB and 24 TB instances that are currently in development:

And here is the output from top on those instances:

Here is a handy reference to all of your scale-up and scale-out SAP HANA on AWS options:

If you want to learn more or you want to gain early access to the new instances, go ahead and contact us.

Jeff;

 

Now Available – Elastic Fabric Adapter (EFA) for Tightly-Coupled HPC Workloads

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-elastic-fabric-adapter-efa-for-tightly-coupled-hpc-workloads/

We announced Elastic Fabric Adapter (EFA) at re:Invent 2018 and made it available in preview form at the time. During the preview, AWS customers put EFA through its paces on a variety of tightly-coupled HPC workloads, providing us with valuable feedback and helping us to fine-tune the final product.

Now Available
Today I am happy to announce that EFA is now ready for production use in multiple AWS regions. It is ready to support demanding HPC workloads that need lower and more consistent network latency, along with higher throughput, than is possible with traditional TCP communication. This launch lets you apply the scale, flexibility, and elasticity of the AWS Cloud to tightly-coupled HPC apps and I can’t wait to hear what you do with it. You can, for example, scale up to thousands of compute nodes without having to reserve the hardware or the network ahead of time.

All About EFA
An Elastic Fabric Adapter is an AWS Elastic Network Adapter (ENA) with added capabilities (read my post, Elastic Network Adapter – High Performance Network Interface for Amazon EC2, to learn more about ENA). An EFA can still handle IP traffic, but also supports an important access model commonly called OS bypass. This model allows the application (most commonly through some user-space middleware) access the network interface without having to get the operating system involved with each message. Doing so reduces overhead and allows the application to run more efficiently. Here’s what this looks like (source):

The MPI Implementation and libfabric layers of this cake play crucial roles:

MPI – Short for Message Passing Interface, MPI is a long-established communication protocol that is designed to support parallel programming. It provides functions that allow processes running on a tightly-coupled set of computers to communicate in a language-independent way.

libfabric – This library fits in between several different types of network fabric providers (including EFA) and higher-level libraries such as MPI. EFA supports the standard RDM (reliable datagram) and DGRM (unreliable datagram) endpoint types; to learn more, check out the libfabric Programmer’s Manual. EFA also supports a new protocol that we call Scalable Reliable Datagram; this protocol was designed to work within the AWS network and is implemented as part of our Nitro chip.

Working together, these two layers (and others that can be slotted in instead of MPI), allow you to bring your existing HPC code to AWS and run it with little or no change.

You can use EFA today on c5n.18xlarge and p3dn.24xlarge instances in all AWS regions where those instances are available. The instances can use EFA to communicate within a VPC subnet, and the security group must have ingress and egress rules that allow all traffic within the security group to flow. Each instance can have a single EFA, which can be attached when an instance is started or while it is stopped.

You will also need the following software components:

EFA Kernel Module – The EFA Driver is in the Amazon GitHub repo, and in the Amazon Linux & Amazon Linux 2 AMIs. We are working to add it to AMIs for other Linux distributions.

Libfabric Network Stack – You will need to use an AWS-custom version (already present in the Amazon Linux and Amazon Linux 2 AMIs) for now. We are working to get our changes into the next release (1.8) of libfabric.

MPI or NCCL Implementation – You can use Open MPI 3.1.3 (or later) or NCCL (2.3.8 or later) plus the OFI driver for NCCL. We also also working on support for the Intel MPI library.

You can launch an instance and attach an EFA using the CLI, API, or the EC2 Console, with CloudFormation support coming in a couple of weeks. If you are using the CLI, you need to include the subnet ID and ask for an EFA, like this (be sure to include the appropriate security group):

$ aws ec2 run-instances ... \
  --network-interfaces DeleteOnTermination=true,DeviceIndex=0,SubnetId=SUBNET,InterfaceType=efa

After your instance has launched, run lspci | grep efa0 to verify that the EFA device is attached. You can (but don’t have to) launch your instances in a Cluster Placement Group in order to benefit from physical adjacency when every light-foot counts. When used in this way, EFA can provide one-way MPI latency of 15.5 microseconds.

You can also create a Launch Template and use it to launch EC2 instances (either directly or as part of an EC2 Auto Scaling Group) in On-Demand or Spot Form, launch Spot Fleets, and to run compute jobs on AWS Batch.

Learn More
To learn more about EFA, and to see some additional benchmarks, be sure to watch this re:Invent video (Scaling HPC Applications on EC2 w/ Elastic Fabric Adapter):

 

 

AWS Customer CFD Direct maintains the popular OpenFOAM platform for Computational Fluid Dynamics (CFD) and also produces CFD Direct From the Cloud (CFDDC), an AWS Marketplace offering that makes it easy for you to run OpenFOAM on AWS. They have been testing and benchmarking EFA and recently shared their measurements in a blog post titled OpenFOAM HPC with AWS EFA. In the post, they report on a pair of simulations:

External Aerodynamics Around a Car – This simulation scales extra-linearly to over 200 cores, gradually declining to linear scaling at 1000 cores (about 100K simulation cells per core).

Flow Over a Weir with Hydraulic Jump – This simulation (1000 cores and 100M cells) scales at between 67% and 72.6%, depending on a “data write” setting.

Read the full post to learn more and to see some graphs and visualizations.

In the Works
We plan to add EFA support to additional EC2 instance types over time. In general, we plan to provide EFA support for the two largest sizes of “n” instances of any given type, and also for bare metal instances.

Jeff;

 

Now Available – AMD EPYC-Powered Amazon EC2 T3a Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-amd-epyc-powered-amazon-ec2-t3a-instances/

The AMD EPYC-powered T3a instances that I promised you last year are available now and you can start using them today! Like the recently announced M5ad and R5ad instances, the T3a instances are built on the AWS Nitro System and give you an opportunity to balance your instance mix based on cost and performance.

T3a Instances
These instances deliver burstable, cost-effective performance and are a great fit for workloads that do not need high sustained compute power but experience temporary spikes in usage. You get a generous and assured baseline amount of processing power and the ability to transparently scale up to full core performance when you need more processing power, for as long as necessary. To learn more about the burstable compute model common to the T3 and the T3a, read New T3 Instances – Burstable, Cost-Effective Performance.

You can launch T3a instances today in seven sizes in the US East (N. Virginia), US West (Oregon), Europe (Ireland), US East (Ohio), and Asia Pacific (Singapore) Regions in On-Demand, Spot, and Reserved Instance form. Here are the specs:

Instance NamevCPUsRAMEBS-Optimized BandwidthNetwork Bandwidth
t3a.nano
20.5 GiBUp to 1.5 GbpsUp to 5 Gbps
t3a.micro
21 GiBUp to 1.5 GbpsUp to 5 Gbps
t3a.small
22 GiBUp to 1.5 GbpsUp to 5 Gbps
t3a.medium
24 GiBUp to 1.5 GbpsUp to 5 Gbps
t3a.large
28 GiBUp to 2.1 GbpsUp to 5 Gbps
t3a.xlarge
416 GiBUp to 2.1 GbpsUp to 5 Gbps
t3a.2xlarge
832 GiBUp to 2.1 GbpsUp to 5 Gbps

The T3 and the T3a instances are available in the same sizes and can use the same AMIs, making it easy for you to try both and find the one that is the best match for you application.

Pricing is 10% lower than the equivalent existing T3 instances; see the On-Demand, Spot, and Reserved Instance pricing pages for more info.

Jeff;

Optimizing Network Intensive Workloads on Amazon EC2 A1 Instances

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/optimizing-network-intensive-workloads-on-amazon-ec2-a1-instances/

This post courtesy of Ali Saidi, AWS, Principal Engineer

At re:Invent 2018, AWS announced the Amazon EC2 A1 instance. The A1 instances are powered by our internally developed Arm-based AWS Graviton processors and are up to 45% less expensive than other instance types with the same number of vCPUs and DRAM. These instances are based on the AWS Nitro System, and offer enhanced-networking of up to 10 Gbps with Elastic Network Adapters (ENA).

One of the use cases for the A1 instance is key-value stores and in this post, we describe how to get the most performance from the A1 instance running memcached. Some simple configuration options increase the performance of memcached by 3.9X over the out-of-the-box experience as we’ll show below. Although we focus on memcached, the configuration advice is similar for any network intensive workload running on A1 instances. Typically, the performance of network intensive workloads will improve by tuning some of these parameters, however depending on the particular data rates and processing requirements the values below could change.

irqbalance

Most Linux distributions enable irqbalance by default which load-balance interrupts to different CPUs during runtime. It does a good job to balance interrupt load, but in some cases, we can do better by pinning interrupts to specific CPUs. For our optimizations we’re going to temporarily disable irqbalance, however, if this is a production configuration that needs to survive a server reboot, irqbalance would need to be permanently disabled and the changes below would need to be added to the boot sequence.

Receive Packet Steering (RPS)

RPS controls which CPUs process packets are received by the Linux networking stack (softIRQs). Depending on instance size and the amount of application processing needed per packet, sometimes the optimal configuration is to have the core receiving packets also execute the Linux networking stack, other times it’s better to spread the processing among a set of cores. For memcached on EC2 A1 instances, we found that using RPS to spread the load out is helpful on the larger instance sizes.

Networking Queues

A1 instances with medium, large, and xlarge instance sizes have a single queue to send and receive packets while 2xlarge and 4xlarge instance sizes have two queues. On the single queue droplets, we’ll pin the IRQ to core 0, while on the dual-queue droplets we’ll use either core 0 or core 0 and core 8.

Instance TypeIRQ settingsRPS settingsApplication settings
a1.xlargeCore 0Core 0Run on cores 1-3
a1.2xlargeBoth on core 0Core 0-3, 4-7Run on core 1-7
a1.4xlargeCore 0 and core 8Core 0-7, 8-15Run on cores 1-7 and 9-15

 

 

 

 

 

The following script sets up the Linux kernel parameters:

#!/bin/bash 

sudo systemctl stop irqbalance.service
set_irq_affinity() {
  grep eth0 /proc/interrupts | awk '{print $1}' | tr -d : | while read IRQ; 
do
    sudo sh -c "echo $1 > /proc/irq/$IRQ/smp_affinity_list"
    shift
  done
}
 
case `grep ^processor /proc/cpuinfo  | wc -l ` in
  (4) sudo sh -c 'echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0
      ;;
  (8) sudo sh -c 'echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      sudo sh -c 'echo f0 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0 0
      ;;
  (16) sudo sh -c 'echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      sudo sh -c 'echo ff00 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0 08
      ;;
  *)  echo "Script only supports 4, 8, 16 cores on A1 instances"
      exit 1;
      ;;
esac

Summary

Some simple tuning parameters can significantly improve the performance of network intensive workloads on the A1 instance. With these changes we get 3.9X the performance on an a1.4xlarge and the other two instance sizes see similar improvements. While the particular values listed here aren’t applicable to all network intensive benchmarks, this article demonstrates the methodology and provides a starting point to tune the system and balance the load across CPUs to improve performance. If you have questions about your own workload running on A1 instances, please don’t hesitate to get in touch with us at [email protected] .

Using partition placement groups for large distributed and replicated workloads in Amazon EC2

Post Syndicated from Roshni Pary original https://aws.amazon.com/blogs/compute/using-partition-placement-groups-for-large-distributed-and-replicated-workloads-in-amazon-ec2/

This post is contributed by Ankit Jain – Sr. Product Manager, Amazon EC2 and Harsha Warrdhan Sharma – Global Account Solutions Architect at AWS

Before we introduced partition placement groups, customers deployed large distributed and replicated workloads across multiple Availability Zones to reduce correlated failures. This new Amazon EC2 placement strategy helps reduce the likelihood of correlated failures for workloads such as Hadoop, HBase, Cassandra, Kafka, and Aerospike running on EC2.

While Multi-AZ deployment offers high availability, some workloads are more sensitive to internode latency and could not be deployed across multiple zones. With partition placement groups, you can now deploy these workloads within a single zone and reduce the likelihood of correlated failures, improving your application performance and availability.

Placement group overview

Placement groups enable you to influence how your instances are placed on the underlying hardware. EC2 offers different types of placement groups to address different types of workloads. Use cluster placement groups that enable applications to achieve a low-latency network performance, necessary for tightly coupled node-to-node communication, typical of many HPC applications. Or, use spread placement groups to place a small number of critical instances on distinct racks, reducing correlated failures for your applications.

While spread placement groups work well for small workloads, customers wanted a way to reduce correlated failures for large distributed and replicated workloads that required hundreds of EC2 instances. They also wanted visibility into how instances are placed relative to each other on the underlying hardware. To address these needs, we introduced partition placement groups.

How partition placement groups work

To isolate the impact of hardware faults, EC2 subdivides each partition placement group, into logical segments called partitions. EC2 ensures that no two partitions within a placement group share the same racks. The following diagram shows a partition placement group in a single zone:

Applications such as Hadoop, HBase, Cassandra, Kafka, and Aerospike have replicated nodes for fault tolerance and use the topology information to make intelligent data storage decisions. With partition placement groups, EC2 can place the replicated nodes across separate racks in a zone and isolate the risk of hardware failure to only one node. In addition, partition placement groups offer visibility into the partitions that allows these rack-aware applications to make intelligent data replication decisions, increasing data availability and durability.

Deploying applications using partition placement groups

Here’s an example of how to use a partition placement group through the AWS Management Console or AWS CLI.

First, in the console, choose EC2, Placement Group, create a partition placement group, and then define the number of partitions within that placement group. When creating a partition placement group, we recommend mirroring the replication factor of your application and the number of partitions that you run in that placement group. For instance, if a Hadoop cluster has a replication factor of 3, then you should define a partition placement group with three partitions.

You can also run the following AWS CLI command to perform the same action:

aws ec2 create-placement-group --group-name HDFS-GROUP-A --strategy partition --partition-count 3

After defining the placement group, launch the desired number of EC2 instances into this placement group from the EC2 launch instance wizard.

If you specify Auto distribution for the Target partition field at the time of launch, EC2 tries to evenly distribute all instances across the partitions. For example, a node might consist of 100 EC2 instances and there could be three such replicated nodes in your application. You can launch 300 EC2 instances, select the partition placement group, and keep Auto distribution as the default. Each partition then has approximately 100 instances, with each partition running on a separate group of racks.

Run the following command (auto distribution is the default behavior):

aws ec2 run-instances --placement "GroupName = HDFS-GROUP-A" --count 300

Alternatively, you can specify a target partition number at launch. As your workload grows, this feature makes it easy to maintain the placement group. You can replace instances in a specified partition or add more instances to a specified partition, using the Target partition field.

Run the following command and specify PartitionNumber:

aws ec2 run-instances --placement "GroupName = HDFS-GROUP-A, PartitionNumber = 1" --count 100

After you launch the EC2 instances, you can view the partition number associated with each instance. For these rack-aware applications, you can now treat the partition number associated with instances as rack-ids and pass this information to the application. The application can then use this topology information to decide the replication model to improve the application performance and data reliability. View this information in the console by filtering by placement group name or by the partition number.

Run the following command:

aws ec2 describe-instances --filters "Name=placement-group-name,Values = HDFS-GROUP-A,Name=placement-partition-number,Values = 1" 

Conclusion

Partition placement groups provide you with a way to reduce the likelihood of correlated failures for large workloads. This allows you to run applications like Hadoop, HBase, Cassandra, Kafka, and Aerospike within a single Availability Zone. By deploying these applications within a single zone, you can benefit from the lower latency offered within a zone, improving the application performance. Partition placement groups also offer elasticity, allowing you to run any number of instances within the placement group. For more information, see Partition Placement Groups.

The Wide World of Microsoft Windows on AWS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/the-wide-world-of-microsoft-windows-on-aws/

You have been able to run Microsoft Windows on AWS since 2008 (my ancient post, Big Day for Amazon EC2: Production, SLA, Windows, and 4 New Capabilities, shows you just how far AWS come in a little over a decade). According to IDC, AWS has nearly twice as many Windows Server instances in the cloud as the next largest cloud provider.

Today, we believe that AWS is the best place to run Windows and Windows applications in the cloud. You can run the full Windows stack on AWS, including Active Directory, SQL Server, and System Center, while taking advantage of 61 Availability Zones across 20 AWS Regions. You can run existing .NET applications and you can use Visual Studio or VS Code build new, cloud-native Windows applications using the AWS SDK for .NET.

Wide World of Windows
Starting from this amazing diagram drawn by my colleague Jerry Hargrove, I’d like to explore the Windows-on-AWS ecosystem in detail:

1 – SQL Server Upgrades
AWS provides first-class support for SQL Server, encompassing all four Editions (Express, Web, Standard, and Enterprise), with multiple version of each edition. This wide-ranging support has helped SQL Server to become one of the most popular Windows workloads on AWS.

The SQL Server Upgrade Tool (an AWS Systems Manager script) makes it easy for you to upgrade an EC2 instance that is running SQL Server 2008 R2 SP3 to SQL Server 2016. The tool creates an AMI from a running instance, upgrades the AMI to SQL Server 2016, and launches the new AMI. To learn more, read about the AWSEC2-CloneInstanceAndUpgradeSQLServer action.

Amazon RDS makes it easy for you to upgrade your DB Instances to new major or minor upgrades to SQL Server. The upgrade is performed in-place, and can be initiated with a couple of clicks. For example, if you are currently running SQL Server 2014, you have the following upgrades available:

You can also opt-in to automatic upgrades to new minor versions that take place within your preferred maintenance window:

Before you upgrade a production DB Instance, you can create a snapshot backup, use it to create a test DB Instance, upgrade that instance to the desired new version, and perform acceptance testing. To learn more, about upgrades, read Upgrading the Microsoft SQL Server DB Engine.

2 – SQL Server on Linux
If your organization prefers Linux, you can run SQL Server on Ubuntu, Amazon Linux 2, or Red Hat Enterprise Linux using our License Included (LI) Amazon Machine Images. Read the most recent launch announcement or search for the AMIs in AWS Marketplace using the EC2 Launch Instance Wizard:

This is a very cost-effective option since you do not need to pay for Windows licenses.

You can use the new re-platforming tool (another AWS Systems Manager script) to move your existing SQL Server databases (2008 and above, either in the cloud or on-premises) from Windows to Linux.

3 – Always-On Availability Groups (Amazon RDS for SQL Server)
If you are running enterprise-grade production workloads on Amazon RDS (our managed database service), you should definitely enable this feature! It enhances availability and durability by replicating your database between two AWS Availability Zones, with a primary instance in one and a hot standby in another, with fast, automatic failover in the event of planned maintenance or a service disruption. You can enable this option for an existing DB Instance, and you can also specify it when you create a new one:

To learn more, read Multi-AZ Deployments Using Microsoft SQL Mirroring or Always On.

4 – Lambda Support
Let’s talk about some features for developers!

Launched in 2014, and the subject of continuous innovation ever since, AWS Lambda lets you run code in the cloud without having to own, manage, or even think about servers. You can choose from several .NET Core runtimes for your Lambda functions, and then write your code in either C# or PowerShell:

To learn more, read Working with C# and Working with PowerShell in the AWS Lambda Developer Guide. Your code has access to the full set of AWS services, and can make use of the AWS SDK for .NET; read the Developing .NET Core AWS Lambda Functions post for more info.

5 – CDK for .NET
The AWS CDK (Cloud Development Kit) for .NET lets you define your cloud infrastructure as code and then deploy it using AWS CloudFormation. For example, this code (stolen from this post) will generate a template that creates an Amazon Simple Queue Service (SQS) queue and an Amazon Simple Notification Service (SNS) topic:

var queue = new Queue(this, "MyFirstQueue", new QueueProps
{
    VisibilityTimeoutSec = 300
}
var topic = new Topic(this, "MyFirstTopic", new TopicProps
{
    DisplayName = "My First Topic Yeah"
});

6 – EC2 AMIs for .NET Core
If you are building Linux applications that make use of .NET Core, you can use use our Amazon Linux 2 and Ubuntu AMIs. With .NET Core, PowerShell Core, and the AWS Command Line Interface (CLI) preinstalled, you’ll be up and running— and ready to deploy applications—in minutes. You can find the AMIs by searching for core when you launch an EC2 instance:

7 – .NET Dev Center
The AWS .Net Dev Center contains materials that will help you to learn how design, build, and run .NET Applications on AWS. You’ll find articles, sample code, 10-minute tutorials, projects, and lots more:

8 – AWS License Manager
We want to help you to manage and optimize your Windows and SQL Server applications in new ways. For example,  AWS License Manager helps you to manage the licenses for the software that you run in the cloud or on-premises (read my post, New AWS License Manager – Manage Software Licenses and Enforce Licensing Rules, to learn more). You can create custom rules that emulate those in your licensing agreements, and enforce them when an EC2 instance is launched:

The License Manager also provides you with information on license utilization so that you can fine-tune your license portfolio, possibly saving some money in the process!

9 – Import, Export, and Migration
You have lots of options and choices when it comes to moving your code and data into and out of AWS. Here’s a very brief summary:

TSO Logic – This new member of the AWS family (we acquired the company earlier this year) offers an analytics solution that helps you to plan, optimize, and save money as you make your journey to the cloud.

VM Import/Export – This service allows you to import existing virtual machine images to EC2 instances, and export them back to your on-premises environment. Read Importing a VM as an Image Using VM Import/Export to learn more.

AWS Snowball – This service lets you move petabyte scale data sets into and out of AWS. If you are at exabyte scale, check out the AWS Snowmobile.

AWS Migration Acceleration Program – This program encompasses AWS Professional Services and teams from our partners. It is based on a three step migration model that includes a readiness assessment, a planning phase, and the actual migration.

10 – 21st Century Applications
AWS gives you a full-featured, rock-solid foundation and a rich set of services so that you can build tomorrow’s applications today! You can go serverless with the .NET Core support in Lambda, make use of our Deep Learning AMIs for Windows, host containerized apps on Amazon ECS or eks], and write code that makes use of the latest AI-powered services. Your applications can make use of recommendations, forecasting, image analysis, video analysis, text analytics, document analysis, text to speech, translation, transcription, and more.

11 – AWS Integration
Your existing Windows Applications, both cloud-based and on-premises, can make use of Windows file system and directory services within AWS:

Amazon FSx for Windows Server – This fully managed native Windows file system is compatible with the SMB protocol and NTFS. It provides shared file storage for Windows applications, backed by SSD storage for fast & reliable performance. To learn more, read my blog post.

AWS Directory Service – Your directory-aware workloads and AWS Enterprise IT applications can use this managed Active Directory that runs in the AWS Cloud.

Join our Team
If you would like to build, manage, or market new AWS offerings for the Windows market, be sure to check out our current openings. Here’s a sampling:

Senior Digital Campaign Marketing Manager – Own the digital tactics for product awareness and run adoption campaigns.

Senior Product Marketing Manager – Drive communications and marketing, create compelling content, and build awareness.

Developer Advocate – Drive adoption and community engagement for SQL Server on EC2.

Learn More
Our freshly updated Windows on AWS and SQL Server on AWS pages contain case studies, quick starts, and lots of other useful information.

Jeff;

GPU workloads on AWS Batch

Post Syndicated from Josh Rad original https://aws.amazon.com/blogs/compute/gpu-workloads-on-aws-batch/

Contributed by Manuel Manzano Hoss, Cloud Support Engineer

I remember playing around with graphics processing units (GPUs) workload examples in 2017 when the Deep Learning on AWS Batch post was published by my colleague Kiuk Chung. He provided an example of how to train a convolutional neural network (CNN), the LeNet architecture, to recognize handwritten digits from the MNIST dataset using Apache MXNet as the framework. Back then, to run such jobs with GPU capabilities, I had to do the following:

  • Create a custom GPU-enabled AMI that had installed Docker, the ECS agent, NVIDIA driver and container runtime, and CUDA.
  • Identify the type of P2 EC2 instance that had the required amount of GPU for my job.
  • Check the amount of vCPUs that it offered (even if I was not interested on using them).
  • Specify that number of vCPUs for my job.

All that, when I didn’t have any certainty that the instance was going to have the GPU required available when my job was already running. Back then, there was no GPU pinning. Other jobs running on the same EC2 instance were able to use that GPU, making the orchestration of my jobs a tricky task.

Fast forward two years. Today, AWS Batch announced integrated support for Amazon EC2 Accelerated Instances. It is now possible to specify an amount of GPU as a resource that AWS Batch considers in choosing the EC2 instance to run your job, along with vCPU and memory. That allows me to take advantage of the main benefits of using AWS Batch, the compute resource selection algorithm and job scheduler. It also frees me from having to check the types of EC2 instances that have enough GPU.

Also, I can take advantage of the Amazon ECS GPU-optimized AMI maintained by AWS. It comes with the NVIDIA drivers and all the necessary software to run GPU-enabled jobs. When I allow the P2 or P3 instance types on my compute environment, AWS Batch launches my compute resources using the Amazon ECS GPU-optimized AMI automatically.

In other words, now I don’t worry about the GPU task list mentioned earlier. I can focus on deciding which framework and command to run on my GPU-accelerated workload. At the same time, I’m now sure that my jobs have access to the required performance, as physical GPUs are pinned to each job and not shared among them.

A GPU race against the past

As a kind of GPU-race exercise, I checked a similar example to the one from Kiuk’s post, to see how fast it could be to run a GPU-enabled job now. I used the AWS Management Console to demonstrate how simple the steps are.

In this case, I decided to use the deep neural network architecture called multilayer perceptron (MLP), not the LeNet CNN, to compare the validation accuracy between them.

To make the test even simpler and faster to implement, I thought I would use one of the recently announced AWS Deep Learning containers, which come pre-packed with different frameworks and ready-to-process data. I chose the container that comes with MXNet and Python 2.7, customized for Training and GPU. For more information about the Docker images available, see the AWS Deep Learning Containers documentation.

In the AWS Batch console, I created a managed compute environment with the default settings, allowing AWS Batch to create the required IAM roles on my behalf.

On the configuration of the compute resources, I selected the P2 and P3 families of instances, as those are the type of instance with GPU capabilities. You can select On-Demand Instances, but in this case I decided to use Spot Instances to take advantage of the discounts that this pricing model offers. I left the defaults for all other settings, selecting the AmazonEC2SpotFleetRole role that I created the first time that I used Spot Instances:

Finally, I also left the network settings as default. My compute environment selected the default VPC, three subnets, and a security group. They are enough to run my jobs and at the same time keep my environment safe by limiting connections from outside the VPC:

I created a job queue, GPU_JobQueue, attaching it to the compute environment that I just created:

Next, I registered the same job definition that I would have created following Kiuk’s post. I specified enough memory to run this test, one vCPU, and the AWS Deep Learning Docker image that I chose, in this case mxnet-training:1.4.0-gpu-py27-cu90-ubuntu16.04. The amount of GPU required was in this case, one. To have access to run the script, the container must run as privileged, or using the root user.

Finally, I submitted the job. I first cloned the MXNet repository for the train_mnist.py Python script. Then I ran the script itself, with the parameter –gpus 0 to indicate that the assigned GPU should be used. The job inherits all the other parameters from the job definition:

sh -c 'git clone -b 1.3.1 https://github.com/apache/incubator-mxnet.git && python /incubator-mxnet/example/image-classification/train_mnist.py --gpus 0'

That’s all, and my GPU-enabled job was running. It took me less than two minutes to go from zero to having the job submitted. This is the log of my job, from which I removed the iterations from epoch 1 to 18 to make it shorter:

14:32:31     Cloning into 'incubator-mxnet'...

14:33:50     Note: checking out '19c501680183237d52a862e6ae1dc4ddc296305b'.

14:33:51     INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none', gpus='0', initializer='default', kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1, lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9, monitor=0, network='mlp', num_classes=10, num_epochs=20, num_examples=60000, num_layers=No

14:33:51     DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com:80

14:33:54     DEBUG:urllib3.connectionpool:http://yann.lecun.com:80 "GET /exdb/mnist/train-labels-idx1-ubyte.gz HTTP/1.1" 200 28881

14:33:55     DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com:80

14:33:55     DEBUG:urllib3.connectionpool:http://yann.lecun.com:80 "GET /exdb/mnist/train-images-idx3-ubyte.gz HTTP/1.1" 200 9912422

14:33:59     DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com:80

14:33:59     DEBUG:urllib3.connectionpool:http://yann.lecun.com:80 "GET /exdb/mnist/t10k-labels-idx1-ubyte.gz HTTP/1.1" 200 4542

14:33:59     DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com:80

14:34:00     DEBUG:urllib3.connectionpool:http://yann.lecun.com:80 "GET /exdb/mnist/t10k-images-idx3-ubyte.gz HTTP/1.1" 200 1648877

14:34:04     INFO:root:Epoch[0] Batch [0-100] Speed: 37038.30 samples/sec accuracy=0.793472

14:34:04     INFO:root:Epoch[0] Batch [100-200] Speed: 36457.89 samples/sec accuracy=0.906719

14:34:04     INFO:root:Epoch[0] Batch [200-300] Speed: 36981.20 samples/sec accuracy=0.927500

14:34:04     INFO:root:Epoch[0] Batch [300-400] Speed: 36925.04 samples/sec accuracy=0.935156

14:34:04     INFO:root:Epoch[0] Batch [400-500] Speed: 37262.36 samples/sec accuracy=0.940156

14:34:05     INFO:root:Epoch[0] Batch [500-600] Speed: 37729.64 samples/sec accuracy=0.942813

14:34:05     INFO:root:Epoch[0] Batch [600-700] Speed: 37493.55 samples/sec accuracy=0.949063

14:34:05     INFO:root:Epoch[0] Batch [700-800] Speed: 37320.80 samples/sec accuracy=0.953906

14:34:05     INFO:root:Epoch[0] Batch [800-900] Speed: 37705.85 samples/sec accuracy=0.958281

14:34:05     INFO:root:Epoch[0] Train-accuracy=0.924024

14:34:05     INFO:root:Epoch[0] Time cost=1.633

...  LOGS REMOVED

14:34:44     INFO:root:Epoch[19] Batch [0-100] Speed: 36864.44 samples/sec accuracy=0.999691

14:34:44     INFO:root:Epoch[19] Batch [100-200] Speed: 37088.35 samples/sec accuracy=1.000000

14:34:44     INFO:root:Epoch[19] Batch [200-300] Speed: 36706.91 samples/sec accuracy=0.999687

14:34:44     INFO:root:Epoch[19] Batch [300-400] Speed: 37941.19 samples/sec accuracy=0.999687

14:34:44     INFO:root:Epoch[19] Batch [400-500] Speed: 37180.97 samples/sec accuracy=0.999844

14:34:44     INFO:root:Epoch[19] Batch [500-600] Speed: 37122.30 samples/sec accuracy=0.999844

14:34:45     INFO:root:Epoch[19] Batch [600-700] Speed: 37199.37 samples/sec accuracy=0.999687

14:34:45     INFO:root:Epoch[19] Batch [700-800] Speed: 37284.93 samples/sec accuracy=0.999219

14:34:45     INFO:root:Epoch[19] Batch [800-900] Speed: 36996.80 samples/sec accuracy=0.999844

14:34:45     INFO:root:Epoch[19] Train-accuracy=0.999733

14:34:45     INFO:root:Epoch[19] Time cost=1.617

14:34:45     INFO:root:Epoch[19] Validation-accuracy=0.983579

Summary

As you can see, after AWS Batch launched the instance, the job took slightly more than two minutes to run. I spent roughly five minutes from start to finish. That was much faster than the time that I was previously spending just to configure the AMI. Using the AWS CLI, one of the AWS SDKs, or AWS CloudFormation, the same environment could be created even faster.

From a training point of view, I lost on the validation accuracy, as the results obtained using the LeNet CNN are higher than when using an MLP network. On the other hand, my job was faster, with a time cost of 1.6 seconds in average for each epoch. As the software stack evolves, and increased hardware capabilities come along, these numbers keep improving, but that shouldn’t mean extra complexity. Using managed primitives like the one presented in this post enables a simpler implementation.

I encourage you to test this example and see for yourself how just a few clicks or commands lets you start running GPU jobs with AWS Batch. Then, it is just a matter of replacing the Docker image that I used for one with the framework of your choice, TensorFlow, Caffe, PyTorch, Keras, etc. Start to run your GPU-enabled machine learning, deep learning, computational fluid dynamics (CFD), seismic analysis, molecular modeling, genomics, or computational finance workloads. It’s faster and easier than ever.

If you decide to give it a try, have any doubt or just want to let me know what you think about this post, please write in the comments section!

Learn about AWS Services & Solutions – April AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-april-aws-online-tech-talks/

AWS Tech Talks

Join us this April to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

Blockchain

May 2, 2019 | 11:00 AM – 12:00 PM PTHow to Build an Application with Amazon Managed Blockchain – Learn how to build an application on Amazon Managed Blockchain with the help of demo applications and sample code.

Compute

April 29, 2019 | 1:00 PM – 2:00 PM PTHow to Optimize Amazon Elastic Block Store (EBS) for Higher Performance – Learn how to optimize performance and spend on your Amazon Elastic Block Store (EBS) volumes.

May 1, 2019 | 11:00 AM – 12:00 PM PTIntroducing New Amazon EC2 Instances Featuring AMD EPYC and AWS Graviton Processors – See how new Amazon EC2 instance offerings that feature AMD EPYC processors and AWS Graviton processors enable you to optimize performance and cost for your workloads.

Containers

April 23, 2019 | 11:00 AM – 12:00 PM PTDeep Dive on AWS App Mesh – Learn how AWS App Mesh makes it easy to monitor and control communications for services running on AWS.

March 22, 2019 | 9:00 AM – 10:00 AM PTDeep Dive Into Container Networking – Dive deep into microservices networking and how you can build, secure, and manage the communications into, out of, and between the various microservices that make up your application.

Databases

April 23, 2019 | 1:00 PM – 2:00 PM PTSelecting the Right Database for Your Application – Learn how to develop a purpose-built strategy for databases, where you choose the right tool for the job.

April 25, 2019 | 9:00 AM – 10:00 AM PTMastering Amazon DynamoDB ACID Transactions: When and How to Use the New Transactional APIs – Learn how the new Amazon DynamoDB’s transactional APIs simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables.

DevOps

April 24, 2019 | 9:00 AM – 10:00 AM PTRunning .NET applications with AWS Elastic Beanstalk Windows Server Platform V2 – Learn about the easiest way to get your .NET applications up and running on AWS Elastic Beanstalk.

Enterprise & Hybrid

April 30, 2019 | 11:00 AM – 12:00 PM PTBusiness Case Teardown: Identify Your Real-World On-Premises and Projected AWS Costs – Discover tools and strategies to help you as you build your value-based business case.

IoT

April 30, 2019 | 9:00 AM – 10:00 AM PTBuilding the Edge of Connected Home – Learn how AWS IoT edge services are enabling smarter products for the connected home.

Machine Learning

April 24, 2019 | 11:00 AM – 12:00 PM PTStart Your Engines and Get Ready to Race in the AWS DeepRacer League – Learn more about reinforcement learning, how to build a model, and compete in the AWS DeepRacer League.

April 30, 2019 | 1:00 PM – 2:00 PM PTDeploying Machine Learning Models in Production – Learn best practices for training and deploying machine learning models.

May 2, 2019 | 9:00 AM – 10:00 AM PTAccelerate Machine Learning Projects with Hundreds of Algorithms and Models in AWS Marketplace – Learn how to use third party algorithms and model packages to accelerate machine learning projects and solve business problems.

Networking & Content Delivery

April 23, 2019 | 9:00 AM – 10:00 AM PTSmart Tips on Application Load Balancers: Advanced Request Routing, Lambda as a Target, and User Authentication – Learn tips and tricks about important Application Load Balancers (ALBs) features that were recently launched.

Productivity & Business Solutions

April 29, 2019 | 11:00 AM – 12:00 PM PTLearn How to Set up Business Calling and Voice Connector in Minutes with Amazon Chime – Learn how Amazon Chime Business Calling and Voice Connector can help you with your business communication needs.

May 1, 2019 | 1:00 PM – 2:00 PM PTBring Voice to Your Workplace – Learn how you can bring voice to your workplace with Alexa for Business.

Serverless

April 25, 2019 | 11:00 AM – 12:00 PM PTModernizing .NET Applications Using the Latest Features on AWS Development Tools for .NET – Get a dive deep and demonstration of the latest updates to the AWS SDK and tools for .NET to make development even easier, more powerful, and more productive.

May 1, 2019 | 9:00 AM – 10:00 AM PTCustomer Showcase: Improving Data Processing Workloads with AWS Step Functions’ Service Integrations – Learn how innovative customers like SkyWatch are coordinating AWS services using AWS Step Functions to improve productivity.

Storage

April 24, 2019 | 1:00 PM – 2:00 PM PTAmazon S3 Glacier Deep Archive: The Cheapest Storage in the Cloud – See how Amazon S3 Glacier Deep Archive offers the lowest cost storage in the cloud, at prices significantly lower than storing and maintaining data in on-premises magnetic tape libraries or archiving data offsite.

New AMD EPYC-Powered Amazon EC2 M5ad and R5ad Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amd-epyc-powered-amazon-ec2-m5ad-and-r5ad-instances/

Last year I told you about our New Lower-Cost, AMD-Powered M5a and R5a EC2 Instances. Built on the AWS Nitro System, these instances are powered by custom AMD EPYC processors running at 2.5 GHz. They are priced 10% lower than comparable EC2 M5 and R5 instances, and give you a new opportunity to balance your instance mix based on cost and performance.

Today we are adding M5ad and R5ad instances, both powered by custom AMD EPYC 7000 series processors and built on the AWS Nitro System.

M5ad and R5ad Instances
These instances add high-speed, low latency local (physically connected) block storage to the existing M5a and R5a instances that we launched late last year.

M5ad instances are designed for general purpose workloads such as web servers, app servers, dev/test environments, gaming, logging, and media processing. They are available in 6 sizes:

Instance NamevCPUsRAMLocal StorageEBS-Optimized BandwidthNetwork Bandwidth
m5ad.large
28 GiB1 x 75 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5ad.xlarge
416 GiB1 x 150 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5ad.2xlarge
832 GiB1 x 300 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5ad.4xlarge
1664 GiB2 x 300 GB NVMe SSD2.120 GbpsUp to 10 Gbps
m5ad.12xlarge
48192 GiB2 x 900 GB NVMe SSD5 Gbps10 Gbps
m5ad.24xlarge
96384 GiB4 x 900 GB NVMe SSD10 Gbps20 Gbps

R5ad instances are designed for memory-intensive workloads: data mining, in-memory analytics, caching, simulations, and so forth. The R5ad instances are available in 6 sizes:

Instance NamevCPUsRAMLocal StorageEBS-Optimized BandwidthNetwork Bandwidth
r5ad.large
216 GiB1 x 75 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
r5ad.xlarge
432 GiB1 x 150 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
r5ad.2xlarge
864 GiB1 x 300 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
r5ad.4xlarge
16128 GiB2 x 300 GB NVMe SSD2.120 GbpsUp to 10 Gbps
r5ad.12xlarge
48384 GiB2 x 900 GB NVMe SSD5 Gbps10 Gbps
r5ad.24xlarge
96768 GiB4 x 900 GB NVMe SSD10 Gbps20 Gbps

Again, these instances are available in the same sizes as the M5d and R5d instances, and the AMIs work on either, so go ahead and try both!

Here are some things to keep in mind about the local NMVe storage on the M5ad and R5ad instances:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

M5ad and R5ad instances are available in the US East (N. Virginia), US West (Oregon), US East (Ohio), and Asia Pacific (Singapore) Regions in On-Demand, Spot, and Reserved Instance form.

Jeff;

 

In the Works – EC2 Instances (G4) with NVIDIA T4 GPUs

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-ec2-instances-g4-with-nvidia-t4-gpus/

I’ve written about the power and value of GPUs in the past, and I have written posts to launch many generations of GPU-equipped EC2 instances including the CG1, G2, G3, P2, P3, and P3dn instance types.

Today I would like to give you a sneak peek at our newest GPU-equipped instance, the G4. Designed for machine learning training & inferencing, video transcoding, and other demanding applications, G4 instances will be available in multiple sizes and also in bare metal form. We are still fine-tuning the specs, but you can look forward to:

  • AWS-custom Intel CPUs (4 to 96 vCPUs)
  • 1 to 8 NVIDIA T4 Tensor Core GPUs
  • Up to 384 GiB of memory
  • Up to 1.8 TB of fast, local NVMe storage
  • Up to 100 Gbps networking

The brand-new NVIDIA T4 GPUs feature 320 Turing Tensor cores, 2,560 CUDA cores, and 16 GB of memory. In addition to support for machine learning inferencing and video processing, the T4 includes RT Cores for real-time ray tracing and can provide up to 2x the graphics performance of the NVIDIA M60 (watch Ray Tracing in Games with NVIDIA RTX to learn more).

I’ll have a lot more to say about these powerful, high-end instances very soon, so stay tuned!

Jeff;

PS – If you are interested in joining a private preview, sign up now.

Getting started with the A1 instance

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/getting-started-with-the-a1-instance/

This post courtesy of Ali Saidi, Annapurna Labs, Principal Systems Developer

At re:Invent 2018 AWS announced the Amazon EC2 A1 instance. These instances are based on the AWS Nitro System that powers all of our latest generation of instances, and are the first instance types powered by the AWS Graviton Processor. These processors feature 64-bit Arm Neoverse cores and are the first general-purpose processor design by Amazon specifically for use in AWS. The instances are up to 40% less expensive than the same number of vCPUs and DRAM available in other instance types. A1 instances are currently available in the US East (N. Virginia and Ohio), US West (Oregon) and EU (Ireland) regions with the following configurations:

ModelvCPUsMemory (GiB)Instance StoreNetwork BandwidthEBS Bandwidth
a1.medium12EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.large24EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.xlarge48EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.2xlarge816EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.4xlarge1632EBS OnlyUp to 10 GbpsUp to 3.5 Gbps

For further information about the instance itself, developers can watch this re:Invent talk and visit the A1 product details page.

Since introduction, we’ve been expanding the available operating systems for the instance and working with the Arm software ecosystem. This blog will provide a summary of what’s supported and how to use it.

Operating System Support

If you’re running on an open source stack, as many customers who build applications that scale-out in the cloud are, the Arm ecosystem is well developed and likely already supports your application.

The A1 instance requires AMIs and software built for Arm processors. When we announced A1, we had support for Amazon Linux 2, Ubuntu 16.04 and 18.04, as well as Red Hat Enterprise Linux 7.6. A little over two months later and the available operating systems for our customers has increased to include Red Hat Enterprise Linux 8.0 Beta, NetBSD, Fedora Rawhide, Ubuntu 18.10, and Debian 9.8. We expect to see more operating systems, linux distributions and AMIs available in the coming months.

These operating systems and Linux distributions are offering the same level of support for their Arm AMIs as they do for their existing x86 AMIs. In almost every case, if you’re installing packages with aptor yum those packages exist for Arm in the OS of your choice and will run in the same way.

For example, to install PHP 7.2 on the Arm version of Amazon Linux 2 or Ubuntu we follow the exact same steps we would on an x86 based instance type:

$ sudo amazon-linux-extras php72
$ sudo yum install php

Or on Ubuntu 18.04:

$ sudo apt update
$ sudo apt install php

Containers

Containers are one of the most popular application deployment mechanisms for A1. The Elastic Container Service (ECS) already supports the A1 instance and there’s an Amazon ECS-Optimized Amazon Linux 2 AMI, and we’ll soon be launching support for Elastic Kubernetes Service (EKS). The majority of Docker official-images hosted in Docker Hub already have support for 64-bit Arm systems along with x86.

We’ve further expanded support for running containers at scale with AWS Batch support for A1.

Running a container on A1

In this section we show how to run the container on Amazon Linux 2. Many Docker official images (at least 76% as of this writing) already support 64-bit Arm systems, and the majority of the ones that don’t either have pending patches to add support or are based on commercial software

$ sudo yum install -y docker
$ sudo service docker start
$ sudo docker run hello-world
 
$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
3b4173355427: Pull complete
Digest: sha256:2557e3c07ed1e38f26e389462d03ed943586f744621577a99efb77324b0fe535
Status: Downloaded newer image for hello-world:latest
 
Hello from Docker!
This message shows that your installation appears to be working correctly.
...

Running WordPress on A1

As an example of automating the running of a LAMP (Linux, Apache HTTPd, MariaDB, and PHP) stack on an A1 instance, we’ve updated a basic CloudFormation template to support the A1 instance type. We made some changes to the template to support Amazon Linux 2, but otherwise the same template works for all our instance types. The template is here and it can be launched like any other CloudFormation template.

It defaults to running on an A1 Arm instance. After the template is launched, the output is the URL of the running instance which can be accessed from a browser to observe the default WordPress home page is being served.

Summary

If you’re using open source software, everything you rely on most likely works on Arm systems today, and over the coming months we’ll be working on increasing the support and improving performance of software running on the A1 instances. If you have an open source based web-tier or containerized application, give the A1 instances a try and let us know what you think. If you run into any issues please don’t hesitate to get in touch at [email protected] , via the AWS Compute Forum, or reach out through your usual AWS support contacts, we love customer’s feedback.