Tag Archives: Amazon EC2

The Wide World of Microsoft Windows on AWS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/the-wide-world-of-microsoft-windows-on-aws/

You have been able to run Microsoft Windows on AWS since 2008 (my ancient post, Big Day for Amazon EC2: Production, SLA, Windows, and 4 New Capabilities, shows you just how far AWS come in a little over a decade). According to IDC, AWS has nearly twice as many Windows Server instances in the cloud as the next largest cloud provider.

Today, we believe that AWS is the best place to run Windows and Windows applications in the cloud. You can run the full Windows stack on AWS, including Active Directory, SQL Server, and System Center, while taking advantage of 61 Availability Zones across 20 AWS Regions. You can run existing .NET applications and you can use Visual Studio or VS Code build new, cloud-native Windows applications using the AWS SDK for .NET.

Wide World of Windows
Starting from this amazing diagram drawn by my colleague Jerry Hargrove, I’d like to explore the Windows-on-AWS ecosystem in detail:

1 – SQL Server Upgrades
AWS provides first-class support for SQL Server, encompassing all four Editions (Express, Web, Standard, and Enterprise), with multiple version of each edition. This wide-ranging support has helped SQL Server to become one of the most popular Windows workloads on AWS.

The SQL Server Upgrade Tool (an AWS Systems Manager script) makes it easy for you to upgrade an EC2 instance that is running SQL Server 2008 R2 SP3 to SQL Server 2016. The tool creates an AMI from a running instance, upgrades the AMI to SQL Server 2016, and launches the new AMI. To learn more, read about the AWSEC2-CloneInstanceAndUpgradeSQLServer action.

Amazon RDS makes it easy for you to upgrade your DB Instances to new major or minor upgrades to SQL Server. The upgrade is performed in-place, and can be initiated with a couple of clicks. For example, if you are currently running SQL Server 2014, you have the following upgrades available:

You can also opt-in to automatic upgrades to new minor versions that take place within your preferred maintenance window:

Before you upgrade a production DB Instance, you can create a snapshot backup, use it to create a test DB Instance, upgrade that instance to the desired new version, and perform acceptance testing. To learn more, about upgrades, read Upgrading the Microsoft SQL Server DB Engine.

2 – SQL Server on Linux
If your organization prefers Linux, you can run SQL Server on Ubuntu, Amazon Linux 2, or Red Hat Enterprise Linux using our License Included (LI) Amazon Machine Images. Read the most recent launch announcement or search for the AMIs in AWS Marketplace using the EC2 Launch Instance Wizard:

This is a very cost-effective option since you do not need to pay for Windows licenses.

You can use the new re-platforming tool (another AWS Systems Manager script) to move your existing SQL Server databases (2008 and above, either in the cloud or on-premises) from Windows to Linux.

3 – Always-On Availability Groups (Amazon RDS for SQL Server)
If you are running enterprise-grade production workloads on Amazon RDS (our managed database service), you should definitely enable this feature! It enhances availability and durability by replicating your database between two AWS Availability Zones, with a primary instance in one and a hot standby in another, with fast, automatic failover in the event of planned maintenance or a service disruption. You can enable this option for an existing DB Instance, and you can also specify it when you create a new one:

To learn more, read Multi-AZ Deployments Using Microsoft SQL Mirroring or Always On.

4 – Lambda Support
Let’s talk about some features for developers!

Launched in 2014, and the subject of continuous innovation ever since, AWS Lambda lets you run code in the cloud without having to own, manage, or even think about servers. You can choose from several .NET Core runtimes for your Lambda functions, and then write your code in either C# or PowerShell:

To learn more, read Working with C# and Working with PowerShell in the AWS Lambda Developer Guide. Your code has access to the full set of AWS services, and can make use of the AWS SDK for .NET; read the Developing .NET Core AWS Lambda Functions post for more info.

5 – CDK for .NET
The AWS CDK (Cloud Development Kit) for .NET lets you define your cloud infrastructure as code and then deploy it using AWS CloudFormation. For example, this code (stolen from this post) will generate a template that creates an Amazon Simple Queue Service (SQS) queue and an Amazon Simple Notification Service (SNS) topic:

var queue = new Queue(this, "MyFirstQueue", new QueueProps
{
    VisibilityTimeoutSec = 300
}
var topic = new Topic(this, "MyFirstTopic", new TopicProps
{
    DisplayName = "My First Topic Yeah"
});

6 – EC2 AMIs for .NET Core
If you are building Linux applications that make use of .NET Core, you can use use our Amazon Linux 2 and Ubuntu AMIs. With .NET Core, PowerShell Core, and the AWS Command Line Interface (CLI) preinstalled, you’ll be up and running— and ready to deploy applications—in minutes. You can find the AMIs by searching for core when you launch an EC2 instance:

7 – .NET Dev Center
The AWS .Net Dev Center contains materials that will help you to learn how design, build, and run .NET Applications on AWS. You’ll find articles, sample code, 10-minute tutorials, projects, and lots more:

8 – AWS License Manager
We want to help you to manage and optimize your Windows and SQL Server applications in new ways. For example,  AWS License Manager helps you to manage the licenses for the software that you run in the cloud or on-premises (read my post, New AWS License Manager – Manage Software Licenses and Enforce Licensing Rules, to learn more). You can create custom rules that emulate those in your licensing agreements, and enforce them when an EC2 instance is launched:

The License Manager also provides you with information on license utilization so that you can fine-tune your license portfolio, possibly saving some money in the process!

9 – Import, Export, and Migration
You have lots of options and choices when it comes to moving your code and data into and out of AWS. Here’s a very brief summary:

TSO Logic – This new member of the AWS family (we acquired the company earlier this year) offers an analytics solution that helps you to plan, optimize, and save money as you make your journey to the cloud.

VM Import/Export – This service allows you to import existing virtual machine images to EC2 instances, and export them back to your on-premises environment. Read Importing a VM as an Image Using VM Import/Export to learn more.

AWS Snowball – This service lets you move petabyte scale data sets into and out of AWS. If you are at exabyte scale, check out the AWS Snowmobile.

AWS Migration Acceleration Program – This program encompasses AWS Professional Services and teams from our partners. It is based on a three step migration model that includes a readiness assessment, a planning phase, and the actual migration.

10 – 21st Century Applications
AWS gives you a full-featured, rock-solid foundation and a rich set of services so that you can build tomorrow’s applications today! You can go serverless with the .NET Core support in Lambda, make use of our Deep Learning AMIs for Windows, host containerized apps on Amazon ECS or eks], and write code that makes use of the latest AI-powered services. Your applications can make use of recommendations, forecasting, image analysis, video analysis, text analytics, document analysis, text to speech, translation, transcription, and more.

11 – AWS Integration
Your existing Windows Applications, both cloud-based and on-premises, can make use of Windows file system and directory services within AWS:

Amazon FSx for Windows Server – This fully managed native Windows file system is compatible with the SMB protocol and NTFS. It provides shared file storage for Windows applications, backed by SSD storage for fast & reliable performance. To learn more, read my blog post.

AWS Directory Service – Your directory-aware workloads and AWS Enterprise IT applications can use this managed Active Directory that runs in the AWS Cloud.

Join our Team
If you would like to build, manage, or market new AWS offerings for the Windows market, be sure to check out our current openings. Here’s a sampling:

Senior Digital Campaign Marketing Manager – Own the digital tactics for product awareness and run adoption campaigns.

Senior Product Marketing Manager – Drive communications and marketing, create compelling content, and build awareness.

Developer Advocate – Drive adoption and community engagement for SQL Server on EC2.

Learn More
Our freshly updated Windows on AWS and SQL Server on AWS pages contain case studies, quick starts, and lots of other useful information.

Jeff;

GPU workloads on AWS Batch

Post Syndicated from Josh Rad original https://aws.amazon.com/blogs/compute/gpu-workloads-on-aws-batch/

Contributed by Manuel Manzano Hoss, Cloud Support Engineer

I remember playing around with graphics processing units (GPUs) workload examples in 2017 when the Deep Learning on AWS Batch post was published by my colleague Kiuk Chung. He provided an example of how to train a convolutional neural network (CNN), the LeNet architecture, to recognize handwritten digits from the MNIST dataset using Apache MXNet as the framework. Back then, to run such jobs with GPU capabilities, I had to do the following:

  • Create a custom GPU-enabled AMI that had installed Docker, the ECS agent, NVIDIA driver and container runtime, and CUDA.
  • Identify the type of P2 EC2 instance that had the required amount of GPU for my job.
  • Check the amount of vCPUs that it offered (even if I was not interested on using them).
  • Specify that number of vCPUs for my job.

All that, when I didn’t have any certainty that the instance was going to have the GPU required available when my job was already running. Back then, there was no GPU pinning. Other jobs running on the same EC2 instance were able to use that GPU, making the orchestration of my jobs a tricky task.

Fast forward two years. Today, AWS Batch announced integrated support for Amazon EC2 Accelerated Instances. It is now possible to specify an amount of GPU as a resource that AWS Batch considers in choosing the EC2 instance to run your job, along with vCPU and memory. That allows me to take advantage of the main benefits of using AWS Batch, the compute resource selection algorithm and job scheduler. It also frees me from having to check the types of EC2 instances that have enough GPU.

Also, I can take advantage of the Amazon ECS GPU-optimized AMI maintained by AWS. It comes with the NVIDIA drivers and all the necessary software to run GPU-enabled jobs. When I allow the P2 or P3 instance types on my compute environment, AWS Batch launches my compute resources using the Amazon ECS GPU-optimized AMI automatically.

In other words, now I don’t worry about the GPU task list mentioned earlier. I can focus on deciding which framework and command to run on my GPU-accelerated workload. At the same time, I’m now sure that my jobs have access to the required performance, as physical GPUs are pinned to each job and not shared among them.

A GPU race against the past

As a kind of GPU-race exercise, I checked a similar example to the one from Kiuk’s post, to see how fast it could be to run a GPU-enabled job now. I used the AWS Management Console to demonstrate how simple the steps are.

In this case, I decided to use the deep neural network architecture called multilayer perceptron (MLP), not the LeNet CNN, to compare the validation accuracy between them.

To make the test even simpler and faster to implement, I thought I would use one of the recently announced AWS Deep Learning containers, which come pre-packed with different frameworks and ready-to-process data. I chose the container that comes with MXNet and Python 2.7, customized for Training and GPU. For more information about the Docker images available, see the AWS Deep Learning Containers documentation.

In the AWS Batch console, I created a managed compute environment with the default settings, allowing AWS Batch to create the required IAM roles on my behalf.

On the configuration of the compute resources, I selected the P2 and P3 families of instances, as those are the type of instance with GPU capabilities. You can select On-Demand Instances, but in this case I decided to use Spot Instances to take advantage of the discounts that this pricing model offers. I left the defaults for all other settings, selecting the AmazonEC2SpotFleetRole role that I created the first time that I used Spot Instances:

Finally, I also left the network settings as default. My compute environment selected the default VPC, three subnets, and a security group. They are enough to run my jobs and at the same time keep my environment safe by limiting connections from outside the VPC:

I created a job queue, GPU_JobQueue, attaching it to the compute environment that I just created:

Next, I registered the same job definition that I would have created following Kiuk’s post. I specified enough memory to run this test, one vCPU, and the AWS Deep Learning Docker image that I chose, in this case mxnet-training:1.4.0-gpu-py27-cu90-ubuntu16.04. The amount of GPU required was in this case, one. To have access to run the script, the container must run as privileged, or using the root user.

Finally, I submitted the job. I first cloned the MXNet repository for the train_mnist.py Python script. Then I ran the script itself, with the parameter –gpus 0 to indicate that the assigned GPU should be used. The job inherits all the other parameters from the job definition:

sh -c 'git clone -b 1.3.1 https://github.com/apache/incubator-mxnet.git && python /incubator-mxnet/example/image-classification/train_mnist.py --gpus 0'

That’s all, and my GPU-enabled job was running. It took me less than two minutes to go from zero to having the job submitted. This is the log of my job, from which I removed the iterations from epoch 1 to 18 to make it shorter:

14:32:31     Cloning into 'incubator-mxnet'...

14:33:50     Note: checking out '19c501680183237d52a862e6ae1dc4ddc296305b'.

14:33:51     INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none', gpus='0', initializer='default', kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1, lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9, monitor=0, network='mlp', num_classes=10, num_epochs=20, num_examples=60000, num_layers=No

14:33:51     DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com:80

14:33:54     DEBUG:urllib3.connectionpool:http://yann.lecun.com:80 "GET /exdb/mnist/train-labels-idx1-ubyte.gz HTTP/1.1" 200 28881

14:33:55     DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com:80

14:33:55     DEBUG:urllib3.connectionpool:http://yann.lecun.com:80 "GET /exdb/mnist/train-images-idx3-ubyte.gz HTTP/1.1" 200 9912422

14:33:59     DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com:80

14:33:59     DEBUG:urllib3.connectionpool:http://yann.lecun.com:80 "GET /exdb/mnist/t10k-labels-idx1-ubyte.gz HTTP/1.1" 200 4542

14:33:59     DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): yann.lecun.com:80

14:34:00     DEBUG:urllib3.connectionpool:http://yann.lecun.com:80 "GET /exdb/mnist/t10k-images-idx3-ubyte.gz HTTP/1.1" 200 1648877

14:34:04     INFO:root:Epoch[0] Batch [0-100] Speed: 37038.30 samples/sec accuracy=0.793472

14:34:04     INFO:root:Epoch[0] Batch [100-200] Speed: 36457.89 samples/sec accuracy=0.906719

14:34:04     INFO:root:Epoch[0] Batch [200-300] Speed: 36981.20 samples/sec accuracy=0.927500

14:34:04     INFO:root:Epoch[0] Batch [300-400] Speed: 36925.04 samples/sec accuracy=0.935156

14:34:04     INFO:root:Epoch[0] Batch [400-500] Speed: 37262.36 samples/sec accuracy=0.940156

14:34:05     INFO:root:Epoch[0] Batch [500-600] Speed: 37729.64 samples/sec accuracy=0.942813

14:34:05     INFO:root:Epoch[0] Batch [600-700] Speed: 37493.55 samples/sec accuracy=0.949063

14:34:05     INFO:root:Epoch[0] Batch [700-800] Speed: 37320.80 samples/sec accuracy=0.953906

14:34:05     INFO:root:Epoch[0] Batch [800-900] Speed: 37705.85 samples/sec accuracy=0.958281

14:34:05     INFO:root:Epoch[0] Train-accuracy=0.924024

14:34:05     INFO:root:Epoch[0] Time cost=1.633

...  LOGS REMOVED

14:34:44     INFO:root:Epoch[19] Batch [0-100] Speed: 36864.44 samples/sec accuracy=0.999691

14:34:44     INFO:root:Epoch[19] Batch [100-200] Speed: 37088.35 samples/sec accuracy=1.000000

14:34:44     INFO:root:Epoch[19] Batch [200-300] Speed: 36706.91 samples/sec accuracy=0.999687

14:34:44     INFO:root:Epoch[19] Batch [300-400] Speed: 37941.19 samples/sec accuracy=0.999687

14:34:44     INFO:root:Epoch[19] Batch [400-500] Speed: 37180.97 samples/sec accuracy=0.999844

14:34:44     INFO:root:Epoch[19] Batch [500-600] Speed: 37122.30 samples/sec accuracy=0.999844

14:34:45     INFO:root:Epoch[19] Batch [600-700] Speed: 37199.37 samples/sec accuracy=0.999687

14:34:45     INFO:root:Epoch[19] Batch [700-800] Speed: 37284.93 samples/sec accuracy=0.999219

14:34:45     INFO:root:Epoch[19] Batch [800-900] Speed: 36996.80 samples/sec accuracy=0.999844

14:34:45     INFO:root:Epoch[19] Train-accuracy=0.999733

14:34:45     INFO:root:Epoch[19] Time cost=1.617

14:34:45     INFO:root:Epoch[19] Validation-accuracy=0.983579

Summary

As you can see, after AWS Batch launched the instance, the job took slightly more than two minutes to run. I spent roughly five minutes from start to finish. That was much faster than the time that I was previously spending just to configure the AMI. Using the AWS CLI, one of the AWS SDKs, or AWS CloudFormation, the same environment could be created even faster.

From a training point of view, I lost on the validation accuracy, as the results obtained using the LeNet CNN are higher than when using an MLP network. On the other hand, my job was faster, with a time cost of 1.6 seconds in average for each epoch. As the software stack evolves, and increased hardware capabilities come along, these numbers keep improving, but that shouldn’t mean extra complexity. Using managed primitives like the one presented in this post enables a simpler implementation.

I encourage you to test this example and see for yourself how just a few clicks or commands lets you start running GPU jobs with AWS Batch. Then, it is just a matter of replacing the Docker image that I used for one with the framework of your choice, TensorFlow, Caffe, PyTorch, Keras, etc. Start to run your GPU-enabled machine learning, deep learning, computational fluid dynamics (CFD), seismic analysis, molecular modeling, genomics, or computational finance workloads. It’s faster and easier than ever.

If you decide to give it a try, have any doubt or just want to let me know what you think about this post, please write in the comments section!

Learn about AWS Services & Solutions – April AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-april-aws-online-tech-talks/

AWS Tech Talks

Join us this April to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

Blockchain

May 2, 2019 | 11:00 AM – 12:00 PM PTHow to Build an Application with Amazon Managed Blockchain – Learn how to build an application on Amazon Managed Blockchain with the help of demo applications and sample code.

Compute

April 29, 2019 | 1:00 PM – 2:00 PM PTHow to Optimize Amazon Elastic Block Store (EBS) for Higher Performance – Learn how to optimize performance and spend on your Amazon Elastic Block Store (EBS) volumes.

May 1, 2019 | 11:00 AM – 12:00 PM PTIntroducing New Amazon EC2 Instances Featuring AMD EPYC and AWS Graviton Processors – See how new Amazon EC2 instance offerings that feature AMD EPYC processors and AWS Graviton processors enable you to optimize performance and cost for your workloads.

Containers

April 23, 2019 | 11:00 AM – 12:00 PM PTDeep Dive on AWS App Mesh – Learn how AWS App Mesh makes it easy to monitor and control communications for services running on AWS.

March 22, 2019 | 9:00 AM – 10:00 AM PTDeep Dive Into Container Networking – Dive deep into microservices networking and how you can build, secure, and manage the communications into, out of, and between the various microservices that make up your application.

Databases

April 23, 2019 | 1:00 PM – 2:00 PM PTSelecting the Right Database for Your Application – Learn how to develop a purpose-built strategy for databases, where you choose the right tool for the job.

April 25, 2019 | 9:00 AM – 10:00 AM PTMastering Amazon DynamoDB ACID Transactions: When and How to Use the New Transactional APIs – Learn how the new Amazon DynamoDB’s transactional APIs simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables.

DevOps

April 24, 2019 | 9:00 AM – 10:00 AM PTRunning .NET applications with AWS Elastic Beanstalk Windows Server Platform V2 – Learn about the easiest way to get your .NET applications up and running on AWS Elastic Beanstalk.

Enterprise & Hybrid

April 30, 2019 | 11:00 AM – 12:00 PM PTBusiness Case Teardown: Identify Your Real-World On-Premises and Projected AWS Costs – Discover tools and strategies to help you as you build your value-based business case.

IoT

April 30, 2019 | 9:00 AM – 10:00 AM PTBuilding the Edge of Connected Home – Learn how AWS IoT edge services are enabling smarter products for the connected home.

Machine Learning

April 24, 2019 | 11:00 AM – 12:00 PM PTStart Your Engines and Get Ready to Race in the AWS DeepRacer League – Learn more about reinforcement learning, how to build a model, and compete in the AWS DeepRacer League.

April 30, 2019 | 1:00 PM – 2:00 PM PTDeploying Machine Learning Models in Production – Learn best practices for training and deploying machine learning models.

May 2, 2019 | 9:00 AM – 10:00 AM PTAccelerate Machine Learning Projects with Hundreds of Algorithms and Models in AWS Marketplace – Learn how to use third party algorithms and model packages to accelerate machine learning projects and solve business problems.

Networking & Content Delivery

April 23, 2019 | 9:00 AM – 10:00 AM PTSmart Tips on Application Load Balancers: Advanced Request Routing, Lambda as a Target, and User Authentication – Learn tips and tricks about important Application Load Balancers (ALBs) features that were recently launched.

Productivity & Business Solutions

April 29, 2019 | 11:00 AM – 12:00 PM PTLearn How to Set up Business Calling and Voice Connector in Minutes with Amazon Chime – Learn how Amazon Chime Business Calling and Voice Connector can help you with your business communication needs.

May 1, 2019 | 1:00 PM – 2:00 PM PTBring Voice to Your Workplace – Learn how you can bring voice to your workplace with Alexa for Business.

Serverless

April 25, 2019 | 11:00 AM – 12:00 PM PTModernizing .NET Applications Using the Latest Features on AWS Development Tools for .NET – Get a dive deep and demonstration of the latest updates to the AWS SDK and tools for .NET to make development even easier, more powerful, and more productive.

May 1, 2019 | 9:00 AM – 10:00 AM PTCustomer Showcase: Improving Data Processing Workloads with AWS Step Functions’ Service Integrations – Learn how innovative customers like SkyWatch are coordinating AWS services using AWS Step Functions to improve productivity.

Storage

April 24, 2019 | 1:00 PM – 2:00 PM PTAmazon S3 Glacier Deep Archive: The Cheapest Storage in the Cloud – See how Amazon S3 Glacier Deep Archive offers the lowest cost storage in the cloud, at prices significantly lower than storing and maintaining data in on-premises magnetic tape libraries or archiving data offsite.

New AMD EPYC-Powered Amazon EC2 M5ad and R5ad Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amd-epyc-powered-amazon-ec2-m5ad-and-r5ad-instances/

Last year I told you about our New Lower-Cost, AMD-Powered M5a and R5a EC2 Instances. Built on the AWS Nitro System, these instances are powered by custom AMD EPYC processors running at 2.5 GHz. They are priced 10% lower than comparable EC2 M5 and R5 instances, and give you a new opportunity to balance your instance mix based on cost and performance.

Today we are adding M5ad and R5ad instances, both powered by custom AMD EPYC 7000 series processors and built on the AWS Nitro System.

M5ad and R5ad Instances
These instances add high-speed, low latency local (physically connected) block storage to the existing M5a and R5a instances that we launched late last year.

M5ad instances are designed for general purpose workloads such as web servers, app servers, dev/test environments, gaming, logging, and media processing. They are available in 6 sizes:

Instance NamevCPUsRAMLocal StorageEBS-Optimized BandwidthNetwork Bandwidth
m5ad.large
28 GiB1 x 75 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5ad.xlarge
416 GiB1 x 150 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5ad.2xlarge
832 GiB1 x 300 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5ad.4xlarge
1664 GiB2 x 300 GB NVMe SSD2.120 GbpsUp to 10 Gbps
m5ad.12xlarge
48192 GiB2 x 900 GB NVMe SSD5 Gbps10 Gbps
m5ad.24xlarge
96384 GiB4 x 900 GB NVMe SSD10 Gbps20 Gbps

R5ad instances are designed for memory-intensive workloads: data mining, in-memory analytics, caching, simulations, and so forth. The R5ad instances are available in 6 sizes:

Instance NamevCPUsRAMLocal StorageEBS-Optimized BandwidthNetwork Bandwidth
r5ad.large
216 GiB1 x 75 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
r5ad.xlarge
432 GiB1 x 150 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
r5ad.2xlarge
864 GiB1 x 300 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
r5ad.4xlarge
16128 GiB2 x 300 GB NVMe SSD2.120 GbpsUp to 10 Gbps
r5ad.12xlarge
48384 GiB2 x 900 GB NVMe SSD5 Gbps10 Gbps
r5ad.24xlarge
96768 GiB4 x 900 GB NVMe SSD10 Gbps20 Gbps

Again, these instances are available in the same sizes as the M5d and R5d instances, and the AMIs work on either, so go ahead and try both!

Here are some things to keep in mind about the local NMVe storage on the M5ad and R5ad instances:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

M5ad and R5ad instances are available in the US East (N. Virginia), US West (Oregon), US East (Ohio), and Asia Pacific (Singapore) Regions in On-Demand, Spot, and Reserved Instance form.

Jeff;

 

In the Works – EC2 Instances (G4) with NVIDIA T4 GPUs

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-ec2-instances-g4-with-nvidia-t4-gpus/

I’ve written about the power and value of GPUs in the past, and I have written posts to launch many generations of GPU-equipped EC2 instances including the CG1, G2, G3, P2, P3, and P3dn instance types.

Today I would like to give you a sneak peek at our newest GPU-equipped instance, the G4. Designed for machine learning training & inferencing, video transcoding, and other demanding applications, G4 instances will be available in multiple sizes and also in bare metal form. We are still fine-tuning the specs, but you can look forward to:

  • AWS-custom Intel CPUs (4 to 96 vCPUs)
  • 1 to 8 NVIDIA T4 Tensor Core GPUs
  • Up to 384 GiB of memory
  • Up to 1.8 TB of fast, local NVMe storage
  • Up to 100 Gbps networking

The brand-new NVIDIA T4 GPUs feature 320 Turing Tensor cores, 2,560 CUDA cores, and 16 GB of memory. In addition to support for machine learning inferencing and video processing, the T4 includes RT Cores for real-time ray tracing and can provide up to 2x the graphics performance of the NVIDIA M60 (watch Ray Tracing in Games with NVIDIA RTX to learn more).

I’ll have a lot more to say about these powerful, high-end instances very soon, so stay tuned!

Jeff;

PS – If you are interested in joining a private preview, sign up now.

Getting started with the A1 instance

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/getting-started-with-the-a1-instance/

This post courtesy of Ali Saidi, Annapurna Labs, Principal Systems Developer

At re:Invent 2018 AWS announced the Amazon EC2 A1 instance. These instances are based on the AWS Nitro System that powers all of our latest generation of instances, and are the first instance types powered by the AWS Graviton Processor. These processors feature 64-bit Arm Neoverse cores and are the first general-purpose processor design by Amazon specifically for use in AWS. The instances are up to 40% less expensive than the same number of vCPUs and DRAM available in other instance types. A1 instances are currently available in the US East (N. Virginia and Ohio), US West (Oregon) and EU (Ireland) regions with the following configurations:

ModelvCPUsMemory (GiB)Instance StoreNetwork BandwidthEBS Bandwidth
a1.medium12EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.large24EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.xlarge48EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.2xlarge816EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.4xlarge1632EBS OnlyUp to 10 GbpsUp to 3.5 Gbps

For further information about the instance itself, developers can watch this re:Invent talk and visit the A1 product details page.

Since introduction, we’ve been expanding the available operating systems for the instance and working with the Arm software ecosystem. This blog will provide a summary of what’s supported and how to use it.

Operating System Support

If you’re running on an open source stack, as many customers who build applications that scale-out in the cloud are, the Arm ecosystem is well developed and likely already supports your application.

The A1 instance requires AMIs and software built for Arm processors. When we announced A1, we had support for Amazon Linux 2, Ubuntu 16.04 and 18.04, as well as Red Hat Enterprise Linux 7.6. A little over two months later and the available operating systems for our customers has increased to include Red Hat Enterprise Linux 8.0 Beta, NetBSD, Fedora Rawhide, Ubuntu 18.10, and Debian 9.8. We expect to see more operating systems, linux distributions and AMIs available in the coming months.

These operating systems and Linux distributions are offering the same level of support for their Arm AMIs as they do for their existing x86 AMIs. In almost every case, if you’re installing packages with aptor yum those packages exist for Arm in the OS of your choice and will run in the same way.

For example, to install PHP 7.2 on the Arm version of Amazon Linux 2 or Ubuntu we follow the exact same steps we would on an x86 based instance type:

$ sudo amazon-linux-extras php72
$ sudo yum install php

Or on Ubuntu 18.04:

$ sudo apt update
$ sudo apt install php

Containers

Containers are one of the most popular application deployment mechanisms for A1. The Elastic Container Service (ECS) already supports the A1 instance and there’s an Amazon ECS-Optimized Amazon Linux 2 AMI, and we’ll soon be launching support for Elastic Kubernetes Service (EKS). The majority of Docker official-images hosted in Docker Hub already have support for 64-bit Arm systems along with x86.

We’ve further expanded support for running containers at scale with AWS Batch support for A1.

Running a container on A1

In this section we show how to run the container on Amazon Linux 2. Many Docker official images (at least 76% as of this writing) already support 64-bit Arm systems, and the majority of the ones that don’t either have pending patches to add support or are based on commercial software

$ sudo yum install -y docker
$ sudo service docker start
$ sudo docker run hello-world
 
$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
3b4173355427: Pull complete
Digest: sha256:2557e3c07ed1e38f26e389462d03ed943586f744621577a99efb77324b0fe535
Status: Downloaded newer image for hello-world:latest
 
Hello from Docker!
This message shows that your installation appears to be working correctly.
...

Running WordPress on A1

As an example of automating the running of a LAMP (Linux, Apache HTTPd, MariaDB, and PHP) stack on an A1 instance, we’ve updated a basic CloudFormation template to support the A1 instance type. We made some changes to the template to support Amazon Linux 2, but otherwise the same template works for all our instance types. The template is here and it can be launched like any other CloudFormation template.

It defaults to running on an A1 Arm instance. After the template is launched, the output is the URL of the running instance which can be accessed from a browser to observe the default WordPress home page is being served.

Summary

If you’re using open source software, everything you rely on most likely works on Arm systems today, and over the coming months we’ll be working on increasing the support and improving performance of software running on the A1 instances. If you have an open source based web-tier or containerized application, give the A1 instances a try and let us know what you think. If you run into any issues please don’t hesitate to get in touch at [email protected] , via the AWS Compute Forum, or reach out through your usual AWS support contacts, we love customer’s feedback.

New Whitepaper: Active Directory Domain Services on AWS

Post Syndicated from Vinod Madabushi original https://aws.amazon.com/blogs/architecture/new-whitepaper-active-directory-domain-services-on-aws/

The cloud is now at the center of most Enterprise IT strategies. As such, a well-planned move to the cloud can result in immediate business payoff. To achieve such success, it’s important that you adopt Microsoft Active Directory (AD), the foundation of many large enterprise Windows and .NET applications in a secure, scalable, and highly available manner within the AWS Cloud.

AWS offers flexible options for running AD, so as a customer it’s essential to select an architecture well-suited to support your applications. AWS offers a fully managed option called AWS Managed Active Directory, which enables your directory-aware workloads to use Managed Active Directory in AWS. You can also run Active Directory on Amazon Elastic Compute Cloud (Amazon EC2) and manage both the EC2 Instances and Active Directory, which provides the flexibility needed to extend an existing Active Directory domain to the AWS infrastructure.

In this regard, we are very excited to release Active Directory Domain Services for AWS Whitepaper. This Active Directory whitepaper describes best practices for running Active Directory on AWS, including different architectural approaches for running AWS Managed AD and Active Directory on EC2 Instances. In addition, this document discusses the design considerations, security, network connectivity, and multi-region deployment of Active Directory for both scenarios.

Read the whitepaper: Active Directory on AWS.

About the author

Vinod MadabushiVinod Madabushi is an Enterprise Solutions Architect and subject matter expert in Microsoft technologies, including Active Directory. He works with customers on building highly available, scalable, and resilient applications on AWS Cloud. He’s passionate about solving technology challenges and helping customers with their cloud journey.

 

Running your game servers at scale for up to 90% lower compute cost

Post Syndicated from Roshni Pary original https://aws.amazon.com/blogs/compute/running-your-game-servers-at-scale-for-up-to-90-lower-compute-cost/

This post is contributed by Yahav Biran, Chad Schmutzer, and Jeremy Cowan, Solutions Architects at AWS

Many successful video games such Fortnite: Battle Royale, Warframe, and Apex Legends use a free-to-play model, which offers players access to a portion of the game without paying. Such games are no longer low quality and require premium-like quality. The business model is constrained on cost, and Amazon EC2 Spot Instances offer a viable low-cost compute option. The casual multiplayer games naturally fit the Spot offering. With the orchestration of Amazon EKS containers and the mechanism available to minimize the player impact and optimize the cost when running multiplayer game-servers workloads, both casual and hardcore multiplayer games fit the Spot Instance offering.

Spot Instances offer spare compute capacity available in the AWS Cloud at steep discounts compared to On-Demand Instances. Spot Instances enable you to optimize your costs and scale your application’s throughput up to 10 times for the same budget. Spot Instances are best suited for fault-tolerant workloads. Multiplayer game-servers are no exception: a game-server state is updated using real-time player inputs, which makes the server state transient. Game-server workloads can be disposable and take advantage of Spot Instances to save up to 90% on compute cost. In this blog, we share how to architect your game-server workloads to handle interruptions and effectively use Spot Instances.

Characteristics of game-server workloads

Simply put, multiplayer game servers spend most of their life updating current character position and state (mostly animation). The rest of the time is spent on image updates that result from combat actions, moves, and other game-related events. More specifically, game servers’ CPUs are busy doing network I/O operations by accepting client positions, calculating the new game state, and multi-casting the game state back to the clients. That makes a game server workload a good fit for general-purpose instance types for casual multiplayer games and, preferably, compute-optimized instance types for the hardcore multiplayer games.

AWS provides a wide variety for both compute-optimized (C5 and C4) and general-purpose (M5) instance types with Amazon EC2 Spot Instances. Because capacities fluctuate independently for each instance type in an Availability Zone, you can often get more compute capacity for the same price when using a wide range of instance types. For more information on Spot Instance best practices, see Getting Started with Amazon EC2 Spot Instances

One solution that customers use for running dedicated game-servers is Amazon GameLift. This solution deploys a fleet of Amazon GameLift FleetIQ and Spot Instances in an AWS Region. FleetIQ places new sessions on game servers based on player latencies, instance prices, and Spot Instance interruption rates so that you don’t need to worry about Spot Instance interruptions. For more information, see Reduce Cost by up to 90% with Amazon GameLift FleetIQ and Spot Instances on the AWS Game Tech Blog.

In other cases, you can use game-server deployment patterns like containers-based orchestration (such as Kubernetes, Swarm, and Amazon ECS) for deploying multiplayer game servers. Those systems manage a large number of game-servers deployed as Docker containers across several Regions. The rest of this blog focuses on this containerized game-server solution. Containers fit the game-server workload because they’re lightweight, start quickly, and allow for greater utilization of the underlying instance.

Why use Amazon EC2 Spot Instances?

A Spot Instance is the choice to run a disposable game server workload because of its built-in two-minute interruption notice that provides graceful handling. The two-minute termination notification enables the game server to act upon interruption. We demonstrate two examples for notification handling through Instance Metadata and Amazon CloudWatch. For more information, see “Interruption handling” and “What if I want my game-server to be redundant?” segments later in this blog.

Spot Instances also offer a variety of EC2 instances types that fit game-server workloads, such as general-purpose and compute-optimized (C4 and C5). Finally, Spot Instances provide low termination rates. The Spot Instance Advisor can help you choose a good starting point for determining which instance types have lower historical interruption rates.

Interruption handling

Avoiding player impact is key when using Spot Instances. Here is a strategy to avoid player impact that we apply in the proposed reference architecture and code examples available at Spotable Game Server on GitHub. Specifically, for Amazon EKS, node drainage requires draining the node via the kubectl drain command. This makes the node unschedulable and evicts the pods currently running on the node with a graceful termination period (terminationGracePeriodSeconds) that might impact the player experience. As a result, pods continue to run while a signal is sent to the game to end it gracefully.

Node drainage

Node drainage requires an agent pod that runs as a DaemonSet on every Spot Instance host to pull potential spot interruption from Amazon CloudWatch or instance metadata. We’re going to use the Instance Metadata notification. The following describes how termination events are handled with node drainage:

  1. Launch the game-server pod with a default of 120 seconds (terminationGracePeriodSeconds). As an example, see this deploy YAML file on GitHub.
  2. Provision a worker node pool with a mixed instances policy of On-Demand and Spot Instances. It uses the Spot Instance allocation strategy with the lowest price. For example, see this AWS CloudFormation template on GitHub.
  3. Use the Amazon EKS bootstrap tool (/etc/eks/bootstrap.sh in the recommended AMI) to label each node with its instances lifecycle, either nDemand or Spot. For example:
    • OnDemand: “–kubelet-extra-args –node labels=lifecycle=ondemand,title=minecraft,region=uswest2”
    • Spot: “–kubelet-extra-args –node-labels=lifecycle=spot,title=minecraft,region=uswest2”
  4. A daemon set deployed on every node pulls the termination status from the instance metadata endpoint. When a termination notification arrives, the `kubectl drain node` command is executed, and a SIGTERM signal is sent to the game-server pod. You can see these commands in the batch file on GitHub.
  5. The game server keeps running for the next 120 seconds to allow the game to notify the players about the incoming termination.
  6. No new game-server is scheduled on the node to be terminated because it’s marked as unschedulable.
  7. A notification to an external system such as a matchmaking system is sent to update the current inventory of available game servers.

Optimization strategies for Kubernetes specifications

This section describes a few recommended strategies for Kubernetes specifications that enable optimal game server placements on the provisioned worker nodes.

  • Use single Spot Instance Auto Scaling groups as worker nodes. To accommodate the use of multiple Auto Scaling groups, we use Kubernetes nodeSelector to control the game-server scheduling on any of the nodes in any of the Spot Instance–based Auto Scaling groups.
    nodeSelector:
         lifecycle: spot
            title: your game title

  • The lifecycle label is populated upon node creation through the AWS CloudFormation template in the following section:
    BootstrapArgumentsForSpotFleet:
    	Description: Sets Node Labels to set lifecycle as Ec2Spot
    	    Default: "--kubelet-extra-args --node-labels=lifecycle=spot,title=minecraft,region=uswest2"
    	
    	    Type: String

  • You might have a case where the incoming player actions are served by UDP and masking the interruption from the player is required. Here, the game-server allocator (a Kubernetes scheduler for us) schedules more than one game server as target upstream servers behind a UDP load balancer that multicasts any packet received to the set of game servers. After the scheduler terminates the game server upon node termination, the failover occurs seamlessly. For more information, see “What if I want my game-server to be redundant?” later in this blog.

Reference architecture

The following architecture describes an instance mix of On-Demand and Spot Instances in an Amazon EKS cluster of multiplayer game servers. Within a single VPC, control plane node pools (Master Host and Storage Host) are highly available and thus run On-Demand Instances. The game-server hosts/nodes uses a mix of Spot and On-Demand Instances. The control plane, the API server is accessible via an Amazon Elastic Load Balancing Application Load Balancer with a preconfigured allowed list.

What if I want my game server to be redundant?

A game server is a sessionful workload, but it traditionally runs as a single dedicated game server instance with no redundancy. For game servers that use TCP as the transport network layer, AWS offers Network Load Balancers as an option for distributing player traffic across multiple game servers’ targets. Currently, game servers that use UDP don’t have similar load balancer solutions that add redundancy to maintain a highly available game server.

This section proposes a solution for the case where game servers deployed as containerized Amazon EKS pods use UDP as the network transport layer and are required to be highly available. We’re using the UDP load balancer because of the Spot Instances, but the choice isn’t limited to when you’re using Spot Instances.

The following diagram shows a reference architecture for the implementation of a UDP load balancer based on Amazon EKS. It requires a setup of an Amazon EKS cluster as suggested above and a set of components that simulate architecture that supports multiplayer game services. For example, this includes game-server inventory that captures the running game-servers, their status, and allocation placement.The Amazon EKS cluster is on the left, and the proposed UDP load-balancer system is on the right. A new game server is reported to an Amazon SQS queue that persists in an Amazon DynamoDB table. When a player’s assignment is required, the match-making service queries an API endpoint for an optimal available game server through the game-server inventory that uses the DynamoDB tables.

The solution includes the following main components:

  • The game server (see mockup-udp-server at GitHub). This is a simple UDP socket server that accepts a delta of a game state from connected players and multicasts the updated state based on pseudo computation back to the players. It’s a single threaded server whose goal is to prove the viability of UDP-based load balancing in dedicated game servers. The model presented here isn’t limited to this implementation. It’s deployed as a single-container Kubernetes pod that uses hostNetwork: true for network optimization.
  • The load balancer (udp-lb). This is a containerized NGINX server loaded with the stream module. The load balance upstream set is configured upon initialization based on the dedicated game-server state that is stored in the DynamoDB table game-server-status-by-endpoint. Available load balancer instances are also stored in a DynamoDB table, lb-status-by-endpoint, to be used by core game services such as a matchmaking service.
  • An Amazon SQS queue that captures the initialization and termination of game servers and load balancers instances deployed in the Kubernetes cluster.
  • DynamoDB tables that persist the state of the cluster with regards to the game servers and load balancer inventory.
  • An API operation based on AWS Lambda (game-server-inventory-api-lambda) that serves the game servers and load balancers for an updated list of resources available. The operation supports /get-available-gs needed for the load balancer to set its upstream target game servers. It also supports /set-gs-busy/{endpoint} for labeling already claimed game servers from the available game servers inventory.
  • A Lambda function (game-server-status-poller-lambda) that the Amazon SQS queue triggers and that populates the DynamoDB tables.

Scheduling mechanism

Our goal in this example is to reduce the chance that two game servers that serve the same load-balancer game endpoint are interrupted at the same time. Therefore, we need to prevent the scheduling of the same game servers (mockup-UDP-server) on the same host. This example uses advanced scheduling in Kubernetes where the pod affinity/anti-affinity policy is being applied.

We define two soft labels, mockup-grp1 and mockup-grp2, in the podAffinity section as follows:

      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                      - mockup-grp1
              topologyKey: "kubernetes.io/hostname"

The requiredDuringSchedulingIgnoredDuringExecution tells the scheduler that the subsequent rule must be met upon pod scheduling. The rule says that pods that carry the value of key: “app” mockup-grp1 will not be scheduled on the same node as pods with key: “app” mockup-grp2 due to topologyKey: “kubernetes.io/hostname”.

When a load balancer pod (udp-lb) is scheduled, it queries the game-server-inventory-api endpoint for two game-server pods that run on different nodes. If this request isn’t fulfilled, the load balancer pod enters a crash loop until two available game servers are ready.

Try it out

We published two examples that guide you on how to build an Amazon EKS cluster that uses Spot Instances. The first example, Spotable Game Server, creates the cluster, deploys Spot Instances, Dockerizes the game server, and deploys it. The second example, Game Server Glutamate, enhances the game-server workload and enables redundancy as a mechanism for handling Spot Instance interruptions.

Conclusion

Multiplayer game servers have relatively short-lived processes that last between a few minutes to a few hours. The current average observed life span of Spot Instances in US and EU Regions ranges between a few hours to a few days, which makes Spot Instances a good fit for game servers. Amazon GameLift FleetIQ offers native and seamless support for Spot Instances, and Amazon EKS offers mechanisms to significantly minimize the probability of interrupting the player experience. This makes the Spot Instances an attractive option for not only casual multiplayer game server but also hardcore game servers. Game studios that use Spot Instances for multiplayer game server can save up to 90% of the compute cost, thus benefiting them as well as delighting their players.

Introducing AWS Solutions: Expert architectures on demand

Post Syndicated from AWS Admin original https://aws.amazon.com/blogs/architecture/introducing-aws-solutions-expert-architectures-on-demand/

AWS Solutions Architects are on the front line of helping customers succeed using our technologies. Our team members leverage their deep knowledge of AWS technologies to build custom solutions that solve specific problems for clients. But many customers want to solve common technical problems that don’t require custom solutions, or they want a general solution they can use as a reference to build their own custom solution. For these customers, we offer AWS Solutions: vetted, technical reference implementations built by AWS Solutions Architects and AWS Partner Network partners. AWS Solutions are designed to help customers solve common business and technical problems, or they can be customized for specific use cases.

AWS Solutions are built to be operationally effective, performant, reliable, secure, and cost-effective; and incorporate architectural frameworks such as the Well-Architected Framework. Every AWS Solution comes with a detailed architecture diagram, a deployment guide, and instructions for both manual and automated deployment.

Here are some Solutions we are particularly excited about.

Media2Cloud

We released the Media2Cloud solution in January 2019. This solution helps customers migrate their existing video archives to the cloud. Media2Cloud sets up a serverless end-to-end workflow to ingest your videos and establish metadata, proxy videos, and image thumbnails.

Because it can be a challenging and slow process to migrate your existing video archives to the cloud, the Media2Cloud solution builds the following architecture.

Media2Cloud architecture

The solution leverages the Media Analysis Solution to analyze and extract valuable metadata from your video archives using Amazon Rekognition, Amazon Transcribe, and Amazon Comprehend.

The solution also includes a simple web interface that helps make it easier to get started ingesting your videos to the AWS Cloud. This solution is set up to integrate with AWS Partner Network partners to help customers migrate their video archives to the cloud.

AWS Instance Scheduler

In October 2018, we updated the AWS Instance Scheduler, a solution that enables customers to easily configure custom start and stop schedules for their Amazon EC2 and Amazon RDS instances.

When you deploy the solution’s template, the solution builds the following architecture.

AWS Instance Scheduler

 

For customers who leave all of their instances running at full utilization, this solution can result in up to 70% cost savings for those instances that are only necessary during regular business.

The Instance Scheduler solution gives you the flexibility to automatically manage multiple schedules as necessary, configure multiple start and stop schedules by either deploying multiple Instance Schedulers or modifying individual resource tags, and review Instance Scheduler metrics to better assess your Instance capacity and usage, and to calculate your cost savings.

AWS Connected Vehicle Solution

In January 2018, we updated the AWS Connected Vehicle Solution, a solution that provides secure vehicle connectivity to the AWS Cloud. This solution includes capabilities for local computing within vehicles, sophisticated event rules, and data processing and storage. The solution also allows you to implement a core framework for connected vehicle services that allows you to focus on developing new functionality rather than managing infrastructure.

When you deploy the solution’s template, the solution builds the following architecture.

Connected Vehicle solution

You can build upon this framework to address a variety of use cases such as voice interaction, navigation and other location-based services, remote vehicle diagnostics and health monitoring, predictive analytics and required maintenance alerts, media streaming services, vehicle safety and security services, head unit applications, and mobile applications.

These are just some of our current offerings. Other notable Solutions include AWS WAF Security Automations, Machine Learning for Telecommunication, and AWS Landing Zone. In the coming months, we plan to continue expanding our portfolio of AWS Solutions to address common business and technical problems that our customers face. Visit our homepage to keep up to date with the latest AWS Solutions.

Now Available – Five New Amazon EC2 Bare Metal Instances: M5, M5d, R5, R5d, and z1d

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-five-new-amazon-ec2-bare-metal-instances-m5-m5d-r5-r5d-and-z1d/

Today we are launching the five new EC2 bare metal instances that I promised you a few months ago. Your operating system runs on the underlying hardware and has direct access to the processor and other hardware. The instances are powered by AWS-custom Intel® Xeon® Scalable Processor (Skylake) processors that deliver sustained all-core Turbo performance.

Here are the specs:

Instance NameSustained All-Core Turbo
Logical ProcessorsMemoryLocal StorageEBS-Optimized BandwidthNetwork Bandwidth
m5.metalUp to 3.1 GHz96384 GiB14 Gbps25 Gbps
m5d.metalUp to 3.1 GHz96384 GiB4 x 900 GB NVMe SSD14 Gbps25 Gbps
r5.metalUp to 3.1 GHz96768 GiB14 Gbps25 Gbps
r5d.metalUp to 3.1 GHz96768 GiB4 x 900 GB NVMe SSD14 Gbps25 Gbps
z1d.metalUp to 4.0 GHz48384 GiB2 x 900 GB NVMe SSD14 Gbps25 Gbps

The M5 instances are designed for general-purpose workloads, such as web and application servers, gaming servers, caching fleets, and app development environments. The R5 instances are designed for high performance databases, web scale in-memory caches, mid-sized in-memory databases, real-time big data analytics, and other memory-intensive enterprise applications. The M5d and R5d variants also include 3.6 TB of local NVMe SSD storage.

z1d instances provide high compute performance and lots of memory, making them ideal for electronic design automation (EDA) and relational databases with high per-core licensing costs. The high CPU performance allows you to license fewer cores and significantly reduce your TCO for Oracle or SQL Server workloads.

All of the instances are powered by the AWS Nitro System, with dedicated hardware accelerators for EBS processing (including crypto operations), the software-defined network inside of each Virtual Private Cloud (VPC), ENA networking, and access to the local NVMe storage on the M5d, R5d, and z1d instances. Bare metal instances can also take advantage of Elastic Load Balancing, Auto Scaling, Amazon CloudWatch, and other AWS services.

In addition to being a great home for old-school applications and system software that are licensed specifically and exclusively for use on physical, non-virtualized hardware, bare metal instances can be used to run tools and applications that require access to low-level processor features such as performance counters. For example, Mozilla’s Record and Replay Framework (rr) records and replays program execution with low overhead, using the performance counters to measure application performance and to deliver signals and context-switch events with high fidelity. You can read their paper, Engineering Record And Replay For Deployability, to learn more.

Launch One Today
m5.metal instances are available in the US East (N. Virginia and Ohio), US West (N. California and Oregon), Europe (Frankfurt, Ireland, London, Paris, and Stockholm), and Asia Pacific (Mumbai, Seoul, Singapore, Sydney, and Tokyo) AWS regions.

m5d.metal instances are available in the US East (N. Virginia and Ohio), US West (Oregon), Europe (Frankfurt, Ireland, Paris, and Stockholm), and Asia Pacific (Mumbai, Seoul, Singapore, and Sydney) AWS regions.

r5.metal instances are available in the US East (N. Virginia and Ohio), US West (N. California and Oregon), Europe (Frankfurt, Ireland, Paris, and Stockholm), Asia Pacific (Mumbai, Seoul, and Singapore), and AWS GovCloud (US-West) AWS regions.

r5d.metal instances are available in the US East (N. Virginia and Ohio), US West (N. California), Europe (Frankfurt, Paris, and Stockholm), Asia Pacific (Mumbai, Seoul, and Singapore), and AWS GovCloud (US-West) AWS regions.

z1d.metal instances are available in the US East (N. Virginia), US West (N. California and Oregon), Europe (Ireland), and Asia Pacific (Singapore and Tokyo) AWS regions.

The bare metal instances will become available in even more AWS regions as soon as possible.

Jeff;

 

Best Practices for Porting Applications to the EC2 A1 Instance Type

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/best-practices-for-porting-applications-to-the-ec2-a1-instance-type/

This post courtesy of Dr. Jonathan Shapiro-Ward, AWS Solutions Architect

The new Amazon EC2 A1 instance types are powered by an ARM based AWS Graviton CPU. A1 instances are extremely cost effective and are ideal for scale-out scenarios where a large number of smaller instances are required.

Prior the launch of A1 instances, all AWS instances were x86 based. There are a number of significant differences between the two instruction sets. The predominant difference is that ARM follows a RISC (Reduced Instruction Set Computer) design, whereas x86 is a CISC (Complex Instruction Set Computer) architecture. In short, ARM is a comparatively simple architecture composed of simple instructions which execute within a single cycle. Meanwhile, x86 has mostly complex instructions that execute over multiple cycles. This key difference has a range of implications for compiler and hardware complexity, power efficiency, and performance, but the key question to application developers is portability.

Workloads built for x86 will not run on the A1 family of instances, they must be ported. In many cases this is trivial. An extensive range of Free and Open Source software supports ARM and requires no modification to port workloads. Aside from installing a different binary, the process for installing the Apache Web Server, Nginx, PostgreSQL, Docker, and many more applications is unchanged. Unfortunately, porting is not always so simple, especially for applications developed in house. Ideally, software is written to be portable, but this is not always the case. Even workloads written in languages designed for portability such as Java can prove a challenge. In this blog post, we’ll review common challenges and migration paths from porting from x86 based architectures to ARM.

A General Porting Strategy

  1. Check for Core Language and Platform Support. The vast majority of common languages such as Java, Python, Perl, Ruby, PHP, Go, Rust, and so forth support modern ARM architectures. Likewise, major frameworks such as Django, Spring, Hadoop, Apache Spark, and many more run on ARM. More niche languages and frameworks may not have as robust support. If your language or crucial framework is dependent on x86, you may not be able to run your application on the A1 instance type.
  2. Identify all third party libraries and dependencies. All non-trivial applications rely on third party libraries to provide essential functionality. These can range from a standard library, to open source libraries, to paid for proprietary libraries. Examine these libraries and determine if they support ARM. In the event that a library is dependent upon a specific architecture, search for an open issue around ARM support or inquire with the vendor as to the roadmap. If possible, investigate alternative libraries if ARM support is not forthcoming.
  3. Identify Porting Path. There are three common strategies for porting. The strategy that applies will depend upon the language being used.
    1. For interpreted and those compiled to bytecode, the first step is to translate runbooks, scripts, AMIs, and templates to install the ARM equivalent of the interpreter or language VM. For instance, installing the ARM version of the JVM or CPython. If installation is done via package manager, no change may be necessary. Subsequently it is necessary to ensure that any native code libraries are replaced with the ARM equivalent. For Java, this would entail swapping out libraries leveraging JNI. For Python, this would entail swapping out CPython C-API based libraries (such as numpy). Once again, if this is done via a package manager such as yum or pip, manual intervention should not be necessary.
    2. In the case of compiled languages such as C/C++, the application will have to be re-compiled for ARM. If your application utilizes any machine specific features or relies on behavior that varies between compilers, it may be necessary to re-write parts of your application. This is discussed in more detail below. If your application follows common standards such as ANSI C and avoids any machine specific dependencies, the majority of effort will be spent in modifying the build process. This will depend upon your build process and will likely involve modifying build configurations, makefiles, configure scripts, or other assets. The binary can either be compiled on an A1 instance or cross compiled. Once a binary has been produced, the deployment process should be adapted to target the A1 instance.
    3. For applications built using platform specific languages and frameworks, which have an ARM alternative, the application will have to be ported to this alternative. By far the most common example of this is the .NET Framework. In order to run on ARM, .NET applications must be ported to .NET Core or to Mono. This will likely entail a non-trivial modification to the application codebase. This is discussed in more detail below.
  4. Test! All tests must be ported over and, if there is insufficient test coverage, it may be necessary to write new tests. This is especially pertinent if you had to recompile your application. A strong test suite should identify any issues arising from machine specific code that behaves incorrectly on the A1 instance type. Perform the usual range of unit testing, acceptance testing, and pre-prod testing.
  5. Update your infrastructure as code resources, such as AWS CloudFormation templates, to provision your application to A1 instances. This will likely be a simple change, modifying the instance type, AMI, and user data to reflect the change to the A1 instance.
  6. Perform a Green/Blue Deployment. Create your new A1 based stack alongside your existing stack and leverage Route 53 weighted routing to route 10% of requests to the new stack. Monitor error rates, user behavior, load, and other critical factors in order to determine the health of the ported application in production. If the application behaves correctly, swap over all traffic to the new stack. Otherwise, reexamine the application and identify the root cause of any errors.

Porting C/C++ Applications to A1 Instances

C and C++ are very portable languages. Indeed, C runs on more architectures than any other language but that is not to say that all C will run on all systems. When porting applications to A1 instances, many of the challenges one might expect do not arise. The AWS Graviton chip is little endian, just like x86 and int, float, double, and other common types are the same size between architectures. This does not, however, guarantee portability. Most commonly, issues porting a C based application arise from aspects of the C standard which are dependent upon the architecture and implementation. Let’s briefly look at some examples of C that are not portable between architectures.

The most frequently discussed issue in porting C from x86 to ARM is the use of the character datatype. This issue arises from the C99 standard which requires the implementation to decide if the char datatype is signed or unsigned. On x86 Linux a char is signed by default. On ARM Linux a char is unsigned. This discrepancy is due to performance, with unsigned char types resulting in more efficient ARM assembly. This can, however, cause issues. Let’s examine the following code listing:

//Code Listing 1
#include <stdio.h>

int main(){
    char c = -1;
    if (c < 0){
        printf("The value of the char is less than 0\n");
    } else {
        printf("The value of the char is greater than 0\n");
    }
    return 0;
}

On an x86 instance (in this case a t3.large) the above code has the expected result – printing “The value of the char is less than 0”. On an A1 instance, it does not. There are mechanisms around this, for example gcc has the -fsigned-char flag which forces all char types to become signed upon compilation. Crucially, a developer must be aware of these types of issues ahead of time. Not all compilers and warn levels will provide appropriate warnings around char signedness (and indeed many other issues arising from architectural differences). Resultantly, without rigorous testing, it is possible to introduce unexpected errors by porting. This makes a comprehensive set of tests for your application an essential part of the porting process.

If your application has only ever been built for a single target environment, (e.g. x86 Linux with gcc) there are potentially unexpected behaviors which will emerge when that application is built for a different architecture or with a different compiler.

The key best practice for ensuring portability of your C applications (and other compiled languages) is to adhere to a standard. Vanilla C99 will ensure the broadest compatibility across architectures and operating systems. A compiler specific standard such as gnu99 will ensure compatibility but can tether you to one compiler.

Static analysis should always be used when building your applications. Static analysis helps to detect bugs, security flaws, and pertinent to our discussion: compatibility issues.

Porting .NET Applications to A1 Instances

Porting a .NET application from a Windows instance to an A1 Linux instance can yield significant cost reductions and is an effective way to economically scale out web apps and other parallelizable workloads.

For all intents and purposes, the .NET Framework only runs on x86 Windows. This limitation does not necessarily prevent running your .NET application on an A1 instance. There are two .NET implementations, .NET Core and .NET Framework. The .NET Framework is the modern evolution of the original .NET release and is tightly coupled to x86 Windows. Meanwhile, .NET Core is a recent open source project, developed by Microsoft, which is decoupled from Windows and runs on a variety of platforms, including ARM Linux. There is one final option, Mono – an open source implementation of the .NET Framework which runs on a variety of architectures.

For greenfield projects .NET Core has become the de facto option as it has a number of advantages over the alternatives. Projects based on .NET Core are cross platform and significantly lighter than .Net Framework and Mono projects – making them far better suited to developing microservices and to running in containers or serverless environments. From version 2.1 onward, .NET Core supports ARM.

For existing projects, .NET Core is the best migration path to containers and to running on A1 instances. There are, however, a number of factors that might prohibit replatforming to .NET Core. These include

• Dependency on Windows specific APIs
• Reliance on features that are only available in .NET Framework such as WPF or Windows Forms. Many of these features are coming to .NET Core as part of .NET Core 3, but at time or writing, this is in preview.
• Use of third-party libraries that do not support .NET Core.
• Use of an unsupported language Currently, .NET Core only supports C#, F#, Visual Studio.

There are a number of tools to help port from .NET Framework to .NET Core. These include the .Net portability analyzer, which will analyze a .NET codebase and determine any factors that might prohibit porting.

Conclusion

The A1 instances can deliver significant cost savings over other instances types and are ideal for scale-out applications such as microservices. In many cases, moving to A1 instances can be easy. Many languages, frameworks, and applications have strong support for ARM. For Python, Java, Ruby, and other open source languages, porting can be trivial. For other applications such as native binary applications or .net applications, there can be some challenges. By examining your application and determining what, if any, x86 dependencies exist you can devise a migration strategy which will enable you to make use of A1 instances.

AWS Ops Automator v2 features vertical scaling (Preview)

Post Syndicated from AWS Admin original https://aws.amazon.com/blogs/architecture/aws-ops-automator-v2-features-vertical-scaling-preview/

The new version of the AWS Ops Automator, a solution that enables you to automatically manage your AWS resources, features vertical scaling for Amazon EC2 instances. With vertical scaling, the solution automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. The solution can resize your instances by restarting your existing instance with a new size. Or, the solution can resize your instances by replacing your existing instance with a new, resized instance.

With this update, the AWS Ops Automator can help make setting up vertical scaling easier. All you have to do is define the time-based or event-based trigger that determines when the solution scales your instances, and choose whether you want to change the size of your existing instances or replace your instances with new, resized instances. The time-based or event-based trigger invokes the AWS Lambda to scale your instances.

Ops Automator Vertical Scaling

Restarting with a new size

When you choose to resize your instances by restarting the instance with a new size, the solution increases or decreases the size of your existing instances in response to changes in demand or at a specified point in time. The solution automatically changes the instance size to the next defined size up or down.

Replacing with a new, resized instance

Alternatively, you can choose to have the Ops Automator replace your instance with a new, resized instance instead of restarting your existing instance. When the solution determines that your instances need to be scaled, the solution launches new instances with the next defined instance size up or down. The solution is also integrated with Elastic Load Balancing to automatically register the new instance with load Balancers.

Getting Started

To learn more, visit the solution webpage and request access to the private preview.

Learn about hourly-replication in Server Migration Service and the ability to migrate large data volumes

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/learn-about-hourly-replication-in-server-migration-service-and-the-ability-to-migrate-large-data-volumes/

This post courtesy of Shane Baldacchino, AWS Solutions Architect

AWS Server Migration Service (AWS SMS) is an agentless service that makes it easier and faster for you to migrate thousands of on-premises workloads to AWS. AWS SMS allows you to automate, schedule, and track incremental replications of live server volumes, making it easier for you to coordinate large-scale server migrations.

In my previous blog posts, we introduced how you can use AWS Server Migration Service (AWS SMS) to migrate a popular commercial off the shelf software, WordPress into AWS.

For details and a walkthrough on how setup the AWS Server Migration Service, please see the following blog posts for Hyper-V and VMware hypervisors which will guide you through the high level process.

In this article we are going to step it up a few notches and look past common the migration of off-the-shelf software and provide you a pattern on how you can use AWS SMS and some of the recently launched features to migrate a more complicated environment, especially compression and resiliency for replication jobs and the support for data volumes greater than 4TB.

This post covers a migration of a complex internally developed eCommerce system comprising of a polyglot architecture. It is made up a Windows Microsoft IIS presentation tier, Tomcat application tier, and Microsoft SQL Server database tier. All workloads run on-premises as virtual machines in a VMware vCenter 5.5 and ESX 5.5 environment.

This theoretical customer environment has various business and infrastructure requirements.

Application downtime: During any migration activities, the application cannot be offline for more than 2 hours
Licensing: The customer has renewed their Microsoft SQL Server license for an additional 3 years and holds License Mobility with Software Assurance option for Microsoft SQL Server and therefore wants to take advantage of AWS BYOL licensing for Microsoft SQL server and Microsoft Windows Server.
Large data volumes: The Microsoft SQL Server database engine (.mdf, .ldf and .ndf files) consumes 11TB of storage

Walkthrough

Key elements of this migration process are identical to the process outlined in my previous blog posts and for more information on this process, please see the following blog posts Hyper-V and VMware hypervisors, but a high level you will need to.

• Establish your AWS environment.
• Download the SMS Connector from the AWS Management Console.
• Configure AWS SMS and Hypervisor permissions.
• Install and configure the SMS Connector appliance.
• Import your virtual machine inventory and create a replication jobs
• Launch your Amazon EC2 instances and associated NACL’s, Security Groups and AWS Elastic Load Balancers
• Change your DNS records to resolve the custom application to an AWS Elastic Load Balancer.

Before you start, ensure that your source systems OS and vCenter version are supported by AWS. For more information, see the Server Migration Service FAQ.

Planning the Migration

Once you have downloaded and configured the AWS SMS connector with your given Hypervisor you can get started in creating replication jobs.

The artifacts derived from our replication jobs with AWS SMS will be AMI’s (Amazon Machine Images) and as such we do not need to replicate each server individually and that is because we have a three-tier architecture that has commonality between servers with multiple Application and Web servers performing the same function, and as such we can leverage a common AMI and create three replication jobs.

1. Microsoft SQL Server – Database Tier
2. Ubuntu Server – Application Tier
3. IIS Web server – Webserver Tier

Performing the Replication

After validating that the SMS Connector is in a “HEALTHY” state, import your server catalog from your Hypervisor to AWS SMS. This process can take up to a minute.

Select the three servers (Microsoft SQL Server, Ubuntu Server, IIS Web server) to migrate and choose Create replication job. AWS SMS now supports creating replications jobs with frequencies as short as 1 hour, and as such to ensure our business RTO (Recovery Time Objective) of 2 hours is met we will create our replication jobs with a frequency of 1 hour. This will minimize the risk of any delta updates during the cutover windows not completing.

Given the businesses existing licensing investment in Microsoft SQL Server, they will leverage these the BYOL (Bring Your Own License) offering when creating the Microsoft SQL Server replication job.

The AWS SMS console guides you through the process. The time that the initial replication task takes to complete is dependent on available bandwidth and the size of your virtual machines.

After the initial seed replication, network bandwidth requirement is minimized as AWS SMS replicates only incremental changes occurring on the VM.

The progress updates from AWS SMS are automatically sent to AWS Migration Hub so that you can track tasks in progress.

AWS Migration Hub provides a single location to track the progress of application migrations across multiple AWS and partner solutions. In this post, we are using AWS SMS as a mechanism to migrate the virtual machines (VMs) and track them via AWS Migration Hub.

Migration Hub and AWS SMS are both free. You pay only for the cost of the individual migration tools that you use, and any resources being consumed on AWS

The dashboard reflects any status changes that occur in the linked services. You can see from the following image that two servers are complete whilst another is in progress.

Using Migration Hub, you can view the migration progress of all applications. This allows you to quickly get progress updates across all of your migrations, easily identify and troubleshoot any issues, and reduce the overall time and effort spent on your migration projects.

After validating that the SMS Connector

Testing Your Replicated Instances

Thirty hours after creating the replication jobs, notification was received via AWS SNS (Simple Notification Service) that all 3 replication jobs have completed. During the 30-hour replication window the customers ISP experienced downtime and sporadic flapping of the link, but this was negated by the network auto-recovery feature of SMS. It recovered and resumed replication without any intervention.

With the replication tasks being complete. The artifact created by AWS SMS is a custom AMI that you can use to deploy an EC2 instance. Follow the usual process to launch your EC2 instance, noting that you may need to replace any host-based firewalls with security groups and NACLs and any hardware based load balancers with Elastic Load Balancing to achieve fault tolerance, scalability, performance and security.

As this environment is a 3-tier architecture with commonality been tiers (Application and Presentation Tier) we can create during the EC2 Launch process an ASG (Auto Scaling Group) to ensure that deployed capacity matches user demand. The ASG will be based on the custom AMI’s generated by the replication jobs.

When you create an EC2 instance, ensure that you pick the most suitable EC2 instance type and size to match your performance and cost requirements.

While your new EC2 instances are a replica of your on-premises VM, you should always validate that applications are functioning. How you do this differs on an application-by-application basis. You can use a combination of approaches, such as editing a local host file and testing your application, SSH, RDP and Telnet.

For our Windows Presentation and database tier, I can RDP in to my systems and validate IIS 8.0 and other services are functioning correctly.

For our Ubuntu Application tier, we can SSH in to perform validation.

Post validation of each individual server we can now continue to test the application end to end. This is because our systems have been instantiated inside a VPC with no route back to our on-premises environment which allows us to test functionality without the risk of communication back to our production application.

After validation of systems it is now time to cut over, plan your runbook accordingly to ensure you either eliminate or minimize application disruption.

Cutting Over

As the replication window specified in AWS SMS replication jobs was 1 hour, there were hourly AMI’s created that provide delta updates since the initial seed replication was performed. The customer verified the stack by executing the previously created runbook using the latest AMIs, and verified the application behaved as expected.

After another round of testing, the customer decided to plan the cutover on the coming Saturday at midnight, by announcing a two-hour scheduled maintenance window. During the cutover window, the customer took the application offline, shutdown Microsoft SQL Server instance and performed an on-demand sync of all systems.

This generate a new versioned AMI that contained all on-premise data. The customer then executed the runbook on the new AMI’s. For the application and presentation tier these AMI’s were used in the ASG configuration. After application validation Amazon Route 53 was updated to resolve the application CNAME to the Application Load Balancer CNAME used to load balance traffic to the fleet of IIS servers.

Based on the TTL (Time To Live) of your Amazon Route 53 DNS zone file, end users slowly resolve the application to AWS, in this case within 300 seconds. Once this TTL period had elapsed the customer brought their application back online and exited their maintenance window, with time to spare.

After modifying the Amazon Route 53 Zone Apex, the physical topology now looks as follows with traffic being routed to AWS.

After validation of a successful migration the customer deleted their AWS Server Migration Service replication jobs and began planning to decommission their on-premises resources.

Summary

This is an example pattern on migrate a complex custom polyglot environment in to AWS using AWS migration services, specifically leveraging many of the new features of the AWS SMS service.

Many architectures can be extended to use many of the inherent benefits of AWS, with little effort. For example this article illustrated how AWS Migration Services can be used to migrate complex environments in to AWS and then use native AWS services such as Amazon CloudWatch metrics to drive Auto Scaling policies to ensure deployed capacity matches user demand whilst technologies such as Application Load Balancers can be used to achieve fault tolerance and scalability

Think big and get building!

 

 

Handling AWS Chargebacks for Enterprise Customers

Post Syndicated from Varad Ram original https://aws.amazon.com/blogs/architecture/handling-aws-chargebacks-for-enterprise-customers/

As AWS product portfolios and feature sets grow, as an enterprise customer, you are likely to migrate your existing workloads and innovate your new products on AWS. To help you keep your cloud charges simple, you can use consolidated billing. This can, however, create complexity for your internal chargebacks, especially if some of your resources and services are not tagged correctly. To help your individual teams and business units normalize and reduce their costs as your AWS implementation grows, you can implement chargebacks transparently and automate billing.

This blog post includes a walkthrough of an end-to-end mechanism that you can use to automate your consolidated billing charges for either your existing AWS accounts, or for newly created accounts.

Walkthrough

Prerequisites for implementation:

  • One account that is the payer account, which consolidates billing and links all other accounts (including admin accounts)
  • An understanding of billing, Detailed Billing Report (DBR), Cost and Usage Report (CUR), and blended and unblended costs
  • Activate propagation of necessary cost allocation tags to consolidated billing
  • Access to reservations across the linked accounts
  • Read permission on the source bucket and write permission to the transformed bucket
  • An automated method (such as database access or an API) to verify the cost centers tagged to AWS resources
  • Permissions to get access to the services described in this solution on the account targeted for this automation

Before you begin, it is important to understand the blended costs and unblended costs in consolidated billing. Blended costs are calculated based on the blended rate (the average rates for the reserved and on-demand instances that are used by your member accounts) for each service your accounts used, multiplied by the account usage of those services. Unblended costs are the charges for those services broken out for each linked account.

Based on your organization’s strategy for savings (centralized or not), you could consider either the blended or unblended costs. The consolidated billing files that include the information for the chargeback are the Detailed Billing Report (DBR) and Cost and Usage Report (CUR). Both of these reports provide both the blended and unblended rates as separate columns.

To help you create and maintain your AWS accounts, you can use AWS Account Vending Machine (AVM). You can launch AVM from either the AWS Landing Zone or with a custom solution. AVM keeps all your account information in a DynamoDB table (such as the account number, root mail ID, default cost center, name of the owner, etc.) and maintains reservation-related data (such as invoice ID, instance type, region, amount, cost center, etc.) in another table. To enable your account administrator to add invoice details for all your reservations, you can use a web page hosted on AWS Lambda, Amazon Simple Storage Service (Amazon S3), or a web server.

To begin the process of billing transformation, you must add a trigger on an S3 bucket (which contains raw AWS billing files) that pushes messages (PutObject) into Amazon Simple Queue Service (SQS) and your billing transformation program (written in Python, Nodejs, Java, .net, etc. using AWS SDK) that runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance, containers, or Lambda (if the bill can be processed within 15 minutes with file size restrictions).

The billing transformation program must do the following:

  • Cache the Account details and reservation DynamoDB tables
  • Verify if there are any messages in SQS
  • Ignore if the file is not a DBR or CUR file (process either of them, not both)
  • Download the file, unzip, and read row-by-row; for a DBR file, consider only the “LineItem” RecordType
  • Add two new columns: Bill_CostCenter and Bill_Notes
    • If there is a valid value in the CostCenter tag (verified with internal automation processes), add the same value to the Bill_CostCenter column and any notes to the Bill_Notes column
    • If the CostCenter is invalid, get the default Cost Center from the cached account details and add the information to the Bill_CostCenter and Bill_Notes columns
    • If the row is a reservation invoice, the cost center information comes from the reservation table and is added to the correct column
  • Cache consolidation of cost centers with the blended or unblended cost of each row
  • Write each of these processed line items into a new file
  • Handle exceptions by the normal organization practices (for example, email the owner of the cost center or the finance team)
  • Push the new file into the transformed Amazon S3 bucket
  • Write the consolidated lines into a different file and upload to Transformed Amazon S3 bucket
Figure 1 – Architecture of processing a billing chargeback

Figure 1 – Architecture of processing a billing chargeback

 

Figure 2 – Validating the Cost Center process

After you have the consolidated billing file aggregated by cost center, you can easily see and handle your internal chargebacks. To further simplify your chargeback model, you can get help from AWS Technical Account Managers and Billing Concierge, if your organization would like AWS to provide custom invoices from the consolidated billing file.

Because the cost centers in your organization can expire over time, it’s important validate them frequently with automation, such as a Lambda program.

Improvements

If your organization has a more complex chargeback structure, you can extend the logic described above to support deeper and broader chargeback codes, or implement hierarchical chargeback structure.

You can also extend the transformation logic to support several chargeback codes (such as comma separated or with additional tags) if you have multiple teams or project that want to share a resource.

Summary

As enterprise organizations grow and consume more cloud services, the cost optimization process grows and evolves with them. Sophisticated chargeback models enable the teams and business units in the organization to be accountable and contribute to take the steps necessary to normalize the usage and costs of AWS services.

About the Author

Varad RamVarad Ram likes to help customers adopt to cloud technologies and he is particularly interested in Artificial Intelligence. He believes Deep Learning will power future technology growth. In his spare time, his daughter and toddler son keep him busy biking and hiking.

Podcast #289: A Look at Amazon FSx For Windows File Server

Post Syndicated from Simon Elisha original https://aws.amazon.com/blogs/aws/podcast-amazon-fsx-for-windows-file-server/

In this episode, Simon speaks with Andrew Crudge (Senior Product Manager, FSx) about this newly released service, capabilities available to customers and how to make the best use of it in your environment.

Additional Resources

About the AWS Podcast

The AWS Podcast is a cloud platform podcast for developers, dev ops, and cloud professionals seeking the latest news and trends in storage, security, infrastructure, serverless, and more. Join Simon Elisha and Jeff Barr for regular updates, deep dives and interviews. Whether you’re building machine learning and AI models, open source projects, or hybrid cloud solutions, the AWS Podcast has something for you. Subscribe with one of the following:

Like the Podcast?

Rate us on iTunes and send your suggestions, show ideas, and comments to [email protected]. We want to hear from you!

Western Digital HDD Simulation at Cloud Scale – 2.5 Million HPC Tasks, 40K EC2 Spot Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/western-digital-hdd-simulation-at-cloud-scale-2-5-million-hpc-tasks-40k-ec2-spot-instances/

Earlier this month my colleague Bala Thekkedath published a story about Extreme Scale HPC and talked about how AWS customer Western Digital built a cloud-scale HPC cluster on AWS and used it to simulate crucial elements of upcoming head designs for their next-generation hard disk drives (HDD).

The simulation described in the story encompassed a little over 2.5 million tasks, and ran to completion in just 8 hours on a million-vCPU Amazon EC2 cluster. As Bala shared in his story, much of the simulation work at Western Digital revolves around the need to evaluate different combinations of technologies and solutions that comprise an HDD. The engineers focus on cramming ever-more data into the same space, improving storage capacity and increasing transfer speed in the process. Simulating millions of combinations of materials, energy levels, and rotational speeds allows them to pursue the highest density and the fastest read-write times. Getting the results more quickly allows them to make better decisions and lets them get new products to market more rapidly than before.

Here’s a visualization of Western Digital’s energy-assisted recording process in action. The top stripe represents the magnetism; the middle one represents the added energy (heat); and the bottom one represents the actual data written to the medium via the combination of magnetism and heat:

I recently spoke to my colleagues and to the teams at Western Digital and Univa who worked together to make this record-breaking run a reality. My goal was to find out more about how they prepared for this run, see what they learned, and to share it with you in case you are ready to run a large-scale job of your own.

Ramping Up
About two years ago, the Western Digital team was running clusters as big as 80K vCPUs, powered by EC2 Spot Instances in order to be as cost-effective as possible. They had grown to the 80K vCPU level after repeated, successful runs with 8K, 16K, and 32K vCPUs. After these early successes, they decided to shoot for the moon, push the boundaries, and work toward a one million vCPU run. They knew that this would stress and tax their existing tools, and settled on a find/fix/scale-some-more methodology.

Univa’s Grid Engine is a batch scheduler. It is responsible for keeping track of the available compute resources (EC2 instances) and dispatching work to the instances as quickly and efficiently as possible. The goal is to get the job done in the smallest amount of time and at the lowest cost. Univa’s Navops Launch supports container-based computing and also played a critical role in this run by allowing the same containers to be used for Grid Engine and AWS Batch.

One interesting scaling challenge arose when 50K hosts created concurrent connections to the Grid Engine scheduler. Once running, the scheduler can dispatch up to 3000 tasks per second, with an extra burst in the (relatively rare) case that an instance terminates unexpectedly and signals the need to reschedule 64 or more tasks as quickly as possible. The team also found that referencing worker instances by IP addresses allowed them to sidestep some internal (AWS) rate limits on the number of DNS lookups per Elastic Network Interface.

The entire simulation is packed in a Docker container for ease of use. When newly launched instances come online they register their specs (instance type, IP address, vCPU count, memory, and so forth) in an ElastiCache for Redis cluster. Grid Engine uses this data to find and manage instances; this is more efficient and scalable than calling DescribeInstances continually.

The simulation tasks read and write data from Amazon Simple Storage Service (S3), taking advantage of S3’s ability to store vast amounts of data and to handle any conceivable request rate.

Inside a Simulation Task
Each potential head design is described by a collection of parameters; the overall simulation run consists of an exploration of this parameter space. The results of the run help the designers to find designs that are buildable, reliable, and manufacturable. This particular run focused on modeling write operations.

Each simulation task ran for 2 to 3 hours, depending on the EC2 instance type. In order to avoid losing work if a Spot Instance is about to be terminated, the tasks checkpoint themselves to S3 every 15 minutes, with a bit of extra logic to cover the important case where the job finishes after the termination signal but before the actual shutdown.

Making the Run
After just 6 weeks of planning and prep (including multiple large-scale AWS Batch runs to generate the input files), the combined Western Digital / Univa / AWS team was ready to make the full-scale run. They used an AWS CloudFormation template to start Grid Engine and launch the cluster. Due to the Redis-based tracking that I described earlier, they were able to start dispatching tasks to instances as soon as they became available. The cluster grew to one million vCPUs in 1 hour and 32 minutes and ran full-bore for 6 hours:

When there were no more undispatched tasks available, Grid Engine began to shut the instances down, reaching the zero-instance point in about an hour. During the run, Grid Engine was able to keep the instances fully supplied with work over 99% of the time. The run used a combination of C3, C4, M4, R3, R4, and M5 instances. Here’s the overall breakdown over the course of the run:

The job spanned all six Availability Zones in the US East (N. Virginia) Region. Spot bids were placed at the On-Demand price. Over the course of the run, about 1.5% of the instances in the fleet were terminated and automatically replaced; the vast majority of the instances stayed running for the entire time.

And That’s That
This job ran 8 hours and cost $137,307 ($17,164 per hour). The folks I talked to estimated that this was about half the cost of making the run on an in-house cluster, if they had one of that size!

Evaluating the success of the run, Steve Phillpott (CIO of Western Digital) told us:

“Storage technology is amazingly complex and we’re constantly pushing the limits of physics and engineering to deliver next-generation capacities and technical innovation. This successful collaboration with AWS shows the extreme scale, power and agility of cloud-based HPC to help us run complex simulations for future storage architecture analysis and materials science explorations. Using AWS to easily shrink simulation time from 20 days to 8 hours allows Western Digital R&D teams to explore new designs and innovations at a pace un-imaginable just a short time ago.”

The Western Digital team behind this one is hiring an R&D Engineering Technologist; they also have many other open positions!

A Run for You
If you want to do a run on the order of 100K to 1M cores (or more), our HPC team is ready to help, as are our friends at Univa. To get started, Contact HPC Sales!

Jeff;

Optimizing a Lift-and-Shift for Security

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-security/

This is the third and final blog within a three-part series that examines how to optimize lift-and-shift workloads. A lift-and-shift is a common approach for migrating to AWS, whereby you move a workload from on-prem with little or no modification. This third blog examines how lift-and-shift workloads can benefit from an improved security posture with no modification to the application codebase. (Read about optimizing a lift-and-shift for performance and for cost effectiveness.)

Moving to AWS can help to strengthen your security posture by eliminating many of the risks present in on-premise deployments. It is still essential to consider how to best use AWS security controls and mechanisms to ensure the security of your workload. Security can often be a significant concern in lift-and-shift workloads, especially for legacy workloads where modern encryption and security features may not present. By making use of AWS security features you can significantly improve the security posture of a lift-and-shift workload, even if it lacks native support for modern security best practices.

Adding TLS with Application Load Balancers

Legacy applications are often the subject of a lift-and-shift. Such migrations can help reduce risks by moving away from out of date hardware but security risks are often harder to manage. Many legacy applications leverage HTTP or other plaintext protocols that are vulnerable to all manner of attacks. Often, modifying a legacy application’s codebase to implement TLS is untenable, necessitating other options.

One comparatively simple approach is to leverage an Application Load Balancer or a Classic Load Balancer to provide SSL offloading. In this scenario, the load balancer would be exposed to users, while the application servers that only support plaintext protocols will reside within a subnet which is can only be accessed by the load balancer. The load balancer would perform the decryption of all traffic destined for the application instance, forwarding the plaintext traffic to the instances. This allows  you to use encryption on traffic between the client and the load balancer, leaving only internal communication between the load balancer and the application in plaintext. Often this approach is sufficient to meet security requirements, however, in more stringent scenarios it is never acceptable for traffic to be transmitted in plaintext, even if within a secured subnet. In this scenario, a sidecar can be used to eliminate plaintext traffic ever traversing the network.

Improving Security and Configuration Management with Sidecars

One approach to providing encryption to legacy applications is to leverage what’s often termed the “sidecar pattern.” The sidecar pattern entails a second process acting as a proxy to the legacy application. The legacy application only exposes its services via the local loopback adapter and is thus accessible only to the sidecar. In turn the sidecar acts as an encrypted proxy, exposing the legacy application’s API to external consumers via TLS. As unencrypted traffic between the sidecar and the legacy application traverses the loopback adapter, it never traverses the network. This approach can help add encryption (or stronger encryption) to legacy applications when it’s not feasible to modify the original codebase. A common approach to implanting sidecars is through container groups such as pod in EKS or a task in ECS.

Implementing the Sidecar Pattern With Containers

Figure 1: Implementing the Sidecar Pattern With Containers

Another use of the sidecar pattern is to help legacy applications leverage modern cloud services. A common example of this is using a sidecar to manage files pertaining to the legacy application. This could entail a number of options including:

  • Having the sidecar dynamically modify the configuration for a legacy application based upon some external factor, such as the output of Lambda function, SNS event or DynamoDB write.
  • Having the sidecar write application state to a cache or database. Often applications will write state to the local disk. This can be problematic for autoscaling or disaster recovery, whereby having the state easily accessible to other instances is advantages. To facilitate this, the sidecar can write state to Amazon S3, Amazon DynamoDB, Amazon Elasticache or Amazon RDS.

A sidecar requires customer development, but it doesn’t require any modification of the lift-and-shifted application. A sidecar treats the application as a blackbox and interacts with it via its API, configuration file, or other standard mechanism.

Automating Security

A lift-and-shift can achieve a significantly stronger security posture by incorporating elements of DevSecOps. DevSecOps is a philosophy that argues that everyone is responsible for security and advocates for automation all parts of the security process. AWS has a number of services which can help implement a DevSecOps strategy. These services include:

  • Amazon GuardDuty: a continuous monitoring system which analyzes AWS CloudTrail Events, Amazon VPC Flow Log and DNS Logs. GuardDuty can detect threats and trigger an automated response.
  • AWS Shield: a managed DDOS protection services
  • AWS WAF: a managed Web Application Firewall
  • AWS Config: a service for assessing, tracking, and auditing changes to AWS configuration

These services can help detect security problems and implement a response in real time, achieving a significantly strong posture than traditional security strategies. You can build a DevSecOps strategy around a lift-and-shift workload using these services, without having to modify the lift-and-shift application.

Conclusion

There are many opportunities for taking advantage of AWS services and features to improve a lift-and-shift workload. Without any alteration to the application you can strengthen your security posture by utilizing AWS security services and by making small environmental and architectural changes that can help alleviate the challenges of legacy workloads.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Optimizing a Lift-and-Shift for Cost Effectiveness and Ease of Management

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-cost/

Lift-and-shift is the process of migrating a workload from on premise to AWS with little or no modification. A lift-and-shift is a common route for enterprises to move to the cloud, and can be a transitionary state to a more cloud native approach. This is the second blog post in a three-part series which investigates how to optimize a lift-and-shift workload. The first post is about performance.

A key concern that many customers have with a lift-and-shift is cost. If you move an application as is  from on-prem to AWS, is there any possibility for meaningful cost savings? By employing AWS services, in lieu of self-managed EC2 instances, and by leveraging cloud capability such as auto scaling, there is potential for significant cost savings. In this blog post, we will discuss a number of AWS services and solutions that you can leverage with minimal or no change to your application codebase in order to significantly reduce management costs and overall Total Cost of Ownership (TCO).

Automate

Even if you can’t modify your application, you can change the way you deploy your application. The adopting-an-infrastructure-as-code approach can vastly improve the ease of management of your application, thereby reducing cost. By templating your application through Amazon CloudFormation, Amazon OpsWorks, or Open Source tools you can make deploying and managing your workloads a simple and repeatable process.

As part of the lift-and-shift process, rationalizing the workload into a set of templates enables less time to spent in the future deploying and modifying the workload. It enables the easy creation of dev/test environments, facilitates blue-green testing, opens up options for DR, and gives the option to roll back in the event of error. Automation is the single step which is most conductive to improving ease of management.

Reserved Instances and Spot Instances

A first initial consideration around cost should be the purchasing model for any EC2 instances. Reserved Instances (RIs) represent a 1-year or 3-year commitment to EC2 instances and can enable up to 75% cost reduction (over on demand) for steady state EC2 workloads. They are ideal for 24/7 workloads that must be continually in operation. An application requires no modification to make use of RIs.

An alternative purchasing model is EC2 spot. Spot instances offer unused capacity available at a significant discount – up to 90%. Spot instances receive a two-minute warning when the capacity is required back by EC2 and can be suspended and resumed. Workloads which are architected for batch runs – such as analytics and big data workloads – often require little or no modification to make use of spot instances. Other burstable workloads such as web apps may require some modification around how they are deployed.

A final alternative is on-demand. For workloads that are not running in perpetuity, on-demand is ideal. Workloads can be deployed, used for as long as required, and then terminated. By leveraging some simple automation (such as AWS Lambda and CloudWatch alarms), you can schedule workloads to start and stop at the open and close of business (or at other meaningful intervals). This typically requires no modification to the application itself. For workloads that are not 24/7 steady state, this can provide greater cost effectiveness compared to RIs and more certainty and ease of use when compared to spot.

Amazon FSx for Windows File Server

Amazon FSx for Windows File Server provides a fully managed Windows filesystem that has full compatibility with SMB and DFS and full AD integration. Amazon FSx is an ideal choice for lift-and-shift architectures as it requires no modification to the application codebase in order to enable compatibility. Windows based applications can continue to leverage standard, Windows-native protocols to access storage with Amazon FSx. It enables users to avoid having to deploy and manage their own fileservers – eliminating the need for patching, automating, and managing EC2 instances. Moreover, it’s easy to scale and minimize costs, since Amazon FSx offers a pay-as-you-go pricing model.

Amazon EFS

Amazon Elastic File System (EFS) provides high performance, highly available multi-attach storage via NFS. EFS offers a drop-in replacement for existing NFS deployments. This is ideal for a range of Linux and Unix usecases as well as cross-platform solutions such as Enterprise Java applications. EFS eliminates the need to manage NFS infrastructure and simplifies storage concerns. Moreover, EFS provides high availability out of the box, which helps to reduce single points of failure and avoids the need to manually configure storage replication. Much like Amazon FSx, EFS enables customers to realize cost improvements by moving to a pay-as-you-go pricing model and requires a modification of the application.

Amazon MQ

Amazon MQ is a managed message broker service that provides compatibility with JMS, AMQP, MQTT, OpenWire, and STOMP. These are amongst the most extensively used middleware and messaging protocols and are a key foundation of enterprise applications. Rather than having to manually maintain a message broker, Amazon MQ provides a performant, highly available managed message broker service that is compatible with existing applications.

To use Amazon MQ without any modification, you can adapt applications that leverage a standard messaging protocol. In most cases, all you need to do is update the application’s MQ endpoint in its configuration. Subsequently, the Amazon MQ service handles the heavy lifting of operating a message broker, configuring HA, fault detection, failure recovery, software updates, and so forth. This offers a simple option for reducing management overhead and improving the reliability of a lift-and-shift architecture. What’s more is that applications can migrate to Amazon MQ without the need for any downtime, making this an easy and effective way to improve a lift-and-shift.

You can also use Amazon MQ to integrate legacy applications with modern serverless applications. Lambda functions can subscribe to MQ topics and trigger serverless workflows, enabling compatibility between legacy and new workloads.

Integrating Lift-and-Shift Workloads with Lambda via Amazon MQ

Figure 1: Integrating Lift-and-Shift Workloads with Lambda via Amazon MQ

Amazon Managed Streaming Kafka

Lift-and-shift workloads which include a streaming data component are often built around Apache Kafka. There is a certain amount of complexity involved in operating a Kafka cluster which incurs management and operational expense. Amazon Kinesis is a managed alternative to Apache Kafka, but it is not a drop-in replacement. At re:Invent 2018, we announced the launch of Amazon Managed Streaming Kafka (MSK) in public preview. MSK provides a managed Kafka deployment with pay-as-you-go pricing and an acts as a drop-in replacement in existing Kafka workloads. MSK can help reduce management costs and improve cost efficiency and is ideal for lift-and-shift workloads.

Leveraging S3 for Static Web Hosting

A significant portion of any web application is static content. This includes videos, image, text, and other content that changes seldom, if ever. In many lift-and-shifted applications, web servers are migrated to EC2 instances and host all content – static and dynamic. Hosting static content from an EC2 instance incurs a number of costs including the instance, EBS volumes, and likely, a load balancer. By moving static content to S3, you can significantly reduce the amount of compute required to host your web applications. In many cases, this change is non-disruptive and can be done at the DNS or CDN layer, requiring no change to your application.

Reducing Web Hosting Costs with S3 Static Web Hosting

Figure 2: Reducing Web Hosting Costs with S3 Static Web Hosting

Conclusion

There are numerous opportunities for reducing the cost of a lift-and-shift. Without any modification to the application, lift-and-shift workloads can benefit from cloud-native features. By using AWS services and features, you can significantly reduce the undifferentiated heavy lifting inherent in on-prem workloads and reduce resources and management overheads.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Optimizing a Lift-and-Shift for Performance

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-performance/

Many organizations begin their cloud journey with a lift-and-shift of applications from on-premise to AWS. This approach involves migrating software deployments with little, or no, modification. A lift-and-shift avoids a potentially expensive application rewrite but can result in a less optimal workload that a cloud native solution. For many organizations, a lift-and-shift is a transitional stage to an eventual cloud native solution, but there are some applications that can’t feasibly be made cloud-native such as legacy systems or proprietary third-party solutions. There are still clear benefits of moving these workloads to AWS, but how can they be best optimized?

In this blog series post, we’ll look at different approaches for optimizing a black box lift-and-shift. We’ll consider how we can significantly improve a lift-and-shift application across three perspectives: performance, cost, and security. We’ll show that without modifying the application we can integrate services and features that will make a lift-and-shift workload cheaper, faster, more secure, and more reliable. In this first blog, we’ll investigate how a lift-and-shift workload can have improved performance through leveraging AWS features and services.

Performance gains are often a motivating factor behind a cloud migration. On-premise systems may suffer from performance bottlenecks owing to legacy infrastructure or through capacity issues. When performing a lift-and-shift, how can you improve performance? Cloud computing is famous for enabling horizontally scalable architectures but many legacy applications don’t support this mode of operation. Traditional business applications are often architected around a fixed number of servers and are unable to take advantage of horizontal scalability. Even if a lift-and-shift can’t make use of auto scaling groups and horizontal scalability, you can achieve significant performance gains by moving to AWS.

Scaling Up

The easiest alternative to scale up to compute is vertical scalability. AWS provides the widest selection of virtual machine types and the largest machine types. Instances range from small, burstable t3 instances series all the way to memory optimized x1 series. By leveraging the appropriate instance, lift-and-shifts can benefit from significant performance. Depending on your workload, you can also swap out the instances used to power your workload to better meet demand. For example, on days in which you anticipate high load you could move to more powerful instances. This could be easily automated via a Lambda function.

The x1 family of instances offers considerable CPU, memory, storage, and network performance and can be used to accelerate applications that are designed to maximize single machine performance. The x1e.32xlarge instance, for example, offers 128 vCPUs, 4TB RAM, and 14,000 Mbps EBS bandwidth. This instance is ideal for high performance in-memory workloads such as real time financial risk processing or SAP Hana.

Through selecting the appropriate instance types and scaling that instance up and down to meet demand, you can achieve superior performance and cost effectiveness compared to running a single static instance. This affords lift-and-shift workloads far greater efficiency that their on-prem counterparts.

Placement Groups and C5n Instances

EC2 Placement groups determine how you deploy instances to underlying hardware. One can either choose to cluster instances into a low latency group within a single AZ or spread instances across distinct underlying hardware. Both types of placement groups are useful for optimizing lift-and-shifts.

The spread placement group is valuable in applications that rely on a small number of critical instances. If you can’t modify your application  to leverage auto scaling, liveness probes, or failover, then spread placement groups can help reduce the risk of simultaneous failure while improving the overall reliability of the application.

Cluster placement groups help improve network QoS between instances. When used in conjunction with enhanced networking, cluster placement groups help to ensure low latency, high throughput, and high network packets per second. This is beneficial for chatty applications and any application that leveraged physical co-location for performance on-prem.

There is no additional charge for using placement groups.

You can extend this approach further with C5n instances. These instances offer 100Gbps networking and can be used in placement group for the most demanding networking intensive workloads. Using both placement groups and the C5n instances require no modification to your application, only to how it is deployed – making it a strong solution for providing network performance to lift-and-shift workloads.

Leverage Tiered Storage to Optimize for Price and Performance

AWS offers a range of storage options, each with its own performance characteristics and price point. Through leveraging a combination of storage types, lift-and-shifts can achieve the performance and availability requirements in a price effective manner. The range of storage options include:

Amazon EBS is the most common storage service involved with lift-and-shifts. EBS provides block storage that can be attached to EC2 instances and formatted with a typical file system such as NTFS or ext4. There are several different EBS types, ranging from inexpensive magnetic storage to highly performant provisioned IOPS SSDs. There are also storage-optimized instances that offer high performance EBS access and NVMe storage. By utilizing the appropriate type of EBS volume and instance, a compromise of performance and price can be achieved. RAID offers a further option to optimize EBS. EBS utilizes RAID 1 by default, providing replication at no additional cost, however an EC2 instance can apply other RAID levels. For instance, you can apply RAID 0 over a number of EBS volumes in order to improve storage performance.

In addition to EBS, EC2 instances can utilize the EC2 instance store. The instance store provides ephemeral direct attached storage to EC2 instances. The instance store is included with the EC2 instance and provides a facility to store non-persistent data. This makes it ideal for temporary files that an application produces, which require performant storage. Both EBS and the instance store are expose to the EC2 instance as block level devices, and the OS can use its native management tools to format and mount these volumes as per a traditional disk – requiring no significant departure from the on prem configuration. In several instance types including the C5d and P3d are equipped with local NVMe storage which can support extremely IO intensive workloads.

Not all workloads require high performance storage. In many cases finding a compromise between price and performance is top priority. Amazon S3 provides highly durable, object storage at a significantly lower price point than block storage. S3 is ideal for a large number of use cases including content distribution, data ingestion, analytics, and backup. S3, however, is accessible via a RESTful API and does not provide conventional file system semantics as per EBS. This may make S3 less viable for applications that you can’t easily modify, but there are still options for using S3 in such a scenario.

An option for leveraging S3 is AWS Storage Gateway. Storage Gateway is a virtual appliance than can be run on-prem or on EC2. The Storage Gateway appliance can operate in three configurations: file gateway, volume gateway and tape gateway. File gateway provides an NFS interface, Volume Gateway provides an iSCSI interface, and Tape Gateway provides an iSCSI virtual tape library interface. This allows files, volumes, and tapes to be exposed to an application host through conventional protocols with the Storage Gateway appliance persisting data to S3. This allows an application to be agnostic to S3 while leveraging typical enterprise storage protocols.

Using S3 Storage via Storage Gateway

Figure 1: Using S3 Storage via Storage Gateway

Conclusion

A lift-and-shift can achieve significant performance gains on AWS by making use of a range of instance types, storage services, and other features. Even without any modification to the application, lift-and-shift workloads can benefit from cutting edge compute, network, and IO which can help realize significant, meaningful performance gains.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

New – Hibernate Your EC2 Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-hibernate-your-ec2-instances/

As you know, you can easily build highly scalable AWS applications that launch fresh EC2 instances on an as-needed basis. While the instances can be up and running in a matter of seconds, booting the operating system and the application can take considerable time. Also, caches and other memory-centric application components can take some time (sometimes tens of minutes) to preload or warm up. Both of these factors impose a delay that can force you to over-provision in case you need incremental capacity very quickly.

Hibernation for EC2 Instances
Today we are giving you the ability to launch EC2 instances, set them up as desired, hibernate them, and then bring them back to life when you need them. The hibernation process stores the in-memory state of the instance, along with its private and elastic IP addresses, allowing it to pick up exactly where it left off.

This feature is available today and you can use it on freshly launched M3, M4, M5, C3, C4, C5, R3, R4, and R5 instances running Amazon Linux 1 (support for Amazon Linux 2 is in the works and will be ready soon). It applies to On-Demand instances and instances running with Reserved Instance coverage.

When an instance is instructed to hibernate, it writes the in-memory state to a file in the root EBS volume and then (in effect) shuts itself down. The AMI used to launch the instance must be encrypted, as must the root EBS volume of the instance. The encryption ensures proper protection for sensitive data when it is copied from memory to the EBS volume.

While the instance is in hibernation, you pay only for the EBS volumes and Elastic IP Addresses attached to it; there are no other hourly charges (just like any other stopped instance).

Hibernation in Action
In order to check out this feature I launch a c4.large instance, and select hibernation as a stop behavior:

I also expand my instance’s root volume, adding 10 GB + the memory size of the instance to the desired size:

I also create and associate an Elastic IP address with my instance since the public IP address will change. My instance is up and running, and I can check the uptime:

Then I select the instance in the EC2 Console and choose Stop – Hibernate from the Instance State menu (API and CLI support is also available):

The instance state transitions from running to stopping, and then to stopped, in seconds:

The console provides additional information about the transition:

The SSH connection to the instance drops, since it is no longer running:

Later, when I am ready to proceed, I click Start:

This time the state goes from stopped to pending, and then to running, again in seconds, and I can reconnect. I can then use uptime to see that the instance has not been rebooted, but has continued from where it left off:

If I was using this instance interactively, I could use a session manager such as screen, tmux, or mosh to make this totally seamless. The most interesting use cases for hibernation revolve around long-running processes and services that take a lot of time to initialize before they are ready to accept traffic where this would not be a concern.

Things to Know
As you can see, hibernation is really easy to use, and I hope that you are already thinking of some ways to apply it to your application. Here are a couple of things to keep in mind:

Instance Type – You can enable and use hibernation on freshly launches instances of the types that I listed above.

Root Volume Size – The root volume must have free space equal to the amount of RAM on the instance in order for the hibernation to succeed.

Operating Systems – The newest Amazon Linux 1 AMIs are configured for hibernation, with many others in the works. You will need to create an encrypted AMI, using one of these AMIs as a base. You can also follow our directions to customize and use your own AMI.

Modifications – You cannot modify the instance size or type while it is in hibernation, but you can modify the user data and the EBS Optimization setting.

Pricing – While the instance is in hibernation, you pay only for the EBS storage and any Elastic IP addresses attached to the instance.

Performance – The time to hibernate or resume is dependent on the memory size of the instance, the amount of in-memory data to be saved, and the throughput of the root EBS volume.

Coming Soon – We are working on support for Amazon Linux 2, Ubuntu, Windows Server 2008 R2, Windows Server 2012, Windows Server 2012 R2, Windows Server 2016, along with the SQL Server variants of the Windows AMIs.

Available Now
This feature is available now in the US East (N. Virginia, Ohio), US West (N. California, Oregon), Canada (Central), South America (São Paulo), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Tokyo), and EU (Frankfurt, London, Ireland, Paris) Regions.

Jeff;