Tag Archives: Compute

Running the most reliable choice for Windows workloads: Windows on AWS

Post Syndicated from Sandy Carter original https://aws.amazon.com/blogs/compute/running-the-most-reliable-choice-for-windows-workloads-windows-on-aws/

Some of you may not know, but AWS began supporting Microsoft Windows workloads on AWS in 2008—over 11 years ago. Year over year, we have released exciting new services and enhancements based on feedback from customers like you. AWS License Manager and Amazon CloudWatch Application Insights for .NET and SQL Server are just some of the recent examples. The rate and pace of innovation is eye-popping.

In addition to innovation, one of the key areas that companies value is the reliability of the cloud platform. I recently chatted with David Sheehan, DevOps engineer at eMarketer. He told me, “Our move from Azure to AWS improved the performance and reliability of our microservices in addition to significant cost savings.” If a healthcare clinic can’t connect to the internet, then it’s possible that they can’t deliver care to their patients. If a bank can’t process transactions because of an outage, they could lose business.

In 2018, the next-largest cloud provider had almost 7x more downtime hours than AWS per data pulled directly from the public service health dashboards of the major cloud providers. It is the reason companies like Edwards Lifesciences chose AWS. They are a global leader in patient-focused medical innovations for structural heart disease, as well as critical care and surgical monitoring. Rajeev Bhardwaj, the senior director for Enterprise Technology, recently told me, “We chose AWS for our data center workloads, including Windows, based on our assessment of the security, reliability, and performance of the platform.”

There are several reasons as to why AWS delivers a more reliable platform for Microsoft workloads but I would like to focus on two here: designing for reliability and scaling within a Region.

Reason #1—It’s designed for reliability

AWS has significantly better reliability than the next largest cloud provider, due to our fundamentally better global infrastructure design based on Regions and Availability Zones. The AWS Cloud spans 64 zones within 21 geographic Regions around the world. We’ve announced plans for 12 more zones and four more Regions in Bahrain, Cape Town, Jakarta, and Milan.

Look at networking capabilities across five key areas: security, global coverage, performance, manageability, and availability. AWS has made deep investments in each of these areas over the past 12 years. We want to ensure that AWS has the networking capabilities required to run the world’s most demanding workloads.

There is no compression algorithm for experience. From running the most extensive, reliable, and secure global cloud infrastructure technology platform, we’ve learned that you care about the availability and performance of your applications. You want to deploy applications across multiple zones in the same Region for fault tolerance and latency.

I want to take a moment to emphasize that our approach to building our network is fundamentally different from our competitors, and that difference matters. Each of our Regions is fully isolated from all other Regions. Unlike virtually every other cloud provider, each AWS Region has multiple zones and data centers. These zones are a fully isolated partition of our infrastructure that contains up to eight separate data centers.

The zones are connected to each other with fast, private fiber-optic networking, enabling you to easily architect applications that automatically fail over between zones without interruption. With their own power infrastructure, the zones are physically separated by a meaningful distance, many kilometers, from any other zone. You can partition applications across multiple zones in the same Region to better isolate any issues and achieve high availability.

The AWS control plane (including APIs) and AWS Management Console are distributed across AWS Regions. They use a Multi-AZ architecture within each Region to deliver resilience and ensure continuous availability. This ensures that you avoid having a critical service dependency on a single data center.

While other cloud vendors claim to have Availability Zones, they do not have the same stringent requirements for isolation between zones, leading to impact across multiple zones. Furthermore, AWS has more zones and more Regions with support for multiple zones than any other cloud provider. This design is why the next largest cloud provider had almost 7x more downtime hours in 2018 than AWS.

Reason #2—Scale within a Region

We also designed our services into smaller cells that scale out within a Region, as opposed to a single-Region instance that scales up. This approach reduces the blast radius when there is a cell-level failure. It is why AWS—unlike other providers—has never experienced a network event spanning multiple Regions.

AWS also provides the most detailed information on service availability via the Service Health Dashboard, including Regions affected, services impacted, and downtime duration. AWS keeps a running log of all service interruptions for the past year. Finally, you can subscribe to an RSS feed to be notified of interruptions to each individual service.

Reliability matters

Running Windows workloads on AWS means that you not only get the most innovative cloud, but you also have the most reliable cloud as well.

For example, Mary Kay is one of the world’s leading direct sellers of skin care products and cosmetics. They have tens of thousands of employees and beauty consultants working outside the office, so the IT system is fundamental for the success of their company.

Mary Kay used Availability Zones and Microsoft Active Directory to architect their applications on AWS. AWS Microsoft Managed AD provides Mary Kay the features that enabled them to deploy SQL Server Always On availability groups on Amazon EC2 Windows. This configuration gave Mary Kay the control to scale their deployment out to meet their performance requirements. They were able to deploy the service in multiple Regions to support users worldwide. Their on-premises users get the same experience when using Active Directory–aware services, either on-premises or in the AWS Cloud.

Now, with our cross-account and cross-VPC support, Mary Kay is looking at reducing their managed Active Directory infrastructure footprint, saving money and reducing complexity. But this identity management system must be reliable and scalable as well as innovative.

Fugro is a Dutch multinational public company headquartered in the Netherlands. They provide geotechnical, survey, subsea, and geoscience services for clients, typically oil and gas, telecommunications cable, and infrastructure companies. Fugro leverages the cloud to support the delivery of geo-intelligence and asset management services for clients globally in industries including onshore and offshore energy, renewables, power, and construction.

As I was chatting with Scott Carpenter, the global cloud architect for Fugro, he said, “Fugro is also now in the process of migrating a complex ESRI ArcGIS environment from an existing cloud provider to AWS. It is going to centralize and accelerate access from existing AWS hosted datasets, while still providing flexibility and interoperability to external and third-party data sources. The ArcGIS migration is driven by a focus on providing the highest level of operational excellence.”

With AWS, you don’t have to be concerned about reliability. AWS has the reliability and scale that drives innovation for Windows applications running in the cloud. And the same reliability that makes it best for your Windows applications is the same reliability that makes AWS the best cloud for all your applications.

Let AWS help you assess how your company can get the most out of cloud. Join all the AWS customers that trust us to run their most important applications in the best cloud. To have us create an assessment for your Windows applications or all your applications, email us at [email protected].

Optimizing Network Intensive Workloads on Amazon EC2 A1 Instances

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/optimizing-network-intensive-workloads-on-amazon-ec2-a1-instances/

This post courtesy of Ali Saidi, AWS, Principal Engineer

At re:Invent 2018, AWS announced the Amazon EC2 A1 instance. The A1 instances are powered by our internally developed Arm-based AWS Graviton processors and are up to 45% less expensive than other instance types with the same number of vCPUs and DRAM. These instances are based on the AWS Nitro System, and offer enhanced-networking of up to 10 Gbps with Elastic Network Adapters (ENA).

One of the use cases for the A1 instance is key-value stores and in this post, we describe how to get the most performance from the A1 instance running memcached. Some simple configuration options increase the performance of memcached by 3.9X over the out-of-the-box experience as we’ll show below. Although we focus on memcached, the configuration advice is similar for any network intensive workload running on A1 instances. Typically, the performance of network intensive workloads will improve by tuning some of these parameters, however depending on the particular data rates and processing requirements the values below could change.

irqbalance

Most Linux distributions enable irqbalance by default which load-balance interrupts to different CPUs during runtime. It does a good job to balance interrupt load, but in some cases, we can do better by pinning interrupts to specific CPUs. For our optimizations we’re going to temporarily disable irqbalance, however, if this is a production configuration that needs to survive a server reboot, irqbalance would need to be permanently disabled and the changes below would need to be added to the boot sequence.

Receive Packet Steering (RPS)

RPS controls which CPUs process packets are received by the Linux networking stack (softIRQs). Depending on instance size and the amount of application processing needed per packet, sometimes the optimal configuration is to have the core receiving packets also execute the Linux networking stack, other times it’s better to spread the processing among a set of cores. For memcached on EC2 A1 instances, we found that using RPS to spread the load out is helpful on the larger instance sizes.

Networking Queues

A1 instances with medium, large, and xlarge instance sizes have a single queue to send and receive packets while 2xlarge and 4xlarge instance sizes have two queues. On the single queue droplets, we’ll pin the IRQ to core 0, while on the dual-queue droplets we’ll use either core 0 or core 0 and core 8.

Instance TypeIRQ settingsRPS settingsApplication settings
a1.xlargeCore 0Core 0Run on cores 1-3
a1.2xlargeBoth on core 0Core 0-3, 4-7Run on core 1-7
a1.4xlargeCore 0 and core 8Core 0-7, 8-15Run on cores 1-7 and 9-15

 

 

 

 

 

The following script sets up the Linux kernel parameters:

#!/bin/bash 

sudo systemctl stop irqbalance.service
set_irq_affinity() {
  grep eth0 /proc/interrupts | awk '{print $1}' | tr -d : | while read IRQ; 
do
    sudo sh -c "echo $1 > /proc/irq/$IRQ/smp_affinity_list"
    shift
  done
}
 
case `grep ^processor /proc/cpuinfo  | wc -l ` in
  (4) sudo sh -c 'echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0
      ;;
  (8) sudo sh -c 'echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      sudo sh -c 'echo f0 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0 0
      ;;
  (16) sudo sh -c 'echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      sudo sh -c 'echo ff00 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0 08
      ;;
  *)  echo "Script only supports 4, 8, 16 cores on A1 instances"
      exit 1;
      ;;
esac

Summary

Some simple tuning parameters can significantly improve the performance of network intensive workloads on the A1 instance. With these changes we get 3.9X the performance on an a1.4xlarge and the other two instance sizes see similar improvements. While the particular values listed here aren’t applicable to all network intensive benchmarks, this article demonstrates the methodology and provides a starting point to tune the system and balance the load across CPUs to improve performance. If you have questions about your own workload running on A1 instances, please don’t hesitate to get in touch with us at [email protected] .

Fact-checking the truth on TCO for running Windows workloads in the cloud

Post Syndicated from Sandy Carter original https://aws.amazon.com/blogs/compute/fact-checking-the-truth-on-tco-for-running-windows-workloads-in-the-cloud/

We’ve been talking to many customers over the last 3–4 months who are concerned about the total cost of ownership (TCO) for running Microsoft Windows workloads in the cloud.

For example, Infor is a global leader in enterprise resource planning (ERP) for manufacturing, healthcare, and retail. They’ve moved thousands of their existing Microsoft SQL Server workloads to Amazon EC2 instances. As a result, they are saving 75% on monthly backup costs. With these tremendous cost savings, Infor can now focus their resources on exponential business growth, with initiatives around AI and optimization.

We also love the story of Just Eat, a UK-based company that has migrated their SQL Server workloads to AWS. They’re now focused on using that data to train Alexa skills for ordering take out!

Here are three fact checks that you should review to ensure that you are getting the best TCO!

Fact check #1: Microsoft’s cost comparisons are misleading for running Windows workloads in the cloud

Customers have shared with us over and over why they continue to trust AWS to run their most important Windows workloads. Still, some of those customers tell us that Microsoft claims Azure is cheaper for running Windows workloads. But can this really be true?

When looking at Microsoft’s cost comparisons, we can see that their analysis is misleading because of some false assumptions. For example, Microsoft only compares the costs of the compute service and licenses. But every workload needs storage and networking! By leaving out these necessary services, Microsoft is not comparing real-world workloads.

The comparison also assumes that the AWS and Azure offerings are at a performance parity, which isn’t true. While the comparison uses equivalent virtual instance configuration, Microsoft ignores the significantly higher performance of AWS compute. We hear that customers must run between two to three times as many Azure instances to get the same performance as they do on AWS (see Fact check #2).

And the list goes on. Microsoft’s analysis only looks at 2008 versions of Windows Server or SQL Server. Then, it adds in the cost for expensive Extended Support to the AWS calculation (extended support costs 75% of the current license cost per year). This addition makes up more than half of the claimed cost difference.

Microsoft assumes that in the next three years, customers won’t move off software that’s more than 10 years old. What we hear from customers is that they plan to use their upgrade rights from Software Assurance (SA) to move to newer versions, such as SQL Server 2016. They’ll use our new automated upgrade tool to eliminate the need for these expensive fees.

Finally, the comparison assumes the use of Azure Hybrid Benefit to reduce the cost of the Azure virtual instance. It does not factor in the cost of the required Microsoft SA on each license. The required SA adds significant cost to the Microsoft side of the example and further demonstrates that their example was misleading.

These assumptions result in a comparison that does not factor in all the costs needed to run SQL Server in Azure. Nor does it account for the performance gains that you get from running on AWS.

At AWS, we are committed to helping you make the most efficient use of cloud resources and lower your Microsoft bill. It appears that Microsoft is focused on keeping those line items flat or growing them over time by adding more and more licensing complexity.

Fact check #2: Price-performance matters to your business for running SQL Server in the cloud

When deciding what cloud is best for your Windows workloads, you should consider both price and performance to find the right operational combination to maximize value for your business. It is also important to think about the future and not make important platform decisions based on technology that was designed before the rise of the cloud.

We know that getting better application performance for your apps is critical to your customers’ satisfaction. In fact, excellent application performance leads to 39% higher customer satisfaction. For more information, see the Netmagic Solutions whitepaper, Application Performance Management: How End-User Experience Affects Your Bottom Line. Poor performance may lead to damaged reputations or even worse, customer attrition.

To make sure that you have the best possible experience for your customers, we focused on pushing the boundaries around performance.

With that in mind, here are some comparisons done between Azure and AWS:

  • DB Best, an enterprise database consulting company, wrote two blog posts—one each for Azure and AWS. They showed how to get the best price-performance ratio for running current versions of SQL Server in the cloud.
  • ZK Research took these posts and compared the results from DB Best to show an apples-to-apples comparison. The testing from DB Best found that SQL Server on AWS consistently shows a 2–3x better performance compared to Azure, using a TPC-C-like benchmark tool called HammerDB.
  • ZK Research then used the DB Best data to calculate the comparison cost for running 1 billion transactions per month. ZK Research found that SQL Server running on Azure would have twice the cost than when running on AWS, when comparing price-performance, including storage, compute, and networking.

As you can see from this data, running on AWS gives you the best price-performance ratio for Windows workloads.

Fact check #3: What does an optimized TCO for Windows workloads in the cloud look like?

When assessing which cloud to run your Windows workloads, your comparison must go well beyond just the compute and support costs. Look at the TCO of your workloads and include everything necessary to run and support these workloads, like storage, networking, and the cost benefits of better reliability. Then, see how you can use the cloud to lower your overall TCO.

So how do you lower your costs to run Windows workloads like Windows Server and SQL Server in the cloud? Optimize those workloads for the scalability and flexibility of cloud. When companies plan cloud migrations on their own, they often use a spreadsheet inventory of their on-premises servers and try to map them, one-to-one, to new cloud-based servers. But these inventories don’t account for the capabilities of cloud-based systems.

On-premises servers are not optimized, with 84% of workloads currently over-provisioned. Many Windows and SQL Server 2008 workloads are running on older, slower server hardware. By sizing your workloads for performance and capability, not by physical servers, you can optimize your cloud migration.

Reducing the number of licenses that you use, both by server and core counts, can also drive significant cost savings. See which on-premises workloads are fault-tolerant, and then use Amazon EC2 Spot Instances to save up to 90% on your compute costs vs. On-Demand pricing.

To get the most out of moving your Windows workloads into the cloud, review and optimize each workload to take best advantage of cloud scalability and flexibility. Our customers have made the most efficient use of cloud resources by working with assessment partners like Movere or TSO Logic, which is now part of AWS.

By running detailed assessments of their environments before migration, customers can yield up to 36% savings using AWS over three years. Customer with optimized environments often find that their AWS solutions are price-competitive with Microsoft even before taking in account the AWS price-performance advantage.

In addition, you can optimize utilization with AWS Trusted Advisor. In fact, over the last couple years, we’ve used AWS Trusted Advisor to tell customers how to spend less money with us, leading to hundreds of millions of dollars in savings for our customers every year.

Why run Windows Server and SQL Server anywhere else but AWS?

For the past 10 years, many companies, such as Adobe and Salesforce, have trusted AWS to run Windows-based workloads such as Windows Server and SQL Server. Many customers tell us the reasons they choose AWS is due to TCO and reliability. Customers have been able to run their Windows workloads with lower costs and higher performance than on any other cloud. To learn more about our story and why customers trust AWS for their Windows workloads, check out Windows on AWS.

After the workloads are optimized for cloud, you can save even more money by efficiently managing your Window Server and SQL Server licenses with AWS License Manager. By the way, License Manager lets you manage on-premises and in the cloud, as well as other software like SAP, Oracle, and IBM.

Dedicated hosts allow customers to bring Windows Server and SQL Server licenses with or without Software Assurance. Licenses without Software Assurance cannot be taken to Azure. Furthermore, Dedicated Hosts allow customers to license Windows Server at the physical level and achieve a greater number of instances at a lower price than they would get through Azure Hybrid Use Benefits.

Summary

The answer is clear: AWS is the best cloud to run your Windows workloads. AWS offers the best experience for Windows workloads in the cloud, which is why we run almost 2x the number of Windows workloads compared to the next largest cloud.

Our customers have found that migrating their Windows workloads to AWS can yield significant savings and better performance. Customers like Sysco, Hess, Sony DADC New Media Solutions, Ancestry, and Expedia have chosen AWS to upgrade, migrate, and modernize their Windows workloads in the cloud.

Don’t let misleading cost comparisons prevent you from getting the most out of cloud. Let AWS help you assess how you can get the most out of cloud. Join all the AWS customers that trust us to run their most important applications in the best cloud for Windows workloads. If you want us to do an assessment for you, email us at [email protected].

Docker, Amazon ECS, and Spot Fleets: A Great Fit Together

Post Syndicated from Tung Nguyen original https://aws.amazon.com/blogs/aws/docker-amazon-ecs-and-spot-fleets-a-great-fit-together/

Guest post by AWS Container Hero Tung Nguyen. Tung is the president and founder of BoltOps, a consulting company focused on cloud infrastructure and software on AWS. He also enjoys writing for the BoltOps Nuts and Bolts blog.

EC2 Spot Instances allow me to use spare compute capacity at a steep discount. Using Amazon ECS with Spot Instances is probably one of the best ways to run my workloads on AWS. By using Spot Instances, I can save 50–90% on Amazon EC2 instances. You would think that folks would jump at a huge opportunity like a black Friday sales special. However, most folks either seem to not know about Spot Instances or are hesitant. This may be due to some fallacies about Spot.

Spot Fallacies

With the Spot model, AWS can remove instances at any time. It can be due to a maintenance upgrade; high demand for that instance type; older instance type; or for any reason whatsoever.

Hence the first fear and fallacy that people quickly point out with Spot:

What do you mean that the instance can be replaced at any time? Oh no, that must mean that within 20 minutes of launching the instance, it gets killed.

I felt the same way too initially. The actual Spot Instance Advisor website states:

The average frequency of interruption across all Regions and instance types is less than 5%.

From my own usage, I have seen instances run for weeks. Need proof? Here’s a screenshot from an instance in one of our production clusters.

If you’re wondering how many days that is….

Yes, that is 228 continuous days. You might not get these same long uptimes, but it disproves the fallacy that Spot Instances are usually interrupted within 20 minutes from launch.

Spot Fleets

With Spot Instances, I place a single request for a specific instance in a specific Availability Zone. With Spot Fleets, instead of requesting a single instance type, I can ask for a variety of instance types that meet my requirements. For many workloads, as long as the CPU and RAM are close enough, many instance types do just fine.

So, I can spread my instance bets across instance types and multiple zones with Spot Fleets. Using Spot Fleets dramatically makes the system more robust on top of the already mentioned low interruption rate. Also, I can run an On-Demand cluster to provide additional safeguard capacity.

ECS and Spot Fleets: A Great Fit Together

This is one of my favorite ways to run workloads because it gives me a scalable system at a ridiculously low cost. The technologies are such a great fit together that one might think they were built for each other.

  1. Docker provides a consistent, standard binary format to deploy. If it works in one Docker environment, then it works in another. Containers can be pulled down in seconds, making them an excellent fit for Spot Instances, where containers might move around during an interruption.
  2. ECS provides a great ecosystem to run Docker containers. ECS supports a feature called connection instance draining that allows me to tell ECS to relocate the Docker containers to other EC2 instances.
  3. Spot Instances fire off a two-minute warning signal letting me know when it’s about to terminate the instance.

These are the necessary pieces I need for building an ECS cluster on top of Spot Fleet. I use the two-minute warning to call ECS connection draining, and ECS automatically moves containers to another instance in the fleet.

Here’s a CloudFormation template that achieves this: ecs-ec2-spot-fleet. Because the focus is on understanding Spot Fleets, the VPC is designed to be simple.

The template specifies two instance types in the Spot Fleet: t3.small and t3.medium with 2 GB and 4 GB of RAM, respectively. The template weights the t3.medium twice as much as the t3.small. Essentially, the Spot Fleet TargetCapacity value equals the total RAM to provision for the ECS cluster. So if I specify 8, the Spot Fleet service might provision four t3.small instances or two t3.medium instances. The cluster adds up to at least 8 GB of RAM.

To launch the stack run, I run the following command:

aws cloudformation create-stack --stack-name ecs-spot-demo --template-body file://ecs-spot-demo.yml --capabilities CAPABILITY_IAM

The CloudFormation stack launches container instances and registers them to an ECS cluster named developmentby default. I can change this with the EcsCluster parameter. For more information on the parameters, see the README and the template source.

When I deploy the application, the deploy tool creates the ECS cluster itself. Here are the Spot Instances in the EC2 console.

Deploy the demo app

After the Spot cluster is up, I can deploy a demo app on it. I wrote a tool called Ufo that is useful for these tasks:

  1. Build the Docker image.
  2. Register the ECS task definition.
  3. Register and deploy the ECS service.
  4. Create the load balancer.

Docker should be installed as a prerequisite. First, I create an ECR repo and set some variables:

ECR_REPO=$(aws ecr create-repository --repository-name demo/sinatra | jq -r '.repository.repositoryUri')
VPC_ID=$(aws ec2 describe-vpcs --filters Name=tag:Name,Values="demo vpc" | jq -r '.Vpcs[].VpcId')

Now I’m ready to clone the demo repo and deploy a sample app to ECS with ufo.

git clone https://github.com/tongueroo/demo-ufo.git demo
cd demo
ufo init --image $ECR_REPO --vpc-id $VPC_ID
ufo current --service demo-web
ufo ship # deploys to ECS on the Spot Fleet cluster

Here’s the ECS service running:

I then grab the Elastic Load Balancing endpoint from the console or with ufo ps.

$ ufo ps
Elb: develop-Elb-12LHJWU4TH3Q8-597605736.us-west-2.elb.amazonaws.com
$

Now I test with curl:

$ curl develop-Elb-12LHJWU4TH3Q8-597605736.us-west-2.elb.amazonaws.com
42

The application returns “42,” the meaning of life, successfully. That’s it! I now have an application running on ECS with Spot Fleet Instances.

Parting thoughts

One additional advantage of using Spot is that it encourages me to think about my architecture in a highly available manner. The Spot “constraints” ironically result in much better sleep at night as the system must be designed to be self-healing.

Hopefully, this post opens the world of running ECS on Spot Instances to you. It’s a core of part of the systems that BoltOps has been running on its own production system and for customers. I still get excited about the setup today. If you’re interested in Spot architectures, contact me at BoltOps.

One last note: Auto Scaling groups also support running multiple instance types and purchase options. Jeff mentions in his post that weight support is planned for a future release. That’s exciting, as it may streamline the usage of Spot with ECS even further.

Learn about AWS Services & Solutions – April AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-april-aws-online-tech-talks/

AWS Tech Talks

Join us this April to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

Blockchain

May 2, 2019 | 11:00 AM – 12:00 PM PTHow to Build an Application with Amazon Managed Blockchain – Learn how to build an application on Amazon Managed Blockchain with the help of demo applications and sample code.

Compute

April 29, 2019 | 1:00 PM – 2:00 PM PTHow to Optimize Amazon Elastic Block Store (EBS) for Higher Performance – Learn how to optimize performance and spend on your Amazon Elastic Block Store (EBS) volumes.

May 1, 2019 | 11:00 AM – 12:00 PM PTIntroducing New Amazon EC2 Instances Featuring AMD EPYC and AWS Graviton Processors – See how new Amazon EC2 instance offerings that feature AMD EPYC processors and AWS Graviton processors enable you to optimize performance and cost for your workloads.

Containers

April 23, 2019 | 11:00 AM – 12:00 PM PTDeep Dive on AWS App Mesh – Learn how AWS App Mesh makes it easy to monitor and control communications for services running on AWS.

March 22, 2019 | 9:00 AM – 10:00 AM PTDeep Dive Into Container Networking – Dive deep into microservices networking and how you can build, secure, and manage the communications into, out of, and between the various microservices that make up your application.

Databases

April 23, 2019 | 1:00 PM – 2:00 PM PTSelecting the Right Database for Your Application – Learn how to develop a purpose-built strategy for databases, where you choose the right tool for the job.

April 25, 2019 | 9:00 AM – 10:00 AM PTMastering Amazon DynamoDB ACID Transactions: When and How to Use the New Transactional APIs – Learn how the new Amazon DynamoDB’s transactional APIs simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables.

DevOps

April 24, 2019 | 9:00 AM – 10:00 AM PTRunning .NET applications with AWS Elastic Beanstalk Windows Server Platform V2 – Learn about the easiest way to get your .NET applications up and running on AWS Elastic Beanstalk.

Enterprise & Hybrid

April 30, 2019 | 11:00 AM – 12:00 PM PTBusiness Case Teardown: Identify Your Real-World On-Premises and Projected AWS Costs – Discover tools and strategies to help you as you build your value-based business case.

IoT

April 30, 2019 | 9:00 AM – 10:00 AM PTBuilding the Edge of Connected Home – Learn how AWS IoT edge services are enabling smarter products for the connected home.

Machine Learning

April 24, 2019 | 11:00 AM – 12:00 PM PTStart Your Engines and Get Ready to Race in the AWS DeepRacer League – Learn more about reinforcement learning, how to build a model, and compete in the AWS DeepRacer League.

April 30, 2019 | 1:00 PM – 2:00 PM PTDeploying Machine Learning Models in Production – Learn best practices for training and deploying machine learning models.

May 2, 2019 | 9:00 AM – 10:00 AM PTAccelerate Machine Learning Projects with Hundreds of Algorithms and Models in AWS Marketplace – Learn how to use third party algorithms and model packages to accelerate machine learning projects and solve business problems.

Networking & Content Delivery

April 23, 2019 | 9:00 AM – 10:00 AM PTSmart Tips on Application Load Balancers: Advanced Request Routing, Lambda as a Target, and User Authentication – Learn tips and tricks about important Application Load Balancers (ALBs) features that were recently launched.

Productivity & Business Solutions

April 29, 2019 | 11:00 AM – 12:00 PM PTLearn How to Set up Business Calling and Voice Connector in Minutes with Amazon Chime – Learn how Amazon Chime Business Calling and Voice Connector can help you with your business communication needs.

May 1, 2019 | 1:00 PM – 2:00 PM PTBring Voice to Your Workplace – Learn how you can bring voice to your workplace with Alexa for Business.

Serverless

April 25, 2019 | 11:00 AM – 12:00 PM PTModernizing .NET Applications Using the Latest Features on AWS Development Tools for .NET – Get a dive deep and demonstration of the latest updates to the AWS SDK and tools for .NET to make development even easier, more powerful, and more productive.

May 1, 2019 | 9:00 AM – 10:00 AM PTCustomer Showcase: Improving Data Processing Workloads with AWS Step Functions’ Service Integrations – Learn how innovative customers like SkyWatch are coordinating AWS services using AWS Step Functions to improve productivity.

Storage

April 24, 2019 | 1:00 PM – 2:00 PM PTAmazon S3 Glacier Deep Archive: The Cheapest Storage in the Cloud – See how Amazon S3 Glacier Deep Archive offers the lowest cost storage in the cloud, at prices significantly lower than storing and maintaining data in on-premises magnetic tape libraries or archiving data offsite.

Getting started with the A1 instance

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/getting-started-with-the-a1-instance/

This post courtesy of Ali Saidi, Annapurna Labs, Principal Systems Developer

At re:Invent 2018 AWS announced the Amazon EC2 A1 instance. These instances are based on the AWS Nitro System that powers all of our latest generation of instances, and are the first instance types powered by the AWS Graviton Processor. These processors feature 64-bit Arm Neoverse cores and are the first general-purpose processor design by Amazon specifically for use in AWS. The instances are up to 40% less expensive than the same number of vCPUs and DRAM available in other instance types. A1 instances are currently available in the US East (N. Virginia and Ohio), US West (Oregon) and EU (Ireland) regions with the following configurations:

ModelvCPUsMemory (GiB)Instance StoreNetwork BandwidthEBS Bandwidth
a1.medium12EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.large24EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.xlarge48EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.2xlarge816EBS OnlyUp to 10 GbpsUp to 3.5 Gbps
a1.4xlarge1632EBS OnlyUp to 10 GbpsUp to 3.5 Gbps

For further information about the instance itself, developers can watch this re:Invent talk and visit the A1 product details page.

Since introduction, we’ve been expanding the available operating systems for the instance and working with the Arm software ecosystem. This blog will provide a summary of what’s supported and how to use it.

Operating System Support

If you’re running on an open source stack, as many customers who build applications that scale-out in the cloud are, the Arm ecosystem is well developed and likely already supports your application.

The A1 instance requires AMIs and software built for Arm processors. When we announced A1, we had support for Amazon Linux 2, Ubuntu 16.04 and 18.04, as well as Red Hat Enterprise Linux 7.6. A little over two months later and the available operating systems for our customers has increased to include Red Hat Enterprise Linux 8.0 Beta, NetBSD, Fedora Rawhide, Ubuntu 18.10, and Debian 9.8. We expect to see more operating systems, linux distributions and AMIs available in the coming months.

These operating systems and Linux distributions are offering the same level of support for their Arm AMIs as they do for their existing x86 AMIs. In almost every case, if you’re installing packages with aptor yum those packages exist for Arm in the OS of your choice and will run in the same way.

For example, to install PHP 7.2 on the Arm version of Amazon Linux 2 or Ubuntu we follow the exact same steps we would on an x86 based instance type:

$ sudo amazon-linux-extras php72
$ sudo yum install php

Or on Ubuntu 18.04:

$ sudo apt update
$ sudo apt install php

Containers

Containers are one of the most popular application deployment mechanisms for A1. The Elastic Container Service (ECS) already supports the A1 instance and there’s an Amazon ECS-Optimized Amazon Linux 2 AMI, and we’ll soon be launching support for Elastic Kubernetes Service (EKS). The majority of Docker official-images hosted in Docker Hub already have support for 64-bit Arm systems along with x86.

We’ve further expanded support for running containers at scale with AWS Batch support for A1.

Running a container on A1

In this section we show how to run the container on Amazon Linux 2. Many Docker official images (at least 76% as of this writing) already support 64-bit Arm systems, and the majority of the ones that don’t either have pending patches to add support or are based on commercial software

$ sudo yum install -y docker
$ sudo service docker start
$ sudo docker run hello-world
 
$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
3b4173355427: Pull complete
Digest: sha256:2557e3c07ed1e38f26e389462d03ed943586f744621577a99efb77324b0fe535
Status: Downloaded newer image for hello-world:latest
 
Hello from Docker!
This message shows that your installation appears to be working correctly.
...

Running WordPress on A1

As an example of automating the running of a LAMP (Linux, Apache HTTPd, MariaDB, and PHP) stack on an A1 instance, we’ve updated a basic CloudFormation template to support the A1 instance type. We made some changes to the template to support Amazon Linux 2, but otherwise the same template works for all our instance types. The template is here and it can be launched like any other CloudFormation template.

It defaults to running on an A1 Arm instance. After the template is launched, the output is the URL of the running instance which can be accessed from a browser to observe the default WordPress home page is being served.

Summary

If you’re using open source software, everything you rely on most likely works on Arm systems today, and over the coming months we’ll be working on increasing the support and improving performance of software running on the A1 instances. If you have an open source based web-tier or containerized application, give the A1 instances a try and let us know what you think. If you run into any issues please don’t hesitate to get in touch at [email protected] , via the AWS Compute Forum, or reach out through your usual AWS support contacts, we love customer’s feedback.

Learn about AWS Services & Solutions – February 2019 AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-february-2019-aws-online-tech-talks/

AWS Tech Talks

Join us this February to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

Application Integration

February 20, 2019 | 11:00 AM – 12:00 PM PTCustomer Showcase: Migration & Messaging for Mission Critical Apps with S&P Global Ratings – Learn how S&P Global Ratings meets the high availability and fault tolerance requirements of their mission critical applications using the Amazon MQ.

AR/VR

February 28, 2019 | 1:00 PM – 2:00 PM PTBuild AR/VR Apps with AWS: Creating a Multiplayer Game with Amazon Sumerian – Learn how to build real-world augmented reality, virtual reality and 3D applications with Amazon Sumerian.

Blockchain

February 18, 2019 | 11:00 AM – 12:00 PM PTDeep Dive on Amazon Managed Blockchain – Explore the components of blockchain technology, discuss use cases, and do a deep dive into capabilities, performance, and key innovations in Amazon Managed Blockchain.

Compute

February 25, 2019 | 9:00 AM – 10:00 AM PTWhat’s New in Amazon EC2 – Learn about the latest innovations in Amazon EC2, including new instances types, related technologies, and consumption options that help you optimize running your workloads for performance and cost.

February 27, 2019 | 1:00 PM – 2:00 PM PTDeploy and Scale Your First Cloud Application with Amazon Lightsail – Learn how to quickly deploy and scale your first multi-tier cloud application using Amazon Lightsail.

Containers

February 19, 2019 | 9:00 AM – 10:00 AM PTSecuring Container Workloads on AWS Fargate – Explore the security controls and best practices for securing containers running on AWS Fargate.

Data Lakes & Analytics

February 18, 2019 | 1:00 PM – 2:00 PM PTAmazon Redshift Tips & Tricks: Scaling Storage and Compute Resources – Learn about the tools and best practices Amazon Redshift customers can use to scale storage and compute resources on-demand and automatically to handle growing data volume and analytical demand.

Databases

February 18, 2019 | 9:00 AM – 10:00 AM PTBuilding Real-Time Applications with Redis – Learn about Amazon’s fully managed Redis service and how it makes it easier, simpler, and faster to build real-time applications.

February 21, 2019 | 1:00 PM – 2:00 PM PT – Introduction to Amazon DocumentDB (with MongoDB Compatibility) – Get an introduction to Amazon DocumentDB (with MongoDB compatibility), a fast, scalable, and highly available document database that makes it easy to run, manage & scale MongoDB-workloads.

DevOps

February 20, 2019 | 1:00 PM – 2:00 PM PTFireside Chat: DevOps at Amazon with Ken Exner, GM of AWS Developer Tools – Join our fireside chat with Ken Exner, GM of Developer Tools, to learn about Amazon’s DevOps transformation journey and latest practices and tools that support the current DevOps model.

End-User Computing

February 28, 2019 | 9:00 AM – 10:00 AM PTEnable Your Remote and Mobile Workforce with Amazon WorkLink – Learn about Amazon WorkLink, a new, fully-managed service that provides your employees secure, one-click access to internal corporate websites and web apps using their mobile phones.

Enterprise & Hybrid

February 26, 2019 | 1:00 PM – 2:00 PM PTThe Amazon S3 Storage Classes – For cloud ops professionals, by cloud ops professionals. Wallace and Orion will tackle your toughest AWS hybrid cloud operations questions in this live Office Hours tech talk.

IoT

February 26, 2019 | 9:00 AM – 10:00 AM PTBring IoT and AI Together – Learn how to bring intelligence to your devices with the intersection of IoT and AI.

Machine Learning

February 19, 2019 | 1:00 PM – 2:00 PM PTGetting Started with AWS DeepRacer – Learn about the basics of reinforcement learning, what’s under the hood and opportunities to get hands on with AWS DeepRacer and how to participate in the AWS DeepRacer League.

February 20, 2019 | 9:00 AM – 10:00 AM PTBuild and Train Reinforcement Models with Amazon SageMaker RL – Learn about Amazon SageMaker RL to use reinforcement learning and build intelligent applications for your businesses.

February 21, 2019 | 11:00 AM – 12:00 PM PTTrain ML Models Once, Run Anywhere in the Cloud & at the Edge with Amazon SageMaker Neo – Learn about Amazon SageMaker Neo where you can train ML models once and run them anywhere in the cloud and at the edge.

February 28, 2019 | 11:00 AM – 12:00 PM PTBuild your Machine Learning Datasets with Amazon SageMaker Ground Truth – Learn how customers are using Amazon SageMaker Ground Truth to build highly accurate training datasets for machine learning quickly and reduce data labeling costs by up to 70%.

Migration

February 27, 2019 | 11:00 AM – 12:00 PM PTMaximize the Benefits of Migrating to the Cloud – Learn how to group and rationalize applications and plan migration waves in order to realize the full set of benefits that cloud migration offers.

Networking

February 27, 2019 | 9:00 AM – 10:00 AM PTSimplifying DNS for Hybrid Cloud with Route 53 Resolver – Learn how to enable DNS resolution in hybrid cloud environments using Amazon Route 53 Resolver.

Productivity & Business Solutions

February 26, 2019 | 11:00 AM – 12:00 PM PTTransform the Modern Contact Center Using Machine Learning and Analytics – Learn how to integrate Amazon Connect and AWS machine learning services, such Amazon Lex, Amazon Transcribe, and Amazon Comprehend, to quickly process and analyze thousands of customer conversations and gain valuable insights.

Serverless

February 19, 2019 | 11:00 AM – 12:00 PM PTBest Practices for Serverless Queue Processing – Learn the best practices of serverless queue processing, using Amazon SQS as an event source for AWS Lambda.

Storage

February 25, 2019 | 11:00 AM – 12:00 PM PT Introducing AWS Backup: Automate and Centralize Data Protection in the AWS Cloud – Learn about this new, fully managed backup service that makes it easy to centralize and automate the backup of data across AWS services in the cloud as well as on-premises.

Best Practices for Porting Applications to the EC2 A1 Instance Type

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/best-practices-for-porting-applications-to-the-ec2-a1-instance-type/

This post courtesy of Dr. Jonathan Shapiro-Ward, AWS Solutions Architect

The new Amazon EC2 A1 instance types are powered by an ARM based AWS Graviton CPU. A1 instances are extremely cost effective and are ideal for scale-out scenarios where a large number of smaller instances are required.

Prior the launch of A1 instances, all AWS instances were x86 based. There are a number of significant differences between the two instruction sets. The predominant difference is that ARM follows a RISC (Reduced Instruction Set Computer) design, whereas x86 is a CISC (Complex Instruction Set Computer) architecture. In short, ARM is a comparatively simple architecture composed of simple instructions which execute within a single cycle. Meanwhile, x86 has mostly complex instructions that execute over multiple cycles. This key difference has a range of implications for compiler and hardware complexity, power efficiency, and performance, but the key question to application developers is portability.

Workloads built for x86 will not run on the A1 family of instances, they must be ported. In many cases this is trivial. An extensive range of Free and Open Source software supports ARM and requires no modification to port workloads. Aside from installing a different binary, the process for installing the Apache Web Server, Nginx, PostgreSQL, Docker, and many more applications is unchanged. Unfortunately, porting is not always so simple, especially for applications developed in house. Ideally, software is written to be portable, but this is not always the case. Even workloads written in languages designed for portability such as Java can prove a challenge. In this blog post, we’ll review common challenges and migration paths from porting from x86 based architectures to ARM.

A General Porting Strategy

  1. Check for Core Language and Platform Support. The vast majority of common languages such as Java, Python, Perl, Ruby, PHP, Go, Rust, and so forth support modern ARM architectures. Likewise, major frameworks such as Django, Spring, Hadoop, Apache Spark, and many more run on ARM. More niche languages and frameworks may not have as robust support. If your language or crucial framework is dependent on x86, you may not be able to run your application on the A1 instance type.
  2. Identify all third party libraries and dependencies. All non-trivial applications rely on third party libraries to provide essential functionality. These can range from a standard library, to open source libraries, to paid for proprietary libraries. Examine these libraries and determine if they support ARM. In the event that a library is dependent upon a specific architecture, search for an open issue around ARM support or inquire with the vendor as to the roadmap. If possible, investigate alternative libraries if ARM support is not forthcoming.
  3. Identify Porting Path. There are three common strategies for porting. The strategy that applies will depend upon the language being used.
    1. For interpreted and those compiled to bytecode, the first step is to translate runbooks, scripts, AMIs, and templates to install the ARM equivalent of the interpreter or language VM. For instance, installing the ARM version of the JVM or CPython. If installation is done via package manager, no change may be necessary. Subsequently it is necessary to ensure that any native code libraries are replaced with the ARM equivalent. For Java, this would entail swapping out libraries leveraging JNI. For Python, this would entail swapping out CPython C-API based libraries (such as numpy). Once again, if this is done via a package manager such as yum or pip, manual intervention should not be necessary.
    2. In the case of compiled languages such as C/C++, the application will have to be re-compiled for ARM. If your application utilizes any machine specific features or relies on behavior that varies between compilers, it may be necessary to re-write parts of your application. This is discussed in more detail below. If your application follows common standards such as ANSI C and avoids any machine specific dependencies, the majority of effort will be spent in modifying the build process. This will depend upon your build process and will likely involve modifying build configurations, makefiles, configure scripts, or other assets. The binary can either be compiled on an A1 instance or cross compiled. Once a binary has been produced, the deployment process should be adapted to target the A1 instance.
    3. For applications built using platform specific languages and frameworks, which have an ARM alternative, the application will have to be ported to this alternative. By far the most common example of this is the .NET Framework. In order to run on ARM, .NET applications must be ported to .NET Core or to Mono. This will likely entail a non-trivial modification to the application codebase. This is discussed in more detail below.
  4. Test! All tests must be ported over and, if there is insufficient test coverage, it may be necessary to write new tests. This is especially pertinent if you had to recompile your application. A strong test suite should identify any issues arising from machine specific code that behaves incorrectly on the A1 instance type. Perform the usual range of unit testing, acceptance testing, and pre-prod testing.
  5. Update your infrastructure as code resources, such as AWS CloudFormation templates, to provision your application to A1 instances. This will likely be a simple change, modifying the instance type, AMI, and user data to reflect the change to the A1 instance.
  6. Perform a Green/Blue Deployment. Create your new A1 based stack alongside your existing stack and leverage Route 53 weighted routing to route 10% of requests to the new stack. Monitor error rates, user behavior, load, and other critical factors in order to determine the health of the ported application in production. If the application behaves correctly, swap over all traffic to the new stack. Otherwise, reexamine the application and identify the root cause of any errors.

Porting C/C++ Applications to A1 Instances

C and C++ are very portable languages. Indeed, C runs on more architectures than any other language but that is not to say that all C will run on all systems. When porting applications to A1 instances, many of the challenges one might expect do not arise. The AWS Graviton chip is little endian, just like x86 and int, float, double, and other common types are the same size between architectures. This does not, however, guarantee portability. Most commonly, issues porting a C based application arise from aspects of the C standard which are dependent upon the architecture and implementation. Let’s briefly look at some examples of C that are not portable between architectures.

The most frequently discussed issue in porting C from x86 to ARM is the use of the character datatype. This issue arises from the C99 standard which requires the implementation to decide if the char datatype is signed or unsigned. On x86 Linux a char is signed by default. On ARM Linux a char is unsigned. This discrepancy is due to performance, with unsigned char types resulting in more efficient ARM assembly. This can, however, cause issues. Let’s examine the following code listing:

//Code Listing 1
#include <stdio.h>

int main(){
    char c = -1;
    if (c < 0){
        printf("The value of the char is less than 0\n");
    } else {
        printf("The value of the char is greater than 0\n");
    }
    return 0;
}

On an x86 instance (in this case a t3.large) the above code has the expected result – printing “The value of the char is less than 0”. On an A1 instance, it does not. There are mechanisms around this, for example gcc has the -fsigned-char flag which forces all char types to become signed upon compilation. Crucially, a developer must be aware of these types of issues ahead of time. Not all compilers and warn levels will provide appropriate warnings around char signedness (and indeed many other issues arising from architectural differences). Resultantly, without rigorous testing, it is possible to introduce unexpected errors by porting. This makes a comprehensive set of tests for your application an essential part of the porting process.

If your application has only ever been built for a single target environment, (e.g. x86 Linux with gcc) there are potentially unexpected behaviors which will emerge when that application is built for a different architecture or with a different compiler.

The key best practice for ensuring portability of your C applications (and other compiled languages) is to adhere to a standard. Vanilla C99 will ensure the broadest compatibility across architectures and operating systems. A compiler specific standard such as gnu99 will ensure compatibility but can tether you to one compiler.

Static analysis should always be used when building your applications. Static analysis helps to detect bugs, security flaws, and pertinent to our discussion: compatibility issues.

Porting .NET Applications to A1 Instances

Porting a .NET application from a Windows instance to an A1 Linux instance can yield significant cost reductions and is an effective way to economically scale out web apps and other parallelizable workloads.

For all intents and purposes, the .NET Framework only runs on x86 Windows. This limitation does not necessarily prevent running your .NET application on an A1 instance. There are two .NET implementations, .NET Core and .NET Framework. The .NET Framework is the modern evolution of the original .NET release and is tightly coupled to x86 Windows. Meanwhile, .NET Core is a recent open source project, developed by Microsoft, which is decoupled from Windows and runs on a variety of platforms, including ARM Linux. There is one final option, Mono – an open source implementation of the .NET Framework which runs on a variety of architectures.

For greenfield projects .NET Core has become the de facto option as it has a number of advantages over the alternatives. Projects based on .NET Core are cross platform and significantly lighter than .Net Framework and Mono projects – making them far better suited to developing microservices and to running in containers or serverless environments. From version 2.1 onward, .NET Core supports ARM.

For existing projects, .NET Core is the best migration path to containers and to running on A1 instances. There are, however, a number of factors that might prohibit replatforming to .NET Core. These include

• Dependency on Windows specific APIs
• Reliance on features that are only available in .NET Framework such as WPF or Windows Forms. Many of these features are coming to .NET Core as part of .NET Core 3, but at time or writing, this is in preview.
• Use of third-party libraries that do not support .NET Core.
• Use of an unsupported language Currently, .NET Core only supports C#, F#, Visual Studio.

There are a number of tools to help port from .NET Framework to .NET Core. These include the .Net portability analyzer, which will analyze a .NET codebase and determine any factors that might prohibit porting.

Conclusion

The A1 instances can deliver significant cost savings over other instances types and are ideal for scale-out applications such as microservices. In many cases, moving to A1 instances can be easy. Many languages, frameworks, and applications have strong support for ARM. For Python, Java, Ruby, and other open source languages, porting can be trivial. For other applications such as native binary applications or .net applications, there can be some challenges. By examining your application and determining what, if any, x86 dependencies exist you can devise a migration strategy which will enable you to make use of A1 instances.

Learn about hourly-replication in Server Migration Service and the ability to migrate large data volumes

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/learn-about-hourly-replication-in-server-migration-service-and-the-ability-to-migrate-large-data-volumes/

This post courtesy of Shane Baldacchino, AWS Solutions Architect

AWS Server Migration Service (AWS SMS) is an agentless service that makes it easier and faster for you to migrate thousands of on-premises workloads to AWS. AWS SMS allows you to automate, schedule, and track incremental replications of live server volumes, making it easier for you to coordinate large-scale server migrations.

In my previous blog posts, we introduced how you can use AWS Server Migration Service (AWS SMS) to migrate a popular commercial off the shelf software, WordPress into AWS.

For details and a walkthrough on how setup the AWS Server Migration Service, please see the following blog posts for Hyper-V and VMware hypervisors which will guide you through the high level process.

In this article we are going to step it up a few notches and look past common the migration of off-the-shelf software and provide you a pattern on how you can use AWS SMS and some of the recently launched features to migrate a more complicated environment, especially compression and resiliency for replication jobs and the support for data volumes greater than 4TB.

This post covers a migration of a complex internally developed eCommerce system comprising of a polyglot architecture. It is made up a Windows Microsoft IIS presentation tier, Tomcat application tier, and Microsoft SQL Server database tier. All workloads run on-premises as virtual machines in a VMware vCenter 5.5 and ESX 5.5 environment.

This theoretical customer environment has various business and infrastructure requirements.

Application downtime: During any migration activities, the application cannot be offline for more than 2 hours
Licensing: The customer has renewed their Microsoft SQL Server license for an additional 3 years and holds License Mobility with Software Assurance option for Microsoft SQL Server and therefore wants to take advantage of AWS BYOL licensing for Microsoft SQL server and Microsoft Windows Server.
Large data volumes: The Microsoft SQL Server database engine (.mdf, .ldf and .ndf files) consumes 11TB of storage

Walkthrough

Key elements of this migration process are identical to the process outlined in my previous blog posts and for more information on this process, please see the following blog posts Hyper-V and VMware hypervisors, but a high level you will need to.

• Establish your AWS environment.
• Download the SMS Connector from the AWS Management Console.
• Configure AWS SMS and Hypervisor permissions.
• Install and configure the SMS Connector appliance.
• Import your virtual machine inventory and create a replication jobs
• Launch your Amazon EC2 instances and associated NACL’s, Security Groups and AWS Elastic Load Balancers
• Change your DNS records to resolve the custom application to an AWS Elastic Load Balancer.

Before you start, ensure that your source systems OS and vCenter version are supported by AWS. For more information, see the Server Migration Service FAQ.

Planning the Migration

Once you have downloaded and configured the AWS SMS connector with your given Hypervisor you can get started in creating replication jobs.

The artifacts derived from our replication jobs with AWS SMS will be AMI’s (Amazon Machine Images) and as such we do not need to replicate each server individually and that is because we have a three-tier architecture that has commonality between servers with multiple Application and Web servers performing the same function, and as such we can leverage a common AMI and create three replication jobs.

1. Microsoft SQL Server – Database Tier
2. Ubuntu Server – Application Tier
3. IIS Web server – Webserver Tier

Performing the Replication

After validating that the SMS Connector is in a “HEALTHY” state, import your server catalog from your Hypervisor to AWS SMS. This process can take up to a minute.

Select the three servers (Microsoft SQL Server, Ubuntu Server, IIS Web server) to migrate and choose Create replication job. AWS SMS now supports creating replications jobs with frequencies as short as 1 hour, and as such to ensure our business RTO (Recovery Time Objective) of 2 hours is met we will create our replication jobs with a frequency of 1 hour. This will minimize the risk of any delta updates during the cutover windows not completing.

Given the businesses existing licensing investment in Microsoft SQL Server, they will leverage these the BYOL (Bring Your Own License) offering when creating the Microsoft SQL Server replication job.

The AWS SMS console guides you through the process. The time that the initial replication task takes to complete is dependent on available bandwidth and the size of your virtual machines.

After the initial seed replication, network bandwidth requirement is minimized as AWS SMS replicates only incremental changes occurring on the VM.

The progress updates from AWS SMS are automatically sent to AWS Migration Hub so that you can track tasks in progress.

AWS Migration Hub provides a single location to track the progress of application migrations across multiple AWS and partner solutions. In this post, we are using AWS SMS as a mechanism to migrate the virtual machines (VMs) and track them via AWS Migration Hub.

Migration Hub and AWS SMS are both free. You pay only for the cost of the individual migration tools that you use, and any resources being consumed on AWS

The dashboard reflects any status changes that occur in the linked services. You can see from the following image that two servers are complete whilst another is in progress.

Using Migration Hub, you can view the migration progress of all applications. This allows you to quickly get progress updates across all of your migrations, easily identify and troubleshoot any issues, and reduce the overall time and effort spent on your migration projects.

After validating that the SMS Connector

Testing Your Replicated Instances

Thirty hours after creating the replication jobs, notification was received via AWS SNS (Simple Notification Service) that all 3 replication jobs have completed. During the 30-hour replication window the customers ISP experienced downtime and sporadic flapping of the link, but this was negated by the network auto-recovery feature of SMS. It recovered and resumed replication without any intervention.

With the replication tasks being complete. The artifact created by AWS SMS is a custom AMI that you can use to deploy an EC2 instance. Follow the usual process to launch your EC2 instance, noting that you may need to replace any host-based firewalls with security groups and NACLs and any hardware based load balancers with Elastic Load Balancing to achieve fault tolerance, scalability, performance and security.

As this environment is a 3-tier architecture with commonality been tiers (Application and Presentation Tier) we can create during the EC2 Launch process an ASG (Auto Scaling Group) to ensure that deployed capacity matches user demand. The ASG will be based on the custom AMI’s generated by the replication jobs.

When you create an EC2 instance, ensure that you pick the most suitable EC2 instance type and size to match your performance and cost requirements.

While your new EC2 instances are a replica of your on-premises VM, you should always validate that applications are functioning. How you do this differs on an application-by-application basis. You can use a combination of approaches, such as editing a local host file and testing your application, SSH, RDP and Telnet.

For our Windows Presentation and database tier, I can RDP in to my systems and validate IIS 8.0 and other services are functioning correctly.

For our Ubuntu Application tier, we can SSH in to perform validation.

Post validation of each individual server we can now continue to test the application end to end. This is because our systems have been instantiated inside a VPC with no route back to our on-premises environment which allows us to test functionality without the risk of communication back to our production application.

After validation of systems it is now time to cut over, plan your runbook accordingly to ensure you either eliminate or minimize application disruption.

Cutting Over

As the replication window specified in AWS SMS replication jobs was 1 hour, there were hourly AMI’s created that provide delta updates since the initial seed replication was performed. The customer verified the stack by executing the previously created runbook using the latest AMIs, and verified the application behaved as expected.

After another round of testing, the customer decided to plan the cutover on the coming Saturday at midnight, by announcing a two-hour scheduled maintenance window. During the cutover window, the customer took the application offline, shutdown Microsoft SQL Server instance and performed an on-demand sync of all systems.

This generate a new versioned AMI that contained all on-premise data. The customer then executed the runbook on the new AMI’s. For the application and presentation tier these AMI’s were used in the ASG configuration. After application validation Amazon Route 53 was updated to resolve the application CNAME to the Application Load Balancer CNAME used to load balance traffic to the fleet of IIS servers.

Based on the TTL (Time To Live) of your Amazon Route 53 DNS zone file, end users slowly resolve the application to AWS, in this case within 300 seconds. Once this TTL period had elapsed the customer brought their application back online and exited their maintenance window, with time to spare.

After modifying the Amazon Route 53 Zone Apex, the physical topology now looks as follows with traffic being routed to AWS.

After validation of a successful migration the customer deleted their AWS Server Migration Service replication jobs and began planning to decommission their on-premises resources.

Summary

This is an example pattern on migrate a complex custom polyglot environment in to AWS using AWS migration services, specifically leveraging many of the new features of the AWS SMS service.

Many architectures can be extended to use many of the inherent benefits of AWS, with little effort. For example this article illustrated how AWS Migration Services can be used to migrate complex environments in to AWS and then use native AWS services such as Amazon CloudWatch metrics to drive Auto Scaling policies to ensure deployed capacity matches user demand whilst technologies such as Application Load Balancers can be used to achieve fault tolerance and scalability

Think big and get building!

 

 

AWS Fargate Price Reduction – Up to 50%

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/aws-fargate-price-reduction-up-to-50/

AWS Fargate is a compute engine that uses containers as its fundamental compute primitive. AWS Fargate runs your application containers for you on demand. You no longer need to provision a pool of instances or manage a Docker daemon or orchestration agent. Because the infrastructure that runs your containers is invisible, you don’t have to worry about whether you have provisioned enough instances to run your containerized workload. You also don’t have to worry about whether you’re using those instances efficiently to avoid paying for resources that you don’t use. You no longer need to do undifferentiated heavy lifting to maintain the infrastructure that runs your containers. AWS Fargate automatically updates and patches underlying resources to keep you safe from vulnerabilities in the underlying operating system and software. AWS Fargate uses an on-demand pricing model that charges per vCPU and per GB of memory reserved per second, with a 1-minute minimum.

At re:Invent 2018 we announced Firecracker, an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant containers and functions-based services. Firecracker enables you to deploy workloads in lightweight virtual machines called microVMs. These microVMs can initiate code faster, with less overhead. Innovations such as these allow us to improve the efficiency of Fargate and help us pass on cost savings to customers.

Effective January 7th, 2019 Fargate pricing per vCPU per second is being reduced by 20%, and pricing per GB of memory per second is being reduced by 65%. Depending on the ratio of CPU to memory that you’re allocating for your containers, you could see an overall price reduction of anywhere from 35% to 50%.

The following table shows the price reduction for each built-in launch configuration.

vCPUGB MemoryEffective Price Cut
0.250.5-35.00%
0.251-42.50%
0.252-50.00%
0.51-35.00%
0.52-42.50%
0.53-47.00%
0.54-50.00%
12-35.00%
13-39.30%
14-42.50%
15-45.00%
16-47.00%
17-48.60%
18-50.00%
24-35.00%
25-37.30%
26-39.30%
27-41.00%
28-42.50%
29-43.80%
210-45.00%
211-46.10%
212-47.00%
213-47.90%
214-48.60%
215-49.30%
216-50.00%
48-35.00%
49-36.20%
410-37.30%
411-38.30%
412-39.30%
413-40.20%
414-41.00%
415-41.80%
416-42.50%
417-43.20%
418-43.80%
419-44.40%
420-45.00%
421-45.50%
422-46.10%
423-46.50%
424-47.00%
425-47.40%
426-47.90%
427-48.30%
428-48.60%
429-49.00%
430-49.30%

Many engineering organizations such as Turner Broadcasting System, Veritone, and Catalytic have already been using AWS Fargate to achieve significant infrastructure cost savings for batch jobs, cron jobs, and other on-and-off workloads. Running a cluster of instances at all times to run your containers constantly incurs cost, but AWS Fargate stops charging when your containers stop.

With these new price reductions, AWS Fargate also enables significant savings for containerized web servers, API services, and background queue consumers run by organizations like KPMG, CBS, and Product Hunt. If your application is currently running on large EC2 instances that peak at 10-20% CPU utilization, consider migrating to containers in AWS Fargate. Containers give you more granularity to provision the exact amount of CPU and memory that your application needs. You no longer pay for instance resources that your application doesn’t use. If a sudden spike of traffic causes your application to require more resources you still have the ability to rapidly scale your application out by adding more containers, or scale your application up by launching larger containers.

AWS Fargate lets you focus on building your containerized application without worrying about the infrastructure. This encompasses not just the infrastructure capacity provisioning, monitoring, and maintenance but also the infrastructure price. Implementing Firecracker in AWS Fargate is just part of our journey to keep making AWS Fargate faster, more powerful, and more efficient. Running your containers in AWS Fargate allows you to benefit from these improvements without any manual intervention required on your part.

AWS Fargate has achieved SOC, PCI, HIPAA BAA, ISO, MTCS, C5, and ENS High compliance certification, and has a 99.99% SLA. You can get started with AWS Fargate in 13 AWS Regions around the world.

Learn about AWS Services & Solutions – January AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-january-aws-online-tech-talks/

AWS Tech Talks

Happy New Year! Join us this January to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

Containers

January 22, 2019 | 9:00 AM – 10:00 AM PTDeep Dive Into AWS Cloud Map: Service Discovery for All Your Cloud Resources – Learn how to increase your application availability with AWS Cloud Map, a new service that lets you discover all your cloud resources.

Data Lakes & Analytics

January 22, 2019 | 1:00 PM – 2:00 PM PT– Increase Your Data Engineering Productivity Using Amazon EMR Notebooks – Learn how to develop analytics and data processing applications faster with Amazon EMR Notebooks.

Enterprise & Hybrid

January 29, 2019 | 1:00 PM – 2:00 PM PTBuild Better Workloads with the AWS Well-Architected Framework and Tool – Learn how to apply architectural best practices to guide your cloud migration.

IoT

January 29, 2019 | 9:00 AM – 10:00 AM PTHow To Visually Develop IoT Applications with AWS IoT Things Graph – See how easy it is to build IoT applications by visually connecting devices & web services.

Mobile

January 21, 2019 | 11:00 AM – 12:00 PM PTBuild Secure, Offline, and Real Time Enabled Mobile Apps Using AWS AppSync and AWS Amplify – Learn how to easily build secure, cloud-connected data-driven mobile apps using AWS Amplify, GraphQL, and mobile-optimized AWS services.

Networking

January 30, 2019 | 9:00 AM – 10:00 AM PTImprove Your Application’s Availability and Performance with AWS Global Accelerator – Learn how to accelerate your global latency-sensitive applications by routing traffic across AWS Regions.

Robotics

January 29, 2019 | 11:00 AM – 12:00 PM PTUsing AWS RoboMaker Simulation for Real World Applications – Learn how AWS RoboMaker simulation works and how you can get started with your own projects.

Security, Identity & Compliance

January 23, 2019 | 1:00 PM – 2:00 PM PTCustomer Showcase: How Dow Jones Uses AWS to Create a Secure Perimeter Around Its Web Properties – Learn tips and tricks from a real-life example on how to be in control of your cloud security and automate it on AWS.

January 30, 2019 | 11:00 AM – 12:00 PM PTIntroducing AWS Key Management Service Custom Key Store – Learn how you can generate, store, and use your KMS keys in hardware security modules (HSMs) that you control.

Serverless

January 31, 2019 | 9:00 AM – 10:00 AM PT Nested Applications: Accelerate Serverless Development Using AWS SAM and the AWS Serverless Application Repository – Learn how to compose nested applications using the AWS Serverless Application Model (SAM), SAM CLI, and the AWS Serverless Application Repository.

January 31, 2019 | 11:00 AM – 12:00 PM PTDeep Dive Into Lambda Layers and the Lambda Runtime API – Learn how to use Lambda Layers to enable re-use and sharing of code, and how you can build and test Layers locally using the AWS Serverless Application Model (SAM).

Storage

January 28, 2019 | 11:00 AM – 12:00 PM PTThe Amazon S3 Storage Classes – Learn about the Amazon S3 Storage Classes and how to use them to optimize your storage resources.

January 30, 2019 | 1:00 PM – 2:00 PM PTDeep Dive on Amazon FSx for Windows File Server: Running Windows on AWS – Learn how to deploy Amazon FSx for Windows File Server in some of the most common use cases.

Optimizing a Lift-and-Shift for Security

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-security/

This is the third and final blog within a three-part series that examines how to optimize lift-and-shift workloads. A lift-and-shift is a common approach for migrating to AWS, whereby you move a workload from on-prem with little or no modification. This third blog examines how lift-and-shift workloads can benefit from an improved security posture with no modification to the application codebase. (Read about optimizing a lift-and-shift for performance and for cost effectiveness.)

Moving to AWS can help to strengthen your security posture by eliminating many of the risks present in on-premise deployments. It is still essential to consider how to best use AWS security controls and mechanisms to ensure the security of your workload. Security can often be a significant concern in lift-and-shift workloads, especially for legacy workloads where modern encryption and security features may not present. By making use of AWS security features you can significantly improve the security posture of a lift-and-shift workload, even if it lacks native support for modern security best practices.

Adding TLS with Application Load Balancers

Legacy applications are often the subject of a lift-and-shift. Such migrations can help reduce risks by moving away from out of date hardware but security risks are often harder to manage. Many legacy applications leverage HTTP or other plaintext protocols that are vulnerable to all manner of attacks. Often, modifying a legacy application’s codebase to implement TLS is untenable, necessitating other options.

One comparatively simple approach is to leverage an Application Load Balancer or a Classic Load Balancer to provide SSL offloading. In this scenario, the load balancer would be exposed to users, while the application servers that only support plaintext protocols will reside within a subnet which is can only be accessed by the load balancer. The load balancer would perform the decryption of all traffic destined for the application instance, forwarding the plaintext traffic to the instances. This allows  you to use encryption on traffic between the client and the load balancer, leaving only internal communication between the load balancer and the application in plaintext. Often this approach is sufficient to meet security requirements, however, in more stringent scenarios it is never acceptable for traffic to be transmitted in plaintext, even if within a secured subnet. In this scenario, a sidecar can be used to eliminate plaintext traffic ever traversing the network.

Improving Security and Configuration Management with Sidecars

One approach to providing encryption to legacy applications is to leverage what’s often termed the “sidecar pattern.” The sidecar pattern entails a second process acting as a proxy to the legacy application. The legacy application only exposes its services via the local loopback adapter and is thus accessible only to the sidecar. In turn the sidecar acts as an encrypted proxy, exposing the legacy application’s API to external consumers via TLS. As unencrypted traffic between the sidecar and the legacy application traverses the loopback adapter, it never traverses the network. This approach can help add encryption (or stronger encryption) to legacy applications when it’s not feasible to modify the original codebase. A common approach to implanting sidecars is through container groups such as pod in EKS or a task in ECS.

Implementing the Sidecar Pattern With Containers

Figure 1: Implementing the Sidecar Pattern With Containers

Another use of the sidecar pattern is to help legacy applications leverage modern cloud services. A common example of this is using a sidecar to manage files pertaining to the legacy application. This could entail a number of options including:

  • Having the sidecar dynamically modify the configuration for a legacy application based upon some external factor, such as the output of Lambda function, SNS event or DynamoDB write.
  • Having the sidecar write application state to a cache or database. Often applications will write state to the local disk. This can be problematic for autoscaling or disaster recovery, whereby having the state easily accessible to other instances is advantages. To facilitate this, the sidecar can write state to Amazon S3, Amazon DynamoDB, Amazon Elasticache or Amazon RDS.

A sidecar requires customer development, but it doesn’t require any modification of the lift-and-shifted application. A sidecar treats the application as a blackbox and interacts with it via its API, configuration file, or other standard mechanism.

Automating Security

A lift-and-shift can achieve a significantly stronger security posture by incorporating elements of DevSecOps. DevSecOps is a philosophy that argues that everyone is responsible for security and advocates for automation all parts of the security process. AWS has a number of services which can help implement a DevSecOps strategy. These services include:

  • Amazon GuardDuty: a continuous monitoring system which analyzes AWS CloudTrail Events, Amazon VPC Flow Log and DNS Logs. GuardDuty can detect threats and trigger an automated response.
  • AWS Shield: a managed DDOS protection services
  • AWS WAF: a managed Web Application Firewall
  • AWS Config: a service for assessing, tracking, and auditing changes to AWS configuration

These services can help detect security problems and implement a response in real time, achieving a significantly strong posture than traditional security strategies. You can build a DevSecOps strategy around a lift-and-shift workload using these services, without having to modify the lift-and-shift application.

Conclusion

There are many opportunities for taking advantage of AWS services and features to improve a lift-and-shift workload. Without any alteration to the application you can strengthen your security posture by utilizing AWS security services and by making small environmental and architectural changes that can help alleviate the challenges of legacy workloads.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Optimizing a Lift-and-Shift for Cost Effectiveness and Ease of Management

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-cost/

Lift-and-shift is the process of migrating a workload from on premise to AWS with little or no modification. A lift-and-shift is a common route for enterprises to move to the cloud, and can be a transitionary state to a more cloud native approach. This is the second blog post in a three-part series which investigates how to optimize a lift-and-shift workload. The first post is about performance.

A key concern that many customers have with a lift-and-shift is cost. If you move an application as is  from on-prem to AWS, is there any possibility for meaningful cost savings? By employing AWS services, in lieu of self-managed EC2 instances, and by leveraging cloud capability such as auto scaling, there is potential for significant cost savings. In this blog post, we will discuss a number of AWS services and solutions that you can leverage with minimal or no change to your application codebase in order to significantly reduce management costs and overall Total Cost of Ownership (TCO).

Automate

Even if you can’t modify your application, you can change the way you deploy your application. The adopting-an-infrastructure-as-code approach can vastly improve the ease of management of your application, thereby reducing cost. By templating your application through Amazon CloudFormation, Amazon OpsWorks, or Open Source tools you can make deploying and managing your workloads a simple and repeatable process.

As part of the lift-and-shift process, rationalizing the workload into a set of templates enables less time to spent in the future deploying and modifying the workload. It enables the easy creation of dev/test environments, facilitates blue-green testing, opens up options for DR, and gives the option to roll back in the event of error. Automation is the single step which is most conductive to improving ease of management.

Reserved Instances and Spot Instances

A first initial consideration around cost should be the purchasing model for any EC2 instances. Reserved Instances (RIs) represent a 1-year or 3-year commitment to EC2 instances and can enable up to 75% cost reduction (over on demand) for steady state EC2 workloads. They are ideal for 24/7 workloads that must be continually in operation. An application requires no modification to make use of RIs.

An alternative purchasing model is EC2 spot. Spot instances offer unused capacity available at a significant discount – up to 90%. Spot instances receive a two-minute warning when the capacity is required back by EC2 and can be suspended and resumed. Workloads which are architected for batch runs – such as analytics and big data workloads – often require little or no modification to make use of spot instances. Other burstable workloads such as web apps may require some modification around how they are deployed.

A final alternative is on-demand. For workloads that are not running in perpetuity, on-demand is ideal. Workloads can be deployed, used for as long as required, and then terminated. By leveraging some simple automation (such as AWS Lambda and CloudWatch alarms), you can schedule workloads to start and stop at the open and close of business (or at other meaningful intervals). This typically requires no modification to the application itself. For workloads that are not 24/7 steady state, this can provide greater cost effectiveness compared to RIs and more certainty and ease of use when compared to spot.

Amazon FSx for Windows File Server

Amazon FSx for Windows File Server provides a fully managed Windows filesystem that has full compatibility with SMB and DFS and full AD integration. Amazon FSx is an ideal choice for lift-and-shift architectures as it requires no modification to the application codebase in order to enable compatibility. Windows based applications can continue to leverage standard, Windows-native protocols to access storage with Amazon FSx. It enables users to avoid having to deploy and manage their own fileservers – eliminating the need for patching, automating, and managing EC2 instances. Moreover, it’s easy to scale and minimize costs, since Amazon FSx offers a pay-as-you-go pricing model.

Amazon EFS

Amazon Elastic File System (EFS) provides high performance, highly available multi-attach storage via NFS. EFS offers a drop-in replacement for existing NFS deployments. This is ideal for a range of Linux and Unix usecases as well as cross-platform solutions such as Enterprise Java applications. EFS eliminates the need to manage NFS infrastructure and simplifies storage concerns. Moreover, EFS provides high availability out of the box, which helps to reduce single points of failure and avoids the need to manually configure storage replication. Much like Amazon FSx, EFS enables customers to realize cost improvements by moving to a pay-as-you-go pricing model and requires a modification of the application.

Amazon MQ

Amazon MQ is a managed message broker service that provides compatibility with JMS, AMQP, MQTT, OpenWire, and STOMP. These are amongst the most extensively used middleware and messaging protocols and are a key foundation of enterprise applications. Rather than having to manually maintain a message broker, Amazon MQ provides a performant, highly available managed message broker service that is compatible with existing applications.

To use Amazon MQ without any modification, you can adapt applications that leverage a standard messaging protocol. In most cases, all you need to do is update the application’s MQ endpoint in its configuration. Subsequently, the Amazon MQ service handles the heavy lifting of operating a message broker, configuring HA, fault detection, failure recovery, software updates, and so forth. This offers a simple option for reducing management overhead and improving the reliability of a lift-and-shift architecture. What’s more is that applications can migrate to Amazon MQ without the need for any downtime, making this an easy and effective way to improve a lift-and-shift.

You can also use Amazon MQ to integrate legacy applications with modern serverless applications. Lambda functions can subscribe to MQ topics and trigger serverless workflows, enabling compatibility between legacy and new workloads.

Integrating Lift-and-Shift Workloads with Lambda via Amazon MQ

Figure 1: Integrating Lift-and-Shift Workloads with Lambda via Amazon MQ

Amazon Managed Streaming Kafka

Lift-and-shift workloads which include a streaming data component are often built around Apache Kafka. There is a certain amount of complexity involved in operating a Kafka cluster which incurs management and operational expense. Amazon Kinesis is a managed alternative to Apache Kafka, but it is not a drop-in replacement. At re:Invent 2018, we announced the launch of Amazon Managed Streaming Kafka (MSK) in public preview. MSK provides a managed Kafka deployment with pay-as-you-go pricing and an acts as a drop-in replacement in existing Kafka workloads. MSK can help reduce management costs and improve cost efficiency and is ideal for lift-and-shift workloads.

Leveraging S3 for Static Web Hosting

A significant portion of any web application is static content. This includes videos, image, text, and other content that changes seldom, if ever. In many lift-and-shifted applications, web servers are migrated to EC2 instances and host all content – static and dynamic. Hosting static content from an EC2 instance incurs a number of costs including the instance, EBS volumes, and likely, a load balancer. By moving static content to S3, you can significantly reduce the amount of compute required to host your web applications. In many cases, this change is non-disruptive and can be done at the DNS or CDN layer, requiring no change to your application.

Reducing Web Hosting Costs with S3 Static Web Hosting

Figure 2: Reducing Web Hosting Costs with S3 Static Web Hosting

Conclusion

There are numerous opportunities for reducing the cost of a lift-and-shift. Without any modification to the application, lift-and-shift workloads can benefit from cloud-native features. By using AWS services and features, you can significantly reduce the undifferentiated heavy lifting inherent in on-prem workloads and reduce resources and management overheads.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Optimizing a Lift-and-Shift for Performance

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-performance/

Many organizations begin their cloud journey with a lift-and-shift of applications from on-premise to AWS. This approach involves migrating software deployments with little, or no, modification. A lift-and-shift avoids a potentially expensive application rewrite but can result in a less optimal workload that a cloud native solution. For many organizations, a lift-and-shift is a transitional stage to an eventual cloud native solution, but there are some applications that can’t feasibly be made cloud-native such as legacy systems or proprietary third-party solutions. There are still clear benefits of moving these workloads to AWS, but how can they be best optimized?

In this blog series post, we’ll look at different approaches for optimizing a black box lift-and-shift. We’ll consider how we can significantly improve a lift-and-shift application across three perspectives: performance, cost, and security. We’ll show that without modifying the application we can integrate services and features that will make a lift-and-shift workload cheaper, faster, more secure, and more reliable. In this first blog, we’ll investigate how a lift-and-shift workload can have improved performance through leveraging AWS features and services.

Performance gains are often a motivating factor behind a cloud migration. On-premise systems may suffer from performance bottlenecks owing to legacy infrastructure or through capacity issues. When performing a lift-and-shift, how can you improve performance? Cloud computing is famous for enabling horizontally scalable architectures but many legacy applications don’t support this mode of operation. Traditional business applications are often architected around a fixed number of servers and are unable to take advantage of horizontal scalability. Even if a lift-and-shift can’t make use of auto scaling groups and horizontal scalability, you can achieve significant performance gains by moving to AWS.

Scaling Up

The easiest alternative to scale up to compute is vertical scalability. AWS provides the widest selection of virtual machine types and the largest machine types. Instances range from small, burstable t3 instances series all the way to memory optimized x1 series. By leveraging the appropriate instance, lift-and-shifts can benefit from significant performance. Depending on your workload, you can also swap out the instances used to power your workload to better meet demand. For example, on days in which you anticipate high load you could move to more powerful instances. This could be easily automated via a Lambda function.

The x1 family of instances offers considerable CPU, memory, storage, and network performance and can be used to accelerate applications that are designed to maximize single machine performance. The x1e.32xlarge instance, for example, offers 128 vCPUs, 4TB RAM, and 14,000 Mbps EBS bandwidth. This instance is ideal for high performance in-memory workloads such as real time financial risk processing or SAP Hana.

Through selecting the appropriate instance types and scaling that instance up and down to meet demand, you can achieve superior performance and cost effectiveness compared to running a single static instance. This affords lift-and-shift workloads far greater efficiency that their on-prem counterparts.

Placement Groups and C5n Instances

EC2 Placement groups determine how you deploy instances to underlying hardware. One can either choose to cluster instances into a low latency group within a single AZ or spread instances across distinct underlying hardware. Both types of placement groups are useful for optimizing lift-and-shifts.

The spread placement group is valuable in applications that rely on a small number of critical instances. If you can’t modify your application  to leverage auto scaling, liveness probes, or failover, then spread placement groups can help reduce the risk of simultaneous failure while improving the overall reliability of the application.

Cluster placement groups help improve network QoS between instances. When used in conjunction with enhanced networking, cluster placement groups help to ensure low latency, high throughput, and high network packets per second. This is beneficial for chatty applications and any application that leveraged physical co-location for performance on-prem.

There is no additional charge for using placement groups.

You can extend this approach further with C5n instances. These instances offer 100Gbps networking and can be used in placement group for the most demanding networking intensive workloads. Using both placement groups and the C5n instances require no modification to your application, only to how it is deployed – making it a strong solution for providing network performance to lift-and-shift workloads.

Leverage Tiered Storage to Optimize for Price and Performance

AWS offers a range of storage options, each with its own performance characteristics and price point. Through leveraging a combination of storage types, lift-and-shifts can achieve the performance and availability requirements in a price effective manner. The range of storage options include:

Amazon EBS is the most common storage service involved with lift-and-shifts. EBS provides block storage that can be attached to EC2 instances and formatted with a typical file system such as NTFS or ext4. There are several different EBS types, ranging from inexpensive magnetic storage to highly performant provisioned IOPS SSDs. There are also storage-optimized instances that offer high performance EBS access and NVMe storage. By utilizing the appropriate type of EBS volume and instance, a compromise of performance and price can be achieved. RAID offers a further option to optimize EBS. EBS utilizes RAID 1 by default, providing replication at no additional cost, however an EC2 instance can apply other RAID levels. For instance, you can apply RAID 0 over a number of EBS volumes in order to improve storage performance.

In addition to EBS, EC2 instances can utilize the EC2 instance store. The instance store provides ephemeral direct attached storage to EC2 instances. The instance store is included with the EC2 instance and provides a facility to store non-persistent data. This makes it ideal for temporary files that an application produces, which require performant storage. Both EBS and the instance store are expose to the EC2 instance as block level devices, and the OS can use its native management tools to format and mount these volumes as per a traditional disk – requiring no significant departure from the on prem configuration. In several instance types including the C5d and P3d are equipped with local NVMe storage which can support extremely IO intensive workloads.

Not all workloads require high performance storage. In many cases finding a compromise between price and performance is top priority. Amazon S3 provides highly durable, object storage at a significantly lower price point than block storage. S3 is ideal for a large number of use cases including content distribution, data ingestion, analytics, and backup. S3, however, is accessible via a RESTful API and does not provide conventional file system semantics as per EBS. This may make S3 less viable for applications that you can’t easily modify, but there are still options for using S3 in such a scenario.

An option for leveraging S3 is AWS Storage Gateway. Storage Gateway is a virtual appliance than can be run on-prem or on EC2. The Storage Gateway appliance can operate in three configurations: file gateway, volume gateway and tape gateway. File gateway provides an NFS interface, Volume Gateway provides an iSCSI interface, and Tape Gateway provides an iSCSI virtual tape library interface. This allows files, volumes, and tapes to be exposed to an application host through conventional protocols with the Storage Gateway appliance persisting data to S3. This allows an application to be agnostic to S3 while leveraging typical enterprise storage protocols.

Using S3 Storage via Storage Gateway

Figure 1: Using S3 Storage via Storage Gateway

Conclusion

A lift-and-shift can achieve significant performance gains on AWS by making use of a range of instance types, storage services, and other features. Even without any modification to the application, lift-and-shift workloads can benefit from cutting edge compute, network, and IO which can help realize significant, meaningful performance gains.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Learn about New AWS re:Invent Launches – December AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-new-aws-reinvent-launches-december-aws-online-tech-talks/

AWS Tech Talks

Join us in the next couple weeks to learn about some of the new service and feature launches from re:Invent 2018. Learn about features and benefits, watch live demos and ask questions! We’ll have AWS experts online to answer any questions you may have. Register today!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

Compute

December 19, 2018 | 01:00 PM – 02:00 PM PTDeveloping Deep Learning Models for Computer Vision with Amazon EC2 P3 Instances – Learn about the different steps required to build, train, and deploy a machine learning model for computer vision.

Containers

December 11, 2018 | 01:00 PM – 02:00 PM PTIntroduction to AWS App Mesh – Learn about using AWS App Mesh to monitor and control microservices on AWS.

Data Lakes & Analytics

December 10, 2018 | 11:00 AM – 12:00 PM PTIntroduction to AWS Lake Formation – Build a Secure Data Lake in Days – AWS Lake Formation (coming soon) will make it easy to set up a secure data lake in days. With AWS Lake Formation, you will be able to ingest, catalog, clean, transform, and secure your data, and make it available for analysis and machine learning.

December 12, 2018 | 11:00 AM – 12:00 PM PTIntroduction to Amazon Managed Streaming for Kafka (MSK) – Learn about features and benefits, use cases and how to get started with Amazon MSK.

Databases

December 10, 2018 | 01:00 PM – 02:00 PM PTIntroduction to Amazon RDS on VMware – Learn how Amazon RDS on VMware can be used to automate on-premises database administration, enable hybrid cloud backups and read scaling for on-premises databases, and simplify database migration to AWS.

December 13, 2018 | 09:00 AM – 10:00 AM PTServerless Databases with Amazon Aurora and Amazon DynamoDB – Learn about the new serverless features and benefits in Amazon Aurora and DynamoDB, use cases and how to get started.

Enterprise & Hybrid

December 19, 2018 | 11:00 AM – 12:00 PM PTHow to Use “Minimum Viable Refactoring” to Achieve Post-Migration Operational Excellence – Learn how to improve the security and compliance of your applications in two weeks with “minimum viable refactoring”.

IoT

December 17, 2018 | 11:00 AM – 12:00 PM PTIntroduction to New AWS IoT Services – Dive deep into the AWS IoT service announcements from re:Invent 2018, including AWS IoT Things Graph, AWS IoT Events, and AWS IoT SiteWise.

Machine Learning

December 10, 2018 | 09:00 AM – 10:00 AM PTIntroducing Amazon SageMaker Ground Truth – Learn how to build highly accurate training datasets with machine learning and reduce data labeling costs by up to 70%.

December 11, 2018 | 09:00 AM – 10:00 AM PTIntroduction to AWS DeepRacer – AWS DeepRacer is the fastest way to get rolling with machine learning, literally. Get hands-on with a fully autonomous 1/18th scale race car driven by reinforcement learning, 3D racing simulator, and a global racing league.

December 12, 2018 | 01:00 PM – 02:00 PM PTIntroduction to Amazon Forecast and Amazon Personalize – Learn about Amazon Forecast and Amazon Personalize – what are the key features and benefits of these managed ML services, common use cases and how you can get started.

December 13, 2018 | 01:00 PM – 02:00 PM PTIntroduction to Amazon Textract: Now in Preview – Learn how Amazon Textract, now in preview, enables companies to easily extract text and data from virtually any document.

Networking

December 17, 2018 | 01:00 PM – 02:00 PM PTIntroduction to AWS Transit Gateway – Learn how AWS Transit Gateway significantly simplifies management and reduces operational costs with a hub and spoke architecture.

Robotics

December 18, 2018 | 11:00 AM – 12:00 PM PTIntroduction to AWS RoboMaker, a New Cloud Robotics Service – Learn about AWS RoboMaker, a service that makes it easy to develop, test, and deploy intelligent robotics applications at scale.

Security, Identity & Compliance

December 17, 2018 | 09:00 AM – 10:00 AM PTIntroduction to AWS Security Hub – Learn about AWS Security Hub, and how it gives you a comprehensive view of high-priority security alerts and your compliance status across AWS accounts.

Serverless

December 11, 2018 | 11:00 AM – 12:00 PM PTWhat’s New with Serverless at AWS – In this tech talk, we’ll catch you up on our ever-growing collection of natively supported languages, console updates, and re:Invent launches.

December 13, 2018 | 11:00 AM – 12:00 PM PTBuilding Real Time Applications using WebSocket APIs Supported by Amazon API Gateway – Learn how to build, deploy and manage APIs with API Gateway.

Storage

December 12, 2018 | 09:00 AM – 10:00 AM PTIntroduction to Amazon FSx for Windows File Server – Learn about Amazon FSx for Windows File Server, a new fully managed native Windows file system that makes it easy to move Windows-based applications that require file storage to AWS.

December 14, 2018 | 01:00 PM – 02:00 PM PTWhat’s New with AWS Storage – A Recap of re:Invent 2018 Announcements – Learn about the key AWS storage announcements that occurred prior to and at re:Invent 2018. With 15+ new service, feature, and device launches in object, file, block, and data transfer storage services, you will be able to start designing the foundation of your cloud IT environment for any application and easily migrate data to AWS.

December 18, 2018 | 09:00 AM – 10:00 AM PTIntroduction to Amazon FSx for Lustre – Learn about Amazon FSx for Lustre, a fully managed file system for compute-intensive workloads. Process files from S3 or data stores, with throughput up to hundreds of GBps and sub-millisecond latencies.

December 18, 2018 | 01:00 PM – 02:00 PM PTIntroduction to New AWS Services for Data Transfer – Learn about new AWS data transfer services, and which might best fit your requirements for data migration or ongoing hybrid workloads.

New for AWS Lambda – Use Any Programming Language and Share Common Components

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-lambda-use-any-programming-language-and-share-common-components/

I remember the excitement when AWS Lambda was announced in 2014Four years on, customers are using Lambda functions for many different use cases. For example, iRobot is using AWS Lambda to provide compute services for their Roomba robotic vacuum cleaners, Fannie Mae to run Monte Carlo simulations for millions of mortgages, Bustle to serve billions of requests for their digital content.

Today, we are introducing two new features that are going to make serverless development even easier:

  • Lambda Layers, a way to centrally manage code and data that is shared across multiple functions.
  • Lambda Runtime API, a simple interface to use any programming language, or a specific language version, for developing your functions.

These two features can be used together: runtimes can be shared as layers so that developers can pick them up and use their favorite programming language when authoring Lambda functions.

Let’s see how they work more in detail.

Lambda Layers

When building serverless applications, it is quite common to have code that is shared across Lambda functions. It can be your custom code, that is used by more than one function, or a standard library, that you add to simplify the implementation of your business logic.

Previously, you would have to package and deploy this shared code together with all the functions using it. Now, you can put common components in a ZIP file and upload it as a Lambda Layer. Your function code doesn’t need to be changed and can reference the libraries in the layer as it would normally do.

Layers can be versioned to manage updates, each version is immutable. When a version is deleted or permissions to use it are revoked, functions that used it previously will continue to work, but you won’t be able to create new ones.

In the configuration of a function, you can reference up to five layers, one of which can optionally be a runtime. When the function is invoked, layers are installed in /opt in the order you provided. Order is important because layers are all extracted under the same path, so each layer can potentially overwrite the previous one. This approach can be used to customize the environment. For example, the first layer can be a runtime and the second layer adds specific versions of the libraries you need.

The overall, uncompressed size of function and layers is subject to the usual unzipped deployment package size limit.

Layers can be used within an AWS account, shared between accounts, or shared publicly with the broad developer community.

There are many advantages when using layers. For example, you can use Lambda Layers to:

  • Enforce separation of concerns, between dependencies and your custom business logic.
  • Make your function code smaller and more focused on what you want to build.
  • Speed up deployments, because less code must be packaged and uploaded, and dependencies can be reused.

Based on our customer feedback, and to provide an example of how to use Lambda Layers, we are publishing a public layer which includes NumPy and SciPy, two popular scientific libraries for Python. This prebuilt and optimized layer can help you start very quickly with data processing and machine learning applications.

In addition to that, you can find layers for application monitoring, security, and management from partners such as Datadog, Epsagon, IOpipe, NodeSource, Thundra, Protego, PureSec, Twistlock, Serverless, and Stackery.

Using Lambda Layers

In the Lambda console I can now manage my own layers:

I don’t want to create a new layer now but use an existing one in a function. I create a new Python function and, in the function configuration, I can see that there are no referenced layers. I choose to add a layer:

From the list of layers compatible with the runtime of my function, I select the one with NumPy and SciPy, using the latest available version:

After I add the layer, I click Save to update the function configuration. In case you’re using more than one layer, you can adjust here the order in which they are merged with the function code.

To use the layer in my function, I just have to import the features I need from NumPy and SciPy:

import numpy as np
from scipy.spatial import ConvexHull

def lambda_handler(event, context):

    print("\nUsing NumPy\n")

    print("random matrix_a =")
    matrix_a = np.random.randint(10, size=(4, 4))
    print(matrix_a)

    print("random matrix_b =")
    matrix_b = np.random.randint(10, size=(4, 4))
    print(matrix_b)

    print("matrix_a * matrix_b = ")
    print(matrix_a.dot(matrix_b)
    print("\nUsing SciPy\n")

    num_points = 10
    print(num_points, "random points:")
    points = np.random.rand(num_points, 2)
    for i, point in enumerate(points):
        print(i, '->', point)

    hull = ConvexHull(points)
    print("The smallest convex set containing all",
        num_points, "points has", len(hull.simplices),
        "sides,\nconnecting points:")
    for simplex in hull.simplices:
        print(simplex[0], '<->', simplex[1])

I run the function, and looking at the logs, I can see some interesting results.

First, I am using NumPy to perform matrix multiplication (matrices and vectors are often used to represent the inputs, outputs, and weights of neural networks):

random matrix_1 =
[[8 4 3 8]
[1 7 3 0]
[2 5 9 3]
[6 6 8 9]]
random matrix_2 =
[[2 4 7 7]
[7 0 0 6]
[5 0 1 0]
[4 9 8 6]]
matrix_1 * matrix_2 = 
[[ 91 104 123 128]
[ 66 4 10 49]
[ 96 35 47 62]
[130 105 122 132]]

Then, I use SciPy advanced spatial algorithms to compute something quite hard to build by myself: finding the smallest “convex set” containing a list of points on a plane. For example, this can be used in a Lambda function receiving events from multiple geographic locations (corresponding to buildings, customer locations, or devices) to visually “group” similar events together in an efficient way:

10 random points:
0 -> [0.07854072 0.91912467]
1 -> [0.11845307 0.20851106]
2 -> [0.3774705 0.62954561]
3 -> [0.09845837 0.74598477]
4 -> [0.32892855 0.4151341 ]
5 -> [0.00170082 0.44584693]
6 -> [0.34196204 0.3541194 ]
7 -> [0.84802508 0.98776034]
8 -> [0.7234202 0.81249389]
9 -> [0.52648981 0.8835746 ]
The smallest convex set containing all 10 points has 6 sides,
connecting points:
1 <-> 5
0 <-> 5
0 <-> 7
6 <-> 1
8 <-> 7
8 <-> 6

When I was building this example, there was no need to install or package dependencies. I could quickly iterate on the code of the function. Deployments were very fast because I didn’t have to include large libraries or modules.

To visualize the output of SciPy, it was easy for me to create an additional layer to import matplotlib, a plotting library. Adding a few lines of code at the end of the previous function, I can now upload to Amazon Simple Storage Service (S3) an image that shows how the “convex set” is wrapping all the points:

    plt.plot(points[:,0], points[:,1], 'o')
    for simplex in hull.simplices:
        plt.plot(points[simplex, 0], points[simplex, 1], 'k-')
        
    img_data = io.BytesIO()
    plt.savefig(img_data, format='png')
    img_data.seek(0)

    s3 = boto3.resource('s3')
    bucket = s3.Bucket(S3_BUCKET_NAME)
    bucket.put_object(Body=img_data, ContentType='image/png', Key=S3_KEY)
    
    plt.close()

Lambda Runtime API

You can now select a custom runtime when creating or updating a function:

With this selection, the function must include (in its code or in a layer) an executable file called bootstrap, responsible for the communication between your code (that can use any programming language) and the Lambda environment.

The runtime bootstrap uses a simple HTTP based interface to get the event payload for a new invocation and return back the response from the function. Information on the interface endpoint and the function handler are shared as environment variables.

For the execution of your code, you can use anything that can run in the Lambda execution environment. For example, you can bring an interpreter for the programming language of your choice.

You only need to know how the Runtime API works if you want to manage or publish your own runtimes. As a developer, you can quickly use runtimes that are shared with you as layers.

We are making these open source runtimes available today:

We are also working with our partners to provide more open source runtimes:

  • Erlang (Alert Logic)
  • Elixir (Alert Logic)
  • Cobol (Blu Age)
  • N|Solid (NodeSource)
  • PHP (Stackery)

The Runtime API is the future of how we’ll support new languages in Lambda. For example, this is how we built support for the Ruby language.

Available Now

You can use runtimes and layers in all regions where Lambda is available, via the console or the AWS Command Line Interface (CLI). You can also use the AWS Serverless Application Model (SAM) and the SAM CLI to test, deploy and manage serverless applications using these new features.

There is no additional cost for using runtimes and layers. The storage of your layers takes part in the AWS Lambda Function storage per region limit.

To learn more about using the Runtime API and Lambda Layers, don’t miss our webinar on December 11, hosted by Principal Developer Advocate Chris Munns.

I am so excited by these new features, please let me know what are you going to build next!

Introducing AWS App Mesh – service mesh for microservices on AWS

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/introducing-aws-app-mesh-service-mesh-for-microservices-on-aws/

AWS App Mesh is a service mesh that allows you to easily monitor and control communications across microservices applications on AWS. You can use App Mesh with microservices running on Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Container Service for Kubernetes (Amazon EKS), and Kubernetes running on Amazon EC2.

Today, App Mesh is available as a public preview. In the coming months, we plan to add new functionality and integrations.

Why App Mesh?

Many of our customers are building applications with microservices architectures, breaking applications into many separate, smaller pieces of software that are independently deployed and operated. Microservices help to increase the availability and scalability of an application by allowing each component to scale independently based on demand. Each microservice interacts with the other microservices through an API.

When you start building more than a few microservices within an application, it becomes difficult to identify and isolate issues. These can include high latencies, error rates, or error codes across the application. There is no dynamic way to route network traffic when there are failures or when new containers need to be deployed.

You can address these problems by adding custom code and libraries into each microservice and using open source tools that manage communications for each microservice. However, these solutions can be hard to install, difficult to update across teams, and complex to manage for availability and resiliency.

AWS App Mesh implements a new architectural pattern that helps solve many of these challenges and provides a consistent, dynamic way to manage the communications between microservices. With App Mesh, the logic for monitoring and controlling communications between microservices is implemented as a proxy that runs alongside each microservice, instead of being built into the microservice code. The proxy handles all of the network traffic into and out of the microservice and provides consistency for visibility, traffic control, and security capabilities to all of your microservices.

Use App Mesh to model how all of your microservices connect. App Mesh automatically computes and sends the appropriate configuration information to each microservice proxy. This gives you standardized, easy-to-use visibility and traffic controls across your entire application.  App Mesh uses Envoy, an open source proxy. That makes it compatible with a wide range of AWS partner and open source tools for monitoring microservices.

Using App Mesh, you can export observability data to multiple AWS and third-party tools, including Amazon CloudWatch, AWS X-Ray, or any third-party monitoring and tracing tool that integrates with Envoy. You can configure new traffic routing controls to enable dynamic blue/green canary deployments for your services.

Getting started

Here’s a sample application with two services, where service A receives traffic from the internet and uses service B for some backend processing. You want to route traffic dynamically between services B and B’, a new version of B deployed to act as the canary.

First, create a mesh, a namespace that groups related microservices that must interact.

Next, create virtual nodes to represent services in the mesh. A virtual node can represent a microservice or a specific microservice version. In this example, service A and B participate in the mesh and you manage the traffic to service B using App Mesh.

Now, deploy your services with the required Envoy proxy and with a mapping to the node in the mesh.

After you have defined your virtual nodes, you can define how the traffic flows between your microservices. To do this, define a virtual router and routes for communications between microservices.

A virtual router handles traffic for your microservices. After you create a virtual router, you create routes to direct traffic appropriately. These routes include the connection requests that the route should accept, where they should go, and the weighted amount of traffic to send. All of these changes to adjust traffic between services is computed and sent dynamically to the appropriate proxies by App Mesh to execute your deployment.

You now have a virtual router set up that accepts all traffic from virtual node A sending to the existing version of service B, as well some traffic to the new version, B’.

Exporting metrics, logs, and traces

One of benefits about placing a proxy in front of every microservice is that you can automatically capture metrics, logs, and traces about the communication between your services. App Mesh enables you to easily collect and export this data to the tools of your choice. Envoy is already integrated with several tools like Prometheus and Datadog.

During the preview, we are adding support for AWS services such as Amazon CloudWatch and AWS X-Ray. We have a lot more integrations planned as well.

Available now

AWS App Mesh is available as a public preview and you can start using it today in the North Virginia, Ohio, Oregon, and Ireland AWS Regions. During the preview, we plan to add new features and want to hear your feedback. You can check out our GitHub repository for examples and our roadmap.

— Nate

Building a tightly coupled molecular dynamics workflow with multi-node parallel jobs in AWS Batch

Post Syndicated from Josh Rad original https://aws.amazon.com/blogs/compute/building-a-tightly-coupled-molecular-dynamics-workflow-with-multi-node-parallel-jobs-in-aws-batch/

Contributed by Amr Ragab, HPC Application Consultant, AWS Professional Services and Aswin Damodar, Senior Software Development Engineer, AWS Batch

At Supercomputing 2018 in Dallas, TX, AWS announced AWS Batch support for running tightly coupled workloads in a multi-node parallel jobs environment. This AWS Batch feature enables applications that require strong scaling for efficient computational workloads.

Some of the more popular workloads that can take advantage of this feature enhancement include computational fluid dynamics (CFD) codes such as OpenFoam, Fluent, and ANSYS. Other workloads include molecular dynamics (MD) applications such as AMBER, GROMACS, NAMD.

Running tightly coupled, distributed, deep learning frameworks is also now possible on AWS Batch. Applications that can take advantage include TensorFlow, MXNet, Pytorch, and Chainer. Essentially, any application scaling that benefits from tightly coupled–based scalability can now be integrated into AWS Batch.

In this post, we show you how to build a workflow executing an MD simulation using GROMACS running on GPUs, using the p3 instance family.

AWS Batch overview

AWS Batch is a service providing managed planning, scheduling, and execution of containerized workloads on AWS. Purpose-built for scalable compute workloads, AWS Batch is ideal for high throughput, distributed computing jobs such as video and image encoding, loosely coupled numerical calculations, and multistep computational workflows.

If you are new to AWS Batch, consider gaining familiarity with the service by following the tutorial in the Creating a Simple “Fetch & Run” AWS Batch Job post.

Prerequisites

You need an AWS account to go through this walkthrough. Other prerequisites include:

  • Launch an ECS instance, p3.2xlarge with a NVIDIA Tesla V100 backend. Use the Amazon Linux 2 AMIs for ECS.
  • In the ECS instance, install the latest CUDA 10 stack, which provides the toolchain and compilation libraries as well as the NVIDIA driver.
  • Install nvidia-docker2.
  • In your /etc/docker/daemon.json file, ensure that the default-runtime value is set to nvidia.
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}
  • Finally, save the EC2 instance as an AMI in your account. Copy the AMI ID, as you need it later in the post.

Deploying the workload

In a production environment, it’s important to efficiently execute the compute workload with multi-node parallel jobs. Most of the optimization is on the application layer and how efficiently the Message Passing Interface (MPI) ranks (MPI and OpenMP threads) are distributed across nodes. Application-level optimization is out of scope for this post, but should be considered when running in production.

One of the key requirements for running on AWS Batch is a Dockerized image with the application, libraries, scripts, and code. For multi-node parallel jobs, you need an MPI stack for the tightly coupled communication layer and a wrapper script for the MPI orchestration. The running child Docker containers need to pass container IP address information to the master node to fill out the MPI host file.

The undifferentiated heavy lifting that AWS Batch provides is the Docker-to-Docker communication across nodes using Amazon ECS task networking. With multi-node parallel jobs, the ECS container receives environmental variables from the backend, which can be used to establish which running container is the master and which is the child.

  • AWS_BATCH_JOB_MAIN_NODE_INDEX—The designation of the master node in a multi-node parallel job. This is the main node in which the MPI job is launched.
  • AWS_BATCH_JOB_MAIN_NODE_PRIVATE_IPV4_ADDRESS—The IPv4 address of the main node. This is presented in the environment for all children nodes.
  • AWS_BATCH_JOB_NODE_INDEX—The designation of the node index.
  • AWS_BATCH_JOB_NUM_NODES – The number of nodes launched as part of the node group for your multi-node parallel job.

If AWS_BATCH_JOB_MAIN_NODE_INDEX = AWS_BATCH_JOB_NODE_INDEX, then this is the main node. The following code block is an example MPI synchronization script that you can include as part of the CMD structure of the Docker container. Save the following code as mpi-run.sh.

#!/bin/bash

cd $JOB_DIR

PATH="$PATH:/opt/openmpi/bin/"
BASENAME="${0##*/}"
log () {
  echo "${BASENAME} - ${1}"
}
HOST_FILE_PATH="/tmp/hostfile"
AWS_BATCH_EXIT_CODE_FILE="/tmp/batch-exit-code"

aws s3 cp $S3_INPUT $SCRATCH_DIR
tar -xvf $SCRATCH_DIR/*.tar.gz -C $SCRATCH_DIR

sleep 2

usage () {
  if [ "${#@}" -ne 0 ]; then
    log "* ${*}"
    log
  fi
  cat <<ENDUSAGE
Usage:
export AWS_BATCH_JOB_NODE_INDEX=0
export AWS_BATCH_JOB_NUM_NODES=10
export AWS_BATCH_JOB_MAIN_NODE_INDEX=0
export AWS_BATCH_JOB_ID=string
./mpi-run.sh
ENDUSAGE

  error_exit
}

# Standard function to print an error and exit with a failing return code
error_exit () {
  log "${BASENAME} - ${1}" >&2
  log "${2:-1}" > $AWS_BATCH_EXIT_CODE_FILE
  kill  $(cat /tmp/supervisord.pid)
}

# Set child by default switch to main if on main node container
NODE_TYPE="child"
if [ "${AWS_BATCH_JOB_MAIN_NODE_INDEX}" == 
"${AWS_BATCH_JOB_NODE_INDEX}" ]; then
  log "Running synchronize as the main node"
  NODE_TYPE="main"
fi


# wait for all nodes to report
wait_for_nodes () {
  log "Running as master node"

  touch $HOST_FILE_PATH
  ip=$(/sbin/ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)
  
  if [ -x "$(command -v nvidia-smi)" ] ; then
      NUM_GPUS=$(ls -l /dev/nvidia[0-9] | wc -l)
      availablecores=$NUM_GPUS
  else
      availablecores=$(nproc)
  fi

  log "master details -> $ip:$availablecores"
  echo "$ip slots=$availablecores" >> $HOST_FILE_PATH

  lines=$(uniq $HOST_FILE_PATH|wc -l)
  while [ "$AWS_BATCH_JOB_NUM_NODES" -gt "$lines" ]
  do
    log "$lines out of $AWS_BATCH_JOB_NUM_NODES nodes joined, check again in 1 second"
    sleep 1
    lines=$(uniq $HOST_FILE_PATH|wc -l)
  done
  # Make the temporary file executable and run it with any given arguments
  log "All nodes successfully joined"
  
  # remove duplicates if there are any.
  awk '!a[$0]++' $HOST_FILE_PATH > ${HOST_FILE_PATH}-
deduped
  cat $HOST_FILE_PATH-deduped
  log "executing main MPIRUN workflow"

  cd $SCRATCH_DIR
  . /opt/gromacs/bin/GMXRC
  /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 \
                          -x PATH -x LD_LIBRARY_PATH -x 
GROMACS_DIR -x GMXBIN -x GMXMAN -x GMXDATA \
                          --allow-run-as-root --machinefile 
${HOST_FILE_PATH}-deduped \
                          $GMX_COMMAND
  sleep 2

  tar -czvf $JOB_DIR/batch_output_$AWS_BATCH_JOB_ID.tar.gz 
$SCRATCH_DIR/*
  aws s3 cp $JOB_DIR/batch_output_$AWS_BATCH_JOB_ID.tar.gz 
$S3_OUTPUT

  log "done! goodbye, writing exit code to 
$AWS_BATCH_EXIT_CODE_FILE and shutting down my supervisord"
  echo "0" > $AWS_BATCH_EXIT_CODE_FILE
  kill  $(cat /tmp/supervisord.pid)
  exit 0
}


# Fetch and run a script
report_to_master () {
  # get own ip and num cpus
  #
  ip=$(/sbin/ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)

  if [ -x "$(command -v nvidia-smi)" ] ; then
      NUM_GPUS=$(ls -l /dev/nvidia[0-9] | wc -l)
      availablecores=$NUM_GPUS
  else
      availablecores=$(nproc)
  fi

  log "I am a child node -> $ip:$availablecores, reporting to the master node -> 
${AWS_BATCH_JOB_MAIN_NODE_PRIVATE_IPV4_ADDRESS}"
  until echo "$ip slots=$availablecores" | ssh 
${AWS_BATCH_JOB_MAIN_NODE_PRIVATE_IPV4_ADDRESS} "cat >> 
/$HOST_FILE_PATH"
  do
    echo "Sleeping 5 seconds and trying again"
  done
  log "done! goodbye"
  exit 0
  }


# Main - dispatch user request to appropriate function
log $NODE_TYPE
case $NODE_TYPE in
  main)
    wait_for_nodes "${@}"
    ;;

  child)
    report_to_master "${@}"
    ;;

  *)
    log $NODE_TYPE
    usage "Could not determine node type. Expected (main/child)"
    ;;
esac

The synchronization script supports downloading the assets from Amazon S3 as well as preparing the MPI host file based on GPU scheduling for GROMACS.

Furthermore, the mpirun stanza is captured in this script. This script can be a template for several multi-node parallel job applications by just changing a few lines.  These lines are essentially the GROMACS-specific steps:

. /opt/gromacs/bin/GMXRC
export OMP_NUM_THREADS=$OMP_THREADS
/opt/openmpi/bin/mpirun -np $MPI_THREADS --mca btl_tcp_if_include eth0 \
-x OMP_NUM_THREADS -x PATH -x LD_LIBRARY_PATH -x GROMACS_DIR -x GMXBIN -x GMXMAN -x GMXDATA \
--allow-run-as-root --machinefile ${HOST_FILE_PATH}-deduped \
$GMX_COMMAND

In your development environment for building Docker images, create a Dockerfile that prepares the software stack for running GROMACS. The key elements of the Dockerfile are:

  1. Set up a passwordless-ssh keygen.
  2. Download, and compile OpenMPI. In this Dockerfile, you are downloading the recently released OpenMPI 4.0.0 source and compiling on a NVIDIA Tesla V100 GPU-backed instance (p3.2xlarge).
  3. Download and compile GROMACS.
  4. Set up supervisor to run SSH at Docker container startup as well as processing the mpi-run.sh script as the CMD.

Save the following script as a Dockerfile:

FROM nvidia/cuda:latest

ENV USER root

# -------------------------------------------------------------------------------------
# install needed software -
# openssh
# mpi
# awscli
# supervisor
# -------------------------------------------------------------------------------------

RUN apt update
RUN DEBIAN_FRONTEND=noninteractive apt install -y iproute2 cmake openssh-server openssh-client python python-pip build-essential gfortran wget curl
RUN pip install supervisor awscli

RUN mkdir -p /var/run/sshd
ENV DEBIAN_FRONTEND noninteractive

ENV NOTVISIBLE "in users profile"

#####################################################
## SSH SETUP

RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
RUN sed '[email protected]\s*required\s*[email protected] optional [email protected]' -i /etc/pam.d/sshd
RUN echo "export VISIBLE=now" >> /etc/profile

RUN echo "${USER} ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
ENV SSHDIR /root/.ssh
RUN mkdir -p ${SSHDIR}
RUN touch ${SSHDIR}/sshd_config
RUN ssh-keygen -t rsa -f ${SSHDIR}/ssh_host_rsa_key -N ''
RUN cp ${SSHDIR}/ssh_host_rsa_key.pub ${SSHDIR}/authorized_keys
RUN cp ${SSHDIR}/ssh_host_rsa_key ${SSHDIR}/id_rsa
RUN echo " IdentityFile ${SSHDIR}/id_rsa" >> /etc/ssh/ssh_config
RUN echo "Host *" >> /etc/ssh/ssh_config && echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config
RUN chmod -R 600 ${SSHDIR}/* && \
chown -R ${USER}:${USER} ${SSHDIR}/
# check if ssh agent is running or not, if not, run
RUN eval `ssh-agent -s` && ssh-add ${SSHDIR}/id_rsa

##################################################
## S3 OPTIMIZATION

RUN aws configure set default.s3.max_concurrent_requests 30
RUN aws configure set default.s3.max_queue_size 10000
RUN aws configure set default.s3.multipart_threshold 64MB
RUN aws configure set default.s3.multipart_chunksize 16MB
RUN aws configure set default.s3.max_bandwidth 4096MB/s
RUN aws configure set default.s3.addressing_style path

##################################################
## CUDA MPI

RUN wget -O /tmp/openmpi.tar.gz https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.0.tar.gz && \
tar -xvf /tmp/openmpi.tar.gz -C /tmp
RUN cd /tmp/openmpi* && ./configure --prefix=/opt/openmpi --with-cuda --enable-mpirun-prefix-by-default && \
make -j $(nproc) && make install
RUN echo "export PATH=$PATH:/opt/openmpi/bin" >> /etc/profile
RUN echo "export LD_LIBRARY_PATH=$LD_LIRBARY_PATH:/opt/openmpi/lib:/usr/local/cuda/include:/usr/local/cuda/lib64" >> /etc/profile

###################################################
## GROMACS 2018 INSTALL

ENV PATH $PATH:/opt/openmpi/bin
ENV LD_LIBRARY_PATH $LD_LIRBARY_PATH:/opt/openmpi/lib:/usr/local/cuda/include:/usr/local/cuda/lib64
RUN wget -O /tmp/gromacs.tar.gz http://ftp.gromacs.org/pub/gromacs/gromacs-2018.4.tar.gz && \
tar -xvf /tmp/gromacs.tar.gz -C /tmp
RUN cd /tmp/gromacs* && mkdir build
RUN cd /tmp/gromacs*/build && \
cmake .. -DGMX_MPI=on -DGMX_THREAD_MPI=ON -DGMX_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX=/opt/gromacs && \
make -j $(nproc) && make install
RUN echo "source /opt/gromacs/bin/GMXRC" >> /etc/profile

###################################################
## supervisor container startup

ADD conf/supervisord/supervisord.conf /etc/supervisor/supervisord.conf
ADD supervised-scripts/mpi-run.sh supervised-scripts/mpi-run.sh
RUN chmod 755 supervised-scripts/mpi-run.sh

EXPOSE 22
RUN export PATH="$PATH:/opt/openmpi/bin"
ADD batch-runtime-scripts/entry-point.sh batch-runtime-scripts/entry-point.sh
RUN chmod 0755 batch-runtime-scripts/entry-point.sh

CMD /batch-runtime-scripts/entry-point.sh

After the container is built, push the image to your Amazon ECR repository and note the container image URI for later steps.

Set up GROMACS

For the input files, use the Chalcone Synthase (1CGZ) example, from RCSB.org. For this post, just run a simple simulation following the Lysozyme in Water GROMACS tutorial.

Execute the production MD run before the analysis (that is, after the system is solvated, neutralized, and equilibrated), so you can show that the longest part of the simulation can be achieved in a containizered workflow.

It is possible from the tutorial to run the entire workflow from PDB preparation to solvation, and energy minimization and analysis in AWS Batch.

Set up the compute environment

For the purpose of running the MD simulation in this test case, use two p3.2xlarge instances. Each instance provides one NVIDIA Tesla V100 GPU for which GROMACS distributes the job. You don’t have to launch specific instance types. With the p3 family, the MPI-wrapper can concomitantly modify the MPI ranks to accommodate the current GPU and node topology.

When the job is executed, instantiate two MPI processes with two OpenMP threads per MPI process. For this post, launch EC2 OnDemand, using the Amazon Linux AMIs we can take advantage of per-second billing.

Under Create Compute Environment, choose a managed compute environment and provide a name, such as gromacs-gpu-ce. Attach two roles:

  • AWSBatchServiceRole—Allows AWS Batch to make EC2 calls on your behalf.
  • ecsInstanceRole—Allows the underlying instance to make AWS API calls.

In the next panel, specify the following field values:

  • Provisioning model: EC2
  • Allowed instance types: p3 family
  • Minimum vCPUs: 0
  • Desired vCPUs: 0
  • Maximum vCPUs: 128

For Enable user-specified Ami ID and enter the AMI that you created earlier.

Finally, for the compute environment, specify the network VPC and subnets for launching the instances, as well as a security group. We recommend specifying a placement group for tightly coupled workloads for better performance. You can also create EC2 tags for the launch instances. We used name=gromacs-gpu-processor.

Next, choose Job Queues and create a gromacs-queue queue coupled with the compute environment created earlier. Set the priority to 1 and select Enable job queue.

Set up the job definition

In the job definition setup, you create a two-node group, where each node pulls the gromacs_mpi image. Because you are using the p3.2xlarge instance providing one V100 GPU per instance, your vCPU slots = 8 for scheduling purposes.

{
    "jobDefinitionName": "gromacs-jobdef",
    "jobDefinitionArn": "arn:aws:batch:us-east-2:<accountid>:job-definition/gromacs-jobdef:1",
    "revision": 6,
    "status": "ACTIVE",
    "type": "multinode",
    "parameters": {},
    "nodeProperties": {
        "numNodes": 2,
        "mainNode": 0,
        "nodeRangeProperties": [
            {
                "targetNodes": "0:1",
                "container": {
                    "image": "<accountid>.dkr.ecr.us-east-2.amazonaws.com/gromacs_mpi:latest",
                    "vcpus": 8,
                    "memory": 24000,
                    "command": [],
                    "jobRoleArn": "arn:aws:iam::<accountid>:role/ecsTaskExecutionRole",
                    "volumes": [
                        {
                            "host": {
                                "sourcePath": "/scratch"
                            },
                            "name": "scratch"
                        },
                        {
                            "host": {
                                "sourcePath": "/efs"
                            },
                            "name": "efs"
                        }
                    ],
                    "environment": [
                        {
                            "name": "SCRATCH_DIR",
                            "value": "/scratch"
                        },
                        {
                            "name": "JOB_DIR",
                            "value": "/efs"
                        },
                        {
                            "name": "GMX_COMMAND",
                            "value": "gmx_mpi mdrun -deffnm md_0_1 -nb gpu -ntomp 1"
                        },
                        {
                            "name": "OMP_THREADS",
                            "value": "2"
                        },
                        {
                            “name”: “MPI_THREADS”,
                            “value”: “1”
                        },
                        {
                            "name": "S3_INPUT",
                            "value": "s3://ragab-md/1CGZ.tar.gz"
                        },
                        {
                            "name": "S3_OUTPUT",
                            "value": "s3://ragab-md"
                        }
                    ],
                    "mountPoints": [
                        {
                            "containerPath": "/scratch",
                            "sourceVolume": "scratch"
                        },
                        {
                            "containerPath": "/efs",
                            "sourceVolume": "efs"
                        }
                    ],
                    "ulimits": [],
                    "instanceType": "p3.2xlarge"
                }
            }
        ]
    }
}

Submit the GROMACS job

In the AWS Batch job submission portal, provide a job name and select the job definition created earlier as well as the job queue. Ensure that the vCPU value is set to 8 and the Memory (MiB) value is 24000.

Under Environmental Variables, within in each node group, ensure that the keys are set correctly as follows.

KeyValue
SCRATCH_DIR/scratch
JOB_DIR/efs
OMP_THREADS2
GMX_COMMANDgmx_mpi mdrun -deffnm md_0_1 -nb gpu
MPI_THREADS2
S3_INPUTs3://<your input>
S3_OUTPUTs3://<your output>

Submit the job and wait for it to enter into the RUNNING state. After the job is in the RUNNING state, select the job ID and choose Nodes.

The containers listed each write to a separate Amazon CloudWatch log stream where you can monitor the progress.

After the job is completed the entire working directory is compressed and uploaded to S3, the trajectories (*.xtc) and input .gro files can be viewed in your favorite MD analysis package. For more information about preparing a desktop, see Deploying a 4x4K, GPU-backed Linux desktop instance on AWS.

You can view the trajectories in PyMOL as well as running any subsequent trajectory analysis.

Extending the solution

As we mentioned earlier, you can take this core workload and extend it as part of a job execution chain in a workflow. Native support for job dependencies exists in AWS Batch and alternatively in AWS Step Functions. With Step Functions, you can create a decision-based workflow tree to run the preparation, solvation, energy minimization, equilibration, production MD, and analysis.

Conclusion

In this post, we showed that tightly coupled, scalable MD simulations can be executed using the recently released multi-node parallel jobs feature for AWS Batch. You finally have an end-to-end solution for distributed and MPI-based workloads.

As we mentioned earlier, many other applications can also take advantage of this feature set. We invite you to try this out and let us know how it goes.

Want to discuss how your tightly coupled workloads can benefit on AWS Batch? Contact AWS.

Scaling Amazon Kinesis Data Streams with AWS Application Auto Scaling

Post Syndicated from Giorgio Nobile original https://aws.amazon.com/blogs/big-data/scaling-amazon-kinesis-data-streams-with-aws-application-auto-scaling/

Recently, AWS launched a new feature of AWS Application Auto Scaling that let you define scaling policies that automatically add and remove shards to an Amazon Kinesis Data Stream. For more detailed information about this feature, see the Application Auto Scaling GitHub repository.

As your streaming information increases, you require a scaling solution to accommodate all requests. If you have a decrease in streaming information, you might use scaling to reduce costs. Currently, you scale an Amazon Kinesis Data Stream shard programmatically. Alternatively, you can use the Amazon Kinesis Scaling Utilities. To do so, you can use each utility manually, or automated with an AWS Elastic Beanstalk environment.

With the new feature of Application Auto Scaling, you can use AWS services to create a scaling solution without manual intervention or complex solutions.

Auto scaling solution overview

This blog post shows you how to deploy an auto scaling solution for your Amazon Kinesis Data Streams based on the default Amazon CloudWatch metrics. It also provides an AWS CloudFormation template to set up the environment automatically and the code related to the lambda function.

How the auto scaling solution works

Begin with a CloudWatch alarm that monitors Kinesis Data Stream shard metrics. When a custom threshold of the alarm is reached, for example because the number of requests has grown, the alarm is fired. This firing sends a notification to an Application Auto Scaling policy that responds based on the stated preference, scale up or down.

When the scaling policy is triggered, Application Auto Scaling calls an API operation. The call passes the new number of Kinesis Data Stream shards for the desired capacity (for more information, see here). The call also passes the name of the resource to scale, provided by Amazon API Gateway. Amazon API Gateway invokes an AWS Lambda function. Based on the information sent by Application Auto Scaling, the Lambda function increases or decreases the number of shards in the Kinesis Data Stream. It does so by using Kinesis Data Stream’s UpdateShardCount API operation. The following diagram illustrates the scenario.

As you can see from the diagram, AWS System Manager Parameter Store is also involved. We use Parameter Store to store the desired capacity value that Application Auto Scaling sends to API Gateway to increase or decrease the capacity. (In this scenario, the capacity is the number of shards.) In fact, Application Auto Scaling often invokes API Gateway to get the status of the custom resource, in this case the Kinesis Data Stream. It does so to see if there are actions to be taken and if previous actions were successful. Because Lambda is stateless, we need somewhere to save the desired capacity value communicated by Application Auto Scaling at any point.

Solution components

This solution uses the following components:

Application Auto Scaling scalable target – A scalable target is a resource registered with the Application Auto Scaling service. The service can scale any defined and registered resources. A scalable target handles the minimum and maximum value for the scalable dimension. It requires the following parameters:

  • ResourceId: The resource that is the scalable target. For custom resources, such as in the following example, specify the OutputValue returned from the AWS CloudFormation template.
  • RoleARN: The service-linked role used to grant permission to modify scalable target resources.
  • ScalableDimension: The dimension of the scalable target. For custom resources, the value must be custom-resource:ResourceType:Property.
  • ServiceNamespace: The namespace of the AWS service. In this case, this value is the custom resource.

Scaling policy – After you register a scalable target, you can apply a scaling policy that describes how the service should scale.

The following policy types are supported:

  • TargetTrackingScaling — Only for Amazon DynamoDB
  • StepScaling — Supported by Amazon ECS, Amazon EC2 Spot Fleets, and Amazon RDS
  • TargetTrackingScaling — Supported by Amazon ECS, EC2 Spot Fleets, and Amazon RDS
  • StepScaling — Supported by other services

In our scenario, we use a StepScaling policy, because we are using a custom resource type, as discussed later in Scaling policy and scheduled actions section. However, custom resource type can also support scheduled actions.

API Gateway – In our solution, we use Amazon API Gateway to expose a secure REST endpoint. Application Auto Scaling uses this endpoint to send authenticated calls, using IAM, to get the current capacity of the custom service to scale with HTTP GET. Application Auto Scaling also uses this endpoint to adjust the relative capacity of the custom service (with HTTP PATCH).

CloudWatch metrics and alarms – KPI to monitor and trigger an alarm directed to the Application Auto Scaling endpoint.

Lambda function – In our scenario, the AWS Lambda function mainly does two tasks:

  1. If the API request is GET, the Lambda function returns JSON that includes the information of the status of the custom resource that Application Auto Scaling controls. In this case, this custom resource is the Kinesis Data Stream.
  2. If the API request is PATCH, the Lambda function stores the new desired capacity in a DynamoDB table. The Lambda function then calls the UpdateShardCount API operation for the Kinesis Data Stream.

AWS System Manager Parameter Store – KPI to monitor and trigger an alarm directed to the Application Auto Scaling endpoint.

Prerequisites

Prerequisites for this solution include the following:

  • User credentials with permissions that allow you to configure automatic scaling and create the required service-linked role. For more information, see the Application Auto Scaling User Guide.
  • Permissions to create a stack using an AWS CloudFormation template, plus full access permissions to resources within the stack. For more information, see the AWS CloudFormation User Guide.

Scaling policy and scheduled actionsseconds

You can use the same architecture to work in two different situations for your Amazon Kinesis Data Stream:

  1. The first is predictable traffic, which means the scheduled actions. An example of predictable traffic is when your Kinesis Data Stream endpoint sees growing traffic in specific time window. In this case, you can make sure that an Application Auto Scaling scheduled action increases the number of Kinesis Data Stream shards to meet the demand. For instance, you might increase the number of shards at 12:00 p.m. and decrease them at 8:00 p.m.
  2. The second is the classic on-demand scenario, which specifies the scaling policy. In this case, you create an Application Auto Scaling scaling policy that increases or decreases the number of Kinesis Data Stream shards to meet the client demand.

In this blog post we are going to focus on the seconds scenario with the scaling policy, as we believe it is more challenging to implement.

Limitations

Application Auto Scaling can scale up and down continuously to make sure that you can meet your demand. However, Kinesis Data Streams have some limitations to consider when configuring Application Auto Scaling. With Kinesis Data Streams, you can’t do the following:

  • Scale more than twice for each rolling 24-hour period for each stream
  • Scale up to more than double your current shard count for a stream
  • Scale down below half your current shard count for a stream
  • Scale up to more than 500 shards in a stream
  • Scale a stream with more than 500 shards down unless the result is fewer than 500 shards
  • Scale up to more than the shard limit for your account

If you need to scale more than once a day, you can use this AWS Support form to request an increase to this limit.

Choosing the metric

When choosing the metrics to monitor to scale up and down, we can use the stream-level metrics IncomingBytes and IncomingRecords, as described in the Kinesis Data Streams documentation. Kinesis supports streaming 1 MiB of data per second or 1000 records per second. We can use IncomingBytes and IncomingRecords to set an alarm based on a threshold, let’s say 80 percent. We do this to call the Application Auto Scaling service before Amazon Kinesis start throttling our requests. This is the most effective method to proactively scale our resource. However, we need to set up the right cooldown period in Application Auto Scaling to avoid multiple scaling actions triggered by both metrics at the same time.

Alternatively, we can use the WriteProvisionedThroughputExceeded metric to scale when we reach the Amazon Kinesis shard limit, as described in the CloudWatch documentation.

In this example, we use the first approach, using IncomingRecords.

Deploying and testing the solution

To test the solution, we can use the AWS CloudFormation template found here. The AWS CloudFormation template automatically creates for you: the API Gateway, the Lambda function, the Kinesis Data Stream, the DynamoDB table, and the Application Auto Scaling group, and its scaling policy.

Deploying the solution

To let AWS CloudFormation create these resources on your behalf:

  1. Open the AWS Management Console in the AWS Region you want to deploy the solution to, and on the Services menu, choose CloudFormation.
  2. Choose Create Stack, choose Upload a template to Amazon S3, and then choose the file custom-application-autoscaling-kinesis.yaml included in the solution.
  3. Give a friendly name to the stack. Specify the Amazon S3 bucket that contains the compressed version of AWS Lambda function (index.py) included in the solution.
  4. For Options, you can specify tags for your stack and an optional IAM role to be used by AWS CloudFormation to create resources. If the role isn’t specified, a new role is created. You can also perform additional configuration for rollback settings and notification options.
  5. The review section shows a recap of the information. Be sure to select the two AWS CloudFormation acknowledgements to allow AWS CloudFormation to create resources with custom names on your behalf. Also, create a change set, because the AWS CloudFormation template includes the AWS::Serverless-2016-10-31
  6. Choose stream level metrics to create the resources present in the stack.

Testing the solution

Now that the environment is created, test it. To manually fire the Amazon CloudWatch alarm, we must generate traffic to the stream. By taking advantage of the Amazon Kinesis Data Generator, this is an efficient way to do it.

  1. First, it is necessary to follow this guide to set up your Amazon Kinesis Data Generator https://awslabs.github.io/amazon-kinesis-data-generator/web/help.html
  2. After the generator is created, it is necessary to select the Region and the newly created Kinesis Data Stream, in our case Kinesis-MyKinesisStream-1MUOGAD9OBCJH
  3. In Records per second insert a value greater than 1000 if you have one shard. Otherwise, multiply this number time the number of shards (for instance, if you have two shards, 1500 * 2 = 3000).
  4. In the form, enter test, and then choose Send data.
  5. Now that the traffic is being generated, open the Amazon CloudWatch console, and in Alarms, choose Alarms.
  6. In the ALARM list, select IncomingRecords-alarm-outOpen the History tab on the bottom of the page to see that the alarm triggered the Application Auto Scaling.

To verify that the number of open shards has been updated:

  1. Open the Amazon Kinesis console and select Data Streams, then select your Data Stream, in our case Kinesis-MyKinesisStream-1MUOGAD9OBCJH.
  2. In Details, it is possible to see that the number of shards increased to three, as shown in the following example:

Cleaning up the environment after testing

To clean up the environment after the testing, the procedure is straight-forward. By removing the AWS CloudFormation stack, everything is removed, as follows:

  1. Open the AWS Management Console in the AWS Region that you want to deploy the solution to, and select the CloudFormation stack from the list.
  2. Click on Actions and Delete Stack.
  3. OPTIONALLY: you can delete the S3 bucket and the Lambda function that you created.

Conclusion

This post described how you use Application Auto Scaling service to automatically scale Amazon Kinesis Data Stream. With the help of Amazon API Gateway, you can allow Application Auto Scaling to securely invoke the AWS Lambda function that interacts with the desired stream.


About the Authors

Giorgio Nobile works as Solutions Architect for Amazon Web Services in Italy. He works with enterprise customers and helps them to embrace the digital transformation. Giorgio’s field of expertise covers Big Data. In his free time, Giorgio loves playing with his two children and is addicted to DIY and snowboarding.

 

 

 

Diego Natali works as Solutions Architect for Amazon Web Services in Italy. With several years engineering background, he helps ISV and Start up customers designing flexible and resilient architectures using AWS services. In his spare time he enjoys watching movies and riding his dirt bike.

 

 

 

 

Learn about AWS – November AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-aws-november-aws-online-tech-talks/

AWS Tech Talks

AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. Join us this month to learn about AWS services and solutions. We’ll have experts online to help answer any questions you may have.

Featured this month! Check out the tech talks: Virtual Hands-On Workshop: Amazon Elasticsearch Service – Analyze Your CloudTrail Logs, AWS re:Invent: Know Before You Go and AWS Office Hours: Amazon GuardDuty Tips and Tricks.

Register today!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

AR/VR

November 13, 2018 | 11:00 AM – 12:00 PM PTHow to Create a Chatbot Using Amazon Sumerian and Sumerian Hosts – Learn how to quickly and easily create a chatbot using Amazon Sumerian & Sumerian Hosts.

Compute

November 19, 2018 | 11:00 AM – 12:00 PM PTUsing Amazon Lightsail to Create a Database – Learn how to set up a database on your Amazon Lightsail instance for your applications or stand-alone websites.

November 21, 2018 | 09:00 AM – 10:00 AM PTSave up to 90% on CI/CD Workloads with Amazon EC2 Spot Instances – Learn how to automatically scale a fleet of Spot Instances with Jenkins and EC2 Spot Plug-In.

Containers

November 13, 2018 | 09:00 AM – 10:00 AM PTCustomer Showcase: How Portal Finance Scaled Their Containerized Application Seamlessly with AWS Fargate – Learn how to scale your containerized applications without managing servers and cluster, using AWS Fargate.

November 14, 2018 | 11:00 AM – 12:00 PM PTCustomer Showcase: How 99designs Used AWS Fargate and Datadog to Manage their Containerized Application – Learn how 99designs scales their containerized applications using AWS Fargate.

November 21, 2018 | 11:00 AM – 12:00 PM PTMonitor the World: Meaningful Metrics for Containerized Apps and Clusters – Learn about metrics and tools you need to monitor your Kubernetes applications on AWS.

Data Lakes & Analytics

November 12, 2018 | 01:00 PM – 01:45 PM PTSearch Your DynamoDB Data with Amazon Elasticsearch Service – Learn the joint power of Amazon Elasticsearch Service and DynamoDB and how to set up your DynamoDB tables and streams to replicate your data to Amazon Elasticsearch Service.

November 13, 2018 | 01:00 PM – 01:45 PM PTVirtual Hands-On Workshop: Amazon Elasticsearch Service – Analyze Your CloudTrail Logs – Get hands-on experience and learn how to ingest and analyze CloudTrail logs using Amazon Elasticsearch Service.

November 14, 2018 | 01:00 PM – 01:45 PM PTBest Practices for Migrating Big Data Workloads to AWS – Learn how to migrate analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premises deployments to AWS.

November 15, 2018 | 11:00 AM – 11:45 AM PTBest Practices for Scaling Amazon Redshift – Learn about the most common scalability pain points with analytics platforms and see how Amazon Redshift can quickly scale to fulfill growing analytical needs and data volume.

Databases

November 12, 2018 | 11:00 AM – 11:45 AM PTModernize your SQL Server 2008/R2 Databases with AWS Database Services – As end of extended Support for SQL Server 2008/ R2 nears, learn how AWS’s portfolio of fully managed, cost effective databases, and easy-to-use migration tools can help.

DevOps

November 16, 2018 | 09:00 AM – 09:45 AM PTBuild and Orchestrate Serverless Applications on AWS with PowerShell – Learn how to build and orchestrate serverless applications on AWS with AWS Lambda and PowerShell.

End-User Computing

November 19, 2018 | 01:00 PM – 02:00 PM PTWork Without Workstations with AppStream 2.0 – Learn how to work without workstations and accelerate your engineering workflows using AppStream 2.0.

Enterprise & Hybrid

November 19, 2018 | 09:00 AM – 10:00 AM PTEnterprise DevOps: New Patterns of Efficiency – Learn how to implement “Enterprise DevOps” in your organization through building a culture of inclusion, common sense, and continuous improvement.

November 20, 2018 | 11:00 AM – 11:45 AM PTAre Your Workloads Well-Architected? – Learn how to measure and improve your workloads with AWS Well-Architected best practices.

IoT

November 16, 2018 | 01:00 PM – 02:00 PM PTPushing Intelligence to the Edge in Industrial Applications – Learn how GE uses AWS IoT for industrial use cases, including 3D printing and aviation.

Machine Learning

November 12, 2018 | 09:00 AM – 09:45 AM PTAutomate for Efficiency with Amazon Transcribe and Amazon Translate – Learn how you can increase efficiency and reach of your operations with Amazon Translate and Amazon Transcribe.

Mobile

November 20, 2018 | 01:00 PM – 02:00 PM PTGraphQL Deep Dive – Designing Schemas and Automating Deployment – Get an overview of the basics of how GraphQL works and dive into different schema designs, best practices, and considerations for providing data to your applications in production.

re:Invent

November 9, 2018 | 08:00 AM – 08:30 AM PTEpisode 7: Getting Around the re:Invent Campus – Learn how to efficiently get around the re:Invent campus using our new mobile app technology. Make sure you arrive on time and never miss a session.

November 14, 2018 | 08:00 AM – 08:30 AM PTEpisode 8: Know Before You Go – Learn about all final details you need to know before you arrive in Las Vegas for AWS re:Invent!

Security, Identity & Compliance

November 16, 2018 | 11:00 AM – 12:00 PM PTAWS Office Hours: Amazon GuardDuty Tips and Tricks – Join us for office hours and get the latest tips and tricks for Amazon GuardDuty from AWS Security experts.

Serverless

November 14, 2018 | 09:00 AM – 10:00 AM PTServerless Workflows for the Enterprise – Learn how to seamlessly build and deploy serverless applications across multiple teams in large organizations.

Storage

November 15, 2018 | 01:00 PM – 01:45 PM PTMove From Tape Backups to AWS in 30 Minutes – Learn how to switch to cloud backups easily with AWS Storage Gateway.

November 20, 2018 | 09:00 AM – 10:00 AM PTDeep Dive on Amazon S3 Security and Management – Amazon S3 provides some of the most enhanced data security features available in the cloud today, including access controls, encryption, security monitoring, remediation, and security standards and compliance certifications.