Tag Archives: Compute

Automating your lift-and-shift migration at no cost with CloudEndure Migration

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/automating-your-lift-and-shift-migration-at-no-cost-with-cloudendure-migration/

This post is courtesy of Gonen Stein, Head of Product Strategy, CloudEndure 

Acquired by AWS in January 2019, CloudEndure offers a highly automated migration tool to simplify and expedite rehost (lift-and-shift) migrations. AWS recently announced that CloudEndure Migration is now available to all customers and partners at no charge.

Each free CloudEndure Migration license provides 90 days of use following agent installation. During this period, you can perform all migration steps: replicate your source machines, conduct tests, and perform a scheduled cutover to complete the migration.

Overview

In this post, I show you how to obtain free CloudEndure Migration licenses and how to use CloudEndure Migration to rehost a machine from an on-premises environment to AWS. Although I’ve chosen to focus on an on-premises-to-AWS use case, CloudEndure also supports migration from cloud-based environments. For those of you who are interested in advanced automation, I include information on how to further automate large-scale migration projects.

Understanding CloudEndure Migration

CloudEndure Migration works through an agent that you install on your source machines. No reboot is required, and there is no performance impact on your source environment. The CloudEndure Agent connects to the CloudEndure User Console, which issues an API call to your target AWS Region. The API call creates a Staging Area in your AWS account that is designated to receive replicated data.

CloudEndure Migration automated rehosting consists of three main steps :

  1. Installing the Agent: The CloudEndure Agent replicates entire machines to a Staging Area in your target.
  2. Configuration and testing: You configure your target machine settings and launch non-disruptive tests.
  3. Performing cutover: CloudEndure automatically converts machines to run natively in AWS.

The Staging Area comprises both lightweight Amazon EC2 instances that act as replication servers and staging Amazon EBS volumes. Each source disk maps to an identically sized EBS volume in the Staging Area. The replication servers receive data from the CloudEndure Agent running on the source machines and write this data onto staging EBS volumes. One replication server can handle multiple source machines replicating concurrently.

After all source disks copy to the Staging Area, the CloudEndure Agent continues to track and replicate any changes made to the source disks. Continuous replication occurs at the block level, enabling CloudEndure to replicate any application that runs on supported x86-based Windows or Linux operating systems via an installed agent.

When the target machines launch for testing or cutover, CloudEndure automatically converts the target machines so that they boot and run natively on AWS. This conversion includes injecting the appropriate AWS drivers, making appropriate bootloader changes, modifying network adapters, and activating operating systems using the AWS KMS. This machine conversion process generally takes less than a minute, irrespective of machine size, and runs on all launched machines in parallel.

CloudEndure Migration Architecture

CloudEndure Migration Architecture

Installing CloudEndure Agent

To install the Agent:

  1. Start your migration project by registering for a free CloudEndure Migration license. The registration process is quick–use your email address to create a username and password for the CloudEndure User Console. Use this console to create and manage your migration projects.

    After you log in to the CloudEndure User Console, you need an AWS access key ID and secret access key to connect your CloudEndure Migration project to your AWS account. To obtain these credentials, sign in to the AWS Management Console and create an IAM user.Enter your AWS credentials in the CloudEndure User Console.

  2. Configure and save your migration replication settings, including your migration project’s Migration Source, Migration Target, and Staging Area. For example, I selected the Migration Source: Other Infrastructure, because I am migrating on-premises machines. Also, I selected the Migration Target AWS Region: AWS US East (N. Virginia). The CloudEndure User Console also enables you to configure a variety of other settings after you select your source and target, such as subnet, security group, VPN or Direct Connect usage, and encryption.

    After you save your replication settings, CloudEndure prompts you to install the CloudEndure Agent on your source machines. In my example, the source machines consist of an application server, a web server, and a database server, and all three are running Debian GNU / Linux 9.

  3. Download the CloudEndure Agent Installer for Linux by running the following command:
    wget -O ./installer_linux.py https://console.cloudendure.com/installer_linux.py
  4. Run the Installer:
    sudo python ./installer_linux.py -t <INSTALLATION TOKEN> --no-prompt

    You can install the CloudEndure Agent locally on each machine. For large-scale migrations, use the unattended installation parameters with any standard deployment tool to remotely install the CloudEndure Agent on your machines.

    After the Agent installation completes, CloudEndure adds your source machines to the CloudEndure User Console. From there, your source machines undergo several initial replication steps. To obtain a detailed breakdown of these steps, in the CloudEndure User Console, choose Machines, and select a specific machine to open the Machine Details View page.

    details page

    Data replication consists of two stages: Initial Sync and Continuous Data Replication. During Initial Sync, CloudEndure copies all of the source disks’ existing content into EBS volumes in the Staging Area. After Initial Sync completes, Continuous Data Replication begins, tracking your source machines and replicating new data writes to the staging EBS volumes. Continuous Data Replication makes sure that your Staging Area always has the most up-to-date copy of your source machines.

  5. To track your source machines’ data replication progress, in the CloudEndure User Console, choose Machines, and see the Details view.
    When the Data Replication Progress status column reads Continuous Data Replication, and the Migration Lifecycle status column reads Ready for Testing, Initial Sync is complete. These statuses indicate that the machines are functioning correctly and are ready for testing and migration.

Configuration and testing

To test how your machine runs on AWS, you must configure the Target Machine Blueprint. This Blueprint is a set of configurations that define where and how the target machines are launched and provisioned, such as target subnet, security groups, instance type, volume type, and tags.

For large-scale migration projects, APIs can be used to configure the Blueprint for all of your machines within seconds.

I recommend performing a test at least two weeks before migrating your source machines, to give you enough time to identify potential problems and resolve them before you perform the actual cutover. For more information, see Migration Best Practices.

To launch machines in Test Mode:

  1. In the CloudEndure User Console, choose Machines.
  2. Select the Name box corresponding to each machine to test.
  3. Choose Launch Target Machines, Test Mode.Launch target test machines

After the target machines launch in Test Mode, the CloudEndure User Console reports those machines as Tested and records the date and time of the test.

Performing cutover

After you have completed testing, your machines continue to be in Continuous Data Replication mode until the scheduled cutover window.

When you are ready to perform a cutover:

  1. In the CloudEndure User Console, choose Machines.
  2. Select the Name box corresponding to each machine to migrate.
  3. Choose Launch Target Machines, Cutover Mode.
    To confirm that your target machines successfully launch, see the Launch Target Machines menu. As your data replicates, verify that the target machines are running correctly, make any necessary configuration adjustments, perform user acceptance testing (UAT) on your applications and databases, and redirect your users.
  4. After the cutover completes, remove the CloudEndure Agent from the source machines and the CloudEndure User Console.
  5. At this point, you can also decommission your source machines.

Conclusion

In this post, I showed how to rehost an on-premises workload to AWS using CloudEndure Migration. CloudEndure automatically converts your machines from any source infrastructure to AWS infrastructure. That means they can boot and run natively in AWS, and run as expected after migration to the cloud.

If you have further questions see CloudEndure Migration, or Registering to CloudEndure Migration.

Get started now with free CloudEndure Migration licenses.

Learn about AWS Services & Solutions – September AWS Online Tech Talks

Post Syndicated from Jenny Hang original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-september-aws-online-tech-talks/

Learn about AWS Services & Solutions – September AWS Online Tech Talks

AWS Tech Talks

Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

 

Compute:

September 23, 2019 | 11:00 AM – 12:00 PM PTBuild Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case.

September 26, 2019 | 1:00 PM – 2:00 PM PTSelf-Hosted WordPress: It’s Easier Than You Think – Learn how you can easily build a fault-tolerant WordPress site using Amazon Lightsail.

October 3, 2019 | 11:00 AM – 12:00 PM PTLower Costs by Right Sizing Your Instance with Amazon EC2 T3 General Purpose Burstable Instances – Get an overview of T3 instances, understand what workloads are ideal for them, and understand how the T3 credit system works so that you can lower your EC2 instance costs today.

 

Containers:

September 26, 2019 | 11:00 AM – 12:00 PM PTDevelop a Web App Using Amazon ECS and AWS Cloud Development Kit (CDK) – Learn how to build your first app using CDK and AWS container services.

 

Data Lakes & Analytics:

September 26, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Provisioning Amazon MSK Clusters and Using Popular Apache Kafka-Compatible Tooling – Learn best practices on running Apache Kafka production workloads at a lower cost on Amazon MSK.

 

Databases:

September 25, 2019 | 1:00 PM – 2:00 PM PTWhat’s New in Amazon DocumentDB (with MongoDB compatibility) – Learn what’s new in Amazon DocumentDB, a fully managed MongoDB compatible database service designed from the ground up to be fast, scalable, and highly available.

October 3, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Enterprise-Class Security, High-Availability, and Scalability with Amazon ElastiCache – Learn about new enterprise-friendly Amazon ElastiCache enhancements like customer managed key and online scaling up or down to make your critical workloads more secure, scalable and available.

 

DevOps:

October 1, 2019 | 9:00 AM – 10:00 AM PT – CI/CD for Containers: A Way Forward for Your DevOps Pipeline – Learn how to build CI/CD pipelines using AWS services to get the most out of the agility afforded by containers.

 

Enterprise & Hybrid:

September 24, 2019 | 1:00 PM – 2:30 PM PT Virtual Workshop: How to Monitor and Manage Your AWS Costs – Learn how to visualize and manage your AWS cost and usage in this virtual hands-on workshop.

October 2, 2019 | 1:00 PM – 2:00 PM PT – Accelerate Cloud Adoption and Reduce Operational Risk with AWS Managed Services – Learn how AMS accelerates your migration to AWS, reduces your operating costs, improves security and compliance, and enables you to focus on your differentiating business priorities.

 

IoT:

September 25, 2019 | 9:00 AM – 10:00 AM PTComplex Monitoring for Industrial with AWS IoT Data Services – Learn how to solve your complex event monitoring challenges with AWS IoT Data Services.

 

Machine Learning:

September 23, 2019 | 9:00 AM – 10:00 AM PTTraining Machine Learning Models Faster – Learn how to train machine learning models quickly and with a single click using Amazon SageMaker.

September 30, 2019 | 11:00 AM – 12:00 PM PTUsing Containers for Deep Learning Workflows – Learn how containers can help address challenges in deploying deep learning environments.

October 3, 2019 | 1:00 PM – 2:30 PM PTVirtual Workshop: Getting Hands-On with Machine Learning and Ready to Race in the AWS DeepRacer League – Join DeClercq Wentzel, Senior Product Manager for AWS DeepRacer, for a presentation on the basics of machine learning and how to build a reinforcement learning model that you can use to join the AWS DeepRacer League.

 

AWS Marketplace:

September 30, 2019 | 9:00 AM – 10:00 AM PTAdvancing Software Procurement in a Containerized World – Learn how to deploy applications faster with third-party container products.

 

Migration:

September 24, 2019 | 11:00 AM – 12:00 PM PTApplication Migrations Using AWS Server Migration Service (SMS) – Learn how to use AWS Server Migration Service (SMS) for automating application migration and scheduling continuous replication, from your on-premises data centers or Microsoft Azure to AWS.

 

Networking & Content Delivery:

September 25, 2019 | 11:00 AM – 12:00 PM PTBuilding Highly Available and Performant Applications using AWS Global Accelerator – Learn how to build highly available and performant architectures for your applications with AWS Global Accelerator, now with source IP preservation.

September 30, 2019 | 1:00 PM – 2:00 PM PTAWS Office Hours: Amazon CloudFront – Just getting started with Amazon CloudFront and [email protected]? Get answers directly from our experts during AWS Office Hours.

 

Robotics:

October 1, 2019 | 11:00 AM – 12:00 PM PTRobots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.

 

Security, Identity & Compliance:

October 1, 2019 | 1:00 PM – 2:00 PM PTDeep Dive on Running Active Directory on AWS – Learn how to deploy Active Directory on AWS and start migrating your windows workloads.

 

Serverless:

October 2, 2019 | 9:00 AM – 10:00 AM PTDeep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.

 

Storage:

September 24, 2019 | 9:00 AM – 10:00 AM PTOptimize Your Amazon S3 Data Lake with S3 Storage Classes and Management Tools – Learn how to use the Amazon S3 Storage Classes and management tools to better manage your data lake at scale and to optimize storage costs and resources.

October 2, 2019 | 11:00 AM – 12:00 PM PTThe Great Migration to Cloud Storage: Choosing the Right Storage Solution for Your Workload – Learn more about AWS storage services and identify which service is the right fit for your business.

 

 

Running AWS Infrastructure On Premises with AWS Outposts

Post Syndicated from Matt Garman original https://aws.amazon.com/blogs/compute/running-aws-infrastructure-on-premises-with-aws-outposts/

We announced AWS Outposts at re:Invent last December and since then have seen immense customer interest. Customers have been asking for an AWS option on-premises to run applications with low latency and local data-processing requirements. AWS Outposts is a new service slated to launch in late 2019, that brings the same infrastructure, APIs, and tools that customers use in AWS to virtually any customer on-premises facility. It is a fully managed service; the physical infrastructure is delivered and installed by AWS, operated and monitored by AWS, and automatically updated and patched as part of being connected to an AWS Region.

You can use AWS Outposts to launch a range of Amazon EC2 instances (C5, M5, R5, I3en and G4, both with and without local storage options) and Amazon EBS volumes locally. In addition to EC2 and EBS, you can also run a wide range of AWS services locally on Outposts. At general availability, AWS services supported locally on Outposts will include Amazon ECS and Amazon EKS clusters for container-based applications, Amazon EMR clusters for data analytics, and Amazon RDS instances for relational database services. Additional services such as Amazon SageMaker and Amazon MSK are coming soon after launch.

An Outpost works as an extension of the AWS Region into your own data center; services running on the Outpost can seamlessly work with any AWS service or resource running in the cloud. For example, you can use private connectivity to your Amazon S3 buckets or Amazon DynamoDB tables in the public region. Amazon tools will work with Outposts as well. API calls will be logged via CloudTrail automatically and existing CloudFormation templates will work as well. When AWS launches new innovations, they will work with Outposts so customers can always take advantage of the latest technologies.

We are speaking with customers across verticals including manufacturing, healthcare, financial services, media and entertainment, and telecom who are interested in Outposts for their connected environments. One of the most common scenarios is applications that need single-digit millisecond latency to end-users or onsite equipment. Customers may need to run compute-intensive workloads on their manufacturing factory floors with precision and quality. Others have graphics-intensive applications such as image analysis that need low-latency access to end-users or storage-intensive workloads that collect and process hundreds of TBs of data a day. Customers want to integrate their cloud deployments with their on-premises environments and use AWS services for a consistent hybrid experience.

Some customers are also interested in leveraging AWS services in disconnected environments like cruise ships or remote mining locations. For these types of environments, AWS offers Snowball Edge, which is optimized to operate in environments with limited to no connectivity. Compared to Snowball Edge, Outposts are designed to be run exclusively in connected, on-premises environments.

I want to share an example of how an early user of Outposts is using it to control and operate industrial equipment at hundreds of sites around the world. They already run centralized decision-making applications in AWS to identify what work to execute at which site. Predictable low-latency access to local compute resources is essential for their on-premises control systems to manage materials with smoothness and speed. For instance, control systems need to process video streams to sense the product on the conveyor belt, and execute a robotic movement to direct the product to the right location. Their sites also run video monitoring applications where the captured data can exceed available bandwidth so they want to conduct video encoding on-premises.

We worked with our early customer to deploy an Outpost rack at one of their sites. After connecting their Outpost to the nearest local AWS Region, the customer has complete control over their virtual network, including selection of an IP address range, creation of subnets, and configuration of route tables and network gateways, just like with their Amazon VPC today. They can seamlessly extend their regional VPC to the Outpost by creating a subnet and associating it with the Outpost the same way they associate subnets with an Availability Zone today.

To launch instances on their Outposts, they use the exact same API call as they do in the public region, but target their Outpost subnet so the instances launch on the Outpost in their facility. These instances will run in their existing VPC and even be able to communicate with instances running in the public region via private IP addresses. As the infrastructure is the same hardware as what we use in our public AWS Region, the applications running on these instances will perform the same as they do in the public region. To get low-latency access to local compute and storage already running in their facility, the customer can create a local gateway in their VPC that makes it simple for the Outpost to route traffic directly to their local datacenter networks.

Using Outposts, the customer plans to standardize tooling across on-premises and the cloud, and automate deployments and configurations across hundreds of sites by using the same APIs, the same IAM permissions, the same EC2 AMIs, the same CloudFormation templates, and the same deployment pipelines everywhere. As Outposts are updated and patched as part of AWS regional operations, they no longer need to upgrade and patch their on-premises infrastructure or take downtime for maintenance.

We are excited about the enthusiasm we’re seeing from customers, and are eager to help them focus on innovating for their end-users without worrying about procuring, deploying, and operating the infrastructure for their applications. With AWS Outposts, we are bringing the same AWS infrastructure, APIs, services, and tools to help customers build applications that can run as reliably and securely at any of their sites as in the cloud. We recently released a new video that shares more details about the Outposts rack. If you haven’t already seen it, go check it out! As always, we look forward to your feedback.

To learn more about AWS Outposts, visit the product detail page.

Additional Resources:

What is an AWS Outposts Rack?

AWS Nitro System

Sharing automated blueprints for Amazon ECS continuous delivery using AWS Service Catalog

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/sharing-automated-blueprints-for-amazon-ecs-continuous-delivery-using-aws-service-catalog/

This post is contributed by Mahmoud ElZayet | Specialist SA – Dev Tech, AWS

 

Modern application development processes enable organizations to improve speed and quality continually. In this innovative culture, small, autonomous teams own the entire application life cycle. While such nimble, autonomous teams speed product delivery, they can also impose costs on compliance, quality assurance, and code deployment infrastructures.

Standardized tooling and application release code helps share best practices across teams, reduce duplicated code, speed on-boarding, create consistent governance, and prevent resource over-provisioning.

 

Overview

In this post, I show you how to use AWS Service Catalog to provide standardized and automated deployment blueprints. This helps accelerate and improve your product teams’ application release workflows on Amazon ECS. Follow my instructions to create a sample blueprint that your product teams can use to release containerized applications on ECS. You can also apply the blueprint concept to other technologies, such as serverless or Amazon EC2–based deployments.

The sample templates and scripts provided here are for demonstration purposes and should not be used “as-is” in your production environment. After you become familiar with these resources, create customized versions for your production environment, taking account of in-house tools and team skills, as well as all applicable standards and restrictions.

 

Prerequisites

To use this solution, you need the following resources:

 

Sample scenario

Example Corp. has various product teams that develop applications and services on AWS. Example Corp. teams have expressed interest in deploying their containerized applications managed by AWS Fargate on ECS. As part of Example Corp’s central tooling team, you want to enable teams to quickly release their applications on Fargate. However, you also make sure that they comply with all best practices and governance requirements.

For convenience, I also assume that you have supplied product teams working on the same domain, application, or project with a shared AWS account for service deployment. Using this account, they all deploy to the same ECS cluster.

In this scenario, you can author and provide these teams with a shared deployment blueprint on ECS Fargate. Using AWS Service Catalog, you can share the blueprint with teams as follows:

  1. Every time that a product team wants to release a new containerized application on ECS, they retrieve a new AWS Service Catalog ECS blueprint product. This enables them to obtain the required infrastructure, permissions, and tools. As a prerequisite, the ECS blueprint requires building blocks such as a git repository or an AWS CodeBuild project. Again, you can acquire those blocks through another AWS Service Catalog product.
  2. The product team completes the ECS blueprint’s required parameters, such as the desired number of ECS tasks and application name. As an administrator, you can constrain the value of some parameters such as the VPC and the cluster name. For more information, see AWS Service Catalog Template Constraints.
  3. The ECS blueprint product deploys all the required ECS resources, configured according to best practices. You can also use the AWS Cloud Development Kit (CDK) to maintain and provision pre-defined constructs for your infrastructure.
  4. A standardized CI/CD pipeline also generates, enabling your product teams to publish their application to ECS automatically. Ideally, this pipeline should have all stages, practices, security checks, and standards required for application release. Product teams must still author application code, create a Dockerfile, build specifications, run automated tests and deployment scripts, and complete other tasks required for application release.
  5. The ECS blueprint can be continually updated based on organization-wide feedback and to support new use cases. Your product team can always access the latest version through AWS Service Catalog. I recommend retaining multiple, customizable blueprints for various technologies.

 

For simplicity’s sake, my explanation envisions your environment as consisting of one AWS account. In practice, you can use IAM controls to segregate teams’ access to each other’s resources, even when they share an account. However, I recommend having at least two AWS accounts, one for testing and one for production purposes.

To see an example framework that helps deploy your AWS Service Catalog products to multiple accounts, see AWS Deployment Framework (ADF). This framework can also help you create cross-account pipelines that cater to different product teams’ needs, even when these teams deploy to the same technology stack.

To set up shared deployment blueprints for your production teams, follow the steps outlined in the following sections.

 

Set up the environment

In this section, I explain how to create a central ECS cluster in the appropriate VPC where teams can deploy their containers. I provide an AWS CloudFormation template to help you set up these resources. This template also creates an IAM role to be used by AWS Service Catalog later.

To run the CloudFormation template:

1. Use a git client to clone the following GitHub repository to a local directory. This will be the directory where you will run all the subsequent AWS CLI commands.

2. Using the AWS CLI, run the following commands. Replace <Application_Name> with a lowercase string with no spaces representing the application or microservice that your product team plans to release—for example, myapp.

aws cloudformation create-stack --stack-name "fargate-blueprint-prereqs" --template-body file://environment-setup.yaml --capabilities CAPABILITY_NAMED_IAM --parameters ParameterKey=ApplicationName,ParameterValue=<Application_Name>

3. Keep running the following command until the output reads CREATE_COMPLETE:

aws cloudformation describe-stacks --stack-name "fargate-blueprint-prereqs" --query Stacks[0].StackStatus

4. In case of error, use the describe-events CLI command or review error details on the console.

5. When the stack creation reads CREATE_COMPLETE, run the following command, and make a note of the output values in an editor of your choice. You need this information for a later step:

aws cloudformation describe-stacks  --stack-name fargate-blueprint-prereqs --query Stacks[0].Outputs

6. Run the following commands to copy those CloudFormation templates to Amazon S3. Replace <Template_Bucket_Name> with the template bucket output value you just copied into your editor of choice:

aws s3 cp core-build-tools.yml s3://<Template_Bucket_Name>/core-build-tools.yml

aws s3 cp ecs-fargate-deployment-blueprint.yml s3://<Template_Bucket_Name>/ecs-fargate-deployment-blueprint.yml

Create AWS Service Catalog products

In this section, I show you how to create two AWS Service Catalog products for teams to use in publishing their containerized app:

  1. Core Build Tools
  2. ECS Fargate Deployment Blueprint

To create an AWS Service Catalog portfolio that includes these products:

1. Using the AWS CLI, run the following command, replacing <Application_Name>
with the application name you defined earlier and replacing <Template_Bucket_Name>
with the template bucket output value you copied into your editor of choice:

aws cloudformation create-stack --stack-name "fargate-blueprint-catalog-products" --template-body file://catalog-products.yaml --parameters ParameterKey=ApplicationName,ParameterValue=<Application_Name> ParameterKey=TemplateBucketName,ParameterValue=<Template_Bucket_Name>

2. After a few minutes, check the stack creation completion. Run the following command until the output reads CREATE_COMPLETE:

aws cloudformation describe-stacks --stack-name "fargate-blueprint-catalog-products" --query Stacks[0].StackStatus

3. In case of error, use the describe-events CLI command or check error details in the console.

Your AWS Service Catalog configuration should now be ready.

 

Test product teams experience

In this section, I show you how to use IAM roles to impersonate a product team member and simulate their first experience of containerized application deployment.

 

Assume team role

To assume the role that you created during the environment setup step

1.     In the Management console, follow the instructions in Switching a Role.

  • For Account, enter the account ID used in the sample solution. To learn more about how to find an AWS account ID, see Your AWS Account ID and Its Alias.
  • For Role, enter <Application_Name>-product-team-role, where <Application_Name> is the same application name you defined in Environment Setup section.
  • (Optional) For Display name, enter a custom session value.

You are now logged in as a member of the product team.

 

Provision core build product

Next, provision the core build tools for your blueprint:

  1. In the Service Catalog console, you should now see the two products created earlier listed under Products.
  2. Select the first product, Core Build Tools.
  3. Choose LAUNCH PRODUCT.
  4. Name the product something such as <Application_Name>-build-tools, replacing <Application_Name> with the name previously defined for your application.
  5. Provide the same application name you defined previously.
  6. Leave the ContainerBuild parameter default setting as yes, as you are building a container requiring a container repository and its associated permissions.
  7. Choose NEXT three times, then choose LAUNCH.
  8. Under Events, watch the Status property. Keep refreshing until the status reads Succeeded. In case of failure, choose the URL value next to the key CloudformationStackARN. This choice takes you to the CloudFormation console, where you can find more information on the errors.

Now you have the following build tools created along with the required permissions:

  • AWS CodeCommit repository to store your code
  • CodeBuild project to build your container image and test your application code
  • Amazon ECR repository to store your container images
  • Amazon S3 bucket to store your build and release artifacts

 

Provision ECS Fargate deployment blueprint

In the Service Catalog console, follow the same steps to deploy the blueprint for ECS deployment. Here are the product provisioning details:

  • Product Name: <Application_Name>-fargate-blueprint.
  • Provisioned Product Name: <Application_Name>-ecs-fargate-blueprint.
  • For the parameters Subnet1, Subnet2, VpcId, enter the output values you copied earlier into your editor of choice in the Setup Environment section.
  • For other parameters, enter the following:
    • ApplicationName: The same application name you defined previously.
    • ClusterName: Enter the value example-corp-ecs-cluster, which is the name chosen in the template for the central cluster.
  • Leave the DesiredCount and LaunchType parameters to their default values.

After the blueprint product creation completes, you should have an ECS service with a sample task definition for your product team. The build tools created earlier include the permissions required for deploying to the ECS service. Also, a CI/CD pipeline has been created to guide your product teams as they publish their application to the ECS service. Ideally, this pipeline should have all stages, practices, security checks, and standards required for application release.

Product teams still have to author application code, create a Dockerfile, build specifications, run automated tests and deployment scripts, and perform other tasks required for application release. The blueprint product can provide wiki links to reference examples for these steps, or access to pre-provisioned sample pipelines.

 

Test your pipeline

Now, upload a sample app to test your pipeline:

  1. Log in with the product team role.
  2. In the CodeCommit console, select the repository with the application name that you defined in the environment setup section.
  3. Scroll down, choose Add file, Create file.
  4. Paste the following in the page editor, which is a script to build the container image and push it to the ECR repository:
version: 0.2
phases:
  pre_build:
    commands:
      - $(aws ecr get-login --no-include-email)
      - TAG="$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | head -c 8)"
      - IMAGE_URI="${REPOSITORY_URI}:${TAG}"
  build:
    commands:
      - docker build --tag "$IMAGE_URI" .
  post_build:
    commands:
      - docker push "$IMAGE_URI"      
      - printf '[{"name":"%s","imageUri":"%s"}]' "$APPLICATION_NAME" "$IMAGE_URI" > images.json
artifacts:
  files: 
    - images.json
    - '**/*'

5. For File name, enter buildspec.yml.

6. For Author name and Email address, enter your name and your preferred email address for the commit. Although optional, the addition of a commit message is a good practice.

7. Choose Commit changes.

8. Repeat the same steps for the Dockerfile. The sample Dockerfile creates a straightforward PHP application. Typically, you add your application content to that image.

File name: Dockerfile

File content:

FROM ubuntu:12.04

# Install dependencies
RUN apt-get update -y
RUN apt-get install -y git curl apache2 php5 libapache2-mod-php5 php5-mcrypt php5-mysql

# Configure apache
RUN a2enmod rewrite
RUN chown -R www-data:www-data /var/www
ENV APACHE_RUN_USER www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR /var/log/apache2

EXPOSE 80

CMD ["/usr/sbin/apache2", "-D",  "FOREGROUND"]

Your pipeline should now be ready to run successfully. Although you can list all current pipelines in the Region, you can only describe and modify pipelines that have a prefix matching your application name. To confirm:

  1. In the AWS CodePipeline console, select the pipeline <Application_Name>-ecs-fargate-pipeline.
  2. The pipeline should now be running.

Because you performed two commits to the repository from the console, you must wait for the second run to complete before successful deployment to ECS Fargate.

 

Clean up

To clean up the environment, run the following commands in the AWS CLI, replacing <Application_Name>
with your application name, <Account_Id> with your AWS Account ID with no hyphens and <Template_Bucket_Name>
with the template bucket output value you copied into your editor of choice:

aws ecr delete-repository --repository-name <Application_Name> --force

aws s3 rm s3://<Application_Name>-artifactbucket-<Account_Id> --recursive

aws s3 rm s3://<Template_Bucket_Name> --recursive

 

To remove the AWS Service Catalog products:

  1. Log in with the Product team role
  2. In the console, follow the instructions at Deleting Provisioned Products.
  3. Delete the AWS Service Catalog products in reverse order, starting with the blueprint product.

Run the following commands to delete the administrative resources:

aws cloudformation delete-stack --stack-name fargate-blueprint-catalog-products

aws cloudformation delete-stack --stack-name fargate-blueprint-prereqs

Conclusion

In this post, I showed you how to design and build ECS Fargate deployment blueprints. I explained how these accelerate and standardize the release of containerized applications on AWS. Your product teams can keep getting the latest standards and coded best practices through those automated blueprints.

As always, AWS welcomes feedback. Please submit comments or questions below.

Optimizing NGINX load balancing on Amazon EC2 A1 instances

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/compute/optimizing-nginx-load-balancing-on-amazon-ec2-a1-instances/

This post is contributed by Geoff Blake | Sr System Development Engineer

In a previous post, Optimizing Network Intensive Workloads on Amazon EC2 A1 Instances, I provided general guidance on tuning network-intensive workloads on A1 instances using Memcached as the example use case.

NGINX is another network-intensive application that, like Memcached, is a good fit for the A1 instances. This post describes how to configure NGINX as a load balancer on A1 for optimal performance using Amazon Linux 2, highlighting the important tuning parameters. You can extract up to a 30% performance benefit with these tunings over the default configuration on A1. However, depending on the particular data rates, processing required per request, instance size, and chosen AMI, the values in this post could change for your particular scenario. However, the methodologies described here are still applicable.

IRQ affinity and receive packet steering

Turning off irqbalance, pinning IRQs, and distributing network processing to specific cores helps the performance of NGINX when it runs on A1 instances.

Unlike Memcached in the previous post, NGINX does not benefit from using receive packet steering (RPS) to spread network processing among all cores or isolating processing to a subset of cores. It is better to allow NGINX access to all cores, while keeping IRQs and network processing isolated to a subset of cores.

Using a modified version of the script from the previous post, you can set IRQs, RPS, and NGINX workers to the following mappings. Finding the optimal balance of IRQs, RPS, and worker mappings can benefit performance by up to 10% on a1.4xlarge instances.

Instance typeIRQ settingsRPS settingsNGINX workers
a1.2xlargeCore 0, 4Core 0, 4Run on cores 0-7
a1.4xlargeCore 0, 8Core 0, 8Run on cores 0-15

NGINX access logging

For production deployments of NGINX, logging is critically important for monitoring the health of servers and debugging issues.

On large a1.4xlarge instance types, logging each request can become a performance bottleneck in certain situations. To alleviate this issue, tune the access_log configuration with the buffer modifier. Only a small amount of buffering is necessary to avoid logging becoming a bottleneck, on the order of 8 KB. This tuning parameter alone can give a significant boost of 20% or more on a1.4xlarge instances, depending on the traffic profile.

Additional Linux tuning parameters

The Linux networking stack is tuned to conserve resources for general use cases. When running NGINX as a load balancer, the server must be tuned to give the network stack considerably more resources than the default amount, for the prevention of dropped connections and packets. Here are the key parameters to tune:

  • core.somaxcon: Maximum number of backlogged sockets allowed. Increase this to 4096 or more to prevent dropping connection requests.
  • ipv4.tcp_max_syn_backlog: Maximum number of outstanding SYN requests. Set this to the same value as net.core.somaxconn.
  • ipv4.ip_local_port_range: To avoid prematurely running out of connections with clients, set to a larger ephemeral port range of 1024 65535.
  • core.rmem_max, net.core.wmem_max, net.ipv4.tcp_rmem, net.ipv4.tcp_wmem, net.ipv4.tcp_mem: Socket and TCP buffer settings. Tune these to be larger than the default. Setting the maximum buffer sizes to 8 MB should be sufficient.

Additional NGINX configuration parameters

To extract the most out of an NGINX load balancer on A1 instances, set the following  NGINX parameters higher than their default values:

 

  • worker_processes: Keeping this set to the default of auto works well on A1.
  • worker_rlimit_nofile: Set this to a high value such as 65536 to allow many connections and access to files.
  • worker_connections: Set this to a high value such as 49152 to cover most of the ephemeral port range.
  • keepalive_requests: The number of requests that a downstream client can make before a connection is closed. Setting this to a reasonably high number such as 10000 helps prevent connection churn and ephemeral port exhaustion.
  • Keepalive: Set to a value that covers your total number of backends plus expected growth, such as 100 in your upstream blocks to keep connections open to your backends from each worker process.

Summary

Using the above tuning parameters versus the defaults that come with Amazon Linux 2 and NGINX can significantly increase the performance of an NGINX load balancing workload by up to 30% on the a1.4xlarge instance type. Similar, but less dramatic performance gains were seen on the smaller a1.2xlarge instance type as well. If you have questions about your own workload running on A1 instances, contact us at [email protected].

 

Deploying GitOps with Weave Flux and Amazon EKS

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/deploying-gitops-with-weave-flux-and-amazon-eks/

This post is contributed by Jon Jozwiak | Senior Solutions Architect, AWS

 

You have countless options for deploying resources into an Amazon EKS cluster. GitOps—a term coined by Weaveworks—provides some substantial advantages over the alternatives. With only Git as the single, central source for controlling deployment into your cluster, GitOps provides easy version control on a platform your team already knows. Getting started with GitOps is straightforward: create a pull request, merge, and the configuration deploys to the EKS cluster.

Weave Flux makes running GitOps in your EKS cluster fast and easy, as it monitors your configuration in Git and image repositories and automates deployments. Weave Flux follows a pull model, automatically triggering deployments based on changes. This provides better security than most continuous deployment tools, which need permissions to access your cluster. This approach also provides Git with version control over your configuration and enables rollback.

This post walks through implementing Weave Flux and deploying resources to EKS using Git. To simplify the image build pipeline, I use AWS Service Catalog to provide a standardized pipeline. AWS Service Catalog lets you centrally define a portfolio of approved products that AWS users can provision. An AWS CloudFormation template defines each product, which can be version-controlled.

After you deploy the sample resources, I quickly demonstrate the GitOps approach where a new image results in the configuration automatically deploying to EKS. This new image may be a commit of Kubernetes manifests or a commit of Helm release definitions.

The following diagram shows the workflow.

Prerequisites

In GitOps, you manage Docker image builds separately from deployment configuration. For image builds, this example uses AWS CodePipeline and AWS CodeBuild, which provide a managed workflow from GitHub source through to an image landing in Amazon Elastic Container Registry (ECR).

This post assumes that you already have an EKS cluster deployed, including kubectl access. It also assumes that you have a GitHub account.

GitHub setup

First, create a GitHub repository to store the Kubernetes manifests (configuration files) to apply to the cluster.

In GitHub, create a GitHub repository. This repository holds Kubernetes manifests for your deployments. Name the repository k8s-config to align with this post. Leave it as a public repository, check the box for Initialize this repository with a README, and choose Create Repo.

On the GitHub repository page, choose Clone or Download and save the SSH string:

[email protected]:youruser/k8s-config.git

Next, create a GitHub token that allows creating and deleting repositories so AWS Service Catalog can deploy and remove pipelines.

  1. In your GitHub profile, access your token settings.
  2. Choose Generate New Token.
  3. Name your new token CodePipeline Service Catalog, and select the following options:
  • repo scopes (repo:status, repo_deployment, public_repo, and repo:invite)
  • read:org
  • write:public_key and read:public_key
  • write:repo_hook and read:repo_hook
  • read:user and user:email
  • delete_repo

4 . Choose Generate Token.

5. Copy and save your access token for future access.

 

Deploy Helm

Helm is a package manager for Kubernetes that allows you to define a chart. Charts are collections of related resources that let you create, version, share, and publish applications. By deploying Helm into your cluster, you make it much easier to deploy Weave Flux and other systems. If you’ve deployed Helm already, skip this section.

First, install the Helm client with the following command:

curl -LO https://git.io/get_helm.sh

chmod 700 get_helm.sh

./get_helm.sh

 

On macOS, you could alternatively enter the following command:

brew install kubernetes-helm

 

Next, set up a service account with cluster role for Tiller, Helm’s server-side component. This allows Tiller to manage resources in your cluster.

kubectl -n kube-system create sa tiller

kubectl create clusterrolebinding tiller-cluster-rule \

--clusterrole=cluster-admin \

--serviceaccount=kube-system:tiller

 

Finally, initialize Helm and verify your version. Tiller takes a few seconds to start.

helm init --service-account tiller --history-max 200

helm version

 

Deploy Weave Flux

With Helm installed, proceed with the Weave Flux installation. Begin by installing the Flux Custom Resource Definition.

kubectl apply -f https://raw.githubusercontent.com/fluxcd/flux/helm-0.10.1/deploy-helm/flux-helm-release-crd.yaml

Now add the Weave Flux Helm repository and proceed with the install. Make sure that you update the git.url to match the GitHub repository that you created earlier.

helm repo add fluxcd https://charts.fluxcd.io

helm upgrade -i flux --set helmOperator.create=true --set helmOperator.createCRD=false --set [email protected]:YOURUSER/k8s-config --namespace flux fluxcd/flux

 

You can use the following code to verify that you successfully deployed Flux. You should see three pods running:

kubectl get pods -n flux

NAME                                 READY     STATUS    RESTARTS   AGE

flux-5bd7fb6bb6-4sc78                1/1       Running   0          52s

flux-helm-operator-df5746688-84kw8   1/1       Running   0          52s

flux-memcached-6f8c446979-f45wj      1/1       Running   0          52s

 

Flux requires a deploy key to work with the GitHub repository. In this post, Flux generates the SSH key pair itself, but you can also specify a different key pair when deploying. To access the key, download fluxctl, a command line utility that interacts with the Flux API. The following steps work for Linux. For other OS platforms, see Installing fluxctl.

sudo wget -O /usr/local/bin/fluxctl https://github.com/fluxcd/flux/releases/download/1.14.1/fluxctl_linux_amd64

sudo chmod 755 /usr/local/bin/fluxctl

 

Validate that fluxctl installed successfully, then retrieve the public key pair using the following command. Specify the namespace where you deployed Flux.

fluxctl version

fluxctl --k8s-fwd-ns=flux identity

 

Copy the key and add that as a deploy key in your GitHub repository.

  1. In your GitHub repository, choose Settings, Deploy Keys.
  2. Choose Add deploy key and name the key Flux Deploy Key.
  3. Paste the key from fluxctl identity.
  4. Choose Allow Write Access, Add Key.

Now use AWS Service Catalog to set up your image build pipeline.

 

Set up AWS Service Catalog

To allow end users to consume product portfolios, you must associate a portfolio with an IAM principal (or principals): a user, group, or role. For this example, associate your current identity. After you master these basics, there are additional resources to teach you how to set up a multi-region, multi-account catalog.

To retrieve your current identity, use the AWS CLI to get your ARN:

aws sts get-caller-identity

Deploy the product portfolio that contains an image build pipeline service by doing the following:

  1. In the AWS CloudFormation console, launch the CloudFormation stack with the following link:

 

 

2. Choose Next.

3. On the Specify Details page, enter your ARN from get-caller-identity. Also enter an environment tag, which AWS applies to all resources from this portfolio.

4. Choose Next.

5. On the Options page, choose Next.

6. On the Review page, select the check box displayed next to I acknowledge that AWS CloudFormation might create IAM resources.

7. Choose Create. CloudFormation takes a few minutes to create your resources.

 

Deploy the image pipeline

The image pipeline provisions a GitHub repository, Amazon ECR repository, and AWS CodeBuild project. It also uses AWS CodePipeline to build a Docker image.

  1. In the AWS Management Console, go to the AWS Service Catalog products list and choose Pipeline for Docker Images.
  2. Choose Launch Product.
  3. For Name, enter ExamplePipeline, and choose Next.
  4. On the Parameters page, fill in a project name, description, and unique S3 bucket name. The specifics don’t matter, but make a note of the name and S3 bucket for later use.
  5. Fill in your GitHub User and GitHub Token values from earlier. Leave the rest of the fields as the default values.
  6. To clean up your GitHub repository on stack delete, change Delete Repository to true.
  7. Choose Next.
  8. On the TagOptions screen, choose Next.
  9. Choose Next on the Notifications page.
  10. On the Review page, choose Launch.

The launch process takes 1–2 minutes. You can verify that you now have a repository matching your project name (eks-example) in GitHub. You can also look at the pipeline created in the AWS CodePipeline console.

 

Deploying with GitOps

You can now provision workloads into the EKS cluster. With a GitOps approach, you only commit code and Kubernetes resource definitions to GitHub. AWS CodePipeline handles the image builds, and Weave Flux applies the desired state to Kubernetes.

First, create a simple Hello World application in your example pipeline. Clone the GitHub repository that you created in the previous step and substitute your GitHub user below.

git clone [email protected]:youruser/eks-example.git

cd eks-example

Create a base README file, a source directory, and download a simple NGINX configuration (hello.conf), home page (index.html), and Dockerfile.

echo "# eks-example" > README.md

mkdir src

wget -O src/hello.conf https://blog-gitops-eks.s3.amazonaws.com/hello.conf

wget -O src/index.html https://blog-gitops-eks.s3.amazonaws.com/index.html

wget https://blog-gitops-eks.s3.amazonaws.com/Dockerfile

 

Now that you have a simple Hello World app with Dockerfile, commit the changes to kick off the pipeline.

git add .

git commit -am "Initial commit"

[master (root-commit) d69a6ba] Initial commit

4 files changed, 34 insertions(+)

create mode 100644 Dockerfile

create mode 100644 README.md

create mode 100644 src/hello.conf

create mode 100644 src/index.html

git push

 

Watch in the AWS CodePipeline console to see the image build in process. This may take a minute to start. When it’s done, look in the ECR console to see the first version of the container image.

To deploy this image and the Hello World application, commit Kubernetes manifests for Flux. Create a namespace, deployment, and service in the Kubernetes Git repository (k8s-config) you created. Make sure that you aren’t in your eks-example repository directory.

cd ..

git clone [email protected]:youruser/k8s-config.git

cd k8s-config

mkdir charts namespaces releases workloads

 

The preceding directory structure helps organize the repository but isn’t necessary. Flux can descend into subdirectories and look for YAML files to apply.

Create a namespace Kubernetes manifest.

cat << EOF > namespaces/eks-example.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: eks-example
  name: eks-example
EOF

Now create a deployment manifest. Make sure that you update this image to point to your repository and image tag. For example, <Account ID>.dkr.ecr.us-east-1.amazonaws.com/eks-example:d69a6bac.

cat << EOF > workloads/eks-example-dep.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: eks-example
  namespace: eks-example
  labels:
    app: eks-example
  annotations:
    # Container Image Automated Updates
    flux.weave.works/automated: "true"
    # do not apply this manifest on the cluster
    #flux.weave.works/ignore: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: eks-example
  template:
    metadata:
      labels:
        app: eks-example
    spec:
      containers:
      - name: eks-example
        image: <Your Account>.dkr.ecr.us-east-1.amazonaws.com/eks-example:d69a6bac
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /
            port: http
        readinessProbe:
          httpGet:
            path: /
            port: http
EOF

 

Finally, create a service manifest to create a load balancer.

cat << EOF > workloads/eks-example-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: eks-example
  namespace: eks-example
  labels:
    app: eks-example
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: eks-example
EOF

 

In the preceding code, there are two Kubernetes annotations for Flux. The first, flux.weave.works/automated, tells Flux whether the container image should be automatically updated. This example sets the value to true, enabling updates to your deployment as new images arrive in the registry. This example comments out the second annotation, flux.weave.works/ignore. However, you can use it to tell Flux to ignore the deployment temporarily.

Commit the changes, and in a few minutes, it automatically deploys.

git add .
git commit -am "eks-example deployment"
[master 954908c] eks-example deployment
 3 files changed, 64 insertions(+)
 create mode 100644 namespaces/eks-example.yaml
 create mode 100644 workloads/eks-example-dep.yaml
 create mode 100644 workloads/eks-example-svc.yaml

 

Make sure that you push your changes.

git push

Now check the logs of your Flux pod:

kubectl get pods -n flux

Update the name below to reflect the name of the pod in your deployment. This sample pulls every five minutes for changes. When it triggers, you should see kubectl apply log messages to create the namespace, service, and deployment.

kubectl logs flux-5bd7fb6bb6-4sc78 -n flux

Find the load balancer input for your service with the following:

kubectl describe service eks-example -n eks-example

Now when you connect to the load balancer address in a browser, you can see the Hello World app.

Change the eks-example source code in a small way (such as changing index.html to say Hello World Deployment 2), then commit and push to Git.

After a few minutes, refresh your browser to see the deployed change. You can watch the changes in AWS CodePipeline, in ECR, and through Flux logs. Weave Flux automatically updated your deployment manifests in the k8s-config repository to deploy the new image as it detected it. To back out that change, use a git revert or git reset command.

Finally, you can use the same approach to deploy Helm charts. You can host these charts within the configuration Git repository (k8s-config in this example), or on an external chart repository. In the following example, you use an external chart repository.

In your k8s-config directory, get the latest changes from your repository and then create a Helm release from an external chart.

cd k8s-config

git pull

 

First, create the namespace manifest.

cat << EOF > namespaces/nginx.yaml
apiVersion: v1
kind: Namespace
metadata:
  labels:
    name: nginx
  name: nginx
EOF

 

Then create the Helm release manifest. This is a custom resource definition provided by Weave Flux.

cat << EOF > releases/nginx.yaml
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  name: mywebserver
  namespace: nginx
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.nginx: semver:~1.16
    flux.weave.works/locked: 'true'
    flux.weave.works/locked_msg: '"Halt updates for now"'
    flux.weave.works/locked_user: User Name <[email protected]>
spec:
  releaseName: mywebserver
  chart:
    repository: https://charts.bitnami.com/bitnami/
    name: nginx
    version: 3.3.2
  values:
    usePassword: true
    image:
      registry: docker.io
      repository: bitnami/nginx
      tag: 1.16.0-debian-9-r46
    service:
      type: LoadBalancer
      port: 80
      nodePorts:
        http: ""
      externalTrafficPolicy: Cluster
    ingress:
      enabled: false
    livenessProbe:
      httpGet:
        path: /
        port: http
      initialDelaySeconds: 30
      timeoutSeconds: 5
      failureThreshold: 6
    readinessProbe:
      httpGet:
        path: /
        port: http
      initialDelaySeconds: 5
      timeoutSeconds: 3
      periodSeconds: 5
    metrics:
      enabled: false
EOF

git add . 
git commit -am "Adding NGINX Helm release"
git push

 

There are a few new annotations for Flux above. The flux.weave.works/locked annotation tells Flux to lock the deployment. This is useful if you find a known bad image and must roll back to a previous version. In addition, the flux.weave.works/tag.nginx annotation filters image tags by semantic versioning.

Wait up to five minutes for Flux to pull the configuration and verify this deployment as you did in the previous example:

kubectl get pods -n flux

kubectl logs flux-5bd7fb6bb6-4sc78 -n flux

 

kubectl get all -n nginx

 

If this doesn’t deploy, ensure Helm initialized as described earlier in this post.

kubectl get pods -n kube-system | grep tiller

kubectl get pods -n flux

kubectl logs flux-helm-operator-df5746688-84kw8 -n flux

 

Clean up

Log in as an administrator and follow these steps to clean up your sample deployment.

  1. Delete all images from the Amazon ECR repository.

2. In AWS Service Catalog provisioned products, select the three dots to the left of your ExamplePipeline service and choose Terminate provisioned product. Wait until it completes termination (1–2 minutes).

3. Delete your Amazon S3 artifact bucket.

4. Delete Weave Flux:

helm delete flux --purge

kubectl delete ns flux

kubectl delete crd helmreleases.flux.weave.works

5. Delete the load balancer services:

helm delete mywebserver --purge

kubectl delete ns nginx

kubectl delete svc eks-example -n eks-example

kubectl delete deployment eks-example -n eks-example

kubectl delete ns eks-example

6. Clean up your GitHub repositories:

 – Go to your k8s-config repository in GitHub, choose Settings, scroll to the bottom and choose Delete this repository. If you set delete to false in the pipeline service, you also must delete your eks-example repository.

 – Delete the personal access token that you created.

7.     If you provisioned an EKS cluster at the beginning of this post, delete it:

eksctl get cluster

eksctl delete cluster <clustername>

8.     In the AWS CloudFormation console, select the DevServiceCatalog stack, and choose the Actions, Delete Stack.

Conclusion

In this post, I demonstrated how to use a GitOps approach, which allows you to focus on committing code and configuration to Git rather than learning new CI/CD tooling. Git acts as the single source of truth, and Weave Flux pulls changes and ensures that the Kubernetes cluster configuration matches the desired state.

In addition, AWS Service Catalog can be used to create a portfolio of services that enables you to standardize your offerings, such as an image build pipeline based on AWS CodePipeline.

As always, AWS welcomes feedback. Please submit comments or questions below.

Introducing the capacity-optimized allocation strategy for Amazon EC2 Spot Instances

Post Syndicated from Chad Schmutzer original https://aws.amazon.com/blogs/compute/introducing-the-capacity-optimized-allocation-strategy-for-amazon-ec2-spot-instances/

AWS announces the new capacity-optimized allocation strategy for Amazon EC2 Auto Scaling and EC2 Fleet. This new strategy automatically makes the most efficient use of spare capacity while still taking advantage of the steep discounts offered by Spot Instances. It’s a new way for you to gain easy access to extra EC2 compute capacity in the AWS Cloud.

This post compares how the capacity-optimized allocation strategy deploys capacity compared to the current lowest-price allocation strategy.

Overview

Spot Instances are spare EC2 compute capacity in the AWS Cloud available to you at savings of up to 90% off compared to On-Demand prices. The only difference between On-Demand Instances and Spot Instances is that Spot Instances can be interrupted by EC2 with two minutes of notification when EC2 needs the capacity back.

When making requests for Spot Instances, customers can take advantage of allocation strategies within services such as EC2 Auto Scaling and EC2 Fleet. The allocation strategy determines how the Spot portion of your request is fulfilled from the possible Spot Instance pools you provide in the configuration.

The existing allocation strategy available in EC2 Auto Scaling and EC2 Fleet is called “lowest-price” (with an option to diversify across N pools). This strategy allocates capacity strictly based on the lowest-priced Spot Instance pool or pools. The “diversified” allocation strategy (available in EC2 Fleet but not in EC2 Auto Scaling) spreads your Spot Instances across all the Spot Instance pools you’ve specified as evenly as possible.

As the AWS global infrastructure has grown over time in terms of geographic Regions and Availability Zones as well as the raw number of EC2 Instance families and types, so has the amount of spare EC2 capacity. Therefore it is important that customers have access to tools to help them utilize spare EC2 capacity optimally. The new capacity-optimized strategy for both EC2 Auto Scaling and EC2 Fleet provisions Spot Instances from the most-available Spot Instance pools by analyzing capacity metrics.

Walkthrough

To illustrate how the capacity-optimized allocation strategy deploys capacity compared to the existing lowest-price allocation strategy, here are examples of Auto Scaling group configurations and use cases for each strategy.

Lowest-price (diversified over N pools) allocation strategy

The lowest-price allocation strategy deploys Spot Instances from the pools with the lowest price in each Availability Zone. This strategy has an optional modifier SpotInstancePools that provides the ability to diversify over the N lowest-priced pools in each Availability Zone.

Spot pricing changes slowly over time based on long-term trends in supply and demand, but capacity fluctuates in real time. The lowest-price strategy does not account for pool capacity depth as it deploys Spot Instances.

As a result, the lowest-price allocation strategy is a good choice for workloads with a low cost of interruption that want the lowest possible prices, such as:

  • Time-insensitive workloads
  • Extremely transient workloads
  • Workloads that are easily check-pointed and restarted

Example

The following example configuration shows how capacity could be allocated in an Auto Scaling group using the lowest-price allocation strategy diversified over two pools:

{
  "AutoScalingGroupName": "runningAmazonEC2WorkloadsAtScale",
  "MixedInstancesPolicy": {
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "my-launch-template",
        "Version": "$Latest"
      },
      "Overrides": [
        {
          "InstanceType": "c3.large"
        },
        {
          "InstanceType": "c4.large"
        },
        {
          "InstanceType": "c5.large"
        }
      ]
    },
    "InstancesDistribution": {
      "OnDemandPercentageAboveBaseCapacity": 0,
      "SpotAllocationStrategy": "lowest-price",
      "SpotInstancePools": 2
    }
  },
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 60,
  "HealthCheckType": "EC2",
  "VPCZoneIdentifier": "subnet-a1234567890123456,subnet-b1234567890123456,subnet-c1234567890123456"
}

In this configuration, you request 60 Spot Instances because DesiredCapacity is set to 60 and OnDemandPercentageAboveBaseCapacity is set to 0. The example follows Spot best practices and is flexible across c3.large, c4.large, and c5.large in us-east-1a, us-east-1b, and us-east-1c (mapped according to the subnets in VPCZoneIdentifier). The Spot allocation strategy is set to lowest-price over two SpotInstancePools.

First, EC2 Auto Scaling tries to make sure that it balances the requested capacity across all the Availability Zones provided in the request. To do so, it splits the target capacity request of 60 across the three zones. Then, the lowest-price allocation strategy allocates the Spot Instance launches to the lowest-priced pool per zone.

Using the example Spot prices shown in the following table, the resulting allocation is:

  • 20 Spot Instances from us-east-1a (10 c3.large, 10 c4.large)
  • 20 Spot Instances from us-east-1b (10 c3.large, 10 c4.large)
  • 20 Spot Instances from us-east-1c (10 c3.large, 10 c4.large)
Availability ZoneInstance typeSpot Instances allocatedSpot price
us-east-1ac3.large10$0.0294
us-east-1ac4.large10$0.0308
us-east-1ac5.large0$0.0408
us-east-1bc3.large10$0.0294
us-east-1bc4.large10$0.0308
us-east-1bc5.large0$0.0387
us-east-1cc3.large10$0.0294
us-east-1cc4.large10$0.0331
us-east-1cc5.large0$0.0353

The cost for this Auto Scaling group is $1.83/hour. Of course, the Spot Instances are allocated according to the lowest price and are not optimized for capacity. The Auto Scaling group could experience higher interruptions if the lowest-priced Spot Instance pools are not as deep as others, since upon interruption the Auto Scaling group will attempt to re-provision instances into the lowest-priced Spot Instance pools.

Capacity-optimized allocation strategy

There is a price associated with interruptions, restarting work, and checkpointing. While the overall hourly cost of capacity-optimized allocation strategy might be slightly higher, the possibility of having fewer interruptions can lower the overall cost of your workload.

The effectiveness of the capacity-optimized allocation strategy depends on following Spot best practices by being flexible and providing as many instance types and Availability Zones (Spot Instance pools) as possible in the configuration. It is also important to understand that as capacity demands change, the allocations provided by this strategy also change over time.

Remember that Spot pricing changes slowly over time based on long-term trends in supply and demand, but capacity fluctuates in real time. The capacity-optimized strategy does account for pool capacity depth as it deploys Spot Instances, but it does not account for Spot prices.

As a result, the capacity-optimized allocation strategy is a good choice for workloads with a high cost of interruption, such as:

  • Big data and analytics
  • Image and media rendering
  • Machine learning
  • High performance computing

Example

The following example configuration shows how capacity could be allocated in an Auto Scaling group using the capacity-optimized allocation strategy:

{
  "AutoScalingGroupName": "runningAmazonEC2WorkloadsAtScale",
  "MixedInstancesPolicy": {
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "my-launch-template",
        "Version": "$Latest"
      },
      "Overrides": [
        {
          "InstanceType": "c3.large"
        },
        {
          "InstanceType": "c4.large"
        },
        {
          "InstanceType": "c5.large"
        }
      ]
    },
    "InstancesDistribution": {
      "OnDemandPercentageAboveBaseCapacity": 0,
      "SpotAllocationStrategy": "capacity-optimized"
    }
  },
  "MinSize": 10,
  "MaxSize": 100,
  "DesiredCapacity": 60,
  "HealthCheckType": "EC2",
  "VPCZoneIdentifier": "subnet-a1234567890123456,subnet-b1234567890123456,subnet-c1234567890123456"
}

In this configuration, you request 60 Spot Instances because DesiredCapacity is set to 60 and OnDemandPercentageAboveBaseCapacity is set to 0. The example follows Spot best practices (especially critical when using the capacity-optimized allocation strategy) and is flexible across c3.large, c4.large, and c5.large in us-east-1a, us-east-1b, and us-east-1c (mapped according to the subnets in VPCZoneIdentifier). The Spot allocation strategy is set to capacity-optimized.

First, EC2 Auto Scaling tries to make sure that the requested capacity is evenly balanced across all the Availability Zones provided in the request. To do so, it splits the target capacity request of 60 across the three zones. Then, the capacity-optimized allocation strategy optimizes the Spot Instance launches by analyzing capacity metrics per instance type per zone. This is because this strategy effectively optimizes by capacity instead of by the lowest price (hence its name).

Using the example Spot prices shown in the following table, the resulting allocation is:

  • 20 Spot Instances from us-east-1a (20 c4.large)
  • 20 Spot Instances from us-east-1b (20 c3.large)
  • 20 Spot Instances from us-east-1c (20 c5.large)
Availability ZoneInstance typeSpot Instances allocatedSpot price
us-east-1ac3.large0$0.0294
us-east-1ac4.large20$0.0308
us-east-1ac5.large0$0.0408
us-east-1bc3.large20$0.0294
us-east-1bc4.large0$0.0308
us-east-1bc5.large0$0.0387
us-east-1cc3.large0$0.0294
us-east-1cc4.large0$0.0308
us-east-1cc5.large20$0.0353

The cost for this Auto Scaling group is $1.91/hour, only 5% more than the lowest-priced example above. However, notice the distribution of the Spot Instances is different. This is because the capacity-optimized allocation strategy determined this was the most efficient distribution from an available capacity perspective.

Conclusion

Consider using the new capacity-optimized allocation strategy to make the most efficient use of spare capacity. Automatically deploy into the most available Spot Instance pools—while still taking advantage of the steep discounts provided by Spot Instances.

This allocation strategy may be especially useful for workloads with a high cost of interruption, including:

  • Big data and analytics
  • Image and media rendering
  • Machine learning
  • High performance computing

No matter which allocation strategy you choose, you still enjoy the steep discounts provided by Spot Instances. These discounts are possible thanks to the stable Spot pricing made available with the new Spot pricing model.

Chad Schmutzer is a Principal Developer Advocate for the EC2 Spot team. Follow him on twitter to get the latest updates on saving at scale with Spot Instances, to provide feedback, or just say HI.

Why AWS is the best place for your Windows workloads, and how Microsoft is changing their licensing to try to awkwardly force you into Azure

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/compute/why-aws-is-the-best-place-for-your-windows-workloads-and-how-microsoft-is-changing-their-licensing-to-try-to-awkwardly-force-you-into-azure/

This post is contributed by Sandy Carter, Vice President at AWS. It is also located on LinkedIn

Many companies today are considering how to migrate to the cloud to take advantage of the agility and innovation that the cloud brings. Having the right to choose the best provider for your business is critical.

AWS is the best cloud for running your Windows workloads and our experience running Windows applications has earned our customers’ trust. It’s been more than 11 years since AWS first made it possible for customers to run their Windows workloads on AWS—longer than Azure has even been around, and according to a report by IDC, we host nearly two times as many Windows Server instances in the cloud as Microsoft. And more and more enterprises are entrusting their Windows workloads to AWS because of its greater reliability, higher performance, and lower cost, with the number of AWS enterprise customers using AWS for Windows Server growing more than 400% in the past three years.

In fact, we are seeing a trend of customers moving from Azure to AWS. eMarketer started their digital transformation with Azure, but found performance challenges and higher costs that led them to migrate all of their workloads over to AWS. Why did they migrate? They found a better experience, stronger support, higher availability, and better performance, with 4x faster launch times and 35% lower costs compared to Azure. Ancestry, a leader in consumer genomics, went all-in on development in the cloud moving 10 PB data and 400 Windows-based applications in less than 9 months. They also modernized to Linux with .NET Core and leveraged advanced technologies including serverless and containers. With results like that, you can see why organizations like Sysco, Edwards Life Sciences, Expedia, and NextGen Healthcare have chosen AWS to upgrade, migrate, and modernize their Windows workloads

If you are interested in seeing your cost savings over running on-premises or over running on Azure,  send us an email at [email protected] or visit why AWS is the best cloud for Windows.

Optimizing Amazon ECS task density using awsvpc network mode

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/optimizing-amazon-ecs-task-density-using-awsvpc-network-mode/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS recently increased the number of elastic network interfaces available when you run tasks on Amazon ECS. Use the account setting called awsvpcTrunking. If you use the Amazon EC2 launch type and task networking (awsvpc network mode), you can now run more tasks on an instance—5 to 17 times as many—as you did before.

As more of you embrace microservices architectures, you deploy increasing numbers of smaller tasks. AWS now offers you the option of more efficient packing per instance, potentially resulting in smaller clusters and associated savings.

 

Overview

To manage your own cluster of EC2 instances, use the EC2 launch type. Use task networking to run ECS tasks using the same networking properties as if tasks were distinct EC2 instances.

Task networking offers several benefits. Every task launched with awsvpc network mode has its own attached network interface, a primary private IP address, and an internal DNS hostname. This simplifies container networking and gives you more control over how tasks communicate, both with each other and with other services within their virtual private clouds (VPCs).

Task networking also lets you take advantage of other EC2 networking features like VPC Flow Logs. This feature lets you monitor traffic to and from tasks. It also provides greater security control for containers, allowing you to use security groups and network monitoring tools at a more granular level within tasks. For more information, see Introducing Cloud Native Networking for Amazon ECS Containers.

However, if you run container tasks on EC2 instances with task networking, you can face a networking limit. This might surprise you, particularly when an instance has plenty of free CPU and memory. The limit reflects the number of network interfaces available to support awsvpc network mode per container instance.

 

Raise network interface density limits with trunking

The good news is that AWS raised network interface density limits by implementing a networking feature on ECS called “trunking.” This is a technique for multiplexing data over a shared communication link.

If you’re migrating to microservices using AWS App Mesh, you should optimize network interface density. App Mesh requires awsvpc networking to provide routing control and visibility over an ever-expanding array of running tasks. In this context, increased network interface density might save money.

By opting for network interface trunking, you should see a significant increase in capacity—from 5 to 17 times more than the previous limit. For more information on the new task limits per container instance, see Supported Amazon EC2 Instance Types.

Applications with tasks not hitting CPU or memory limits also benefit from this feature through the more cost-effective “bin packing” of container instances.

 

Trunking is an opt-in feature

AWS chose to make the trunking feature opt-in due to the following factors:

  • Instance registration: While normal instance registration is straightforward with trunking, this feature increases the number of asynchronous instance registration steps that can potentially fail. Any such failures might add extra seconds to launch time.
  • Available IP addresses: The “trunk” belongs to the same subnet in which the instance’s primary network interface originates. This effectively reduces the available IP addresses and potentially the ability to scale out on other EC2 instances sharing the same subnet. The trunk consumes an IP address. With a trunk attached, there are two assigned IP addresses per instance, one for the primary interface and one for the trunk.
  • Differing customer preferences and infrastructure: If you have high CPU or memory workloads, you might not benefit from trunking. Or, you may not want awsvpc networking.

Consequently, AWS leaves it to you to decide if you want to use this feature. AWS might revisit this decision in the future, based on customer feedback. For now, your account roles or users must opt in to the awsvpcTrunking account setting to gain the benefits of increased task density per container instance.

 

Enable trunking

Enable the ECS elastic network interface trunking feature to increase the number of network interfaces that can be attached to supported EC2 container instance types. You must meet the following prerequisites before you can launch a container instance with the increased network interface limits:

  • Your account must have the AWSServiceRoleForECS service-linked role for ECS.
  • You must opt into the awsvpcTrunking  account setting.

 

Make sure that a service-linked role exists for ECS

A service-linked role is a unique type of IAM role linked to an AWS service (such as ECS). This role lets you delegate the permissions necessary to call other AWS services on your behalf. Because ECS is a service that manages resources on your behalf, you need this role to proceed.

In most cases, you won’t have to create a service-linked role. If you created or updated an ECS cluster, ECS likely created the service-linked role for you.

You can confirm that your service-linked role exists using the AWS CLI, as shown in the following code example:

$ aws iam get-role --role-name AWSServiceRoleForECS
{
    "Role": {
        "Path": "/aws-service-role/ecs.amazonaws.com/",
        "RoleName": "AWSServiceRoleForECS",
        "RoleId": "AROAJRUPKI7I2FGUZMJJY",
        "Arn": "arn:aws:iam::226767807331:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS",
        "CreateDate": "2018-11-09T21:27:17Z",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "ecs.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        },
        "Description": "Role to enable Amazon ECS to manage your cluster.",
        "MaxSessionDuration": 3600
    }
}

If the service-linked role does not exist, create it manually with the following command:

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

For more information, see Using Service-Linked Roles for Amazon ECS.

 

Opt in to the awsvpcTrunking account setting

Your account, IAM user, or role must opt in to the awsvpcTrunking account setting. Select this setting using the AWS CLI or the ECS console. You can opt in for an account by making awsvpcTrunking  its default setting. Or, you can enable this setting for the role associated with the instance profile with which the instance launches. For instructions, see Account Settings.

 

Other considerations

After completing the prerequisites described in the preceding sections, launch a new container instance with increased network interface limits using one of the supported EC2 instance types.

Keep the following in mind:

  • It’s available with the latest variant of the ECS-optimized AMI.
  • It only affects creation of new container instances after opting into awsvpcTrunking.
  • It only affects tasks created with awsvpc network mode and EC2 launch type. Tasks created with the AWS Fargate launch type always have a dedicated network interface, no matter how many you launch.

For details, see ENI Trunking Considerations.

 

Summary

If you seek to optimize the usage of your EC2 container instances for clusters that you manage, enable the increased network interface density feature with awsvpcTrunking. By following the steps outlined in this post, you can launch tasks using significantly fewer EC2 instances. This is especially useful if you embrace a microservices architecture, with its increasing numbers of lighter tasks.

Hopefully, you found this post informative and the proposed solution intriguing. As always, AWS welcomes all feedback or comment.

Using AWS App Mesh with Fargate

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/using-aws-app-mesh-with-fargate/

This post is contributed by Tony Pujals | Senior Developer Advocate, AWS

 

AWS App Mesh is a service mesh, which provides a framework to control and monitor services spanning multiple AWS compute environments. My previous post provided a walkthrough to get you started. In it, I showed deploying a simple microservice application to Amazon ECS and configuring App Mesh to provide traffic control and observability.

In this post, I show more advanced techniques using AWS Fargate as an ECS launch type. I show you how to deploy a specific version of the colorteller service from the previous post. Finally, I move on and explore distributing traffic across other environments, such as Amazon EC2 and Amazon EKS.

I simplified this example for clarity, but in the real world, creating a service mesh that bridges different compute environments becomes useful. Fargate is a compute service for AWS that helps you run containerized tasks using the primitives (the tasks and services) of an ECS application. This lets you work without needing to directly configure and manage EC2 instances.

 

Solution overview

This post assumes that you already have a containerized application running on ECS, but want to shift your workloads to use Fargate.

You deploy a new version of the colorteller service with Fargate, and then begin shifting traffic to it. If all goes well, then you continue to shift more traffic to the new version until it serves 100% of all requests. Use the labels “blue” to represent the original version and “green” to represent the new version. The following diagram shows programmer model of the Color App.

You want to begin shifting traffic over from version 1 (represented by colorteller-blue in the following diagram) over to version 2 (represented by colorteller-green).

In App Mesh, every version of a service is ultimately backed by actual running code somewhere, in this case ECS/Fargate tasks. Each service has its own virtual node representation in the mesh that provides this conduit.

The following diagram shows the App Mesh configuration of the Color App.

 

 

After shifting the traffic, you must physically deploy the application to a compute environment. In this demo, colorteller-blue runs on ECS using the EC2 launch type and colorteller-green runs on ECS using the Fargate launch type. The goal is to test with a portion of traffic going to colorteller-green, ultimately increasing to 100% of traffic going to the new green version.

 

AWS compute model of the Color App.

Prerequisites

Before following along, set up the resources and deploy the Color App as described in the previous walkthrough.

 

Deploy the Fargate app

To get started after you complete your Color App, configure it so that your traffic goes to colorteller-blue for now. The blue color represents version 1 of your colorteller service.

Log into the App Mesh console and navigate to Virtual routers for the mesh. Configure the HTTP route to send 100% of traffic to the colorteller-blue virtual node.

The following screenshot shows routes in the App Mesh console.

Test the service and confirm in AWS X-Ray that the traffic flows through the colorteller-blue as expected with no errors.

The following screenshot shows racing the colorgateway virtual node.

 

Deploy the new colorteller to Fargate

With your original app in place, deploy the send version on Fargate and begin slowly increasing the traffic that it handles rather than the original. The app colorteller-green represents version 2 of the colorteller service. Initially, only send 30% of your traffic to it.

If your monitoring indicates a healthy service, then increase it to 60%, then finally to 100%. In the real world, you might choose more granular increases with automated rollout (and rollback if issues arise), but this demonstration keeps things simple.

You pushed the gateway and colorteller images to ECR (see Deploy Images) in the previous post, and then launched ECS tasks with these images. For this post, launch an ECS task using the Fargate launch type with the same colorteller and envoy images. This sets up the running envoy container as a sidecar for the colorteller container.

You don’t have to manually configure the EC2 instances in a Fargate launch type. Fargate automatically colocates the sidecar on the same physical instance and lifecycle as the primary application container.

To begin deploying the Fargate instance and diverting traffic to it, follow these steps.

 

Step 1: Update the mesh configuration

You can download updated AWS CloudFormation templates located in the repo under walkthroughs/fargate.

This updated mesh configuration adds a new virtual node (colorteller-green-vn). It updates the virtual router (colorteller-vr) for the colorteller virtual service so that it distributes traffic between the blue and green virtual nodes at a 2:1 ratio. That is, the green node receives one-third of the traffic.

$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Step 2: Deploy the green task to Fargate

The fargate-colorteller.sh script creates parameterized template definitions before deploying the fargate-colorteller.yaml CloudFormation template. The change to launch a colorteller task as a Fargate task is in fargate-colorteller-task-def.json.

$ ./fargate-colorteller.sh
...

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - DEMO-fargate-colorteller
$

 

Verify the Fargate deployment

The ColorApp endpoint is one of the CloudFormation template’s outputs. You can view it in the stack output in the AWS CloudFormation console, or fetch it with the AWS CLI:

$ colorapp=$(aws cloudformation describe-stacks --stack-name=$ENVIRONMENT_NAME-ecs-colorapp --query="Stacks[0
].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp> ].Outputs[?OutputKey=='ColorAppEndpoint'].OutputValue" --output=text); echo $colorapp
http://DEMO-Publi-YGZIJQXL5U7S-471987363.us-west-2.elb.amazonaws.com

Assign the endpoint to the colorapp environment variable so you can use it for a few curl requests:

$ curl $colorapp/color
{"color":"blue", "stats": {"blue":1}}
$

The 2:1 weight of blue to green provides predictable results. Clear the histogram and run it a few times until you get a green result:

$ curl $colorapp/color/clear
cleared

$ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done

0: {"color":"blue", "stats": {"blue":1}}
1: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"blue", "stats": {"blue":0.67,"green":0.33}}
3: {"color":"green", "stats": {"blue":0.5,"green":0.5}}
4: {"color":"blue", "stats": {"blue":0.6,"green":0.4}}
5: {"color":"gre
en", "stats": {"blue":0.5,"green":0.5}}
6: {"color":"blue", "stats": {"blue":0.57,"green":0.43}}
7: {"color":"blue", "stats": {"blue":0.63,"green":0.38}}
8: {"color":"green", "stats": {"blue":0.56,"green":0.44}}
...
199: {"color":"blue", "stats": {"blue":0.66,"green":0.34}}

This reflects the expected result for a 2:1 ratio. Check everything on your AWS X-Ray console.

The following screenshot shows the X-Ray console map after the initial testing.

The results look good: 100% success, no errors.

You can now increase the rollout of the new (green) version of your service running on Fargate.

Using AWS CloudFormation to manage your stacks lets you keep your configuration under version control and simplifies the process of deploying resources. AWS CloudFormation also gives you the option to update the virtual route in appmesh-colorapp.yaml and deploy the updated mesh configuration by running appmesh-colorapp.sh.

For this post, use the App Mesh console to make the change. Choose Virtual routers for appmesh-mesh, and edit the colorteller-route. Update the HTTP route so colorteller-blue-vn handles 33.3% of the traffic and colorteller-green-vn now handles 66.7%.

Run your simple verification test again:

$ curl $colorapp/color/clear
cleared
fargate $ for ((n=0;n<200;n++)); do echo "$n: $(curl -s $colorapp/color)"; done
0: {"color":"green", "stats": {"green":1}}
1: {"color":"blue", "stats": {"blue":0.5,"green":0.5}}
2: {"color":"green", "stats": {"blue":0.33,"green":0.67}}
3: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
4: {"color":"green", "stats": {"blue":0.2,"green":0.8}}
5: {"color":"green", "stats": {"blue":0.17,"green":0.83}}
6: {"color":"blue", "stats": {"blue":0.29,"green":0.71}}
7: {"color":"green", "stats": {"blue":0.25,"green":0.75}}
...
199: {"color":"green", "stats": {"blue":0.32,"green":0.68}}
$

If your results look good, double-check your result in the X-Ray console.

Finally, shift 100% of your traffic over to the new colorteller version using the same App Mesh console. This time, modify the mesh configuration template and redeploy it:

appmesh-colorteller.yaml
  ColorTellerRoute:
    Type: AWS::AppMesh::Route
    DependsOn:
      - ColorTellerVirtualRouter
      - ColorTellerGreenVirtualNode
    Properties:
      MeshName: !Ref AppMeshMeshName
      VirtualRouterName: colorteller-vr
      RouteName: colorteller-route
      Spec:
        HttpRoute:
          Action:
            WeightedTargets:
              - VirtualNode: colorteller-green-vn
                Weight: 1
          Match:
            Prefix: "/"
$ ./appmesh-colorapp.sh
...
Waiting for changeset to be created..
Waiting for stack create/update to complete
...
Successfully created/updated stack - DEMO-appmesh-colorapp
$

Again, repeat your verification process in both the CLI and X-Ray to confirm that the new version of your service is running successfully.

 

Conclusion

In this walkthrough, I showed you how to roll out an update from version 1 (blue) of the colorteller service to version 2 (green). I demonstrated that App Mesh supports a mesh spanning ECS services that you ran as EC2 tasks and as Fargate tasks.

In my next walkthrough, I will demonstrate that App Mesh handles even uncontainerized services launched directly on EC2 instances. It provides a uniform and powerful way to control and monitor your distributed microservice applications on AWS.

If you have any questions or feedback, feel free to comment below.

Deploying an Nginx-based HTTP/HTTPS load balancer with Amazon Lightsail

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/compute/deploying-an-nginx-based-http-https-load-balancer-with-amazon-lightsail/

In this post, I discuss how to configure a load balancer to route web traffic for Amazon Lightsail using NGINX. I define load balancers and explain their value. Then, I briefly weigh the pros and cons of self-hosted load balancers against Lightsail’s managed load balancer service. Finally, I cover how to set up a NGINX-based load balancer inside of a Lightsail instance.

If you feel like you already understand what load balancers are, and the pros and cons of self-hosted load balancers vs. managed services, feel free to skip ahead to the deployment section.

What is a load balancer?

Although load balancers offer many different functionalities, for the sake of this discussion, I focus on one main task: A load balancer accepts your users’ traffic and routes it to the right server.

For example, if I assign the load balancer the DNS name: www.example.com, anyone visiting the site first encounters the load balancer. The load balancer then routes the request to one of the backend servers: web-1, web-2, or web-3. The backend servers then respond to the requestor.

A load balancer provides multiple benefits to your application or website. Here are a few key advantages:

  • Redundancy
  • Publicly available IP addresses
  • Horizontally scaled application capacity

Redundancy

Load balancers usually front at least two backend servers. Using something called a health check, they ensure that these servers are online and available to service user requests. If a backend server goes out of service, the load balancer stops routing traffic to that instance. By running multiple servers, you help to ensure that at least one is always available to respond to incoming traffic. As a result, your users aren’t bogged down by errors from a downed machine.

Publicly available IP addresses

Without a load balancer, a server requires a unique IP address to accept an incoming request via the internet. There are a finite number of these IP addresses, and most cloud providers limit the number of statically assigned public IP addresses that you can have.

By using a load balancer, you can task a single public IP address with servicing multiple backend servers. Later in this post, I return to this topic as I discuss configuring a load balancer.

Horizontally scaled application capacity

As your application or website becomes more popular, its performance may degrade. Adding additional capacity can be as easy as spinning up a new instance and placing it behind your load balancer. If demand drops, you can spin down any unneeded instances to save money.

Horizontal scaling means the deployment of additional identically configured servers to handle increased load. Vertical scaling means the deployment of a more powerful server to handle increased load. If you deploy an underpowered server, expect poor performance, whether you have a single server or ten.

Self-managed load balancer vs. a managed service

Now that you have a better understanding of load balancers and the key benefits that they provide, the next question is: How can you get one into your environment?

On one hand, you could spin up a Lightsail load balancer. These load balancers are all managed by AWS and don’t require any patching or maintenance on your part to stay up-to-date. You only need to name your load balancer and pick instances to service. Your load balancer is then up and running. If you’re so inclined, you can also get a free SSL (secure socket layer) certificate with a few extra clicks.

Lightsail load balancers deploy easily and require essentially no maintenance after they’re operational, for $18 per month (at publication time). Lightsail load balancer design prioritizes easy installation and maintenance. As a result, they lack some advanced configuration settings found with other models.

Consequently, you might prefer to configure your load balancer if you prioritize configuration flexibility or cost reduction. A self-hosted load balancer provides access to many advanced features, and your only hard cost is the instance price.

The downsides of self-hosting are that you are also responsible for the following:

  • Installing the load balancer software.
  • Keeping the software (and the host operating system) updated and secure.

Deploying a NGINX-based load balancer with Lightsail

Although many software-based load balancers are available, I recommend building a solution on NGINX because this wildly popular tool:

  • Is open source/free.
  • Offers great community support.
  • Has a custom Lightsail blueprint.

Prerequisites

This tutorial assumes that you already have your backend servers deployed. These servers should all:

  • Be identically configured.
  • Point to a central backend database.

In other words, the target servers shouldn’t each have database copies installed.

To deploy a centralized database, see Lightsail’s managed database offering.

Because I’ve tailored these instructions to generic website and web app hosting, they may not work with specific applications such as WordPress.

Required prerequisites

Before installing an optional SSL certificate, you need to have the following:

  • A purchased domain name.
  • Permissions to update the DNS servers for that domain.

Optional prerequisites

Although not required, the following prerequisites may also be helpful:

  • Familiarity with accessing instances via SSH.
  • Using basic LINUX commands. 

Deploy an NGINX instance

To begin, deploy an NGINX instance in Lightsail, choosing the NGINX blueprint. Make sure that you are deploying it into the same Region as the servers to load balance.

Choose an appropriate instance size for your application, being aware of the amount of RAM and CPU, as well as the data transfer. If you choose an undersized instance, you can always scale it up via a snapshot. However, an oversized instance cannot as easily be scaled down. You may need to rebuild the load balancer on a smaller-sized instance from scratch.

Configure HTTP load balancing

In the following steps, edit the NGINX configuration file to load balance HTTP requests to the appropriate servers.

First, start up an SSH session with your new NGINX instance and change into the appropriate configuration directory:

cd /opt/bitnami/nginx/conf/bitnami/

Make sure that you have the IP addresses for the servers to load balance. In most cases, traffic shouldn’t flow from your load balancer to your instances across the internet. So, make sure to use the instance’s private IP address. You can find this information on the instance management page, in the Lightsail console.

In this example, my three servers have the following private IP addresses:

  • 192.0.2.67
  • 192.0.2.206
  • 192.0.2.198

The configuration file to edit is named bitnami.conf. Open it using your preferred text editor (use sudo to edit the file):

sudo vi bitnami.conf

Clear the contents of the file and add the following code, making sure to substitute the private IP addresses of the servers to load balance:

# Define Pool of servers to load balance upstream webservers { 
server 192.0.2.67 max_fails=3 fail_timeout=30s; 
server 192.0.2.206 max_fails=3 fail_timeout=30s;
server 192.0.2.198 max_fails=3 fail_timeout=30s;
}

In the code, you used the keyword upstream to define a pool (named webservers) of three servers to which NGINX should route traffic. If you don’t specify how NGINX should route each request, it defaults to round-robin server routing. Two other routing methods are available:

  • Least connected, which routes to the server with the fewest number of active connections.
  • IP hash, which uses a hashing function to enable sticky sessions (otherwise called session persistence).

Discussion on these methods is out of scope for this post. For more information, see Using nginx as HTTP load balancer.

Additionally, I recommend max_fails and fail_timeout to define health checks. Based on the configuration above, NGINX marks a server as down if it fails to respond or responds with an error three times in 30 seconds. If a server is marked down, NGINX continues to probe every 30 seconds. If it receives a positive response, it marks the server as live.

After the code you just inserted to the file, add the following:

# Forward traffic on port 80 to one of the servers in the webservers group server {
listen 80; location / {
   proxy_pass http://webservers;
   }
}

This code tells NGINX to listen for requests on port 80, the default port for unencrypted web (HTTP) traffic and forward such requests to one of the servers in the webservers group defined by the upstream keyword.

Save the file and quit back to your command prompt.

For the changes to take effect, restart the NGINX service using the Bitnami control script:

sudo /opt/bitnami/ctlscript.sh restart nginx

At this point, you should be able to visit the IP address of your NGINX instance in your web browser. The load balancer then routes the request to one of the servers defined in your webservers group.

For reference, here’s the full bitnami.conf file.

# Define Pool of servers to load balance
upstream webservers {
server 192.0.2.67 max_fails=3 fail_timeout=30s;
server 192.0.2.206 max_fails=3 fail_timeout=30s;
server 192.0.2.198 max_fails=3 fail_timeout=30s;
}
# Forward traffic on port 80 to one of the servers in the webservers group server {
listen 80; location / {
proxy_pass http://webservers;
}
}

Configure HTTPS load balancing

Configuring your load balancer to use SSL requires three steps:

  1. Ensure that you have a domain record for your NGINX load balancer instance.
  2. Obtain and install a certificate.
  3. Update the NGINX configuration file.

If you have not already done so, assign your NGINX instance an entry with your DNS provider. Remember, the load balancer is the address your users use to reach your site. For instance, it might be appropriate to create a record that points http://www.yourdomain.com/ at your NGINX load balancer. If you need help configuring the DNS in Lightsail, see DNS in Amazon Lightsail.

Similarly, to configure your NGINX instance to use a free SSL certificate from Let’s Encrypt, follow steps 1–7 in Tutorial: Using Let’s Encrypt SSL certificates with your Nginx instance in Amazon Lightsail. You handle step 8 later in this post,

After you configure NGINX to use the SSL certificate and update your DNS, you must modify the configuration file to allow for HTTPS traffic.

Again, use a text editor to open the bitnami.conf file:

sudo vi bitnami.conf

Add the following code to the bottom of the file:

server {
     listen 443 ssl;
     location / {
          proxy_pass http://webservers;
     }
     ssl_certificate server.crt;
     ssl_certificate_key server.key;
     ssl_session_cache shared:SSL:1m;
     ssl_session_timeout 5m;
     ssl_ciphers HIGH:!aNULL:!MD5;
     ssl_prefer_server_ciphers on;
}

This code closely resembles the HTTP code added previously. However, in this case, the code tells NGINX to accept SSL connections on the secure port 443 (HTTPS) and forward them to one of your web servers. The rest of the commands instruct NGINX on where to locate SSL certificates, as well as setting various SSL parameters.

Here again, restart the NGINX service:

sudo /opt/bitnami/ctlscript.sh restart nginx

Optional steps

At this point, you should be able to access your website using both HTTP and HTTPS. However, there are a couple of optional steps to consider, including:

  • Shutting off direct HTTP/HTTPS access to your web servers.
  • Automatically redirecting incoming load balancer HTTP requests to HTTPS.

It’s probably not a great idea to allow people to access your load-balanced servers directly. Fortunately, you can easily restrict access:

  1. Navigate to each instance’s management page in the Lightsail console.
  2. Choose Networking.
  3. Remove the HTTP and HTTPS (if enabled) firewall port entries.

This restriction shuts down access via the internet while still allowing communications between the load balancer and the servers over the private AWS network.

In many cases, there’s no good reason to allow access to a website or web app over unencrypted HTTP. However, the load balancer configuration described to this point still accepts HTTP requests. To automatically reroute requests from HTTP to HTTPS, make one small change to the configuration file:

  1. Edit the conf file.
  2. Find this code:
server {
listen 80; location / {
proxy_pass http://webservers;
}
}
  1. Replace it with this code:
server {
listen 80;
return 301 https://$host$request_uri;
}

The replacement code instructs NGINX to respond to HTTP requests with a “page has been permanently redirected” message and a citation of the new page address. The new address is simply requested one, only accessed over HTTPS instead of HTTP.

For this change to take effect, you must restart NGINX:

sudo /opt/bitnami/ctlscript.sh restart nginx

For reference, this is what the final bitnami.conf file looks like:

# Define the pool of servers to load balance
upstream webservers {
server 192.0.2.67 max_fails=3 fail_timeout=30s;
server 192.0.2.206 max_fails=3 fail_timeout=30s;
server 192.0.2.198 max_fails=3 fail_timeout=30s;
}
# Redirect traffic on port 80 to use HTTPS
server {
listen 80;
return 301 https://$host$request_uri;
}
# Forward traffic on port 443 to one of the servers in the web servers group
server {
     listen 443 ssl;
     location / {
          proxy_pass http://webservers;
     }
     ssl_certificate server.crt;
     ssl_certificate_key server.key;
     ssl_session_cache shared:SSL:1m;
     ssl_session_timeout 5m;
     ssl_ciphers HIGH:!aNULL:!MD5;
     ssl_prefer_server_ciphers on;
}

Conclusion

This post explained how to configure a load balancer to route web traffic for Amazon Lightsail using NGINX. I defined load balancers and their utility. I weighed the pros and cons of self-hosted load balancers against Lightsail’s managed load balancer service. Finally, I walked you through how to set up a NGINX-based load balancer inside of a Lightsail instance.

Thanks for reading this post. If you have any questions, feel free to contact me on Twitter, @mikegcoleman or visit the Amazon Lightsail forums.

Scaling Kubernetes deployments with Amazon CloudWatch metrics

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/

This post is contributed by Kwunhok Chan | Solutions Architect, AWS

 

In an earlier post, AWS introduced Horizontal Pod Autoscaler and Kubernetes Metrics Server support for Amazon Elastic Kubernetes Service. These tools make it easy to scale your Kubernetes workloads managed by EKS in response to built-in metrics like CPU and memory.

However, one common use case for applications running on EKS is the integration with AWS services. For example, you administer an application that processes messages published to an Amazon SQS queue. You want the application to scale according to the number of messages in that queue. The Amazon CloudWatch Metrics Adapter for Kubernetes (k8s-cloudwatch-adapter) helps.

 

Amazon CloudWatch Metrics Adapter for Kubernetes

The k8s-cloudwatch-adapter is an implementation of the Kubernetes Custom Metrics API and External Metrics API with integration for CloudWatch metrics. It allows you to scale your Kubernetes deployment using the Horizontal Pod Autoscaler (HPA) with CloudWatch metrics.

 

Prerequisites

Before starting, you need the following:

 

Getting started

Before using the k8s-cloudwatch-adapter, set up a way to manage IAM credentials to Kubernetes pods. The CloudWatch Metrics Adapter requires the following permissions to access metric data from CloudWatch:

cloudwatch:GetMetricData

Create an IAM policy with the following template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

For demo purposes, I’m granting admin permissions to my Kubernetes worker nodes. Don’t do this in your production environment. To associate IAM roles to your Kubernetes pods, you may want to look at kube2iam or kiam.

If you’re using an EKS cluster, you most likely provisioned it with AWS CloudFormation. The following command uses AWS CloudFormation stacks to update the proper instance policy with the correct permissions:

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
--role-name $(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[0].Parameters[?ParameterKey==`NodeInstanceRoleName`].ParameterValue' | jq -r ".[0]")

 

Make sure to replace ${STACK_NAME} with the nodegroup stack name from the AWS CloudFormation console .

 

You can now deploy the k8s-cloudwatch-adapter to your Kubernetes cluster.

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml

 

This deployment creates a new namespace—custom-metrics—and deploys the necessary ClusterRole, Service Account, and Role Binding values, along with the deployment of the adapter. Use the created custom resource definition (CRD) to define the configuration for the external metrics to retrieve from CloudWatch. The adapter reads the configuration defined in ExternalMetric CRDs and loads its external metrics. That allows you to use HPA to autoscale your Kubernetes pods.

 

Verifying the deployment

Next, query the metrics APIs to see if the adapter is deployed correctly. Run the following command:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
  ]
}

There are no resources from the response because you haven’t registered any metric resources yet.

 

Deploying an Amazon SQS application

Next, deploy a sample SQS application to test out k8s-cloudwatch-adapter. The SQS producer and consumer are provided, together with the YAML files for deploying the consumer, metric configuration, and HPA.

Both the producer and consumer use an SQS queue named helloworld. If it doesn’t exist already, the producer creates this queue.

Deploy the consumer with the following command:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/consumer-deployment.yaml

 

You can verify that the consumer is running with the following command:

$ kubectl get deploy sqs-consumer
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
sqs-consumer   1         1         1            0           5s

 

Set up Amazon CloudWatch metric and HPA

Next, create an ExternalMetric resource for the CloudWatch metric. Take note of the Kind value for this resource. This CRD resource tells the adapter how to retrieve metric data from CloudWatch.

You define the query parameters used to retrieve the ApproximateNumberOfMessagesVisible for an SQS queue named helloworld. For details about how metric data queries work, see CloudWatch GetMetricData API.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric:
  metadata:
    name: hello-queue-length
  spec:
    name: hello-queue-length
    resource:
      resource: "deployment"
      queries:
        - id: sqs_helloworld
          metricStat:
            metric:
              namespace: "AWS/SQS"
              metricName: "ApproximateNumberOfMessagesVisible"
              dimensions:
                - name: QueueName
                  value: "helloworld"
            period: 300
            stat: Average
            unit: Count
          returnData: true

 

Create the ExternalMetric resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/externalmetric.yaml

 

Then, set up the HPA for your consumer. Here is the configuration to use:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: hello-queue-length
      targetValue: 30

 

This HPA rule starts scaling out when the number of messages visible in your SQS queue exceeds 30, and scales in when there are fewer than 30 messages in the queue.

Create the HPA resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/hpa.yaml

 

Generate load using a producer

Finally, you can start generating messages to the queue:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/producer-deployment.yaml

On a separate terminal, you can now watch your HPA retrieving the queue length and start scaling the replicas. SQS metrics generate at five-minute intervals, so give the process a few minutes:

$ kubectl get hpa sqs-consumer-scaler -w

 

Clean up

After you complete this experiment, you can delete the Kubernetes deployment and respective resources.

Run the following commands to remove the consumer, external metric, HPA, and SQS queue:

$ kubectl delete deploy sqs-producer
$ kubectl delete hpa sqs-consumer-scaler
$ kubectl delete externalmetric sqs-helloworld-length
$ kubectl delete deploy sqs-consumer

$ aws sqs delete-queue helloworld

 

Other CloudWatch integrations

AWS recently announced the preview for Amazon CloudWatch Container Insights, which monitors, isolates, and diagnoses containerized applications running on EKS and Kubernetes clusters. To get started, see Using Container Insights.

 

Get involved

This project is currently under development. AWS welcomes issues and pull requests, and would love to hear your feedback.

How could this adapter be best implemented to work in your environment? Visit the Amazon CloudWatch Metrics Adapter for Kubernetes project on GitHub and let AWS know what you think.

Access Private applications on AWS Fargate using Amazon API Gateway PrivateLink

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/access-private-applications-on-aws-fargate-using-amazon-api-gateway-privatelink/

This post is contributed by Mani Chandrasekaran | Solutions Architect, AWS

 

Customers would like to run container-based applications in a private subnet inside a virtual private cloud (VPC), where there is no direct connectivity from the outside world to these applications. This is a very secure way of running applications which do not want to be directly exposed to the internet.

AWS Fargate is a compute engine for Amazon ECS that enables you to run containers without having to manage servers or clusters. With AWS Fargate with Amazon ECS, you don’t have to provision, configure, and scale clusters of virtual machines to run containers.

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. The API Gateway private integration makes it simple to expose your HTTP and HTTPS resources behind a virtual private cloud (VPC) with Amazon VPC private endpoints. This allows access by clients outside of the VPC without exposing the resources to the internet.

This post shows how API Gateway can be used to expose an application running on Fargate in a private subnet in a VPC using API Gateway private integration through AWS PrivateLink. With the API Gateway private integration, you can enable access to HTTP and HTTPS resources in a VPC without detailed knowledge of private network configurations or technology-specific appliances.

 

Architecture

You deploy a simple NGINX application running on Fargate within a private subnet as a first step, and then expose this NGINX application to the internet using the API.

As shown in the architecture in the following diagram, you create a VPC with two private subnets and two public subnets. To enable the Fargate tasks to download Docker images from Amazon ECR, you deploy two network address translation (NAT) gateways in the public subnets.

You also deploy a container application, NGINX, as an ECS service with one or more Fargate tasks running inside the private subnets. You provision an internal Network Load Balancer in the VPC private subnets and target the ECS service running as Fargate tasks. This is provisioned using an AWS CloudFormation template (link provided later in this post).

The integration between API Gateway and the Network Load Balancer inside the private subnet uses an API Gateway VpcLink resource. The VpcLink encapsulates connections between the API and targeted VPC resources when the application is hosted on Fargate. You set up an API with the private integration by creating a VpcLink that targets the Network Load Balancer and then uses the VpcLink as an integration endpoint .

 

 

Deployment

Here are the steps to deploy this solution:

  1. Deploy an application on Fargate.
  2. Set up an API Gateway private integration.
  3. Deploy and test the API.
  4. Clean up resources to avoid incurring future charges.

 

Step 1 — Deploy an application on AWS Fargate
I’ve created an AWS CloudFormation template to make it easier for you to get started.

  1. Get the AWS CloudFormation template.
  2. In the AWS Management Console, deploy the CloudFormation template in an AWS Region where Fargate and API Gateway are available.
  3. On the Create stack page, specify the parameters specific to your environment. Or, use the default parameters, which deploy an NGINX Docker image as a Fargate task in an ECS cluster across two Availability Zones.

When the process is finished, the status changes to CREATE_COMPLETE and the details of the Network Load Balancer, VPC, subnets, and ECS cluster name appear on the Outputs tab.

 

Step 2 — Set up an API Gateway Private Integration
Next, set up an API Gateway API with private integrations using the AWS CLI and specify the AWS Region in all the AWS CLI commands.

1. Create a VPCLink in API Gateway with the ARN of the Network Load Balancer that you provisioned. Make sure that you specify the correct endpoint URL and Region based on the AWS Region that you selected for the CloudFormation template. Run the following command:

aws apigateway create-vpc-link \
--name fargate-nlb-private-link \
--target-arns arn:aws:elasticloadbalancing:ap-south-1:xxx:loadbalancer/net/Farga-Netwo-XX/xx \
--endpoint-url https://apigateway.ap-south-1.amazonaws.com \
--region ap-south-1

The command immediately returns the following response, acknowledges the receipt of the request, and shows the PENDING status for the new VpcLink:

{
    "id": "alnXXYY",
    "name": "fargate-nlb-private-link",
    "targetArns": [
        " arn:aws:elasticloadbalancing:ap-south-1:xxx:loadbalancer/net/Farga-Netwo-XX/xx"
    ],
    "status": "PENDING"
}

It takes 2–4 minutes for API Gateway to create the VpcLink. When the operation finishes successfully, the status changes to AVAILABLE.

 

2. To verify that the VpcLink was successfully created, run the following command:

aws apigateway get-vpc-link --vpc-link-id alnXXYY --region ap-south-1

When the VpcLink status is AVAILABLE, you can create the API and integrate it with the VPC resource through the VpcLink.

 

3. To set up an API, run the following command to create an API Gateway RestApi resource

aws apigateway create-rest-api --name 'API Gateway VPC Link NLB Fargate Test' --region ap-south-1

{
    "id": "qc83xxxx",
    "name": "API Gateway VPC Link NLB Fargate Test",
    "createdDate": 1547703133,
    "apiKeySource": "HEADER",
    "endpointConfiguration": {
        "types": [
            "EDGE"
        ]
    }
}

Find the ID value of the RestApi in the returned result. In this example, it is qc83xxxx. Use this ID to finish the operations on the API, including methods and integrations setup.

 

4. In this example, you create an API with only a GET method on the root resource (/) and integrate the method with the VpcLink.

Set up the GET / method. First, get the identifier of the root resource (/):

aws apigateway get-resources --rest-api-id qc83xxxx --region ap-south-1

In the output, find the ID value of the / path. In this example, it is mq165xxxx.

 

5. Set up the method request for the API method of GET /:

aws apigateway put-method \
       --rest-api-id qc83xxxx \
       --resource-id mq165xxxx \
       --http-method GET \
       --authorization-type "NONE" --region ap-south-1

6. Set up the private integration of the HTTP_PROXY type and call the put-integration command:

aws apigateway put-integration \
--rest-api-id qc83xxxx \
--resource-id mq165xxxx \
--uri 'http://myApi.example.com' \
--http-method GET \
--type HTTP_PROXY \
--integration-http-method GET \
--connection-type VPC_LINK \
--connection-id alnXXYY --region ap-south-1

For a private integration, you must set connection-type to VPC_LINK and set connection-id to the VpcLink identifier, alnXXYY in this example. The URI parameter is not used to route requests to your endpoint, but is used to set the host header and for certificate validation.

 

Step 3 — Deploy and test the API

To test the API, run the following command to deploy the API:

aws apigateway create-deployment \
--rest-api-id qc83xxxx \
--stage-name test \
--variables vpcLinkId= alnXXYY --region ap-south-1

Test the APIs with tools such as Postman or the curl command. To call a deployed API, you must submit requests to the URL for the API Gateway component service for API execution, known as execute-api.

The base URL for REST APIs is in this format:

https://{restapi_id}.execute-api.{region}.amazonaws.com/{stage_name}/

Replace {restapi_id} with the API identifier, {region} with the Region, and {stage_name} with the stage name of the API deployment.

To test the API with curl, run the following command:

curl -X GET https://qc83xxxx.execute-api.ap-south-1.amazonaws.com/test/

The curl response should be the NGINX home page.

To test the API with Postman, place the Invoke URL into Postman and choose GET as the method. Choose Send.

The returned result (the NGINX home page) appears.

For more information, see Use Postman to Call a REST API.

 

Step 4 — Clean up resources

After you finish your deployment test, make sure to delete the following resources to avoid incurring future charges.

1. Delete the REST API created in the API Gateway and Amazon VPC endpoint services using the console.
Or, in the AWS CLI, run the following command:

aws apigateway delete-rest-api --rest-api-id qc83xxxx --region ap-south-1

aws apigateway delete-vpc-link --vpc-link-id alnXXYY --region ap-south-1

2. To delete the Fargate-related resources created in CloudFormation, in the console, choose Delete Stack.

 

Conclusion

API Gateway private endpoints enable use cases for building private API–based services running on Fargate inside your own VPCs. You can take advantage of advanced features of API Gateway, such as custom authorizers, Amazon Cognito User Pools integration, usage tiers, throttling, deployment canaries, and API keys. At the same time, you can make sure the APIs or applications running in Fargate are not exposed to the internet.

Now Available: New C5 instance sizes and bare metal instances

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/now-available-new-c5-instance-sizes-and-bare-metal-instances/

Amazon EC2 C5 instances are very popular for running compute-heavy workloads like batch processing, distributed analytics, high-performance computing, machine/deep learning inference, ad serving, highly scalable multiplayer gaming, and video encoding.

Today, we are happy to expand the Amazon EC2 C5 family with:

  • New larger virtualized instance sizes: 12xlarge and 24xlarge,
  • A bare metal option.

The new C5 instance sizes run on Intel’s Second Generation Xeon Scalable processors (code-named Cascade Lake) with sustained all-core turbo frequency of 3.6GHz and maximum single core turbo frequency of 3.9GHz.

The new processors also enable a new feature called Intel Deep Learning Boost, a capability based on the AVX-512 instruction set. Thanks to the new Vector Neural Network Instructions (AVX-512 VNNI), deep learning frameworks will speed up typical machine learning operations like convolution, and automatically improve inference performance over a wide range of workloads.

These instances are also based on the AWS Nitro System, with dedicated hardware accelerators for EBS processing (including crypto operations), the software-defined network inside of each Virtual Private Cloud (VPC), and ENA networking.

New C5 instance sizes: 12xlarge and 24xlarge

Previously, the largest C5 instance available was C5.18xlarge, with 72 logical processors and 144 GiB of memory. As you can see, the new 24xlarge size increases available resources by 33%, in order to scale up and reduce the time required to compute intensive tasks.

Instance NameLogical ProcessorsMemoryEBS-Optimized BandwidthNetwork Bandwidth
c5.12xlarge4896 GiB7 Gbps12 Gbps
c5.24xlarge96192 GiB14 Gbps25 Gbps

Bare metal C5

Just like for existing bare metal instances (M5, M5d, R5, R5d, z1d, and so forth), your operating system runs directly on the underlying hardware with direct access to the processor.

As described in a previous blog post, you can leverage bare metal instances for applications that:

  • do not want to take the performance hit of nested virtualization,
  • need access to physical resources and low-level hardware features, such as performance counters and Intel VT that are not always available or fully supported in virtualized environments,
  • are intended to run directly on the hardware, or licensed and supported for use in non-virtualized environments.

Bare metal instances can also take advantage of Elastic Load Balancing, Auto Scaling, Amazon CloudWatch, and other AWS services.

Instance NameLogical ProcessorsMemoryEBS-Optimized BandwidthNetwork Bandwidth
c5.metal96192 GiB14 Gbps25 Gbps

Now Available!

You can start using these new instances today in the following regions: US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (London), Europe (Frankfurt), Europe (Stockholm), Europe (Paris), Asia Pacific (Singapore), Asia Pacific (Sydney), and AWS GovCloud (US-West).

Please send us feedback and help us build the next generation of compute-optimized instances.

Julien;

Securing credentials using AWS Secrets Manager with AWS Fargate

Post Syndicated from Anuneet Kumar original https://aws.amazon.com/blogs/compute/securing-credentials-using-aws-secrets-manager-with-aws-fargate/

This post is contributed by Massimo Re Ferre – Principal Developer Advocate, AWS Container Services.

Cloud security at AWS is the highest priority and the work that the Containers team is doing is a testament to that. A month ago, the team introduced an integration between AWS Secrets Manager and AWS Systems Manager Parameter Store with AWS Fargate tasks. Now, Fargate customers can easily consume secrets securely and parameters transparently from their own task definitions.

In this post, I show you an example of how to use Secrets Manager and Fargate integration to ensure that your secrets are never exposed in the wild.

Overview

AWS has engineered Fargate to be highly secure, with multiple, important security measures. One of these measures is ensuring that each Fargate task has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with other tasks.

Another area of security focus is the Amazon VPC networking integration, which ensures that tasks can be protected the way that an Amazon EC2 instance can be protected from a networking perspective.

This specific announcement, however, is important in the context of our shared responsibility model. For example, DevOps teams building and running solutions on the AWS platform require proper tooling and functionalities to securely manage secrets, passwords, and sensitive parameters at runtime in their application code. Our job is to empower them with platform capabilities to do exactly that and make it as easy as possible.

Sometimes, in a rush to get things out the door quick, we have seen some users trading off some security aspects for agility, from embedding AWS credentials in source code pushed to public repositories all the way to embedding passwords in clear text in privately stored configuration files. We have solved this problem for developers consuming various AWS services by letting them assign IAM roles to Fargate tasks so that their AWS credentials are transparently handled.

This was useful for consuming native AWS services, but what about accessing services and applications that are outside of the scope of IAM roles and IAM policies? Often, the burden of having to deal with these credentials is pushed onto the developers and AWS users in general. It doesn’t have to be this way. Enter the Secrets Manager and Fargate integration!

Starting with Fargate platform version 1.3.0 and later, it is now possible for you to instruct Fargate tasks to securely grab secrets from Secrets Manager so that these secrets are never exposed in the wild—not even in private configuration files.

In addition, this frees you from the burden of having to implement the undifferentiated heavy lifting of securing these secrets. As a bonus, because Secrets Manager supports secrets rotation, you also gain an additional level of security with no additional effort.

Twitter matcher example

In this example, you create a Fargate task that reads a stream of data from Twitter, matches a particular pattern in the messages, and records some information about the tweet in a DynamoDB table.

To do this, use a Python Twitter library called Tweepy to read the stream from Twitter and the AWS Boto 3 Python library to write to Amazon DynamoDB.

The following diagram shows the high-level flow:

The objective of this example is to show a simple use case where you could use IAM roles assigned to tasks to consume AWS services (such as DynamoDB). It also includes consuming external services (such as Twitter), for which explicit non-AWS credentials need to be stored securely.

This is what happens when you launch the Fargate task:

  • The task starts and inherits the task execution role (1) and the task role (2) from IAM.
  • It queries Secrets Manager (3) using the credentials inherited by the task execution role to retrieve the Twitter credentials and pass them onto the task as variables.
  • It reads the stream from Twitter (4) using the credentials that are stored in Secrets Manager.
  • It matches the stream with a configurable pattern and writes to the DynamoDB table (5) using the credentials inherited by the task role.
  • It matches the stream with a configurable pattern and writes to the DynamoDB table (5) and logs to CloudWatch (6) using the credentials inherited by the task role.

As a side note, while for this specific example I use Twitter as an external service that requires sensitive credentials, any external service that has some form of authentication using passwords or keys is acceptable. Modify the Python script as needed to capture relevant data from your own service to write to the DynamoDB table.

Here are the solution steps:

  • Create the Python script
  • Create the Dockerfile
  • Build the container image
  • Create the image repository
  • Create the DynamoDB table
  • Store the credentials securely
  • Create the IAM roles and IAM policies for the Fargate task
  • Create the Fargate task
  • Clean up

Prerequisites

To be able to execute this exercise, you need an environment configured with the following dependencies:

You can also skip this configuration part and launch an AWS Cloud9 instance.

For the purpose of this example, I am working with the AWS CLI, configured to work with the us-west-2 Region. You can opt to work in a different Region. Make sure that the code examples in this post are modified accordingly.

In addition to the list of AWS prerequisites, you need a Twitter developer account. From there, create an application and use the credentials provided that allow you to connect to the Twitter APIs. We will use them later in the blog post when we will add them to AWS Secrets Manager.

Note: many of the commands suggested in this blog post use $REGION and $AWSACCOUNT in them. You can either set environmental variables that point to the region you want to deploy to and to your own account or you can replace those in the command itself with the region and account number. Also, there are some configuration files (json) that use the same patterns; for those the easiest option is to replace the $REGION and $AWSACCOUNT placeholders with the actual region and account number.

Create the Python script

This script is based on the Tweepy streaming example. I modified the script to include the Boto 3 library and instructions that write data to a DynamoDB table. In addition, the script prints the same data to standard output (to be captured in the container log).

This is the Python script:

from __future__ import absolute_import, print_function from tweepy.streaming import StreamListener from tweepy import OAuthHandler from tweepy import Stream import json import boto3 import os

# DynamoDB table name and Region dynamoDBTable=os.environ['DYNAMODBTABLE'] region_name=os.environ['AWSREGION'] # Filter variable (the word for which to filter in your stream) filter=os.environ['FILTER'] # Go to http://apps.twitter.com and create an app. # The consumer key and secret are generated for you after consumer_key=os.environ['CONSUMERKEY'] consumer_secret=os.environ['CONSUMERSECRETKEY'] # After the step above, you are redirected to your app page. # Create an access token under the "Your access token" section access_token=os.environ['ACCESSTOKEN'] access_token_secret=os.environ['ACCESSTOKENSECRET'] class StdOutListener(StreamListener): """ A listener handles tweets that are received from the stream. This is a basic listener that prints received tweets to stdout. """ def on_data(self, data): j = json.loads(data) tweetuser = j['user']['screen_name'] tweetdate = j['created_at'] tweettext = j['text'].encode('ascii', 'ignore').decode('ascii') print(tweetuser) print(tweetdate) print(tweettext) dynamodb = boto3.client('dynamodb',region_name) dynamodb.put_item(TableName=dynamoDBTable, Item={'user':{'S':tweetuser},'date':{'S':tweetdate},'text':{'S':tweettext}}) return True def on_error(self, status): print(status) if __name__ == '__main__': l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, l) stream.filter(track=[filter]) 

Save this file in a directory and call it twitterstream.py.

This image requires seven parameters, which are clearly visible at the beginning of the script as system variables:

  • The name of the DynamoDB table
  • The Region where you are operating
  • The word or pattern for which to filter
  • The four keys to use to connect to the Twitter API services. Later, I explore how to pass these variables to the container, keeping in mind that some are more sensitive than others.

Create the Dockerfile

Now onto building the actual Docker image. To do that, create a Dockerfile that contains these instructions:

FROM amazonlinux:2
RUN yum install shadow-utils.x86_64 -y
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python get-pip.py
RUN pip install tweepy
RUN pip install boto3
COPY twitterstream.py .
RUN groupadd -r twitterstream && useradd -r -g twitterstream twitterstream
USER twitterstream
CMD ["python", "-u", "twitterstream.py"]

Save it as Dockerfile in the same directory with the twitterstream.py file.

Build the container image

Next, create the container image that you later instantiate as a Fargate task. Build the container image running the following command in the same directory:

docker build -t twitterstream:latest .

Don’t overlook the period (.) at the end of the command: it tells Docker to find the Dockerfile in the current directory.

You now have a local Docker image that, after being properly parameterized, can eventually read from the Twitter APIs and save data in a DynamoDB table.

Create the image repository

Now, store this image in a proper container registry. Create an Amazon ECR repository with the following command:

aws ecr create-repository --repository-name twitterstream --region $REGION

You should see something like the following code example as a result:

{
"repository": {
"registryId": "012345678910",
"repositoryName": "twitterstream",
"repositoryArn": "arn:aws:ecr:us-west-2:012345678910:repository/twitterstream",
"createdAt": 1554473020.0,
"repositoryUri": "012345678910.dkr.ecr.us-west-2.amazonaws.com/twitterstream"
}
}

Tag the local image with the following command:

docker tag twitterstream:latest $AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest

Make sure that you refer to the proper repository by using your AWS account ID and the Region to which you are deploying.

Grab an authorization token from AWS STS:

$(aws ecr get-login --no-include-email --region $REGION)

Now, push the local image to the ECR repository that you just created:

docker push $AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest

You should see something similar to the following result:

The push refers to repository [012345678910.dkr.ecr.us-west-2.amazonaws.com/twitterstream]
435b608431c6: Pushed
86ced7241182: Pushed
e76351c39944: Pushed
e29c13e097a8: Pushed
e55573178275: Pushed
1c729a602f80: Pushed
latest: digest: sha256:010c2446dc40ef2deaedb3f344f12cd916ba0e96877f59029d047417d6cb1f95 size: 1582

Now the image is safely stored in its ECR repository.

Create the DynamoDB table

Now turn to the backend DynamoDB table. This is where you store the extract of the Twitter stream being generated. Specifically, you store the user that published the Tweet, the date when the Tweet was published, and the text of the Tweet.

For the purpose of this example, create a table called twitterStream. This can be customized as one of the parameters that you have to pass to the Fargate task.

Run this command to create the table:

aws dynamodb create-table --region $REGION --table-name twitterStream \
                          --attribute-definitions AttributeName=user,AttributeType=S AttributeName=date,AttributeType=S \
                          --key-schema AttributeName=user,KeyType=HASH AttributeName=date,KeyType=RANGE \
                          --billing-mode PAY_PER_REQUEST

Store the credentials securely

As I hinted earlier, the Python script requires the Fargate task to pass some information as variables. You pass the table name, the Region, and the text to filter as standard task variables. Because this is not sensitive information, it can be shared without raising any concern.

However, other configurations are sensitive and should not be passed over in plaintext, like the Twitter API key. For this reason, use Secrets Manager to store that sensitive information and then read them within the Fargate task securely. This is what the newly announced integration between Fargate and Secrets Manager allows you to accomplish.

You can use the Secrets Manager console or the CLI to store sensitive data.

If you opt to use the console, choose other types of secrets. Under Plaintext, enter your consumer key. Under Select the encryption key, choose DefaultEncryptionKey, as shown in the following screenshot. For more information, see Creating a Basic Secret.

For this example, however, it is easier to use the AWS CLI to create the four secrets required. Run the following commands, but customize them with your own Twitter credentials:

aws secretsmanager create-secret --region $REGION --name CONSUMERKEY \
    --description "Twitter API Consumer Key" \
    --secret-string <your consumer key here> 
aws secretsmanager create-secret --region $REGION --name CONSUMERSECRETKEY \
    --description "Twitter API Consumer Secret Key" \
    --secret-string <your consumer secret key here> 
aws secretsmanager create-secret --region $REGION --name ACCESSTOKEN \
    --description "Twitter API Access Token" \
    --secret-string <your access token here> 
aws secretsmanager create-secret --region $REGION --name ACCESSTOKENSECRET \
    --description "Twitter API Access Token Secret" \
    --secret-string <your access token secret here> 

Each of those commands reports a message confirming that the secret has been created:

{
"VersionId": "7d950825-7aea-42c5-83bb-0c9b36555dbb",
"Name": "CONSUMERSECRETKEY",
"ARN": "arn:aws:secretsmanager:us-west-2:01234567890:secret:CONSUMERSECRETKEY-5D0YUM"
}

From now on, these four API keys no longer appear in any configuration.

The following screenshot shows the console after the commands have been executed:

Create the IAM roles and IAM policies for the Fargate task

To run the Python code properly, your Fargate task must have some specific capabilities. The Fargate task must be able to do the following:

  1. Pull the twitterstream container image (created earlier) from ECR.
  2. Retrieve the Twitter credentials (securely stored earlier) from Secrets Manager.
  3. Log in to a specific Amazon CloudWatch log group (logging is optional but a best practice).
  4. Write to the DynamoDB table (created earlier).

The first three capabilities should be attached to the ECS task execution role. The fourth should be attached to the ECS task role. For more information, see Amazon ECS Task Execution IAM Role.

In other words, the capabilities that are associated with the ECS agent and container instance need to be configured in the ECS task execution role. Capabilities that must be available from within the task itself are configured in the ECS task role.

First, create the two IAM roles that are eventually attached to the Fargate task.

Create a file called ecs-task-role-trust-policy.json with the following content (make sure you replace the $REGION, $AWSACCOUNT placeholders as well as the proper secrets ARNs):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now, run the following commands to create the twitterstream-task-role role, as well as the twitterstream-task-execution-role:

aws iam create-role --region $REGION --role-name twitterstream-task-role --assume-role-policy-document file://ecs-task-role-trust-policy.json

aws iam create-role --region $REGION --role-name twitterstream-task-execution-role --assume-role-policy-document file://ecs-task-role-trust-policy.json

Next, create a JSON file that codifies the capabilities required for the ECS task role (twitterstream-task-role):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:$REGION:$AWSACCOUNT:table/twitterStream"
            ]
        }
    ]
}

Save the file as twitterstream-iam-policy-task-role.json.

Now, create a JSON file that codifies the capabilities required for the ECS task execution role (twitterstream-task-execution-role):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue",
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERSECRETKEY-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKEN-XXXXXX",
                "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKENSECRET-XXXXXX"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Save the file as twitterstream-iam-policy-task-execution-role.json.

The following two commands create IAM policy documents and associate them with the IAM roles that you created earlier:

aws iam put-role-policy --region $REGION --role-name twitterstream-task-role --policy-name twitterstream-iam-policy-task-role --policy-document file://twitterstream-iam-policy-task-role.json

aws iam put-role-policy --region $REGION --role-name twitterstream-task-execution-role --policy-name twitterstream-iam-policy-task-execution-role --policy-document file://twitterstream-iam-policy-task-execution-role.json

Create the Fargate task

Now it’s time to tie everything together. As a recap, so far you have:

  • Created the container image that contains your Python code.
  • Created the DynamoDB table where the code is going to save the extract from the Twitter stream.
  • Securely stored the Twitter API credentials in Secrets Manager.
  • Created IAM roles with specific IAM policies that can write to DynamoDB and read from Secrets Manager (among other things).

Now you can tie everything together by creating a Fargate task that executes the container image. To do so, create a file called twitterstream-task.json and populate it with the following configuration:

{
    "family": "twitterstream", 
    "networkMode": "awsvpc", 
    "executionRoleArn": "arn:aws:iam::$AWSACCOUNT:role/twitterstream-task-execution-role",
    "taskRoleArn": "arn:aws:iam::$AWSACCOUNT:role/twitterstream-task-role",
    "containerDefinitions": [
        {
            "name": "twitterstream", 
            "image": "$AWSACCOUNT.dkr.ecr.$REGION.amazonaws.com/twitterstream:latest", 
            "essential": true,
            "environment": [
                {
                    "name": "DYNAMODBTABLE",
                    "value": "twitterStream"
                },
                {
                    "name": "AWSREGION",
                    "value": "$REGION"
                },                
                {
                    "name": "FILTER",
                    "value": "Cloud Computing"
                }
            ],    
            "secrets": [
                {
                    "name": "CONSUMERKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX"
                },
                {
                    "name": "CONSUMERSECRETKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERSECRETKEY-XXXXXX"
                },
                {
                    "name": "ACCESSTOKEN",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKEN-XXXXXX"
                },
                {
                    "name": "ACCESSTOKENSECRET",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:ACCESSTOKENSECRET-XXXXXX"
                }
            ],
            "logConfiguration": {
                    "logDriver": "awslogs",
                    "options": {
                            "awslogs-group": "twitterstream",
                            "awslogs-region": "$REGION",
                            "awslogs-stream-prefix": "twitterstream"
                    }
            }
        }
    ], 
    "requiresCompatibilities": [
        "FARGATE"
    ], 
    "cpu": "256", 
    "memory": "512"
}

To tweak the search string, change the value of the FILTER variable (currently set to “Cloud Computing”).

The Twitter API credentials are never exposed in clear text in these configuration files. There is only a reference to the Amazon Resource Names (ARNs) of the secret names. For example, this is the system variable CONSUMERKEY in the Fargate task configuration:

"secrets": [
                {
                    "name": "CONSUMERKEY",
                    "valueFrom": "arn:aws:secretsmanager:$REGION:$AWSACCOUNT:secret:CONSUMERKEY-XXXXXX"
                }

This directive asks the ECS agent running on the Fargate instance (that has assumed the specified IAM execution role) to do the following:

  • Connect to Secrets Manager.
  • Get the secret securely.
  • Assign its value to the CONSUMERKEY system variable to be made available to the Fargate task.

Register this task by running the following command:

aws ecs register-task-definition --region $REGION --cli-input-json file://twitterstream-task.json

In preparation to run the task, create the CloudWatch log group with the following command:

aws logs create-log-group --log-group-name twitterstream --region $REGION

If you don’t create the log group upfront, the task fails to start.

Create the ECS cluster

The last step before launching the Fargate task is creating an ECS cluster. An ECS cluster has two distinct dimensions:

  • The EC2 dimension, where the compute capacity is managed by the customer as ECS container instances)
  • The Fargate dimension, where the compute capacity is managed transparently by AWS.

For this example, you use the Fargate dimension, so you are essentially using the ECS cluster as a logical namespace.

Run the following command to create a cluster called twitterstream_cluster (change the name as needed). If you have a default cluster already created in your Region of choice, you can use that, too.

aws ecs create-cluster --cluster-name "twitterstream_cluster" --region $REGION

Now launch the task in the ECS cluster just created (in the us-west-2 Region) with a Fargate launch type. Run the following command:

aws ecs run-task --region $REGION \
  --cluster "twitterstream_cluster" \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=["subnet-6a88e013","subnet-6a88e013"],securityGroups=["sg-7b45660a"],assignPublicIp=ENABLED}" \
  --task-definition twitterstream:1

A few things to pay attention to with this command:

  • If you created more than one revision of the task (by re-running the aws ecs register-task-definition command), make sure to run the aws ecs run-task command with the proper revision number at the end.
  • Customize the network section of the command for your own environment:
    • Use the default security group in your VPC, as the Fargate task only needs outbound connectivity.
    • Use two public subnets in which to start the Fargate task.

The Fargate task comes up in a few seconds and you can see it from the ECS console, as shown in the following screenshot:

Similarly, the DynamoDB table starts being populated with the information collected by the script running in the task, as shown in the following screenshot:

Finally, the Fargate task logs all the activities in the CloudWatch Log group, as shown in the following screenshot:

The log may take a few minutes to populate and be consolidated in CloudWatch.

Clean up

Now that you have completed the walkthrough, you can tear down all the resources that you created to avoid incurring future charges.

First, stop the ECS task that you started:

aws ecs stop-task --cluster twitterstream_cluster --region $REGION --task 4553111a-748e-4f6f-beb5-f95242235fb5

Your task number is different. You can grab it either from the ECS console or from the AWS CLI. This is how you read it from the AWS CLI:

aws ecs list-tasks --cluster twitterstream_cluster --family twitterstream --region $REGION  
{
"taskArns": [
"arn:aws:ecs:us-west-2:693935722839:task/4553111a-748e-4f6f-beb5-f95242235fb5 "
]
}

Then, delete the ECS cluster that you created:

aws ecs delete-cluster --cluster "twitterstream_cluster" --region $REGION

Next, delete the CloudWatch log group:

aws logs delete-log-group --log-group-name twitterstream --region $REGION

The console provides a fast workflow to delete the IAM roles. In the IAM console, choose Roles and filter your search for twitter. You should see the two roles that you created:

Select the two roles and choose Delete role.

Cleaning up the secrets created is straightforward. Run a delete-secret command for each one:

aws secretsmanager delete-secret --region $REGION --secret-id CONSUMERKEY
aws secretsmanager delete-secret --region $REGION --secret-id CONSUMERSECRETKEY
aws secretsmanager delete-secret --region $REGION --secret-id ACCESSTOKEN
aws secretsmanager delete-secret --region $REGION --secret-id ACCESSTOKENSECRET

The next step is to delete the DynamoDB table:

aws dynamodb delete-table --table-name twitterStream --region $REGION

The last step is to delete the ECR repository. By default, you cannot delete a repository that still has container images in it. To address that, add the –force directive:

aws ecr delete-repository --region $REGION --repository-name twitterstream --force

You can de-register the twitterstream task definition by following this procedure in the ECS console. The task definitions remain inactive but visible in the system.

With this, you have deleted all the resources that you created.

Conclusion

In this post, I demonstrated how Fargate can interact with Secrets Manager to retrieve sensitive data (for example, Twitter API credentials). You can securely make the sensitive data available to the code running in the container inside the Fargate task.

I also demonstrated how a Fargate task with a specific IAM role can access other AWS services (for example, DynamoDB).

 

Improving and securing your game-binaries distribution at scale

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/improving-and-securing-your-game-binaries-distribution-at-scale/

This post is contributed by Yahav Biran | Sr. Solutions Architect, AWS and Scott Selinger | Associate Solutions Architect, AWS 

One of the challenges that game publishers face when employing CI/CD processes is the distribution of updated game binaries in a scalable, secure, and cost-effective way. Continuous integration and continuous deployment (CI/CD) processes enable game publishers to improve games throughout their lifecycle.

Often, CI/CD jobs contain minor changes that cause the CI/CD processes to push a full set of game binaries over the internet. This is a suboptimal approach. It negatively affects the cost of development network resources, customer network resources (output and input bandwidth), and the time it takes for a game update to propagate.

This post proposes a method of optimizing the game integration and deployments. Specifically, this method improves the distribution of updated game binaries to various targets, such as game-server farms. The proposed mechanism also adds to the security model designed to include progressive layers, starting from the Amazon EC2 instance that runs the game server. It also improves security of the game binaries, the game assets, and the monitoring of the game server deployments across several AWS Regions.

Why CI/CD in gaming is hard today

Game server binaries are usually a native application that includes binaries like graphic, sound, network, and physics assets, as well as scripts and media files. Game servers are usually developed with game engines like Unreal, Amazon Lumberyard, and Unity. Game binaries typically take up tens of gigabytes. However, because game developer teams modify only a few tens of kilobytes every day, frequent distribution of a full set of binaries is wasteful.

For a standard global game deployment, distributing game binaries requires compressing the entire binaries set and transferring the compressed version to destinations, then decompressing it upon arrival. You can optimize the process by decoupling the various layers, pushing and deploying them individually.

In both cases, the continuous deployment process might be slow due to the compression and transfer durations. Also, distributing the image binaries incurs unnecessary data transfer costs, since data is duplicated. Other game-binary distribution methods may require the game publisher’s DevOps teams to install and maintain custom caching mechanisms.

This post demonstrates an optimal method for distributing game server updates. The solution uses containerized images stored in Amazon ECR and deployed using Amazon ECS or Amazon EKS to shorten the distribution duration and reduce network usage.

How can containers help?

Dockerized game binaries enable standard caching with no implementation from the game publisher. Dockerized game binaries allow game publishers to stage their continuous build process in two ways:

  • To rebuild only the layer that was updated in a particular build process and uses the other cached layers.
  • To reassemble both packages into a deployable game server.

The use of ECR with either ECS or EKS takes care of the last mile deployment to the Docker container host.

Larger application binaries mean longer application loading times. To reduce the overall application initialization time, I decouple the deployment of the binaries and media files to allow the application to update faster. For example, updates in the application media files do not require the replication of the engine binaries or media files. This is achievable if the application binaries can be deployed in a separate directory structure. For example:

/opt/local/engine

/opt/local/engine-media

/opt/local/app

/opt/local/app-media

Containerized game servers deployment on EKS

The application server can be deployed as a single Kubernetes pod with multiple containers. The engine media (/opt/local/engine-media), the application (/opt/local/app), and the application media (/opt/local/app-media) spawn as Kubernetes initContainers and the engine binary (/opt/local/engine) runs as the main container.

apiVersion: v1
kind: Pod
metadata:
  name: my-game-app-pod
  labels:
    app: my-game-app
volumes:
      - name: engine-media-volume
          emptyDir: {}
      - name: app-volume
          emptyDir: {}
      - name: app-media-volume
          emptyDir: {}
      initContainers:
        - name: app
          image: the-app- image
          imagePullPolicy: Always
          command:
            - "sh"
            - "-c"
            - "cp /* /opt/local/engine-media"
          volumeMounts:
            - name: engine-media-volume
              mountPath: /opt/local/engine-media
        - name: engine-media
          image: the-engine-media-image
          imagePullPolicy: Always
          command:
            - "sh"
            - "-c"
            - "cp /* /opt/local/app"
          volumeMounts:
            - name: app-volume
              mountPath: /opt/local/app
        - name: app-media
          image: the-app-media-image
          imagePullPolicy: Always
          command:
            - "sh"
            - "-c"
            - "cp /* /opt/local/app-media"
          volumeMounts:
            - name: app-media-volume
              mountPath: /opt/local/app-media
spec:
  containers:
  - name: the-engine
    image: the-engine-image
    imagePullPolicy: Always
    volumeMounts:
       - name: engine-media-volume
         mountPath: /opt/local/engine-media
       - name: app-volume
         mountPath: /opt/local/app
       - name: app-media-volume
         mountPath: /opt/local/app-media
    command: ['sh', '-c', '/opt/local/engine/start.sh']

Applying multi-stage game binaries builds

In this post, I use Docker multi-stage builds for containerizing the game asset builds. I use AWS CodeBuild to manage the build and to deploy the updates of game engines like Amazon Lumberyard as ready-to-play dedicated game servers.

Using this method, frequent changes in the game binaries require less than 1% of the data transfer typically required by full image replication to the nodes that run the game-server instances. This results in significant improvements in build and integration time.

I provide a deployment example for Amazon Lumberyard Multiplayer Sample that is deployed to an EKS cluster, but this can also be done using different container orchestration technology and different game engines. I also show that the image being deployed as a game-server instance is always the latest image, which allows centralized control of the code to be scheduled upon distribution.

This example shows an update of only 50 MB of game assets, whereas the full game-server binary is 3.1 GB. With only 1.5% of the content being updated, that speeds up the build process by 90% compared to non-containerized game binaries.

For security with EKS, apply the imagePullPolicy: Always option as part of the Kubernetes best practice container images deployment option. This option ensures that the latest image is pulled every time that the pod is started, thus deploying images from a single source in ECR, in this case.

Example setup

  • Read through the following sample, a multiplayer game sample, and see how to build and structure multiplayer games to employ the various features of the GridMate networking library.
  • Create an AWS CodeCommit or GitHub repository (multiplayersample-lmbr) that includes the game engine binaries, the game assets (.pak, .cfg and more), AWS CodeBuild specs, and EKS deployment specs.
  • Create a CodeBuild project that points to the CodeCommit repo. The build image uses aws/codebuild/docker:18.09.0: the built-in image maintained by CodeBuild configured with 3 GB of memory and two vCPUs. The compute allocated for build capacity can be modified for cost and build time tradeoff.
  • Create an EKS cluster designated as a staging or an integration environment for the game title. In this case, it’s multiplayersample.

The binaries build Git repository

The Git repository is composed of five core components ordered by their size:

  • The game engine binaries (for example, BinLinux64.Dedicated.tar.gz). This is the compressed version of the game engine artifacts that are not updated regularly, hence they are deployed as a compressed file. The maintenance of this file is usually done by a different team than the developers working on the game title.
  • The game binaries (for example, MultiplayerSample_pc_Paks_Dedicated). This directory is maintained by the game development team and managed as a standard multi-branch repository. The artifacts under this directory get updated on a daily or weekly basis, depending on the game development plan.
  • The build-related specifications (for example, buildspec.yml  and Dockerfile). These files specify the build process. For simplicity, I only included the Docker build process to convey the speed of continuous integration. The process can be easily extended to include the game compilation and linked process as well.
  • The Docker artifacts for containerizing the game engine and the game binaries (for example, start.sh and start.py). These scripts usually are maintained by the game DevOps teams and updated outside of the regular game development plan. More details about these scripts can be found in a sample that describes how to deploy a game-server in Amazon EKS.
  • The deployment specifications (for example, eks-spec) specify the Kubernetes game-server deployment specs. This is for reference only, since the CD process usually runs in a separate set of resources like staging EKS clusters, which are owned and maintained by a different team.

The game build process

The build process starts with any Git push event on the Git repository. The build process includes three core phases denoted by pre_build, buildand post_build in multiplayersample-lmbr/buildspec.yml

  1. The pre_build phase unzips the game-engine binaries and logs in to the container registry (Amazon ECR) to prepare.
  2. The buildphase executes the docker build command that includes the multi-stage build.
    • The Dockerfile spec file describes the multi-stage image build process. It starts by adding the game-engine binaries to the Linux OS, ubuntu:18.04 in this example.
    • FROM ubuntu:18.04
    • ADD BinLinux64.Dedicated.tar /
    • It continues by adding the necessary packages to the game server (for example, ec2-metadata, boto3, libc, and Python) and the necessary scripts for controlling the game server runtime in EKS. These packages are only required for the CI/CD process. Therefore, they are only added in the CI/CD process. This enables a clean decoupling between the necessary packages for development, integration, and deployment, and simplifies the process for both teams.
    • RUN apt-get install -y python python-pip
    • RUN apt-get install -y net-tools vim
    • RUN apt-get install -y libc++-dev
    • RUN pip install mcstatus ec2-metadata boto3
    • ADD start.sh /start.sh
    • ADD start.py /start.py
    • The second part is to copy the game engine from the previous stage --from=0 to the next build stage. In this case, you copy the game engine binaries with the two COPY Docker directives.
    • COPY --from=0 /BinLinux64.Dedicated/* /BinLinux64.Dedicated/
    • COPY --from=0 /BinLinux64.Dedicated/qtlibs /BinLinux64.Dedicated/qtlibs/
    • Finally, the game binaries are added as a separate layer on top of the game-engine layers, which concludes the build. It’s expected that constant daily changes are made to this layer, which is why it is packaged separately. If your game includes other abstractions, you can break this step into several discrete Docker image layers.
    • ADD MultiplayerSample_pc_Paks_Dedicated /BinLinux64.Dedicated/
  3. The post_build phase pushes the game Docker image to the centralized container registry for further deployment to the various regional EKS clusters. In this phase, tag and push the new image to the designated container registry in ECR.

- docker tag $IMAGE_REPO_NAME:$IMAGE_TAG

$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG

docker push

$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG

The game deployment process in EKS

At this point, you’ve pushed the updated image to the designated container registry in ECR (/$IMAGE_REPO_NAME:$IMAGE_TAG). This image is scheduled as a game server in an EKS cluster as game-server Kubernetes deployment, as described in the sample.

In this example, I use  imagePullPolicy: Always.


containers:
…
        image: /$IMAGE_REPO_NAME:$IMAGE_TAG/multiplayersample-build
        imagePullPolicy: Always
        name: multiplayersample
…

By using imagePullPolicy, you ensure that no one can circumvent Amazon ECR security. You can securely make ECR the single source of truth with regards to scheduled binaries. However, ECR to the worker nodes via kubelet, the node agent. Given the size of a whole image combined with the frequency with which it is pulled, that would amount to a significant additional cost to your project.

However, Docker layers allow you to update only the layers that were modified, preventing a whole image update. Also, they enable secure image distribution. In this example, only the layer MultiplayerSample_pc_Paks_Dedicated is updated.

Proposed CI/CD process

The following diagram shows an example end-to-end architecture of a full-scale game-server deployment using EKS as the orchestration system, ECR as the container registry, and CodeBuild as the build engine.

Game developers merge changes to the Git repository that include both the preconfigured game-engine binaries and the game artifacts. Upon merge events, CodeBuild builds a multistage game-server image that is pushed to a centralized container registry hosted by ECR. At this point, DevOps teams in different Regions continuously schedule the image as a game server, pulling only the updated layer in the game server image. This keeps the entire game-server fleet running the same game binaries set, making for a secure deployment.

 

Try it out

I published two examples to guide you through the process of building an Amazon EKS cluster and deploying a containerized game server with large binaries.

Conclusion

Adopting CI/CD in game development improves the software development lifecycle by continuously deploying quality-based updated game binaries. CI/CD in game development is usually hindered by the cost of distributing large binaries, in particular, by cross-regional deployments.

Non-containerized paradigms require deployment of the full set of binaries, which is an expensive and time-consuming task. Containerized game-server binaries with AWS build tools and Amazon EKS-based regional clusters of game servers enable secure and cost-effective distribution of large binary sets to enable increased agility in today’s game development.

In this post, I demonstrated a reduction of more than 90% of the network traffic required by implementing an effective CI/CD system in a large-scale deployment of multiplayer game servers.

Running the most reliable choice for Windows workloads: Windows on AWS

Post Syndicated from Sandy Carter original https://aws.amazon.com/blogs/compute/running-the-most-reliable-choice-for-windows-workloads-windows-on-aws/

Some of you may not know, but AWS began supporting Microsoft Windows workloads on AWS in 2008—over 11 years ago. Year over year, we have released exciting new services and enhancements based on feedback from customers like you. AWS License Manager and Amazon CloudWatch Application Insights for .NET and SQL Server are just some of the recent examples. The rate and pace of innovation is eye-popping.

In addition to innovation, one of the key areas that companies value is the reliability of the cloud platform. I recently chatted with David Sheehan, DevOps engineer at eMarketer. He told me, “Our move from Azure to AWS improved the performance and reliability of our microservices in addition to significant cost savings.” If a healthcare clinic can’t connect to the internet, then it’s possible that they can’t deliver care to their patients. If a bank can’t process transactions because of an outage, they could lose business.

In 2018, the next-largest cloud provider had almost 7x more downtime hours than AWS per data pulled directly from the public service health dashboards of the major cloud providers. It is the reason companies like Edwards Lifesciences chose AWS. They are a global leader in patient-focused medical innovations for structural heart disease, as well as critical care and surgical monitoring. Rajeev Bhardwaj, the senior director for Enterprise Technology, recently told me, “We chose AWS for our data center workloads, including Windows, based on our assessment of the security, reliability, and performance of the platform.”

There are several reasons as to why AWS delivers a more reliable platform for Microsoft workloads but I would like to focus on two here: designing for reliability and scaling within a Region.

Reason #1—It’s designed for reliability

AWS has significantly better reliability than the next largest cloud provider, due to our fundamentally better global infrastructure design based on Regions and Availability Zones. The AWS Cloud spans 64 zones within 21 geographic Regions around the world. We’ve announced plans for 12 more zones and four more Regions in Bahrain, Cape Town, Jakarta, and Milan.

Look at networking capabilities across five key areas: security, global coverage, performance, manageability, and availability. AWS has made deep investments in each of these areas over the past 12 years. We want to ensure that AWS has the networking capabilities required to run the world’s most demanding workloads.

There is no compression algorithm for experience. From running the most extensive, reliable, and secure global cloud infrastructure technology platform, we’ve learned that you care about the availability and performance of your applications. You want to deploy applications across multiple zones in the same Region for fault tolerance and latency.

I want to take a moment to emphasize that our approach to building our network is fundamentally different from our competitors, and that difference matters. Each of our Regions is fully isolated from all other Regions. Unlike virtually every other cloud provider, each AWS Region has multiple zones and data centers. These zones are a fully isolated partition of our infrastructure that contains up to eight separate data centers.

The zones are connected to each other with fast, private fiber-optic networking, enabling you to easily architect applications that automatically fail over between zones without interruption. With their own power infrastructure, the zones are physically separated by a meaningful distance, many kilometers, from any other zone. You can partition applications across multiple zones in the same Region to better isolate any issues and achieve high availability.

The AWS control plane (including APIs) and AWS Management Console are distributed across AWS Regions. They use a Multi-AZ architecture within each Region to deliver resilience and ensure continuous availability. This ensures that you avoid having a critical service dependency on a single data center.

While other cloud vendors claim to have Availability Zones, they do not have the same stringent requirements for isolation between zones, leading to impact across multiple zones. Furthermore, AWS has more zones and more Regions with support for multiple zones than any other cloud provider. This design is why the next largest cloud provider had almost 7x more downtime hours in 2018 than AWS.

Reason #2—Scale within a Region

We also designed our services into smaller cells that scale out within a Region, as opposed to a single-Region instance that scales up. This approach reduces the blast radius when there is a cell-level failure. It is why AWS—unlike other providers—has never experienced a network event spanning multiple Regions.

AWS also provides the most detailed information on service availability via the Service Health Dashboard, including Regions affected, services impacted, and downtime duration. AWS keeps a running log of all service interruptions for the past year. Finally, you can subscribe to an RSS feed to be notified of interruptions to each individual service.

Reliability matters

Running Windows workloads on AWS means that you not only get the most innovative cloud, but you also have the most reliable cloud as well.

For example, Mary Kay is one of the world’s leading direct sellers of skin care products and cosmetics. They have tens of thousands of employees and beauty consultants working outside the office, so the IT system is fundamental for the success of their company.

Mary Kay used Availability Zones and Microsoft Active Directory to architect their applications on AWS. AWS Microsoft Managed AD provides Mary Kay the features that enabled them to deploy SQL Server Always On availability groups on Amazon EC2 Windows. This configuration gave Mary Kay the control to scale their deployment out to meet their performance requirements. They were able to deploy the service in multiple Regions to support users worldwide. Their on-premises users get the same experience when using Active Directory–aware services, either on-premises or in the AWS Cloud.

Now, with our cross-account and cross-VPC support, Mary Kay is looking at reducing their managed Active Directory infrastructure footprint, saving money and reducing complexity. But this identity management system must be reliable and scalable as well as innovative.

Fugro is a Dutch multinational public company headquartered in the Netherlands. They provide geotechnical, survey, subsea, and geoscience services for clients, typically oil and gas, telecommunications cable, and infrastructure companies. Fugro leverages the cloud to support the delivery of geo-intelligence and asset management services for clients globally in industries including onshore and offshore energy, renewables, power, and construction.

As I was chatting with Scott Carpenter, the global cloud architect for Fugro, he said, “Fugro is also now in the process of migrating a complex ESRI ArcGIS environment from an existing cloud provider to AWS. It is going to centralize and accelerate access from existing AWS hosted datasets, while still providing flexibility and interoperability to external and third-party data sources. The ArcGIS migration is driven by a focus on providing the highest level of operational excellence.”

With AWS, you don’t have to be concerned about reliability. AWS has the reliability and scale that drives innovation for Windows applications running in the cloud. And the same reliability that makes it best for your Windows applications is the same reliability that makes AWS the best cloud for all your applications.

Let AWS help you assess how your company can get the most out of cloud. Join all the AWS customers that trust us to run their most important applications in the best cloud. To have us create an assessment for your Windows applications or all your applications, email us at [email protected].

Optimizing Network Intensive Workloads on Amazon EC2 A1 Instances

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/optimizing-network-intensive-workloads-on-amazon-ec2-a1-instances/

This post courtesy of Ali Saidi, AWS, Principal Engineer

At re:Invent 2018, AWS announced the Amazon EC2 A1 instance. The A1 instances are powered by our internally developed Arm-based AWS Graviton processors and are up to 45% less expensive than other instance types with the same number of vCPUs and DRAM. These instances are based on the AWS Nitro System, and offer enhanced-networking of up to 10 Gbps with Elastic Network Adapters (ENA).

One of the use cases for the A1 instance is key-value stores and in this post, we describe how to get the most performance from the A1 instance running memcached. Some simple configuration options increase the performance of memcached by 3.9X over the out-of-the-box experience as we’ll show below. Although we focus on memcached, the configuration advice is similar for any network intensive workload running on A1 instances. Typically, the performance of network intensive workloads will improve by tuning some of these parameters, however depending on the particular data rates and processing requirements the values below could change.

irqbalance

Most Linux distributions enable irqbalance by default which load-balance interrupts to different CPUs during runtime. It does a good job to balance interrupt load, but in some cases, we can do better by pinning interrupts to specific CPUs. For our optimizations we’re going to temporarily disable irqbalance, however, if this is a production configuration that needs to survive a server reboot, irqbalance would need to be permanently disabled and the changes below would need to be added to the boot sequence.

Receive Packet Steering (RPS)

RPS controls which CPUs process packets are received by the Linux networking stack (softIRQs). Depending on instance size and the amount of application processing needed per packet, sometimes the optimal configuration is to have the core receiving packets also execute the Linux networking stack, other times it’s better to spread the processing among a set of cores. For memcached on EC2 A1 instances, we found that using RPS to spread the load out is helpful on the larger instance sizes.

Networking Queues

A1 instances with medium, large, and xlarge instance sizes have a single queue to send and receive packets while 2xlarge and 4xlarge instance sizes have two queues. On the single queue droplets, we’ll pin the IRQ to core 0, while on the dual-queue droplets we’ll use either core 0 or core 0 and core 8.

Instance TypeIRQ settingsRPS settingsApplication settings
a1.xlargeCore 0Core 0Run on cores 1-3
a1.2xlargeBoth on core 0Core 0-3, 4-7Run on core 1-7
a1.4xlargeCore 0 and core 8Core 0-7, 8-15Run on cores 1-7 and 9-15

 

 

 

 

 

The following script sets up the Linux kernel parameters:

#!/bin/bash 

sudo systemctl stop irqbalance.service
set_irq_affinity() {
  grep eth0 /proc/interrupts | awk '{print $1}' | tr -d : | while read IRQ; 
do
    sudo sh -c "echo $1 > /proc/irq/$IRQ/smp_affinity_list"
    shift
  done
}
 
case `grep ^processor /proc/cpuinfo  | wc -l ` in
  (4) sudo sh -c 'echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0
      ;;
  (8) sudo sh -c 'echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      sudo sh -c 'echo f0 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0 0
      ;;
  (16) sudo sh -c 'echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      sudo sh -c 'echo ff00 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0 08
      ;;
  *)  echo "Script only supports 4, 8, 16 cores on A1 instances"
      exit 1;
      ;;
esac

Summary

Some simple tuning parameters can significantly improve the performance of network intensive workloads on the A1 instance. With these changes we get 3.9X the performance on an a1.4xlarge and the other two instance sizes see similar improvements. While the particular values listed here aren’t applicable to all network intensive benchmarks, this article demonstrates the methodology and provides a starting point to tune the system and balance the load across CPUs to improve performance. If you have questions about your own workload running on A1 instances, please don’t hesitate to get in touch with us at [email protected] .

Fact-checking the truth on TCO for running Windows workloads in the cloud

Post Syndicated from Sandy Carter original https://aws.amazon.com/blogs/compute/fact-checking-the-truth-on-tco-for-running-windows-workloads-in-the-cloud/

We’ve been talking to many customers over the last 3–4 months who are concerned about the total cost of ownership (TCO) for running Microsoft Windows workloads in the cloud.

For example, Infor is a global leader in enterprise resource planning (ERP) for manufacturing, healthcare, and retail. They’ve moved thousands of their existing Microsoft SQL Server workloads to Amazon EC2 instances. As a result, they are saving 75% on monthly backup costs. With these tremendous cost savings, Infor can now focus their resources on exponential business growth, with initiatives around AI and optimization.

We also love the story of Just Eat, a UK-based company that has migrated their SQL Server workloads to AWS. They’re now focused on using that data to train Alexa skills for ordering take out!

Here are three fact checks that you should review to ensure that you are getting the best TCO!

Fact check #1: Microsoft’s cost comparisons are misleading for running Windows workloads in the cloud

Customers have shared with us over and over why they continue to trust AWS to run their most important Windows workloads. Still, some of those customers tell us that Microsoft claims Azure is cheaper for running Windows workloads. But can this really be true?

When looking at Microsoft’s cost comparisons, we can see that their analysis is misleading because of some false assumptions. For example, Microsoft only compares the costs of the compute service and licenses. But every workload needs storage and networking! By leaving out these necessary services, Microsoft is not comparing real-world workloads.

The comparison also assumes that the AWS and Azure offerings are at a performance parity, which isn’t true. While the comparison uses equivalent virtual instance configuration, Microsoft ignores the significantly higher performance of AWS compute. We hear that customers must run between two to three times as many Azure instances to get the same performance as they do on AWS (see Fact check #2).

And the list goes on. Microsoft’s analysis only looks at 2008 versions of Windows Server or SQL Server. Then, it adds in the cost for expensive Extended Support to the AWS calculation (extended support costs 75% of the current license cost per year). This addition makes up more than half of the claimed cost difference.

Microsoft assumes that in the next three years, customers won’t move off software that’s more than 10 years old. What we hear from customers is that they plan to use their upgrade rights from Software Assurance (SA) to move to newer versions, such as SQL Server 2016. They’ll use our new automated upgrade tool to eliminate the need for these expensive fees.

Finally, the comparison assumes the use of Azure Hybrid Benefit to reduce the cost of the Azure virtual instance. It does not factor in the cost of the required Microsoft SA on each license. The required SA adds significant cost to the Microsoft side of the example and further demonstrates that their example was misleading.

These assumptions result in a comparison that does not factor in all the costs needed to run SQL Server in Azure. Nor does it account for the performance gains that you get from running on AWS.

At AWS, we are committed to helping you make the most efficient use of cloud resources and lower your Microsoft bill. It appears that Microsoft is focused on keeping those line items flat or growing them over time by adding more and more licensing complexity.

Fact check #2: Price-performance matters to your business for running SQL Server in the cloud

When deciding what cloud is best for your Windows workloads, you should consider both price and performance to find the right operational combination to maximize value for your business. It is also important to think about the future and not make important platform decisions based on technology that was designed before the rise of the cloud.

We know that getting better application performance for your apps is critical to your customers’ satisfaction. In fact, excellent application performance leads to 39% higher customer satisfaction. For more information, see the Netmagic Solutions whitepaper, Application Performance Management: How End-User Experience Affects Your Bottom Line. Poor performance may lead to damaged reputations or even worse, customer attrition.

To make sure that you have the best possible experience for your customers, we focused on pushing the boundaries around performance.

With that in mind, here are some comparisons done between Azure and AWS:

  • DB Best, an enterprise database consulting company, wrote two blog posts—one each for Azure and AWS. They showed how to get the best price-performance ratio for running current versions of SQL Server in the cloud.
  • ZK Research took these posts and compared the results from DB Best to show an apples-to-apples comparison. The testing from DB Best found that SQL Server on AWS consistently shows a 2–3x better performance compared to Azure, using a TPC-C-like benchmark tool called HammerDB.
  • ZK Research then used the DB Best data to calculate the comparison cost for running 1 billion transactions per month. ZK Research found that SQL Server running on Azure would have twice the cost than when running on AWS, when comparing price-performance, including storage, compute, and networking.

As you can see from this data, running on AWS gives you the best price-performance ratio for Windows workloads.

Fact check #3: What does an optimized TCO for Windows workloads in the cloud look like?

When assessing which cloud to run your Windows workloads, your comparison must go well beyond just the compute and support costs. Look at the TCO of your workloads and include everything necessary to run and support these workloads, like storage, networking, and the cost benefits of better reliability. Then, see how you can use the cloud to lower your overall TCO.

So how do you lower your costs to run Windows workloads like Windows Server and SQL Server in the cloud? Optimize those workloads for the scalability and flexibility of cloud. When companies plan cloud migrations on their own, they often use a spreadsheet inventory of their on-premises servers and try to map them, one-to-one, to new cloud-based servers. But these inventories don’t account for the capabilities of cloud-based systems.

On-premises servers are not optimized, with 84% of workloads currently over-provisioned. Many Windows and SQL Server 2008 workloads are running on older, slower server hardware. By sizing your workloads for performance and capability, not by physical servers, you can optimize your cloud migration.

Reducing the number of licenses that you use, both by server and core counts, can also drive significant cost savings. See which on-premises workloads are fault-tolerant, and then use Amazon EC2 Spot Instances to save up to 90% on your compute costs vs. On-Demand pricing.

To get the most out of moving your Windows workloads into the cloud, review and optimize each workload to take best advantage of cloud scalability and flexibility. Our customers have made the most efficient use of cloud resources by working with assessment partners like Movere or TSO Logic, which is now part of AWS.

By running detailed assessments of their environments before migration, customers can yield up to 36% savings using AWS over three years. Customer with optimized environments often find that their AWS solutions are price-competitive with Microsoft even before taking in account the AWS price-performance advantage.

In addition, you can optimize utilization with AWS Trusted Advisor. In fact, over the last couple years, we’ve used AWS Trusted Advisor to tell customers how to spend less money with us, leading to hundreds of millions of dollars in savings for our customers every year.

Why run Windows Server and SQL Server anywhere else but AWS?

For the past 10 years, many companies, such as Adobe and Salesforce, have trusted AWS to run Windows-based workloads such as Windows Server and SQL Server. Many customers tell us the reasons they choose AWS is due to TCO and reliability. Customers have been able to run their Windows workloads with lower costs and higher performance than on any other cloud. To learn more about our story and why customers trust AWS for their Windows workloads, check out Windows on AWS.

After the workloads are optimized for cloud, you can save even more money by efficiently managing your Window Server and SQL Server licenses with AWS License Manager. By the way, License Manager lets you manage on-premises and in the cloud, as well as other software like SAP, Oracle, and IBM.

Dedicated hosts allow customers to bring Windows Server and SQL Server licenses with or without Software Assurance. Licenses without Software Assurance cannot be taken to Azure. Furthermore, Dedicated Hosts allow customers to license Windows Server at the physical level and achieve a greater number of instances at a lower price than they would get through Azure Hybrid Use Benefits.

Summary

The answer is clear: AWS is the best cloud to run your Windows workloads. AWS offers the best experience for Windows workloads in the cloud, which is why we run almost 2x the number of Windows workloads compared to the next largest cloud.

Our customers have found that migrating their Windows workloads to AWS can yield significant savings and better performance. Customers like Sysco, Hess, Sony DADC New Media Solutions, Ancestry, and Expedia have chosen AWS to upgrade, migrate, and modernize their Windows workloads in the cloud.

Don’t let misleading cost comparisons prevent you from getting the most out of cloud. Let AWS help you assess how you can get the most out of cloud. Join all the AWS customers that trust us to run their most important applications in the best cloud for Windows workloads. If you want us to do an assessment for you, email us at [email protected].

Docker, Amazon ECS, and Spot Fleets: A Great Fit Together

Post Syndicated from Tung Nguyen original https://aws.amazon.com/blogs/aws/docker-amazon-ecs-and-spot-fleets-a-great-fit-together/

Guest post by AWS Container Hero Tung Nguyen. Tung is the president and founder of BoltOps, a consulting company focused on cloud infrastructure and software on AWS. He also enjoys writing for the BoltOps Nuts and Bolts blog.

EC2 Spot Instances allow me to use spare compute capacity at a steep discount. Using Amazon ECS with Spot Instances is probably one of the best ways to run my workloads on AWS. By using Spot Instances, I can save 50–90% on Amazon EC2 instances. You would think that folks would jump at a huge opportunity like a black Friday sales special. However, most folks either seem to not know about Spot Instances or are hesitant. This may be due to some fallacies about Spot.

Spot Fallacies

With the Spot model, AWS can remove instances at any time. It can be due to a maintenance upgrade; high demand for that instance type; older instance type; or for any reason whatsoever.

Hence the first fear and fallacy that people quickly point out with Spot:

What do you mean that the instance can be replaced at any time? Oh no, that must mean that within 20 minutes of launching the instance, it gets killed.

I felt the same way too initially. The actual Spot Instance Advisor website states:

The average frequency of interruption across all Regions and instance types is less than 5%.

From my own usage, I have seen instances run for weeks. Need proof? Here’s a screenshot from an instance in one of our production clusters.

If you’re wondering how many days that is….

Yes, that is 228 continuous days. You might not get these same long uptimes, but it disproves the fallacy that Spot Instances are usually interrupted within 20 minutes from launch.

Spot Fleets

With Spot Instances, I place a single request for a specific instance in a specific Availability Zone. With Spot Fleets, instead of requesting a single instance type, I can ask for a variety of instance types that meet my requirements. For many workloads, as long as the CPU and RAM are close enough, many instance types do just fine.

So, I can spread my instance bets across instance types and multiple zones with Spot Fleets. Using Spot Fleets dramatically makes the system more robust on top of the already mentioned low interruption rate. Also, I can run an On-Demand cluster to provide additional safeguard capacity.

ECS and Spot Fleets: A Great Fit Together

This is one of my favorite ways to run workloads because it gives me a scalable system at a ridiculously low cost. The technologies are such a great fit together that one might think they were built for each other.

  1. Docker provides a consistent, standard binary format to deploy. If it works in one Docker environment, then it works in another. Containers can be pulled down in seconds, making them an excellent fit for Spot Instances, where containers might move around during an interruption.
  2. ECS provides a great ecosystem to run Docker containers. ECS supports a feature called connection instance draining that allows me to tell ECS to relocate the Docker containers to other EC2 instances.
  3. Spot Instances fire off a two-minute warning signal letting me know when it’s about to terminate the instance.

These are the necessary pieces I need for building an ECS cluster on top of Spot Fleet. I use the two-minute warning to call ECS connection draining, and ECS automatically moves containers to another instance in the fleet.

Here’s a CloudFormation template that achieves this: ecs-ec2-spot-fleet. Because the focus is on understanding Spot Fleets, the VPC is designed to be simple.

The template specifies two instance types in the Spot Fleet: t3.small and t3.medium with 2 GB and 4 GB of RAM, respectively. The template weights the t3.medium twice as much as the t3.small. Essentially, the Spot Fleet TargetCapacity value equals the total RAM to provision for the ECS cluster. So if I specify 8, the Spot Fleet service might provision four t3.small instances or two t3.medium instances. The cluster adds up to at least 8 GB of RAM.

To launch the stack run, I run the following command:

aws cloudformation create-stack --stack-name ecs-spot-demo --template-body file://ecs-spot-demo.yml --capabilities CAPABILITY_IAM

The CloudFormation stack launches container instances and registers them to an ECS cluster named developmentby default. I can change this with the EcsCluster parameter. For more information on the parameters, see the README and the template source.

When I deploy the application, the deploy tool creates the ECS cluster itself. Here are the Spot Instances in the EC2 console.

Deploy the demo app

After the Spot cluster is up, I can deploy a demo app on it. I wrote a tool called Ufo that is useful for these tasks:

  1. Build the Docker image.
  2. Register the ECS task definition.
  3. Register and deploy the ECS service.
  4. Create the load balancer.

Docker should be installed as a prerequisite. First, I create an ECR repo and set some variables:

ECR_REPO=$(aws ecr create-repository --repository-name demo/sinatra | jq -r '.repository.repositoryUri')
VPC_ID=$(aws ec2 describe-vpcs --filters Name=tag:Name,Values="demo vpc" | jq -r '.Vpcs[].VpcId')

Now I’m ready to clone the demo repo and deploy a sample app to ECS with ufo.

git clone https://github.com/tongueroo/demo-ufo.git demo
cd demo
ufo init --image $ECR_REPO --vpc-id $VPC_ID
ufo current --service demo-web
ufo ship # deploys to ECS on the Spot Fleet cluster

Here’s the ECS service running:

I then grab the Elastic Load Balancing endpoint from the console or with ufo ps.

$ ufo ps
Elb: develop-Elb-12LHJWU4TH3Q8-597605736.us-west-2.elb.amazonaws.com
$

Now I test with curl:

$ curl develop-Elb-12LHJWU4TH3Q8-597605736.us-west-2.elb.amazonaws.com
42

The application returns “42,” the meaning of life, successfully. That’s it! I now have an application running on ECS with Spot Fleet Instances.

Parting thoughts

One additional advantage of using Spot is that it encourages me to think about my architecture in a highly available manner. The Spot “constraints” ironically result in much better sleep at night as the system must be designed to be self-healing.

Hopefully, this post opens the world of running ECS on Spot Instances to you. It’s a core of part of the systems that BoltOps has been running on its own production system and for customers. I still get excited about the setup today. If you’re interested in Spot architectures, contact me at BoltOps.

One last note: Auto Scaling groups also support running multiple instance types and purchase options. Jeff mentions in his post that weight support is planned for a future release. That’s exciting, as it may streamline the usage of Spot with ECS even further.