Tag Archives: Compute

Adopt Recommendations and Monitor Predictive Scaling for Optimal Compute Capacity

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/evaluating-predictive-scaling-for-amazon-ec2-capacity-optimization/

This post is written by Ankur Sethi, Sr. Product Manager, EC2, and Kinnar Sen, Sr. Specialist Solution Architect, AWS Compute.

Amazon EC2 Auto Scaling helps customers optimize their Amazon EC2 capacity by dynamically responding to varying demand. Based on customer feedback, we enhanced the scaling experience with the launch of predictive scaling policies. Predictive scaling proactively adds EC2 instances to your Auto Scaling group in anticipation of demand spikes. This results in better availability and performance for your applications that have predictable demand patterns and long initialization times. We recently launched a couple of features designed to help you assess the value of predictive scaling – prescriptive recommendations on whether to use predictive scaling based on its potential availability and cost impact, and integration with Amazon CloudWatch to continuously monitor the accuracy of predictions. In this post, we discuss the features in detail and the steps that you can easily adopt to enjoy the benefits of predictive scaling.

Recap: Predictive Scaling

EC2 Auto Scaling helps customers maintain application availability by managing the capacity and health of the underlying cluster. Prior to predictive scaling, EC2 Auto Scaling offered dynamic scaling policies such as target tracking and step scaling. These dynamic scaling policies are configured with an Amazon CloudWatch metric that represents an application’s load. EC2 Auto Scaling constantly monitors this metric and responds according to your policies, thereby triggering the launch or termination of instances. Although it’s extremely effective and widely used, this model is reactive in nature, and for larger spikes, may lead to unfulfilled capacity momentarily as the cluster is scaling out. Customers mitigate this by adopting aggressive scale out and conservative scale in to manage the additional buffer of instances. However, sometimes applications take a long time to initialize or have a recurring pattern with a sudden spike of high demand. These can have an impact on the initial response of the system when it is scaling out. Customers asked for a proactive scaling mechanism that can scale capacity ahead of predictable spikes, and so we delivered predictive scaling.

Predictive scaling was launched to make the scaling action proactive as it anticipates the changes required in the compute demand and scales accordingly. The scaling action is determined by ensemble machine learning (ML) built with data from your Auto Scaling group’s scaling patterns, as well as billions of data points from our observations. Predictive scaling should be used for applications where demand changes rapidly but with a recurring pattern, instances require a long time to initialize, or where you’re manually invoking scheduled scaling for routine demand patterns. Predictive scaling not only forecasts capacity requirements based on historical usage, but also learns continuously, thereby making forecasts more accurate with time. Furthermore, predictive scaling policy is designed to only scale out and not scale in your Auto Scaling groups, eliminating the risk of ending with lesser capacity because of inexact predictions. You must use dynamic scaling policy, scheduled scaling, or your own custom mechanism for scale-ins. In case of exceptional demand spikes, this addition of dynamic scaling policy can also improve your application performance by bridging the gap between demand and predicted capacity.

What’s new with predictive scaling

Predictive scaling policies can be configured in a non-mutative ‘Forecast Only’ mode to evaluate the accuracy of forecasts. When you’re ready to start scaling, you can switch to the ‘Forecast and Scale’ mode. Now we prescriptively recommend whether your policy should be switched to ‘Forecast and Scale’ mode if it can potentially lead to better availability and lower costs, saving you the time and effort of doing such an evaluation manually. You can test different configurations by creating multiple predictive scaling policies in ‘Forecast Only’ mode, and choose the one that performs best in terms of availability and cost improvements.

Monitoring and observability are key elements of AWS Well Architected Framework. Now we also offer CloudWatch metrics for your predictive scaling policies so that that you can programmatically monitor your predictive scaling policy for demand pattern changes or prolonged periods of inaccurate predictions. This will enable you to monitor the key performance metrics and make it easier to adopt AWS Well-Architected best practices.

In the following sections, we deep dive into the details of these two features.

Recommendations for predictive scaling

Once you set up an Auto Scaling group with predictive scaling policy in Forecast Only mode as explained in this introduction to predictive scaling blog post , you can review the results of the forecast visually and adjust any parameters to more accurately reflect the behavior that you desire. Evaluating simply on the basis of visualization may not be very intuitive if the scaling patterns are erratic. Moreover, if you keep higher minimum capacities, then the graph may show a flat line for the actual capacity as your Auto Scaling group capacity is an outcome of existing scaling policy configurations and the minimum capacity that you configured. This makes it difficult to contemplate whether the lower capacity predicted by predictive scaling wouldn’t leave your Auto Scaling group under-scaled.

This new feature provides a prescriptive guidance to switch on predictive scaling in Forecast and Scale mode based on the factors of availability and cost savings. To determine the availability and cost savings, we compare the predictions against the actual capacity and the optimal, required capacity. This required capacity is inferred based on whether your instances were running at a higher or lower value than the target value for scaling metric that you defined as part of the predictive scaling policy configuration. For example, if an Auto Scaling group is running 10 instances at 20% CPU Utilization while the target defined in predictive scaling policy is 40%, then the instances are running under-utilized by 50% and the required capacity is assumed to be 5 instances (half of your current capacity). For an Auto Scaling group, based on the time range in which you’re interested (two weeks as default), we aggregate the cost saving and availability impact of predictive scaling. The availability impact measures for the amount of time that the actual metric value was higher than the target value that you defined to be optimal for each policy. Similarly, cost savings measures the aggregated savings based on the capacity utilization of the underlying Auto Scaling group for each defined policy. The final cost and availability will lead us to a recommendation based on:

  • If availability increases (or remains same) and cost reduces (or remains same), then switch on Forecast and Scale
  • If availability reduces, then disable predictive scaling
  • If availability increase comes at an increased cost, then the customer should take the call based on their cost-availability tradeoff threshold

This figure shows the console view of how the recommendations look like on the Auto Scaling console. For each policy we make prescriptive recommendation of whether to switch to Forecast And Scale mode along with whether doing so can lead to better availability and lower costFigure 1: Predictive Scaling Recommendations on EC2 Auto Scaling console

The preceding figure shows how the console reflects the recommendation for a predictive scaling policy. You get information on whether the policy can lead to higher availability and lower cost, which leads to a recommendation to switch to Forecast and Scale. To achieve this cost saving, you might have to lower your minimum capacity and aim for higher utilization in dynamic scaling policies.

To get the most value from this feature, we recommend that you create multiple predictive scaling policies in Forecast Only mode with different configurations, choosing different metrics and/or different target values. Target value is an important lever that changes how aggressive the capacity forecasts must be. A lower target value increases your capacity forecast resulting in better availability for your application. However, this also means more dollars to be spent on the Amazon EC2 cost. Similarly, a higher target value can leave you under-scaled while reactive scaling bridges the gap in just a few minutes. Separate estimates of cost and availability impact are provided for each of the predictive scaling policies. We recommend using a policy if either availability or cost are improved and the other variable improves or stays the same. As long as there is a predictable pattern, Auto Scaling enhanced with predictive scaling maintains high availability for your applications.

Continuous Monitoring of predictive scaling

Once you’re using a predictive scaling policy in Forecast and Scale mode based on the recommendation, you must monitor the predictive scaling policy for demand pattern changes or inaccurate predictions. We introduced two new CloudWatch Metrics for predictive scaling called ‘PredictiveScalingLoadForecast’ and ‘PredictiveScalingCapacityForecast’. Using CloudWatch mertic math feature, you can create a customized metric that measures the accuracy of predictions. For example, to monitor whether your policy is over or under-forecasting, you can publish separate metrics to measure the respective errors. In the following graphic, we show how the metric math expressions can be used to create a Mean Absolute Error for over-forecasting on the load forecasts. Because predictive scaling can only increase capacity, it is useful to alert when the policy is excessively over-forecasting to prevent unnecessary cost.This figure shows the CloudWatch graph of three metrics – the total CPU Utilization of the Auto Scaling group, the load forecast generated by predictive scaling, and the derived metric using metric math that measures error for over-forecastingFigure 2: Graphing an accuracy metric using metric math on CloudWatch

In the previous graph, the total CPU Utilization of the Auto Scaling group is represented by m1 metric in orange color while the predicted load by the policy is represented by m2 metric in green color. We used the following expression to get the ratio of over-forecasting error with respect to the actual value.

IF((m2-m1)>0, (m2-m1),0))/m1

Next, we will setup an alarm to automatically send notifications using Amazon Simple Notification Service (Amazon SNS). You can create similar accuracy monitoring for capacity forecasts, but remember that once the policy is in Forecast and Scale mode, it already starts influencing the actual capacity. Hence, putting alarms on load forecast accuracy might be more intuitive as load is generally independent of the capacity of an Auto Scaling group.

This figure shows creation of alarm when 10 out of 12 data points breach 0.02 threshold for the accuracy metricFigure 3: Creating a CloudWatch Alarm on the accuracy metric

In the above screenshot, we have set an alarm that triggers when our custom accuracy metric goes above 0.02 (20%) for 10 out of last 12 data points which translates to 10 hours of the last 12 hours. We prefer to alarm on a greater number of data points so that we get notified only when predictive scaling is consistently giving inaccurate results.

Conclusion

With these new features, you can make a more informed decision about whether predictive scaling is right for you and which configuration makes the most sense. We recommend that you start off with Forecast Only mode and switch over to Forecast and Scale based on the recommendations. Once in Forecast and Scale mode, predictive scaling starts taking proactive scaling actions so that your instances are launched and ready to contribute to the workload in advance of the predicted demand. Then continuously monitor the forecast to maintain high availability and cost optimization of your applications. You can also use the new predictive scaling metrics and CloudWatch features, such as metric math, alarms, and notifications, to monitor and take actions when predictions are off by a set threshold for prolonged periods.

Building Sustainable, Efficient, and Cost-Optimized Applications on AWS

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-sustainable-efficient-and-cost-optimized-applications-on-aws/

This blog post is written by Isha Dua Sr. Solutions Architect AWS, Ananth Kommuri Solutions Architect AWS, and Dr. Sam Mokhtari Sr. Sustainability Lead SA WA for AWS.

Today, more than ever, sustainability and cost-savings are top of mind for nearly every organization. Research has shown that AWS’ infrastructure is 3.6 times more energy efficient than the median of U.S. enterprise data centers and up to five times more energy efficient than the average in Europe. That said, simply migrating to AWS isn’t enough to meet the Environmental, Social, Governance (ESG) and Cloud Financial Management (CFM) goals that today’s customers are setting. In order to make conscious use of our planet’s resources, applications running on the cloud must be built with efficiency in mind.

That’s because cloud sustainability is a shared responsibility. At AWS, we’re responsible for optimizing the sustainability of the cloud – building efficient infrastructure, enough options to meet every customer’s needs, and the tools to manage it all effectively. As an AWS customer, you’re responsible for sustainability in the cloud – building workloads in a way that minimizes the total number of resource requirements and makes the most of what must be consumed.

Most AWS service charges are correlated with hardware usage, so reducing resource consumption also has the added benefit of reducing costs. In this blog post, we’ll highlight best practices for running efficient compute environments on AWS that maximize utilization and decrease waste, with both sustainability and cost-savings in mind.

First: Measure What Matters

Application optimization is a continuous process, but it has to start somewhere. The AWS Well Architected Framework Sustainability pillar includes an improvement process that helps customers map their journey and understand the impact of possible changes. There is a saying “you can’t improve what you don’t measure.”, which is why it’s important to define and regularly track metrics which are important to your business. Scope 2 Carbon emissions, such as those provided by the AWS Customer Carbon Footprint Tool, are one metric that many organizations use to benchmark their sustainability initiatives, but they shouldn’t be the only one.

Even after AWS meets our 2025 goal of powering our operations with 100% renewable energy, it’s still be important to maximize the utilization and minimize the consumption of the resources that you use. Just like installing solar panels on your house, it’s important to limit your total consumption to ensure you can be powered by that energy. That’s why many organizations use proxy metrics such as vCPU Hours, storage usage, and data transfer to evaluate their hardware consumption and measure improvements made to infrastructure over time.

In addition to these metrics, it’s helpful to baseline utilization against the value delivered to your end-users and customers. Tracking utilization alongside business metrics (orders shipped, page views, total API calls, etc) allows you to normalize resource consumption with the value delivered to your organization. It also provides a simple way to track progress towards your goals over time. For example, if the number of orders on your ecommerce site remained constant over the last month, but your AWS infrastructure usage decreased by 20%, you can attribute the efficiency gains to your optimization efforts, not changes in your customer behavior.

Utilize all of the available pricing models

Compute tasks are the foundation of many customers’ workloads, so it typically sees biggest benefit by optimization. Amazon EC2 provides resizable compute across a wide variety of compute instances, is well-suited to virtually every use case, is available via a number of highly flexible pricing options. One of the simplest changes you can make to decrease your costs on AWS is to review the purchase options for the compute and storage resources that you already use.

Amazon EC2 provides multiple purchasing options to enable you to optimize your costs based on your needs. Because every workload has different requirements, we recommend a combination of purchase options tailored for your specific workload needs. For steady-state workloads that can have a 1-3 year commitment, using Compute Savings Plans helps you save costs, move from one instance type to a newer, more energy-efficient alternative, or even between compute solutions (e.g., from EC2 instances to AWS Lambda functions, or AWS Fargate).

EC2 Spot instances are another great way to decrease cost and increase efficiency on AWS. Spot Instances make unused Amazon EC2 capacity available for customers at discounted prices. At AWS, one of our goals it to maximize utilization of our physical resources. By choosing EC2 Spot instances, you’re running on hardware that would otherwise be sitting idle in our datacenters. This increases the overall efficiency of the cloud, because more of our physical infrastructure is being used for meaningful work. Spot instances use market-based pricing that changes automatically based on supply and demand. This means that the hardware with the most spare capacity sees the highest discounts, sometimes up to XX% off on-demand prices, to encourage our customers to choose that configuration.

Savings Plans are ideal for predicable, steady-state work. On-demand is best suited for new, stateful, and spiky workloads which can’t be instance, location, or time flexible. Finally, Spot instances are a great way to supplement the other options for applications that are fault tolerant and flexible. AWS recommends using a mix of pricing models based on your workload needs and ability to be flexible.

By using these pricing models, you’re creating signals for your future compute needs, which helps AWS better forecast resource demands, manage capacity, and run our infrastructure in a more sustainable way.

Choose efficient, purpose-built processors whenever possible

Choosing the right processor for your application is as equally important consideration because under certain use cases a more powerful processor can allow for the same level of compute power with a smaller carbon footprint. AWS has the broadest choice of processors, such as Intel – Xeon scalable processors, AMD – AMD EPYC processors, GPU’s FPGAs, and Custom ASICs for Accelerated Computing.

AWS Graviton3, AWS’s latest and most power-efficient processor, delivers 3X better CPU performance per-watt than any other processor in AWS, provides up to 40% better price performance over comparable current generation x86-based instances for various workloads, and helps customers reduce their carbon footprint. Consider transitioning your workload to Graviton-based instances to improve the performance efficiency of your workload (see AWS Graviton Fast Start and AWS Graviton2 for ISVs). Note the considerations when transitioning workloads to AWS Graviton-based Amazon EC2 instances.

For machine learning (ML) workloads, use Amazon EC2 instances based on purpose-built Amazon Machine Learning (Amazon ML) chips, such as AWS TrainiumAWS Inferentia, and Amazon EC2 DL1.

Optimize for hardware utilization

The goal of efficient environments is to use only as many resources as required in order to meet your needs. Thankfully, this is made easier on the cloud because of the variety of instance choices, the ability to scale dynamically, and the wide array of tools to help track and optimize your cloud usage. At AWS, we offer a number of tools and services that can help you to optimize both the size of individual resources, as well as scale the total number of resources based on traffic and load.

Two of the most important tools to measure and track utilization are Amazon CloudWatch and the AWS Cost & Usage Report (CUR). With CloudWatch, you can get a unified view of your resource metrics and usage, then analyze the impact of user load on capacity utilization over time. The Cost & Usage Report (CUR) can help you understand which resources are contributing the most to your AWS usage, allowing you to fine-tune your efficiency and save on costs. CUR data is stored in S3, which allows you to query it with tools like Amazon Athena or generate custom reports in Amazon QuickSight or integrate with AWS Partner tools for better visibility and insights.

An example of a tool powered by CUR data is the AWS Cost Intelligence Dashboard. The Cost Intelligence Dashboard provides a detailed, granular, and recommendation-driven view of your AWS usage. With its prebuilt visualizations, it can help you identify which service and underlying resources are contributing the most towards your AWS usage, and see the potential savings you can realize by optimizing. It even provides right sizing recommendations and the appropriate EC2 instance family to help you optimize your resources.

Cost Intelligence Dashboard is also integrated with AWS Compute Optimizer, which makes instance type and size recommendations based on workload characteristics. For example, it can identify if the workload is CPU-intensive, if it exhibits a daily pattern, or if local storage is accessed frequently. Compute Optimizer then infers how the workload would have performed on various hardware platforms (for example, Amazon EC2 instance types) or using different configurations (for example, Amazon EBS volume IOPS settings, and AWS Lambda function memory sizes) to offer recommendations. For stable workloads, check AWS Compute Optimizer at regular intervals to identify right-sizing opportunities for instances. By right sizing with Compute Optimizer, you can increase resource utilization and reduce costs by up to 25%. Similarly, Lambda Power Tuning can help choose the memory allocated to Lambda functions is an optimization process that balances speed (duration) and cost while lowering your carbon emission in the process.

CloudWatch metrics are used to power EC2 Autoscaling, which can automatically choose the right instance to fit your needs with attribute-based instance selection and scale your entire instance fleet up and down based on demand in order to maintain high utilization. AWS Auto Scaling makes scaling simple with recommendations that let you optimize performance, costs, or balance between them. Configuring and testing workload elasticity will help save money, maintain performance benchmarks, and reduce the environmental impact of workloads. You can utilize the elasticity of the cloud to automatically increase the capacity during user load spikes, and then scale down when the load decreases. Amazon EC2 Auto Scaling allows your workload to automatically scale up and down based on demand. You can set up scheduled or dynamic scaling policies based on metrics such as average CPU utilization or average network in or out. Then, you can integrate AWS Instance Scheduler and Scheduled scaling for Amazon EC2 Auto Scaling to schedule shut downs and terminate resources that run only during business hours or on weekdays to further reduce your carbon footprint.

Design applications to minimize overhead and use fewer resources

Using the latest Amazon Machine Image (AMI) gives you updated operating systems, packages, libraries, and applications, which enable easier adoption as more efficient technologies become available. Up-to-date software includes features to measure the impact of your workload more accurately, as vendors deliver features to meet their own sustainability goals.

By reducing the amount of equipment that your company has on-premises and using managed services, you can help facilitate the move to a smaller, greener footprint. Instead of buying, storing, maintaining, disposing of, and replacing expensive equipment, businesses can purchase services as they need that are already optimized with a greener footprint. Managed services also shift responsibility for maintaining high average utilization and sustainability optimization of the deployed hardware to AWS. Using managed services will help distribute the sustainability impact of the service across all of the service tenants, thereby reducing your individual contribution. The following services help reduce your environmental impact because capacity management is automatically optimized.

 AWS  Managed Service   Recommendation for sustainability improvement

Amazon Aurora

You can use Amazon  Aurora Serverless to automatically start up, shut down, and scale capacity up or down based on your application’s needs.

Amazon Redshift

You can use Amazon Redshift Serverless to run and scale data warehouse capacity.

AWS Lambda

You can Migrate AWS Lambda functions to Arm-based AWS Graviton2 processors.

Amazon ECS

You can run Amazon ECS on AWS Fargate to avoid the undifferentiated heavy lifting by leveraging sustainability best practices AWS put in place for management of the control plane.

Amazon EMR

You can use EMR Serverless to avoid over- or under-provisioning resources for your data processing jobs.

AWS Glue

You can use Auto-scaling for AWS Glue to enable on-demand scaling up and scaling down of the computing resources.

 Centralized data centers consume a lot of energy, produce a lot of carbon emissions and cause significant electronic waste. While more data centers are moving towards green energy, an even more sustainable approach (alongside these so-called “green data centers”) is to actually cut unnecessary cloud traffic, central computation and storage as much as possible by shifting computation to the edge. Edge Computing stores and uses data locally, on or near the device it was created on. This reduces the amount of traffic sent to the cloud and, at scale, can limit the overall energy used and carbon emissions.

Use storage that best supports how your data is accessed and stored to minimize the resources provisioned while supporting your workload. Solid state devices (SSDs) are more energy intensive than magnetic drives and should be used only for active data use cases. You should look into using ephemeral storage whenever possible and categorize, centralize, deduplicate, and compress persistent storage.

AWS OutpostsAWS Local Zones and AWS Wavelength services deliver data processing, analysis, and storage close to your endpoints, allowing you to deploy APIs and tools to locations outside AWS data centers. Build high-performance applications that can process and store data close to where it’s generated, enabling ultra-low latency, intelligent, and real-time responsiveness. By processing data closer to the source, edge computing can reduce latency, which means that less energy is required to keep devices and applications running smoothly. Edge computing can help to reduce the carbon footprint of data centers by using renewable energy sources such as solar and wind power.

Conclusion

In this blog post, we discussed key methods and recommended actions you can take to optimize your AWS compute infrastructure for resource efficiency. Using the appropriate EC2 instance types with the right size, processor, instance storage and pricing model can enhance the sustainability of your applications. Use of AWS managed services, options for edge computing and continuously optimizing your resource usage can further improve the energy efficiency of your workloads. You can also analyze the changes in your emissions over time as you migrate workloads to AWS, re-architect applications, or deprecate unused resources using the Customer Carbon Footprint Tool.

Ready to get started? Check out the AWS Sustainability page to find out more about our commitment to sustainability and learn more about renewable energy usage, case studies on sustainability through the cloud, and more.

Building a Cloud in the Cloud: Running Apache CloudStack on Amazon EC2, Part 2

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-a-cloud-in-the-cloud-running-apache-cloudstack-on-amazon-ec2-part-2/

This blog is written by Mark Rogers, SDE II – Customer Engineering AWS.

In part 1, I showed you how to run Apache CloudStack with KVM on a single Amazon Elastic Compute Cloud (Amazon EC2) instance. That simple setup is great for experimentation and light workloads. In this post, things will get a lot more interesting. I’ll show you how to create an overlay network in your Amazon Virtual Private Cloud (Amazon VPC) that allows CloudStack to scale horizontally across multiple EC2 instances. This same method could work with other hypervisors, too.

If you haven’t read it yet, then start with part 1. It explains why this network setup is necessary. The same prerequisites apply to both posts.

Making things easier

I wrote some scripts to automate the CloudStack installation and OS configuration on CentOS 7. You can customize them to meet your needs. I also wrote some AWS CloudFormation templates you can copy in order to create a demo environment. The README file has more details.

The scalable method

Our team started out using a single EC2 instance, as described in my last post. That worked at first, but it didn’t have the capacity we needed. We were limited to a couple of dozen VMs, but we needed hundreds. We also needed to scale up and down as our needs changed. This meant we needed the ability to add and remove CloudStack hosts. Using a Linux bridge as a virtual subnet was no longer adequate.

To support adding hosts, we need a subnet that spans multiple instances. The solution I found is Virtual Extensible LAN (VXLAN). It’s lightweight, easy to configure, and included in the Linux kernel. VXLAN creates a layer 2 overlay network that abstracts away the details of the underlying network. It allows machines in different parts of a network to communicate as if they’re all attached to the same simple network switch.

Another example of an overlay network is an Amazon VPC. It acts like a physical network, but it’s actually a layer on top of other networks. It’s networks all the way down. VXLAN provides a top layer where CloudStack can sit comfortably, handling all of your VM needs, blissfully unaware of the world below it.

An overlay network comes with some big advantages. The biggest improvement is that you can have multiple hosts, allowing for horizontal scaling. Having more hosts not only gives you more computing power, but also lets you do rolling maintenance. Instead of putting the database and file storage on the management server, I’ll show you how to use Amazon Elastic File System (Amazon EFS) and Amazon Relational Database Service (Amazon RDS) for scalable and reliable storage.

EC2 Instances

Let’s start with three Amazon EC2 instances. One will be a router between the overlay network and your Amazon VPC, the second one will be your CloudStack management server, and the third one will be your VM host. You’ll also need a way to connect to your instances, such as a bastion host or a VPN endpoint.

Three EC2 instances are connected to an AWS subnet. There's an overlay network that spans all three instances.The router instance connects the overlay network to the AWS subnet. The management instance contains the CloudStack management service, which is attached to the overlay network. The host instance contains the CloudStack agent and some VMs, all of which are connected to the overlay network.VXLAN must send and receive multicast traffic. Only Nitro instances can be multicast senders. As you plan, look at the list of Nitro instance types.

The router won’t need much computing power, but it will need enough network bandwidth to meet your needs. If you put Amazon EFS in the same subnet as your instances, then they’ll communicate with it directly, thereby reducing the load on the router. Decide how much network throughput you want, and then pick a suitable Nitro instance type.

After creating the router instance, configure AWS to use it as a router. Stop source/destination checking in the instance’s network settings. Then update the applicable AWS route tables to use the router as the target for the overlay network. The router’s security group needs to allow ingress to the CloudStack UI (TCP port 8080) and any services you plan to offer from VMs.

For the management server, you’ll want a Nitro instance type. It’s going to use more CPU than your router, so plan accordingly.

In addition to being a Nitro type, the host instance must also be a metal type. Metal instances have hardware virtualization support, which is needed by KVM. If you have a new AWS account with a low on-demand vCPU limit, then consider starting with an m5zn.metal, which has 48 vCPUs. Otherwise, I suggest going directly to a c5.metal because it provides 96 vCPUs for a similar price. There are bigger types available depending on your compute needs, budget, and vCPU limit. If your account’s on-demand vCPU limit is too low, then you can file a support ticket to have it raised.

Networking

All of the instances should be on a dedicated subnet. Sharing the subnet with other instances can cause communication issues. For an example, refer to the following figure. The subnet has an instance named TroubleMaker that’s not on the overlay network. If TroubleMaker sends a request to the management instance’s overlay network address, then here’s what happens:

  1. The request goes through the AWS subnet to the router.
  2. The router forwards the request via the overlay network.
  3. The CloudStack management instance has a connection to the same AWS subnet that TroubleMaker is on. Therefore, it responds directly instead of using the router. This isn’t the return path that AWS is expecting, so the response is dropped.

This diagram depicts the steps described in the previous paragraph.

If you move TroubleMaker to a different subnet, then the requests and responses will all go through the router. That will fix the communication issues.

The instances in the overlay network will use special interfaces that serve as VXLAN tunnel endpoints (VTEPs). The VTEPs must know how to contact each other via the underlay network. You could manually give each instance a list of all of the other instances, but that’s a maintenance nightmare. It’s better to let the VTEPs discover each other, which they can do using multicast. You can add multicast support using AWS Transit Gateway.

Here are the steps to make VXLAN multicasts work:

  1. Enable multicast support when you create the transit gateway.
  2. Attach the transit gateway to your subnet.
  3. Create a transit gateway multicast domain with IGMPv2 support enabled.
  4. Associate the multicast domain with your subnet.
  5. Configure the eth0 interface on each instance to use IGMPv2. The following sample code shows how to do this.
  6. Make sure that your instance security groups allow ingress for IGMP queries (protocol 2 traffic from 0.0.0.0/32) and VXLAN traffic (UDP port 4789 from the other instances).

CloudStack VMs must connect to the same bridge as the VXLAN interface. As mentioned in the previous post, CloudStack cares about names. I recommend giving the interface a name starting with “eth”. Moreover, this naming convention tells CloudStack which bridge to use, thereby avoiding the need for a dummy interface like the one in the simple setup.

The following snippet shows how I configured the networking in CentOS 7. You must provide values for these variables:

  • $overlay_host_ip_address, $overlay_netmask, and $overlay_gateway_ip: Use values for the overlay network that you’re creating.
  • $dns_address: I recommend using the base of the VPC IPv4 network range, plus two. You shouldn’t use 169.654.169.253 because CloudStack reserves link-local addresses for its own use.
  • $multicast_address: The multicast address that you want VXLAN to use. Pick something in the multicast range that won’t conflict with anything else. I recommend choosing from the IPv4 local scope (239.255.0.0/16).
  • $interface_name: The name of the interface VXLAN should use to communicate with the physical network. This is typically eth0.

A couple of the steps are different for the router instance than for the other instances. Pay attention to the comments!

yum install -y bridge-utils net-tools

# IMPORTANT: Omit the GATEWAY setting on the router instance!
cat << EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=no
USERCTL=no
NM_CONTROLLED=no
IPADDR=$overlay_host_ip_address
NETMASK=$overlay_netmask
DNS1=$dns_address
GATEWAY=$overlay_gateway_ip
EOF

cat << EOF > /sbin/ifup-local
#!/bin/bash
# Set up VXLAN once cloudbr0 is available.
if [[ \$1 == "cloudbr0" ]]
then
    ip link add ethvxlan0 type vxlan id 100 dstport 4789 group "$multicast_address" dev "$interface_name"
    brctl addif cloudbr0 ethvxlan0
    ip link set up dev ethvxlan0
fi
EOF

chmod +x /sbin/ifup-local

# Transit Gateway requires IGMP version 2
echo "net.ipv4.conf.$interface_name.force_igmp_version=2" >> /etc/sysctl.conf
sysctl -p

# Enable IPv4 forwarding
# IMPORTANT: Only do this on the router instance!
echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p

# Restart the network service to make the changes take effect.
systemctl restart network

Storage

Let’s look at storage. Create an Amazon RDS database using the MySQL 8.0 engine, and set a master password for CloudStack’s database setup tool to use. Refer to the CloudStack documentation to find the MySQL settings that you’ll need. You can put the settings in an RDS parameter group. In case you’re wondering why I’m not using Amazon Aurora, it’s because CloudStack needs the MyISAM storage engine, which isn’t available in Aurora.

I recommend Amazon EFS for file storage. For efficiency, create a mount target in the subnet with your EC2 instances. That will enable them to communicate directly with the mount target, thereby bypassing the overlay network and router. Note that the system VMs will use Amazon EFS via the router.

If you want, you can consolidate your CloudStack file systems. Just create a single file system with directories for each zone and type of storage. For example, I use directories named /zone1/primary, /zone1/secondary, /zone2/primary, etc. You should also consider enabling provisioned throughput on the file system, or you may run out of bursting credits after booting a few VMs.

One consequence of the file system’s scalability is that the amount of free space (8 exabytes) will cause an integer overflow in CloudStack! To avoid this problem, reduce storage.overprovisioning.factor in CloudStack’s global settings from 2 to 1.

When your environment is ready, install CloudStack. When it asks for a default gateway, remember to use the router’s overlay network address. When you add a host, make sure that you use the host’s overlay network IP address.

Cleanup

If you used my CloudFormation template, delete the stack and remove any route table entries you added.

If you didn’t use CloudFormation, here are the things to delete:

  1. The CloudStack EC2 instances
  2. The Amazon RDS database and parameter group
  3. The Amazon EFS file system
  4. The transit gateway multicast domain subnet association
  5. The transit gateway multicast domain
  6. The transit gateway VPC attachment
  7. The transit gateway
  8. The route table entries that you created
  9. The security groups that you created for the instances, database, and file system

Conclusion

The approach I shared has many steps, but they’re not bad when you have a plan. Whether you need a simple setup for experiments, or you need a scalable environment for a data center migration, you now have a path forward. Give it a try, and comment here about the things you learned. I hope you find it useful and fun!

“Apache”, “Apache CloudStack”, and “CloudStack” are trademarks of the Apache Software Foundation.

Building a Cloud in the Cloud: Running Apache CloudStack on Amazon EC2, Part 1

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-a-cloud-in-the-cloud-running-apache-cloudstack-on-amazon-ec2-part-1/

This blog is written by Mark Rogers, SDE II – Customer Engineering AWS.

How do you put a cloud inside another cloud? Some features that make Amazon Elastic Compute Cloud (Amazon EC2) secure and wonderful also make running CloudStack difficult. The biggest obstacle is that AWS and CloudStack both want to manage network resources. Therefore, we must keep them out of each other’s way. This requires some steps that aren’t obvious, and it took a long time to figure out. I’m going to share what I learned, so that you can navigate the process more easily.

Apache CloudStack is an open-source platform for deploying and managing virtual machines (VMs) and the associated network and storage infrastructure. You would normally run it on your own hardware to create your own cloud. But there can be advantages to running it inside of an Amazon Virtual Private Cloud (Amazon VPC), including how it could help you migrate out of a data center. It’s a great way to create disposable environments for experiments or training. Furthermore, it’s a convenient way to test-drive the new CloudStack support in Amazon Elastic Kubernetes Service (Amazon EKS) Anywhere. In my case, I needed to create development and test environments for a project that uses the CloudStack API. The environments needed to be shared and scalable. Our build pipelines were already in AWS, so it made sense to put the new environments there, too.

CloudStack can work with a number of hypervisors. The instructions in this article will use Kernel-based Virtual Machine (KVM) on Linux. KVM will manage the VMs at a low level, and CloudStack will manage KVM.

Prerequisites

Most of the information in this article should be applicable to a range of CloudStack versions. I targeted CloudStack 4.14 on CentOS 7. I also tested CloudStack versions 4.16 and 4.17, and I recommend them.

The official CentOS 7 x86_64 HVM image works well. If you use a different Linux flavor or version, then you might have to modify some of the implementation details.

You’ll need to know the basics of CloudStack. The scope of this article is making CloudStack and AWS coexist peacefully. Once CloudStack is running, I’m assuming that you’ll handle things from there.  Refer to the AWS documentation and CloudStack documentation for information on security and other best practices.

Making things easier

I wrote some scripts to automate the installation. You can run them on EC2 instances with CentOS 7, and they’ll do all the installation and OS configuration for you. You can use them as they are, or customize them to meet your needs. I also wrote some AWS CloudFormation templates you can copy in order to create a demo environment. The README file has more details.

Amazon EC2 instance types

KVM requires hardware virtualization support. Most EC2 instances are VMs that don’t support nested virtualization. To get access to the bare hardware, you need a metal instance type.

I like c5.metal because it’s one of the least expensive metal types, and has a low cost per vCPU. It has 96 vCPUs and 192 GiB of memory. If you run 20 VMs on it, with 4 CPU cores and 8 GiB of memory each, then you’d still have 16 vCPUs and 32 GiB to share between the operating system, CloudStack, and MySQL. Using CloudStack’s overprovisioning feature, you could fit even more VMs if they’re running light loads.

Networking

The biggest challenge is the network. AWS knows which IP and MAC addresses should exist, and it knows the machines to which they should belong. It blocks any traffic that doesn’t fit its idea of how the network should behave. Simultaneously, CloudStack assumes that any IP or MAC address it invents should work just fine. When CloudStack assigns addresses to VMs on an AWS subnet, their network traffic gets blocked.

You could get around this by enabling network address translation (NAT) on the instance running CloudStack. That’s a great solution if it fits your needs, but it makes it hard for other machines in your Amazon VPC to contact your VMs. I recommend a different approach.

Although AWS restricts what you can do with its layer 2 network, it’s perfectly happy to let you run your own layer 3 router. Your EC2 instance can act as a router to a new virtual subnet that’s outside of the jurisdiction of AWS. The instance integrates with AWS just like a VPN appliance, routing traffic to wherever it needs to go. CloudStack can do whatever it wants in the virtual subnet, and everybody’s happy.

What do I mean by a virtual subnet? This is a subnet that exists only inside the EC2 instance.  It consists of logical network interfaces attached to a Linux bridge. The entire subnet exists inside a single EC2 instance. It doesn’t scale well, but it’s simple. In my next post, I’ll cover a more complicated setup with an overlay network that spans multiple instances to allow horizontal scaling.

The simple way

The simple way is to put everything in one EC2 instance, including the database, file storage, and a virtual subnet. Because everything’s stored locally, allocate enough disk space for your needs. 500 GB will be enough to support a few basic VMs. Create or select a security group for your instance that gives users access to the CloudStack UI (TCP port 8080). The security group should also allow access to any services that you’ll offer from your VMs.

EC2 instance summary info showing 1 instance, CentOS 7 (x86_64) AMI, c5.metal instance type, a security group name, and a 500 GiB volume

When you have your instance, configure AWS to treat it as a router.

  1. Go to Amazon EC2 in the AWS Management Console.
  2. Select your instance, and stop source/destination checking.

In the EC2 Actions menu, select Networking, then Change source/destination check.

3. Update the subnet route tables.

a. Go to the VPC settings, and select Route Tables.

b. Identify the tables for subnets that need CloudStack access.

c. In each of these tables, add a route to the new virtual subnet. The route target should be your EC2 instance.

4. Depending on your network needs, you may also need to add routes to transit gateways, VPN endpoints, etc.

Because everything will be on one server, creating a virtual subnet is simply a matter of creating a Linux bridge. CloudStack must find a network adapter attached to the bridge. Therefore, add a dummy interface with a name that CloudStack will recognize.

A single EC2 instance contains the CloudStack management service, the CloudStack agent, a dummy network interface, several virtual machines, and a router. All of those things are connected to each other by a virtual subnet that exists inside the instance. The instance's elastic network interface is connected between the router and the Amazon VPC.The following snippet shows how I configure networking in CentOS 7. You must provide values for the variables $virutal_host_ip_address and $virtual_netmask to reflect the virtual subnet that you want to create. For $dns_address, I recommend the base of the VPC IPv4 network range, plus two. You shouldn’t use 169.654.169.253 because CloudStack reserves link-local addresses for its own use.

yum install -y bridge-utils net-tools

# The bridge must be named cloudbr0.

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=yes
USERCTL=no
NM_CONTROLLED=no
IPADDR=$virtual_host_ip_address
NETMASK=$virtual_netmask
DNS1=$dns_address
EOF

# Create a dummy network interface.
cat << EOF > /etc/sysconfig/modules/dummy.modules
#!/bin/sh
/sbin/modprobe dummy numdummies=1
/sbin/ip link set name ethdummy0 dev dummy0
EOF

chmod +x /etc/sysconfig/modules/dummy.modules
/etc/sysconfig/modules/dummy.modules

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-ethdummy0
TYPE=Ethernet
BOOTPROTO=none
NAME=ethdummy0
DEVICE=ethdummy0
ONBOOT=yes
BRIDGE=cloudbr0
NM_CONTROLLED=no
EOF

# Turn the instance into a router

echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p

# Must kill dhclient or the network service won't restart properly.
# A reboot would also work, if you’d rather do that.

pkill dhclient
systemctl restart network

CloudStack must know which IP addresses to use for inter-service communication. It will select by resolving the machine’s fully qualified domain name (FQDN) to an address. The following commands will make it to choose the right one. You must provide a value for $virtual_host_ip_address.

hostnamectl set-hostname cloudstack.localdomain

echo "$virtual_host_ip_address cloudstack.localdomain" >> 
/etc/hosts

You can finish the setup by following the Quick Installation Guide.

Remember that CloudStack is only directly connected to your virtual network. The EC2 instance is the router that connects the virtual subnet to the Amazon VPC. When you’re configuring CloudStack, use your instance’s virtual subnet address as the default gateway.

Use the EC2 instance's virtual subnet IP address as the default gateway in CloudStack. In this example, the virtual subnet is 10.100.0.0/16, and the instance's address in that subnet is 10.100.0.1. CloudStack then uses 10.100.0.1 as the default gateway.

To access CloudStack from your workstation, you’ll need a connection to your VPC. This can be through a client VPN or a bastion host. If you use a bastion, its subnet needs a route to your virtual subnet, and you’ll need an SSH tunnel for your browser to access the CloudStack UI. The UI is at http://x.x.x.x:8080/client/, where x.x.x.x is your CloudStack instance’s virtual subnet address. Note that CloudStack’s console viewer won’t work if you’re using an SSH tunnel.

If you’re just experimenting with CloudStack, then I suggest saving money by stopping your instance when it isn’t needed. The safe way to do that is:

  1. Disable your zone in the CloudStack UI.
  2. Put the primary storage into maintenance mode.
  3. Wait for the switch to maintenance mode to be complete.
  4. Stop the EC2 instance.

When you’re ready to turn everything back on, simply reverse those steps. If you have any virtual routers in CloudStack, then you may need to start those, too.

Cleanup

If you used my CloudFormation template, then delete the stack and remove any route table entries you added. If you didn’t use CloudFormation, then terminate the EC2 instance, delete the security group you created for it, and remove any route table entries that you added.

Conclusion

Getting CloudStack to run on AWS isn’t so bad. The hardest part is simply knowing how. The setup explained here is great for small installations, but it can only scale vertically. In my next post, I’ll show you how to create an installation that scales horizontally. Instead of using a virtual subnet that exists in a single EC2 instance, we’ll build an overlay network that spans multiple instances. It will use more components and features, including some that might be new to you. I hope you find it interesting!

Now that you can create a simple setup, give it a try! I hope you have fun and learn something new along the way. Comment with the results of your experiments.

“Apache”, “Apache CloudStack”, and “CloudStack” are trademarks of the Apache Software Foundation.

Serverless ICYMI Q4 2022

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2022/

Welcome to the 20th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!In case you missed our last ICYMI, check out what happened last quarter here.

AWS Lambda

For developers using Java, AWS Lambda has introduced Lambda SnapStart. SnapStart is a new capability that can improve the start-up performance of functions using Corretto (java11) runtime by up to 10 times, at no extra cost.

To use this capability, you must enable it in your function and then publish a new version. This triggers the optimization process. This process initializes the function, takes an immutable, encrypted snapshot of the memory and disk state, and caches it for reuse. When the function is invoked, the state is retrieved from the cache in chunks, on an as-needed basis, and it is used to populate the execution environment.

The ICYMI: Serverless pre:Invent 2022 post shares some of the launches for Lambda before November 21, like the support of Lambda functions using Node.js 18 as a runtime, the Lambda Telemetry API, and new .NET tooling to support .NET 7 applications.

Also, now Amazon Inspector supports Lambda functions. You can enable Amazon Inspector to scan your functions continually for known vulnerabilities. The log4j vulnerability shows how important it is to scan your code for vulnerabilities continuously, not only after deployment. Vulnerabilities can be discovered at any time, and with Amazon Inspector, your functions and layers are rescanned whenever a new vulnerability is published.

AWS Step Functions

There were many new launches for AWS Step Functions, like intrinsic functions, cross-account access capabilities, and the new executions experience for Express Workflows covered in the pre:Invent post.

During AWS re:Invent this year, we announced Step Functions Distributed Map. If you need to process many files, or items inside CSV or JSON files, this new flow can help you. The new distributed map flow orchestrates large-scale parallel workloads.

This feature is optimized for files stored in Amazon S3. You can either process in parallel multiple files stored in a bucket, or process one large JSON or CSV file, in which each line contains an independent item. For example, you can convert a video file into multiple .gif animations using a distributed map, or process over 37 GB of aggregated weather data to find the highest temperature of the day. 

Amazon EventBridge

Amazon EventBridge launched two major features: Scheduler and Pipes. Amazon EventBridge Scheduler allows you to create, run, and manage scheduled tasks at scale. You can schedule one-time or recurring tasks across 270 services and over 6.000 APIs.

Amazon EventBridge Pipes allows you to create point-to-point integrations between event producers and consumers. With Pipes you can now connect different sources, like Amazon Kinesis Data Streams, Amazon DynamoDB Streams, Amazon SQS, Amazon Managed Streaming for Apache Kafka, and Amazon MQ to over 14 targets, such as Step Functions, Kinesis Data Streams, Lambda, and others. It not only allows you to connect these different event producers to consumers, but also provides filtering and enriching capabilities for events.

EventBridge now supports enhanced filtering capabilities including:

  • Matching against characters at the end of a value (suffix filtering)
  • Ignoring case sensitivity (equals-ignore-case)
  • OR matching: A single rule can match if any conditions across multiple separate fields are true.

It’s now also simpler to build rules, and you can generate AWS CloudFormation from the console pages and generate event patterns from a schema.

AWS Serverless Application Model (AWS SAM)

There were many announcements for AWS SAM during this quarter summarized in the ICMYI: Serverless pre:Invent 2022 post, like AWS SAM ConnectorsSAM CLI Pipelines now support OpenID Connect Protocol, and AWS SAM CLI Terraform support.

AWS Application Composer

AWS Application Composer is a new visual designer that you can use to build serverless applications using multiple AWS services. This is ideal if you want to build a prototype, review with others architectures, generate diagrams for your projects, or onboard new team members to a project.

Within a simple user interface, you can drag and drop the different AWS resources and configure them visually. You can use AWS Application Composer together with AWS SAM Accelerate to build and test your applications in the AWS Cloud.

AWS Serverless digital learning badges

The new AWS Serverless digital learning badges let you show your AWS Serverless knowledge and skills. This is a verifiable digital badge that is aligned with the AWS Serverless Learning Plan.

This badge proves your knowledge and skills for Lambda, Amazon API Gateway, and designing serverless applications. To earn this badge, you must score at least 80 percent on the assessment associated with the Learning Plan. Visit this link if you are ready to get started learning or just jump directly to the assessment. 

News from other services:

Amazon SNS

Amazon SQS

AWS AppSync and AWS Amplify

Observability

AWS re:Invent 2022

AWS re:Invent was held in Las Vegas from November 28 to December 2, 2022. Werner Vogels, Amazon’s CTO, highlighted event-driven applications during his keynote. He stated that the world is asynchronous and showed how strange a synchronous world would be. During the keynote, he showcased Serverlesspresso as an example of an event-driven application. The Serverless DA team presented many breakouts, workshops, and chalk talks. Rewatch all our breakout content:

In addition, we brought Serverlesspresso back to Vegas. Serverlesspresso is a contactless, serverless order management system for a physical coffee bar. The architecture comprises several serverless apps that support an ordering process from a customer’s smartphone to a real espresso bar. The customer can check the virtual line, place an order, and receive a notification when their drink is ready for pickup.

Serverless blog posts

October

November

December

Videos

Serverless Office Hours – Tuesday 10 AM PT

Weekly live virtual office hours: In each session, we talk about a specific topic or technology related to serverless and open it up to helping with your real serverless challenges and issues. Ask us anything about serverless technologies and applications.

YouTube: youtube.com/serverlessland

Twitch: twitch.tv/aws

October

November

December

FooBar Serverless YouTube Channel

Marcia Villalba frequently publishes new videos on her popular FooBar Serverless YouTube channel.

October

November

December

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. If you want to learn more about event-driven architectures, read our new guide that will help you get started.

You can also follow the Serverless Developer Advocacy team on Twitter and LinkedIn to see the latest news, follow conversations, and interact with the team.

For more serverless learning resources, visit Serverless Land.

Enabling load-balancing of non-HTTP(s) traffic on AWS Wavelength

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/enabling-load-balancing-of-non-https-traffic-on-aws-wavelength/

This blog post is written by Jack Chen, Telco Solutions Architect, and Robert Belson, Developer Advocate.

AWS Wavelength embeds AWS compute and storage services within 5G networks, providing mobile edge computing infrastructure for developing, deploying, and scaling ultra-low-latency applications. AWS recently introduced support for Application Load Balancer (ALB) in AWS Wavelength zones. Although ALB addresses Layer-7 load balancing use cases, some low latency applications that get deployed in AWS Wavelength Zones rely on UDP-based protocols, such as QUIC, WebRTC, and SRT, which can’t be load-balanced by Layer-7 Load Balancers. In this post, we’ll review popular load-balancing patterns on AWS Wavelength, including a proposed architecture demonstrating how DNS-based load balancing can address customer requirements for load-balancing non-HTTP(s) traffic across multiple Amazon Elastic Compute Cloud (Amazon EC2) instances. This solution also builds a foundation for automatic scale-up and scale-down capabilities for workloads running in an AWS Wavelength Zone.

Load balancing use cases in AWS Wavelength

In the AWS Regions, customers looking to deploy highly-available edge applications often consider Amazon Elastic Load Balancing (Amazon ELB) as an approach to automatically distribute incoming application traffic across multiple targets in one or more Availability Zones (AZs). However, at the time of this publication, AWS-managed Network Load Balancer (NLB) isn’t supported in AWS Wavelength Zones and ALB is being rolled out to all AWS Wavelength Zones globally. As a result, this post will seek to document general architectural guidance for load balancing solutions on AWS Wavelength.

As one of the most prominent AWS Wavelength use cases, highly-immersive video streaming over UDP using protocols such as WebRTC at scale often require a load balancing solution to accommodate surges in traffic, either due to live events or general customer access patterns. These use cases, relying on Layer-4 traffic, can’t be load-balanced from a Layer-7 ALB. Instead, Layer-4 load balancing is needed.

To date, two infrastructure deployments involving Layer-4 load balancers are most often seen:

  • Amazon EC2-based deployments: Often the environment of choice for earlier-stage enterprises and ISVs, a fleet of EC2 instances will leverage a load balancer for high-throughput use cases, such as video streaming, data analytics, or Industrial IoT (IIoT) applications
  • Amazon EKS deployments: Customers looking to optimize performance and cost efficiency of their infrastructure can leverage containerized deployments at the edge to manage their AWS Wavelength Zone applications. In turn, external load balancers could be configured to point to exposed services via NodePort objects. Furthermore, a more popular choice might be to leverage the AWS Load Balancer Controller to provision an ALB when you create a Kubernetes Ingress.

Regardless of deployment type, the following design constraints must be considered:

  • Target registration: For load balancing solutions not managed by AWS, seamless solutions to load balancer target registration must be managed by the customer. As one potential solution, visit a recent HAProxyConf presentation, Practical Advice for Load Balancing at the Network Edge.
  • Edge Discovery: Although DNS records can be populated into Amazon Route 53 for each carrier-facing endpoint, DNS won’t deterministically route mobile clients to the most optimal mobile endpoint. When available, edge discovery services are required to most effectively route mobile clients to the lowest latency endpoint.
  • Cross-zone load balancing: Given the hub-and-spoke design of AWS Wavelength, customer-managed load balancers should proxy traffic only to that AWS Wavelength Zone.

Solution overview – Amazon EC2

In this solution, we’ll present a solution for a highly-available load balancing solution in a single AWS Wavelength Zone for an Amazon EC2-based deployment. In a separate post, we’ll cover the needed configurations for the AWS Load Balancer Controller in AWS Wavelength for Amazon Elastic Kubernetes Service (Amazon EKS) clusters.

The proposed solution introduces DNS-based load balancing, a technique to abstract away the complexity of intelligent load-balancing software and allow your Domain Name System (DNS) resolvers to distribute traffic (equally, or in a weighted distribution) to your set of endpoints.

Our solution leverages the weighted routing policy in Route 53 to resolve inbound DNS queries to multiple EC2 instances running within an AWS Wavelength zone. As EC2 instances for a given workload get deployed in an AWS Wavelength zone, Carrier IP addresses can be assigned to the network interfaces at launch.

Through this solution, Carrier IP addresses attached to AWS Wavelength instances are automatically added as DNS records for the customer-provided public hosted zone.

To determine how Route 53 responds to queries, given an arbitrary number of records of a public hosted zone, Route53 offers numerous routing policies:

Simple routing policy – In the event that you must route traffic to a single resource in an AWS Wavelength Zone, simple routing can be used. A single record can contain multiple IP addresses, but Route 53 returns the values in a random order to the client.

Weighted routing policy – To route traffic more deterministically using a set of proportions that you specify, this policy can be selected. For example, if you would like Carrier IP A to receive 50% of the traffic and Carrier IP B to receive 50% of the traffic, we’ll create two individual A records (one for each Carrier IP) with a weight of 50 and 50, respectively. Learn more about Route 53 routing policies by visiting the Route 53 Developer Guide.

The proposed solution leverages weighted routing policy in Route 53 DNS to route traffic to multiple EC2 instances running within an AWS Wavelength zone.

Reference architecture

The following diagram illustrates the load-balancing component of the solution, where EC2 instances in an AWS Wavelength zone are assigned Carrier IP addresses. A weighted DNS record for a host (e.g., www.example.com) is updated with Carrier IP addresses.

DNS-based load balancing

When a device makes a DNS query, it will be returned to one of the Carrier IP addresses associated with the given domain name. With a large number of devices, we expect a fair distribution of load across all EC2 instances in the resource pool. Given the highly ephemeral mobile edge environments, it’s likely that Carrier IPs could frequently be allocated to accommodate a workload and released shortly thereafter. However, this unpredictable behavior could yield stale DNS records, resulting in a “blackhole” – routes to endpoints that no longer exist.

Time-To-Live (TTL) is a DNS attribute that specifies the amount of time, in seconds, that you want DNS recursive resolvers to cache information about this record.

In our example, we should set to 30 seconds to force DNS resolvers to retrieve the latest records from the authoritative nameservers and minimize stale DNS responses. However, a lower TTL has a direct impact on cost, as a result of increased number of calls from recursive resolvers to Route53 to constantly retrieve the latest records.

The core components of the solution are as follows:

Alongside the services above in the AWS Wavelength Zone, the following services are also leveraged in the AWS Region:

  • AWS Lambda – a serverless event-driven function that makes API calls to the Route 53 service to update DNS records.
  • Amazon EventBridge– a serverless event bus that reacts to EC2 instance lifecycle events and invokes the Lambda function to make DNS updates.
  • Route 53– cloud DNS service with a domain record pointing to AWS Wavelength-hosted resources.

In this post, we intentionally leave the specific load balancing software solution up to the customer. Customers can leverage various popular load balancers available on the AWS Marketplace, such as HAProxy and NGINX. To focus our solution on the auto-registration of DNS records to create functional load balancing, this solution is designed to support stateless workloads only. To support stateful workloads, sticky sessions – a process in which routes requests to the same target in a target group – must be configured by the underlying load balancer solution and are outside of the scope of what DNS can provide natively.

Automation overview

Using the aforementioned components, we can implement the following workflow automation:

Event-driven Auto Scaling Workflow

Amazon CloudWatch alarm can trigger the Auto Scaling group Scale out or Scale in event by adding or removing EC2 instances. Eventbridge will detect the EC2 instance state change event and invoke the Lambda function. This function will update the DNS record in Route53 by either adding (scale out) or deleting (scale in) a weighted A record associated with the EC2 instance changing state.

Configuration of the automatic auto scaling policy is out of the scope of this post. There are many auto scaling triggers that you can consider using, based on predefined and custom metrics such as memory utilization. For the demo purposes, we will be leveraging manual auto scaling.

In addition to the core components that were already described, our solution also utilizes AWS Identity and Access Management (IAM) policies and CloudWatch. Both services are key components to building AWS Well-Architected solutions on AWS. We also use AWS Systems Manager Parameter Store to keep track of user input parameters. The deployment of the solution is automated via AWS CloudFormation templates. The Lambda function provided should be uploaded to an AWS Simple Storage Service (Amazon S3) bucket.

Amazon Virtual Private Cloud (Amazon VPC), subnets, Carrier Gateway, and Route Tables are foundational building blocks for AWS-based networking infrastructure. In our deployment, we are creating a new VPC, one subnet in an AWS Wavelength zone of your choice, a Carrier Gateway, and updating the route table for this subnet to point the default route to the Carrier Gateway.

Wavelength VPC architecture.

Deployment prerequisites

The following are prerequisites to deploy the described solution in your account:

  • Access to an AWS Wavelength zone. If your account is not allow-listed to use AWS Wavelength zones, then opt-in to AWS Wavelength zones here.
  • Public DNS Hosted Zone hosted in Route 53. You must have access to a registered public domain to deploy this solution. The zone for this domain should be hosted in the same account where you plan to deploy AWS Wavelength workloads.
    If you don’t have a public domain, then you can register a new one. Note that there will be a service charge for the domain registration.
  • Amazon S3 bucket. For the Lambda function that updates DNS records in Route 53, store the source code as a .zip file in an Amazon S3 bucket.
  • Amazon EC2 Key pair. You can use an existing Key pair for the deployment. If you don’t have a KeyPair in the region where you plan to deploy this solution, then create one by following these instructions.
  • 4G or 5G-connected device. Although the infrastructure can be deployed independent of the underlying connected devices, testing the connectivity will require a mobile device on one of the Wavelength partner’s networks. View the complete list of Telecommunications providers and Wavelength Zone locations to learn more.

Conclusion

In this post, we demonstrated how to implement DNS-based load balancing for workloads running in an AWS Wavelength zone. We deployed the solution that used the EventBridge Rule and the Lambda function to update DNS records hosted by Route53. If you want to learn more about AWS Wavelength, subscribe to AWS Compute Blog channel here.

Run fault tolerant and cost-optimized Spark clusters using Amazon EMR on EKS and Amazon EC2 Spot Instances

Post Syndicated from Kinnar Kumar Sen original https://aws.amazon.com/blogs/big-data/run-fault-tolerant-and-cost-optimized-spark-clusters-using-amazon-emr-on-eks-and-amazon-ec2-spot-instances/

Amazon EMR on EKS is a deployment option in Amazon EMR that allows you to run Spark jobs on Amazon Elastic Kubernetes Service (Amazon EKS). Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances save you up to 90% over On-Demand Instances, and is a great way to cost optimize the Spark workloads running on Amazon EMR on EKS. Because Spot is an interruptible service, if we can move or reuse the intermediate shuffle files, it improves the overall stability and SLA of the job. The latest versions of Amazon EMR on EKS have integrated Spark features to enable this capability.

In this post, we discuss these features—Node Decommissioning and Persistent Volume Claim (PVC) reuse—and their impact on increasing the fault tolerance of Spark jobs on Amazon EMR on EKS when cost optimizing using EC2 Spot Instances.

Amazon EMR on EKS and Spot

EC2 Spot Instances are spare EC2 capacity provided at a steep discount of up to 90% over On-Demand prices. Spot Instances are a great choice for stateless and flexible workloads. The caveat with this discount and spare capacity is that Amazon EC2 can interrupt an instance with a proactive or reactive (2-minute) warning when it needs the capacity back. You can provision compute capacity in an EKS cluster using Spot Instances using a managed or self-managed node group and provide cost optimization for your workloads.

Amazon EMR on EKS uses Amazon EKS to run jobs with the EMR runtime for Apache Spark, which can be cost optimized by running the Spark executors on Spot. It provides up to 61% lower costs and up to 68% performance improvement for Spark workloads on Amazon EKS. The Spark application launches a driver and executors to run the computation. Spark is a semi-fault tolerant framework that is resilient to executor loss due to an interruption and therefore can run on EC2 Spot. On the other hand, when the driver is interrupted, the job fails. Hence, we recommend running drivers on on-demand instances. Some of the best practices for running Spark on Amazon EKS are applicable with Amazon EMR on EKS.

EC2 Spot instances also helps in cost optimization by improving the overall throughput of the job. This can be achieved by auto-scaling the cluster using Cluster Autoscaler (for managed nodegroups) or Karpenter.

Though Spark executors are resilient to Spot interruptions, the shuffle files and RDD data is lost when the executor gets killed. The lost shuffle files need to be recomputed, which increases the overall runtime of the job. Apache Spark has released two features (in versions 3.1 and 3.2) that addresses this issue. Amazon EMR on EKS released features such as node decommissioning (version 6.3) and PVC reuse (version 6.8) to simplify recovery and reuse shuffle files, which increases the overall resiliency of your application.

Node decommissioning

The node decommissioning feature works by preventing scheduling of new jobs on the nodes that are to be decommissioned. It also moves any shuffle files or cache present in those nodes to other executors (peers). If there are no other available executors, the shuffle files and cache are moved to a remote fallback storage.

Node Decommissioning

Fig 1 : Node Decommissioning

Let’s look at the decommission steps in more detail.

If one of the nodes that is running executors is interrupted, the executor starts the process of decommissioning and sends the message to the driver:

21/05/05 17:41:41 WARN KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Received executor 7 decommissioned message
21/05/05 17:41:41 DEBUG TaskSetManager: Valid locality levels for TaskSet 2.0: NO_PREF, ANY
21/05/05 17:41:41 INFO KubernetesClusterSchedulerBackend: Decommission executors: 7
21/05/05 17:41:41 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_2.0, runningTasks: 10
21/05/05 17:41:41 INFO BlockManagerMasterEndpoint: Mark BlockManagers (BlockManagerId(7, 192.168.82.107, 39007, None)) as being decommissioning.
21/05/05 20:22:17 INFO CoarseGrainedExecutorBackend: Decommission executor 1.
21/05/05 20:22:17 INFO CoarseGrainedExecutorBackend: Will exit when finished decommissioning
21/05/05 20:22:17 INFO BlockManager: Starting block manager decommissioning process...
21/05/05 20:22:17 DEBUG FileSystem: Looking for FS supporting s3a

The executor looks for RDD or shuffle files and tries to replicate or migrate those files. It first tries to find a peer executor. If successful, it will move the files to the peer executor:

22/06/07 20:41:38 INFO ShuffleStatus: Updating map output for 46 to BlockManagerId(4, 192.168.13.235, 34737, None)
22/06/07 20:41:38 DEBUG BlockManagerMasterEndpoint: Received shuffle data block update for 0 46, ignore.
22/06/07 20:41:38 DEBUG BlockManagerMasterEndpoint: Received shuffle index block update for 0 46, updating.

However, if It is not able to find a peer executor, it will try to move the files to a fallback storage if available.

Fallback Storage

Fig 2: Fallback Storage

The executor is then decommissioned. When a new executor comes up, the shuffle files are reused:

22/06/07 20:42:50 INFO BasicExecutorFeatureStep: Adding decommission script to lifecycle
22/06/07 20:42:50 DEBUG ExecutorPodsAllocator: Requested executor with id 19 from Kubernetes.
22/06/07 20:42:50 DEBUG ExecutorPodsWatchSnapshotSource: Received executor pod update for pod named amazon-reviews-word-count-bfd0a5813fd1b80f-exec-19, action ADDED
22/06/07 20:42:50 DEBUG BlockManagerMasterEndpoint: Received shuffle index block update for 0 52, updating.
22/06/07 20:42:50 INFO ShuffleStatus: Recover 52 BlockManagerId(fallback, remote, 7337, None)

The key advantage of this process is that it enables migrates blocks and shuffle data, thereby reducing recomputation, which adds to the overall resiliency of the system and reduces runtime. This process can be triggered by a Spot interruption signal (Sigterm) and node draining. Node draining  may happen due to high-priority task scheduling or independently.

When you use Amazon EMR on EKS with managed node groups/Karpenter, the Spot interruption handling is automated, wherein Amazon EKS gracefully drains and rebalances the Spot nodes to minimize application disruption when a Spot node is at elevated risk of interruption. If you’re using managed node groups/Karpenter, the decommission gets triggered when the nodes are getting drained and because it’s proactive, it gives you more time (at least 2 minutes) to move the files. In the case of self-managed node groups, we recommend installing the AWS Node Termination Handler to handle the interruption, and the decommission is triggered when the reactive (2-minute) notification is received. We recommend to use Karpenter with Spot Instances as it has faster node scheduling with early pod binding and binpacking to optimize the resource utilization.

The following code enables this configuration; more details are available on GitHub:

"spark.decommission.enabled": "true"
"spark.storage.decommission.rddBlocks.enabled": "true"
"spark.storage.decommission.shuffleBlocks.enabled" : "true"
"spark.storage.decommission.enabled": "true"
"spark.storage.decommission.fallbackStorage.path": "s3://<<bucket>>"

PVC reuse

Apache Spark enabled dynamic PVC in version 3.1, which is useful with dynamic allocation because we don’t have to pre-create the claims or volumes for the executors and delete them after completion. PVC enables true decoupling of data and processing when we’re running Spark jobs on Kubernetes, because we can use it as a local storage to spill in-process files too. The latest version of Amazon EMR 6.8 has integrated the PVC reuse feature of Spark, wherein if an executor is terminated due to EC2 Spot interruption or any other reason (JVM), then the PVC is not deleted but persisted and reattached to another executor. If there are shuffle files in that volume, then they are reused.

As with node decommission, this reduces the overall runtime because we don’t have to recompute the shuffle files. We also save the time required to request a new volume for an executor, and shuffle files can be reused without moving the files round.

The following diagram illustrates this workflow.

PVC Reuse

Fig 3: PVC Reuse

Let’s look at the steps in more detail.

If one or more of the nodes that are running executors is interrupted, the underlying pods get terminated and the driver gets the update. Note that the driver is the owner of the PVC of the executors, and they are not terminated. See the following code:

22/06/15 23:25:07 DEBUG ExecutorPodsWatchSnapshotSource: Received executor pod update for pod named amazon-reviews-word-count-9ee82b8169a75183-exec-3, action DELETED
22/06/15 23:25:07 DEBUG ExecutorPodsWatchSnapshotSource: Received executor pod update for pod named amazon-reviews-word-count-9ee82b8169a75183-exec-6, action MODIFIED
22/06/15 23:25:07 DEBUG ExecutorPodsWatchSnapshotSource: Received executor pod update for pod named amazon-reviews-word-count-9ee82b8169a75183-exec-6, action DELETED
22/06/15 23:25:07 DEBUG ExecutorPodsWatchSnapshotSource: Received executor pod update for pod named amazon-reviews-word-count-9ee82b8169a75183-exec-3, action MODIFIED

The ExecutorPodsAllocator tries to allocate new executor pods to replace the ones terminated due to interruption. During the allocation, it figures out how many of the existing PVCs have files and can be reused:

22/06/15 23:25:23 INFO ExecutorPodsAllocator: Found 2 reusable PVCs from 10 PVCs

The ExecutorPodsAllocator requests for a pod and when it launches it, the PVC is reused. In the following example, the PVC from executor 6 is reused for new executor pod 11:

22/06/15 23:25:23 DEBUG ExecutorPodsAllocator: Requested executor with id 11 from Kubernetes.
22/06/15 23:25:24 DEBUG ExecutorPodsWatchSnapshotSource: Received executor pod update for pod named amazon-reviews-word-count-9ee82b8169a75183-exec-11, action ADDED
22/06/15 23:25:24 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/usr/lib/spark/conf) : log4j.properties,spark-env.sh,hive-site.xml,metrics.properties
22/06/15 23:25:24 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
22/06/15 23:25:24 DEBUG ExecutorPodsWatchSnapshotSource: Received executor pod update for pod named amazon-reviews-word-count-9ee82b8169a75183-exec-11, action MODIFIED
22/06/15 23:25:24 INFO ExecutorPodsAllocator: Reuse PersistentVolumeClaim amazon-reviews-word-count-9ee82b8169a75183-exec-6-pvc-0

The shuffle files, if present in the PVC are reused.

The key advantage of this technique is that it allows us to reuse pre-computed shuffle files in their original location, thereby reducing the time of the overall job run.

This works for both static and dynamic PVCs. Amazon EKS offers three different storage offerings, which can be encrypted too: Amazon Elastic Block Store (Amazon EBS), Amazon Elastic File System (Amazon EFS), and Amazon FSx for Lustre. We recommend using dynamic PVCs with Amazon EBS because with static PVCs, you would need to create multiple PVCs.

The following code enables this configuration; more details are available on GitHub:

"spark.kubernetes.driver.ownPersistentVolumeClaim": "true"
"spark.kubernetes.driver.reusePersistentVolumeClaim": "true"

For this to work, we need to enable PVC with Amazon EKS and mention the details in the Spark runtime configuration. For instructions, refer to How do I use persistent storage in Amazon EKS? The following code contains the Spark configuration details for using PVC as local storage; other details are available on GitHub:

"spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly": "false"
"spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName": "OnDemand"
"spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass": "spark-sc"
"spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit": "10Gi"
"spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path": "/var/data/spill"

Conclusion

With Amazon EMR on EKS (6.9) and the features discussed in this post, you can further reduce the overall runtime for Spark jobs when running with Spot Instances. This also improves the overall resiliency and flexibility of the job while cost optimizing the workload on EC2 Spot.

Try out the EMR on EKS workshop for improved performance when running Spark workloads on Kubernetes and cost optimize using EC2 Spot Instances.


About the Author

Kinnar Kumar Sen is a Sr. Solutions Architect at Amazon Web Services (AWS) focusing on Flexible Compute. As a part of the EC2 Flexible Compute team, he works with customers to guide them to the most elastic and efficient compute options that are suitable for their workload running on AWS. Kinnar has more than 15 years of industry experience working in research, consultancy, engineering, and architecture.

Scaling AWS Outposts rack deployments with ACE racks

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/scaling-aws-outposts-rack-deployments-with-ace-racks/

This blog post is written by Eric Vasquez, Specialist Hybrid Edge Solutions Architect, and Paul Scherer, Senior Network Service Tech.

Overview

AWS Outposts brings managed, monitored AWS infrastructure, compute, and storage to your on-premises environment. It provides the same AWS APIs, and console experience you would get within the AWS Region to which the Outpost is homed to. You may already have an Outposts rack. An Outpost can consist of one or more racks creating a pool of consumable resources as a single logical Outpost. In this post, we will introduce you to an Aggregation, Core, Edge (ACE) rack.

Depending on your familiarity with the Outpost family, you might have already heard about an ACE rack. An ACE rack serves as an aggregation point for multi-rack Outpost deployments. ACE racks reduce the physical networking port requirements as well as the logical interfaces needed, while allowing for connectivity between multiple racks in your logical Outpost. ACE racks are recommended for customers with planned deployments beyond three racks excluding the ACE rack itself.

We recommend that all customers leverage an ACE rack if planning expansions beyond three racks in the long-term, even if the initial deployment is a single rack. An ACE rack contains four routers, and these routers can connect to either two or four customer upstream devices. For the best redundancy, reliability, and resiliency, we recommend deploying an ACE rack to four upstream customer devices.

ACE racks support 10G, 40G, and 100G connections to a customer network. However, 100G connections between each ACE router to a customer device are recommended.

Outpost architecture aceOutpost extension from region and ACE rack deployment in a 15 rack Outpost configuration

Each Outposts rack comes standard with redundant Outpost networking devices, power supplies, and two top-of-rack patch panels which serve as demarcation points between the Outpost rack and your customer networking device (CND). For the remainder of this post, we’ll refer to the Outpost Networking Devices as OND and customer switches/routers as CND. The Outpost rack ONDs form Border Gateway Protocol (BGP) neighbor relationships with either your CND or the ACE rack using point-to-point (P2P) Virtual LAN (VLAN) interfaces.

For Outposts installation without an ACE rack, each Outposts OND connects to your LAN using single-mode or multi-mode fiber with LC connectors supporting 1G, 10G, 40G, or 100G connectivity. We provide flexibility for the CNDs and allow either Layer 2 or Layer 3 devices, including firewalls. Each OND uses a single LACP port channel that carries 2 VLAN point-to-points virtual interfaced (VIF)to establish 2 BGP relationships over the port channel to your upstream CND and aggregate total bandwidth. This results in each Outpost rack requiring a minimum of two physical uplinks, but as a general best practice we recommend two-per-device for a total of four uplinks, along with two LACP port channels and 4 VLAN to establish point-to-points (P2P) BGP peering’s. Note that the IP’s used in the following diagram are just examples.

Outpost Service link and Local Gateway VLANOutpost Service link and Local Gateway VLAN

As we continue to expand rack deployments, so will the number of physical uplinks and VLAN interfaces required for the added OND to a CND. When we introduce the ACE rack, the OND is no longer attached to your CND. Instead, it goes directly to ACE devices, which provide at least one uplink to your network switch/router. In this topology, AWS owns the VLAN interface allocation and configuration between compute rack OND and the ACE routers.

Let’s cover the potential downsides to a multi-rack installation without an ACE rack. In this case, we have a three-rack Outpost deployment, with one uplink (two per rack) from each rack OND to the CND. This would require you to provide: six physical ports on your devices, six fiber cables,12 VLAN VIFs, 12 P2P subnets potentially exhausting 24 ips, and six port channels.

In comparison to a three-rack install that sits behind an ACE rack, you provide fewer physical network ports on your devices, fewer fiber cabling uplinks, fewer VLAN VIFs, fewer port channels, and fewer P2P’s. Each ACE router will have its own LACP port channel with 2x VLAN VIFs in each channel (the same as an Outposts Networking Devices (OND) <> Customer connection). The following table highlights the advantages in using an ACE rack when running a multi-rack Outpost, which becomes more desirable as you continue to scale.

2-Rack Outpost

Installation

3-Rack Outpost

Installation

4-Rack Outpost

Installation

Requirement

Without ACE With ACE Without ACE With ACE Without ACE With ACE

Physical Ports

4

4

6

4

8

4

Fiber Cables

4

4 6 4 8

4

LACP Port Channels

4

4 6 4

8

4

VLAN VIFs

8

8 12 8 16

8

P2P Subnets 8 8 12 8 16

8

ACE VS Non-ACE Rack Components Comparison

Furthermore, you should consider the additional weight, and power requirements that an ACE rack introduces when planning for multi-rack deployments. In addition to initial kVA requirements for the Outpost racks you must account for the resources required for an ACE rack. An ACE rack consumes up to 10kVA of power and weighs up to 705 lbs. Carefully planning additional capacity for these resources with your AWS account team will be critical for a successful deployment.

Similar to an Outpost rack, an ACE rack deployment is monitored by AWS. The rack provides telemetry data transmitted over a set of VPN tunnels back to the anchor points in the Region to which the Outpost is homed. This allows AWS to monitor the rack for hardware failures, performance degradation, and other alarm conditions including Links, Interfaces going down, and BGP drops.

As part of the Outpost ordering process, AWS will work closely with you to determine the location for install, power availability on-site, and the network configuration of both the Outposts rack and ACE rack. This includes BGP configuration, and the Customer Owned IP Address (CoIP), which is the pool of IP addresses for route advertisements back to your CND. The COIP pool allows resources inside your Outpost rack to communicate with on-premises resources and vice-versa. Another connectivity option would be the Direct VPC Routing (DVR) where we advertise VPC subnets associated with your LGW to your on-premises networks. Outposts uses a networking connectivity back to the Region for management purposes called the service link (SL). The SL is an encrypted set of VPN connections used whenever the Outpost communicates with your chosen home Region.

Conclusion

This post addresses the most common questions surrounding ACE racks, how an ACE rack can be deployed, and why an ACE rack would be leveraged for a multi-Outpost rack deployment. In this post, we demonstrated how an ACE rack serves as a consolidation point in your on-premises environment, making multi-rack deployments scalable, while reducing complexity and physical port allocation for connectivity between an Outpost and your LAN. In addition, we described how you can get this process started. If you want to learn more about Outposts fundamentals and how you can build your applications with AWS services using Outposts for hybrid cloud deployments you can learn more check out the Outposts user guide.

Securing Lambda Function URLs using Amazon Cognito, Amazon CloudFront and AWS WAF

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/securing-lambda-function-urls-using-amazon-cognito-amazon-cloudfront-and-aws-waf/

This post is written by Madhu Singh (Solutions Architect), and Krupanidhi Jay (Solutions Architect).

Lambda function URLs is a dedicated HTTPs endpoint for a AWS Lambda function. You can configure a function URL to have two methods of authentication: IAM and NONE. IAM authentication means that you are restricting access to the function URL (and in-turn access to invoke the Lambda function) to certain AWS principals (such as roles or users). Authentication type of NONE means that the Lambda function URL has no authentication and is open for anyone to invoke the function.

This blog shows how to use Lambda function URLs with an authentication type of NONE and use custom authorization logic as part of the function code, and to only allow requests that present valid Amazon Cognito credentials when invoking the function. You also learn ways to protect Lambda function URL against common security threats like DDoS using AWS WAF and Amazon CloudFront.

Lambda function URLs provides a simpler way to invoke your function using HTTP calls. However, it is not a replacement for Amazon API Gateway, which provides advanced features like request validation and rate throttling.

Solution overview

There are four core components in the example.

1. A Lambda function with function URLs enabled

At the core of the example is a Lambda function with the function URLs feature enabled with the authentication type of NONE. This function responds with a success message if a valid authorization code is passed during invocation. If not, it responds with a failure message.

2. Amazon Cognito User Pool

Amazon Cognito user pools enable user authentication on websites and mobile apps. You can also enable publicly accessible Login and Sign-Up pages in your applications using Amazon Cognito user pools’ feature called the hosted UI.

In this example, you use a user pool and the associated Hosted UI to enable user login and sign-up on the website used as entry point. This Lambda function validates the authorization code against this Amazon Cognito user pool.

3. CloudFront distribution using AWS WAF

CloudFront is a content delivery network (CDN) service that helps deliver content to end users with low latency, while also improving the security posture for your applications.

AWS WAF is a web application firewall that helps protect your web applications or APIs against common web exploits and bots and AWS Shield is a managed distributed denial of service (DDoS) protection service that safeguards applications running on AWS. AWS WAF inspects the incoming request according to the configured Web Access Control List (web ACL) rules.

Adding CloudFront in front of your Lambda function URL helps to cache content closer to the viewer, and activating AWS WAF and AWS Shield helps in increasing security posture against multiple types of attacks, including network and application layer DDoS attacks.

4. Public website that invokes the Lambda function

The example also creates a public website built on React JS and hosted in AWS Amplify as the entry point for the demo. This website works both in authenticated mode and in guest mode. For authentication, the website uses Amazon Cognito user pools hosted UI.

Solution architecture

This shows the architecture of the example and the information flow for user requests.

In the request flow:

  1. The entry point is the website hosted in AWS Amplify. In the home page, when you choose “sign in”, you are redirected to the Amazon Cognito hosted UI for the user pool.
  2. Upon successful login, Amazon Cognito returns the authorization code, which is stored as a cookie with the name “code”. The user is redirected back to the website, which has an “execute Lambda” button.
  3. When the user choose “execute Lambda”, the value from the “code” cookie is passed in the request body to the CloudFront distribution endpoint.
  4. The AWS WAF web ACL rules are configured to determine whether the request is originating from the US or Canada IP addresses and to determine if the request should be allowed to invoke Lambda function URL origin.
  5. Allowed requests are forwarded to the CloudFront distribution endpoint.
  6. CloudFront is configured to allow CORS headers and has the origin set to the Lambda function URL. The request that CloudFront receives is passed to the function URL.
  7. This invokes the Lambda function associated with the function URL, which validates the token.
  8. The function code does the following in order:
    1. Exchange the authorization code in the request body (passed as the event object to Lambda function) to access_token using Amazon Cognito’s token endpoint (check the documentation for more details).
      1. Amazon Cognito user pool’s attributes like user pool URL, Client ID and Secret are retrieved from AWS Systems Manager Parameter Store (SSM Parameters).
      2. These values are stored in SSM Parameter Store at the time these resources are deployed via AWS CDK (see “how to deploy” section)
    2. The access token is then verified to determine its authenticity.
    3. If valid, the Lambda function returns a message stating user is authenticated as <username> and execution was successful.
    4. If either the authorization code was not present, for example, the user was in “guest mode” on the website, or the code is invalid or expired, the Lambda function returns a message stating that the user is not authorized to execute the function.
  9. The webpage displays the Lambda function return message as an alert.

Getting started

Pre-requisites:

Before deploying the solution, please follow the README from the GitHub repository and take the necessary steps to fulfill the pre-requisites.

Deploy the sample solution

1. From the code directory, download the dependencies:

$ npm install

2. Start the deployment of the AWS resources required for the solution:

$ cdk deploy

Note:

  • optionally pass in the –profile argument if needed
  • The deployment can take up to 15 minutes

3. Once the deployment completes, the output looks similar to this:

Open the amplifyAppUrl from the output in your browser. This is the URL for the demo website. If you don’t see the “Welcome to Compute Blog” page, the Amplify app is still building, and the website is not available yet. Retry in a few minutes. This website works either in an authenticated or unauthenticated state.

Test the authenticated flow

  1. To test the authenticated flow, choose “Sign In”.

2. In the sign-in page, choose on sign-up (for the first time) and create a user name and password.

3. To use an existing an user name and password, enter those credentials and choose login.

4. Upon successful sign-in or sign up, you are redirected back to the webpage with “Execute Lambda” button.

5. Choose this button. In a few seconds, an alert pop-up shows the logged in user and that the Lambda execution is successful.

Testing the unauthenticated flow

1. To test the unauthenticated flow, from the Home page, choose “Continue”.

2. Choose “Execute Lambda” and in a few seconds, you see a message that you are not authorized to execute the Lambda function.

Testing the geo-block feature of AWS WAF

1. Access the website from a Region other than US or Canada. If you are physically in the US or Canada, you may use a VPN service to connect to a Region other than US or Canada.

2. Choose the “Execute Lambda” button. In the Network trace of browser, you can see the call to invoke Lambda function was blocked with Forbidden response.

3. To try either the authenticated or unauthenticated flow again, choose “Return to Home Page” to go back to the home page with “Sign In” and “Continue” buttons.

Cleaning up

To delete the resources provisioned, run the cdk destroy command from the AWS CDK CLI.

Conclusion

In this blog, you create a Lambda function with function URLs enabled with NONE as the authentication type. You then implemented a custom authentication mechanism as part of your Lambda function code. You also increased the security of your Lambda function URL by setting it as Origin for the CloudFront distribution and using AWS WAF Geo and IP limiting rules for protection against common web threats, like DDoS.

For more serverless learning resources, visit Serverless Land.

Journey to adopt Cloud-Native DevOps platform Series #1: OfferUp modernized DevOps platform with Amazon EKS and Flagger to accelerate time to market

Post Syndicated from Purna Sanyal original https://aws.amazon.com/blogs/devops/journey-to-adopt-cloud-native-devops-platform-series-1-offerup-modernized-devops-platform-with-amazon-eks-and-flagger-to-accelerate-time-to-market/

In this two part series, we discuss the challenges faced by OfferUp, a Digital Native customer, to meet business growth and time-to-market. Their journey involved modernizing their existing DevOps platform, from the traditional monolith virtual machine (VM) based architecture to modern containerized architecture and running cloud-native applications for secured progressive delivery to accelerate time to market. This series will provide strategies, architecture patterns, and technical steps you can adopt to become more agile and innovative like OfferUp has.

OfferUp engineers were using the homegrown DevOps platform to build and release new services on the marketplace platform. In this first post, we discuss the key challenges encountered by OfferUp engineers with the existing DevOps platform, as well as how OfferUp modernized its DevOps platform with Amazon Elastic Kubernetes Service (Amazon EKS) and Flagger, automating production releases with progressive delivery techniques for faster time-to-market with new products and services. Amazon EKS is a managed container service to run and scale Kubernetes applications in the cloud or on-premises.

Previous DevOps architecture

OfferUp is a leading online and mobile customer to customer (C2C) marketplace where users can both buy and sell goods on the platform. Users can browse and purchase products from a broad range of categories, including furniture, clothing, sports equipment, toys, and many more. As a mobile-first company, OfferUp puts a great deal of emphasis on in-person communication between buyers and sellers.

OfferUp built a home grown, self-managed DevOps platform. This platform used a set of manual processes and third-party applications that allows both developers and operations engineers to build and deploy code to a production environment. The DevOps pipeline included topic areas such as source code control, continuous integration/continuous delivery (CI/CD), microservices, as well as development and test Methodologies. The following diagram depicts the previous architecture of OfferUp’s DevOps platform, which was self-managed on Amazon Elastic Compute Cloud (Amazon EC2).

Figure 1: Previous DevOps architecture of OfferUp

OfferUp used GitHub for code repositories. Once the source code was committed in the code repository, Jenkins pulled the source code from code repositories on a scheduled or on-demand basis and built Amazon Machine Images (AMI). The built image was deployed in production by a  custom built deployment tool, Vanaheim, which supports one-box canary deployment and full roll-out deployment strategies. The DevOps engineers used to manually create a deployment job in the Vanaheim portal and then manually monitor the test success rate and service metrics to detect any impact from the deployment. Once the success rate was reached, a full production roll out was performed from the Vanaheim portal.

Key challenges with previous DevOps pipeline

In 2020, OfferUp experienced significant transaction volume growth on its Marketplace platform with the increase of its user base. With OfferUp’s acquisition of LetGo in 2020, there was a need to build a scalable DevOps platform to support future integration and organic growth. The previous DevOps platform, designed and deployed over seven years ago, had reached the limits of its scalability, and could no longer keep up with the platform’s growth. The previous architecture was expensive to run and had a complex infrastructure that made it difficult to upgrade and add new features.

The following key factors drove the push for modernization:

  • Manual verification was required to check if the code was correctly deployed in one of the servers in production, and if the deployment was right in one server, then it was rolled out to other production servers. Full Rollout to production wasn’t automated due to frequent failures requiring manual rollbacks.
  • The previous platform required a longer deployment time (1–2 hours) due to the authoritative batch process, which sometimes caused delays in releasing and testing of new features.
  • The self-managed nature of the Jenkins and Vanaheim clusters was consuming far too much engineering time. Most of the institutional knowledge of this legacy platform was lost over the years and it didn’t align with OfferUp’s philosophy of small DevOps engineering teams. Innovation had stalled partly due to the difficulty of simultaneously upgrading the DevOps platform and releasing new features.

DevOps platform automation with Flagger and Gloo Ingress Controller on Amazon EKS

A key requirement for the next-generation system was that the new architecture would reduce the operational burden on engineering teams, deployment lifecycle, and total cost of ownership. OfferUp evaluated multiple managed container orchestration platforms for its DevOps Platform. It finally selected Amazon EKS for high availability, reducing the average time to deploy a change to the stack from hours to just a few minutes and reducing the complexity in managing and upgrading the Kubernetes cluster. On the Amazon EKS platform, OfferUp uses Flagger, a progressive delivery tool that automates the release process for applications running on Kubernetes. Flagger implements several deployment strategies (Canary releases, A/B testing, and Blue/Green mirroring) using the Gloo Edge ingress controller for traffic routing. Datadog is used as an observability service for monitoring the health of the deployments and effectively managing the canary to progressive delivery. For release analysis, Flagger runs a query on Datadog logs and uses Slack for alerting and notifications. The cloud native technology components of the architecture are described as follows:

Kubernetes and Amazon EKS – Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications. Kubernetes is a graduate project in the CNCF. Amazon EKS is a fully-managed, certified Kubernetes conformant service that simplifies the process of building, securing, operating, and maintaining Kubernetes clusters on AWS. Amazon EKS integrates with core AWS services, such as Amazon CloudWatch, Auto Scaling Groups, and AWS Identity and Access Management (IAM) to provide a seamless experience for monitoring, scaling, and load balancing your containerized applications.

Helm – Helm manage Kubernetes applications. Helm Charts define, install, and upgrade even the most complex Kubernetes application. Charts are easy to create, version, share, and publish. If Kubernetes were an operating system, then Helm would be the package manager. Helm is a graduate project in the CNCF and is maintained by the Helm community.

Flagger – Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators such as HTTP requests success rate, requests average duration, and pods health. Based on the set thresholds, a canary is either promoted or aborted and its analysis is pushed to a Slack channel. Flagger became a CNCF project – part of the Flux family of GitOps tools.

Gloo EdgeGloo Edge is a feature-rich, Kubernetes-native ingress controller. Gloo Edge is exceptional in its function-level routing; its support for legacy apps, microservices, and serverless; its discovery capabilities; and its tight integration with leading open-source projects. Gloo Edge is uniquely designed to support hybrid applications, in which multiple technologies, architectures, protocols, and clouds can coexist.

Observability platformDatadog’s integrations with Kubernetes, Docker, and AWS will let you track the full range of Amazon EKS metrics, as well as logs and performance data from your cluster and applications. Datadog gives you comprehensive coverage of your dynamic infrastructure and applications with features like auto discovery to track services across containers, sophisticated graphing, and alerting options.

Modernized DevOps architecture

In the new architecture, OfferUp uses Github as a version control tool and Github actions as their CI/CD tool. On every Pull request, tests are run, artifacts are built and stored in the JFrog Artifactory, and docker Images are stored in the Amazon Elastic Container Registry (Amazon ECR). Separate deployment pipelines are triggered based on the environment (dev, staging, and production) of choice. Flagger detects any changes in the version of the application and gradually shifts production traffic to the canary. It measures the requests success rate and average response duration metrics from Datadog to decide full rollout in production. For an application deployment, a canary promotion can be defined using Flagger’s custom resource. Flagger rolls back the deployment when the success rate falls below the defined desired success rate metrics.

Figure 2: Modernized DevOps architecture of OfferUp

With the modernized DevOps platform, OfferUp moved from monolithic to microservice architecture where  front-end applications and GraphQL runs on the Amazon EKS cluster. The production cluster runs 110 services and 650+ pods on 60 nodes. The cluster scales up to 100 nodes with Amazon Auto Scaling group based on the traffic pattern. On the networking front, the cluster has a private endpoint and uses both VPC CNI plugin, and the CoreDNS add-on. There are four Amazon EKS clusters, one each for the production, test, utility, and the staging environments. OfferUp has a plan to explore Karpenter open-source autoscaling project, and it will move new applications to the Amazon EKS cluster, allowing the total node counts to scale up to 200.

Benefits of modernized architecture

The new architecture helped OfferUp make  automated decisions to deploy new releases and improve the time to market while reducing unplanned production downtime

  • Faster deployments and Quicker rollbacks – The new architecture reduces the Service Deployment time from one hour down to five minutes, and automates rollback time to five minutes from the manual rollback time of one hour.
  • Automate deployment of new releases – The lack of canary deployment processes in the previous architecture required OfferUp engineers to manually intervene to validate the deployment status, which led to administrative overhead and production outages. The canary deployments take care of the traffic shifting by automatically measuring the requests’ success rate and latency metrics from Datadog and subsequently release the service to production. Deployments are automatically rolled back when the success rate falls below the defined success rate metric thresholds.
  • Simplified Configuration – Configuration has been simplified drastically and integrated within the CI/CD pipeline in the new architecture, thereby reducing configuration complexity, eliminating manual processes, and saving Developers time.
  • More time to Focus on Innovation – With fully automated progressive delivery, the developers no longer need to spend time testing and releasing source code in production. Similarly, migrating from a Self-managed DevOps platform to the Managed Amazon EKS services lowered the DevOps platform’s infrastructure management burden on the engineering team. This helps developers spend more time focusing on building and testing new features and innovations.
  • Cost reduction – Moving from self-managed Amazon EC2-based architecture to the Amazon EKS cluster reduced the cost of operations through shared nodes and improved pod density. The previous architecture was using 200 nodes of Amazon EC2 instances. The same workload was moved to a 50 nodes Amazon EKS cluster. Furthermore, custom applications (Vanaheim and Jenkins) were retired, further reducing the costs.

Conclusion

In this post, you see how OfferUp embarked on the journey to modernize its DevOps platform to support its growth and developers’ velocity. The key factors that drove the modernization decisions were the ability to scale the platform to support the automated testing of features in production, the faster release of new features, cost reduction, and to facilitate future innovation. The modernized DevOps platform on Amazon EKS also decreased the ongoing operational support burden for engineers, and the scalability of the design opens up a lot of headroom for growth.

We encourage you to look into modernizing your existing CI/CD pipeline on Amazon EKS with the Flagger progressive delivery mechanism. Amazon EKS removes the undifferentiated heavy lifting of managing and updating the Kubernetes cluster. Managed node groups automate the provisioning and lifecycle management of worker nodes in an Amazon EKS cluster, which greatly simplifies operational activities, such as new Kubernetes version deployments.

In the next part of the series, you’ll discover how to implement Flagger and Gloo Edge Ingress Controller on Amazon EKS to automate the release process for applications running on Kubernetes.

Further Reading

Journey to adopt Cloud-Native DevOps platform Series #2: Progressive delivery on Amazon EKS with Flagger and Gloo Edge Ingress Controller

About the authors:

Purna Sanyal

Purna Sanyal is a technology enthusiast and an architect at AWS, helping digital native customers solve their business problems with successful adoption of cloud native architecture. He provides technical thought leadership, architecture guidance, and conducts PoCs to enable customers’ digital transformation. He is also passionate about building innovative solutions around Kubernetes, database, analytics, and machine learning.

Alan Liu

Alan Liu is Sr Director of Engineering at OfferUp. He is a technology enthusiast and he worked across a wide variety of industry. He is highly effective, adaptable, scalable, experienced leader with a proven record.

Monitoring shared AWS Outposts rack capacity

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/monitoring-shared-aws-outposts-rack-capacity/

This post is written by Adam Imeson, Sr. Hybrid Edge Specialist Solutions Architect.

AWS Outposts rack is a fully-managed service that offers the same AWS infrastructure, APIs, tools, and a subset of AWS services to any data center, colocation space, or on-premises facility for a consistent hybrid experience. Outposts rack is ideal for workloads that require low latency, access to on-premises systems, local data processing, data residency, and migration of applications with local system interdependencies.

An Outpost is a pool of AWS compute and storage capacity deployed at a customer site. In an Outposts rack deployment, an Outpost may comprise of one or more racks connected together at the site. It’s common for customers to order their Outpost in a dedicated account and then integrate with their multi-account organizational architecture by sharing the Outpost via AWS Resource Access Manager (AWS RAM). This post will explain how to set up cross-account Amazon CloudWatch metrics so that disparate stakeholders within your organization can effectively monitor your Outpost’s capacity to meet their specific needs.

Overview

The AWS account that you use to order an Outpost owns that Outpost. This includes all metrics and health events pertaining to that Outpost. Many customers must integrate Outposts into their multi-account environments, as discussed in the “Best practices: AWS Outposts in a multi-account AWS environment” posts (part 1 and part 2). This post will go into more detail on how to monitor Outposts in these environments.

The nuance here stems from the different ways to share access to AWS resources. AWS RAM allows infrastructure resources to be shared across multiple accounts. Then, the consumer accounts can launch resources on the infrastructure as though they owned it. AWS Identity and Access Management (IAM) allows customers to modify a given account’s permissions such that users in other accounts can make AWS API calls that affect the given account.

An Outpost provides infrastructure resources, so customers can share Outposts via AWS RAM. CloudWatch metrics about Outposts are data which customers retrieve using AWS API calls, so customers can share access to those metrics using IAM.

In a typical customer’s AWS Organization, there are two cases to consider. First, when the customer is sharing an Outpost to multiple development accounts, each account needs to view metrics relevant to the Outpost so that the development accounts can deploy and operate their applications.Diagram depicting an Outpost that a customer has shared to three different accounts using RAM. The three different accounts each have a different application deployed in them.

Second, when the customer has several accounts that each own different Outposts, the customer’s centralized monitoring account needs to track metrics relevant to each of the Outposts.

Diagram depicting three accounts that each own a separate Outpost, with all three accounts sharing Outpost metrics to CloudWatch in the customer’s central monitoring account.

This post will explain strategies for both cases.

Customers must monitor the health of the Outpost’s connection to its regional control plane (the Outpost’s service link), as an Outpost is an extension of an AWS Availability Zone (AZ) and is designed to be connected to an AZ at all times. The health of the Outpost’s service link is a crucial variable when application owners are diagnosing disruptions to their application, and also when infrastructure owners are diagnosing disruptions to a site. Customers can monitor their service link’s status with the ConnectedStatus metric.

Customers also must monitor their Outposts’ current capacity. Outposts necessarily have a limited capacity footprint when compared to an AWS Region. Application owners must make informed decisions about capacity as they scale their apps over time or respond to occasional hardware failures. Infrastructure owners also must maintain a holistic view of capacity across all of the Outposts for which they are responsible so that they can plan for capacity expansion over time. Customers can monitor their Outposts’ capacity using the various capacity metrics that Outposts provide.

For an overview of how to set up a capacity dashboard and capacity-based CloudWatch alarms within a single account, see “Monitoring AWS Outposts capacity.” This post will expand on the single-account strategy by introducing cross-account capabilities. See also “Cross-Account Cross-Region Dashboards with Amazon CloudWatch.” These two posts provide practical walkthroughs for setting up the metric flows explained below.

Setting up Outposts metric permissions for your organization

This post assumes that you have multiple Outposts in different accounts that are all part of the same Organization. You’re sharing these Outposts into accounts that development teams use to deploy and operate their applications. You also have a centralized monitoring account where your infrastructure team tracks various metrics across all accounts. Your Organization might look something like this:

A base diagram depicting six AWS accounts with different names. Outpost Account 1 contains an Outpost. Outpost Account 2 contains a different Outpost. Monitoring Account contains Amazon CloudWatch. Accounts A through C contain Applications A through C respectively.

The first Outpost is shared to Accounts A and B, and the second Outpost is only shared to Account B. This is just an example of how a customer might set up their environment so that Application A can deploy on Outpost 1, and Application B can deploy on both Outpost 1 and 2.

The same base diagram of the six AWS accounts as before, with arrows added to depict AWS RAM resource shares. Outpost Account 1 shares its Outpost to Accounts A and B. Outpost Account 2 shares its Outpost to Account B.

To enable centralized monitoring, each account shares CloudWatch metrics with the central monitoring account as described in “Cross-Account Cross-Region Dashboards with Amazon CloudWatch.”

The same base diagram of the six AWS accounts, with arrows added to depict CloudWatch metrics being shared from all five of the other accounts to the Monitoring Account.

Now there are application accounts which can launch on the desired Outposts, and all of the accounts are sharing metrics with the central monitoring account. The team responsible for procuring and managing the Outposts can now set up dashboards in the central monitoring account in accordance with “Monitoring AWS Outposts capacity” to get a holistic view of capacity. This is valuable for capacity planning as applications naturally grow over time.

However, this may not be sufficient for operations. Consider that each application team needs to understand how much capacity is available on the Outpost that they’re using. This is crucial for teams operating highly available applications to maintain awareness of whether they still have N+1 capacity available on the Outpost to use in the event of a hardware failure. This is also important for planning expansions to the application ahead of time, as application teams have the best understanding of the future needs of their applications. Finally, application teams can use the metrics to track the operational health of the Outpost, which is crucial for root-causing any application disruptions.

You can implement this by sharing CloudWatch metrics from the Outpost accounts to the application accounts which are consuming the Outposts’ capacity, as shown in the following diagram.

The same base diagram of the six AWS accounts, with arrows depicting CloudWatch metrics being shared. Outpost Account 1 is sharing CloudWatch metrics to Accounts A and B. Outpost Account 2 is sharing CloudWatch metrics to Account B.

Walkthrough

Log in to your application account and navigate to the CloudWatch console. Open the Settings menu and choose Configure.

Screenshot of the CloudWatch Console’s Settings menu.

Scroll to the bottom. In the View cross-account cross-region section, choose Edit.

Screenshot of the Cross-account cross-region sub-menu in the CloudWatch console.

Choose your preferred account selection method from the three options and choose Save changes. I recommend the Custom account selector option, as it strikes a good balance between a simple setup and ease of use. If you choose this option, then input the Outpost owner account’s account ID and a human-readable name for the account. This name will appear in the drop-down when you’re using the CloudWatch console to view metrics from other accounts later.

Screenshot of the Cross-account cross-region sub-menu in the CloudWatch console, with the “Custom account selector” option selected and “123456789012 Outpost owner account” in the input field.

Your application account is now prepared to view metrics from the Outpost owner account. Now log in to the account that owns the Outpost and navigate to the CloudWatch console. You still need to share the Outpost’s metrics to the application account. Open the Settings page again, and choose Configure in the Cross-account cross-region section as before. This time, choose Share data in the Share your CloudWatch data section:

Screenshot of the Cross-account cross-region sub-menu in the CloudWatch console, with the “Share data” button circled in red in the “Share your CloudWatch data” section.

Choose Add account and input the application account’s account ID. Then scroll to the bottom of the page and choose Launch CloudFormation template.

Screenshot of the “Share your CloudWatch data sub-menu in the CloudWatch console. The “Specific accounts” option in the “Sharing” section is highlighted, and the sample account ID “234567890123” is typed into the input field.

The AWS CloudFormation template will create the CloudWatch-CrossAccountSharingRole. This role gives CloudWatch read access to the AWS account that you specified, the application account. You can view and modify this role using the IAM console if you want to. For example, you might adjust the role to allow read access to an entire Organizational Unit (OU).

Now, log back in to the application account and navigate to the CloudWatch console. Choose All metrics in the left-side menu. In the Metrics section, select the Outpost owner account from the drop-down.

Screenshot of the CloudWatch console’s “All metrics” sub-page. The account selection drop-down is circled in red in the “Metrics” subsection.

You can now view the metrics from the Outpost owner account and incorporate them into the dashboards in the application account. Now the application teams can track the Outposts’ ConnectedStatus metrics to be alerted on any disconnections from the region, and they can track the Outposts’ capacity metrics as well. It’s a best practice to alarm on Outpost capacity metrics once a consumption threshold defined by business needs has been breached.

Conclusion

Outposts rack allows customers to deploy AWS infrastructure into virtually any data center, colocation space, or on-premises facility. Outposts are tied to the AWS account that ordered them, and customers can share Outposts among AWS accounts within the same Organization. When multiple teams within a customer’s Organization are interacting with the same Outpost, that introduces additional monitoring surface area for capacity and service health. This post explains how customers can accommodate their teams’ different needs by sharing Outposts metrics around their Organization along with their Outposts. As best practices, customers should share their Outposts capacity and ConnectedStatus metrics to teams who are running applications on Outposts. Customers’ operations teams should also work with their stakeholders to define a maximum capacity utilization threshold for a given Outpost and alarm on that threshold.

Our guide to AWS Compute at re:Invent 2022

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/our-guide-to-aws-compute-at-reinvent-2022/

This blog post is written by Shruti Koparkar, Senior Product Marketing Manager, Amazon EC2.

AWS re:Invent is the most transformative event in cloud computing and it is starting on November 28, 2022. AWS Compute team has many exciting sessions planned for you covering everything from foundational content, to technology deep dives, customer stories, and even hands on workshops. To help you build out your calendar for this year’s re:Invent, let’s look at some highlights from the AWS Compute track in this blog. Please visit the session catalog for a full list of AWS Compute sessions.

Learn what powers AWS Compute

AWS offers the broadest and deepest functionality for compute. Amazon Elastic Cloud Compute (Amazon EC2) offers granular control for managing your infrastructure with the choice of processors, storage, and networking.

The AWS Nitro System is the underlying platform for our all our modern EC2 instances. It enables AWS to innovate faster, further reduce cost for our customers, and deliver added benefits like increased security and new instance types.

Discover the benefits of AWS Silicon

AWS has invested years designing custom silicon optimized for the cloud. This investment helps us deliver high performance at lower costs for a wide range of applications and workloads using AWS services.

  • Explore the AWS journey into silicon innovation with our “CMP201: Silicon Innovation at AWS” session. We will cover some of the thought processes, learnings, and results from our experience building silicon for AWS Graviton, AWS Nitro System, and AWS Inferentia.
  • To learn about customer-proven strategies to help you make the move to AWS Graviton quickly and confidently while minimizing uncertainty and risk, attend “CMP410: Framework for adopting AWS Graviton-based instances”.

 Explore different use cases

Amazon EC2 provides secure and resizable compute capacity for several different use-cases including general purpose computing for cloud native and enterprise applications, and accelerated computing for machine learning and high performance computing (HPC) applications.

High performance computing

  • HPC on AWS can help you design your products faster with simulations, predict the weather, detect seismic activity with greater precision, and more. To learn how to solve world’s toughest problems with extreme-scale compute come join us for “CMP205: HPC on AWS: Solve complex problems with pay-as-you-go infrastructure”.
  • Single on-premises general-purpose supercomputers can fall short when solving increasingly complex problems. Attend “CMP222: Redefining supercomputing on AWS” to learn how AWS is reimagining supercomputing to provide scientists and engineers with more access to world-class facilities and technology.
  • AWS offers many solutions to design, simulate, and verify the advanced semiconductor devices that are the foundation of modern technology. Attend “CMP320: Accelerating semiconductor design, simulation, and verification” to hear from ARM and Marvel about how they are using AWS to accelerate EDA workloads.

Machine Learning

Cost Optimization

Hear from our customers

We have several sessions this year where AWS customers are taking the stage to share their stories and details of exciting innovations made possible by AWS.

Get started with hands-on sessions

Nothing like a hands-on session where you can learn by doing and get started easily with AWS compute. Our speakers and workshop assistants will help you every step of the way. Just bring your laptop to get started!

You’ll get to meet the global cloud community at AWS re:Invent and get an opportunity to learn, get inspired, and rethink what’s possible. So build your schedule in the re:Invent portal and get ready to hit the ground running. We invite you to stop by the AWS Compute booth and chat with our experts. We look forward to seeing you in Las Vegas!

Doubling down on local development with Workers: Miniflare meets workerd

Post Syndicated from Brendan Coll original https://blog.cloudflare.com/miniflare-and-workerd/

Doubling down on local development with Workers: Miniflare meets workerd

Doubling down on local development with Workers: Miniflare meets workerd

Local development gives you a fully-controllable and easy-to-debug testing environment. At the start of this year, we brought this experience to Workers developers by launching Miniflare 2.0: a local Cloudflare Workers simulator. Miniflare 2 came with features like step-through debugging support, detailed console.logs, pretty source-mapped error pages, live reload and a highly-configurable unit testing environment. Not only that, but we also incorporated Miniflare into Wrangler, our Workers CLI, to enable wrangler dev’s —local mode.

Today, we’re taking local development to the next level! In addition to introducing new support for migrating existing projects to your local development environment, we’re making it easier to work with your remote data—locally! Most importantly, we’re releasing a much more accurate Miniflare 3, powered by the recently open-sourced workerd runtime—the same runtime used by Cloudflare Workers!

Enabling local development with workerd

One of the superpowers of having a local development environment is that you can test changes without affecting users in production. A great local environment offers a level of fidelity on par with production.

The way we originally approached local development was with Miniflare 2, which reimplemented Workers runtime APIs in JavaScript. Unfortunately, there were subtle behavior mismatches between these re-implementations and the real Workers runtime. These types of issues are really difficult for developers to debug, as they don’t appear locally, and step-through debugging of deployed Workers isn’t possible yet. For example, the following Worker returns responses successfully in Miniflare 2, so we might assume it’s safe to publish:

let cachedResponsePromise;
export default {
  async fetch(request, env, ctx) {
    // Let's imagine this fetch takes a few seconds. To speed up our worker, we
    // decide to only fetch on the first request, and reuse the result later.
    // This works fine in Miniflare 2, so we must be good right?
    cachedResponsePromise ??= fetch("https://example.com");
    return (await cachedResponsePromise).clone();
  },
};

However, as soon as we send multiple requests to our deployed Worker, it fails with Error: Cannot perform I/O on behalf of a different request. The problem here is that response bodies created in one request’s handler cannot be accessed from a different request’s handler. This limitation allows Cloudflare to improve overall Worker performance, but it was almost impossible for Miniflare 2 to detect these types of issues locally. In this particular case, the best solution is to cache using fetch itself.

Additionally, because the Workers runtime uses a very recent version of V8, it supports some JavaScript features that aren’t available in all versions of Node.js. This meant a few features implemented in Workers, like Array#findLast, weren’t always available in Miniflare 2.

With the Workers runtime now open-sourced, Miniflare 3 can leverage the same implementations that are deployed on Cloudflare’s network, giving bug-for-bug compatibility and practically eliminating behavior mismatches. 🎉

Doubling down on local development with Workers: Miniflare meets workerd
Miniflare 3’s new simplified architecture using worked

This radically simplifies our implementation too. We were able to remove over 50,000 lines of code from Miniflare 2. Of course, we still kept all the Miniflare special-sauce that makes development fun like live reload and detailed logging. 🙂

Doubling down on local development with Workers: Miniflare meets workerd

Local development with real data

We know that many developers choose to test their Workers remotely on the Cloudflare network as it gives them the ability to test against real data. Testing against fake data in staging and local environments is sometimes difficult, as it never quite matches the real thing.

With Miniflare 3, we’re blurring the lines between local and remote development, by bringing real data to your machine as an experimental opt-in feature. If enabled, Miniflare will read and write data to namespaces on the Cloudflare network, as your Worker would when deployed. This is only supported with Workers KV for now, but we’re exploring similar solutions for R2 and D1.

Doubling down on local development with Workers: Miniflare meets workerd
Miniflare’s system for accessing real KV data, reads and writes are cached locally for future accesses

A new default for Wrangler

With Miniflare 3 now effectively as accurate as the real Workers environment, and the ability to access real data locally, we’re revisiting the decision to make remote development the initial Wrangler experience. In a future update, wrangler dev --local will become the default. --local will no longer be required. Benchmarking suggests this will bring an approximate 10x reduction to startup and a massive 60x reduction to script reload times! Over the next few weeks, we’ll be focusing on further optimizing Wrangler’s performance to bring you the fastest Workers development experience yet!

Doubling down on local development with Workers: Miniflare meets workerd

wrangler init --from-dash

We want all developers to be able to take advantage of the improved local experience, so we’re making it easy to start a local Wrangler project from an existing Worker that’s been developed in the Cloudflare dashboard. With Node.js installed, run npx wrangler init –from-dash <your_worker_name> in your terminal to set up a new project with all your existing code and bindings such as KV namespaces configured. You can now seamlessly continue development of your application locally, taking advantage of all the developer experience improvements Wrangler and Miniflare provide. When you’re ready to deploy your worker, run npx wrangler publish.

Looking to the future

Over the next few months, the Workers team is planning to further improve the local development experience with a specific focus on automated testing. Already, we’ve released a preliminary API for programmatic end-to-end tests with wrangler dev, but we’re also investigating ways of bringing Miniflare 2’s Jest/Vitest environments to workerd. We’re also considering creating extensions for popular IDEs to make developing workers even easier. 👀

Miniflare 3.0 is now included in Wrangler! Try it out by running npx wrangler@latest dev --experimental-local. Let us know what you think in the #wrangler channel on the Cloudflare Developers Discord, and please open a GitHub issue if you hit any unexpected behavior.

Keep track of Workers’ code and configuration changes with Deployments

Post Syndicated from Kabir Sikand original https://blog.cloudflare.com/deployments-for-workers/

Keep track of Workers’ code and configuration changes with Deployments

Keep track of Workers’ code and configuration changes with Deployments

Today we’re happy to introduce Deployments for Workers. Deployments allow developers to keep track of changes to their Worker; not just the code, but the configuration and bindings as well. With deployments, developers now have access to a powerful audit log of changes to their production applications.

And tracking changes is just the beginning! Deployments provide a strong foundation to add: automated deployments, rollbacks, and integration with version control.

Today we’ll dive into the details of deployments, how you can use them, and what we’re thinking about next.

Deployments

Deployments are a powerful new way to track changes to your Workers. With them, you can track who’s making changes to your Workers, where those changes are coming from, and when those changes are being made.

Keep track of Workers’ code and configuration changes with Deployments

Cloudflare reports on deployments made from wrangler, API, dashboard, or Terraform anytime you make changes to your Worker’s code, edit resource bindings and environment variables, or modify configuration like name or usage model.

Keep track of Workers’ code and configuration changes with Deployments

We expose the source of your deployments, so you can track where changes are coming from. For example, if you have a CI job that’s responsible for changes, and you see a user made a change through the Cloudflare dashboard, it’s easy to flag that and dig into whether the deployment was a mistake.

Interacting with deployments

Cloudflare tracks the authors, sources, and timestamps of deployments. If you have a set of users responsible for deployment, or an API Token that’s associated with your CI tool, it’s easy to see which made recent deployments. Each deployment also includes a timestamp, so you can track when those changes were made.

Keep track of Workers’ code and configuration changes with Deployments

You can access all this deployment information in your Cloudflare dashboard, under your Worker’s Deployments tab. We also report on the active version right at the front of your Worker’s detail page. Wrangler will also report on deployment information. wrangler publish now reports the latest deployed version, and a new `wrangler deployments` command can be used to view a deployment history.

Keep track of Workers’ code and configuration changes with Deployments

To learn more about the details of deployments, head over to our Developer Documentation.

What’s next?

We’re excited to share deployments with our customers, available today in an open beta. As we mentioned up front, we’re just getting started with deployments. We’re also excited for more on-platform tooling like rollbacks, deploy status, deployment rules, and a view-only mode to historical deployments. Beyond that, we want to ensure deployments can be automated from commits to your repository, which means working on version control integrations to services like GitHub, Bitbucket, and Gitlab. We’d love to hear more about how you’re currently using Workers and how we can improve developer experience. If you’re interested, let’s chat.

If you’d like to join the conversation, head over to Cloudflare’s Developer Discord and give us a shout! We love hearing from our customers, and we’re excited to see what you build with Cloudflare.

UPDATE Supercloud SET status = ‘open alpha’ WHERE product = ‘D1’;

Post Syndicated from Nevi Shah original https://blog.cloudflare.com/d1-open-alpha/

UPDATE Supercloud SET status = 'open alpha' WHERE product = 'D1';

UPDATE Supercloud SET status = 'open alpha' WHERE product = 'D1';

In May 2022, we announced our quest to simplify databases – building them, maintaining them, integrating them. Our goal is to empower you with the tools to run a database that is powerful, scalable, with world-beating performance without any hassle. And we first set our sights on reimagining the database development experience for every type of user – not just database experts.

Over the past couple of months, we’ve been working to create just that, while learning some very important lessons along the way. As it turns out, building a global relational database product on top of Workers pushes the boundaries of the developer platform to their absolute limit, and often beyond them, but in a way that’s absolutely thrilling to us at Cloudflare. It means that while our progress might seem slow from outside, every improvement, bug fix or stress test helps lay down a path for all of our customers to build the world’s most ambitious serverless application.

However, as we continue down the road to making D1 production ready, it wouldn’t be “the Cloudflare way” unless we stopped for feedback first – even though it’s not quite finished yet. In the spirit of Developer Week, there is no better time to introduce the D1 open alpha!

An “open alpha” is a new concept for us. You’ll likely hear the term “open beta” on various announcements at Cloudflare, and while it makes sense for many products here, it wasn’t quite right for D1. There are still some crucial pieces that are still in active development and testing, so before we release the fully-formed D1 as a public beta for you to start building real-world apps with, we want to make sure everybody can start to get a feel for the product on their hobby apps or side-projects.

What’s included in the alpha?

While a lot is still changing behind the scenes with D1, we’ve put a lot of thought into how you, as a developer, interact with it – even if you’re new to databases.

Using the D1 dashboard

In a few clicks you can get your D1 database up and running right from within your dashboard. In our D1 interface, you can create, maintain and view your database as you please. Changes made in the UI are instantly available to your Worker – no redeploy required!

UPDATE Supercloud SET status = 'open alpha' WHERE product = 'D1';

Use Wrangler

If you’re looking to get your hands a little dirty, you can also work with your database using our Wrangler CLI. Create your database and begin adding your data manually or bootstrap your database with one of two ways:

1.  Execute an SQL file

$ wrangler d1 execute my-database-name --file ./customers.sql

where your .sql file looks something like this:

customers.sql

DROP TABLE IF EXISTS Customers;
CREATE TABLE Customers (CustomerID INT, CompanyName TEXT, ContactName TEXT, PRIMARY KEY (`CustomerID`));
INSERT INTO Customers (CustomerID, CompanyName, ContactName) 
VALUES (1, 'Alfreds Futterkiste', 'Maria Anders'),(4, 'Around the Horn', 'Thomas Hardy'),(11, 'Bs Beverages', 'Victoria Ashworth'),(13, 'Bs Beverages', 'Random Name');

2. Create and run migrations

Migrations are a way to version your database changes. With D1, you can create a migration and then apply it to your database.

To create the migration, execute:

wrangler d1 migrations create <my-database-name> <short description of migration>

This will create an SQL file in a migrations folder where you can then go ahead and add your queries. Then apply the migrations to your database by executing:

wrangler d1 migrations apply <my-database-name>

Access D1 from within your Worker

You can attach your D1 to a Worker by adding the D1 binding to your wrangler.toml configuration file. Then interact with D1 by executing queries inside your Worker like so:

export default {
 async fetch(request, env) {
   const { pathname } = new URL(request.url);

   if (pathname === "/api/beverages") {
     const { results } = await env.DB.prepare(
       "SELECT * FROM Customers WHERE CompanyName = ?"
     )
       .bind("Bs Beverages")
       .all();
     return Response.json(results);
   }

   return new Response("Call /api/beverages to see Bs Beverages customers");
 },
};

Or access D1 from within your Pages Function

In this Alpha launch, D1 also supports integration with Cloudflare Pages! You can add a D1 binding inside the Pages dashboard, and write your queries inside a Pages Function to build a full-stack application! Check out the full documentation to get started with Pages and D1.

Community built tooling

During our private alpha period, the excitement behind D1 led to some valuable contributions to the D1 ecosystem and developer experience by members of the community. Here are some of our favorite projects to date:

d1-orm

An Object Relational Mapping (ORM) is a way for you to query and manipulate data by using JavaScript. Created by a Cloudflare Discord Community Champion, the d1-orm seeks to provide a strictly typed experience while using D1:

const users = new Model(
    // table name, primary keys, indexes etc
    tableDefinition,
    // column types, default values, nullable etc
    columnDefinitions
)

// TS helper for typed queries
type User = Infer<type of users>;

// ORM-style query builder
const user = await users.First({
    where: {
        id: 1,
    },
});

You can check out the full documentation, and provide feedback by making an issue on the GitHub repository.

workers-qb

This is a zero-dependency query builder that provides a simple standardized interface while keeping the benefits and speed of using raw queries over a traditional ORM. While not intended to provide ORM-like functionality, workers-qb makes it easier to interact with the database from code for direct SQL access:

const qb = new D1QB(env.DB)

const fetched = await qb.fetchOne({
  tableName: 'employees',
  fields: 'count(*) as count',
  where: {
    conditions: 'department = ?1',
    params: ['HQ'],
  },
})

You can read more about the query builder here.

d1-console

Instead of running the wrangler d1 execute command in your terminal every time you want to interact with your database, you can interact with D1 from within the d1-console. Created by a Discord Community Champion, this gives the benefit of executing multi-line queries, obtaining command history, and viewing a cleanly formatted table output.

UPDATE Supercloud SET status = 'open alpha' WHERE product = 'D1';

While this is a community project today, we plan to natively support a “D1 Console” in the future. For now, get started by checking out the d1-console package here.

D1 adapter for Kysely

Kysely is a type-safe and autocompletion-friendly typescript SQL query builder. With this adapter you can interact with D1 with the familiar Kysely interface:

// Create Kysely instance with kysely-d1
const db = new Kysely<Database>({ 
  dialect: new D1Dialect({ database: env.DB })
});
    
// Read row from D1 table
const result = await db
  .selectFrom('kv')
  .selectAll()
  .where('key', '=', key)
  .executeTakeFirst();

Check out the project here.

What’s still in testing?

The biggest pieces that have been disabled for this alpha release are replication and JavaScript transaction support. While we’ll be rolling out these changes gradually, we want to call out some limitations that exist today that we’re actively working on testing:

  • Database location: Each D1 database only runs a single instance. It’s created close to where you, as the developer, create the database, and does not currently move regions based on access patterns. Workers running elsewhere in the world will see higher latency as a result.
  • Concurrency limitations: Under high load, read and write queries may be queued rather than triggering new replicas to be created. As a result, the performance & throughput characteristics of the open alpha won’t be representative of the final product.
  • Availability limitations: Backups will block access to the DB while they’re running. In most cases this should only be a second or two, and any requests that arrive during the backup will be queued.

You can also check out a more detailed, up-to-date list on D1 alpha Limitations.

Request for feedback

While we can make all sorts of guesses and bets on the kind of databases you want to use D1 for, we are not the users – you are! We want developers from all backgrounds to preview the D1 tech at its early stages, and let us know where we need to improve to make it suitable for your production apps.

For general feedback about your experience and to interact with other folks in the alpha, join our #d1-open-alpha channel in the Cloudflare Developers Discord. We plan to make any important announcements and changes in this channel as well as on our monthly community calls.

To file more specific feature requests (no matter how wacky) and report any bugs, create a thread in the Cloudflare Community forum under the D1 category. We will be maintaining this forum as a way to plan for the months ahead!

Get started

Want to get started right away? Check out our D1 documentation to get started today. Build our classic Northwind Traders demo to explore the D1 experience and deploy your first D1 database!

Introducing the price-capacity-optimized allocation strategy for EC2 Spot Instances

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/introducing-price-capacity-optimized-allocation-strategy-for-ec2-spot-instances/

This blog post is written by Jagdeep Phoolkumar, Senior Specialist Solution Architect, Flexible Compute and Peter Manastyrny, Senior Product Manager Tech, EC2 Core.

Amazon EC2 Spot Instances are unused Amazon Elastic Compute Cloud (Amazon EC2) capacity in the AWS Cloud available at up to a 90% discount compared to On-Demand prices. One of the best practices for using EC2 Spot Instances is to be flexible across a wide range of instance types to increase the chances of getting the aggregate compute capacity. Amazon EC2 Auto Scaling and Amazon EC2 Fleet make it easy to configure a request with a flexible set of instance types, as well as use a Spot allocation strategy to determine how to fulfill Spot capacity from the Spot Instance pools that you provide in your request.

The existing allocation strategies available in Amazon EC2 Auto Scaling and Amazon EC2 Fleet are called “lowest-price” and “capacity-optimized”. The lowest-price allocation strategy allocates Spot Instance pools where the Spot price is currently the lowest. Customers told us that in some cases the lowest-price strategy picks the Spot Instance pools that are not optimized for capacity availability and results in more frequent Spot Instance interruptions. As an improvement over lowest-price allocation strategy, in August 2019 AWS launched the capacity-optimized allocation strategy for Spot Instances, which helps customers tap into the deepest Spot Instance pools by analyzing capacity metrics. Since then, customers have seen a significantly lower interruption rate with capacity-optimized strategy when compared to the lowest-price strategy. You can read more about these customer stories in the Capacity-Optimized Spot Instance Allocation in Action at Mobileye and Skyscanner blog post. The capacity-optimized allocation strategy strictly selects the deepest pools. Therefore, sometimes it can pick high-priced pools even when there are low-priced pools available with marginally less capacity. Customers have been telling us that, for an optimal experience, they would like an allocation strategy that balances the best trade-offs between lowest-price and capacity-optimized.

Today, we’re excited to share the new price-capacity-optimized allocation strategy that makes Spot Instance allocation decisions based on both the price and the capacity availability of Spot Instances. The price-capacity-optimized allocation strategy should be the first preference and the default allocation strategy for most Spot workloads.

This post illustrates how the price-capacity-optimized allocation strategy selects Spot Instances in comparison with lowest-price and capacity-optimized. Furthermore, it discusses some common use cases of the price-capacity-optimized allocation strategy.

Overview

The price-capacity-optimized allocation strategy makes Spot allocation decisions based on both capacity availability and Spot prices. In comparison to the lowest-price allocation strategy, the price-capacity-optimized strategy doesn’t always attempt to launch in the absolute lowest priced Spot Instance pool. Instead, price-capacity-optimized attempts to diversify as much as possible across the multiple low-priced pools with high capacity availability. As a result, the price-capacity-optimized strategy in most cases has a higher chance of getting Spot capacity and delivers lower interruption rates when compared to the lowest-price strategy. If you factor in the cost associated with retrying the interrupted requests, then the price-capacity-optimized strategy becomes even more attractive from a savings perspective over the lowest-price strategy.

We recommend the price-capacity-optimized allocation strategy for workloads that require optimization of cost savings, Spot capacity availability, and interruption rates. For existing workloads using lowest-price strategy, we recommend price-capacity-optimized strategy as a replacement. The capacity-optimized allocation strategy is still suitable for workloads that either use similarly priced instances, or ones where the cost of interruption is so significant that any cost saving is inadequate in comparison to a marginal increase in interruptions.

Walkthrough

In this section, we illustrate how the price-capacity-optimized allocation strategy deploys Spot capacity when compared to the other two allocation strategies. The following example configuration shows how Spot capacity could be allocated in an Auto Scaling group using the different allocation strategies:

{
    "AutoScalingGroupName": "myasg ",
    "MixedInstancesPolicy": {
        "LaunchTemplate": {
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-abcde12345"
            },
            "Overrides": [
                {
                    "InstanceRequirements": {
                        "VCpuCount": {
                            "Min": 4,
                            "Max": 4
                        },
                        "MemoryMiB": {
                            "Min": 0,
                            "Max": 16384
                        },
                        "InstanceGenerations": [
                            "current"
                        ],
                        "BurstablePerformance": "excluded",
                        "AcceleratorCount": {
                            "Max": 0
                        }
                    }
                }
            ]
        },
        "InstancesDistribution": {
            "OnDemandPercentageAboveBaseCapacity": 0,
            "SpotAllocationStrategy": "spot-allocation-strategy"
        }
    },
    "MinSize": 10,
    "MaxSize": 100,
    "DesiredCapacity": 60,
    "VPCZoneIdentifier": "subnet-a12345a,subnet-b12345b,subnet-c12345c"
}

First, Amazon EC2 Auto Scaling attempts to balance capacity evenly across Availability Zones (AZ). Next, Amazon EC2 Auto Scaling applies the Spot allocation strategy using the 30+ instances selected by attribute-based instance type selection, in each Availability Zone. The results after testing different allocation strategies are as follows:

  • Price-capacity-optimized strategy diversifies over multiple low-priced Spot Instance pools that are optimized for capacity availability.
  • Capacity-optimize strategy identifies Spot Instance pools that are only optimized for capacity availability.
  • Lowest-price strategy by default allocates the two lowest priced Spot Instance pools that aren’t optimized for capacity availability

To find out how each allocation strategy fares regarding Spot savings and capacity, we compare ‘Cost of Auto Scaling group’ (number of instances x Spot price/hour for each type of instance) and ‘Spot interruptions rate’ (number of instances interrupted/number of instances launched) for each allocation strategy. We use fictional numbers for the purpose of this post. However, you can use the Cloud Intelligence Dashboards to find the actual Spot Saving, and the Amazon EC2 Spot interruption dashboard to log Spot Instance interruptions. The example results after a 30-day period are as follows:

Allocation strategy

Instance allocation

Cost of Auto Scaling group

Spot interruptions rate

price-capacity-optimized

40 c6i.xlarge

20 c5.xlarge

$4.80/hour 3%

capacity-optimized

60 c5.xlarge

$5.00/hour

2%

lowest-price

30 c5a.xlarge

30 m5n.xlarge

$4.75/hour

20%

As per the above table, with the price-capacity-optimized strategy, the cost of the Auto Scaling group is only 5 cents (1%) higher, whereas the rate of Spot interruptions is six times lower (3% vs 20%) than the lowest-price strategy. In summary, from this exercise you learn that the price-capacity-optimized strategy provides the optimal Spot experience that is the best of both the lowest-price and capacity-optimized allocation strategies.

Common use-cases of price-capacity-optimized allocation strategy

Earlier we mentioned that the price-capacity-optimized allocation strategy is recommended for most Spot workloads. To elaborate further, in this section we explore some of these common workloads.

Stateless and fault-tolerant workloads

Stateless workloads that can complete ongoing requests within two minutes of a Spot interruption notice, and the fault-tolerant workloads that have a low cost of retries, are the best fit for the price-capacity-optimized allocation strategy. This category has workloads such as stateless containerized applications, microservices, web applications, data and analytics jobs, and batch processing.

Workloads with a high cost of interruption

Workloads that have a high cost of interruption associated with an expensive cost of retries should implement checkpointing to lower the cost of interruptions. By using checkpointing, you make the price-capacity-optimized allocation strategy a good fit for these workloads, as it allocates capacity from the low-priced Spot Instance pools that offer a low Spot interruptions rate. This category has workloads such as long Continuous Integration (CI), image and media rendering, Deep Learning, and High Performance Compute (HPC) workloads.

Conclusion

We recommend that customers use the price-capacity-optimized allocation strategy as the default option. The price-capacity-optimized strategy helps Amazon EC2 Auto Scaling groups and Amazon EC2 Fleet provision target capacity with an optimal experience. Updating to the price-capacity-optimized allocation strategy is as simple as updating a single parameter in an Amazon EC2 Auto Scaling group and Amazon EC2 Fleet.

To learn more about allocation strategies for Spot Instances, visit the Spot allocation strategies documentation page.

Simplifying Amazon EC2 instance type flexibility with new attribute-based instance type selection features

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/simplifying-amazon-ec2-instance-type-flexibility-with-new-attribute-based-instance-type-selection-features/

This blog is written by Rajesh Kesaraju, Sr. Solution Architect, EC2-Flexible Compute and Peter Manastyrny, Sr. Product Manager, EC2.

Today AWS is adding two new attributes for the attribute-based instance type selection (ABS) feature to make it even easier to create and manage instance type flexible configurations on Amazon EC2. The new network bandwidth attribute allows customers to request instances based on the network requirements of their workload. The new allowed instance types attribute is useful for workloads that have some instance type flexibility but still need more granular control over which instance types to run on.

The two new attributes are supported in EC2 Auto Scaling Groups (ASG), EC2 Fleet, Spot Fleet, and Spot Placement Score.

Before exploring the new attributes in detail, let us review the core ABS capability.

ABS refresher

ABS lets you express your instance type requirements as a set of attributes, such as vCPU, memory, and storage when provisioning EC2 instances with ASG, EC2 Fleet, or Spot Fleet. Your requirements are translated by ABS to all matching EC2 instance types, simplifying the creation and maintenance of instance type flexible configurations. ABS identifies the instance types based on attributes that you set in ASG, EC2 Fleet, or Spot Fleet configurations. When Amazon EC2 releases new instance types, ABS will automatically consider them for provisioning if they match the selected attributes, removing the need to update configurations to include new instance types.

ABS helps you to shift from an infrastructure-first to an application-first paradigm. ABS is ideal for workloads that need generic compute resources and do not necessarily require the hardware differentiation that the Amazon EC2 instance type portfolio delivers. By defining a set of compute attributes instead of specific instance types, you allow ABS to always consider the broadest and newest set of instance types that qualify for your workload. When you use EC2 Spot Instances to optimize your costs and save up to 90% compared to On-Demand prices, instance type diversification is the key to access the highest amount of Spot capacity. ABS provides an easy way to configure and maintain instance type flexible configurations to run fault-tolerant workloads on Spot Instances.

We recommend ABS as the default compute provisioning method for instance type flexible workloads including containerized apps, microservices, web applications, big data, and CI/CD.

Now, let us dive deep on the two new attributes: network bandwidth and allowed instance types.

How network bandwidth attribute for ABS works

Network bandwidth attribute allows customers with network-sensitive workloads to specify their network bandwidth requirements for compute infrastructure. Some of the workloads that depend on network bandwidth include video streaming, networking appliances (e.g., firewalls), and data processing workloads that require faster inter-node communication and high-volume data handling.

The network bandwidth attribute uses the same min/max format as other ABS attributes (e.g., vCPU count or memory) that assume a numeric value or range (e.g., min: ‘10’ or min: ‘15’; max: ‘40’). Note that setting the minimum network bandwidth does not guarantee that your instance will achieve that network bandwidth. ABS will identify instance types that support the specified minimum bandwidth, but the actual bandwidth of your instance might go below the specified minimum at times.

Two important things to remember when using the network bandwidth attribute are:

  • ABS will only take burst bandwidth values into account when evaluating maximum values. When evaluating minimum values, only the baseline bandwidth will be considered.
    • For example, if you specify the minimum bandwidth as 10 Gbps, instances that have burst bandwidth of “up to 10 Gbps” will not be considered, as their baseline bandwidth is lower than the minimum requested value (e.g., m5.4xlarge is burstable up to 10 Gbps with a baseline bandwidth of 5 Gbps).
    • Alternatively, c5n.2xlarge, which is burstable up to 25 Gbps with a baseline bandwidth of 10 Gbps will be considered because its baseline bandwidth meets the minimum requested value.
  • Our recommendation is to only set a value for maximum network bandwidth if you have specific requirements to restrict instances with higher bandwidth. That would help to ensure that ABS considers the broadest possible set of instance types to choose from.

Using the network bandwidth attribute in ASG

In this example, let us look at a high-performance computing (HPC) workload or similar network bandwidth sensitive workload that requires a high volume of inter-node communications. We use ABS to select instances that have at minimum 10 Gpbs of network bandwidth and at least 32 vCPUs and 64 GiB of memory.

To get started, you can create or update an ASG or EC2 Fleet set up with ABS configuration and specify the network bandwidth attribute.

The following example shows an ABS configuration with network bandwidth attribute set to a minimum of 10 Gbps. In this example, we do not set a maximum limit for network bandwidth. This is done to remain flexible and avoid restricting available instance type choices that meet our minimum network bandwidth requirement.

Create the following configuration file and name it: my_asg_network_bandwidth_configuration.json

{
    "AutoScalingGroupName": "network-bandwidth-based-instances-asg",
    "DesiredCapacityType": "units",
    "MixedInstancesPolicy": {
        "LaunchTemplate": {
            "LaunchTemplateSpecification": {
                "LaunchTemplateName": "LaunchTemplate-x86",
                "Version": "$Latest"
            },
            "Overrides": [
                {
                "InstanceRequirements": {
                    "VCpuCount": {"Min": 32},
                    "MemoryMiB": {"Min": 65536},
                    "NetworkBandwidthGbps": {"Min": 10} }
                 }
            ]
        },
        "InstancesDistribution": {
            "OnDemandPercentageAboveBaseCapacity": 30,
            "SpotAllocationStrategy": "capacity-optimized"
        }
    },
    "MinSize": 1,
    "MaxSize": 10,
    "DesiredCapacity":10,
    "VPCZoneIdentifier": "subnet-f76e208a, subnet-f76e208b, subnet-f76e208c"
}

Next, let us create an ASG using the following command:

my_asg_network_bandwidth_configuration.json file

aws autoscaling create-auto-scaling-group --cli-input-json file://my_asg_network_bandwidth_configuration.json

As a result, you have created an ASG that may include instance types m5.8xlarge, m5.12xlarge, m5.16xlarge, m5n.8xlarge, and c5.9xlarge, among others. The actual selection at the time of the request is made by capacity optimized Spot allocation strategy. If EC2 releases an instance type in the future that would satisfy the attributes provided in the request, that instance will also be automatically considered for provisioning.

Considered Instances (not an exhaustive list)


Instance Type        Network Bandwidth
m5.8xlarge             “10 Gbps”

m5.12xlarge           “12 Gbps”

m5.16xlarge           “20 Gbps”

m5n.8xlarge          “25 Gbps”

c5.9xlarge               “10 Gbps”

c5.12xlarge             “12 Gbps”

c5.18xlarge             “25 Gbps”

c5n.9xlarge            “50 Gbps”

c5n.18xlarge          “100 Gbps”

Now let us focus our attention on another new attribute – allowed instance types.

How allowed instance types attribute works in ABS

As discussed earlier, ABS lets us provision compute infrastructure based on our application requirements instead of selecting specific EC2 instance types. Although this infrastructure agnostic approach is suitable for many workloads, some workloads, while having some instance type flexibility, still need to limit the selection to specific instance families, and/or generations due to reasons like licensing or compliance requirements, application performance benchmarking, and others. Furthermore, customers have asked us to provide the ability to restrict the auto-consideration of newly released instances types in their ABS configurations to meet their specific hardware qualification requirements before considering them for their workload. To provide this functionality, we added a new allowed instance types attribute to ABS.

The allowed instance types attribute allows ABS customers to narrow down the list of instance types that ABS considers for selection to a specific list of instances, families, or generations. It takes a comma separated list of specific instance types, instance families, and wildcard (*) patterns. Please note, that it does not use the full regular expression syntax.

For example, consider container-based web application that can only run on any 5th generation instances from compute optimized (c), general purpose (m), or memory optimized (r) families. It can be specified as “AllowedInstanceTypes”: [“c5*”, “m5*”,”r5*”].

Another example could be to limit the ABS selection to only memory-optimized instances for big data Spark workloads. It can be specified as “AllowedInstanceTypes”: [“r6*”, “r5*”, “r4*”].

Note that you cannot use both the existing exclude instance types and the new allowed instance types attributes together, because it would lead to a validation error.

Using allowed instance types attribute in ASG

Let us look at the InstanceRequirements section of an ASG configuration file for a sample web application. The AllowedInstanceTypes attribute is configured as [“c5.*”, “m5.*”,”c4.*”, “m4.*”] which means that ABS will limit the instance type consideration set to any instance from 4th and 5th generation of c or m families. Additional attributes are defined to a minimum of 4 vCPUs and 16 GiB RAM and allow both Intel and AMD processors.

Create the following configuration file and name it: my_asg_allow_instance_types_configuration.json

{
    "AutoScalingGroupName": "allow-instance-types-based-instances-asg",
    "DesiredCapacityType": "units",
    "MixedInstancesPolicy": {
        "LaunchTemplate": {
            "LaunchTemplateSpecification": {
                "LaunchTemplateName": "LaunchTemplate-x86",
                "Version": "$Latest"
            },
            "Overrides": [
                {
                "InstanceRequirements": {
                    "VCpuCount": {"Min": 4},
                    "MemoryMiB": {"Min": 16384},
                    "CpuManufacturers": ["intel","amd"],
                    "AllowedInstanceTypes": ["c5.*", "m5.*","c4.*", "m4.*"] }
            }
            ]
        },
        "InstancesDistribution": {
            "OnDemandPercentageAboveBaseCapacity": 30,
            "SpotAllocationStrategy": "capacity-optimized"
        }
    },
    "MinSize": 1,
    "MaxSize": 10,
    "DesiredCapacity":10,
    "VPCZoneIdentifier": "subnet-f76e208a, subnet-f76e208b, subnet-f76e208c"
}

As a result, you have created an ASG that may include instance types like m5.xlarge, m5.2xlarge, c5.xlarge, and c5.2xlarge, among others. The actual selection at the time of the request is made by capacity optimized Spot allocation strategy. Please note that if EC2 will in the future release a new instance type which will satisfy the other attributes provided in the request, but will not be a member of 4th or 5th generation of m or c families specified in the allowed instance types attribute, the instance type will not be considered for provisioning.

Selected Instances (not an exhaustive list)

m5.xlarge

m5.2xlarge

m5.4xlarge

c5.xlarge

c5.2xlarge

m4.xlarge

m4.2xlarge

m4.4xlarge

c4.xlarge

c4.2xlarge

As you can see, ABS considers a broad set of instance types for provisioning, however they all meet the compute attributes that are required for your workload.

Cleanup

To delete both ASGs and terminate all the instances, execute the following commands:

aws autoscaling delete-auto-scaling-group --auto-scaling-group-name network-bandwidth-based-instances-asg --force-delete

aws autoscaling delete-auto-scaling-group --auto-scaling-group-name allow-instance-types-based-instances-asg --force-delete

Conclusion

In this post, we explored the two new ABS attributes – network bandwidth and allowed instance types. Customers can use these attributes to select instances based on network bandwidth and to limit the set of instances that ABS selects from. The two new attributes, as well as the existing set of ABS attributes enable you to save time on creating and maintaining instance type flexible configurations and make it even easier to express the compute requirements of your workload.

ABS represents the paradigm shift in the way that our customers interact with compute, making it easier than ever to request diversified compute resources at scale. We recommend ABS as a tool to help you identify and access the largest amount of EC2 compute capacity for your instance type flexible workloads.

Building highly resilient applications with on-premises interdependencies using AWS Local Zones

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-highly-resilient-applications-with-on-premises-interdependencies-using-aws-local-zones/

This blog post is written by Rachel Rui Liu, Senior Solutions Architect.

AWS Local Zones are a type of infrastructure deployment that places compute, storage, database, and other select AWS services close to large population and industry centers.

Following the successful launch of the AWS Local Zones in 16 US cities since 2019, in Feb 2022, AWS announced plans to launch new AWS Local Zones in 32 metropolitan areas in 26 countries worldwide.

With Local Zones, we’ve seen use cases in two common categories.

The first category of use cases is for workloads that require extremely low latency between end-user devices and workload servers. For example, let’s consider media content creation and real-time multiplayer gaming. For these use cases, deploying the workload to a Local Zone can help achieve down to single-digit milliseconds latency between end-user devices and the AWS infrastructure, which is ideal for a good end-user experience.

This post will focus on addressing the second category of use cases, which is commonly seen in an enterprise hybrid architecture, where customers must achieve low latency between AWS infrastructure and existing on-premises data centers.  Compared to the first category of use cases, these use cases can tolerate slightly higher latency between the end-user devices and the AWS infrastructure. However, these workloads have dependencies to these on-premises systems, so the lowest possible latency between AWS infrastructure and on-premises data centers is required for better application performance. Here are a few examples of these systems:

  • Financial services sector mainframe workloads hosted on premises serving regional customers.
  • Enterprise Active Directory hosted on premise serving cloud and on-premises workloads.
  • Enterprise applications hosted on premises processing a high volume of locally generated data.

For workloads deployed in AWS, the time taken for each interaction with components still hosted in the on-premises data center is increased by the latency. In turn, this delays responses received by the end-user. The total latency accumulates and results in suboptimal user experiences.

By deploying modernized workloads in Local Zones, you can reduce latency while continuing to access systems hosted in on-premises data centers, thereby reducing the total latency for the end-user. At the same time, you can enjoy the benefits of agility, elasticity, and security offered by AWS, and can apply the same automation, compliance, and security best practices that you’ve been familiar with in the AWS Regions.

Enterprise workload resiliency with Local Zones

While designing hybrid architectures with Local Zones, resiliency is an important consideration. You want to route traffic to the nearest Local Zone for low latency. However, when disasters happen, it’s critical to fail over to the parent Region automatically.

Let’s look at the details of hybrid architecture design based on real world deployments from different angles to understand how the architecture achieves all of the design goals.

Hybrid architecture with resilient network connectivity

The following diagram shows a high-level overview of a resilient enterprise hybrid architecture with Local Zones, where you have redundant connections between the AWS Region, the Local Zone, and the corporate data center.

resillient network connectivity

Here are a few key points with this network connectivity design:

  1. Use AWS Direct Connect or Site-to-Site VPN to connect the corporate data center and AWS Region.
  2. Use Direct Connect or self-hosted VPN to connect the corporate data center and the Local Zone. This connection will provide dedicated low-latency connectivity between the Local Zone and corporate data center.
  3. Transit Gateway is a regional service. When attaching the VPC to AWS Transit Gateway, you can only add subnets provisioned in the Region. Instances on subnets in the Local Zone can still use Transit Gateway to reach resources in the Region.
  4. For subnets provisioned in the Region, the VPC route table should be configured to route the traffic to the corporate data center via Transit Gateway.
  5. For subnets provisioned in Local Zone, the VPC route table should be configured to route the traffic to the corporate data center via the self-hosted VPN instance or Direct Connect.

Hybrid architecture with resilient workload deployment

The next examples show a public and a private facing workload.

To simplify the diagram and focus on application layer architecture, the following diagrams assume that you are using Direct Connect to connect between AWS and the on-premises data center.

Example 1: Resilient public facing workload

With a public facing workload, end-user traffic will be routed to the Local Zone. If the Local Zone is unavailable, then the traffic will be routed to the Region automatically using an Amazon Route 53 failover policy.

public facing workload resilliency
Here are the key design considerations for this architecture:

  1. Deploy the workload in the Local Zone and put the compute layer in an AWS AutoScaling Group, so that the application can scale up and down depending on volume of requests.
  2. Deploy the workload in both the Local Zone and an AWS Region, and put the compute layer into an autoscaling group. The regional deployment will act as pilot light or warm standby with minimal footprint. But it can scale out when the Local Zone is unavailable.
  3. Two Application Load Balancers (ALBs) are required: one in the Region and one in the Local Zone. Each ALB will dispatch the traffic to each workload cluster inside the autoscaling group local to it.
  4. An internet gateway is required for public facing workloads. When using a Local Zone, there’s no extra configuration needed: define a single internet gateway and attach it to the VPC.

If you want to specify an Elastic IP address to be the workload’s public endpoint, the Local Zone will have a different address pool than the Region. Noting that BYOIP is unsupported for Local Zones.

  1. Create a Route 53 DNS record with “Failover” as the routing policy.
  • For the primary record, point it to the alias of the ALB in the Local Zone. This will set Local Zone as the preferred destination for the application traffic which minimizes latency for end-users.
  • For the secondary record, point it to the alias of the ALB in the AWS Region.
  • Enable health check for the primary record. If health check against the primary record fails, which indicates that the workload deployed in the Local Zone has failed to respond, then Route 53 will automatically point to the secondary record, which is the workload deployed in the AWS Region.

Example 2: Resilient private workload

For a private workload that’s only accessible by internal users, a few extra considerations must be made to keep the traffic inside of the trusted private network.

private workload resilliency

The architecture for resilient private facing workload has the same steps as public facing workload, but with some key differences. These include:

  1. Instead of using a public hosted zone, create private hosted zones in Route 53 to respond to DNS queries for the workload.
  2. Create the primary and secondary records in Route 53 just like the public workload but referencing the private ALBs.
  3. To allow end-users onto the corporate network (within offices or connected via VPN) to resolve the workload, use the Route 53 Resolver with an inbound endpoint. This allows end-users located on-premises to resolve the records in the private hosted zone. Route 53 Resolver is designed to be integrated with an on-premises DNS server.
  4. No internet gateway is required for hosting the private workload. You might need an internet gateway in the Local Zone for other purposes: for example, to host a self-managed VPN solution to connect the Local Zone with the corporate data center.

Hosting multiple workloads

Customers who host multiple workloads in a single VPC generally must consider how to segregate those workloads. As with workloads in the AWS Region, segregation can be implemented at a subnet or VPC level.

If you want to segregate workloads at the subnet level, you can extend your existing VPC architecture by provisioning extra sets of subnets to the Local Zone.

segregate workloads at subnet level

Although not shown in the diagram, for those of you using a self-hosted VPN to connect the Local Zone with an on-premises data center, the VPN solution can be deployed in a centralized subnet.

You can continue to use security groups, network access control lists (NACLs) , and VPC route tables – just as you would in the Region to segregate the workloads.

If you want to segregate workloads at the VPC level, like many of our customers do, within the Region, inter-VPC routing is generally handled by Transit Gateway. However, in this case, it may be undesirable to send traffic to the Region to reach a subnet in another VPC that is also extended to the Local Zone.

segregate workloads at VPC level

Key considerations for this design are as follows:

  1. Direct Connect is deployed to connect the Local Zone with the corporate data center. Therefore, each VPC will have a dedicated Virtual Private Gateway provisioned to allow association with the Direct Connect Gateway.
  2. To enable inter-VPC traffic within the Local Zone, peer the two VPCs together.
  3. Create a VPC route table in VPC A. Add a route for Subnet Y where the destination is the peering link. Assign this route table to Subnet X.
  4. Create a VPC route table in VPC B. Add a route for Subnet X where the destination is the peering link. Assign this route table to Subnet Y.
  5. If necessary, add routes for on-premises networks and the transit gateway to both route tables.

This design allows traffic between subnets X and Y to stay within the Local Zone, thereby avoiding any latency from the Local Zone to the AWS Region while still permitting full connectivity to all other networks.

Conclusion

In this post, we summarized the use cases for enterprise hybrid architecture with Local Zones, and showed you:

  • Reference architectures to host workloads in Local Zones with low-latency connectivity to corporate data centers and resiliency to enable fail over to the AWS Region automatically.
  • Different design considerations for public and private facing workloads utilizing this hybrid architecture.
  • Segregation and connectivity considerations when extending this hybrid architecture to host multiple workloads.

Hopefully you will be able to follow along with these reference architectures to build and run highly resilient applications with local system interdependencies using Local Zones.

Amazon EC2 Trn1 Instances for High-Performance Model Training are Now Available

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/amazon-ec2-trn1-instances-for-high-performance-model-training-are-now-available/

Deep learning (DL) models have been increasing in size and complexity over the last few years, pushing the time to train from days to weeks. Training large language models the size of GPT-3 can take months, leading to an exponential growth in training cost. To reduce model training times and enable machine learning (ML) practitioners to iterate fast, AWS has been innovating across chips, servers, and data center connectivity.

At AWS re:Invent 2021, we announced the preview of Amazon EC2 Trn1 instances powered by AWS Trainium chips. AWS Trainium is optimized for high-performance deep learning training and is the second-generation ML chip built by AWS, following AWS Inferentia.

Today, I’m excited to announce that Amazon EC2 Trn1 instances are now generally available! These instances are well-suited for large-scale distributed training of complex DL models across a broad set of applications, such as natural language processing, image recognition, and more.

Compared to Amazon EC2 P4d instances, Trn1 instances deliver 1.4x the teraFLOPS for BF16 data types, 2.5x more teraFLOPS for TF32 data types, 5x the teraFLOPS for FP32 data types, 4x inter-node network bandwidth, and up to 50 percent cost-to-train savings. Trn1 instances can be deployed in EC2 UltraClusters that serve as powerful supercomputers to rapidly train complex deep learning models. I’ll share more details on EC2 UltraClusters later in this blog post.

New Trn1 Instance Highlights
Trn1 instances are available today in two sizes and are powered by up to 16 AWS Trainium chips with 128 vCPUs. They provide high-performance networking and storage to support efficient data and model parallelism, popular strategies for distributed training.

Trn1 instances offer up to 512 GB of high-bandwidth memory, deliver up to 3.4 petaFLOPS of TF32/FP16/BF16 compute power, and feature an ultra-high-speed NeuronLink interconnect between chips. NeuronLink helps avoid communication bottlenecks when scaling workloads across multiple Trainium chips.

Trn1 instances are also the first EC2 instances to enable up to 800 Gbps of Elastic Fabric Adapter (EFA) network bandwidth for high-throughput network communication. This second generation EFA delivers lower latency and up to 2x more network bandwidth compared to the previous generation. Trn1 instances also come with up to 8 TB of local NVMe SSD storage for ultra-fast access to large datasets.

The following table lists the sizes and specs of Trn1 instances in detail.

Instance Name
vCPUs AWS Trainium Chips Accelerator Memory NeuronLink Instance Memory Instance Networking Local Instance Storage
trn1.2xlarge 8 1 32 GB N/A 32 GB Up to 12.5 Gbps 1x 500 GB NVMe
trn1.32xlarge 128 16 512 GB Supported 512 GB 800 Gbps 4x 2 TB NVMe

Trn1 EC2 UltraClusters
For large-scale model training, Trn1 instances integrate with Amazon FSx for Lustre high-performance storage and are deployed in EC2 UltraClusters. EC2 UltraClusters are hyperscale clusters interconnected with a non-blocking petabit-scale network. This gives you on-demand access to a supercomputer to cut model training time for large and complex models from months to weeks or even days.

Amazon EC2 Trn1 UltraCluster

AWS Trainium Innovation
AWS Trainium chips include specific scalar, vector, and tensor engines that are purpose-built for deep learning algorithms. This ensures higher chip utilization as compared to other architectures, resulting in higher performance.

Here is a short summary of additional hardware innovations:

  • Data Types: AWS Trainium supports a wide range of data types, including FP32, TF32, BF16, FP16, and UINT8, so you can choose the most suitable data type for your workloads. It also supports a new, configurable FP8 (cFP8) data type, which is especially relevant for large models because it reduces the memory footprint and I/O requirements of the model.
  • Hardware-Optimized Stochastic Rounding: Stochastic rounding achieves close to FP32-level accuracy with faster BF16-level performance when you enable auto-casting from FP32 to BF16 data types. Stochastic rounding is a different way of rounding floating-point numbers, which is more suitable for machine learning workloads versus the commonly used Round Nearest Even rounding. By setting the environment variable NEURON_RT_STOCHASTIC_ROUNDING_EN=1 to use stochastic rounding, you can train a model up to 30 percent faster.
  • Custom Operators, Dynamic Tensor Shapes: AWS Trainium also supports custom operators written in C++ and dynamic tensor shapes. Dynamic tensor shapes are key for models with unknown input tensor sizes, such as models processing text.

AWS Trainium shares the same AWS Neuron SDK as AWS Inferentia, making it easy for everyone who is already using AWS Inferentia to get started with AWS Trainium.

For model training, the Neuron SDK consists of a compiler, framework extensions, a runtime library, and developer tools. The Neuron plugin natively integrates with popular ML frameworks, such as PyTorch and TensorFlow.

The AWS Neuron SDK supports just-in-time (JIT) compilation, in addition to ahead-of-time (AOT) compilation, to speed up model compilation, and Eager Debug Mode, for a step-by-step execution.

To compile and run your model on AWS Trainium, you need to change only a few lines of code in your training script. You don’t need to tweak your model or think about data type conversion.

Get Started with Trn1 Instances
In this example, I train a PyTorch model on an EC2 Trn1 instance using the available PyTorch Neuron packages. PyTorch Neuron is based on the PyTorch XLA software package and enables conversion of PyTorch operations to AWS Trainium instructions.

Each AWS Trainium chip includes two NeuronCore accelerators, which are the main neural network compute units. With only a few changes to your training code, you can train your PyTorch model on AWS Trainium NeuronCores.

SSH into the Trn1 instance and activate a Python virtual environment that includes the PyTorch Neuron packages. If you’re using a Neuron-provided AMI, you can activate the preinstalled environment by running the following command:

source aws_neuron_venv_pytorch_p36/bin/activate

Before you can run your training script, you need to make a few modifications. On Trn1 instances, the default XLA device should be mapped to a NeuronCore.

Let’s start by adding the PyTorch XLA imports to your training script:

import torch, torch_xla
import torch_xla.core.xla_model as xm

Then, place your model and tensors onto an XLA device:

model.to(xm.xla_device())
tensor.to(xm.xla_device())

When the model is moved to the XLA device (NeuronCore), subsequent operations on the model are recorded for later execution. This is XLA’s lazy execution which is different from PyTorch’s eager execution. Within the training loop, you have to mark the graph to be optimized and run on the XLA device using xm.mark_step(). Without this mark, XLA cannot determine where the graph ends.

...
for data, target in train_loader:
	output = model(data)
	loss = loss_fn(output, target)
	loss.backward()
	optimizer.step()
	xm.mark_step()
...

You can now run your training script using torchrun <my_training_script>.py.

When running the training script, you can configure the number of NeuronCores to use for training by using torchrun –nproc_per_node.

For example, to run a multi-worker data parallel model training on all 32 NeuronCores in one trn1.32xlarge instance, run torchrun --nproc_per_node=32 <my_training_script>.py.

Data parallel is a strategy for distributed training that allows you to replicate your script across multiple workers, with each worker processing a portion of the training dataset. The workers then share their result with each other.

For more details on supported ML frameworks, model types, and how to prepare your model training script for large-scale distributed training across trn1.32xlarge instances, have a look at the AWS Neuron SDK documentation.

Profiling Tools
Let’s have a quick look at useful tools to keep track of your ML experiments and profile Trn1 instance resource consumption. Neuron integrates with TensorBoard to track and visualize your model training metrics.

AWS Neuron SDK TensorBoard integration

On the Trn1 instance, you can use the neuron-ls command to describe the number of Neuron devices present in the system, along with the associated NeuronCore count, memory, connectivity/topology, PCI device information, and the Python process that currently has ownership of the NeuronCores:

AWS Neuron SDK neuron-ls command

Similarly, you can use the neuron-top command to see a high-level view of the Neuron environment. This shows the utilization of each of the NeuronCores, any models that are currently loaded onto one or more NeuronCores, process IDs for any processes that are using the Neuron runtime, and basic system statistics relating to vCPU and memory usage.

AWS Neuron SDK neuron-top command

Available Now
You can launch Trn1 instances today in the AWS US East (N. Virginia) and US West (Oregon) Regions as On-Demand, Reserved, and Spot Instances or as part of a Savings Plan. As usual with Amazon EC2, you pay only for what you use. For more information, see Amazon EC2 pricing.

Trn1 instances can be deployed using AWS Deep Learning AMIs, and container images are available via managed services such as Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), and AWS ParallelCluster.

To learn more, visit our Amazon EC2 Trn1 instances page, and please send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Antje

Let’s Architect! Architecting with custom chips and accelerators

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-custom-chips-and-accelerators/

It’s hard to imagine a world without computer chips. They are at the heart of the devices that we use to work and play every day. Currently, Amazon Web Services (AWS) is offering customers the next generation of computer chip, with lower cost, higher performance, and a reduced carbon footprint.

This edition of Let’s Architect! focuses on custom computer chips, accelerators, and technologies developed by AWS, such as AWS Nitro System, custom-designed Arm-based AWS Graviton processors that support data-intensive workloads, as well as AWS Trainium, and AWS Inferentia chips optimized for machine learning training and inference.

In this post, we discuss these new AWS technologies, their main characteristics, and how to take advantage of them in your architecture.

Deliver high performance ML inference with AWS Inferentia

As Deep Learning models become increasingly large and complex, the training cost for these models increases, as well as the inference time for serving.

With AWS Inferentia, machine learning practitioners can deploy complex neural-network models that are built and trained on popular frameworks, such as Tensorflow, PyTorch, and MXNet on AWS Inferentia-based Amazon EC2 Inf1 instances.

This video introduces you to the main concepts of AWS Inferentia, a service designed to reduce both cost and latency for inference. To speed up inference, AWS Inferentia: selects and shares a model across multiple chips, places pieces inside the on-chip cache, then streams the data via pipeline for low-latency predictions.

Presenters discuss through the structure of the chip, software considerations, as well as anecdotes from the Amazon Alexa team, who uses AWS Inferentia to serve predictions. If you want to learn more about high throughput coupled with low latency, explore Achieve 12x higher throughput and lowest latency for PyTorch Natural Language Processing applications out-of-the-box on AWS Inferentia on the AWS Machine Learning Blog.

AWS Inferentia shares a model across different chips to speed up inference

AWS Inferentia shares a model across different chips to speed up inference

AWS Lambda Functions Powered by AWS Graviton2 Processor – Run Your Functions on Arm and Get Up to 34% Better Price Performance

AWS Lambda is a serverless, event-driven compute service that enables code to run from virtually any type of application or backend service, without provisioning or managing servers. Lambda uses a high-availability compute infrastructure and performs all of the administration of the compute resources, including server- and operating-system maintenance, capacity-provisioning, and automatic scaling and logging.

AWS Graviton processors are designed to deliver the best price and performance for cloud workloads. AWS Graviton3 processors are the latest in the AWS Graviton processor family and provide up to: 25% increased compute performance, two-times higher floating-point performance, and two-times faster cryptographic workload performance compared with AWS Graviton2 processors. This means you can migrate AWS Lambda functions to Graviton in minutes, plus get as much as 19% improved performance at approximately 20% lower cost (compared with x86).

Comparison between x86 and Arm/Graviton2 results for the AWS Lambda function computing prime numbers

Comparison between x86 and Arm/Graviton2 results for the AWS Lambda function computing prime numbers (click to enlarge)

Powering next-gen Amazon EC2: Deep dive on the Nitro System

The AWS Nitro System is a collection of building-block technologies that includes AWS-built hardware offload and security components. It is powering the next generation of Amazon EC2 instances, with a broadening selection of compute, storage, memory, and networking options.

In this session, dive deep into the Nitro System, reviewing its design and architecture, exploring new innovations to the Nitro platform, and understanding how it allows for fasting innovation and increased security while reducing costs.

Traditionally, hypervisors protect the physical hardware and bios; virtualize the CPU, storage, networking; and provide a rich set of management capabilities. With the AWS Nitro System, AWS breaks apart those functions and offloads them to dedicated hardware and software.

AWS Nitro System separates functions and offloads them to dedicated hardware and software, in place of a traditional hypervisor

AWS Nitro System separates functions and offloads them to dedicated hardware and software, in place of a traditional hypervisor

How Amazon migrated a large ecommerce platform to AWS Graviton

In this re:Invent 2021 session, we learn about the benefits Amazon’s ecommerce Datapath platform has realized with AWS Graviton.

With a range of 25%-40% performance gains across 53,000 Amazon EC2 instances worldwide for Prime Day 2021, the Datapath team is lowering their internal costs with AWS Graviton’s improved price performance. Explore the software updates that were required to achieve this and the testing approach used to optimize and validate the deployments. Finally, learn about the Datapath team’s migration approach that was used for their production deployment.

AWS Graviton2: core components

AWS Graviton2: core components

See you next time!

Thanks for exploring custom computer chips, accelerators, and technologies developed by AWS. Join us in a couple of weeks when we talk more about architectures and the daily challenges faced while working with distributed systems.

Other posts in this series

Looking for more architecture content?

AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!