Tag Archives: Amazon EC2

New – Run Visual Studio Software on Amazon EC2 with User-Based License Model

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-run-visual-studio-software-on-amazon-ec2-with-user-based-license-model/

We announce the general availability of license-included Visual Studio software on Amazon Elastic Cloud Compute (Amazon EC2) instances. You can now purchase fully compliant AWS-provided licenses of Visual Studio with a per-user subscription fee. Amazon EC2 provides preconfigured Amazon Machine Images (AMIs) of Visual Studio Enterprise 2022 and Visual Studio Professional 2022. You can launch on-demand Windows instances including Visual Studio and Windows Server licenses without long-term licensing commitments.

Amazon EC2 provides a broad choice of instances, and customers not only have the flexibility of paying for what their end users use but can also provide the capacity and right hardware to their end users. You can simply launch EC2 instances using license-included AMIs, and multiple authorized users can connect to these EC2 instances by using Remote Desktop software. Your administrator can authorize users centrally using AWS License Manager and AWS Managed Microsoft Active Directory (AD).

Configure Visual Studio License with AWS License Manager
As a prerequisite, your administrator needs to create an instance of AWS Managed Microsoft AD and allow AWS License Manager to onboard to it by accepting permission. To set up authorized users, see AWS Managed Microsoft AD documentation.

AWS License Manager makes it easier to manage your software licenses from vendors such as Microsoft, SAP, Oracle, and IBM across AWS and on-premises environments. To display a list of available Visual Studio software licenses, select User-based subscriptions in the AWS Licence Manager console.

You can see listed products to support user-based subscriptions. Each product has a descriptive name, a count of the subscribed users to utilize the product, and whether the subscription has been activated for use with a directory. Also, you are required to purchase Remote Desktop Services SAL licenses in the same way as Visual Studio by authorizing users for those licenses.

When you select Visual Studio Professional, you can see product details and subscribed users. By selecting Subscribe users, you can add authorized users to the license of Visual Studio Professional software.

You can perform the administrative tasks using the AWS Command Line Interface (CLI) tools via AWS License Manager APIs. For example, you can subscribe a user to the product in your Active Directory.

$ aws license-manager-user-subscriptions start-product-subscription \
         --username vscode2 \
         --product VISUAL_STUDIO_PROFESSIONAL \
         --identity-provider " \
                "ActiveDirectoryIdentityProvider" = \
                {"DirectoryId" = "d-9067b110b5"}" 
         --endpoint-url https://license-manager-user-subscriptions.us-east-1.amazonaws.com

To launch a Windows instance with preconfigured Visual Studio software, go to the EC2 console and select Launch instances. In the Application and OS Images (Amazon Machine Image), search for “Visual Studio on EC2,” and you can find AMIs under the Quickstart AMIs and AWS Marketplace AMIs tabs.

After launching your Windows instance, your administrator associates a user to the product in the Instances screen of the License Manager console. You can see the listed instances were launched using an AMI to provide the specified product to users who can then be associated.

These steps will be performed by the administrators who are responsible for managing users, instances, and costs across the organization. To learn more about administrative tasks, see User-based subscriptions in AWS License Manager.

Run Visual Studio Software on EC2 Instances
Once administrators authorize end users and launch the instances, you can remotely connect to Visual Studio instances using your AD account information shared by your administrator via Remote Desktop software. That’s all!

The instances deployed for user-based subscriptions must remain as managed nodes with AWS Systems Manager. For more information, see Troubleshooting managed node availability and Troubleshooting SSM Agent in the AWS Systems Manager User Guide.

Now Available
License-included Visual Studio on Amazon EC2 is now available in all AWS commercial and public Regions. You are billed per user for licenses of Visual Studio through a monthly subscription and per vCPU for license-included Windows Server instances on EC2.  You can use On-Demand InstancesReserved Instances, and Savings Plan pricing models like you do today for EC2 instances.

To learn more, visit our License Manager User Based Subscriptions documentation, and please send any feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy

AWS Week In Review – July 25, 2022

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-week-in-review-july-25-2022/

A few weeks ago, we hosted the first EMEA AWS Heroes Summit in Milan, Italy. This past week, I had the privilege to join the Americas AWS Heroes Summit in Seattle, Washington, USA. Meeting with our community experts is always inspiring and a great opportunity to learn from each other. During the Summit, AWS Heroes from North America and Latin America shared their thoughts with AWS developer advocates and product teams on topics such as serverless, containers, machine learning, data, and DevTools. You can learn more about the AWS Heroes program here.

AWS Heroes Summit Americas 2022

Last Week’s Launches
Here are some launches that got my attention during the previous week:

Cloudscape Design System Cloudscape is an open source design system for creating web applications. It was built for and is used by AWS products and services. We created it in 2016 to improve the user experience across web applications owned by AWS services and also to help teams implement those applications faster. If you’ve ever used the AWS Management Console, you’ve seen Cloudscape in action. If you are building a product that extends the AWS Management Console, designing a user interface for a hybrid cloud management system, or setting up an on-premises solution that uses AWS, have a look at Cloudscape Design System.

Cloudscape Design System

AWS re:Post introduces community-generated articlesAWS re:Post gives you access to a vibrant community that helps you become even more successful on AWS. Expert community members can now share technical guidance and knowledge beyond answering questions through the Articles feature. Using this feature, community members can share best practices and troubleshooting processes and address customer needs around AWS technology in greater depth. The Articles feature is unlocked for community members who have achieved Rising Star status on re:Post or subject matter experts who built their reputation in the community based on their contributions and certifications. If you have a Rising Star status on re:Post, start writing articles now! All other members can unlock Rising Star status through community contributions or simply browse available articles today on re:Post.

AWS re:Post

AWS Lambda announces support for attribute-based access control (ABAC) and new IAM condition key – You can now use attribute-based access control (ABAC) with AWS Lambda to control access to functions within AWS Identity and Access Management (IAM) using tags. ABAC is an authorization strategy that defines access permissions based on attributes. In AWS, these attributes are called tags. With ABAC, you can scale an access control strategy by setting granular permissions with tags without requiring permissions updates for every new user or resource as your organization scales. Read this blog post by Julian Wood and Chris McPeek to learn more.

AWS Lambda also announced support for lambda:SourceFunctionArn, a new IAM condition key that can be used for IAM policy conditions that specify the Amazon Resource Name (ARN) of the function from which a request is made. You can use the Condition element in your IAM policy to compare the lambda:SourceFunctionArn condition key in the request context with values that you specify in your policy. This allows you to implement advanced security controls for the AWS API calls taken by your Lambda function code. For more details, have a look at the Lambda Developer Guide.

Amazon Fraud Detector launches Account Takeover Insights (ATI)Amazon Fraud Detector now supports an Account Takeover Insights (ATI) model, a low-latency fraud detection machine learning model specifically designed to detect accounts that have been compromised through stolen credentials, phishing, social engineering, or other forms of account takeover. The ATI model is designed to detect up to four times more ATI fraud than traditional rules-based account takeover solutions while minimizing the level of friction for legitimate users. To learn more, have a look at the Amazon Fraud Detector documentation.

Amazon EMR on EC2 clusters (EMR Clusters) introduces more fine-grained access controls – Previously, all jobs running on an EMR cluster used the IAM role associated with the EMR cluster’s EC2 instances to access resources. This role is called the EMR EC2 instance profile. With the new runtime roles for Amazon EMR Steps, you can now specify a different IAM role for your Apache Spark and Hive jobs, scoping down access at a job level. This simplifies access controls on a single EMR cluster that is shared between multiple tenants, wherein each tenant is isolated using IAM roles. You can now also enforce table and column permissions based on your Amazon EMR runtime role to manage your access to data lakes with AWS Lake Formation. For more details, read the What’s New post.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some additional news and customer stories you may find interesting:

AWS open-source news and updates – My colleague Ricardo Sueiras writes this weekly open-source newsletter in which he highlights new open-source projects, tools, and demos from the AWS Community. Read edition #121 here.

AI Use Case Explorer – If you are interested in AI use cases, have a look at the new AI Use Case Explorer. You can search over 100 use cases and 400 customer success stories by industry, business function, and the business outcome you want to achieve.

Bayer centralizes and standardizes data from the carbon program using AWS – To help Brazilian farmers adopt climate-smart agricultural practices and reduce carbon emissions in their activities, Bayer created the Carbon Program, which aims to build carbon-neutral agriculture practices. Learn how Bayer uses AWS to centralize and standardize the data received from the different partners involved in the project in this Bayer case study.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS re:Inforce 2022 – The event will be held this week in person on July 26 and 27 in Boston, Massachusetts, USA. You can watch the keynote and leadership sessions online for free. AWS On Air will also stream live from re:Inforce.

AWS SummitAWS Global Summits – AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Registrations are open for the following AWS Summits in August:

Imagine Conference 2022IMAGINE 2022 – The IMAGINE 2022 conference will take place on August 3 at the Seattle Convention Center, Washington, USA. It’s a no-cost event that brings together education, state, and local leaders to learn about the latest innovations and best practices in the cloud. You can register here.

I’ll be speaking at Data Con LA on August 13–14 in Los Angeles, California, USA. Feel free to say “Hi!” if you’re around. And if you happen to be at Ray Summit on August 23–24 in San Francisco, California, USA, stop by the AWS booth. I’ll be there to discuss all things Ray on AWS.

That’s all for this week. Check back next Monday for another Week in Review!

Antje

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

New – Amazon EC2 R6a Instances Powered by 3rd Gen AMD EPYC Processors for Memory-Intensive Workloads

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-r6a-instances-powered-by-3rd-gen-amd-epyc-processors-for-memory-intensive-workloads/

We launched the general-purpose Amazon EC2 M6a instances at AWS re:Invent 2021 and compute-intensive C6a instances in February of this year. These instances are powered by the 3rd Gen AMD EPYC processors running at frequencies up to 3.6 GHz to offer up to 35 percent better price performance versus the previous generation instances.

Today, we are expanding our portfolio to include memory-optimized Amazon EC2 R6a instances featuring AMD EPYC (Milan) processors 10 percent less expensive than comparable x86 instances.

R6a instances, powered by 3rd Gen AMD EPYC processors are well suited for memory-intensive applications such as high-performance databases (relational databases, noSQL databases), distributed web scale in-memory caches (such as memcached, Redis), in-memory databases such as real-time big data analytics (such as Hadoop, Spark clusters), and other enterprise applications.

R6a instances are built on the AWS Nitro System and support Elastic Fabric Adapter (EFA) for workloads that benefit from lower network latency and highly scalable inter-node communication, such as high-performance computing and video processing.

Here’s a quick recap of the advantages of the new R6a instances compared to R5a instances:

  • Up to 35 percent higher price performance per vCPU versus comparable R5a instances
  • Up to 10 percent less expensive than comparable x86 instances
  • Up to 1536 GiB of memory, 2 times more than the previous generation, giving you the benefit of scaling up databases and running larger in-memory workloads.
  • Up to 192 vCPUs, 50 Gbps enhanced networking, and 40 Gbps EBS bandwidth, enabling you to process data faster, consolidate workloads, and lower the cost of ownership.
  • SAP-certified instances require memory-intensive applications such as high-performance enterprise databases like SAP Business Suite.
  • Support always-on memory encryption with AMD transparent sngle key memory encryption (TSME), and support new AVX2 instructions for accelerating encryption and decryption algorithms.

Here are the specs of R6a instances in detail:

Name vCPUs Memory (GiB) Network Bandwidth (Gbps) EBS Throughput (Gbps)
r6a.large 2 16 Up to 12.5 Up to 6.6
r6a.xlarge 4 32 Up to 12.5 Up to 6.6
r6a.2xlarge 8 64 Up to 12.5 Up to 6.6
r6a.4xlarge 16 128 Up to 12.5 Up to 6.6
r6a.8xlarge 32 256 12.5 6.6
r6a.12xlarge 48 384 18.75 10
r6a.16xlarge 64 512 25 13.3
r6a.24xlarge 96 768 37.5 20
r6a.32xlarge 128 1024 50 26.6
r6a.48xlarge 192 1536 50 40
r6a.metal 192 1536 50 40

Now Available
You can launch R6a instances today in the AWS US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Mumbai) and Europe (Frankfurt, Ireland) as On-DemandSpot, and Reserved Instances or as part of a Savings Plan.

To learn more, visit the R6a instances page. Please send feedback to [email protected]AWS re:Post for EC2, or through your usual AWS Support contacts.

— Channy

Selecting Network Switches for Your AWS Outposts

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/selecting-network-switches-for-your-aws-outposts/

This blog post is written by, Frankie Negro, Outposts Solution Architect.

AWS Outposts is a family of fully managed solutions that extend AWS infrastructure, services, APIs, and tools to customer premises. Outposts is available in a variety of form factors, from 1U and 2U Outposts servers (https://aws.amazon.com/outposts/servers/) to 42U Outposts racks (https://aws.amazon.com/outposts/rack/). AWS Outposts is ideal for workloads that require low-latency access to on-premises systems, local data processing, data residency, and application migration with local system interdependencies.

When operating and consuming services in the AWS Regions, the underlying networking layer is completely abstracted. You do not need to be aware of the underlying networking topology, device port speeds, connectors, transports, links, and media types. Instead, the focus is on design, with the architecture leveraging the high-level constructs available for the Amazon Virtual Private Cloud (VPC), such as VPCs, Subnets, Route Tables, Security Groups, and network access control lists. The network bandwidth available for an Amazon Elastic Compute Cloud (Amazon EC2) instance depends on the number of vCPUs that it has.

AWS Outposts requires a dedicated network connection to an AWS Region defined by the customer when ordering the product. This connection is called the Service Link, and it connects to either public or private anchors (not both) in a specific Availability Zone (AZ) in the selected parent Region. AWS recommends redundant connections that meet the bandwidth requirements for Outposts rack and Outposts servers.

The purpose of AWS Outposts is to fulfill use cases where the workload has requirements that prevent or make it unfeasible to operate in the AWS Regions. Most of these use cases, such as low latency and local data processing, require strong and reliable network infrastructure to handle a high volume of packets per second.

The construct connecting AWS Outposts to the customer on-premises network is called local gateway (LGW) for Outposts rack and local network interface (LNI) for Outposts Servers. These logical elements mediate the data traffic between Outposts and the customer premises.

On Outposts rack, Service Link and LGW traffic flows through the same network connection, which can be a single link per physical device or an aggregated link. Network packets sent to the Region or to your local network are segregated using distinct virtual LANs (VLANs) on Outposts rack. The smaller family members, Outposts servers, use two distinct physical ports.

The physical network elements providing the connections between the devices and services are called Outposts Networking Devices (ONDs) on the AWS side and Customer Networking Devices (CNDs) on the customer side. For its part, Outposts rack can deliver throughput up to 400 Gbps, aggregating 4 x 100 Gbps uplinks to support Service Link and LGW network traffic, while an Outposts server can provide a 10 Gbps dedicated network port for each traffic.

Outpost network traffic segments and logical elements

The upstream devices you provide play a fundamental role in the harmonic coexistence and operation at the ethernet physical and data link layers, which are the basis for performance and stability of the upper network and transport layers as defined by the OSI model. A careful selection of your upstream networking devices must combine reliable operations, cost effectiveness, and long-term vision.

The physical layer (L1)

Here we are talking about physical cables and media interfaces. There are no supported options for UTP Cables with RJ-45 connectors, as Outposts rack only supports Fiber Optic cables with Lucent Connectors (LC). For short distances you can use MMF (Multi-Mode Fiber) or MMF OM4 (Optical Multimode) with LC.  Longer distances can be achieved using SMF (Single Mode Fiber). Distance limits depend on the Fiber Mode and Type.

: Lucent Connector (LC) DuplexEach Outposts server has one physical QSFP+ interface. A 4-way breakout cable is supplied with SFP+ transceivers. You will use two interfaces: One for the LNI traffic and another for the Service Link traffic.

With this is mind, RJ-45 ports on upstream switches will not suit any AWS Outposts connections. Switch models that combine RF-45 and optical ports can be used in conjunction with copper ethernet cables category 8 (CAT8), which support up to 40 Gbps speeds, to connect other segments while the optical ports can be used for AWS Outposts.

When evaluating your upstream switches, bear in mind that Outposts rack switches are always capable of 1 / 10 / 40 / 100 Gbps speeds, and it is the same equipment regardless of the selected AWS Outposts resource ID and uplink connection speed defined during the order process.

It is recommended to account for future traffic needs from the beginning and specify upstream switches with 40 or 100 Gbps ports rather than start small and upgrade in the future. Upgrades and changes always carry risk, so limiting future risk by minimizing the need for upgrades will help mitigate issues and provide a stable, productive environment.

Another characteristic to look for when selecting your networking devices is “non-blocking” switches. These switches can handle all ports at full capacity simultaneously, without contention. It is a simple feature to select, and you can expect high performance out-of-the box without having to go too deep into details such as buffering mechanisms.

The Data Link layer (L2)

This layer establishes and terminates the logical links between nodes and exchange frames end-to-end. Outposts rack requires that your upstream devices support 802.1Q (Dot1q) standards that implement the VLAN support needed to segregate traffic to be forwarded to the Region (via Service Link) from traffic to be forwarded to the customer’s local network.

Most core switches ship with this capability. One good spec to evaluate is the maximum size of the MAC Address Table per VLAN supported by the switch. If the MAC Table gets full, your equipment may fail over to broadcast mode in that VLAN, which introduces additional stress in the network and is a potential exploit condition.

Another common feature for core switches is to support link aggregation or bundle links together so they act like a single, logical link. While AWS Outposts will work with just a single connection per OND, a recommended fault tolerance and high availability best practice is to aggregate multiple paths to withstand the failure of one or multiple members of the logical aggregation group.

As defined in the AWS Well-Architected Framework Reliability pillar design principles, to observe best practices of Automatically recover from failure and Scale horizontally to increase aggregate workload availability, you should consider implementing, for example, 4 x 10 Gbps instead of a single 40 Gbps uplink. AWS Outposts uses link aggregation control protocol (LACP) aggregations with the immediate customer network device (CND) according to the IEEE 802.3ad standard.

To learn more about how you can architect Outposts for network failures, check out the AWS Outposts High Availability Design and Architecture Considerations at this URL.

The logical interface defined as a result of the link aggregation (LAG) can be configured as an ethernet trunk port defined in the IEEE 802.1q standard to allow the use of multiple VLANs. Alternatively, the logical interface can be configured as an L3 interface with the Service Link and LGW defined as VLAN sub-interfaces. This is how AWS Outposts segregates traffic forwarded to Service Link from packets sent to the customer local network.

The Network layer (L3)

At this layer, we get into routing and logical addressing. Outposts rack requires Border Gateway Protocol (BGP) to dynamically exchange routes. Each OND device will establish eBGP peering with the upstream routing device for the Service Link and the LGW.

The architectural decision will be a trade-off between discrete components for routing and switching and an L3 switch capable of BGP routing. This aspect requires a careful assessment. It is common for a core switch to offer L3 capabilities, but BGP support is not available in most cases.

Switch design often aims for excelling at L2 and basic L3. If the network design requires advanced routing features or large IP routing tables, the safest path is to specify a powerful L2 switch and a dedicated L3 router.

Redundant equipment for fault tolerance is recommended as well. AWS does not have restrictions on how the customer implements core switches, but it’s always a good practice to keep it simple and standard, avoiding designs that include proprietary solutions, such as Virtual Chassis and Switch Clustering, because it can make troubleshooting difficult.

Conclusion

In this post, I showed the importance of dedicating time and effort to carefully evaluating the networking landscape where your AWS Outposts will be deployed, assessing the network device options available to you, designing for high-availability, and selecting switch models with proper feature sets and future-proof specifications.

The performance and operation of your AWS Outposts is largely dependent on your network substrate, and all efforts dedicated to making good decisions will be time well spent, allowing you to get the best value out of your hybrid solution while focusing on creating compelling applications and addressing your use cases with AWS Outposts.

Automating Amazon EC2-Windows EBS Volumes monitoring and creating alarms

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/automating-amazon-ec2-windows-ebs-volumes-monitoring-and-creating-alarms/

This blog post is written by, Santhosh Kumar Adapa, Database Consultant,  AWS WWCO ProServe,  Jeevan Shetty, Database Consultant, AWS WWCO ProServe, and Bhanu Ganesh Gudivada, Consultant  Databases, AWS WWCO ProServe.

Customers who are running fleets of Amazon Elastic Compute Cloud (Amazon EC2) instances use advanced monitoring techniques to observe their operational performance. Capabilities like aggregated and custom dimensions help customers categorize and customize their metrics across server fleets for fast and efficient decision making. Customers require visibility into not only infrastructure metrics (such as CPU and memory), but also disk usage metrics.

Monitoring Amazon EC2-Windows Amazon Elastic Block Store (Amazon EBS) Volumes usage is critical, especially when customers have a large fleet of Amazon EC2 Windows servers running to host their databases and applications in AWS. Generally, we see issues with EC2 instances running out of disk space, and free disk space isn’t a metric that is directly available with Amazon CloudWatch. Amazon CloudWatch agent helps solve this problem. After installing and configuring the CloudWatch agent on your EC2 instance, the agent will send metric data with the disk utilization to CloudWatch. The next step is to create a CloudWatch alarm to monitor the disk utilization metric.

In this post, we showcase the steps to automate the monitoring and creating alarms for EBS volumes attached to Amazon EC2 Windows instances. Alarms are created using AWS Lambda that monitors the free disk space and alerts whenever thresholds are crossed using Amazon Simple Notification Service (Amazon SNS).

Solution overview

To demonstrate the solution we first install and configure the CloudWatch agent in your Amazon EC2 Windows instance, and then the agent will send metric data with the disk utilization to CloudWatch. To monitor the disk on each Amazon EC2 Windows instance, we’ll use two custom Metrics, “FreeStorageSpaceInMB” and “FreeStorageSpaceInPercent”, that are collected by CloudWatch agent and pushed to CloudWatch.

The following diagram illustrates the architecture used in this post:

architecture used in this post

  1. Amazon EC2 Windows instance with attached Amazon EBS Volumes to be monitored for free disk usage. The EC2 instance is configured with Amazon CloudWatch Agent.
  2. CloudWatch agent is configured to monitor the “FreeStorageSpaceInMB” and “FreeStorageSpaceInPercent” metrics, and pushed to AWS CloudWatch.
  3. Lambda function that can be invoked to create CloudWatch alarms for each disk attached to the EC2 instance.
  4. CloudWatch Alarms are created with warnings and critical thresholds based on storage size.
  5. Amazon SNS is used to send alerts when the CloudWatch Alarms crosses the threshold.
  6. AWS Identity and Access Management (IAM) to provide permission to the Lambda function to get Amazon EBS metrics and to create CloudWatch Alarms.

Prerequisites

You will need the following prerequisites:

  • To implement this solution, you must have an Amazon EC2 Windows instance configured with Amazon CloudWatch Agent by following the steps documented in the article – How to monitor Windows and Linux servers and get internal performance metrics.
  • To monitor the “FreeStorageSpaceInMB” and “FreeStorageSpaceInPercent” metrics for Amazon EBS volumes attached to the EC2 instance, the CloudWatch agent configuration JSON should have the following section:
"LogicalDisk": {
	"measurement": [
	{
		"name":"% Free Space",
		"rename":"FreeStorageSpaceInPercent",
		"unit":"Percent"
	},
	{
		"name":"Free Megabytes",
		"rename":"FreeStorageSpaceInMB",
		"unit":"Megabytes"
	}
	],
	"metrics_collection_interval": 10,
	"resources": [
		"*"
	]
},
  1. Amazon EC2 host or bastion host with an IAM role attached that has permissions to create an IAM role, Lambda function, and run Amazon Relational Database Service (Amazon RDS) CLI commands. A Lambda function and an IAM role are created using AWS Serverless Application Model (SAM).

AWS SAM

In this section, we provide the steps to create an IAM role and deploy a Lambda function using AWS SAM.

  1. Log in to the Amazon EC2 host and install the AWS SAM CLI.
  2. Download the source code and deploy it by running the following command:
git clone https://github.com/aws-samples/aws-ec2-windows-ebs-volumes-monitoring

cd aws-ec2-windows-ebs-volumes-monitoring/ebs_volumes_monitoring
sam deploy --guided

3. Provide the following parameters:

    1. Stack Name – Name for the AWS CloudFormation stack.
    2. AWS Region – AWS Region where the stack is being deployed.

The following is the sample output when you run sam deploy –guided with default arguments:

=========================================
Stack Name [ebs-volumes-monitoring]: ebs-volumes-monitoring
AWS Region [us-west-2]:
#Shows you resources changes to be deployed and require a 'Y' to initiate deploy
Confirm changes before deploy [y/N]:
#SAM needs permission to be able to create roles to connect to the resources in your template
Allow SAM CLI IAM role creation [Y/n]:
#Preserves the state of previously provisioned resources when an operation fails
Disable rollback [y/N]:
Save arguments to configuration file [Y/n]:
SAM configuration file [samconfig.toml]:
SAM configuration environment [default]:

In the following sections, we describe the AWS services deployed with AWS SAM.

IAM role

AWS SAM creates an IAM role with policies to describe EC2 instances, as well as List, Get, and Put CloudWatch metrics. Furthermore, it attaches an AWS managed IAM policy called AWSLambdaBasicExecutionRole to the IAM role. This role is attached to the Lambda functions to create Amazon EBS volume alarms for EC2 instances.

Lambda function

AWS SAM also deploys the Lambda function. It uses Python 3.8 and accepts two parameters:

  1. Hostname: Amazon EC2 Windows instance name, or, if we must configure alarms for multiple servers, then you can use a wild card character, such as Instance_name* or Instance_name?
  2. sns_topic_name: ARN of the SNS topic that is used to configure CloudWatch Alarms. Notification is sent to the SNS topic when the Amazon EBS Volume metric crosses the threshold.

Invoking Lambda function

After the SAM deployment is successful, we can invoke the Lambda function with the instance name and the SNS Topic ARN. The Lambda function creates two alarms (Warning and Critical) for every Amazon EBS volume attached to the instance. The Warning and Critical values can be changed in the Lambda code so that there are two different values depending on the size of the disk drive. Furthermore, the alarms are configured to send notifications to the SNS Topic. The following is the sample command to invoke the Lambda function:

aws lambda invoke --function-name ec2-ebs-metric --cli-binary-format raw-in-base64-out \
--payload '{"hostname": "Windows*", "sns_topic_name": "arn:aws:sns:us-west-2:123456789:notify_dba" }' response.json

Verifying CloudWatch Alarms:

Verify the CloudWatch Alarms that are created in the CloudWatch console. The following screenshot shows the CloudWatch alarms created for an EC2 instance with four disks. There are two alarms (Warning and Critical) created for every disk (four disks in total). Therefore, we see eight CloudWatch alarms.

CloudWatch console alarms

Checking CloudWatch Logs:

After running the Lambda function, to verify the log, go to Lambda Service page, select the Lambda function created, navigate to the Monitor tab, and then select “View logs in CloudWatch”. Then, go to the latest log file to check the CloudWatch log files for any errors.

Checking CloudWatch LogsSelect the latest Log Steam to check the details of the last Lambda function execution.

Log Steam detailsThe log file shows the full details of the Lambda function execution. Furthermore, it shows the CloudWatch alarms configured for each disk, as well as if there are any errors generated during execution.

Log file detailsCleanup

To clean up the resources used in this post, complete the following steps:

  1. Delete the CloudFormation stack by running below command and replacing STACK_NAME with stack name provided in step 3a above, under section “AWS SAM”
sam delete --stack-name STACK_NAME
  1. Confirm the stack has been deleted by running below command. Replace STACK_NAME as mentioned in previous step.
aws cloudformation list-stacks --query "StackSummaries[?contains(StackName,' STACK_NAME ')].StackStatus"
  1. Delete any CloudWatch alarms created by the Lambda function by following the document – Editing or deleting a CloudWatch alarm.

Conclusion

In this post, we demonstrated how the requirement of monitoring Amazon EC2 Windows EBS Volumes usage is critical. In particular, this is essential when customers have a large fleet of Amazon EC2 Windows servers running to host their databases and applications in the cloud. We showcased the process of automating the free disk monitoring using Lambda and notifying through Amazon SNS when disks cross the storage threshold. By implementing such monitoring, customers can prevent issues with EC2 instances running out of disk space thus preventing critical production outages.

Provide any thoughts or questions in the comments section. We also encourage you to explore CloudWatch monitoring and try out additional use cases mentioned in the documentation.

Using AWS Backup and Oracle RMAN for backup/restore of Oracle databases on Amazon EC2: Part 2

Post Syndicated from Jeevan Shetty original https://aws.amazon.com/blogs/architecture/using-aws-backup-and-oracle-rman-for-backup-restore-of-oracle-databases-on-amazon-ec2-part-2/

Customers running Oracle databases on Amazon Elastic Compute Cloud (Amazon EC2) often take database and schema backups using Oracle native tools like Data Pump and Recovery Manager (RMAN) to satisfy data protection, disaster recovery (DR), and compliance requirements. A priority is to reduce backup time as the data grows exponentially and recover sooner in case of failure/disaster.

In Part 1 of this two-part series, we explain how we can use AWS Backup and Amazon Simple Storage Service (Amazon S3) bucket to perform the backup and restore of an Oracle Database on AWS EC2.

In Part 2, we provide a mechanism to use AWS Backup to create a full backup of the EC2 instance, including the OS image, Oracle binaries, logs, and data files. The mechanism also uses Oracle RMAN to perform archived redo log backup to Amazon Elastic File System (Amazon EFS). Then, we demonstrate the steps to restore a database to a specific point-in-time using AWS Backup and Oracle RMAN.

Solution overview

Figure 1 demonstrates the workflow:

  1. Oracle database on Amazon EC2 configured with Oracle Secure Backup.
  2. AWS Backup service to backup EC2 instance at regular intervals.
  3. Amazon EFS for storing Oracle RMAN archive log backups.
Oracle Database in Amazon EC2 using AWS Backup and EFS for backup and restore

Figure 1. Oracle Database in Amazon EC2 using AWS Backup and EFS for backup and restore

Prerequisites

  1. An AWS account
  2. Oracle database and AWS CLI in an EC2 instance
  3. Access to configure AWS Backup
  4. Access to configure EFS to store the Oracle RMAN archive log backups

1. Configure AWS Backup

Configure AWS Backup as detailed in Step 1 of Part 1.

Oracle RMAN archive log backup

While AWS Backup is now creating a daily backup of the EC2 instance, we also want to make sure we backup the archived log files to a protected location. This will let us do point-in-time restores and restore to more recent times than just the last daily EC2 backup. Below we provide the steps to backup archive log using RMAN to Amazon EFS.

Backup/restore archive logs to/from Amazon EFS

Backing up the Oracle Archive logs is an important part of the process. In this section, we will describe how you can backup their Oracle archive logs to Amazon EFS. One advantage of this option (as compared with using Oracle Secure Backup [detailed in Part 1 of this series]) is that it does not require any additional Oracle licensing.

2. Configure Amazon EFS

a. Create an Amazon EFS file system that will be used to store Oracle RMAN Archive log backups. The image below details the steps involved in creation of an Amazon EFS. Consider that a sample file system ID: fs-0123abcdef012345 is created and will be used to store RMAN archive log backup.

Figure 2. Configure Amazon EFS which is used to store Oracle RMAN archive log backups

b. Install the Amazon EFS Client and follow instructions to install EFS client on RHEL EC2 instance. Note: next steps were tested on RHEL 7.9.

sudo yum -y install git
sudo yum -y install rpm-build
git clone https://github.com/aws/efs-utils
cd efs-utils/
sudo yum -y install make
sudo make rpm
sudo yum -y install ./build/amazon-efs-utils*rpm

c. Mount the EFS file system on your EC2 instance. In this example, we show the steps to mount EFS filesystem on EC2 Instance (if the command requests to upgrade stunnel, refer to Upgrading stunnel. Ensure that the EC2 instance profile attached has necessary policies to access EFS. /rman for mount point and file system ID: fs-0123abcdef012345 are examples for EFS file system.

sudo mkdir /rman
sudo mount -t efs -o tls,iam fs-0123abcdef012345 /rman

d. To mount EFS file system automatically on EC2 instance reboot, add an entry in /etc/fstab. This example is for RHEL EC2 instance:

fs-0123abcdef012345:/      /rman        efs     _netdev,tls,iam        0 0

3. Configure RMAN backup to Amazon EFS

With Amazon EFS mounted on EC2 instance, we can configure Oracle RMAN archive log backups to EFS. In below commands oratst is used as an example of your ORACLE_SID.

a. Configure RMAN repository to take control file backup to Amazon EFS automatically.

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/rman/ctrl-D_%d_%F';
CONFIGURE CONTROLFILE AUTOBACKUP ON;

b. Create a script (for example, rman_archive.sh) with below commands and schedule using crontab (example entry: */5 * * * * rman_archive.sh) to run every 5 minutes. This will ensure that Oracle Archive logs are backed up to Amazon EFS (/rman) frequently, ensuring an recovery point objective (RPO) of 5 minutes.

dt=`date +%Y%m%d_%H%M%S`

rman target / log=/rman/rman_arch_bkup_oratst_${dt}.log <<EOF

RUN
{
    allocate channel c1_efs device type disk format '/rman/arch-D-%d_%T_s%s_p%p' MAXPIECESIZE 10G;
    BACKUP ARCHIVELOG ALL delete all input;
    release channel c1_efs;
}

EOF

4. Perform database point-in-time recovery

In event of a database crash/corruption, we can use AWS Backup service and Oracle RMAN archive log backup to recover database to a specific point-in-time.

a. Typically, you would pick the most recent Recovery Point completed before the time to which you wish to recover. Using AWS Backup, identify the Recovery point ID to restore by following the steps from Restoring an Amazon EC2 instance. Note: when following the steps, be sure to set the “User data” settings as described in the next bulleted item.

After the EBS volumes are created from the snapshot, there is no need to wait for all of the data to transfer from Amazon S3 to your Amazon EBS volume before your attached instance can start accessing the volume. Amazon EBS Snapshots implement lazy loading, so that you can begin using them right away.

b. Ensure that database does not start automatically after restoring the EC2 instance, by renaming /etc/oratab. Use below command in “User data” section while restoring EC2 instance. After database recovery, we can rename it back to /etc/oratab.

#!/usr/bin/sh
sudo su - 
mv /etc/oratab /etc/oratab_bk

c. Login to the EC2 instance once it is up and execute the RMAN recovery commands mentioned below. Identify the DBID from RMAN logs saved in the EFS. Below commands use database oratst as an example.

rman target /

RMAN> startup nomount

RMAN> set dbid DBID


# Below command is to restore the controlfile from autobackup

RMAN> RUN
{
    set controlfile autobackup format for device type disk to '/rman/ctrl-D_%d_%F';
    RESTORE CONTROLFILE FROM AUTOBACKUP;
    alter database mount;
}


#Identify the recovery point (sequence_number) by listing the backups available in catalog.

RMAN> list backup;

In Figure 3, the most recent archive log backed up is 460, so you can use this sequence number in the next set of RMAN commands.

RMAN> RUN
{
    allocate channel c1_efs device type disk format '/rman/arch-D-%d_%T_s%s_p%p';    
    recover database until sequence sequence_number;
    ALTER DATABASE OPEN RESETLOGS;
    release channel c1_efs;
}

Sample output of Oracle RMAN “list backup” command

Figure 3. Sample output of Oracle RMAN “list backup” command

d. To avoid performance issues due to lazy loading, after the database is open, you can run below command to force a faster restoration of the blocks from S3 bucket to EBS volumes (below example allocates two channels and validates the entire database).

RMAN> RUN
{
  ALLOCATE CHANNEL c1 DEVICE TYPE DISK;
  ALLOCATE CHANNEL c2 DEVICE TYPE DISK;
  VALIDATE database section size 1200M;
}

e. This completes the recovery of database, and we can let the database to auto start by renaming file back to /etc/oratab.

mv /etc/oratab_bk /etc/oratab

5. Backup retention

Ensure that the AWS Backup Lifecycle policy match Oracle archive log backup retention. Also, follow documentation to configure Oracle backup retention and deleting expired backup. Below is a sample command for Oracle backup retention.

CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 31 DAYS; 

RMAN> RUN
{
    allocate channel c1_efs device type disk format '/rman/arch-D-%d_%T_s%s_p%p';

    crosscheck backup;
    delete noprompt obsolete;
    delete noprompt expired backup;
    
    release channel c1_efs;
}

Cleanup

Follow below instructions to remove or cleanup the setup:

  1. Delete the backup plan created in Step 1.
  2. Remove the cron entry from the EC2 instance configured in Step 3b.
  3. Delete the EFS that was created in Step 2 to store Oracle RMAN archive log backups.

Conclusion

In this post, we demonstrated the use for AWS Backup for EC2 snapshot and EFS as storage for Oracle RMAN archive log backups. With this strategy for backup, Oracle Database running on EC2 can be restored and recovered to a point-in-time faster than oracle native backup and recovery strategies. Also, by using EFS for Oracle RMAN archive log backups, we can avoid the additional licensing required to use Oracle Secure Backup, explained in Part 1. You can leverage this solution to facilitate restoring copies of your production database for development or testing purposes and to Recover from a user error that removes data or corrupts existing data.

To learn more about AWS Backup, refer to the AWS Backup Documentation.

Using AWS Backup and Oracle RMAN for backup/restore of Oracle databases on Amazon EC2: Part 1

Post Syndicated from Jeevan Shetty original https://aws.amazon.com/blogs/architecture/using-aws-backup-and-oracle-rman-for-backup-restore-of-oracle-databases-on-amazon-ec2-part-1/

Customers running Oracle databases on Amazon Elastic Compute Cloud (Amazon EC2) often take database and schema backups using Oracle native tools, like Data Pump and Recovery Manager (RMAN), to satisfy data protection, disaster recovery (DR), and compliance requirements. A priority is to reduce backup time as the data grows exponentially and recover sooner in case of failure/disaster.

In situations where RMAN backup is used as a DR solution, using AWS Backup to backup the file system and using RMAN to backup the archive logs are an efficient method to perform Oracle database point-in-time recovery in the event of a disaster.

Sample use cases:

  1. Quickly build a copy of production database to test bug fixes or for a tuning exercise.
  2. Recover from a user error that removes data or corrupts existing data.
  3. A complete database recovery after a media failure.

There are two options to backup the archive logs using RMAN:

  1. Using Oracle Secure Backup (OSB) and an Amazon Simple Storage Service (Amazon S3) bucket as the storage for archive logs
  2. Using Amazon Elastic File System (Amazon EFS) as the storage for archive logs

This is Part 1 of this two-part series, we provide a mechanism to use AWS Backup to create a full backup of the EC2 instance, including the OS image, Oracle binaries, logs, and data files. In this post, we will use Oracle RMAN to perform archived redo log backup to an Amazon S3 bucket. Then, we demonstrate the steps to restore a database to a specific point-in-time using AWS Backup and Oracle RMAN.

Solution overview

Figure 1 demonstrates the workflow:

  1. Oracle database on Amazon EC2 configured with Oracle Secure Backup (OSB)
  2. AWS Backup service to backup EC2 instance at regular intervals.
  3. AWS Identity and Access Management (IAM) role for EC2 instance that grants permission to write archive log backups to Amazon S3
  4. S3 bucket for storing Oracle RMAN archive log backups
Figure 1. Oracle Database in Amazon EC2 using AWS Backup and S3 for backup and restore

Figure 1. Oracle Database in Amazon EC2 using AWS Backup and S3 for backup and restore

Prerequisites

For this solution, the following prerequisites are required:

  1. An AWS account
  2. Oracle database and AWS CLI in an EC2 instance
  3. Access to configure AWS Backup
  4. Acces to S3 bucket to store the RMAN archive log backup

1. Configure AWS Backup

You can choose AWS Backup to schedule daily backups of the EC2 instance. AWS Backup efficiently stores your periodic backups using backup plans. Only the first EBS snapshot performs a full copy from Amazon Elastic Block Storage (Amazon EBS) to Amazon S3. All subsequent snapshots are incremental snapshots, copying just the changed blocks from Amazon EBS to Amazon S3, thus, reducing backup duration and storage costs. Oracle supports Storage Snapshot Optimization, which takes third-party snapshots of the database without placing the database in backup mode. By default, AWS Backup now creates crash-consistent backups of Amazon EBS volumes that are attached to an EC2 instance. Customers no longer have to stop their instance or coordinate between multiple Amazon EBS volumes attached to the same EC2 instance to ensure crash-consistency of their application state.

You can create daily scheduled backup of EC2 instances. Figures 2, 3, and 4 are sample screenshots of the backup plan, associating an EC2 instance with the backup plan.

Configure backup rule using AWS Backup

Figure 2. Configure backup rule using AWS Backup

Select EC2 instance containing Oracle Database for backup

Figure 3. Select EC2 instance containing Oracle Database for backup

Summary screen showing the backup rule and resources managed by AWS Backup

Figure 4. Summary screen showing the backup rule and resources managed by AWS Backup

Oracle RMAN archive log backup

While AWS Backup is now creating a daily backup of the EC2 instance, we also want to make sure we backup the archived log files to a protected location. This will let us do point-in-time restores and restore to other recent times than just the last daily EC2 backup. Here, we provide the steps to backup archive log using RMAN to S3 bucket.

Backup/restore archive logs to/from Amazon S3 using OSB

Backing-up the Oracle archive logs is an important part of the process. In this section, we will describe how you can backup their Oracle Archive logs to Amazon S3 using OSB. Note: OSB is a separately licensed product from Oracle Corporation, so you will need to be properly licensed for OSB if you use this approach.

2. Setup S3 bucket and IAM role

Oracle Archive log backups can be scheduled using cron script to run at regular interval (for example, every 15 minutes). These backups are stored in an S3 bucket.

a. Create an S3 bucket with lifecycle policy to transition the objects to S3 Standard-Infrequent Access.
b. Attach the following policy to the IAM Role of EC2 containing Oracle database or create an IAM role (ec2access) with the following policy and attach it to the EC2 instance. Update bucket-name with the bucket created in previous step.


        {
            "Sid": "S3BucketAccess",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name",
                "arn:aws:s3:::bucket-name/*"
            ]
        }

3. Setup OSB

After we have configured the backup of EC2 instance using AWS Backup, we setup OSB in the EC2 instance. In these steps, we show the mechanism to configure OSB.

a. Verify hardware and software prerequisites for OSB Cloud Module.
b. Login to the EC2 instance with User ID owning the Oracle Binaries.
c. Download Amazon S3 backup installer file (osbws_install.zip)
d. Create Oracle wallet directory.

mkdir $ORACLE_HOME/dbs/osbws_wallet

e. Create a file (osbws.sh) in the EC2 instance with the following commands. Update IAM role with the one created/updated in Step 2b.

java -jar osbws_install.jar —IAMRole ec2access walletDir $ORACLE_HOME/dbs/osbws_wallet -libDir $ORACLE_HOME/lib/

f. Change permission and run the file.

chmod 700 osbws.sh
./osbws.sh

Sample output: AWS credentials are valid.
Oracle Secure Backup Web Service wallet created in directory /u01/app/oracle/product/19.3.0.0/db_1/dbs/osbws_wallet.
Oracle Secure Backup Web Service initialization file /u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora created.
Downloading Oracle Secure Backup Web Service Software Library from file osbws_linux64.zip.
Download complete.

g. Set ORACLE_SID by executing below command:

. oraenv

h. Running the script – osbws.sh installs OSB libraries and creates a file called osbws<ORACLE_SID>.ora.
i. Add/modify below with S3 bucket(bucket-name) and region(ex:us-west-2) created in Step 2a.

OSB_WS_HOST=http://s3.us-west-2.amazonaws.com
OSB_WS_BUCKET=bucket-name
OSB_WS_LOCATION=us-west-2

4. Configure RMAN backup to S3 bucket

With OSB installed in the EC2 instance, you can backup Oracle archive logs to S3 bucket. These backups can be used to perform database point-in-time recovery in case of database crash/corruption . oratst is used as an example in below commands.

a. Configure RMAN repository. Example below uses Oracle 19c and Oracle Sid – oratst.

RMAN> configure channel device type sbt parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)';

b. Create a script (for example, rman_archive.sh) with below commands, and schedule using crontab (example entry: */5 * * * * rman_archive.sh) to run every 5 minutes. This will makes sure Oracle Archive logs are backed up to Amazon S3 frequently, thus ensuring an recovery point objective (RPO) of 5 minutes.

dt=`date +%Y%m%d_%H%M%S`

rman target / log=rman_arch_bkup_oratst_${dt}.log <<EOF

RUN
{
	allocate channel c1_s3 device type sbt
	parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)' MAXPIECESIZE 10G;

	BACKUP ARCHIVELOG ALL delete all input;
	Backup CURRENT CONTROLFILE;

release channel c1_s3;
	
}

EOF

c. Copy RMAN logs to S3 bucket. These logs contain the database identifier (DBID) that is required when we have to restore the database using Oracle RMAN.

aws s3 cp rman_arch_bkup_oratst_${dt}.log s3://bucket-name

5. Perform database point-in-time recovery

In the event of a database crash/corruption, we can use AWS Backup service and Oracle RMAN Archive log backup to recover database to a specific point-in-time.

a. Typically, you would pick the most recent recovery point completed before the time you wish to recover. Using AWS Backup, identify the recovery point ID to restore by following the steps on restoring an Amazon EC2 instance. Note: when following the steps, be sure to set the “User data” settings as described in the next bullet item.

After the EBS volumes are created from the snapshot, there is no need to wait for all of the data to transfer from Amazon S3 to your EBS volume before your attached instance can start accessing the volume. Amazon EBS snapshots implement lazy loading, so that you can begin using them right away.

b. Be sure the database does not start automatically after restoring the EC2 instance, by renaming /etc/oratab. Use the following command in “User data” section while restoring EC2 instance. After database recovery, we can rename it back to /etc/oratab.

#!/usr/bin/sh
sudo su - 
mv /etc/oratab /etc/oratab_bk

c. Login to the EC2 instance once it is up, and execute the RMAN recovery commands mentioned. Identify the DBID from RMAN logs saved in the S3 bucket. These commands use database oratst as an example:

rman target /

RMAN> startup nomount

RMAN> set dbid DBID

# Below command is to restore the controlfile from autobackup

RMAN> RUN
{
    allocate channel c1_s3 device type sbt
	parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)';

    RESTORE CONTROLFILE FROM AUTOBACKUP;
    alter database mount;

    release channel c1_s3;
}


#Identify the recovery point (sequence_number) by listing the backups available in catalog.

RMAN> list backup;

In Figure 5, the most recent archive log backed up is 380, so you can use this sequence number in the next set of RMAN commands.

Sample output of Oracle RMAN “list backup” command

Figure 5. Sample output of Oracle RMAN “list backup” command

RMAN> RUN
{
    allocate channel c1_s3 device type sbt
	parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)';

    recover database until sequence sequence_number;
    ALTER DATABASE OPEN RESETLOGS;
    release channel c1_s3;
}

d. To avoid performance issues due to lazy loading, after the database is open, run the following command to force a faster restoration of the blocks from S3 bucket to EBS volumes (this example allocates two channels and validates the entire database).

RMAN> RUN
{
  ALLOCATE CHANNEL c1 DEVICE TYPE DISK;
  ALLOCATE CHANNEL c2 DEVICE TYPE DISK;
  VALIDATE database section size 1200M;
}

e. This completes the recovery of database, and we can let the database automatically start by renaming file back to /etc/oratab.

mv /etc/oratab_bk /etc/oratab

6. Backup retention

Ensure that the AWS Backup lifecycle policy matches the Oracle Archive log backup retention. Also, follow documentation to configure Oracle backup retention and delete expired backups. This is a sample command for Oracle backup retention:

CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 31 DAYS; 

RMAN> RUN
{
    allocate channel c1_s3 device type sbt
	parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)';

            crosscheck backup;
            delete noprompt obsolete;
            delete noprompt expired backup;

    release channel c1_s3;
}

Cleanup

Follow below instructions to remove or cleanup the setup:

  1. Delete the backup plan created in Step 1.
  2. Uninstall Oracle Secure Backup from the EC2 instance.
  3. Delete/Update IAM role (ec2access) to remove access from the S3 bucket used to store archive logs.
  4. Remove the cron entry from the EC2 instance configured in Step 4b.
  5. Delete the S3 bucket that was created in Step 2a to store Oracle RMAN archive log backups.

Conclusion

In this post, we demonstrate how to use AWS Backup and Oracle RMAN Archive log backup of Oracle databases running on Amazon EC2 can restore and recover efficiently to a point-in-time, without requiring an extra-step of restoring data files. Data files are restored as part of the AWS Backup EC2 instance restoration. You can leverage this solution to facilitate restoring copies of your production database for development or testing purposes, plus recover from a user error that removes data or corrupts existing data.

To learn more about AWS Backup, refer to the AWS Backup AWS Backup Documentation.

New – Amazon EC2 M1 Mac Instances

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-amazon-ec2-m1-mac-instances/

Last year, during the re:Invent 2021 conference, I wrote a blog post to announce the preview of EC2 M1 Mac instances. I know many of you requested access to the preview, and we did our best but could not satisfy everybody. However, the wait is over. I have the pleasure of announcing the general availability of EC2 M1 Mac instances.

EC2 Mac instances are dedicated Mac mini computers attached through Thunderbolt to the AWS Nitro System, which lets the Mac mini appear and behave like another EC2 instance. It connects to your Amazon Virtual Private Cloud (Amazon VPC), boots from Amazon Elastic Block Store (EBS) volumes, and uses EBS snapshots, Amazon Machine Images (AMIs), security groups and other AWS services such as Amazon CloudWatch and AWS Systems Manager.

The availability of EC2 M1 Mac instances lets you access machines built around the Apple-designed M1 System on Chip (SoC). If you are a Mac developer and re-architecting your apps to natively support Macs with Apple silicon, you may now build and test your apps and take advantage of all the benefits of AWS. Developers building for iPhone, iPad, Apple Watch, and Apple TV will also benefit from faster builds. EC2 M1 Mac instances deliver up to 60 percent better price performance over the x86-based EC2 Mac instances for iPhone and Mac app build workloads.

For example, I tested the time it takes to clean, build, archive, and run the unit tests on a sample project I wrote. The new EC2 M1 Mac instances complete this set of tasks in 49 seconds on average. This is 47.8 percent faster than the same set of tasks running on the previous generation of EC2 Mac instances.

To see how to launch an EC2 M1 Mac instance from the AWS Management Console or the AWS Command Line Interface (CLI), I invite you to read my last blog post on the subject.

EC2 Mac M1 Instance

During the six months of the preview, we collected your feedback and fine-tuned the service to your needs.

We’ve added a new FAQ section to our documentation to get started with EC2 M1 Mac instances. Agents for management and observability, such as Systems Manager and CloudWatch, are pre-installed on all our macOS AMIs, along with tools such as the AWS Command Line Interface (CLI) and our AWS SDKs. EC2 M1 Mac instances integrate with other AWS services, such as Amazon Elastic File System (Amazon EFS) for file storage, AWS Auto Scaling, or AWS Secrets Manager.

For example, I am using Secrets Manager to securely store my build secrets, such as the signing keys and certificates used to sign my binaries before to distribute them on the App Store. From my laptop, I first make sure to export the certificate from the macOS keychain. I then upload my certificate to Secrets Manager with this command:

aws secretsmanager create-secret            \
       --name apple-signing-dev-certificate \
       --secret-binary fileb://./secrets/apple_dev_seb.p12 

On the EC2 M1 Mac instance, to prepare my instance before the build phase, I download the certificate, decode it (it is base64-encoded), and store it in the EC2 M1 Mac instance keychain, where the codesign tool will find it during the build.

# download the certificate from Secrets Manager
SIGNING_DEV_KEY=$($aws secretsmanager get-secret-value  \
      --secret-id apple-signing-dev-certificate         \
      --query SecretBinary --output text)
	  

# save the certificate as a file
echo $SIGNING_DEV_KEY | base64 -d > seb_dev_certificate.p12

# import the certificate in the keychain 
security import seb_dev_certificate.p12 \
                -P "my_cert_password"   \
                -k my.dev.keychain      \
                -T /usr/bin/security -T /usr/bin/codesign -T /usr/bin/xcodebuild

# delete the certificate from disk
rm seb_dev_certificate.p12

There are a few more configuration steps to get code signing work from the macOS command line. You can check out this presentation I made or my code repository for the details.

We are preparing a couple of events to help you learn more about EC2 M1 Mac instance use cases and configuration. First, we recently had an online webinar to learn how to take advantage of EC2 Mac instances for iOS development, content is available for you to consume on-demand after a free registration step. Second, we are preparing a one-day, in-person developer conference for later this year. The conference agenda will be packed with technical content and workshops. Stay tuned on social media to learn more about it.

Last and not least, but not related to EC2 Mac instances, the Apple WWDC 2022 conference took place last month, from June 6–8, 2022, and the content is available online. This is a great occasion to learn more about development for Apple systems in general.

And now, go build 😉

— seb

Configuring low latency connectivity between AWS Outposts rack and on-premises data using CoIP

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/configuring-low-latency-connectivity-between-aws-outposts-rack-using-coip-and-on-premises-data/

This blog post is written by, Leonardo Azize Martins, Cloud Infrastructure Architect, Professional Services.

AWS Outposts rack enables applications that need to run on-premises due to low latency, local data processing, or local data storage needs by connecting Outposts rack to your on-premises network via the local gateway (LGW).

Each Outpost rack includes a local gateway to provide low latency connectivity between the Outpost and any local data sources, end users, local machinery and equipment, or local databases. If you have an Outpost rack, you can include a local gateway as the target in your VPC subnet route table where the destination is your on-premises network. Local gateways are only available for Outposts rack and can only be used in route tables where the VPC has been associated with an LGW.

In a previous blog post on connecting AWS Outposts to on-premises data sources, you learned the different use cases to use AWS Outposts connected with your local network. In this post, you will dive deep into local gateway usage and specific details about it. You will learn how to use Outpost local gateway when it is configured as Customer-owned IP addresses (CoIP) mode. You will also learn how to integrate it with your Amazon Virtual Private Cloud (Amazon VPC) and how different routes work regarding Amazon Elastic Compute Cloud (Amazon EC2) instances running on Outposts.

Overview of solution

The primary role of a local gateway is to provide connectivity from an Outpost to your local on-premises LAN. It also provides AWS Outposts connectivity to the internet through your on-premises network via the LGW, so you don’t need to rely on an internet gateway (IGW). The local gateway can also provide a data plane path back to the AWS Region. If you already have connectivity between your LAN and the Region through AWS Site-to-Site VPN or AWS Direct Connect, you can use the same path to connect from the Outpost to the AWS Region privately.

Outpost subnet

Public subnets and private subnets are important concepts to understand for Outposts networking. A public subnet has a route to an internet gateway. The same concept applies to Outpost subnets, when a public subnet exists on Outposts, it will have a route to an internet gateway, which will use a service link as a communication path between an Outpost and the internet gateway in the parent AWS Region.

Outpost public route via internet gateway

A private subnet does not have a direct route to the internet gateway. It will only be local inside the VPC, or it could have a route to a Network Address Translation (NAT) gateway. In both cases, the communication between Outpost subnets and AWS Region subnets will be done via the service link.

Outpost private route via NAT gateway

Your subnet can be private and only be allowed to communicate with your on-premises network. You just need a route pointing to the LGW.

Outpost private route via LGW

You can also provide internet connectivity to your Outpost subnets via LGW. In this case, it will not use the service link. As soon as it traverses the LGW and goes to your next hop, it will follow your routing flow to the internet.

Outpost public route via LGW

Routing

By default, every Outpost subnet inherits the main route table from its VPC. You can create a custom route table and associate it with an Outpost subnet. You can include a local gateway as the target when the destination is your on-premises network. A local gateway can only be used in VPC and subnet route tables that are exclusively associated with an Outpost subnet. If the route table is associated with an Outpost subnet and a Region subnet, it will not allow you to add a local gateway as the target.

Error message: addition of a local gateway as the target is denied

Local gateways are also not supported in the main route table.

Error message: routes that target local gateways not supported in main route table

The local gateway advertises Outpost IP address ranges to your on-premises network via BGP. In the other direction, from an on-premises network to the Outpost, it doesn’t use BGP, which means there is no propagation. You need to configure your VPC route table with static routes.

As of this writing, the LGW does not support jumbo frame.

Outposts IP addresses

Outposts can be configured in customer-owned IP (CoIP) mode.

Customer-owned IP

During the installation process, uses information that you provide about your on-premises network to create an address pool, which is known as a customer-owned IP address pool (CoIP pool). AWS then assigns it to the local gateway for use and advertises back to your on-premises network through BGP.

CoIP addresses provide local or external connectivity to resources in your Outpost subnets through your on-premises network. You can assign these IP addresses to resources on your Outpost, such as an EC2 instance, by allocating a new Elastic IP address from the customer-owned IP pool and then assigning this new Elastic IP address to your EC2 instance.

A local gateway serves as NAT for EC2 instances that have been assigned addresses from your customer-owned IP pool.

You can optionally share your customer-owned pool with multiple AWS accounts in your AWS Organizations using the AWS Resource Access Manager (RAM). After you share the pool, participants can allocate and associate Elastic IP addresses from the customer-owned IP pool.

Communication between your Outpost and on-premises network will use the CoIP Elastic IP addresses to address instances in the Outpost; the VPC CIDR range is not used.

Walkthrough

You will follow the steps required to configure your VPC to use LGW configured as CoIP, including:

  • Associate your VPC with the LGW route table.
  • Create an Outposts subnet.
  • Create and associate the VPC route table with the subnet.
  • Add a route to on-premises network with LGW as the target.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account.
  • An Outpost that consists of one or more Outposts racks configured in CoIP mode.

Associate VPC with LGW route table

Use the following procedure to associate a VPC with the LGW route table. You can’t associate VPCs that have a CIDR block conflict.

  1. Open the AWS Outposts console.
  2. In the navigation pane, choose Local gateway route tables.
  3. Select the route table and then choose Actions, Associate VPC.
  4. For VPC ID, select the VPC to associate with the local gateway route table.
  5. Choose Associate VPC.

Create an Outpost subnet

You can add Outpost subnets to any VPC in the parent AWS Region for the Outpost. When you do so, the VPC also spans the Outpost.

  1. Open the AWS Outposts console.
  2. On the navigation pane, choose Outposts.
  3. Select the Outpost, and then choose Actions, Create subnet.
  4. Select the VPC and specify an IP address range for the subnet.
  5. Choose Create.

Create and associate VPC route table with the Outpost subnet

You can create a custom route table for your VPC using the Amazon VPC console. It is a best practice to have one specific route table for each subnet.

  1. Open the Amazon VPC console.
  2. In the navigation pane, choose Route Tables.
  3. Choose Create route table.
  4. For VPC, choose your VPC.
  5. Choose Create.
  6. On the Subnet associations tab, choose Edit subnet associations.
  7. Select the check box for the subnet to associate with the route table and then choose Save associations.

Add a route to on-premises network with LGW as the target

You can add a route to a route table using the Amazon VPC console.

  1. Open the Amazon VPC console.
  2. In the navigation pane, choose Route Tables and select
  3. Choose Actions, Edit routes.
  4. Choose Add route. For Destination, enter the destination CIDR block, a single IP address, or the ID of a prefix list.
  5. Choose Save routes.

Allocate and associate a customer-owned IP address

  1. Open the Amazon EC2 console.
  2. In the navigation pane, choose Elastic IPs.
  3. Choose Allocate new address.
  4. For Network Border Group, select the location from which the IP address is advertised.
  5. For Public IPv4 address pool, choose Customer owned IPv4 address pool.
  6. For Customer owned IPv4 address pool, select the pool that you configured.
  7. Choose Allocate and close the confirmation screen.
  8. In the navigation pane, choose Elastic IPs.
  9. Select an Elastic IP address and choose Actions, Associate address.
  10. Select the instance from Instance and then choose Associate.

For more information about Launch an instance on your Outpost, refer to the AWS Outposts User Guide.

Allocate Elastic IP address

Cleaning up

To avoid incurring future charges, delete the resources, like EC2 instances.

Conclusion

In this post, I covered how to use the Outposts rack local gateway to communicate with your on-premises network. You learned how a subnet route table can influence the connectivity of public or private Outpost instances.

To learn more, check out our Outposts local gateway documentation and the networking reference architecture.

Understanding the lifecycle of Amazon EC2 Dedicated Hosts

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/understanding-the-lifecycle-of-amazon-ec2-dedicated-hosts/

This post is written by Benjamin Meyer, Sr. Solutions Architect, and Pascal Vogel, Associate Solutions Architect.

Amazon Elastic Compute Cloud (Amazon EC2) Dedicated Hosts enable you to run software on dedicated physical servers. This lets you comply with corporate compliance requirements or per-socket, per-core, or per-VM licensing agreements by vendors, such as Microsoft, Oracle, and Red Hat. Dedicated Hosts are also required to run Amazon EC2 Mac Instances.

The lifecycles and states of Amazon EC2 Dedicated Hosts and Amazon EC2 instances are closely connected and dependent on each other. To operate Dedicated Hosts correctly and consistently, it is critical to understand the interplay between Dedicated Hosts and EC2 Instances. In this post, you’ll learn how EC2 instances are reliant on their (dedicated) hosts. We’ll also dive deep into their respective lifecycles, the connection points of these lifecycles, and the resulting considerations.

What is an EC2 instance?

An EC2 instance is a virtual server running on top of a physical Amazon EC2 host. EC2 instances are launched using a preconfigured template called Amazon Machine Image (AMI), which packages the information required to launch an instance. EC2 instances come in various CPU, memory, storage and GPU configurations, known as instance types, to enable you to choose the right instance for your workload. The process of finding the right instance size is known as right sizing. Amazon EC2 builds on the AWS Nitro System, which is a combination of dedicated hardware and the lightweight Nitro hypervisor. The EC2 instances that you launch in your AWS Management Console via Launch Instances are launched on AWS-controlled physical hosts.

What is an Amazon EC2 Bare Metal instance?

Bare Metal instances are instances that aren’t using the Nitro hypervisor. Bare Metal instances provide direct access to physical server hardware. Therefore, they let you run legacy workloads that don’t support a virtual environment, license-restricted business-critical applications, or even your own hypervisor. Workloads on Bare Metal instances continue to utilize AWS Cloud features, such as Amazon Elastic Block Store (Amazon EBS), Elastic Load Balancing (ELB), and Amazon Virtual Private Cloud (Amazon VPC).

What is an Amazon EC2 Dedicated Host?

An Amazon EC2 Dedicated Host is a physical server fully dedicated to a single customer. With visibility of sockets and physical cores of the Dedicated Host, you can address corporate compliance requirements, such as per-socket, per-core, or per-VM software licensing agreements.

You can launch EC2 instances onto a Dedicated Host. Instance families such as M5, C5, R5, M5n, C5n, and R5n allow for the launching of different instance sizes, such as4xlarge and 8xlarge, to the same host. Other instance families only support a homogenous launching of a single instance size. For more details, see Dedicated Host instance capacity.

As an example, let’s look at an M6i Dedicated Host. M6i Dedicated Hosts have 2 sockets and 64 physical cores. If you allocate a M6i Dedicated Host, then you can specify what instance type you’d like to support for allocation. In this case, possible instance sizes are:

  • large
  • xlarge
  • 2xlarge
  • 4xlarge
  • 8xlarge
  • 12xlarge
  • 16xlarge
  • 24xlarge
  • 32xlarge
  • metal

The number of instances that you can launch on a single M6i Dedicated Host depends on the selected instance size. For example:

  • In the case of xlarge (4 vCPUs), a maximum of 32 m6i.xlarge instances can be scheduled on this Dedicated Host.
  • In the case of 8xlarge (32 vCPUs), a maximum of 4 m6i.8xlarge instances can be scheduled on this Dedicated Host.
  • In the case of metal (128 vCPUs), a maximum of 1 m6i.metal instance can be scheduled on this Dedicated Host.

When launching an EC2 instance on a Dedicated Host, you’re billed for the Dedicated Host but not for the instance. The cost for Amazon EBS volumes is the same as in the case of regular EC2 instances.

Exemplary homogenious M6i Dedicated Host shown with 32 m6i.xlarge, four m6i.8xlarge and one m6i.metal each.

Exemplary M6i Dedicated Host instance selections: m6i.xlarge, m6i.8xlarge and m6i.metal

Understanding the EC2 instance lifecycle

Amazon EC2 instance lifecycle states and transitions

Throughout its lifecycle, an EC2 instance transitions through different states, starting with its launch and ending with its termination. Upon Launch, an EC2 instance enters the pending state. You can only launch EC2 instances on Dedicated Hosts in the available state. You aren’t billed for the time that the EC2 instance is in any state other than running. When launching an EC2 instance on a Dedicated Host, you’re billed for the Dedicated Host but not for the instance. Depending on the user action, the instance can transition into three different states from the running state:

  1. Via Reboot from the running state, the instance enters the rebooting state. Once the reboot is complete, it reenters the running state.
  2. In the case of an Amazon EBS-backed instance, a Stop or Stop-Hibernate transitions the running instance into the stopping state. After reaching the stopped state, it remains there until further action is taken. Via Start, the instance will reenter the pending and subsequently the running state. Via Terminate from the stopped state, the instance will enter the terminated state. As part of a Stop or Stop-Hibernate and subsequent Start, the EC2 instance may move to a different AWS-managed host. On Reboot, it remains on the same AWS-managed host.
  3. Via Terminate from the running state, the instance will enter the shutting-down state, and finally the terminated state. An instance can’t be started from the terminated state.

Understanding the Amazon EC2 Dedicated Host lifecycle

A diagram of the the Amazon EC2 Dedicated Host lifecycle states and transitions between them.

Amazon EC2 Dedicated Host lifecycle states and transitions

An Amazon EC2 Dedicated Host enters the available state as soon as you allocate it in your AWS account. Only if the Dedicated Host is in the available state, you can launch EC2 instances on it. You aren’t billed for the time that your Dedicated Host is in any state other than available. From the available state, the following states and state transitions can be reached:

  1. You can Release the Dedicated Host, transitioning it into the released state. Amazon EC2 Mac Instances Dedicated Hosts have a minimum allocation time of 24h. They can’t be released within the 24h. You can’t release a Dedicated Host that contains instances in one of the following states: pending, running, rebooting, stopping, or shutting down. Consequently, you must Stop or Terminate any EC2 instances on the Dedicated Host and wait until it’s in the available state before being able to release it. Once an instance is in the stopped state, you can move it to a different Dedicated Host by modifying its Instance placement configuration.
  2. The Dedicated Host may enter the pending state due to a number of reasons. In case of an EC2 Mac instance, stopping or terminating a Mac instance initiates a scrubbing workflow of the underlying Dedicated Host, during which it enters the pending state. This scrubbing workflow includes tasks such as erasing the internal SSD, resetting NVRAM, and more, and it can take up to 50 minutes to complete. Additionally, adding or removing a Dedicated Host to or from a Resource Group can cause the Dedicated Host to go into the pending state. From the pending state, the Dedicated Host will reenter the available state.
  3. The Dedicated Host may enter the under-assessment state if AWS is investigating a possible issue with the underlying infrastructure, such as a hardware defect or network connectivity event. While the host is in the under-assessment state, all of the EC2 instances running on it will have the impaired status. Depending on the nature of the underlying issue and if it’s configured, the Dedicated Host will initiate host auto recovery.

If Dedicated Host Auto Recovery is enabled for your host, then AWS attempts to restart the instances currently running on a defect Dedicated Host on an automatically allocated replacement Dedicated Host without requiring your manual intervention. When host recovery is initiated, the AWS account owner is notified by email and by an AWS Health Dashboard event. A second notification is sent after the host recovery has been successfully completed. Initially, the replacement Dedicated Host is in the pending state. EC2 instances running on the defect dedicated Host remain in the impaired status throughout this process. For more information, see the Host Recovery documentation.

Once all of the EC2 instances have been successfully relaunched on the replacement Dedicated Host, it enters the available state. Recovered instances reenter the running state. The original Dedicated Host enters the released-permanent-failure state. However, if the EC2 instances running on the Dedicated Host don’t support host recovery, then the original Dedicated Host enters the permanent-failure state instead.

Conclusion

In this post, we’ve explored the lifecycles of Amazon EC2 instances and Amazon EC2 Dedicated Hosts. We took a close look at the individual lifecycle states and how both lifecycles must be considered in unison to operate EC2 Instances on EC2 Dedicated Hosts correctly and consistently. To learn more about operating Amazon EC2 Dedicated Hosts, visit the EC2 Dedicated Hosts User Guide.

Use AWS Nitro Enclaves to perform computation of multiple sensitive datasets

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/leveraging-aws-nitro-enclaves-to-perform-computation-of-multiple-sensitive-datasets/

This blog post is written by, Jeff Wisman, Principal Solutions Architect and Andrew Lee, Solutions Architect.

Introduction

Many organizations have sensitive datasets that they do not want to share with others because of stringent security and compliance requirements. However, they would still like to use each other’s data to perform processing and aggregation. For example, B2B (business to business) companies often want to augment their customer information dataset with additional demographic or psychographic signals. This enrichment of data is often done by one party sending customer information to be matched against another party’s data universe. Naturally, privacy and the revealing of business-critical customer information to an external entity is a major concern here. In this blog, we present a solution where multiple parties can choose to give an isolated compute environment access to their encrypted data to be decrypted and processed in a secure way using AWS Nitro Enclaves.

Designing and building your own secure private computing solution can be challenging, with few out-of-the-box solutions. Our sample application uses Nitro Enclaves, which support the creation of an isolated execution environment called an enclave and a cryptographic attestation process for generating and validating the enclave’s identity. The attestation process makes it possible to ensure only authorized code is running, as well as integration with the AWS Key Management Service (AWS KMS), so that only enclaves that you choose can access sensitive data. Nitro Enlaves enables customers to focus more on their application instead of worrying about integration with external services. While many enterprise use cases involve complex datasets, we’ll use a hypothetical scenario to learn the fundamentals of how this works. The example proof of concept (POC) application will be centered around a third-party bidding service for real estate transactions. Buyers will submit encrypted bids to the application. Once all the bids have been entered, the application will decrypt the bids, determine the highest bidder, and return a result without disclosing the actual bid amounts to any party.

How it works

The POC will be deployed across three AWS accounts, one each for buyer-1, buyer-2, and the bidding service. The bidding service will be run in an enclave on the bidding service’s account.

  1. The bidding service will generate a set of measurements called platform configuration registers (PCRs) from the application code that uses the attestation process. PCRs are cryptographic measurements that are unique to an enclave. An attestation document can be used to verify the identity of the enclave and establish trust.
  2. The buyers will each generate their own AWS KMS key and use AWS KMS to authorize cryptographic requests from the bidding service enclave based on PCR values in the attestation document.
  3. The buyers will place their bids into a file, encrypt the file using AWS KMS, and store them in their own Amazon Simple Storage Service (Amazon S3)
  4. The bidding service will run the application, which will retrieve the encrypted bids from each buyer’s S3 bucket, decrypt the bids, calculate the highest bidder, and store the result in an S3 bucket.

Overall workflow of POC

Implementation

Let’s take a deeper dive into the steps involved in implementing this POC. To deploy the POC to your environment, follow the instructions in the AWS Nitro Enclaves Bidding Service GitHub project.

Enclave image generation

The first step is for the bidding service to launch the parent instance that will host the enclave. Refer to Launch the parent instance for more information about this process. The two real estate buyers, which we will call buyer-1 and buyer-2, will need to review the application code of the enclave application and agree that their data will not be exposed outside the enclave. Once they agree on the code, the bidding service generates the enclave image and a set of measurements as part of the attestation process. The buyers should also perform this process to ensure that the enclave image was generated and its measurements are from the agreed-upon application code. During the generation process, a unique set of measurements is taken of the application, which will make up its identity. When the enclave makes a request to decrypt data with AWS KMS, those measurements will be included in an attestation document to prove the enclave’s identity. Access policies in AWS KMS can then grant access to that identity. An example of a set of measurements is shown here:

Enclave Image successfully created.
{ "Measurements": { "HashAlgorithm": "Sha384 { ... }",
"PCR0":"287b24930a9f0fe14b01a71ecdc00d8be8fad90f9834d547158854b8279c74095c43f8d7f047714e98deb7903f20e3dd",
"PCR1":"aca6e62ffbf5f7deccac452d7f8cee1b94048faf62afc16c8ab68c9fed8c38010c73a669f9a36e596032f0b973d21895",
"PCR2":"0315f483ae1220b5e023d8c80ff1e135edcca277e70860c31f3003b36e3b2aaec5d043c9ce3a679e3bbd5b3b93b61d6f"
} }

Preparing encrypted data

Each buyer will create an AWS KMS key and use that key to encrypt their bids. The encrypted bids will be stored in their respective S3 buckets. Because AWS KMS integrates with Nitro Enclaves to provide built-in attestation support, each buyer can add the PCR values generated earlier as a condition to their respective AWS KMS key policies. This will ensure that only the enclave application code agreed upon by both buyers will have access to utilize the keys for decryption. The following is an example of a KMS key policy with PCR values as a condition:

{
  "Version": "2012-10-17",
  "Id": "key-default-1",
  "Statement": [{
    "Sid": "Enable decrypt from enclave",
    "Effect": "Allow",
    "Principal": < PARENT INSTANCE ROLE ARN > ,
    "Action": "kms:Decrypt",
    "Resource": "",
    "Condition": {
      "StringEqualsIgnoreCase": {
        "kms:RecipientAttestation:ImageSha384": "<PCR0 VALUE FROM BUILDING ENCLAVE IMAGE>",
        "kms:RecipientAttestation:PCR1":"<PCR1 VALUE FROM BUILDING ENCLAVE IMAGE>",
        "kms:RecipientAttestation:PCR2":"<PCR2 VALUE FROM BUILDING ENCLAVE IMAGE>"
      }
    }
  }]
}

The previous example only shows a key policy that uses PCR0, PCR1, and PCR2. You can further scope down the permissions by adding additional PCR values, for instance, role, parent instance ID, and a signing certificate for the enclave image. Refer to the AWS Nitro Enclaves User Guide for more details about PCR values.

Running the POC

The bidding service will run the enclave image generated earlier on the parent instance. The application runs as two parts, one part on the parent instance and another in the enclave. Communication between the parent and the enclave is done through a vsock connection. An AWS KMS proxy is also used on the parent to allow communication between the enclave and AWS KMS for decrypting data. The parent application will retrieve the encrypted bids from each buyer’s S3 bucket and send them to the enclave. The enclave will decrypt the data using both buyers’ AWS KMS keys and present attestation documents signed by the Nitro Hypervisor. AWS KMS will validate that the PCR values in the attestation documents match the key policy before performing the decryption. The decryption event will be logged in AWS CloudTrail for auditing purposes. Once the encrypted bids are decrypted, the values are compared, and the winning buyer is recorded in the result. The unencrypted result is then returned to the parent and written to an S3 bucket in the bidding service’s account. A diagram of this process is shown in Figure 2.

Cleanup

Be sure you delete all the resources that were created when following the included Github project:

  • Bidding service EC2 instance EC2 instance IAM role and policy
  • AWS KMS Customer managed keys
  • S3 Buckets for storing encrypted files

Conclusion

In this blog post, we introduced a sample POC utilizing Nitro Enclaves to allow a bidding service to process two parties’ encrypted data without revealing their data to any party. We did this by ensuring access to sensitive data is only allowed from an application running within an enclave. With the straightforward integration of AWS KMS and the attestation process, customers can quickly develop applications on Nitro Enclaves that can enable computing on encrypted datasets from multiple accounts. For more information on AWS Nitro Enclaves, see the official product documentation or the introductory videos on YouTube.

Continually assessing application resilience with AWS Resilience Hub and AWS CodePipeline

Post Syndicated from Scott Bryen original https://aws.amazon.com/blogs/architecture/continually-assessing-application-resilience-with-aws-resilience-hub-and-aws-codepipeline/

As customers commit to a DevOps mindset and embrace a nearly continuous integration/continuous delivery model to implement change with a higher velocity, assessing every change impact on an application resilience is key. This blog shows an architecture pattern for automating resiliency assessments as part of your CI/CD pipeline. Automatically running a resiliency assessment within CI/CD pipelines, development teams can fail fast and understand quickly if a change negatively impacts an applications resilience. The pipeline can stop the deployment into further environments, such as QA/UAT and Production, until the resilience issues have been improved.

AWS Resilience Hub is a managed service that gives you a central place to define, validate and track the resiliency of your AWS applications. It is integrated with AWS Fault Injection Simulator (FIS), a chaos engineering service, to provide fault-injection simulations of real-world failures. Using AWS Resilience Hub, you can assess your applications to uncover potential resilience enhancements. This will allow you to validate your applications recovery time (RTO), recovery point (RPO) objectives and optimize business continuity while reducing recovery costs. Resilience Hub also provides APIs for you to integrate its assessment and testing into your CI/CD pipelines for ongoing resilience validation.

AWS CodePipeline is a fully managed continuous delivery service for fast and reliable application and infrastructure updates. You can use AWS CodePipeline to model and automate your software release processes. This enables you to increase the speed and quality of your software updates by running all new changes through a consistent set of quality checks.

Continuous resilience assessments

Figure 1 shows the resilience assessments automation architecture in a multi-account setup. AWS CodePipeline, AWS Step Functions, and AWS Resilience Hub are defined in your deployment account while the application AWS CloudFormation stacks are imported from your workload account. This pattern relies on AWS Resilience Hub ability to import CloudFormation stacks from a different accounts, regions, or both, when discovering an application structure.

High-level architecture pattern for automating resilience assessments

Figure 1. High-level architecture pattern for automating resilience assessments

Add application to AWS Resilience Hub

Begin by adding your application to AWS Resilience Hub and assigning a resilience policy. This can be done via the AWS Management Console or using CloudFormation. In this instance, the application has been created through the AWS Management Console. Sebastien Stormacq’s post, Measure and Improve Your Application Resilience with AWS Resilience Hub, walks you through how to add your application to AWS Resilience Hub.

In a multi-account environment, customers typically have dedicated AWS workload account per environment and we recommend you separate CI/CD capabilities into another account. In this post, the AWS Resilience Hub application has been created in the deployment account and the resources have been discovered using an CloudFormation stack from the workload account. Proper permissions are required to use AWS Resilience Hub to manage application in multiple accounts.

Adding application to AWS Resilience Hub

Figure 2. Adding application to AWS Resilience Hub

Create AWS Step Function to run resilience assessment

Whenever you make a change to your application CloudFormation, you need to update and publish the latest version in AWS Resilience Hub to ensure you are assessing the latest changes. Now that AWS Step Functions SDK integrations support AWS Resilience Hub, you can build a state machine to coordinate the process, which will be triggered from AWS Code Pipeline.

AWS Step Functions is a low-code, visual workflow service that developers use to build distributed applications, automate IT and business processes, and build data and machine learning pipelines using AWS services. Workflows manage failures, retries, parallelization, service integrations, and observability so developers can focus on higher-value business logic.

AWS Step Function for orchestrating AWS SDK calls

Figure 3. AWS Step Function for orchestrating AWS SDK calls

  1. The first step in the workflow is to update the resources associated with the application defined in AWS Resilience Hub by calling ImportResourcesToDraftApplication.
  2. Check for the import process to complete using a wait state, a call to DescribeDraftAppVersionResourcesImportStatus and then a choice state to decide whether to progress or continue waiting.
  3. Once complete, publish the draft application by calling PublishAppVersion to ensure we are assessing the latest version.
  4. Once published, call StartAppAssessment to kick-off a resilience assessment.
  5. Check for the assessment to complete using a wait state, a call to DescribeAppAssessment and then a choice state to decide whether to progress or continue waiting.
  6. In the choice state, use assessment status from the response to determine if the assessment is pending, in progress or successful.
  7. If successful, use the compliance status from the response to determine whether to progress to success or fail.
    • Compliance status will be either “PolicyMet” or “PolicyBreached”.
  8. If policy breached, publish onto SNS to alert the development team before moving to fail.

Create stage within code pipeline

Now that we have the AWS Step Function created, we need to integrate it into our pipeline. The post Fine-grained Continuous Delivery With CodePipeline and AWS Step Functions demonstrates how you can trigger a step function from AWS Code Pipeline.

When adding the stage, you need to pass the ARN of the stack which was deployed in the previous stage as well as the ARN of the application in AWS Resilience Hub. These will be required on the AWS SDK calls and you can pass this in as a literal.

AWS CodePipeline stage step function input

Figure 4. AWS CodePipeline stage step function input

Example state using the input from AWS CodePipeline stage

Figure 5. Example state using the input from AWS CodePipeline stage

For more information about these AWS SDK calls, please refer to the AWS Resilience Hub API Reference documents.

Customers often run their workloads in lower environments in a less resilient way to save on cost. It’s important to add the assessment stage at the appropriate point of your pipeline. We recommend adding this to your pipeline after the deployment to a test environment which mirrors production but before deploying to production. By doing this you can fail fast and halt changes which will lower resilience in production.

A note on service quotas: AWS Resilience Hub allows you to run 20 assessments per month per application. If you need to increase this quota, please raise a ticket with AWS Support.

Conclusion

In this post, we have seen an approach to continuously assessing resilience as part of your CI/CD pipeline using AWS Resilience Hub, AWS CodePipeline and AWS Step Functions. This approach will enable you to understand fast if a change will weaken resilience.

AWS Resilience Hub also generates recommended AWS FIS Experiments that you can deploy and use to test the resilience of your application. As well as assessing the resilience, we also recommend you integrate running these tests into your pipeline. The post Chaos Testing with AWS Fault Injection Simulator and AWS CodePipeline demonstrates how you can active this.

AWS Week in Review – June 20, 2022

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/aws-week-in-review-june-20-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Last Week’s Launches
It’s been a quiet week on the AWS News Blog, however a glance at What’s New page shows the various service teams have been busy as usual. Here’s a round-up of announcements that caught my attention this past week.

Support for 15 new resource types in AWS Config – AWS Config is a service for assessment, audit, and evaluation of the configuration of resources in your account. You can monitor and review changes in resource configuration using automation against a desired configuration. The newly expanded set of types includes resources from Amazon SageMaker, Elastic Load Balancing, AWS Batch, AWS Step Functions, AWS Identity and Access Management (IAM), and more.

New console experience for AWS Budgets – A new split-view panel allows for viewing details of a budget without needing to leave the overview page. The new panel will save you time (and clicks!) when you’re analyzing performance across a set of budgets. By the way, you can also now select multiple budgets at the same time.

VPC endpoint support is now available in Amazon SageMaker Canvas SageMaker Canvas is a visual point-and-click service enabling business analysts to generate accurate machine-learning (ML) models without requiring ML experience or needing to write code. The new VPC endpoint support, available in all Regions where SageMaker Canvas is suppported, eliminates the need for an internet gateway, NAT instance, or a VPN connection when connecting from your SageMaker Canvas environment to services such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, and more.

Additional data sources for Amazon AppFlow – Facebook Ads, Google Ads, and Mixpanel are now supported as data sources, providing the ability to ingest marketing and product analytics for downstream analysis in AppFlow-connected software-as-a-service (SaaS) applications such as Marketo and Salesforce Marketing Cloud.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates you may have missed from the past week:

Amazon Elastic Compute Cloud (Amazon EC2) expanded the Regional availability of AWS Nitro System-based C6 instance types. C6gn instance types, powered by Arm-based AWS Graviton2 processors, are now available in the Asia Pacific (Seoul), Europe (Milan), Europe (Paris), and Middle East (Bahrain) Regions, while C6i instance types, powered by 3rd generation Intel Xeon Scalable processors, are now available in the Europe (Frankfurt) Region.

As a .NET and PowerShell Developer Advocate here at AWS, there are some news and updates related to .NET I want to highlight:

Upcoming AWS Events
The AWS New York Summit is approaching quickly, on July 12. Registration is also now open for the AWS Summit Canberra, an in-person event scheduled for August 31.

Microsoft SQL Server users may be interested in registering for the SQL Server Database Modernization webinar on June 21. The webinar will show you how to go about modernizing and how to cost-optimize SQL Server on AWS.

Amazon re:MARS is taking place this week in Las Vegas. I’ll be there as a host of the AWS on Air show, along with special guests highlighting their latest news from the conference. I also have some On Air sessions on using our AI services from .NET lined up! As usual, we’ll be streaming live from the expo hall, so if you’re at the conference, give us a wave. You can watch the show live on Twitch.tv/aws, Twitter.com/AWSOnAir, and LinkedIn Live.

A reminder that if you’re a podcast listener, check out the official AWS Podcast Update Show. There is also the latest installment of the AWS Open Source News and Updates newsletter to help keep you up to date.

No doubt there’ll be a whole new batch of releases and announcements from re:MARS, so be sure to check back next Monday for a summary of the announcements that caught our attention!

— Steve

Implementing Attribute-Based Instance Type Selection using Terraform

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/implementing-attribute-based-instance-type-selection-using-terraform/

This blog post is written by Christian Melendez, Senior Specialist Solutions Architect, Flexible Compute – EC2 Spot and Carlos Manzanedo Rueda, WW SA Leader, Flexible Compute – EC2 Spot.

In this blog post we will cover the release of Terraform support for Attribute-Based Instance Type Selection (ABS). ABS simplifies the configuration required to acquire compute capacity for Instance Flexible workloads. Terraform  is an open-source infrastructure as code software tool by HashiCorp. Hashicorp is an AWS Partner Network (APN) Advanced Technology Partner and member of the AWS DevOps Competency.

Introduction

Amazon EC2 provides a wide selection of instance types optimized to fit different use cases. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity and give you the flexibility to choose the appropriate mix of resources for your applications.

Workloads such as continuous integration, analytics, microservices on containers, etc., can use multiple instance types. Customers have been telling us that simplifying the configuration of instance flexible workloads is important. For workloads that are instance flexible, AWS released ABS to express workload requirements as a set of instance attributes such as: vCPU, memory, type of processor, etc. ABS translates these requirements and selects all matching instance types that meet the criteria. To select which instance to launch, Amazon EC2 Auto Scaling Groups and EC2 Fleet chose instances based on the allocation strategy configured. Lowest-price allocation strategy is supported on both Amazon EC2 On-Demand Instances and Amazon EC2 Spot Instances. The recommendation for Spot Instances is to use capacity-optimized, which select the optimal instances that reduce the frequency of interruptions. ABS does also future-proof EC2 Auto Scaling Group and EC2 Fleet configurations: any new instance type we launch that matches the selected attributes, will be included in the list automatically. No need to update your EC2 Auto Scaling Group or EC2 Fleet configuration.

Following our commitment to open-source projects, AWS has added support for ABS in the AWS Terraform provider. You can use ABS for launch templates, EC2 Auto Scaling Group, and EC2 Fleet resources. The minimum required version of the AWS provider is v4.16.0.

Applying instance flexibility is key for running fault-tolerant, elastic, reliable, and cost optimized workloads. By selecting a diversified choice of instances that qualify for your workload, your application will be better prepared to avoid scenarios where lack of capacity on a specific instance type could be an issue. This applies both to On-Demand and Spot Instance-based workloads. For Spot Instances, applying diversification is key. Spot Instances are spare capacity that can be reclaimed by EC2 when it is required. ABS allows you to specify diversification in simple terms, allowing EC2 Auto Scaling Group and EC2 Fleet’s allocation strategy to replace reclaimed Spot Instances with instances from other pools where capacity is available.

Instance Requirement Attributes

To represent the instance requirements for your workload using ABS, there are a set of attributes you can use within the instance_requirements block. When using Terraform, the only two required attributes are memory_mib and vcpu_count. The rest of the attributes provide default values that adhere to Instance Flexible workloads best practices. For example, bare_metal attribute is by default excluded. You can see the full list of ABS attributes in the Terraform docs site.

Once ABS attributes are configured, ABS picks a list of instance types that match the criteria. This list is especially important when you’re using Spot Instances. One of Spot Instances best practices is to diversify the instance types which, in combination with the capacity-optimized allocation strategy, gives you access to the highest amount of Spot capacity pools. For On-Demand Instances, the instance types list is important as well. There might be scenarios where On-Demand Instance pools lack capacity. By applying instance flexibility using ABS, you can avoid the InsufficientInstanceCapacity error. And in combination with the lowest-price allocation strategy, you get the lowest price instance types from your diversified selection.

There are different places where we can specify ABS attributes. We can specify them at the launch templates level and declare a base mechanism to select instances. In most cases, the recommendation is to configure ABS attributes at the EC2 Auto Scaling Group and EC2 Fleet level. Let’s explore each of these options.

Configuring instance requirements within launch templates

Launch templates are instance configuration templates where you specify parameters like AMI ID, instance type, key pair, and security groups to launch instances. You can use ABS attributes in a launch template when you need to be prescriptive, and define sane defaults or guardrails for your workloads. This way, EC2 Auto Scaling Group or EC2 Fleet simply reference and use the launch template.

You should use ABS attributes in launch templates when you want to prevent users from overriding the resources specified by a launch template. Note that is still possible to override those requirements.

Let’s say that we have a Java application that requires a minimum of 4 vCPUs and 8 GiB of memory, and has been using the c5.xlarge instance type. After performance testing we’ve identified that runs better with current instance types generations. The following code snippet represents how to define these requirements in a Launch Template. To see the full list of attributes, visit the launch template doc site.

resource "aws_launch_template" "abs" {
  name_prefix = "abs"
  image_id    = data.aws_ami.abs.id

  instance_requirements {
    memory_mib {
      min = 8192
    }
    vcpu_count {
      min = 4
    }
    instance_generations = ["current"]
  }
}

Note by using instance_requirements block in a LaunchTemplate, you’ll need to use the mixed_instances_policy block in the EC2 Auto Scaling Group.

resource "aws_autoscaling_group" "on_demand" {
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  max_size           = 1
  min_size           = 1
  mixed_instances_policy {
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.abs.id
      }
    }
  }
}

The EC2 Auto Scaling Group will use instance types that match the requirements in the launch template.

You can preview what are the instances that the EC2 Auto Scaling Group will select. The section “How to Preview Matching Instances without Launching Them” of the ABS blog post, describes how to preview the instances that will be selected.

Launching EC2 Spot Instances with EC2 Auto Scaling

Launch templates are very powerful. They allow you to decouple the attributes such as user_data from the actual instance management. They are idempotent and can be versioned which is key for rolling out changes in the configuration and applying EC2 Auto Scaling Group Instance Refresh.

Our recommendation is to define ABS attributes as overrides within the mixed_instances_policy block in EC2 Auto Scaling Groups. For most of the applications, we recommend using EC2 Auto Scaling Group to provision EC2 instances.

Let’s get back to our previous example. Now we want to be more prescriptive for the instance requirements our Java application uses.

Let’s assume that the Java application is memory intensive, requires a vCPU to Memory ratio of 4GB for every vCPU, and has been using the m5.large instance type. Additionally, it does not need hardware accelerators like GPUs, FPGAs, etc. Our application does also require a minimum of 2 vCPUs, and the range of memory has been reduced to 32GB to avoid large garbage collection scenarios. This time, we’d like to launch only Spot Instances. As mentioned earlier in this blog post, diversification is key for Spot. To improve the experience with Spot, let’s enable the capacity rebalance feature from the EC2 Auto Scaling Group to proactively replace instances that are at an elevated risk of being interrupted. The following code snippet represents the ABS attributes we need for the more prescriptive workload:

resource “aws_autoscaling_group” “spot” {
  availability_zones = [“us-east-1a”, “us-east-1b”, “us-east-1c”]
  desired_capacity   = 1
  max_size           = 1
  min_size           = 1
  capacity_rebalance = true

  mixed_instances_policy {
    instances_distribution {
      spot_allocation_strategy = "capacity-optimized"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.x86.id
      }

      override {
        instance_requirements {
          memory_mib {
            min = 4096
            max = 32768
          }

          vcpu_count {
            min = 2
          }

          memory_gib_per_vcpu {
            min = 4
            max = 4
          }

          accelerator_count {
            max = 0
          }
        }
      }
    }
  }
}

Launching EC2 Spot Instances with EC2 Fleet

Another method we have to launch EC2 instances is the EC2 Fleet API. We recommend to use EC2 Fleet for workloads that need granular controls to provision capacity. For example, tight HPC workloads where instances must be close together (single Availability Zone and within the same placement group) and need similar instance types. EC2 Fleet is also used by capacity orchestrators such as Karpenter or Atlassian Escalator that implement tuned up and optimized logic to provision capacity.

Let’s say that this time the workload is a CPU bound workload and has been using the c5.9xlarge instance type. The workload can be retried and the application supports checkpointing, so it qualifies to use Spot Instances. Given we’ll be using Spot Instances, we would like to benefit from the capacity rebalance feature as we did before in the EC2 Auto Scaling Group example. The application requires very prescriptive ranges of vCPU and memory and we also need a minimum of 100 GB SSD local storage. While in most cases EC2 Auto Scaling Groups are appropriate solution to procure and maintain capacity, in this case we will use EC2 Fleet.

The following code snippet represents the ABS attributes we need for this workload:

resource "aws_ec2_fleet" "spot" {
  target_capacity_specification {
    default_target_capacity_type = "spot"
    total_target_capacity        = 5
  }

  spot_options {
    allocation_strategy = "capacity-optimized"
    maintenance_strategies {
      capacity_rebalance {
        replacement_strategy = "launch"
      }
    }
  }

  launch_template_config {
    launch_template_specification {
      launch_template_id = aws_launch_template.x86.id
      version            = aws_launch_template.x86.latest_version
    }

    override {
      instance_requirements {
        memory_mib {
          min = 65536
          max = 73728
        }

        vcpu_count {
          min = 32
          max = 36
        }

        cpu_manufacturers   = ["intel"]
        local_storage       = "required"
        local_storage_types = ["ssd"]

        total_local_storage_gb {
          min = 100
        }
      }
    }
  }
}

Multi-Architecture workloads using Graviton and x86 with EC2 Auto Scaling Groups

Another recent feature from EC2 Auto Scaling Groups is that you can build multi-architecture workloads. EC2 Auto Scaling Group allows you to mix Graviton and x86 instance types in the same EC2 Auto Scaling Group. Unlike the x86_64 instances we have used so far, AWS Graviton processors are custom built by AWS using 64-bit Arm. You need to use different launch templates as each architecture needs to use a different AMI. This can be defined in within the override block.

In the example below, we use ABS to define different attributes depending on the CPU architecture. And what’s great about doing this is that we don’t need to exclude instance types. Instances will be launched with a compatible CPU architecture based on the AMI that you specify in our launch template.

Besides supporting mixing architectures, EC2 Auto Scaling Groups allows to combine purchase models. For our example, this time we’ll use a more complex scenario to showcase how powerful and feature rich EC2 Auto Scaling Group has become. The following code snippet applies many of the configurations we’ve seen before, but the key difference here is that we have two overrides. One is for Graviton instances, and another one for x86 instances.

resource "aws_autoscaling_group" "on_demand_spot" {
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  desired_capacity   = 4
  max_size           = 10
  min_size           = 2
  capacity_rebalance = true

  mixed_instances_policy {
    instances_distribution {
      on_demand_base_capacity                  = 2
      on_demand_percentage_above_base_capacity = 0
      spot_allocation_strategy                 = "capacity-optimized"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.arm.id
      }

      override {
        launch_template_specification {
          launch_template_id = aws_launch_template.arm.id
        }

        instance_requirements {
          memory_mib {
            min = 16384
            max = 16384
          }

          vcpu_count {
            max = 4
          }
        }
      }

      override {
        launch_template_specification {
          launch_template_id = aws_launch_template.x86.id
        }

        instance_requirements {
          memory_mib {
            min = 16384
          }

          vcpu_count {
            min = 4
          }
        }
      }
    }
  }
}

Multi-architecture workloads can also be applied to container orchestration. Thanks to the Amazon Elastic Container Registry (Ama­zon ECR) support for multi-architecture container images, you can use manifests to push both container images, and Amazon ECR will pull the proper image based on the CPU architecture. We have a workshop for Elastic Kubernetes Service (Amazon EKS) where you can learn more about how to deploy a multi-architecture workload in Amazon EKS.

Conclusion

In this post you’ve learned how to configure ABS to launch instances using Auto Scaling Groups and EC2 Fleet. You’ve learned and how to mix CPU architectures along with purchase models in the same EC2 Auto Scaling Group using Terraform while simplifying the configuration.

Our commitment with open-source projects such as Terraform is to help customers implement AWS best practices in a larger ecosystem easily. ABS support allows customers to get access to compute capacity by simply specifying the resource requirement attributes of their workloads rather than the instance names.

ABS simplifies the configuration for instance flexible workloads and removes the need to list the instances that qualify for your workload. Instead, it simplifies the configuration and future proof for scenarios where AWS includes new instances that qualify for the workload. For Spot workloads where instance diversification is key, ABS simplifies the selection of instances and helps to increase the total number of pools. For more information, visit the ABS user guide and Terraform documentation for Auto Scaling Groups and EC2 Fleet.

Adding approval notifications to EC2 Image Builder before sharing AMIs

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/adding-approval-notifications-to-ec2-image-builder-before-sharing-amis/

This blog post was written by, Glenn Chia Jin Wee, Associate Cloud Architect at AWS and Randall Han, Associate Professional Services Consultant at AWS.

In some situations, you may be required to manually validate the Amazon Machine Image (AMI) built from an Amazon Elastic Compute Cloud (Amazon EC2) Image Builder pipeline before sharing this AMI to other AWS accounts or to an AWS Organization. Currently, Image Builder provides an end-to-end pipeline that automatically shares AMIs after they’ve been built.

In this post, we will walk through the steps to enable approval notifications before AMIs are shared with other AWS accounts. Having a manual approval step could be useful if you would like to verify the AMI configurations before it is shared to other AWS accounts or an AWS Organization. This reduces the possibility of incorrectly configured AMIs being shared to other teams which in turn could lead to downstream issues if applications are installed using this AMI. This solution uses serverless resources to send an email with a link that automatically shares the AMI with the specified AWS accounts. Users select this link after they’ve verified that the AMI is built according to specifications.

Overview

Architecture Diagram

  1. In this solution, an Image Builder Pipeline is run that builds a Golden AMI in Account A. After the AMI is built, Image Builder publishes data about the AMI to an Amazon Simple Notification Service (Amazon SNS) topic.
  2. This SNS Topic passes the data to an AWS Lambda function that subscribes to it.
  3. The Lambda function that subscribes to this topic retrieves the data, formats it, and sends a customized email to another SNS Topic.
  4. The second SNS Topic has an email subscription with the Approver’s email. The approver will receive the customized email with a URL that interacts with the next set of Serverless resources.
  5. Selecting the URL makes a GET request to Amazon API Gateway, thereby passing the AMI ID in the query string.
  6. API Gateway then triggers another Lambda function and passes the AMI ID to it.
  7. The Lambda function obtains the AMI ID from the query string parameter of the API Gateway request, and then shares it with the provided target account.

Prerequisites

For this walkthrough, you will need the following:

Walkthrough

In this section, we will guide you through the steps required to deploy the Image Builder solution that utilizes Serverless resources. The solution is deployed with AWS SAM.

In this scenario, we deploy the solution within the approver’s account. The approval email will be sent to a predefined email address for manual approval, before the newly created AMI is shared to target accounts.

Once the approver selects the approval link, an email notification will be sent to the predefined target account email address, notifying that the AMI has been successfully shared.

The high-level steps we will follow are:

  1. In Account A, deploy the provided AWS SAM template. This includes an example Image Builder Pipeline, Amazon SNS topics, API Gateway, and Lambda functions.
  2. Approve the SNS subscription from your supplied email address.
  3. Run the pipeline from the Amazon EC2 Image Builder Console.
  4. [Optional] After the pipeline runs, launch an Amazon EC2 instance from the built AMI to conduct manual tests
  5. An Amazon SNS email will be sent to you with an API Gateway URL. When clicked, an AWS Lambda function shares the AMI to the Account B.
  6. Log in to Account B and verify that the AMI has been shared.

Step 1: Launch the AWS SAM template

  1. Clone the SAM templates from this GitHub repository.
  2. Run the following command to deploy the templates via SAM. Replace <approver email> with the Approver’s email and <AWS Account B ID> with the AWS Account ID of your second AWS Account.

sam deploy \

–template-file template.yaml \

–stack-name ec2-image-builder-approver-notifications \

–capabilities CAPABILITY_IAM \

–resolve-s3 \

–parameter-overrides \

ApproverEmail=<approver email> \

TargetAccountEmail=<target account email> \

TargetAccountlds=<AWS Account B ID>

Step 2: Verify your email address

  1. After running the deployment, you will receive an email prompting you to confirm the Subscription at the approver email address. Choose Confirm subscription.

Email to confirm SNS topic subscription

  1. This leads to the following screen, which shows that your subscription is confirmed.

SNS topic subscription confirmation

  1. Repeat the previous 2 steps for the target email address.

Step 3: Run the pipeline from the Image Builder console

  1. In the Image Builder console, under Image pipelines, select the checkbox next to the Pipeline created, choose Actions, and select Run pipeline.

Run the Image Builder Pipeline

Note that the pipeline takes approximately 20 to 30 minutes to complete.

Step 4: [Optional] Launch an Amazon EC2 instance from the built AMI

There could be a requirement to manually validate the AMI before sharing it to other AWS accounts or to the AWS organization. With this requirement, approvers will launch an Amazon EC2 instance from the built AMI and conduct manual tests on the EC2 instance to make sure that it is functional.

  1. In the Amazon EC2 console, under Images, choose AMIs. Validate that the AMI is created.

Validate the AMI has been built

  1. Follow AWS docs: Launching an EC2 instances from a custom AMI for steps on how to launch an Amazon EC2 instance from the AMI.

Step 5: Select the approval URL in the email sent

  1. When the pipeline is run successfully, you will receive another email with a URL to share the AMI.

Approval link to share the AMI to Account B

2. Selecting this URL results in the following screen which shows that the AMI share is successful.

Result showing the AMI was successfully shared after selecting the approval link

Step 6: Verify that the AMI is shared to Account B

  1. Log in to Account B.
  2. In the Amazon EC2 console, under Images, choose AMIs. Then, in the dropdown, choose Private images. Validate that the AMI is shared.

AMI is shared when Private images are selected from the dropdown

3. Verify that a success email notification was sent to the target account email address provided.

Successful AMI share email notification sent to Target Account Email Address

Clean up

This section provides the necessary information for deleting various resources created as part of this post.

1. Deregister the AMIs created and shared.

a. Log in to Account A and follow the steps at AWS documentation: Deregister your Linux AMI.

2. Delete the SAM stack with the following command. Replace <region> with the Region of choice.

sam delete –stack-name ec2-image-builder-approver-notifications –no-prompts –region <region>

3. Delete the CloudWatch log groups for the Lambda functions. You’ll identify it with the name `/aws/lambda/ec2-image-builder-approve*`.

4. Consider deleting the Amazon S3 bucket used to store the packaged Lambda artifact.

Conclusion

In this post, we explained how to use Serverless resources to enable approval notifications for an Image Builder pipeline before AMIs are shared to other accounts. This solution can be extended to share to more than one AWS account or even to an AWS organization. With this solution, you will be notified when new golden images are created, allowing you to verify the correctness of their configuration before sharing them to for wider use. This reduces the possibility of sharing AMIs with misconfigurations that the written tests may not have identified.

We invite you to experiment with different AMIs created using Image Builder, and with different Image Builder components. Check out this GitHub repository for various examples that use Image Builder. Also check out this blog on Image builder integrations with EC2 Auto Scaling Instance Refresh. Let us know your questions and findings in the comments, and have fun!

Capturing GPU Telemetry on the Amazon EC2 Accelerated Computing Instances

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/capturing-gpu-telemetry-on-the-amazon-ec2-accelerated-computing-instances/

This post is written by Amr Ragab, Principal Solutions Architect EC2.

AWS is excited to announce the native integration of monitoring GPU metrics through the CloudWatch Agent. Customers can now easily monitor GPU utilization and its memory to scale their workloads more effectively without custom scripts. In this post, we’ll describe how to allow GPU monitoring and integrate it into your Amazon Machine Images (AMI). Furthermore, we’ll extend this to include the monitoring of GPU hardware events utilizing CloudWatch Log Streams. By combining this telemetry into the Amazon CloudWatch Console, customers can have a complete picture of GPU activity across their fleets.

Capturing GPU metrics

There is an extensive list of NVIDIA accelerator metrics that can be captured. Depending on the workload type, it may be unnecessary to capture all of the metrics at all times. The following table breaks down the suggested metrics to collect by workload type. This considers a balance of cost and impactful metrics at scale.

Compute (Machine Learning (ML), High Performance Computing (HPC)) Graphics/Gaming
utilization_gpu
power_draw
utilization_memory
memory_total
memory_used
memory_free
pcie_link_gen_current
pcie_link_width_current
clocks_current_smclocks_current_memory
utilization_gpu
utilization_memory
memory_total
memory_usedmemory_free
pcie_link_gen_current
pcie_link_width_current
encoder_stats_session_count
encoder_stats_average_fps
encoder_stats_average_latency
clocks_current_graphics
clocks_current_memory
clocks_current_video

Moreover, this is supported through custom AMIs that are deployed with managed service offerings, including Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Services (Amazon ECS), and AWS ParallelCluster w/ SLURM for HPC clusters.

The following is an example screenshot from the CloudWatch Console showcasing the telemetry captured for a P4d instance. You can see that we captured the preceding metrics on a per-GPU index. Each Amazon Elastic Compute Cloud (Amazon EC2) P4d instance has 8x A100 GPUs.

Cloudwatch Console

Capturing GPU Xid events

Xid events are a reporting mechanism from GPU hardware vendors that emit notable events from the device to the OS in this case we are capturing the events through the NVRM kernel module. Current GPU architecture requires that the full GPU with protections are passed into the running instance. Thus, most errors that manifest inside of the customer instance aren’t directly visible to the Amazon EC2 virtualization stack. Although some of these errors are benign, others indicate problems with the customer application, the NVIDIA driver, and under rare circumstances a defect in the GPU hardware.

For NVIDIA-based Amazon EC2 instances, these errors will be logged in the system journal with an “NVRM:” regular expression.

These events can be collected and pushed to Amazon CloudWatch Logs as a stream. When an Xid event occurs on the GPU, it will parse the event and push it the log stream for that instance ID in the Region in which the instance is running. The following steps are required to get started capturing those events.

Deployment

We’ll cover the deployment in two different use-cases: 1. You have an existing instance running and you want to start to capture metrics and XID events. 2. You want to build and an AMI and use it within Amazon EC2 or additional services.

I. On a running Amazon EC2 instance

Step 1. Attach an IAM Role to the EC2 instance that has permission to CloudWatch Metrics/Logs. The following is an IAM policy that you can attach to your IAM Role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricStream",
                "logs:CreateLogDelivery",
                "logs:CreateLogStream",
                "cloudwatch:PutMetricData",
                "logs:UpdateLogDelivery",
                "logs:CreateLogGroup",
                "logs:PutLogEvents",
                "cloudwatch:ListMetrics"
            ],
            "Resource": "*"
        }
    ]
}

Step 2. Connect to a shell on the EC2 instance (through SSM or SSH). Install the CloudWatch Agent following the instructions here. There is support across architectures and distributions.

Step 3. Next, we can create our CloudWatch Agent JSON configuration file. The following JSON snippet will capture the logs from gpuerrors.log and push to CloudWatch Logs. Save the contents of the following JSON snippet to a file on the instance at /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json.

{
     "agent": {
         "run_as_user": "root"
     },
     "metrics": {
         "append_dimensions": {
             "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
             "ImageId": "${aws:ImageId}",
             "InstanceId": "${aws:InstanceId}",
             "InstanceType": "${aws:InstanceType}"
         },
         "aggregation_dimensions": [["InstanceId"]],
         "metrics_collected": {
            "nvidia_gpu": {
                "measurement": [
                    "utilization_gpu",
                    "utilization_memory",
                    "memory_total",
                    "memory_used",
                    "memory_free",
                    "clocks_current_graphics",
                    "clocks_current_sm",
                    "clocks_current_memory"
                ]
            }
         }
     },
     "logs": {
         "logs_collected": {
             "files": {
                 "collect_list": [
                     {
                         "file_path": "/var/log/gpuevent.log",
                         "log_group_name": "/ec2/accelerated/accel-event-log",
                         "log_stream_name": "{instance_id}"
                     }
                 ]
             }
         }
     }
 }

Step 4. To start capturing the logs, restart the aws cloudwatch systemd service.

sudo systemctl restart amazon-cloudwatch-agent.service

At this point, if you navigate to the CloudWatch Console in the Region that the instance is running, – All metrics – CWAgent, you should see a table of metrics similar to the following screenshot.

Cloudwatch Agent Metrics

Step 5. To capture the XID events it’s possible to use the same CloudWatch Log directive used in the preceding image were set the GPU metrics to capture. The JSON following snippet defines that we will stream the log in /var/log/gpuevent.log to CloudWatch.

"logs": {
         "logs_collected": {
             "files": {
                 "collect_list": [
                     {
                         "file_path": "/var/log/gpuevent.log",
                         "log_group_name": "/ec2/accelerated/accel-event-log",
                         "log_stream_name": "{instance_id}"
                     }
                 ]
             }
         }
     }

The GitHub project is an open source reference design for capturing these errors in the CloudWatch agent.

https://github.com/aws-samples/aws-efa-nccl-baseami-pipeline

Step 6. Save the following file as /opt/aws/aws-hwaccel-event-parser.py|.go with the following contents, which will write the Xid errors parsed to /var/log/gpuevent.log:

The code is available in either Python3 or Go (> 1.16).

Golang code of the hwaccel-event-parser: https://github.com/aws-samples/aws-efa-nccl-baseami-pipeline/blob/master/nvidia-efa-ami_base/cloudwatch/nvidia/aws-hwaccel-event-parser.go

Python3 code: https://github.com/aws-samples/aws-efa-nccl-baseami-pipeline/blob/master/nvidia-efa-ami_base/cloudwatch/nvidia/aws-hwaccel-event-parser.py

As you can see from the code, this is a blocking thread, and it will be running during the lifetime of the instance or container.

Step 7. For ease of deployment, you can also create a systemd service (aws-hw-monitor.service), which will run at startup before the CloudWatch agent.

[Unit]
Description=HW Error Monitor
Before=amazon-cloudwatch-agent.service
After=syslog.target network-online.target

[Service]
Type=simple
ExecStart=/opt/aws/cloudwatch/aws-cloudwatch-wrapper.sh
RemainAfterExit=1
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target

Where /opt/aws/cloudwatch/aws-cloudwatch-wrapper.sh is a script which contains:

#!/bin/bash
python3 /opt/aws/aws-hwaccel-event-parser.py &

Finally, enable and start the hw monitor service

sudo systemctl enable aws-hw-monitor.service –now

II. Building an AMI

For convenience, the following repo has what is needed to build the AMI for Amazon EC2, Amazon EKS, Amazon ECS, Amazon Linux 2, and Ubuntu 18.04/20.04 distributions. You must have Packer installed on your machine, and it must be authenticated to make API calls on your behalf to AWS. Generally you need to modify the variables:{} json and execute the packer build.

"variables": {
    "region": "us-east-1",
    "flag": "<flag>",
    "subnet_id": "<subnetid>",
    "security_groupids": "<security_group_id,security_group_id",
    "build_ami": "<buildami>",
    "efa_pkg": "aws-efa-installer-latest.tar.gz",
    "intel_mkl_version": "intel-mkl-2020.0-088",
    "nvidia_version": "510.47.03",
    "cuda_version": "cuda-toolkit-11-6 nvidia-gds-11-6",
    "cudnn_version": "libcudnn8",
    "nccl_version": "v2.12.7-1"
  },

After filling in the variables, check that the packer script is validated.

packer validate nvidia-efa-ml-al2.yml
packer build nvidia-efa-ml-al2.yml

The log group namespace is /ec2/accelerated/accel-event-log. However, you may change this to the namespace of your preference in the CloudWatch Agent config file created earlier.

Navigate to the CloudWatch Console – Logs – Log groups – /ec2/accelerated/accel-event-log. It’s sorted by instance ID, where the instance ID of the latest stream is on top.

CloudWatch Log-events

We can see in the preceding screenshot that an example workload ran on instance i-03a7b66de3198977e, which was a p4d.24xlarge triggered a Xid 63 event. Capturing these events is the first step. Next, we must interpret what these events mean. With each Xid error, there is a number associated with each event. As previously mentioned, these can be hardware errors, driver, and/or application errors. If you’re running on an Amazon EC2 accelerated instance, and after code execution run into one of these errors, contact AWS Support with the instance ID and Xid error. The following is a list of the more common Xid errors that you may encounter.

Xid Error Name Description Action
48 Double Bit ECC error Hardware memory error Contact AWS Support with Xid error and instance ID
74 GPU NVLink error Further SXid errors should also be populated which will inform on the error seen with the NVLink fabric Get information on which links are causing the issue by running nvidia-smi nvlink -e
63 GPU Row Remapping Event Specific to Ampere architecture –- a row bank is pending a memory remap Stop all CUDA processes, and reset the GPU (nvidia-smi -r), and make sure thatensure the remap is cleared in nvidia-smi -q
13 Graphics Engine Exception User application fault , illegal instruction or register Rerun the application with CUDA_LAUNCH_BLOCKING=1 enabled which should determine if it’s a NVIDIA driver or hardware issue
31 GPU memory page fault Illegal memory address access error Rerun the application with CUDA_LAUNCH_BLOCKING=1 enabled which should determine if it’s a NVIDIA driver or hardware issue

A quick way to check for row remapping failures is to run the below command on the instance.

nvidia-smi --query-remapped-
rows=gpu_name,gpu_bus_id,remapped_rows.failure,remapped_rows.pending,remapped_rows.correctable,remapped_rows.uncorrectable --format=csv
gpu_name, gpu_bus_id, remapped_rows.failure, remapped_rows.pending, remapped_rows.correctable, remapped_rows.uncorrectable
NVIDIA A100-SXM4-40GB, 00000000:10:1C.0, 0, 0, 0, 0
NVIDIA A100-SXM4-40GB, 00000000:10:1D.0, 0, 0, 0, 0
NVIDIA A100-SXM4-40GB, 00000000:20:1C.0, 0, 0, 0, 0
NVIDIA A100-SXM4-40GB, 00000000:20:1D.0, 0, 0, 0, 0
NVIDIA A100-SXM4-40GB, 00000000:90:1C.0, 0, 0, 0, 0
NVIDIA A100-SXM4-40GB, 00000000:90:1D.0, 0, 0, 0, 0
NVIDIA A100-SXM4-40GB, 00000000:A0:1C.0, 0, 0, 0, 0
NVIDIA A100-SXM4-40GB, 00000000:A0:1D.0, 0, 0, 0, 0

This isn’t an exhaustive list of Xid events, but it provides some of the more common ones that you may come across as you develop your accelerated workload. You can find a more complete table of events here. Furthermore, if you have questions, you can reach out to AWS Support with the output of the tar ball created by executing the nvidia-bug-report.sh script included with the NVIDIA driver.

Conclusion

Get started with integrating this monitoring into your AMIs if you use custom AMIs specifically for key services, such as Amazon EKS, Amazon ECS, or Amazon EC2 with AWS ParallelCluster. This will help you discover utilization metrics for your accelerated computing workloads. If you have any questions about this post, then reach out to your account team.

Identification of replication bottlenecks when using AWS Application Migration Service

Post Syndicated from Tobias Reekers original https://aws.amazon.com/blogs/architecture/identification-of-replication-bottlenecks-when-using-aws-application-migration-service/

Enterprises frequently begin their journey by re-hosting (lift-and-shift) their on-premises workloads into AWS and running Amazon Elastic Compute Cloud (Amazon EC2) instances. A simpler way to re-host is by using AWS Application Migration Service (Application Migration Service), a cloud-native migration service.

To streamline and expedite migrations, automate reusable migration patterns that work for a wide range of applications. Application Migration Service is the recommended migration service to lift-and-shift your applications to AWS.

In this blog post, we explore key variables that contribute to server replication speed when using Application Migration Service. We will also look at tests you can run to identify these bottlenecks and, where appropriate, include remediation steps.

Overview of migration using Application Migration Service

Figure 1 depicts the end-to-end data replication flow from source servers to a target machine hosted on AWS. The diagram is designed to help visualize potential bottlenecks within the data flow, which are denoted by a black diamond.

Data flow when using AWS Application Migration Service (black diamonds denote potential points of contention)

Figure 1. Data flow when using AWS Application Migration Service (black diamonds denote potential points of contention)

Baseline testing

To determine a baseline replication speed, we recommend performing a control test between your target AWS Region and the nearest Region to your source workloads. For example, if your source workloads are in a data center in Rome and your target Region is Paris, run a test between eu-south-1 (Milan) and eu-west-3 (Paris). This will give a theoretical upper bandwidth limit, as replication will occur over the AWS backbone. If the target Region is already the closest Region to your source workloads, run the test from within the same Region.

Network connectivity

There are several ways to establish connectivity between your on-premises location and AWS Region:

  1. Public internet
  2. VPN
  3. AWS Direct Connect

This section pertains to options 1 and 2. If facing replication speed issues, the first place to look is at network bandwidth. From a source machine within your internal network, run a speed test to calculate your bandwidth out to the internet; common test providers include Cloudflare, Ookla, and Google. This is your bandwidth to the internet, not to AWS.

Next, to confirm the data flow from within your data center, run a traceroute (Windows) or tracert (Linux). Identify any network hops that are unusual or potentially throttling bandwidth (due to hardware limitations or configuration).

To measure the maximum bandwidth between your data center and the AWS subnet that is being used for data replication, while accounting for Security Sockets Layer (SSL) encapsulation, use the CloudEndure SSL bandwidth tool (refer to Figure 1).

Source storage I/O

The next area to look for replication bottlenecks is source storage. The underlying storage for servers can be a point of contention. If the storage is maxing-out its read speeds, this will impact the data-replication rate. If your storage I/O is heavily utilized, it can impact block replication by Application Migration Service. In order to measure storage speeds, you can use the following tools:

  • Windows: WinSat (or other third-party tooling, like AS SSD Benchmark)
  • Linux: hdparm

We suggest reducing read/write operations on your source storage when starting your migration using Application Migration Service.

Application Migration Service EC2 replication instance size

The size of the EC2 replication server instance can also have an impact on the replication speed. Although it is recommended to keep the default instance size (t3.small), it can be increased if there are business requirements, like to speed up the initial data sync. Note: using a larger instance can lead to increased compute costs.

-508 (1)

Common replication instance changes include:

  • Servers with <26 disks: change the instance type to m5.large. Increase the instance type to m5.xlarge or higher, as needed.
  • Servers with <26 disks (or servers in AWS Regions that do not support m5 instance types): change the instance type to m4.large. Increase to m4.xlarge or higher, as needed.

Note: Changing the replication server instance type will not affect data replication. Data replication will automatically pick up where it left off, using the new instance type you selected.

Application Migration Service Elastic Block Store replication volume

You can customize the Amazon Elastic Block Store (Amazon EBS) volume type used by each disk within each source server in that source server’s settings (change staging disk type).

By default, disks <500GiB use Magnetic HDD volumes. AWS best practice suggests not change the default Amazon EBS volume type, unless there is a business need for doing so. However, as we aim to speed up the replication, we actively change the default EBS volume type.

There are two options to choose from:

  1. The lower cost, Throughput Optimized HDD (st1) option utilizes slower, less expensive disks.

-508 (2)

    • Consider this option if you(r):
      • Want to keep costs low
      • Large disks do not change frequently
      • Are not concerned with how long the initial sync process will take
  1. The faster, General Purpose SSD (gp2) option utilizes faster, but more expensive disks.

-508 (3)

    • Consider this option if you(r):
      • Source server has disks with a high write rate, or if you need faster performance in general
      • Want to speed up the initial sync process
      • Are willing to pay more for speed

Source server CPU

The Application Migration Service agent that is installed on the source machine for data replication uses a single core in most cases (agent threads can be scheduled to multiple cores). If core utilization reaches a maximum, this can be a limitation for replication speed. In order to check the core utilization:

  • Windows: Launch the Task Manger application within Windows, and click on the “CPU” tab. Right click on the CPU graph (this is currently showing an average of cores) > select “Change graph to” > “Logical processors”. This will show individual cores and their current utilization (Figure 2).
Logical processor CPU utilization

Figure 2. Logical processor CPU utilization

Linux: Install htop and run from the terminal. The htop command will display the Application Migration Service/CE process and indicate the CPU and memory utilization percentage (this is of the entire machine). You can check the CPU bars to determine if a CPU is being maxed-out (Figure 3).

AWS Application Migration Service/CE process to assess CPU utilization

Figure 3. AWS Application Migration Service/CE process to assess CPU utilization

Conclusion

In this post, we explored several key variables that contribute to server replication speed when using Application Migration Service. We encourage you to explore these key areas during your migration to determine if your replication speed can be optimized.

Related information

New – Amazon EC2 R6id Instances with NVMe Local Instance Storage of up to 7.6 TB

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/new-amazon-ec2-r6id-instances/

In November 2021, we launched the memory-optimized Amazon EC2 R6i instances, our sixth-generation x86-based offering powered by 3rd Generation Intel Xeon Scalable processors (code named Ice Lake).

Today I am excited to announce a disk variant of the R6i instance: the Amazon EC2 R6id instances with non-volatile memory express (NVMe) SSD local instance storage. The R6id instances are designed to power applications that require low storage latency or require temporary swap space.

Customers with workloads that require access to high-speed, low-latency storage, including those that need temporary storage for scratch space, temporary files, and caches, have the option to choose the R6id instances with NVMe local instance storage of up to 7.6 TB. The new instances are also available as bare-metal instances to support workloads that benefit from direct access to physical resources.

Here’s some background on what led to the development of the sixth-generation instances. Our customers who are currently using fifth-generation instances are looking for the following:

  • Higher Compute Performance – Higher CPU performance to improve latency and processing time for their workloads
  • Improved Price Performance – Customers are very sensitive to price performance to optimize costs
  • Larger Sizes – Customers require larger sizes to scale their enterprise databases
  • Higher Amazon EBS Performance – Customers have requested higher Amazon EBS throughput (“at least double”) to improve response times for their analytics applications
  • Local Storage – Large customers have expressed a need for more local storage per vCPU

Sixth-generation instances address these requirements by offering generational improvement across the board, including 15 percent increase in price performance, 33 percent more vCPUs, up to 1 TB memory, 2x networking performance, 2x EBS performance, and global availability.

Compared to R5d instances, the R6id instances offer:

  • Larger instance size (.32xlarge) with 128 vCPUs and 1024 GiB of memory, enabling customers to consolidate their workloads and scale up applications.
  • Up to 15 percent improvement in compute price performance and 20 percent higher memory bandwidth.
  • Up to 58 percent higher storage per vCPU and 34 percent lower cost per TB.
  • Up to 50 Gbps network bandwidth and up to 40 Gbps EBS bandwidth; EBS burst bandwidth support for sizes up to .4xlarge.
  • Always-on memory encryption.
  • Support for new Intel Advanced Vector Extensions (AVX 512) instructions such as VAES, VCLMUL, VPCLMULQDQ, and GFNI for faster execution of cryptographic algorithms such as those used in IPSec and TLS implementations.

The detailed specifications of the R6id instances are as follows:

Instance Name

vCPUs RAM (GiB)

Local NVMe SSD Storage (GB)

EBS Throughput (Gbps)

Network Bandwidth (Gbps)

r6id.large 2 16 1 x 118 Up to 10 Up to 12.5
r6id.xlarge 4 32 1 x 237 Up to 10 Up to 12.5
r6id.2xlarge 8 64 1 x 474 Up to 10 Up to 12.5
r6id.4xlarge 16 128 1 x 950 Up to 10 Up to 12.5
r6id.8xlarge 32 256 1 x 1900 10 12.5
r6id.12xlarge 48 384 2 x 1425 15 18.75
r6id.16xlarge 64 512 2 x 1900 20 25
r6id.24xlarge 96 768 4 x 1425 30 37.5
r6id.32xlarge 128 1024 4 x 1900 40 50
r6id.metal 128 1024 4 x 1900 40 50

Now available

The R6id instances are available today in the AWS US East (Ohio), US East (N.Virginia), US West (Oregon), and Europe (Ireland) Regions as On-Demand, Spot, and Reserved Instances or as part of a Savings Plan. As usual, with EC2, you pay for what you use. For more information, see the Amazon EC2 pricing page.

To learn more, visit our Amazon EC2 R6i instances page, and please send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Veliswa x

Considerations for modernizing Microsoft SQL database service with high availability on AWS

Post Syndicated from Lewis Tang original https://aws.amazon.com/blogs/architecture/considerations-for-modernizing-microsoft-sql-database-service-with-high-availability-on-aws/

Many organizations have applications that require Microsoft SQL Server to run relational database workloads: some applications can be proprietary software that the vendor mandates Microsoft SQL Server to run database service; the other applications can be long-standing, home-grown applications that included Microsoft SQL Server when they were initially developed. When organizations migrate applications to AWS, they often start with lift-and-shift approach and run Microsoft SQL database service on Amazon Elastic Compute Cloud (Amazon EC2). The reason could be this is what they are most familiar with.

In this post, I share the architecture options to modernize Microsoft SQL database service and run highly available relational data services on Amazon EC2, Amazon Relational Database Service (Amazon RDS), and Amazon Aurora (Aurora).

Running Microsoft SQL database service on Amazon EC2 with high availability

This option is the least invasive to existing operations models. It gives you a quick start to modernize Microsoft SQL database service by leveraging the AWS Cloud to manage services like physical facilities. The low-level infrastructure operational tasks—such as server rack, stack, and maintenance—are managed by AWS. You have full control of the database and operating-system–level access, so there is a choice of tools to manage the operating system, database software, patches, data replication, backup, and restoration.

You can use any Microsoft SQL Server-supported replication technology with your Microsoft SQL Server database on Amazon EC2 to achieve high availability, data protection, and disaster recovery. Common solutions include log shipping, database mirroring, Always On availability groups, and Always On Failover Cluster Instances.

High availability in a single Region

Figure 1 shows how you can use Microsoft SQL Server on Amazon EC2 across multiple Availability Zones (AZs) within single Region. The interconnects among AZs that are similar to your data center intercommunications are managed by AWS. The primary database is a read-write database, and the secondary database is configured with log shipping, database mirroring, or Always On availability groups for high availability. All the transactional data from the primary database is transferred and can be applied to the secondary database asynchronously for log shipping, and it can either asynchronously or synchronously for Always On availability groups and mirroring.

High availability in a single Region with Microsoft SQL Database Service on Amazon EC2

Figure 1. High availability in a single Region with Microsoft SQL database service on Amazon EC2

High availability across multiple Regions

Figure 2 demonstrates how to configure high availability for Microsoft SQL Server on Amazon EC2 across multiple Regions. A secondary Microsoft SQL Server in a different Region from the primary is configured with log shipping, database mirroring, or Always On availability groups for high availability. The transactional data from primary database is transferred via the fully managed backbone network of AWS across Regions.

High availability across multiple Regions with Microsoft SQL database service on Amazon EC2

Figure 2. High availability across multiple Regions with Microsoft SQL database service on Amazon EC2

Replatforming Microsoft SQL Database Service on Amazon RDS with high availability

Amazon RDS is a managed database service and responsible for most management tasks. It currently supports Multi-AZ deployments for SQL Server using SQL Server Database Mirroring (DBM) or Always On Availability Groups (AGs) as a high-availability, failover solution.

High availability in a single Region

Figure 3 demonstrates the Microsoft SQL database service that is run on Amazon RDS is configured with a multi-AZ deployment model in single region. Multi-AZ deployments provide increased availability, data durability, and fault tolerance for DB instances. In the event of planned database maintenance or unplanned service disruption, Amazon RDS automatically fails-over to the up-to-date secondary DB instance. This functionality lets database operations resume quickly without manual intervention. The primary and standby instances use the same endpoint, whose physical network address transitions to the secondary replica as part of the failover process. You don’t have to reconfigure your application when a failover occurs. Amazon RDS supports multi-AZ deployments for Microsoft SQL Server by using either SQL Server database mirroring or Always On availability groups.

High availability in a single Region with Microsoft SQL database service on Amazon RDS

Figure 3. High availability in a single Region with Microsoft SQL database service on Amazon RDS

High availability across multiple Regions

Figure 4 depicts how you can use AWS Database Migration Service (AWS DMS) to configure continuous replication among Microsoft SQL Database Service on Amazon RDS across multiple Regions. AWS DMS needs Microsoft Change Data Capture to be enabled on the Amazon RDS for the Microsoft SQL Server instance. If problems occur, you can initiate manual failovers and reinstate database services by promoting the Amazon RDS read replica in a different Region.

High availability across multiple Regions with Microsoft SQL database service on Amazon RDS

Figure 4. High availability across multiple Regions with Microsoft SQL database service on Amazon RDS

Refactoring Microsoft SQL database service on Amazon Aurora with high availability

This option helps you to eliminate the cost of SQL database service license. You can run database service on a truly cloud native modern database architecture. You can use AWS Schema Conversion Tool to assist in the assessment and conversion of your database code and storage objects. Any objects that cannot be automatically converted are clearly marked so they can be manually converted to complete the migration.

The Aurora architecture involves separation of storage and compute. Aurora includes some high availability features that apply to the data in your database cluster. The data remains safe even if some or all of the DB instances in the cluster become unavailable. Other high availability features apply to the DB instances. These features help to make sure that one or more DB instances are ready to handle database requests from your application.

High availability in a single Region

Figure 5 demonstrates Aurora stores copies of the data in a database cluster across multiple AZs in single Region. When data is written to the primary DB instance, Aurora synchronously replicates the data across AZs to six storage nodes associated with your cluster volume. Doing so provides data redundancy, eliminates I/O freezes, and minimizes latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, such as database engine updates, and help protect your databases against failure and AZ disruption.

High availability in a single Region with Amazon Aurora

Figure 5. High availability in a single Region with Amazon Aurora

High availability across multiple Regions

Figure 6 depicts how you can set up Aurora global databases for high availability across multiple Regions. An Aurora global database consists of one primary Region where your data is written, and up to five read-only secondary Regions. You issue write operations directly to the primary database cluster in the primary Region. Aurora automatically replicates data to the secondary Regions using dedicated infrastructure, with latency typically under a second.

High availability across multiple Regions with Amazon Aurora global databases

Figure 6. High availability across multiple Regions with Amazon Aurora global databases

Summary

You can choose among the options of Amazon EC2, Amazon RDS, and Amazon Aurora when modernizing SQL database service on AWS. Understanding the features required by business and the scope of service management responsibilities are good starting points. When presented with multiple options that meet with business needs, choose one that will allow more focus on your application, business value-add capabilities, and help you to reduce the services’ “total cost of ownership”.

AWS MGN Update – Configure DR, Convert CentOS Linux to Rocky Linux, and Convert SUSE Linux Subscription

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-mgn-update-configure-dr-convert-centos-linux-to-rocky-linux-and-convert-suse-linux-subscription/

Just about a year ago, Channy showed you How to Use the New AWS Application Migration Server for Lift-and-Shift Migrations. In his post, he introduced AWS Application Migration Service (AWS MGN) and said:

With AWS MGN, you can minimize time-intensive, error-prone manual processes by automatically replicating entire servers and converting your source servers from physical, virtual, or cloud infrastructure to run natively on AWS. The service simplifies your migration by enabling you to use the same automated process for a wide range of applications.

Since launch, we have added agentless replication along with support for Windows 10 and multiple versions of Windows Server (2003, 2008, and 2022). We also expanded into additional regions throughout 2021.

New Post-Launch Actions
As the title of Channy’s post stated, AWS MGN initially supported direct, lift-and-shift migrations. In other words, the selected disk volumes on the source servers were directly copied, bit-for-bit to EBS volumes attached to freshly launched Amazon Elastic Compute Cloud (Amazon EC2) instances.

Today we are adding a set of optional post-launch actions that provide additional support for your migration and modernization efforts. The actions are initiated and managed by the AWS Systems Manager agent, which can be automatically installed as the first post-launch action. We are launching with an initial set of four actions, and plan to add more over time:

Install Agent – This action installs the AWS Systems Manager agent, and is a prerequisite to the other actions.

Disaster Recovery – Installs the AWS Elastic Disaster Recovery Service agent on each server and configures replication to a specified target region.

CentOS Conversion – If the source server is running CentOS, the instances can be migrated to Rocky Linux.

SUSE Subscription Conversion – If the source service is running SUSE Linux via a subscription provided by SUSE, the instance is changed to use an AWS-provided SUSE subscription.

Using Post-Launch Actions
My AWS account has a post-launch settings template that serves as a starting point, and provides the default settings each time I add a new source server. I can use the values from the template as-is, or I can customize them as needed. I open the Application Migration Service Console and click Settings to view and edit my template:

I click Post-launch settings template, and review the default values. Then I click Edit to make changes:

As I noted earlier, the Systems Manager agent executes the other post-launch actions, and is a prerequisite, so I enable it:

Next, I choose to run the post-launch actions on both my test and cutover instances, since I want to test against the final migrated configuration:

I can now configure any or all of the post-launch options, starting with disaster recovery. I check Configure disaster recovery on migrated servers and choose a target region:

Next, I check Convert CentOS to Rocky Linux distribution. This action converts a CentOS 8 distribution to a Rocky Linux 8 distribution:

Moving right along, I check Change SUSE Linux Subscription to AWS provided SUSE Linux subscription, and then click Save template:

To learn more about pricing for the SUSE subscriptions, visit the Amazon EC2 On-Demand Pricing page.

After I have set up my template, I can view and edit the settings for each of my source servers. I simply select the server and choose Edit post-launch settings from the Replication menu:

The post-launch actions will be run at the appropriate time on the test or the cutover instances, per my selections. Any errors that arise during the execution of an action are written to the SSM execution log. I can also examine the Migration dashboard for each source server and review the Post-launch actions:

Available Now
The post-launch actions are available now and you can start using them today in all regions where AWS Application Migration Service (AWS MGN) is supported.

Jeff;