Tag Archives: Amazon EC2

New – Amazon EC2 R6id Instances with NVMe Local Instance Storage of up to 7.6 TB

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/new-amazon-ec2-r6id-instances/

In November 2021, we launched the memory-optimized Amazon EC2 R6i instances, our sixth-generation x86-based offering powered by 3rd Generation Intel Xeon Scalable processors (code named Ice Lake).

Today I am excited to announce a disk variant of the R6i instance: the Amazon EC2 R6id instances with non-volatile memory express (NVMe) SSD local instance storage. The R6id instances are designed to power applications that require low storage latency or require temporary swap space.

Customers with workloads that require access to high-speed, low-latency storage, including those that need temporary storage for scratch space, temporary files, and caches, have the option to choose the R6id instances with NVMe local instance storage of up to 7.6 TB. The new instances are also available as bare-metal instances to support workloads that benefit from direct access to physical resources.

Here’s some background on what led to the development of the sixth-generation instances. Our customers who are currently using fifth-generation instances are looking for the following:

  • Higher Compute Performance – Higher CPU performance to improve latency and processing time for their workloads
  • Improved Price Performance – Customers are very sensitive to price performance to optimize costs
  • Larger Sizes – Customers require larger sizes to scale their enterprise databases
  • Higher Amazon EBS Performance – Customers have requested higher Amazon EBS throughput (“at least double”) to improve response times for their analytics applications
  • Local Storage – Large customers have expressed a need for more local storage per vCPU

Sixth-generation instances address these requirements by offering generational improvement across the board, including 15 percent increase in price performance, 33 percent more vCPUs, up to 1 TB memory, 2x networking performance, 2x EBS performance, and global availability.

Compared to R5d instances, the R6id instances offer:

  • Larger instance size (.32xlarge) with 128 vCPUs and 1024 GiB of memory, enabling customers to consolidate their workloads and scale up applications.
  • Up to 15 percent improvement in compute price performance and 20 percent higher memory bandwidth.
  • Up to 58 percent higher storage per vCPU and 34 percent lower cost per TB.
  • Up to 50 Gbps network bandwidth and up to 40 Gbps EBS bandwidth; EBS burst bandwidth support for sizes up to .4xlarge.
  • Always-on memory encryption.
  • Support for new Intel Advanced Vector Extensions (AVX 512) instructions such as VAES, VCLMUL, VPCLMULQDQ, and GFNI for faster execution of cryptographic algorithms such as those used in IPSec and TLS implementations.

The detailed specifications of the R6id instances are as follows:

Instance Name

vCPUs RAM (GiB)

Local NVMe SSD Storage (GB)

EBS Throughput (Gbps)

Network Bandwidth (Gbps)

r6id.large 2 16 1 x 118 Up to 10 Up to 12.5
r6id.xlarge 4 32 1 x 237 Up to 10 Up to 12.5
r6id.2xlarge 8 64 1 x 474 Up to 10 Up to 12.5
r6id.4xlarge 16 128 1 x 950 Up to 10 Up to 12.5
r6id.8xlarge 32 256 1 x 1900 10 12.5
r6id.12xlarge 48 384 2 x 1425 15 18.75
r6id.16xlarge 64 512 2 x 1900 20 25
r6id.24xlarge 96 768 4 x 1425 30 37.5
r6id.32xlarge 128 1024 4 x 1900 40 50
r6id.metal 128 1024 4 x 1900 40 50

Now available

The R6id instances are available today in the AWS US East (Ohio), US East (N.Virginia), US West (Oregon), and Europe (Ireland) Regions as On-Demand, Spot, and Reserved Instances or as part of a Savings Plan. As usual, with EC2, you pay for what you use. For more information, see the Amazon EC2 pricing page.

To learn more, visit our Amazon EC2 R6i instances page, and please send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Veliswa x

Considerations for modernizing Microsoft SQL database service with high availability on AWS

Post Syndicated from Lewis Tang original https://aws.amazon.com/blogs/architecture/considerations-for-modernizing-microsoft-sql-database-service-with-high-availability-on-aws/

Many organizations have applications that require Microsoft SQL Server to run relational database workloads: some applications can be proprietary software that the vendor mandates Microsoft SQL Server to run database service; the other applications can be long-standing, home-grown applications that included Microsoft SQL Server when they were initially developed. When organizations migrate applications to AWS, they often start with lift-and-shift approach and run Microsoft SQL database service on Amazon Elastic Compute Cloud (Amazon EC2). The reason could be this is what they are most familiar with.

In this post, I share the architecture options to modernize Microsoft SQL database service and run highly available relational data services on Amazon EC2, Amazon Relational Database Service (Amazon RDS), and Amazon Aurora (Aurora).

Running Microsoft SQL database service on Amazon EC2 with high availability

This option is the least invasive to existing operations models. It gives you a quick start to modernize Microsoft SQL database service by leveraging the AWS Cloud to manage services like physical facilities. The low-level infrastructure operational tasks—such as server rack, stack, and maintenance—are managed by AWS. You have full control of the database and operating-system–level access, so there is a choice of tools to manage the operating system, database software, patches, data replication, backup, and restoration.

You can use any Microsoft SQL Server-supported replication technology with your Microsoft SQL Server database on Amazon EC2 to achieve high availability, data protection, and disaster recovery. Common solutions include log shipping, database mirroring, Always On availability groups, and Always On Failover Cluster Instances.

High availability in a single Region

Figure 1 shows how you can use Microsoft SQL Server on Amazon EC2 across multiple Availability Zones (AZs) within single Region. The interconnects among AZs that are similar to your data center intercommunications are managed by AWS. The primary database is a read-write database, and the secondary database is configured with log shipping, database mirroring, or Always On availability groups for high availability. All the transactional data from the primary database is transferred and can be applied to the secondary database asynchronously for log shipping, and it can either asynchronously or synchronously for Always On availability groups and mirroring.

High availability in a single Region with Microsoft SQL Database Service on Amazon EC2

Figure 1. High availability in a single Region with Microsoft SQL database service on Amazon EC2

High availability across multiple Regions

Figure 2 demonstrates how to configure high availability for Microsoft SQL Server on Amazon EC2 across multiple Regions. A secondary Microsoft SQL Server in a different Region from the primary is configured with log shipping, database mirroring, or Always On availability groups for high availability. The transactional data from primary database is transferred via the fully managed backbone network of AWS across Regions.

High availability across multiple Regions with Microsoft SQL database service on Amazon EC2

Figure 2. High availability across multiple Regions with Microsoft SQL database service on Amazon EC2

Replatforming Microsoft SQL Database Service on Amazon RDS with high availability

Amazon RDS is a managed database service and responsible for most management tasks. It currently supports Multi-AZ deployments for SQL Server using SQL Server Database Mirroring (DBM) or Always On Availability Groups (AGs) as a high-availability, failover solution.

High availability in a single Region

Figure 3 demonstrates the Microsoft SQL database service that is run on Amazon RDS is configured with a multi-AZ deployment model in single region. Multi-AZ deployments provide increased availability, data durability, and fault tolerance for DB instances. In the event of planned database maintenance or unplanned service disruption, Amazon RDS automatically fails-over to the up-to-date secondary DB instance. This functionality lets database operations resume quickly without manual intervention. The primary and standby instances use the same endpoint, whose physical network address transitions to the secondary replica as part of the failover process. You don’t have to reconfigure your application when a failover occurs. Amazon RDS supports multi-AZ deployments for Microsoft SQL Server by using either SQL Server database mirroring or Always On availability groups.

High availability in a single Region with Microsoft SQL database service on Amazon RDS

Figure 3. High availability in a single Region with Microsoft SQL database service on Amazon RDS

High availability across multiple Regions

Figure 4 depicts how you can use AWS Database Migration Service (AWS DMS) to configure continuous replication among Microsoft SQL Database Service on Amazon RDS across multiple Regions. AWS DMS needs Microsoft Change Data Capture to be enabled on the Amazon RDS for the Microsoft SQL Server instance. If problems occur, you can initiate manual failovers and reinstate database services by promoting the Amazon RDS read replica in a different Region.

High availability across multiple Regions with Microsoft SQL database service on Amazon RDS

Figure 4. High availability across multiple Regions with Microsoft SQL database service on Amazon RDS

Refactoring Microsoft SQL database service on Amazon Aurora with high availability

This option helps you to eliminate the cost of SQL database service license. You can run database service on a truly cloud native modern database architecture. You can use AWS Schema Conversion Tool to assist in the assessment and conversion of your database code and storage objects. Any objects that cannot be automatically converted are clearly marked so they can be manually converted to complete the migration.

The Aurora architecture involves separation of storage and compute. Aurora includes some high availability features that apply to the data in your database cluster. The data remains safe even if some or all of the DB instances in the cluster become unavailable. Other high availability features apply to the DB instances. These features help to make sure that one or more DB instances are ready to handle database requests from your application.

High availability in a single Region

Figure 5 demonstrates Aurora stores copies of the data in a database cluster across multiple AZs in single Region. When data is written to the primary DB instance, Aurora synchronously replicates the data across AZs to six storage nodes associated with your cluster volume. Doing so provides data redundancy, eliminates I/O freezes, and minimizes latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, such as database engine updates, and help protect your databases against failure and AZ disruption.

High availability in a single Region with Amazon Aurora

Figure 5. High availability in a single Region with Amazon Aurora

High availability across multiple Regions

Figure 6 depicts how you can set up Aurora global databases for high availability across multiple Regions. An Aurora global database consists of one primary Region where your data is written, and up to five read-only secondary Regions. You issue write operations directly to the primary database cluster in the primary Region. Aurora automatically replicates data to the secondary Regions using dedicated infrastructure, with latency typically under a second.

High availability across multiple Regions with Amazon Aurora global databases

Figure 6. High availability across multiple Regions with Amazon Aurora global databases

Summary

You can choose among the options of Amazon EC2, Amazon RDS, and Amazon Aurora when modernizing SQL database service on AWS. Understanding the features required by business and the scope of service management responsibilities are good starting points. When presented with multiple options that meet with business needs, choose one that will allow more focus on your application, business value-add capabilities, and help you to reduce the services’ “total cost of ownership”.

AWS MGN Update – Configure DR, Convert CentOS Linux to Rocky Linux, and Convert SUSE Linux Subscription

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-mgn-update-configure-dr-convert-centos-linux-to-rocky-linux-and-convert-suse-linux-subscription/

Just about a year ago, Channy showed you How to Use the New AWS Application Migration Server for Lift-and-Shift Migrations. In his post, he introduced AWS Application Migration Service (AWS MGN) and said:

With AWS MGN, you can minimize time-intensive, error-prone manual processes by automatically replicating entire servers and converting your source servers from physical, virtual, or cloud infrastructure to run natively on AWS. The service simplifies your migration by enabling you to use the same automated process for a wide range of applications.

Since launch, we have added agentless replication along with support for Windows 10 and multiple versions of Windows Server (2003, 2008, and 2022). We also expanded into additional regions throughout 2021.

New Post-Launch Actions
As the title of Channy’s post stated, AWS MGN initially supported direct, lift-and-shift migrations. In other words, the selected disk volumes on the source servers were directly copied, bit-for-bit to EBS volumes attached to freshly launched Amazon Elastic Compute Cloud (Amazon EC2) instances.

Today we are adding a set of optional post-launch actions that provide additional support for your migration and modernization efforts. The actions are initiated and managed by the AWS Systems Manager agent, which can be automatically installed as the first post-launch action. We are launching with an initial set of four actions, and plan to add more over time:

Install Agent – This action installs the AWS Systems Manager agent, and is a prerequisite to the other actions.

Disaster Recovery – Installs the AWS Elastic Disaster Recovery Service agent on each server and configures replication to a specified target region.

CentOS Conversion – If the source server is running CentOS, the instances can be migrated to Rocky Linux.

SUSE Subscription Conversion – If the source service is running SUSE Linux via a subscription provided by SUSE, the instance is changed to use an AWS-provided SUSE subscription.

Using Post-Launch Actions
My AWS account has a post-launch settings template that serves as a starting point, and provides the default settings each time I add a new source server. I can use the values from the template as-is, or I can customize them as needed. I open the Application Migration Service Console and click Settings to view and edit my template:

I click Post-launch settings template, and review the default values. Then I click Edit to make changes:

As I noted earlier, the Systems Manager agent executes the other post-launch actions, and is a prerequisite, so I enable it:

Next, I choose to run the post-launch actions on both my test and cutover instances, since I want to test against the final migrated configuration:

I can now configure any or all of the post-launch options, starting with disaster recovery. I check Configure disaster recovery on migrated servers and choose a target region:

Next, I check Convert CentOS to Rocky Linux distribution. This action converts a CentOS 8 distribution to a Rocky Linux 8 distribution:

Moving right along, I check Change SUSE Linux Subscription to AWS provided SUSE Linux subscription, and then click Save template:

To learn more about pricing for the SUSE subscriptions, visit the Amazon EC2 On-Demand Pricing page.

After I have set up my template, I can view and edit the settings for each of my source servers. I simply select the server and choose Edit post-launch settings from the Replication menu:

The post-launch actions will be run at the appropriate time on the test or the cutover instances, per my selections. Any errors that arise during the execution of an action are written to the SSM execution log. I can also examine the Migration dashboard for each source server and review the Post-launch actions:

Available Now
The post-launch actions are available now and you can start using them today in all regions where AWS Application Migration Service (AWS MGN) is supported.

Jeff;

Running AWS Lambda functions on AWS Outposts using AWS IoT Greengrass

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/running-aws-lambda-functions-on-aws-outposts-using-aws-iot-greengrass/

This blog post is written by Adam Imeson, Sr. Hybrid Edge Specialist Solution Architect.

Today, AWS customers can deploy serverless applications in AWS Regions using a variety of AWS services. Customers can also use AWS Outposts to deploy fully managed AWS infrastructure at virtually any datacenter, colocation space, or on-premises facility.

AWS Outposts extends the cloud by bringing AWS services to customers’ premises to support their hybrid and edge workloads. This post will describe how to deploy Lambda functions on an Outpost using AWS IoT Greengrass.

Consider a customer who has built an application that runs in an AWS Region and depends on AWS Lambda. This customer has a business need to enter a new geographic market, but the nearest AWS Region is not close enough to meet application latency or data residency requirements. AWS Outposts can help this customer extend AWS infrastructure and services to their desired geographic region. This blog post will explain how a customer can move their Lambda-dependent application to an Outpost.

Overview

In this walkthrough you will create a Lambda function that can run on AWS IoT Greengrass and deploy it on an Outpost. This architecture results in an AWS-native Lambda function running on the Outpost.

Architecture overview - Lambda functions on AWS Outposts

Deploying Lambda functions on Outposts rack

Prerequisites: Building a VPC

To get started, build a VPC in the same Region as your Outpost. You can do this with the create VPC option in the AWS console. The workflow allows you to set up a VPC with public and private subnets, an internet gateway, and NAT gateways as necessary. Do not consume all of the available IP space in the VPC with your subnets in this step, because you will still need to create Outposts subnets after this.

Now, build a subnet on your Outpost. You can do this by selecting your Outpost in the Outposts console and choosing Create Subnet in the drop-down Actions menu in the top right.

Confirm subnet details

Choose the VPC you just created and select a CIDR range for your new subnet that doesn’t overlap with the other subnets that are already in the VPC. Once you’ve created the subnet, you need to create a new subnet route table and associate it with your new subnet. Go into the subnet route tables section of the VPC console and create a new route table. Associate the route table with your new subnet. Add a 0.0.0.0/0 route pointing at your VPC’s internet gateway. This sets the subnet up as a public subnet, which for the purposes of this post will make it easier to access the instance you are about to build for Greengrass Core. Depending on your requirements, it may make more sense to set up a private subnet on your Outpost instead. You can also add a route pointing at your Outpost’s local gateway here. Although you won’t be using the local gateway during this walkthrough, adding a route to the local gateway makes it possible to trigger your Outpost-hosted Lambda function with on-premises traffic.

Create a new route table

Associate the route table with the new subnet

Add a 0.0.0.0/0 route pointing at your VPC’s internet gateway

Setup: Launching an instance to run Greengrass Core

Create a new EC2 instance in your Outpost subnet. As long as your Outpost has capacity for your desired instance type, this operation will proceed the same way as any other EC2 instance launch. You can check your Outpost’s capacity in the Outposts console or in Amazon CloudWatch:

I used a c5.large instance running Amazon Linux 2 with 20 GiB of Amazon EBS storage for this walkthough. You can pick a different instance size or a different operating system in accordance with your application’s needs and the AWS IoT Greengrass documentation. For the purposes of this tutorial, we assign a public IP address to the EC2 instance on creation.

Step 1: Installing the AWS IoT Greengrass Core software

Once your EC2 instance is up and running, you will need to install the AWS IoT Greengrass Core software on the instance. Follow the AWS IoT Greengrass documentation to do this. You will need to do the following:

  1. Ensure that your EC2 instance has appropriate AWS permissions to make AWS API calls. You can do this by attaching an instance profile to the instance, or by providing AWS credentials directly to the instance as environment variables, as in the Greengrass documentation.
  2. Log in to your instance.
  3. Install OpenJDK 11. For Amazon Linux 2, you can use sudo amazon-linux-extras install java-openjdk11 to do this.
  4. Create the default system user and group that runs components on the device, with
    sudo useradd —system —create-home ggc_user
    sudo groupadd —system ggc_group
  5. Edit the /etc/sudoers file with sudo visudosuch that the entry for the root user looks like root ALL=(ALL:ALL) ALL
  6. Enable cgroups and enable and mount the memory and devices cgroups. In Amazon Linux 2, you can do this with the grubby utility as follows:
    sudo grubby --args="cgroup_enable=memory cgroup_memory=1 systemd.unified_cgroup_hierarchy=0" --update-kernel /boot/vmlinuz-$(uname -r)
  7. Type sudo reboot to reboot your instance with the cgroup boot parameters enabled.
  8. Log back in to your instance once it has rebooted.
  9. Use this command to download the AWS IoT Greengrass Core software to the instance:
    curl -s https://d2s8p88vqu9w66.cloudfront.net/releases/greengrass-nucleus-latest.zip > greengrass-nucleus-latest.zip
  10. Unzip the AWS IoT Greengrass Core software:
    unzip greengrass-nucleus-latest.zip -d GreengrassInstaller && rm greengrass-nucleus-latest.zip
  11. Run the following command to launch the installer. Replace each argument with appropriate values for your particular deployment, particularly the aws-region and thing-name arguments.
    sudo -E java -Droot="/greengrass/v2" -Dlog.store=FILE \
    -jar ./GreengrassInstaller/lib/Greengrass.jar \
    --aws-region region \
    --thing-name MyGreengrassCore \
    --thing-group-name MyGreengrassCoreGroup \
    --thing-policy-name GreengrassV2IoTThingPolicy \
    --tes-role-name GreengrassV2TokenExchangeRole \
    --tes-role-alias-name GreengrassCoreTokenExchangeRoleAlias \
    --component-default-user ggc_user:ggc_group \
    --provision true \
    --setup-system-service true \
    --deploy-dev-tools true
  12. You have now installed the AWS IoT Greengrass Core software on your EC2 instance. If you type sudo systemctl status greengrass.service then you should see output similar to this:

Step 2: Building and deploying a Lambda function

Now build a Lambda function and deploy it to the new Greengrass Core instance. You can find example local Lambda functions in the aws-greengrass-lambda-functions GitHub repository. This example will use the Hello World Python 3 function from that repo.

  1. Create the Lambda function. Go to the Lambda console, choose Create function, and select the Python 3.8 runtime:

  1. Choose Create function at the bottom of the page. Once your new function has been created, copy the code from the Hello World Python 3 example into your function:

  1. Choose Deploy to deploy your new function’s code.
  2. In the top right, choose Actions and select Publish new version. For this particular function, you would need to create a deployment package with the AWS IoT Greengrass SDK for the function to work on the device. I’ve omitted this step for brevity as it is not a main focus of this post. Please reference the Lambda documentation on deployment packages and the Python-specific deployment package docs if you want to pursue this option.

  1. Go to the AWS IoT Greengrass console and choose Components in the left-side pop-in menu.
  2. On the Components page, choose Create component, and then Import Lambda function. If you prefer to do this programmatically, see the relevant AWS IoT Greengrass documentation or AWS CloudFormation documentation.
  3. Choose your new Lambda function from the drop-down.

Create component

  1. Scroll to the bottom and choose Create component.
  2. Go to the Core devices menu in the left-side nav bar and select your Greengrass Core device. This is the Greengrass Core EC2 instance you set up earlier. Make a note of the core device’s name.

  1. Use the left-side nav bar to go to the Deployments menu. Choose Create to create a new deployment, which will place your Lambda function on your Outpost-hosted core device.
  2. Give the deployment a name and select Core device, providing the name of your core device. Choose Next.

  1. Select your Lambda function and choose Next.

  1. Choose Next again, on both the Configure components and Configure advanced settings On the last page, choose Deploy.

You should see a green message at the top of the screen indicating that your configuration is now being deployed.

Clean up

  1. Delete the Lambda function you created.
  2. Terminate the Greengrass Core EC2 instance.
  3. Delete the VPC.

Conclusion

Many customers use AWS Outposts to expand applications into new geographies. Some customers want to run Lambda-based applications on Outposts. This blog post shows how to use AWS IoT Greengrass to build Lambda functions which run locally on Outposts.

To learn more about Outposts, please contact your AWS representative and visit the Outposts homepage and documentation.

Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/making-your-go-workloads-up-to-20-faster-with-go-1-18-and-aws-graviton/

This blog post was written by Syl Taylor, Professional Services Consultant.

In March 2022, the highly anticipated Go 1.18 was released. Go 1.18 brings to the language some long-awaited features and additions, such as generics. It also brings significant performance improvements for Arm’s 64-bit architecture used in AWS Graviton server processors. In this post, we show how migrating Go workloads from Go 1.17.8 to Go 1.18 can help you run your applications up to 20% faster and more cost-effectively. To achieve this goal, we selected a series of realistic and relatable workloads to showcase how they perform when compiled with Go 1.18.

Overview

Go is an open-source programming language which can be used to create a wide range of applications. It’s developer-friendly and suitable for designing production-grade workloads in areas such as web development, distributed systems, and cloud-native software.

AWS Graviton2 processors are custom-built by AWS using 64-bit Arm Neoverse cores to deliver the best price-performance for your cloud workloads running in Amazon Elastic Compute Cloud (Amazon EC2). They provide up to 40% better price/performance over comparable x86-based instances for a wide variety of workloads and they can run numerous applications, including those written in Go.

Web service throughput

For web applications, the number of HTTP requests that a server can process in a window of time is an important measurement to determine scalability needs and reduce costs.

To demonstrate the performance improvements for a Go-based web service, we selected the popular Caddy web server. To perform the load testing, we selected the hey application, which was also written in Go. We deployed these packages in a client/server scenario on m6g Graviton instances.

Relative performance comparison for requesting a static webpage

The Caddy web server compiled with Go 1.18 brings a 7-8% throughput improvement as compared with the variant compiled with Go 1.17.8.

We conducted a second test where the client downloads a dynamic page on which the request handler performs some additional processing to write the HTTP response content. The performance gains were also noticeable at 10-11%.

Relative performance comparison for requesting a dynamic webpage

Regular expression searches

Searching through large amounts of text is where regular expression patterns excel. They can be used for many use cases, such as:

  • Checking if a string has a valid format (e.g., email address, domain name, IP address),
  • Finding all of the occurrences of a string (e.g., date) in a text document,
  • Identifying a string and replacing it with another.

However, despite their efficiency in search engines, text editors, or log parsers, regular expression evaluation is an expensive operation to run. We recommend identifying optimizations to reduce search time and compute costs.

The following example uses the Go regexp package to compile a pattern and search for the presence of a standard date format in a large generated string. We observed a 13.5% increase in completed executions with a 12% reduction in execution time.

Relative performance comparison for using regular expressions to check that a pattern exists

In a second example, we used the Go regexp package to find all of the occurrences of a pattern for character sequences in a string, and then replace them with a single character. We observed a 12% increase in evaluation rate with an 11% reduction in execution time.

Relative performance comparison for using regular expressions to find and replace all of the occurrences of a pattern

As with most workloads, the improvements will vary depending on the input data, the hardware selected, and the software stack installed. Furthermore, with this use case, the regular expression usage will have an impact on the overall performance. Given the importance of regex patterns in modern applications, as well as the scale at which they’re used, we recommend upgrading to Go 1.18 for any software that relies heavily on regular expression operations.

Database storage engines

Many database storage engines use a key-value store design to benefit from simplicity of use, faster speed, and improved horizontal scalability. Two implementations commonly used are B-trees and LSM (log-structured merge) trees. In the age of cloud technology, building distributed applications that leverage a suitable database service is important to make sure that you maximize your business outcomes.

B-trees are seen in many database management systems (DBMS), and they’re used to efficiently perform queries using indexes. When we tested a sample program for inserting and deleting in a large B-tree structure, we observed a 10.5% throughput increase with a 10% reduction in execution time.

Relative performance comparison for inserting and deleting in a B-Tree structure

On the other hand, LSM trees can achieve high rates of write throughput, thus making them useful for big data or time series events, such as metrics and real-time analytics. They’re used in modern applications due to their ability to handle large write workloads in a time of rapid data growth. The following are examples of databases that use LSM trees:

  • InfluxDB is a powerful database used for high-speed read and writes on time series data. It’s written in Go and its storage engine uses a variation of LSM called the Time-Structured Merge Tree (TSM).
  • CockroachDB is a popular distributed SQL database written in Go with its own LSM tree implementation.
  • Badger is written in Go and is the engine behind Dgraph, a graph database. Its design leverages LSM trees.

When we tested an LSM tree sample program, we observed a 13.5% throughput increase with a 9.5% reduction in execution time.

We also tested InfluxDB using comparison benchmarks to analyze writes and reads to the database server. On the load stress test, we saw a 10% increase of insertion throughput and a 14.5% faster rate when querying at a large scale.

Relative performance comparison for inserting to and querying from an InfluxDB database

In summary, for databases with an engine written in Go, you’ll likely observe better performance when upgrading to a version that has been compiled with Go 1.18.

Machine learning training

A popular unsupervised machine learning (ML) algorithm is K-Means clustering. It aims to group similar data points into k clusters. We used a dataset of 2D coordinates to train K-Means and obtain the cluster distribution in a deterministic manner. The example program uses an OOP design. We noticed an 18% improvement in execution throughput and a 15% reduction in execution time.

Relative performance comparison for training a K-means model

A widely-used and supervised ML algorithm for both classification and regression is Random Forest. It’s composed of numerous individual decision trees, and it uses a voting mechanism to determine which prediction to use. It’s a powerful method for optimizing ML models.

We ran a deterministic example to train a dense Random Forest. The program uses an OOP design and we noted a 20% improvement in execution throughput and a 15% reduction in execution time.

Relative performance comparison for training a Random Forest model

Recursion

An efficient, general-purpose method for sorting data is the merge sort algorithm. It works by repeatedly breaking down the data into parts until it can compare single units to each other. Then, it decides their order in the intermediary steps that will merge repeatedly until the final sorted result. To implement this divide-and-conquer approach, merge sort must use recursion. We ran the program using a large dataset of numbers and observed a 7% improvement in execution throughput and a 4.5% reduction in execution time.

Relative performance comparison for running a merge sort algorithm

Depth-first search (DFS) is a fundamental recursive algorithm for traversing tree or graph data structures. Many complex applications rely on DFS variants to solve or optimize hard problems in various areas, such as path finding, scheduling, or circuit design. We implemented a standard DFS traversal in a fully-connected graph. Then we observed a 14.5% improvement in execution throughput and a 13% reduction in execution time.

Relative performance comparison for running a DFS algorithm

Conclusion

In this post, we’ve shown that a variety of applications, not just those primarily compute-bound, can benefit from the 64-bit Arm CPU performance improvements released in Go 1.18. Programs with an object-oriented design, recursion, or that have many function calls in their implementation will likely benefit more from the new register ABI calling convention.

By using AWS Graviton EC2 instances, you can benefit from up to a 40% price/performance improvement over other instance types. Furthermore, you can save even more with Graviton through the additional performance improvements by simply recompiling your Go applications with Go 1.18.

To learn more about Graviton, see the Getting started with AWS Graviton guide.

AWS Week In Review – May 30, 2022

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-week-in-review-may-30-2022/

Today, the US observes Memorial Day. South Korea also has a national Memorial Day, celebrated next week on June 6. In both countries, the day is set aside to remember those who sacrificed in service to their country. This time provides an opportunity to recognize and show our appreciation for the armed services and the important role they play in protecting and preserving national security.

AWS also has supported our veterans, active-duty military personnel, and military spouses with our training and hiring programs in the US. We’ve developed a number of programs focused on engaging the military community, helping them develop valuable AWS technical skills, and aiding in transitioning them to begin their journey to the cloud. To learn more, see AWS’s military commitment.

Last Week’s Launches
The launches that caught my attention last week are the following:

Three New AWS Wavelength Zones in the US and South Korea  – We announced the availability of three new AWS Wavelength Zones on Verizon’s 5G Ultra Wideband network in Nashville, Tennessee, and Tampa, Florida in the US, and Seoul in South Korea on SK Telecom’s 5G network.

AWS Wavelength Zones embed AWS compute and storage services at the edge of communications service providers’ 5G networks while providing seamless access to cloud services running in an AWS Region. We have a total of 28 Wavelength Zones in Canada, Germany, Japan, South Korea, the UK, and the US globally. Learn more about AWS Wavelength and get started today.

New Amazon EC2 C7g, M6id, C6id, and P4de Instance Types – Last week, we announced four new EC2 instance types. C7g instances are the first instances powered by the latest AWS Graviton3 processors and deliver up to 25 percent better performance over Graviton2-based C6g instances for a broad spectrum of applications, even high-performance computing (HPC) and CPU-based machine learning (ML) inference.

M6id and C6id instances are powered by the Intel Xeon Scalable processors (Ice Lake) with an all-core turbo frequency of 3.5 GHz, equipped with up to 7.6 TB of local NVMe-based SSD block-level storage, and deliver up to 15 percent better price performance compared to the previous generation instances.

P4de instances are a preview of our latest GPU-based instances that provide the highest performance for ML training and HPC applications. It is powered by 8 NVIDIA A100 GPUs with 80 GB high-performance HBM2e GPU memory, 2X higher than the GPUs in our current P4d instances. The new P4de instances provide a total of 640GB of GPU memory, providing up to 60 percent better ML training performance along with 20 percent lower cost to train when compared to P4d instances.

Amazon EC2 Stop Protection Feature to Protect Instances From Unintentional Stop Actions – Now you don’t have to worry about stopping or terminating your instances from accidental actions. With Stop Protection, you can safeguard data in instance store volume(s) from unintentional stop actions. Previously, you could protect your instances from unintentional termination actions by enabling Termination Protection too.

When enabled, the Stop or Termination Protection feature blocks attempts to stop or terminate the instance via the EC2 console, API call, or CLI command. This feature provides an extra measure of protection for stateful workloads since instances can be stopped or terminated only by deactivating the Stop Protection feature.

AWS DataSync Supports Google Cloud Storage and Azure Files Storage Locations – We announced the general availability of two additional storage locations for AWS DataSync, an online data movement service that makes it easy to sync your data both into and out of the AWS Cloud. With this release, DataSync now supports Google Cloud Storage and Azure Files storage locations in addition to Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), Amazon FSx for Windows File Server, Amazon FSx for Lustre, and Amazon FSx for OpenZFS.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Last week, there were lots of announcements of public sectors at AWS Summit Washington, DC.

To learn more, watch the keynote of Max Peterson, Vice President of AWS Worldwide Public Sector.

Upcoming AWS Events
If you have a developer background or similar and are looking to develop ML skills you can use to solve real-world problems, Let’s Ship It – with AWS! ML Edition is the perfect place to start. Over eight episodes of Twitch training scheduled from June 2 to July 21, you can learn hands-on how to build ML models, such as predicting demand and personalizing your offerings, and more.

The AWS Summit season is mostly over in Asia Pacific and Europe, but there are some upcoming virtual and in-person Summits that might be close to you in June:

More to come in August and September.

Please join Amazon re:MARS 2022 (June 21 – 24) to hear from recognized thought leaders and technical experts who are building the future of machine learning, automation, robotics, and space. You can preview Robotics at Amazon to discuss the recent real-world challenges of building robotic systems, published by Amazon Science.

You can now register for AWS re:Inforce 2022 (July 26 – 27). Join us in Boston to learn how AWS is innovating in the world of cloud security, and hone your technical skills in expert-led interactive sessions.

You can now register for AWS re:Invent 2022 (November 28 – December 2). Join us in Las Vegas to experience our most vibrant event that brings together the global cloud community. You can virtually attend live keynotes and leadership sessions and access our on-demand breakout sessions even after re:Invent closes.

That’s all for this week. Check back next Monday for another Week in Review!

Channy

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

New – Amazon EC2 M6id and C6id Instances with Up to 7.6 TB Local NVMe Storage

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-m6id-and-c6id-instances-with-up-to-7-6-tb-local-nvme-storage/

Last year, we launched the Amazon EC2 M6i instances and C6i instances, our sixth-generation offerings that include 3rd generation Intel Xeon Scalable processors.

Today we are expanding Amazon EC2 M6id and C6id instances, backed by NVMe-based SSD block-level instance storage physically connected to the host server. These instances are powered by the Intel Xeon Scalable processors (Ice Lake) with an all-core turbo frequency of 3.5 GHz, equipped with up to 7.6 TB of local NVMe-based SSD block-level storage, and deliver up to 15 percent better price performance compared to previous generation instances.

M6id instances are ideal for workloads that require a balance of compute and memory resources along with high-speed, low-latency local block storage, including data logging and media processing. C6id is ideal for compute-intensive workloads, including those that need access to high-speed, low-latency local storage like video encoding, image manipulation, and other forms of media processing. Both M6id and C6id will also benefit applications that need temporary storage of data, such as batch and log processing and applications that need caches and scratch files.

Compared to previous generation instances, new instance types provide:

  • Up to 58 percent higher storage per vCPU and 34 percent lower cost per TB compared to M5d instances, and up to 138 percent higher storage per vCPU and 56 percent lower cost per TB compared with C5d instances.
  • Larger instance sizes (32xlarge) with up to 128 vCPUs and 512 GiB (M6id) or 256 GiB (C6id) of memory that make it easier and more cost-efficient to consolidate workloads and scale up applications.
  • Up to 15 percent improvement in compute price performance and 20 percent higher memory bandwidth.
  • 2 times increased bandwidth up to 40 Gbps for Amazon EBS and 50 Gbps for networking.

Here are the specs of M6id instances in detail:

Instance Name vCPUs RAM (GiB) Local NVMe SSD Storage (GB) EBS Throughput (Gbps) Network Bandwidth (Gbps)
m6id.large 2 8 1 x 118 Up to 10 Up to 12.5
m6id.xlarge 4 16 1 x 237 Up to 10 Up to 12.5
m6id.2xlarge 8 32 1 x 474 Up to 10 Up to 12.5
m6id.4xlarge 16 64 1 x 950 Up to 10 Up to 12.5
m6id.8xlarge 32 128 1 x 1900 10 12.5
m6id.12xlarge 48 192 2 x 1425 15 18.75
m6id.16xlarge 64 156 2 x 1900 20 25
m6id.24xlarge 96 384 4 x 1425 30 37.5
m6id.32xlarge 128 512 4 x 1900 40 50
m6id.metal 128 512 4 x 1900 40 50

Here are also the specs of C6id instances in detail:

Instance Name vCPUs RAM (GiB) Local NVMe SSD Storage (GB) EBS Throughput (Gbps) Network Bandwidth (Gbps)
c6id.large 2 4 1 x 118 Up to 10 Up to 12.5
c6id.xlarge 4 8 1 x 237 Up to 10 Up to 12.5
c6id.2xlarge 8 16 1 x 474 Up to 10 Up to 12.5
c6id.4xlarge 16 32 1 x 950 Up to 10 Up to 12.5
c6id.8xlarge 32 64 1 x 1900 10 12.5
c6id.12xlarge 48 96 2 x 1425 15 18.75
c6id.16xlarge 64 128 2 x 1900 20 25
c6id.24xlarge 96 192 4 x 1425 30 37.5
c6id.32xlarge 128 256 4 x 1900 40 50
c6id.metal 128 256 4 x 1900 40 50

You can use any Amazon Machine Images (AMIs) that include drivers for the Elastic Network Adapter (ENA) and NVMe. For optimal networking performance on these new instances, ENA driver update may be required. For more information on optimal ENA driver for M6id and C6id instances, see this article on migrating instances.

Here are a couple of things to remind you about the local NVMe storage on these instances:

  • You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
  • Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.
  • Local NVMe devices have the same lifetime as the instance they are attached to and do not stick around after the instance has been stopped or terminated.

Now Available
You can launch M6id and C6id instances today in the AWS US East (Ohio), US East (N. Virginia), US West (Oregon), and Europe (Ireland) Regions as On-Demand, Spot, and Reserved Instances or as part of a Savings Plan. As usual with EC2, you pay for what you use. For more information, see the EC2 pricing page.

To learn more, visit our Amazon EC2 M6i instances or C6i instances page, and please send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

– Channy

Implementing lightweight on-premises API connectivity using inverting traffic proxy

Post Syndicated from Oleksiy Volkov original https://aws.amazon.com/blogs/architecture/implementing-lightweight-on-premises-api-connectivity-using-inverting-traffic-proxy/

This post will explore the use of lightweight application inversion proxy as a solution for multi-point hybrid or multi-cloud, API-level connectivity for cases where AWS Direct Connect or VPN may not be practical. Then, we will present a sample solution and explain how it addresses typical challenges involved in this space.

Defining the issue

Large ISV providers and integration vendors often need to have API-level integration between a central cloud-based system and a number of on-premises APIs. Use cases can range from refactoring/modernization initiatives to interfacing with legacy on-premises applications, which have no direct migration path to the cloud.

The typical approach is to use VPN or Direct Connect, as they can provide significant benefits in terms of latency and security. However, they are not always practical in situations involving multi-source systems deployed by various groups or organizations that may have significant budget, process, or timeline constraints.

Conceptual solution

An option that addresses the connectivity need is an inverting application proxy, which can be deployed as a lightweight executable on an on-premises backend. The locally deployed agent can communicate with the proxy server on AWS using an inverted communication pattern. This means that the agent will establish outbound connection to the proxy, and it will use the connection to receive inbound requests, too. Figure 1 describes a sample architecture using inverting proxy pattern using Amazon API Gateway façade.

Inverting application proxy

Figure 1. Inverting application proxy

The advantages of this approach include ease-of-deployment (drop-in executable agent) and -configuration. As the proxy inverts the direction of application connectivity to originate from on-premises servers, the local firewall does not need to be reconfigured to open additional ports needed for traditional proxy deployment.

Realizing the solution on AWS

We have built a sample traffic routing solution based on the original open-source Inverting Proxy and Agent by Ian Maddox, Jason Cooke, and Omar Janjur. The solution is written in Go and leverages multiple AWS services to provide additional telemetry, security, and discoverability capabilities that address the common needs of enterprise customers.

The solution is comprised of an inverting proxy and a forwarding agent. The inverting proxy is deployed on AWS as a stand-alone executable running on Amazon Elastic Compute Cloud (EC2) and responsible for forwarding traffic to the agent. The agent can be deployed as a binary or container within the target on-premises system.

Upon starting, the agent will establish an outbound connection with the proxy and local sever application. Once established, the proxy will use it in reverse to forward all incoming client requests through the agent and to the backend application. The connection is secured by Transport Layer Security (TLS) to protect communications between client and proxy and between agent and backend application.

This solution uses a unique backend ID and IAM user/role tags to identify different backend servers and control access to proxies. The backend ID is passed as a command-line parameter to the agent. The agent checks the IAM account or IAM role Amazon EC2 is running under for tag “AllowedBackends”. The tag contains coma-separated list of backend IDs that the agent is allowed to access. The connectivity is established only if the provided backend ID matches one of the values in the coma-separated list.

The solution supports native integration with AWS Cloud Map to enable automatic discoverability of remote API endpoints. Upon start and once the IAM access control checks are successfully validated, the agent can register the backend endpoints within AWS Cloud Map using a provided service name and service namespace ID.

Inverting proxy agent can collect telemetry and automatically publish it to Amazon CloudWatch using a custom namespace. This includes HTTP response codes and counts from server application aggregated by the backend ID.

For full list of options, features, and supported configurations, use --help command-line parameter with both agent and proxy executables.

Enabling highly resilient proxy deployment

For production scenarios that require high availability, deploy a pair of inverting proxies connecting to a pair of agents deployed on separate EC2 instances. The entire configuration is then placed behind Application Load Balancer to provide a single point of ingress, load-balancing, and health-checking functionality. Figure 2 demonstrates a highly resilient setup for critical workloads.

Highly resilient deployment diagram for inverting proxy

Figure 2. Highly resilient deployment diagram for inverting proxy

Additionally, for real-life production workloads dealing with sensitive data, we recommend following security and resilience best practices for Amazon EC2.

Deploying and running the solution

The solution includes a simple demo Node.js server application to simulate connectivity with an inverting proxy. A restrictive security group will be used to simulate on-premises data center.

Steps to deployment:

1. Create a “backend” Amazon EC2 server using Linux 2, free-tier AMI. Ensure that Port 443 (inbound port for sample server application) is blocked from external access via appropriate security group.

2. Connect by using SSH into target server run updates.

sudo yum update -y

3. Install development tools and dependencies:

sudo yum groupinstall "Development Tools" -y

4. Install Golang:

sudo yum install golang -y

5. Install node.js.

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.34.0/install.sh | bash

. ~/.nvm/nvm.sh

nvm install 16

6. Clone the inverting proxy GitHub repository to the “backend” EC2 instance.

7. From inverting-proxy folder, build the application by running:

mkdir /home/ec2-user/inverting-proxy/bin

export GOPATH=/home/ec2-user/inverting-proxy/bin

make

8. From /simple-server folder, run the sample appTLS application in the background (see instructions below). Note: to enable SSL you will need to generate encryption key and certificate files (server.crt and server.key) and place them in simple-server folder.

npm install

node appTLS &

Example app listening at https://localhost:443

Confirm that the application is running by using ps -ef | grep node:

ec2-user  1700 30669  0 19:45 pts/0    00:00:00 node appTLS

ec2-user  1708 30669  0 19:45 pts/0    00:00:00 grep --color=auto node

9. For backend Amazon EC2 server, navigate to Amazon EC2 security settings and create an IAM role for the instance. Keep default permissions and add “AllowedBackends” tag with the backend ID as a tag value (the backend ID can be any string that matches the backend ID parameter in Step 13).

10. Create a proxy Amazon EC2 server using Linux AMI in a public subnet and connect by using SSH in an Amazon EC2 once online. Copy the contents of the bin folder from the agent EC2 or clone the repository and follow build instructions above (Steps 2-7).

Note: the agent will be establishing outbound connectivity to the proxy; open the appropriate port (443) in the proxy Amazon EC2 security group. The proxy server needs to be accessible by the backend Amazon EC2 and your client workstation, as you will use your local browser to test the application.

11. To enable TLS encryption on incoming connections to proxy, you will need to generate and upload the certificate and private key (server.crt and server.key) to the bin folder of the proxy deployment.

12. Navigate to /bin folder of the inverting proxy and start the proxy by running:

sudo ./proxy –port 443 -tls

2021/12/19 19:56:46 Listening on [::]:443

13. Use the SSH to connect into the backend Amazon EC2 server and configure the inverting proxy agent. Navigate to /bin folder in the cloned repository and run the command below, replacing uppercase strings with the appropriate values. Note: the required trailing slash after the proxy DNS URL.

./proxy-forwarding-agent -proxy https://YOUR_PROXYSERVER_PUBLIC_DNS/ -backend SampleBackend-host localhost:443 -scheme https

14. Use your local browser to navigate to proxy server public DNS name (https://YOUR_PROXYSERVER_PUBLIC_DNS). You should see the following response from your sample backend application:

Hello World!

Conclusion

Inverting proxy is a flexible, lightweight pattern that can be used for routing API traffic in non-trivial hybrid and multi-cloud scenarios that do not require low-latency connectivity. It can also be used for securing existing endpoints, refactoring legacy applications, and enabling visibility into legacy backends. The sample solution we have detailed can be customized to create unique implementations and provides out-of-the-box baseline integration with multiple AWS services.

New – Amazon EC2 C7g Instances, Powered by AWS Graviton3 Processors

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-amazon-ec2-c7g-instances-powered-by-aws-graviton3-processors/

I am excited to announce that Amazon Elastic Compute Cloud (Amazon EC2) C7g instances powered by the latest AWS Graviton3 processors that have been available in preview since re:Invent last year are now available for all.

Let’s decompose the name C7g: the “C” instance family is designed for compute-intensive workloads. This is the 7th generation of this instance family. And the “g” means it is based on AWS Graviton, the silicon designed by AWS. These instances are the first instances to be powered by the latest generation of AWS Graviton, the Graviton3 processors.

As you bring more diverse workloads to the cloud, and as your compute, storage, and networking demands increase at a rapid pace, you are asking us to push the price performance boundary even further so that you can accelerate your migration to the cloud and optimize your costs. Additionally, you are looking for more energy-efficient compute options to help you reduce your carbon footprint and achieve your sustainability goals. We do this by working back from your requests, and innovating at a rapid pace across all levels of the AWS infrastructure. Our Graviton chips offer better performance at lower cost along with enhanced capabilities. For example, AWS Graviton3 processors offer you enhanced security with always-on memory encryption, dedicated caches for every vCPU, and support for pointer authentication.

Let’s illustrate this with numbers. When we launched Graviton2-based instances, they provided up to 40 percent better price/performance for a wide variety of workloads over comparable fifth-generation x86-based instances. We now have 12 instance families (M6g, M6gd, C6g, C6gd, C6gn, R6g, R6gd, T4g, X2gd, Im4gn, Is4gen, and G5g) that are powered by AWS Graviton2 processors that provide significant price performance benefits for a wide range of workloads. In 2021, we saw tens of thousands of AWS customers take advantage of this innovation by using Graviton2-based EC2 instances.

Our next generation, Graviton3 processors, deliver up to 25 percent higher performance, up to 2x higher floating-point performance, and 50 percent faster memory access based on leading-edge DDR5 memory technology compared with Graviton2 processors.

Graviton3 also uses up to 60 percent less energy for the same performance as comparable EC2 instances, which helps you reduce your carbon footprint.

Snap Inc, known for its popular social media services such as Snapchat and Bitmoji, adopted AWS Graviton2-based instances to optimize their price performance on Amazon EC2. Aaron Sheldon, software engineer at Snap, told us: “We trialed the new AWS Graviton3-based Amazon EC2 C7g instances and found that they provide significant performance improvements on real workloads compared to previous generation C6g instances. We are excited to migrate our Graviton2-based workloads to Graviton3, including messaging, storage, and friend graph workloads.”

The C7g instances are available in eight sizes with 1, 2, 4, 8, 16, 32, 48, and 64 vCPUs. C7g instances support configurations up to 128 GiB of memory, 30 Gbps of network performance, and 20 Gbps of Amazon Elastic Block Store (EBS) performance. These instances are powered by the AWS Nitro System, a combination of dedicated hardware and a lightweight hypervisor.

The following table summarizes the key characteristics of each instance type in this family.

Instance Name vCPUs
Memory
Network Bandwidth
EBS Bandwidth
c7g.medium 1 2 GiB up to 12.5 Gbps up to 10 Gbps
c7g.large 2 4 GiB up to 12.5 Gbps up to 10 Gbps
c7g.xlarge 4 8 GiB up to 12.5 Gbps up to 10 Gbps
c7g.2xlarge 8 16 GiB up to 15 Gbps up to 10 Gbps
c7g.4xlarge 16 32 GiB up to 15 Gbps up to 10 Gbps
c7g.8xlarge 32 64 GiB 15 Gbps 10 Gbps
c7g.12xlarge 48 96 GiB 22.5 Gbps 15 Gbps
c7g.16xlarge 64 128 GiB 30 Gbps 20 Gbps

C7g instances are initially available in US East (N. Virginia) and US West (Oregon) AWS Regions; other Regions will be added shortly after launch.

As usual, you can purchase C7g capacity on demand, as Reserved Instances, or as Spot instances, and use your Saving Plans. The pricing details are available on the EC2 pricing page.

I have the chance to talk with AWS customers on a daily basis, and many of my discussions are around price performance and the sustainability of their workloads. With more than 500 instance types to choose from, one question I often receive is: what are the workloads that would benefit from C7g?

You will find that C7g instances provide the best price performance within their instance families for a broad spectrum of compute-intensive workloads, including application servers, micro services, high-performance computing, electronic design automation, gaming, media encoding, or CPU-based ML inference. These instances are ideal for all Linux-based workloads, including containerized and micro service-based applications built using Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Container Registry, Kubernetes, and Docker, and written in popular programming languages such as C/C++, Rust, Go, Java, Python, .NET Core, Node.js, Ruby, and PHP.

The next question I receive is: given that Graviton instances are based on Arm architecture, how difficult is it to migrate from x86?

Graviton3 instances are supported by a broad choice of operating systems, independent software vendors, container services, agents, and developer tools, enabling you to migrate your workloads with minimal effort.

Applications and scripts written in high-level programming languages such as Python, Node.js, Ruby, Java, or PHP will typically just require a redeployment. Applications written in lower-level programming languages such as C/C++, Rust, or Go will require a re-compilation.

But you don’t always need to migrate your applications. Several managed services are based on Graviton already, such as Amazon ElastiCache, Amazon EKS, Amazon ECS, Amazon Relational Database Service (RDS), Amazon EMR, Amazon Aurora, and Amazon OpenSearch Service, and your application can benefit from Graviton with minimal efforts. A French customer told me recently they migrated a significant portion of their Amazon EMR clusters to Graviton by doing just one line change in their Terraform scripts; all the rest worked as-is.

For those of you building with serverless, we have also released Graviton support for AWS Fargate and AWS Lambda, extending the price, efficiency, and performance benefits of Graviton to serverless workloads. Lambda functions using Graviton2 can see up to 34 percent better price/performance.

Reducing the carbon footprint of your organization is also of paramount importance. Reducing the carbon footprint of cloud-based workloads is a shared responsibility between you and us. We do our part by innovating at all levels: from the materials used to build our facilities, the usage of water for cooling, and the production of renewable energy, down to inventing new silicons that are more energy efficient. To help you meet your own sustainability goals, we added a sustainability pillar to the AWS Well-Architected framework, and we released the Customer Carbon Footprint tool. Graviton3 fits into that context. It uses up to 60 percent less energy for the same performance as comparable EC2 instances.

We do our part in this shared responsibility model, and now, it is your turn. You can use our innovations and tools to help you optimize your workloads and only use the resources you need. Take the occasion to write clever code that uses fewer CPU cycles, less storage, or less network bandwidth. And be sure to select energy-efficient options, such as Graviton3-based instance types or managed services, when deploying your code.

To help you to get started migrating your applications to Graviton instance types today, we curated this list of technical resources. Have a look at it. To learn more about Graviton-based instances, visit the Graviton page or the C7g page and check out this video:

If you’d like to get started with Graviton-based instances for free, we also just reintroduced the free trial on T4g.small instances for up to 750 hours/month until the end of this year (December 31, 2022).

And now, go build 😉

— seb

AWS Week In Review – May 23, 2022

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-week-in-review-may-27-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

This is the right place to quickly learn about recent AWS news from last week, in just about five minutes or less. This week, I have collected a couple of news items that might be of interest to you, the IT professionals, developers, system administrators, or any type of builders that have their hands on the AWS console, the CLI, or that are writing code.

Last Week’s Launches
The launches that caught my attention last week are the following:

EC2 now supports NitroTPM and SecureBoot – A Trusted Platform Module is often a discrete chip in a computer where you can store secrets and release them to the operating system only when the system is in a known good state. You typically use TPM modules to store operating-system-level volume encryption keys, such as the ones used by BitLocker on Windows or LUKS. NitroTPM is a virtual TPM module available on selected instance families that allows you to deploy your workloads depending on TPM functionalities on EC2 instances.

Amazon EC2 Auto Scaling now backfills predictive scaling forecasts so you can quickly validate forecast accuracy. Auto Scaling Predictive Scaling is a capability of Auto Scaling that allows you to scale your fleet in and out based on observed usage patterns. It uses AI/ML to predict when your fleet needs more or less capacity. It allows you to scale a fleet in advance of the scaling event and have the fleet prepared at peak times. The new backfills shows you how predictive scaling would have scaled your fleet during the last 14 days. This allows you to quickly decide if the predictive scaling policy is accurate for your applications by comparing the demand and capacity forecasts against actual demand immediately after you create a predictive scaling policy.

AWS Backup adds support for two new managed file systems, Amazon FSx for OpenZFS and Amazon Fsx for NetApp ONTAP. These additions helps you meet your centralized data protection and regulatory compliance needs. You can now use AWS Backup’s policy-based capabilities to centrally protect Amazon FSx for NetApp ONTAP or Amazon Fsx for OpenZFS, along with the other AWS services for storage, database, and compute that AWS Backup supports.

AWS App Mesh now supports IPv6 AWS App Mesh is a service mesh that provides application-level networking to make it easy for your services to communicate with each other across multiple types of compute infrastructure. The new support for IPv6 allows you to support workloads running in IPv6 networks and to invoke App Mesh APIs over IPv6. This helps you meet IPv6 compliance requirements, and removes the need for complex networking configuration to handle address translation between IPv4 and IPv6.

Amazon Chime SDK now supports video background replacement and blur on iOS and Android. When you want to integrate audio and video call capabilities in your mobile applications, the Chime SDK is the easiest way to get started. It provides an easy-to-use API that uses the scalable and robust Amazon Chime backend to power your communications. For example, Slack is using Chime as backend for the communications in their apps. The Chime SDK client libraries for iOS and Android now include video background replacement and blur, which developers can use to reduce visual distractions and help increase visual privacy for mobile users on iOS and Android.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you may have missed:

Amazon Redshift: Ten years of continuous reinvention. This is an Amazon Redshift research paper that will be presented at a leading international forum for database researchers. The authors reflect on how far the first petabyte-scale cloud data warehouse has advanced since it was announced ten years ago.

Improve Your Security at the Edge with AWS IoT Services is a new blog post on the IoT channel. We understand the risks associated with operating at the edge and that you need additional capabilities to ensure that your data is protected. AWS IoT services can help you with end-to-end data protection, device security, and device identification to create the foundation of an expanded information security model and confidently operate at the edge.

AWS Open Source News and Updates – Ricardo Sueiras, my colleague from the AWS Developer Relation team, runs this newsletter. It brings you all the latest open-source projects, posts, and more. Read edition #113 here.

Upcoming AWS Events
CDK Day, on May 26 is a one-day fully virtual event dedicated to the AWS Cloud Development Kit. With four versions of the CDK released (AWS, Terraform, CDK8s, and Projen), we tought the CDK deserves its own full-fledged conference. We will take one day and showcase the brightest and best of CDK from across the whole product family. Let’s talk serverless, Kubernetes and multi-cloud all on the same day! CDK Day will take place on May 26, 2022 and will be fully virtual, live-streamed to our YouTube channel. Book your ticket now, it’s free.

The AWS Summit season is mostly over in Europe, but there are upcoming Summits in North America and the Asia Pacific Regions. Here are some virtual and in-person Summits that might be close to you:

More to come in July, August, and September.

You can register for re:MARS to get fresh ideas on topics such as machine learning, automation, robotics, and space. The conference will be in person in Las Vegas, June 21–24.

That’s all for this week. Check back next Monday for another Week in Review!

— seb

Amazon EC2 Now Supports NitroTPM and UEFI Secure Boot

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-ec2-now-supports-nitrotpm-and-uefi-secure-boot/

In computing, Trusted Platform Module (TPM) technology is designed to provide hardware-based, security-related functions. A TPM chip is a secure crypto-processor that is designed to carry out cryptographic operations. There are three key advantages of using TPM technology. First, you can generate, store, and control access to encryption keys outside of the operating system. Second, you can use a TPM module to perform platform device authentication by using the TPM’s unique RSA key, which is burned into it. And third, it may help to ensure platform integrity by taking and storing security measurements.

During re:Invent 2021, we announced the future availability of NitroTPM, a virtual TPM 2.0-compliant TPM module for your Amazon Elastic Compute Cloud (Amazon EC2) instances, based on AWS Nitro System. We also announced Unified Extensible Firmware Interface (UEFI) Secure Boot availability for EC2.

I am happy to announce you can start to use both NitroTPM and Secure Boot today in all AWS Regions outside of China, including the AWS GovCloud (US) Regions.

You can use NitroTPM to store secrets, such as disk encryption keys or SSH keys, outside of the EC2 instance memory, protecting them from applications running on the instance. NitroTPM leverages the isolation and security properties of the Nitro System to ensure only the instance can access these secrets. It provides the same functions as a physical or discrete TPM. NitroTPM follows the ISO TPM 2.0 specification, allowing you to migrate existing on-premises workloads that leverage TPMs to EC2.

The availability of NitroTPM unlocks a couple of use cases to strengthen the security posture of your EC2 instances, such as secured key storage and access for OS-level volume encryption or platform attestation for measured boot or identity access.

Secured Key Storage and Access
NitroTPM can create and store keys that are wrapped and tied to certain platform measurements (known as Platform Configuration Registers – PCR). NitroTPM unwraps the key only when those platform measurements have the same value as they had at the moment the key was created. This process is referred to as “sealing the key to the TPM.” Decrypting the key is called unsealing. NitroTPM only unseals keys when the instance and the OS are in a known good state. Operating systems compliant with TPM 2.0 specifications use this mechanism to securely unseal volume encryption keys. You can use NitroTPM to store encryption keys for BitLocker on Microsoft Windows. Linux Unified Key Setup (LUKS) or dm-verity on Linux are examples of OS-level applications that can leverage NitroTPM too.

Platform Attestation
Another key feature that NitroTPM provides is “measured boot” a process where the bootloader and operating system extend PCRs with measurements of the software or configuration that they load during the boot process. This improves security in the event that, for example, a malicious program overwrites part of your kernel with malware. With measured boot, you can also obtain signed PCR values from the TPM and use them to prove to remote servers that the boot state is valid, enabling remote attestation support.

How to Use NitroTPM
There are three prerequisites to start using NitroTPM:

  • You must use an operating system that has Command Response Buffer (CRB) drivers for TPM 2.0, such as recent versions of Windows or Linux. We tested the following OSes: Red Hat Enterprise Linux 8, SUSE Linux Enterprise Server 15, Ubuntu 18.04, Ubuntu 20.04, and Windows Server 2016, 2019, and 2022.
  • You must deploy it on a Nitro-based EC2 instance. At the moment, we support all Intel and AMD instance types that support UEFI boot mode. Graviton1, Graviton2, Xen-based, Mac, and bare-metal instances are not supported.
  • Note that NitroTPM does not work today with some additional instance types, but support for these instance types will come soon after the launch. The list is: C6a, C6i, G4ad, G4dn, G5, Hpc6a, I4i, M6a, M6i, P3dn, R6i, T3, T3a, U-12tb1, U-3tb1, U-6tb1, U-9tb1, X2idn, X2iedn, and X2iezn.
  • When you create your own AMI, it must be flagged to use UEFI as boot mode and NitroTPM. Windows AMIs provided by AWS are flagged by default. Linux-based AMI are not flagged by default; you must create your own.

How to Create an AMI with TPM Enabled
AWS provides AMIs for multiple versions of Windows with TPM enabled. I can verify if an AMI supports NitroTPM using the DescribeImagesAPI call. For example:

aws ec2 describe-images --image-ids ami-0123456789

When NitroTPM is enabled for the AMI, “TpmSupport”: “v2.0” appears in the output, such as in the following example.

{
   "Images": [
      {
         ...
         "BootMode": "uefi",
         "TpmSupport": "v2.0"
      }
   ]
}

I may also query for tpmSupport using the DescribeImageAttribute API call.

When creating my own AMI, I may enable TPM support using the RegisterImage API call, by setting boot-mode to uefi and tpm-support to v2.0.

aws ec2 register-image             \
       --region us-east-1           \
       --name my-image              \
       --boot-mode uefi             \
       --architecture x86_64        \
       --root-device-name /dev/xvda \
       --block-device-mappings DeviceName=/dev/xvda,Ebs={SnapshotId=snap-0123456789example} DeviceName=/dev/xvdf,Ebs={VolumeSize=10} \
       --tpm-support v2.0

Now that you know how to create an AMI with TPM enabled, let’s create a Windows instance and configure BitLocker to encrypt the root volume.

A Walk Through: Using NitroTPM with BitLocker
BitLocker automatically detects and uses NitroTPM when available. There is no extra configuration step beyond what you do today to install and configure BitLocker. Upon installation, BitLocker recognizes the TPM module and starts to use it automatically.

Let’s go through the installation steps. I start the instance as usual, using an AMI that has both uefi and TPM v2.0 enabled. I make sure I use a supported version of Windows. Here I am using Windows Server 2022 04.13.

Once connected to the instance, I verify that Windows recognizes the TPM module. To do so, I launch the tpm.msc application, and the Trusted Platform Module (TPM) Management window opens. When everything goes well, it shows Manufacturer Name: AMZN under TPM Manufacturer Information.

Trusted Platform Module ManagementNext, I install BitLocker.

I open the servermanager.exe application and select Manage at the top right of the screen. In the dropdown menu, I select Add Roles and Features.

Add roles and featuresI select Role-based or feature-based installation from the wizard.

Install BitLocker - Step 1I select Next multiple times until I reach the Features section. I select BitLocker Drive Encryption, and I select Install.

Install BitLocker - Step 2I wait a bit for the installation and then restart the server at the end of the installation.

After reboot, I reconnect to the server and open the control panel. I select BitLocker Drive Encryption under the System and Security section.

Turn on Bitlocker - part 1I select Turn on BitLocker, and then I select Next and wait for the verification of the system and the time it takes to encrypt my volume’s data.

Just for extra safety, I decide to reboot at the end of the encryption. It is not strictly necessary. But I encrypted the root volume of the machine (C:) so I am wondering if the machine can still boot.

After the reboot, I reconnect to the instance, and I verify the encryption status.

Turn on Bitlocker - part 2I also verify BitLocker’s status and key protection method enabled on the volume. To do so, I open PowerShell and type

manage-bde -protectors -get C:

Bitlocker statusI can see on the resulting screen that the C: volume encryption key is coming from the NitroTPM module and the instance used Secure Boot for integrity validation. I can also view the recovery key.

I left the recovery key in plain text in the previous screenshot because the instance and volume I used for this demo will not exist anymore by the time you will read this. Do not share your recovery keys publicly otherwise.

Important Considerations
Now that I have shown how to use NitroTPM to protect BitLocker’s volume encryption key, I’ll go through a couple of additional considerations:

  • You can only enable an AMI for NitroTPM support by using the RegisterImage API via the AWS CLI and not via the Amazon EC2 console.
  • NitroTPM support is enabled by setting a flag on an AMI. After you launch an instance with the AMI, you can’t modify the attributes on the instance. The ModifyInstanceAttribute API is not supported on running or stopped instances.
  • Importing or exporting EC2 instances with NitroTPM, such as with the ImportImage API, will omit NitroTPM data.
  • The NitroTPM state is not included in EBS snapshots. You can only restore an EBS snapshot to the same EC2 instance.
  • BitLocker volumes that are encrypted with TPM-based keys cannot be restored on a different instance. It is possible to change the instance type (stop, change instance type, and restart it).

At the moment, we support all Intel and AMD instance types that supports UEFI boot mode. Graviton1, Graviton2, Xen-based, Mac, and bare-metal instances are not supported. Some additional instance types are not supported at launch (I shared the exact list previously). We will add support for these soon after launch.

There is no additional cost for using NitroTPM. It is available today in all AWS Regions, including the AWS GovCloud (US) Regions, except in China.

And now, go build 😉

— seb

Amazon EC2 DL1 instances Deep Dive

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/amazon-ec2-dl1-instances-deep-dive/

This post is written by Amr Ragab, Principal Solutions Architect, Amazon EC2.

AWS is excited to announce that the new Amazon Elastic Compute Cloud (Amazon EC2) DL1 instances are now generally available in US-East (N. Virginia) and US-West (Oregon). DL1 provides up to 40% better price performance for training deep learning models as compared to current generation GPU-based EC2 instances. The dl1.24xlarge instance type features eight Intel-Habana Gaudi accelerators, which are custom-built to train deep learning models. Each Gaudi accelerator has 32 GB of high bandwidth memory (HBM2) and a peer-to-peer bidirectional bandwidth of 100 Gbps RoCE, for a total bidirectional interconnect bandwidth of 700 Gbps per card. Further instance specifications are as follows:

Instance Size vCPU Instance Memory (GiB) Gaudi Accelerators Network Bandwidth (Gbps) Total Accelerator Interconnect (Gbs) Local Instance Storage EBS Bandwidth (Gbps)
d1.24xlarge 96 768 8 4×100 Gbps 700 4x1TB NVMe 19

Instance Architecture

System architecture of the amazon ec2 dl1 instances.

As the preceding instance architecture indicates, pairs of Gaudi accelerators (e.g., Gaudi0 and Gaudi1) are attached directly through a PCIe Gen3x16 link. Additionally, peer-to-peer networking via 100 Gbps RoCEv2 links – with seven active links per card – provides a torus configuration with a total of 700 Gbps of interconnect bandwidth per card. This topology is a separate interconnect outside of the two NUMA domains. Furthermore, the instance supports four EFA ENIs and 4x1TB of local NVMe SSD storage. We will provide a peer-direct driver over EFA, which will let you utilize high throughput, low latency peer-direct networking between accelerators across multiple instances to efficiently scale multi-node distributed training workloads.

Quick Start

Quickly get started with DL1 and SynapseAI SDK through with the following options:

1) Habana Deep Learning AMIs provided by AWS.

2) AWS Marketplace AMIs provided by Habana.

3) Using Packer to build a custom Amazon Machine Images (AMI) provided by this GitHub repo. This repo also provides build scripts to create Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS) AMIs.

After selecting an AMI, launch a dl1.24xlarge instance in either us-east-1 or us-west-2. To help identify in which availability zone(s) dl1.24xlarge is available, run the following command:

aws ec2 describe-instance-type-offerings \
--location-type availability-zone \
--filters Name=instance-type,Values=dl1.24xlarge \
--region us-west-2 \
--output table

Once launched, you can connect to the instance over SSH (with the correct security group attached).

Habana Collectives Communication Library (HCL/HCCL)

As part of the Habana SynapseAI SDK, Habana Gaudi’s use the HCCL library for handling the collectives between HPUs. Get more information on HCCL here. On DL1 through the HCL-tests, we can confirm close to 700 Gbps (689 Gbps) per card for the collectives tested as follows.

You can confirm these tests by cloning the github repo here.

Habana DL1 HCCL tests.

Amazon EKS Quick Start

Support for DL1 on Amazon EKS is available today with Amazon EKS versions > 1.19. The following is a quick start to get up and running quickly with DL1.

The following dependencies will be needed:

eksctl – You need version 0.70.0+ of eksctl.
kubectl – You use Kubernetes version 1.20 in this post.

Create EKS cluster:

eksctl create cluster --region us-east-1 --without-nodegroup \
--vpc-public-subnets subnet-037d8e430963c2d3e,subnet-0abe898359a7d43e9

Nodegroup configuration – save the following codeblock to a file called dl1-managed-ng.yaml. Replace the AMI ID in the code block with the AMI created earlier.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: fabulous-rainbow-1635807811
  region: us-west-2

vpc:
  id: vpc-34f1894c
  subnets:
    public:
      endpoint-one:
        id: subnet-4532e73d
      endpoint-two:
        id: subnet-8f8b7dc5

managedNodeGroups:
  - name: dl1-ng-1d
    instanceType: dl1.24xlarge
    volumeSize: 200
    instancePrefix: dl1-ng-1d-worker
    ami: ami-072c632cbbc2255b3
    iam:
      withAddonPolicies:
        imageBuilder: true
        autoScaler: true
        ebs: true
        fsx: true
        cloudWatch: true
    ssh:
      allow: true
      publicKeyName: amrragab-aws
    subnets:
    - endpoint-one
    minSize: 1
    desiredCapacity: 1
    maxSize: 4
    overrideBootstrapCommand: |
      #!/bin/bash
      /etc/eks/bootstrap.sh fabulous-rainbow-1635807811

Create the managed nodegroup with the following command:

eksctl create nodegroup -f dl1-managed-ng.yaml

Once the nodegroup has been completed, you must apply the habana-k8s-device-plugin

kubectl create -f https://vault.habana.ai/artifactory/docker-k8s-device-plugin/habana-k8s-device-plugin.yaml

Once completed, you should see the Gaudi devices as an allocatable resource in your EKS
cluster, presenting 8 Gaudi accelerators per DL1 node in the cluster.

Allocatable:

attachable-volumes-aws-ebs: 39
cpu:                        95690m
ephemeral-storage:          192188443124
habana.ai/gaudi:            8
hugepages-1Gi:              0
hugepages-2Mi:              30000Mi
memory:                     753055132Ki
pods:                       15

Example Distributed Machine Learning (ML) Workloads

The following tables are examples of Mixed Precision/FP32 training results comparing DL1 to the common GPU instances used for ML training.

Model: ResNet50
Framework: TensorFlow 2
Dataset: Imagenet2012
GitHub: https://github.com/HabanaAI/Model-
References/tree/master/TensorFlow/computer_vision/Resnets/resnet_keras

Instance Type Batch Size
Mixed Precision Training Throughput (images/sec)
8x Gaudi – 32 GB (dl1.24xlarge) 256 13036
8x A100 – 40 GB (p4d.24xlarge) 256 17921
8x V100 – 32 GB (p3dn.24xlarge) 256 9685
8x V100 – 16GB (p3.16xlarge) 256 8945

Model: Bert Large – Pretraining
Framework: Pytorch 1.9
Dataset: Wikipedia/BooksCorpus
GitHub: https://github.com/HabanaAI/Model-References/tree/master/PyTorch/nlp/bert

Instance Type Batch Size
@128 Sequence
Length
Mixed Precision Training Throughput (seq/sec)
8x Gaudi – 32 GB (dl1.24xlarge) 256 1318
8x A100 – 40 GB (p4d.24xlarge) 8192 2979
8x V100 – 32 GB (p3dn.24xlarge) 8192 1458
8x V100 – 16GB (p3.16xlarge) 8192 1013

You can find a more comprehensive list of ML models supported with performance data here. Support for containers with TensorFlow and Pytorch are also available. Furthermore, you can stay up-to-date with the operator support for TensorFlow and Pytorch.

CONCLUSION

We are excited to innovate on behalf of our customers and provide a diverse choice in ML accelerators with DL1 instances. The DL1 instances powered by Gaudi accelerators can provide up to 40% better price performance for training deep learning models as compared to current generation GPU-based EC2 instances. DL1 instances use the Habana SynapseAI SDK with framework support in Pytorch and TensorFlow. Additional future support for EFA with peer direct HPUs across nodes will also be supported. Now it’s time to go power up your ML workloads with Amazon EC2 DL1 instances.

AWS Week in Review – May 2, 2022

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/aws-week-in-review-may-2-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Wow, May already! Here in the Pacific Northwest, spring is in full bloom and nature has emerged completely from her winter slumbers. It feels that way here at AWS, too, with a burst of new releases and updates and our in-person summits and other events now in full flow. Two weeks ago, we had the San Francisco summit; last week, we held the London summit and also our .NET Enterprise Developer Day virtual event in EMEA. This week we have the Madrid summit, with more summits and events to come in the weeks ahead. Be sure to check the events section at the end of this post for a summary and registration links.

Last week’s launches
Here are some of the launches and updates last week that caught my eye:

If you’re looking to reduce or eliminate the operational overhead of managing your Apache Kafka clusters, then the general availability of Amazon Managed Streaming for Apache Kafka (MSK) Serverless will be of interest. Starting with the original release of Amazon MSK in 2019, the work needed to set up, scale, and manage Apache Kafka has been reduced, requiring just minutes to create a cluster. With Amazon MSK Serverless, the provisioning, scaling, and management of the required resources is automated, eliminating the undifferentiated heavy-lift. As my colleague Marcia notes in her blog post, Amazon MSK Serverless is a perfect solution when getting started with a new Apache Kafka workload where you don’t know how much capacity you will need or your applications produce unpredictable or highly variable throughput and you don’t want to pay for idle capacity.

Another week, another set of Amazon Elastic Compute Cloud (Amazon EC2) instances! This time around, it’s new storage-optimized I4i instances based on the latest generation Intel Xeon Scalable (Ice Lake) Processors. These new instances are ideal for workloads that need minimal latency, and fast access to data held on local storage. Examples of these workloads include transactional databases such as MySQL, Oracle DB, and Microsoft SQL Server, as well as NoSQL databases including MongoDB, Couchbase, Aerospike, and Redis. Additionally, workloads that benefit from very high compute performance per TB of storage (for example, data analytics and search engines) are also an ideal target for these instance types, which offer up to 30 TB of AWS Nitro SSD storage.

Deploying AWS compute and storage services within telecommunications providers’ data centers, at the edge of the 5G networks, opens up interesting new possibilities for applications requiring end-to-end low latency (for example, delivery of high-resolution and high-fidelity live video streaming, and improved augmented/virtual reality (AR/VR) experiences). The first AWS Wavelength deployments started in the US in 2020, and have expanded to additional countries since. This week we announced the opening of the first Canadian AWS Wavelength zone, in Toronto.

Other AWS News
Some other launches and news items you may have missed:

Amazon Relational Database Service (RDS) had a busy week. I don’t have room to list them all, so below is just a subset of updates!

  • The addition of IPv6 support enables customers to simplify their networking stack. The increase in address space offered by IPv6 removes the need to manage overlapping address spaces in your Amazon Virtual Private Cloud (VPC)s. IPv6 addressing can be enabled on both new and existing RDS instances.
  • Customers in the Asia Pacific (Sydney) and Asia Pacific (Singapore) Regions now have the option to use Multi-AZ deployments to provide enhanced availability and durability for Amazon RDS DB instances, offering one primary and two readable standby database instances spanning three Availability Zones (AZs). These deployments benefit from up to 2x faster transaction commit latency, and automated fail overs, typically under 35 seconds.
  • Amazon RDS PostgreSQL users can now choose from General-Purpose M6i and Memory-Optimized R6i instance types. Both of these sixth-generation instance types are AWS Nitro System-based, delivering practically all of the compute and memory resources of the host hardware to your instances.
  • Applications using RDS Data API can now elect to receive SQL results as a simplified JSON string, making it easier to deserialize results to an object. Previously, the API returned a JSON string as an array of data type and value pairs, which required developers to write custom code to parse the response and extract the values, so as to translate the JSON string into an object. Applications that use the API to receive the previous JSON format are still supported and will continue to work unchanged.

Applications using Amazon Interactive Video Service (IVS), offering low-latency interactive video experiences, can now add a livestream chat feature, complete with built-in moderation, to help foster community participation in livestreams using Q&A discussions. The new chat support provides chat room resource management and a messaging API for sending, receiving, and moderating chat messages.

Amazon Polly now offers a new Neural Text-to-Speech (TTS) voice, Vitória, for Brazilian Portuguese. The original Vitória voice, dating back to 2016, used standard technology. The new voice offers a more natural-sounding rhythm, intonation, and sound articulation. In addition to Vitória, Polly also offers a second Brazilian Portuguese neural voice, Camila.

Finally, if you’re a .NET developer who’s modernizing .NET Framework applications to run in the cloud, then the announcement that the open-source CoreWCF project has reached its 1.0 release milestone may be of interest. AWS is a major contributor to the project, a port of Windows Communication Foundation (WCF), to run on modern cross-platform .NET versions (.NET Core 3.1, or .NET 5 or higher). This project benefits all .NET developers working on WCF applications, not just those on AWS. You can read more about the project in my blog post from last year, where I spoke with one of the contributing AWS developers. Congratulations to all concerned on reaching the 1.0 milestone!

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Upcoming AWS Events
As I mentioned earlier, the AWS Summits are in full flow, with some some virtual and in-person events in the very near future you may want to check out:

I’m also happy to share that I’ll be joining the AWS on Air crew at AWS Summit Washington, DC. This in-person event is coming up May 23–25. Be sure to tune in to the livestream for all the latest news from the event, and if you’re there in person feel free to come say hi!

Registration is also now open for re:MARS, our conference for topics related to machine learning, automation, robotics, and space. The conference will be in-person in Las Vegas, June 21–24.

That’s all the news I have room for this week — check back next Monday for another week in review!

— Steve

New – Storage-Optimized Amazon EC2 Instances (I4i) Powered by Intel Xeon Scalable (Ice Lake) Processors

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-storage-optimized-amazon-ec2-instances-i4i-powered-by-intel-xeon-scalable-ice-lake-processors/

Over the years we have released multiple generations of storage-optimized Amazon Elastic Compute Cloud (Amazon EC2) instances including the HS1 (2012) , D2 (2015), I2 (2013) , I3 (2017), I3en (2019), D3/D3en (2020), and Im4gn/Is4gen (2021). These instances are used to host high-performance real-time relational databases, distributed file systems, data warehouses, key-value stores, and more.

New I4i Instances
Today I am happy to introduce the new I4i instances, powered by the latest generation Intel Xeon Scalable (Ice Lake) Processors with an all-core turbo frequency of 3.5 GHz.

The instances offer up to 30 TB of NVMe storage using AWS Nitro SSD devices that are custom-built by AWS, and are designed to minimize latency and maximize transactions per second (TPS) on workloads that need very fast access to medium-sized datasets on local storage. This includes transactional databases such as MySQL, Oracle DB, and Microsoft SQL Server, as well as NoSQL databases: MongoDB, Couchbase, Aerospike, Redis, and the like. They are also an ideal fit for workloads that can benefit from very high compute performance per TB of storage such as data analytics and search engines.

Here are the specs:

Instance Name vCPUs
Memory (DDR4) Local NVMe Storage
(AWS Nitro SSD)
Sequential Read Throughput
(128 KB Blocks)
Bandwidth
EBS-Optimized
Network
i4i.large 2 16 GiB 468 GB 350 MB/s Up to 10 Gbps Up to 10 Gbps
i4i.xlarge 4 32 GiB 937 GB 700 MB/s Up to 10 Gbps Up to 10 Gbps
i4i.2xlarge 8 64 GiB 1,875 GB 1,400 MB/s Up to 10 Gbps Up to 12 Gbps
i4i.4xlarge 16 128 GiB 3,750 GB 2,800 MB/s Up to 10 Gbps Up to 25 Gbps
i4i.8xlarge 32 256 GiB 7,500 GB
(2 x 3,750 GB)
5,600 MB/s 10 Gbps 18.75 Gbps
i4i.16xlarge 64 512 GiB 15,000 GB
(4 x 3,750 GB)
11,200 MB/s 20 Gbps 37.5 Gbps
i4i.32xlarge 128 1024 GiB 30,000 GB
(8 x 3,750 GB)
22,400 MB/s 40 Gbps 75 Gbps

In comparison to the Xen-based I3 instances, the Nitro-powered I4i instances give you:

  • Up to 60% lower storage I/O latency, along with up to 75% lower storage I/O latency variability.
  • A new, larger instance size (i4i.32xlarge).
  • Up to 30% better compute price/performance.

The i4i.16xlarge and i4.32xlarge instances give you control over C-states, and the i4i.32xlarge instances support non-uniform memory access (NUMA). All of the instances support AVX-512, and use Intel Total Memory Encryption (TME) to deliver always-on memory encryption.

From Our Customers
AWS customers and AWS service teams have been putting these new instances to the test ahead of today’s launch. Here’s what they had to say:

Redis Enterprises powers mission-critical applications for over 8,000 organizations. According to Yiftach Shoolman (Co-Founder and CTO of Redis):

We are thrilled with the performance we are seeing from the Amazon EC2 I4i instances which use the new low latency AWS Nitro SSDs. Our testing shows I4i instances delivering an astonishing 2.9x higher query throughput than the previous generation I3 instances. We have also tested with various read and write mixes, and observed consistent and linearly scaling performance.

ScyllaDB is a high performance NoSQL database that can take advantage of high performance cloud computing instances.
Avi Kivity (Co-Founder and CTO of ScyllaDB) told us:


When we tested I4i instances, we observed up to 2.7x increase in throughput per vCPU compared to I3 instances for reads. With an even mix of reads and writes, we observed 2.2x higher throughput per vCPU, with a 40% reduction in average latency than I3 instances. We are excited for the incredible performance and value these new instances will enable for our customers going forward.

Amazon QuickSight is a business intelligence service. After testing,
Tracy Daugherty (General Manager, Amazon Quicksight) reported that:

I4i instances have demonstrated superior performance over previous generation I instances, with a 30% improvement across operations. We look forward to using I4i to further elevate performance for our customers.

Available Now

You can launch I4i instances today in the AWS US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions (with more to come) in On-Demand and Spot form. Savings Plans and Reserved Instances are available, as are Dedicated Instances and Dedicated Hosts.

In order to take advantage of the performance benefits of these new instances, be sure to use recent AMIs that include current ENA drivers and support for NVMe 1.4.

To learn more, visit the I4i instance home page.

Jeff;

Develop and test AWS Glue version 3.0 jobs locally using a Docker container

Post Syndicated from Subramanya Vajiraya original https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/

AWS Glue is a fully managed serverless service that allows you to process data coming through different data sources at scale. You can use AWS Glue jobs for various use cases such as data ingestion, preprocessing, enrichment, and data integration from different data sources. AWS Glue version 3.0, the latest version of AWS Glue Spark jobs, provides a performance-optimized Apache Spark 3.1 runtime experience for batch and stream processing.

You can author AWS Glue jobs in different ways. If you prefer coding, AWS Glue allows you to write Python/Scala source code with the AWS Glue ETL library. If you prefer interactive scripting, AWS Glue interactive sessions and AWS Glue Studio notebooks helps you to write scripts in notebooks by inspecting and visualizing the data. If you prefer a graphical interface rather than coding, AWS Glue Studio helps you author data integration jobs visually without writing code.

For a production-ready data platform, a development process and CI/CD pipeline for AWS Glue jobs is key. We understand the huge demand for developing and testing AWS Glue jobs where you prefer to have flexibility, a local laptop, a Docker container on Amazon Elastic Compute Cloud (Amazon EC2), and so on. You can achieve that by using AWS Glue Docker images hosted on Docker Hub or the Amazon Elastic Container Registry (Amazon ECR) Public Gallery. The Docker images help you set up your development environment with additional utilities. You can use your preferred IDE, notebook, or REPL using the AWS Glue ETL library.

This post is a continuation of blog post “Developing AWS Glue ETL jobs locally using a container“. While the earlier post introduced the pattern of development for AWS Glue ETL Jobs on a Docker container using a Docker image, this post focuses on how to develop and test AWS Glue version 3.0 jobs using the same approach.

Solution overview

The following Docker images are available for AWS Glue on Docker Hub:

  • AWS Glue version 3.0amazon/aws-glue-libs:glue_libs_3.0.0_image_01
  • AWS Glue version 2.0amazon/aws-glue-libs:glue_libs_2.0.0_image_01

You can also obtain the images from the Amazon ECR Public Gallery:

  • AWS Glue version 3.0public.ecr.aws/glue/aws-glue-libs:glue_libs_3.0.0_image_01
  • AWS Glue version 2.0public.ecr.aws/glue/aws-glue-libs:glue_libs_2.0.0_image_01

Note: AWS Glue Docker images are x86_64 compatible and arm64 hosts are currently not supported.

In this post, we use amazon/aws-glue-libs:glue_libs_3.0.0_image_01 and run the container on a local machine (Mac, Windows, or Linux). This container image has been tested for AWS Glue version 3.0 Spark jobs. The image contains the following:

  • Amazon Linux
  • AWS Glue ETL Library (aws-glue-libs)
  • Apache Spark 3.1.1
  • Spark history server
  • JupyterLab
  • Livy
  • Other library dependencies (the same as the ones of the AWS Glue job system)

To set up your container, you pull the image from Docker Hub and then run the container. We demonstrate how to run your container with the following methods, depending on your requirements:

  • spark-submit
  • REPL shell (pyspark)
  • pytest
  • JupyterLab
  • Visual Studio Code

Prerequisites

Before you start, make sure that Docker is installed and the Docker daemon is running. For installation instructions, see the Docker documentation for Mac, Windows, or Linux. Also make sure that you have at least 7 GB of disk space for the image on the host running Docker.

For more information about restrictions when developing AWS Glue code locally, see Local Development Restrictions.

Configure AWS credentials

To enable AWS API calls from the container, set up your AWS credentials with the following steps:

  1. Create an AWS named profile.
  2. Open cmd on Windows or a terminal on Mac/Linux, and run the following command:
    PROFILE_NAME="profile_name"

In the following sections, we use this AWS named profile.

Pull the image from Docker Hub

If you’re running Docker on Windows, choose the Docker icon (right-click) and choose Switch to Linux containers… before pulling the image.

Run the following command to pull the image from Docker Hub:

docker pull amazon/aws-glue-libs:glue_libs_3.0.0_image_01

Run the container

Now you can run a container using this image. You can choose any of following methods based on your requirements.

spark-submit

You can run an AWS Glue job script by running the spark-submit command on the container.

Write your ETL script (sample.py in the example below) and save it under the /local_path_to_workspace/src/ directory using the following commands:

$ WORKSPACE_LOCATION=/local_path_to_workspace
$ SCRIPT_FILE_NAME=sample.py
$ mkdir -p ${WORKSPACE_LOCATION}/src
$ vim ${WORKSPACE_LOCATION}/src/${SCRIPT_FILE_NAME}

These variables are used in the docker run command below. The sample code (sample.py) used in the spark-submit command below is included in the appendix at the end of this post.

Run the following command to run the spark-submit command on the container to submit a new Spark application:

$ docker run -it -v ~/.aws:/home/glue_user/.aws -v $WORKSPACE_LOCATION:/home/glue_user/workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 --name glue_spark_submit amazon/aws-glue-libs:glue_libs_3.0.0_image_01 spark-submit /home/glue_user/workspace/src/$SCRIPT_FILE_NAME
...22/01/26 09:08:55 INFO DAGScheduler: Job 0 finished: fromRDD at DynamicFrame.scala:305, took 3.639886 s
root
|-- family_name: string
|-- name: string
|-- links: array
| |-- element: struct
| | |-- note: string
| | |-- url: string
|-- gender: string
|-- image: string
|-- identifiers: array
| |-- element: struct
| | |-- scheme: string
| | |-- identifier: string
|-- other_names: array
| |-- element: struct
| | |-- lang: string
| | |-- note: string
| | |-- name: string
|-- sort_name: string
|-- images: array
| |-- element: struct
| | |-- url: string
|-- given_name: string
|-- birth_date: string
|-- id: string
|-- contact_details: array
| |-- element: struct
| | |-- type: string
| | |-- value: string
|-- death_date: string

...

REPL shell (pyspark)

You can run a REPL (read-eval-print loop) shell for interactive development. Run the following command to run the pyspark command on the container to start the REPL shell:

$ docker run -it -v ~/.aws:/home/glue_user/.aws -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 --name glue_pyspark amazon/aws-glue-libs:glue_libs_3.0.0_image_01 pyspark
...
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /__ / .__/\_,_/_/ /_/\_\  version 3.1.1-amzn-0
 /_/

Using Python version 3.7.10 (default, Jun 3 2021 00:02:01)
Spark context Web UI available at http://56e99d000c99:4040
Spark context available as 'sc' (master = local[*], app id = local-1643011860812).
SparkSession available as 'spark'.
>>> 

pytest

For unit testing, you can use pytest for AWS Glue Spark job scripts.

Run the following commands for preparation:

$ WORKSPACE_LOCATION=/local_path_to_workspace
$ SCRIPT_FILE_NAME=sample.py
$ UNIT_TEST_FILE_NAME=test_sample.py
$ mkdir -p ${WORKSPACE_LOCATION}/tests
$ vim ${WORKSPACE_LOCATION}/tests/${UNIT_TEST_FILE_NAME}

Run the following command to run pytest on the test suite:

$ docker run -it -v ~/.aws:/home/glue_user/.aws -v $WORKSPACE_LOCATION:/home/glue_user/workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 --name glue_pytest amazon/aws-glue-libs:glue_libs_3.0.0_image_01 -c "python3 -m pytest"
starting org.apache.spark.deploy.history.HistoryServer, logging to /home/glue_user/spark/logs/spark-glue_user-org.apache.spark.deploy.history.HistoryServer-1-5168f209bd78.out
============================================================= test session starts =============================================================
platform linux -- Python 3.7.10, pytest-6.2.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/glue_user/workspace
plugins: anyio-3.4.0
collected 1 item  

tests/test_sample.py . [100%]

============================================================== warnings summary ===============================================================
tests/test_sample.py::test_counts
 /home/glue_user/spark/python/pyspark/sql/context.py:79: DeprecationWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
 DeprecationWarning)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
======================================================== 1 passed, 1 warning in 21.07s ========================================================

JupyterLab

You can start Jupyter for interactive development and ad hoc queries on notebooks. Complete the following steps:

  1. Run the following command to start JupyterLab:
    $ JUPYTER_WORKSPACE_LOCATION=/local_path_to_workspace/jupyter_workspace/
    $ docker run -it -v ~/.aws:/home/glue_user/.aws -v $JUPYTER_WORKSPACE_LOCATION:/home/glue_user/workspace/jupyter_workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p 8888:8888 --name glue_jupyter_lab amazon/aws-glue-libs:glue_libs_3.0.0_image_01 /home/glue_user/jupyter/jupyter_start.sh
    ...
    [I 2022-01-24 08:19:21.368 ServerApp] Serving notebooks from local directory: /home/glue_user/workspace/jupyter_workspace
    [I 2022-01-24 08:19:21.368 ServerApp] Jupyter Server 1.13.1 is running at:
    [I 2022-01-24 08:19:21.368 ServerApp] http://faa541f8f99f:8888/lab
    [I 2022-01-24 08:19:21.368 ServerApp] or http://127.0.0.1:8888/lab
    [I 2022-01-24 08:19:21.368 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

  2. Open http://127.0.0.1:8888/lab in your web browser in your local machine to access the JupyterLab UI.
  3. Choose Glue Spark Local (PySpark) under Notebook.

Now you can start developing code in the interactive Jupyter notebook UI.

Visual Studio Code

To set up the container with Visual Studio Code, complete the following steps:

  1. Install Visual Studio Code.
  2. Install Python.
  3. Install Visual Studio Code Remote – Containers.
  4. Open the workspace folder in Visual Studio Code.
  5. Choose Settings.
  6. Choose Workspace.
  7. Choose Open Settings (JSON).
  8. Enter the following JSON and save it:
    {
        "python.defaultInterpreterPath": "/usr/bin/python3",
        "python.analysis.extraPaths": [
            "/home/glue_user/aws-glue-libs/PyGlue.zip:/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip:/home/glue_user/spark/python/",
        ]
    }

Now you’re ready to set up the container.

  1. Run the Docker container:
    $ docker run -it -v ~/.aws:/home/glue_user/.aws -v $WORKSPACE_LOCATION:/home/glue_user/workspace/ -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 --name glue_pyspark amazon/aws-glue-libs:glue_libs_3.0.0_image_01 pyspark

  2. Start Visual Studio Code.
  3. Choose Remote Explorer in the navigation pane, and choose the container amazon/aws-glue-libs:glue_libs_3.0.0_image_01.
  4. Right-click and choose Attach to Container.
  5. If the following dialog appears, choose Got it.
  6. Open /home/glue_user/workspace/.
  7. Create an AWS Glue PySpark script and choose Run.

You should see the successful run on the AWS Glue PySpark script.

Conclusion

In this post, we learned how to get started on AWS Glue Docker images. AWS Glue Docker images help you develop and test your AWS Glue job scripts anywhere you prefer. It is available on Docker Hub and Amazon ECR Public Gallery. Check it out, we look forward to getting your feedback.

Appendix: AWS Glue job sample codes for testing

This appendix introduces three different scripts as AWS Glue job sample codes for testing purposes. You can use any of them in the tutorial.

The following sample.py code uses the AWS Glue ETL library with an Amazon Simple Storage Service (Amazon S3) API call. The code requires Amazon S3 permissions in AWS Identity and Access Management (IAM). You need to grant the IAM-managed policy arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess or IAM custom policy that allows you to make ListBucket and GetObject API calls for the S3 path.

import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions


class GluePythonSampleTest:
    def __init__(self):
        params = []
        if '--JOB_NAME' in sys.argv:
            params.append('JOB_NAME')
        args = getResolvedOptions(sys.argv, params)

        self.context = GlueContext(SparkContext.getOrCreate())
        self.job = Job(self.context)

        if 'JOB_NAME' in args:
            jobname = args['JOB_NAME']
        else:
            jobname = "test"
        self.job.init(jobname, args)

    def run(self):
        dyf = read_json(self.context, "s3://awsglue-datasets/examples/us-legislators/all/persons.json")
        dyf.printSchema()

        self.job.commit()


def read_json(glue_context, path):
    dynamicframe = glue_context.create_dynamic_frame.from_options(
        connection_type='s3',
        connection_options={
            'paths': [path],
            'recurse': True
        },
        format='json'
    )
    return dynamicframe


if __name__ == '__main__':
    GluePythonSampleTest().run()z
	

The following test_sample.py code is a sample for a unit test of sample.py:

import pytest
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions
import sys
from src import sample


@pytest.fixture(scope="module", autouse=True)
def glue_context():
    sys.argv.append('--JOB_NAME')
    sys.argv.append('test_count')

    args = getResolvedOptions(sys.argv, ['JOB_NAME'])
    context = GlueContext(SparkContext.getOrCreate())
    job = Job(context)
    job.init(args['JOB_NAME'], args)

    yield(context)

    job.commit()


def test_counts(glue_context):
    dyf = sample.read_json(glue_context, "s3://awsglue-datasets/examples/us-legislators/all/persons.json")
    assert dyf.toDF().count() == 1961

About the Authors

Subramanya Vajiraya is a Cloud Engineer (ETL) at AWS Sydney specialized in AWS Glue. He is passionate about helping customers solve issues related to their ETL workload and implement scalable data processing and analytics pipelines on AWS. Outside of work, he enjoys going on bike rides and taking long walks with his dog Ollie, a 1-year-old Corgi.

Vishal Pathak is a Data Lab Solutions Architect at AWS. Vishal works with customers on their use cases, architects solutions to solve their business problems, and helps them build scalable prototypes. Prior to his journey in AWS, Vishal helped customers implement business intelligence, data warehouse, and data lake projects in the US and Australia.

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He enjoys learning different use cases from customers and sharing knowledge about big data technologies with the wider community.

How to re-platform and modernize Java web applications on AWS

Post Syndicated from Rick Armstrong original https://aws.amazon.com/blogs/compute/re-platform-java-web-applications-on-aws/

This post is written by: Bill Chan, Enterprise Solutions Architect

According to a report from Grand View Research, “the global application server market size was valued at USD 15.84 billion in 2020 and is expected to expand at a compound annual growth rate (CAGR) of 13.2% from 2021 to 2028.” The report also suggests that Java based application servers “accounted for the largest share of around 50% in 2020.” This means that many organizations continue to rely on Java application server capabilities to deliver middleware services that underpin the web applications running their transactional, content management and business process workloads.

The maturity of the application server technology also means that many of these web applications were built on traditional three-tier web architectures running in on-premises data centers. And as organizations embark on their journey to cloud, the question arises as to what is the best approach to migrate these applications?

There are seven common migration strategies when moving applications to the cloud, including:

  • Retain – keeping applications running as is and revisiting the migration at a later stage
  • Retire – decommissioning applications that are no longer required
  • Repurchase – switching from existing applications to a software-as-a-service (SaaS) solution
  • Rehost – moving applications as is (lift and shift), without making any changes to take advantage of cloud capabilities
  • Relocate – moving applications as is, but at a hypervisor level
  • Replatform – moving applications as is, but introduce capabilities that take advantage of cloud-native features
  • Refactor – re-architect the application to take full advantage of cloud-native features

Refer to Migrating to AWS: Best Practices & Strategies and the 6 Strategies for Migrating Applications to the Cloud for more details.

This blog focuses on the ‘replatform’ strategy, which suits customers who have large investments in application server technologies and the business case for re-architecting doesn’t stack up. By re-platforming their applications into the cloud, customers can benefit from the flexibility of a ‘pay-as-you-go’ model, dynamically scale to meet demand and provision infrastructure as code. Additionally, customers can increase the speed and agility to modernize existing applications and build new cloud-native applications to deliver better customer experiences.

In this post, we walk through the steps to replatform a simple contact management Java application running on an open-source Tomcat application server, along with modernization aspects that include:

  • Deploying a Tomcat web application with automatic scaling capabilities
  • Integrating Tomcat with Redis cache (using Redisson Session Manager for Tomcat)
  • Integrating Tomcat with Amazon Cognito for authentication (using Boyle Software’s OpenID Connect Authenticator for Tomcat)
  • Delegating user log in and sign up to Amazon Cognito

Overview of solution

Solution architecture overview diagram

The solution is comprised of the following components:

  • A VPC across two Availability Zones
  • Two public subnets, two private app subnets, and two private DB subnets
  • An Internet Gateway attached to the VPC
    • A public route table routing internet traffic to the Internet Gateway
    • Two private route tables routing traffic internally within the VPC
  • A frontend web server application Elastic Load Balancing that routes traffic to the Apache Web Servers
  • An Auto Scaling group that launches additional Apache Web Servers based on defined scaling policies. Each instance of the web server is based on a launch template, which defines the same configuration for each new web server.
  • A hosted zone in Amazon Route 53 with a domain name that routes to the frontend web server Elastic Load Balancing
  • An application Elastic Load Balancing that routes traffic to the Tomcat application servers
  • An Auto Scaling group that launches additional Tomcat Application Servers based on defined scaling policies. Each instance of the Tomcat application server is based on a launch template, which defines the same configuration and software components for each new application server
  • A Redis cache cluster with a primary and replica node to store session data after the user has authenticated, making your application servers stateless
  • A Redis open-source Java client, with a Tomcat Session Manager implementation to store authenticated user session data in Redis cache
  • A MySQL Amazon Relational Database Service (Amazon RDS) Multi-AZ deployment for MySQL RDS to store the contact management and role access tables
  • An Amazon Simple Storage Service (Amazon S3) bucket to store the application and framework artifacts, images, scripts and configuration files that are referenced by any new Tomcat application server instances provisioned by automatic scaling
  • Amazon Cognito with a sign-up Lambda function to register users and insert a corresponding entry in the user account tables. Cognito acts as an identity provider and performs the user authentication using an OpenID Connect Authenticator Java component

Walkthrough

The following steps overviews how to deploy the blog solution:

  • Clone and build the Sample Web Application and AWS Signup Lambda Maven projects from GitHub repository
  • Deploy the CloudFormation template (java-webapp-infra.yaml) to create the AWS networking infrastructure and the CloudFormation template (java-webapp-rds.yaml) to create the database instance
  • Update and build the sample web application and signup Lambda function
  • Upload the packages into your S3 bucket
  • Deploy the CloudFormation template (java-webapp-components.yaml) to create the blog solution components
  • Update the solution configuration files and upload them into your S3 bucket
  • Run a script to provision the underlying database tables
  • Validate the web application, session cache and automatic scaling functionality
  • Clean up resources

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • An Amazon Elastic Compute Cloud (Amazon EC2) key pair (required for authentication). For more details, see Amazon EC2 key pairs
  • A Java Integrated Development Environment (IDE) such as Eclipse or NetBeans. AWS also offers a cloud-based IDE that lets you write, run and debug code in your browser without having to install files or configure your development machine, called AWS Cloud9. I will show how AWS Cloud9 can be used as part of a DevOps solution in a subsequent post
  • A valid domain name and SSL certificate for the deployed web application. To validate the OAuth 2.0 integration, Cognito requires the URL that the user is redirected to after successful sign-in to be HTTPS. Refer to a configuring a user pool app client for more details
  • Downloaded the following JARs:

Note: the solution was validated in the preceding versions and therefore, the launch template created for the CloudFormation solution stack refers to these specific JARs. If you decide to use different versions, then the ‘java-webapp-components.yaml’ will need to be updated to reflect the new versions. Alternatively, you can externalize the parameters in the template.

Clone the GitHub repository to your local machine

This repository contains the sample code for the Java web application and post confirmation sign-up Lambda function. It also contains the CloudFormation templates required to set up the AWS infrastructure, SQL script to create the supporting database and configuration files for the web server, Tomcat application server and Redis cache.

Deploy infrastructure CloudFormation template

  1. Log in to the AWS Management Console and open the CloudFormation service.

Diagram showing the first step in creating a CloudFormation stack.

2. Create the infrastructure stack using the java-webapp-infra.yaml template (located in the ‘config’ directory of the repo).

3. Infrastructure stack outputs:

Diagram showing the outputs generated from the infrastructure stack creation

Deploy database CloudFormation template

  1.  Log in to the AWS Management Console and open the CloudFormation service.
  2. Create the infrastructure stack using the java-webapp-rds.yaml template (located in the ‘config’ directory of the repo).
  3. Database stack outputs.

Diagram showing the outputs generated from the relational database service stack creation

Update and build sample web application and signup Lambda function

  1. Import the ‘sample-webapp’ and ‘aws-signup-lambda’ Maven projects from the repository into your IDE.
  2. Update the sample-webapp’s UserDAO class to reflect the RDSEndpoint, DBUserName, and DBPassword from the previous step:”
    // Externalize and update jdbcURL, jdbcUsername, jdbcPassword parameters specific to your environment
    	private String jdbcURL = "jdbc:mysql://<RDSEndpoint>:3306/webappdb?useSSL=false";
    	private String jdbcUsername = "<DBUserName>";
    	private String jdbcPassword = "<DBPassword>";

  3. To build the ‘sample-webapp’ Maven project, use the standard ‘clean install’ goals.
  4. Update the aws-signup-lambda’s signupHandler class to reflect RDSEndpoint, DBUserName, and DBPassword from the solution stack:
    // Update with your database connection details
    		String jdbcURL = "jdbc:mysql://<RDSEndpoint>:3306/webappdb?useSSL=false";
    		String jdbcUsername = "<DBUserName>";
    		String jdbcPassword = "<DBPassword>";

  5. To build the aws-signup-lambda Maven project, use the ‘package shade:shade’ goals to include all dependencies in the package.
  6. Two packages are created in their respective target directory: ‘sample-webapp.war’ and ‘create-user-lambda-1.0.jar’

Upload the packages into your S3 bucket

  1. Log in to the AWS Management Console and open the S3 service.
  2. Select the bucket created by the infrastructure CloudFormation template in an earlier step.

Diagram showing the S3 bucket interface with no objects.

3.  Create a ‘config’ and ‘lib’ folder in the bucket.

Diagram showing the S3 bucket interface with the new folders.

4.  Upload the ‘sample-webapp.war’ and ‘create-user-lambda-1.0.jar’ created an earlier step (along with the downloaded packages from the pre-requisites section) into the ‘lib’ folder of the bucket. The ‘lib’ folder should look like this:

Diagram showing the S3 bucket interface and objects in the lib folder

Note: the solution was validated in the preceding versions and therefore, the launch template created for the CloudFormation solution stack refers to these specific package names.

Deploy the solution components CloudFormation template

1.       Log in to the AWS Management Console and open the CloudFormation service (if you aren’t already logged in from the previous step).

2.       Create the web application solution stack using the ‘java-webapp-components.yaml’ template (located in the ‘config’ directory of the repo).

3.       Guidance on the different template input parameters:

a.       BastionSGSource – default is 0.0.0.0/0, but it is recommended to restrict this to your allowed IPv4 CIDR range for additional security

b.       BucketName – the bucket name created as part of the infrastructure stack. This blog uses the bucket name is ‘chanbi-java-webapp-bucket’

c.       CallbackURL – the URL that the user is redirected to after successful sign up/sign in is composed of your domain name (blog.example.com), the application root (sample-webapp), and the authentication form action ‘j_security_check’. As noted earlier, this needs to be over HTTPS

d.       CreateUserLambdaKey – the S3 object key for the signup Lambda package. This blog uses the key ‘lib/create-user-lambda-1.0.jar’

e.       DBUserName – the database user name for the MySQL RDS. Make note of this as it will be required in a subsequent step

f.        DBUserPassword – the database user password. Make note of this as it will be required in a subsequent step

g.       KeyPairName – the key pair to use when provisioning the EC2 instances. This key pair was created in the pre-requisite step

h.       WebALBSGSource – the IPv4 CIDR range allowed to access the web app. Default is 0.0.0.0/0

i.         The remaining parameters are import names from the infrastructure stack. Use default settings

4.       After successful stack creation, you should see the following java web application solution stack output:

 Diagram showing the outputs generated from the solution components stack creation.

Update configuration files

  1. The GitHub repository’s ‘config’ folder contains the configuration files for the web server, Tomcat application server and Redis cache, which needs to be updated to reflect the parameters specific to your stack output in the previous step.
  2. Update the virtual hosts in ‘httpd.conf’ to proxy web traffic to the internal app load balancer. Use the value defined by the key ‘AppALBLoadBalancerDNS’ from the stack output.
    <VirtualHost *:80>
    ProxyPass / http://<AppALBLoadBalancerDNS>:8080/
    ProxyPassReverse / http://<AppALBLoadBalancerDNS>:8080/
    </VirtualHost>

  3. Update JDBC resource for the ‘webappdb’ in the ‘context.xml, with the values defined by the RDSEndpoint, DBUserName, and DBPassword from the solution components CloudFormation stack:
    <Resource name="jdbc/webappdb" auth="Container" type="javax.sql.DataSource"
                   maxTotal="100" maxIdle="30" maxWaitMillis="10000"
                   username="<DBUserName>" password="<DBPassword>" driverClassName="com.mysql.jdbc.Driver"
                   url="jdbc:mysql://<RDSEndpoint>:3306/webappdb"/>

  4. Log in to the AWS Management Console and open the Amazon Cognito service. Select ‘Manage User Pools’ and you will notice that a ‘java-webapp-pool’ has been created by the solution components CloudFormation stack. Select the ‘java-webapp-pool’ and make note of the ‘Pool Id’, ‘App client id’ and ‘App client secret’.

Diagram showing the Cognito User Pool interface general settings

Diagram showing the Cognito User Pool interface app client settings

5.  Update ‘Valve’ configuration in the ‘context.xml’, with the ‘Pool Id’, ‘App client id’ and ‘App client secret’ values from the previous step. The Cognito IDP endpoint specific to your Region can be found here. The host base URI needs to be replaced with the domain for your web application.

    <Valve className="org.bsworks.catalina.authenticator.oidc.tomcat90.OpenIDConnectAuthenticator"
       providers="[
           {
               name: 'Amazon Cognito',
               issuer: https://<cognito-idp-endpoint-for-you-region>/<cognito-pool-id>,
               clientId: <user-pool-app-client-id>,
               clientSecret: <user-pool-app-client-secret>
           }
       ]"
        hostBaseURI="https://<your-sample-webapp-domain>" usernameClaim="email" />

6.  Update the ‘address’ parameter in ‘redisson.yaml’ with Redis cluster endpoint. Use the value defined by the key ‘RedisClusterEndpoint’ from the solution components CloudFormation stack output.

singleServerConfig:
    address: "redis://<RedisClusterEndpoint>:6379"

7.  No updates are required to the following files:

a.  server.xml – defines a data source realm for the user names, passwords, and roles assigned to users

      <Realm className="org.apache.catalina.realm.DataSourceRealm"
   dataSourceName="jdbc/webappdb" localDataSource="true"
   userTable="user_accounts" userNameCol="user_name" userCredCol="user_pass"
   userRoleTable="user_account_roles" roleNameCol="role_name" debug="9" />
      </Realm>

b.  tomcat.service – allows Tomcat to run as a service

c.  uninstall-sample-webapp.sh – removes the sample web application

Upload configuration files into your S3 bucket

  1. Upload the configuration files from the previous step into the ‘config’ folder of the bucket. The ‘config’ folder should look like this:

Diagram showing the S3 bucket interface and objects in the config folder

Update the Auto Scaling groups

  1. Auto Scaling groups manage the provisioning and removal of the web and application instances in our solution. To start an instance of the web server, update the Auto Scaling group’s desired capacity (1), minimum capacity (1) and maximum capacity (2) as shown in the following image:

Diagram showing the web server auto scaling group interface and group details.

2.  To start an instance of the application server, update the Auto Scaling group’s desired capacity (1), minimum capacity (1) and maximum capacity (2) for as shown in the following image:

Diagram showing the web server auto scaling group interface and group details.

The web and application scaling groups will show a status of “Updating capacity” (as shown in the following image) as the instances start up.

Diagram showing the auto scaling groups interface and updating capacity status.

After web and application servers have started, an instance will appear under ‘Instance management’ with a ‘Healthy’ status for each Auto Scaling group (as shown in the following image).

Diagram showing the web server auto scaling group interface and instance status

Diagram showing the application server auto scaling group interface and instance status

Run the database script webappdb_create_tables.sql

  1. The database script creates the database and underlying tables required by the web application. As the database server resides in the DB private subnet and is only accessible from the application server instance, we need to first connect (via SSH) to the bastion host (using public IPv4 DNS), and from there we can connect (via SSH) to the application server instance (using its private IPv4 address). This will in turn allow us to connect to the database instance and run the database script. Refer to connecting to your Linux instance using SSH for more details. Instance details are located under the ‘Instances’ view (as shown in the following image).

Diagram showing the instances interface and the running instances for the VPC

2.  Transfer the database script webappdb_create_tables.sql to the application server instance via the Bastion Host. Refer to transferring files using a client for details.

3.  Once connected to the application server via SSH, execute the command to connect to the database instance:

mysql -h <RDSEndpoint> -P 3306 -u <DBUserName> -p

4. Enter the DB user password used when creating the database instance. You will be presented with the MySQL prompt after successful login:

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 300
Server version: 8.0.23 Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]>

5. Run the command to run the database script webappdb_create_tables.sql:

source /home/ec2-user/webappdb_create_tables.sql

Add an HTTPS listener to the external web load balancer

  1. Log in to the AWS Management Console and select Load Balancers as part of the EC2 service
    Diagram showing the load balancer interface
  2. Add a HTTPS listener on port 443 for the web load balancer. The default action for the listener is to forward traffic to the web instance target group. Refer to create an HTTPS listener for your Application Load Balancer for more details.
    Diagram showing the load balancer add listener interface

Reference the SSL certificate for your domain. In the following example, I have used a certificate from AWS Certificate Manager (ACM) for my domain. You also have the option of using a certificate from Identity Access Management or importing your own certificate.

Diagram showing the secure listener settings interface

Update your DNS to route traffic from the external web load balancer

  1. In this example, I use Amazon Route 53 as the Domain Name Server (DNS) service, but the steps will be similar when using your own DNS service.
  2. Create an A record type that routes traffic from your domain name to the external web load balancer. For more details, refer to creating records by using the Amazon Route 53 console.
    Diagram showing the hosted zone interface

Validate the web application

  1. In your browser, access the following https://<yourdomain.example.com>/sample-webapp
    Diagram showing the log in page for the sample web application.
  2. Select “Amazon Cognito” to authenticate using Cognito as the Identity Provider (IdP). You will be redirected to the login page for your Cognito domain.
    Diagram showing the sign in page provided by Amazon Cognito
  3. Select the “Sign up” to create a new user and enter your email and password. Note the password strength requirements that can be configured as part of the user pool’s policies.
    Diagram showing the sign up page provided by Amazon Cognito
  4. An email with the verification code will be sent to the sign-up email address. Enter the code on the verification code screen.
    Diagram showing the account confirmation page with verification code provided by Amazon Cognito
  5. After successful confirmation, you will be re-directed to the authenticated landing page for the web application.
    Diagram showing the main page with the list of contacts for the sample web application.
  6. The simple web application allows you to add, edit, and delete contacts as shown in the following image.
    Diagram showing the list of contacts for the sample web application with edit and delete functionality.

Validate the session data on Redis

  1. Follow the steps outlined in connecting to nodes for details on connecting to your Redis cache cluster. You will need to connect to your application server instance (via the bastion host) to perform this as the Redis cache is only accessible from the private subnet.
  2. After successfully installing the Redis client, search for your authenticated user session key in the cluster by running the command (from within the ‘redis-stable’ directory):
    src/redis-cli -c -h <RedisClusterEndpoint> -p 6379 -–bigkeys

  3. You should see an output with your Tomcat authenticated session (if you can’t, perform another login via the Cognito login screen):
    # Scanning the entire keyspace to find biggest keys as well as
    # average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
    # per 100 SCAN commands (not usually needed).
    
    [00.00%] Biggest hash   found so far '"redisson:tomcat_session:AE647D93F2BECEFEE07B5B42C435E3DE"' with 8 fields

  4. Connect to the cache cluster:
    # src/redis-cli -c -h <RedisClusterEndpoint> -p 6379

  5. Run the HGETALL command to get the session details:
    java-webapp-redis-cluster.<xxxxxx>.0001.apse2.cache.amazonaws.com:6379> HGETALL "redisson:tomcat_session:AE647D93F2BECEFEE07B5B42C435E3DE"
     1) "session:creationTime"
     2) "\x04L\x00\x00\x01}\x16\x92\x1bX"
     3) "session:lastAccessedTime"
     4) "\x04L\x00\x00\x01}\x16\x92%\x9c"
     5) "session:principal"
     6) "\x04\x04\t>@org.apache.catalina.realm.GenericPrincipal$SerializablePrincipal\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x04>\x04name\x16\x00>\bpassword\x16\x00>\tprincipal\x16\x00>\x05roles\x16\x00\x16>\[email protected]>\[email protected]\x01B\x01\x14>\bstandard"
     7) "session:maxInactiveInterval"
     8) "\x04K\x00\x00\a\b"
     9) "session:isValid"
    10) "\x04P"
    11) "session:authtype"
    12) "\x04>\x04FORM"
    13) "session:isNew"
    14) "\x04Q"
    15) "session:thisAccessedTime"
    16) "\x04L\x00\x00\x01}\x16\x92%\x9c"

Scale your web and application server instances

  1. Amazon EC2 Auto Scaling provides several ways for you to scale instances in your Auto Scaling group such as scaling manually as we did in an earlier step. But you also have the option to scale dynamically to meet changes in demand (such as maintaining CPU Utilization at 50%), predictively scale in advance of daily and weekly patterns in traffic flows, or scale based on a scheduled time. Refer to scaling the size of your Auto Scaling group for more details.
    Diagram showing the auto scaling groups interface and scaling policies
  2. We will create a scheduled action to provision another application server instance.
    Diagram showing the auto scaling group's create schedule action interface.
  3. As per our scheduled action, at 11.30 am, an additional application server instance is started.
    Diagram showing the activity history for the instance.
  4. Under instance management, you will see an additional instance in ‘Pending’ state as it starts.
    Diagram showing the auto scaling groups interface and additional instances.
  5. To test the stateless nature of your application, you can manually stop the original application server instance and observe that your end-user experience is unaffected i.e. you are not prompted to re-authenticate and can continue using the application as your session data is stored in Redis ElastiCache and not tied to the original instance.

Cleaning up

To avoid incurring future charges, remove the resources by deleting the java-webapp-components, java-webapp-rds and java-webapp-infra CloudFormation stacks.

Conclusion

Customers with significant investments in Java application server technologies have options to migrate to the cloud without requiring a complete re-architecture of their applications. In this blog, we’ve shown an approach to modernizing Java applications running on Tomcat Application Server in AWS. And in doing so, take advantage of cloud-native features such as automatic scaling, provisioning infrastructure as code, and leveraging managed services (such as ElastiCache for Redis and Amazon RDS) to make our application stateless. We also demonstrated modernization features such as authentication and user provisioning via an external IdP (Amazon Cognito). For more information on different re-platforming patterns refer to the AWS Prescriptive Guidance on Migration.

Managing and Securing AWS Outposts Instances using AWS Systems Manager, Amazon Inspector, and Amazon GuardDuty

Post Syndicated from sbbusser original https://aws.amazon.com/blogs/compute/managing-and-securing-aws-outposts-instances-using-aws-systems-manager-amazon-inspector-and-amazon-guardduty/

This post is written by Sumeeth Siriyur, Specialist Solutions Architect.

AWS Outposts is a family of fully managed solutions that deliver AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience. Outposts is ideal for workloads that need low latency access to on-premises applications or systems, local data processing, and secure storage of sensitive customer data that must remain anywhere without an AWS region, including inside company-controlled environments or a specific country.

A key feature of Outposts is that it offers the same AWS hardware infrastructure, services, APIs, and tools to build and run your applications on-premises and “in AWS Regions”. Outposts is part of the cloud for a truly consistent hybrid experience. AWS compute, storage, database, and other services run locally on Outposts, and you can access the full range of AWS services available in the Region to build, manage, and scale your on-premises applications using familiar AWS services and tools.

Outposts comes in a variety of form factors, from 1U and 2U servers to 42U Outposts rack. This post focuses on the 42U form factor of Outposts.

This post demonstrates how to use some of the existing AWS services in the Region, such as AWS System Manager (SSM), Amazon Inspector, and Amazon GuardDuty to manage and secure your workload environment on Outposts rack. This is no different from how you use these services for workloads in the AWS Regions.

Solution overview

In this scenario, Outposts rack is locally installed in a customer premises. The service link connectivity to the AWS Region can be either via an AWS Direct Connect private virtual interface, a public virtual interface, or the public internet.

The local gateway (LGW) provides connectivity between the Outposts instances and the local on-premises network.

A virtual private cloud (VPC) spans all Availability Zones in its AWS Region. You can extend the VPC in the Region to the Outpost by adding an Outpost subnet. To add an Outpost subnet to a VPC, specify the Amazon Resource Name (ARN) – arn:aws:outposts:region:account-id – of the Outpost when you create the subnet. Outposts rack support multiple subnets. In this scenario, we have extended the VPC from the Region (us-west-2) to the Outpost.

To improve the security posture of the Outposts instance, you can configure AWS SSM to use an interface VPC endpoint in Amazon Virtual Private Cloud (VPC). An interface VPC endpoint lets you connect to services powered by AWS PrivateLink, a technology that lets you privately access AWS SSM APIs by using private IP addresses. See the details in the following AWS SSM section for the VPC endpoints.

Most importantly, to leverage any of the AWS services in the Region, Outposts rack relies on connectivity to the parent AWS Region. Outposts rack is not designed for disconnected operations or environments with limited to no connectivity. We recommend that you have highly-available networking connections back to your AWS Region. For an optimal experience and resiliency, AWS recommends that you use redundant connectivity of at least 500 Mbps (1 Gbps or higher) for the service link connection to the AWS Region.

An overview of the AWS Outposts setup and connectivity back to the region.

Outposts offers a consistent experience with the same hardware infrastructure, services, APIs, management, and operations on-premises as in the AWS Regions. Unlike other hybrid solutions that require different APIs, manual software updates, and purchase of third-party hardware and support, Outposts enables developers and IT operations teams to achieve the same pace of innovation across different environments.

In the first section, let’s see how we can use AWS SSM services for managing and operating Outposts instances.

Managing Outposts instances using AWS SSM

The Amazon Systems Manager Agent (SSM Agent) is installed and running on the Outposts instances.

SSM Agent is installed by default on Amazon Linux, Amazon Linux 2, Ubuntu Server16.04 and Ubuntu Server 18.04 LTS based Amazon Elastic Compute Cloud (EC2) AMIs. If SSM Agent isn’t preinstalled, then you must manually install the agent. Agent communication with SSM is via TCP port 443.

Linux: Manually install SSM Agent on EC2 instances for Linux

Windows: Manually install SSM Agent on EC2 instances for Windows Server

  1. Create an IAM instance profile for SSM

By default, SSM doesn’t have permission to perform actions on your instances. Grant access by using an AWS Identity and Access Management (IAM) instance profile. An instance profile is a container that passes IAM role information to an Amazon EC2 instance at launch. You can create an instance profile for SSM by attaching one or more IAM policies that define the necessary permissions to a new role or to a role that you already created. Make sure that you follow AWS best practices by having a least-privileges policy created.

  1. Create VPC endpoints for SSM.

a. amazonaws.us-west-2.ssm: The endpoint for the Systems Manager service.

b. amazonaws.us-west-2.ec2messages: Systems Manager uses this endpoint to make calls from the SSM Agent to the Systems Manager service.

c. amazonaws.us-west-2.ec2: If you’re using Systems Manager to create VSS-enabled snapshots, then you must make sure that you have an endpoint to the EC2 service. Without the EC2 endpoint defined, a call to enumerate attached Amazon Elastic Block Storage (EBS) volumes fails, which causes the Systems Manager command to fail.

d. amazonaws.us-west-2.ssmmessages: This endpoint is for connecting to your instances with a secure data channel using Session Manager.

e. amazonaws.us-west-2.s3: Systems Manager uses this endpoint to update SSM agent, perform patch operation, and for uploading logs into Amazon Simple Storage Service (S3) buckets.

  1. Once the SSM agent has been installed and the necessary permission has been provided for the Systems Manager, log in to Systems Manager Console and navigate to Fleet Manager to discover the Outposts instances as shown in the following image.

Fleet Manager to discover the Outposts instances.

4. You can use compliance to scan the Outposts instances for patch compliance and configuration inconsistencies.

Compliance to scan the Outposts instances for patch compliance and configuration inconsistencies.

5. AWS Systems Manager Inventory provides visibility into your Outposts computing environment. You can use this inventory to collect metadata about the instances.

AWS SSM inventory to collect metadata about the instances.

6. With Session Manager, you can log into your Outposts instances. You can use either an interactive one-click browser-based shell, or the AWS Command Line Interface (CLI) for Linux based EC2 instances. For Windows instances, you can connect using Remote Desktop Protocol (RDP). For better SEO, suggest replacing this with “Check out”, attach the link to “how to connect to Windows instances from the Fleet Manager console”, and delete can be found here. here.

Note that accessing the Outposts EC2 instances through SSH or RDP via the Region based Session Manager will have more latency via service link than accessing via the LGW.

Session Manager to connect to Outposts EC2 instances.

7. Patch Manager automated the process of patching the Outposts instances with both security-related and other types of updates. In the following you can see that one of the Outposts instances is scanned and updated with an operational update.

AWS SSM Patch Manager to patch the Outposts Instances.

Security at AWS is the highest priority. Security is a shared responsibility between AWS and customers. We offer the security tools and procedures to secure the Outposts instances as in the AWS region. By using AWS services, you can enhance your security posture on Outposts rack in these areas.

In the second section, let’s see how we can use Amazon Inspector running in the AWS Region to scan for vulnerabilities within the Outposts environment. Amazon Inspector uses the widely deployed SSM Agent to automatically scan for vulnerabilities on Outposts instances.

Scan Outposts instances for vulnerabilities using Amazon Inspector

Amazon Inspector is an automated vulnerability management service that continually scans AWS workloads for software vulnerabilities and unintended network exposure. Amazon Inspector automatically discovers all of the Outposts EC2 instances (installed with SSM Agent) and container images residing in Amazon Elastic Container Registry (ECR) that are identified for scanning. Then, it immediately starts scanning them for software vulnerabilities and unintended network exposure.

All workloads are continually rescanned when a new Common Vulnerabilities And Exposures (CVE) is published, or when there are changes in the workloads, such as installation of new software in an Outposts EC2 instance.

Amazon Inspector uses the widely deployed SSM Agent (deployed in the previous scenario) to collect the software inventory and configurations from your Outposts EC2 instances. Use the VPC interface endpoint – com.amazonaws.us-west-2.inspector2 – to privately access Amazon Inspector. The collected application inventory and configurations are used to assess workloads for vulnerabilities.

  1. The following Summary Dashboard provides information on how many Outposts EC2 instances and the container repositories are scanned and discovered.

Amazon Inspector Summary Console.

2. The findings by Vulnerability tab help to identify the most vulnerable Outposts EC2 instances in your environment. In the following, you can see Outposts instances with the following vulnerability highlighted.

a. Port range 0 to 65535 is reachable from an Internet Gateway

b. Port 22 is reachable from an Internet Gateway

Amazon Inspector Vulnerability console.

3. The findings by instance tab shows you all of the active findings for a Single Outposts instance in your environment. In the following, you can see that for this instance there are a total of 12 high and 19 medium findings based on the rules in the Common Vulnerabilities And Exposures (CVE) package.

Amazon Inspector Instances Console.

In the last section, let’s see how we can use GuardDuty to detect any threats within the Outposts environment.

Threat Detection service for your AWS accounts and Outposts workloads using Amazon GuardDuty

GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for malicious activities and delivers detailed security findings for visibility and remediation.

GuardDuty continuously monitors and analyses the Outposts instances and reports suspicious activities using the GuardDuty console. It gets this information from CloudTrail Management Events, VPC Flow Logs, and DNS logs.

In this scenario, GuardDuty has detected an SSH brute force attack against an Outposts instance.

Amazon GuardDuty threat detection console.

Costs associated with the scenario

  • Systems Manager: With AWS Systems Manager, you pay only for what you use on the priced feature. In this scenario, we have used the following features.
    1. Inventory – No additional charges
    2. Session Manager – No additional charges
    3. Patch Manager – No additional charges

*Note that there will be charges for the VPC endpoint created.

  • Amazon Inspector: Costs for Amazon Inspector are based on container images scanned to ECR and the EC2 instances being scanned.
    1. The average number of EC2 instances scanned per month in US-WEST-2 region is $1.258 per instance. In the above scenario, there are three instances within the Outposts at $1.258 = $3.774
  • Amazon GuardDuty: VPC Flow logs and CloudWatch logs are used for GuardDuty analysis. In this scenario, Only VPC Flow logs are considered.
    1. VPC Flow log is charged per GB/month. In US-WEST-2 region – the First 500 GB/month is $1 per GB. In the above scenario, there are three instances within the Outposts that would generate approximately 80 MB of data, which is still within the 500 GB limit.
  • Understand more about AWS Outposts rack pricing on our website.

Cleaning up

Please delete example resources if they are no longer needed to avoid incurring future costs.

  • Amazon Inspector: Disable Amazon Inspector from the Amazon Inspector Console.
  • Amazon GuardDuty: You can use the GuardDuty console to suspend or disable GuardDuty. You are not charged for using GuardDuty when the service is suspended.
  • Delete unused IAM policies

Conclusion

On-premises data centers traditionally use a variety of infrastructure, tools, and APIs. This disparate assortment of hardware and software solutions results in complexity. In turn, this leads to greater management costs, inability of staff to translate skills from one setting to another, and limits in innovation and knowledge-sharing between environments.

Using a common set of tools, services in the AWS Regions and on Outposts on premises allows you to have a consistent operation environment, thereby delivering a true hybrid cloud experience. Equally, by using the same tools to deploy and manage workloads in both environments, you can reduce operational overhead.

To get started with Outposts, see AWS Outposts Family. For more information about Outposts availability, see the Outposts rack FAQ.

Automate the Creation of On-Demand Capacity Reservations for running EC2 instances

Post Syndicated from sbbusser original https://aws.amazon.com/blogs/compute/automate-the-creation-of-on-demand-capacity-reservations-for-running-ec2-instances/

This post is written by Ballu Singh a Principal Solutions Architect at AWS, Neha Joshi a Senior Solutions Architect at AWS, and Naveen Jagathesan a Technical Account Manager at AWS.

Customers have asked how they can “create On-Demand Capacity Reservations (ODCRs) for their existing instances during events, such as the holiday season, Black Friday, marketing campaigns, or others?”

ODCRs let you reserve compute capacity for your your Amazon Elastic Compute Cloud (Amazon EC2) instances. ODCRs further make sure that you always have EC2 capacity access when required, and for as long as you need it. Customers who want to make sure that any instances that are stopped/started during the critical event and are available when needed should be covered by ODCRs.

ODCRs let you reserve compute capacity for your Amazon EC2 instances in a specific availability zone for any duration. This means that you can create and manage capacity reservations independently from the billing discounts offered by Savings Plans or Regional Reserved Instances. You can create ODCR at any time, without entering into a one-year or three-year term commitment, and the capacity is available immediately. Billing starts as soon as the ODCR enters the active state. When you no longer need it, cancel the ODCR to stop incurring charges.

At the time of this blog publication, if you need to create ODCR for existing running instances, you must manually identify your running instances configuration with matching attributes, such as instance type, platform, and Availability Zone. This is a time and resource consuming process.

In this post, we provide an automated way to manage ODCR operations. This includes creating, modifying, and cancelling ODCRs for the running instances across regions in an account, all without requiring any manual intervention of specifying instance configuration attributes. Additionally, it creates an Amazon CloudWatch Alarm for InstanceUtilization and an Amazon Simple Notification Service (Amazon SNS) topic with topic name ODCRAlarmNotificationTopic to notify when the threshold breaches.

Note: This will not create cluster placement group ODCRs. For details on capacity reservations in cluster placement groups, refer here.

Getting started

Before you create Capacity Reservations, note the limitations and restrictions here.

To get started, download the scripts for registering, modifying, and canceling ODCRs and associated requirements.txt, as well as AWS Identity and Access Management (IAM) policy from the GitHub link here.

Pre-requisites

To implement these scripts, you need the following prerequisites:

  1. Access to AWS Management Console, AWS Command Line Interface (CLI),or AWS SDK for ODCR.
  2. The following IAM role permissions for IAM users using the solution as provided in ODCR_IAM.json.
  3. Amazon EC2 instance having supported platform for capacity reservation. Capacity Reservations support the following platforms listed here for Linux and Windows.
  4. Refer to the above GitHub link for the code, and save the requirements.txt file in the same directory with other python scripts. You may want to run the requirements.txt file if you don’t have appropriate dependency to run the rest of the python scripts. You can run this using the following command:
pip3 install -r requirements.txt

Implementation Details

To create ODCR capacity reservation

The following instructions will guide you through creating a capacity reservation of running instances across all of the Regions within an AWS account.
Input variables needed from users:

  • EndDateType (String) – Indicates how the Capacity Reservation ends. A Capacity Reservation can have one of the following end types:
      • unlimited – The Capacity Reservation remains active until you explicitly cancel it. Don’t provide an EndDate if the EndDateType is unlimited.
      • limited – The Capacity Reservation expires automatically at a specified date and time. You must provide an EndDate value if the EndDateType value is limited.
  • EndDate (datetime) – The date and time when the Capacity Reservation expires. When a Capacity Reservation expires, the reserved capacity is released and you can no longer launch instances into it. The Capacity Reservation’s state changes to expired when it reaches its end date and time.

You must provide EndDateType as ‘limited’ and the EndDate in standard UTC format to secure instances for a limited period. Command to execute register ODCR script with limited period:

You must provide EndDateType as ‘unlimited’ to secure instances for unlimited period. Command to execute register ODCR script with unlimited period:

registerODCR.py '<EndDateType>' '<EndDate>'
    Example- registerODCR.py 'limited' '2022-01-31 14:30:00'
  • You must provide EndDateType as ‘unlimited’ to secure instances for unlimited period. Command to execute register ODCR script with unlimited period:
registerODCR.py 'EndDateType'
    Example- registerODCR.py 'unlimited'

This registerODCR.py script does following four things:

1. Describe instances cross-region in an account. It checks for the instance that has:

    • No Capacity reservation
    • State of the instance is running
    • Tenancy is default
    • InstanceLifecycle is None indicates whether this is a Spot Instance or a Scheduled Instance

Note: Describe instances API call is counted toward your account API limit. Therefore, it is advisable to run the script during non-peak hours or before the short-term scaling event begins. Work with AWS Support team if you run into API throttling.

2. Aggregates instances with similar attributes, such as InstanceType, AvailabilityZone, Tenancy, and Platform.

3. Describe reserved instances cross-region in an account. It checks for instance(s) that have Zonal Reservation Instances (ZRIs) and compares them with aggregated instances with similar attributes.

4. Finally,

    • Reserves ODCR(s) for existing running instances with matching attributes for which ZRIs do not exist.

Note: If you have one or more ZRIs in an account, then the script compares them with the existing instances with matching characteristics – Instance Type, AZ, and Platform – and does NOT create ODCR for the ZRIs to avoid incurring redundant charges. If there are more running instances than ZRIs, then the script creates an ODCR for just the delta.

    • Creates an SNS topic with the topic name – ODCRAlarmNotificationTopic in the region where you’re registering ODCR, if it doesn’t already exist.
    • Creates CloudWatch alarm for InstanceUtilization using the best practices, which can be found here.

Note: You must subscribe and confirm to the SNS topic, if you haven’t already, to receive notifications.

The CloudWatch alarm is also created on your behalf in the region for each ODCR. This alarm monitors your ODCR metric- InstanceUtilization. Whenever it breaches threshold (50% in this case), it enters the alarm state and sends an SNS notification using the topic that was created for you if you subscribed to it.

Note: You can change the alarm threshold based on your specific needs.

  • You will receive an email notification when CloudWatch Alarm State changes to Alarm with:
    • SNS Subject (Assuming CW alarms triggers in US East region).
ALARM: "ODCRAlarm-cr-009969c7abf4daxxx" in US East (N. Virginia)
    • SNS Body will have the details
      • CW alarm, region, link to view the alarm, alarm details, and state change actions.

With this, if your ODCR InstanceUtilization drops, then you will be notified in near-real time to help you optimize the capacity and stop unnecessary payments for unused capacity.

To modify ODCR capacity reservation

To modify the attributes of an active capacity reservation after you have created it, adhere to the following instructions.

Note: When modifying a Capacity Reservation, you can only increase or decrease the quantity and change how it is released. You can’t change the instance type, EBS optimization, instance store settings, platform, Availability Zone, or instance eligibility of a Capacity Reservation. If you must modify any of these attributes, then we recommend that you cancel the reservation, and then create a new one with the required attributes. You can’t modify a Capacity Reservation after it has expired or after you have explicitly canceled it.

  • Input variables needed from users:
    • CapacityReservationID – The ID of the Capacity Reservation that you want to modify.
    • InstanceCount (integer) – The number of instances for which to reserve capacity. The number of instances can’t be increased or decreased by more than 1000 in a single request.
    • EndDateType (String) – Indicates how the Capacity Reservation ends. A Capacity Reservation can have one of the following end types:
      • unlimited – The Capacity Reservation remains active until you explicitly cancel it. Don’t provide an EndDate if the EndDateType is unlimited.
      • limited – The Capacity Reservation expires automatically at a specified date and time. You must provide an EndDate value if the EndDateType value is limited.
    • EndDate (datetime) – The date and time of when the Capacity Reservation expires. When a Capacity Reservation expires, the reserved capacity is released, and you can no longer launch
    • instances into it. The Capacity Reservation’s state changes to expired when it reaches its end date and time.
      Example to run the modify ODCR script for ‘limited’ period:
    • You must provide EndDateType as ‘unlimited’ to modify instances for an unlimited period. Command to the run modify ODCR script with unlimited period:
  • Command to execute modify ODCR script:
    modifyODCR.py <CapacityReservationId> <InstanceCount> <EndDateType> <EndDate> 
  • Example to execute the modify ODCR script for limited period:
modifyODCR.py 'cr-05e6a94b99915xxxx' '1' 'limited' '2022-01-31 14:30:00'

Note: EndDate is in the standard UTC time.

  • You must provide EndDateType as ‘unlimited’ to modify instances for unlimited period. Command to execute modify ODCR script with unlimited period:
modifyODCR.py <CapacityReservationId> <InstanceCount> <EndDateType>
  • Example to execute the modify ODCR script for unlimited period:
modifyODCR.py 'cr-05e6a94b99915xxxx' '1' 'unlimited'

To cancel ODCR capacity reservation

To cancel the ODCR that are in the “Active” state, follow these instructions:

Note: Once the cancellation request succeeds, the reservation status will be marked as “cancelled”.

  • Input variables needed from users:
    • CapacityReservationID – The ID of the Capacity Reservation to cancel.
  • You must provide one parameter while executing the cancellation script.
  • Command to execute cancel ODCR script:
cancelODCR.py <CapacityReservationId> 
  • Example to execute the cancel ODCR script:
Example - cancelODCR.py 'cr-05e6a94b99915xxxx'

Monitoring

CloudWatch metrics let you monitor the unused capacity in your Capacity Reservations to optimize the ODCR. ODCRs send metric data to CloudWatch every five minutes. Although Capacity Reservation usage metrics are UsedInstanceCount, AvailableInstanceCount, TotalInstanceCount, and InstanceUtilization, for this solution we will be using the InstanceUtilization metric. This shows the percentage of reserved capacity instances that are currently in use. This will be useful for monitoring and optimizing ODCR consumption.

For example, if your On-Demand Capacity Reservation is for four instances and with matching criteria only one EC2 instance is currently running, then the InstanceUtilization metric will be 25% for your respective capacity reservation.

Let’s look at the steps to create the CloudWatch monitoring dashboard for your On-Demand Capacity Reservation solution:

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
  2. If necessary, change the Region. From the navigation bar, select the Region where your Capacity Reservation resides. For more information, see Regions and Endpoints.
  3. In the navigation pane, choose Metrics.

Amazon CloudWatch Dashboard

For All metrics, choose EC2 Capacity Reservations.

Amazon CloudWatch Dashboard: Metrics

4. Choose the metric dimension By Capacity Reservation. Metrics will be grouped by

Amazon CloudWatch Metrics: Capacity Reservation Ids

5. Select the dropdown arrow for InstanceUtilization, and select Search for this only.

Amazon CloudWatch Metrics Filter

Once we see the InstanceUtilization metric in the filter list, select Graph Search.

Amazon CloudWatch Metrics: Graph Search

This displays the InstanceUtilization metrics for the selected period.

Amazon CloudWatch Metrics Duration

OPTIONAL: To display the Capacity Reservation IDs for active metrics only:

    • Navigate to Graphed metrics.

Amazon CloudWatch: Graphed Metrics

    • Under Details column, select Edit math expression.

Amazon CloudWatch Metrics: Math Expression

    • Edit the math expression with the following, and select Apply:
REMOVE_EMPTY(SEARCH('{AWS/EC2CapacityReservations,CapacityReservationId} MetricName="InstanceUtilization"', 'Average', 300))

Amazon CloudWatch Graphed Metrics: Math Expression Apply

This displays the Capacity Reservation IDs for active metrics only.

Amazon CloudWatch Metrics: Active Capacity Reservation Ids

With this configuration, whenever new Capacity Reservations are created, the InstanceUtilization metric for respective Capacity Reservation IDs will be populated.

6. From the Actions drop-down menu, select Add to dashboard.

Amazon CloudWatch Metrics: Add to Dashboard

Select Create new to create a new dashboard for monitoring your ODCR metrics.

Amazon CloudWatch: Creat New Dashboard

Specify the new dashboard name, and select Add to dashboard.

Amazon CloudWatch: Create New Dashboard

7. These configuration steps will navigate you to your newly created CloudWatch dashboard under Dashboards.

Amazon CloudWatch Dashboard: ODCR Metrics

Once this is created, if you create new Capacity Reservations, or new instances get added to existing reservations, then those metrics will be automatically be added to your CloudWatch Dashboard.

Note: You may see a delay of approximately 5-10 minutes from the point when changes are made to your environment (ODCR operations or instances launch/termination activities) to those changes getting reflected on your CloudWatch Dashboard metrics.

Conclusion

In this post, we discussed a solution for automating ODCR operations for existing EC2 instances. This included creating capacity reservation, modifying capacity reservation, and cancelling capacity reservation operations that inherit your existing EC2 instances for attribute details. We also discussed monitoring aspects of ODCR metrics using CloudWatch. This solution allows you to automate some of the ODCR operations for existing instances, thereby optimizing and speeding up the entire process.

For more information, see Target a group of Amazon EC2 On-Demand Capacity Reservations blog and Capacity Reservations documentation.

If you have feedback or questions about this post, please submit your comments in the comments section or contact AWS Support.

AWS Week in Review – March 14, 2022

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/aws-week-in-review-march-14-2022/

This post is part of our Week in Review series. Check back each week for a quick round up of interesting news and announcements from AWS!

Welcome to the March 14 AWS Week in Review post, and Happy Pi Day! I hope you managed to catch some of our livestreamed Pi day celebration of the 16th birthday of Amazon Simple Storage Service (Amazon S3). I certainly had a lot of fun in the event, along with my co-hosts – check out the end of this post for some interesting facts and fun from the day.

First, let’s dive right into the news items and launches from the last week that caught my attention.

Last Week’s Launches
New X2idn and X2iedn EC2 Instance Types – Customers with memory-intensive workloads and a requirement for high networking bandwidth may be interested in the newly announced X2idn and X2iedn instance types, which are built on the AWS Nitro system. Featuring third-generation Intel Xeon Scalable (Ice Lake) processors, these instance types can yield up to 50 percent higher compute price performance and up to 45 percent higher SAP Application Performance Standard (SAPS) performance than comparable X1 instances. If you’re curious about the suffixes on those instance type names, they specify processor and other information. In this case, the i suffix indicates that the instances are using an Intel processor, e means it’s a memory-optimized instance family, d indicates local NVMe-based SSDs physically connected to the host server, and n means the instance types support higher network bandwidth up to 100 Gbps. You can find out more about the new instance types in this news blog post.

Amazon DynamoDB released two updates – First, an increase in the default service quotas raises the number of tables allowed by default from 256 to 2500 tables. This will help customers working with large numbers of tables. At the same time the service also increased the allowed number of concurrent table management operations, from 50 to 500. Table management operations are those that create, update, or delete tables. The second update relates to PartiQL a SQL-compatible query language you can use to query, insert, update, or delete DynamoDB table data. You can now specify a limit on the number of items processed. You’ll find this useful when you know you only need to process a certain number of items, helping reduce the cost and duration of requests.

If you’re coding against Amazon ECS‘s API, you may want to take a look at the change to UpdateService that now enables you to update load balancers, service registries, tag propagation, and ECS managed tags for a service. Previously, you would have had to delete and recreate the service to make changes to these resources for a service. Now you can do it all with one call, making it a hassle-free and less disruptive, more efficient experience. Take a look at the What’s New post for more details.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
If you’re analyzing time series data, take a look at this new book on building forecasting models and detecting anomalies in your data. It’s authored by Michael Hoarau, an AI/ML Specialist Solutions Architect at AWS.

March 8 was International Women’s Day and we published a post featuring several women, including fellow news blogger and published author Antje Barth, chatting about their experiences working in Developer Relations at AWS.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

.NET Application Modernization Webinar (March 23)Sign up today to learn about .NET modernization, what it is, and why you might want to modernize. The webinar will include a deep dive focusing on the AWS Microservice Extractor for .NET.

AWS Summit Brussels is fast approaching on March 31st. Register here.

Pi Day Fun & Facts
As this post is published, we’re coming to the end of our livestreamed Pi Day event celebrating the 16th birthday of S3 – how time flies! Here are some interesting facts & fun snippets from the event:

  • In the keynote, we learned S3 currently stores over 200 trillion objects, and serves over 100 million requests per second!
  • S3‘s Intelligent Tiering has saved customers over $250 million to date.
  • Did you know that S3, having reached 16 years of age, is now eligible for a Washington State drivers license? Or that it can now buy a lottery ticket, get a passport, or – check this – it can pilot a hang glider!
  • We asked each of our guests on the livestream, and the team of AWS news bloggers, to nominate their favorite pie. The winner? It’s a tie between apple and pecan pie!

That’s all for this week. Check back next Monday for another Week in Review!

— Steve

Extend SQL Server DR using log shipping for SQL Server FCI with Amazon FSx for Windows configuration

Post Syndicated from Yogi Barot original https://aws.amazon.com/blogs/architecture/extend-sql-server-dr-using-log-shipping-for-sql-server-fci-with-amazon-fsx-for-windows-configuration/

This week for Women’s History Month, we’re continuing to feature female authors. We’re showcasing women in the tech industry who are building, creating, and, above all, inspiring, empowering, and encouraging everyone—especially women and girls—in tech.


Companies choosing to rehost their on-premises SQL Server workloads to AWS can face challenges with setting up their disaster recovery (DR) strategy. Solutions such as Always On can be a more expensive, complex configuration across Regions. It can cause latency issues when synchronously replicating data cross-Region. Snapshots have additional overhead and may breach their stringent recovery point objective/recovery time objective (RPO/RTO) requirements.

A log shipping solution can take advantage of cross-Region replication of data using Amazon FSx for Windows File Server. It has less maintenance overhead, doesn’t introduce latency, and meets RPO/RTO requirements. A multi-Region architecture for Microsoft SQL Server is often adopted for SQL Server deployments for business continuity (disaster recovery) and improved latency (for a geographically distributed customer base).

This blog post explores SQL Server DR architecture using SQL Server failover cluster with Amazon FSx for Windows File Server for the primary site and secondary DR site. We describe how to set up a multi-Region DR using log shipping. We’ll explain the architecture patterns so you can follow along and effectively design a highly available SQL Server deployment that spans two or more AWS Regions.

Here are some advantages of using log shipping versus Always On distributed availability group DR setup.

  • Log shipping works with SQL Server Standard edition
  • It lowers total cost of ownership (TCO) as you only need one SQL Server Standard edition license at the primary/DR site
  • It’s straightforward to configure
  • There’s no need for clustering setup at the OS level
  • It supports all SQL Server versions. You don’t need the SQL Server version to be the same for source and destination instances.

Log shipping DR solution for SQL Server FCI with Amazon FSx

The architecture diagram in Figure 1 depicts SQL Server failover cluster instance (FCI) using Amazon FSx as storage (multiple Availability Zones) in Region 1. It uses a standalone or a similar setup on Region 2. It uses a log shipping feature for replication and DR. This will also serve as the reference architecture for our solution.

Log shipping DR solution for SQL Server FCI with Amazon FSx

Figure 1. Log shipping DR solution for SQL Server FCI with Amazon FSx

Figure 1 shows an SQL cluster in Region 1 and standalone SQL cluster in Region 2. The primary cluster in Region 1 is initially configured with SQL Server failover cluster instance (FCI) using Amazon FSx for its shared storage. Region 2 can have a standalone Amazon EC2 server with SQL Server and Amazon Elastic Block Store (EBS) as storage. Or it can have an identical configuration to Region 1, but with different hostnames, and an SQL network name (SQLFCI02) to avoid possible collisions.

You can build the VPC peering or AWS Transit Gateway to have seamless connectivity between the two Regions for the opened ports (SQL Server, SMB for file share, and others.)

With Amazon FSx, you get a fully managed and shared file storage solution, that automatically replicates the underlying storage synchronously across multiple Availability Zones. Amazon FSx provides high availability with automatic failure detection and automatic failover if there are any hardware or storage issues. The service fully supports continuously available shares, a feature that permits SQL Server uninterrupted access to shared file data.

There is an asynchronous replication setup from Region 1 to Region 2 using the log shipping feature. In this type of configuration, Microsoft SQL Server log shipping replicates databases using transaction logs. This ensures that a physically replicated warm standby database is an exact binary replica of the primary database. This is referred to as physical replication.

Log shipping can be configured with two available modes. These are related to the state of the secondary log-shipped SQL Server database.

  • Standby mode. The database is available for querying. Users cannot access the database while restore is going on. But once restore is completed, users can access it in read-only mode.
  • Restore mode. The database is not accessible for users.

In this solution, you configure a warm standby SQL Server database on an EC2 instance designated in SQL FCI using Amazon FSx as shared storage. You can send transaction log backups asynchronously between your primary Region database and the warm standby server in the other Region. The transaction log backups are then applied to the warm standby database sequentially. When all the logs have been applied, you can perform a manual failover and point the application to the secondary Region. We recommend running the primary and secondary database instances in separate Availability Zones, and configuring a monitor instance to track all the details of log shipping.

Prerequisites

Walkthrough steps to set up DR for SQL Server FCI

Following are the steps required to configure SQL Server DR using SQL Server failover cluster. Amazon FSx for Windows File Server is used for the primary site and secondary DR site. We also demonstrate how to set up a multi-Region log shipping.

Assumed variables

Region_01:
WSFC Cluster Name: SQLCluster1
FCI Virtual Network Name: SQLFCI01
Region_02:
Amazon EC2 Name: EC2SQL2

Make sure to configure network connectivity between your clusters. In this solution, we are using two VPCs in two separate Regions.

    • VPC peering is configured to enable network traffic on both VPCs.
    • The domain controller (AWS Managed Microsoft AD) on both VPCs are configured with conditional forwarding. This enables DNS resolution between the two VPCs.
  1. Configure SQL FCI setup using Amazon FSx as shared storage on Region_01.
  2. Configure SQL standalone instance on Region_02 with EBS volume as storage.
  3. Create an Amazon FSx in the primary Region with AWS managed Active Directory, or on-premises Active Directory connected with trust relation or AD Connector.
  4. Create a SQL Server service account with proper permissions to be able to set up transaction log settings.
  5. Configure VPC peering between the primary and DR/secondary Region.
  6. Join the domain to the Active Directory network for both primary and secondary servers in primary Region.
  7. Mount Amazon FSx on primary and secondary server and allow shared permissions, so SQL Server is able to access the folder. Use Amazon FSx for storing transaction log backups and EBS for storing transaction logs on the secondary Region.
  8. Set up log shipping from the primary server SQL Server FCI01 to the secondary SQL Server EC2SQL2 with the standby option enabled. This way the databases can be in read on the secondary SQL Server.
  9. In case of disaster, follow the FAILOVER and FAILBACK steps in the next sections. Learn more by reading Change Roles Between Primary and Secondary Log Shipping Servers.

Failover steps

In case of disaster at primary Region node SQLFCI01, log shipping acts as DR solution. Following, we show the steps to bring the databases online on EC2SQL02. Once SQLFCI01 is back, Use the following steps if DR drill checks to failover. In a real disaster, follow the process from Step 3 onwards.

1. Stop all activities on SQLFCI01 databases involved in log shipping jobs on SQLFCI01 and EC2SQL02. Confirm if any process is running by using the following query:

Use master
Go
select * from sysprocesses where dbid = DB_ID('DatabaseName')

2. Take full backup on SQLFCI01 as rollback option.

BACKUP DATABASE [DatabaseName]
TO  DISK = N'Provide Drive details'
WITH COMPRESSION
GO

3. Take last tail transaction log backup if we have access to SQL Server. Otherwise, check the last available transaction log stored in EC2SQL02 and restore it with RECOVERY to bring the databases online on EC2SQL02.

RESTORE LOG [DatabaseName] FROM  DISK = N'Provide path of last tlog'
WITH  FILE = 1,  RECOVERY,  NOUNLOAD,  STATS = 10
GO

4. Redirect the application connections to EC2SQL02.

Failback methods

1. Native backup/restore or rollback strategy

  • Take full backup from EC2SQL02 and copy to the SQLFCI01.
  • RESTORE the full backup on SQLFCI01.
  • Reconfigure log shipping between SQLFCI01 and EC2SQL02.

2.    Reverse log shipping

In case of DR drills or business continuity and disaster recovery (BCDR) activities, we can set up reverse log shipping to reduce the time taken to failover. It doesn’t require reinitializing the database with a full backup if performed carefully. It is crucial to preserve the log sequence number (LSN) chain. Perform the final log backup using the NORECOVERY option. Backing up the log with this option puts the database in a state where log backups can be restored. It ensures that the database’s LSN chain doesn’t deviate. This procedure helps reduce downtime to bring back SQLFCI01.

  • STOP all activities on SQLFCI01 databases involved in log shipping jobs on SQLFCI01 and EC2SQL02.
  • TAKE Tlog backup of SQLFCI01 with NORECOVERY option.

BACKUP LOG [DatabaseName]
TO DISK = 'BackupFilePathname'
WITH NORECOVERY;

  • RESTORE transaction log backup on EC2SQL02 with NORECOVERY.
  • Reconfigure log shipping and reenable the jobs back.
  • Reconfigure the application connections to SQLFCI01.

Conclusion

A multi-Region strategy for your mission-critical SQL Server deployments is key for business continuity and disaster recovery. This blog post shows how to achieve that using log shipping for SQL Server FCI deployment. Setting up DR using log shipping can help you save costs and meet your business requirements.

To learn more, check out Simplify your Microsoft SQL Server high availability deployments using Amazon FSx for Windows File Server.

More posts for Women’s History Month!

Other ways to participate