Tag Archives: high performance computing

New – Amazon EC2 Hpc7a Instances Powered by 4th Gen AMD EPYC Processors Optimized for High Performance Computing

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-hpc7a-instances-powered-by-4th-gen-amd-epyc-processors-optimized-for-high-performance-computing/

In January 2022, we launched Amazon EC2 Hpc6a instances for customers to efficiently run their compute-bound high performance computing (HPC) workloads on AWS with up to 65 percent better price performance over comparable x86-based compute-optimized instances.

As their jobs grow more complex, customers have asked for more cores with more compute performance and more memory and network performance to reduce the time to complete jobs. Additionally, as customers look to bring more of their HPC workloads to EC2, they have asked how we can make it easier to distribute processes to make the best use of memory and network bandwidth, to align with their workload requirements.

Today, we are announcing the general availability of Amazon EC2 Hpc7a instances, the next generation of instance types that are purpose-built for tightly coupled HPC workloads. Hpc7a instances powered by the 4th Gen AMD EPYC processors (Genoa) deliver up to 2.5 times better performance compared to Hpc6a instances. These instances offer 300 Gbps Elastic Fabric Adapter (EFA) bandwidth powered by the AWS Nitro System, for fast and low-latency internode communications.

Hpc7a instances feature Double Data Rate 5 (DDR5) memory, which provides 50 percent higher memory bandwidth compared to DDR4 memory to enable high-speed access to data in memory. These instances are ideal for compute-intensive, latency-sensitive workloads such as computational fluid dynamics (CFD) and numerical weather prediction (NWP).

If you are running on Hpc6a, you can use Hpc7a instances and take advantage of the 2 times higher core density, 2.1 times higher effective memory bandwidth, and 3 times higher network bandwidth to lower the time needed to complete jobs compared to Hpc6a instances.

Here’s a quick infographic that shows you how the Hpc7a instances and the 4th Gen AMD EPYC processor (Genoa) compare to the previous instances and processor:

Hpc7a instances feature sizes of up to 192 cores of the AMD EPYC processors CPUs with 768 GiB RAM. Here are the detailed specs:

Instance Name CPUs RAM (Gib)
EFA Network Bandwidth (Gbps)
Attached Storage
Hpc7a.12xlarge 24 768 Up to 300 EBS Only
Hpc7a.24xlarge 48 768 Up to 300 EBS Only
Hpc7a.48xlarge 96 768 Up to 300 EBS Only
Hpc7a.96xlarge 192 768 Up to 300 EBS Only

These instances provide higher compute, memory, and network performance to run the most compute-intensive workloads, such as CFD, weather forecasting, molecular dynamics, and computational chemistry on AWS.

Similar to EC2 Hpc7g instances released a month earlier, we are offering smaller instance sizes that makes it easier for customers to pick a smaller number of CPU cores to activate while keeping all other resources constant based on their workload requirements. For HPC workloads, common scenarios include providing more memory bandwidth per core for CFD workloads, allocating fewer cores in license-bound scenarios, and supporting more memory per core. To learn more, see Instance sizes in the Amazon EC2 Hpc7 family – a different experience in the AWS HPC Blog.

As with Hpc6a instances, you can use the Hpc7a instance to run your largest and most complex HPC simulations on EC2 and optimize for cost and performance. You can also use the new Hpc7a instances with AWS Batch and AWS ParallelCluster to simplify workload submission and cluster creation. You can also use Amazon FSx for Lustre for submillisecond latencies and up to hundreds of gigabytes per second of throughput for storage.

To achieve the best performance for HPC workloads, these instances have Simultaneous Multithreading (SMT) disabled, they’re available in a single Availability Zone, and they have limited external network and EBS bandwidth.

Now Available
Amazon EC2 Hpc7a instances are available today in three AWS Regions: US East (Ohio), EU (Ireland), and US GovCloud for purchase in On-Demand, Reserved Instances, and Savings Plans. For more information, see the Amazon EC2 pricing page.

To learn more, visit our Hpc7a instances page and get in touch with our HPC team, AWS re:Post for EC2, or through your usual AWS Support contacts.

Channy

New – Amazon EC2 P5 Instances Powered by NVIDIA H100 Tensor Core GPUs for Accelerating Generative AI and HPC Applications

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-p5-instances-powered-by-nvidia-h100-tensor-core-gpus-for-accelerating-generative-ai-and-hpc-applications/

In March 2023, AWS and NVIDIA announced a multipart collaboration focused on building the most scalable, on-demand artificial intelligence (AI) infrastructure optimized for training increasingly complex large language models (LLMs) and developing generative AI applications.

We preannounced Amazon Elastic Compute Cloud (Amazon EC2) P5 instances powered by NVIDIA H100 Tensor Core GPUs and AWS’s latest networking and scalability that will deliver up to 20 exaflops of compute performance for building and training the largest machine learning (ML) models. This announcement is the product of more than a decade of collaboration between AWS and NVIDIA, delivering the visual computing, AI, and high performance computing (HPC) clusters across the Cluster GPU (cg1) instances (2010), G2 (2013), P2 (2016), P3 (2017), G3 (2017), P3dn (2018), G4 (2019), P4 (2020), G5 (2021), and P4de instances (2022).

Most notably, ML model sizes are now reaching trillions of parameters. But this complexity has increased customers’ time to train, where the latest LLMs are now trained over the course of multiple months. HPC customers also exhibit similar trends. With the fidelity of HPC customer data collection increasing and data sets reaching exabyte scale, customers are looking for ways to enable faster time to solution across increasingly complex applications.

Introducing EC2 P5 Instances
Today, we are announcing the general availability of Amazon EC2 P5 instances, the next-generation GPU instances to address those customer needs for high performance and scalability in AI/ML and HPC workloads. P5 instances are powered by the latest NVIDIA H100 Tensor Core GPUs and will provide a reduction of up to 6 times in training time (from days to hours) compared to previous generation GPU-based instances. This performance increase will enable customers to see up to 40 percent lower training costs.

P5 instances provide 8 x NVIDIA H100 Tensor Core GPUs with 640 GB of high bandwidth GPU memory, 3rd Gen AMD EPYC processors, 2 TB of system memory, and 30 TB of local NVMe storage. P5 instances also provide 3200 Gbps of aggregate network bandwidth with support for GPUDirect RDMA, enabling lower latency and efficient scale-out performance by bypassing the CPU on internode communication.

Here are the specs for these instances:

Instance
Size
vCPUs Memory
(GiB)
GPUs
(H100)
Network Bandwidth
(Gbps)
EBS Bandwidth
(Gbps)
Local Storage
(TB)
P5.48xlarge 192 2048 8 3200 80 8 x 3.84

Here’s a quick infographic that shows you how the P5 instances and NVIDIA H100 Tensor Core GPUs compare to previous instances and processors:

P5 instances are ideal for training and running inference for increasingly complex LLMs and computer vision models behind the most demanding and compute-intensive generative AI applications, including question answering, code generation, video and image generation, speech recognition, and more. P5 will provide up to 6 times lower time to train compared with previous generation GPU-based instances across those applications. Customers who can use lower precision FP8 data types in their workloads, common in many language models that use a transformer model backbone, will see further benefit at up to 6 times performance increase through support for the NVIDIA transformer engine.

HPC customers using P5 instances can deploy demanding applications at greater scale in pharmaceutical discovery, seismic analysis, weather forecasting, and financial modeling. Customers using dynamic programming (DP) algorithms for applications like genome sequencing or accelerated data analytics will also see further benefit from P5 through support for a new DPX instruction set.

This enables customers to explore problem spaces that previously seemed unreachable, iterate on their solutions at a faster clip, and get to market more quickly.

You can see the detail of instance specifications along with comparisons of instance types between p4d.24xlarge and new p5.48xlarge below:

Feature p4d.24xlarge p5.48xlarge Comparision
Number & Type of Accelerators 8 x NVIDIA A100 8 x NVIDIA H100
FP8 TFLOPS per Server 16,000 640% vs.A100 FP16
FP16 TFLOPS per Server 2,496 8,000
GPU Memory 40 GB 80 GB 200%
GPU Memory Bandwidth 12.8 TB/s 26.8 TB/s 200%
CPU Family Intel Cascade Lake AMD Milan
vCPUs 96  192 200%
Total System Memory 1152 GB 2048 GB 200%
Networking Throughput 400 Gbps 3200 Gbps 800%
EBS Throughput 19 Gbps 80 Gbps 400%
Local Instance Storage 8 TBs NVMe 30 TBs NVMe 375%
GPU to GPU Interconnect 600 GB/s 900 GB/s 150%

Second-generation Amazon EC2 UltraClusters and Elastic Fabric Adaptor
P5 instances provide market-leading scale-out capability for multi-node distributed training and tightly coupled HPC workloads. They offer up to 3,200 Gbps of networking using the second-generation Elastic Fabric Adaptor (EFA) technology, 8 times compared with P4d instances.

To address customer needs for large-scale and low latency, P5 instances are deployed in the second-generation EC2 UltraClusters, which now provide customers with lower latency across up to 20,000+ NVIDIA H100 Tensor Core GPUs. Providing the largest scale of ML infrastructure in the cloud, P5 instances in EC2 UltraClusters deliver up to 20 exaflops of aggregate compute capability.

EC2 UltraClusters use Amazon FSx for Lustre, fully managed shared storage built on the most popular high-performance parallel file system. With FSx for Lustre, you can quickly process massive datasets on demand and at scale and deliver sub-millisecond latencies. The low-latency and high-throughput characteristics of FSx for Lustre are optimized for deep learning, generative AI, and HPC workloads on EC2 UltraClusters.

FSx for Lustre keeps the GPUs and ML accelerators in EC2 UltraClusters fed with data, accelerating the most demanding workloads. These workloads include LLM training, generative AI inferencing, and HPC workloads, such as genomics and financial risk modeling.

Getting Started with EC2 P5 Instances
To get started, you can use P5 instances in the US East (N. Virginia) and US West (Oregon) Region.

When launching P5 instances, you will choose AWS Deep Learning AMIs (DLAMIs) to support P5 instances. DLAMI provides ML practitioners and researchers with the infrastructure and tools to quickly build scalable, secure distributed ML applications in preconfigured environments.

You will be able to run containerized applications on P5 instances with AWS Deep Learning Containers using libraries for Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service  (Amazon EKS).  For a more managed experience, you can also use P5 instances via Amazon SageMaker, which helps developers and data scientists easily scale to tens, hundreds, or thousands of GPUs to train a model quickly at any scale without worrying about setting up clusters and data pipelines. HPC customers can leverage AWS Batch and ParallelCluster with P5 to help orchestrate jobs and clusters efficiently.

Existing P4 customers will need to update their AMIs to use P5 instances. Specifically, you will need to update your AMIs to include the latest NVIDIA driver with support for NVIDIA H100 Tensor Core GPUs. They will also need to install the latest CUDA version (CUDA 12), CuDNN version, framework versions (e.g., PyTorch, Tensorflow), and EFA driver with updated topology files. To make this process easy for you, we will provide new DLAMIs and Deep Learning Containers that come prepackaged with all the needed software and frameworks to use P5 instances out of the box.

Now Available
Amazon EC2 P5 instances are available today in AWS Regions: US East (N. Virginia) and US West (Oregon). For more information, see the Amazon EC2 pricing page. To learn more, visit our P5 instance page and explore AWS re:Post for EC2 or through your usual AWS Support contacts.

You can choose a broad range of AWS services that have generative AI built in, all running on the most cost-effective cloud infrastructure for generative AI. To learn more, visit Generative AI on AWS to innovate faster and reinvent your applications.

Channy

New – Amazon EC2 Hpc7g Instances Powered by AWS Graviton3E Processors Optimized for High Performance Computing Workloads

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-hpc7g-instances-powered-by-aws-graviton3e-processors-optimized-for-high-performance-computing-workloads/

At AWS re:Invent 2022, Adam Selipsky, CEO of AWS, explained high performance computing (HPC) workloads typically can either be compute-intensive, compute- and networking-intensive, or data- and memory-intensive in his keynote.

Compute workloads include weather forecasting, computational fluid dynamics, and financial options pricing. To help with this, you have Amazon EC2 Hpc6a instances, which deliver up to 65 percent better price performance over comparable compute optimized x86-based instances.

Other HPC workloads require modeling the performance of complex structures—things like wind turbines, concrete buildings, and industrial equipment. Without enough data and memory, these models can take days or weeks to run in a cost-effective way. The Amazon EC2 Hpc6id instance is designed to deliver leading price performance for data and memory-intensive HPC workloads with higher memory bandwidth per core, faster local solid-state drive (SSD) storage, and enhanced networking with Elastic Fabric Adapter (EFA).

Announcing Amazon EC2 Hpc7g Instances
Compute-intensive HPC workloads such as weather forecasting, computational fluid dynamics, and financial options pricing also require more network performance, even better price performance, and greater energy efficiency.

Today we are announcing the general availability of Amazon EC2 Hpc7g instances, a new purpose-built instance type for tightly coupled compute and network-intensive HPC workloads.

Hpc7g instances are powered by AWS Graviton3E processors that provide up to two times better floating-point performance and 200 Gbps dedicated EFA bandwidth than EC2 C6gn instances powered by AWS Graviton2 processors and are up to 60 percent more energy efficient than comparable x86 instances.

Here’s a quick infographic that shows you how the Hpc7g instances and the Graviton3E processors compare to previous instances and processors:

Hpc7g instances feature sizes of up to 64 cores of the latest AWS custom Graviton3E CPUs with 128 GiB RAM. Here are the detailed specs:

Instance Name
CPUs RAM (GiB)
EFA Network Bandwidth (Gbps) Attached Storage
hpc7g.4xlarge 16 128 Up to 200 EBS Only
hpc7g.8xlarge 32 128 Up to 200 EBS Only
hpc7g.16xlarge 64 128 Up to 200 EBS Only

Hpc7g instances are the most cost-efficient option to scale your HPC clusters on AWS. If you are considering migrating your largest HPC workloads requiring tens of thousands of cores at scale to AWS, you can take advantage of up to 200 Gbps EFA bandwidth to reduce the latency and run message passing interface (MPI) applications on parallel computing architectures while ensuring minimized power consumption on Hpc7g instances.

You can choose to use smaller sizes of Hpc7g instances to pick a lower number of cores and evenly distribute memory and network resources across the remaining cores to increase per-core performance to help reduce software licensing costs.

You can also use Hpc7g instances with AWS ParallelCluster to offer a complete HPC run-time environment that spans both x86 and arm64 instance types, giving you the flexibility to run different workload types within the same HPC cluster. You can compare and contrast performance, thus making it easier to find out what’s best for you and enabling easier porting of your workload.

Customer Story
The Water Institute is an independent, non-profit applied research organization that works across disciplines to advance science and develop integrated methods used to solve complex environmental and societal challenges.

They benchmarked the Hpc7g instances with 200 Gbps EFA using the Advanced Circulation (ADCIRC) model. ADCIRC is deployed throughout many US government agencies to simulate the movement of water due to astronomic tides, riverine flows, and atmospheric forces, including hurricanes and it is often used for real-time forecasting applications and design studies.

The model run for this application is targeted at Southern Louisiana and is the basis for most of the analysis conducted there including levee design, planning studies, and real-time hurricane storm surge forecasting applications. The left graphic above shows the full extent of the domain, while to the right of that, the high-resolution area targeted at Southern Louisiana shows flooding around the levees in New Orleans during a simulation of Hurricane Katrina.

The model contains 1.6 million vertices and 3 million elements. It’s these parameters that affect the computational complexity of the simulations. The simulations depict 18 days of astronomic tide, river inflows, and atmospheric wind and pressure forcing.

The Water Institute benchmarked against many of the instance types that would be useful for their workload types at AWS, including c6gn.16xlarge, hpc7g.16xlarge, hpc6a.48xlarge, and hpc6id.36xlarge.

The Hpc7g instance shows more than 40 percent better performance than the C6gn instance and has comparable performance to other high performance x86 instance types but with a better price-to-performance ratio. With Hpc7g instances, the Water Institute can lower its costs while maintaining the performance levels they expect.

RIKEN, who has built the powerful supercomputer, FUGAKU using arm64, is collaborating with AWS to create a virtual Fugaku using Hpc7g with Graviton3E to support Japanese manufacturers’ increasing demand for compute power. RIKEN has already confirmed that multiple Fugaku applications provide excellent performance on the AWS Graviton3E processor in the AWS cloud environment.

Also, Siemens has optimized the scalability of Simcenter STAR-CCM+ across a broad range of CPU and GPU instances on AWS. This technology is supported on Linux and available through Arm-based EC2 instances or the Fugaku supercomputer.

To hear more voices of customers and partners such as Ansys, Arup, CERFACS, ESI, Jij, ParTec, Rescale, and TotalCAE, see the Hpc7g instances page.

Now Available
Amazon EC2 Hpc7g instances are now generally available in the US East (N. Virginia) Region for purchase in On-Demand, Reserved Instance, and Savings Plan form.

To learn more, see the Amazon EC2 Hpc7g instances page. Give it a try, and please send feedback to AWS re:Post for High Performance Compute or through your usual AWS support contacts.

Channy

Streaming Android games from cloud to mobile with AWS Graviton-based Amazon EC2 G5g instances

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/streaming-android-games-from-cloud-to-mobile-with-aws-graviton-based-amazon-ec2-g5g-instances/

This blog post is written by Vincent Wang, GCR EC2 Specialist SA, Compute.

Streaming games from the cloud to mobile devices is an emerging technology that allows less powerful and less expensive devices to play high-quality games with lower battery consumption and less storage capacity. This technology enables a wider audience to enjoy high-end gaming experiences from their existing devices, such as smartphones, tablets, and smart TVs.

To load games for streaming on AWS, it’s necessary to use Android environments that can utilize GPU acceleration for graphics rendering and optimize for network latency. Cloud-native products, such as the Anbox Cloud Appliance or Genymotion available on the AWS Marketplace, can provide a cost-effective containerized solution for game streaming workloads on Amazon Elastic Compute Cloud (Amazon EC2).

For example, Anbox Cloud’s virtual device infrastructure can run games with low latency and high frame rates. When combined with the AWS Graviton-based Amazon EC2 G5g instances, which offer a cost reduction of up to 30% per-game stream per-hour compared to x86-based GPU instances, it enables companies to serve millions of customers in a cost-efficient manner.

In this post, we chose the Anbox Cloud Appliance to demonstrate how you can use it to stream a resource-demanding game called Genshin Impact. We use a G5g instance along with a mobile phone to run the streamed game inside of a Firefox browser application.

Overview

Graviton-based instances utilize fewer compute resources than x86-based instances due to the 64-bit architecture of Arm processors used in AWS Graviton servers. As shown in the following diagram, Graviton instances eliminate the need for cross-compilation or Android emulation. This simplifies development efforts and reduces time-to-market, thereby lowering the cost-per-stream. With G5g instances, customers can now run their Android games natively, encode CPU or GPU-rendered graphics, and stream the game over the network to multiple mobile devices.

Architecture difference when running Android on X86-based instance and Graviton-based instance.

Figure 1: Architecture difference when running Android on X86-based instance and Graviton-based instance.

Real-time ray-traced rendering is required for most modern games to deliver photorealistic objects and environments with physically accurate shadows, reflections, and refractions. The G5g instance, which is powered by AWS Graviton2 processors and NVIDIA T4G Tensor Core GPUs, provides a cost-effective solution for running these resource-intensive games.

Architecture

Architecture of Android Streaming Game.

Figure 2: Architecture of Android Streaming Game.

When streaming games from a mobile device, only input data (touchscreen, audio, etc.) is sent over the network to the game streaming server hosted on a G5g instance. Then, the input is directed to the appropriate Android container designated for that particular client. The game application running in the container processes the input and updates the game state accordingly. Then, the resulting rendered image frames are sent back to the mobile device for display on the screen. In certain games, such as multiplayer games, the streaming server must communicate with external game servers to reflect the full game state. In these cases, additional data is transferred to and from game servers and back to the mobile client. The communication between clients and the streaming server is performed using the WebRTC network protocol to minimize latency and make sure that users’ gaming experience isn’t affected.

The Graviton processor handles compute-intensive tasks, such as the Android runtime and I/O transactions on the streaming server. However, for resource-demanding games, the Nvidia GPU is utilized for graphics rendering. To scale effortlessly, the Anbox Cloud software can be utilized to manage and execute several game sessions on the same instance.

Prerequisites

First, you need an Ubuntu single sign-on (SSO) account. If you don’t have one yet, you may create one from Ubuntu One website. Then you need an Android mobile phone with Firefox or Chrome browser installed to play the streaming games.

Setup

We can install Anbox Cloud Appliance in the AWS Marketplace. Select the Arm variant so that it works on Graviton-based instances. If the subscription doesn’t work on the first try, then you receive an email which guides you to a page where you can try again.

Figure 3: Subscribe Anbox Cloud Appliance in AWS Marketplace.

Figure 3: Subscribe Anbox Cloud Appliance in AWS Marketplace.

In this demonstration, we select G5g.xlarge in the Instance type section and leave all settings with default values, except the storage as per the following:

  1. A root disk with minimum 50 GB (required)
  2. An additional Amazon Elastic Block Store (Amazon EBS) volume with at least 100 GB (recommended)

For the Genshin Impact demo, we recommend a specific amount of storage. However, when deploying your Android applications, you must select an appropriate storage size based on the package size. Additionally, you should choose an instance size based on the resources that you plan to utilize for your gaming sessions, such as CPU, memory, and networking. In our demo, we launched only one session from a single mobile device.

Launch the instance and wait until it reaches running status. Then you can secure shell (SSH) to the instance to configure the Android environment.

Install Anbox cloud

To make sure of the security and reliability of some of the package repositories used, we update the CUDA Linux GPG Repository Key. View this Nvidia blog post for more details on this procedure.

$ sudo apt-key del 7fa2af80

$ wget

https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/sbsa/cuda keyring_1.0-1_all.deb

$ sudo dpkg -i cuda-keyring_1.0-1_all.deb

As the Android in Anbox Cloud Appliance is running in an LXD container environment, upgrade LXD to the latest version.

  $ sudo snap refresh –channel=5.0/stable lxd

Install the Anbox Cloud Appliance software using the following command and selecting the default answers:

  $ sudo anbox-cloud-appliance init

Watch the status page at https://$(ec2_public_DNS_name) for progress information.

Figure 4: The status of deploying Anbox Cloud.

Figure 4: The status of deploying Anbox Cloud.

The initialization process takes approximately 20 minutes. After it’s complete, register the Ubuntu SSO account previously created, then follow the instructions provided to finalize the process.

  $ anbox-cloud-appliance dashboard register <your Ubuntu SSO email address>

Stream an Android game application

Use the sample from the following repo to setup the service on the streaming server:

  $ git clone https://github.com/anbox-cloud/cloud-gaming-demo.git

Build the Flutter web UI:

$ sudo snap install flutter –classic

$ cd cloud-gaming-demo/ui && flutter build web && cd ..

$ mkdir -p backend/service/static

$ cp -av ui/build/web/* backend/service/static

Then build the backend service which processes requests and interacts with the Anbox Stream Gateway to create instances of game applications. Start by preparing the environment:

$ sudo apt-get install python3-pip

$ sudo pip3 install virtualenv

$ cd backend && virtualenv venv

Create the configuration file for the backend service so that it can access the Anbox Stream Gateway. There are two parameters to set: gateway-URL and gateway-token. The gateway token can be obtained from the following command:

$ anbox-cloud-appliance gateway account create <account-name>

Create a file called config.yaml that contains the two values:

gateway-url: https:// <EC2 public DNS name>

gateway-token: <gateway_token>

Add the following line to the activate hook in the backend/venv/bin/ directory so that the backend service can read config.yaml on its startup:

$ export CONFIG_PATH=<path_to_config_yaml>

Now we can launch the backend service which will be served by default on TCP port 8002.

$./run.sh

In the next steps, we download a game and build it via Anbox Cloud. We need an Android APK and a configuration file. Create a folder under the HOME directory and create a manifest.yaml file in the folder. In this example, we must add the following details in the file. You can refer to the Anbox Cloud documentation for more information on the format.

name: genshin

instance-type: g10.3

resources:

cpus: 10

memory: 25GB

disk-size: 50GB

gpu-slots: 15

features: [“enable_virtual_keyboard”]

Select an APK for the arm64-v8a architecture which is natively supported on Graviton. In this example, we download Genshin Impact, an action role-playing game developed and published by miHoYo. You must supply your own Android APK if you want to try these steps. Download the APK into the folder and rename it to app.apk. Overall, the final layout of the game folder should look as follows:

.

├── app.apk

└── manifest.yaml

Run the following command from the folder to create the application:

$ amc application create  .

Wait until the application status changes to ready. You can monitor the status with the following command:

$ amc application ls

Edit the following:

  1. Update the gameids variable defined in the ui/lib/homepage.dart file to include the name of the game (as declared in the manifest file).
  2. Insert a new key/value pair to the static appNameMap and appDesMap variables defined in the lib/api/application.dart file.
  3. Provide a screenshot of the game (in jpeg format), rename it to <game-name>.jpeg, and put it into the ui/lib/assets directory.

Then, re-build the web UI, copy the contents from the ui/build/web folder to the backend/service/static directory, and refresh the webpage.

Test the game

Using your mobile phone, open the Firefox browser or another browser that supports WebRTC. Type the public DNS name of the G5g instance with the 8002 TCP port, and you should see something similar to the following:

Figure 5: The webpage of the Android streaming game portal.

Figure 5: The webpage of the Android streaming game portal.

Select the Play now button, wait a moment for the application to be setup on the server side, and then enjoy the game.

Figure 6: The screen capture of playing Android streaming game.

Figure 6: The screen capture of playing Android streaming game.

Clean-up

Please cancel the subscription of the Anbox Cloud Appliance in the AWS Marketplace, you can follow the AWS Marketplace Buyer Guide for more details, then terminate the G5g.xlarge instance to avoid incurring future costs.

Conclusion

In this post, we demonstrated how a resource-intensive Android game runs natively on a Graviton-based G5g instance and is streamed to an Arm-based mobile device. The benefits include better price-performance, reduced development effort, and faster time-to-market. One way to run your games efficiently on the cloud is through software available on the AWS Marketplace, such as the Anbox Cloud Appliance, which was showcased as an example method.

To learn more about AWS Graviton, visit the official product page and the technical guide.

New – Amazon EC2 Hpc6id Instances Optimized for High Performance Computing

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-hpc6id-instances-optimized-for-high-performance-computing/

We have given you the flexibility and ability to run the largest and most complex high performance computing (HPC) workloads with Amazon Elastic Compute Cloud (Amazon EC2) instances that feature enhanced networking like C5n, C6gnR5n, M5n, and our recently launched HPC instances Hpc6a.

We heard feedback from customers asking us to deliver more options to support their most intensive workloads with higher per-vCPU compute performance as well as larger memory and local disk storage to reduce job completion time for data-intensive workloads like Finite Element Analysis (FEA) and seismic processing.

Announcing Amazon EC2 Hpc6id Instance for HPC Workloads
Today, we announce the general availability of Amazon EC2 Hpc6id instances, a new instance type that is purpose-built for tightly coupled HPC workloads. Amazon EC2 Hpc6id instances are powered by 3rd Gen Intel Xeon Scalable processors (Ice Lake) that run at frequencies up to 3.5 GHz, 1024 GiB memory, 15.2 TB local SSD disk, 200 Gbps Elastic Fabric Adapter (EFA) network bandwidth, which is 4x higher than R6i instances.

Amazon EC2 Hpc6id instances have the best per-vCPU HPC performance when compared to similar x86-based EC2 instances for data-intensive HPC workloads.

Here are the detailed specs:

Instance Name CPUs RAM EFA Network Bandwidth Attached Storage
hpc6id.32xlarge 64 1024 GiB Up to 200 Gbps 15.2 TB local SSD disk

Amazon EC2 Hpc6id Instances Use Cases
Customers running license-bound scenarios can lower infrastructure and HPC software licensing costs with Hpc6id. Other customers with HPC codes that are optimized for Intel-specific features, such as Math Kernel Library or AVX-512, can migrate their largest HPC workloads to Hpc6id and scale up their workloads on AWS by taking advantage of 200 Gbps EFA bandwidth.

Other customers using HPC software codes that are optimized for per-CPU performance are also able to consolidate their workloads on fewer nodes and complete jobs faster with Hpc6id. Faster job completion time helps customers to reduce both infrastructure and software licensing costs. Customers can use Hpc6id instances to quickly carry out complex calculations across a range of cluster sizes—up to tens of thousands of cores.

Customers also can use Hpc6id instances with AWS ParallelCluster to provision Hpc6id instances alongside other instance types, giving customers the flexibility to run different workload types within the same HPC cluster. Hpc6id instances benefit from the AWS Nitro System, a rich collection of building blocks that offloads many of the traditional virtualization functions to dedicated hardware and software to deliver high performance, high availability, and high security while also reducing virtualization overhead.

Now Available
Amazon EC2 Hpc6id instances are available for purchase as On-Demand or Reserved Instances or with Savings Plans. Hpc6id instances are available in the US East (Ohio) and AWS GovCloud (US-West) Regions. To optimize Amazon EC2 Hpc6id instances networking for tightly coupled workloads, use cluster placement groups within a single Availability Zone.

To learn more, visit our Hpc6 instance page and get in touch with our HPC teamAWS re:Post for EC2, or through your usual AWS Support contacts.

Channy

Our guide to AWS Compute at re:Invent 2022

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/our-guide-to-aws-compute-at-reinvent-2022/

This blog post is written by Shruti Koparkar, Senior Product Marketing Manager, Amazon EC2.

AWS re:Invent is the most transformative event in cloud computing and it is starting on November 28, 2022. AWS Compute team has many exciting sessions planned for you covering everything from foundational content, to technology deep dives, customer stories, and even hands on workshops. To help you build out your calendar for this year’s re:Invent, let’s look at some highlights from the AWS Compute track in this blog. Please visit the session catalog for a full list of AWS Compute sessions.

Learn what powers AWS Compute

AWS offers the broadest and deepest functionality for compute. Amazon Elastic Cloud Compute (Amazon EC2) offers granular control for managing your infrastructure with the choice of processors, storage, and networking.

The AWS Nitro System is the underlying platform for our all our modern EC2 instances. It enables AWS to innovate faster, further reduce cost for our customers, and deliver added benefits like increased security and new instance types.

Discover the benefits of AWS Silicon

AWS has invested years designing custom silicon optimized for the cloud. This investment helps us deliver high performance at lower costs for a wide range of applications and workloads using AWS services.

  • Explore the AWS journey into silicon innovation with our “CMP201: Silicon Innovation at AWS” session. We will cover some of the thought processes, learnings, and results from our experience building silicon for AWS Graviton, AWS Nitro System, and AWS Inferentia.
  • To learn about customer-proven strategies to help you make the move to AWS Graviton quickly and confidently while minimizing uncertainty and risk, attend “CMP410: Framework for adopting AWS Graviton-based instances”.

 Explore different use cases

Amazon EC2 provides secure and resizable compute capacity for several different use-cases including general purpose computing for cloud native and enterprise applications, and accelerated computing for machine learning and high performance computing (HPC) applications.

High performance computing

  • HPC on AWS can help you design your products faster with simulations, predict the weather, detect seismic activity with greater precision, and more. To learn how to solve world’s toughest problems with extreme-scale compute come join us for “CMP205: HPC on AWS: Solve complex problems with pay-as-you-go infrastructure”.
  • Single on-premises general-purpose supercomputers can fall short when solving increasingly complex problems. Attend “CMP222: Redefining supercomputing on AWS” to learn how AWS is reimagining supercomputing to provide scientists and engineers with more access to world-class facilities and technology.
  • AWS offers many solutions to design, simulate, and verify the advanced semiconductor devices that are the foundation of modern technology. Attend “CMP320: Accelerating semiconductor design, simulation, and verification” to hear from ARM and Marvel about how they are using AWS to accelerate EDA workloads.

Machine Learning

Cost Optimization

Hear from our customers

We have several sessions this year where AWS customers are taking the stage to share their stories and details of exciting innovations made possible by AWS.

Get started with hands-on sessions

Nothing like a hands-on session where you can learn by doing and get started easily with AWS compute. Our speakers and workshop assistants will help you every step of the way. Just bring your laptop to get started!

You’ll get to meet the global cloud community at AWS re:Invent and get an opportunity to learn, get inspired, and rethink what’s possible. So build your schedule in the re:Invent portal and get ready to hit the ground running. We invite you to stop by the AWS Compute booth and chat with our experts. We look forward to seeing you in Las Vegas!

Let’s Architect! Architecting with custom chips and accelerators

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-custom-chips-and-accelerators/

It’s hard to imagine a world without computer chips. They are at the heart of the devices that we use to work and play every day. Currently, Amazon Web Services (AWS) is offering customers the next generation of computer chip, with lower cost, higher performance, and a reduced carbon footprint.

This edition of Let’s Architect! focuses on custom computer chips, accelerators, and technologies developed by AWS, such as AWS Nitro System, custom-designed Arm-based AWS Graviton processors that support data-intensive workloads, as well as AWS Trainium, and AWS Inferentia chips optimized for machine learning training and inference.

In this post, we discuss these new AWS technologies, their main characteristics, and how to take advantage of them in your architecture.

Deliver high performance ML inference with AWS Inferentia

As Deep Learning models become increasingly large and complex, the training cost for these models increases, as well as the inference time for serving.

With AWS Inferentia, machine learning practitioners can deploy complex neural-network models that are built and trained on popular frameworks, such as Tensorflow, PyTorch, and MXNet on AWS Inferentia-based Amazon EC2 Inf1 instances.

This video introduces you to the main concepts of AWS Inferentia, a service designed to reduce both cost and latency for inference. To speed up inference, AWS Inferentia: selects and shares a model across multiple chips, places pieces inside the on-chip cache, then streams the data via pipeline for low-latency predictions.

Presenters discuss through the structure of the chip, software considerations, as well as anecdotes from the Amazon Alexa team, who uses AWS Inferentia to serve predictions. If you want to learn more about high throughput coupled with low latency, explore Achieve 12x higher throughput and lowest latency for PyTorch Natural Language Processing applications out-of-the-box on AWS Inferentia on the AWS Machine Learning Blog.

AWS Inferentia shares a model across different chips to speed up inference

AWS Inferentia shares a model across different chips to speed up inference

AWS Lambda Functions Powered by AWS Graviton2 Processor – Run Your Functions on Arm and Get Up to 34% Better Price Performance

AWS Lambda is a serverless, event-driven compute service that enables code to run from virtually any type of application or backend service, without provisioning or managing servers. Lambda uses a high-availability compute infrastructure and performs all of the administration of the compute resources, including server- and operating-system maintenance, capacity-provisioning, and automatic scaling and logging.

AWS Graviton processors are designed to deliver the best price and performance for cloud workloads. AWS Graviton3 processors are the latest in the AWS Graviton processor family and provide up to: 25% increased compute performance, two-times higher floating-point performance, and two-times faster cryptographic workload performance compared with AWS Graviton2 processors. This means you can migrate AWS Lambda functions to Graviton in minutes, plus get as much as 19% improved performance at approximately 20% lower cost (compared with x86).

Comparison between x86 and Arm/Graviton2 results for the AWS Lambda function computing prime numbers

Comparison between x86 and Arm/Graviton2 results for the AWS Lambda function computing prime numbers (click to enlarge)

Powering next-gen Amazon EC2: Deep dive on the Nitro System

The AWS Nitro System is a collection of building-block technologies that includes AWS-built hardware offload and security components. It is powering the next generation of Amazon EC2 instances, with a broadening selection of compute, storage, memory, and networking options.

In this session, dive deep into the Nitro System, reviewing its design and architecture, exploring new innovations to the Nitro platform, and understanding how it allows for fasting innovation and increased security while reducing costs.

Traditionally, hypervisors protect the physical hardware and bios; virtualize the CPU, storage, networking; and provide a rich set of management capabilities. With the AWS Nitro System, AWS breaks apart those functions and offloads them to dedicated hardware and software.

AWS Nitro System separates functions and offloads them to dedicated hardware and software, in place of a traditional hypervisor

AWS Nitro System separates functions and offloads them to dedicated hardware and software, in place of a traditional hypervisor

How Amazon migrated a large ecommerce platform to AWS Graviton

In this re:Invent 2021 session, we learn about the benefits Amazon’s ecommerce Datapath platform has realized with AWS Graviton.

With a range of 25%-40% performance gains across 53,000 Amazon EC2 instances worldwide for Prime Day 2021, the Datapath team is lowering their internal costs with AWS Graviton’s improved price performance. Explore the software updates that were required to achieve this and the testing approach used to optimize and validate the deployments. Finally, learn about the Datapath team’s migration approach that was used for their production deployment.

AWS Graviton2: core components

AWS Graviton2: core components

See you next time!

Thanks for exploring custom computer chips, accelerators, and technologies developed by AWS. Join us in a couple of weeks when we talk more about architectures and the daily challenges faced while working with distributed systems.

Other posts in this series

Looking for more architecture content?

AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

New – Amazon EC2 X2idn and X2iedn Instances for Memory-Intensive Workloads with Higher Network Bandwidth

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-x2idn-and-x2iedn-instances-for-memory-intensive-workloads-with-higher-network-bandwidth/

In 2016, we launched Amazon EC2 X1 instances designed for large-scale and in-memory applications in the cloud. The price per GiB of RAM for X1 instances is among the lowest. X1 instances are ideal for high performance computing (HPC) applications and running in-memory databases like SAP HANA and big data processing engines such as Apache Spark or Presto.

The following year, we launched X1e instances with up to 4 TiB of memory designed to run SAP HANA and other memory-intensive, in-memory applications. These instances are certified by SAP to run production environments of the next-generation Business Suite S/4HANA, Business Suite on HANA (SoH), Business Warehouse on HANA (BW), and Data Mart Solutions on HANA on the AWS Cloud.

Today, I am happy to announce the general availability of Amazon EC2 X2idn/X2iedn instances, built on the AWS Nitro system and featuring the third-generation Intel Xeon Scalable (Ice Lake) processors with up to 50 percent higher compute price performance than comparable X1 instances. These improvements result in up to 45 percent higher SAP Application Performance Standard (SAPS) performance than comparable X1 instances.

You might have noticed that we’re now using the “i” suffix in the instance type to specify that the instances are using an Intel processor, “e” in the memory-optimized instance family to indicate extended memory, “d” with local NVMe-based SSDs that are physically connected to the host server, and “n” to support higher network bandwidth up to 100 Gbps.

X2idn instances enable up to 2 TiB of memory, while X2iedn instances enable up to 4 TiB of memory. X2idn and X2iedn instances also support 100 Gbps of network performance with hardware-enabled VPC encryption and support 80 Gbps of Amazon EBS bandwidth and 260k IOPs with EBS-encrypted volumes.

Instance Name vCPUs RAM (GiB) Local NVMe SSD Storage (GB) Network Bandwidth (Gbps) EBS-Optimized Bandwidth (Gbps)
x2idn.16xlarge 64 1024 1 x 1900 Up to 50 Up to 40
x2idn.24xlarge 96 1536 1 x 1425 75 60
x2idn.32xlarge 128 2048 2 x 1900 100 80
x2iedn.xlarge 4 128 1 x 118 Up to 25 Up to 20
x2iedn.2xlarge 8 256 1 x 237 Up to 25 Up to 20
x2iedn.4xlarge 16 512 1 x 475 Up to 25 Up to 20
x2iedn.8xlarge 32 1024 1 x 950 25 20
x2iedn.16xlarge 64 2048 1 x 1900 50 40
x2iedn.24xlarge 96 3072 2 x 1425 75 60
x2iedn.32xlarge 128 4096 2 x 1900 100 80

X2idn instances are ideal for running large in-memory databases such as SAP HANA. All of the X2idn instance sizes are certified by SAP for production HANA and S/4HANA workloads. In addition, X2idn instances are ideal for memory-intensive and latency-sensitive workloads such as Apache Spark and Presto, and for generating real-time analytics, processing giant graphs using Neo4j or Titan, or creating enormous caches.

X2iedn instances are optimized for applications that seek high memory to vCPU ratio and deliver the highest memory capacity per vCPU among all virtualized EC2 instance types. X2iedn is suited to run high-performance databases (such as Oracle DB, SQL server) and in-memory workloads (such as SAP HANA, Redis). Workloads that are sensitized to per-core licensing, such as Oracle DB, greatly benefit from the higher memory per vCPU (32GB:1vCPU) offered by X2iedn. X2iedn allows you to optimize licensing costs because it provides customers the same memory at half the number of vCPU compared to X2idn.

These instances offer the same amount of local storage as in X1/X1e, up to 3.8 TB, but the local storage in X2idn/X2iedn is NVMe-based, which will offer an order of magnitude lower latency compared to SATA SSDs in X1/X1e.

Things to Know
Here are some fun facts about the X2idn and X2iedn instances:

Optimizing CPU—You can disable Intel Hyper-Threading Technology for workloads that perform well with single-threaded CPUs, like some HPC applications.

NUMA—You can make use of non-uniform memory access (NUMA) on X2idn and X2iedn instances. This advanced feature is worth exploring if you have a deep understanding of your application’s memory access patterns.

Available Now
X2idn instances are now available in the US East (N. Virginia), Asia Pacific (Mumbai, Singapore, Tokyo), Europe (Frankfurt, Ireland) Regions.

X2iedn instances are now available in the US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Singapore, Tokyo), Europe (Frankfurt, Ireland) Regions.

You can use On-Demand Instances, Reserved Instances, Savings Plan, and Spot Instances. Dedicated Instances and Dedicated Hosts are also available.

To learn more, visit our EC2 X2i Instances page, and please send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

Channy

New – Amazon EC2 Hpc6a Instance Optimized for High Performance Computing

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-hpc6a-instance-optimized-for-high-performance-computing/

High Performance Computing (HPC) allows scientists and engineers to solve complex, compute-intensive problems such as computational fluid dynamics (CFD), weather forecasting, and genomics. HPC applications typically require instances with high memory bandwidth, a low latency, high bandwidth network interconnect and access to a fast parallel file system.

Many customers have turned to AWS to run their HPC workloads. For example, Descartes Labs used AWS to power a TOP500 LINPACK benchmarking (the most powerful commercially available computer systems) run that delivered 1.93 PFLOPS, landing at position 136 on the TOP500 list in June 2019. That run made use of 41,472 cores on a cluster of Amazon EC2 C5 instances. Last year Descartes Labs ran the LINPACK benchmark again and placed within the top 40 on the June 2021 TOP500 list with 172,692 cores on a cluster of EC2 instances, which represents a 417 percent performance increase in just two years.

AWS enables you to increase the speed of research and reduce time-to-results by running HPC in the cloud and scaling to tens of thousands of parallel tasks that wouldn’t be practical in most on-premises environments. AWS helps you reduce costs by providing CPU, GPU, and FPGA instances on-demand, Elastic Fabric Adapter (EFA), an EC2 network device that improves throughput and scaling tightly coupled workloads, and AWS ParallelCluster, an open-source cluster management tool that makes it easy for you to deploy and manage HPC clusters on AWS.

Announcing EC2 Hpc6a Instances for HPC Workloads
Customers today across various industries use compute-optimized EFA-enabled Amazon EC2 instances (for example, C5n, R5n, M5n, and M5zn) to maximize the performance of a variety of HPC workloads, but as these workloads scale to tens of thousands of cores, cost-efficiency becomes increasingly important. We have found that customers are not only looking to optimize performance for their HPC workloads but want to optimize costs as well.

As we pre-announced in November 2021, Hpc6a, a new HPC-optimized EC2 instance, is generally available beginning today. This instance delivers 100 Gbps networking through EFA with 96 third-generation AMD EPYC™ processor (Milan) cores with 384 GB RAM, and offers up to 65 percent better price-performance over comparable x86-based compute-optimized instances.

You can launch Hpc6a instances today in the US East (Ohio) and GovCloud (US-West) Regions in On-Demand and Dedicated Hosting or as part of a Savings Plan. Here are the detailed specs:

Instance Name CPUs* RAM EFA Network Bandwidth Attached Storage
hpc6a.48xlarge 96 384 GiB Up to 100 Gbps EBS Only

*Hpc6a instances have simultaneous multi-threading disabled to optimize for HPC codes. This means that unlike other EC2 instances, Hpc6a vCPUs are physical cores, not threads.

To enable predictable thread performance and efficient scheduling for HPC workloads, simultaneous multi-threading is disabled. Thanks to AWS Nitro System, no cores are held back for the hypervisor, making all cores available to your code.

Hpc6a instances introduce a number of targeted features to deliver cost and performance optimizations for customers running tightly coupled HPC workloads that rely on high levels of inter-instance communications. These instances enable EFA networking bandwidth of 100 Gbps and are designed to efficiently scale large tightly coupled clusters within a single Availability Zone.

We hear from many of our engineering customers, such as those in the automotive sector, that they want to reduce the need for physical testing and move towards an increasingly virtual simulation-based product design process faster at a lower cost.

According to our benchmarking results for Siemens Simcenter STAR-CCM+ automotive CFD simulation, when the Hpc6a scales up to 400 nodes (approximately 40,000 cores), with the help of EFA networking, it is able to maintain approximately 100 percent scaling efficiency. Hpc6a instance shows 70 percent lower cost compared to c5n, meaning companies can deliver new designs faster and at a lower cost when using Hpc6a instances. This means companies can deliver new designs faster and at a lower cost when using Hpc6a instances.

You can use the Hpc6a instance with AMD EPYC third-generation (Milan) processors to run your largest and most complex HPC simulations on EC2 and optimize for cost and performance. Customers can also use the new Hpc6a instances with AWS Batch and AWS ParallelCluster to simplify workload submission and cluster creation.

To learn more, visit our Hpc6a instance page and get in touch with our HPC team, AWS re:Post for EC2, or through your usual AWS Support contacts.

Channy

Monitoring dashboard for AWS ParallelCluster

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/monitoring-dashboard-for-aws-parallelcluster/

This post is contributed by Nicola Venuti, Sr. HPC Solutions Architect.

AWS ParallelCluster is an AWS-supported, open source cluster management tool that makes it easy to deploy and manage High Performance Computing (HPC) clusters on AWS. While AWS ParallelCluster includes many benefits for its users, it has not provided straightforward support for monitoring your workloads. In this post, I walk you through an add-on extension that you can use to help monitor your cloud resources with AWS ParallelCluster.

Product overview

AWS ParallelCluster offers many benefits that hide the complexity of the underlying platform and processes. Some of these benefits include:

  • Automatic resource scaling
  • A batch scheduler
  • Easy cluster management that allows you to build and rebuild your infrastructure without the need for manual actions
  • Seamless migration to the cloud supporting a wide variety of operating systems.

While all of these benefits streamline processes for you, one crucial task for HPC stakeholders that customers have found challenging with AWS ParallelCluster is monitoring.

Resources in an HPC cluster like schedulers, instances, storage, and networking are important assets to monitor, especially when you pay for what you use. Organizing and displaying all these metrics becomes challenging as the scale of the infrastructure increases or changes over time which is the typical scenario in the Cloud.

Given these challenges, AWS wants to provide AWS ParallelCluster users with two additional benefits: 1/ facilitate the optimization of price performance and 2/ visualize and monitor the components of cost for their workloads.

The HPC Solution Architects team has created an AWS ParallelCluster add-on that is easy to use and customize. In this blog post, I demonstrate how Grafana – a platform for monitoring and observability – can run on AWS ParallelCluster to enable infrastructure monitoring.

AWS ParallelCluster integrated dashboard and Grafana add-on dashboards

Many customers want a tool that consolidates information coming from different AWS services and makes it easy to monitor the computing infrastructure created and managed by AWS ParallelCluster. The AWS ParallelCluster team – just like every other service team at AWS – is open and keen to listen from our customers.

This feedback helps us inform our product roadmap and build new, customer-focused features.

We recently released AWS ParallelCluster 2.10.0 based on customer feedback. Here are some key features have been released with it:

  • Support for the CentOS 8 Operating System: Customers can now choose CentOS 8 as their base operating system of choice to run their clustersfor both x86 and Arm architectures.
  • Support for P4d instances along with NVIDIA GPUDirect Remote Direct Memory Access (RDMA).
  • FSx for Lustre enhancements (support for  FSx AutoImport and HDD-based support filesystem options)
  • And finally, a new CloudWatch Dashboards designed to help aggregating cluster information, metrics, and logging already available on CloudWatch.

The last of these features above, integrated CloudWatch Dashboards, are designed to help customers face the challenge of cluster monitoring mentioned above. In addition, the Grafana dashboards demonstrated later in this blog are a complementary add-on to this new, CloudWatch-based, dashboard. There are some key differences between the two, summarized in the table below.

The latter does not require any additional component or agent running on either the head or the compute nodes. It aggregates the metrics already pushed by AWS ParallelCluster on CloudWatch into a single dashboard. This translates into zero overhead for the cluster, but at the expense of less flexibility and expandability.

Instead, the Grafana-based dashboard offers additional flexibility and customizability and requires a few lightweight components installed on either the head or the compute nodes. Another key difference between the two monitoring dashboards is that the CloudWatch based one requires AWS credentials, IAM User and Roles configured and access to the AWS web-console, while the Grafana-based one has its-own built-in authentication and authorization system unrelated from the AWS account, end-users or HPC admins might not have permissions (or simply are not willing) to access the AWS Management Console in order to monitor their clusters.

CloudWatch based dashboard Grafana-based monitoring add-on
No additional component Grafana + Prometheus
No overhead Minimal overhead
Little to no expandability Support full customizability
Requires AWS credentials and IAM configured Custom credentials, no AWS access required
Custom user-interface

Grafana add-on dashboards on GitHub

There are many components of an HPC cluster to monitor. Moreover, the cluster is built on a system that is continuously evolving: AWS ParallelCluster and AWS services in general are updated and new features are released often. Because of this, we wanted to have a monitoring solution that is developed on flexible components, that can evolve rapidly. So, we released a Grafana add-on as an open-source project onto this GitHub repository.

Releasing it as an open-source project allows us to more easily and frequently release new updates and enhancements. It also enables users to customize and extend its functionalities by adding new dashboards, or extending the dashboards functionalities (like GPU monitoring or track EC2 Spot Instance interruptions).

At the moment, this new solution is composed of the following open-source components:

  • Grafana is an open-source platform for monitoring and observing. Grafana allows you to query, visualize, alert, and understand your metrics. It also helps you create, explore, and share dashboards fostering a data-driven culture.
  • Prometheus is an open source project for systems and service monitoring from the Cloud Native Computing Foundation. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
  • The Prometheus Pushgateway is an open source tool that allows ephemeral and batch jobs to expose their metrics to Prometheus.
  • NGINX is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server.
  • Prometheus-Slurm-Exporter is a Prometheus collector and exporter for metrics extracted from the Slurm resource scheduling system.
  • Node_exporter is a Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.

Note: while almost all components are under the Apache2 license, only Prometheus-Slurm-Exporter is licensed under GPLv3. You should be aware of this license and accept its terms before proceeding and installing this component.

The dashboards

I demonstrate a few different Grafana dashboards in this post. These dashboards are available for you in the AWS Samples GitHub repository. In addition, two dashboards – still under development – are proposed in beta. The first shows the cluster logs coming from Amazon CloudWatch Logs. The second one shows the costs associated to each AWS service utilized.

All these dashboards can be used as they are or customized as you need:

  • AWS ParallelCluster Summary – This is the main dashboard that shows general monitoring info and metrics for the whole cluster. It includes: Slurm metrics, compute related metrics, storage performance metrics, and network usage.

ParallelCluster dashboard

  • Master Node Details – This dashboard shows detailed metrics for the head node, including CPU, memory, network, and storage utilization.

Master node details

  • Compute Node List – This dashboard shows the list of the available compute nodes. Each entry is a link to a more detailed dashboard: the compute node details (see the following image).

Compute node list

  • Compute Node Details – Similar to the head node details, this dashboard shows detailed metrics for the compute nodes.
  • Cluster Logs – This dashboard (still under development) shows all the logs of your HPC cluster. The logs are pushed by AWS ParallelCluster to Amazon CloudWatch Logs and are reported here.

Cluster logs

  • Cluster Costs (also under development) – This dashboard shows the cost associated to AWS Service utilized by your cluster. It includes: EC2, Amazon EBS, FSx, Amazon S3, Amazon EFS. as well as an aggregation of all the costs of every single component.

Cluster costs

How to deploy it

You can simply use the post-install script that you can find in this GitHub repo as it is, or customize it as you need. For instance, you might want to change your Grafana password to something more secure and meaningful for you, or you might want to customize some dashboards by adding additional components to monitor.

#Load AWS Parallelcluster environment variables
. /etc/parallelcluster/cfnconfig
#get GitHub repo to clone and the installation script
monitoring_url=$(echo ${cfn_postinstall_args}| cut -d ',' -f 1 )
monitoring_dir_name=$(echo ${cfn_postinstall_args}| cut -d ',' -f 2 )
monitoring_tarball="${monitoring_dir_name}.tar.gz"
setup_command=$(echo ${cfn_postinstall_args}| cut -d ',' -f 3 )
monitoring_home="/home/${cfn_cluster_user}/${monitoring_dir_name}"
case ${cfn_node_type} in
    MasterServer)
        wget ${monitoring_url} -O ${monitoring_tarball}
        mkdir -p ${monitoring_home}
        tar xvf ${monitoring_tarball} -C ${monitoring_home} --strip-components 1
    ;;
    ComputeFleet)
    ;;
esac
#Execute the monitoring installation script
bash -x "${monitoring_home}/parallelcluster-setup/${setup_command}" >/tmp/monitoring-setup.log 2>&1
exit $?

The proposed post-install script takes care of installing and configuring everything for you. Although a few additional parameters are needed in the AWS ParallelCluster configuration file: the post-install argument, additional IAM policies, custom security group, and a tag. You can find an AWS ParallelCluster template here.

Please note that, at the moment, the post install script has only been tested using Amazon Linux 2.

Add the following parameters to your AWS ParallelCluster configuration file, and then build up your cluster:

base_os = alinux2
post_install = s3://<my-bucket-name>/post-install.sh
post_install_args = https://github.com/aws-samples/aws-parallelcluster-monitoring/tarball/main,aws-parallelcluster-monitoring,install-monitoring.sh
additional_iam_policies = arn:aws:iam::aws:policy/CloudWatchFullAccess,arn:aws:iam::aws:policy/AWSPriceListServiceFullAccess,arn:aws:iam::aws:policy/AmazonSSMFullAccess,arn:aws:iam::aws:policy/AWSCloudFormationReadOnlyAccess
tags = {"Grafana" : "true"}

Make sure that port 80 and port 443 of your head node are accessible from the internet or from your network. You can achieve this by creating the appropriate security group via AWS Management Console or via Command Line Interface (AWS CLI), see the following example:

aws ec2 create-security-group --group-name my-grafana-sg --description "Open Grafana dashboard ports" —vpc-id vpc-1a2b3c4d
aws ec2 authorize-security-group-ingress --group-id sg-12345 --protocol tcp --port 443 —cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-id sg-12345 --protocol tcp --port 80 —cidr 0.0.0.0/0

There is more information on how to create your security groups here.

Finally, set the additional_sg parameter in the [VPC] section of your AWS ParallelCluster configuration file.

After your cluster is created, you can open a web-browser and connect to https://your_public_ip. You should see a landing page with links to the Prometheus database service and the Grafana dashboards.

Size your compute and head nodes

We looked into resource utilization of the components required for building this monitoring solution. In particular, the Prometheus node exporter installed in the compute nodes uses a small (almost negligible) number of CPU cycles, memory, and network.

Depending on the size of your cluster the components installed on the head node (see the list in the chapter “Solution components” of this blog) it might require additional CPU, memory and network capabilities. In particular, if you expect to run a large-scale cluster (hundreds of instances) because of the higher volume of network traffic due to the compute nodes continuously pushing metrics into the head node, we recommend you use an instance type bigger than what you planned.

We cannot advise you exactly how to size your head node because there are many factors that can influence resource utilization. The best recommendation we could give you is to use the Grafana dashboard itself to monitor the CPU, memory, disk, and most importantly, network utilization, and then resize your head node (or other components) accordingly.

Conclusions

This monitoring solution for AWS ParallelCluster is an add-on for your HPC Cluster on AWS and a complement to new features recently released in AWS ParallelCluster 2.10. This blog aimed to provide you with instructions and basic tooling that can be customized based on your needs and that can be adapted quickly as the underlying AWS services evolve.

We are open to hear from you, receive feedback, issues, and pull requests to extend its functionalities.

Amazon EC2 P4d instances deep dive

Post Syndicated from Neelay Thaker original https://aws.amazon.com/blogs/compute/amazon-ec2-p4d-instances-deep-dive/

This post is contributed by Amr Ragab, Senior Solutions Architect, Amazon EC2

Introduction

AWS is excited to announce that the new Amazon EC2 P4d instances are now generally available. This instance type brings additional benefits with 2.5x higher deep learning performance; adding to the accelerated instances portfolio, new features, and technical breakthroughs that our customers can benefit from with this latest technology. This blog post details some of those key features and how to integrate them into your current workloads and architectures.

Overview

P4d instances

As you can see from the generalized block diagram above, the p4d comes with dual socket Intel Cascade Lake 8275CL processors totaling 96 vCPUs at 3.0 GHz with 1.1 TB of RAM and 8 TB of NVMe local storage. P4d also comes with 8 x 40 GB NVIDIA Tesla A100 GPUs with NVSwitch and 400 Gbps Elastic Fabric Adapter (EFA) enabled networking. This instance configuration represents the latest generation of computing for our customers spanning Machine Learning (ML), High Performance Computing (HPC), and analytics.

One of the improvements of the p4d is in the networking stack.  This new instance type has 400 Gbps with support for EFA and GPUDirect RDMA. Now, on AWS, you can take advantage of point-to-point GPU to GPU communication (across nodes), bypassing the CPU. Look out for additional blogs and webinars detailing use cases of GPUDirect and how this feature helps decrease latency and improve performance for certain workloads.

Let’s look at some new features and performance metrics for the P4d instances.

Features

Local ephemeral NVMe storage
The p4d instance type comes with 8 TB of local NVMe storage. Each device has a maximum read/write throughput of 2.7 GB/s. To create a local namespace and staging area for input into the GPUs, you can create a local RAID 0 of all the drives. This results in aggregate read throughput of about 16 GB/s. The following table summarizes the I/O tests on the NVMe drives in this configuration.

FIO – Test Block Size Threads Bandwidth
1 Sequential Read 128k 96 16.4 GiB/s
2 Sequential Write 128k 96 8.2 GiB/s
3 Random Read 128k 96 16.3 GiB/s
4 Random Write 128k 96 8.1 GiB/s

NVSwitch

Introduced with the p4d instance type is NVSwitch. Every GPU in the node is connected to each other in a full mesh topology up to 600 GB/s bidirectional bandwidth. ML frameworks and HPC applications that use NVIDIA communication collectives library (NCCL) can take full advantage of this all-to-all communication layer.

P4d GPU to GPU bandwidth

P3 GPU to GPU bandwidth

P4d uses a full mesh NVLink topology for optimized all-to-all communication, compared to the previous generation P3/P3dn instances, which have all-to-all communication across various data path domains (NUMA, PCIe switch, NVLink).  This new topology accessed via NCCL will improve performance for multiGPU workloads.
To make optimal use of the NVSwitch ensure that in your instance, all GPUs application boost clocks are set to its maximum values:

sudo nvidia-smi -ac 1215,1410

Multi-Instance GPU (MIG)

It’s now possible, at the user level, to have control of fractionating a GPU into multiple GPU slices, with each GPU slice isolated from each other. This enables multiple users to run different workloads on the same GPU without impacting performance. I walk you through an example implementation of MIG in the following steps:

With every newly launched instance, MIG is disabled. So, you must enable it with the following command:

ubuntu@ip-172-31-34-6:~# sudo nvidia-smi -mig 1 

Enabled MIG Mode for GPU 00000000:10:1C.0
You can get a list of supported MIG profiles.
Next, you can create seven slices, and create compute instances for each slice.
ubuntu@ip-172-31-34-6:~# sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 
Successfully created GPU instance ID 9 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 7 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 8 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 11 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 12 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 13 on GPU 0 using profile MIG 1g.5gb (ID 19) 
Successfully created GPU instance ID 14 on GPU 0 using profile MIG 1g.5gb (ID 19)
ubuntu@ip-172-31-34-6:~# nvidia-smi mig -cci -gi 7,8,9,11,12,13,14 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 7 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 8 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 9 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 11 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 12 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 13 using profile MIG 1g.5gb (ID 0) 
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 14 using profile MIG 1g.5gb (ID 0)

You can split a GPU into a maximum of seven slices. To pass the GPU through into a docker container, you can specify the index pair at runtime:

docker run -it --gpus '"device=1:0"' nvcr.io/nvidia/tensorflow:20.09-tf1-py3

With MIG, you can run multiple smaller workloads on the same GPU without compromising performance. We will follow up with additional blogs on this feature as we integrate it with additional AWS services.

NVIDIA GPUDirect RDMA over EFA

For workloads optimized for multiGPU capabilities, we introduced GPUDirect over EFA fabric. This allows direct GPU-GPU communication across multiple p4d nodes for decreased latency and improved performance. Follow this user guide to get started with installing the EFA driver and setting up the environment. The code sample below can be used as a template to use GPUDirect RDMA over EFA.

/opt/amazon/openmpi/bin/mpirun \
     -n ${NUM_PROCS} -N ${NUM_PROCS_NODE} \
     -x RDMAV_FORK_SAFE=1 -x NCCL_DEBUG=info \
     -x FI_EFA_USE_DEVICE_RDMA=1 \
     --hostfile ${HOSTS_FILE} \
     --mca pml ^cm --mca btl tcp,self --mca btl_tcp_if_exclude lo,docker0 --bind-to none \
     $HOME/nccl-tests/build/all_reduce_perf -b 8 -e 4G -f 2 -g 1 -c 1 -n 100

Machine learning Optimizations

You can quickly get started with the all benefits mentioned earlier for the p4d by using our latest Deep Learning AMI (DLAMI). The DLAMI now comes with CUDA11 and the latest NVLink and cuDNN SDKs and drivers to take advantage of the p4d.

TensorFloat32 – TF32

TF32 is a new 19 bit precision datatype from NVIDIA introduced for the first time for the p4d.24xlarge instance. This datatype improves performance with little to no loss of training and validation accuracy for most mainstream models. We have more detailed benchmarks for individual algorithms. But, on the p4d.24xlarge you can achieve approximately a 2.5 fold increase compared to FP32 on the p3dn.24xlarge for mainstream deep learning models.

We have updated our machine learning models here to show examples (see the following chart) of popular algorithms our customers are using today including general DNNs and Bert.

DNN P3dn FP32 (imgs/sec) P3dn FP16 (imgs/sec) P4d Throughput TF32 (imgs/sec) P4d Throughput FP16 (imgs/sec) P4d over p3dn TF32/FP32 P4d over P3dn FP16
1 Resnet50 3057 7413 6841 15621 2.2 2.1
2 Resnet152 1145 2644 2823 5700 2.5 2.2
3 Inception3 2010 4969 4808 10433 2.4 2.1
4 Inception4 847 1778 2025 3811 2.4 2.1
5 VGG16 1202 2092 4532 7240 3.8 3.5
6 Alexnet 32198 50708 82192 133068 2.6 2.6
7 SSD300 1554 2918 3467 6016 2.2 2.1

BERT Large – Wikipedia/Books Corpus

GPUs Sequence Length Batch size / GPU: mixed precision, TF32 Gradient Accumulation: mixed precision, TF32 Throughput – mixed precision
1 1 128 64,64 1024,1024 372
2 4 128 64,64 256,256 1493
3 8 128 64,64 128,128 2936
4 1 512 16,8 2048,4096 77
5 4 512 16,8 512,1024 303
6 8 512 16,8 256,512 596

You can find other code examples at github.com/NVIDIA/DeepLearningExamples.

If you want to builld your own AMI or extend an AMI maintained by your organization you can use the github repo, which provides Packer scripts to build AMIs for Amazon Linux 2 or Ubuntu 18.04 versions.

https://github.com/aws-samples/aws-efa-nccl-baseami-pipeline

The stack includes the following components:

  • NVIDIA Driver 450.80.02
  • CUDA 11
  • NVIDIA Fabric Manager
  • cuDNN 8
  • NCCL 2.7.8
  • EFA latest driver
  • AWS-OFI-NCCL
  • FSx kernel and client driver and utilities
  • Intel OneDNN
  • NVIDIA-runtime Docker

Conclusion

Get started with the new P4d instances with support on Amazon EKS, AWS Batch, and Amazon Sagemaker. We are excited to hear about what you develop and run with the new P4d instances. If you have any questions please reach out to your account team. Now, go power up your ML and HPC workloads with NVIDIA Tesla A100s and the P4d instances.

New – GPU-Equipped EC2 P4 Instances for Machine Learning & HPC

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-gpu-equipped-ec2-p4-instances-for-machine-learning-hpc/

The Amazon EC2 team has been providing our customers with GPU-equipped instances for nearly a decade. The first-generation Cluster GPU instances were launched in late 2010, followed by the G2 (2013), P2 (2016), P3 (2017), G3 (2017), P3dn (2018), and G4 (2019) instances. Each successive generation incorporates increasingly-capable GPUs, along with enough CPU power, memory, […]

Introducing retry strategies for AWS Batch

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/introducing-retry-strategies-for-aws-batch/

This post is contributed by Christian Kniep, Sr. Developer Advocate, HPC and AWS Batch.

Scientists, researchers, and engineers are using AWS Batch to run workloads reliably at scale, and to offload the undifferentiated heavy lifting in their day-to-day work. But even with a slight chance of failure in the stack, the act of mitigating these failures reminds customers that infrastructure, middleware and software are not error proof.

Many customers use Amazon EC2 Spot Instances to save up to 90% on their computing cost by leveraging unused EC2 capacity. If unused EC2 capacity is unavailable, an EC2 Spot Instance can be reclaimed by EC2. While AWS Batch takes care of rescheduling the job on a different instance, this rescheduling should not be handled differently depending on whether it is an application failure or some infrastructure event interrupting the job.

Starting today, customers can define how many retries are performed in cases where a task does not finish correctly. AWS Batch now allows customers define custom retry conditions, so that failures like an interruption of an instance or an infrastructure agent are handled differently, and do not just exhaust the number of retries attempted.

In this blog, I show the benefits of custom retry with AWS Batch by using different error codes from a job to control whether it should be retried. I will also demonstrate how to handle infrastructure events like a failing container image download, or an EC2 Spot interruption.

Example setup

To showcase this new feature, I use the AWS Command Line Interface (AWS CLI) to set up the following:

  1. IAMroles, policies, and profiles to grant access and permissions
  2. A compute environment (CE) to provide the compute resources to run jobs
  3. A job queue, which supervises the job execution and schedules jobs on the CE
  4. Job definitions with different retry strategies,which use a simple job to demonstrate how the new configuration can be applied

Once those tasks are completed, I submit jobs to show how you can handle different scenarios, such as infrastructure failure, application handling via error code or a middleware event.

Prerequisite

To make things easier, I first set up a couple of environment variables to have the information available for later use. I use the following code to set up the environment variables:

# in case it is not already installed
sudo yum install -y jq 
export MD_URL=http://169.254.169.254/latest/meta-data
export IFACE=$(curl -s ${MD_URL}/network/interfaces/macs/)
export SUBNET_ID=$(curl -s ${MD_URL}/network/interfaces/macs/${IFACE}/subnet-id)
export VPC_ID=$(curl -s ${MD_URL}/network/interfaces/macs/${IFACE}/vpc-id)
export AWS_REGION=$(curl -s ${MD_URL}/placement/availability-zone | sed 's/[a-z]$//')
export AWS_ACCT_ID=$(curl -s ${MD_URL}/identity-credentials/ec2/info |jq -r .AccountId)
export AWS_SG_DEFAULT=$(aws ec2 describe-security-groups \
--filters Name=group-name,Values=default \
|jq -r '.SecurityGroups[0].GroupId')

IAM

When using the AWS Management Console, I must create IAM roles manually.

Trust policies

IAM roles are defined to be used by an individual service. In the simplest case, I want a role to be used by Amazon EC2 – the service that provides the compute capacity in the cloud. The definition of which entity is able to use an IAM role is called a Trust Policy. To set up a Trust Policy for an IAM role, I use the following code snippet:

cat > ec2-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "ec2.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
EOF

Instance role

With the IAM trust policy, I can now create an ecsInstanceRole and attach the pre-defined policy AmazonEC2ContainerServiceforEC2Role. This allows an instance to interact with Amazon ECS.

aws iam create-role --role-name ecsInstanceRole \
 --assume-role-policy-document file://ec2-trust-policy.json
aws iam create-instance-profile --instance-profile-name ecsInstanceProfile
aws iam add-role-to-instance-profile \
    --instance-profile-name ecsInstanceProfile \
    --role-name ecsInstanceRole
aws iam attach-role-policy --role-name ecsInstanceRole \
 --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role

Service role

The AWS Batch service uses a role to interact with different services. The trust relationship reflects that the AWS Batch service is going to assume this role. I can set up this role with the following logic:

cat > svc-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "batch.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
EOF
aws iam create-role --role-name AWSBatchServiceRole \
--assume-role-policy-document file://svc-trust-policy.json
aws iam attach-role-policy --role-name AWSBatchServiceRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole

At this point, I have created the IAM roles and policies so that the instances and services are able to interact with the AWS API operations, including trust policies to define which services are meant to use them. EC2 for the ecsInstanceRole and the AWSBatchServiceRole for the AWS Batch service itself.

Compute environment

Now, I am going to create a CE, which will launch instances to run the example jobs.

cat > compute-environment.json << EOF
{
  "computeEnvironmentName": "compute-0",
  "type": "MANAGED",
  "state": "ENABLED",
  "computeResources": {
    "type": "SPOT",
    "allocationStrategy": "SPOT_CAPACITY_OPTIMIZED",
    "minvCpus": 2,
    "maxvCpus": 32,
    "desiredvCpus": 4,
    "instanceTypes": [ "m5.xlarge","m5.2xlarge","m4.xlarge","m4.2xlarge","m5a.xlarge","m5a.2xlarge"],
    "subnets": ["${SUBNET_ID}"],
    "securityGroupIds": ["${AWS_SG_DEFAULT}"],
    "instanceRole": "arn:aws:iam::${AWS_ACCT_ID}:instance-profile/ecsInstanceRole",
    "tags": {"Name": "aws-batch-instances"},
    "ec2KeyPair": "batch-ssh-key",
    "bidPercentage": 0
  },
  "serviceRole": "arn:aws:iam::${AWS_ACCT_ID}:role/AWSBatchServiceRole"
}
EOF
aws batch create-compute-environment --cli-input-json file:// compute-environment.json 

Once this is complete, my compute environment begins to launch instances. This takes a few minutes. I can use the following command to check on the status of the compute environment whenever I want:

aws batch describe-compute-environments |jq '.computeEnvironments[] |select(.computeEnvironmentName=="compute-0")'

The command uses jq to filter the output to only show the compute environment I just created.

Job queue

Now that I have my compute environment up and running, I can create a job queue, which accepts job submissions and schedules the jobs to the compute environment.

cat > job-queue.json << EOF
{
  "jobQueueName": "queue-0",
  "state": "ENABLED",
  "priority": 1,
  "computeEnvironmentOrder": [{
    "order": 0,
    "computeEnvironment": "compute-0"
  }]
}
EOF
aws batch create-job-queue --cli-input-json file://job-queue.json

Job definition

The job definition is used as a template for jobs. It is referenced in a job submission to specify the defaults of a job configuration, while some of the parameters can be overwritten when you submit.

Within the job definition, different retry strategies can be configured along with a maximum number of attempts for the job.
Three possible conditions can be used:

  • onExitCode will evaluate non-zero exit codes
  • onReason matched against middleware errors
  • onStatusReason can be used to react to infrastructure events such as an instance termination

Different conditions are assigned an action to either EXIT or RETRY the job. Important to note, that a job finishing with an exit code of zero will EXIT the job and not evaluate the retry condition. The default behavior for all non-zero exit code is the following:

{
  "onExitCode" : ""
  "onStatusReason" : ""
  "onReason" : "*"
  "action": retry
}

This condition retries every job that does not succeed (exit code 0) until the attempts are exhausted.

Spot Instance interruptions

AWS Batch works great with Spot Instances and customers are using this to reduce their compute cost. If Spot Instances become unavailable, instances are reclaimed by EC2, which can lead to one or more of my hosts being shut down. When this happens, the jobs running on those hosts are shut down due to an infrastructure event, not an application failure. Previously, separating these kinds of events from one another was only possible by catching the notification on the instance itself or through CloudWatch Events. Now with customer retry, you don’t have to rely on instance notifications or CloudWatch Events.

Using the job definition below, the job is restarted if the instance running the job gets shut down, which includes the termination due to a Spot Instance reclaim. The additional condition makes sure that the job exits whenever the exit code is not zero, otherwise the job would be rescheduled until the attempts are exhausted (see default behavior above).

cat > jdef-spot..json << EOF
{
    "jobDefinitionName": "spot",
    "type": "container",
    "containerProperties": {
        "image": "alpine:latest",
        "vcpus": 2,
        "memory": 256,
        "command":  ["sleep","600"],
        "readonlyRootFilesystem": false
    },
    "retryStrategy": { 
        "attempts": 5,
        "evaluateOnExit": 
        [{
            "onStatusReason" :"Host EC2*",
            "action": "RETRY"
        },{
  		  "onReason" : "*"
            "action": "EXIT"
        }]
    }
}
EOF
aws batch register-job-definition --cli-input-json file://jdef-spot.json

To simulate a Spot Instances reclaim, I submit a job, and manually shut down the host the job is running on. This triggers my condition to ask AWS Batch to make 5 attempts to finish the job before it marks the job a failure.

When I use the AWS CLI to describe my job, it displays the number of attempts to retry.

By shutting down my instance, the job returns to the status RUNNABLE and will be scheduled again until it succeeds or reaches the maximum attempts defined.

Exit code mitigation

I can also use the exit code to decide which mitigation I want to use based on the exit code of the job script or application itself.

To illustrate this, I can create a new job definition that uses a container image that exits on a random exit code between 0 and 3. Traditionally, an exit code of 0 means success, and won’t trigger this retry strategy. For all other (nonzero) exit codes the retry strategy is evaluated. In my example, 1 or 2 reflect situations where a retry is needed, but an exit code of 3 means that AWS Batch should let the job fail.

cat > jdef-randomEC.json << EOF
{
    "jobDefinitionName": "randomEC",
    "type": "container",
    "containerProperties": {
        "image": "qnib/random-ec:2020-10-13.3",
        "vcpus": 2,
        "memory": 256,
        "readonlyRootFilesystem": false
    },
    "retryStrategy": { 
        "attempts": 10,
        "evaluateOnExit": 
        [{
            "onExitCode": "1",
            "action": "RETRY"
        },{
            "onExitCode": "2",
            "action": "RETRY"
        },{
            "onExitCode": "3",
            "action": "EXIT"
        }]
    }
}
EOF
aws batch register-job-definition --cli-input-json file://jdef-randomEC.json

A submitted job retries until the exit code 0 is successful, 3 for a failure or the attempts are exhausted (in this case, 10 of them).

aws batch submit-job  --job-name randomEC-$(date +"%F_%H-%M-%S") --job-queue queue-0   --job-definition randomEC:1

The output of a job submission shows the job name and the job id.

In case the exit code is 1, and the job will be requeued.

Container image pull failure

The first example showed an error on the infrastructure layer and the second showed how to handle errors on the application layer. In this last example, I show how to handle errors that are introduced in the middleware layer, in this case: the container daemon.

It might happen if your Docker registry is down or having issues. To demonstrate this, I used an image name that is not present in the registry. In that case, the job should not get rescheduled to fail again immediately.

The following job definition again defines 10 attempts for a job, except when the container cannot be pulled. This leads to a direct failure of the job.

cat > jdef-noContainer.json << EOF
{
    "jobDefinitionName": "noContainer",
    "type": "container",
    "containerProperties": {
        "image": "no-container-image",
        "vcpus": 2,
        "memory": 256,
        "readonlyRootFilesystem": false
    },
    "retryStrategy": { 
        "attempts": 10,
        "evaluateOnExit": 
        [{
            "onReason": "CannotPullContainerError:*",
            "action": "EXIT"
        }]
    }
}
EOF
aws batch register-job-definition --cli-input-json file://jdef-noContainer.json

Note that the job defines an image name (“no-container-image”) which is not present in the registry. The job is set up to fail when trying to download the image, and will do so repeatedly, if AWS Batch keeps trying.

Even though the job definition has 10 attempts configured for this job, it fell straight through to FAILED as the retry strategy sets the action exit when a CannotPullContainerError occurs. Many of the error codes I can create conditions for are documented in the Amazon ECS user guide (e.g. task error codes / container pull error).

Conclusion

In this blog post, I showed three different scenarios that leverage the new custom retry features in AWS Batch to control when a job should exit or get rescheduled.

By defining retry strategies you can react to an infrastructure event (like an EC2 Spot interruption), an application signal (via the exit code), or an event within the middleware (like a container image not being available).

This new feature allows you to have fine grained control over how your jobs react to different error scenarios.

Fire Dynamics Simulation CFD workflow using AWS ParallelCluster, Elastic Fabric Adapter, Amazon FSx for Lustre and NICE DCV

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/fire-dynamics-simulation-cfd-workflow-using-aws-parallelcluster-elastic-fabric-adapter-amazon-fsx-for-lustre-and-nice-dcv/

This post was written by By Kevin Tuil, AWS HPC consultant 

Modeling fires is key for many industries, from the design of new buildings, defining evacuation procedures for trains, planes and ships, and even the spread of wildfires. Modeling these fires is complex. It involves both the need to model the three-dimensional unsteady turbulent flow of the fire and the many potential chemical reactions. To achieve this, the fire modeling community has moved to higher-fidelity turbulence modeling approaches such as the Large Eddy Simulation, which requires both significant temporal and spatial resolution. It means that the computational cost for these simulations is typically in the order of days to weeks on a single workstation.
While there are a number of software packages, one of the most popular is the open-source code: Fire Dynamics Simulation (FDS) developed by National Institute of Standards and Technology (NIST).

In this blog, I focus on how AWS High Performance Computing (HPC) resources (e.g AWS ParallelCluster, Amazon FSx for Lustre, Elastic Fabric Adapter (EFA), and Amazon S3) allow FDS users to scale up beyond a single workstation to hundreds of cores to achieve simulation times of hours rather than days or weeks. In this blog, I outline the architecture needed, providing scripts and templates to compile FDS and run your simulation.

Service and solution overview

AWS ParallelCluster

AWS ParallelCluster is an open source cluster management tool that simplifies deploying and managing HPC clusters with Amazon FSx for Lustre, EFA, a variety of job schedulers, and the MPI library of your choice. AWS ParallelCluster simplifies cluster orchestration on AWS so that HPC environments become easy-to-use, even if you are new to the cloud. AWS released AWS ParallelCluster 2.9.1 and its user guide – which is the version I use in this blog.

These three AWS HPC resources are optimal for Fire Dynamics Simulation. Together, they provide easy deployment of HPC systems on AWS, low latency network communication for MPI workloads, and a fast, parallel file system.

Elastic Fabric Adapter

EFA is a critical service that provides low latency and high-bandwidth 100 Gbps network communication. EFA allows applications to scale at the level of on-premises HPC clusters with the on-demand elasticity and flexibility of the AWS Cloud. Computational Fluid Dynamics (CFD), among other tightly coupled applications, is an excellent candidate for the use of EFA.

Amazon FSx for Lustre

Amazon FSx for Lustre is a fully managed, high-performance file system, optimized for fast processing workloads, like HPC. Amazon FSx for Lustre allows users to access and alter data from either Amazon S3 or on-premises seamlessly and exceptionally fast. For example, you can launch and run a file system that provides sub-millisecond latency access to your data. Additionally, you can read and write data at speeds of up to hundreds of gigabytes per second of throughput, and millions of IOPS. This speed and low-latency unleash innovation at an unparalleled pace. This blog post uses the latest version of Amazon FSx for Lustre, which recently added a new API for moving data in and out of Amazon S3. This API also includes POSIX support, which allows files to mount with the same user id. Additionally, the latest version also includes a new backup feature that allows you to back up your files to an S3 bucket.

Solution and steps

The overall solution that I deploy in this blog is represented in the following diagram:

solution overview diagram

Step 1: Access to AWS Cloud9 terminal and upload data

There are two ways to start using AWS ParallelCluster. You can either install AWS CLI or turn on AWS Cloud9, which is a cloud-based integrated development environment (IDE) that includes a terminal. For simplicity, I use AWS Cloud9 to create the HPC cluster. Please refer to this link to proceed to AWS Cloud9 set up and to this link for AWS CLI setup.

Once logged into your AWS Cloud9 instance, the first thing you want to create is the S3 bucket. This bucket is key to exchange user data in and out from the corporate data center and the AWS HPC cluster. Please make sure that your bucket name is unique globally, meaning there is only one worldwide across all AWS Regions.

aws s3 mb s3://fds-smv-bucket-unique
make_bucket: fds-smv-bucket-unique

Download the latest FDS-SMV Linux version package from the official NIST website. It looks something like: FDS6.7.4_SMV6.7.14_lnx.sh

For the geometry, it should be renamed to “geometry.fds”, and must be uploaded to your AWS Cloud9 or directly to your S3 bucket.

Please note that once the FDS-SMV package has been downloaded locally to the instance, you must upload it to the S3 bucket using the following command.

aws s3 cp FDS6.7.4_SMV6.7.14_lnx.sh s3://fds-smv-bucket-unique
aws s3 cp geometry.fds s3://fds-smv-bucket-unique

You use the same S3 bucket to install FDS-SMV later on with the Amazon FSx for Lustre File System.

Step 2: Set up AWS ParallelCluster

You can install AWS ParallelCluster running the following command from your AWS Cloud9 instance:

sudo pip install aws-parallelcluster

Once it is installed, you can run the following command to check the version:

pcluster version 

At the time of writing this blog, 2.9.1 is the most up-to-date version.

Then use the text editor of your choice and open the configuration file as follows:

vim ~/.parallelcluster/config

Replace the bolded section, if not yet filled in, by your own information and save the configuration file.

[aws]
aws_region_name = <AWS-REGION>

[global]
sanity_check = true
cluster_template = fds-smv-cluster
update_check = true

[vpc public]
vpc_id = vpc-<VPC-ID>
master_subnet_id = subnet-<SUBNET-ID>

[cluster fds-smv-cluster]
key_name = <Key-Name>
vpc_settings = public
compute_instance_type=c5n.18xlarge
master_instance_type=c5.xlarge
initial_queue_size = 0
max_queue_size = 100
scheduler=slurm
cluster_type = ondemand
s3_read_write_resource=arn:aws:s3:::fds-smv-bucket-unique*
placement_group = DYNAMIC
placement = compute
base_os = alinux2
tags = {"Name" : "fds-smv"}
disable_hyperthreading = true
fsx_settings = fsxshared
enable_efa = compute
dcv_settings = hpc-dcv

[dcv hpc-dcv]
enable = master

[fsx fsxshared]
shared_dir = /fsx
storage_capacity = 1200
import_path = s3://fds-smv-bucket-unique
imported_file_chunk_size = 1024
export_path = s3://fds-smv-bucket-unique

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

Let’s review the different sections of the configuration file and explain their role:

  • scheduler: Supported job schedulers are SGE, TORQUE, SLURM and AWS Batch. I have selected SLURM for this example.
  • cluster_type: You have the choice between On-Demand (ondemand) or Spot Instances (spot) for your compute instances. For On-Demand, instances are available for use without condition (if available in the Region selected) at a certain price per hour with the pay-as-you-go model, meaning that as soon as they are started, they are reserved for your utilization. For Spot Instances, you can take advantage of unused EC2 capacity in the AWS Cloud. Spot Instances are available at up to a 90% discount compared to On-Demand Instance prices. You can use Spot Instances for various stateless, fault-tolerant, or flexible applications such as HPC, for more information about Spot Instances, feel free to visit this webpage.
  • s3_read_write_resource: This parameter allows you to read and write objects directly on your S3 bucket from the cluster you created without additional permissions. It acts as a role for your cluster, allowing you access to your specified S3 bucket.  
  • placement_groupUse DYNAMIC to ensure that your instances are located as physically close to one another as possible. Close placement minimizes the latency between compute nodes and takes advantage of EFA’s low latency networking.
  • placement: By selecting compute you only enforce compute instances to be placed within the same placement group, leaving the head node placement free.
  • compute_instance_type:Select C5n.18xlarge because it is optimized for compute-intensive workloads and supports EFA for better scaling of HPC applications. Note that EFA is supported only for specific instance types. Please visit currently supported instances for more information.
  • master_instance_type:This can be any instance type. As traffic between head and compute nodes is relatively small, and the head node runs during the entire lifetime of the cluster, I use c5.xlarge because it is inexpensive and is a good fit for this use case.
  • initial_queue_size:You start with no compute instances after the HPC cluster is up. This means that any new job submitted has some delay (time for the nodes to be powered on) before they are seen as available by the job scheduler. This helps you pay for what you use and keeps costs as low as possible.
  • max_queue_size:Limit the maximum compute fleet to 100 instances. This allows you room to scale your jobs up to a large number of cores, while putting a limit on the number of compute nodes to help control costs.
  • base_osFor this blog, select Amazon Linux 2 (alinux2) as a base OS. Currently we also support Amazon Linux (alinux), CentOS 7 (centos7), Ubuntu 16.04 (ubuntu1604), and Ubuntu 18.04 (ubuntu1804) with EFA.
  • disable_hyperthreading: This setting turns off hyperthreading (true) on your cluster, which is the right configuration in this use case.[fsx fsxshared]: This section contains the settings to define your FSx for Lustre parallel file system, including the location where the shared directory is mounted, the storage capacity for the file system, the chunk size for files to be imported, and the location from which the data will be imported. You can read more about FSx for Lustre here.
  • enable_efa: Mark as (true) in this use case since it is a tightly coupled CFD simulation use case.
  • dcv_settings:With AWS ParallelCluster, you can use NICE DCV to support your remote visualization needs.
  • [dcv hpc-dcv]:This section contains the settings to define your remote visualization setup. You can read more about DCV with AWS ParallelCluster here.
  • import_path: This parameter enables all the objects on the S3 bucket available when creating the cluster to be seen directly from the FSx for Lustre file system. In this case, you are able to access the FDS-SMV package and the geometry under the /fsx mounted folder.
  • export_path: This parameter is useful for backup purposes using the Data Repository Tasks. I share more details about this in step 7 (optional).

Step 3: Create the HPC cluster and log in

Now, you can create the HPC cluster, named fds-smv. It takes around 10 minutes to complete and you can see the status changing (going through the different AWS CloudFormation template steps). At the end of creation, two IP addresses are prompted, a public IP and/or a private IP depending on your network choice.

pcluster create fds-smv
Creating stack named: parallelcluster-fds-smv
Status: parallelcluster-fds-smv - CREATE_COMPLETE                               
MasterPublicIP: X.X.X.X
ClusterUser: ec2-user
MasterPrivateIP: X.X.X.X

In order to log in, you must use the key you specified in the AWS ParallelCluster configuration file before creating the cluster:

pcluster ssh fds-smv -i <Key-Name>

You should now be logged in as an ec2-user (since we are using Amazon Linux 2 base OS).

Step 4: Install FDS-SMV package

Now that the HPC cluster using AWS ParallelCluster is set up, it is time to install the FDS-SMV package.  In the prior steps, you uploaded both the FDS-SMV package and the geometry to your S3 bucket. Since you enabled “import_path” to that bucket, they are already available on the Amazon FSx for Lustre storage under /fsx.

Run the script as follows and select /fsx/fds-smv as final target for installation:

cd /fsx
./FDS6.7.4_SMV6.7.14_lnx.sh
[ec2-user@ip-X-X-X-X fsx]$ ./FDS6.7.4_SMV6.7.14_lnx.sh 

Installing FDS and Smokeview  for Linux

Options:
  1) Press <Enter> to begin installation [default]
  2) Type "extract" to copy the installation files to:
     FDS6.7.4_SMV6.7.14_lnx.tar.gz
 

FDS install options:
  Press 1 to install in /home/ec2-user/FDS/FDS6 [default]
  Press 2 to install in /opt/FDS/FDS6
  Press 3 to install in /usr/local/bin/FDS/FDS6
  Enter a directory path to install elsewhere
/fsx/fds-smv

It is important to source the following scripts as part of the installed packages to check if the installation is successful with the correct versions. Here is the correct output you should get:

[ec2-user@ip-X-X-X-X ~]$ source /fsx/fds-smv/bin/SMV6VARS.sh 
[ec2-user@ip-X-X-X-X ~]$ source /fsx/fds-smv/bin/FDS6VARS.sh 
[ec2-user@ip-X-X-X-X ~]$ fds -version
FDS revision       : FDS6.7.4-0-gbfaa110-release
MPI library version: Intel(R) MPI Library 2019 Update 4 for Linux* OS

[ec2-user@ip-10-0-2-233 ~]$ smokeview -version

Smokeview  SMV6.7.14-0-g568693b-release - Mar  9 2020

Revision         : SMV6.7.14-0-g568693b-release
Revision Date    : Wed Mar 4 23:13:42 2020 -0500
Compilation Date : Mar  9 2020 16:31:22
Compiler         : Intel C/C++ 19.0.4.243
Checksum(SHA1)   : e801eace7c6597dc187739e51ba6f546bfde4e48
Platform         : LINUX64

Important notes:

The way FDS-SMV package has been installed is the default installation. Binaries are already compiled and Intel MPI libraries are embedded as part of the installation package. It is what one would call a self-contained application. For further builds and source codes, please visit this webpage.

Step 5: Running the fire dynamics simulation using FDS

Now that everything is installed, it is time to create the SLURM submission script. In this step, you take advantage of the FSx for Lustre File System, the compute-optimized instance, and the EFA network to maximize simulation performance.

cd /fsx/
vi fds-smv.sbatch

Here is the information you should specify in your submission script:

#!/bin/bash
#SBATCH --job-name=fds-smv-job
#SBATCH --ntasks=<Total number of MPI processes>
#SBATCH --ntasks-per-node=36
#SBATCH --output=%x_%j.out

source /fsx/fds-smv/bin/FDS6VARS.sh
source /fsx/fds-smv/bin/SMV6VARS.sh

module load intelmpi 

export OMP_NUM_THREADS=1
export I_MPI_PIN_DOMAIN=omp

cd /fsx/<results>

time mpirun -ppn 36 -np <Total number of MPI processes>  fds geometry.fds

Replace the <results> with the one of your choice, and don’t forget to copy the geometry.fds file in it before submitting your job. Once ready, save the file and submit the job using the following command:

sbatch fds-smv.sbatch 

If you decided to build your HPC cluster with c5n.18xlarge instances, the number of MPI processes per node is 36 since you turned off the hyperthreading, and that the instance has 36 physical cores. That is the meaning of the “#SBATCH --ntasks-per-node=36” line.

For any run exceeding 36 MPI processes, the job is split among multiple instances and take advantage of EFA for internode communication.

It is important to note that FDS only allows the number of MPI processes to be equal to the number of meshes in the input geometry (geometry.fds in this scenario). In case the number of meshes in the input geometry cannot be modified, OpenMP threads can be enabled and efficiently increase performance. Do this using up to four OpenMP Threads across four CPU cores attached to one MPI process.

Please read best practices provided by NIST for that topic on their user guide.

In order to take advantage of the distributed computing capability of FDS, it is mandatory to work first on the input geometry, and divide it into the appropriate number of meshes. It is also highly advised to evenly distribute the number of cells/elements per mesh across all meshes. This best practice optimizes the load balancing for each CPU core.

Step 6: Visualizing the results using NICE DCV and SMV

In order to visualize results, you must connect to the head node using NICE DCV streaming protocol.

As a reminder, the current instance type for the head node is a c5.xlarge, which is not a graphics-accelerated instance. For heavy and GPU intensive visualization, it is important to set up a more appropriate instance such as the G4 instance group.

Go back to your AWS Cloud9 instance, open a new terminal side by side to your session connected to your AWS HPC cluster, and enter the following command in the terminal:

pcluster dcv connect fds-smv -k <Key-Name>

You are provided a one-time HTTPS URL available for a short period of time in order to connect to your head node using the NICE DCV protocol.

Once connected, open the terminal inside your session and source the FDS-SMV scripts as before:

source /fsx/fds-smv/bin/FDS6VARS.sh
source /fsx/fds-smv/bin/SMV6VARS.sh

Navigate to your <results> folder and start SMV with your result.

I have selected one of the geometries named fire_whirl_pool.fds in the Examples folder, part of the default FDS-SMV installation package located here:

/fsx/fds-smv/Examples/Fires/fire_whirl_pool.fds

You can find other scenarios under the Examples folder to run some more use cases if you did not already choose your geometry.fds file.

Now you can run SMV and visualize your results:

smokeview fire_whirl_pool.smv

SMV (smokeview) takes as an input .smv extension files, please replace with your appropriate file. If you have already chosen your geometry.fds, then run the following command:

smokeview geometry.smv

The application then open as follows, and you can visualize the results. The following image is an output of the SOOT DENSITY of the 3D smoke.

fire simulation picture

Step 7 (optional): Back up your FDS-SMV results to an S3 bucket

First update the AWS CLI to its most recent version. It is compatible with 1.16.309 and above.

After running your FDS-SMV simulation, you can back up your data in /fsx to the S3 bucket you used earlier to upload the installation package, and input files using Data Repository Tasks.

Data Repository Tasks represent bulk operations between your Amazon FSx for Lustre file system and your S3 bucket. One of the jobs is to export your changed file system contents back to its linked S3 bucket.

Open your AWS Cloud9 terminal and exit the HPC head node cluster. Retrieve your Amazon FSx for Lustre ID using:

aws fsx describe-file-systems

It looks something like, fs-0533eebf1148fc8dd. Then create a backup of the data as follows:

aws fsx create-data-repository-task --file-system-id fs-0533eebf1148fc8dd --type EXPORT_TO_REPOSITORY --paths results --report Enabled=true,Scope=FAILED_FILES_ONLY,Format=REPORT_CSV_20191124,Path=s3://fds-smv-bucket-unique/

The following are definitions about the command parameters:

  • file-system-id: Your file system ID.
  • type EXPORT_TO_REPOSITORY: Exports the data back to the S3 bucket.
  • paths results: The directory you want to export to your S3 bucket. If you have more than one folder to back up, use a comma-separated notation such as: results1,results2,…
  • Format=REPORT_CSV_20191124: Note this is only the name the Amazon FSx Lustre supports. Please keep it the same.

You can check the backup status by running:

aws fsx describe-data-repository-tasks

Please wait for the copy to be achieved, once finished you should see on the Lifecycle line "Lifecycle": "SUCCEEDED"

Also go back to your S3 bucket, and your folder(s) should appear with all the files correctly uploaded from your /fsx folder you specified.

In terms of data management, Amazon S3 is an important service. You started by uploading installation package and geometry files from an external source, such as your laptop or an on-premises system. Then made these files available to the AWS HPC cluster under the Amazon FSx for Lustre file system and ran the simulation. Finally, you backed up the results from the Amazon FSx for Lustre to Amazon S3. You can also decide to download the results on Amazon S3 back to your local system if needed.

Step 8: Delete your AWS resources created during the deployment of this blog

After your run is completed and your data backed up successfully (Step 7 is optional) on your S3 bucket, you can then delete your cluster by using the following command in your Cloud9 terminal:

pcluster delete fds-smv

Warning:

If you run the command above all resources you created during this blog are automatically deleted beside your Cloud9 session and your data on your S3 bucket you created earlier.

Your S3 bucket still contains your input “geometry.fds” and your installation package “FDS6.7.4_SMV6.7.14_lnx.sh” files.

If you selected to back up your data during Step 7 (optional), then your S3 bucket also contains that data on top of the two previous files mentioned above.

If you want to delete your S3 bucket and all data mentioned above, go to your AWS Management Console, select S3 service then select your S3 bucket and hit delete on the top section.

If you want to terminate your Cloud9 session, go to your AWS Management Console, select Cloud9 service then select your session and hit delete on the top right section.

After performing these operations, there will be no more resources running on AWS related to this blog.

Conclusion

I showed that AWS ParallelCluster, Amazon FSx for Lustre, EFA, and Amazon S3 are key AWS services and features for HPC workloads such as CFD and in particular for FDS.

You can achieve simulation times of hours on AWS rather than days or weeks on a single workstation.

Please visit this workshop  for a more in-depth tutorial on running Fire Dynamics Simulation on AWS and our HPC dedicated homepage.

 

How to run 3D interactive applications with NICE DCV in AWS Batch

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/how-to-run-3d-interactive-applications-with-nice-dcv-in-aws-batch/

This post is contributed by Alberto Falzone, Consultant, HPC and Roberto Meda, Senior Consultant, HPC.

High Performance Computing (HPC) workflows across industry verticals such as Design and Engineering, Oil and Gas, and Life Sciences often require GPU-based 3D/OpenGL rendering. Setting up drivers and applications for these types of workflows can require significant effort.

Similar GPU intensive workloads, such as AI/ML, are heavily using containers to package software stacks and reduce the complexity of installing and setting up the required binaries and scripts to download and run a simple container image. This approach is rarely used in the visualization of previously mentioned pre- and post-processing steps due to the complexity of using a graphical user interface within a container.

This post describes how to reduce the complexity of installing and configuring a GPU accelerated application while maintaining performance by using NICE DCV. NICE DCV is a high-performance remote display protocol that provides customers with a secure way to deliver remote desktops and application streaming from any cloud or data center to any device, over varying network conditions.

With remote server-side graphical rendering, and optimized streaming technology over network, huge volume data can be analyzed easily without moving or downloading on client, saving on data transfer costs.

Services and solution overview

This post provides a step-by-step guide on how to build a container able to run accelerated graphical applications using NICE DCV, and setup AWS Batch to run it. Finally, I will showcase how to submit an AWS Batch job that will provision the compute environment (CE) that contains a set of managed or unmanaged compute resources that are used to run jobs, launch the application in a container, and how to connect to the application with NICE DCV.

Services

Before reviewing the solution, below are the AWS services and products you will use to run your application:

  • AWS Batch (AWS Batch) plans, schedules, and runs batch workloads on Amazon Elastic Container Service (ECS), dynamically provisioning the defined CE with Amazon EC2
  • Amazon Elastic Container Registry (Amazon ECR) is a fully managed Docker container registry that simplifies how developers store, manage, and deploy Docker container images. In this example, you use it to register the Docker image with all the required software stack that will be used from AWS Batch to submit batch jobs.
  • NICE DCV (NICE DCV) is a high-performance remote display protocol that delivers remote desktops and application streaming from any cloud or data center to any device, over varying network conditions. With NICE DCV and Amazon EC2, customers can run graphics-intensive applications remotely on G3/G4 EC2 instances, and stream the results to client machines not provided with a GPU.
  • AWS Secrets Manager (AWS Secrets Manager) helps you to securely encrypt, store, and retrieve credentials for your databases and other services. Instead of hardcoding credentials in your apps, you can make calls to Secrets Manager to retrieve your credentials whenever needed.
  • AWS Systems Manager (AWS Systems Manager) gives you visibility and control of your infrastructure on AWS, and provides a unified user interface so you can view operational data from multiple AWS services. It also allows you to automate operational tasks across your AWS resources. Here it is used to retrieve a public parameter.
  • Amazon Simple Notification Service (Amazon SNS) enables applications, end-users, and devices to instantly send and receive notifications from the cloud. You can send notifications by email to the user who has created a valid and verified subscription.

Solution

The goal of this solution is to run an interactive Linux desktop session in a single Amazon ECS container, with support for GPU rendering, and connect remotely through NICE DCV protocol. AWS Batch will dynamically provision EC2 instances, with or without GPU (e.g. G3/G4 instances).

Solution scheme

You will build and register the DCV Container image to be used for the DCV Desktop Sessions. In AWS Batch, we will set up a managed CE starting from the Amazon ECS GPU-optimized AMI, which comes with the NVIDIA drivers and Amazon ECS agent already installed. Also, you will use Amazon Secrets Manager to safely store user credentials and Amazon SNS to automatically notify the user that the interactive job is ready.

Tutorial

As a Computational Fluid Dynamics (CFD) visualization application example you will use Paraview.

This blog post goes through the following steps:

  1. Prepare required components
    • Launch temporary EC2 instance to build a DCV container image
    • Store user’s credentials and notification data
    • Create required roles
  2. Build DCV container image
  3. Create a repository on Amazon ECR
    • Push the DCV container image
  4. Configure AWS Batch
    • Create a managed CE
    • Create a related job queue
    • Create its Job Definition
  5. Submit a batch job
  6. Connect to the interactive desktop session using NICE DCV
    • Run the Paraview application to visualize results of a job simulation

Prerequisites

  • An Amazon Linux 2 instance as a Docker host, launched from the latest Amazon ECS GPU-optimized AMI
  • In order to connect to desktop sessions, inbound DCV port must be opened (by default DCV port is 8443)
  • AWS account credentials with the necessary access permissions
  • AWS Command Line Interface (CLI) installed and configured with the same AWS credentials
  • To easily install third-party/open source required software, assume that the Docker host has outbound internet access allowed

Step 1. Required components

In this step you’ll create a temporary EC2 instance dedicated to a Docker image, and create the IAM policies required for the next steps. Next create the secrets in AWS Secrets Manager service to store sensible data like credentials and SNS topic ARN, and apply and verify the required system settings.

1.1 Launch the temporary EC2 instance for Docker image building

Launch the EC2 instance that becomes your Docker host from the Amazon ECS GPU-optimized AMI. Retrieve its AMI ID. For cost saving, you can use one of t3* family instance type for this stage (e.g. t3.medium).

1.2 Store user credentials and notification data

As an example of avoiding hardcoded credentials or keys into scripts used in next stages, we’ll use AWS Secrets Manager to safely store final user’s OS credentials and other sensible data.

  • In the AWS Management Console select Secrets Manager, create a new secret, select type Other type of secrets, and specify key pair. Store the user login name as a key, e.g.: user001, and the password as value, then name the secret as Run_DCV_in_Batch, or alternatively you can use the commands. Note xxxxxxxxxx is your chosen password.

aws secretsmanager  create-secret --secret-id Run_DCV_in_Batch
aws secretsmanager put-secret-value --secret-id Run_DCV_in_Batch --secret-string '{"user001":"xxxxxxxxxx"}'

  • Create an SNS Topic to send email notifications to the user when a DCV session is ready for connection:
  • In the AWS Management Console select Secrets Manager service to create a new secret named DCV_Session_Ready_Notification, with type other type of secrets and key pair values. Store the string sns_topic_arn as a key and the SNS Topic ARN as value:

aws secretsmanager  create-secret --secret-id DCV_Session_Ready_Notification
aws secretsmanager put-secret-value --secret-id DCV_Session_Ready_Notification --secret-string '{"sns_topic_arn":"<put here your SNS Topic ARN>"}'

1.3 Create required role and policy

To simplify, define a single role named dcv-ecs-batch-role gathering all the necessary policies. This role will be associated to the EC2 instance that launches from an AWS Batch job submission, so it is included inside the CE definition later.

To allow DCV sessions, push images into Amazon ECR and AWS Batch operations, create the role and include the following AWS managed and custom policies:

  • AmazonEC2ContainerRegistryFullAccess
  • AmazonEC2ContainerServiceforEC2Role
  • SecretsManagerReadWrite
  • AmazonSNSFullAccess
  • AmazonECSTaskExecutionRolePolicy

To reach the NICE DCV licenses stored in Amazon S3 (see licensing the NICE DCV server for more details), define a custom policy named DCVLicensePolicy (the following policy is for eu-west-1 Region, you might also use us-east-1):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::dcv-license.eu-west-1/*"
        }
    ]
}

create role

Note: If needed, you can add additional policies to allow the copy data from/to S3 bucket.

Update the Trust relationships of the same role in order to allow the Amazon ECS tasks execution and use this role from the AWS Batch Job definition as well:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Trusted relationships and Trusted entities

1.4 Create required Security Group

In the AWS Management Console, access EC2, and create a Security Group, named dcv-sg, that is open to DCV sessions and DCV clients by enabling tcp port 8443 in Inbound.

Step 2. DCV container image

Now you will build a container that provides OpenGL acceleration via NICE DCV. You’ll write the Dockerfile starting from Amazon Linux 2 base image, and add DCV with its related requirements.

2.1 Define the Dockerfile

The base software packages in the Dockerfile will contain: NVIDIA libraries, X server and GNOME desktop and some external scripts to manage the DCV service startup and email notification for the user.

Starting from the base image just pulled, our Dockerfile does install all required (and optional) system tools and libraries, desktop manager packages, manage the Prerequisites for Linux NICE DCV Servers , Install the NICE DCV Server on Linux and Paraview application for 2D/3D data visualization.

The final contents of the Dockerfile is available here; in the same repository, you can also find scripts that manage the DCV service system script, the notification message sent to the User, the creation of local User at startup and the run script for the DCV container.

2.2 Build Dockerfile

Install required tools both to unpack archives and perform command on AWS:

sudo yum install -y unzip awscli

Download the Git archive within the EC2 instance, and unpack on a temporary directory:

curl -s -L -o - https://github.com/aws-samples/aws-batch-using-nice-dcv/archive/latest.tar.gz | tar zxvf -

From inside the folder containing aws-batch-using-nice-dcv.dockerfile, let’s build the Docker image:

docker build -t dcv -f aws-batch-using-nice-dcv.dockerfile .

The first time it takes a while since it has to download and install all the required packages and related dependencies. After the command completes, check it has been built and tagged correctly with the command:

docker images

Step 3. Amazon ECR configuration

In this step, you’ll push/archive our newly built DCV container AMI into Amazon ECR. Having this image in Amazon ECR allows you to use it inside Amazon ECS and AWS Batch.

3.1 Push DCV image into Amazon ECR repository

Set a desired name for your new repository, e.g. dcv, and push your latest dcv image into it. The push procedure is described in Amazon ECR by selecting your repository, and clicking on the top-right button View push commands.

Install the required tool to manage content in JSON format:

sudo yum install -y jq

Amazon ECR push commands to run include:

  • Login command to authenticate your Docker client to Amazon ECS registry. Using the AWS CLI:

AWS_REGION="$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region)"
eval $(aws ecr get-login --no-include-email --region "${AWS_REGION}") Note: If you receive an “Unknown options: –no-include-email” error when using the AWS CLI, ensure that you have the latest version installed. Learn more.

  • Create the repository:

aws ecr create-repository --repository-name=dcv —region "${AWS_REGION}"DCV_REPOSITORY=$(aws ecr describe-repositories --repository-names=dcv --region "${AWS_REGION}"| jq -r '.repositories[0].repositoryUri')

  • Tag the image to push the image to the Amazon ECR repository:

docker build -t "${DCV_REPOSITORY}:$(date +%F)" -f aws-batch-using-nice-dcv.dockerfile .

  • Push command:

docker push "${DCV_REPOSITORY}:$(date +%F)"

Step 4. AWS Batch configuration

The final step is to set up AWS Batch to manage your DCV containers. The link to all previous steps is the use of our DCV container image inside the AWS Batch CE.

4.1 Compute environment

Create an AWS Batch CE using othe newly created AMI.

  • Log into the AWS Management Console, select AWS Batch, select ‘get started’, and skip the wizard on next page.
  • Choose Compute Environments on the left, and click on Create Environment.
  • Specify all your desired settings, e.g.:
      • Managed type
      • Name: DCV-GPU-CE
      • Service role: AWSBatchServiceRole
      • Instance role: dcv-ecs-batch-role
  • Since you want OpenGL acceleration, choose an instance type with GPU (e.g. g4dn.xlarge).
  • Choose an allocation strategy. In this example I choose BEST_FIT_PROGRESSIVE
  • Assign the security group dcv-sg, created previously at step 1.4 that keeps DCV port 8443 open.
  • Add a Nametag with the value e.g. “DCV-GPU-Batch-Instance”; to assign it to the EC2 instances started by AWS Batch automatically, so you can recognize it if needed.

4.2 Job Queue

Time to create a Job Queue for DCV with your preferred settings.

  • Select Job Queues from the left menu, then select Create queue (naming, for instance, e.g. DCV-GPU-Queue)
  • Specify a required Priority integer value.
  • Associate to this queue the CE you defined in the previous step (e.g. DCV-GPU-CE).

4.3 Job Definition

Now, we create a Job Definition by selecting the related item in the left menu, and select Create. 

We’ll use, listed per section:

  • Job Definition name (e.g. DCV-GPU-JD)
  • Execution timeout to 1h: 3600
  • Parameter section:
    • Add the Parameter named command with value: --network=host
      • Note: This parameter is required and equivalent to specify the same option to the docker run.Learn more.
  • Environment section:
    • Job role: dcv-ecs-batch-role
    • Container image: Use the ECR repository previously created, e.g. dkr.ecr.eu-west-1.amazonaws.com/dcv. If you don’t remember the Amazon ECR image URI, just return to Amazon ECR -> Repository -> Images.
    • vCPUs: 8
      • Note: Value equal to the vCPUs of the chosen instance type (in this example: gdn4.2xlarge), having one job per node to avoid conflicts on usage of TCP ports required by NICE DCV daemons.
    • Memory (MiB): 2048
  • Security section:
    • Check Privileged
    • Set user root (run as root)
  • Environment Variables section:
    • DISPLAY: 0
    • NVIDIA_VISIBLE_DEVICES: 0
    • NVIDIA_ALL_CAPABILITIES: all

Note: Amazon ECS provides a GPU-optimized AMI that comes ready with pre-configured NVIDIA kernel drivers and a Docker GPU runtime, learn more; the variables above make available the required graphic device(s) inside the container.

4.4 Create and submit a Job

We can finally, create an AWS Batch job, by selecting Batch → Jobs → Submit Job.
Let’s specify the job queue and job definition defined in the previous steps. Leave the command filed as pre-filled from job definition.

Running DCV job on AWS Batch

4.5 Connect to sessions

Once the job is in RUNNING state, go to the AWS Batch dashboard, you can get the IP address/DNS in several ways as noted in How do I get the ID or IP address of an Amazon EC2 instance for an AWS Batch job. For example, assuming the tag Name set on CE is DCV-GPU-Batch-Instance:

aws ec2 describe-instances --filters Name=instance-state-name,Values=running Name=tag:Name,Values="DCV-GPU-Batch-Instance" --query "Reservations[].Instances[].{id: InstanceId, tm: LaunchTime, ip: PublicIpAddress}" | jq -r 'sort_by(.tm) | reverse | .[0]' | jq -r .ip

Note: It could be required to add the EC2 policy to the list of instances in the IAM role. If the AWS SNS Topic is properly configured, as mentioned in subsection 1.2, you receive the notification email message with the URL link to connect to the interactive graphical DCV session.

Email from SNS

Finally, connect to it:

  • https://<ip address>:8443

Note: You might need to wait for the host to report as running on EC2 in AWS Management Console.

Below is a NICE DCV session running inside a container using the web browser, or equivalently the NICE DCV native client as well, running Paraview visualization application. It shows the basic elbow results coming from an external OpenFoam simulation, which data has been previously copied over from an S3 bucket; and the dcvgltest as well:

DCV Client connected to a running session

Cleanup

Once you’ve finished running the application, avoid incurring future charges by navigating to the AWS Batch console and terminate the job, set CE parameter Minimum vCPUs and Desired vCPUs equal to 0. Also, navigate to Amazon EC2 and stop the temporary EC2 instance used to build the Docker image.

For a full cleanup of all of the configurations and resources used, delete: the job definition, the job queue and the CE (AWS Batch), the Docker image and ECR Repository (Amazon ECR), the role dcv-ecs-batch-role (Amazon IAM), the security group dcv-sg (Amazon EC2), the Topic DCV_Session_Ready_Notification (AWS SNS), and the secret Run_DCV_in_Batch (Amazon Secrets Manager).

Conclusion

This blog post demonstrates how AWS Batch enables innovative approaches to run HPC workflows including not only batch jobs, but also pre-/post-analysis steps done through interactive graphical OpenGL/3D applications.

You are now ready to start interactive applications with AWS Batch and NICE DCV on G-series instance types with dedicated 3D hardware. This allows you to take advantage of graphical remote rendering on optimized infrastructure without moving data to save costs.

Custom logging with AWS Batch

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/custom-logging-with-aws-batch/

This post was written by Christian Kniep, Senior Developer Advocate for HPC and AWS Batch. 

For HPC workloads, visibility into the logs of jobs is important to debug a job which failed, but also to have insights into a running job and track its trajectory to influence the configuration of the next job or terminate the job because it went off track.

With AWS Batch, customers are able to run batch workloads at scale, reliably and with ease as this managed serves takes out the undifferentiated heavy lifting. The customer can then focus on submitting jobs and getting work done. Customers told us that at a certain scale, the single logging driver available within AWS Batch made it hard to separate logs as they were all ending up in the same log group in Amazon CloudWatch.

With the new release of customer logging driver support, customers are now able to adjust how the job output is logged. Not only customize the Amazon CloudWatch setting, but enable the use of external logging frameworks such as splunk, fluentd, json-files, syslog, gelf, journald.

This allow AWS Batch jobs to use the existing systems they are accustom to, with fine-grained control of the log data for debugging and access control purposes.

In this blog, I show the benefits of custom logging with AWS Batch by adjusting the log targets for jobs. The first example will customize the Amazon CloudWatch log group, the second will log to Splunk, an external logging service.

Example setup

To showcase this new feature, I use the AWS Command Line Interface (CLI) to setup the following:

  1. IAM roles, policies, and profiles to grant access and permissions
  2. A compute environment to provide the compute resources to run jobs
  3. A job queue, which supervises the job execution and schedules jobs on a compute environment
  4. A job definition, which uses a simple job to demonstrate how the new configuration can be applied

Once those tasks are completed, I submit a job and send logs to a customized CloudWatch log-group and Splunk.

Prerequisite

To make things easier, I first set a couple of environment variables to have the information handy for later use. I use the following code to set up the environment variables.

# in case it is not already installed
sudo yum install -y jq 
export MD_URL=http://169.254.169.254/latest/meta-data
export IFACE=$(curl -s ${MD_URL}/network/interfaces/macs/)
export SUBNET_ID=$(curl -s ${MD_URL}/network/interfaces/macs/${IFACE}/subnet-id)
export VPC_ID=$(curl -s ${MD_URL}/network/interfaces/macs/${IFACE}/vpc-id)
export AWS_REGION=$(curl -s ${MD_URL}/placement/availability-zone | sed 's/[a-z]$//')
export AWS_ACCT_ID=$(curl -s ${MD_URL}/identity-credentials/ec2/info |jq -r .AccountId)
export AWS_SG_DEFAULT=$(aws ec2 describe-security-groups \
--filters Name=group-name,Values=default \
|jq -r '.SecurityGroups[0].GroupId')

IAM

When using the AWS Management Console, you must create IAM roles manually.

Trust Policies

IAM Roles are defined to be used by a certain service. In the simplest case, you want a role to be used by Amazon EC2 – the service that provides the compute capacity in the cloud. This defines which entity is able to use an IAM Role, called Trust Policy. To set up a trust policy for an IAM role, use the following code snippet.

cat > ec2-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "ec2.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
EOF

Instance role

With the IAM trust policy, I now create an ecsInstanceRole and attach the pre-defined policy AmazonEC2ContainerServiceforEC2Role. This allows an instance to interact with Amazon ECS.

aws iam create-role --role-name ecsInstanceRole \
 --assume-role-policy-document file://ec2-trust-policy.json
aws iam create-instance-profile --instance-profile-name ecsInstanceProfile
aws iam add-role-to-instance-profile \
    --instance-profile-name ecsInstanceProfile \
    --role-name ecsInstanceRole
aws iam attach-role-policy --role-name ecsInstanceRole \
 --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role

Service Role

The AWS Batch service uses a role to interact with different services. The trust relationship reflects that the AWS Batch service is going to assume this role.  You can set up this role with the following logic.

cat > svc-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Service": "batch.amazonaws.com"
    },
    "Action": "sts:AssumeRole"
  }]
}
EOF
aws iam create-role --role-name AWSBatchServiceRole \
--assume-role-policy-document file://svc-trust-policy.json
aws iam attach-role-policy --role-name AWSBatchServiceRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole

In addition to dealing with Amazon ECS, the instance role can create and write to Amazon CloudWatch log groups, to control which log group names are used, a condition is attached.

While the compute environment is coming up, let us create and attach a policy to make a new log-group possible.

cat > policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "logs:CreateLogGroup"
    ],
    "Resource": "*",
    "Condition": {
      "StringEqualsIfExists": {
        "batch:LogDriver": ["awslogs"],
        "batch:AWSLogsGroup": ["/aws/batch/custom/*"]
      }
    }
  }]
}
EOF
aws iam create-policy --policy-name batch-awslog-policy \
    --policy-document file://policy.json
aws iam attach-role-policy --policy-arn arn:aws:iam::${AWS_ACCT_ID}:policy/batch-awslog-policy --role-name ecsInstanceRole

At this point, I created the IAM roles and policies so that the instance and service are able to interact with the AWS APIs, including trust-policies to define which services are meant to use them. EC2 for the ecsInstanceRole and the AWSBatchServiceRole for the AWS Batch service itself.

Compute environment

Now, I am going to create a compute environment, which is going to spin up an instance (one vCPU target) to run the example job in.

cat > compute-environment.json << EOF
{
  "computeEnvironmentName": "od-ce",
  "type": "MANAGED",
  "state": "ENABLED",
  "computeResources": {
    "type": "EC2",
    "allocationStrategy": "BEST_FIT_PROGRESSIVE",
    "minvCpus": 1,
    "maxvCpus": 8,
    "desiredvCpus": 1,
    "instanceTypes": ["m5.xlarge"],
    "subnets": ["${SUBNET_ID}"],
    "securityGroupIds": ["${AWS_SG_DEFAULT}"],
    "instanceRole": "arn:aws:iam::${AWS_ACCT_ID}:instance-profile/ecsInstanceRole",
    "tags": {"Name": "aws-batch-compute"},
    "bidPercentage": 0
  },
  "serviceRole": "arn:aws:iam::${AWS_ACCT_ID}:role/AWSBatchServiceRole"
}
EOF
aws batch create-compute-environment --cli-input-json file://compute-environment.json  

Once this section is complete, a compute environment is being spun up in the back. This will take a moment. You can use the following command to check on the status of the compute environment.

aws batch  describe-compute-environments

Once it is enabled and valid we can continue by setting up the job queue.

Job Queue

Now that I have a compute environment up and running, I will create a job queue which accepts job submissions and schedules the jobs on the compute environment.

cat > job-queue.json << EOF
{
  "jobQueueName": "jq",
  "state": "ENABLED",
  "priority": 1,
  "computeEnvironmentOrder": [{
    "order": 0,
    "computeEnvironment": "od-ce"
  }]
}
EOF
aws batch create-job-queue --cli-input-json file://job-queue.json

Job definition

The job definition is used as a template for jobs. This example runs a plain container and prints the environment variables. With the new release of AWS Batch, the logging driver awslogs now allows you to change the log group configuration within the job definition.

cat > job-definition.json << EOF
{
  "jobDefinitionName": "alpine-env",
  "type": "container",
  "containerProperties": {
  "image": "alpine",
  "vcpus": 1,
  "memory": 128,
  "command": ["env"],
  "readonlyRootFilesystem": true,
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": { 
      "awslogs-region": "${AWS_REGION}", 
      "awslogs-group": "/aws/batch/custom/env-queue",
      "awslogs-create-group": "true"}
    }
  }
}
EOF
aws batch register-job-definition --cli-input-json file://job-definition.json

Job Submission

Using the above job definition, you can now submit a job.

aws batch submit-job \
  --job-name test-$(date +"%F_%H-%M-%S") \
  --job-queue arn:aws:batch:${AWS_REGION}:${AWS_ACCT_ID}:job-queue/jq \
  --job-definition arn:aws:batch:${AWS_REGION}:${AWS_ACCT_ID}:job-definition/alpine-env:1

Now, you can check the ‘Log Group’ in CloudWatch. Go to the CloudWatch console and find the ‘Log Group’ section on the left.

log groups in cloudwatch

Now, click on the log group defined above, and you should see the output of the job which allows for debugging if something within the container went wrong or processing logs and create alarms and reports.

cloudwatch log events

Splunk

Splunk is an established log engine for a broad set of customers. You can use the Docker container to set up a Splunk server quickly. More information can be found in the Splunk documentation. You need to configure the HTTP Event Collector, which provides you with a link and a token.

To send logs to Splunk, create an additional job-definition with the Splunk token and URL. Please adjust the splunk-url and splunk-token to match your Splunk setup.

{
  "jobDefinitionName": "alpine-splunk",
  "type": "container",
  "containerProperties": {
    "image": "alpine",
    "vcpus": 1,
    "memory": 128,
    "command": ["env"],
    "readonlyRootFilesystem": false,
    "logConfiguration": {
      "logDriver": "splunk",
      "options": {
        "splunk-url": "https://<splunk-url>",
        "splunk-token": "XXX-YYY-ZZZ"
      }
    }
  }
}

This forwards the logs to Splunk, as you can see in the following image.

forward to splunk

Conclusion

This blog post showed you how to apply custom logging to AWS Batch using the awslog and Splunk logging driver. While these are two important logging drivers, please head over to the documentation to find out about fluentd, syslog, json-file and other drivers to find the best driver to match your current logging infrastructure.

 

EFA-enabled C5n instances to scale Simcenter STAR-CCM+

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/efa-enabled-c5n-instances-to-scale-simcenter-star-ccm/

This post was contributed by Dnyanesh Digraskar, Senior Partner SA, High Performance Computing; Linda Hedges, Principal SA, High Performance Computing

In this blog, we define and demonstrate the scalability metrics for a typical real-world application using Computational Fluid Dynamics (CFD) software from Siemens, Simcenter STAR-CCM+, running on a High Performance Computing (HPC) cluster on Amazon Web Services (AWS). This scenario demonstrates the scaling of an external aerodynamics CFD case with 97 million cells to over 4,000 cores of Amazon EC2 C5n.18xlarge instances using the Simcenter STAR-CCM+ software. We also discuss the effects of scaling on efficiency, simulation turn-around time, and total simulation costs. TLG Aerospace, a Seattle-based aerospace engineering services company, contributed the data used in this blog. For a detailed case study describing TLG Aerospace’s experience and the results they achieved, see the TLG Aerospace case study.

For HPC workloads that use multiple nodes, the cluster setup including the network is at the heart of scalability concerns. Some of the most common concerns from CFD or HPC engineers are “how well will my application scale on AWS?”, “how do I optimize the associated costs for best performance of my application on AWS?”, “what are the best practices in setting up an HPC cluster on AWS to reduce the simulation turn-around time and maintain high efficiency?” This post aims to answer these concerns by defining and explaining important scalability-related parameters by illustrating the results from the CFD case. For detailed HPC-specific information, see visit the High Performance Computing page and download the CFD whitepaper, Computational Fluid Dynamics on AWS.

CFD scaling on AWS

Scale-up

HPC applications, such as CFD, depend heavily on the applications’ ability to scale compute tasks efficiently in parallel across multiple compute resources. We often evaluate parallel performance by determining an application’s scale-up. Scale-up – a function of the number of processors used – is the time to complete a run on one processor, divided by the time to complete the same run on the number of processors used for the parallel run.

Scale-up formula

In addition to characterizing the scale-up of an application, scalability can be further characterized as “strong” or “weak”. Strong scaling offers a traditional view of application scaling, where a problem size is fixed and spread over an increasing number of processors. As more processors are added to the calculation, good strong scaling means that the time to complete the calculation decreases proportionally with increasing processor count. In comparison, weak scaling does not fix the problem size used in the evaluation, but purposely increases the problem size as the number of processors also increases. An application demonstrates good weak scaling when the time to complete the calculation remains constant as the ratio of compute effort to the number of processors is held constant. Weak scaling offers insight into how an application behaves with varying case size.

Figure 1, the following image, shows scale-up as a function of increasing processor count for the Simcenter STAR-CCM+ case data provided by TLG Aerospace. This is a demonstration of “strong” scalability. The blue line shows what ideal or perfect scalability looks like. The purple triangles show the actual scale-up for the case as a function of increasing processor count. The closeness of these two curves demonstrates excellent scaling to well over 3,000 processors for this mid-to-large-sized 97M cell case. This example was run on Amazon EC2 C5n.18xlarge Intel Skylake instances, 3.0 GHz, each providing 36 cores with Hyper-Threading disabled.

Figure 1. Strong scaling demonstrated for a 97M cell Simcenter STAR-CCM+ CFD calculation

Efficiency

Now that you understand the variation of scale-up with the number of processors, we discuss the relation of scale-up with number of grid cells per processor, which determines the efficiency of the parallel simulation. Efficiency is the scale-up divided by the number of processors used in the calculation. By plotting grid cells per processor, as in Figure 2, scaling estimates can be made for simulations with different grid sizes with Simcenter STAR-CCM+. The purple line in Figure 2 shows scale-up as a function of grid cells per processor. The vertical axis for scale-up is on the left-hand side of the graph as indicated by the purple arrow. The green line in Figure 2 shows efficiency as a function of grid cells per processor. The vertical axis for efficiency is on the right side of the graph and is indicated by the green arrow.

Figure 2. Scale-up and efficiency as a function of cells per processor.

Fewer grid cells per processor means reduced computational effort per processor. Maintaining efficiency while reducing cells per processor demonstrates the strong scalability of Simcenter STAR-CCM+ on AWS.

Efficiency remains at about 100% between approximately 700,000 cells per processor core and 60,000 cells per processor core. Efficiency starts to fall off at about 60,000 cells per core. An efficiency of at least 80% is maintained until 25,000 cells per core. Decreasing cells per core leads to decreased efficiency because the total computational effort per processor core is reduced. The goal of achieving more than 100% efficiency (here, at about 250,000 cells per core) is common in scaling studies, is case-specific, and often related to smaller effects such as timing variation and memory caching.

Turn-around time and cost

Case turn-around time and cost is what really matters to most HPC users. A plot of turn-around time versus CPU cost for this case is shown in Figure 3. As the number of cores increases, the total turn-around time decreases. But as the number of cores increases, the inefficiency also increases, which leads to increased costs. The cost, represented by solid blue curve, is based on the On-Demand price for the C5n.18xlarge, and only includes the computational costs. Small costs are also incurred for data storage. Minimum cost and turn-around time are achieved with approximately 60,000 cells per core.

Figure 3. Cost per run for: On-Demand pricing ($3.888 per hour for C5n.18xlarge in US-East-1) with and without the Simcenter STAR-CCM+ POD license cost as a function of turn-around time [Blue]; 3-yr all-upfront pricing ($1.475 per hour for C5n.18xlarge in US-East-1) [Green]

Many users choose a cell count per core count to achieve the lowest possible cost. Others may choose a cell count per core count to achieve the fastest turn-around time. If a run is desired in 1/3rd the time of the lowest price point, it can be achieved with approximately 25,000 cells per core.

Additional information about the test scenario

TLG Aerospace has used the Simcenter STAR-CCM+ Power-On-Demand (POD) license for running the simulations for this case. POD license enables flexible On-Demand usage of the software on unlimited cores for a fixed price of $22 per hour. The total cost per run, which includes the computational cost, plus the POD license cost is represented in Figure 3 by the dashed blue curve. As POD license is charged per hour, the total cost per run increases for higher turn-around times. Note that many users run Simcenter STAR-CCM+ with fewer cells per core than this case. While this increases the compute cost, other concerns—such as license costs or schedules—can be overriding factors. However, many find the reduced turn-around time well worth the price of the additional instances.

AWS also offers Savings Plans, which are a flexible pricing model offering substantially lower price on EC2 instances compared to On-Demand prices for a committed usage of 1- or 3-year term. For example, the 3-year all-upfront pricing of C5n.18xlarge instance is 62% cheaper than the On-Demand pricing. The total cost per run using the 3-year all-upfront pricing model is illustrated in Figure 3 by solid green line. The 3-year all-upfront pricing plan offers a substantial reduction in price for running the simulations.

Amazon Linux is optimized to run on AWS and offers excellent performance for running HPC applications. For the case presented here, the operating system used was Amazon Linux 2. While other Linux distributions are also performant, we strongly recommend that for Linux HPC applications, you use a current Linux kernel.

Amazon Elastic Block Store (Amazon EBS) is a persistent, block-level storage device that is often used for cluster storage on AWS. A standard EBS General Purpose SSD (gp2) volume was used for this scenario. For other HPC applications that may require faster I/O to prevent data writes from being a bottleneck to turn-around speed, we recommend FSx for Lustre. FSx for Lustre seamlessly integrates with Amazon S3, allowing users for efficient data interaction with Amazon S3.

AWS customers can choose to run their applications on either threads or cores. With hyper-threading, a single CPU physical core appears as two logical CPUs to the operating system. For an application like Simcenter STAR-CCM+, excellent linear scaling can be seen when using either threads or cores, though we generally recommend disabling hyper-threading. Most HPC applications benefit from disabling hyper-threading, and therefore, it tends to be the preferred environment for running HPC workloads. For more information, see Well-Architected Framework HPC Lens.

Elastic Fabric Adapter (EFA)

Elastic Fabric Adapter (EFA) is a network device that can be attached to Amazon EC2 instances to accelerate HPC applications by providing lower and consistent latency and higher throughput than the Transmission Control Protocol (TCP) transport. C5n.18xlarge instances used for running Simcenter STAR-CCM+ for this case support EFA technology, which is generally recommended for best scaling.

Summary

This post demonstrates the scalability of a commercial CFD software Simcenter STAR-CCM+ for an external aerodynamics simulation performed on the Amazon EC2 C5n.18xlarge instances. The availability of EFA, a high-performing network device on these instances result in excellent scalability of the application. The case turn-around time and associated costs of running Simcenter STAR-CCM+ on AWS hardware are discussed. In general, excellent performance can be achieved on AWS for most HPC applications. In addition to low cost and quick turn-around time, important considerations for HPC also include throughput and availability. AWS offers high throughput, scalability, security, cost-savings, and high availability, decreasing a long queue time and reducing the case turn-around time.