Tag Archives: Amazon FSx for Lustre

New — File Release for Amazon FSx for Lustre

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/new-file-release-for-amazon-fsx-for-lustre/

Amazon FSx for Lustre provides fully managed shared storage with the scalability and high performance of the open-source Lustre file systems to support your Linux-based workloads. FSx for Lustre is for workloads where storage speed and throughput matter. This is because FSx for Lustre helps you avoid storage bottlenecks, increase utilization of compute resources, and decrease time to value for workloads that include artificial intelligence (AI) and machine learning (ML), high performance computing (HPC), financial modeling, and media processing. FSx for Lustre integrates natively with Amazon Simple Storage Service (Amazon S3), synchronizing changes in both directions with automatic import and export, so that you can access your Amazon S3 data lakes through a high-performance POSIX-compliant file system on demand.

Today, I’m excited to announce file release for FSx for Lustre. This feature helps you manage your data lifecycle by releasing file data that has been synchronized with Amazon S3. File release frees up storage space so that you can continue writing new data to the file system while retaining on-demand access to released files through the FSx for Lustre lazy loading from Amazon S3. You specify a directory to release from, and optionally a minimum amount of time since last access, so that only data from the specified directory, and the minimum amount of time since last access (if specified), is released. File release helps you with data lifecycle management by moving colder file data to S3 enabling you to take advantage of S3 tiering.

File release tasks are initiated using the AWS Management Console, or by making an API call using the AWS CLI, AWS SDK, or Amazon EventBridge Scheduler to schedule release tasks at regular intervals. You can choose to receive completion reports at the end of your release task if so desired.

Initiating a Release Task
As an example, let’s look at how to use the console to initiate a release task. To specify criteria for files to release (for example, directories or time since last access), we define release data repository tasks (DRTs). DRTs release all files that are synchronized with Amazon S3 and that meet the specified criteria. It’s worth noting that release DRTs are processed in sequence. This means that if you submit a release DRT while another DRT (for example, import or export) is in progress, the release DRT will be queued but not processed until after the import or export DRT has completed.

Note: For the data repository association to work, automatic backups for the file system must be disabled (use the Backups tab to do this). Secondly, ensure that the file system and the associated S3 bucket are in the same AWS Region.

I already have an FSx for Lustre file system my-fsx-test.

I create a data repository association, which is a link between a directory on the file system and an S3 bucket or prefix.

I specify the name of the S3 bucket or an S3 prefix to be associated with the file system.

After the data repository association has been created, I select Create release task.

The release task will release directories or files that you want to release based on your specific criteria (again, important to remember that these files or directories must be synchronized with an S3 bucket in order for the release to work). If you specified the minimum last access for release (in addition to the directory), files that have not been accessed more recently than that will be released.

In my example, I chose to Disable completion reports. However, if you choose to Enable completion reports, the release task will produce a report at the end of the release task.

Files that have been released can still be accessed using existing FSx for Lustre functionality to automatically retrieve data from Amazon S3 back to the file system on demand. This is because, although released, their metadata stays on the file system.

File release won’t automatically prevent your file system from becoming full. It remains important to ensure that you don’t write more data than the available storage capacity before you run the next release task.

Now Available
File release on FSx for Lustre is available today in all AWS Regions where FSx for Lustre is supported, on all new or existing S3-linked file systems running Lustre version 2.12 or later. With file release on FSx for Lustre, there is no additional cost. However, if you release files that you later access again from the file system, you will incur normal Amazon S3 request and data retrieval costs where applicable when those files are read back into the file system.

To learn more, visit the Amazon FSx for Lustre Page, and please send feedback to AWS re:Post for Amazon FSx for Lustre or through your usual AWS support contacts.

Veliswa

Designing a hybrid AI/ML data access strategy with Amazon SageMaker

Post Syndicated from Franklin Aguinaldo original https://aws.amazon.com/blogs/architecture/designing-a-hybrid-ai-ml-data-access-strategy-with-amazon-sagemaker/

Over time, many enterprises have built an on-premises cluster of servers, accumulating data, and then procuring more servers and storage. They often begin their ML journey by experimenting locally on their laptops. Investment in artificial intelligence (AI) is at a different stage in every business organization. Some remain completely on-premises, others are hybrid (both on-premises and cloud), and the remaining have moved completely into the cloud for their AI and machine learning (ML) workloads.

These enterprises are also researching or have started using the cloud to augment their on-premises systems for several reasons. As technology improves, both the size and quantity of data increases over time. The amount of data captured and the number of datapoints continues to expand, which presents a challenge to manage on-premises. Many enterprises are distributed, with offices in different geographic regions, continents, and time zones. While it is possible to increase the on-premises footprint and network pipes, there are still hidden costs to consider for maintenance and upkeep. These organizations are looking to the cloud to shift some of that effort and enable them to burst and use the rich AI and ML features on the cloud.

Defining a hybrid data access strategy

Moving ML workloads into the cloud calls for a robust hybrid data strategy describing how and when you will connect your on-premises data stores to the cloud. For most, it makes sense to make the cloud the source of truth, while still permitting your teams to use and curate datasets on-premises. Defining the cloud as source of truth for your datasets means the primary copy will be in the cloud and any dataset generated will be stored in the same location in the cloud. This ensures that requests for data is served from the primary copy and any derived copies.

A hybrid data access strategy should address the following:

Understand your current and future storage footprint for ML on-premises. Create a map of your ML workloads, along with performance and access requirements for testing and training.
Define connectivity across on-premises locations and the cloud. This includes east-west and north-south traffic to support interconnectivity between sites, required bandwidth, and throughput for the data movement workload requirements.
Define your single source of truth (SSOT)[1] and where the ML datasets will primarily live. Consider how dated, new, hot, and cold data will be stored.
Define your storage performance requirements, mapping them to the appropriate cloud storage services. This will give you the ability to take advantage of cloud-native ML with Amazon SageMaker.

Hybrid data access strategy architecture

To help address these challenges, we worked on outlining an end-to-end system architecture in Figure 1 that defines: 1) connectivity between on-premises data centers and AWS Regions; 2) mappings for on-premises data to the cloud; and 3) Aligning Amazon SageMaker to appropriate storage, based on ML requirements.

AI/ML hybrid data access strategy reference architecture

Figure 1. AI/ML hybrid data access strategy reference architecture

Let’s explore this architecture step by step.

  1. On-premises connectivity to the AWS Cloud runs through AWS Direct Connect for high transfer speeds.
  2. AWS DataSync is used for migrating large datasets into Amazon Simple Storage Service (Amazon S3). AWS DataSync agent is installed on-premises.
  3. On-premises network file system (NFS) or server message block (SMB) data is bridged to the cloud through Amazon S3 File Gateway, using either a virtual machine (VM) or hardware appliance.
  4. AWS Storage Gateway uploads data into Amazon S3 and caches it on-premises.
  5. Amazon S3 is the source of truth for ML assets stored on the cloud.
  6. Download S3 data for experimentation to Amazon SageMaker Studio.
  7. Amazon SageMaker notebooks instances can access data through S3, Amazon FSx for Lustre, and Amazon Elastic File System. Use Amazon File Cache for high-speed caching for access to on-premises data, and Amazon FSx for NetApp ONTAP for cloud bursting.
  8. SageMaker training jobs can use data in Amazon S3, EFS, and FSx for Lustre. S3 data is accessed via File, Fast File, or Pipe mode, and pre-loaded or lazy-loaded when using FSx for Lustre as training job input. Any existing data on EFS can also be made available to training jobs as well.
  9. Leverage Amazon S3 Glacier for archiving data and reducing storage costs.

ML workloads using Amazon SageMaker

Let’s go deeper into how SageMaker can help you with your ML workloads.

To start mapping ML workloads to the cloud, consider which AWS storage services work with Amazon SageMaker. Amazon S3 typically serves as the central storage location for both structured and unstructured data that is used for ML. This includes raw data coming from upstream applications, and also curated datasets that are organized and stored as part of a Feature Store.

In the initial phases of development, a SageMaker Studio user will leverage S3 APIs to download data from S3 to their private home directory. This home directory is backed by a SageMaker-managed EFS file system. Studio users then point their notebook code (also stored in the home directory) to the local dataset and begin their development tasks.

To scale up and automate model training, SageMaker users can launch training jobs that run outside of the SageMaker Studio notebook environment. There are several options for making data available to a SageMaker training job.

  1. Amazon S3. Users can specify the S3 location of the training dataset. When using S3 as a data source, there are three input modes to choose from:
    • File mode. This is the default input mode, where SageMaker copies the data from S3 to the training instance storage. This storage is either a SageMaker-provisioned Amazon Elastic Block Store (Amazon EBS) volume or an NVMe SSD that is included with specific instance types. Training only starts after the dataset has been downloaded to the storage, and there must be enough storage space to fit the entire dataset.
    • Fast file mode. Fast file mode exposes S3 objects as a POSIX file system on the training instance. Dataset files are streamed from S3 on demand, as the training script reads them. This means that training can start sooner and require less disk space. Fast file mode also does not require changes to the training code.
    • Pipe mode. Pipe input also streams data in S3 as the training script reads it, but requires code changes. Pipe input mode is largely replaced by the newer and easier-to-use Fast File mode.
  2. FSx for Lustre. Users can specify a FSx for Lustre file system, which SageMaker will mount to the training instance and run the training code. When the FSx for Lustre file system is linked to a S3 bucket, the data can be lazily loaded from S3 during the first training job. Subsequent training jobs on the same dataset can then access it with low latency. Users can also choose to pre-load the file system with S3 data using hsm_restore commands.
  3. Amazon EFS. Users can specify an EFS file system that already contains their training data. SageMaker will mount the file system on the training instance and run the training code.
    Find out how to Choose the best data source for your SageMaker training job.

Conclusion

With this reference architecture, you can develop and deliver ML workloads that run either on-premises or in the cloud. Your enterprise can continue using its on-premises storage and compute for particular ML workloads, while also taking advantage of the cloud, using Amazon SageMaker. The scale available on the cloud allows your enterprise to conduct experiments without worrying about capacity. Start defining your hybrid data strategy on AWS today!

Additional resources:

[1] The practice of aggregating data from many sources to a single source or location.

Genomics workflows, Part 5: automated benchmarking

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/genomics-workflows-part-5-automated-benchmarking/

Launching and running genomics workflows can take hours and involves large pools of compute instances that process data at a petabyte scale. Benchmarking helps you evaluate workflow performance and discover faster and cheaper ways of running them.

In practice, performance evaluations happen irregularly because of the associated heavy lifting. In this blog post, we discuss how life-science research teams can automate evaluations.

Business Benefits

An automated benchmarking solution provides:

  • more accurate enterprise resource planning by performing historical analytics,
  • lower cost to the business by comparing performance on different resource types, and
  • cost transparency to the business by quantifying periodical chargeback.

We’ve used automated benchmarking to compare processing times on different services such as Amazon Elastic Compute Cloud (Amazon EC2), AWS Batch, AWS ParallelCluster, Amazon Elastic Kubernetes Service (Amazon EKS), and on-premises HPC clusters. Scientists, financiers, technical leaders, and other stakeholders can build reports and dashboards to compare consumption data by consumer, workflow type, and time period.

Design pattern

Our automated benchmarking solution measures performance on two dimensions:

  • Timing: measures the duration of a workflow launch on a specific dataset
  • Pricing: measures the associated cost

This solution can be extended to other performance metrics such as iterations per second or process/thread distribution across compute nodes.

Our requirements include the following:

  • Consistent measurement of timing based on workflow status (such as preparing, waiting, ready, running, failed, complete)
  • Extensible pricing models based on unit prices (the Amazon EC2 Spot price at a specific period of time compared to Amazon EC2 On-Demand pricing)
  • Scalable, cost-efficient, and flexible data store enabling historical benchmarking and estimations
  • Minimal infrastructure management overhead

We choose a serverless design pattern using AWS Step Functions orchestration, AWS Lambda for our application code, and Amazon DynamoDB to track workflow launch IDs and states (as described in Part 3). We assume that the genomics workflows run on AWS Batch with genomics data on Amazon FSx for Lustre (Part 1). AWS Step Functions allows us to break down processing into smaller steps and avoid monolithic application code. Our evaluation process runs in four steps:

  1. Monitor for completed workflow launches in the DynamoDB stream using an Amazon EventBridge pipe with a Step Functions workflow as target. This event-driven approach is more efficient than periodic polling and avoids custom code for parsing status and cost values in all records of the DynamoDB stream.
  2. Collect a list of all compute resources associated with the workflow launch. Design a Lambda function that queries the AWS Batch API (see Part 1) to describe compute environment parameters like the Amazon EC2 instance IDs and their details, such as processing times, instance family/size, and allocation strategy (for example, Spot Instances, Reserved Instances, On-Demand Instances).
  3. Calculate the cost of all consumed resources. We achieve this with another Lambda function, which calculates the total price based on unit prices from the AWS Price List Query API.
  4. Our state machine updates the total price in the DynamoDB table without the need for additional application code.

Figure 1 visualizes these steps.

Automated benchmarking of genomics workflows

Figure 1. Automated benchmarking of genomics workflows

Implementation considerations

AWS Step Functions orchestrates our benchmarking workflow reliably and makes our application code easy to maintain. Figure 2 summarizes the state machine transitions that we’ll describe.

AWS Step Functions state machine for automated benchmarking

Figure 2. AWS Step Functions state machine for automated benchmarking

Gather consumption details

Configure the DynamoDB stream view type to New image so that the entire item is passed through as it appears after it was changed. We set up an Amazon EventBridge pipe with event filtering and the DynamoDB stream as a source. Our event filter uses multiple matching on records with a status of COMPLETE, but no cost entry in order to avoid an infinite loop. Once our state machine has updated the DynamoDB item with the workflow price, the resulting record in the DynamoDB stream will not pass our event filter.

The syntax of our event filter is as follows:

{
  "dynamodb": {
    "NewImage": {
      "status": {
        "S": ["COMPLETE"]
      },
      "totalCost": {
        "S": [{
          "exists": false
        }]
      }
    }
  }
}

We use an input transformer to simplify follow-on parsing by removing unnecessary metadata from the event.

The consumed resources included in the stream record are the auto-scaling group ID for AWS Batch and the Amazon FSx for Lustre volume ID. We use the DescribeJobs API (describe_jobs in Boto3) to determine which compute resources were used. If the response is a list of EC2 instances, we then look up consumption information including start and end times using the ListJobs API (list_jobs in Boto3) for each compute node. We use describe_volumes with filters on the identified EC2 instances to obtain the size and type of Amazon Elastic Block Store (Amazon EBS) volumes.

Calculate prices

Another Lambda function obtains the associated unit prices of all consumed resources using the GetProducts request of AWS Price List Query API (get_products in Boto3) and then parsing the pricePerUnit value. For Spot Instances, we use describe_spot_price_history of the EC2 client in Boto3 and specify the time range and instance types for which we want to receive prices.

Calculate the price of workflow launches based on the following factors:

  • Number and size of EC2 instances in auto-scaling node groups
  • Size of EBS volumes and Amazon FSx for Lustre
  • Processing duration

Our Python-based Lambda function calculates the total, rounds it, and delivers the price breakdown in the following format:

total_cost: str, instance_cost: str, volume_cost: str, filesystem_cost: str

Lastly, we put the price breakdown to the DynamoDB table using UpdateItem directly from the Amazon States Language.

Note that AWS credits and enterprise discounts might not be reflected in the responses of the AWS Price List Query API unless applied to the particular AWS account. This is often considered best practice in light of least-privilege considerations.

In the past, we’ve also used AWS Cost Explorer instead of the AWS Price List API. AWS Cost Explorer data is updated at least once every 24 hours. You can denote the pending price status in the DynamoDB table item and use the Wait state to delay the calculation process.

The presented solution can be extended to other compute services such as Amazon Elastic Kubernetes Service (Amazon EKS). For Amazon EKS, events are enriched with the cluster ID from the DynamoDB table and the price calculation should also include control plane costs.

Conclusion

Life-science research teams use benchmarking to compare workflow performance and inform their architectural decisions. Such evaluations are effort-intensive and therefore done irregularly.

In this blog post, we showed how life-science research teams can automate benchmarking for their scientific workflows. The insights teams gain from automated benchmarking indicate continuous optimization opportunities, such as by adjusting compute node configuration. The evaluation data is also available on demand for other purposes including chargeback.

Stay tuned for our next post in which we show how to use historical benchmarking data for price estimations of future workflow launches.

Related information

Improve the performance of Apache Iceberg’s metadata file operations using Amazon FSx for Lustre on Amazon EMR

Post Syndicated from Rajarshi Sarkar original https://aws.amazon.com/blogs/big-data/improve-the-performance-of-apache-icebergs-metadata-file-operations-using-amazon-fsx-for-lustre-on-amazon-emr/

Apache Iceberg is an open table format for large datasets in Amazon Simple Storage Service (Amazon S3), and provides fast query performance over large tables, atomic commits, concurrent writes, and SQL-compatible table evolution. With Amazon EMR 6.5+, you can use Apache Spark on EMR clusters with the Iceberg table format.

Iceberg helps data engineers manage complex challenges such as continuously evolving datasets while maintaining query performance. Iceberg allows you to do the following:

  • Maintain transactional consistency on tables between multiple applications where files can be added, removed, or modified atomically with full read isolation and multiple concurrent writes
  • Implement full schema evolution to track changes to a table over time
  • Issue time travel queries to query historical data and verify changes between updates
  • Organize tables into flexible partition layouts with partition evolution, enabling updates to partition schemes as queries and data volume changes without relying on physical directories
  • Roll back tables to prior versions to quickly correct issues and return tables to a known good state
  • Perform advanced planning and filtering in high-performance queries on large datasets

In this post, we show you how to improve the performance of Iceberg’s metadata file operations using Amazon FSx for Lustre and Amazon EMR.

Performance of metadata file operations in Iceberg

The catalog, metadata layer, and data layer of Iceberg are outlined in the following diagram.

Iceberg maintains metadata across multiple small files (metadata file, manifest list, and manifest files) to effectively prune data, filter data, read the correct snapshot, merge delta files, and more. Although Iceberg has implemented fast scan planning to make sure that metadata file operations don’t take a large amount of time, the time taken is slightly high for object storage like Amazon S3 because it has a higher read/write latency.

In use cases like a high-throughput streaming application writing data into an S3 data lake in near-real time, snapshots are produced in microbatches at a very fast rate, resulting in a high number of snapshot files and causing degradation in performance of metadata file operations.

As shown in the following architecture diagram, the EMR cluster consumes from Kafka and writes to an Iceberg table, which uses Amazon S3 as storage and AWS Glue as the catalog.

In this post, we dive deep into how to improve query performance by caching metadata files in a low-latency file system like FSx for Lustre.

Overview of solution

FSx for Lustre makes it easy and cost-effective to launch and run the high-performance Lustre file system. You use it for workloads where speed matters, such as high throughput streaming writes, machine learning, high performance computing (HPC), video processing, and financial modelling. You can also link the FSx for Lustre file system to an S3 bucket, if required. FSx for Lustre offers multiple deployment options, including the following:

  • Scratch file systems, which are designed for temporary storage and short-term processing of data. Data isn’t replicated and doesn’t persist if a file server fails. Use scratch file systems when you need cost-optimized storage for short-term, processing-heavy workloads.
  • Persistent file systems, which are designed for long-term storage and workloads. The file servers are highly available, and data is automatically replicated within the same Availability Zone in which the file system is located. The data volumes attached to the file servers are replicated independently from the file servers to which they are attached.

The use case with Iceberg’s metadata files is related to caching, and the workloads are short-running (a few hours), so the scratch file system can be considered as a viable deployment option. A Scratch-2 file system with 200 MB/s/TiB of throughput is sufficient for our needs because Iceberg’s metadata files are small in size and we don’t expect a very high number of parallel connections.

You can use FSx for Lustre as a cache for the metadata files (on top of an S3 location) to offer better performance in terms of metadata file operations. To read/write files, Iceberg provides a capability to load a custom FileIO dynamically during runtime. You can pass the FSxForLustreS3FileIO reference using a Spark configuration, which takes care of reading/writing to appropriate file systems (FSx for Lustre for reads and Amazon S3 for writes). By enabling the catalog properties lustre.mount.path, lustre.file.system.path, and data.repository.path, Iceberg resolves the S3 path to FSx for Lustre path at runtime.

As shown in the following architecture diagram, the EMR cluster consumes from Kafka and writes to an Iceberg table that uses Amazon S3 as storage and AWS Glue as the catalog. Metadata reads are redirected to FSx for Lustre, which updates asynchronously.

Pricing and performance

We took a sample dataset across 100, 1,000, and 10,000 snapshots, and could observe up to 8.78 times speedup in metadata file operations and up to 1.26 times speedup in query time. Note that the benefit was observed for tables with a higher number of snapshots. The environment components used in this benchmark are listed in the following table.

Iceberg Version Spark Version Cluster Version Master Workers
0.14.1-amzn-0 3.3.0-amzn-1 Amazon EMR 6.9.0 m5.8xlarge 15 x m5.8xlarge

The following graph compares the speedup for each amount of snapshots.

You can calculate the price using the AWS Pricing Calculator. The estimated monthly cost of an FSx for Lustre file system (scratch deployment type) in the US East (N. Virginia) Region with 1.2 TB storage capacity and 200 MBps/TiB per unit storage throughput is $336.38.

The overall benefit is significant considering the low cost incurred. The performance gain in terms of metadata file operations can help you achieve low-latency read for high-throughput streaming workloads.

Prerequisites

For this walkthrough, you need the following prerequisites:

Create an FSx for Lustre file system

In this section, we walk through the steps to create your FSx for Lustre file system via the FSx for Lustre console. To use the AWS Command Line Interface (AWS CLI), refer to create-file-system.

  1. On the Amazon FSx console, create a new file system.
  2. For File system options, select Amazon FSx for Lustre.
  3. Choose Next.
  4. For File system name¸ enter an optional name.
  5. For Deployment and storage type, select Scratch, SSD, because it’s designed for short-term storage and workloads.
  6. For Throughput per unit of storage, select 200 MB/s/TiB.You can choose the storage capacity according to your use case. A Scratch-2 file system with 200 MB/s/TiB of throughput is sufficient for our needs because Iceberg’s metadata files are small in size and we don’t expect a very high number of parallel connections.
  7. Enter an appropriate VPC, security group, and subnet.Make sure that the security group has the appropriate inbound and outbound rules enabled to access the FSx for Lustre file system from Amazon EMR.
  8. In the Data Repository Import/Export section, select Import data from and export data to S3.
  9. Select Update my file and directory listing as objects are added to, changed in, or deleted from my S3 bucket to keep the file system listing updated.
  10. For Import bucket, enter the S3 bucket to store the Iceberg metadata.
  11. Choose Next and verify the summary of the file system, then choose Create File System.

When the file system is created, you can view the DNS name and mount name.

Create an EMR cluster with FSx for Lustre mounted

This section shows how to create an Iceberg table using Spark, though we can use other engines as well. To create your EMR cluster with FSx for Lustre mounted, complete the following steps:

  1. On the Amazon EMR console, create an EMR cluster (6.9.0 or above) with Iceberg installed. For instructions, refer to Use a cluster with Iceberg installed.

To use the AWS CLI, refer to create-cluster.

  1. Keep the network (VPC) and EC2 subnet the same as the ones you used when creating the FSx for Lustre file system.
  2. Create a bootstrap script and upload it to an S3 bucket that is accessible to EMR.Refer to the following bootstrap script to mount FSx for Lustre in an EMR cluster (the file system gets mounted in the /mnt/fsx path of the cluster). The file system DNS name and the mount name can be found in the file system summary details.
    sudo amazon-linux-extras install -y lustre2.10
    sudo mkdir -p /mnt/fsx
    sudo mount -t lustre -o noatime <Lustre-File-System-DNS-Name>@tcp:/<mount-Name> /mnt/fsx
    sudo ln -s /mnt/fsx /lustre
    sudo chmod -R 755 /mnt/fsx
    sudo chmod -R 755 /lustre

  3. Add the bootstrap action script to the EMR cluster.
  4. Specify your EC2 key pair.
  5. Choose Create cluster.
  6. When the EMR cluster is running, SSH into the cluster and launch the spark-sql using the following code:
    spark-sql --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
        --conf spark.sql.catalog.my_catalog.warehouse=s3://<bucket>/warehouse/sample_table \
        --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
        --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.FSxForLustreS3FileIO \
        --conf spark.sql.catalog.my_catalog.lustre.mount.path=file:///mnt/fsx \
        --conf spark.sql.catalog.my_catalog.lustre.file.system.path=/warehouse/sample_table/metadata \
        --conf spark.sql.catalog.my_catalog.data.repository.path=s3://<bucket>/warehouse/sample_table/metadata

Note the following:

    • At a Spark session level, the catalog properties io-impl, lustre.mount.path, lustre.file.system.path, and data.repository.path have been set.
    • io-impl sets a custom FileIO implementation that resolves the FSx for Lustre location (from the S3 location) during reads. lustre.mount.path is the local mount path in the EMR cluster, lustre.file.system.path is the FSx for Lustre file system path, and data.repository.path is the S3 data repository path, which is linked to the FSx for Lustre file system path. After all these properties are provided, the data.repository.path is resolved to the concatenation of lustre.mount.path and lustre.file.system.path during reads. Note that FSx for Lustre is eventually consistent after an update in Amazon S3. So, in case FSx for Lustre is catching up with the S3 updates, the FileIO will fall back to appropriate S3 paths.
    • If write.metadata.path is configured, make sure that the path doesn’t contain any trailing slashes and data.repository.path is equivalent to write.metadata.path.
  1. Create the Iceberg database and table using the following queries:
    spark-sql> CREATE DATABASE IF NOT EXISTS my_catalog.db_iceberg;
    
    spark-sql> CREATE TABLE IF NOT EXISTS my_catalog.db_iceberg.sample_table (id int, data string)
    USING iceberg
    LOCATION 's3://<bucket>/warehouse/sample_table';

    Note that to migrate an existing table to use FSx for Lustre, you must create the FSx for Lustre file system, mount the same while starting the EMR cluster, and start the Spark session as highlighted in the previous step. The Amazon S3 listing of the existing table is updated in the FSx for Lustre file system eventually.

  2. Insert the data into the table using an INSERT INTO query and then query the same:
    spark-sql> INSERT INTO my_catalog.db_iceberg.sample_table VALUES (1, 'a'), (2, 'b'), (3, 'c');
    
    spark-sql> SELECT * FROM my_catalog.db_iceberg.sample_table;

  3. You can now view the metadata files in the local FSx mount, which is also linked to the S3 bucket s3://<bucket>/warehouse/sample_table/metadata/:
    $ ls -ltr /mnt/fsx/warehouse/sample_table/metadata/
    total 3
    -rw-r--r-- 1 hadoop hadoop 1289 Mar 22 12:01 00000-40c0ef36-5a4c-4b3e-ba88-68398581c1a8.metadata.json
    -rw-r--r-- 1 hadoop hadoop 1289 Mar 22 12:01 00000-0b2b7ee5-4167-4fab-9527-d79d05c0a864.metadata.json
    -rw-r--r-- 1 hadoop hadoop 3727 Mar 22 12:03 snap-8127395396703547805-1-9a1cf328-db72-4cac-ad61-018048c3c470.avro
    -rw-r--r-- 1 hadoop hadoop 5853 Mar 22 12:03 9a1cf328-db72-4cac-ad61-018048c3c470-m0.avro
    -rw-r--r-- 1 hadoop hadoop 2188 Mar 22 12:03 00001-100c61d1-51c4-4337-8641-a6ed5ed9802e.metadata.json

  4. You can view the metadata files in Amazon S3:
    [hadoop@ip-172-31-17-161 ~]$ aws s3 ls s3://<bucket>/warehouse/sample_table/metadata/
    2022-03-24 05:29:46         20 .00000-0b2b7ee5-4167-4fab-9527-d79d05c0a864.metadata.json.crc
    2022-03-24 05:29:45         20 .00000-40c0ef36-5a4c-4b3e-ba88-68398581c1a8.metadata.json.crc
    2022-03-24 05:29:45         28 .00001-100c61d1-51c4-4337-8641-a6ed5ed9802e.metadata.json.crc
    2022-03-24 05:29:46         56 .9a1cf328-db72-4cac-ad61-018048c3c470-m0.avro.crc
    2022-03-24 05:29:45         40 .snap-8127395396703547805-1-9a1cf328-db72-4cac-ad61-018048c3c470.avro.crc
    2022-03-24 05:29:45       1289 00000-0b2b7ee5-4167-4fab-9527-d79d05c0a864.metadata.json
    2022-03-24 05:29:46       1289 00000-40c0ef36-5a4c-4b3e-ba88-68398581c1a8.metadata.json
    2022-03-24 05:29:45       2188 00001-100c61d1-51c4-4337-8641-a6ed5ed9802e.metadata.json
    2022-03-24 05:29:45       5853 9a1cf328-db72-4cac-ad61-018048c3c470-m0.avro
    2022-03-24 05:29:46       3727 snap-8127395396703547805-1-9a1cf328-db72-4cac-ad61-018048c3c470.avro

  5. You can also view the data files in Amazon S3:
    $ aws s3 ls s3://<bucket>/warehouse/sample_table/data/
    2022-03-22 12:03:04        619 00000-0-2a0e3499-189e-42c3-8c86-df47c91b1a11-00001.parquet
    2022-03-22 12:03:04        619 00001-1-b3e34418-30cf-4a81-80c7-04d2fc089435-00001.parquet

Clean up

When you’re done exploring the solution, complete the following steps to clean up the resources:

  1. Drop the Iceberg table.
  2. Delete the EMR cluster.
  3. Delete the FSx for Lustre file system.
  4. If any orphan files are present, empty the S3 bucket.
  5. Delete the EC2 key pair.
  6. Delete the VPC.

Conclusion

In this post, we demonstrated how to create an FSx for Lustre file system and an EMR cluster with the file system mounted. We observed the performance gain in terms of Iceberg metadata file operations and then cleaned up so as not to incur any additional charges.

Using FSx for Lustre with Iceberg on Amazon EMR allows you to gain significant performance in terms of metadata file operations. We observed 6.33–8.78 times speedup in metadata file operations and 1.06–1.26 times speedup in query time for Iceberg tables with 100, 1,000, and 10,000 snapshots. Note that this approach reduces the time for metadata file operations and not for the data operations. The overall performance gain would be dependent on the number of metadata files, size of each metadata files, amount of data that is being processed, and so on.


About the Author

Rajarshi Sarkar is a Software Development Engineer at Amazon EMR. He works on cutting-edge features of Amazon EMR and is also involved in open-source projects such as Apache Iceberg and Trino. In his spare time, he likes to travel, watch movies and hang out with friends.

AWS Week in Review – September 19, 2022

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-september-19-2022/

Things are heating up in Seattle, with preparation for AWS re:Invent 2022 well underway. Later this month the entire News Blog team will participate in our now-legendary “speed storming” event. Over the course of three or four days, each of the AWS service teams with a launch in the works for re:Invent will give us an overview and share their PRFAQ (Press Release + FAQ) with us. After the meetings conclude, we’ll divvy up the launches and get to work on our blog posts!

Last Week’s Launches
Here are some of the launches that caught my eye last week:

Amazon Lex Visual Conversation Builder – This new tool makes bot design easier than ever. You get a complete view of the conversation in one place, and you can manage complex conversations that have dynamic paths. To learn more and see the builder in action, read Announcing Visual Conversation Builder for Amazon Lex on the AWS Machine Learning Blog.

AWS Config Conformance Pack Price Reduction – We have reduced the price for evaluation of AWS Config Conformance Packs by up to 58%. These packs contain AWS Config rules and remediation actions that can be deployed as a single entity in account and a region, or across an entire organization. The price reduction took effect September 14, 2022; it lowers the cost per evaluation and decreases the number of evaluations needed to reach each pricing tier.

CDK (Cloud Development Kit) Tree View – The AWS CloudFormation console now includes a Constructs tree view that automatically organizes the resources that were synthesized by AWS CDK constructs. The top level of the tree view includes the named constructs and the second level includes all of the resources generated by the named construct. Read the What’s New to learn more!

AWS Incident Detection and ResponseAWS Enterprise Support customers now have access to proactive monitoring and incident management for selected workloads running on AWS. As part of the onboarding process, AWS experts review workloads for reliability and operational excellence, and work with the customer to identify critical metrics and associated alarms. Incident Management Engineers then monitor the workloads, detect critical incidents, and initiate a call bridge to accelerate recovery. Read the AWS Incident Detection and Response page and the What’s New to learn more.

ECS Cluster Scale-In Speed – Auto-Scaled ECS clusters can now scale-in (reduce capacity) faster than ever before. Previously, each scale-in would reduce the capacity within an Auto Scaling Group (ASG) by 5% at a time. Now, capacity can be reduced by up to 50%. This change makes scaling more responsive to workload changes while still maintaining availability for spiky traffic patterns. Read Faster Scaling-In for Amazon ECS Cluster Auto Scaling and the What’s New to learn more.

AWS Outposts Rack Networking – AWS Outposts racks now support local gateway ingress routing to redirect incoming traffic to an Elastic Network Interface (ENI) attached to an EC2 instance before traffic reaches workloads running on the Outpost; read Deploying Local Gateway Ingress Routing on AWS Outposts to learn more. Outposts racks now also support direct VPC routing to simplify the process of communicating with your on-premises network; read the What’s New to learn more.

Amazon SWF Console Experience Updated – The new console experience for Amazon Simple Workflow Service (SWF) gives you better visibility of your SWF domains along with additional information about your workflow executions and events. You can efficiently manage high-volume workloads and quickly find the detailed information that helps you to operate at peak efficiency. Read the What’s New to learn more.

Dynamic Intermediate Certificate Authorities – According to a post on the AWS Security Blog, public certificates issued through AWS Certificate Manager (ACM) will soon (October 11, 2022) be issued from one of several intermediate certificate authorities managed by Amazon. This change will be transparent to most customers and applications, except those that make use of certificate pinning. In some cases, older browsers will need to be updated in order to properly trust the Amazon Trust Services CAs.

X in Y – We launched existing AWS services and instance types in additional regions:

Other AWS News
AWS Open Source – Check out Installment #127 of the AWS Open Source News and Updates Newsletter to learn about new tools for AWS CloudFormation, AWS Lambda, Terraform / EKS, AWS Step Functions, AWS Identity and Access Management (IAM), and more.

New Case Study – Read this new case study to learn how the Deep Data Research Computing Center at Stanford University is creating tools designed to bridge the gap between biology and computer science in order to help researchers in precision medicine deliver tangible medical solutions.

Application Management – The AWS DevOps Blog showed you how to Implement Long-Running Deployments with AWS CloudFormation Custom Resources Using AWS Step Functions.

Architecture – The AWS Architecture Blog showed you how to Maintain Visibility Over the Use of Cloud Architecture Patterns.

Big Data – The AWS Big Data Blog showed you how to Optimize Amazon EMR Costs for Legacy and Spark Workloads.

Migration – In a two-part series on the AWS Compute Blog, Marcia showed you how to Lift and Shift a Web Application to AWS Serverless (Part 1, Part 2).

Mobile – The AWS Mobile Blog showed you how to Build Your Own Application for Route Optimization and Tracking using AWS Amplify and Amazon Location Service.

Security – The AWS Security Blog listed 10 Reasons to Import a Certificate into AWS Certificate Manager and 154 AWS Services that have achieved HITRUST Certificiation.

Training and Certification – The AWS Training and Certification Blog talked about The Value of Data and Pursuing the AWS Certified Data Analytics – Specialty Certification.

Containers – The AWS Containers Blog encouraged you to Achieve Consistent Application-Level Tagging for Cost Tracking in AWS.

Upcoming AWS Events
Check your calendar and sign up for an AWS event in your locale:

AWS Summits – Come together to connect, collaborate, and learn about AWS. Registration is open for the following in-person AWS Summits: Mexico City (September 21–22), Bogotá (October 4), and Singapore (October 6).

AWS Community DaysAWS Community Day events are community-led conferences to share and learn with one another. In September, the AWS community in the US will run events in Arlington, Virginia (September 30). In Europe, Community Day events will be held in October. Join us in Amersfoort, Netherlands (October 3), Warsaw, Poland (October 14), and Dresden, Germany (October 19).

AWS Fest – This third-party event will feature AWS influencers, community heroes, industry leaders, and AWS customers, all sharing AWS optimization secrets (September 29th), register here.

Stay Informed
I hope that you have enjoyed this look back at some of what took place in AWS-land last week! To better keep up with all of this news, please check out the following resources:

Jeff;

Run Apache Spark with Amazon EMR on EKS backed by Amazon FSx for Lustre storage

Post Syndicated from Vara Bonthu original https://aws.amazon.com/blogs/big-data/run-apache-spark-with-amazon-emr-on-eks-backed-by-amazon-fsx-for-lustre-storage/

Traditionally, Spark workloads have been run on a dedicated setup like a Hadoop stack with YARN or MESOS as a resource manager. Starting from Apache Spark 2.3, Spark added support for Kubernetes as a resource manager. The new Kubernetes scheduler natively supports the submission of Spark jobs to a Kubernetes cluster. Spark on Kubernetes provides simpler administration, better developer experience, easier dependency management with containers, a fine-grained security layer, and optimized resource allocation. As a result, Spark on Kubernetes gained much traction for high-performance and cost-effective ways of running big data and machine learning (ML) workloads.

In AWS, we offer a managed service, Amazon EMR on EKS, to run your Apache Spark workloads on Amazon Elastic Kubernetes Service (Amazon EKS) . This service uses the Amazon EMR runtime for Apache Spark, which increases the performance of your Spark jobs so that they run faster and cost less. EMR on EKS lets you run Spark applications alongside other application types on the same Amazon EKS cluster to improve resource utilization. In addition, EMR on EKS integrates with Amazon EMR Studio for authoring jobs and the Apache Spark UI for debugging out of the box to simplify infrastructure management.

For storage, EMR on EKS supports node ephemeral storage using hostPath where the storage is attached to individual nodes, and Amazon Elastic Block Store (Amazon EBS) volume per executor/driver pod using dynamic Persistent Volume Claims. However, some Spark users are looking for an HDFS-like shared file system to handle specific workloads like time-sensitive applications or streaming analytics. HDFS is best suited for jobs that requires highly interactive speed for a large number of files with random access reads, atomic rename operations, and sequential metadata requests.

Amazon FSx for Lustre is a fully managed shared storage option built on the world’s most popular high-performance file system. It offers highly scalable, cost-effective storage, which provides sub-millisecond latencies, millions of IOPS, and throughput of hundreds of gigabytes per second. Its popular use cases include high-performance computing (HPC), financial modeling, video rendering, and machine learning. FSx for Lustre supports two types of deployments:

  • Scratch file systems – These are designed for temporary or short-term storage where the data is not needed to replicate or persist if a file server fails
  • Persistent file systems – These are suitable for long-term storage where the file server is highly available and the data is replicated within the Availability Zone

In both deployment types, automatic data sync between the mounted file system and Amazon Simple Storage Service (Amazon S3) buckets is supported, helping you offload large volumes of cold and warm data for a better cost-efficient design. It makes multi-AZ or multi-region failover possible via Amazon S3 for businesses that require resiliency and availability.

This post demonstrates how to use EMR on EKS to submit Spark jobs with FSx for Lustre as the storage. It can be mounted on Spark driver and executor pods through static and dynamic PersistentVolumeClaims methods.

Static vs. dynamic provisioning

With static provisioning, the FSx for Lustre file system and PersistentVolume (PV) must be created in advance. The following diagram illustrates the static provisioning architecture. The Spark application driver and executor pods refer to an existing static PersistentVolumeClaim (PVC) to mount the FSx for Lustre file system.

Unlike static provisioning, the FSx for Lustre file system and PV doesn’t need to be pre-created for dynamic provisioning. As shown in the following diagram, the FSx for Lustre CSI driver plugin is deployed to an Amazon EKS cluster to dynamically provision the FSx for Lustre file system with a given PVC. Dynamic provisioning only requires a PVC and the corresponding storage class. After the PVC is created in Kubernetes, the FSx for Lustre CSI driver identifies the storage class and creates the requested file system.

The Spark application driver and executor pods in the architecture refer to an existing dynamic PVC to mount the FSx for Lustre file system.

Solution overview

In this post, you provision the following resources with Amazon EKS Blueprints for Terraform to run Spark jobs using EMR on EKS:

Pre-requisites

Before you build the entire infrastructure, you must have the following prerequisites:

Now you’re ready to deploy the solution.

Clone the GitHub repo

Open your terminal window, change to the home directory, and clone the GitHub repo:

cd ~
git clone https://github.com/aws-ia/terraform-aws-eks-blueprints.git

Then, navigate to the following:

cd ~/terraform-aws-eks-blueprints/examples/analytics/emr-eks-fsx-lustre

Initialize Terraform

Initialize the project, which downloads plugins that allow Terraform to interact with AWS services:

terraform init

Run terraform plan

Run terraform plan to verify the resources created by this deployment:

export AWS_REGION="<enter-your-region>"
terraform plan

The terraform plan output shows the resources that are created by this plan.

Run terraform apply

Run terraform apply to deploy the resources:

terraform apply --auto-approve

This deployment may take up to 30 minutes to create all the resources.

Verify the resources

Verify the Amazon EKS cluster created by the deployment. This following command displays the cluster details in JSON format:

aws eks describe-cluster --name emr-eks-fsx-lustre

Let’s create a kubeconfig file for the EKS cluster with the following command. This command creates a new cluster context entry with certificate authority data under ~/.kube/config to authenticate with the EKS cluster:

aws eks --region <ENTER_YOUR_REGION> update-kubeconfig --name emr-eks-fsx-lustre

Verify the managed node groups:

aws eks list-nodegroups —cluster-name emr-eks-fsx-lustre

The output should show two node groups:

{
    "nodegroups": [
        "core-node-grp-<some_random_numbers>",
        "spark-node-grp-<some_random_numbers>"
    ]
}

List the pods created by the FSx for Lustre CSI driver. The following command shows two controllers and an fsx-csi-node daemonset pod for each node:

kubectl get pods -n kube-system | grep fsx

List the namespace created for emr-data-team-a:

kubectl get ns | grep emr-data-team-a

The output will display the active namespace.

List the FSx storage class, PV, and PVCs created by this deployment. You may notice that fsx-dynamic-pvc is in Pending status because this dynamic PVC is still creating the FSx for Lustre. The dynamic PV status changed to Bound after the file system was created.

#FSx Storage Class
kubectl get storageclasses | grep fsx
  emr-eks-fsx-lustre   fsx.csi.aws.com         Delete          Immediate              false                  109s

# Output of static persistent volume with name fsx-static-pv
kubectl get pv | grep fsx  
  fsx-static-pv                              1000Gi     RWX            Recycle          Bound    emr-data-team-a/fsx-static-pvc       fsx

# Output of static persistent volume claim with name fsx-static-pvc and fsx-dynamic-pvc
kubectl get pvc -n emr-data-team-a | grep fsx
  fsx-dynamic-pvc   Pending                                             fsx            4m56s
  fsx-static-pvc    Bound     fsx-static-pv   1000Gi     RWX            fsx            4m56s

Log in to the FSx for Lustre console and verify the two file systems created by this deployment:

  • The first file system (emr-eks-fsx-lustre-static) is a persistent file system created with the Terraform resource
  • The second file system (fs-0e77adf20acb4028f) is created by the FSx for Lustre CSI driver dynamically with a dynamic PVC manifest

In this demo, we learn how to use a statically provisioned FSx for Lustre file system and dynamically provisioned FSx for Lustre file system in EMR on EKS Spark jobs.

Static provisioning

You can create an FSx for Lustre file system using the AWS CLI or any infrastructure as code (IaC) tool. In this example, we used Terraform to create the FSx for Lustre file system with deployment type as PERSISTENT_2. For static provisioning, we must create the FSx for Lustre file system first, followed by the PV and PVCs. After we create all three resources, we can mount the FSx for Lustre file system on a Spark driver and executor pod.

We use the following Terraform code snippet in the deployment to create the FSx for Lustre file system (2400 GB) and the file system association with the S3 bucket for import and export under the /data file system path. Note that this resource refers to a single subnet (single Availability Zone) for creating an FSx for Lustre file system. However, the Spark pods can use this file system across all Availability Zones, unlike the EBS volume, which is Availability Zone specific. In addition, the FSx for Lustre association with the S3 bucket creates a file system directory called /data. The Spark job driver and executor pod templates use this /data directory as a spark-local-dir for scratch space.

# New FSx for Lustre filesystem
resource "aws_fsx_lustre_file_system" "this" {
  deployment_type             = "PERSISTENT_2"
  storage_type                = "SSD"
  per_unit_storage_throughput = "500"
  storage_capacity            = 2400

  subnet_ids         = [module.vpc.private_subnets[0]]
  security_group_ids = [aws_security_group.fsx.id]
  log_configuration {
    level = "WARN_ERROR"
  }
  tags = merge({ "Name" : "${local.name}-static" }, local.tags)
}

# S3 bucket association with FSx for Lustre filesystem
resource "aws_fsx_data_repository_association" "example" {
  file_system_id       = aws_fsx_lustre_file_system.this.id
  data_repository_path = "s3://${aws_s3_bucket.this.id}"
  file_system_path     = "/data" # This directory will be used in Spark podTemplates under volumeMounts as subPath

  s3 {
    auto_export_policy {
      events = ["NEW", "CHANGED", "DELETED"]
    }

    auto_import_policy {
      events = ["NEW", "CHANGED", "DELETED"]
    }
  }
}

Persistent Volume

The following YAML template shows the definition of the PV created by this deployment. For example, running the command kubectl edit pv fsx-static-pv -n kube-system displays the manifest. PVs are a cluster scoped resource, therefore no namespace is defined in the template. The DevOps or cluster admin teams typically create this.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: fsx-static-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1000Gi
  claimRef:  # PV Claimed by fsx-static-pvc                
    apiVersion: v1
    kind: PersistentVolumeClaim             
    name: fsx-static-pvc
    namespace: emr-data-team-a
    resourceVersion: "5731"
    uid: 9110afc4-c605-440e-b022-190904866f0c
  csi:
    driver: fsx.csi.aws.com
    volumeAttributes:
      dnsname: fs-0a85fd096ef3f0089.fsx.eu-west-1.amazonaws.com # FSx DNS Name
      mountname: fz5jzbmv
    volumeHandle: fs-0a85fd096ef3f0089
  mountOptions:
  - flock
  persistentVolumeReclaimPolicy: Recycle

Persistent Volume Claim

The following YAML template shows the definition of the PVC created by this deployment. For example, running the command kubectl edit pvc fsx-static-pvc -n emr-data-team-a shows the deployed resource.

PVCs are namespace-specific resources typically created by the developers. The emr-data-team-a namespace is defined in the template.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fsx-static-pvc
  namespace: emr-data-team-a
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1000Gi
  storageClassName: fsx
  volumeMode: Filesystem
  volumeName: fsx-static-pv
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1000Gi
  phase: Bound

Now that we have set up the static FSx for Lustre file system, we can use the PVC in EMR on EKS Spark jobs with pod templates. Key things to note in the template are that the volumes section in the following code is defined as persistentVolumeClaim with the claim name as fsx-static-pvc, and the containers section refers to the unique mountPath folder /static. We also use initContainers in the driver pod template to give correct permissions and ownership to the Hadoop users to be used by EMR on EKS driver executor pods. Finally, notice that data in the subPath is associated with the S3 bucket sync in the preceding Terraform resource.

We use the following driver pod template:

apiVersion: v1
kind: Pod
metadata:
  name: fsx-taxi-driver
  namespace: emr-data-team-a
spec:
  volumes:
    - name: spark-local-dir-ny-taxi
      persistentVolumeClaim:
        claimName: fsx-static-pvc # Static PVC pre-created by this example terraform template

  nodeSelector:
    NodeGroupType: spark

  containers:
    - name: spark-kubernetes-driver 
      volumeMounts:
        - name: spark-local-dir-ny-taxi
          mountPath: /static
          subPath: data # sub folder created in FSx for Lustre filesystem and mapped to s3 bucket sync and export
          readOnly: false
  initContainers:
    - name: spark-init-container-driver  
      image: public.ecr.aws/y4g4v0z7/busybox
      volumeMounts:
        - name: spark-local-dir-ny-taxi
          mountPath: /static
      command: ["sh", "-c", "chmod -R 777 /static", "chown -hR +999:+1000 /static/data"]

The executor pod template also refers to the same persistentVolumeClaim as fsx-static-pvc and volumeMounts with mountPath as /static. Notice that we don’t use the initContainers section in this template because the required permissions for the file system directory /static/data have been applied by the driver processes already. Because it’s a shared file system, the same permissions apply to the executor process as well.

apiVersion: v1
kind: Pod
metadata:
  name: fsx-taxi-exec
  namespace: emr-data-team-a
spec:
  volumes:
    - name: spark-local-dir-ny-taxi
      persistentVolumeClaim:
        claimName: fsx-static-pvc # Static PVC pre-created by this example terraform template

  nodeSelector:
    NodeGroupType: spark

  containers:
    - name: spark-kubernetes-executor # Don't change this name. EMR on EKS looking for this name
      volumeMounts:
        - name: spark-local-dir-ny-taxi
          mountPath: /static # mountPath name can be anything but this should match with Driver template as well
          subPath: data # sub folder created in FSx for Lustre filesystem and mapped to s3 bucket sync and export
          readOnly: false

Let’s run the sample PySpark script using the preceding pod templates. Navigate to the examples/spark-execute directory and run the shell script (fsx-static-spark.sh):

cd ~/terraform-aws-eks-blueprints/examples/analytics/emr-eks-fsx-lustre/examples/spark-execute

This shell script expects three input values. EMR_VIRTUAL_CLUSTER_ID and EMR_JOB_EXECUTION_ROLE_ARN can be extracted from the Terraform output values. Additionally, you create an S3 bucket with required permissions. This S3 bucket stores the sample PySpark scripts, pod templates, input and output data generated by this shell script, and the Spark job. Check out the shell script for more details.

EMR_VIRTUAL_CLUSTER_ID=$1     # Terraform output variable: emrcontainers_virtual_cluster_id    
S3_BUCKET=$2                  # This script requires s3 bucket as input parameter e.g., s3://<bucket-name>    
EMR_JOB_EXECUTION_ROLE_ARN=$3 # Terraform output variable: emr_on_eks_role_arn

Let’s run the fsx-static-spark.sh shell script. This job takes approximately 6 minutes by two executors, which processes 40 objects with a total size of 1.4 GB. Each object is around 36.4 MB. You can adjust the number of objects from 40 to any large number to process a large amount of data. This shell script downloads the public dataset (NY Taxi Trip Data) locally in your disk and uploads it to the S3 bucket using Amazon S3 sync. PySpark jobs read the data from the S3 buckets, apply GroupBy on a few fields, and write back to the S3 bucket to demonstrate the shuffling activity.

./fsx-static-spark.sh <EMR_VIRTUAL_CLUSTER_ID> \
s3://<YOUR_BUCKET_NAME> \
<EMR_JOB_EXECUTION_ROLE_ARN>

You can run the following queries to monitor the Spark job and the usage of the FSx for Lustre file system mounted on the driver and executor pods. Verify the job run events with the following command:

kubectl get pods --namespace=emr-data-team-a -w

You will notice one job object pod, a driver pod, and two executor pods. The Spark executor instances count can be updated in the Shell script.

You can also query to monitor the usage of FSx for Lustre mounted file system size. The following command shows the size of the mounted file system growth during the test run:

# Verify the used FSx for Lustre filesystem disk size with executor1
kubectl exec -ti ny-taxi-trip-static-exec-1 -c spark-kubernetes-executor -n emr-data-team-a — df -h

# Verify the files created under /static/data FSx mount
kubectl exec -ti ny-taxi-trip-static-exec-1 -c spark-kubernetes-executor -n emr-data-team-a — ls -lah /static

# Verify the file sync from FSx to S3 bucket. 
aws s3 ls s3://<YOUR_SYNC_BUCKET_NAME_FROM_TERRAFORM_OUTPUT>/

The following diagram shows the output for the preceding commands. The files under the executor are the same as those under the S3 bucket. These files are the same because the S3 sync feature is enabled in the FSx for Lustre file system. This test uses the FSx for Lustre file system for scratch space, so the shuffle files will be deleted from the FSx for Lustre file system and S3 bucket when the test is complete.

This PySpark job is writing the aggregated and repartition output directly to an S3 bucket location. Instead, you can choose to write to the FSx for Lustre file system path, which syncs to an S3 bucket eventually. The FSx for Lustre file system provides low latency, high throughput, and high IOPS for reading and writing data by multiple Spark Jobs. In addition, the data stored in FSx disk is synced to an S3 bucket for durable storage.

You can monitor the FSx for Lustre file system using Amazon CloudWatch metrics. The following time series graph shows the average stats with a period of 30 seconds.

When the Spark job is complete, you can verify the results in the Spark Web UI from the EMR on EKS console.

You can also verify the FSx for Lustre file system data sync to an S3 bucket.

Dynamic provisioning

So far, we have looked at an FSx for Lustre statically provisioned file system example and its usage with Spark jobs.

We can also provision an FSx for Lustre file system on-demand using the FSx for Lustre CSI driver and Persistent Volume Claim. Whenever you create a PVC with a dynamic volume referring to an FSx storage class, the FSx for Lustre CSI driver automatically provisions the FSx for Lustre file system and the corresponding Persistent Volume. Admin teams (DevOps) are responsible for deploying the FSx for Lustre CSI driver and FSx storage class, and the developers and data engineers (DataOps) are responsible for deploying the PVC, which refers to the FSx storage class.

The following storage class is deployed to Amazon EKS by this Terraform deployment. This dynamic PVC example doesn’t use the Amazon S3 backup association. You can still do that, but it requires an Amazon S3 config in the storage class manifest. Check out Dynamic Provisioning with Data Repository to configure the FSx storage class with the S3 import/export path with the choice of deployment type (SCRATCH_1, SCRATCH_2 and PERSISTENT_1). We have also created a dedicated security group used in this manifest. For more information, refer to File System Access Control with Amazon VPC.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fsx
provisioner: fsx.csi.aws.com
parameters:
  securityGroupIds: sg-0c8a656a0bbb17fe2
  subnetId: subnet-03cb3d850193b907b
reclaimPolicy: Delete
volumeBindingMode: Immediate

The following YAML template shows the definition of the dynamic PVC used in this deployment. Running the command kubectl edit pvc fsx-dynamic-pvc -n emr-data-team-a shows the deployed resource. PVCs are a namespace-specific resources typically created by the developers, therefore we define the emr-data-team-a namespace.

Spark can dynamically provision the PVC with claimName using SparkConf (for example, spark.kubernetes.driver.volumes.persistentVolumeClaim.data.options.claimName=OnDemand). However, we recommend deploying the PVC before the start of Spark jobs to avoid delays to provision the FSx for Lustre file system during the job run. The FSx for Lustre file system takes approximately 10–12 minutes to complete.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fsx-dynamic-pvc
  namespace: emr-data-team-a
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 2000Gi
  storageClassName: fsx # PVC reference to Storage class created by Terraform
  volumeMode: Filesystem
  volumeName: pvc-0da5a625-03ba-48fa-b08e-3f74291c0e5e # Dynamically created Persistent Volume
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 2400Gi
  phase: Bound

Now that we have set up the dynamic FSx for Lustre file system, we can use this in EMR on EKS Spark jobs using pod templates. Key things to note in the following template are that the volumes section is defined as persistentVolumeClaim with the claim name as fsx-dynamic-pvc, and the containers section refers to the unique mountPath folder as /dynamic. We also use initContainers in the driver pod template to give correct permissions and ownership to the Hadoop users to be used by EMR on EKS driver executor processes.

The following is our driver pod template:

# NOTE: PVC created before the start of the Spark job to avoid waiting for 15 mins to create FSx for Lustre filesystem while the job is running
---
apiVersion: v1
kind: Pod
metadata:
  name: fsx-taxi-driver
  namespace: emr-data-team-a # Namespace used to submit the jobs
spec:
  volumes:
    - name: spark-local-dir-ny-taxi
      persistentVolumeClaim:
        claimName: fsx-dynamic-pvc  # Dynamic PVC pre-created by this example terraform template

  nodeSelector:
    NodeGroupType: spark

  containers:
    - name: spark-kubernetes-driver 
      volumeMounts:
        - name: spark-local-dir-ny-taxi
          mountPath: /dynamic # FSx SCRATCH_1 filesystem for executors scratch space
          readOnly: false
  initContainers:  # initContainer only used in Driver to set the permissions for dynamically created filesystem.
    - name: spark-init-container-driver  
      image: public.ecr.aws/y4g4v0z7/busybox
      volumeMounts:
        - name: spark-local-dir-ny-taxi
          mountPath: /dynamic # FSx Scratch 1 filesystem for executors scratch space
      command: ["sh", "-c", "chmod 777 /dynamic", "chown -hR +999:+1000 /dynamic"]

The executor pod template also refers to the same persistentVolumeClaim as fsx-dynamic-pvc and volumeMounts with mountPath as /dynamic:

apiVersion: v1
kind: Pod
metadata:
  name: fsx-taxi-exec
  namespace: emr-data-team-a
spec:
  volumes:
    - name: spark-local-dir-ny-taxi
      persistentVolumeClaim:
        claimName: fsx-dynamic-pvc  # Dynamic PVC pre-created by this example terraform template

  nodeSelector:
    NodeGroupType: spark

  containers:
    - name: spark-kubernetes-executor # Don't change this name. EMR on EKS looking for this name
      volumeMounts:
        - name: spark-local-dir-ny-taxi
          mountPath: /dynamic  # FSx Scratch 1 filesystem for executor’s scratch space
          readOnly: false

Let’s run the sample PySpark script using the preceding pod templates. Navigate to the examples/spark-execute directory and run the shell script (fsx-dynamic-spark.sh). This script is the same as the static provisioning example; the only difference is the pod templates, which refer to the dynamic volumes.

cd ~/terraform-aws-eks-blueprints/examples/analytics/emr-eks-fsx-lustre/examples/spark-execute

This shell script expects three input values: EMR_VIRTUAL_CLUSTER_ID, EMR_JOB_EXECUTION_ROLE_ARN, and your S3 bucket name. Use the same values used in the previous static provisioning example.

Let’s run the fsx-dynamic-spark.sh shell script:

./fsx-dynamic-spark.sh <EMR_VIRTUAL_CLUSTER_ID> \
s3://<YOUR_BUCKET_NAME> \
<EMR_JOB_EXECUTION_ROLE_ARN>

After the job is triggered, run the following commands to see the output of the job:

# Output of dynamic persistent volume claim fsx-dynamic-pvc
kubectl get pvc -n emr-data-team-a | grep fsx-dynamic-pvc

# Verify the used FSx for Lustre filesystem disk size with executor1
kubectl exec -ti ny-taxi-trip-dynamic-exec-1 -c spark-kubernetes-executor -n emr-data-team-a — df -h

# Verify the files created under /dynamic FSx mount
kubectl exec -ti ny-taxi-trip-dynamic-exec-1 -c spark-kubernetes-executor -n emr-data-team-a — ls -lah /dynamic

The following screenshot shows the file system mounted under the /dynamic path. We can also see the Spark shuffle files created in the /dynamic folder.

Clean up

To clean up your environment, destroy the Terraform modules in reverse order. Then, empty any S3 buckets created by this module and run the following commands:

cd ~/terraform-aws-eks-blueprints/examples/analytics/emr-eks-fsx-lustre

terraform destroy -target="module.eks_blueprints_kubernetes_addons" -auto-approve

terraform destroy -target="module.eks_blueprints" -auto-approve

terraform destroy -target="module.vpc" -auto-approve

# Finally, destroy any additional resources that are not in the above modules

terraform destroy -auto-approve

Furthermore, log in to the AWS Management Console and delete any S3 buckets or FSX for Lustre file systems created by this deployment to avoid unwanted charges to your AWS account.

Conclusion

In this post, we demonstrated how to mount an FSx for Lustre file system as a PVC to Spark applications with EMR on EKS. We showed two mounting methods: static provisioning and dynamic provisioning via the FSx for Lustre CSI driver. The HDFS-like storage can be used by Spark on a Kubernetes pattern to achieve optimal storage performance. You can use it either as a temporary scratch space to store intermediate data while processing, or as a shared, persistent file system to exchange data for multiple pods in a single job or between multiple Spark jobs.

If you want to try out the full solution or for more EMR on EKS examples, check out our open-sourced project on GitHub.


About the authors

Vara Bonthu is a Senior Open Source Engineer focused on data analytics and containers working with Strategic Accounts. He is passionate about open source, big data, Kubernetes, and has a substantial development, DevOps, and architecture background.

Karthik Prabhakar is a Senior Analytics Specialist Solutions Architect at AWS, helping strategic customers adopt and run AWS Analytics services.

Melody Yang is a Senior Big Data Solutions Architect for Amazon EMR at AWS. She is an experienced analytics leader working with AWS customers to provide best practice guidance and technical advice in order to assist their success in data transformation. Her areas of interests are open-source frameworks and automation, data engineering, and DataOps.

Migrating petabytes of data from on-premises file systems to Amazon FSx for Lustre

Post Syndicated from Vimala Pydi original https://aws.amazon.com/blogs/architecture/migrating-petabytes-of-data-from-on-premises-file-systems-to-amazon-fsx-for-lustre/

Many organizations use the Lustre filesystem for Linux-based applications that require petabytes of data and high-performance storage. Lustre file systems are used in machine learning (ML), high performance computing (HPC), big data, and financial analytics. Many such high-performance workloads are being migrated to Amazon Web Services (AWS) to take advantage of the scalability, elasticity, and agility that AWS offers. Amazon FSx for Lustre is a fully managed service that provides cost-effective, high-performance, and scalable storage for Lustre file systems on AWS.

AWS DataSync is an AWS managed service for copying data to and from Amazon FSx for Lustre. It provides high-speed transfer through its use of compression and parallel transfer mechanism and integrates with Amazon CloudWatch for observability.

This blog will show you how to migrate petabytes of data files from on-premises to Amazon FSx for Lustre using AWS DataSync. It will provide an overview of Amazon CloudWatch metrics and logs to help you monitor your data transfer using AWS DataSync and metrics from Amazon FSx for Lustre.

Solution overview for file storage data migration

The high-level architecture diagram in Figure 1 depicts file storage data migration from on-premises data center to Amazon FSx for Lustre using AWS DataSync.

Following are the steps for the migration:

  1. Create an Amazon FSx file system.
  2. Install AWS DataSync agent on premises to connect to AWS DataSync service over secured TLS connection.
  3. Configure source and target locations to create an AWS DataSync task.
  4. Configure and start the AWS DataSync task to migrate the data from on-premises to Amazon FSx for Lustre.
Figure 1. Architecture diagram for transferring files on-premises to Amazon FSx for Lustre using AWS DataSync

Figure 1. Architecture diagram for transferring files on-premises to Amazon FSx for Lustre using AWS DataSync

Prerequisites

Steps for migration

1. Create an Amazon FSx file system

To start the migration, create a Lustre file system in Amazon FSx service and follow the step-by-step guidance provided in Getting started with Amazon FSx for Lustre.

For this blog, a target of ‘Persistent 2’ deployment type FSx for Lustre is selected with a storage capacity of 1.2 TB (Figure 2.)

Figure 2. FSx for Lustre target file system

Figure 2. FSx for Lustre target file system

2. Install AWS DataSync agent on-premises

Follow steps in the article: Getting started with AWS DataSync to get started with the AWS DataSync service. Configure the source system to migrate the file system data using the following steps:

  • Deploy an AWS DataSync agent on-premises on a supported virtual machine or hypervisor (Figure 3.)
  • Configure the AWS DataSync agent from AWS Management Console.
  • Activate the AWS DataSync agent configured from the preceding step.
Figure 3. Create AWS DataSync agent

Figure 3. Create AWS DataSync agent

3. Configure source and destination locations

A DataSync task consists of a pair of locations between which data is transferred. The source location defines the storage system that you want to read from. The destination location defines the storage service that you want to write data to. Here the source location is an on-premises Lustre system and the destination location is the Amazon FSx for Lustre service (Figure 4.)

Figure 4. Configure source and destination location for AWS DataSync task

Figure 4. Configure source and destination location for AWS DataSync task

4. Configure and start task

A task is a set of two locations (source and destination) and a set of options that you use to control the behavior of the task. Create a task with the source and destination locations and choose Start from the Actions menu (Figure 5.)

Figure 5. Start task

Figure 5. Start task

Wait for the task status to change to Running (Figure 6.)

Figure 6. Task status

Figure 6. Task status

To check the details of the task completion, select the task and click on the History tab (Figure 7.) The status should show Success once the task successfully completes the migration.

Figure 7. Task history

Figure 7. Task history

Monitoring the file transfer

Amazon CloudWatch is the AWS native observability service. It collects and processes raw data from AWS services such as Amazon FSx for Lustre and AWS DataSync into readable, near real-time metrics. It provides metrics that you can use to get more visibility into the data transfer. For a full list of CloudWatch metrics for AWS DataSync and Amazon FSx for Lustre, read Monitoring AWS DataSync and Monitoring Amazon FSx for Lustre.

Amazon FSx for Lustre can also provide various metrics, for example, the number of read or write operations using DataReadOperations and DataWriteOperations. To find the total storage available you can check the metric FreeDataStorageCapacity (Figure 8.)

Figure 8. CloudWatch metrics for Amazon FSx for Lustre

Figure 8. CloudWatch metrics for Amazon FSx for Lustre

AWS DataSync metrics such as FilesTransferred, gives the actual number of files or metadata that transferred over the network. BytesTransferred provides the total number of bytes that transferred over the network when the agent reads from the source location to the destination location.

A robust monitoring system can be built by setting up an automated notification process for any errors or issues in the data transfer task. Integrate Amazon CloudWatch in combination with the Amazon Simple Notification Service (SNS). Figure 9 depicts the AWS DataSync logs in Amazon CloudWatch.

Figure 9. AWS DataSync logs in Amazon CloudWatch

Figure 9. AWS DataSync logs in Amazon CloudWatch

You can also gather insights from the logs of the data transfer metrics using CloudWatch Logs Insights. CloudWatch Log Insights enables you to quickly search and query your log data (Figure 10.) You can set a filter metric for error codes and alert the appropriate team.

Figure 10. Amazon CloudWatch Logs Insights for querying logs

Figure 10. Amazon CloudWatch Logs Insights for querying logs

Cleanup

If you are no longer using the resources discussed in this blog, remove the unneeded AWS resources to avoid incurring charges. After finishing the file transfer, clean up resources by deleting the Amazon FSx file system and AWS DataSync objects (DataSync agent, task, source location, and destination location.)

Conclusion

In this post, we demonstrated how we can accelerate migration of Lustre files from on-premises into Amazon FSx for Lustre using AWS DataSync. As a fully managed service, AWS DataSync securely and seamlessly connects to your Amazon FSx for Lustre file system. This makes it possible for you to move millions of files and petabytes of data without the need for deploying or managing infrastructure in the cloud. We walked through different observability metrics with Amazon CloudWatch integration to provide performance metrics, logging, and events. This can further help to speed up critical hybrid cloud storage workflows in industries that must move active files into AWS quickly. This capability is available in Regions where AWS DataSync and Amazon FSx for Lustre are available. For further details on using this cost-effective service, see Amazon FSx for Lustre pricing and AWS DataSync pricing.

For further reading:

More posts for Women’s History Month!

Other ways to participate

Enhanced Amazon S3 Integration for Amazon FSx for Lustre

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/enhanced-amazon-s3-integration-for-amazon-fsx-for-lustre/

Today, we are announcing two additional capabilities of Amazon FSx for Lustre. First, a full bi-directional synchronization of your file systems with Amazon Simple Storage Service (Amazon S3), including deleted files and objects. Second, the ability to synchronize your file systems with multiple S3 buckets or prefixes.

Lustre is a large scale, distributed parallel file system powering the workloads of most of the largest supercomputers. It is popular among AWS customers for high-performance computing workloads, such as meteorology, life-science, and engineering simulations. It is also used in media and entertainment, as well as the financial services industry.

I had my first hands-on Lustre file systems when I was working for Sun Microsystems. I was a pre-sales engineer and worked on some deals to sell multimillion-dollar compute and storage infrastructure to financial services companies. Back then, having access to a Lustre file system was a luxury. It required expensive compute, storage, and network hardware. We had to wait weeks for delivery. Furthermore, it required days to install and configure a cluster.

Fast forward to 2021, I may create a petabyte-scale Lustre cluster and attach the file system to compute resources running in the AWS cloud, on-demand, and only pay for what I use. There is no need to know about Storage Area Networks (SAN), Fiber Channel (FC) fabric, and other underlying technologies.

Modern applications use different storage options for different workloads. It is common to use S3 object storage for data transformation, preparation, or import/export tasks. Other workloads may require POSIX file-systems to access the data. FSx for Lustre lets you synchronize objects stored on S3 with the Lustre file system to meet these requirements.

When you link your S3 bucket to your file system, FSx for Lustre transparently presents S3 objects as files and lets you to write results back to S3.

Full Bi-Directional Synchronization with Multiple S3 Buckets
If your workloads require a fast, POSIX-compliant file system access to your S3 buckets, then you can use FSx for Lustre to link your S3 buckets to a file system and keep data synchronized between the file system and S3 in both directions. However, until today, there were a couple limitations. First, you had to manually configure a task to export data back from FSx for Lustre to S3. Second, deleted files on S3 were not automatically deleted from the file system. And third, an FSx for Lustre file system was synchronized with one S3 bucket only. We are addressing these three challenges with this launch.

Starting today, when you configure an automatic export policy for your data repository association, files on your FSx for Lustre file system are automatically exported to your data repository on S3. Next, deleted objects on S3 are now deleted from the FSx for Lustre file system. The opposite is also available: deleting files on FSx for Lustre triggers the deletion of corresponding objects on S3. Finally, you may now synchronize your FSx for Lustre file system with multiple S3 buckets. Each bucket has a different path at the root of your Lustre file system. For example your S3 bucket logs may be mapped to /fsx/logs and your other financial_data bucket may be mapped to /fsx/finance.

These new capabilities are useful when you must concurrently process data in S3 buckets using both a file-based and an object-based workflow, as well as share results in near real time between these workflows. For example, an application that accesses file data can do so by using an FSx for Lustre file system linked to your S3 bucket, while another application running on Amazon EMR may process the same files from S3.

Moreover, you may link multiple S3 buckets or prefixes to a single FSx for Lustre file system, thereby enabling a unified view across multiple datasets. Now you can create a single FSx for Lustre file system and easily link multiple S3 data repositories (S3 buckets or prefixes). This is convenient when you use multiple S3 buckets or prefixes to organize and manage access to your data lake, access files from a public S3 bucket (such as these hundreds of public datasets) and write job outputs to a different S3 bucket, or when you want to use a larger FSx for Lustre file system linked to multiple S3 datasets to achieve greater scale-out performance.

How It Works
Let’s create an FSx for Lustre file system and attach it to an Amazon Elastic Compute Cloud (Amazon EC2) instance. I make sure that the file system and instance are in the same VPC subnet to minimize data transfer costs. The file system security group must authorize access from the instance.

I open the AWS Management Console, navigate to FSx, and select Create file system. Then, I select Amazon FSx for Lustre. I am not going through all of the options to create a file system here, you can refer to the documentation to learn how to create a file system. I make sure that Import data from and export data to S3 is selected.

Lustre - enable S3 synchronizationIt takes a few minutes to create the file system. Once the status is ✅ Available, I navigate to the Data repository tab, and then select Create data repository association.

I choose a Data Repository path (my source S3 bucket) and a file system path (where in the file system that bucket will be imported).

FsX Lustre Data repository

Then, I choose the Import policy and Export policy. I may synchronize the creation of file/objects, their updates, and when they are deleted. I select Create.

FsX Lustre Data repository import policies

When I use automatic import, I also make sure to provide an S3 bucket in the same AWS Region as the FSx for Lustre cluster. FSx for Lustre supports linking to an S3 bucket in a different AWS Region for automatic export and all other capabilities.

Using the console, I see the list of Data repository associations. I wait for the import task status to become ✅ Succeeded. If I link the file system to an S3 bucket with a large number of objects, then I may choose to skip Importing metadata from repository while creating the data repository association, and then load metadata from selected prefixes in my S3 buckets that are required for my workload using an Import task.

FsX for lustre - meta data repository tasks

I create an EC2 instance in the same VPC subnet. Furthermore, I make sure that the FSx for Lustre cluster security group authorizes ingress traffic from the EC2 instance. I use SSH to connect to the instance, and then type the following commands (commands are prefixed with the $ sign that is part of my shell prompt).

# check kernel version, minimum version 4.14.104-95.84 is required 
$ uname -r
4.14.252-195.483.amzn2.aarch64

# install lustre client 
$ sudo amazon-linux-extras install -y lustre2.10
Installing lustre-client
...
Installed:
  lustre-client.aarch64 0:2.10.8-5.amzn2                                                                                                                        

Complete!

# create a mount point 
$ sudo mkdir /fsx

# mount the file system 
$ sudo mount -t lustre -o noatime,flock fs-00...9d.fsx.us-east-1.amazonaws.com@tcp:/ny345bmv /fsx

# verify mount succeeded
$ mount 
...
172.0.0.0@tcp:/ny345bmv on /fsx type lustre (rw,noatime,flock,lazystatfs)

Then, I verify that the file system contains the S3 objects, and I create a new file using the touch command.

Fsx Lustre - check file system

I switch to the AWS Console, under S3 and then my bucket name, and I verify that the file has been synchronized.

Fsx Lustre - check s3

Using the console, I delete the file from S3. And, unsurprisingly, after a few seconds, the file is also deleted from the FSx file system.

Fsx Lustre - check file systems - deleted

Pricing and Availability
These new capabilities are available at no additional cost on Amazon FSx for Lustre file systems. Automatic export and multiple repositories are only available on Persistent 2 file systems in US East (N. Virginia), US East (Ohio), US West (Oregon), Canada (Central), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland). Automatic import with support for deleted and moved objects in S3 is available on file systems created after July 23, 2020 in all regions where FSx for Lustre is available.

You can configure your file system to automatically import S3 updates by using the AWS Management Console, the AWS Command Line Interface (CLI), and AWS SDKs.

Learn more about using S3 data repositories with Amazon FSx for Lustre file systems.

One More Thing
One more thing while you are reading. Today, we also launched the next generation of FSx for Lustre file systems. FSx for Lustre next-gen file systems are built on AWS Graviton processors. They are designed to provide you with up to 5x higher throughput per terabyte (up to 1 GB/s per terabyte) and reduce your cost of throughput by up to 60% as compared to previous generation file systems. Give it a try today!

— seb

PS : my colleague Michael recorded a demo video to show you the enhanced S3 integration for FSx for Lustre in action. Check it out today.

Fire Dynamics Simulation CFD workflow using AWS ParallelCluster, Elastic Fabric Adapter, Amazon FSx for Lustre and NICE DCV

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/fire-dynamics-simulation-cfd-workflow-using-aws-parallelcluster-elastic-fabric-adapter-amazon-fsx-for-lustre-and-nice-dcv/

This post was written by By Kevin Tuil, AWS HPC consultant 

Modeling fires is key for many industries, from the design of new buildings, defining evacuation procedures for trains, planes and ships, and even the spread of wildfires. Modeling these fires is complex. It involves both the need to model the three-dimensional unsteady turbulent flow of the fire and the many potential chemical reactions. To achieve this, the fire modeling community has moved to higher-fidelity turbulence modeling approaches such as the Large Eddy Simulation, which requires both significant temporal and spatial resolution. It means that the computational cost for these simulations is typically in the order of days to weeks on a single workstation.
While there are a number of software packages, one of the most popular is the open-source code: Fire Dynamics Simulation (FDS) developed by National Institute of Standards and Technology (NIST).

In this blog, I focus on how AWS High Performance Computing (HPC) resources (e.g AWS ParallelCluster, Amazon FSx for Lustre, Elastic Fabric Adapter (EFA), and Amazon S3) allow FDS users to scale up beyond a single workstation to hundreds of cores to achieve simulation times of hours rather than days or weeks. In this blog, I outline the architecture needed, providing scripts and templates to compile FDS and run your simulation.

Service and solution overview

AWS ParallelCluster

AWS ParallelCluster is an open source cluster management tool that simplifies deploying and managing HPC clusters with Amazon FSx for Lustre, EFA, a variety of job schedulers, and the MPI library of your choice. AWS ParallelCluster simplifies cluster orchestration on AWS so that HPC environments become easy-to-use, even if you are new to the cloud. AWS released AWS ParallelCluster 2.9.1 and its user guide – which is the version I use in this blog.

These three AWS HPC resources are optimal for Fire Dynamics Simulation. Together, they provide easy deployment of HPC systems on AWS, low latency network communication for MPI workloads, and a fast, parallel file system.

Elastic Fabric Adapter

EFA is a critical service that provides low latency and high-bandwidth 100 Gbps network communication. EFA allows applications to scale at the level of on-premises HPC clusters with the on-demand elasticity and flexibility of the AWS Cloud. Computational Fluid Dynamics (CFD), among other tightly coupled applications, is an excellent candidate for the use of EFA.

Amazon FSx for Lustre

Amazon FSx for Lustre is a fully managed, high-performance file system, optimized for fast processing workloads, like HPC. Amazon FSx for Lustre allows users to access and alter data from either Amazon S3 or on-premises seamlessly and exceptionally fast. For example, you can launch and run a file system that provides sub-millisecond latency access to your data. Additionally, you can read and write data at speeds of up to hundreds of gigabytes per second of throughput, and millions of IOPS. This speed and low-latency unleash innovation at an unparalleled pace. This blog post uses the latest version of Amazon FSx for Lustre, which recently added a new API for moving data in and out of Amazon S3. This API also includes POSIX support, which allows files to mount with the same user id. Additionally, the latest version also includes a new backup feature that allows you to back up your files to an S3 bucket.

Solution and steps

The overall solution that I deploy in this blog is represented in the following diagram:

solution overview diagram

Step 1: Access to AWS Cloud9 terminal and upload data

There are two ways to start using AWS ParallelCluster. You can either install AWS CLI or turn on AWS Cloud9, which is a cloud-based integrated development environment (IDE) that includes a terminal. For simplicity, I use AWS Cloud9 to create the HPC cluster. Please refer to this link to proceed to AWS Cloud9 set up and to this link for AWS CLI setup.

Once logged into your AWS Cloud9 instance, the first thing you want to create is the S3 bucket. This bucket is key to exchange user data in and out from the corporate data center and the AWS HPC cluster. Please make sure that your bucket name is unique globally, meaning there is only one worldwide across all AWS Regions.

aws s3 mb s3://fds-smv-bucket-unique
make_bucket: fds-smv-bucket-unique

Download the latest FDS-SMV Linux version package from the official NIST website. It looks something like: FDS6.7.4_SMV6.7.14_lnx.sh

For the geometry, it should be renamed to “geometry.fds”, and must be uploaded to your AWS Cloud9 or directly to your S3 bucket.

Please note that once the FDS-SMV package has been downloaded locally to the instance, you must upload it to the S3 bucket using the following command.

aws s3 cp FDS6.7.4_SMV6.7.14_lnx.sh s3://fds-smv-bucket-unique
aws s3 cp geometry.fds s3://fds-smv-bucket-unique

You use the same S3 bucket to install FDS-SMV later on with the Amazon FSx for Lustre File System.

Step 2: Set up AWS ParallelCluster

You can install AWS ParallelCluster running the following command from your AWS Cloud9 instance:

sudo pip install aws-parallelcluster

Once it is installed, you can run the following command to check the version:

pcluster version 

At the time of writing this blog, 2.9.1 is the most up-to-date version.

Then use the text editor of your choice and open the configuration file as follows:

vim ~/.parallelcluster/config

Replace the bolded section, if not yet filled in, by your own information and save the configuration file.

[aws]
aws_region_name = <AWS-REGION>

[global]
sanity_check = true
cluster_template = fds-smv-cluster
update_check = true

[vpc public]
vpc_id = vpc-<VPC-ID>
master_subnet_id = subnet-<SUBNET-ID>

[cluster fds-smv-cluster]
key_name = <Key-Name>
vpc_settings = public
compute_instance_type=c5n.18xlarge
master_instance_type=c5.xlarge
initial_queue_size = 0
max_queue_size = 100
scheduler=slurm
cluster_type = ondemand
s3_read_write_resource=arn:aws:s3:::fds-smv-bucket-unique*
placement_group = DYNAMIC
placement = compute
base_os = alinux2
tags = {"Name" : "fds-smv"}
disable_hyperthreading = true
fsx_settings = fsxshared
enable_efa = compute
dcv_settings = hpc-dcv

[dcv hpc-dcv]
enable = master

[fsx fsxshared]
shared_dir = /fsx
storage_capacity = 1200
import_path = s3://fds-smv-bucket-unique
imported_file_chunk_size = 1024
export_path = s3://fds-smv-bucket-unique

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

Let’s review the different sections of the configuration file and explain their role:

  • scheduler: Supported job schedulers are SGE, TORQUE, SLURM and AWS Batch. I have selected SLURM for this example.
  • cluster_type: You have the choice between On-Demand (ondemand) or Spot Instances (spot) for your compute instances. For On-Demand, instances are available for use without condition (if available in the Region selected) at a certain price per hour with the pay-as-you-go model, meaning that as soon as they are started, they are reserved for your utilization. For Spot Instances, you can take advantage of unused EC2 capacity in the AWS Cloud. Spot Instances are available at up to a 90% discount compared to On-Demand Instance prices. You can use Spot Instances for various stateless, fault-tolerant, or flexible applications such as HPC, for more information about Spot Instances, feel free to visit this webpage.
  • s3_read_write_resource: This parameter allows you to read and write objects directly on your S3 bucket from the cluster you created without additional permissions. It acts as a role for your cluster, allowing you access to your specified S3 bucket.  
  • placement_groupUse DYNAMIC to ensure that your instances are located as physically close to one another as possible. Close placement minimizes the latency between compute nodes and takes advantage of EFA’s low latency networking.
  • placement: By selecting compute you only enforce compute instances to be placed within the same placement group, leaving the head node placement free.
  • compute_instance_type:Select C5n.18xlarge because it is optimized for compute-intensive workloads and supports EFA for better scaling of HPC applications. Note that EFA is supported only for specific instance types. Please visit currently supported instances for more information.
  • master_instance_type:This can be any instance type. As traffic between head and compute nodes is relatively small, and the head node runs during the entire lifetime of the cluster, I use c5.xlarge because it is inexpensive and is a good fit for this use case.
  • initial_queue_size:You start with no compute instances after the HPC cluster is up. This means that any new job submitted has some delay (time for the nodes to be powered on) before they are seen as available by the job scheduler. This helps you pay for what you use and keeps costs as low as possible.
  • max_queue_size:Limit the maximum compute fleet to 100 instances. This allows you room to scale your jobs up to a large number of cores, while putting a limit on the number of compute nodes to help control costs.
  • base_osFor this blog, select Amazon Linux 2 (alinux2) as a base OS. Currently we also support Amazon Linux (alinux), CentOS 7 (centos7), Ubuntu 16.04 (ubuntu1604), and Ubuntu 18.04 (ubuntu1804) with EFA.
  • disable_hyperthreading: This setting turns off hyperthreading (true) on your cluster, which is the right configuration in this use case.[fsx fsxshared]: This section contains the settings to define your FSx for Lustre parallel file system, including the location where the shared directory is mounted, the storage capacity for the file system, the chunk size for files to be imported, and the location from which the data will be imported. You can read more about FSx for Lustre here.
  • enable_efa: Mark as (true) in this use case since it is a tightly coupled CFD simulation use case.
  • dcv_settings:With AWS ParallelCluster, you can use NICE DCV to support your remote visualization needs.
  • [dcv hpc-dcv]:This section contains the settings to define your remote visualization setup. You can read more about DCV with AWS ParallelCluster here.
  • import_path: This parameter enables all the objects on the S3 bucket available when creating the cluster to be seen directly from the FSx for Lustre file system. In this case, you are able to access the FDS-SMV package and the geometry under the /fsx mounted folder.
  • export_path: This parameter is useful for backup purposes using the Data Repository Tasks. I share more details about this in step 7 (optional).

Step 3: Create the HPC cluster and log in

Now, you can create the HPC cluster, named fds-smv. It takes around 10 minutes to complete and you can see the status changing (going through the different AWS CloudFormation template steps). At the end of creation, two IP addresses are prompted, a public IP and/or a private IP depending on your network choice.

pcluster create fds-smv
Creating stack named: parallelcluster-fds-smv
Status: parallelcluster-fds-smv - CREATE_COMPLETE                               
MasterPublicIP: X.X.X.X
ClusterUser: ec2-user
MasterPrivateIP: X.X.X.X

In order to log in, you must use the key you specified in the AWS ParallelCluster configuration file before creating the cluster:

pcluster ssh fds-smv -i <Key-Name>

You should now be logged in as an ec2-user (since we are using Amazon Linux 2 base OS).

Step 4: Install FDS-SMV package

Now that the HPC cluster using AWS ParallelCluster is set up, it is time to install the FDS-SMV package.  In the prior steps, you uploaded both the FDS-SMV package and the geometry to your S3 bucket. Since you enabled “import_path” to that bucket, they are already available on the Amazon FSx for Lustre storage under /fsx.

Run the script as follows and select /fsx/fds-smv as final target for installation:

cd /fsx
./FDS6.7.4_SMV6.7.14_lnx.sh
[ec2-user@ip-X-X-X-X fsx]$ ./FDS6.7.4_SMV6.7.14_lnx.sh 

Installing FDS and Smokeview  for Linux

Options:
  1) Press <Enter> to begin installation [default]
  2) Type "extract" to copy the installation files to:
     FDS6.7.4_SMV6.7.14_lnx.tar.gz
 

FDS install options:
  Press 1 to install in /home/ec2-user/FDS/FDS6 [default]
  Press 2 to install in /opt/FDS/FDS6
  Press 3 to install in /usr/local/bin/FDS/FDS6
  Enter a directory path to install elsewhere
/fsx/fds-smv

It is important to source the following scripts as part of the installed packages to check if the installation is successful with the correct versions. Here is the correct output you should get:

[ec2-user@ip-X-X-X-X ~]$ source /fsx/fds-smv/bin/SMV6VARS.sh 
[ec2-user@ip-X-X-X-X ~]$ source /fsx/fds-smv/bin/FDS6VARS.sh 
[ec2-user@ip-X-X-X-X ~]$ fds -version
FDS revision       : FDS6.7.4-0-gbfaa110-release
MPI library version: Intel(R) MPI Library 2019 Update 4 for Linux* OS

[ec2-user@ip-10-0-2-233 ~]$ smokeview -version

Smokeview  SMV6.7.14-0-g568693b-release - Mar  9 2020

Revision         : SMV6.7.14-0-g568693b-release
Revision Date    : Wed Mar 4 23:13:42 2020 -0500
Compilation Date : Mar  9 2020 16:31:22
Compiler         : Intel C/C++ 19.0.4.243
Checksum(SHA1)   : e801eace7c6597dc187739e51ba6f546bfde4e48
Platform         : LINUX64

Important notes:

The way FDS-SMV package has been installed is the default installation. Binaries are already compiled and Intel MPI libraries are embedded as part of the installation package. It is what one would call a self-contained application. For further builds and source codes, please visit this webpage.

Step 5: Running the fire dynamics simulation using FDS

Now that everything is installed, it is time to create the SLURM submission script. In this step, you take advantage of the FSx for Lustre File System, the compute-optimized instance, and the EFA network to maximize simulation performance.

cd /fsx/
vi fds-smv.sbatch

Here is the information you should specify in your submission script:

#!/bin/bash
#SBATCH --job-name=fds-smv-job
#SBATCH --ntasks=<Total number of MPI processes>
#SBATCH --ntasks-per-node=36
#SBATCH --output=%x_%j.out

source /fsx/fds-smv/bin/FDS6VARS.sh
source /fsx/fds-smv/bin/SMV6VARS.sh

module load intelmpi 

export OMP_NUM_THREADS=1
export I_MPI_PIN_DOMAIN=omp

cd /fsx/<results>

time mpirun -ppn 36 -np <Total number of MPI processes>  fds geometry.fds

Replace the <results> with the one of your choice, and don’t forget to copy the geometry.fds file in it before submitting your job. Once ready, save the file and submit the job using the following command:

sbatch fds-smv.sbatch 

If you decided to build your HPC cluster with c5n.18xlarge instances, the number of MPI processes per node is 36 since you turned off the hyperthreading, and that the instance has 36 physical cores. That is the meaning of the “#SBATCH --ntasks-per-node=36” line.

For any run exceeding 36 MPI processes, the job is split among multiple instances and take advantage of EFA for internode communication.

It is important to note that FDS only allows the number of MPI processes to be equal to the number of meshes in the input geometry (geometry.fds in this scenario). In case the number of meshes in the input geometry cannot be modified, OpenMP threads can be enabled and efficiently increase performance. Do this using up to four OpenMP Threads across four CPU cores attached to one MPI process.

Please read best practices provided by NIST for that topic on their user guide.

In order to take advantage of the distributed computing capability of FDS, it is mandatory to work first on the input geometry, and divide it into the appropriate number of meshes. It is also highly advised to evenly distribute the number of cells/elements per mesh across all meshes. This best practice optimizes the load balancing for each CPU core.

Step 6: Visualizing the results using NICE DCV and SMV

In order to visualize results, you must connect to the head node using NICE DCV streaming protocol.

As a reminder, the current instance type for the head node is a c5.xlarge, which is not a graphics-accelerated instance. For heavy and GPU intensive visualization, it is important to set up a more appropriate instance such as the G4 instance group.

Go back to your AWS Cloud9 instance, open a new terminal side by side to your session connected to your AWS HPC cluster, and enter the following command in the terminal:

pcluster dcv connect fds-smv -k <Key-Name>

You are provided a one-time HTTPS URL available for a short period of time in order to connect to your head node using the NICE DCV protocol.

Once connected, open the terminal inside your session and source the FDS-SMV scripts as before:

source /fsx/fds-smv/bin/FDS6VARS.sh
source /fsx/fds-smv/bin/SMV6VARS.sh

Navigate to your <results> folder and start SMV with your result.

I have selected one of the geometries named fire_whirl_pool.fds in the Examples folder, part of the default FDS-SMV installation package located here:

/fsx/fds-smv/Examples/Fires/fire_whirl_pool.fds

You can find other scenarios under the Examples folder to run some more use cases if you did not already choose your geometry.fds file.

Now you can run SMV and visualize your results:

smokeview fire_whirl_pool.smv

SMV (smokeview) takes as an input .smv extension files, please replace with your appropriate file. If you have already chosen your geometry.fds, then run the following command:

smokeview geometry.smv

The application then open as follows, and you can visualize the results. The following image is an output of the SOOT DENSITY of the 3D smoke.

fire simulation picture

Step 7 (optional): Back up your FDS-SMV results to an S3 bucket

First update the AWS CLI to its most recent version. It is compatible with 1.16.309 and above.

After running your FDS-SMV simulation, you can back up your data in /fsx to the S3 bucket you used earlier to upload the installation package, and input files using Data Repository Tasks.

Data Repository Tasks represent bulk operations between your Amazon FSx for Lustre file system and your S3 bucket. One of the jobs is to export your changed file system contents back to its linked S3 bucket.

Open your AWS Cloud9 terminal and exit the HPC head node cluster. Retrieve your Amazon FSx for Lustre ID using:

aws fsx describe-file-systems

It looks something like, fs-0533eebf1148fc8dd. Then create a backup of the data as follows:

aws fsx create-data-repository-task --file-system-id fs-0533eebf1148fc8dd --type EXPORT_TO_REPOSITORY --paths results --report Enabled=true,Scope=FAILED_FILES_ONLY,Format=REPORT_CSV_20191124,Path=s3://fds-smv-bucket-unique/

The following are definitions about the command parameters:

  • file-system-id: Your file system ID.
  • type EXPORT_TO_REPOSITORY: Exports the data back to the S3 bucket.
  • paths results: The directory you want to export to your S3 bucket. If you have more than one folder to back up, use a comma-separated notation such as: results1,results2,…
  • Format=REPORT_CSV_20191124: Note this is only the name the Amazon FSx Lustre supports. Please keep it the same.

You can check the backup status by running:

aws fsx describe-data-repository-tasks

Please wait for the copy to be achieved, once finished you should see on the Lifecycle line "Lifecycle": "SUCCEEDED"

Also go back to your S3 bucket, and your folder(s) should appear with all the files correctly uploaded from your /fsx folder you specified.

In terms of data management, Amazon S3 is an important service. You started by uploading installation package and geometry files from an external source, such as your laptop or an on-premises system. Then made these files available to the AWS HPC cluster under the Amazon FSx for Lustre file system and ran the simulation. Finally, you backed up the results from the Amazon FSx for Lustre to Amazon S3. You can also decide to download the results on Amazon S3 back to your local system if needed.

Step 8: Delete your AWS resources created during the deployment of this blog

After your run is completed and your data backed up successfully (Step 7 is optional) on your S3 bucket, you can then delete your cluster by using the following command in your Cloud9 terminal:

pcluster delete fds-smv

Warning:

If you run the command above all resources you created during this blog are automatically deleted beside your Cloud9 session and your data on your S3 bucket you created earlier.

Your S3 bucket still contains your input “geometry.fds” and your installation package “FDS6.7.4_SMV6.7.14_lnx.sh” files.

If you selected to back up your data during Step 7 (optional), then your S3 bucket also contains that data on top of the two previous files mentioned above.

If you want to delete your S3 bucket and all data mentioned above, go to your AWS Management Console, select S3 service then select your S3 bucket and hit delete on the top section.

If you want to terminate your Cloud9 session, go to your AWS Management Console, select Cloud9 service then select your session and hit delete on the top right section.

After performing these operations, there will be no more resources running on AWS related to this blog.

Conclusion

I showed that AWS ParallelCluster, Amazon FSx for Lustre, EFA, and Amazon S3 are key AWS services and features for HPC workloads such as CFD and in particular for FDS.

You can achieve simulation times of hours on AWS rather than days or weeks on a single workstation.

Please visit this workshop  for a more in-depth tutorial on running Fire Dynamics Simulation on AWS and our HPC dedicated homepage.