All posts by Karthik Prabhakar

Achieve up to 27% better price-performance for Spark workloads with AWS Graviton2 on Amazon EMR Serverless

Post Syndicated from Karthik Prabhakar original https://aws.amazon.com/blogs/big-data/achieve-up-to-27-better-price-performance-for-spark-workloads-with-aws-graviton2-on-amazon-emr-serverless/

Amazon EMR Serverless is a serverless option in Amazon EMR that makes it simple to run applications using open-source analytics frameworks such as Apache Spark and Hive without configuring, managing, or scaling clusters.

At AWS re:Invent 2022, we announced support for running serverless Spark and Hive workloads with AWS Graviton2 (Arm64) on Amazon EMR Serverless. AWS Graviton2 processors are custom-built by AWS using 64-bit Arm Neoverse cores, delivering a significant leap in price-performance for your cloud workloads.

This post discusses the performance improvements observed while running Apache Spark jobs using AWS Graviton2 on EMR Serverless. We found that Graviton2 on EMR Serverless achieved 10% performance improvement for Spark workloads based on runtime. AWS Graviton2 is offered at a 20% lower cost than the x86 architecture option (see the Amazon EMR pricing page for details), resulting in a 27% overall better price-performance for workloads.

Spark performance test results

The following charts compare the benchmark runtime with and without Graviton2 for a EMR Serverless Spark application (note that the charts are not drawn to scale). We observed up to 10% improvement in total runtime and 8% improvement in geometric mean for the queries compared to x86.

The following table summarizes our results.

Metric Graviton2 x86 %Gain
Total Execution Time (in seconds) 2,670 2,959 10%
Geometric Mean (in seconds) 22.06 24.07 8%

Testing configuration

To evaluate the performance improvements, we use benchmark tests derived from TPC-DS 3 TB scale performance benchmarks. The benchmark consists of 104 queries, and each query is submitted sequentially to an EMR Serverless application. EMR Serverless has automatic and fine-grained scaling enabled by default. Spark provides Dynamic Resource Allocation (DRA) to dynamically adjust the application resources based on the workload, and EMR Serverless uses the signals from DRA to elastically scale workers as needed. For our tests, we chose a predefined pre-initialized capacity that allows the application to scale to default limits. Each application has 1 driver and 100 workers configured as pre-initialized capacity, allowing it to scale to a maximum of 8000 vCPU/60000 GB capacity. When launching the applications, as default we use x86_64 to get baseline numbers and Arm64 for AWS Graviton2, and the application had VPC networking enabled.

The following table summarizes the Spark application configuration.

Number of Drivers Driver Size Number of Executors Executor Size Ephemeral Storage Amazon EMR release label
1 4 vCPUs, 16 GB Memory 100 4 vCPUs, 16 GB Memory 200 G 6.9

Performance test results and cost comparison

Let’s do a cost comparison of the benchmark tests. Because we used 1 driver [4 vCPUs, 16 GB memory] and 100 executors [4 vCPUs, 16 GB memory] for each run, the total capacity used is 4*101=192 vCPUs, 16*101=1616 GB memory, 200*100=20000 GB storage. The following table summarizes the cost.

Test Total time (Seconds) vCPUs Memory (GB) Ephemeral (Storage GB) Cost
x86_64 2,958.82 404 1616 18000 $26.73
Graviton2 2,670.38 404 1616 18000 $19.59

The calculations are as follows:

  • Total vCPU cost = (number of vCPU * per vCPU rate * job runtime in hour)
  • Total GB = (Total GB of memory configured * per GB-hours rate * job runtime in hour)
  • Storage = 20 GB of ephemeral storage is available for all workers by default—you pay only for any additional storage that you configure per worker

Cost breakdown

Let’s look at the cost breakdown for x86:

  • Job runtime – 49.3 minutes = 0.82 hours
  • Total vCPU cost – 404 vCPUs x 0.82 hours job runtime x 0.052624 USD per vCPU = 17.4333 USD
  • Total GB cost – 1,616 memory-GBs x 0.82 hours job runtime x 0.0057785 USD per memory GB = 7.6572 USD
  • Storage cost – 18,000 storage-GBs x 0.82 hours job runtime x 0.000111 USD per storage GB = 1.6386 USD
  • Additional storage – 20,000 GB – 20 GB free tier * 100 workers = 18,000 additional storage GB
  • EMR Serverless total cost (x86): 17.4333 USD + 7.6572 USD + 1.6386 USD = 26.7291 USD

Let’s compare to the cost breakdown for Graviton 2:

  • Job runtime – 44.5 minutes = 0.74 hours
  • Total vCPU cost – 404 vCPUs x 0.74 hours job runtime x 0.042094 USD per vCPU = 12.5844 USD
  • Total GB cost – 1,616 memory-GBs x 0.74 hours job runtime x 0.004628 USD per memory GB = 5.5343 USD
  • Storage cost – 18,000 storage-GBs x 0.74 hours job runtime x 0.000111 USD per storage GB = 1.4785 USD
  • Additional storage – 20,000 GB – 20 GB free tier * 100 workers = 18,000 additional storage GB
  • EMR Serverless total cost (Graviton2): 12.5844 USD + 5.5343 USD + 1.4785 USD = 19.5972 USD

The tests indicate that for the benchmark run, AWS Graviton2 lead to an overall cost savings of 27%.

Individual query improvements and observations

The following chart shows the relative speedup of individual queries with Graviton2 compared to x86.

We see some regression in a few shorter queries, which had little impact on the overall benchmark runtime. We observed better performance gains for long running queries, for example:

  • q67 average 86 seconds for x86, 74 seconds for Graviton2 with 24% runtime performance gain
  • q23a and q23b gained 14% and 16%, respectively
  • q32 regressed by 7%; the difference between average runtime is <500 milliseconds (11.09 seconds for Graviton2 vs. 10.39 seconds for x86)

To quantify performance, we use benchmark SQL derived from TPC-DS 3 TB scale performance benchmarks.

If you’re evaluating migrating your workloads to Graviton2 architecture on EMR Serverless, we recommend testing the Spark workloads based on your real-world use cases. The outcome might vary based on the pre-initialized capacity and number of workers chosen. If you want to run workloads across multiple processor architectures, (for example, test the performance on x86 and Arm vCPUs) follow the walkthrough in the GitHub repo to get started with some concrete ideas.

Conclusion

As demonstrated in this post, Graviton2 on EMR Serverless applications consistently yielded better performance for Spark workloads. Graviton2 is available in all Regions where EMR Serverless is available. To see a list of Regions where EMR Serverless is available, see the EMR Serverless FAQs. To learn more, visit the Amazon EMR Serverless User Guide and sample codes with Apache Spark and Apache Hive.

If you’re wondering how much performance gain you can achieve with your use case, try out the steps outlined in this post and replace with your queries.

To launch your first Spark or Hive application using a Graviton2-based architecture on EMR Serverless, see Getting started with Amazon EMR Serverless.


About the authors

Karthik Prabhakar is a Senior Big Data Solutions Architect for Amazon EMR at AWS. He is an experienced analytics engineer working with AWS customers to provide best practices and technical advice in order to assist their success in their data journey.

Nithish Kumar Murcherla is a Senior Systems Development Engineer on the Amazon EMR Serverless team. He is passionate about distributed computing, containers, and everything and anything about the data.