<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Apache Spark &#8211; Noise</title>
	<atom:link href="https://noise.getoto.net/tag/apache-spark/feed/" rel="self" type="application/rss+xml" />
	<link>https://noise.getoto.net</link>
	<description>The collective thoughts of the interwebz</description>
	<lastBuildDate>Tue, 02 Sep 2025 20:22:40 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.2</generator>
	<item>
		<title>Deploy Apache YuniKorn batch scheduler for Amazon EMR on EKS</title>
		<link>https://noise.getoto.net/2025/09/02/deploy-apache-yunikorn-batch-scheduler-for-amazon-emr-on-eks/</link>
		
		<dc:creator><![CDATA[Suvojit Dasgupta]]></dc:creator>
		<pubDate>Tue, 02 Sep 2025 20:22:40 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon EMR]]></category>
		<category><![CDATA[Amazon EMR on EKS]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Apache Spark]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=497f5421c7c54a816a5ac129af43680b</guid>

					<description><![CDATA[This post explores Kubernetes scheduling fundamentals, examines the limitations of the default kube-scheduler for batch workloads, and demonstrates how YuniKorn addresses these challenges. We discuss how to deploy YuniKorn as a custom scheduler for Amazon EMR on EKS, its integration with job submissions, how to configure queues and placement rules, and how to establish resource quotas. We also show these features in action through practical Spark job examples.]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Build a centralized observability platform for Apache Spark on Amazon EMR on EKS using external Spark History Server</title>
		<link>https://noise.getoto.net/2025/06/03/build-a-centralized-observability-platform-for-apache-spark-on-amazon-emr-on-eks-using-external-spark-history-server/</link>
		
		<dc:creator><![CDATA[Sri Potluri]]></dc:creator>
		<pubDate>Tue, 03 Jun 2025 16:20:37 +0000</pubDate>
				<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon EMR]]></category>
		<category><![CDATA[Amazon EMR on EKS]]></category>
		<category><![CDATA[Apache Spark]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=c31ae3b162b5f425b837208e6ffffb74</guid>

					<description><![CDATA[This post demonstrates how to build a centralized observability platform using SHS for Spark applications running on EMR on EKS. We showcase how to enhance SHS with performance monitoring tools, with a pattern applicable to many monitoring solutions such as SparkMeasure and DataFlint.]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Use Batch Processing Gateway to automate job management in multi-cluster Amazon EMR on EKS environments</title>
		<link>https://noise.getoto.net/2024/09/13/use-batch-processing-gateway-to-automate-job-management-in-multi-cluster-amazon-emr-on-eks-environments/</link>
		
		<dc:creator><![CDATA[Umair Nawaz]]></dc:creator>
		<pubDate>Fri, 13 Sep 2024 18:51:11 +0000</pubDate>
				<category><![CDATA[Amazon Elastic Kubernetes Service]]></category>
		<category><![CDATA[Amazon EMR]]></category>
		<category><![CDATA[Amazon EMR on EKS]]></category>
		<category><![CDATA[Apache Spark]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=16777eae0c14766e6c6aa326f1f00aea</guid>

					<description><![CDATA[AWS customers often process petabytes of data using Amazon EMR on EKS. In enterprise environments with diverse workloads or varying operational requirements, customers frequently choose a multi-cluster setup due to the following advantages: Better resiliency and no single point of failure – If one cluster fails, other clusters can continue processing critical workloads, maintaining business […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Detect and handle data skew on AWS Glue</title>
		<link>https://noise.getoto.net/2024/05/01/detect-and-handle-data-skew-on-aws-glue/</link>
		
		<dc:creator><![CDATA[Salim Tutuncu]]></dc:creator>
		<pubDate>Wed, 01 May 2024 16:27:24 +0000</pubDate>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Apache Spark]]></category>
		<category><![CDATA[AWS Analytics]]></category>
		<category><![CDATA[AWS Glue]]></category>
		<category><![CDATA[Best practices]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Expert (400)]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Spark]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=ac6e0a0c43720a34b439d0d7b7faf802</guid>

					<description><![CDATA[AWS Glue is a fully managed, serverless data integration service provided by Amazon Web Services (AWS) that uses Apache Spark as one of its backend processing engines (as of this writing, you can use Python Shell, Spark, or Ray). Data skew occurs when the data being processed is not evenly distributed across the Spark cluster, […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Introducing Amazon EMR on EKS job submission with Spark Operator and spark-submit</title>
		<link>https://noise.getoto.net/2023/06/06/introducing-amazon-emr-on-eks-job-submission-with-spark-operator-and-spark-submit/</link>
		
		<dc:creator><![CDATA[Lotfi Mouhib]]></dc:creator>
		<pubDate>Tue, 06 Jun 2023 19:29:04 +0000</pubDate>
				<category><![CDATA[#emroneks #ec2spot #costoptimization #sparkec2spot]]></category>
		<category><![CDATA[Advanced (300)]]></category>
		<category><![CDATA[Amazon Elastic Kubernetes Service]]></category>
		<category><![CDATA[Amazon EMR]]></category>
		<category><![CDATA[Amazon EMR on EKS]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Apache Spark]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=f8d00e04aaa0768830bd963478a8299f</guid>

					<description><![CDATA[Amazon EMR on EKS provides a deployment option for Amazon EMR that allows organizations to run open-source big data frameworks on Amazon Elastic Kubernetes Service (Amazon EKS). With EMR on EKS, Spark applications run on the Amazon EMR runtime for Apache Spark. This performance-optimized runtime offered by Amazon EMR makes your Spark jobs run fast […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Run Apache Spark workloads 3.5 times faster with Amazon EMR 6.9</title>
		<link>https://noise.getoto.net/2023/01/30/run-apache-spark-workloads-3-5-times-faster-with-amazon-emr-6-9/</link>
		
		<dc:creator><![CDATA[Sekar Srinivasan]]></dc:creator>
		<pubDate>Mon, 30 Jan 2023 18:34:02 +0000</pubDate>
				<category><![CDATA[Amazon EMR]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Apache Spark]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[Intermediate (200)]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=abc71ebc3c1715293a6b731d42523b02</guid>

					<description><![CDATA[In this post, we analyze the results from our benchmark tests running a TPC-DS application on open-source Apache Spark and then on Amazon EMR 6.9, which comes with an optimized Spark runtime that is compatible with open-source Spark. We walk through a detailed cost analysis and finally provide step-by-step instructions to run the benchmark. With Amazon EMR 6.9.0, you can now run your Apache Spark 3.x applications faster and at lower cost without requiring any changes to your applications. In our performance benchmark tests, derived from TPC-DS performance tests at 3 TB scale, we found the EMR runtime for Apache Spark 3.3.0 provides a 3.5 times (using total runtime) performance improvement on average over open-source Apache Spark 3.3.0.]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
		<item>
		<title>Design considerations for Amazon EMR on EKS in a multi-tenant Amazon EKS environment</title>
		<link>https://noise.getoto.net/2022/09/21/design-considerations-for-amazon-emr-on-eks-in-a-multi-tenant-amazon-eks-environment/</link>
		
		<dc:creator><![CDATA[Lotfi Mouhib]]></dc:creator>
		<pubDate>Wed, 21 Sep 2022 16:03:06 +0000</pubDate>
				<category><![CDATA[Amazon EMR]]></category>
		<category><![CDATA[Amazon EMR on EKS]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Apache Spark]]></category>
		<category><![CDATA[Best practices]]></category>
		<category><![CDATA[EKS]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[Spark]]></category>
		<category><![CDATA[Technical How-to]]></category>
		<guid isPermaLink="false">http://noise.getoto.net/?guid=2d8d7a599ba488e42b682d29ad6bd035</guid>

					<description><![CDATA[Many AWS customers use Amazon Elastic Kubernetes Service (Amazon EKS) in order to take advantage of Kubernetes without the burden of managing the Kubernetes control plane. With Kubernetes, you can centrally manage your workloads and offer administrators a multi-tenant environment where they can create, update, scale, and secure workloads using a single API. Kubernetes also […]]]></description>
		
		
		<enclosure url="" length="0" type="" />

			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Object Caching 36/172 objects using Memcached
Page Caching using Disk: Enhanced 
Lazy Loading (feed)
Database Caching using Memcached

Served from: noise.getoto.net @ 2025-12-10 04:19:04 by W3 Total Cache
-->