Post Syndicated from Anubhav Awasthi original https://aws.amazon.com/blogs/big-data/fine-grained-access-control-in-amazon-emr-serverless-with-aws-lake-formation/
In today’s data-driven world , enterprises are increasingly reliant on vast amounts of data to drive decision-making and innovation. With this reliance comes the critical need for robust data security and access control mechanisms. Fine-grained access control restricts access to specific data subsets, protecting sensitive information and maintaining regulatory compliance. It allows organizations to set detailed permissions at various levels, including database, table, column, and row. This precise control mitigates risks of unauthorized access, data leaks, and misuse. In the unfortunate event of a security incident, fine-grained access control helps limit the scope of the breach, minimizing potential damage.
AWS is introducing general availability of fine-grained access control based on AWS Lake Formation for Amazon EMR Serverless on Amazon EMR 7.2. Enterprises can now significantly enhance their data governance and security frameworks. This new integration supports the implementation of modern data lake architectures, such as data mesh, by providing a seamless way to manage and analyze data. You can use EMR Serverless to enforce data access controls using Lake Formation when reading data from Amazon Simple Storage Service (Amazon S3), enabling robust data processing workflows and real-time analytics without the overhead of cluster management.
In this post, we discuss how to implement fine-grained access control in EMR Serverless using Lake Formation. With this integration, organizations can achieve better scalability, flexibility, and cost-efficiency in their data operations, ultimately driving more value from their data assets.
Key use cases for fine-grained access control in analytics
The following are key use cases for fine-grained access control in analytics:
- Customer 360 – You can enable different departments to securely access specific customer data relevant to their functions. For example, the sales team can be granted access only to data such as customer purchase history, preferences, and transaction patterns. Meanwhile, the marketing team is limited to viewing campaign interactions, customer demographics, and engagement metrics.
- Financial reporting – You can enable financial analysts to access the necessary data for reporting and analysis while restricting sensitive financial details to authorized executives.
- Healthcare analytics – You can enable healthcare researchers and data scientists to analyze de-identified patient data for medical advancements and research, while making sure Protected Health Information (PHI) remains confidential and accessible only to authorized healthcare professionals and personnel.
- Supply chain optimization – You can grant logistics teams visibility into inventory and shipment data while limiting access to pricing or supplier information to relevant stakeholders.
Solution overview
In this post, we explore how to implement fine-grained access control on Iceberg tables within an EMR Serverless application, using the capabilities of Lake Formation. If you’re interested in learning how to implement fine-grained access control on open table formats in Amazon EMR running on Amazon Elastic Compute Cloud (Amazon EC2) instances using Lake Formation, refer to Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation.
With the data access control features available in Lake Formation, you can enforce granular permissions and govern access to specific columns, rows, or cells within your Iceberg tables. This approach makes sure sensitive data remains secure and accessible only to authorized users or applications, aligning with your organization’s data governance policies and regulatory compliance requirements.
A cross-account modern data platform on AWS involves setting up a centralized data lake in a primary AWS account, while allowing controlled access to this data from secondary AWS accounts. This setup helps organizations maintain a single source of truth for their data, provides consistent data governance, and uses the robust security features of AWS across multiple business units or project teams.
To demonstrate how you can use Lake Formation to implement cross account fine-grained access control within an EMR Serverless environment, we use the TPC-DS dataset to create tables in the AWS Glue Data Catalog in the AWS producer account and provision different user personas to reflect various roles and access levels in the AWS consumer account, forming a secure and governed data lake.
The following diagram illustrates the solution architecture.

The producer account contains the following persona:
- Data engineer – Tasks include data preparation, bulk updates, and incremental updates. The data engineer has the following access:
- Table-level access – Full read/write access to all TPC-DS tables.
The consumer account contains the following personas:
- Finance analyst – We run a sample query that performs a sales data analysis to guide marketing, inventory, and promotion strategies based on demographic and geographic performance. The finance analyst has the following access:
- Table-level access – Full access to tables
store_sales,catalog_sales,web_sales,item, andpromotionfor comprehensive financial analysis. - Column-level access – Limited access to cost-related columns in the
salestables to avoid exposure to sensitive pricing strategies. Limited access to sensitive columns likecredit_ratingin thecustomer_demographicstable. - Row-level access – Access only to sales data from the current fiscal year or specific promotional periods.
- Table-level access – Full access to tables
- Product analyst – We run a sample query that does a customer behavior analysis to tailor marketing, promotions, and loyalty programs based on purchase patterns and regional insights. The product analyst has the following access:
- Table-level access – Full access to tables
item,store_sales, andcustomertables to evaluate product and market trends. - Column-level access – Restricted access to personal identifiers in the
customertable, such ascustomer_address,email_address, anddate of birth.
- Table-level access – Full access to tables
Prerequisites
You should have the following prerequisites:
- Access to the producer account and consumer account with adequate permissions to create and deploy AWS CloudFormation stacks, upload files to S3 buckets, accept shared resources in AWS Resource Access Manager (AWS RAM), and other actions taken in this post.
- Access to AWS Identity and Access Management (IAM) roles or users who are a Lake Formation data lake administrator in both the producer and consumer account. For instructions, refer to Create a data lake administrator.
Set up infrastructure in the producer account
We provide a CloudFormation template to deploy the data lake stack with the following resources:
- Two S3 buckets: one for scripts and query results, and one for the data lake storage
- An Amazon Athena workgroup
- An EMR Serverless application
- An AWS Glue database and tables on external public S3 buckets of TPC-DS data
- An AWS Glue database for the data lake
- An IAM role and polices
Set up Lake Formation for the data engineer in the producer account
Set up Lake Formation cross-account data sharing version settings:
- Open the Lake Formation console with the Lake Formation data lake administrator in the producer account.
- Under Data Catalog settings, pick Version 4 under Cross-account version settings.
To learn more about the differences between data sharing versions, refer to Updating cross-account data sharing version settings. Make sure Default permissions for newly created databases and tables is unchecked.

Register the Amazon S3 location as the data lake location
When you register an Amazon S3 location with Lake Formation, you specify an IAM role with read/write permissions on that location. After registering, when EMR Serverless requests access to this Amazon S3 location, Lake Formation will supply temporary credentials of the provided role to access the data. We already created the role LakeFormationServiceRole using the CloudFormation template. To register the Amazon S3 location as the data lake location, complete the following steps:
- Open the Lake Formation console with the Lake Formation data lake administrator in the producer account.
- In the navigation pane, choose Data lake locations under Administration.
- Choose Register location.
- For Amazon S3 path, enter
s3://<DatalakeBucketName>. (Copy the bucket name from the CloudFormation stack’s Outputs tab.) - For IAM role, enter
LakeFormationServiceRoleDatalake. - For Permission mode, select Lake Formation.
- Choose Register location.

Generate TPC-DS tables in the producer account
In this section, we generate TPC-DS tables in Iceberg format in the producer account.
Grant database permissions to the data engineer
First, let’s grant database permissions to the data engineer IAM role Amazon-EMR-ExecutionRole_DE that we will use with EMR Serverless. Complete the following steps:
- Open the Lake Formation console with the Lake Formation data lake administrator in the producer account.
- Choose Databases and Create database.
- Enter
iceberg_dbfor Name ands3://<DatalakeBucketName>for Location. - Choose Create database.
- In the navigation pane, choose Data lake permissions and choose Grant.

- In the Principles section, select IAM users and roles and choose
Amazon-EMR-ExecutionRole_DE. - In the LF-Tags or catalog resources section, select Named Data Catalog resources and choose
tpc-sourceandiceberg_dbfor Databases.

- Select Super for both Database permissions and Grantable permissions and choose Grant.

Create an EMR Serverless application
Now, let’s log in to EMR Serverless using Amazon EMR Studio and complete the following steps:
- On the Amazon EMR console, choose EMR Serverless.
- Under Manage applications, choose
my-emr-studio. You will be directed to the Create application page on EMR Studio. Let’s create a Lake Formation enabled EMR Serverless application - Under Application settings, provide the following information:
- For Name, enter a name
emr-fgac-application. - For Type, choose Spark.
- For Release version, choose emr-7.2.0.
- For Architecture, choose x86_64.
- For Name, enter a name
- Under Application setup options, select Use custom settings.
- Under Interactive endpoint, select Enable endpoint for EMR studio
- Under Additional configurations, for Metastore configuration, select Use AWS Glue Data Catalog as metastore, then select Use Lake Formation for fine-grained access control.
- Under Network connections, choose
emrs-vpcfor the VPC, enter any two private subnets, and enteremr-serverless-sgfor Security groups.

- Choose Create and start application.
Create a Workspace
Complete the following steps to create an EMR Workspace:
- On the Amazon EMR console, choose Workspaces in the navigation pane and choose Create Workspace.
- Enter the Workspace name
emr-fgac-workspace. - Leave all other settings as default and choose Create Workspace.
- Choose Launch Workspace. Your browser might request to allow pop-up permissions for the first time launching the Workspace.
- After the Workspace is launched, in the navigation pane, choose Compute.
- For Compute type¸ select EMR Serverless application and enter
emr-fgac-applicationfor the application andAmazon-EMR-ExecutionRole_DEas the runtime role. - Make sure the kernel attached to the Workspace is PySpark.

- Navigate to the File browser section and choose Upload files.
- Upload the file Iceberg-ingest-final_v2.ipynb.
- Update the data lake bucket name, AWS account ID, and AWS Region accordingly.
- Choose the double arrow icon to restart the kernel and rerun the notebook.

To verify that the data is generated, you can go to the AWS Glue console. Under Data Catalog, Databases, you should see TPC-DS tables ending with _iceberg for the database iceberg_db.
Share the database and TPC-DS tables to the consumer account
We now grant permissions to the consumer account, including grantable permissions. This allows the Lake Formation data lake administrator in the consumer account to control access to the data within the account.
Grant database permissions to the consumer account
Complete the following steps:
- Open the Lake Formation console with the Lake Formation data lake administrator in the producer account.
- In the navigation pane, choose Databases.
- Select the database
iceberg_db, and on the Actions menu, under Permissions, choose Grant. - In the Principles section, select External accounts and enter the consumer account.
- In the LF-Tags or catalog resources section, select Named Data Catalog resources and choose
iceberg_dbfor Databases. - In the Database permissions section, select Describe for both Database permissions and Grantable permissions.
This allows the data lake administrator in the consumer account to describe the database and grant describe permissions to other principals in the consumer account.

Grant table permissions to the consumer account
Repeat the preceding steps to grant table permissions to the consumer account.

Choose All tables under Tables and provide select and describe permissions for Table permissions and Grantable permissions.

Set up Lake Formation in the consumer account
For the remaining section of the post, we focus on the consumer account. Deploy the following CloudFormation stack to set up resources:
The template will create the Amazon EMR runtime role for both analyst user personas.
Log in to the AWS consumer account and accept the AWS RAM invitation first:
- Open the AWS RAM console with the IAM identity that has AWS RAM access.
- In the navigation pane, choose Resource shares under Shared with me.
- You should see two pending resource shares from the producer account.
- Accept both invitations.
You should be able to see the iceberg_db database on the Lake Formation console.
Create a resource link for the shared database
To access the database and table resources that were shared by the producer AWS account, you need to create a resource link in the consumer AWS account. A resource link is a Data Catalog object that is a link to a local or shared database or table. After you create a resource link to a database or table, you can use the resource link name wherever you would use the database or table name. In this step, you grant permission on the resource links to the job runtime roles for EMR Serverless. The runtime roles will then access the data in shared databases and underlying tables through the resource link.
To create a resource link, complete the following steps:
- Open the Lake Formation console with the Lake Formation data lake administrator in the consumer account.
- In the navigation pane, choose Databases.
- Select the
iceberg_dbdatabase, verify that the owner account ID is the producer account, and on the Actions menu, choose Create resource links. - For Resource link name, enter the name of the resource link (
iceberg_db_shared). - For Shared database’s region, choose the Region of the iceberg_db database.
- For Shared database, choose the
iceberg_dbdatabase. - For Shared database’s owner ID, enter the account ID of the producer account.
- Choose Create.

Grant permissions on the resource link to the EMR job runtime roles
Grant permissions on the resource link to Amazon-EMR-ExecutionRole_Finance and Amazon-EMR-ExecutionRole_Product using the following steps:
- Open the Lake Formation console with the Lake Formation data lake administrator in the consumer account.
- In the navigation pane, choose Databases.
- Select the resource link (
iceberg_db_shared) and on the Actions menu, choose Grant. - In the Principles section, select IAM users and roles, and choose Amazon-EMR-ExecutionRole_Finance and Amazon-EMR-ExecutionRole_Product.
- In the LF-Tags or catalog resources section, select Named Data Catalog resources and for Databases, choose
iceberg_db_shared. - In the Resource link permissions section, select Describe for Resource link permissions.
This allows the EMR Serverless job runtime roles to describe the resource link. We don’t make any selections for grantable permissions because runtime roles shouldn’t be able to grant permissions to other principles.
Choose Grant.

Grant table permissions for the finance analyst
Complete the following steps:
- Open the Lake Formation console with the Lake Formation data lake administrator in the consumer account.
- In the navigation pane, choose Databases.
- Select the resource link (
iceberg_db_shared) and on the Actions menu, choose Grant on target. - In the Principles section, select IAM users and roles, then choose
Amazon-EMR-ExecutionRole_Finance. - In the LF-Tags or catalog resources section, select Named Data Catalog resources and specify the following:
- For Databases, choose
iceberg_db. - For Tables¸ choose
store_sales_iceberg.

- For Databases, choose
- In the Table permissions section, for Table permissions, select Select.
- In the Data permissions section, select Column-based access.
- Select Exclude columns and choose all cost-related columns (
ss_wholesale_costandss_ext_wholesale_cost). - Choose Grant.

- Similarly, grant access to table
customer_demographics_icebergand exclude the columncd_credit_rating. - Following the same steps, grant All data access for tables
store_iceberganditem_iceberg. - For the table
date_dim_iceberg, we provide selective row-level access. - Similar to the preceding table permissions, select
date_dim_icebergunder Tables and in the Data filters section, choose Create new.

- For Data filter name, enter
FA_Filter_year. - Select Access to all columns under Column-level access.
- Select Filter rows and for Row filter expression, enter
d_year=2002to only provide access to the 2002 year. - Choose Save changes.

- Choose Create filter.
- Make sure
FA_Filter_yearis selected under Data filters and grant select permissions on the filter.
Grant table permissions for the product analyst
You can provide permissions for the next set of tables required for the product analyst role using the Lake Formation console. Alternatively, you can use the AWS Command Line Interface (AWS CLI) to grant permissions. We provide grant on target permissions for the resource link iceberg_db_shared to IAM role Amazon-EMR-ExecutionRole_Product.
- Similar to steps followed in previous sections, for table
store_sales_iceberg,date_dim_iceberg,store_iceberg, andhouse_hold_demographics_iceberg, provide select permissions for All data access. Make sure the role selected isAmazon-EMR-ExecutionRole_Product.
For table customer_iceberg, we limit access to personally identifiable information (PII) columns.
- Under Data permissions, select Column-based access and Exclude columns.
- Choose columns
c_birth_day,c_birth_month,c_birth_year,c_current_addr_sk,c_customer_id,c_email_address, andc_birth_country.
Verify access using interactive notebooks from EMR Studio
Complete the following steps to test role access:
- Log in to the AWS consumer account and open the Amazon EMR console.
- Choose EMR Serverless in the navigation pane and choose an existing EMR Studio.
- If you don’t have EMR Studio configured, choose Get Started and select Create and launch EMR Studio.
- Create a Lake Formation enabled EMR Serverless application as described in previous sections.
- Create an EMR Studio Workspace as described in previous sections.
- Use
emr-studio-service-rolefor Service role anddatalake-resources-<account_id>-<region>for Workspace storage, then launch your Workspace.
Now, let’s verify access for the finance analyst.
- Make sure the compute type inside your Workspace is pointing to the EMR Serverless application created in the prior step and
Amazon-EMR-ExecutionRole_Financeas the interactive runtime role. - Go to File browser in the navigation pane, choose Upload files, and add Notebook_FA.ipynb to your Workspace.
- Run all the cells to verify fine-grained access.

Now let’s test access for the product analyst.
- In the same Workspace, detach and attach the same EMR Serverless application with
Amazon-EMR-ExecutionRole_Productas the interactive runtime role. - Upload Notebook_PA.ipynb under the File browser section.
- Run all the cells to verify fine-grained access for the product analyst.

In a real-world scenario, both analysts will likely have their own Workspace with restricted rights to assume only the authorized interactive runtime role.
Considerations and limitations
EMR Serverless with Lake Formation uses Spark resource profiles to create two profiles and two Spark drivers for access control. Read this white paper to learn about the feature details. The user profile runs the supplied code, and the system profile enforces Lake Formation policies. Therefore, it’s recommended that you have a minimum of two Spark drivers when pre-initialized capacity is used with Lake Formation enabled jobs. No change in executor count is required. Refer to Using EMR Serverless with AWS Lake Formation for fine-grained access control to learn more about the technical implementation of the Lake Formation integration with EMR Serverless.

You can expect a performance overhead after enabling Lake Formation. The level of access (table, column, or row) and the amount of data filtered will have significant impact on query performance.
Clean up
To avoid incurring ongoing costs, complete the following steps to clean up your resources:
- In your secondary (consumer) account, log in to the Lake Formation console.
- Drop the resource share table.
- In your primary (producer) account, log in to the Lake Formation console.
- Revoke the access you configured.
- Drop the AWS Glue tables and database.
- Delete the AWS Glue job.
- Delete the S3 buckets and any other resources that you created as part of the prerequisites for this post.
Conclusion
In this post, we showed how to integrate Lake Formation with EMR Serverless to manage access to Iceberg tables. This solution showcases a modern way to enforce fine-grained access control in a multi-account open data lake setup. The approach simplifies data management in the main account while carefully controlling how users access data in other secondary accounts.
Try out the solution for your own use case, and let us know your feedback and questions in the comments section.
About the Authors
Anubhav Awasthi is a Sr. Big Data Specialist Solutions Architect at AWS. He works with customers to provide architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena, AWS Glue, and AWS Lake Formation.
Nishchai JM is an Analytics Specialist Solutions Architect at Amazon Web services. He specializes in building Big-data applications and help customer to modernize their applications on Cloud. He thinks Data is new oil and spends most of his time in deriving insights out of the Data.














Dhrubajyoti Mukherjee is a Cloud Infrastructure Architect with a strong focus on data strategy, data analytics, and data governance at Amazon Web Services (AWS). He uses his deep expertise to provide guidance to global enterprise customers across industries, helping them build scalable and secure AWS solutions that drive meaningful business outcomes. Dhrubajyoti is passionate about creating innovative, customer-centric solutions that enable digital transformation, business agility, and performance improvement. An active contributor to the AWS community, Dhrubajyoti authors AWS Prescriptive Guidance publications, blog posts, and open-source artifacts, sharing his insights and best practices with the broader community. Outside of work, Dhrubajyoti enjoys spending quality time with his family and exploring nature through his love of hiking mountains.
Ravi Kumar is a Data Architect and Analytics expert at Amazon Web Services; he finds immense fulfillment in working with data. His days are dedicated to designing and analyzing complex data systems, uncovering valuable insights that drive business decisions. Outside of work, he unwinds by listening to music and watching movies, activities that allow him to recharge after a long day of data wrangling.
Martin Mikoleizig studied mechanical engineering and production technology at the RWTH Aachen University before starting to work in Dr. h.c. Ing. F. Porsche AG 2015 as a production planner for the engine assembly. In several years as a Project Manager on Testing Technology for new engine models he also introduced several innovations like human-machine-collaborations and intelligent assistance systems. From 2017, he was responsible for the Shopfloor IT team of the module lines in Zuffenhausen before he became responsible for the Planning of the E-Drive assembly at Porsche. Beside this he was responsible for the Digitalisation Strategy of the Production Ressort at Porsche. Since October 2022, he has been assigned to Volkswagen Autoeuropa in Portugal in the role of a Digital Transformation Manager for the plant driving the Digital Transformation towards a Data Driven Factory.
Weizhou Sun is a Lead Architect at Amazon Web Services, specializing in digital manufacturing solutions and IoT. With extensive experience in Europe, she has enhanced operational efficiencies, reducing latency and increasing throughput. Weizhou’s expertise includes Industrial Computer Vision, predictive maintenance, and predictive quality, consistently delivering top performance and client satisfaction. A recognized thought leader in IoT and remote driving, she has contributed to business growth through innovations and open-source work. Committed to knowledge sharing, Weizhou mentors colleagues and contributes to practice development. Known for her problem-solving skills and customer focus, she delivers solutions that exceed expectations. In her free time, Weizhou explores new technologies and fosters a collaborative culture.
Shameka Almond is an Advisory Consultant at Amazon Web Services. She works closely with enterprise customers to help them better understand the business impact and value of implementing data solutions, including data governance best practices. Shameka has over a decade of wide-ranging IT experience in the manufacturing and aerospace industries, and the nonprofit sector. She has supported several data governance initiatives, helping both public and private organizations identify opportunities for improvement and increased efficiency. Outside of the office she enjoys hosting large family gatherings, and supporting community outreach events dedicated to introducing students in K-12 to STEM.
Adjoa Taylor has over 20 years of experience in industrial manufacturing, providing industry and technology consulting services, digital transformation, and solution delivery. Currently Adjoa leads Product Centric Digital Transformation, enabling customers to solve complex manufacturing problems by leveraging Smart Factory and Industry leading transformation mechanisms. Most recently driving value with AI/ML and generative AI use-cases for the plant floor. Adjoa is an experienced leader spending over 20 years of her career delivering projects in countries throughout North America, Latin America, Europe, and Asia. Through prior roles, Adjoa brings deep experience across multiple business segments with a focus on business outcome driven solutions. Adjoa is passionate about helping customers solve problems while realizing the art of the possible via the right impacting value-based solution.



















Ramesh H Singh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon DataZone team. He is passionate about building high-performance ML/AI and analytics products that enable enterprise customers to achieve their critical goals using cutting-edge technology. Connect with him on
Eric Fleishman is a software engineer at AWS in Seattle. He loves diving into cloud technology and solving complex problems to build impactful solutions. Outside of work, he is all about staying active—whether its snowboarding down the slopes or working out. He enjoys pushing his limits and embracing new challenges.
Theo Tolv is a Senior Analytics Architect based in Stockholm, Sweden. He’s worked with small and big data for most of his career, and has built applications running on AWS since 2008. In his spare time he likes to tinker with electronics and read space opera.
Joel Farvault is Principal Specialist SA Analytics for AWS with 25 years’ experience working on enterprise architecture, data governance and analytics, mainly in the financial services industry. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management. He leverages his experience to advise customers on their data strategy and technology foundations.
Lakshmi Nair is a Senior Analytics Specialist Solutions Architect at AWS. She specializes in designing advanced analytics systems across industries. She focuses on crafting cloud-based data platforms, enabling real-time streaming, big data processing, and robust data governance.
Fabricio Hamada is a Senior Data Strategy Solutions Architect at AWS.
Lionel Pulickal is Sr. Solutions Architect at AWS











Shovan Kanjilal is a Senior Analytics and Machine Learning Architect with Amazon Web Services. He is passionate about helping customers build scalable, secure, and high-performance data solutions in the cloud.
Manoj Shunmugam is a DevOps Consultant in Professional Services at Amazon Web Services. He works with customers to establish infrastructures using cloud-centered and/or container-based platforms in the AWS Cloud.
Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling on his road bike.
Gal Heyne is a Product Manager for AWS Glue with a strong focus on AI/ML, data engineering, and BI. She is passionate about developing a deep understanding of customers’ business needs and collaborating with engineers to design easy-to-use data products.
Roger Kim is a Software Development Engineer on the Amazon Redshift team focusing on query performance and optimization. He holds a BA in Computer Science and Mathematics from Cornell University.
Mohammed Alkateb is an Engineering Manager at Amazon Redshift. Prior to joining Amazon, Mohammed had 12 years of industry experience in query optimization and database internals as an Individual Contributor and Engineering Manager. Mohammed has 18 US patents, and he has publications in research and industrial tracks of premier database conferences including EDBT, ICDE, SIGMOD and VLDB. Mohammed holds a PhD in Computer Science from The University of Vermont, and MSc and BSc degrees in Information Systems from Cairo University.
Mengchu Cai is a principal engineer on the Amazon Redshift team. Mengchu currently works on query optimization and data lake query performance. He also led the development of SQL language features. Mengchu received his PhD in Computer Science and Engineering from the University of Nebraska Lincoln.
Ravi Animi is a Senior Product Leader on the Amazon Redshift team and manages several functional areas of Amazon Redshift analytics, data, and AI, including spatial analytics, streaming analytics, query performance, Spark integration, and analytics business strategy. He has experience with relational databases, multi-dimensional databases, IoT technologies, storage and compute infrastructure services, and more recently, as a startup founder in the areas of AI and deep learning. Ravi holds dual Bachelors degrees in Physics and Electrical Engineering from Washington University, St. Louis, a Masters in Engineering from Stanford, and an MBA from Chicago Booth.





Debaprasun Chakraborty is an AWS Solutions Architect, specializing in the analytics domain. He has around 20 years of software development and architecture experience. He is passionate about helping customers in cloud adoption, migration and strategy.





























Roman Martynenko is a Senior Solutions Architect at Amazon Web Services with over 20 years of experience in Software Engineering, Architecture and Cloud technologies. Roman is helping Canadian public sector customers with their cloud journey. He focuses on next-generation developer experience, helping organizations re-imagine the entire Software Development Lifecycle. Outside of work, he enjoys hiking, home automation, and DIY projects.
Karthik Chemudupati is a Principal Technical Account Manager (TAM) with AWS, focused on helping customers achieve cost optimization and operational excellence. He has more than 20 years of IT experience in software engineering, cloud operations and automations. Karthik joined AWS in 2016 as a TAM and worked with more than dozen Enterprise Customers across US-West. Outside of work, he enjoys spending time with his family.
Shardul Vaidya is a Worldwide Partner Solutions Architect with AWS, focused on helping partners and customers build and effectively use Generative AI powered developer experiences. Shardul joined AWS in 2020 as part of their early career talent Solutions Architect team and worked with over a hundred modernization and DevOps partners across the world. Outside of work, he’s a music lover and collects records.
Naidu Rongali is a Big Data and ML engineer at Amazon. He designs and develops data processing solutions for data intensive analytical systems supporting Amazon retail business. He has been working on integrating generative AI capabilities into the data lake and data warehouse systems using Amazon Bedrock AI models. Naidu has a PG diploma in Applied Statistics from the Indian Statistical Institute, Calcutta and BTech in Electrical and Electronics from NIT, Warangal. Outside of his work, Naidu practices yoga and goes trekking often.





Miranda Diaz is a Software Development Engineer for EMR at AWS. Miranda works to design and develop technologies that make it easy for customers across the world to automatically scale their computing resources to their needs, helping them achieve the best performance at the optimal cost.
Sajjan Bhattarai is a Senior Cloud Support Engineer at AWS, and specializes in BigData and Machine Learning workloads. He enjoys helping customers around the world to troubleshoot and optimize their data platforms.
Bezuayehu Wate is an Associate Big Data Specialist Solutions Architect at AWS. She works with customers to provide strategic and architectural guidance on designing, building, and modernizing their cloud-based analytics solutions using AWS.
Mohammed Alkateb is an Engineering Manager at Amazon Redshift. Prior to joining Amazon, Mohammed had 12 years of industry experience in query optimization and database internals as an individual contributor and engineering manager. Mohammed has 18 US patents, and he has publications in research and industrial tracks of premier database conferences including EDBT, ICDE, SIGMOD and VLDB. Mohammed holds a PhD in Computer Science from The University of Vermont, and MSc and BSc degrees in Information Systems from Cairo University.
Ramchandra Anil Kulkarni is a software development engineer who has been with Amazon Redshift for over 4 years. He is driven to develop database innovations that serve AWS customers globally. Kulkarni’s long-standing tenure and dedication to the Amazon Redshift service demonstrate his deep expertise and commitment to delivering cutting-edge database solutions that empower AWS customers worldwide.
Mark Lyons is a Principal Product Manager on the Amazon Redshift team. He works on the intersection of data lakes and data warehouses. Prior to joining AWS, Mark held product leadership roles with Dremio and Vertica. He is passionate about data analytics and empowering customers to change the world with their data.
Asser Moustafa is a Principal Worldwide Specialist Solutions Architect at AWS, based in Dallas, Texas. He partners with customers worldwide, advising them on all aspects of their data architectures, migrations, and strategic data visions to help organizations adopt cloud-based solutions, maximize the value of their data assets, modernize legacy infrastructures, and implement cutting-edge capabilities like machine learning and advanced analytics. Prior to joining AWS, Asser held various data and analytics leadership roles, completing an MBA from New York University and an MS in Computer Science from Columbia University in New York. He is passionate about empowering organizations to become truly data-driven and unlock the transformative potential of their data.




Satish Nandi is a Senior Product Manager with Amazon OpenSearch Service. He is focused on OpenSearch Serverless and has years of experience in networking, security and AI/ML. He holds a Bachelor’s degree in Computer Science and an MBA in Entrepreneurship. In his free time, he likes to fly airplanes and hang gliders and ride his motorcycle.
Milav Shah is an Engineering Leader with Amazon OpenSearch Service. He focuses on search experience for OpenSearch customers. He has extensive experience building highly scalable solutions in databases, real-time streaming and distributed computing. He also possesses functional domain expertise in verticals like Internet of Things, fraud protection, gaming and AI/ML. In his free time, he likes to ride cycle, hike, and play chess.
Qiaoxuan Xue is a Senior Software Engineer at AWS leading the search and benchmarking areas of the Amazon OpenSearch Serverless Project. His passion lies in finding solutions for intricate challenges within large-scale distributed systems. Outside of work, he enjoys woodworking, biking, playing basketball, and spending time with his family and dog.
Prashant Agrawal is a Sr. Search Specialist Solutions Architect with Amazon OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining AWS, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.




Steven Carpenter is a Senior Solution Developer on the AWS Industries Prototyping and Customer Engineering (PACE) team, helping AWS customers bring innovative ideas to life through rapid prototyping on the AWS platform. He holds a master’s degree in Computer Science from Wayne State University in Detroit, Michigan.
Aravindharaj Rajendran is a Senior Solution Developer within the AWS Industries Prototyping and Customer Engineering (PACE) team, based in Herndon, VA. He helps AWS customers materialize their innovative ideas by rapid prototyping using the AWS platform. Outside of work, he loves playing PC games, Badminton and Traveling.