Tag Archives: Foundational (100)

AWS now licensed by DESC to operate as a Tier 1 cloud service provider in the Middle East (UAE) Region

Post Syndicated from Ioana Mecu original https://aws.amazon.com/blogs/security/aws-now-licensed-by-desc-to-operate-as-a-tier-1-cloud-service-provider-in-the-middle-east-uae-region/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that our Middle East (UAE) Region is now certified by the Dubai Electronic Security Centre (DESC) to operate as a Tier 1 cloud service provider (CSP). This alignment with DESC requirements demonstrates our continuous commitment to adhere to the heightened expectations for CSPs. AWS government customers can run their applications in the AWS Cloud certified Regions in confidence.

AWS was evaluated by independent third-party auditor BSI on behalf of DESC on January 23, 2023. The Certificate of Compliance illustrating the AWS compliance status is available through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

As of this writing, 62 services offered in the Middle East (UAE) Region are in scope of this certification. For up-to-date information, including when additional services are added, visit the AWS Services in Scope by Compliance Program webpage and choose DESC CSP.

AWS strives to continuously bring services into scope of its compliance programs to help you meet your architectural and regulatory needs. Please reach out to your AWS account team if you have questions or feedback about DESC compliance.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Ioana Mecu

Ioana Mecu

Ioana is a Security Audit Program Manager at AWS based in Madrid, Spain. She leads security audits, attestations, and certification programs across Europe and the Middle East. Ioana has previously worked in risk management, security assurance, and technology audits in the financial sector industry for the past 15 years.

AWS Security Profile: Jana Kay, Cloud Security Strategist

Post Syndicated from Roger Park original https://aws.amazon.com/blogs/security/aws-security-profile-jana-kay-cloud-security-strategist/

In the AWS Security Profile series, we interview Amazon Web Services (AWS) thought leaders who help keep our customers safe and secure. This interview features Jana Kay, Cloud Security Strategist. Jana shares her unique career journey, insights on the Security and Resiliency of the Cloud Tabletop Exercise (TTX) program, thoughts on the data protection and cloud security landscape, and more.


How long have you been at AWS and what do you do in your current role?
I’ve been at AWS a little over four years. I started in 2018 as a Cloud Security Strategist, and in my opinion, I have one of the coolest jobs at AWS. I get to help customers think through how to use the cloud to address some of their most difficult security challenges, by looking at trends and emerging and evolving issues, and anticipating those that might still be on the horizon. I do this through various means, such as whitepapers, short videos, and tabletop exercises. I love working on a lot of different projects, which all have an impact on customers and give me the opportunity to learn new things myself all the time!

How did you get started in the security space? What about it piqued your interest?
After college, I worked in the office of a United States senator, which led me to apply to the Harvard Kennedy School for a graduate degree in public policy. When I started graduate school, I wasn’t sure what my focus would be, but my first day of class was September 11, 2001, which obviously had a tremendous impact on me and my classmates. I first heard about the events of September 11 while I was in an international security policy class, taught by the late Dr. Ash Carter. My classmates and I came from a range of backgrounds, cultures, and professions, and Dr. Carter challenged us to think strategically and objectively—but compassionately—about what was unfolding in the world and our responsibility to effect change. That experience led me to pursue a career in security. I concentrated in international security and political economy, and after graduation, accepted a Presidential Management Fellowship in the Office of the Secretary of Defense at the Pentagon, where I worked for 16 years before coming to AWS.

What’s been the most dramatic change you’ve seen in the security industry?
From the boardroom to builder teams, the understanding that security has to be integrated into all aspects of an organization’s ecosystem has been an important shift. Acceptance of security as foundational to the health of an organization has been evolving for a while, and a lot of organizations have more work to do, but overall there is prioritization of security within organizations.

I understand you’ve helped publish a number of papers at AWS. What are they and how can customers find them?
Good question! AWS publishes a lot of great whitepapers for customers. A few that I’ve worked on are Accreditation Models for Secure Cloud Adoption, Security at the Edge: Core Principles, and Does data localization cause more problems than it solves? To stay updated on the latest whitepapers, see AWS Whitepapers & Guides.

What are your thoughts on the security of the cloud today?
There are a lot of great technologies—such as AWS Data Protection services—that can help you with data protection, but it’s equally important to have the right processes in place and to create a strong culture of data protection. Although one of the biggest shifts I’ve seen in the industry is recognition of the importance of security, we still have a ways to go for people to understand that security and data protection is everyone’s job, not just the job of security experts. So when we talk about data protection and privacy issues, a lot of the conversation focuses on things like encryption, but the conversation shouldn’t end there because ultimately, security is only as good as the processes and people who implement it.

Do you have a favorite AWS Security service and why?
I like anything that helps simplify my life, so AWS Control Tower is one of my favorites. It has so much functionality. Not only does AWS Control Tower help you set up multi-account AWS environments, you can use it to help identify which of your resources are compliant. The dashboard, which allows for visibility of provisioned accounts, controls enabled policy enforcement and can help you detect noncompliant resources.

What are you currently working on that you’re excited about?
Currently, my focus is the Security and Resiliency of the Cloud Tabletop Exercise (TTX). It’s a 3-hour interactive event about incident response in which participants discuss how to prevent, detect, contain, and eradicate a simulated cyber incident. I’ve had the opportunity to conduct the TTX in South America, the Middle East, Europe, and the US, and it’s been so much fun meeting customers and hearing the discussions during the TTX and how much participants enjoy the experience. It scales well for groups of different sizes—and for a single customer or industry or for multiple customers or industries—and it’s been really interesting to see how the conversations change depending on the participants.

How does the Security and Resiliency of the Cloud Tabletop Exercise help security professionals hone their skills?
One of the great things about the tabletop is that it involves interacting with other participants. So it’s an opportunity for security professionals and business leaders to learn from their peers, hear different perspectives, and understand all facets of the problem and potential solutions. Often our participants range from CISOs to policymakers to technical experts, who come to the exercise with different priorities for data protection and different ideas on how to solve the scenarios that we present. The TTX isn’t a technical exercise, but participants link their collective understanding of what capabilities are needed in a given scenario to what services are available to them and then finally how to implement those services. One of the things that I hope participants leave with is a better understanding of the AWS tools and services that are available to them.

How can customers learn more about the Security and Resiliency of the Cloud Tabletop Exercise?
To learn more about the TTX, reach out to your account manager.

Is there something you wish customers would ask you about more often?
I wish they’d ask more about what they should be doing to prepare for a cyber incident. It’s one thing to have an incident response plan; it’s another thing to be confident that it’s going to work if you ever need it. If you don’t practice the plan, how do you know that it’s effective, if it has gaps, or if everyone knows their role in an incident?

How about outside of work—any hobbies?
I’m the mother of a teenager and tween, so between keeping up with their activities, I wish I had more time for hobbies! But someday soon, I’d like to get back to traveling more for leisure, reading for fun, and playing tennis.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Roger Park

Roger Park

Roger is a Senior Security Content Specialist at AWS Security focusing on data protection. He has worked in cybersecurity for almost ten years as a writer and content producer. In his spare time, he enjoys trying new cuisines, gardening, and collecting records.

Author

Jana Kay

Since 2018, Jana has been a cloud security strategist with the AWS Security Growth Strategies team. She develops innovative ways to help AWS customers achieve their objectives, such as security table top exercises and other strategic initiatives. Previously, she was a cyber, counter-terrorism, and Middle East expert for 16 years in the Pentagon’s Office of the Secretary of Defense.

Updated ebook: Protecting your AWS environment from ransomware

Post Syndicated from Megan O'Neil original https://aws.amazon.com/blogs/security/updated-ebook-protecting-your-aws-environment-from-ransomware/

Amazon Web Services is excited to announce that we’ve updated the AWS ebook, Protecting your AWS environment from ransomware. The new ebook includes the top 10 best practices for ransomware protection and covers new services and features that have been released since the original published date in April 2020.

We know that customers care about ransomware. Security teams across the board are ramping up their protective, detective, and reactive measures. AWS serves all customers, including those most sensitive to disruption like teams responsible for critical infrastructure, healthcare organizations, manufacturing, educational institutions, and state and local governments. We want to empower our customers to protect themselves against ransomware by using a range of security capabilities. These capabilities provide unparalleled visibility into your AWS environment, as well as the ability to update and patch efficiently, to seamlessly and cost-effectively back up your data, and to templatize your environment, enabling a rapid return to a known good state. Keep in mind that there is no single solution or quick fix to mitigate ransomware. In fact, the mitigations and controls outlined in this document are general security best practices. We hope you find this information helpful and take action.

For example, to help protect against a security event that impacts stored backups in the source account, AWS Backup supports cross-account backups and the ability to centrally define backup policies for accounts in AWS Organizations by using the management account. Also, AWS Backup Vault Lock enforces write-once, read-many (WORM) backups to help protect backups (recovery points) in your backup vaults from inadvertent or malicious actions. You can copy backups to a known logically isolated destination account in the organization, and you can restore from the destination account or, alternatively, to a third account. This gives you an additional layer of protection if the source account experiences disruption from accidental or malicious deletion, disasters, or ransomware.

Learn more about solutions like these by checking out our Protecting against ransomware webpage, which discusses security resources that can help you secure your AWS environments from ransomware.

Download the ebook Protecting your AWS environment from ransomware.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Megan O’Neil

Megan is a Principal Specialist Solutions Architect focused on threat detection and incident response. Megan and her team enable AWS customers to implement sophisticated, scalable, and secure solutions that solve their business challenges.

Merritt Baer

Merritt Baer

Merritt is a Principal in the Office of the CISO. She can be found on Twitter at @merrittbaer and looks forward to meeting you on her Twitch show, at re:Invent, or in your next executive conversation.

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Post Syndicated from Avinash Kolluri original https://aws.amazon.com/blogs/big-data/how-amazon-devices-scaled-and-optimized-real-time-demand-and-supply-forecasts-using-serverless-analytics/

Every day, Amazon devices process and analyze billions of transactions from global shipping, inventory, capacity, supply, sales, marketing, producers, and customer service teams. This data is used in procuring devices’ inventory to meet Amazon customers’ demands. With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data.

This post shows you how we migrated to a serverless data lake built on AWS that consumes data automatically from multiple sources and different formats. Furthermore, it created further opportunities for our data scientists and engineers to use AI and machine learning (ML) services to continuously feed and analyze data.

Challenges and design concerns

Our legacy architecture primarily used Amazon Elastic Compute Cloud (Amazon EC2) to extract the data from various internal heterogeneous data sources and REST APIs with the combination of Amazon Simple Storage Service (Amazon S3) to load the data and Amazon Redshift for further analysis and generating the purchase orders.

We found this approach resulted in a few deficiencies and therefore drove improvements in the following areas:

  • Developer velocity – Due to the lack of unification and discovery of schema, which are primary reasons for runtime failures, developers often spent time dealing with operational and maintenance issues.
  • Scalability – Most of these datasets are shared across the globe. Therefore, we must meet the scaling limits while querying the data.
  • Minimal infrastructure maintenance – The current process spans multiple computes depending on the data source. Therefore, reducing infrastructure maintenance is critical.
  • Responsiveness to data source changes – Our current system gets data from various heterogeneous data stores and services. Any updates to those services takes months of developer cycles. The response times for these data sources are critical to our key stakeholders. Therefore, we must take a data-driven approach to select a high-performance architecture.
  • Storage and redundancy – Due to the heterogeneous data stores and models, it was challenging to store the different datasets from various business stakeholder teams. Therefore, having versioning along with incremental and differential data to compare will provide a remarkable ability to generate more optimized plans
  • Fugitive and accessibility – Due to the volatile nature of logistics, a few business stakeholder teams have a requirement to analyze the data on demand and generate the near-real-time optimal plan for the purchase orders. This introduces the need for both polling and pushing the data to access and analyze in near-real time.

Implementation strategy

Based on these requirements, we changed strategies and started analyzing each issue to identify the solution. Architecturally, we chose a serverless model, and the data lake architecture action line refers to all the architectural gaps and challenging features we determined were part of the improvements. From an operational standpoint, we designed a new shared responsibility model for data ingestion using AWS Glue instead of internal services (REST APIs) designed on Amazon EC2 to extract the data. We also used AWS Lambda for data processing. Then we chose Amazon Athena as our query service. To further optimize and improve the developer velocity for our data consumers, we added Amazon DynamoDB as a metadata store for different data sources landing in the data lake. These two decisions drove every design and implementation decision we made.

The following diagram illustrates the architecture

arch

In the following sections, we look at each component in the architecture in more detail as we move through the process flow.

AWS Glue for ETL

To meet customer demand while supporting the scale of new businesses’ data sources, it was critical for us to have a high degree of agility, scalability, and responsiveness in querying various data sources.

AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. You can use it for analytics, ML, and application development. It also includes additional productivity and DataOps tooling for authoring, running jobs, and implementing business workflows.

With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Also, you can immediately search and query cataloged data using Athena, Amazon EMR, and Amazon Redshift Spectrum.

AWS Glue made it easy for us to connect to the data in various data stores, edit and clean the data as needed, and load the data into an AWS-provisioned store for a unified view. AWS Glue jobs can be scheduled or called on demand to extract data from the client’s resource and from the data lake.

Some responsibilities of these jobs are as follows:

  • Extracting and converting a source entity to data entity
  • Enrich the data to contain year, month, and day for better cataloging and include a snapshot ID for better querying
  • Perform input validation and path generation for Amazon S3
  • Associate the accredited metadata based on the source system

Querying REST APIs from internal services is one of our core challenges, and considering the minimal infrastructure, we wanted to use them in this project. AWS Glue connectors assisted us in adhering to the requirement and goal. To query data from REST APIs and other data sources, we used PySpark and JDBC modules.

AWS Glue supports a wide variety of connection types. For more details, refer to Connection Types and Options for ETL in AWS Glue.

S3 bucket as landing zone

We used an S3 bucket as the immediate landing zone of the extracted data, which is further processed and optimized.

Lambda as AWS Glue ETL Trigger

We enabled S3 event notifications on the S3 bucket to trigger Lambda, which further partitions our data. The data is partitioned on InputDataSetName, Year, Month, and Date. Any query processor running on top of this data will scan only a subset of data for better cost and performance optimization. Our data can be stored in various formats, such as CSV, JSON, and Parquet.

The raw data isn’t ideal for most of our use cases to generate the optimal plan because it often has duplicates or incorrect data types. Most importantly, the data is in multiple formats, but we quickly modified the data and observed significant query performance gains from using the Parquet format. Here, we used one of the performance tips in Top 10 performance tuning tips for Amazon Athena.

AWS Glue jobs for ETL

We wanted better data segregation and accessibility, so we chose to have a different S3 bucket to improve performance further. We used the same AWS Glue jobs to further transform and load the data into the required S3 bucket and a portion of extracted metadata into DynamoDB.

DynamoDB as metadata store

Now that we have the data, various business stakeholders further consume it. This leaves us with two questions: which source data resides on the data lake and what version. We chose DynamoDB as our metadata store, which provides the latest details to the consumers to query the data effectively. Every dataset in our system is uniquely identified by snapshot ID, which we can search from our metadata store. Clients access this data store with an API’s.

Amazon S3 as data lake

For better data quality, we extracted the enriched data into another S3 bucket with the same AWS Glue job.

AWS Glue Crawler

Crawlers are the “secret sauce” that enables us to be responsive to schema changes. Throughout the process, we chose to make each step as schema-agnostic as possible, which allows any schema changes to flow through until they reach AWS Glue. With a crawler, we could maintain the agnostic changes happening to the schema. This helped us automatically crawl the data from Amazon S3 and generate the schema and tables.

AWS Glue Data Catalog

The Data Catalog helped us maintain the catalog as an index to the data’s location, schema, and runtime metrics in Amazon S3. Information in the Data Catalog is stored as metadata tables, where each table specifies a single data store.

Athena for SQL queries

Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run. We considered operational stability and increasing developer velocity as our key improvement factors.

We further optimized the process to query Athena so that users can plug in the values and the queries to get data out of Athena by creating the following:

  • An AWS Cloud Development Kit (AWS CDK) template to create Athena infrastructure and AWS Identity and Access Management (IAM) roles to access data lake S3 buckets and the Data Catalog from any account
  • A library so that client can provide an IAM role, query, data format, and output location to start an Athena query and get the status and result of the query run in the bucket of their choice.

To query Athena is a two-step process:

  • StartQueryExecution – This starts the query run and gets the run ID. Users can provide the output location where the output of the query will be stored.
  • GetQueryExecution – This gets the query status because the run is asynchronous. When successful, you can query the output in an S3 file or via API.

The helper method for starting the query run and getting the result would be in the library.

Data lake metadata service

This service is custom developed and interacts with DynamoDB to get the metadata (dataset name, snapshot ID, partition string, timestamp, and S3 link of the data) in the form of a REST API. When the schema is discovered, clients use Athena as their query processor to query the data.

Because all datasets have a snapshot ID are partitioned, the join query doesn’t result in a full table scan but only a partition scan on Amazon S3. We used Athena as our query processor because of its ease in not managing our query infrastructure. Later, if we feel we need something more, we can use either Redshift Spectrum or Amazon EMR.

Conclusion

Amazon Devices teams discovered significant value by moving to a data lake architecture using AWS Glue, which enabled multiple global business stakeholders to ingest data in more productive ways. This enabled the teams to generate the optimal plan to place purchase orders for devices by analyzing the different datasets in near-real time with appropriate business logic to solve the problems of the supply chain, demand, and forecast.

From an operational perspective, the investment has already started to pay off:

  • It standardized our ingestion, storage, and retrieval mechanisms, saving onboarding time. Before the implementation of this system, one dataset took 1 month to onboard. Due to our new architecture, we were able to onboard 15 new datasets in less than 2 months, which improved our agility by 70%.
  • It removed scaling bottlenecks, creating a homogeneous system that can quickly scale to thousands of runs.
  • The solution added schema and data quality validation before accepting any inputs and rejecting them if data quality violations are discovered.
  • It made it easy to retrieve datasets while supporting future simulations and back tester use cases requiring versioned inputs. This will make launching and testing models simpler.
  • The solution created a common infrastructure that can be easily extended to other teams across DIAL having similar issues with data ingestion, storage, and retrieval use cases.
  • Our operating costs have fallen by almost 90%.
  • This data lake can be accessed efficiently by our data scientists and engineers to perform other analytics and have a predictive approach as a future opportunity to generate accurate plans for the purchase orders.

The steps in this post can help you plan to build a similar modern data strategy using AWS-managed services to ingest data from different sources, automatically create metadata catalogs, share data seamlessly between the data lake and data warehouse, and create alerts in the event of an orchestrated data workflow failure.


About the authors

avinash_kolluriAvinash Kolluri is a Senior Solutions Architect at AWS. He works across Amazon Alexa and Devices to architect and design modern distributed solutions. His passion is to build cost-effective and highly scalable solutions on AWS. In his spare time, he enjoys cooking fusion recipes and traveling.

vipulVipul Verma is a Sr.Software Engineer at Amazon.com. He has been with Amazon since 2015,solving real-world challenges through technology that directly impact and improve the life of Amazon customers. In his spare time, he enjoys hiking.

Amazon EMR launches support for Amazon EC2 C7g (Graviton3) instances to improve cost performance for Spark workloads by 7–13%

Post Syndicated from Al MS original https://aws.amazon.com/blogs/big-data/amazon-emr-launches-support-for-amazon-ec2-c7g-graviton3-instances-to-improve-cost-performance-for-spark-workloads-by-7-13/

Amazon EMR provides a managed service to easily run analytics applications using open-source frameworks such as Apache Spark, Hive, Presto, Trino, HBase, and Flink. The Amazon EMR runtime for Spark and Presto includes optimizations that provide over twice the performance improvements compared to open-source Apache Spark and Presto.

With Amazon EMR release 6.7, you can now use Amazon Elastic Compute Cloud (Amazon EC2) C7g instances, which use the AWS Graviton3 processors. These instances improve price-performance of running Spark workloads on Amazon EMR by 7.93–13.35% over previous generation instances, depending on the instance size. In this post, we describe how we estimated the price-performance benefit.

Amazon EMR runtime performance with EC2 C7g instances

We ran TPC-DS 3 TB benchmark queries on Amazon EMR 6.9 using the Amazon EMR runtime for Apache Spark (compatible with Apache Spark 3.3) with C7g instances. Data was stored in Amazon Simple Storage Service (Amazon S3), and results were compared to equivalent C6g clusters from the previous generation instance family. We measured performance improvements using the total query runtime and geometric mean of the query runtime across TPC-DS 3 TB benchmark queries.

Our results showed 13.65–18.73% improvement in total query runtime performance and 16.98–20.28% improvement in geometric mean on EMR clusters with C7g compared to equivalent EMR clusters with C6g instances, depending on the instance size. In comparing costs, we observed 7.93–13.35% reduction in cost on the EMR cluster with C7g compared to the equivalent with C6g, depending on the instance size. We did not benchmark the C6g xlarge instance because it didn’t have sufficient memory to run the queries.

The following table shows the results from running the TPC-DS 3 TB benchmark queries using Amazon EMR 6.9 compared to equivalent C7g and C6g instance EMR clusters.

Instance Size 16 XL 12 XL 8 XL 4 XL 2 XL
Total size of the cluster (1 leader + 5 core nodes) 6 6 6 6 6
Total query runtime on C6g (seconds) 2774.86205 2752.84429 3173.08086 5108.45489 8697.08117
Total query runtime on C7g (seconds) 2396.22799 2336.28224 2698.72928 4151.85869 7249.58148
Total query runtime improvement with C7g 13.65% 15.13% 14.95% 18.73% 16.64%
Geometric mean query runtime C6g (seconds) 22.2113 21.75459 23.38081 31.97192 45.41656
Geometric mean query runtime C7g (seconds) 18.43905 17.65898 19.01684 25.48695 37.43737
Geometric mean query runtime improvement with C7g 16.98% 18.83% 18.66% 20.28% 17.57%
EC2 C6g instance price ($ per hour) $2.1760 $1.6320 $1.0880 $0.5440 $0.2720
EMR C6g instance price ($ per hour) $0.5440 $0.4080 $0.2720 $0.1360 $0.0680
(EC2 + EMR) instance price ($ per hour) $2.7200 $2.0400 $1.3600 $0.6800 $0.3400
Cost of running on C6g ($ per instance) $2.09656 $1.55995 $1.19872 $0.96493 $0.82139
EC2 C7g instance price ($ per hour) $2.3200 $1.7400 $1.1600 $0.5800 $0.2900
EMR C7g price ($ per hour per instance) $0.5800 $0.4350 $0.2900 $0.1450 $0.0725
(EC2 + EMR) C7g instance price ($ per hour) $2.9000 $2.1750 $1.4500 $0.7250 $0.3625
Cost of running on C7g ($ per instance) $1.930290 $1.411500 $1.086990 $0.836140 $0.729990
Total cost reduction with C7g including performance improvement -7.93% -9.52% -9.32% -13.35% -11.13%

The following graph shows per-query improvements observed on C7g 2xlarge instances compared to equivalent C6g generations.

Benchmarking methodology

The benchmark used in this post is derived from the industry-standard TPC-DS benchmark, and uses queries from the Spark SQL Performance Tests GitHub repo with the following fixes applied.

We calculated TCO by multiplying cost per hour by number of instances in the cluster and time taken to run the queries on the cluster. We used on-demand pricing in the US East (N. Virginia) Region for all instances.

Conclusion

In this post, we described how we estimated the cost-performance benefit from using Amazon EMR with C7g instances compared to using equivalent previous generation instances. Using these new instances with Amazon EMR improves cost-performance by an additional 7–13%.


About the authors

AI MSAl MS is a product manager for Amazon EMR at Amazon Web Services.

Kyeonghyun Ryoo is a Software Development Engineer for EMR at Amazon Web Services. He primarily works on designing and building automation tools for internal teams and customers to maximize their productivity. Outside of work, he is a retired world champion in professional gaming who still enjoy playing video games.

Yuzhou Sun is a software development engineer for EMR at Amazon Web Services.

Steve Koonce is an Engineering Manager for EMR at Amazon Web Services.

AWS Lake Formation 2022 year in review

Post Syndicated from Jason Berkowitz original https://aws.amazon.com/blogs/big-data/aws-lake-formation-2022-year-in-review/

Data governance is the collection of policies, processes, and systems that organizations use to ensure the quality and appropriate handling of their data throughout its lifecycle for the purpose of generating business value. Data governance is increasingly top-of-mind for customers as they recognize data as one of their most important assets. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders. In addition, data governance is required to comply with an increasingly complex regulatory environment with data privacy (such as GDPR and CCPA) and data residency regulations (such as in the EU, Russia, and China).

For AWS customers, effective data governance improves decision-making, increases business agility, provides a competitive advantage, and reduces the risk of fines due to non-compliance with regulatory obligations. We understand the unique opportunity to provide our customers a comprehensive end-to-end data governance solution that is seamlessly integrated into our portfolio of services, and AWS Lake Formation and the AWS Glue Data Catalog are key to solving these challenges.

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

To learn more about how customers are securing and sharing data with Lake Formation, we recommend going deeper into GoDaddy’s decentralized data mesh, Novo Nordisk’s modern data architecture, and JPMorgan’s improvements to their Federated Data Lake, a governed data mesh implementation using Lake Formation. Also, you can learn how AWS Partners integrated with Lake Formation to help customers build unique data lakes, in Starburst’s data mesh solution, Informatica’s automated data sharing solution, Ahana’s Presto integration with Lake Formation, Ascending’s custom data governance system, how PBS used machine learning on their data lakes, and how hc1 provides personalized health insights for customers.

You can review how Lake Formation is used by customers to build modern data architectures in the following re:Invent 2022 talks:

The Lake Formation team listened to customer feedback and made improvements in the areas of cross-account data governance, expanding the source of data lakes, enabling unified data governance of a business data catalog, making secure business-to-business data sharing possible, and expanding the coverage area for fine-grained access controls to Amazon Redshift. In the rest of this post, we are happy to share the progress we made in 2022.

Enhancing cross-account governance

Lake Formation provides the foundation for customers to share data across accounts within their organization. You can share AWS Glue Data Catalog resources to AWS Identity and Access Management (IAM) principals within an account as well as other AWS accounts using two methods. The first one is called the named-resource method, where users can select the names of databases and tables and choose the type of permissions to share. The second method uses LF-Tags, where users can create and associate LF-Tags to databases and tables and grant permission to IAM principals using LF-Tag policies and expressions.

In November 2022, Lake Formation introduced version 3 of its cross-account sharing feature. With this new version, Lake Formation users can share catalog resources using LF-Tags at the AWS Organizations level. Sharing data using LF-tags helps scale permissions and reduces the admin work for data lake builders. The cross-account sharing version 3 also allows you to share resources to specific IAM principals in other accounts, providing data owners control over who can access their data in other accounts. Lastly, we have removed the overhead of writing and maintaining Data Catalog resource policies by introducing AWS Resource Access Manager (AWS RAM) invites with LF-Tags-based policies in the cross-account sharing version 3. We encourage you to further explore cross-account sharing in Lake Formation.

Extending Lake Formation permissions to new data

Until re:Invent 2022, Lake Formation provided permissions management for IAM principals on Data Catalog resources with underlying data primarily on Amazon Simple Storage Service (Amazon S3). At re:Invent 2022, we introduced Lake Formation permissions management for Amazon Redshift data shares in preview mode. Amazon Redshift is a fully-managed, petabyte-scale data warehouse service in the AWS Cloud. The data sharing feature allows data owners to group databases, tables, and views in an Amazon Redshift cluster and share it with other Amazon Redshift clusters within or across AWS accounts. Data sharing reduces the need to keep multiple copies of the same data in different data warehouses to accelerate business decision-making across an organization. Lake Formation further enhances sharing data within Amazon Redshift data shares by providing fine-grained access control on tables and views.

For additional details on this feature, refer to AWS Lake Formation-managed Redshift datashares (preview) and How Redshift data share can be managed by Lake Formation.

Amazon EMR is a managed cluster platform to run big data applications using Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto at scale. You can use Amazon EMR to run batch and stream processing analytics jobs on your S3 data lakes. Starting with Amazon EMR release 6.7.0, we introduced Lake Formation permissions management on a runtime IAM role used with the EMR Steps API. This feature enables you to submit Apache Spark and Apache Hive applications to an EMR cluster through the EMR Steps API that enforces table-level and column-level permissions using Lake Formation to that IAM role submitting the application. This Lake Formation integration with Amazon EMR allows you to share an EMR cluster across multiple users in an organization with different permissions by isolating your applications through a runtime IAM role. We encourage you to check this feature in the Lake Formation workshop Integration with Amazon EMR using Runtime Roles. To explore a use case, see Introducing runtime roles for Amazon EMR steps: Use IAM roles and AWS Lake Formation for access control with Amazon EMR.

Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) that enables data scientists and developers to prepare data for building, training, tuning, and deploying models. Studio offers a native integration with Amazon EMR so that data scientists and data engineers can interactively prepare data at petabyte scale using open-source frameworks such as Apache Spark, Presto, and Hive using Studio notebooks. With the release of Lake Formation permissions management on a runtime IAM role, Studio now supports table-level and column-level access with Lake Formation. When users connect to EMR clusters from Studio notebooks, they can choose the IAM role (called the runtime IAM role) that they want to connect with. If data access is managed by Lake Formation, users can enforce table-level and column-level permissions using policies attached to the runtime role. For more details, refer to Apply fine-grained data access controls with AWS Lake Formation and Amazon EMR from Amazon SageMaker Studio.

Ingest and catalog varied data

A robust data governance model includes data from an organization’s many data sources and methods to discover and catalog those varied data assets. AWS Glue crawlers provide the ability to discover data from sources including Amazon S3, Amazon Redshift, and NoSQL databases, and populate the AWS Glue Data Catalog.

In 2022, we launched AWS Glue crawler support for Snowflake and AWS Glue crawler support for Delta Lake tables. These integrations allow AWS Glue crawlers to create and update Data Catalog tables based on these popular data sources. This makes it even easier to create extract, transform, and load (ETL) jobs with AWS Glue based on these Data Catalog tables as sources and targets.

In 2022, the AWS Glue crawlers UI was redesigned to offer a better user experience. One of the main enhancements delivered as part of this revision is the greater insights into AWS Glue crawler history. The crawler history UI provides an easy view of crawler runs, schedules, data sources, and tags. For each crawl, the crawler history offers a summary of changes in the database schema or Amazon S3 partition changes. Crawler history also provides detailed info about DPU hours and reduces the time spent analyzing and debugging crawler operations and costs. To explore the new functionalities added to the crawlers UI, refer to Set up and monitor AWS Glue crawlers using the enhanced AWS Glue UI and crawler history.

In 2022, we also extended support for crawlers based on Amazon S3 event notifications to support catalog tables. With this feature, incremental crawling can be offloaded from data pipelines to the scheduled AWS Glue crawler, reducing crawls to incremental S3 events. For more information, refer to Build incremental crawls of data lakes with existing Glue catalog tables.

More ways to share data beyond the data lake

During re:Invent 2022, we announced a preview of AWS Data Exchange for AWS Lake Formation, a new feature that enables data subscribers to find and subscribe to third-party datasets that are managed directly through Lake Formation. Until now, AWS Data Exchange subscribers could access third-party datasets by exporting providers’ files to their own S3 buckets, calling providers’ APIs through Amazon API Gateway, or querying producers’ Amazon Redshift data shares from their Amazon Redshift cluster. With the new Lake Formation integration, data providers curate AWS Data Exchange datasets using Lake Formation tags. Data subscribers are able to query and explore the databases and tables associated with those tags, just like any other AWS Glue Data Catalog resource. Organizations can apply resource-based Lake Formation permissions to share the licensed datasets within the same account or across accounts using AWS License Manager. AWS Data Exchange for Lake Formation streamlines data licensing and sharing operations by accelerating data onboarding, reducing the amount of ETL required for end-users to access third-party data, and centralizing governance and access controls for third-party data.

At re:Invent 2022, we also announced Amazon DataZone, a new data management service that makes it faster and easier for you to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources. Amazon DataZone is a business data catalog service that supplements the technical metadata in the AWS Glue Data Catalog. Amazon DataZone is integrated with Lake Formation permissions management so that you can effectively manage and govern access to your data, and audit who is accessing what data and for what purpose. With the publisher-subscriber model of Amazon DataZone, data assets can be shared and accessed across Regions. For additional details about the service and its capabilities, refer to the Amazon DataZone FAQs and re:Invent launch.

Conclusion

Data is transforming every field and every business. However, with data growing faster than most companies can keep track of, collecting, securing, and getting value out of that data is a challenging thing to do. A modern data strategy can help you create better business outcomes with data. AWS provides the most complete set of services for the end-to-end data journey to help you unlock value from your data and turn it into insight.

At AWS, we work backward from customer requirements. From the Lake Formation team, we worked hard to deliver the features described in this post, and we invite you to check them out. With our continued focus to invent, we hope to play a key role in empowering organizations to build new data governance models that help you derive more business value at lightning speed.

You can get started with Lake Formation by exploring our hands-on workshop modules and Getting started tutorials. We look forward to hearing from you, our customers, on your data lake and data governance use cases. Please get in touch through your AWS account team and share your comments.


About the Authors

Jason Berkowitz is a Senior Product Manager with AWS Lake Formation. He comes from a background in machine learning and data lake architectures. He helps customers become data-driven.

Aarthi Srinivasan is a Senior Big Data Architect with AWS Lake Formation. She enjoys building data lake solutions for AWS customers and partners. When not on the keyboard, she explores the latest science and technology trends and spends time with her family.

Leonardo Gómez is a Senior Analytics Specialist Solutions Architect at AWS. Based in Toronto, Canada, he has over a decade of experience in data management, helping customers around the globe address their business and technical needs.

United Arab Emirates IAR compliance assessment report is now available with 58 services in scope

Post Syndicated from Ioana Mecu original https://aws.amazon.com/blogs/security/united-arab-emirates-iar-compliance-assessment-report-is-now-available-with-58-services-in-scope/

Amazon Web Services (AWS) is pleased to announce the publication of our compliance assessment report on the Information Assurance Regulation (IAR) established by the Telecommunications and Digital Government Regulatory Authority (TDRA) of the United Arab Emirates. The report covers the AWS Middle East (UAE) Region, with 58 services in scope of the assessment.

The IAR provides management and technical information security controls to establish, implement, maintain, and continuously improve information assurance. AWS alignment with IAR requirements demonstrates our ongoing commitment to adhere to the heightened expectations for cloud service providers. As such, IAR-regulated customers can use AWS services with confidence.

Independent third-party auditors from BDO evaluated AWS for the period of November 1, 2021, to October 31, 2022. The assessment report illustrating the status of AWS compliance is available through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

For up-to-date information, including when additional services are added, see AWS Services in Scope by Compliance Program and choose IAR.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about IAR compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Ioana Mecu

Ioana Mecu

Ioana is a Security Audit Program Manager at AWS based in Madrid, Spain. She leads security audits, attestations, and certification programs across Europe and the Middle East. Ioana has previously worked in risk management, security assurance, and technology audits in the financial sector industry for the past 15 years.

Gokhan Akyuz

Gokhan Akyuz

Gokhan is a Security Audit Program Manager at AWS based in Amsterdam, Netherlands. He leads security audits, attestations, and certification programs across Europe and the Middle East. Gokhan has more than 15 years of experience in IT and cybersecurity audits and controls implementation in a wide range of industries.

AWS CloudHSM is now PCI PIN certified

Post Syndicated from Nivetha Chandran original https://aws.amazon.com/blogs/security/aws-cloudhsm-is-now-pci-pin-certified/

Amazon Web Services (AWS) is pleased to announce that AWS CloudHSM is certified for Payment Card Industry Personal Identification Number (PCI PIN) version 3.1.

With CloudHSM, you can manage and access your keys on FIPS 140-2 Level 3 certified hardware, protected with customer-owned, single-tenant hardware security module (HSM) instances that run in your own virtual private cloud (VPC). This PCI PIN attestation gives you the flexibility to deploy your regulated workloads with reduced compliance overhead.

Coalfire, a third-party Qualified Security Assessor (QSA), evaluated CloudHSM. Customers can access the PCI PIN Attestation of Compliance (AOC) report through AWS Artifact.

To learn more about our PCI program and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Nivetha Chandran

Nivetha is a Security Assurance Manager at Amazon Web Services on the Global Audits team, managing the PCI compliance program. Nivetha holds a Master’s degree in Information Management from the University of Washington.

Fall 2022 SOC reports now available in Spanish

Post Syndicated from Rodrigo Fiuza original https://aws.amazon.com/blogs/security/fall-2022-soc-reports-now-available-in-spanish/

Spanish version >>

We continue to listen to our customers, regulators, and stakeholders to understand their needs regarding audit, assurance, certification, and attestation programs at Amazon Web Services (AWS). We are pleased to announce that Fall 2022 System and Organization Controls (SOC) 1, SOC 2, and SOC 3 reports are now available in Spanish. These translated reports will help drive greater engagement and alignment with customer and regulatory requirements across Latin America and Spain.

The Spanish language version of the reports does not contain the independent opinion issued by the auditors or the control test results, but you can find this information in the English language version. Stakeholders should use the English version as a complement to the Spanish version.

Translated SOC reports in Spanish are available to customers through AWS Artifact. Translated SOC reports in Spanish will be published twice a year, in alignment with the Fall and Spring reporting cycles.

We value your feedback and questions—feel free to reach out to our team or give feedback about this post through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

 


Spanish

Los informes SOC de Otoño de 2022 ahora están disponibles en español

Seguimos escuchando a nuestros clientes, reguladores y partes interesadas para comprender sus necesidades en relación con los programas de auditoría, garantía, certificación y atestación en Amazon Web Services (AWS). Nos complace anunciar que los informes SOC 1, SOC 2 y SOC 3 de AWS de Otoño de 2022 ya están disponibles en español. Estos informes traducidos ayudarán a impulsar un mayor compromiso y alineación con los requisitos regulatorios y de los clientes en las regiones de América Latina y España.

La versión en inglés de los informes debe tenerse en cuenta en relación con la opinión independiente emitida por los auditores y los resultados de las pruebas de controles, como complemento de las versiones en español.

Los informes SOC traducidos en español están disponibles en AWS Artifact. Los informes SOC traducidos en español se publicarán dos veces al año según los ciclos de informes de Otoño y Primavera.

Valoramos sus comentarios y preguntas; no dude en ponerse en contacto con nuestro equipo o enviarnos sus comentarios sobre esta publicación a través de nuestra página Contáctenos.

Si tienes comentarios sobre esta publicación, envíalos en la sección Comentarios a continuación.

¿Desea obtener más noticias sobre seguridad de AWS? Síguenos en Twitter.

Author

Rodrigo Fiuza

Rodrigo is a Security Audit Manager at AWS, based in São Paulo. He leads audits, attestations, certifications, and assessments across Latin America, Caribbean and Europe. Rodrigo has previously worked in risk management, security assurance, and technology audits for the past 12 years.

Andrew Najjar

Andrew Najjar

Andrew is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS and has 8 years of experience in security assurance. Andrew holds a master’s degree in information systems and bachelor’s degree in accounting from Indiana University. He is a CPA and AWS Certified Solution Architect – Associate.

ryan wilks

Ryan Wilks

Ryan is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS. Ryan has 11 years of experience in information security and holds ITIL, CISM and CISA certifications.

Nathan Samuel

Nathan Samuel

Nathan is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS. Nathan has a Bachelors of Commerce degree from the University of the Witwatersrand, South Africa and has 17 years’ experience in security assurance and holds the CISA, CRISC, CGEIT, CISM, CDPSE and Certified Internal Auditor certifications.

C5 Type 2 attestation report now available with 156 services in scope

Post Syndicated from Julian Herlinghaus original https://aws.amazon.com/blogs/security/c5-type-2-attestation-report-now-available-with-156-services-in-scope/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS), and we are pleased to announce that AWS has successfully completed the 2022 Cloud Computing Compliance Controls Catalogue (C5) attestation cycle with 156 services in scope. This alignment with C5 requirements demonstrates our ongoing commitment to adhere to the heightened expectations for cloud service providers. AWS customers in Germany and across Europe can run their applications on AWS Regions in scope of the C5 report with the assurance that AWS aligns with C5 requirements.

The C5 attestation scheme is backed by the German government and was introduced by the Federal Office for Information Security (BSI) in 2016. AWS has adhered to the C5 requirements since their inception. C5 helps organizations demonstrate operational security against common cyberattacks when using cloud services within the context of the German Government’s Security Recommendations for Cloud Computing Providers.

Independent third-party auditors evaluated AWS for the period October 1, 2021, through September 30, 2022. The C5 report illustrates AWS’ compliance status for both the basic and additional criteria of C5. Customers can download the C5 report through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

AWS has added the following 16 services to the current C5 scope:

At present, the services offered in the Frankfurt, Dublin, London, Paris, Milan, Stockholm and Singapore Regions are in scope of this certification. For up-to-date information, see the AWS Services in Scope by Compliance Program page and choose C5.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about C5 compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Julian Herlinghaus

Julian Herlinghaus

Julian is a Manager in AWS Security Assurance based in Berlin, Germany. He leads third-party and customer security audits across Europe and specifically the DACH region. He has previously worked as Information Security department lead of an accredited certification body and has multiple years of experience in information security and security assurance & compliance.

Andreas Terwellen

Andreas Terwellen

Andreas is a senior manager in security audit assurance at AWS, based in Frankfurt, Germany. His team is responsible for third-party and customer audits, attestations, certifications, and assessments across Europe. Previously, he was a CISO in a DAX-listed telecommunications company in Germany. He also worked for different consulting companies managing large teams and programs across multiple industries and sectors.

AWS achieves HDS certification in two additional Regions

Post Syndicated from Janice Leung original https://aws.amazon.com/blogs/security/aws-achieves-hds-certification-in-two-additional-regions/

We’re excited to announce that two additional AWS Regions—Asia Pacific (Jakarta) and Europe (Milan)—have been granted the Health Data Hosting (Hébergeur de Données de Santé, HDS) certification. This alignment with HDS requirements demonstrates our continued commitment to adhere to the heightened expectations for cloud service providers. AWS customers who handle personal health data can use HDS-certified Regions with confidence to manage their workloads.

The following 18 Regions are in scope for this certification:

  • US East (Ohio)
  • US East (Northern Virginia)
  • US West (Northern California)
  • US West (Oregon)
  • Asia Pacific (Jakarta)
  • Asia Pacific (Seoul)
  • Asia Pacific (Mumbai)
  • Asia Pacific (Singapore)
  • Asia Pacific (Sydney)
  • Asia Pacific (Tokyo)
  • Canada (Central)
  • Europe (Frankfurt)
  • Europe (Ireland)
  • Europe (London)
  • Europe (Milan)
  • Europe (Paris)
  • Europe (Stockholm)
  • South America (São Paulo)

Introduced by the French governmental agency for health, Agence Française de la Santé Numérique (ASIP Santé), the HDS certification aims to strengthen the security and protection of personal health data. Achieving this certification demonstrates that AWS provides a framework for technical and governance measures to secure and protect personal health data, governed by French law.

Independent third-party auditors evaluated and certified AWS on January 13, 2023. The Certificate of Compliance that demonstrates AWS compliance status is available on the Agence du Numérique en Santé (ANS) website and AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

For up-to-date information, including when additional Regions are added, see the AWS Compliance Programs page, and choose HDS.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about HDS compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Author

Janice Leung

Janice is a security audit program manager at AWS, based in New York. She leads security audits across Europe and previously worked in security assurance and technology risk management in the financial industry for 11 years.

Recap to security, identity, and compliance sessions at AWS re:Invent 2022

Post Syndicated from Katie Collins original https://aws.amazon.com/blogs/security/recap-to-security-identity-and-compliance-sessions-at-aws-reinvent-2022/

AWS re:Invent returned to Las Vegas, NV, in November 2022. The conference featured over 2,200 sessions and hands-on labs and more than 51,000 attendees over 5 days. If you weren’t able to join us in person, or just want to revisit some of the security, identity, and compliance announcements and on-demand sessions, this blog post is for you.

re:Invent 2022

Key announcements

Here are some of the security announcements that we made at AWS re:Invent 2022.

  • We announced the preview of a new service, Amazon Security Lake. Amazon Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your AWS account. Security Lake makes it simpler to analyze security data so that you can get a more complete understanding of security across your entire organization. You can also improve the protection of your workloads, applications, and data. Security Lake automatically gathers and manages your security data across accounts and AWS Regions.
  • We introduced the AWS Digital Sovereignty Pledge—our commitment to offering the most advanced set of sovereignty controls and features available in the cloud. As part of this pledge, we launched a new feature of AWS Key Management Service, External Key Store (XKS), where you can use your own encryption keys stored outside of the AWS Cloud to protect data on AWS.
  • To help you with the building blocks for zero trust, we introduced two new services:
    • AWS Verified Access provides secure access to corporate applications without a VPN. Verified Access verifies each access request in real time and only connects users to the applications that they are allowed to access, removing broad access to corporate applications and reducing the associated risks.
    • Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service for custom applications. Using the Cedar policy language, Amazon Verified Permissions centralizes fine-grained permissions for custom applications and helps developers authorize user actions in applications.
  • We announced Automated sensitive data discovery for Amazon Macie. This new capability helps you gain visibility into where your sensitive data resides on Amazon Simple Storage Service (Amazon S3) at a fraction of the cost of running a full data inspection across all your S3 buckets. Automated sensitive data discovery automates the continual discovery of sensitive data and potential data security risks across your S3 storage aggregated at the AWS Organizations level.
  • Amazon Inspector now supports AWS Lambda functions, adding continual, automated vulnerability assessments for serverless compute workloads. Amazon Inspector automatically discovers eligible AWS Lambda functions and identifies software vulnerabilities in application package dependencies used in the Lambda function code. The functions are initially assessed upon deployment to Lambda and continually monitored and reassessed, informed by updates to the function and newly published vulnerabilities. When vulnerabilities are identified, actionable security findings are generated, aggregated in Amazon Inspector, and pushed to Security Hub and Amazon EventBridge to automate workflows.
  • Amazon GuardDuty now offers threat detection for Amazon Aurora to identify potential threats to data stored in Aurora databases. Currently in preview, Amazon GuardDuty RDS Protection profiles and monitors access activity to existing and new databases in your account, and uses tailored machine learning models to detect suspicious logins to Aurora databases. When a potential threat is detected, GuardDuty generates a security finding that includes database details and contextual information on the suspicious activity. GuardDuty is integrated with Aurora for direct access to database events without requiring you to modify your databases.
  • AWS Security Hub is now integrated with AWS Control Tower, allowing you to pair Security Hub detective controls with AWS Control Tower proactive or preventive controls and manage them together using AWS Control Tower. Security Hub controls are mapped to related control objectives in the AWS Control Tower control library, providing you with a holistic view of the controls required to meet a specific control objective. This combination of over 160 detective controls from Security Hub, with the AWS Control Tower built-in automations for multi-account environments, gives you a strong baseline of governance and off-the-shelf controls to scale your business using new AWS workloads and services. This combination of controls also helps you monitor whether your multi-account AWS environment is secure and managed in accordance with best practices, such as the AWS Foundational Security Best Practices standard.
  • We launched our Cloud Audit Academy (CAA) course for Federal and DoD Workloads (FDW) on AWS. This new course is a 12-hour interactive training based on NIST SP 800-171, with mappings to NIST SP 800-53 and the Cybersecurity Maturity Model Certification (CMMC) and covers AWS services relevant to each NIST control family. This virtual instructor-led training is industry- and framework-specific for our U.S. Federal and DoD customers.
  • AWS Wickr allows businesses and public sector organizations to collaborate more securely, while retaining data to help meet requirements such as e-discovery and Freedom of Information Act (FOIA) requests. AWS Wickr is an end-to-end encrypted enterprise communications service that facilitates one-to-one chats, group messaging, voice and video calling, file sharing, screen sharing, and more.
  • We introduced the Post-Quantum Cryptography hub that aggregates resources and showcases AWS research and engineering efforts focused on providing cryptographic security for our customers, and how AWS interfaces with the global cryptographic community.

Watch on demand

Were you unable to join the event in person? See the following for on-demand sessions.

Keynotes and leadership sessions

Watch the AWS re:Invent 2022 keynote where AWS Chief Executive Officer Adam Selipsky shares best practices for managing security, compliance, identity, and privacy in the cloud. You can also replay the other AWS re:Invent 2022 keynotes.

To learn about the latest innovations in cloud security from AWS and what you can do to foster a culture of security in your business, watch AWS Chief Information Security Officer CJ Moses’s leadership session with guest Deneen DeFiore, Chief Information Security Officer at United Airlines.

Breakout sessions and new launch talks

You can watch talks and learning sessions on demand to learn about the following topics:

  • See how AWS, customers, and partners work together to raise their security posture with AWS infrastructure and services. Learn about trends in identity and access management, threat detection and incident response, network and infrastructure security, data protection and privacy, and governance, risk, and compliance.
  • Dive into our launches! Hear from security experts on recent announcements. Learn how new services and solutions can help you meet core security and compliance requirements.

Consider joining us for more in-person security learning opportunities by saving the date for AWS re:Inforce 2023, which will be held June 13-14 in Anaheim, California. We look forward to seeing you there!

If you’d like to discuss how these new announcements can help your organization improve its security posture, AWS is here to help. Contact your AWS account team today.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Katie Collins

Katie Collins

Katie is a Product Marketing Manager in AWS Security, where she brings her enthusiastic curiosity to deliver products that drive value for customers. Her experience also includes product management at both startups and large companies. With a love for travel, Katie is always eager to visit new places while enjoying a great cup of coffee.

Author

Himanshu Verma

Himanshu is a Worldwide Specialist for AWS Security Services. In this role, he leads the go-to-market creation and execution for AWS Security Services, field enablement, and strategic customer advisement. Prior to AWS, he held several leadership roles in Product Management, engineering and development, working on various identity, information security and data protection technologies. He obsesses brainstorming disruptive ideas, venturing outdoors, photography and trying various “hole in the wall” food and drinking establishments around the globe.

Introducing the Security Design of the AWS Nitro System whitepaper

Post Syndicated from J.D. Bean original https://aws.amazon.com/blogs/security/introducing-the-security-design-of-the-aws-nitro-system-whitepaper/

AWS recently released a whitepaper on the Security Design of the AWS Nitro System. The Nitro System is a combination of purpose-built server designs, data processors, system management components, and specialized firmware that serves as the underlying virtualization technology that powers all Amazon Elastic Compute Cloud (Amazon EC2) instances launched since early 2018. With the Nitro System, AWS undertook an effort to reimagine the architecture of virtualization to deliver security, isolation, performance, cost savings, and a pace of innovation that our customers require.

This whitepaper is a detailed design document on the inner workings of the AWS Nitro System, and how we use it to help secure your most critical workloads. This is the first time that AWS has provided such a detailed design document on the Nitro System and how it offers a no-operator access design and strong tenant isolation. The whitepaper describes the security design of the Nitro System in detail to help you evaluate Amazon EC2 for your sensitive workloads.

Three key components of the Nitro System are used to implement this design:

  • Purpose-built Nitro Cards – Hardware devices designed by AWS that provide overall system control and I/O virtualization that is independent of the main system board with its CPUs and memory.
  • Nitro Security Chip – Enables a secure boot process for the overall system based on a hardware root of trust, the ability to offer bare metal instances, and defense-in-depth that offers protection to the server from unauthorized modification of system firmware.
  • Nitro Hypervisor – A deliberately minimized and firmware-like hypervisor designed to provide strong resource isolation, and performance that is nearly indistinguishable from a bare metal server.

The whitepaper describes the fundamental architectural change introduced by the Nitro System compared to previous approaches to virtualization. It discusses the three key components of the Nitro System, and provides a demonstration of how these components work together by walking through what happens when a new Amazon Elastic Block Store (Amazon EBS) volume is added to a running EC2 instance. The whitepaper also discusses how the Nitro System is designed to eliminate the possibility of administrator access to an EC2 server, the overall passive communications design of the Nitro System, and the Nitro System change management process. Finally, the paper surveys important aspects of the EC2 system design that provide mitigations against potential side-channel issues that can arise in compute environments.

The whitepaper dives deep into each of these considerations, offering a detailed picture of the Nitro System security design. For more information about cloud security at AWS, contact us.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

J.D. Bean

J.D. is a Principal Security Architect for Amazon EC2 based out of New York City. His interests include security, privacy, and compliance. He is passionate about his work enabling AWS customers’ successful cloud journeys. J.D. holds a Bachelor of Arts from The George Washington University and a Juris Doctor from New York University School of Law.

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Post Syndicated from Aniket Jiddigoudar original https://aws.amazon.com/blogs/big-data/getting-started-with-aws-glue-data-quality-from-the-aws-glue-data-catalog/

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning (ML), and application development. You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores.

Hundreds of thousands of customers use data lakes for analytics and machine learning to make data-driven business decisions. Data consumers lose trust in data if it is not accurate and recent, making data quality essential for undertaking optimal and correct decisions.

Evaluation of the accuracy and freshness of data is a common task for engineers. Currently, there are various tools available to evaluate data quality. However, these tools often require manual processes of data discovery and expertise in data engineering and coding.

We are pleased to announce the public preview launch of AWS Glue Data Quality. You can access this feature today without requesting any additional access in the available Regions. AWS Glue Data Quality is a new preview feature of AWS Glue that measures and monitors the data quality of Amazon S3-based data lakes and in AWS Glue ETL jobs. It does not require any expertise in data engineering or coding. It simplifies your experience of monitoring and evaluating the quality of your data.

This is Part 1 of a four-part series of posts to explain how AWS Glue Data Quality works. Check out the next posts in the series:

Getting started with AWS Glue Data Quality

In this post, we will go over the simplicity of using the AWS Glue Data Quality feature by:

  1. Starting data quality recommendations and runs on your data in AWS Glue Data Catalog.
  2. Creating an Amazon CloudWatch alarm for getting notifications when data quality results are below a certain threshold.
  3. Analyzing your AWS Glue Data Quality run results through Amazon Athena.

Set up resources with AWS CloudFormation

The provided CloudFormation script creates the following resources for you:

  1. The IAM role required to run AWS Glue Data Quality runs
  2. An Amazon Simple Storage Service (Amazon S3) bucket to store the NYC Taxi dataset
  3. An S3 bucket to store and analyze the results of AWS Glue Data Quality runs
  4. An AWS Glue database and table created from the NYC Taxi dataset

Steps:

  1. Open the AWS CloudFormation console.
  2. Choose Create stack and then select With new resources (standard).
  3. For Template source, choose Upload a template File, and provide the above attached template file. Then choose Next.
  4. For Stack name, DataQualityDatabase, and DataQualityTable, leave as default. For DataQualityS3BucketName, enter the name of your S3 bucket. Then choose Next.
  5. On the final screen, make sure to acknowledge that this stack would create IAM resources for you, and choose Submit.
  6. Once the stack is successfully created, navigate to the S3 bucket created by the stack and upload the yellow_tripdata_2022-01.parquet file.

Start an AWS Glue Data Quality run on your data in AWS Glue Data Catalog

In this first section, we will generate data quality rule recommendations from the AWS Glue Data Quality service. Using these recommendations, we will then run a data quality task against our dataset to obtain an analysis of our data.

To get started, complete the following steps:

  1. Open AWS Glue console.
  2. Choose Tables under Data Catalog.
  3. Select the DataQualityTable table created via the CloudFormation stack.
  4. Select the Data quality tab.
  5. Choose Recommend ruleset.
  6. On the Recommend data quality rules page, check Save recommended rules as a ruleset. This will allow us to save the recommended rules in a ruleset automatically, for use in the next steps.
  7. For IAM Role, choose the IAM role that was created from the CloudFormation stack.
  8. For Additional configurations -optional, leave the default number of workers and timeout.
  9. Choose Recommend ruleset. This will start a data quality recommendation run, with the given number of workers.
  10. Wait for the ruleset to be completed.
  11. Once completed, navigate back to the Rulesets tab. You should see a successful recommendation run and a ruleset created.

Understand AWS Glue Data Quality recommendations

AWS Glue Data Quality recommendations are suggestions generated by the AWS Glue Data Quality service and are based on the shape of your data. These recommendations automatically take into account aspects like RowCounts, Mean, Standard Deviation etc. of your data, and generate a set of rules, for you to use as a starting point.

The dataset used here was the NYC Taxi dataset. Based on this, the columns in this dataset, and the values of those columns, AWS Glue Data Quality recommends a set of rules. In total, the recommendation service automatically took into consideration all the columns of the dataset, and recommended 55 rules.

Some of these rules are:

  • “RowCount between <> and <> ” → Expect a count of number of rows based on the data it saw
  • “ColumnValues “VendorID” in [ ] → Expect the ”VendorID“ column to be within a specific set of values
  • IsComplete “VendorID” → Expect the “VendorID” to be a non-null value

How do I use the recommended AWS Glue Data Quality rules?

  1. From the Rulesets section, you should see your generated ruleset. Select the generated ruleset, and choose Evaluate ruleset.
    • If you didn’t check the box to Save recommended rules as a ruleset when you ran the recommendation, you can still click on the recommendation task run and copy the rules to create a new ruleset
  2. For Data quality actions under Data quality properties, select Publish metrics to Amazon CloudWatch. If this box isn’t checked, the data quality run will not publish metrics to Amazon CloudWatch.
  3. For IAM role, select the GlueDataQualityBlogRole created in the AWS CloudFormation stack.
  4. For Requested number of workers under Advanced properties, leave as default. 
  5. For Data quality results location, select the value of the GlueDataQualityResultsS3Bucket location that was created via the AWS CloudFormation stack
  6. Choose Evaluate ruleset.
  7. Once the run begins, you can see the status of the run on the Data quality results tab.
  8. After the run reaches a successful stage, select the completed data quality task run, and view the data quality results shown in Run results.

Our recommendation service suggested that we enforce 55 rules, based on the column values and the data within our NYC Taxi dataset. We then converted the collection of 55 rules into a RuleSet. Then, we ran a Data Quality Evaluation task run using our RuleSet against our dataset. In our results above, we see the status of each within the RuleSet.

You can also utilize the AWS Glue Data Quality APIs to carry out these steps.

Get Amazon SNS notifications for my failing data quality runs through Amazon CloudWatch alarms

Each AWS Glue Data Quality evaluation run from the Data Catalog, emits a pair of metrics named glue.data.quality.rules.passed (indicating a number of rules that passed) and glue.data.quality.rules.failed (indicating the number of failed rules) per data quality run. This emitted metric can be used to create alarms to alert users if a given data quality run falls below a threshold.

To get started with setting up an alarm that would send an email via an Amazon SNS notification, follow the steps below:

  1. Open Amazon CloudWatch console.
  2. Choose All metrics under Metrics. You will see an additional namespace under Custom namespaces titled Glue Data Quality.

Note: When starting an AWS Glue Data Quality run, make sure the Publish metrics to Amazon CloudWatch checkbox is enabled, as shown below. Otherwise, metrics for that particular run will not be published to Amazon CloudWatch.

  1. Under the Glue Data Quality namespace, you should be able to see metrics being emitted per table, per ruleset. For the purpose of our blog, we shall be using the glue.data.quality.rules.failed rule and alarm, if this value goes over 1 (indicating that, if we see a number of failed rule evaluations greater than 1, we would like to be notified).
  2. In order to create the alarm, choose All alarms under Alarms.
  3. Choose Create alarm.
  4. Choose Select metric.
  5. Select the glue.data.quality.rules.failed metric corresponding to the table you’ve created, then choose Select metric.
  6. Under the Specify metric and conditions tab, under the Metrics section:
    1. For Statistic, select Sum.
    2. For Period, select 1 minute.
  7. Under the Conditions section:
    1. For Threshold type, choose Static.
    2. For Whenever glue.data.quality.rules.failed is…, select Greater/Equal.
    3. For than…, enter 1 as the threshold value.
    4. Expand the Additional configurations dropdown and select Treat missing data as good

These selections imply that if the glue.data.quality.rules.failed metric emits a value greater than or equal to 1, we will trigger an alarm. However, if there is no data, we will treat it as acceptable.

  1. Choose Next.
  2. On Configure actions:
    1. For the Alarm state trigger section, select In alarm .
    2. For Send a notification to the following SNS topic, choose Create a new topic to send a notification via a new SNS topic.
    3. For Email endpoints that will receive the notification…, enter your email address. Choose Next.
  3. For Alarm name, enter myFirstDQAlarm, then choose Next.
  4. Finally, you should see a summary of all the selections on the Preview and create screen. Choose Create alarm at the bottom.
  5. You should now be able to see the alarm being created from the Amazon CloudWatch alarms dashboard.

In order to demonstrate AWS Glue Data Quality alarms, we are going to go over a real-world scenario where we have corrupted data being ingested, and how we could use the AWS Glue Data Quality service to get notified of this, using the alarm we created in the previous steps. For this purpose, we will use the provided file malformed_yellow_taxi.parquet that contains data that has been tweaked on purpose.

  1. Navigate to the S3 location DataQualityS3BucketName mentioned in the CloudFormation template supplied at the beginning of the blog post.
  2. Upload the malformed_yellow_tripdata.parquet file to this location. This will help us simulate a flow where we have a file with poor data quality coming into our data lakes via our ETL processes.
  3. Navigate to the AWS Glue Data Catalog console, select the demo_nyc_taxi_data_input that was created via the provided AWS CloudFormation template and then navigate to the Data quality tab.
  4. Select the RuleSet we had created in the first section. Then select Evaluate ruleset.
  5. From the Evaluate data quality screen:
    1. Check the box to Publish metrics to Amazon CloudWatch. This checkbox is needed to ensure that the failure metrics are emitted to Amazon CloudWatch.
    2. Select the IAM role created via the AWS CloudFormation template.
    3. Optionally, select an S3 location to publish your AWS Glue Data Quality results.
    4. Select Evaluate ruleset.
  6.  Navigate to the Data Quality results tab. You should now see two runs, one from the previous steps of this blog and one that we currently triggered. Wait for the current run to complete.
  7. As you see, we have a failed AWS Glue Data Quality run result, with only 52 of our original 55 rules passing. These failures are attributed to the new file we uploaded to S3.
  8. Navigate to the Amazon CloudWatch console and select the alarm we created at the beginning of this section.
  9. As you can see, we configured the alarm to fire every time the glue.data.quality.rules.failed metric crosses a threshold of 1. After the above AWS Glue Data Quality run, we see 3 rules failing, which triggered the alarm. Further, you also should have gotten an email detailing the alarm’s firing.

We have thus demonstrated an example where incoming malformed data, coming into our data lakes can be identified via the AWS Glue Data Quality rules, and subsequent alerting mechanisms can be created to notify appropriate personas.

Analyze your AWS Glue Data Quality run results through Amazon Athena

In scenarios where you have multiple AWS Glue Data Quality run results against a dataset, over a period of time, you might want to track the trends of the dataset’s quality over a period of time. To achieve this, we can export our AWS Glue Data Quality run results to S3, and use Amazon Athena to run analytical queries against the exported run. The results can then be further used in Amazon QuickSight to build dashboards to have a graphical representation of your data quality trends

In the third part of this post, we will see the steps needed to start tracking data on your dataset’s quality:

  1. For our data quality runs that we set up in the previous sections, we set the Data quality results location parameter to the bucket location specified by the AWS CloudFormation stack.
  2. After each successful run, you should see a single JSONL file being exported to your selected S3 location, corresponding to that particular run.
  3. Open the Amazon Athena console.
  4. In the query editor, run the following CREATE TABLE statement (replace the <my_table_name> with a relevant value, and <GlueDataQualityResultsS3Bucket_from_cfn> section with the GlueDataQualityResultsS3Bucket value from the provided AWS CloudFormation template):
    CREATE EXTERNAL TABLE `<my_table_name>`(
    `catalogid` string,
    `databasename` string,
    `tablename` string,
    `dqrunid` string,
    `evaluationstartedon` timestamp,
    `evaluationcompletedon` timestamp,
    `rule` string,
    `outcome` string,
    `failurereason` string,
    `evaluatedmetrics` string)
    PARTITIONED BY (
    `year` string,
    `month` string,
    `day` string)
    ROW FORMAT SERDE
    'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES (
    'paths'='catalogId,databaseName,dqRunId,evaluatedMetrics,evaluationCompletedOn,evaluationStartedOn,failureReason,outcome,rule,tableName')
    STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
    's3://<GlueDataQualityResultsS3Bucket_from_cfn>/'
    TBLPROPERTIES (
    'classification'='json',
    'compressionType'='none',
    'typeOfData'='file')
    
    MSCK REPAIR TABLE `<my_table_name>`

  5. Once the above table is created, you should be able to run queries to analyze your data quality results.

For example, consider the following query that shows me the failed AWS Glue Data Quality runs against my table demo_nyc_taxi_data_input within a time window:

SELECT * from "<my_table_name>"
WHERE "outcome" = 'Failed'
AND "tablename" = '<my_source_table>'
AND "evaluationcompletedon" between
parse_datetime('2022-12-05 17:00:00:000', 'yyyy-MM-dd HH:mm:ss:SSS') AND parse_datetime('2022-12-05 20:00:00:000', 'yyyy-MM-dd HH:mm:ss:SSS');

The output of the above query shows me details about all the runs with “outcome” = ‘Failed’ that ran against my NYC Taxi dataset table ( “tablename” = ‘demo_nyc_taxi_data_input’ ). The output also gives me information about the failure reason ( failurereason ) and the values it was evaluated against ( evaluatedmetrics ).

As you can see, we are able to get detailed information about our AWS Glue Data Quality runs, via the run results uploaded to S3, perform more detailed analysis and build dashboards on top of the data.

Clean up

  • Navigate to the Amazon Athena console and delete the table created for data quality analysis.
  • Navigate to the Amazon CloudWatch console and delete the alarms created.
  • If you deployed the sample CloudFormation stack, delete the CloudFormation stack via the AWS CloudFormation console. You will need to empty the S3 bucket before you delete the bucket.
  • If you have enabled your AWS Glue Data Quality runs to output to S3, empty those buckets as well.

Conclusion

In this post, we talked about the ease and speed of incorporating data quality rules using the AWS Glue Data Quality feature, into your AWS Glue Data Catalog tables. We also talked about how to run recommendations and evaluate data quality against your tables. We then discussed analyzing the data quality results via Amazon Athena, and the process for setting up alarms via Amazon CloudWatch in order to notify users of failed data quality.

To dive into the AWS Glue Data Quality APIs, take a look at the AWS Glue Data Quality API documentation
To learn more about AWS Glue Data Quality, check out the AWS Glue Data Quality Developer Guide


About the authors

Aniket Jiddigoudar is a Big Data Architect on the AWS Glue team.

Joseph Barlan is a Frontend Engineer at AWS Glue. He has over 5 years of experience helping teams build reusable UI components and is passionate about frontend design systems. In his spare time, he enjoys pencil drawing and binge watching tv shows.

Authority to operate (ATO) on AWS Program now available for customers in Spain

Post Syndicated from Greg Herrmann original https://aws.amazon.com/blogs/security/authority-to-operate-on-aws-program-now-available-for-customers-in-spain/

Meeting stringent security and compliance requirements in regulated or public sector environments can be challenging and time consuming, even for organizations with strong technical competencies. To help customers navigate the different requirements and processes, we launched the ATO on AWS Program in June 2019 for US customers. The program involves a community of expert AWS partners to help support and accelerate customers’ ability to meet their security and compliance obligations.

We’re excited to announce that we have now expanded the ATO on AWS Program to Spain. As part of the launch in Spain, we recruited and vetted five partners with a demonstrated competency in helping customers meet Spanish and European Union (EU) regulatory compliance and security requirements, such as the General Data Protection Regulation (GDPR), Esquema Nacional de Seguridad (ENS), and European Banking Authority guidelines.

How Does the ATO on AWS Program support customers?

The primary offering of the ATO on AWS Program is access to a community of vetted, expert partners that specialize in customers’ authorization needs, whether it be architecting, configuring, deploying, or integrating tools and controls. The team also provides direct engagement activities to introduce you to publicly available and no-cost resources, tools, and offerings so you can work to meet your security obligations on AWS. These activities include one-on-one meetings, answering questions, technical workshops (in specific cases), and more.

Who are the partners?

Partners in the ATO on AWS Program go through a rigorous evaluation conducted by a team of AWS Security and Compliance experts. Before acceptance into the program, the partners complete a checklist of criteria and provide detailed evidence that they meet those criteria.

Our initial launch in Spain includes the following five partners that have successfully met the criteria to join the program. Each partner has also achieved the Esquema Nacional de Seguridad certification.

  • ATOS – a global leader in digital transformation, cybersecurity, and cloud and high performance computing. ATOS was ranked #1 in Managed Security Services (MSS) revenue by Gartner in 2021.
  • Indra Sistemas – a global technology and consulting company that provides proprietary solutions for the transport and defense markets. It also offers digital transformation consultancy and information technologies in Spain and Latin America through its affiliate Minsait.
  • NTT Data EMEAL ­– an operational company created from an alliance between everis and NTT DATA EMEAL to support clients in Europe and Latin America. NTT Data EMEAL supports customers through strategic consulting and advisory services, new technologies, applications, infrastructure, IT modernization, and business process outsourcing (BPO).
  • Telefónica Tech – a leading company in digital transformation. Telefónica Tech combines cybersecurity and cloud technologies to help simplify technological environments and build appropriate solutions for customers.
  • T-Systems – a leading service provider for the public sector in Spain. As an AWS Premier Tier Services Partner and Managed Service Provider, T-Systems maintains the Security and Migration Competencies, supporting customers with migration and secure operation of applications.

For a complete list of ATO on AWS Program partners, see the ATO on AWS Partners page.

Engage the ATO on AWS Program

Customers seeking support can engage the ATO on AWS Program and our partners in multiple ways. The best way to reach us is to complete a short, online ATO on AWS Questionnaire so we can learn more about your timeline and goals. If you prefer to engage AWS partners directly, see the complete list of our partners and their contact information at ATO on AWS Partners.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Greg Herrmann

Greg Herrmann

Greg has worked in the security and compliance field for over 18 years, supporting both classified and unclassified workloads for U.S. federal and DoD customers. He has been with AWS for more than 6 years as a Senior Security Partner Strategist for the Security and Compliance Partner Team, working with AWS partners and customers to accelerate security and compliance processes.

Borja Larrumbide

Borja Larrumbide

Borja is a Security Assurance Manager for AWS in Spain and Portugal. Previously, he worked at companies such as Microsoft and BBVA in different roles and sectors. Borja is a seasoned security assurance practitioner with years of experience engaging key stakeholders at national and international levels. His areas of interest include security, privacy, risk management, and compliance.

How to use Amazon Verified Permissions for authorization

Post Syndicated from Jeremy Ware original https://aws.amazon.com/blogs/security/how-to-use-amazon-verified-permissions-for-authorization/

Applications with multiple users and shared data require permissions management. The permissions describe what each user of an application is permitted to do. Permissions are defined as allow or deny decisions for resources in the application.

To manage permissions, developers often combine attribute-based access control (ABAC) and role-based access control (RBAC) models with custom code coupled with business logic. This requires a review of the code to understand the permissions, and changes to the code to modify the permissions. Auditing permissions within an application can require the same level of time and effort as a full application code review. This can cause delays to deliver and require additional time and resources to ascertain permissions across your application.

In this post, I will show you how to use Amazon Verified Permissions to define permissions within custom applications using the Cedar policy language. I’ll also show you how authorization requests are made.

Overview of Amazon Verified Permissions

Amazon Verified Permissions provides a prebuilt, flexible permissions system that you can use to build permissions based on both ABAC and RBAC in your applications. You define and manage fine-grained permissions using both permit policies, that grant permissions, and forbid policies, that restrict an action. This lets you focus on building or modernizing the application.

Amazon Verified Permissions maintains a centralized policy store, which helps you manage permissions throughout an application, authorize actions, and analyze permissions with automated reasoning. It also has an evaluation simulator tool to help you test your authorization decisions and author policies.

Policy creation

To author policies with Amazon Verified Permissions, use the purpose-built Cedar policy language to create specific permission policies that include traits of ABAC and RBAC. This allows you to apply granularity with least privilege in mind.

The following figure shows a permission policy for a document management application. In the figure, between the set of parentheses on lines 1-4 of the policy, RBAC is used, based on the principal’s UserGroup, to limit the permit action to registered users—and not guest or machine principals, for example. Between the brackets on lines 5–7 of the policy, ABAC is used, where resource.owner == principal limits access to the resource to only the owner.

Figure 1: Using the Cedar policy language to create permissions

Figure 1: Using the Cedar policy language to create permissions

Policies are developed in two ways:

  • Developers build out policies as part of the deployment of the application – Policy permissions that are defined as part of deployment are a great way for developers to set up guardrails on actions that should not cross set boundaries.
  • Policies are created through the use of the application by end users – Policy permissions that are configurable within the application provide the freedom for data to be shared between users.

We will walk you through these two approaches in the following sections.

Create policies as part of the deployment of the application

The following figure shows how a developer can configure a permit policy as part of the deployment of an application.

Figure 2: Creating policies as part of the deployment of the application

Figure 2: Creating policies as part of the deployment of the application

Policies configured by developers with pre-defined permissions that are deployed alongside the application is a familiar method for setting up guardrails in an application. Consider the document management application shown in Figure 3. There is a permit policy in place that allows users to view their own documents. Without a policy, the default result is a deny. You should also configure explicit forbid policies to act as guardrails to prevent overly permissive policies. In Figure 3, the policy restricts a user to only GET documents that they own or that are not tagged as private.

Figure 3: Example of a permit policy using Cedar

Figure 3: Example of a permit policy using Cedar

Create policies within the application by end users

The following figure shows how end users can apply policies within the application.

Figure 4: How permissions can be applied using policies for application end users

Figure 4: How permissions can be applied using policies for application end users

In a document sharing application, the application usually provides a simple end-user experience with a menu containing point-and-click actions that allow the user to select predefined permissions, such as read, write, or delete. Abstracted by the application, these permissions are transformed into Amazon Verified Permissions policy statements and stored in the designated policy location for the application. When an end user tries to take actions protected by these permissions, the application queries the Amazon Verified Permissions backend to determine if the principal in question has permissions to do so.

You can allow users of the application to create policies directly with respect to their given environments or current permissions. For example, if the application is targeted to system administrators or engineers who are technically proficient, you might choose not to hide the policy generation process behind a UI. The Amazon Verified Permissions policy grammar is designed for users comfortable with text-based query languages. Figure 5 shows an example policy that allows a user to GET or POST documents that they own.

Figure 5: Amazon Verified Permissions policy grammar written with Cedar to define permissions

Figure 5: Amazon Verified Permissions policy grammar written with Cedar to define permissions

Conclusion

Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service that helps you build and modernize applications without relying heavily on coding authorization within your applications. By using the Cedar policy language, you can define granular access controls that use both RBAC and ABAC and help end users create policies within the application. This allows for alignment of authorization standards across applications and provides clear visibility into existing permissions for review and audibility.

To learn more about ABAC and RBAC and how to design policy statements, see the blog post Get the best out of Amazon Verified Permissions by using fine-grained authorization methods.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Jeremy Wave

Jeremy Ware

Jeremy is a Security Specialist Solutions Architect focused on Identity and Access Management. Jeremy and his team enable AWS customers to implement sophisticated, scalable, and secure IAM architecture and Authentication workflows to solve business challenges. With a background in Security Engineering, Jeremy has spent many years working to raise the Security Maturity gap at numerous global enterprises. Outside of work, Jeremy loves to explore the mountainous outdoors participate in sports such as Snowboarding, Wakeboarding, and Dirt bike riding.

AWS achieves GNS Portugal certification for classified information

Post Syndicated from Rodrigo Fiuza original https://aws.amazon.com/blogs/security/aws-achieves-gns-portugal-certification-for-classified-information/

GNS Logo

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS), and we are pleased to announce that our Regions and AWS Edge locations in Europe are now certified by the Portuguese GNS/NSO (National Security Office) at the National Restricted level. This certification demonstrates our ongoing commitment to adhere to the heightened expectations for cloud service providers to process, transmit, and store classified data.

The GNS certification is based on NIST SP800-53 R4 and CSA CCM v4 frameworks, with the goal of protecting the processing and transmission of classified information.

AWS was evaluated by Adyta Lda, an independent third-party auditor, and by GNS Portugal. The Certificate of Compliance illustrating the compliance status of AWS is available on the GNS Certifications page and through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

As of this writing, 26 services offered in Europe are in scope of this certification. For up-to-date information, including when additional services are added, see the AWS Services in Scope by Compliance Program and select GNS.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about GNS Portugal compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Rodrigo Fiuza

Rodrigo is a Security Audit Manager at AWS, based in São Paulo. He leads audits, attestations, certifications, and assessments across Latin America, Caribbean and Europe. Rodrigo has previously worked in risk management, security assurance, and technology audits for the past 12 years.

Approaches for authenticating external applications in a machine-to-machine scenario

Post Syndicated from Patrick Sard original https://aws.amazon.com/blogs/security/approaches-for-authenticating-external-applications-in-a-machine-to-machine-scenario/

December 8, 2022: This post has been updated to reflect changes for M2M options with the new service of IAMRA. This blog post was first published November 19, 2013.

August 10, 2022: This blog post has been updated to reflect the new name of AWS Single Sign-On (SSO) – AWS IAM Identity Center. Read more about the name change here.


Amazon Web Services (AWS) supports multiple authentication mechanisms (AWS Signature v4, OpenID Connect, SAML 2.0, and more), essential in providing secure access to AWS resources. However, in a strictly machine-to machine (m2m) scenario, not all are a good fit. In these cases, a human is not present to provide user credential input. An example of such a scenario is when an on-premises application sends data to an AWS environment, as shown in Figure 1.

This post is designed to help you decide which approach is best to securely connect your applications, either residing on premises or hosted outside of AWS, to your AWS environment when no human interaction comes into play. We will go through the various alternatives available and highlight the pros and cons of each.

Figure 1: Securely connect your external applications to AWS in machine-to-machine scenarios

Figure 1: Securely connect your external applications to AWS in machine-to-machine scenarios

Determining the best approach

Let’s start by looking at possible authentication mechanisms that AWS supports in the following table. We’ll first identify the AWS service or services where the authentication can be set up—called the AWS front-end service. Then we’ll point out the AWS service that actually handles the authentication with AWS in the background—called the AWS backend service. We will also assess each mechanism based on use case.

Table 1: Authentication mechanisms available in AWS
Authentication mechanism AWS front-end service AWS backend service Good for m2m communication?
AWS Signature v4
  • All
AWS Security Token Service (AWS STS) Yes
Mutual TLS AWS STS Yes
OpenID Connect AWS STS Yes
SAML AWS STS Yes
Kerberos
  • n/a
AWS STS Yes
Microsoft Active Directory communication AWS STS No
IAM Roles Anywhere AWS STS Yes

Notes

We’ll now review each of these alternatives and also evaluate two additional characteristics on a 5-grade scale (from very low to very high) for each authentication mechanism:

  • Complexity: How complex is it to implement the authentication mechanism?
  • Convenience: How convenient is it to use the authentication mechanism on an ongoing basis?

As you’ll see, not all of the mechanisms are necessarily a good fit for a machine-to-machine scenario. Our focus here is on authentication of external applications, but not authentication of servers or other computers or Internet of Things (IoT) devices, which has already been documented extensively.

Active Directory–based authentication is available through either AWS IAM Identity Center or a limited set of AWS services and is meant in both cases to provide end users with access to AWS accounts and business applications. Active Directory–based authentication is also used broadly to authenticate devices such as Windows or Linux computers on a network. However, it isn’t used for authenticating applications with AWS. For that reason, we’ll exclude it from further scrutiny in this article.

Let’s look at the remaining authentication mechanisms one by one, with their respective pros and cons.

AWS Signature v4

The purpose of AWS Signature v4 is to authenticate incoming HTTP(S) requests to AWS services APIs. The AWS Signature v4 process is explained in detail in the documentation for the AWS APIs but, in a nutshell, the caller computes a signature using their credentials and then adds it to the header of the HTTP(S) request. On the other end, AWS accepts the request only if the provided signature is valid.

Figure 2: AWS Signature v4 authentication

Figure 2: AWS Signature v4 authentication

Native to AWS, low in complexity and highly convenient, AWS Signature v4 is the natural choice for machine-to-machine authentication scenarios with AWS. It is used behind the scenes by the AWS Command Line Interface (AWS CLI) and the AWS SDKs.

Pros

  • AWS Signature v4 is very convenient: the signature is built in the SDKs provided by AWS and is automatically computed on the caller’s behalf. If you prefer not to use an SDK, the signature process is a simple computation that can be implemented in any programming language.
  • There are fewer credentials to manage. No need to manage tedious digital certificates or even long-lived AWS credentials, because the AWS Signature v4 process supports temporary AWS credentials.
  • There is no need to interact with a third-party identity provider: once the request is signed, you’re good to go, provided that the signature is valid.

Cons

  • If you prefer not to store long-lived AWS credentials for your on-premises applications, you must first perform authentication through a third-party identity provider to obtain temporary AWS credentials. This would require using either OpenID Connect or SAML, in addition to AWS Signature v4. You could also use IAM Roles Anywhere, which exchanges a trusted certificate for temporary AWS credentials.

Mutual TLS

Mutual TLS, more specifically the mutual authentication mechanism of the Transport Layer Security (TLS) Protocol, allows the authentication of both ends—the client and the server sides—of a communication channel. By default, the server side of the TLS channel is always authenticated. With mutual TLS, the clients must also present a valid X.509 certificate for their identity to be verified.

Amazon API Gateway has recently announced native support for mutual TLS authentication (see this blog post for more details on the new feature). You can enable mutual TLS authentication on custom domains to authenticate your regional REST and HTTP APIs (except for private or edge APIs, for which the new feature is not supported at the time of this writing).

Figure 3: Mutual TLS authentication

Figure 3: Mutual TLS authentication

Mutual TLS can be both time-consuming and complicated to set up, but it is a widespread authentication mechanism.

Pros

  • Mutual TLS is widespread for IoT and business-to-business applications

Cons

  • You need to manage the digital certificates and their lifecycles. This can add significant burden and complexity to your IT operations.
  • You also need, at an application level, to pay special care to revoked certificates to reduce the risk of misuse. Since API Gateway doesn’t automatically verify if a client certificate has been revoked, you have to implement your own logic to do so, such as by using a Lambda authorizer.

OpenID Connect

OpenID Connect (OIDC), specifically OIDC 1.0, is a standard built on top of the OAuth 2.0 authorization framework to provide authentication for mobile and web-based applications. The OIDC client authentication method can be used by a client application to gain access to APIs exposed through Amazon API Gateway. The client application typically authenticates to an OAuth 2.0 authorization server, such as Amazon Cognito or another solution supporting that standard. As a result, the client application obtains a JSON Web Token (JWT) from the OAuth 2.0 authorization server. API Gateway then allows or denies the request based on the JWT validation. For more information about the access control part of this process, see the Amazon API Gateway documentation.

Figure 4: OIDC client authentication

Figure 4: OIDC client authentication

OIDC can be complex to put in place, but it’s a widespread authentication mechanism, especially for mobile and web applications and microservices architecture, including machine-to-machine scenarios.

Pros

  • With OIDC, you avoid storing long-lived AWS credentials for your on-premises applications.
  • OIDC uses REST or JSON message flows over HTTP, which makes it a particularly good fit (compared to SAML) for application developers today.

Cons

  • You need to store and maintain a set of credentials for each client application (such as client id and client secret) and make it accessible to the application. This can add complexity to your IT operations.

SAML

SAML 2.0 is an open standard for exchanging identity and security information between applications and service providers. SAML can be used to delegate authentication to a third-party identity provider, such as an Active Directory environment that is running on premises, and to gain access to AWS by providing a valid SAML assertion. (See About SAML 2.0-based federation to learn how to configure your AWS environment to leverage SAML 2.0.)

IAM validates the SAML assertion with your identity provider and, upon success, provides a set of AWS temporary credentials to the requesting party. The whole process is described in the IAM documentation.

Figure 5: SAML authentication

Figure 5: SAML authentication

SAML can be complex to put in place, but it’s a versatile authentication mechanism that can fit a lot of different use cases, including machine-to-machine scenarios.

Pros

  • With SAML, you not only avoid storing long-lived AWS credentials for your on-premises applications, but you can also use an existing on-premises directory, such as Active Directory, as an identity provider.
  • SAML doesn’t prescribe any particular technology or protocol by which the authentication should take place. The developer has total freedom to employ whichever is more convenient or makes more sense: key-based (such as X.509 certificates), ticket-based (such as Kerberos), or another applicable mechanism.
  • SAML is also a good fit when protocol bindings other than HTTP are needed.

Cons

  • Using SAML with AWS requires a third-party identity provider for your on-premises environment.
  • SAML also requires a trust to be established between your identity provider and your AWS environment, which adds more complexity to the process.
  • Because SAML is XML-based, it isn’t as concise or nimble as AWS Signature v4 or OIDC, for example.
  • You need to manage the SAML assertions and their lifecycles. This can add significant burden and complexity to your IT operations.

Kerberos

Initially developed by MIT, Kerberos v5 is an IETF standard protocol that enables client/server authentication on an unprotected network. It isn’t supported out-of-the-box by AWS, but you can use an identity provider, such as Active Directory, to exchange the Kerberos ticket provided to your application for either an OIDC/OAuth token or a SAML assertion that can be validated by AWS.

Figure 6: Kerberos authentication (through SAML or OIDC)

Figure 6: Kerberos authentication (through SAML or OIDC)

Kerberos is highly complex to set up, but it can make sense in cases where you already have an on-premises environment with Kerberos authentication in place.

Pros

  • With Kerberos, you not only avoid storing long-lived AWS credentials for your on-premises applications, but you can also use an existing on-premises directory, such as Active Directory, as an identity provider.

Cons

  • Using Kerberos with AWS requires the Kerberos ticket to be converted into something that can be accepted by AWS. Therefore, it requires you to use either the OIDC or SAML authentication mechanisms, as described previously.

IAM Roles Anywhere

IAM Roles Anywhere establishes a trust between your AWS account and the certificate authority (CA) that issues certificates to your on-premises workloads using public key infrastructure (PKI). For a detailed overview, see the blog post Extend AWS IAM roles to workloads outside of AWS with IAM Roles Anywhere. Your workloads outside of AWS use IAM Roles Anywhere to exchange x.509 certificates for temporary AWS credentials in order to interact with AWS APIs, thus removing the need for long-term credentials in your on-premises applications. IAM Roles Anywhere enables short-term credentials for numerous hybrid environment use cases including machine-to-machine scenarios.

Figure 7: IAMRA authentication process

Figure 7: IAMRA authentication process

IAM Roles Anywhere is a versatile authentication mechanism that can fit a lot of different use cases, including machine-to-machine scenarios where your on-premises workload is accessing AWS resources.

Pros

  • With IAM Roles Anywhere you avoid storing long-lived AWS credentials for your on-premises workloads.
  • You can import a certificate revocation list (CRL) from your certificate authority (CA) to support certificate revocation.

Cons

  • You need to manage the digital certificates and their lifecycles. This can add complexity to your IT operations.
  • IAM Roles Anywhere does not support callbacks to CRL distribution points (CDPs) or Online Certificate Status Protocol (OCSP) endpoints.

Conclusion

Now we’ll collect and summarize this discussion in the following table, with the pros and cons of each approach.

Authentication mechanism AWS front-end service Complexity Convenience
AWS Signature v4
  • All
Low Very High
Mutual TLS
  • AWS IoT Core
  • Amazon API Gateway
Medium High
OpenID Connect
  • Amazon Cognito
  • Amazon API Gateway
Medium High
SAML
  • Amazon Cognito
  • AWS Identity and Access Management (IAM)
High Medium
Kerberos
  • n/a
Very High Low
IAM Roles Anywhere
  • AWS Identity and Access Management (IAM)
Medium High

AWS Signature v4 is the most convenient and least complex mechanism of these options, but as for every situation, it’s important to start from your own requirements and context before making a choice. Additional factors may influence your choice, such as the structure or the culture of your organization, or the resources available for your project. Keeping the discussion focused on simple factors on purpose, we’ve come up with the following actionable decision helper.

Use AWS Signature v4 when:

  • You have access to AWS credentials (temporary or long-lived)
  • You want to call AWS services directly through their APIs

Use mutual TLS when:

  • The cost and effort of maintaining digital certificates is acceptable for your organization
  • Your organization already has a process in place to maintain digital certificates
  • You plan to call AWS services indirectly through custom-built APIs

Use OpenID Connect when:

  • You need or want to procure temporary AWS credentials by using a REST-based mechanism
  • You want to call AWS services directly through their APIs

Use SAML when:

  • You need to procure temporary AWS credentials
  • You already have a SAML-based authentication process in place
  • You want to call AWS services directly through their APIs

Use Kerberos when:

  • You already have a Kerberos-based authentication process in place
  • None of the previously mentioned mechanisms can be used for your use case

Use IAMRA when:

  • The cost and effort of maintaining digital certificates is acceptable for your organization
  • Your organization already has a process in place to maintain digital certificates
  • You want to call AWS services directly through their APIs
  • You need temporary security credentials for workloads such as servers, containers, and applications that run outside of AWS

We hope this post helps you find your way among the various alternatives that AWS offers to securely connect your external applications to your AWS environment, and to select the most appropriate mechanism for your specific use case. We look forward to your feedback.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on one of the AWS Developer forums or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Patrick Sard

Patrick works as a Solutions Architect at AWS. Apart from being a cloud enthusiast, Patrick loves practicing tai-chi (preferably Chen style), enjoys an occasional wine-tasting (he trained as a Sommelier), and is an avid tennis player.

Jeremy Wave

Jeremy Ware

Jeremy is a Security Specialist Solutions Architect focused on Identity and Access Management. Jeremy and his team enable AWS customers to implement sophisticated, scalable, and secure IAM architecture and Authentication workflows to solve business challenges. With a background in Security Engineering, Jeremy has spent many years working to raise the Security Maturity gap at numerous global enterprises. Outside of work, Jeremy loves to explore the mountainous outdoors participate in sports such as Snowboarding, Wakeboarding, and Dirt bike riding.

Amazon EMR launches support for Amazon EC2 C6i, M6i, I4i, R6i and R6id instances to improve cost performance for Spark workloads by 6–33%

Post Syndicated from Al MS original https://aws.amazon.com/blogs/big-data/amazon-emr-launches-support-for-amazon-ec2-c6i-m6i-i4i-r6i-and-r6id-instances-to-improve-cost-performance-for-spark-workloads-by-6-33/

Amazon EMR provides a managed service to easily run analytics applications using open-source frameworks such as Apache Spark, Hive, Presto, Trino, HBase, and Flink. The Amazon EMR runtime for Spark and Presto includes optimizations that provide over two times performance improvements over open-source Apache Spark and Presto, so that your applications run faster and at lower cost.

With Amazon EMR release 6.8, you can now use Amazon Elastic Compute Cloud (Amazon EC2) instances such as C6i, M6i, I4i, R6i, and R6id, which use the third-generation Intel Xeon scalable processors. Using these new instances with Amazon EMR improves cost-performance by an additional 5–33% over previous generation instances.

In this post, we describe how we estimated the cost-performance benefit from using Amazon EMR with these new instances compared to using equivalent previous generation instances.

Amazon EMR runtime performance improvements with EC2 I4i instances

We ran TPC-DS 3 TB benchmark queries on Amazon EMR 6.8 using the Amazon EMR runtime for Apache Spark (compatible with Apache Spark 3.3) with five node clusters of I4i instances with data in Amazon Simple Storage Service (Amazon S3), and compared it to equivalent sized I3 instances. We measured performance improvements using the total query runtime and geometric mean of query runtime across the TPC-DS 3 TB benchmark queries.

Our results showed between 36.41–44.39% improvement in total query runtime performance on I4i instance EMR clusters compared to equivalent I3 instance EMR clusters, and between 36–45.2% improvement in geometric mean. To measure cost improvement, we added up the Amazon EMR and Amazon EC2 cost per instance per hour (on-demand) and multiplied it by the total query runtime. Note that I4i 32XL instances were not benchmarked because I3 instances don’t have the 32 XL size available. We observed between 22.56–33.1% reduced instance hour cost on I4i instance EMR clusters compared to equivalent I3 instance EMR clusters to run the TPC-DS benchmark queries. All TPC-DS queries ran faster on I4i instance clusters compared to I3 instance clusters.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent I3 and I4i instance EMR clusters.

Instance Size 16 XL 8 XL 4 XL 2 XL XL
Number of core instances in EMR cluster 5 5 5 5 5
Total query runtime on I3 (seconds) 4752.15457 4506.43694 7110.03042 11853.40336 21333.05743
Total query runtime on I4I (seconds) 2642.77407 2812.05517 4415.0023 7537.52779 12981.20251
Total query runtime improvement with I4I 44.39% 37.60% 37.90% 36.41% 39.15%
Geometric mean query runtime on I3 (sec) 34.99551 29.14821 41.53093 60.8069 95.46128
Geometric mean query runtime on I4I (sec) 19.17906 18.65311 25.66263 38.13503 56.95073
Geometric mean query runtime improvement with I4I 45.20% 36.01% 38.21% 37.29% 40.34%
EC2 I3 instance price ($ per hour) $4.990 $2.496 $1.248 $0.624 $0.312
EMR I3 instance price ($ per hour) $0.270 $0.270 $0.270 $0.156 $0.078
(EC2 + EMR) I3 instance price ($ per hour) $5.260 $2.766 $1.518 $0.780 $0.390
Cost of running on I3 ($ per instance) $6.943 $3.462 $2.998 $2.568 $2.311
EC2 I4I instance price ($ per hour) $5.491 $2.746 $1.373 $0.686 $0.343
EMR I4I price ($ per hour per instance) $1.373 $0.687 $0.343 $0.172 $0.086
(EC2 + EMR) I4I instance price ($ per hour) $6.864 $3.433 $1.716 $0.858 $0.429
Cost of running on I4I ($ per instance) $5.039 $2.681 $2.105 $1.795 $1.546
Total cost reduction with I4I including performance improvement -27.43% -22.56% -29.79% -30.09% -33.10%

The following graph shows per query improvements we observed on I4i 2XL instances with EMR Runtime for Spark on Amazon EMR version 6.8 compared to equivalent I3 2XL instances for the TPC-DS 3 TB benchmark.

Amazon EMR runtime performance improvements with EC2 M6i instances

M6i instances showed a similar performance improvement while running Apache Spark workloads compared to equivalent M5 instances. Our test results showed between 13.45–29.52% improvement in total query runtime for seven different instance sizes within the instance family, and between 7.98–25.37% improvement in geometric mean. On cost comparison, we observed 7.98–25.37% reduced instance hour cost on M6i instance EMR clusters compared to M5 EMR instance clusters to run the TPC-DS benchmark queries.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent M6i and M5 instance EMR clusters.

Instance Size 24 XL 16 XL 12 XL 8 XL 4 XL 2 XL XL
Number of core instances in EMR cluster 5 5 5 5 5 5 5
Total query runtime on M5 (seconds) 4027.58043 3782.10766 3348.05362 3516.4308 5621.22532 10075.45109 17278.15146
Total query runtime on M6I (seconds) 3106.43834 2665.70607 2714.69862 3043.5975 4195.02715 8226.88301 14515.50394
Total query runtime improvement with M6I 22.87% 29.52% 18.92% 13.45% 25.37% 18.35% 15.99%
Geometric mean query runtime M5 (sec) 30.45437 28.5207 23.95314 23.55958 32.95975 49.43178 75.95984
Geometric mean query runtime M6I (sec) 23.76853 19.21783 19.16869 19.9574 24.23012 39.09965 60.79494
Geometric mean query runtime improvement with M6I 21.95% 32.62% 19.97% 15.29% 26.49% 20.90% 19.96%
EC2 M5 instance price ($ per hour) $4.61 $3.07 $2.30 $1.54 $0.77 $0.38 $0.19
EMR M5 instance price ($ per hour) $0.27 $0.27 $0.27 $0.27 $0.19 $0.10 $0.05
(EC2 + EMR) M5 instance price ($ per hour) $4.88 $3.34 $2.57 $1.81 $0.96 $0.48 $0.24
Cost of running on M5 ($ per instance) $5.46 $3.51 $2.39 $1.76 $1.50 $1.34 $1.15
EC2 M6I instance price ($ per hour) $4.61 $3.07 $2.30 $1.54 $0.77 $0.38 $0.19
EMR M6I price ($ per hour per instance) $1.15 $0.77 $0.58 $0.38 $0.19 $0.10 $0.05
(EC2 + EMR) M6I instance price ($ per hour) $5.76 $3.84 $2.88 $1.92 $0.96 $0.48 $0.24
Cost of running on M6I ($ per instance) $4.97 $2.84 $2.17 $1.62 $1.12 $1.10 $0.97
Total cost reduction with M6I including performance improvement -8.92% -19.02% -9.28% -7.98% -25.37% -18.35% -15.99%

Amazon EMR runtime performance improvements with EC2 R6i instances

R6i instances showed a similar performance improvement while running Apache Spark workloads compared to equivalent R5 instances. Our test results showed between 14.25–32.23% improvement in total query runtime for six different instance sizes within the instance family, and between 16.12–36.5% improvement in geometric mean. R5.xlarge instances didn’t have sufficient memory to run TPC-DS benchmark queries, and weren’t included in this comparison. On cost comparison, we observed 5.48–23.5% reduced instance hour cost on R6i instance EMR clusters compared to R5 EMR instance clusters to run the TPC-DS benchmark queries.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent R6i and R5 instance EMR clusters.

Instance Size 24 XL 16 XL 12 XL 8 XL 4 XL 2XL
Number of core instances in EMR cluster 5 5 5 5 5 5
Total query runtime on R5 (seconds) 4024.4737 3715.74432 3552.97298 3535.69879 5379.73168 9121.41532
Total query runtime on R6I (seconds) 2865.83169 2518.24192 2513.4849 3031.71973 4544.44854 6977.9508
Total query runtime improvement with R6I 28.79% 32.23% 29.26% 14.25% 15.53% 23.50%
Geometric mean query runtime R5 (sec) 30.59066 28.30849 25.30903 23.85511 32.33391 47.28424
Geometric mean query runtime R6I (sec) 21.87897 17.97587 17.54117 20.00918 26.6277 34.52817
Geometric mean query runtime improvement with R6I 28.48% 36.50% 30.69% 16.12% 17.65% 26.98%
EC2 R5 instance price ($ per hour) $6.0480 $4.0320 $3.0240 $2.0160 $1.0080 $0.5040
EMR R5 instance price ($ per hour) $0.2700 $0.2700 $0.2700 $0.2700 $0.2520 $0.1260
(EC2 + EMR) R5 instance price ($ per hour) $6.3180 $4.3020 $3.2940 $2.2860 $1.2600 $0.6300
Cost of running on R5 ($ per instance) $7.0630 $4.4403 $3.2510 $2.2452 $1.8829 $1.5962
EC2 R6I instance price ($ per hour) $6.0480 $4.0320 $3.0240 $2.0160 $1.0080 $0.5040
EMR R6I price ($ per hour per instance) $1.5120 $1.0080 $0.7560 $0.5040 $0.2520 $0.1260
(EC2 + EMR) R6I instance price ($ per hour) $7.5600 $5.0400 $3.7800 $2.5200 $1.2600 $0.6300
Cost of running on R6I ($ per instance) $6.0182 $3.5255 $2.6392 $2.1222 $1.5906 $1.2211
Total cost reduction with R6I including performance improvement -14.79% -20.60% -18.82% -5.48% -15.53% -23.50%

Amazon EMR runtime performance improvements with EC2 C6i instances

C6i instances showed a similar performance improvement while running Apache Spark workloads compared to equivalent C5 instances. Our test results showed between 16.9–58.22% improvement in total query runtime for four different instance sizes within the instance family, and between 20.25–59.59% improvement in geometric mean. Only C6i 24, 12, 4, and 2xlarge sizes were benchmarked because C5 doesn’t have 32, 16 and 8 xlarge sizes. C5.xlarge instances didn’t have sufficient memory to run TPC-DS benchmark queries, and weren’t included in this comparison. On cost comparison, we observed 16.75–50.07% reduced instance hour cost on C6i instance EMR clusters compared to C5 EMR instance clusters to run the TPC-DS benchmark queries.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent C6i and C5 instance EMR clusters.

Instance Size * 24 XL 12 XL 4 XL 2 XL
Number of core instances in EMR cluster 5 5 5 5
Total query runtime on C5 (seconds) 3435.59808 2900.84981 5945.12879 10173.00757
Total query runtime on C6I (seconds) 2711.16147 2471.86778 5195.30093 8787.43422
Total query runtime improvement with C6I 21.09% 14.79% 12.61% 13.62%
Geometric mean query runtime C5 (sec) 25.67058 20.06539 31.76582 46.78632
Geometric mean query runtime C6I (sec) 20.4458 17.14133 26.92196 39.32622
Geometric mean query runtime improvement with C6I 20.35% 14.57% 15.25% 15.95%
EC2 C5 instance price ($ per hour) $4.080 $2.040 $0.680 $0.340
EMR C5 instance price ($ per hour) $0.270 $0.270 $0.170 $0.085
(EC2 + EMR) C5 instance price ($ per hour) $4.35000 $2.31000 $0.85000 $0.42500
Cost of running on C5 ($ per instance) $4.15135 $1.86138 $1.40371 $1.20098
EC2 C6I instance price ($ per hour) $4.0800 $2.0400 $0.6800 $0.3400
EMR C6I price ($ per hour per instance) $1.02000 $0.51000 $0.17000 $0.08500
(EC2 + EMR) C6I instance price ($ per hour) $5.10000 $2.55000 $0.85000 $0.42500
Cost of running on C6I ($ per instance) $3.84081 $1.75091 $1.22667 $1.03741
Total cost reduction with C6I including performance improvement -7.48% -5.93% -12.61% -13.62%

Amazon EMR runtime performance improvements with EC2 R6id instances

R6id instances showed a similar performance improvement while running Apache Spark workloads compared to equivalent R5D instances. Our test results showed between 11.8–28.7% improvement in total query runtime for five different instance sizes within the instance family, and between 15.1–32.0% improvement in geometric mean. R6ID 32 XL instances were not benchmarked because R5D instances don’t have these sizes available. On cost comparison, we observed 6.8–11.5% reduced instance hour cost on R6ID instance EMR clusters compared to R5D EMR instance clusters to run the TPC-DS benchmark queries.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent R6id and R5d instance EMR clusters.

Instance Size 24 XL 16 XL 12 XL 8 XL 4 XL 2 XL XL
Number of core instances in EMR cluster 5 5 5 5 5 5 5
Total query runtime on R5D (seconds) 4054.4492975042 3691.7569385583 3598.6869168064 3532.7398928104 5397.5330161574 9281.2627059927 16862.8766838096
Total query runtime on R6ID (seconds) 2992.1198446983 2633.7131630720 2632.3186613402 2729.8860537867 4583.1040980373 7921.9960917943 14867.5391541445
Total query runtime improvement with R6ID 26.20% 28.66% 26.85% 22.73% 15.09% 14.65% 11.83%
Geometric mean query runtime R5D (sec) 31.0238156851 28.1432927726 25.7532157307 24.0596427675 32.5800246829 48.2306670294 76.6771994376
Geometric mean query runtime R6ID (sec) 22.8681174894 19.1282742957 18.6161830746 18.0498249257 25.9500918360 39.6580341258 65.0947323858
Geometric mean query runtime improvement with R6ID 26.29% 32.03% 27.71% 24.98% 20.35% 17.77% 15.11%
EC2 R5D instance price ($ per hour) $6.912000 $4.608000 $3.456000 $2.304000 $1.152000 $0.576000 $0.288000
EMR R5D instance price ($ per hour) $0.270000 $0.270000 $0.270000 $0.270000 $0.270000 $0.144000 $0.072000
(EC2 + EMR) R5D instance price ($ per hour) $7.182000 $4.878000 $3.726000 $2.574000 $1.422000 $0.720000 $0.360000
Cost of running on R5D ($ per instance) $8.088626 $5.002331 $3.724641 $2.525909 $2.132026 $1.856253 $1.686288
EC2 R6ID instance price ($ per hour) $7.257600 $4.838400 $3.628800 $2.419200 $1.209600 $0.604800 $0.302400
EMR R6ID price ($ per hour per instance) $1.814400 $1.209600 $0.907200 $0.604800 $0.302400 $0.151200 $0.075600
(EC2 + EMR) R6ID instance price ($ per hour) $9.072000 $6.048000 $4.536000 $3.024000 $1.512000 $0.756000 $0.378000
Cost of running on R6ID ($ per instance) $7.540142 $4.424638 $3.316722 $2.293104 $1.924904 $1.663619 $1.561092
Total cost reduction with R6ID including performance improvement -6.78% -11.55% -10.95% -9.22% -9.71% -10.38% -7.42%

Benchmarking methodology

The benchmark used in this post is derived from the industry-standard TPC-DS benchmark, and uses queries from the Spark SQL Performance Tests GitHub repo with the following fixes applied.

We calculated TCO by multiplying cost per hour by number of instances in the cluster and time taken to run the queries on the cluster. We used the on-demand pricing in the US East (N. Virginia) Region for all instances.

Conclusion

In this post, we described how we estimated the cost-performance benefit from using Amazon EMR with C6i, M6i, I4i, R6i, and R6id, instances compared to using equivalent previous generation instances. Using these new instances with Amazon EMR improves cost-performance by an additional 5–33%.


About the authors

AI MSAl MS is a product manager for Amazon EMR at Amazon Web Services.

Kyeonghyun Ryoo is a Software Development Engineer for EMR at Amazon Web Services. He primarily works on designing and building automation tools for internal teams and customers to maximize their productivity. Outside of work, he is a retired world champion in professional gaming who still enjoy playing video games.

Renewal of AWS CyberGRX assessment to enhance customers’ third-party due diligence process

Post Syndicated from Naranjan Goklani original https://aws.amazon.com/blogs/security/renewal-of-aws-cybergrx-assessment-to-enhance-customers-third-party-due-diligence-process/

CyberGRX

Amazon Web Services (AWS) is pleased to announce renewal of the AWS CyberGRX cyber risk assessment report. This third-party validated report helps customers perform effective cloud supplier due diligence on AWS and enhances their third-party risk management process.

With the increase in adoption of cloud products and services across multiple sectors and industries, AWS has become a critical component of customers’ third-party environments. Regulated customers are held to high standards by regulators and auditors when it comes to exercising effective due diligence on third parties.

Many customers use third-party cyber risk management (TPCRM) services such as CyberGRX to better manage risks from their evolving third-party environments and to drive operational efficiencies. To help with such efforts, AWS has completed the CyberGRX assessment of its security posture. CyberGRX security analysts perform the assessment and validate the results annually.

The CyberGRX assessment applies a dynamic approach to third-party risk assessment. This approach integrates advanced analytics, threat intelligence, and sophisticated risk models with vendors’ responses to provide an in-depth view of how a vendor’s security controls help protect against potential threats.

Vendor profiles are continuously updated as the risk level of cloud service providers changes, or as AWS updates its security posture and controls. This approach eliminates outdated static spreadsheets for third-party risk assessments, in which the risk matrices are not updated in near real time.

In addition, AWS customers can use the CyberGRX Framework Mapper to map AWS assessment controls and responses to well-known industry standards and frameworks, such as National Institute of Standards and Technology (NIST) 800-53, NIST Cybersecurity Framework, International Organization for Standardization (ISO) 27001, Payment Card Industry Data Security Standard (PCI DSS), and U.S. Health Insurance Portability and Assessment Act (HIPAA). This mapping can reduce customers’ third-party supplier due-diligence burden.

Customers can access the AWS CyberGRX report at no additional cost. Customers can request access to the report by completing an access request form, available on the AWS CyberGRX page.

As always, we value your feedback and questions. Reach out to the AWS Compliance team through the Contact Us page. If you have feedback about this post, submit comments in the Comments section below. To learn more about our other compliance and security programs, see AWS Compliance Programs.

Want more AWS Security news? Follow us on Twitter.

Naranjan Goklani

Naranjan Goklani

Naranjan is a Security Audit Manager at AWS, based in Toronto (Canada). He leads audits, attestations, certifications, and assessments across North America and Europe. Naranjan has more than 13 years of experience in risk management, security assurance, and performing technology audits. Naranjan previously worked in one of the Big 4 accounting firms and supported clients from the financial services, technology, retail, ecommerce, and utilities industries.