Tag Archives: Foundational (100)

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Post Syndicated from Avinash Kolluri original https://aws.amazon.com/blogs/big-data/how-amazon-devices-scaled-and-optimized-real-time-demand-and-supply-forecasts-using-serverless-analytics/

Every day, Amazon devices process and analyze billions of transactions from global shipping, inventory, capacity, supply, sales, marketing, producers, and customer service teams. This data is used in procuring devices’ inventory to meet Amazon customers’ demands. With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data.

This post shows you how we migrated to a serverless data lake built on AWS that consumes data automatically from multiple sources and different formats. Furthermore, it created further opportunities for our data scientists and engineers to use AI and machine learning (ML) services to continuously feed and analyze data.

Challenges and design concerns

Our legacy architecture primarily used Amazon Elastic Compute Cloud (Amazon EC2) to extract the data from various internal heterogeneous data sources and REST APIs with the combination of Amazon Simple Storage Service (Amazon S3) to load the data and Amazon Redshift for further analysis and generating the purchase orders.

We found this approach resulted in a few deficiencies and therefore drove improvements in the following areas:

  • Developer velocity – Due to the lack of unification and discovery of schema, which are primary reasons for runtime failures, developers often spent time dealing with operational and maintenance issues.
  • Scalability – Most of these datasets are shared across the globe. Therefore, we must meet the scaling limits while querying the data.
  • Minimal infrastructure maintenance – The current process spans multiple computes depending on the data source. Therefore, reducing infrastructure maintenance is critical.
  • Responsiveness to data source changes – Our current system gets data from various heterogeneous data stores and services. Any updates to those services takes months of developer cycles. The response times for these data sources are critical to our key stakeholders. Therefore, we must take a data-driven approach to select a high-performance architecture.
  • Storage and redundancy – Due to the heterogeneous data stores and models, it was challenging to store the different datasets from various business stakeholder teams. Therefore, having versioning along with incremental and differential data to compare will provide a remarkable ability to generate more optimized plans
  • Fugitive and accessibility – Due to the volatile nature of logistics, a few business stakeholder teams have a requirement to analyze the data on demand and generate the near-real-time optimal plan for the purchase orders. This introduces the need for both polling and pushing the data to access and analyze in near-real time.

Implementation strategy

Based on these requirements, we changed strategies and started analyzing each issue to identify the solution. Architecturally, we chose a serverless model, and the data lake architecture action line refers to all the architectural gaps and challenging features we determined were part of the improvements. From an operational standpoint, we designed a new shared responsibility model for data ingestion using AWS Glue instead of internal services (REST APIs) designed on Amazon EC2 to extract the data. We also used AWS Lambda for data processing. Then we chose Amazon Athena as our query service. To further optimize and improve the developer velocity for our data consumers, we added Amazon DynamoDB as a metadata store for different data sources landing in the data lake. These two decisions drove every design and implementation decision we made.

The following diagram illustrates the architecture

arch

In the following sections, we look at each component in the architecture in more detail as we move through the process flow.

AWS Glue for ETL

To meet customer demand while supporting the scale of new businesses’ data sources, it was critical for us to have a high degree of agility, scalability, and responsiveness in querying various data sources.

AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. You can use it for analytics, ML, and application development. It also includes additional productivity and DataOps tooling for authoring, running jobs, and implementing business workflows.

With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Also, you can immediately search and query cataloged data using Athena, Amazon EMR, and Amazon Redshift Spectrum.

AWS Glue made it easy for us to connect to the data in various data stores, edit and clean the data as needed, and load the data into an AWS-provisioned store for a unified view. AWS Glue jobs can be scheduled or called on demand to extract data from the client’s resource and from the data lake.

Some responsibilities of these jobs are as follows:

  • Extracting and converting a source entity to data entity
  • Enrich the data to contain year, month, and day for better cataloging and include a snapshot ID for better querying
  • Perform input validation and path generation for Amazon S3
  • Associate the accredited metadata based on the source system

Querying REST APIs from internal services is one of our core challenges, and considering the minimal infrastructure, we wanted to use them in this project. AWS Glue connectors assisted us in adhering to the requirement and goal. To query data from REST APIs and other data sources, we used PySpark and JDBC modules.

AWS Glue supports a wide variety of connection types. For more details, refer to Connection Types and Options for ETL in AWS Glue.

S3 bucket as landing zone

We used an S3 bucket as the immediate landing zone of the extracted data, which is further processed and optimized.

Lambda as AWS Glue ETL Trigger

We enabled S3 event notifications on the S3 bucket to trigger Lambda, which further partitions our data. The data is partitioned on InputDataSetName, Year, Month, and Date. Any query processor running on top of this data will scan only a subset of data for better cost and performance optimization. Our data can be stored in various formats, such as CSV, JSON, and Parquet.

The raw data isn’t ideal for most of our use cases to generate the optimal plan because it often has duplicates or incorrect data types. Most importantly, the data is in multiple formats, but we quickly modified the data and observed significant query performance gains from using the Parquet format. Here, we used one of the performance tips in Top 10 performance tuning tips for Amazon Athena.

AWS Glue jobs for ETL

We wanted better data segregation and accessibility, so we chose to have a different S3 bucket to improve performance further. We used the same AWS Glue jobs to further transform and load the data into the required S3 bucket and a portion of extracted metadata into DynamoDB.

DynamoDB as metadata store

Now that we have the data, various business stakeholders further consume it. This leaves us with two questions: which source data resides on the data lake and what version. We chose DynamoDB as our metadata store, which provides the latest details to the consumers to query the data effectively. Every dataset in our system is uniquely identified by snapshot ID, which we can search from our metadata store. Clients access this data store with an API’s.

Amazon S3 as data lake

For better data quality, we extracted the enriched data into another S3 bucket with the same AWS Glue job.

AWS Glue Crawler

Crawlers are the “secret sauce” that enables us to be responsive to schema changes. Throughout the process, we chose to make each step as schema-agnostic as possible, which allows any schema changes to flow through until they reach AWS Glue. With a crawler, we could maintain the agnostic changes happening to the schema. This helped us automatically crawl the data from Amazon S3 and generate the schema and tables.

AWS Glue Data Catalog

The Data Catalog helped us maintain the catalog as an index to the data’s location, schema, and runtime metrics in Amazon S3. Information in the Data Catalog is stored as metadata tables, where each table specifies a single data store.

Athena for SQL queries

Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run. We considered operational stability and increasing developer velocity as our key improvement factors.

We further optimized the process to query Athena so that users can plug in the values and the queries to get data out of Athena by creating the following:

  • An AWS Cloud Development Kit (AWS CDK) template to create Athena infrastructure and AWS Identity and Access Management (IAM) roles to access data lake S3 buckets and the Data Catalog from any account
  • A library so that client can provide an IAM role, query, data format, and output location to start an Athena query and get the status and result of the query run in the bucket of their choice.

To query Athena is a two-step process:

  • StartQueryExecution – This starts the query run and gets the run ID. Users can provide the output location where the output of the query will be stored.
  • GetQueryExecution – This gets the query status because the run is asynchronous. When successful, you can query the output in an S3 file or via API.

The helper method for starting the query run and getting the result would be in the library.

Data lake metadata service

This service is custom developed and interacts with DynamoDB to get the metadata (dataset name, snapshot ID, partition string, timestamp, and S3 link of the data) in the form of a REST API. When the schema is discovered, clients use Athena as their query processor to query the data.

Because all datasets have a snapshot ID are partitioned, the join query doesn’t result in a full table scan but only a partition scan on Amazon S3. We used Athena as our query processor because of its ease in not managing our query infrastructure. Later, if we feel we need something more, we can use either Redshift Spectrum or Amazon EMR.

Conclusion

Amazon Devices teams discovered significant value by moving to a data lake architecture using AWS Glue, which enabled multiple global business stakeholders to ingest data in more productive ways. This enabled the teams to generate the optimal plan to place purchase orders for devices by analyzing the different datasets in near-real time with appropriate business logic to solve the problems of the supply chain, demand, and forecast.

From an operational perspective, the investment has already started to pay off:

  • It standardized our ingestion, storage, and retrieval mechanisms, saving onboarding time. Before the implementation of this system, one dataset took 1 month to onboard. Due to our new architecture, we were able to onboard 15 new datasets in less than 2 months, which improved our agility by 70%.
  • It removed scaling bottlenecks, creating a homogeneous system that can quickly scale to thousands of runs.
  • The solution added schema and data quality validation before accepting any inputs and rejecting them if data quality violations are discovered.
  • It made it easy to retrieve datasets while supporting future simulations and back tester use cases requiring versioned inputs. This will make launching and testing models simpler.
  • The solution created a common infrastructure that can be easily extended to other teams across DIAL having similar issues with data ingestion, storage, and retrieval use cases.
  • Our operating costs have fallen by almost 90%.
  • This data lake can be accessed efficiently by our data scientists and engineers to perform other analytics and have a predictive approach as a future opportunity to generate accurate plans for the purchase orders.

The steps in this post can help you plan to build a similar modern data strategy using AWS-managed services to ingest data from different sources, automatically create metadata catalogs, share data seamlessly between the data lake and data warehouse, and create alerts in the event of an orchestrated data workflow failure.


About the authors

avinash_kolluriAvinash Kolluri is a Senior Solutions Architect at AWS. He works across Amazon Alexa and Devices to architect and design modern distributed solutions. His passion is to build cost-effective and highly scalable solutions on AWS. In his spare time, he enjoys cooking fusion recipes and traveling.

vipulVipul Verma is a Sr.Software Engineer at Amazon.com. He has been with Amazon since 2015,solving real-world challenges through technology that directly impact and improve the life of Amazon customers. In his spare time, he enjoys hiking.

Amazon EMR launches support for Amazon EC2 C7g (Graviton3) instances to improve cost performance for Spark workloads by 7–13%

Post Syndicated from Al MS original https://aws.amazon.com/blogs/big-data/amazon-emr-launches-support-for-amazon-ec2-c7g-graviton3-instances-to-improve-cost-performance-for-spark-workloads-by-7-13/

Amazon EMR provides a managed service to easily run analytics applications using open-source frameworks such as Apache Spark, Hive, Presto, Trino, HBase, and Flink. The Amazon EMR runtime for Spark and Presto includes optimizations that provide over twice the performance improvements compared to open-source Apache Spark and Presto.

With Amazon EMR release 6.7, you can now use Amazon Elastic Compute Cloud (Amazon EC2) C7g instances, which use the AWS Graviton3 processors. These instances improve price-performance of running Spark workloads on Amazon EMR by 7.93–13.35% over previous generation instances, depending on the instance size. In this post, we describe how we estimated the price-performance benefit.

Amazon EMR runtime performance with EC2 C7g instances

We ran TPC-DS 3 TB benchmark queries on Amazon EMR 6.9 using the Amazon EMR runtime for Apache Spark (compatible with Apache Spark 3.3) with C7g instances. Data was stored in Amazon Simple Storage Service (Amazon S3), and results were compared to equivalent C6g clusters from the previous generation instance family. We measured performance improvements using the total query runtime and geometric mean of the query runtime across TPC-DS 3 TB benchmark queries.

Our results showed 13.65–18.73% improvement in total query runtime performance and 16.98–20.28% improvement in geometric mean on EMR clusters with C7g compared to equivalent EMR clusters with C6g instances, depending on the instance size. In comparing costs, we observed 7.93–13.35% reduction in cost on the EMR cluster with C7g compared to the equivalent with C6g, depending on the instance size. We did not benchmark the C6g xlarge instance because it didn’t have sufficient memory to run the queries.

The following table shows the results from running the TPC-DS 3 TB benchmark queries using Amazon EMR 6.9 compared to equivalent C7g and C6g instance EMR clusters.

Instance Size 16 XL 12 XL 8 XL 4 XL 2 XL
Total size of the cluster (1 leader + 5 core nodes) 6 6 6 6 6
Total query runtime on C6g (seconds) 2774.86205 2752.84429 3173.08086 5108.45489 8697.08117
Total query runtime on C7g (seconds) 2396.22799 2336.28224 2698.72928 4151.85869 7249.58148
Total query runtime improvement with C7g 13.65% 15.13% 14.95% 18.73% 16.64%
Geometric mean query runtime C6g (seconds) 22.2113 21.75459 23.38081 31.97192 45.41656
Geometric mean query runtime C7g (seconds) 18.43905 17.65898 19.01684 25.48695 37.43737
Geometric mean query runtime improvement with C7g 16.98% 18.83% 18.66% 20.28% 17.57%
EC2 C6g instance price ($ per hour) $2.1760 $1.6320 $1.0880 $0.5440 $0.2720
EMR C6g instance price ($ per hour) $0.5440 $0.4080 $0.2720 $0.1360 $0.0680
(EC2 + EMR) instance price ($ per hour) $2.7200 $2.0400 $1.3600 $0.6800 $0.3400
Cost of running on C6g ($ per instance) $2.09656 $1.55995 $1.19872 $0.96493 $0.82139
EC2 C7g instance price ($ per hour) $2.3200 $1.7400 $1.1600 $0.5800 $0.2900
EMR C7g price ($ per hour per instance) $0.5800 $0.4350 $0.2900 $0.1450 $0.0725
(EC2 + EMR) C7g instance price ($ per hour) $2.9000 $2.1750 $1.4500 $0.7250 $0.3625
Cost of running on C7g ($ per instance) $1.930290 $1.411500 $1.086990 $0.836140 $0.729990
Total cost reduction with C7g including performance improvement -7.93% -9.52% -9.32% -13.35% -11.13%

The following graph shows per-query improvements observed on C7g 2xlarge instances compared to equivalent C6g generations.

Benchmarking methodology

The benchmark used in this post is derived from the industry-standard TPC-DS benchmark, and uses queries from the Spark SQL Performance Tests GitHub repo with the following fixes applied.

We calculated TCO by multiplying cost per hour by number of instances in the cluster and time taken to run the queries on the cluster. We used on-demand pricing in the US East (N. Virginia) Region for all instances.

Conclusion

In this post, we described how we estimated the cost-performance benefit from using Amazon EMR with C7g instances compared to using equivalent previous generation instances. Using these new instances with Amazon EMR improves cost-performance by an additional 7–13%.


About the authors

AI MSAl MS is a product manager for Amazon EMR at Amazon Web Services.

Kyeonghyun Ryoo is a Software Development Engineer for EMR at Amazon Web Services. He primarily works on designing and building automation tools for internal teams and customers to maximize their productivity. Outside of work, he is a retired world champion in professional gaming who still enjoy playing video games.

Yuzhou Sun is a software development engineer for EMR at Amazon Web Services.

Steve Koonce is an Engineering Manager for EMR at Amazon Web Services.

AWS Lake Formation 2022 year in review

Post Syndicated from Jason Berkowitz original https://aws.amazon.com/blogs/big-data/aws-lake-formation-2022-year-in-review/

Data governance is the collection of policies, processes, and systems that organizations use to ensure the quality and appropriate handling of their data throughout its lifecycle for the purpose of generating business value. Data governance is increasingly top-of-mind for customers as they recognize data as one of their most important assets. Effective data governance enables better decision-making by improving data quality, reducing data management costs, and ensuring secure access to data for stakeholders. In addition, data governance is required to comply with an increasingly complex regulatory environment with data privacy (such as GDPR and CCPA) and data residency regulations (such as in the EU, Russia, and China).

For AWS customers, effective data governance improves decision-making, increases business agility, provides a competitive advantage, and reduces the risk of fines due to non-compliance with regulatory obligations. We understand the unique opportunity to provide our customers a comprehensive end-to-end data governance solution that is seamlessly integrated into our portfolio of services, and AWS Lake Formation and the AWS Glue Data Catalog are key to solving these challenges.

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

To learn more about how customers are securing and sharing data with Lake Formation, we recommend going deeper into GoDaddy’s decentralized data mesh, Novo Nordisk’s modern data architecture, and JPMorgan’s improvements to their Federated Data Lake, a governed data mesh implementation using Lake Formation. Also, you can learn how AWS Partners integrated with Lake Formation to help customers build unique data lakes, in Starburst’s data mesh solution, Informatica’s automated data sharing solution, Ahana’s Presto integration with Lake Formation, Ascending’s custom data governance system, how PBS used machine learning on their data lakes, and how hc1 provides personalized health insights for customers.

You can review how Lake Formation is used by customers to build modern data architectures in the following re:Invent 2022 talks:

The Lake Formation team listened to customer feedback and made improvements in the areas of cross-account data governance, expanding the source of data lakes, enabling unified data governance of a business data catalog, making secure business-to-business data sharing possible, and expanding the coverage area for fine-grained access controls to Amazon Redshift. In the rest of this post, we are happy to share the progress we made in 2022.

Enhancing cross-account governance

Lake Formation provides the foundation for customers to share data across accounts within their organization. You can share AWS Glue Data Catalog resources to AWS Identity and Access Management (IAM) principals within an account as well as other AWS accounts using two methods. The first one is called the named-resource method, where users can select the names of databases and tables and choose the type of permissions to share. The second method uses LF-Tags, where users can create and associate LF-Tags to databases and tables and grant permission to IAM principals using LF-Tag policies and expressions.

In November 2022, Lake Formation introduced version 3 of its cross-account sharing feature. With this new version, Lake Formation users can share catalog resources using LF-Tags at the AWS Organizations level. Sharing data using LF-tags helps scale permissions and reduces the admin work for data lake builders. The cross-account sharing version 3 also allows you to share resources to specific IAM principals in other accounts, providing data owners control over who can access their data in other accounts. Lastly, we have removed the overhead of writing and maintaining Data Catalog resource policies by introducing AWS Resource Access Manager (AWS RAM) invites with LF-Tags-based policies in the cross-account sharing version 3. We encourage you to further explore cross-account sharing in Lake Formation.

Extending Lake Formation permissions to new data

Until re:Invent 2022, Lake Formation provided permissions management for IAM principals on Data Catalog resources with underlying data primarily on Amazon Simple Storage Service (Amazon S3). At re:Invent 2022, we introduced Lake Formation permissions management for Amazon Redshift data shares in preview mode. Amazon Redshift is a fully-managed, petabyte-scale data warehouse service in the AWS Cloud. The data sharing feature allows data owners to group databases, tables, and views in an Amazon Redshift cluster and share it with other Amazon Redshift clusters within or across AWS accounts. Data sharing reduces the need to keep multiple copies of the same data in different data warehouses to accelerate business decision-making across an organization. Lake Formation further enhances sharing data within Amazon Redshift data shares by providing fine-grained access control on tables and views.

For additional details on this feature, refer to AWS Lake Formation-managed Redshift datashares (preview) and How Redshift data share can be managed by Lake Formation.

Amazon EMR is a managed cluster platform to run big data applications using Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto at scale. You can use Amazon EMR to run batch and stream processing analytics jobs on your S3 data lakes. Starting with Amazon EMR release 6.7.0, we introduced Lake Formation permissions management on a runtime IAM role used with the EMR Steps API. This feature enables you to submit Apache Spark and Apache Hive applications to an EMR cluster through the EMR Steps API that enforces table-level and column-level permissions using Lake Formation to that IAM role submitting the application. This Lake Formation integration with Amazon EMR allows you to share an EMR cluster across multiple users in an organization with different permissions by isolating your applications through a runtime IAM role. We encourage you to check this feature in the Lake Formation workshop Integration with Amazon EMR using Runtime Roles. To explore a use case, see Introducing runtime roles for Amazon EMR steps: Use IAM roles and AWS Lake Formation for access control with Amazon EMR.

Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) that enables data scientists and developers to prepare data for building, training, tuning, and deploying models. Studio offers a native integration with Amazon EMR so that data scientists and data engineers can interactively prepare data at petabyte scale using open-source frameworks such as Apache Spark, Presto, and Hive using Studio notebooks. With the release of Lake Formation permissions management on a runtime IAM role, Studio now supports table-level and column-level access with Lake Formation. When users connect to EMR clusters from Studio notebooks, they can choose the IAM role (called the runtime IAM role) that they want to connect with. If data access is managed by Lake Formation, users can enforce table-level and column-level permissions using policies attached to the runtime role. For more details, refer to Apply fine-grained data access controls with AWS Lake Formation and Amazon EMR from Amazon SageMaker Studio.

Ingest and catalog varied data

A robust data governance model includes data from an organization’s many data sources and methods to discover and catalog those varied data assets. AWS Glue crawlers provide the ability to discover data from sources including Amazon S3, Amazon Redshift, and NoSQL databases, and populate the AWS Glue Data Catalog.

In 2022, we launched AWS Glue crawler support for Snowflake and AWS Glue crawler support for Delta Lake tables. These integrations allow AWS Glue crawlers to create and update Data Catalog tables based on these popular data sources. This makes it even easier to create extract, transform, and load (ETL) jobs with AWS Glue based on these Data Catalog tables as sources and targets.

In 2022, the AWS Glue crawlers UI was redesigned to offer a better user experience. One of the main enhancements delivered as part of this revision is the greater insights into AWS Glue crawler history. The crawler history UI provides an easy view of crawler runs, schedules, data sources, and tags. For each crawl, the crawler history offers a summary of changes in the database schema or Amazon S3 partition changes. Crawler history also provides detailed info about DPU hours and reduces the time spent analyzing and debugging crawler operations and costs. To explore the new functionalities added to the crawlers UI, refer to Set up and monitor AWS Glue crawlers using the enhanced AWS Glue UI and crawler history.

In 2022, we also extended support for crawlers based on Amazon S3 event notifications to support catalog tables. With this feature, incremental crawling can be offloaded from data pipelines to the scheduled AWS Glue crawler, reducing crawls to incremental S3 events. For more information, refer to Build incremental crawls of data lakes with existing Glue catalog tables.

More ways to share data beyond the data lake

During re:Invent 2022, we announced a preview of AWS Data Exchange for AWS Lake Formation, a new feature that enables data subscribers to find and subscribe to third-party datasets that are managed directly through Lake Formation. Until now, AWS Data Exchange subscribers could access third-party datasets by exporting providers’ files to their own S3 buckets, calling providers’ APIs through Amazon API Gateway, or querying producers’ Amazon Redshift data shares from their Amazon Redshift cluster. With the new Lake Formation integration, data providers curate AWS Data Exchange datasets using Lake Formation tags. Data subscribers are able to query and explore the databases and tables associated with those tags, just like any other AWS Glue Data Catalog resource. Organizations can apply resource-based Lake Formation permissions to share the licensed datasets within the same account or across accounts using AWS License Manager. AWS Data Exchange for Lake Formation streamlines data licensing and sharing operations by accelerating data onboarding, reducing the amount of ETL required for end-users to access third-party data, and centralizing governance and access controls for third-party data.

At re:Invent 2022, we also announced Amazon DataZone, a new data management service that makes it faster and easier for you to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources. Amazon DataZone is a business data catalog service that supplements the technical metadata in the AWS Glue Data Catalog. Amazon DataZone is integrated with Lake Formation permissions management so that you can effectively manage and govern access to your data, and audit who is accessing what data and for what purpose. With the publisher-subscriber model of Amazon DataZone, data assets can be shared and accessed across Regions. For additional details about the service and its capabilities, refer to the Amazon DataZone FAQs and re:Invent launch.

Conclusion

Data is transforming every field and every business. However, with data growing faster than most companies can keep track of, collecting, securing, and getting value out of that data is a challenging thing to do. A modern data strategy can help you create better business outcomes with data. AWS provides the most complete set of services for the end-to-end data journey to help you unlock value from your data and turn it into insight.

At AWS, we work backward from customer requirements. From the Lake Formation team, we worked hard to deliver the features described in this post, and we invite you to check them out. With our continued focus to invent, we hope to play a key role in empowering organizations to build new data governance models that help you derive more business value at lightning speed.

You can get started with Lake Formation by exploring our hands-on workshop modules and Getting started tutorials. We look forward to hearing from you, our customers, on your data lake and data governance use cases. Please get in touch through your AWS account team and share your comments.


About the Authors

Jason Berkowitz is a Senior Product Manager with AWS Lake Formation. He comes from a background in machine learning and data lake architectures. He helps customers become data-driven.

Aarthi Srinivasan is a Senior Big Data Architect with AWS Lake Formation. She enjoys building data lake solutions for AWS customers and partners. When not on the keyboard, she explores the latest science and technology trends and spends time with her family.

Leonardo Gómez is a Senior Analytics Specialist Solutions Architect at AWS. Based in Toronto, Canada, he has over a decade of experience in data management, helping customers around the globe address their business and technical needs.

United Arab Emirates IAR compliance assessment report is now available with 58 services in scope

Post Syndicated from Ioana Mecu original https://aws.amazon.com/blogs/security/united-arab-emirates-iar-compliance-assessment-report-is-now-available-with-58-services-in-scope/

Amazon Web Services (AWS) is pleased to announce the publication of our compliance assessment report on the Information Assurance Regulation (IAR) established by the Telecommunications and Digital Government Regulatory Authority (TDRA) of the United Arab Emirates. The report covers the AWS Middle East (UAE) Region, with 58 services in scope of the assessment.

The IAR provides management and technical information security controls to establish, implement, maintain, and continuously improve information assurance. AWS alignment with IAR requirements demonstrates our ongoing commitment to adhere to the heightened expectations for cloud service providers. As such, IAR-regulated customers can use AWS services with confidence.

Independent third-party auditors from BDO evaluated AWS for the period of November 1, 2021, to October 31, 2022. The assessment report illustrating the status of AWS compliance is available through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

For up-to-date information, including when additional services are added, see AWS Services in Scope by Compliance Program and choose IAR.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about IAR compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Ioana Mecu

Ioana Mecu

Ioana is a Security Audit Program Manager at AWS based in Madrid, Spain. She leads security audits, attestations, and certification programs across Europe and the Middle East. Ioana has previously worked in risk management, security assurance, and technology audits in the financial sector industry for the past 15 years.

Gokhan Akyuz

Gokhan Akyuz

Gokhan is a Security Audit Program Manager at AWS based in Amsterdam, Netherlands. He leads security audits, attestations, and certification programs across Europe and the Middle East. Gokhan has more than 15 years of experience in IT and cybersecurity audits and controls implementation in a wide range of industries.

AWS CloudHSM is now PCI PIN certified

Post Syndicated from Nivetha Chandran original https://aws.amazon.com/blogs/security/aws-cloudhsm-is-now-pci-pin-certified/

Amazon Web Services (AWS) is pleased to announce that AWS CloudHSM is certified for Payment Card Industry Personal Identification Number (PCI PIN) version 3.1.

With CloudHSM, you can manage and access your keys on FIPS 140-2 Level 3 certified hardware, protected with customer-owned, single-tenant hardware security module (HSM) instances that run in your own virtual private cloud (VPC). This PCI PIN attestation gives you the flexibility to deploy your regulated workloads with reduced compliance overhead.

Coalfire, a third-party Qualified Security Assessor (QSA), evaluated CloudHSM. Customers can access the PCI PIN Attestation of Compliance (AOC) report through AWS Artifact.

To learn more about our PCI program and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Nivetha Chandran

Nivetha is a Security Assurance Manager at Amazon Web Services on the Global Audits team, managing the PCI compliance program. Nivetha holds a Master’s degree in Information Management from the University of Washington.

Fall 2022 SOC reports now available in Spanish

Post Syndicated from Rodrigo Fiuza original https://aws.amazon.com/blogs/security/fall-2022-soc-reports-now-available-in-spanish/

Spanish version >>

We continue to listen to our customers, regulators, and stakeholders to understand their needs regarding audit, assurance, certification, and attestation programs at Amazon Web Services (AWS). We are pleased to announce that Fall 2022 System and Organization Controls (SOC) 1, SOC 2, and SOC 3 reports are now available in Spanish. These translated reports will help drive greater engagement and alignment with customer and regulatory requirements across Latin America and Spain.

The Spanish language version of the reports does not contain the independent opinion issued by the auditors or the control test results, but you can find this information in the English language version. Stakeholders should use the English version as a complement to the Spanish version.

Translated SOC reports in Spanish are available to customers through AWS Artifact. Translated SOC reports in Spanish will be published twice a year, in alignment with the Fall and Spring reporting cycles.

We value your feedback and questions—feel free to reach out to our team or give feedback about this post through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

 


Spanish

Los informes SOC de Otoño de 2022 ahora están disponibles en español

Seguimos escuchando a nuestros clientes, reguladores y partes interesadas para comprender sus necesidades en relación con los programas de auditoría, garantía, certificación y atestación en Amazon Web Services (AWS). Nos complace anunciar que los informes SOC 1, SOC 2 y SOC 3 de AWS de Otoño de 2022 ya están disponibles en español. Estos informes traducidos ayudarán a impulsar un mayor compromiso y alineación con los requisitos regulatorios y de los clientes en las regiones de América Latina y España.

La versión en inglés de los informes debe tenerse en cuenta en relación con la opinión independiente emitida por los auditores y los resultados de las pruebas de controles, como complemento de las versiones en español.

Los informes SOC traducidos en español están disponibles en AWS Artifact. Los informes SOC traducidos en español se publicarán dos veces al año según los ciclos de informes de Otoño y Primavera.

Valoramos sus comentarios y preguntas; no dude en ponerse en contacto con nuestro equipo o enviarnos sus comentarios sobre esta publicación a través de nuestra página Contáctenos.

Si tienes comentarios sobre esta publicación, envíalos en la sección Comentarios a continuación.

¿Desea obtener más noticias sobre seguridad de AWS? Síguenos en Twitter.

Author

Rodrigo Fiuza

Rodrigo is a Security Audit Manager at AWS, based in São Paulo. He leads audits, attestations, certifications, and assessments across Latin America, Caribbean and Europe. Rodrigo has previously worked in risk management, security assurance, and technology audits for the past 12 years.

Andrew Najjar

Andrew Najjar

Andrew is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS and has 8 years of experience in security assurance. Andrew holds a master’s degree in information systems and bachelor’s degree in accounting from Indiana University. He is a CPA and AWS Certified Solution Architect – Associate.

ryan wilks

Ryan Wilks

Ryan is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS. Ryan has 11 years of experience in information security and holds ITIL, CISM and CISA certifications.

Nathan Samuel

Nathan Samuel

Nathan is a Compliance Program Manager at Amazon Web Services. He leads multiple security and privacy initiatives within AWS. Nathan has a Bachelors of Commerce degree from the University of the Witwatersrand, South Africa and has 17 years’ experience in security assurance and holds the CISA, CRISC, CGEIT, CISM, CDPSE and Certified Internal Auditor certifications.

C5 Type 2 attestation report now available with 156 services in scope

Post Syndicated from Julian Herlinghaus original https://aws.amazon.com/blogs/security/c5-type-2-attestation-report-now-available-with-156-services-in-scope/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS), and we are pleased to announce that AWS has successfully completed the 2022 Cloud Computing Compliance Controls Catalogue (C5) attestation cycle with 156 services in scope. This alignment with C5 requirements demonstrates our ongoing commitment to adhere to the heightened expectations for cloud service providers. AWS customers in Germany and across Europe can run their applications on AWS Regions in scope of the C5 report with the assurance that AWS aligns with C5 requirements.

The C5 attestation scheme is backed by the German government and was introduced by the Federal Office for Information Security (BSI) in 2016. AWS has adhered to the C5 requirements since their inception. C5 helps organizations demonstrate operational security against common cyberattacks when using cloud services within the context of the German Government’s Security Recommendations for Cloud Computing Providers.

Independent third-party auditors evaluated AWS for the period October 1, 2021, through September 30, 2022. The C5 report illustrates AWS’ compliance status for both the basic and additional criteria of C5. Customers can download the C5 report through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

AWS has added the following 16 services to the current C5 scope:

At present, the services offered in the Frankfurt, Dublin, London, Paris, Milan, Stockholm and Singapore Regions are in scope of this certification. For up-to-date information, see the AWS Services in Scope by Compliance Program page and choose C5.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about C5 compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Julian Herlinghaus

Julian Herlinghaus

Julian is a Manager in AWS Security Assurance based in Berlin, Germany. He leads third-party and customer security audits across Europe and specifically the DACH region. He has previously worked as Information Security department lead of an accredited certification body and has multiple years of experience in information security and security assurance & compliance.

Andreas Terwellen

Andreas Terwellen

Andreas is a senior manager in security audit assurance at AWS, based in Frankfurt, Germany. His team is responsible for third-party and customer audits, attestations, certifications, and assessments across Europe. Previously, he was a CISO in a DAX-listed telecommunications company in Germany. He also worked for different consulting companies managing large teams and programs across multiple industries and sectors.

AWS achieves HDS certification in two additional Regions

Post Syndicated from Janice Leung original https://aws.amazon.com/blogs/security/aws-achieves-hds-certification-in-two-additional-regions/

We’re excited to announce that two additional AWS Regions—Asia Pacific (Jakarta) and Europe (Milan)—have been granted the Health Data Hosting (Hébergeur de Données de Santé, HDS) certification. This alignment with HDS requirements demonstrates our continued commitment to adhere to the heightened expectations for cloud service providers. AWS customers who handle personal health data can use HDS-certified Regions with confidence to manage their workloads.

The following 18 Regions are in scope for this certification:

  • US East (Ohio)
  • US East (Northern Virginia)
  • US West (Northern California)
  • US West (Oregon)
  • Asia Pacific (Jakarta)
  • Asia Pacific (Seoul)
  • Asia Pacific (Mumbai)
  • Asia Pacific (Singapore)
  • Asia Pacific (Sydney)
  • Asia Pacific (Tokyo)
  • Canada (Central)
  • Europe (Frankfurt)
  • Europe (Ireland)
  • Europe (London)
  • Europe (Milan)
  • Europe (Paris)
  • Europe (Stockholm)
  • South America (São Paulo)

Introduced by the French governmental agency for health, Agence Française de la Santé Numérique (ASIP Santé), the HDS certification aims to strengthen the security and protection of personal health data. Achieving this certification demonstrates that AWS provides a framework for technical and governance measures to secure and protect personal health data, governed by French law.

Independent third-party auditors evaluated and certified AWS on January 13, 2023. The Certificate of Compliance that demonstrates AWS compliance status is available on the Agence du Numérique en Santé (ANS) website and AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

For up-to-date information, including when additional Regions are added, see the AWS Compliance Programs page, and choose HDS.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about HDS compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Author

Janice Leung

Janice is a security audit program manager at AWS, based in New York. She leads security audits across Europe and previously worked in security assurance and technology risk management in the financial industry for 11 years.

Recap to security, identity, and compliance sessions at AWS re:Invent 2022

Post Syndicated from Katie Collins original https://aws.amazon.com/blogs/security/recap-to-security-identity-and-compliance-sessions-at-aws-reinvent-2022/

AWS re:Invent returned to Las Vegas, NV, in November 2022. The conference featured over 2,200 sessions and hands-on labs and more than 51,000 attendees over 5 days. If you weren’t able to join us in person, or just want to revisit some of the security, identity, and compliance announcements and on-demand sessions, this blog post is for you.

re:Invent 2022

Key announcements

Here are some of the security announcements that we made at AWS re:Invent 2022.

  • We announced the preview of a new service, Amazon Security Lake. Amazon Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your AWS account. Security Lake makes it simpler to analyze security data so that you can get a more complete understanding of security across your entire organization. You can also improve the protection of your workloads, applications, and data. Security Lake automatically gathers and manages your security data across accounts and AWS Regions.
  • We introduced the AWS Digital Sovereignty Pledge—our commitment to offering the most advanced set of sovereignty controls and features available in the cloud. As part of this pledge, we launched a new feature of AWS Key Management Service, External Key Store (XKS), where you can use your own encryption keys stored outside of the AWS Cloud to protect data on AWS.
  • To help you with the building blocks for zero trust, we introduced two new services:
    • AWS Verified Access provides secure access to corporate applications without a VPN. Verified Access verifies each access request in real time and only connects users to the applications that they are allowed to access, removing broad access to corporate applications and reducing the associated risks.
    • Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service for custom applications. Using the Cedar policy language, Amazon Verified Permissions centralizes fine-grained permissions for custom applications and helps developers authorize user actions in applications.
  • We announced Automated sensitive data discovery for Amazon Macie. This new capability helps you gain visibility into where your sensitive data resides on Amazon Simple Storage Service (Amazon S3) at a fraction of the cost of running a full data inspection across all your S3 buckets. Automated sensitive data discovery automates the continual discovery of sensitive data and potential data security risks across your S3 storage aggregated at the AWS Organizations level.
  • Amazon Inspector now supports AWS Lambda functions, adding continual, automated vulnerability assessments for serverless compute workloads. Amazon Inspector automatically discovers eligible AWS Lambda functions and identifies software vulnerabilities in application package dependencies used in the Lambda function code. The functions are initially assessed upon deployment to Lambda and continually monitored and reassessed, informed by updates to the function and newly published vulnerabilities. When vulnerabilities are identified, actionable security findings are generated, aggregated in Amazon Inspector, and pushed to Security Hub and Amazon EventBridge to automate workflows.
  • Amazon GuardDuty now offers threat detection for Amazon Aurora to identify potential threats to data stored in Aurora databases. Currently in preview, Amazon GuardDuty RDS Protection profiles and monitors access activity to existing and new databases in your account, and uses tailored machine learning models to detect suspicious logins to Aurora databases. When a potential threat is detected, GuardDuty generates a security finding that includes database details and contextual information on the suspicious activity. GuardDuty is integrated with Aurora for direct access to database events without requiring you to modify your databases.
  • AWS Security Hub is now integrated with AWS Control Tower, allowing you to pair Security Hub detective controls with AWS Control Tower proactive or preventive controls and manage them together using AWS Control Tower. Security Hub controls are mapped to related control objectives in the AWS Control Tower control library, providing you with a holistic view of the controls required to meet a specific control objective. This combination of over 160 detective controls from Security Hub, with the AWS Control Tower built-in automations for multi-account environments, gives you a strong baseline of governance and off-the-shelf controls to scale your business using new AWS workloads and services. This combination of controls also helps you monitor whether your multi-account AWS environment is secure and managed in accordance with best practices, such as the AWS Foundational Security Best Practices standard.
  • We launched our Cloud Audit Academy (CAA) course for Federal and DoD Workloads (FDW) on AWS. This new course is a 12-hour interactive training based on NIST SP 800-171, with mappings to NIST SP 800-53 and the Cybersecurity Maturity Model Certification (CMMC) and covers AWS services relevant to each NIST control family. This virtual instructor-led training is industry- and framework-specific for our U.S. Federal and DoD customers.
  • AWS Wickr allows businesses and public sector organizations to collaborate more securely, while retaining data to help meet requirements such as e-discovery and Freedom of Information Act (FOIA) requests. AWS Wickr is an end-to-end encrypted enterprise communications service that facilitates one-to-one chats, group messaging, voice and video calling, file sharing, screen sharing, and more.
  • We introduced the Post-Quantum Cryptography hub that aggregates resources and showcases AWS research and engineering efforts focused on providing cryptographic security for our customers, and how AWS interfaces with the global cryptographic community.

Watch on demand

Were you unable to join the event in person? See the following for on-demand sessions.

Keynotes and leadership sessions

Watch the AWS re:Invent 2022 keynote where AWS Chief Executive Officer Adam Selipsky shares best practices for managing security, compliance, identity, and privacy in the cloud. You can also replay the other AWS re:Invent 2022 keynotes.

To learn about the latest innovations in cloud security from AWS and what you can do to foster a culture of security in your business, watch AWS Chief Information Security Officer CJ Moses’s leadership session with guest Deneen DeFiore, Chief Information Security Officer at United Airlines.

Breakout sessions and new launch talks

You can watch talks and learning sessions on demand to learn about the following topics:

  • See how AWS, customers, and partners work together to raise their security posture with AWS infrastructure and services. Learn about trends in identity and access management, threat detection and incident response, network and infrastructure security, data protection and privacy, and governance, risk, and compliance.
  • Dive into our launches! Hear from security experts on recent announcements. Learn how new services and solutions can help you meet core security and compliance requirements.

Consider joining us for more in-person security learning opportunities by saving the date for AWS re:Inforce 2023, which will be held June 13-14 in Anaheim, California. We look forward to seeing you there!

If you’d like to discuss how these new announcements can help your organization improve its security posture, AWS is here to help. Contact your AWS account team today.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Katie Collins

Katie Collins

Katie is a Product Marketing Manager in AWS Security, where she brings her enthusiastic curiosity to deliver products that drive value for customers. Her experience also includes product management at both startups and large companies. With a love for travel, Katie is always eager to visit new places while enjoying a great cup of coffee.

Author

Himanshu Verma

Himanshu is a Worldwide Specialist for AWS Security Services. In this role, he leads the go-to-market creation and execution for AWS Security Services, field enablement, and strategic customer advisement. Prior to AWS, he held several leadership roles in Product Management, engineering and development, working on various identity, information security and data protection technologies. He obsesses brainstorming disruptive ideas, venturing outdoors, photography and trying various “hole in the wall” food and drinking establishments around the globe.

Introducing the Security Design of the AWS Nitro System whitepaper

Post Syndicated from J.D. Bean original https://aws.amazon.com/blogs/security/introducing-the-security-design-of-the-aws-nitro-system-whitepaper/

AWS recently released a whitepaper on the Security Design of the AWS Nitro System. The Nitro System is a combination of purpose-built server designs, data processors, system management components, and specialized firmware that serves as the underlying virtualization technology that powers all Amazon Elastic Compute Cloud (Amazon EC2) instances launched since early 2018. With the Nitro System, AWS undertook an effort to reimagine the architecture of virtualization to deliver security, isolation, performance, cost savings, and a pace of innovation that our customers require.

This whitepaper is a detailed design document on the inner workings of the AWS Nitro System, and how we use it to help secure your most critical workloads. This is the first time that AWS has provided such a detailed design document on the Nitro System and how it offers a no-operator access design and strong tenant isolation. The whitepaper describes the security design of the Nitro System in detail to help you evaluate Amazon EC2 for your sensitive workloads.

Three key components of the Nitro System are used to implement this design:

  • Purpose-built Nitro Cards – Hardware devices designed by AWS that provide overall system control and I/O virtualization that is independent of the main system board with its CPUs and memory.
  • Nitro Security Chip – Enables a secure boot process for the overall system based on a hardware root of trust, the ability to offer bare metal instances, and defense-in-depth that offers protection to the server from unauthorized modification of system firmware.
  • Nitro Hypervisor – A deliberately minimized and firmware-like hypervisor designed to provide strong resource isolation, and performance that is nearly indistinguishable from a bare metal server.

The whitepaper describes the fundamental architectural change introduced by the Nitro System compared to previous approaches to virtualization. It discusses the three key components of the Nitro System, and provides a demonstration of how these components work together by walking through what happens when a new Amazon Elastic Block Store (Amazon EBS) volume is added to a running EC2 instance. The whitepaper also discusses how the Nitro System is designed to eliminate the possibility of administrator access to an EC2 server, the overall passive communications design of the Nitro System, and the Nitro System change management process. Finally, the paper surveys important aspects of the EC2 system design that provide mitigations against potential side-channel issues that can arise in compute environments.

The whitepaper dives deep into each of these considerations, offering a detailed picture of the Nitro System security design. For more information about cloud security at AWS, contact us.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

J.D. Bean

J.D. is a Principal Security Architect for Amazon EC2 based out of New York City. His interests include security, privacy, and compliance. He is passionate about his work enabling AWS customers’ successful cloud journeys. J.D. holds a Bachelor of Arts from The George Washington University and a Juris Doctor from New York University School of Law.

Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog

Post Syndicated from Aniket Jiddigoudar original https://aws.amazon.com/blogs/big-data/getting-started-with-aws-glue-data-quality-from-the-aws-glue-data-catalog/

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning (ML), and application development. You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores.

Hundreds of thousands of customers use data lakes for analytics and machine learning to make data-driven business decisions. Data consumers lose trust in data if it is not accurate and recent, making data quality essential for undertaking optimal and correct decisions.

Evaluation of the accuracy and freshness of data is a common task for engineers. Currently, there are various tools available to evaluate data quality. However, these tools often require manual processes of data discovery and expertise in data engineering and coding.

We are pleased to announce the public preview launch of AWS Glue Data Quality. You can access this feature today without requesting any additional access in the available Regions. AWS Glue Data Quality is a new preview feature of AWS Glue that measures and monitors the data quality of Amazon S3-based data lakes and in AWS Glue ETL jobs. It does not require any expertise in data engineering or coding. It simplifies your experience of monitoring and evaluating the quality of your data.

This is Part 1 of a four-part series of posts to explain how AWS Glue Data Quality works. Check out the next posts in the series:

Getting started with AWS Glue Data Quality

In this post, we will go over the simplicity of using the AWS Glue Data Quality feature by:

  1. Starting data quality recommendations and runs on your data in AWS Glue Data Catalog.
  2. Creating an Amazon CloudWatch alarm for getting notifications when data quality results are below a certain threshold.
  3. Analyzing your AWS Glue Data Quality run results through Amazon Athena.

Set up resources with AWS CloudFormation

The provided CloudFormation script creates the following resources for you:

  1. The IAM role required to run AWS Glue Data Quality runs
  2. An Amazon Simple Storage Service (Amazon S3) bucket to store the NYC Taxi dataset
  3. An S3 bucket to store and analyze the results of AWS Glue Data Quality runs
  4. An AWS Glue database and table created from the NYC Taxi dataset

Steps:

  1. Open the AWS CloudFormation console.
  2. Choose Create stack and then select With new resources (standard).
  3. For Template source, choose Upload a template File, and provide the above attached template file. Then choose Next.
  4. For Stack name, DataQualityDatabase, and DataQualityTable, leave as default. For DataQualityS3BucketName, enter the name of your S3 bucket. Then choose Next.
  5. On the final screen, make sure to acknowledge that this stack would create IAM resources for you, and choose Submit.
  6. Once the stack is successfully created, navigate to the S3 bucket created by the stack and upload the yellow_tripdata_2022-01.parquet file.

Start an AWS Glue Data Quality run on your data in AWS Glue Data Catalog

In this first section, we will generate data quality rule recommendations from the AWS Glue Data Quality service. Using these recommendations, we will then run a data quality task against our dataset to obtain an analysis of our data.

To get started, complete the following steps:

  1. Open AWS Glue console.
  2. Choose Tables under Data Catalog.
  3. Select the DataQualityTable table created via the CloudFormation stack.
  4. Select the Data quality tab.
  5. Choose Recommend ruleset.
  6. On the Recommend data quality rules page, check Save recommended rules as a ruleset. This will allow us to save the recommended rules in a ruleset automatically, for use in the next steps.
  7. For IAM Role, choose the IAM role that was created from the CloudFormation stack.
  8. For Additional configurations -optional, leave the default number of workers and timeout.
  9. Choose Recommend ruleset. This will start a data quality recommendation run, with the given number of workers.
  10. Wait for the ruleset to be completed.
  11. Once completed, navigate back to the Rulesets tab. You should see a successful recommendation run and a ruleset created.

Understand AWS Glue Data Quality recommendations

AWS Glue Data Quality recommendations are suggestions generated by the AWS Glue Data Quality service and are based on the shape of your data. These recommendations automatically take into account aspects like RowCounts, Mean, Standard Deviation etc. of your data, and generate a set of rules, for you to use as a starting point.

The dataset used here was the NYC Taxi dataset. Based on this, the columns in this dataset, and the values of those columns, AWS Glue Data Quality recommends a set of rules. In total, the recommendation service automatically took into consideration all the columns of the dataset, and recommended 55 rules.

Some of these rules are:

  • “RowCount between <> and <> ” → Expect a count of number of rows based on the data it saw
  • “ColumnValues “VendorID” in [ ] → Expect the ”VendorID“ column to be within a specific set of values
  • IsComplete “VendorID” → Expect the “VendorID” to be a non-null value

How do I use the recommended AWS Glue Data Quality rules?

  1. From the Rulesets section, you should see your generated ruleset. Select the generated ruleset, and choose Evaluate ruleset.
    • If you didn’t check the box to Save recommended rules as a ruleset when you ran the recommendation, you can still click on the recommendation task run and copy the rules to create a new ruleset
  2. For Data quality actions under Data quality properties, select Publish metrics to Amazon CloudWatch. If this box isn’t checked, the data quality run will not publish metrics to Amazon CloudWatch.
  3. For IAM role, select the GlueDataQualityBlogRole created in the AWS CloudFormation stack.
  4. For Requested number of workers under Advanced properties, leave as default. 
  5. For Data quality results location, select the value of the GlueDataQualityResultsS3Bucket location that was created via the AWS CloudFormation stack
  6. Choose Evaluate ruleset.
  7. Once the run begins, you can see the status of the run on the Data quality results tab.
  8. After the run reaches a successful stage, select the completed data quality task run, and view the data quality results shown in Run results.

Our recommendation service suggested that we enforce 55 rules, based on the column values and the data within our NYC Taxi dataset. We then converted the collection of 55 rules into a RuleSet. Then, we ran a Data Quality Evaluation task run using our RuleSet against our dataset. In our results above, we see the status of each within the RuleSet.

You can also utilize the AWS Glue Data Quality APIs to carry out these steps.

Get Amazon SNS notifications for my failing data quality runs through Amazon CloudWatch alarms

Each AWS Glue Data Quality evaluation run from the Data Catalog, emits a pair of metrics named glue.data.quality.rules.passed (indicating a number of rules that passed) and glue.data.quality.rules.failed (indicating the number of failed rules) per data quality run. This emitted metric can be used to create alarms to alert users if a given data quality run falls below a threshold.

To get started with setting up an alarm that would send an email via an Amazon SNS notification, follow the steps below:

  1. Open Amazon CloudWatch console.
  2. Choose All metrics under Metrics. You will see an additional namespace under Custom namespaces titled Glue Data Quality.

Note: When starting an AWS Glue Data Quality run, make sure the Publish metrics to Amazon CloudWatch checkbox is enabled, as shown below. Otherwise, metrics for that particular run will not be published to Amazon CloudWatch.

  1. Under the Glue Data Quality namespace, you should be able to see metrics being emitted per table, per ruleset. For the purpose of our blog, we shall be using the glue.data.quality.rules.failed rule and alarm, if this value goes over 1 (indicating that, if we see a number of failed rule evaluations greater than 1, we would like to be notified).
  2. In order to create the alarm, choose All alarms under Alarms.
  3. Choose Create alarm.
  4. Choose Select metric.
  5. Select the glue.data.quality.rules.failed metric corresponding to the table you’ve created, then choose Select metric.
  6. Under the Specify metric and conditions tab, under the Metrics section:
    1. For Statistic, select Sum.
    2. For Period, select 1 minute.
  7. Under the Conditions section:
    1. For Threshold type, choose Static.
    2. For Whenever glue.data.quality.rules.failed is…, select Greater/Equal.
    3. For than…, enter 1 as the threshold value.
    4. Expand the Additional configurations dropdown and select Treat missing data as good

These selections imply that if the glue.data.quality.rules.failed metric emits a value greater than or equal to 1, we will trigger an alarm. However, if there is no data, we will treat it as acceptable.

  1. Choose Next.
  2. On Configure actions:
    1. For the Alarm state trigger section, select In alarm .
    2. For Send a notification to the following SNS topic, choose Create a new topic to send a notification via a new SNS topic.
    3. For Email endpoints that will receive the notification…, enter your email address. Choose Next.
  3. For Alarm name, enter myFirstDQAlarm, then choose Next.
  4. Finally, you should see a summary of all the selections on the Preview and create screen. Choose Create alarm at the bottom.
  5. You should now be able to see the alarm being created from the Amazon CloudWatch alarms dashboard.

In order to demonstrate AWS Glue Data Quality alarms, we are going to go over a real-world scenario where we have corrupted data being ingested, and how we could use the AWS Glue Data Quality service to get notified of this, using the alarm we created in the previous steps. For this purpose, we will use the provided file malformed_yellow_taxi.parquet that contains data that has been tweaked on purpose.

  1. Navigate to the S3 location DataQualityS3BucketName mentioned in the CloudFormation template supplied at the beginning of the blog post.
  2. Upload the malformed_yellow_tripdata.parquet file to this location. This will help us simulate a flow where we have a file with poor data quality coming into our data lakes via our ETL processes.
  3. Navigate to the AWS Glue Data Catalog console, select the demo_nyc_taxi_data_input that was created via the provided AWS CloudFormation template and then navigate to the Data quality tab.
  4. Select the RuleSet we had created in the first section. Then select Evaluate ruleset.
  5. From the Evaluate data quality screen:
    1. Check the box to Publish metrics to Amazon CloudWatch. This checkbox is needed to ensure that the failure metrics are emitted to Amazon CloudWatch.
    2. Select the IAM role created via the AWS CloudFormation template.
    3. Optionally, select an S3 location to publish your AWS Glue Data Quality results.
    4. Select Evaluate ruleset.
  6.  Navigate to the Data Quality results tab. You should now see two runs, one from the previous steps of this blog and one that we currently triggered. Wait for the current run to complete.
  7. As you see, we have a failed AWS Glue Data Quality run result, with only 52 of our original 55 rules passing. These failures are attributed to the new file we uploaded to S3.
  8. Navigate to the Amazon CloudWatch console and select the alarm we created at the beginning of this section.
  9. As you can see, we configured the alarm to fire every time the glue.data.quality.rules.failed metric crosses a threshold of 1. After the above AWS Glue Data Quality run, we see 3 rules failing, which triggered the alarm. Further, you also should have gotten an email detailing the alarm’s firing.

We have thus demonstrated an example where incoming malformed data, coming into our data lakes can be identified via the AWS Glue Data Quality rules, and subsequent alerting mechanisms can be created to notify appropriate personas.

Analyze your AWS Glue Data Quality run results through Amazon Athena

In scenarios where you have multiple AWS Glue Data Quality run results against a dataset, over a period of time, you might want to track the trends of the dataset’s quality over a period of time. To achieve this, we can export our AWS Glue Data Quality run results to S3, and use Amazon Athena to run analytical queries against the exported run. The results can then be further used in Amazon QuickSight to build dashboards to have a graphical representation of your data quality trends

In the third part of this post, we will see the steps needed to start tracking data on your dataset’s quality:

  1. For our data quality runs that we set up in the previous sections, we set the Data quality results location parameter to the bucket location specified by the AWS CloudFormation stack.
  2. After each successful run, you should see a single JSONL file being exported to your selected S3 location, corresponding to that particular run.
  3. Open the Amazon Athena console.
  4. In the query editor, run the following CREATE TABLE statement (replace the <my_table_name> with a relevant value, and <GlueDataQualityResultsS3Bucket_from_cfn> section with the GlueDataQualityResultsS3Bucket value from the provided AWS CloudFormation template):
    CREATE EXTERNAL TABLE `<my_table_name>`(
    `catalogid` string,
    `databasename` string,
    `tablename` string,
    `dqrunid` string,
    `evaluationstartedon` timestamp,
    `evaluationcompletedon` timestamp,
    `rule` string,
    `outcome` string,
    `failurereason` string,
    `evaluatedmetrics` string)
    PARTITIONED BY (
    `year` string,
    `month` string,
    `day` string)
    ROW FORMAT SERDE
    'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES (
    'paths'='catalogId,databaseName,dqRunId,evaluatedMetrics,evaluationCompletedOn,evaluationStartedOn,failureReason,outcome,rule,tableName')
    STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
    's3://<GlueDataQualityResultsS3Bucket_from_cfn>/'
    TBLPROPERTIES (
    'classification'='json',
    'compressionType'='none',
    'typeOfData'='file')
    
    MSCK REPAIR TABLE `<my_table_name>`

  5. Once the above table is created, you should be able to run queries to analyze your data quality results.

For example, consider the following query that shows me the failed AWS Glue Data Quality runs against my table demo_nyc_taxi_data_input within a time window:

SELECT * from "<my_table_name>"
WHERE "outcome" = 'Failed'
AND "tablename" = '<my_source_table>'
AND "evaluationcompletedon" between
parse_datetime('2022-12-05 17:00:00:000', 'yyyy-MM-dd HH:mm:ss:SSS') AND parse_datetime('2022-12-05 20:00:00:000', 'yyyy-MM-dd HH:mm:ss:SSS');

The output of the above query shows me details about all the runs with “outcome” = ‘Failed’ that ran against my NYC Taxi dataset table ( “tablename” = ‘demo_nyc_taxi_data_input’ ). The output also gives me information about the failure reason ( failurereason ) and the values it was evaluated against ( evaluatedmetrics ).

As you can see, we are able to get detailed information about our AWS Glue Data Quality runs, via the run results uploaded to S3, perform more detailed analysis and build dashboards on top of the data.

Clean up

  • Navigate to the Amazon Athena console and delete the table created for data quality analysis.
  • Navigate to the Amazon CloudWatch console and delete the alarms created.
  • If you deployed the sample CloudFormation stack, delete the CloudFormation stack via the AWS CloudFormation console. You will need to empty the S3 bucket before you delete the bucket.
  • If you have enabled your AWS Glue Data Quality runs to output to S3, empty those buckets as well.

Conclusion

In this post, we talked about the ease and speed of incorporating data quality rules using the AWS Glue Data Quality feature, into your AWS Glue Data Catalog tables. We also talked about how to run recommendations and evaluate data quality against your tables. We then discussed analyzing the data quality results via Amazon Athena, and the process for setting up alarms via Amazon CloudWatch in order to notify users of failed data quality.

To dive into the AWS Glue Data Quality APIs, take a look at the AWS Glue Data Quality API documentation
To learn more about AWS Glue Data Quality, check out the AWS Glue Data Quality Developer Guide


About the authors

Aniket Jiddigoudar is a Big Data Architect on the AWS Glue team.

Joseph Barlan is a Frontend Engineer at AWS Glue. He has over 5 years of experience helping teams build reusable UI components and is passionate about frontend design systems. In his spare time, he enjoys pencil drawing and binge watching tv shows.

Authority to operate (ATO) on AWS Program now available for customers in Spain

Post Syndicated from Greg Herrmann original https://aws.amazon.com/blogs/security/authority-to-operate-on-aws-program-now-available-for-customers-in-spain/

Meeting stringent security and compliance requirements in regulated or public sector environments can be challenging and time consuming, even for organizations with strong technical competencies. To help customers navigate the different requirements and processes, we launched the ATO on AWS Program in June 2019 for US customers. The program involves a community of expert AWS partners to help support and accelerate customers’ ability to meet their security and compliance obligations.

We’re excited to announce that we have now expanded the ATO on AWS Program to Spain. As part of the launch in Spain, we recruited and vetted five partners with a demonstrated competency in helping customers meet Spanish and European Union (EU) regulatory compliance and security requirements, such as the General Data Protection Regulation (GDPR), Esquema Nacional de Seguridad (ENS), and European Banking Authority guidelines.

How Does the ATO on AWS Program support customers?

The primary offering of the ATO on AWS Program is access to a community of vetted, expert partners that specialize in customers’ authorization needs, whether it be architecting, configuring, deploying, or integrating tools and controls. The team also provides direct engagement activities to introduce you to publicly available and no-cost resources, tools, and offerings so you can work to meet your security obligations on AWS. These activities include one-on-one meetings, answering questions, technical workshops (in specific cases), and more.

Who are the partners?

Partners in the ATO on AWS Program go through a rigorous evaluation conducted by a team of AWS Security and Compliance experts. Before acceptance into the program, the partners complete a checklist of criteria and provide detailed evidence that they meet those criteria.

Our initial launch in Spain includes the following five partners that have successfully met the criteria to join the program. Each partner has also achieved the Esquema Nacional de Seguridad certification.

  • ATOS – a global leader in digital transformation, cybersecurity, and cloud and high performance computing. ATOS was ranked #1 in Managed Security Services (MSS) revenue by Gartner in 2021.
  • Indra Sistemas – a global technology and consulting company that provides proprietary solutions for the transport and defense markets. It also offers digital transformation consultancy and information technologies in Spain and Latin America through its affiliate Minsait.
  • NTT Data EMEAL ­– an operational company created from an alliance between everis and NTT DATA EMEAL to support clients in Europe and Latin America. NTT Data EMEAL supports customers through strategic consulting and advisory services, new technologies, applications, infrastructure, IT modernization, and business process outsourcing (BPO).
  • Telefónica Tech – a leading company in digital transformation. Telefónica Tech combines cybersecurity and cloud technologies to help simplify technological environments and build appropriate solutions for customers.
  • T-Systems – a leading service provider for the public sector in Spain. As an AWS Premier Tier Services Partner and Managed Service Provider, T-Systems maintains the Security and Migration Competencies, supporting customers with migration and secure operation of applications.

For a complete list of ATO on AWS Program partners, see the ATO on AWS Partners page.

Engage the ATO on AWS Program

Customers seeking support can engage the ATO on AWS Program and our partners in multiple ways. The best way to reach us is to complete a short, online ATO on AWS Questionnaire so we can learn more about your timeline and goals. If you prefer to engage AWS partners directly, see the complete list of our partners and their contact information at ATO on AWS Partners.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Greg Herrmann

Greg Herrmann

Greg has worked in the security and compliance field for over 18 years, supporting both classified and unclassified workloads for U.S. federal and DoD customers. He has been with AWS for more than 6 years as a Senior Security Partner Strategist for the Security and Compliance Partner Team, working with AWS partners and customers to accelerate security and compliance processes.

Borja Larrumbide

Borja Larrumbide

Borja is a Security Assurance Manager for AWS in Spain and Portugal. Previously, he worked at companies such as Microsoft and BBVA in different roles and sectors. Borja is a seasoned security assurance practitioner with years of experience engaging key stakeholders at national and international levels. His areas of interest include security, privacy, risk management, and compliance.

How to use Amazon Verified Permissions for authorization

Post Syndicated from Jeremy Ware original https://aws.amazon.com/blogs/security/how-to-use-amazon-verified-permissions-for-authorization/

Applications with multiple users and shared data require permissions management. The permissions describe what each user of an application is permitted to do. Permissions are defined as allow or deny decisions for resources in the application.

To manage permissions, developers often combine attribute-based access control (ABAC) and role-based access control (RBAC) models with custom code coupled with business logic. This requires a review of the code to understand the permissions, and changes to the code to modify the permissions. Auditing permissions within an application can require the same level of time and effort as a full application code review. This can cause delays to deliver and require additional time and resources to ascertain permissions across your application.

In this post, I will show you how to use Amazon Verified Permissions to define permissions within custom applications using the Cedar policy language. I’ll also show you how authorization requests are made.

Overview of Amazon Verified Permissions

Amazon Verified Permissions provides a prebuilt, flexible permissions system that you can use to build permissions based on both ABAC and RBAC in your applications. You define and manage fine-grained permissions using both permit policies, that grant permissions, and forbid policies, that restrict an action. This lets you focus on building or modernizing the application.

Amazon Verified Permissions maintains a centralized policy store, which helps you manage permissions throughout an application, authorize actions, and analyze permissions with automated reasoning. It also has an evaluation simulator tool to help you test your authorization decisions and author policies.

Policy creation

To author policies with Amazon Verified Permissions, use the purpose-built Cedar policy language to create specific permission policies that include traits of ABAC and RBAC. This allows you to apply granularity with least privilege in mind.

The following figure shows a permission policy for a document management application. In the figure, between the set of parentheses on lines 1-4 of the policy, RBAC is used, based on the principal’s UserGroup, to limit the permit action to registered users—and not guest or machine principals, for example. Between the brackets on lines 5–7 of the policy, ABAC is used, where resource.owner == principal limits access to the resource to only the owner.

Figure 1: Using the Cedar policy language to create permissions

Figure 1: Using the Cedar policy language to create permissions

Policies are developed in two ways:

  • Developers build out policies as part of the deployment of the application – Policy permissions that are defined as part of deployment are a great way for developers to set up guardrails on actions that should not cross set boundaries.
  • Policies are created through the use of the application by end users – Policy permissions that are configurable within the application provide the freedom for data to be shared between users.

We will walk you through these two approaches in the following sections.

Create policies as part of the deployment of the application

The following figure shows how a developer can configure a permit policy as part of the deployment of an application.

Figure 2: Creating policies as part of the deployment of the application

Figure 2: Creating policies as part of the deployment of the application

Policies configured by developers with pre-defined permissions that are deployed alongside the application is a familiar method for setting up guardrails in an application. Consider the document management application shown in Figure 3. There is a permit policy in place that allows users to view their own documents. Without a policy, the default result is a deny. You should also configure explicit forbid policies to act as guardrails to prevent overly permissive policies. In Figure 3, the policy restricts a user to only GET documents that they own or that are not tagged as private.

Figure 3: Example of a permit policy using Cedar

Figure 3: Example of a permit policy using Cedar

Create policies within the application by end users

The following figure shows how end users can apply policies within the application.

Figure 4: How permissions can be applied using policies for application end users

Figure 4: How permissions can be applied using policies for application end users

In a document sharing application, the application usually provides a simple end-user experience with a menu containing point-and-click actions that allow the user to select predefined permissions, such as read, write, or delete. Abstracted by the application, these permissions are transformed into Amazon Verified Permissions policy statements and stored in the designated policy location for the application. When an end user tries to take actions protected by these permissions, the application queries the Amazon Verified Permissions backend to determine if the principal in question has permissions to do so.

You can allow users of the application to create policies directly with respect to their given environments or current permissions. For example, if the application is targeted to system administrators or engineers who are technically proficient, you might choose not to hide the policy generation process behind a UI. The Amazon Verified Permissions policy grammar is designed for users comfortable with text-based query languages. Figure 5 shows an example policy that allows a user to GET or POST documents that they own.

Figure 5: Amazon Verified Permissions policy grammar written with Cedar to define permissions

Figure 5: Amazon Verified Permissions policy grammar written with Cedar to define permissions

Conclusion

Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service that helps you build and modernize applications without relying heavily on coding authorization within your applications. By using the Cedar policy language, you can define granular access controls that use both RBAC and ABAC and help end users create policies within the application. This allows for alignment of authorization standards across applications and provides clear visibility into existing permissions for review and audibility.

To learn more about ABAC and RBAC and how to design policy statements, see the blog post Get the best out of Amazon Verified Permissions by using fine-grained authorization methods.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Jeremy Wave

Jeremy Ware

Jeremy is a Security Specialist Solutions Architect focused on Identity and Access Management. Jeremy and his team enable AWS customers to implement sophisticated, scalable, and secure IAM architecture and Authentication workflows to solve business challenges. With a background in Security Engineering, Jeremy has spent many years working to raise the Security Maturity gap at numerous global enterprises. Outside of work, Jeremy loves to explore the mountainous outdoors participate in sports such as Snowboarding, Wakeboarding, and Dirt bike riding.

AWS achieves GNS Portugal certification for classified information

Post Syndicated from Rodrigo Fiuza original https://aws.amazon.com/blogs/security/aws-achieves-gns-portugal-certification-for-classified-information/

GNS Logo

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS), and we are pleased to announce that our Regions and AWS Edge locations in Europe are now certified by the Portuguese GNS/NSO (National Security Office) at the National Restricted level. This certification demonstrates our ongoing commitment to adhere to the heightened expectations for cloud service providers to process, transmit, and store classified data.

The GNS certification is based on NIST SP800-53 R4 and CSA CCM v4 frameworks, with the goal of protecting the processing and transmission of classified information.

AWS was evaluated by Adyta Lda, an independent third-party auditor, and by GNS Portugal. The Certificate of Compliance illustrating the compliance status of AWS is available on the GNS Certifications page and through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

As of this writing, 26 services offered in Europe are in scope of this certification. For up-to-date information, including when additional services are added, see the AWS Services in Scope by Compliance Program and select GNS.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about GNS Portugal compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Rodrigo Fiuza

Rodrigo is a Security Audit Manager at AWS, based in São Paulo. He leads audits, attestations, certifications, and assessments across Latin America, Caribbean and Europe. Rodrigo has previously worked in risk management, security assurance, and technology audits for the past 12 years.

Approaches for authenticating external applications in a machine-to-machine scenario

Post Syndicated from Patrick Sard original https://aws.amazon.com/blogs/security/approaches-for-authenticating-external-applications-in-a-machine-to-machine-scenario/

December 8, 2022: This post has been updated to reflect changes for M2M options with the new service of IAMRA. This blog post was first published November 19, 2013.

August 10, 2022: This blog post has been updated to reflect the new name of AWS Single Sign-On (SSO) – AWS IAM Identity Center. Read more about the name change here.


Amazon Web Services (AWS) supports multiple authentication mechanisms (AWS Signature v4, OpenID Connect, SAML 2.0, and more), essential in providing secure access to AWS resources. However, in a strictly machine-to machine (m2m) scenario, not all are a good fit. In these cases, a human is not present to provide user credential input. An example of such a scenario is when an on-premises application sends data to an AWS environment, as shown in Figure 1.

This post is designed to help you decide which approach is best to securely connect your applications, either residing on premises or hosted outside of AWS, to your AWS environment when no human interaction comes into play. We will go through the various alternatives available and highlight the pros and cons of each.

Figure 1: Securely connect your external applications to AWS in machine-to-machine scenarios

Figure 1: Securely connect your external applications to AWS in machine-to-machine scenarios

Determining the best approach

Let’s start by looking at possible authentication mechanisms that AWS supports in the following table. We’ll first identify the AWS service or services where the authentication can be set up—called the AWS front-end service. Then we’ll point out the AWS service that actually handles the authentication with AWS in the background—called the AWS backend service. We will also assess each mechanism based on use case.

Table 1: Authentication mechanisms available in AWS
Authentication mechanism AWS front-end service AWS backend service Good for m2m communication?
AWS Signature v4
  • All
AWS Security Token Service (AWS STS) Yes
Mutual TLS AWS STS Yes
OpenID Connect AWS STS Yes
SAML AWS STS Yes
Kerberos
  • n/a
AWS STS Yes
Microsoft Active Directory communication AWS STS No
IAM Roles Anywhere AWS STS Yes

Notes

We’ll now review each of these alternatives and also evaluate two additional characteristics on a 5-grade scale (from very low to very high) for each authentication mechanism:

  • Complexity: How complex is it to implement the authentication mechanism?
  • Convenience: How convenient is it to use the authentication mechanism on an ongoing basis?

As you’ll see, not all of the mechanisms are necessarily a good fit for a machine-to-machine scenario. Our focus here is on authentication of external applications, but not authentication of servers or other computers or Internet of Things (IoT) devices, which has already been documented extensively.

Active Directory–based authentication is available through either AWS IAM Identity Center or a limited set of AWS services and is meant in both cases to provide end users with access to AWS accounts and business applications. Active Directory–based authentication is also used broadly to authenticate devices such as Windows or Linux computers on a network. However, it isn’t used for authenticating applications with AWS. For that reason, we’ll exclude it from further scrutiny in this article.

Let’s look at the remaining authentication mechanisms one by one, with their respective pros and cons.

AWS Signature v4

The purpose of AWS Signature v4 is to authenticate incoming HTTP(S) requests to AWS services APIs. The AWS Signature v4 process is explained in detail in the documentation for the AWS APIs but, in a nutshell, the caller computes a signature using their credentials and then adds it to the header of the HTTP(S) request. On the other end, AWS accepts the request only if the provided signature is valid.

Figure 2: AWS Signature v4 authentication

Figure 2: AWS Signature v4 authentication

Native to AWS, low in complexity and highly convenient, AWS Signature v4 is the natural choice for machine-to-machine authentication scenarios with AWS. It is used behind the scenes by the AWS Command Line Interface (AWS CLI) and the AWS SDKs.

Pros

  • AWS Signature v4 is very convenient: the signature is built in the SDKs provided by AWS and is automatically computed on the caller’s behalf. If you prefer not to use an SDK, the signature process is a simple computation that can be implemented in any programming language.
  • There are fewer credentials to manage. No need to manage tedious digital certificates or even long-lived AWS credentials, because the AWS Signature v4 process supports temporary AWS credentials.
  • There is no need to interact with a third-party identity provider: once the request is signed, you’re good to go, provided that the signature is valid.

Cons

  • If you prefer not to store long-lived AWS credentials for your on-premises applications, you must first perform authentication through a third-party identity provider to obtain temporary AWS credentials. This would require using either OpenID Connect or SAML, in addition to AWS Signature v4. You could also use IAM Roles Anywhere, which exchanges a trusted certificate for temporary AWS credentials.

Mutual TLS

Mutual TLS, more specifically the mutual authentication mechanism of the Transport Layer Security (TLS) Protocol, allows the authentication of both ends—the client and the server sides—of a communication channel. By default, the server side of the TLS channel is always authenticated. With mutual TLS, the clients must also present a valid X.509 certificate for their identity to be verified.

Amazon API Gateway has recently announced native support for mutual TLS authentication (see this blog post for more details on the new feature). You can enable mutual TLS authentication on custom domains to authenticate your regional REST and HTTP APIs (except for private or edge APIs, for which the new feature is not supported at the time of this writing).

Figure 3: Mutual TLS authentication

Figure 3: Mutual TLS authentication

Mutual TLS can be both time-consuming and complicated to set up, but it is a widespread authentication mechanism.

Pros

  • Mutual TLS is widespread for IoT and business-to-business applications

Cons

  • You need to manage the digital certificates and their lifecycles. This can add significant burden and complexity to your IT operations.
  • You also need, at an application level, to pay special care to revoked certificates to reduce the risk of misuse. Since API Gateway doesn’t automatically verify if a client certificate has been revoked, you have to implement your own logic to do so, such as by using a Lambda authorizer.

OpenID Connect

OpenID Connect (OIDC), specifically OIDC 1.0, is a standard built on top of the OAuth 2.0 authorization framework to provide authentication for mobile and web-based applications. The OIDC client authentication method can be used by a client application to gain access to APIs exposed through Amazon API Gateway. The client application typically authenticates to an OAuth 2.0 authorization server, such as Amazon Cognito or another solution supporting that standard. As a result, the client application obtains a JSON Web Token (JWT) from the OAuth 2.0 authorization server. API Gateway then allows or denies the request based on the JWT validation. For more information about the access control part of this process, see the Amazon API Gateway documentation.

Figure 4: OIDC client authentication

Figure 4: OIDC client authentication

OIDC can be complex to put in place, but it’s a widespread authentication mechanism, especially for mobile and web applications and microservices architecture, including machine-to-machine scenarios.

Pros

  • With OIDC, you avoid storing long-lived AWS credentials for your on-premises applications.
  • OIDC uses REST or JSON message flows over HTTP, which makes it a particularly good fit (compared to SAML) for application developers today.

Cons

  • You need to store and maintain a set of credentials for each client application (such as client id and client secret) and make it accessible to the application. This can add complexity to your IT operations.

SAML

SAML 2.0 is an open standard for exchanging identity and security information between applications and service providers. SAML can be used to delegate authentication to a third-party identity provider, such as an Active Directory environment that is running on premises, and to gain access to AWS by providing a valid SAML assertion. (See About SAML 2.0-based federation to learn how to configure your AWS environment to leverage SAML 2.0.)

IAM validates the SAML assertion with your identity provider and, upon success, provides a set of AWS temporary credentials to the requesting party. The whole process is described in the IAM documentation.

Figure 5: SAML authentication

Figure 5: SAML authentication

SAML can be complex to put in place, but it’s a versatile authentication mechanism that can fit a lot of different use cases, including machine-to-machine scenarios.

Pros

  • With SAML, you not only avoid storing long-lived AWS credentials for your on-premises applications, but you can also use an existing on-premises directory, such as Active Directory, as an identity provider.
  • SAML doesn’t prescribe any particular technology or protocol by which the authentication should take place. The developer has total freedom to employ whichever is more convenient or makes more sense: key-based (such as X.509 certificates), ticket-based (such as Kerberos), or another applicable mechanism.
  • SAML is also a good fit when protocol bindings other than HTTP are needed.

Cons

  • Using SAML with AWS requires a third-party identity provider for your on-premises environment.
  • SAML also requires a trust to be established between your identity provider and your AWS environment, which adds more complexity to the process.
  • Because SAML is XML-based, it isn’t as concise or nimble as AWS Signature v4 or OIDC, for example.
  • You need to manage the SAML assertions and their lifecycles. This can add significant burden and complexity to your IT operations.

Kerberos

Initially developed by MIT, Kerberos v5 is an IETF standard protocol that enables client/server authentication on an unprotected network. It isn’t supported out-of-the-box by AWS, but you can use an identity provider, such as Active Directory, to exchange the Kerberos ticket provided to your application for either an OIDC/OAuth token or a SAML assertion that can be validated by AWS.

Figure 6: Kerberos authentication (through SAML or OIDC)

Figure 6: Kerberos authentication (through SAML or OIDC)

Kerberos is highly complex to set up, but it can make sense in cases where you already have an on-premises environment with Kerberos authentication in place.

Pros

  • With Kerberos, you not only avoid storing long-lived AWS credentials for your on-premises applications, but you can also use an existing on-premises directory, such as Active Directory, as an identity provider.

Cons

  • Using Kerberos with AWS requires the Kerberos ticket to be converted into something that can be accepted by AWS. Therefore, it requires you to use either the OIDC or SAML authentication mechanisms, as described previously.

IAM Roles Anywhere

IAM Roles Anywhere establishes a trust between your AWS account and the certificate authority (CA) that issues certificates to your on-premises workloads using public key infrastructure (PKI). For a detailed overview, see the blog post Extend AWS IAM roles to workloads outside of AWS with IAM Roles Anywhere. Your workloads outside of AWS use IAM Roles Anywhere to exchange x.509 certificates for temporary AWS credentials in order to interact with AWS APIs, thus removing the need for long-term credentials in your on-premises applications. IAM Roles Anywhere enables short-term credentials for numerous hybrid environment use cases including machine-to-machine scenarios.

Figure 7: IAMRA authentication process

Figure 7: IAMRA authentication process

IAM Roles Anywhere is a versatile authentication mechanism that can fit a lot of different use cases, including machine-to-machine scenarios where your on-premises workload is accessing AWS resources.

Pros

  • With IAM Roles Anywhere you avoid storing long-lived AWS credentials for your on-premises workloads.
  • You can import a certificate revocation list (CRL) from your certificate authority (CA) to support certificate revocation.

Cons

  • You need to manage the digital certificates and their lifecycles. This can add complexity to your IT operations.
  • IAM Roles Anywhere does not support callbacks to CRL distribution points (CDPs) or Online Certificate Status Protocol (OCSP) endpoints.

Conclusion

Now we’ll collect and summarize this discussion in the following table, with the pros and cons of each approach.

Authentication mechanism AWS front-end service Complexity Convenience
AWS Signature v4
  • All
Low Very High
Mutual TLS
  • AWS IoT Core
  • Amazon API Gateway
Medium High
OpenID Connect
  • Amazon Cognito
  • Amazon API Gateway
Medium High
SAML
  • Amazon Cognito
  • AWS Identity and Access Management (IAM)
High Medium
Kerberos
  • n/a
Very High Low
IAM Roles Anywhere
  • AWS Identity and Access Management (IAM)
Medium High

AWS Signature v4 is the most convenient and least complex mechanism of these options, but as for every situation, it’s important to start from your own requirements and context before making a choice. Additional factors may influence your choice, such as the structure or the culture of your organization, or the resources available for your project. Keeping the discussion focused on simple factors on purpose, we’ve come up with the following actionable decision helper.

Use AWS Signature v4 when:

  • You have access to AWS credentials (temporary or long-lived)
  • You want to call AWS services directly through their APIs

Use mutual TLS when:

  • The cost and effort of maintaining digital certificates is acceptable for your organization
  • Your organization already has a process in place to maintain digital certificates
  • You plan to call AWS services indirectly through custom-built APIs

Use OpenID Connect when:

  • You need or want to procure temporary AWS credentials by using a REST-based mechanism
  • You want to call AWS services directly through their APIs

Use SAML when:

  • You need to procure temporary AWS credentials
  • You already have a SAML-based authentication process in place
  • You want to call AWS services directly through their APIs

Use Kerberos when:

  • You already have a Kerberos-based authentication process in place
  • None of the previously mentioned mechanisms can be used for your use case

Use IAMRA when:

  • The cost and effort of maintaining digital certificates is acceptable for your organization
  • Your organization already has a process in place to maintain digital certificates
  • You want to call AWS services directly through their APIs
  • You need temporary security credentials for workloads such as servers, containers, and applications that run outside of AWS

We hope this post helps you find your way among the various alternatives that AWS offers to securely connect your external applications to your AWS environment, and to select the most appropriate mechanism for your specific use case. We look forward to your feedback.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on one of the AWS Developer forums or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Patrick Sard

Patrick works as a Solutions Architect at AWS. Apart from being a cloud enthusiast, Patrick loves practicing tai-chi (preferably Chen style), enjoys an occasional wine-tasting (he trained as a Sommelier), and is an avid tennis player.

Jeremy Wave

Jeremy Ware

Jeremy is a Security Specialist Solutions Architect focused on Identity and Access Management. Jeremy and his team enable AWS customers to implement sophisticated, scalable, and secure IAM architecture and Authentication workflows to solve business challenges. With a background in Security Engineering, Jeremy has spent many years working to raise the Security Maturity gap at numerous global enterprises. Outside of work, Jeremy loves to explore the mountainous outdoors participate in sports such as Snowboarding, Wakeboarding, and Dirt bike riding.

Amazon EMR launches support for Amazon EC2 C6i, M6i, I4i, R6i and R6id instances to improve cost performance for Spark workloads by 6–33%

Post Syndicated from Al MS original https://aws.amazon.com/blogs/big-data/amazon-emr-launches-support-for-amazon-ec2-c6i-m6i-i4i-r6i-and-r6id-instances-to-improve-cost-performance-for-spark-workloads-by-6-33/

Amazon EMR provides a managed service to easily run analytics applications using open-source frameworks such as Apache Spark, Hive, Presto, Trino, HBase, and Flink. The Amazon EMR runtime for Spark and Presto includes optimizations that provide over two times performance improvements over open-source Apache Spark and Presto, so that your applications run faster and at lower cost.

With Amazon EMR release 6.8, you can now use Amazon Elastic Compute Cloud (Amazon EC2) instances such as C6i, M6i, I4i, R6i, and R6id, which use the third-generation Intel Xeon scalable processors. Using these new instances with Amazon EMR improves cost-performance by an additional 5–33% over previous generation instances.

In this post, we describe how we estimated the cost-performance benefit from using Amazon EMR with these new instances compared to using equivalent previous generation instances.

Amazon EMR runtime performance improvements with EC2 I4i instances

We ran TPC-DS 3 TB benchmark queries on Amazon EMR 6.8 using the Amazon EMR runtime for Apache Spark (compatible with Apache Spark 3.3) with five node clusters of I4i instances with data in Amazon Simple Storage Service (Amazon S3), and compared it to equivalent sized I3 instances. We measured performance improvements using the total query runtime and geometric mean of query runtime across the TPC-DS 3 TB benchmark queries.

Our results showed between 36.41–44.39% improvement in total query runtime performance on I4i instance EMR clusters compared to equivalent I3 instance EMR clusters, and between 36–45.2% improvement in geometric mean. To measure cost improvement, we added up the Amazon EMR and Amazon EC2 cost per instance per hour (on-demand) and multiplied it by the total query runtime. Note that I4i 32XL instances were not benchmarked because I3 instances don’t have the 32 XL size available. We observed between 22.56–33.1% reduced instance hour cost on I4i instance EMR clusters compared to equivalent I3 instance EMR clusters to run the TPC-DS benchmark queries. All TPC-DS queries ran faster on I4i instance clusters compared to I3 instance clusters.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent I3 and I4i instance EMR clusters.

Instance Size 16 XL 8 XL 4 XL 2 XL XL
Number of core instances in EMR cluster 5 5 5 5 5
Total query runtime on I3 (seconds) 4752.15457 4506.43694 7110.03042 11853.40336 21333.05743
Total query runtime on I4I (seconds) 2642.77407 2812.05517 4415.0023 7537.52779 12981.20251
Total query runtime improvement with I4I 44.39% 37.60% 37.90% 36.41% 39.15%
Geometric mean query runtime on I3 (sec) 34.99551 29.14821 41.53093 60.8069 95.46128
Geometric mean query runtime on I4I (sec) 19.17906 18.65311 25.66263 38.13503 56.95073
Geometric mean query runtime improvement with I4I 45.20% 36.01% 38.21% 37.29% 40.34%
EC2 I3 instance price ($ per hour) $4.990 $2.496 $1.248 $0.624 $0.312
EMR I3 instance price ($ per hour) $0.270 $0.270 $0.270 $0.156 $0.078
(EC2 + EMR) I3 instance price ($ per hour) $5.260 $2.766 $1.518 $0.780 $0.390
Cost of running on I3 ($ per instance) $6.943 $3.462 $2.998 $2.568 $2.311
EC2 I4I instance price ($ per hour) $5.491 $2.746 $1.373 $0.686 $0.343
EMR I4I price ($ per hour per instance) $1.373 $0.687 $0.343 $0.172 $0.086
(EC2 + EMR) I4I instance price ($ per hour) $6.864 $3.433 $1.716 $0.858 $0.429
Cost of running on I4I ($ per instance) $5.039 $2.681 $2.105 $1.795 $1.546
Total cost reduction with I4I including performance improvement -27.43% -22.56% -29.79% -30.09% -33.10%

The following graph shows per query improvements we observed on I4i 2XL instances with EMR Runtime for Spark on Amazon EMR version 6.8 compared to equivalent I3 2XL instances for the TPC-DS 3 TB benchmark.

Amazon EMR runtime performance improvements with EC2 M6i instances

M6i instances showed a similar performance improvement while running Apache Spark workloads compared to equivalent M5 instances. Our test results showed between 13.45–29.52% improvement in total query runtime for seven different instance sizes within the instance family, and between 7.98–25.37% improvement in geometric mean. On cost comparison, we observed 7.98–25.37% reduced instance hour cost on M6i instance EMR clusters compared to M5 EMR instance clusters to run the TPC-DS benchmark queries.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent M6i and M5 instance EMR clusters.

Instance Size 24 XL 16 XL 12 XL 8 XL 4 XL 2 XL XL
Number of core instances in EMR cluster 5 5 5 5 5 5 5
Total query runtime on M5 (seconds) 4027.58043 3782.10766 3348.05362 3516.4308 5621.22532 10075.45109 17278.15146
Total query runtime on M6I (seconds) 3106.43834 2665.70607 2714.69862 3043.5975 4195.02715 8226.88301 14515.50394
Total query runtime improvement with M6I 22.87% 29.52% 18.92% 13.45% 25.37% 18.35% 15.99%
Geometric mean query runtime M5 (sec) 30.45437 28.5207 23.95314 23.55958 32.95975 49.43178 75.95984
Geometric mean query runtime M6I (sec) 23.76853 19.21783 19.16869 19.9574 24.23012 39.09965 60.79494
Geometric mean query runtime improvement with M6I 21.95% 32.62% 19.97% 15.29% 26.49% 20.90% 19.96%
EC2 M5 instance price ($ per hour) $4.61 $3.07 $2.30 $1.54 $0.77 $0.38 $0.19
EMR M5 instance price ($ per hour) $0.27 $0.27 $0.27 $0.27 $0.19 $0.10 $0.05
(EC2 + EMR) M5 instance price ($ per hour) $4.88 $3.34 $2.57 $1.81 $0.96 $0.48 $0.24
Cost of running on M5 ($ per instance) $5.46 $3.51 $2.39 $1.76 $1.50 $1.34 $1.15
EC2 M6I instance price ($ per hour) $4.61 $3.07 $2.30 $1.54 $0.77 $0.38 $0.19
EMR M6I price ($ per hour per instance) $1.15 $0.77 $0.58 $0.38 $0.19 $0.10 $0.05
(EC2 + EMR) M6I instance price ($ per hour) $5.76 $3.84 $2.88 $1.92 $0.96 $0.48 $0.24
Cost of running on M6I ($ per instance) $4.97 $2.84 $2.17 $1.62 $1.12 $1.10 $0.97
Total cost reduction with M6I including performance improvement -8.92% -19.02% -9.28% -7.98% -25.37% -18.35% -15.99%

Amazon EMR runtime performance improvements with EC2 R6i instances

R6i instances showed a similar performance improvement while running Apache Spark workloads compared to equivalent R5 instances. Our test results showed between 14.25–32.23% improvement in total query runtime for six different instance sizes within the instance family, and between 16.12–36.5% improvement in geometric mean. R5.xlarge instances didn’t have sufficient memory to run TPC-DS benchmark queries, and weren’t included in this comparison. On cost comparison, we observed 5.48–23.5% reduced instance hour cost on R6i instance EMR clusters compared to R5 EMR instance clusters to run the TPC-DS benchmark queries.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent R6i and R5 instance EMR clusters.

Instance Size 24 XL 16 XL 12 XL 8 XL 4 XL 2XL
Number of core instances in EMR cluster 5 5 5 5 5 5
Total query runtime on R5 (seconds) 4024.4737 3715.74432 3552.97298 3535.69879 5379.73168 9121.41532
Total query runtime on R6I (seconds) 2865.83169 2518.24192 2513.4849 3031.71973 4544.44854 6977.9508
Total query runtime improvement with R6I 28.79% 32.23% 29.26% 14.25% 15.53% 23.50%
Geometric mean query runtime R5 (sec) 30.59066 28.30849 25.30903 23.85511 32.33391 47.28424
Geometric mean query runtime R6I (sec) 21.87897 17.97587 17.54117 20.00918 26.6277 34.52817
Geometric mean query runtime improvement with R6I 28.48% 36.50% 30.69% 16.12% 17.65% 26.98%
EC2 R5 instance price ($ per hour) $6.0480 $4.0320 $3.0240 $2.0160 $1.0080 $0.5040
EMR R5 instance price ($ per hour) $0.2700 $0.2700 $0.2700 $0.2700 $0.2520 $0.1260
(EC2 + EMR) R5 instance price ($ per hour) $6.3180 $4.3020 $3.2940 $2.2860 $1.2600 $0.6300
Cost of running on R5 ($ per instance) $7.0630 $4.4403 $3.2510 $2.2452 $1.8829 $1.5962
EC2 R6I instance price ($ per hour) $6.0480 $4.0320 $3.0240 $2.0160 $1.0080 $0.5040
EMR R6I price ($ per hour per instance) $1.5120 $1.0080 $0.7560 $0.5040 $0.2520 $0.1260
(EC2 + EMR) R6I instance price ($ per hour) $7.5600 $5.0400 $3.7800 $2.5200 $1.2600 $0.6300
Cost of running on R6I ($ per instance) $6.0182 $3.5255 $2.6392 $2.1222 $1.5906 $1.2211
Total cost reduction with R6I including performance improvement -14.79% -20.60% -18.82% -5.48% -15.53% -23.50%

Amazon EMR runtime performance improvements with EC2 C6i instances

C6i instances showed a similar performance improvement while running Apache Spark workloads compared to equivalent C5 instances. Our test results showed between 16.9–58.22% improvement in total query runtime for four different instance sizes within the instance family, and between 20.25–59.59% improvement in geometric mean. Only C6i 24, 12, 4, and 2xlarge sizes were benchmarked because C5 doesn’t have 32, 16 and 8 xlarge sizes. C5.xlarge instances didn’t have sufficient memory to run TPC-DS benchmark queries, and weren’t included in this comparison. On cost comparison, we observed 16.75–50.07% reduced instance hour cost on C6i instance EMR clusters compared to C5 EMR instance clusters to run the TPC-DS benchmark queries.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent C6i and C5 instance EMR clusters.

Instance Size * 24 XL 12 XL 4 XL 2 XL
Number of core instances in EMR cluster 5 5 5 5
Total query runtime on C5 (seconds) 3435.59808 2900.84981 5945.12879 10173.00757
Total query runtime on C6I (seconds) 2711.16147 2471.86778 5195.30093 8787.43422
Total query runtime improvement with C6I 21.09% 14.79% 12.61% 13.62%
Geometric mean query runtime C5 (sec) 25.67058 20.06539 31.76582 46.78632
Geometric mean query runtime C6I (sec) 20.4458 17.14133 26.92196 39.32622
Geometric mean query runtime improvement with C6I 20.35% 14.57% 15.25% 15.95%
EC2 C5 instance price ($ per hour) $4.080 $2.040 $0.680 $0.340
EMR C5 instance price ($ per hour) $0.270 $0.270 $0.170 $0.085
(EC2 + EMR) C5 instance price ($ per hour) $4.35000 $2.31000 $0.85000 $0.42500
Cost of running on C5 ($ per instance) $4.15135 $1.86138 $1.40371 $1.20098
EC2 C6I instance price ($ per hour) $4.0800 $2.0400 $0.6800 $0.3400
EMR C6I price ($ per hour per instance) $1.02000 $0.51000 $0.17000 $0.08500
(EC2 + EMR) C6I instance price ($ per hour) $5.10000 $2.55000 $0.85000 $0.42500
Cost of running on C6I ($ per instance) $3.84081 $1.75091 $1.22667 $1.03741
Total cost reduction with C6I including performance improvement -7.48% -5.93% -12.61% -13.62%

Amazon EMR runtime performance improvements with EC2 R6id instances

R6id instances showed a similar performance improvement while running Apache Spark workloads compared to equivalent R5D instances. Our test results showed between 11.8–28.7% improvement in total query runtime for five different instance sizes within the instance family, and between 15.1–32.0% improvement in geometric mean. R6ID 32 XL instances were not benchmarked because R5D instances don’t have these sizes available. On cost comparison, we observed 6.8–11.5% reduced instance hour cost on R6ID instance EMR clusters compared to R5D EMR instance clusters to run the TPC-DS benchmark queries.

The following table shows the results from running TPC-DS 3 TB benchmark queries using Amazon EMR 6.8 over equivalent R6id and R5d instance EMR clusters.

Instance Size 24 XL 16 XL 12 XL 8 XL 4 XL 2 XL XL
Number of core instances in EMR cluster 5 5 5 5 5 5 5
Total query runtime on R5D (seconds) 4054.4492975042 3691.7569385583 3598.6869168064 3532.7398928104 5397.5330161574 9281.2627059927 16862.8766838096
Total query runtime on R6ID (seconds) 2992.1198446983 2633.7131630720 2632.3186613402 2729.8860537867 4583.1040980373 7921.9960917943 14867.5391541445
Total query runtime improvement with R6ID 26.20% 28.66% 26.85% 22.73% 15.09% 14.65% 11.83%
Geometric mean query runtime R5D (sec) 31.0238156851 28.1432927726 25.7532157307 24.0596427675 32.5800246829 48.2306670294 76.6771994376
Geometric mean query runtime R6ID (sec) 22.8681174894 19.1282742957 18.6161830746 18.0498249257 25.9500918360 39.6580341258 65.0947323858
Geometric mean query runtime improvement with R6ID 26.29% 32.03% 27.71% 24.98% 20.35% 17.77% 15.11%
EC2 R5D instance price ($ per hour) $6.912000 $4.608000 $3.456000 $2.304000 $1.152000 $0.576000 $0.288000
EMR R5D instance price ($ per hour) $0.270000 $0.270000 $0.270000 $0.270000 $0.270000 $0.144000 $0.072000
(EC2 + EMR) R5D instance price ($ per hour) $7.182000 $4.878000 $3.726000 $2.574000 $1.422000 $0.720000 $0.360000
Cost of running on R5D ($ per instance) $8.088626 $5.002331 $3.724641 $2.525909 $2.132026 $1.856253 $1.686288
EC2 R6ID instance price ($ per hour) $7.257600 $4.838400 $3.628800 $2.419200 $1.209600 $0.604800 $0.302400
EMR R6ID price ($ per hour per instance) $1.814400 $1.209600 $0.907200 $0.604800 $0.302400 $0.151200 $0.075600
(EC2 + EMR) R6ID instance price ($ per hour) $9.072000 $6.048000 $4.536000 $3.024000 $1.512000 $0.756000 $0.378000
Cost of running on R6ID ($ per instance) $7.540142 $4.424638 $3.316722 $2.293104 $1.924904 $1.663619 $1.561092
Total cost reduction with R6ID including performance improvement -6.78% -11.55% -10.95% -9.22% -9.71% -10.38% -7.42%

Benchmarking methodology

The benchmark used in this post is derived from the industry-standard TPC-DS benchmark, and uses queries from the Spark SQL Performance Tests GitHub repo with the following fixes applied.

We calculated TCO by multiplying cost per hour by number of instances in the cluster and time taken to run the queries on the cluster. We used the on-demand pricing in the US East (N. Virginia) Region for all instances.

Conclusion

In this post, we described how we estimated the cost-performance benefit from using Amazon EMR with C6i, M6i, I4i, R6i, and R6id, instances compared to using equivalent previous generation instances. Using these new instances with Amazon EMR improves cost-performance by an additional 5–33%.


About the authors

AI MSAl MS is a product manager for Amazon EMR at Amazon Web Services.

Kyeonghyun Ryoo is a Software Development Engineer for EMR at Amazon Web Services. He primarily works on designing and building automation tools for internal teams and customers to maximize their productivity. Outside of work, he is a retired world champion in professional gaming who still enjoy playing video games.

Renewal of AWS CyberGRX assessment to enhance customers’ third-party due diligence process

Post Syndicated from Naranjan Goklani original https://aws.amazon.com/blogs/security/renewal-of-aws-cybergrx-assessment-to-enhance-customers-third-party-due-diligence-process/

CyberGRX

Amazon Web Services (AWS) is pleased to announce renewal of the AWS CyberGRX cyber risk assessment report. This third-party validated report helps customers perform effective cloud supplier due diligence on AWS and enhances their third-party risk management process.

With the increase in adoption of cloud products and services across multiple sectors and industries, AWS has become a critical component of customers’ third-party environments. Regulated customers are held to high standards by regulators and auditors when it comes to exercising effective due diligence on third parties.

Many customers use third-party cyber risk management (TPCRM) services such as CyberGRX to better manage risks from their evolving third-party environments and to drive operational efficiencies. To help with such efforts, AWS has completed the CyberGRX assessment of its security posture. CyberGRX security analysts perform the assessment and validate the results annually.

The CyberGRX assessment applies a dynamic approach to third-party risk assessment. This approach integrates advanced analytics, threat intelligence, and sophisticated risk models with vendors’ responses to provide an in-depth view of how a vendor’s security controls help protect against potential threats.

Vendor profiles are continuously updated as the risk level of cloud service providers changes, or as AWS updates its security posture and controls. This approach eliminates outdated static spreadsheets for third-party risk assessments, in which the risk matrices are not updated in near real time.

In addition, AWS customers can use the CyberGRX Framework Mapper to map AWS assessment controls and responses to well-known industry standards and frameworks, such as National Institute of Standards and Technology (NIST) 800-53, NIST Cybersecurity Framework, International Organization for Standardization (ISO) 27001, Payment Card Industry Data Security Standard (PCI DSS), and U.S. Health Insurance Portability and Assessment Act (HIPAA). This mapping can reduce customers’ third-party supplier due-diligence burden.

Customers can access the AWS CyberGRX report at no additional cost. Customers can request access to the report by completing an access request form, available on the AWS CyberGRX page.

As always, we value your feedback and questions. Reach out to the AWS Compliance team through the Contact Us page. If you have feedback about this post, submit comments in the Comments section below. To learn more about our other compliance and security programs, see AWS Compliance Programs.

Want more AWS Security news? Follow us on Twitter.

Naranjan Goklani

Naranjan Goklani

Naranjan is a Security Audit Manager at AWS, based in Toronto (Canada). He leads audits, attestations, certifications, and assessments across North America and Europe. Naranjan has more than 13 years of experience in risk management, security assurance, and performing technology audits. Naranjan previously worked in one of the Big 4 accounting firms and supported clients from the financial services, technology, retail, ecommerce, and utilities industries.

Get the best out of Amazon Verified Permissions by using fine-grained authorization methods

Post Syndicated from Jeff Lombardo original https://aws.amazon.com/blogs/security/get-the-best-out-of-amazon-verified-permissions-by-using-fine-grained-authorization-methods/

With the release of Amazon Verified Permissions, developers of custom applications can implement access control logic based on caller and resource information; group membership, hierarchy, and relationship; and session context, such as device posture, location, time, or method of authentication. With Amazon Verified Permissions, you can focus on building simple authorization policies and your applications—instead of, for example, building an authorization engine for your multi-tenant consumer applications.

Amazon Verified Permissions uses the Cedar policy language, which simplifies the implementation, review, and maintenance of large and complex access control strategies.

Amazon Verified Permissions includes schema definitions, policy statement grammar, and automated reasoning that scales across millions of permissions, which enables you to enforce the principles of default deny and of least privilege. These features facilitate the deployment of an in-depth fine-grained authorization model to support your Zero-Trust objectives.

In this blog post, we’ll discuss how you can use Amazon Verified Permissions to create authorization policies that are an improvement over traditional access control models, and we provide some best practices for the use of this feature.

What is fine-grained authorization? Is it a role-based or an attribute-based access control mechanism?

Traditionally, customers deploy access control strategies based on roles or attributes.

Role-based access control (RBAC) is an approach of granting access to resources through group memberships instead of individual users. This approach, although it simplifies the definition of entitlements, can become very complex when you scale out groups’ memberships, hierarchies, and nestings.

Consider a photo sharing application that allows users to upload photos and share those photos with friends. We have a user Alice who uploads their vacation photos to a folder named Austin2022. Alice decides to share these photos with friends.

Alice provides a link to their vacation photos to a friend named Bob. Using the link, Bob is able to view photos in the folder Austin2022, because Bob is in the user group Alice/Friends. That is, Bob has the role of Alice/Friends. If Bob were removed as Alice’s friend, Bob would not be able to view Alice’s photos. This is an example of how role-based access control works.

Attribute-based access control (ABAC) deviates from the static nature of RBAC by introducing access rules based on the characteristics of the following: the requestor identity; the attributes of the resources targeted; or contextual elements such as the request time, where the request originated, or the device used to make the request.

Let’s consider who can delete photos in the example photo sharing application. We want to make sure that only Alice can delete their photos. That is, we make an authorization decision based on the attribute owner of the resource photo.

Fine-grained authorization (FGA) is a model that combines the advantages of both RBAC and ABAC, so that customers can find the right balance between each approach for their individual use case. Understanding the FGA approach is key to writing policy statements in Amazon Verified Permissions.

How does permissions policy statement language work?

To define a policy statement, Amazon Verified Permissions uses a policy language based on the PARC model, as AWS Identity and Access Management (IAM) does for IAM policies. PARC refers to the four objects in the policy language: principal, action, resource, and condition, and these are defined as follows:

  1. The principal is the entity taking the action. Often this will be a human user, but it could also be another service or a device.
  2. The action is the operation being performed, for which permission must be granted. Often the action will map to an API call.
  3. The resource is the target of the call.
  4. The condition limits when or where the principal can make the action on the resource.

Using this language, you can create a policy that allows user Alice (the principal) to call deletePhoto (the action) on VacationPhoto_1.jpg (the resource) when Alice is logged in by using multi-factor authentication (the condition). After the Amazon Verified Permissions policy is authored, you will store it in your Amazon Verified Permissions policy store instance.

Policy statements are divided into two sections:

  1. The policy head, which defines the targets of the policy (principal, action, resource) and whether the policy permits or forbids the action.
  2. The Conditions section, which allows you to place conditions that authorize API actions only when specified criteria are met.

You can use the structure of the policy statements to tell at a glance whether a policy follows an RBAC, an ABAC, or an FGA approach, as shown in the following three examples.

// This style of policy can be used to implement a RBAC approach
permit(
  principal in UserGroup::"Alice/Friends",
  action in [
    Action::"readFile", 
    Action::"writeFile"
  ],
  resource in Folder::"Playa del Sol 2021"
);
// This style of policy can be used to implement an ABAC approach
permit(principal, action, resource)
when {
  principal.permitted_access_level >= resource.access_level
};
// This style of policy can be used to implement a hybrid approach
permit(
  principal in UserGroup::"Alice/Friends",
  action in [
    Action::"readFile", 
    Action::"writeFile"
  ],
  resource in Folder::"Playa del Sol 2021"
)
when {
  principal.permitted_access_level >= resource.access_level
};

Let’s go back to our example of Alice and Bob. Now, Alice can define a policy that allows their friends to view photos in their folder Austin2022, as follows.

permit(
    principal in UserGroup::"Alice/Friends",
    action == Action::"viewPhoto",
    resource in Folder::"Austin2022"
);

The policy head says to permit the viewPhoto action to be performed on resources in the folder Austin2022 for principals in user group Alice/Friends. There is no condition section for this policy. With the preceding policy, Bob can access the photos in Alice’s Austin2022 album as long as Bob is a member of the group Alice/Friends.

We can go back to the photo deletion workflow for a more complex scenario. To delete photos, you want to ensure that the requestor owns the photo. Additionally, you might require the user to be logged in via multi-factor authentication (MFA). This policy can be written as follows.

permit(
    principal,
    action == Action::"deletePhoto",
    resource == File::"photo"
)
when {
    resource.owner == principal.name (http://principal.name/)
    && context.MFA == true
};

The policy head permits a user to call the action deletePhoto on photos. The condition section limits the policy to permit photo deletion only when the resource’s owner attribute is the same as the principal’s name attribute and the context object’s MFA attribute equals true.

Designing well-architected policy statements

In this section, we cover six best practices that help customers scale out efficiently.

Use immutable identifiers to reduce risk of collision

The policy statements in this blog post and in Amazon Verified Permissions documentation intentionally use human-readable values such as Bob for a Principal entity, or Alice/Friends for a Group entity. This is useful when discussing general concepts, but in production systems, customers should utilize unique and immutable values for entities. As an example, what would happen if Alice wants to change their user name?

Instead of creating a user named Alice, you should use an autogenerated and unique identifier such as a Universally Unique Identifier (UUID). Those are generally available from your user directory, JSON Web Token, or file system. That way, you can create a user object with the ID a1b2c3d4-5678-90ab-cdef-EXAMPLE11111 and the name attribute Alice. This would allow you to update Alice’s user name without needing to recreate the user object.

Reduce the number of policies that use entity grouping

Policy statements can only contain a single principal entity and a single resource entity. If you want the same policy to apply to multiple principals or resources, you can group common entities and use an in statement.

In this example, Bob’s user account could be stored as the following object.

{
    "EntityId": {
        "EntityType": "User",
        "EntityId": "Bob"
    },
    "Parents":[
        {
            "EntityType": "UserGroup",
            "EntityId": "Alice/Friends"
        }
    ],
    "Attributes": {
        "username": {
            "String": "Bob"
        },
        "email": {
            "String":"[email protected]"
           },
    }
}

And user group Alice/Friends could be stored as the following object.

{                 
  "EntityId": {                     
    "EntityType": "UserGroup",                     
    "EntityId": "Alice/Friends"
    }
}

The parent relationship defined in Bob’s user account object is what makes Bob a member of the group Alice/Friends.

Now you can define a policy that allows Bob to gain access to Alice’s vacation photos because he is in the group Alice/Friends, as follows.

permit(
    principal in UserGroup::"Alice/Friends",
    action == Action::"viewPhoto",
    resource in Folder::"Austin2022"
);

Use namespaces to remove ambiguity

You can use namespaces to remove ambiguity. Returning to our application, let’s say that you want to give users the ability to delete their photos. But your moderators also need the ability to delete inappropriate photos. How can you distinguish between the user action deletePhoto and the administrator action deletePhoto? Namespaces give you this flexibility.

When creating your entities, you can add namespaces in the EntityType field, as in the following example.

{
  "EntityId": {
    "EntityType": "Admin::Action",
    "EntityId": "\"deletePhoto\""    
  },
  "Parents" : []
  "Attributes": {
    "readOnly": {
      "String": "false",
      },
      "appliesTo": {
        "String": "\"Photo\""
      }
  }
}

You then use the namespace in your permit policy, as follows.

permit(
  principal,
  action == Admin::Action::"deletePhoto",
  resource == File::"Photo")
when {
  principal.role == Moderator
};

This policy requires a user to have the role Moderator to successfully use the administrator deletePhoto action.

Set permission guardrails with forbid statements

The Amazon Verified Permissions policy engine denies any action that is not explicitly allowed with a permit policy. But you might want to establish permission guardrails to ensure that an action will be never allowed. You can create forbid policies for this purpose.

Returning to our photo sharing application, suppose that you want to ensure that no user can delete a photo unless the user has been authenticated with MFA. You could use the following policy.

forbid(
  principal,
  action == Action::"deletePhoto",
  resource == File::"Photo"
)
unless {
  context.MFA == true
}

This permission guardrail will help prevent the accidental grant of overly permissive deletePhoto permissions.

Simplify statements with unless conditions

When you define complex conditions for a policy statement, you might face situations where a policy needs multiple negative conditions. Amazon Verified Permissions provides an alternative keyword for the conditional expression: unless. For example, you might deny moderators the ability to delete photos unless they have flagged the photo as inappropriate, are authenticated using MFA, and are on the company’s network, in order to simplify policy statements.

Unless behaves the same as when, except that using unless requires all conditions to evaluate as false. With this additional expression, you can create statement that are less complex to review and maintain. The following example shows how you can simplify a condition with multiple parameters by using the unless expression.

// Allow access unless a resource was deleted more than 7 days ago
permit(
  principal in Group::"Alice/Friends",
  action == Action::"readPhoto",
  resource in Folder::"Playa del Sol 2021"
)
when {
  !(resource.status == "deleted"
   && resource.deletion_date < (context.time.now - 604800)) //7 days ago
}

The following example shows how you can simplify the previous policy by using an unless expression.

// Allow access unless a resource was deleted more than 7 days ago
permit(
  principal in Group::"Alice/Friends",
  action == Action::"readPhoto",
  resource in Folder::"Playa del Sol 2021"
)
unless {
  (resource.status == "deleted"
   && resource.deletion_date < (context.time.now - 604800)) //7 days ago
}

Rationalize policies with a template

You might face a situation where you are repeatedly creating the same rule for different contexts. In the following example, we demonstrate a policy that permits Alice to describe the folder Alice’s Org. Then we replicate the same policy for Bob and the folder Bob’s Org.

permit(
    principal == "Alice",
    action == Action::"describeFolder",
    resource == Folder::"Alice's Org"
)
when {
    resource.owner == principal.username
};

permit(
    principal == "Bob",
    action == Action::"describeFolder",
    resource == Folder::"Bob's Org"
)
when {
    resource.owner == principal.username
};

In this case, we recommend that you use a policy template to simplify the evaluation, as in the following example.

permit(
    principal == ?principal,
    action == Action::"describeFolder",
    resource == ?resource
)
when {
    resource.owner == principal.username
};

With a policy template, the statement inherits from a placeholder (in this example, ?principal and ?resource) and will be evaluated dynamically for each policy evaluation request, based on context that the application will provide.

Conclusion: Start authorizing with Amazon Verified Permissions

With Amazon Verified Permissions, you can create permission policies with expressiveness, performance, and readability in mind.

Using the best practices described in this post, you are ready to author policies with Amazon Verified Permissions. When combined with services like Amazon Cognito, Amazon API Gateway, an AWS Lambda authorizer, or AWS AppSync, Amazon Verified Permissions allows you to unlock in-depth and explicit access control logic securely using native AWS services.

Over the next months, AWS will release more resources to support our customers in their implementation of Amazon Verified Permissions. Learn more about Amazon Verified Permissions. Stay tuned and happy building.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Jeff Lombardo

Jeff is a Solutions Architect expert in IAM, Application Security, and Data Protection. Through 17 years as a security consultant for enterprises of all sizes and business verticals, he delivered innovative solutions with respect to standards and governance frameworks. Today at AWS, he helps organizations enforce best practices and defense in depth for secure cloud adoption.

Brad Burnett

Brad Burnett

Brad is a Security Specialist Solutions Architect focused on Identity. Before AWS, he worked as a Linux Systems Administrator and Incident Responder. When he isn’t helping customers design robust and secure Identity solutions, Brad can be found sharpening his offensive security skills or playing card games.

AWS Digital Sovereignty Pledge: Control without compromise

Post Syndicated from Matt Garman original https://aws.amazon.com/blogs/security/aws-digital-sovereignty-pledge-control-without-compromise/

French | German | Italian | Japanese | Korean

We’ve always believed that for the cloud to realize its full potential it would be essential that customers have control over their data. Giving customers this sovereignty has been a priority for AWS since the very beginning when we were the only major cloud provider to allow customers to control the location and movement of their data. The importance of this foundation has only grown over the last 16 years as the cloud has become mainstream, and governments and standards bodies continue to develop security, data protection, and privacy regulations.

Today, having control over digital assets, or digital sovereignty, is more important than ever.

As we’ve innovated and expanded to offer the world’s most capable, scalable, and reliable cloud, we’ve continued to prioritize making sure customers are in control and able to meet regulatory requirements anywhere they operate. What this looks like varies greatly across industries and countries. In many places around the world, like in Europe, digital sovereignty policies are evolving rapidly. Customers are facing an incredible amount of complexity, and over the last 18 months, many have told us they are concerned that they will have to choose between the full power of AWS and a feature-limited sovereign cloud solution that could hamper their ability to innovate, transform, and grow. We firmly believe that customers shouldn’t have to make this choice.

This is why today we’re introducing the AWS Digital Sovereignty Pledge—our commitment to offering all AWS customers the most advanced set of sovereignty controls and features available in the cloud.

AWS already offers a range of data protection features, accreditations, and contractual commitments that give customers control over where they locate their data, who can access it, and how it is used. We pledge to expand on these capabilities to allow customers around the world to meet their digital sovereignty requirements without compromising on the capabilities, performance, innovation, and scale of the AWS Cloud. At the same time, we will continue to work to deeply understand the evolving needs and requirements of both customers and regulators, and rapidly adapt and innovate to meet them.

Sovereign-by-design

Our approach to delivering on this pledge is to continue to make the AWS Cloud sovereign-by-design—as it has been from day one. Early in our history, we received a lot of input from customers in industries like financial services and healthcare—customers who are among the most security- and data privacy-conscious organizations in the world—about what data protection features and controls they would need to use the cloud. We developed AWS encryption and key management capabilities, achieved compliance accreditations, and made contractual commitments to satisfy the needs of our customers. As customers’ requirements evolve, we evolve and expand the AWS Cloud. A couple of recent examples include the data residency guardrails we added to AWS Control Tower (a service for governing AWS environments) late last year, which give customers even more control over the physical location of where customer data is stored and processed. In February 2022, we announced AWS services that adhere to the Cloud Infrastructure Service Providers in Europe (CISPE) Data Protection Code of Conduct, giving customers an independent verification and an added level of assurance that our services can be used in compliance with the General Data Protection Regulation (GDPR). These capabilities and assurances are available to all AWS customers.

We pledge to continue to invest in an ambitious roadmap of capabilities for data residency, granular access restriction, encryption, and resilience:

1. Control over the location of your data

Customers have always controlled the location of their data with AWS. For example, currently in Europe, customers have the choice to deploy their data into any of eight existing Regions. We commit to deliver even more services and features to protect our customers’ data. We further commit to expanding on our existing capabilities to provide even more fine-grained data residency controls and transparency. We will also expand data residency controls for operational data, such as identity and billing information.

2. Verifiable control over data access

We have designed and delivered first-of-a-kind innovation to restrict access to customer data. The AWS Nitro System, which is the foundation of AWS computing services, uses specialized hardware and software to protect data from outside access during processing on Amazon Elastic Compute Cloud (Amazon EC2). By providing a strong physical and logical security boundary, Nitro is designed to enforce restrictions so that nobody, including anyone in AWS, can access customer workloads on EC2. We commit to continue to build additional access restrictions that limit all access to customer data unless requested by the customer or a partner they trust.

3. The ability to encrypt everything everywhere

Currently, we give customers features and controls to encrypt data, whether in transit, at rest, or in memory. All AWS services already support encryption, with most also supporting encryption with customer managed keys that are inaccessible to AWS. We commit to continue to innovate and invest in additional controls for sovereignty and encryption features so that our customers can encrypt everything everywhere with encryption keys managed inside or outside the AWS Cloud.

4. Resilience of the cloud

It is not possible to achieve digital sovereignty without resiliency and survivability. Control over workloads and high availability are essential in the case of events like supply chain disruption, network interruption, and natural disaster. Currently, AWS delivers the highest network availability of any cloud provider. Each AWS Region is comprised of multiple Availability Zones (AZs), which are fully isolated infrastructure partitions. To better isolate issues and achieve high availability, customers can partition applications across multiple AZs in the same AWS Region. For customers that are running workloads on premises or in intermittently connected or remote use cases, we offer services that provide specific capabilities for offline data and remote compute and storage. We commit to continue to enhance our range of sovereign and resilient options, allowing customers to sustain operations through disruption or disconnection.

Earning trust through transparency and assurances

At AWS, earning customer trust is the foundation of our business. We understand that protecting customer data is key to achieving this. We also know that trust must continue to be earned through transparency. We are transparent about how our services process and transfer data. We will continue to challenge requests for customer data from law enforcement and government agencies. We provide guidance, compliance evidence, and contractual commitments so that our customers can use AWS services to meet compliance and regulatory requirements. We commit to continuing to provide the transparency and business flexibility needed to meet evolving privacy and sovereignty laws.

Navigating changes as a team

Helping customers protect their data in a world with changing regulations, technology, and risks takes teamwork. We would never expect our customers to go it alone. Our trusted partners play a prominent role in bringing solutions to customers. For example, in Germany, T-Systems (part of Deutsche Telekom) offers Data Protection as a Managed Service on AWS. It provides guidance to ensure data residency controls are properly configured, offering services for the configuration and management of encryption keys and expertise to help guide their customers in addressing their digital sovereignty requirements in the AWS Cloud. We are doubling down with local partners that our customers trust to help address digital sovereignty requirements.

We are committed to helping our customers meet digital sovereignty requirements. We will continue to innovate sovereignty features, controls, and assurances within the global AWS Cloud and deliver them without compromise to the full power of AWS.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Matt Garman

Matt Garman

Matt is currently the Senior Vice President of AWS Sales, Marketing and Global Services at AWS, and also sits on Amazon’s executive leadership S-Team. Matt joined Amazon in 2006, and has held several leadership positions in AWS over that time. Matt previously served as Vice President of the Amazon EC2 and Compute Services businesses for AWS for over 10 years. Matt was responsible for P&L, product management, and engineering and operations for all compute and storage services in AWS. He started at Amazon when AWS first launched in 2006 and served as one of the first product managers, helping to launch the initial set of AWS services. Prior to Amazon, he spent time in product management roles at early stage Internet startups. Matt earned a BS and MS in Industrial Engineering from Stanford University, and an MBA from the Kellogg School of Management at Northwestern University.

 


French version

AWS Digital Sovereignty Pledge: le contrôle sans compromis

Nous avons toujours pensé que pour que le cloud révèle son entier potentiel, il était essentiel que les clients aient le contrôle de leurs données. Garantir à nos clients cette souveraineté a été la priorité d’AWS depuis l’origine, lorsque nous étions le seul grand fournisseur de cloud à permettre aux clients de contrôler la localisation et le flux de leurs données. L’importance de cette démarche n’a cessé de croître ces 16 dernières années, à mesure que le cloud s’est démocratisé et que les gouvernements et les organismes de régulation ont développé des réglementations en matière de sécurité, de protection des données et de confidentialité.

Aujourd’hui, le contrôle des ressources numériques, ou souveraineté numérique, est plus important que jamais.

Tout en innovant et en nous développant pour offrir le cloud le plus performant, évolutif et fiable au monde, nous avons continué à ériger comme priorité la garantie que nos clients gardent le contrôle et soient en mesure de répondre aux exigences réglementaires partout où ils opèrent. Ces exigences varient considérablement selon les secteurs et les pays. Dans de nombreuses régions du monde, comme en Europe, les politiques de souveraineté numérique évoluent rapidement. Les clients sont confrontés à une incroyable complexité. Au cours des dix-huit derniers mois, ils nous ont rapporté qu’ils craignaient de devoir choisir entre les vastes possibilités offertes par les services d’AWS et une solution aux fonctionnalités limitées qui pourrait entraver leur capacité à innover, à se transformer et à se développer. Nous sommes convaincus que les clients ne devraient pas avoir à faire un tel choix.

C’est pourquoi nous présentons aujourd’hui l’AWS Digital Sovereignty Pledge – notre engagement à offrir à tous les clients AWS l’ensemble le plus avancé d’outils et de fonctionnalités de contrôle disponibles dans le cloud au service de la souveraineté.

AWS propose déjà une série de fonctionnalités de protection des données, de certifications et d’engagements contractuels qui donnent à nos clients le plein contrôle de la localisation de leurs données, de leur accès et de leur utilisation. Nous nous engageons à développer ces capacités pour permettre à nos clients du monde entier de répondre à leurs exigences en matière de souveraineté numérique, sans faire de compromis sur les capacités, les performances, l’innovation et la portée du cloud AWS. En parallèle, nous continuerons à travailler pour comprendre en profondeur l’évolution des besoins et des exigences des clients et des régulateurs, et nous nous adapterons et innoverons rapidement pour y répondre.

Souverain dès la conception

Pour respecter l’AWS Digital Sovereignty Pledge, notre approche est de continuer à rendre le cloud AWS souverain dès sa conception, comme il l’a été dès le premier jour. Aux débuts de notre histoire, nous recevions de nombreux commentaires de nos clients de secteurs tels que les services financiers et la santé – des clients qui comptent parmi les organisations les plus soucieuses de la sécurité et de la confidentialité des données dans le monde – sur les fonctionnalités et les contrôles de protection des données dont ils auraient besoin pour leur utilisation du cloud. Nous avons développé des compétences en matière de chiffrement et de gestion des données, obtenu des certifications de conformité et pris des engagements contractuels pour répondre aux besoins de nos clients. Nous développons le cloud AWS à mesure que les exigences des clients évoluent. Nous pouvons citer, parmi les exemples récents, les data residency guardrails (les garde-fous de la localisation des données) que nous avons ajoutés en fin d’année dernière à l’AWS Control Tower (un service de gestion des environnements AWS) et qui donnent aux clients davantage de contrôle sur l’emplacement physique de leurs données, où elles sont stockées et traitées. En février 2022, nous avons annoncé de nouveaux services AWS adhérant au Code de Conduite sur la protection des données de l’association CISPE (Cloud Infrastructure Service Providers in Europe (CISPE)). Ils apportent à nos clients une vérification indépendante et un niveau de garantie supplémentaire attestant que nos services peuvent être utilisés conformément au Règlement général sur la protection des données (RGPD). Ces capacités et ces garanties sont disponibles pour tous les clients d’AWS.

Nous nous engageons à poursuivre nos investissements conformément à une ambitieuse feuille de route pour le, , développement de capacités au service de la localisation des données, de la restriction d’accès granulaire (pratique consistant à accorder à des utilisateurs spécifiques différents niveaux d’accès à une ressource particulière), de chiffrement et de résilience :

1. Contrôle de l’emplacement de vos données

AWS a toujours permis à ses clients de contrôler l’emplacement de leurs données. Aujourd’hui en Europe, par exemple, les clients ont le choix de déployer leurs données dans l’une des huit Régions existantes. Nous nous engageons à fournir encore plus de services et de capacités pour protéger les données de nos clients. Nous nous engageons également à développer nos capacités existantes pour fournir des contrôles de localisation des données encore plus précis et transparents. Nous allons également étendre les contrôles de localisation des données pour les données opérationnelles, telles que les informations relatives à l’identité et à la facturation.

2. Contrôle fiable de l’accès aux données

Nous avons conçu et fourni une innovation unique en son genre pour restreindre l’accès aux données clients. Le système AWS Nitro, qui constitue la base des services informatiques d’AWS, utilise du matériel et des logiciels spécialisés pour protéger les données contre tout accès extérieur pendant leur traitement sur les serveurs EC2. En fournissant une solide barrière de sécurité physique et logique, Nitro est conçu pour empêcher, y compris au sein d’AWS, l’accès à ces données sur EC2. Nous nous engageons à continuer à développer des restrictions d’accès supplémentaires qui limitent tout accès aux données de nos clients, sauf indication contraire de la part du client ou de l’un de ses prestataires de confiance.

3. La possibilité de tout chiffrer, partout

Aujourd’hui, nous offrons à nos clients des fonctionnalités et des outils de contrôle pour chiffrer les données, qu’elles soient en transit, au repos ou en mémoire. Tous les services AWS prennent déjà en charge le chiffrement, la plupart permettant également le chiffrement sur des clés gérées par le client et inaccessibles à AWS. Nous nous engageons à continuer d’innover et d’investir dans des outils de contrôle au service de la souveraineté et des fonctionnalités de chiffrement supplémentaires afin que nos clients puissent chiffrer l’ensemble de leurs données partout, avec des clés de chiffrement gérées à l’intérieur ou à l’extérieur du cloud AWS.

4. La résilience du cloud

La souveraineté numérique est impossible sans résilience et sans capacités de continuité d’activité lors de crise majeure. Le contrôle des charges de travail et la haute disponibilité de réseau sont essentiels en cas d’événements comme une rupture de la chaîne d’approvisionnement, une interruption du réseau ou encore une catastrophe naturelle. Actuellement, AWS offre la plus haute disponibilité de réseau de tous les fournisseurs de cloud. Chaque Région AWS est composée de plusieurs zones de disponibilité (AZ), qui sont des portions d’infrastructure totalement isolées. Pour mieux isoler les difficultés et obtenir une haute disponibilité de réseau, les clients peuvent répartir les applications sur plusieurs zones dans la même Région AWS. Pour les clients qui exécutent des charges de travail sur place ou dans des cas d’utilisation à distance ou connectés par intermittence, nous proposons des services qui offrent des capacités spécifiques pour les données hors ligne, le calcul et le stockage à distance. Nous nous engageons à continuer d’améliorer notre gamme d’options souveraines et résilientes, permettant aux clients de maintenir leurs activités en cas de perturbation ou de déconnexion.

Gagner la confiance par la transparence et les garanties

Chez AWS, gagner la confiance de nos clients est le fondement de notre activité. Nous savons que la protection des données de nos clients est essentielle pour y parvenir. Nous savons également que leur confiance s’obtient par la transparence. Nous sommes transparents sur la manière dont nos services traitent et transfèrent leurs données. Nous continuerons à nous contester les demandes de données des clients émanant des autorités judiciaires et des organismes gouvernementaux. Nous fournissons des conseils, des preuves de conformité et des engagements contractuels afin que nos clients puissent utiliser les services AWS pour répondre aux exigences de conformité et de réglementation. Nous nous engageons à continuer à fournir la transparence et la flexibilité commerciale nécessaires pour répondre à l’évolution du cadre réglementaire relatif à la confidentialité et à la souveraineté des données.

Naviguer en équipe dans un monde en perpétuel changement

Aider les clients à protéger leurs données dans un monde où les réglementations, les technologies et les risques évoluent nécessite un travail d’équipe. Nous ne pouvons-nous résoudre à ce que nos clients relèvent seuls ces défis. Nos partenaires de confiance jouent un rôle prépondérant dans l’apport de solutions aux clients. Par exemple, en Allemagne, T-Systems (qui fait partie de Deutsche Telekom) propose la protection des données en tant que service géré sur AWS. L’entreprise fournit des conseils pour s’assurer que les contrôles de localisation des données sont correctement configurés, offrant des services pour la configuration en question, la gestion des clés de chiffrements et une expertise pour aider leurs clients à répondre à leurs exigences de souveraineté numérique dans le cloud AWS. Nous redoublons d’efforts avec les partenaires locaux en qui nos clients ont confiance pour les aider à répondre à ces exigences de souveraineté numérique.

Nous nous engageons à aider nos clients à répondre aux exigences de souveraineté numérique. Nous continuerons d’innover en matière de fonctionnalités, de contrôles et de garanties de souveraineté dans le cloud mondial d’AWS, tout en fournissant sans compromis et sans restriction la pleine puissance d’AWS.


German version

AWS Digital Sovereignty Pledge: Kontrolle ohne Kompromisse

Wir waren immer der Meinung, dass die Cloud ihr volles Potenzial nur dann erschließen kann, wenn Kunden die volle Kontrolle über ihre Daten haben. Diese Datensouveränität des Kunden genießt bei AWS schon seit den Anfängen der Cloud Priorität, als wir der einzige große Anbieter waren, bei dem Kunden sowohl Kontrolle über den Speicherort als auch über die Übertragung ihrer Daten hatten. Die Bedeutung dieser Grundsätze hat über die vergangenen 16 Jahre stetig zugenommen: Die Cloud ist im Mainstream angekommen, sowohl Gesetzgeber als auch Regulatoren entwickeln ihre Vorgaben zu IT-Sicherheit und Datenschutz stetig weiter.

Kontrolle bzw. Souveränität über digitale Ressourcen ist heute wichtiger denn je.

Unsere Innovationen und Entwicklungen haben stets darauf abgezielt, unseren Kunden eine Cloud zur Verfügung zu stellen, die skalierend und zugleich verlässlich global nutzbar ist. Dies beinhaltet auch unseren Kunden die Kontrolle zu gewährleisten die sie benötigen, damit sie alle ihre regulatorischen Anforderungen erfüllen können. Regulatorische Anforderungen sind länder- und sektorspezifisch. Vielerorts – wie auch in Europa – entstehen neue Anforderungen und Regularien zu digitaler Souveränität, die sich rasant entwickeln. Kunden sehen sich einer hohen Anzahl verschiedenster Regelungen ausgesetzt, die eine enorme Komplexität mit sich bringen. Innerhalb der letzten achtzehn Monate haben sich viele unserer Kunden daher mit der Sorge an uns gewandt, vor eine Wahl gestellt zu werden: Entweder die volle Funktionalität und Innovationskraft von AWS zu nutzen, oder auf funktionseingeschränkte „souveräne“ Cloud-Lösungen zurückzugreifen, deren Kapazität für Innovation, Transformation, Sicherheit und Wachstum aber limitiert ist. Wir sind davon überzeugt, dass Kunden nicht vor diese „Wahl“ gestellt werden sollten.

Deswegen stellen wir heute den „AWS Digital Sovereignty Pledge“ vor – unser Versprechen allen AWS Kunden, ohne Kompromisse die fortschrittlichsten Souveränitäts-Kontrollen und Funktionen in der Cloud anzubieten.

AWS bietet schon heute eine breite Palette an Datenschutz-Funktionen, Zertifizierungen und vertraglichen Zusicherungen an, die Kunden Kontrollmechanismen darüber geben, wo ihre Daten gespeichert sind, wer darauf Zugriff erhält und wie sie verwendet werden. Wir werden diese Palette so erweitern, dass Kunden überall auf der Welt, ihre Anforderungen an Digitale Souveränität erfüllen können, ohne auf Funktionsumfang, Leistungsfähigkeit, Innovation und Skalierbarkeit der AWS Cloud verzichten zu müssen. Gleichzeitig werden wir weiterhin daran arbeiten, unser Angebot flexibel und innovativ an die sich weiter wandelnden Bedürfnisse und Anforderungen von Kunden und Regulatoren anzupassen.

Sovereign-by-design

Wir werden den „AWS Digital Sovereignty Pledge“ so umsetzen, wie wir das seit dem ersten Tag machen und die AWS Cloud gemäß unseres „sovereign-by-design“ Ansatz fortentwickeln. Wir haben von Anfang an, durch entsprechende Funktions- und Kontrollmechanismen für spezielle IT-Sicherheits- und Datenschutzanforderungen aus den verschiedensten regulierten Sektoren Lösungen gefunden, die besonders sensiblen Branchen wie beispielsweise dem Finanzsektor oder dem Gesundheitswesen frühzeitig ermöglichten, die Cloud zu nutzen. Auf dieser Basis haben wir die AWS Verschlüsselungs- und Schlüsselmanagement-Funktionen entwickelt, Compliance-Akkreditierungen erhalten und vertragliche Zusicherungen gegeben, welche die Bedürfnisse unserer Kunden bedienen. Dies ist ein stetiger Prozess, um die AWS Cloud auf sich verändernde Kundenanforderungen anzupassen. Ein Beispiel dafür sind die Data Residency Guardrails, um die wir AWS Control Tower Ende letzten Jahres erweitert haben. Sie geben Kunden die volle Kontrolle über die physikalische Verortung ihrer Daten zu Speicherungs- und Verarbeitungszwecken. Dieses Jahr haben wir einen Katalog von AWS Diensten veröffentlicht, die den Cloud Infrastructure Service Providers in Europe (CISPE) erfüllen. Damit verfügen Kunden über eine unabhängige Verifizierung und zusätzliche Versicherung, dass unsere Dienste im Einklang mit der DSGVO verwendet werden können. Diese Instrumente und Nachweise stehen schon heute allen AWS Kunden zur Verfügung.

Wir haben uns ehrgeizige Ziele für unsere Roadmap gesetzt und investieren kontinuierlich in Funktionen für die Verortung von Daten (Datenresidenz), granulare Zugriffsbeschränkungen, Verschlüsselung und Resilienz:

1. Kontrolle über den Ort der Datenspeicherung

Bei AWS hatten Kunden immer schon die Kontrolle über Datenresidenz, also den Ort der Datenspeicherung. Aktuell können Kunden ihre Daten z.B. in 8 bestehenden Regionen innerhalb Europas speichern, von denen 6 innerhalb der Europäischen Union liegen. Wir verpflichten uns dazu, noch mehr Dienste und Funktionen zur Verfügung zustellen, die dem Schutz der Daten unserer Kunden dienen. Ebenso verpflichten wir uns, noch granularere Kontrollen für Datenresidenz und Transparenz auszubauen. Wir werden auch zusätzliche Kontrollen für Daten einführen, die insbesondere die Bereiche Identitäts- und Abrechnungs-Management umfassen.

2. Verifizierbare Kontrolle über Datenzugriffe

Mit dem AWS Nitro System haben wir ein innovatives System entwickelt, welches unberechtigte Zugriffsmöglichkeiten auf Kundendaten verhindert: Das Nitro System ist die Grundlage der AWS Computing Services (EC2). Es verwendet spezialisierte Hardware und Software, um den Schutz von Kundendaten während der Verarbeitung auf EC2 zu gewährleisten. Nitro basiert auf einer starken physikalischen und logischen Sicherheitsabgrenzung und realisiert damit Zugriffsbeschränkungen, die unautorisierte Zugriffe auf Kundendaten auf EC2 unmöglich machen – das gilt auch für AWS als Betreiber. Wir werden darüber hinaus für weitere AWS Services zusätzliche Mechanismen entwickeln, die weiterhin potentielle Zugriffe auf Kundendaten verhindern und nur in Fällen zulassen, die explizit durch Kunden oder Partner ihres Vertrauens genehmigt worden sind.

3. Möglichkeit der Datenverschlüsselung überall und jederzeit

Gegenwärtig können Kunden Funktionen und Kontrollen verwenden, die wir zur Verschlüsselung von Daten während der Übertragung, persistenten Speicherungen oder Verarbeitung in flüchtigem Speicher anbieten. Alle AWS Dienste unterstützen schon heute Datenverschlüsselung, die meisten davon auf Basis der Customer Managed Keys – d.h. Schlüssel, die von Kunden verwaltet werden und für AWS nicht zugänglich sind. Wir werden auch in diesem Bereich weiter investieren und Innovationen vorantreiben. Es wird zusätzliche Kontrollen für Souveränität und Verschlüsselung geben, damit unsere Kunden alles jederzeit und überall verschlüsseln können – und das mit Schlüsseln, die entweder durch AWS oder durch den Kunden selbst bzw. ausgewählte Partner verwaltet werden können.

4. Resilienz der Cloud

Digitale Souveränität lässt sich nicht ohne Ausfallsicherheit und Überlebensfähigkeit herstellen. Die Kontrolle über Workloads und hohe Verfügbarkeit z.B. in Fällen von Lieferkettenstörungen, Netzwerkausfällen und Naturkatastrophen ist essenziell. Aktuell bietet AWS die höchste Netzwerk-Verfügbarkeit unter allen Cloud-Anbietern. Jede AWS Region besteht aus mehreren Availability Zones (AZs), die jeweils vollständig isolierte Partitionen unserer Infrastruktur sind. Um Probleme besser zu isolieren und eine hohe Verfügbarkeit zu erreichen, können Kunden Anwendungen auf mehrere AZs in derselben Region verteilen. Kunden, die Workloads on-premises oder in Szenarien mit sporadischer Netzwerk-Anbindung betreiben, bieten wir Dienste an, welche auf Offline-Daten und Remote Compute und Storage Anwendungsfälle angepasst sind. Wir werden unser Angebot an souveränen und resilienten Optionen ausbauen und fortentwickeln, damit Kunden den Betrieb ihrer Workloads auch bei Trennungs- und Disruptionsszenarien aufrechterhalten können.

Vertrauen durch Transparenz und Zusicherungen

Der Aufbau eines Vertrauensverhältnisses mit unseren Kunden, ist die Grundlage unserer Geschäftsbeziehung bei AWS. Wir wissen, dass der Schutz der Daten unserer Kunden der Schlüssel dazu ist. Wir wissen auch, dass Vertrauen durch fortwährende Transparenz verdient und aufgebaut wird. Wir bieten schon heute transparenten Einblick, wie unsere Dienste Daten verarbeiten und übertragen. Wir werden auch in Zukunft Anfragen nach Kundendaten durch Strafverfolgungsbehörden und Regierungsorganisationen konsequent anfechten. Wir bieten Rat, Compliance-Nachweise und vertragliche Zusicherungen an, damit unsere Kunden AWS Dienste nutzen und gleichzeitig ihre Compliance und regulatorischen Anforderungen erfüllen können. Wir werden auch in Zukunft die Transparenz und Flexibilität an den Tag legen, um auf sich weiterentwickelnde Datenschutz- und Soveränitäts-Regulierungen passende Antworten zu finden.

Den Wandel als Team bewältigen

Regulatorik, Technologie und Risiken sind stetigem Wandel unterworfen: Kunden dabei zu helfen, ihre Daten in diesem Umfeld zu schützen, ist Teamwork. Wir würden nie erwarten, dass unsere Kunden das alleine bewältigen müssen. Unsere Partner genießen hohes Vertrauen und spielen eine wichtige Rolle dabei, Lösungen für Kunden zu entwickeln. Zum Beispiel bietet T-Systems in Deutschland Data Protection as a Managed Service auf AWS an. Das Angebot umfasst Hilfestellungen bei der Konfiguration von Kontrollen zur Datenresidenz, Zusatzdienste im Zusammenhang mit der Schlüsselverwaltung für kryptographische Verfahren und Rat bei der Erfüllung von Anforderungen zu Datensouveränität in der AWS Cloud. Wir werden die Zusammenarbeit mit lokalen Partnern, die besonderes Vertrauen bei unseren gemeinsamen Kunden genießen, intensivieren, um bei der Erfüllung der Digitalen Souveräntitätsanforderungen zu unterstützen.

Wir verpflichten uns dazu unseren Kunden bei der Erfüllung ihre Anforderungen an digitale Souveränität zu helfen. Wir werden weiterhin Souveränitäts-Funktionen, Kontrollen und Zusicherungen für die globale AWS Cloud entwickeln, die das gesamte Leistungsspektrum von AWS erschließen.


Italian version

AWS Digital Sovereignty Pledge: Controllo senza compromessi

Abbiamo sempre creduto che, affinché il cloud possa realizzare in pieno il potenziale, sia essenziale che i clienti abbiano il controllo dei propri dati. Offrire ai clienti questa “sovranità” è sin dall’inizio una priorità per AWS, da quando eravamo l’unico grande cloud provider a consentire ai clienti di controllare la localizzazione e lo spostamento dei propri dati. L’importanza di questo principio è aumentata negli ultimi 16 anni, man mano che il cloud ha iniziato a diffondersi e i governi e gli organismi di standardizzazione hanno continuato a sviluppare normative in materia di sicurezza, protezione dei dati e privacy.

Oggi, avere il controllo sulle risorse digitali, sulla sovranità digitale, è più importante che mai.

Nell’innovare ed espandere l’offerta cloud più completa, scalabile e affidabile al mondo, la nostra priorità è stata assicurarci che i clienti – ovunque operino – abbiano il controllo e siano in grado di soddisfare i requisiti normativi. Il contesto varia notevolmente tra i settori e i paesi. In molti luoghi del mondo, come in Europa, le politiche di sovranità digitale si stanno evolvendo rapidamente. I clienti stanno affrontando un crescente livello di complessità, e negli ultimi diciotto mesi molti di essi ci hanno detto di essere preoccupati di dover scegliere tra usare AWS in tutta la sua potenza e una soluzione cloud sovrana con funzionalità limitate che potrebbe ostacolare la loro capacità di innovazione, trasformazione e crescita. Crediamo fermamente che i clienti non debbano fare questa scelta.

Ecco perché oggi presentiamo l’AWS Digital Sovereignty Pledge, il nostro impegno a offrire a tutti i clienti AWS l’insieme più avanzato di controlli e funzionalità sulla sovranità digitale disponibili nel cloud.

AWS offre già una gamma di funzionalità di protezione dei dati, accreditamenti e impegni contrattuali che consentono ai clienti di controllare la localizzazione, l’accesso e l’utilizzo dei propri dati. Ci impegniamo a espandere queste funzionalità per consentire ai clienti di tutto il mondo di soddisfare i propri requisiti di sovranità digitale senza compromettere le capacità, le prestazioni, l’innovazione e la scalabilità del cloud AWS. Allo stesso tempo, continueremo a lavorare per comprendere a fondo le esigenze e i requisiti in evoluzione sia dei clienti che delle autorità di regolamentazione, adattandoci e innovando rapidamente per soddisfarli.

Sovereign-by-design

Il nostro approccio per mantenere questo impegno è quello di continuare a rendere il cloud AWS “sovereign-by-design”, come è stato sin dal primo giorno. All’inizio della nostra storia, abbiamo ricevuto da clienti in settori come quello finanziario e sanitario – che sono tra le organizzazioni più attente al mondo alla sicurezza e alla privacy dei dati – molti input sulle funzionalità e sui controlli relativi alla protezione dei dati di cui necessitano per utilizzare il cloud. Abbiamo sviluppato le funzionalità di crittografia e gestione delle chiavi di AWS, ottenuto accreditamenti di conformità e preso impegni contrattuali per soddisfare le esigenze dei nostri clienti. Con l’evolversi delle loro esigenze, sviluppiamo ed espandiamo il cloud AWS.

Un paio di esempi recenti includono i data residency guardrail che abbiamo aggiunto a AWS Control Tower (un servizio per la gestione degli ambienti in AWS) alla fine dell’anno scorso, che offrono ai clienti un controllo ancora maggiore sulla posizione fisica in cui i dati dei clienti vengono archiviati ed elaborati. A febbraio 2022 abbiamo annunciato i servizi AWS che aderiscono al Codice di condotta sulla protezione dei dati dei servizi di infrastruttura cloud in Europa (CISPE), offrendo ai clienti una verifica indipendente e un ulteriore livello di garanzia che i nostri servizi possano essere utilizzati in conformità con il Regolamento generale sulla protezione dei dati (GDPR). Queste funzionalità e garanzie sono disponibili per tutti i clienti AWS.

Ci impegniamo a continuare a investire in una roadmap ambiziosa di funzionalità per la residenza dei dati, la restrizione granulare dell’accesso, la crittografia e la resilienza:

1. Controllo della localizzazione dei tuoi dati

I clienti hanno sempre controllato la localizzazione dei propri dati con AWS. Ad esempio, attualmente in Europa i clienti possono scegliere di distribuire i propri dati attraverso una delle otto Region AWS disponibili. Ci impegniamo a fornire ancora più servizi e funzionalità per proteggere i dati dei nostri clienti e ad espandere le nostre capacità esistenti per fornire controlli e trasparenza ancora più dettagliati sulla residenza dei dati. Estenderemo inoltre anche i controlli relativi ai dati operativi, come le informazioni su identità e fatturazione.

2. Controllo verificabile sull’accesso ai dati

Abbiamo progettato e realizzato un’innovazione unica nel suo genere per limitare l’accesso ai dati dei clienti. Il sistema AWS Nitro, che è alla base dei servizi informatici di AWS, utilizza hardware e software specializzati per proteggere i dati dall’accesso esterno durante l’elaborazione su EC2. Fornendo un solido limite di sicurezza fisico e logico, Nitro è progettato per applicare restrizioni in modo che nessuno, nemmeno in AWS, possa accedere ai carichi di lavoro dei clienti su EC2. Ci impegniamo a continuare a creare ulteriori restrizioni di accesso che limitino tutti gli accessi ai dati dei clienti, a meno che non sia richiesto dal cliente o da un partner di loro fiducia.

3. La capacità di criptare tutto e ovunque

Attualmente, offriamo ai clienti funzionalità e controlli per criptare i dati, che siano questi in transito, a riposo o in memoria. Tutti i servizi AWS già supportano la crittografia, e la maggior parte supporta anche la crittografia con chiavi gestite dal cliente, non accessibili per AWS. Ci impegniamo a continuare a innovare e investire in controlli aggiuntivi per la sovranità e le funzionalità di crittografia in modo che i nostri clienti possano criptare tutto e ovunque con chiavi di crittografia gestite all’interno o all’esterno del cloud AWS.

4. Resilienza del cloud

Non è possibile raggiungere la sovranità digitale senza resilienza e affidabilità. Il controllo dei carichi di lavoro e l’elevata disponibilità sono essenziali in caso di eventi come l’interruzione della catena di approvvigionamento, l’interruzione della rete e i disastri naturali. Attualmente AWS offre il più alto livello di disponibilità di rete rispetto a qualsiasi altro cloud provider. Ogni regione AWS è composta da più zone di disponibilità (AZ), che sono partizioni di infrastruttura completamente isolate. Per isolare meglio i problemi e ottenere un’elevata disponibilità, i clienti possono partizionare le applicazioni tra più AZ nella stessa Region AWS. Per i clienti che eseguono carichi di lavoro in locale o in casi d’uso remoti o connessi in modo intermittente, offriamo servizi che forniscono funzionalità specifiche per i dati offline e l’elaborazione e lo storage remoti. Ci impegniamo a continuare a migliorare la nostra gamma di opzioni per la sovranità del dato e la resilienza, consentendo ai clienti di continuare ad operare anche in caso di interruzioni o disconnessioni.

Guadagnare fiducia attraverso trasparenza e garanzie

In AWS, guadagnare la fiducia dei clienti è alla base della nostra attività. Comprendiamo che proteggere i dati dei clienti è fondamentale per raggiungere questo obiettivo, e sappiamo che la fiducia si guadagna anche attraverso la trasparenza. Per questo siamo trasparenti su come i nostri servizi elaborano e spostano i dati. Continueremo ad opporci alle richieste relative ai dati dei clienti provenienti dalle forze dell’ordine e dalle agenzie governative. Forniamo linee guida, prove di conformità e impegni contrattuali in modo che i nostri clienti possano utilizzare i servizi AWS per soddisfare i requisiti normativi e di conformità. Ci impegniamo infine a continuare a fornire la trasparenza e la flessibilità aziendali necessarie a soddisfare l’evoluzione delle leggi sulla privacy e sulla sovranità del dato.

Gestire i cambiamenti come un team

Aiutare i clienti a proteggere i propri dati in un mondo caratterizzato da normative, tecnologie e rischi in evoluzione richiede un lavoro di squadra. Non ci aspetteremmo mai che i nostri clienti lo facessero da soli. I nostri partner di fiducia svolgono un ruolo di primo piano nel fornire soluzioni ai clienti. Ad esempio in Germania, T-Systems (parte di Deutsche Telekom) offre un servizio denominato Data Protection as a Managed Service on AWS. Questo fornisce indicazioni per garantire che i controlli di residenza dei dati siano configurati correttamente, offrendo servizi per la configurazione e la gestione delle chiavi di crittografia, oltre a competenze per aiutare i clienti a soddisfare i requisiti di sovranità digitale nel cloud AWS, per i quali stiamo collaborando con partner locali di cui i nostri clienti si fidano.

Ci impegniamo ad aiutare i nostri clienti a soddisfare i requisiti di sovranità digitale. Continueremo a innovare le funzionalità, i controlli e le garanzie di sovranità del dato all’interno del cloud AWS globale e a fornirli senza compromessi sfruttando tutta la potenza di AWS.


Japanese version

AWS Digital Sovereignty Pledge(AWSのデジタル統制に関するお客様との約束): 妥協のない管理

AWS セールス、マーケティングおよびグローバルサービス担当シニアバイスプレジデント マット・ガーマン(Matt Garman)

クラウドの可能性を最大限に引き出すためには、お客様が自らデータを管理することが不可欠であると、私たちは常に考えてきました。アマゾン ウェブ サービス(以下、AWS) は、お客様がデータの保管場所の管理とデータの移動を統制できる唯一の大手クラウドプロバイダーであった当初から、お客様にこれらの統制を実施していただくことを最優先事項としていました。クラウドが主流になり、政府機関及び標準化団体がセキュリティ、データ保護、プライバシーに関する規制を策定し続ける中で、お客様による統制の重要性は過去 16 年間にわたって一貫して高まっています。

今日、デジタル資産を管理すること、つまりデジタル統制は、かつてないほど重要になっています。

私たちは、世界で最も高性能で、スケーラビリティと信頼性の高いクラウドを提供するために、イノベーションとサービスの拡大を実現してきました。その中でお客様が、どの地域でも継続して管理し、規制要件を満たせるようにすることを最優先してきました。この状況は、業界や国によって大きく異なります。ヨーロッパなど世界中の多くの地域で、デジタル統制に関する政策が急速に発展しています。お客様は非常に複雑な状況に直面しています。過去 18 か月間にわたり、私たちは、お客様から、AWSの全ての機能を使うか、またはイノベーション・変革・成長を妨げてでも機能が限定されたクラウドソリューションを使うかを選択しなければいけない、という懸念の声を多く聞いてきました。私たちは、お客様がこのような選択を迫られるべきではないと考えています。

本日、私たちは、「AWS Digital Sovereignty Pledge(AWSのデジタル統制に関するお客様との約束)」を発表いたします。クラウドで利用できる最も高度な一連の統制管理と機能とを、すべての AWS のお客様に提供します。

AWS は既に、データの保存場所、アクセスできるユーザー、データの使用方法をお客様が管理できるようにする、さまざまなデータ保護機能、認定、契約上の責任を提供しています。私たちは、世界中のお客様が AWS クラウドの機能、パフォーマンス、イノベーション、スケールを犠牲にすることなく、デジタル統制の要件を満たすことができるよう、これらの機能を拡張することを約束いたします。同時に、お客様と規制当局双方の進化するニーズと要件を深く理解し、迅速な導入とイノベーションにより、デジタル統制のニーズと要件を満たすよう継続的に努力していきます。

企画・設計段階からの統制 (sovereign-by-design)

この約束を果たすための私たちのアプローチは、創業初日からそうであったように、AWS クラウドにおいて企画・設計段階からの統制を維持し続けることです。私たちのこれまでの取組の初期段階で、金融サービスやヘルスケアなどの業界のお客様 (世界で最もセキュリティとデータのプライバシーを重視する組織であるお客様) から、クラウドを使うために必要なデータ保護機能や管理について多くのご意見をいただきました。そして、私たちは AWS の暗号化とキー管理機能を開発し、コンプライアンス認定を取得して、お客様のニーズを満たすための契約を締結してきました。お客様の要件の進化に応じて、AWS クラウドも進化、拡張しています。最近の例としては、昨年末に AWS Control Tower (AWS 環境を管理するサービス) に追加したデータレジデンシーガードレールがあります。これにより、お客様のデータが保存および処理される物理的な場所をさらに詳細に管理できるようになりました。2022 年 2 月に、Cloud Infrastructure Service Providers in Europe (CISPE) のデータ保護行動規範に準拠した AWS のサービスを発表しました。これにより、お客様は独立した検証と、当社のサービスがEUの一般データ保護規則 (GDPR) に準拠して使用できるという追加の保証を得ることができます。これらの機能と保証は、すべての AWS のお客様にご利用いただけます。

私たちは、データ保管、きめ細かなアクセス制限、暗号化、耐障害性に関する機能の展開・拡大に引き続き投資することを約束します。

1. データの保管場所の管理

お客様は、AWS を使用して常にデータの保管場所を制御できます。例えば、現在ヨーロッパでは、8 つある既存のリージョンのいずれにもデータを保管できます。私たちは、お客様のデータを保護するために、さらに多くのサービスと機能を提供するよう努めています。さらに、よりきめ細かなデータの保管・管理と透明性を提供するため、既存の機能を拡張することにも取り組んでいます。また、ID や請求情報などの運用データのデータ保管・管理も拡大していきます。

2.検証可能なデータアクセスの管理

私たちは、お客様のデータへのアクセスを制限する、他に類を見ないイノベーションを設計し、実現してきました。AWS のコンピューティングサービスの基盤である AWS Nitro System は、専用のハードウェアとソフトウェアを使用して、Amazon Elastic Compute Cloud (Amazon EC2) での処理中に外部アクセスからデータを保護します。Nitro は、物理的にも論理的にも強固なセキュリティ境界を設けることで、AWS の従業員を含め、誰も Amazon EC2 上のお客様のワークロードにアクセスできないように制限を行います。私たちは、お客様またはお客様が信頼しているパートナーからの要求がない限り、お客様のデータへのすべてのアクセスを制限できるよう、さらなるアクセス管理の構築に継続的に取り組んでいきます。

3.あらゆる場所ですべてを暗号化する機能

現在私たちは、転送中、保存中、メモリ内にあるかを問わず、データを暗号化する機能と管理をお客様に提供しています。すべての AWS のサービスは既に暗号化をサポートしており、そのほとんどのサービスでは、AWSからもアクセスできないお客様が管理するキーによる暗号化もサポートしています。私たちは、お客様が AWS クラウドの内部または外部で管理されている暗号化キーを使用して、あらゆる場所ですべてを暗号化できるように、統制と暗号化機能のさらなる管理に向けたイノベーションと投資を続けていきます。

4. クラウドの耐障害性

可用性と強靭性なくしてデジタル統制を実現することは不可能です。サプライチェーンの中断、ネットワークの中断、自然災害などの事象が発生した場合、ワークロードの管理と高可用性が重要になります。現在、AWS はどのクラウドプロバイダーよりも高いネットワーク可用性を実現しています。各 AWS リージョンは、完全に分離されたインフラストラクチャパーティションである複数のアベイラビリティーゾーン (AZ) で構成されています。生じうる事象をより適切に分離して高可用性を実現するために、お客様は同じ AWS リージョン内の複数の AZ にアプリケーションを分割できます。ワークロードをオンプレミスで実行しているお客様、または断続的な接続やリモートのユースケースには、オフラインデータおよびリモートコンピューティングとストレージに関する特定の機能を備えたサービスを提供しています。私たちは、中断や切断が発生してもお客様が業務を継続できるように、統制と回復力のある選択肢を継続的に拡大していきます。

透明性と保証による信頼の獲得

AWS では、お客様の信頼を得ることはビジネスの根幹です。そのためには、お客様のデータの保護が不可欠であると考えています。また、信頼を継続的に得るには、透明性が必要であることも理解しています。私たちは、当社のサービスによるデータの処理および転送方法に関して、透明性を確保しています。法執行機関や政府機関からのお客様のデータの要求に対しては、引き続き異議申し立てを行っていきます。お客様が AWS のサービスを利用してコンプライアンスや規制の要件を満たすことができるように、ガイダンス、コンプライアンスの証拠、契約責任を提供します。私たちは、進化するプライバシーおよび統制に関する法律に対応するために必要な透明性とビジネスの柔軟性を引き続き提供していきます。

チームとしての変化への対応

規制、テクノロジー、リスクが変化する世界において、お客様によるデータの保護を支援するためには、チームワークが必要です。私たちは、お客様だけで対応することを期待するようなことは決してありません。AWS の信頼できるパートナーが、お客様にソリューションを提供するうえで顕著な役割を果たします。例えば、ドイツでは、T-Systems (ドイツテレコムグループ) が AWS のマネージドサービスとしてデータ保護を提供しています。同社は、データ保護・管理が適切に設定されていることを確認するためのガイダンスを提供し、暗号化キーの設定と管理に関するサービスと専門知識を提供して、顧客が AWS クラウドでデジタル統制要件に対応できるよう支援しています。私たちは、デジタル統制要件への対応を支援するために、お客様が信頼するローカルパートナーとの連携を強化しています。

私たちは、お客様がデジタル統制要件を満たすことができるよう、支援を行うことを約束しています。私たちは引き続き、グローバルな AWS クラウドにおける統制機能、管理、保証を革新し、それらを AWS の全ての機能において妥協することなく提供していきます。


Korean version

AWS 디지털 주권 서약: 타협 없는 제어

작성자: Matt Garman, 아마존웹서비스(AWS) 마케팅 및 글로벌 서비스 담당 수석 부사장

아마존웹서비스(AWS)는 클라우드가 지닌 잠재력이 최대로 발휘되기 위해서는 고객이 반드시 자신의 데이터를 제어할 수 있어야 한다고 항상 믿어 왔습니다. AWS는 서비스 초창기부터 고객이 데이터의 위치와 이동을 제어할 수 있도록 허용하는 유일한 주요 클라우드 공급업체였고 지금까지도 고객에게 이러한 권리를 부여하는 것을 최우선 과제로 삼고 있습니다. 클라우드가 주류가 되고 정부 및 표준 제정 기관이 보안, 데이터 보호 및 개인정보보호 규정을 지속적으로 발전시키면서 지난 16년간 이러한 원칙의 중요성은 더욱 커졌습니다.

오늘날 디지털 자산 또는 디지털 주권 제어는 그 어느 때보다 중요합니다.

AWS는 세계에서 가장 우수하고 확장 가능하며 신뢰할 수 있는 클라우드를 제공하기 위한 혁신과 서비스 확대에 주력하면서, 고객이 사업을 운영하는 모든 곳에서 스스로 통제할 수 있을 뿐 아니라 규제 요구 사항을 충족할 수 있어야 한다는 점을 언제나 최우선 순위로 여겨왔습니다. AWS의 이러한 기업 철학은 개별 산업과 국가에 따라 상당히 다른 모습으로 표출됩니다. 유럽을 비롯해 전 세계 많은 지역에서 디지털 주권 정책이 빠르게 진화하고 있습니다. 고객은 엄청난 복잡성에 직면해 있으며, 지난 18개월 동안 많은 고객들은 AWS의 모든 기능을 갖춘 클라우드 서비스와 혁신, 변화, 성장을 저해할 수 있는 제한적 기능의 클라우드 솔루션 중 하나를 선택해야만 할 수도 있다는 우려가 있다고 말하였습니다. 하지만 AWS는 고객이 이러한 선택을 해서는 안 된다는 굳건한 믿음을 가지고 있습니다.

이것이 바로 오늘, 모든 AWS 고객에게 클라우드 기술에서 가장 발전된 형태의 디지털 주권 제어와 기능을 제공하겠다는 약속인, “AWS 디지털 주권 서약”을 소개하는 이유입니다.

AWS는 이미 고객이 데이터 위치, 데이터에 액세스가 가능한 인력 및 데이터 이용 방식을 제어할 수 있는 다양한 데이터 보호 기능, 인증 및 계약상 의무를 제공하고 있습니다. 또한 전 세계 고객이 AWS 클라우드의 기능, 성능, 혁신 및 규모에 영향을 주지 않으면서 디지털 주권 관련 요건을 충족할 수 있도록 기존 데이터 보호 정책을 지속적으로 확대할 것을 약속합니다. 동시에 고객과 규제 기관의 변화하는 요구와 규제 요건을 깊이 이해하고 이를 충족하기 위해 신속하게 적응하고 혁신을 이어나가고자 합니다.

디지털 주권 보호를 위한 원천 설계

이 약속을 이행하기 위한 우리의 접근 방식은 처음부터 그러했듯이 AWS 클라우드를 애초부터 디지털 주권을 강화하는 방식으로 설계하는 것입니다. AWS는 사업 초기부터 금융 서비스 및 의료 업계와 같이 세계 최고 수준의 보안 및 데이터 정보 보호를 요구하는 조직의 고객으로부터 클라우드를 사용하는 데 필요한 데이터 보호 기능 및 제어에 관한 많은 의견을 수렴했습니다. AWS는 고객의 요구 사항을 충족하기 위하여 암호화 및 핵심적 관리 기능을 개발하고, 규정 준수 인증을 획득했으며, 계약상 의무를 제공하였습니다. 고객의 요구 사항이 변화함에 따라 AWS 클라우드도 진화하고 확장해 왔습니다. 최근의 몇 가지 예를 들면, 작년 말 AWS Control Tower(AWS 환경 관리 서비스)에 추가한 데이터 레지던시 가드레일이 있습니다. 이는 고객 데이터가 저장되고 처리되는 물리적 위치에 대한 제어 권한을 고객에게 훨씬 더 많이 부여하는 것을 목표로 합니다. 2022년 2월에는 유럽 클라우드 인프라 서비스(CISPE) 데이터 보호 행동 강령을 준수하는 AWS 서비스를 발표한 바 있습니다. 이것은 유럽연합의 개인정보보호규정(GDPR)에 따라 서비스를 사용할 수 있다는 독립적인 검증과 추가적 보증을 고객에게 제공하는 것입니다. 이러한 기능과 보증은 AWS의 모든 고객에게 적용됩니다.

AWS는 데이터 레지던시, 세분화된 액세스 제한, 암호화 및 복원력 기능의 증대를 위한 야심 찬 로드맵에 다음과 같이 계속 투자할 것을 약속합니다.

1. 데이터 위치에 대한 제어

고객은 항상 AWS를 통해 데이터 위치를 제어해 왔습니다. 예를 들어 현재 유럽에서는 고객이 기존 8개 리전 중 원하는 위치에 데이터를 배치할 수 있습니다. AWS는 고객의 데이터를 보호하기 위한 더 많은 서비스와 기능을 제공할 것을 약속합니다. 또한, 기존 기능을 확장하여 더욱 세분화된 데이터 레지던시 제어 및 투명성을 제공할 것을 약속합니다. 뿐만 아니라, ID 및 결제 정보와 같은 운영 데이터에 대한 데이터 레지던시 제어를 확대할 예정입니다.

2. 데이터 액세스에 대한 검증 가능한 제어

AWS는 고객 데이터에 대한 액세스를 제한하는 최초의 혁신적 설계를 제공했습니다. AWS 컴퓨팅 서비스의 기반인 AWS Nitro System은 특수 하드웨어 및 소프트웨어를 사용하여 EC2에서 처리하는 동안 외부 액세스로부터 데이터를 보호합니다. 강력한 물리적 보안 및 논리적 보안의 경계를 제공하는 Nitro는 AWS의 모든 사용자를 포함하여 누구도 EC2의 고객 워크로드에 액세스할 수 없도록 설계되었습니다. AWS는 고객 또는 고객이 신뢰하는 파트너의 요청이 있는 경우를 제외하고, 고객 데이터에 대한 모든 액세스를 제한하는 추가 액세스 제한을 계속 구축할 것을 약속합니다.

3. 모든 것을 어디서나 암호화하는 능력

현재 AWS는 전송 중인 데이터, 저장된 데이터 또는 메모리에 있는 데이터를 암호화할 수 있는 기능과 제어권을 고객에게 제공합니다. 모든 AWS 서비스는 이미 암호화를 지원하고 있으며, 대부분 AWS에서 액세스할 수 없는 고객 관리형 키를 사용한 암호화도 지원합니다. 또한 고객이 AWS 클라우드 내부 또는 외부에서 관리되는 암호화 키로 어디서든 어떤 것이든 것을 암호화할 수 있도록 디지털 주권 및 암호화 기능에 대한 추가 제어를 지속적으로 혁신하고 투자할 것을 약속합니다.

4. 클라우드의 복원력

복원력과 생존성 없이는 디지털 주권을 달성할 수 없습니다. 공급망 장애, 네트워크 중단 및 자연 재해와 같은 이벤트가 발생할 경우 워크로드 제어 및 고가용성은 매우 중요합니다. 현재 AWS는 모든 클라우드 공급업체 중 가장 높은 네트워크 가용성을 제공합니다. 각 AWS 리전은 완전히 격리된 인프라 파티션인 여러 가용 영역(AZ)으로 구성됩니다. 문제를 보다 효과적으로 격리하고 고가용성을 달성하기 위해 고객은 동일한 AWS 리전의 여러 AZ로 애플리케이션을 분할할 수 있습니다. 온프레미스 또는 간헐적으로 연결되거나 원격 사용 사례에서 워크로드를 실행하는 고객을 위해 오프라인 데이터와 원격 컴퓨팅 및 스토리지에 대한 특정 기능을 지원하는 서비스를 제공합니다. AWS는 고객이 중단되거나 연결이 끊기는 상황에도 운영을 지속할 수 있도록 자주적이고 탄력적인 옵션의 범위를 지속적으로 강화할 것을 약속합니다.

투명성과 보장을 통한 신뢰 확보

고객의 신뢰를 얻는 것이 AWS 비즈니스의 토대입니다. 이를 위해서는 고객 데이터를 보호하는 것이 핵심이라는 점을 잘 알고 있습니다. 투명성을 통해 계속 신뢰를 쌓아야 한다는 점 또한 잘 알고 있습니다. AWS는 서비스가 데이터를 처리하고 전송하는 방식을 투명하게 공개합니다. 법 집행 기관 및 정부 기관의 고객 데이터 제공 요청에 대하여 계속해서 이의를 제기할 것입니다. AWS는 고객이 AWS 서비스를 사용하여 규정 준수 및 규제 요건을 충족할 수 있도록 지침, 규정 준수 증거 및 계약상의 의무 이행을 제공합니다. AWS는 진화하는 개인 정보 보호 및 각 지역의 디지털 주권 규정을 준수하는 데 필요한 투명성과 비즈니스 유연성을 계속 제공할 것을 약속합니다.

팀 차원의 변화 모색

변화하는 규정, 기술 및 위험이 있는 환경에서 고객이 데이터를 보호할 수 있도록 하려면 팀워크가 필요합니다. 고객 혼자서는 절대 할 수 없는 일입니다. 신뢰할 수 있는 AWS의 파트너는 고객이 문제를 해결하는데 중요한 역할을 합니다. 예를 들어, 독일에서는 T-Systems(Deutsche Telekom의 일부)를 통해 AWS의 관리형 서비스로서의 데이터 보호를 제공합니다. 데이터 레지던시 제어가 올바르게 구성되도록 하기 위한 지침을 제공하고, 암호화 키의 구성 및 관리를 위한 서비스를 통해 고객이 AWS 클라우드에서 디지털 주권 요구 사항을 해결하는 데 도움이 되는 전문 지식을 제공합니다. AWS는 고객이 디지털 주권 요구 사항을 해결하는 데 도움이 될 수 있도록 신뢰할 수 있는 현지 파트너와 협력하고 있습니다.

AWS는 고객이 디지털 주권 요구 사항을 충족할 수 있도록 최선의 지원을 다하고있습니다. AWS는 글로벌 AWS 클라우드 내에서 주권 기능, 제어 및 보안을 지속적으로 혁신하고, AWS의 모든 기능에 대한 어떠한 타협 없이 이를 제공할 것입니다.

Matt Garman

Matt Garman

Matt is currently the Senior Vice President of AWS Sales, Marketing and Global Services at AWS, and also sits on Amazon’s executive leadership S-Team. Matt joined Amazon in 2006, and has held several leadership positions in AWS over that time. Matt previously served as Vice President of the Amazon EC2 and Compute Services businesses for AWS for over 10 years. Matt was responsible for P&L, product management, and engineering and operations for all compute and storage services in AWS. He started at Amazon when AWS first launched in 2006 and served as one of the first product managers, helping to launch the initial set of AWS services. Prior to Amazon, he spent time in product management roles at early stage Internet startups. Matt earned a BS and MS in Industrial Engineering from Stanford University, and an MBA from the Kellogg School of Management at Northwestern University.

Lower your Amazon OpenSearch Service storage cost with gp3 Amazon EBS volumes

Post Syndicated from Siddhant Gupta original https://aws.amazon.com/blogs/big-data/lower-your-amazon-opensearch-service-storage-cost-with-gp3-amazon-ebs-volumes/

Amazon OpenSearch Service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open-source, distributed search and analytics suite comprising OpenSearch, a distributed search and analytics engine, and OpenSearch Dashboards, a UI and visualization tool. When you use Amazon OpenSearch Service, you configure a set of data nodes to store indexes and serve queries. The service supports instance types for data nodes with different storage options. Some supported Amazon Elastic Compute Cloud (Amazon EC2) instance types, like the R6GD or I3, have local NVMe disks. Others use Amazon Elastic Block Store (Amazon EBS) storage.

On July 2022, OpenSearch Service launched support for the next generation, general purpose SSD (gp3) EBS volumes. OpenSearch Service data nodes require low latency and high throughput storage to provide fast indexing and query. With gp3 EBS volumes, you get higher baseline performance (IOPS and throughput) at a 9.6% lower cost than with the previously offered gp2 EBS volume type. You can provision additional IOPS and throughput independent of volume size using gp3. gp3 volumes are also more stable because they don’t use burst credits. OpenSearch support for gp3 volumes includes doubling the limit on per-data node volume sizes. With these larger volumes, you can reduce the cost of passive data, increasing the amount of storage per node.

We recommend that you consider gp3 as the best Amazon EBS option for price/performance and flexibility. In this post, I discuss the basics of gp3 and various cost-saving use cases. Migrating from previous generation storage (gp2, PIOPS, and magnetic) volumes to the latest generation gp3 volumes allows you to reduce monthly storage costs and optimize instance utilization.

Comparing gp2 and gp3

gp3 is the successor to the general purpose SSD gp2 volume. The key benefits of gp3 include higher baseline performance, 9.6% lower cost, and the ability to provision higher performance regardless of volume. The following table summarizes the key differences between gp2 and gp3.

Volume type gp3 gp2
Volume size Depends on instance type. Max OpenSearch Service supports 24 TiB for R6g.12Xlarge. For the latest instance limits, see Amazon OpenSearch Service quotas. Depends on instance type. Max OpenSearch Service supports 12 TiB for R6g.12Xlarge.
Baseline IOPS 3,000 IOPS for volume size up to 1,024 GiB. For volumes above 1,024 GiB, you get 3 IOPS/GiB, without burst credit complexity. 3 IOPS/GiB (minimum 100 IOPS) to a maximum of 16,000 IOPS. Volumes smaller than 1 TiB can also burst up to 3,000 IOPS.
Max IOPS/volume 16,000 16,000
Baseline throughput 125 MiB/s free for volume size up to 170 GiB, or 250 MiB/s free for volume above 170 GiB. Between 125 MiB/s and 250 MiB/s, depending on the volume size.
Max throughput/volume 1,000 MiB/s 250 MiB/s
Price for us-east-1 Region
  • Storage – $0.122/GB-month.
  • IOPS – 3,000 IOPS free for volumes up to 1,024 GiB, or 3 IOPS/GiB free for volumes above 1,024 GiB. $0.008/provisioned IOPS-month over free limits.
  • Throughput – 125 MiB/s free for volumes up to 170 GiB, or +250 MiB/s free for every 3 TiB for volumes above 170 GiB. $0.064/provisioned MiB/s-month over free limits.
  • Storage – $0.135/GB-month.
  • IOPS and throughput provisioning not allowed.
Instance supported T3, C5, M5, R5, C6g, M6g, and R6g T2, C4, M4, R4, T3, C5, M5, R5, C6g, M6g, and R6g

Lower your monthly bills with gp3

The ability to provision IOPS and throughput independent of volume size and support for denser (twice as large) volume sizes are two significant advantages of gp3 adoption. Together, these benefits enable multiple use cases to lower your monthly bills. In this section, we present a few examples of pricing comparisons for OpenSearch domains.

gp2 vs. gp3

This is the most common scenario, in which existing gp2 customers switch to gp3 and immediately begin saving 9.6% due to the lower monthly price per GB for gp3 storage. You can also benefit from the fact that gp3 supports volume sizes two times larger for the R5, R6g, M5, and M6g instance families. This means that you don’t need to spin up new instances for denser storage requirements and can achieve higher storage on the same instance. OpenSearch Service currently supports a maximum of 24 TiB of gp3 storage on R6g.12Xlarge instances.

PIOPS (io1) vs. gp3

OpenSearch Service supports the PIOPS SSD (io1) EBS volume type. You can switch to gp3 and provision additional IOPS and throughput to meet your specific performance requirements. The following table compares the monthly cost of PIOPS (io1) and gp3 storage with R5.large.search instances for storage requirements of 6 TiB and 16000 IOPS. In this example, you would save 65% with gp3 adoption.

. PIOPS (io1) gp3
Instance cost

6 instances * $0.186/hr = $830/month

(r5.large.search can support up to 1 TiB storage for io1; to support 6 TiB we require six instances.)

3 instances * $0.167Hr = $372/month

(r6g.large.search can support up to 2 TiB storage for gp3; to support 6 TiB we require three instances.)

Storage cost (6 TiB)

6,597 GB * $0.169/GB-month = $1115/month

Notes:
(a) Price for PIOPS(io1) is $0.169 per GB/month.
(b) 6TiB = 6597 GB

6,597 GB * $0.122/GB-month = $805/month

Notes:
(a) Price for gp3 storage is $0.122 per GB/month.
(b) 6TiB = 6597 GB

PIOPS cost (16000 PIOPS)

16000 IOPS * $0.088/IOPS-month = $1408/month

Note: io1 PIOPS rate is $0.088 per IOPS-month.

18,000 IOPS is included in the price for 6 TiB volume of gp3; you don’t need to pay.

Note: 3 IOPS/ GiB Storage IOPS inlcued in price.

Total monthly bills $3,353/month $1,177/month

I3 vs. gp3

I3 instances include Non-Volatile Memory Express (NVMe) SSD-based instance storage optimized for low latency, very high random I/O performance, and high sequential read throughput, and delivers high IOPS. However, I3 uses older third-generation CPUs, and the largest storage supported size is 15 TiB with i3.16xlarge.search instance. You should consider using the largest generation instances such as R6g with gp3 storage to get lower cost and better performance over I3 instances.

To comprehend the cost advantage, let’s compare I3 and gp3 for 12 TiB of data storage needs. By switching to gp3 along with the current generation of instances, you can reduce your monthly bills by 56%, according to the calculations in the following table.

. I3.4xlarge gp3 with R6g.xlarge
On-demand instance cost for us-east-1 Region

4 instances * $1.99/hr = $5,922/month

Note: I3.4xlarge.search supports up to 3.8 TiB, so we require four instances to manage 12 TiB storage. Instance cost is $1.99/hr.

4 instances * $0.335/hr = $996/month

Note: R6g.xlarge.search supports up to 3 TiB with gp3, so we require four instances to manage 12 TiB. Instance cost is $0.335/hr.

Storage cost (12 TiB) N/A (included in instance price)

13,194 GB * $0.122/GB-month = $1,610/month

Notes:
(a) 12 TiB = 13,194 GB
(b) Storage cost is $0.122 per GB / month

Total monthly bills $5,922/month $2,606/month

UltraWarm vs. gp3

UltraWarm is designed to provide inexpensive access to infrequently accessed data, such as logs older than 30 days. Warm storage is useful for indexes that aren’t actively being written to, are queried less frequently, and don’t require high performance. If you have large and query-intensive workloads and are attempting to use UltraWarm to optimize costs but encountering higher query volumes than it can handle, you should consider moving some of the data volume to hot nodes with gp3 storage. UltraWarm will remain the least expensive option for your warm data (less-frequently accessed) type use cases, but you shouldn’t use it for hot data use cases. A combination of low-cost gp3 storage and denser instances can help you achieve cost-optimized higher performance for hot data.

The following table shows the monthly costs associated with running a 30 TiB UltraWarm workload, along with a comparison to the potential monthly costs of gp2 and gp3. With gp3, you can save up to 36% compared to gp2. Please note that UltraWarm setup does require hot data nodes; however, we excluded them in the UltraWarm column to focus on UltraWarm replacement costs with hot data nodes using gp2 and gp3.

. UltraWarm All Hot (gp2 with R6g.8xlarge) All Hot (gp3 with R6g.8xlarge)
Instance cost (On-demand)

2 UW large instances * $2.68/hr = $3,987/month

Note: ultrawarm1.large.search supports max 20 TiB, so we need two instances.

4 instances * $2.677/hr = $7,966/month

Note: r6g.8xlarge.search supports max 8 TiB with gp2, so we require four instances.

2 Instances * $2.677/hr= $3,984/month

Note: r6g.8xlarge.search supports max 16 TiB with gp3, so we only require two instances.

Storage cost (30 TiB)

32,985 GB * $0.024/GB-month = $792/month

Notes:
(1) Storage price is $0.024/per GB/month).
(2) 30 TiB = 32985 GB

32,985 GB * $0.135/GB-month = $4,453/month

Notes:
(1) Storage price is $0.135 per GB/month.
(2) 30 TiB = 32985 GB

32,985 GB * $0.122/GB-month = $4,024/month

Notes:
(1) Storage price is $0.122 per GB/month.
(2) 30 TiB = 32985 GB

Total Monthly Bills $4,779/month $12,419/month $8,008/month

All the preceding use cases are from a cost perspective. Before making any changes to the production environment, we recommend validating performance in a test environment for your unique workload and ensuring that configuration changes don’t result in performance degradation.

Optimize instance cost with gp3’s denser storage

OpenSearch Service increased the maximum volume size supported per instance for gp3 by 100% when compared to gp2 for the R5, R6g, M5, and M6g instance families due to gp3’s improved baseline performance. You can optimize your instance needs by taking advantage of the increased storage per instance volume. For example, R6g.large supports up to 2 TiB with gp3, but only 1 TiB with gp2. If you require support for 12 TiB of data storage, you can reconfigure your domains from six data nodes to three R6g.large in order to reduce your instance costs. For OpenSearch EBS instance-specific volume limits, refer to EBS volume size quotas.

Upgrade from gp2 to gp3

To use the EBS gp3 volume type, you must first upgrade your domain’s instances to supported instance types if they don’t already support gp3. For a list of OpenSearch Service supported instances, see EBS volume size quotas. The transition from gp2 to gp3 is seamless. You can upgrade domain configurations from existing EBS volume types such as gp2, Magnetic, and PIOS (io1) to gp3 through OpenSearch Service console or the UpdatedomainConfig API. The configuration change will initiate blue/green deployment, which runs in the background without impacting your online traffic and, depending on the data size, is complete in a few hours. Blue/green deployments run in the background, ensuring that your online traffic is uninterrupted and preventing data loss.

gp3 baseline performance, and additional provisioning limits

One of the gp3’s key features is the ability to scale IOPS and throughput independent of volume. When your application requires more performance, you can scale up to 16,000 IOPS and 1,000 MiB/s throughput for an additional fee. OpenSearch Service EBS gp3 delivers a baseline performance of 3,000 IOPS and 125 MiB/s throughput at any volume size. In addition, OpenSearch Service provisions additional IOPS and throughput for larger volumes to ensure optimal performance. For volumes above 1,024 GiB, you receive 3 IOPS/GiB, and for volumes above 170 GiB, you get an incremental 250 MiB/s for every 3 TiB of storage.

The following table outlines OpenSearch Service baseline IOPS and throughput, as well as the maximum amount you can provision. Note that your instance type may have additional limitations regarding how much and for how long it can support these performance baselines in a 24-hour period. For more information about instances and their limits, refer to Amazon EBS-optimized instances.

Additional performance customers can provisions

.. Baseline (included in storage price) Additional performance customers can provision
Volume Storage (in GiB) IOPS throughput (MiB/s) IOPS throughput (MiB/s)
170 3,000 125 13,000 875
172 3,000 250 13,000 750
1,024 3,000 250 13,000 750
1,025 3,075 250 12,925 750
3,000 9,000 250 7,000 750
3,001 9,003 500 6,997 500
6,000 18,000 500 NA 500
6,001 18,003 750 NA 250
9,001 27,003 1,000 NA NA
24,000 72,000 2,000 NA NA

Do you need additional performance?

In the majority of use cases, you don’t need to provision additional IOPS and throughput, and gp3 baseline performance should suffice. You can use Amazon CloudWatch metrics to find the usage patterns, and if you observe current limits of IOPS and throughput bottlenecking your index and query performance, you should provision additional performance. For more information, refer to EBS volume metrics.

Conclusion

This post explains how OpenSearch Service general purpose SSD gp3 volumes can significantly reduce monthly storage and instance costs, making them more cost-effective than gp2 volumes. Migration to gp3 volumes with the same size and performance configurations as gp2 is the quickest and simplest way to reduce costs. Additionally, you should also consider reducing instance costs by taking advantage of gp3’s support for denser storage per data node.

For more details, check out Amazon OpenSearch Service pricing and Configuration API reference for Amazon OpenSearch Service.


About the author

Siddhant Gupta is a Sr. Technical Product Manager at Amazon Web Services based in Hyderabad, India. Siddhant has been with Amazon for over five years and is currently working with the OpenSearch Service team, helping with new region launches, pricing strategy, and bringing EC2 and EBS innovations to OpenSearch Service customers . He is passionate about analytics and machine learning. In his free time, he loves traveling, fitness activities, spending time with his family and reading non-fiction books.