Tag Archives: Thought Leadership

Securing the future: building a culture of security

Post Syndicated from Carter Spriggs original https://aws.amazon.com/blogs/security/securing-the-future-building-a-culture-of-security/

According to a 2024 Verizon report, nearly 70% of data breaches occurred because a person was manipulated by social engineering or made some type of error. This highlights the importance of human-layer defenses in an organization’s security strategy. In addition to technology, tools, and processes, security requires awareness and action from everyone in an organization to recognize anomalies, escalate potential issues, and ultimately, mitigate risk.

Organizations that invest in a culture of security see better employee adoption of security controls, improved cybersecurity behavior, and a more effective use of cybersecurity resources, according to a 2024 Gartner analysis. This aligns with our own experience at AWS, where we deeply invest in our culture of security. Our leadership prioritizes security and builds it into our organizational structure. Everyone, regardless of role, views security as a shared responsibility. Security advocates and advisors are embedded in our teams to share their expertise, and innovation empowers our people to move fast while staying secure.

Building and maintaining a culture of security requires constant investment and focus. In our recent culture of security series with The Guardian, we share perspectives from AWS leaders on some of the most common questions that people ask us about how to create a culture of security:

The journey to creating a culture of security begins with the first step. Although this journey looks different for every organization, sharing what we’ve learned may spur ideas for how you can help create a security-first mindset in your own team or organization.

We invite you to explore the series and learn more about how AWS sustains a strong culture of security.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Carter Spriggs
Carter Spriggs

Carter is a Product Marketing Manager at AWS.

AWS post-quantum cryptography migration plan

Post Syndicated from Matthew Campagna original https://aws.amazon.com/blogs/security/aws-post-quantum-cryptography-migration-plan/

Amazon Web Services (AWS) is migrating to post-quantum cryptography (PQC). Like other security and compliance features in AWS, we will deliver PQC as part of our shared responsibility model. This means that some PQC features will be transparently enabled for all customers while others will be options that customers can choose to implement to help meet their requirements. This transition will happen in phases, starting with systems that communicate over untrusted networks such as the internet.

The threat of a large-scale quantum computer, sometimes referred to as a cryptographically relevant quantum computer, is its potential to break the public-key cryptographic algorithms in use today. These algorithms are used in most communication protocols and digital signature schemes. For the past eight years, AWS—along with other industry leaders, government agencies, and academia—has been advocating, researching, and proposing new public-key cryptographic algorithms that are resistant to quantum computing. Because customers rely on cryptography performed by AWS to secure their data, we engaged in this work early on to minimize the effort and the impact of the eventual migration to PQC. While there is no evidence that a quantum computer powerful enough to break the public key cryptography in use throughout AWS exists today, we are not waiting. We would rather put protections in place now to protect the security of our customers’ data into the future.

This post summarizes where AWS is today in the journey of migrating to PQC and outlines our path forward.

For the past five years we’ve deployed early versions of PQC algorithms under evaluation by the U.S. National Institute of Standards and Technology (NIST) in both our open-source libraries and security-critical services to allow customers to test the performance impact of moving to PQC. For example, our open-source library for algorithm implementations (AWS-LC), our implementation of TLS (s2n) and core security services like AWS Key Management Service (AWS KMS), AWS Secrets Manager and AWS Certificate Manager (ACM) have had implementations of NIST PQC proposed algorithms for key encapsulation as far back as 2019.

On August 13, 2024, the NIST announced three new post-quantum cryptographic (PQC) algorithms as Federal Information Processing Standards (FIPS). This was the result of NIST’s PQC Standardization Process started in 2016. AWS employees are contributors to many of the proposed schemes including the three new FIPS standards.

Many of our customers have been tracking the standardization process, including the U.S. Government’s Commercial National Security Algorithms (CNSA) Suite 2.0 requirements around PQC adoption and the European Commission’s Recommendation on a Coordinated Implementation Roadmap for the transition to Post-Quantum Cryptography.

Now that the first round of PQC algorithms has been standardized, we can start to implement them for long-term support. Here’s our approach to implementing PQC to provide a seamless transition for our customers who rely on our services and open-source tools to handle cryptographic operations on their behalf.

AWS will take a multi-layered approach to migrating to PQC over the coming years. We define the simultaneous workstreams as:

  • Workstream 1: Inventory of existing systems, identification and development of new standards, testing, and migration planning. While the first set of algorithm standards has been published, there are additional standards to come that will define how PQC should be integrated in specific applications and protocols to ensure interoperability.
  • Workstream 2: Integration of PQC algorithms on public AWS endpoints to provide long-lived confidentiality of customer data transmitted to AWS.
  • Workstream 3: Integration of PQC signing algorithms into AWS cryptographic services to enable customers to deploy new post-quantum long-lived roots of trust to be used for functions such as software, firmware, and document signing.
  • Workstream 4: Integration of PQC signing algorithms into AWS services to enable the use of post-quantum signatures for session-based authentication such as server and client certificate validation.

Workstream 1

We view the work here as an ongoing aspect of our migration plan. It has already informed our overall strategy and prioritized our migration based on our customers’ needs.

Similar to our customers, we had to look across all of the places where we use cryptography to determine which implementations needed migration and at which priority. One of the important decisions we made was to focus more on encryption in transit and less on encryption at rest. Public key (asymmetric) cryptography is the foundation of encryption in transit because it enables two parties to negotiate a shared secret across an untrusted network—it’s today’s traditional public key algorithms that are at risk of being compromised by a cryptographically relevant quantum computer. Based on the current consensus across the industry, the risk of a cryptographically relevant quantum computer to the 256-bit symmetric key cryptography isn’t something that needs to be mitigated. Because data at rest in AWS is encrypted using 256-bit symmetric cryptography, we believe that we don’t need to re-encrypt existing customer data or change the symmetric algorithms and keys that we use to encrypt future data.

While the security of symmetric encryption keys and algorithms isn’t impacted by a cryptographically relevant quantum computer, there are cases where public key algorithms are used to negotiate a shared symmetric key, thereby creating risk that the symmetric key could be compromised. The first use of public key cryptography in AWS that we will migrate to PQC is exactly this case—where we negotiate a shared symmetric key between our customers and the public endpoints of AWS services. The networks that customers use to communicate with AWS services are often outside the control of either AWS or the customer, and therefore susceptible to a bad actor capturing data now and then brute-force decrypting it in the future using a cryptographically relevant quantum computer. Workstream 2 discusses this plan in more detail.

The next use of public key cryptography in AWS that we will migrate to PQC is where we offer the ability to create a key pair that acts as a long-term root of trust, typically used to apply a digital signature to software, firmware, or documents. These types of key pairs might need to be valid for digital signing years into the future because they can’t easily be updated. Think of the firmware on satellites, gaming consoles, and other IoT devices where replacing the public key pairs and the signing algorithm code might not be possible over the life of the device. Workstream 3 describes this plan in more detail.

The final area of public key cryptography in AWS that we will migrate to PQC is where we offer the ability to create a key pair that acts as a shorter-term root of trust, typically used to apply a digital signature to a single transaction, a web session, or some other ephemeral message. The most common example of this use case is the way that digital certificates are used to authenticate the server or client in a TLS session. You might assume that workstreams 2 and 3 handle the risks to session key negotiation and digital signatures in a TLS session, so what’s left to protect? It turns out that the way that public key cryptography is used to mutually authenticate two parties using digital certificates to exchange a message is heavily dependent on standards and interoperability across a large set of internet infrastructure. Getting the industry to agree on those standards and testing interoperability will take time before this workstream is finished. Workstream 4 describes this plan in more detail.

We’ve talked about how AWS has done its cryptographic inventory and our plan to migrate to PQC. If you don’t delegate all your cryptographic operations to AWS, what should you be doing to prepare? While no single approach will be right for all applications and industries, here are some resources with more context on recommendations that we contributed to or used as part of our work:

While doing an inventory of cryptography across your organization to prioritize PQC migration might be a multi-year effort for you, we have a couple of tactical recommendations to consider in the short-term. First, work to make sure that you have agility in your abilities to distribute updated versions of software. This is a critical capability for any organization in the context of vulnerability management and software lifecycle maintenance. This ability will be required to adopt new PQ versions of the AWS Command Line Interface (AWS CLI) and AWS SDKs when we publish them. You might also need to update third-party software components that use TLS or other cryptographic implementations used to communicate with AWS services to make sure that you can take advantage of the PQC we offer.

Second, we strongly encourage you to begin a comprehensive program to adopt TLS 1.3 across your entire organization. This and later versions of TLS not only offer security and performance improvements using classical public key cryptography, but they are strictly mandated to be able to use PQC at all. Even if you recently updated to TLS 1.2 in your clients and servers, you still have work to do to prepare your systems for a PQC future.

Workstream 2

Customers communicate with cloud services using protocols based on public key cryptography. These protocols (such as TLS) help ensure that customers’ communications are confidential and cannot be altered in transit. To protect our customers’ long-term need for confidentiality, we pioneered a mechanism known as hybrid post-quantum key agreement. Hybrid post-quantum key agreement combines Elliptic-Curve Diffie-Hellman (ECDH), a classic key exchange algorithm, with a post-quantum key encapsulation method, such as the newly standardized ML-KEM algorithm. The resulting two keys are combined to establish session communication keys that encrypt the network traffic. An adversary would need to break both of these public-key primitives (ECDH and ML-KEM) to break the confidentiality property provided by the hybrid key agreement.

AWS has taken the first step in deploying PQC by implementing ML-KEM within AWS-LC, our open-source FIPS-140-3-validated cryptographic library. AWS-LC is the core cryptographic library used throughout AWS. Relevant to this workstream, it’s used in s2n-tls, our open-source TLS implementation used across AWS services with HTTPS-based endpoints.

The Internet Engineering Task Force (IETF) is currently finalizing the TLS protocol standard incorporating post-quantum cryptography. Upon completion of this standard, AWS will update s2n-tls to align with these new specifications. After we have the ML-KEM implementation from AWS-LC integrated with a version of s2n based on IETF standards, we will begin deployment of this s2n version across all AWS public endpoints that offer HTTPS-based interfaces. This represents most AWS services, typically accessed through the AWS SDK or AWS CLI. AWS services that offer public endpoints with other interfaces such as SFTP, IPsec, or SSH will get ML-KEM support as standards bodies such as the IETF publish implementation guidance for those protocols.

As a part of migrating AWS managed service endpoints to PQC over TLS, we’ll also be enabling services that provide server-side TLS termination for your workloads, including Elastic Load Balancing (ELB), Amazon API Gateway, and Amazon CloudFront. This will allow you to use the same digital certificates that you’ve been using with these services and let them negotiate the server-side TLS session using ML-KEM on your behalf. This will provide the long-term confidentiality of your TLS sessions without you having to upgrade the underlying certificates themselves to some as-yet-undefined PQC standard.

To further strengthen this transition to ML-KEM, AWS is collaborating with key industry initiatives, including the National Cybersecurity Center of Excellence (NCCoE) Migration to Post-Quantum Cryptography, the Linux Foundation’s Post Quantum Cryptography Alliance, and the Rust TLS Project. These partnerships are crucial in helping to ensure seamless interoperability between different implementations of PQC solutions across the technology landscape.

Workstream 3

Many of our customers manufacture systems with firmware, operating systems, and pre-installed third-party applications. These components are cryptographically signed using a public-key-based root of trust to maintain the security and authenticity of systems as they deliver services to end users. Some of these systems, such as smart TVs connected to set-top boxes, might operate without internet connectivity for a decade or longer until they’re installed.

Additionally, certain customers must embed long-lived roots of trust directly into their hardware during manufacturing—a process that cannot be reversed or updated. For devices designed to operate for 10+ years, the security of these initial roots of trust must remain robust even when cryptographically relevant quantum computers become available.

To address this need for long-lived roots of trust for code and document signing, AWS will adopt ML-DSA, a new digital signature algorithm that is believed to be secure against adversaries in possession of a cryptographically relevant quantum computer. We will first offer ML-DSA as a feature within AWS Key Management Service (AWS KMS), enabling customers to generate and use PQC keys as roots of trust for signing operations within the FIPS-140-3 Level 3 validated hardware security modules (HSMs) used in AWS KMS. This integration represents a crucial milestone in our PQC roadmap, providing customers with the capability to establish secure, quantum-resistant roots of trust and authentication for their long-term security needs.

This long-term perspective underscores the importance of implementing PQC early, helping to ensure that systems will remain secure throughout their entire operational lifetime, even if they are disconnected for a prolonged period. While Amazon will use this capability from AWS KMS to protect our own roots of trust, we encourage you to consider ways in which this capability might help you do the same.

Workstream 4

In workstream 2 we discussed how PQC can be deployed to protect against risks to the confidentiality of data shared across a communication channel. To complete the story, there still needs to be a way to protect the authenticity of server and client identities over a communication channel in a post-quantum world.

Digital signatures are used today for end-entity authentication in networking protocols such as TLS and SSH. Customers use certificates from a trusted certificate authority (CA) that binds a public key to an identity using a digital signature to authenticate service and client endpoints. While some of the PQC standards available today (e.g. ML-DSA) could be implemented with certificates to address the post-quantum risk, the work cannot begin without further standardization and interoperability testing between certificate authorities and the systems that use digital certificates. The primary reason for this delay has to do with how publicly trusted certificates are validated today by recipients of a signed message. In the TLS protocol, for example, the client connecting to a server that presents a chain of digital certificates would need to validate all PQC signatures embedded in each certificate to determine if the server is authentic. The format of those signatures and the processes by which the certificates are issued and managed is governed by the Certificate Authority (CA) Browser Forum. The internet browser manufacturers and certificate issuer members of the CA Browser Forum need to determine how PQC will work for certificate issuance and validation before anyone can safely use them for publicly trusted certificates in TLS sessions. Amazon Trust Services is a certificate issuer and member of the CA Browser Forum; we are engaged to help drive these standards to expedite interoperability testing.

While the PQC story is being finalized for publicly trusted certificates, the AWS Private CA service isn’t necessarily blocked for the same reasons from issuing privately trusted certificates using PQC algorithms like ML-DSA. We would be able to do this because privately trusted certificates aren’t strictly beholden to the rules published by the CA Browser Forum. Customers using privately trusted certificates have the freedom to implement both the client and server portions of a PQC authentication scheme when they control the software on both ends. When workstream 3 is finished and ML-DSA is available for use from AWS KMS for signing operations, AWS Private CA will consider offering PQC as a part of certificate issuance for those customers who are ready to adopt it for their private networking channels. Our open-source AWS-LC and s2n solutions will be available for our customers to implement the PQC certificate validation functions on their clients and servers if need be.

Conclusion

In this post, we covered how AWS will migrate to PQC as part of our shared responsibility model. We also provided guidance to you on how to start your PQC migration strategy, and what part of that strategy you can expect AWS to provide. The road ahead will present new challenges and new opportunities as the industry performs the migration to new cryptographic algorithms. For additional information, blog posts, and periodic updates on our PQC migration, keep watching the AWS Post-Quantum Cryptography page.

If you want to learn more about post-quantum cryptography with AWS, contact the post-quantum cryptography team.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Security, Identity, & Compliance re:Post or contact AWS Support.

Matthew Campagna
Matthew Campagna

Matthew is a Cryptographer and Sr. Principal Engineer at Amazon Web Services. He manages the design and review of cryptographic solutions across the company and leads the migration to post-quantum cryptography. In his spare time, he commutes to work and sleeps.
Melanie Goldsborough
Melanie Goldsborough

Melanie is a Worldwide Senior Security Specialist at AWS and has over 20 years of intelligence and technology experience. She develops global go-to-market strategies for security services, focusing on public sector organizations. Melanie’s expertise spans content development, executive engagement, and program execution to enhance security practices for customers and partners globally.
Peter M. O’Donnell
Peter M. O’Donnell

Peter is an AWS Principal Solutions Architect (SA), specializing in security, risk, and compliance with the Strategic Accounts team. He has been an AWS SA for 9.5 years and he supports some of the company’s largest and most complex strategic customers in security and security-related topics, including data protection, cryptography, identity, threat modeling, compliance, security culture, CISO engagement, and more.

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Post Syndicated from Leo Ramsamy original https://aws.amazon.com/blogs/big-data/how-anz-institutional-division-built-a-federated-data-platform-to-enable-their-domain-teams-to-build-data-products-to-support-business-outcomes/

In today’s rapidly evolving financial landscape, data is the bedrock of innovation, enhancing customer and employee experiences and securing a competitive edge. Recognizing this paradigm shift, ANZ Institutional Division has embarked on a transformative journey to redefine its approach to data management, utilization, and extracting significant business value from data insights.

Like many large financial institutions, ANZ Institutional Division operated with siloed data practices and centralized data management teams. As time went on, the limitations of this approach became apparent due to rising data complexity, larger volumes, and the growing demand for swift, business-driven insights. Consequently, the bank encountered several challenges and needed to take the following actions:

  • Create business insights from untapped data potential, estimated to be approximately $150 million in the Institutional Division alone
  • Improve operational efficiency by removing manual data handling, the use of spreadsheets, and duplicate data entries
  • Increase agility by making data expertise more readily available, thereby improving time to market and overall customer experience
  • Address data quality
  • Standardize tooling and remove the Shadow IT culture, driving scalability, reducing risk, and minimizing overall operational inefficiencies

These challenges are not unique to ANZ Institutional Division. Globally, financial institutions have been experiencing similar issues, prompting a widespread reassessment of traditional data management approaches.

One major trend, embraced by many financial institutions, has been the adoption of the data mesh architecture and the shift towards treating data as a product. This paradigm, pioneered by thought leaders like Zhamak Dehghani, introduces a decentralized approach to data management that aligns closely with modern organizational structures and agile methodologies.

Some notable global examples of leading companies embracing and implementing this trend are JPMorgan Chase, Capital One, and Saxo Bank.

Inspired by these global trends and driven by its own unique challenges, ANZ’s Institutional Division decided to pivot from viewing data as a byproduct of projects to treating it as a valuable product in its own right.

This shift promises several business benefits:

  • Empowered domain expertise – By decentralizing data ownership to domain-based teams, ANZ can use the deep business knowledge within each unit to create more relevant and valuable data products
  • Increased agility – Domain teams can now respond more quickly to business needs, creating and iterating on data products without relying on a centralized bottleneck
  • Improved data quality – With domain experts overseeing their own data, there’s a greater likelihood of catching and correcting quality issues at the source
  • Scalability – The federated approach allows for greater scalability, enabling ANZ to handle increasing data volumes and complexity more effectively
  • Innovation catalyst – By democratizing data access and empowering teams to create data products, ANZ is fostering a culture of innovation and data-driven decision-making across the organization

This transition is not just about technology; it represents a fundamental shift in how ANZ views and values its data assets. By treating data as a product, the bank is positioned to not only overcome current challenges, but to unlock new opportunities for growth, customer service, and competitive advantage.

This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division.

ANZ’s federated data strategy

In response to the challenges, ANZ Group formulated a data strategy that focuses on empowering employees to securely use data to improve the sustainability and financial well-being of their customers. At its core are the following pillars:

  • Introducing new ways of working that focus on generating customer value first
  • New technology platforms and tooling that allow the bank to collect, share, archive, and dispose data in a secure and controlled way
  • Achieving consistency in how data is produced and consumed across the entire bank through data products and better-connected systems
  • Supporting the bank’s risk and regulatory obligations by providing a secure and resilient data platform that provides fine-grained, controlled access to quality data products

ANZ has made the strategic decision to adopt an architectural and operational model aligned with the data mesh paradigm, which revolves around four key principles: domain ownership, data as a product, a self-serve data platform, and federated computational governance.

Domain ownership recognizes that the teams generating the data have the deepest understanding of it and are therefore best suited to manage, govern, and share it effectively. This principle makes sure data accountability remains close to the source, fostering higher data quality and relevance.

Treating data as a product instils a product-centric mindset, emphasizing that data must be secure, discoverable, understandable, interoperable, reusable, and managed throughout its lifecycle. This principle makes sure data consumers, both internal and external, derive consistent value from well-designed data products.

A self-serve data platform empowers domains to create, discover, and consume data products independently. It abstracts technical complexities and provides user-friendly tools, enabling a scalable, repeatable, and automated approach to producing high-quality data products.

Under the federated mesh architecture, each divisional mesh functions as a node within the broader enterprise data mesh, maintaining a degree of autonomy in managing its data products. To effectively coordinate these autonomous nodes and facilitate seamless integration, enterprise-wide standards, such as those related to data governance, interoperability, and security, are essential to maintain alignment and consistency across all nodes and domains and teams within.

With this approach, each node in ANZ maintains its divisional alignment and adherence to data risk and governance standards and policies to manage local data products and data assets. This enables global discoverability and collaboration without centralizing ownership or operations.

As a result, governance resides with the data products themselves, making sure standards and policies, such as access control, data quality, and compliance, are enforced where the data lives. In this regard, the enterprise data product catalog acts as a federated portal, facilitating cross-domain access and interoperability while maintaining alignment with governance principles. This model balances node or domain-level autonomy with enterprise-level oversight, creating a scalable and consistent framework across ANZ.

Within the ANZ enterprise data mesh strategy, aligning data mesh nodes with the ANZ Group’s divisional structure provides optimal alignment between data mesh principles and organizational structure, as shown in the following diagram.

Central to the success of this strategy is its support for each division’s autonomy and freedom to choose their own domain structure, which is closely aligned to their business needs. Divisions decide how many domains to have within their node; some may have one, others many. These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. Nodes and domains serve business needs and are not technology mandated.

Under the federated computational governance model, the ANZ Group strategy defines guardrails that treat a node as a logical data container suitable for the following:

  • Ingestion and metadata management
  • Creating source-aligned data products complying with ANZ’s Data Product Specification (DPS)
  • Integrating source-aligned data products from other nodes
  • Producing consumer-aligned data products for specific business purposes
  • Publishing conforming data products to ANZ’s Data Product Catalog (DPC)

Following on from this strategy is organizing its domain structure to provide autonomy to various functional teams while preserving the core values of data mesh. The following diagram depicts an example of the possible structure.

For instance, Domain A will have the flexibility to create data products that can be published to the divisional catalog, while also maintaining the autonomy to develop data products that are exclusively accessible to teams within the domain. These products will not be available to others until they are deemed ready for broader enterprise use.

This strategy supports each division’s autonomy to implement their own data catalogs and decide which data products to publish to the group-level catalog. This flexibility extends to divisional domains, which can choose which data products to publish to the divisional catalog or keep visible only to domain consumers.

Institutional Data & AI Platform architecture

The Institutional Division has implemented a self-service data platform to enable the domain teams to build and manage data products autonomously. The Institutional Data & AI platform adopts a federated approach to data while centralizing the metadata to facilitate simpler discovery and sharing of data products. The following diagram illustrates the building blocks of the Institutional Data & AI Platform.

The building blocks are as follows:

  1. Foundational Data & AI Platform capabilities – A dedicated data platform team provides domain-agnostic tools, systems, and capabilities to enable autonomous data product development across domains. This self-serve infrastructure allows domain teams to manage the full data lifecycle without relying on a centralized data team. Key capabilities include data storage, data onboarding and transformation, and data utilities that facilitate data sharing with interoperability between domains. These capabilities abstract the technical complexities associated with data management infrastructure, allowing domain experts to focus on creating valuable data products rather than infrastructure management.
  2. Domain-owned data assets – The domain-oriented data ownership approach distributes responsibility for data across the business units within the Institutional Division. Domain teams are responsible for developing, deploying, and managing their own analytical data products alongside operational data services. Data contracts authored by data product owners automate data product creation and provide a standard to access data products. By treating the data as a product, the outcome is a reusable asset that outlives a project and meets the needs of the enterprise consumer. Consumer feedback and demand drives creation and maintenance of the data product.
  3. Division-level metadata management and data governance – A centrally hosted service provides domain teams with the capability to publish their data products along with relevant metadata, like business definitions and lineage. Some of the key features implemented are:
    1. Metadata management that centralizes metadata and presents it within the context of data products, such as data quality scores and data product lineage.
    2. A data portal for consumers to discover data products and access associated metadata.
    3. Subscription workflows that simplify access management to the data products.
    4. Computational governance that enforces divisional and enterprise data policies and standards, such as data classification and business data models for aligning terminology.

The following diagram is a high-level example of the technical architecture approach towards the Institutional Data & AI Platform. The solution uses a building block approach, on a cloud-centered platform comprised of AWS services, with partner solutions and open standards like OpenLineage and Apache Iceberg.

Let’s look at the key services that enable the federated platform to operate at scale:

  • Data storage and processing:
    • Apache Iceberg on Amazon Simple Storage Service (Amazon S3) offers an optimized way to store data assets and products and promotes interoperability across other services
    • Amazon Redshift allows domain teams to create and manage fit-for-purpose data marts
    • AWS Lambda and AWS Glue are used for data onboarding and processing, and data utilities created in Python and PySpark promote reusability and quality across the data processing pipelines
    • dbt simplifies data transformation rules and allows sub-domain data analysts to build modeling logic as SQL statements
    • Amazon Managed Workflows for Apache Airflow (Amazon MWAA) enables efficient management of workflows and data pipeline orchestration using out-of-the-box integrations with AWS services
  • Metadata management and data governance:
    • To maintain data reliability and accuracy, a robust data quality framework using Soda core is used that automates data quality using checks defined in a data contract
    • Amazon DataZone enables data product cataloging, discovery, metadata management, and implementing computational governance
    • OpenLineage simplifies harvesting and collection of data and process-level lineage, which are then published to Amazon DataZone
    • AWS Lake Formation, combined with AWS Glue Data Catalog, provides data governance and access management to data products that reside within sub-domains
  • Analytics:
    • Tableau offers capabilities for sub-domains with data visualization and business intelligence capabilities
  • Observability and security:
    • Observability needs of the platform are built into all the processes using monitoring, with logging functionality provided by Amazon CloudWatch and AWS CloudTrail
    • AWS Secrets Manager makes sure secrets are stored and made available for data pipelines to access services in a secure manner

The technical implementation actualizes the data product strategy at ANZ Institutional Division. Amazon DataZone plays an essential role in facilitating data product management for the domain teams. The service addresses several critical aspects of the Institutional Division’s data product strategy, including:

  • Data cataloging and metadata management – Amazon DataZone provides comprehensive data cataloging and metadata management capabilities
  • Data governance and compliance – Effective data governance is essential for scaling data products
  • Self-service capabilities – Amazon DataZone empowers domain teams with self-service capabilities, enabling them to create, manage, and deploy data products independently
  • Integration and interoperability – One of the challenges in scaling data products is providing seamless integration across various data sources and systems
  • Collaboration and sharing – Amazon DataZone provides a platform for sharing data and metadata across teams and domains

Institutional Division’s delivery model to achieve scale

The Institutional Division has successfully used the federated architecture, and key to this delivery model is the implementation of Foundational Data & AI Platform capabilities that serve all domains within the division. This model promotes self-service and accelerates the delivery of subsequent initiatives by using the capabilities built for previous use cases.

To evaluate the success of the delivery model, ANZ has implemented key metrics, such as cost transparency and domain adoption, to guide the data mesh governance team in refining the delivery approach. For instance, one enhancement involves integrating cross-functional squads to support data literacy.

The key to scaling the Institutional Division operating model are the following considerations:

  • Data as a product approach – Use techniques like event storming and domain-driven design to capture business events and their meanings.
  • Education and enablement – Conduct learning interventions to upskill teams on understanding and using the data as a product approach.
  • Iterative data platform delivery – Work backward from business initiative to iteratively deliver self-service data platform infrastructure capabilities.
  • Managing demand efficiently – Implement a feedback mechanism to manage demand on data products. Track and manage data debt using standard data contract specifications. Most importantly, adopt governance and standards to make sure data products are built and maintained with a long-term perspective, minimizing technical debt.

“The Institutional Data & Analytics Platform (IDAP) has allowed the Institutional team to establish a base foundation to allow various teams to aggregate and consume the wealth of data across the division. This self-service platform enables business leaders to both create and consume reusable data products, unlocking value across this division. It’s also an excellent proof point for our broader data mesh architecture, allowing us to connect this divisional data to broader enterprise data stores—further positioning us to put the customer at the center of everything we do.”

– Tim Hogarth, CTO ANZ

“AWS believes that democratizing data, while not compromising on security and fine-grained access, is a key component of any future-proof, scalable data platform, so we are pleased to be enabling ANZ bank’s IDAP metadata management and data governance capabilities through Amazon DataZone. This allows the diverse business functions at ANZ the autonomy to self-serve on their data needs with built-in governance.”

– Shikha Verma, Head of Product, Amazon DataZone

Conclusion

ANZ’s journey to move towards a data product approach has improved the organization’s approach to manage data and reduce data silos, and has positioned it to become a data-driven, customer-centric organization. By combining federated platform practices and adopting AWS services and open standards, ANZ Institutional Division is achieving its objectives in decentralization with a scalable data platform that enables its domain teams to make informed decisions, drive innovation, and maintain a competitive edge.

Special thanks: This implementation success is a result of close collaboration between ANZ Institutional Division, AWS ProServe, and the AWS account team. We want to thank ANZ Institutional Executives and the Leadership Team for the strong sponsorship and direction.


About the Authors

Leo Ramsamy is a Platform Architect specializing in data and analytics for ANZ’s Institutional division. He focuses on modern data practices, including Data Mesh architecture, data governance, quality management, and observability. His work aligns data strategies with business goals, improving accessibility and enabling better decision-making across ANZ.

Srinivasan Kuppusamy is a Senior Cloud Architect – Data at AWS ProServe, where he helps customers solve their business problems using the power of AWS Cloud technology. His areas of interests are data and analytics, data governance, and AI/ML.

Rada Stanic is a Chief Technologist at Amazon Web Services, where she helps ANZ customers across different segments solve their business problems using AWS Cloud technologies. Her special areas of interest are data analytics, machine learning/AI, and application modernization.

Preparing for take-off: Regulatory perspectives on generative AI adoption within Australian financial services

Post Syndicated from Julian Busic original https://aws.amazon.com/blogs/security/preparing-for-take-off-regulatory-perspectives-on-generative-ai-adoption-within-australian-financial-services/

The Australian financial services regulator, the Australian Prudential Regulation Authority (APRA), has provided its most substantial guidance on generative AI to date in Member Therese McCarthy Hockey’s remarks to the AFIA Risk Summit 2024. The guidance gives a green light for banks, insurance companies, and superannuation funds to accelerate their adoption of this transformative technology, but reminded the financial services industry of the need for adequate guardrails to make sure that the benefits of generative AI don’t come at an unacceptable cost to the community.

Amazon Web Services (AWS) is committed to developing AI responsibly and strongly supports APRA’s message to proceed with generative AI adoption with appropriate guardrails implemented. AWS is at the forefront of generative AI research and innovation, and many of our financial services customers are already harnessing the benefits of our artificial intelligence (AI), machine learning (ML), and generative AI services. AWS is committed to the responsible development and use of AI so that we can help our customers achieve their business goals while meeting—and aiming to exceed—their regulators’ expectations.

A green light for AI, ML, and generative AI

APRA’s guidance, as outlined in APRA Member Therese McCarthy Hockey’s remarks to the AFIA Risk Summit 2024, offers a clear pathway for adoption of AI, ML, and generative AI technologies by APRA-regulated entities. Ms. McCarthy Hockey says that there is “keen support” within APRA and across government for companies to realize the benefits of technology-led innovation, and she highlights the significant advantages that effective use of generative AI can deliver, such as improved productivity, cost efficiencies, more personalized customer experiences, and the ability to divert valuable resources to higher-level areas of need.

“Within APRA and across governments and regulators there is keen support for the realisation of tangible improvements through innovation.” — APRA Member Therese McCarthy Hockey’s remarks to AFIA Risk Summit May 2024

AWS financial services customers are starting to use more advanced AI for a variety of purposes, such as customer service, marketing, application development, fraud detection, and regulatory compliance. Specific use cases cited by APRA were the use of generative AI to rapidly review long documents against criteria such as policy requirements, use of generative AI-powered coding tools to produce better code faster, and creating generative AI bots to simulate customer testing of products and services. This is an extension of less sophisticated forms of AI which have been in operation for some time, with APRA citing internet chat bots and natural language processing as examples where businesses have already realized efficiencies by automating and speeding up manual or time-consuming processes.

APRA and other financial services regulators are experimenting internally with AI themselves. In Ms. McCarthy Hockey’s speech, she noted that APRA itself is using text analysis tools on an ongoing basis to review responses to APRA risk culture surveys, with the results helping APRA risk specialists direct focus to where it’s most required. APRA is also experimenting with natural language processing tools to review incident reporting data from regulated entities and to highlight incidents that are worthy of further investigation. This helps to reduce the human effort required by APRA staff and increase regulatory efficiency. Finally, APRA is collaborating with the Australian Securities and Investments Commission (ASIC) and the Reserve Bank of Australia (RBA) on a proof of concept to reduce the effort required to compare, analyze, and summarize the reams of documentation the three agencies must review as part of their regular entity supervision duties.

Risks must be understood and managed

APRA advocates for a prudent approach to experimentation with these technologies. As was the case with cloud adoption, organizations with more mature risk and data management capabilities will be able to move faster than those without.

“APRA’s message to the entities we regulate is that firm board oversight, robust technology platforms and strong risk management are essential for companies that want to begin experimenting with new ways of harnessing AI.” — APRA Member Therese McCarthy Hockey’s remarks to AFIA Risk Summit May 2024

APRA’s current regulatory framework is fit-for-purpose

APRA also made the specific point that its existing prudential framework remains fit-for-purpose for the increased uptake of AI, ML, and generative AI.

APRA’s primary focus is on governance, citing three key areas:

  1. Do boards have sufficient capability to determine an appropriate AI strategy and make sound risk management decisions? Are they able to effectively challenge management? What sort of learning and development programs are in train, and do the boards have access to external skills and advice if required?
  2. How mature is the risk culture? Is a risk management mindset embedded and functioning effectively across all three lines of defense? What controls and monitoring are in place to help prevent employees making unauthorized use of AI, ML, and generative AI tools?
  3. Is there adequate data quality and reliability? AI outputs depend directly on the quality of the inputs. APRA states that data management is an area where many regulated entities have a long way to go.

APRA also focuses on accountability, reminding regulated entities that as with any form of outsourcing or use of third-party services, the regulated entity retains accountability for the outputs of the AI, ML, and generative AI programs they deploy. There must always be a human in the loop: a person accountable for verifying that AI operates as intended. The level of human involvement can vary—for example, APRA does not suggest that a human should be involved in every AI decision made by a fraud detection service, but there should be a human who is accountable for the algorithm it runs, its operations, and the outcomes it drives.

How AWS is helping customers locally and globally use AI responsibly

From the outset, AWS has prioritized responsible AI innovation by embedding safety, fairness, robustness, security, and privacy into our development processes, and continuously educating our employees. We extend this commitment through to our customers by designing services that help customers derive business value from AI in a safe and responsible way.

AWS collaborates with organizations such as the OECD AI working groups, the Partnership on AI, the Responsible AI Institute, and strategic partnerships with universities worldwide. In Australia, AWS collaborates with key institutions like the National AI Centre, CSIRO, the Australian Information Industry Association, and the Tech Council of Australia to provide insights on responsible AI adoption and to maximize the benefits of AI technology for the country. The recent Voluntary AI Safety Standard developed by the National AI Centre is the start of clear guidance for Australian organizations to follow, and AWS is engaging with Australia and other governments on the responsible use adoption and use of generative AI.

Recently, AWS has supported global financial services customers in critical areas such as risk management, financial crime prevention, and cybersecurity by using generative AI to analyze and respond to large data volumes in real-time. Verafin (a Nasdaq company) used Amazon Bedrock to improve anti-money laundering and fraud prevention processes. This application of AI enhances the effectiveness of financial crime management programs. Mastercard employs AWS AI and machine learning services to detect and prevent fraud while providing the most seamless customer experience possible.

Generative AI’s role in modernizing legacy systems is increasingly recognized, especially among Australian financial services customers who are undertaking transformation programs to reduce technology debt and enhance process resilience. CommBank, PEXA, and National Australia Bank (NAB) employ generative AI technology to improve speed, quality, and security when building and modifying applications.

How to implement responsible AI within your organization

The core dimensions of responsible AI at AWS align to the key regulatory considerations of both APRA and regulators globally:

  • Fairness – Considering impacts on different groups of stakeholders
  • Explainability – Understanding and evaluating system outputs
  • Privacy and security – Appropriately obtaining, using, and protecting data and models
  • Safety – Working to prevent harmful system output and misuse
  • Controllability – Having mechanisms to monitor and steer AI system behaviour
  • Veracity and robustness – Achieving correct system outputs, even with unexpected or adversarial inputs
  • Governance – Incorporating best practices into the AI supply chain, including providers and deployers
  • Transparency – Enabling stakeholders to make informed choices about their engagement with an AI system

Note that responsible AI is a continually evolving field. Customers can keep updated with developments in this area on our Responsible AI webpage.

The Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI provides extensive guidance, and serves as both a starting point and a guide to help customers meet, and in many cases exceed, regulatory expectations.

We have integrated features into our generative AI services to facilitate the application of responsible AI policies for organizations. For example, Amazon Bedrock Guardrails can help financial services organizations comply with APRA guidance on AI use in several key ways:

  1. Content filtering – Guardrails allows organizations to configure content filters to block harmful or inappropriate content in AI model inputs and outputs. This helps AI applications to adhere to with APRA’s expectations for responsible AI use.
  2. Topic restrictions – Organizations can define specific topics to be avoided in AI interactions. For example, a banking chatbot could be configured so it won’t provide investment advice, aligning with regulatory restrictions.
  3. Sensitive information protection – Guardrails can detect and redact personally identifiable information (PII) in AI inputs and outputs. This helps protect customer privacy and aids in compliance with data protection requirements.
  4. Custom word filters – Companies can set up lists of words or phrases to block, helping maintain appropriate communication.
  5. Contextual grounding checks – This feature helps detect and filter AI hallucinations in model responses where a reference source and a user query are provided, improving the accuracy and reliability of AI-generated responses. This aligns with APRA’s focus on making sure that AI systems provide accurate and trustworthy information.
  6. Customizable policies – Guardrails allows organizations to tailor AI safeguards to their specific needs and regulatory requirements, helping them align with APRA’s principles-based approach.
  7. Consistent safeguards – Guardrails can be applied across multiple AI models and applications, enabling a standardized approach to responsible AI use across the organization.
  8. Transparency and testing – The ability to test guardrails and iterate on configurations supports APRA’s expectations for due diligence and appropriate monitoring of AI systems.

We have a comprehensive user guide detailing how to implement, configure, and test Amazon Bedrock Guardrails.

AWS AI Service Cards also provide detailed information on AWS AI services, including intended use cases, limitations, and responsible AI design choices. This transparency helps financial institutions understand and responsibly use AI technologies.

APRA’s existing prudential standards do not set specific rules for managing AI/ML and generative AI risks. Instead, APRA outlines desired risk management outcomes, leaving it to each regulated entity to assess AI deployment risks and implement appropriate controls. AWS offers the User Guide to Financial Services Regulations and Guidelines in Australia to help customers meet APRA’s requirements.

Ultimately, the rate of AI, ML, and generative AI adoption amongst APRA-regulated entities will be determined by the risk appetite and risk management capability of individual entities. APRA openly encourages its regulated entities—our financial services customers—who are considering AI, ML, and generative AI experimentation and adoption to reach out to APRA directly and initiate dialogue. APRA is a highly experienced, knowledgeable, and approachable regulator, and will be able to provide valuable insights and guidance to regulated entities.

Conclusion and next steps

APRA’s messaging to industry is a significant milestone for AI, ML, and generative AI adoption in the Australian financial services industry. Boards, executives, and technology decision-makers should review APRA’s Risk Summit speech and consider APRA’s support for the adoption of these technologies when refining their strategies and plans.

AWS, and our AWS Partner Network, are experienced in working with financial services customers, and there are already a number of examples both internationally and locally where generative AI has been implemented to create value for our customers. AWS is ready to help our customers meet and exceed APRA’s risk management expectations.

Contact your AWS representative to discuss how the AWS solution architects, AWS Professional Services teams, AWS Training and Certification, and the AWS Partner Network can assist with your AI, ML, and generative AI adoption journey. If you don’t have an AWS representative, please contact us at https://aws.amazon.com/contact-us.
 

Julian Busic
Julian Busic

Julian is a Security Solutions Architect with a focus on regulatory engagement. He works with our customers, their regulators, and AWS teams to help customers raise the bar on secure cloud adoption and usage. Julian has over 15 years of experience working in risk and technology across the financial services industry in Australia and New Zealand.
Jamie Simon
Jamie Simon

Jamie leads AWS business within the banking and financial services industry across Australia and New Zealand, supporting financial services customers as they make use of the cloud to transform their business for a digital and AI-enabled future.
Warren Cammack
Warren Cammack

Warren supports AWS customers in applying the value of the AWS Cloud at scale, focusing on identifying and overcoming blockers to adoption. Currently he is leading the rollout of generative AI services to enable enterprises to benefit from the new technology in a safe, responsible, and effective manner.
Krish De
Krish De

Krish is a Principal Solutions Architect with a focus on financial services. He works with AWS customers, their regulators, and AWS teams to safely accelerate customers’ cloud adoption, with prescriptive guidance on governance, risk, and compliance. Krish has over 20 years of experience working in governance, risk, and technology across the financial services industry in Australia, New Zealand, and the United States.

Exploring the benefits of artificial intelligence while maintaining digital sovereignty

Post Syndicated from Max Peterson original https://aws.amazon.com/blogs/security/exploring-benefits-of-artificial-intelligence-while-maintaining-digital-sovereignty/

Around the world, organizations are evaluating and embracing artificial intelligence (AI) and machine learning (ML) to drive innovation and efficiency. From accelerating research and enhancing customer experiences to optimizing business processes, improving patient outcomes, and enriching public services, the transformative potential of AI is being realized across sectors. Although using emerging technologies helps drive positive outcomes, leaders worldwide must balance these benefits with the need to maintain security, compliance, and resilience. Many organizations, including those in the public sector and regulated industries, are investing in generative AI applications powered by large language models (LLMs) and other foundation models (FMs) because these applications can transform and scale their work and provide better experiences for customers. Beyond computing power, unlocking this AI potential resides in the AI applications that organizations can create based on a variety of AI/ML development services, models, and data sources. Organizations must navigate the complexity of building AI applications in light of existing and emerging regulatory regimes while verifying that their AI applications and related data are secure, protected, and resilient to risks and threats.

AWS offers a wide range of AI/ML services and capabilities, built on our sovereign-by-design foundation, that are making it simpler for our customers to meet their digital sovereignty needs while getting the security, control, compliance, and resilience that they need. For example, Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon SageMaker provides tools and infrastructure to build, train, and deploy ML models at scale while supporting responsible AI with governance controls and access to pretrained models.

Innovating securely across the AI lifecycle

Security is and always has been our top priority at AWS. AWS customers benefit from our ongoing investment in data centers, networks, custom hardware, and secure software services, built to satisfy the requirements of the most security-sensitive organizations, including the government, healthcare, and financial services. We have always believed that it is essential that customers have control over their data and its location. That’s why we architected the AWS Cloud to be secure and sovereign-by-design from day one. We remain committed to giving our customers more control and choice so that they can use the full power of AWS while meeting their unique digital sovereignty needs.

As organizations develop and implement generative AI, they want to make sure that their data and applications are secured across the AI lifecycle, including data preparation, training, and inferencing. To help ensure the confidentiality and integrity of customer data, all of our Nitro-based Amazon Elastic Compute Cloud (Amazon EC2) instances that run ML accelerators such as AWS Inferentia and AWS Trainium, and graphics processing units (GPUs) such as P4, P5, G5, and G6, are backed by the industry-leading security capabilities of the AWS Nitro System. By design, there is no mechanism for anyone at AWS to access Nitro EC2 instances that customers use to run their workloads. The NCC Group, an independent cybersecurity firm, has validated the design of the Nitro System.

We take a secure approach to generative AI and make it practical for our customers to secure their generative AI workloads across the generative AI stack so that they can focus on building and scaling. All AWS services—including generative AI services—support encryption, and we continue to innovate and invest in controls and encryption features that allow our customers to encrypt everything everywhere.

For example, Amazon Bedrock uses encryption to protect data in transit and at rest, and data remains in the AWS Region where Amazon Bedrock is being used. Customer data, such as prompts, completions, custom models, and data used for fine-tuning or continued pre-training, is not used for Amazon Bedrock service improvement and is never shared with third-party model providers. When customers fine-tune a model in Amazon Bedrock, the data is never exposed to the public internet, never leaves the AWS network, is securely transferred through a customer’s virtual private cloud (VPN), and is encrypted in transit and at rest.

SageMaker protects ML model artifacts and other system artifacts by encrypting data in transit and at rest. Amazon Bedrock and SageMaker integrate with AWS Key Management Service (AWS KMS) so that customers can securely manage cryptographic keys. AWS KMS is designed so that no one—not even AWS employees—can retrieve plaintext keys from the service.

Developing responsibly

The responsible development and use of AI is a priority for AWS. We believe that AI should take a people-centric approach that makes AI safe, fair, secure, and robust. We are committed to supporting customers with responsible AI and helping them build fairer and more transparent AI applications to foster trust, meet regulatory requirements, and use AI to benefit their business and stakeholders. AWS is the first major cloud service provider to announce ISO/IEC 42001 accredited certification for AI services, covering Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe. ISO/IEC 42001 is an international management system standard that outlines requirements and controls for organizations to promote the responsible development and use of AI systems.

We take responsible AI from theory into practice by providing the necessary tools, guidance, and resources, including Amazon Bedrock Guardrails to help implement safeguards tailored to customer generative AI applications and aligned with their responsible AI policies, or Model Evaluation on Amazon Bedrock to evaluate, compare, and select the best FMs for specific use cases based on custom metrics, such as accuracy, robustness, and toxicity. Additionally, Amazon SageMaker Model Monitor automatically detects and alerts customers of inaccurate predictions from deployed models. We continue to publish AI Service Cards to enhance transparency by providing a single place to find information on the intended use cases and limitations, responsible AI design choices, and performance optimization best practices for our AI services and models.

Building resilience

Resilience plays a pivotal role in the development of any workload, and AI/ML workloads are no different. Customers need to know that their workloads in the cloud will continue to operate in the face of natural disasters, network disruptions, or disruptions due to geopolitical crises. AWS delivers the highest network availability of any cloud provider and is the only cloud provider to offer three or more Availability Zones (AZs) in all Regions, providing more redundancy. Understanding and prioritizing resilience is crucial for generative AI workloads to meet organizational availability and business continuity requirements. We have published guidance on designing generative AI workloads for resilience. To enable higher throughput and enhanced resilience during periods of peak demands in Amazon Bedrock, customers can use cross-region inference to distribute traffic across multiple Regions. For customers with specific European Union data sovereignty requirements, we are launching the AWS European Sovereign Cloud in 2025 to offer an additional layer of control and resilience.

Supporting choice and flexibility

It’s important that customers have access to diverse AI technologies, while having the freedom to choose the right solutions to meet their needs. AWS provides more diversity, choice, and flexibility so that customers can select the AI solution that best aligns with their specific requirements, whether that’s using open-source models, proprietary solutions, or their own custom AI models. For example, we understand the importance of open-source AI in fostering transparency, collaboration, and rapid innovation. Open-source models enable scrutiny of vulnerabilities, drive security improvements, and support research on AI safety. Amazon SageMaker JumpStart provides pretrained, open-source models for a wide range of common use cases. To provide practitioners and developers with the guidance and tools that they need to create secure-by-design AI systems, we are a founding member of the open-source initiative Coalition for Secure AI (CoSAI).

Also, our commitment to portability and interoperability helps ensure that customers can move easily between environments. For customers changing IT providers, we’ve taken concrete steps to lower costs, and AWS is actively engaged in efforts to facilitate switching between cloud providers, including through our support of the Cloud Infrastructure Service Providers in Europe (CISPE) Cloud Switching Framework, which lays out guidance to assist providers and customers in the switching process. This gives organizations the flexibility to adapt their cloud and AI strategies as their needs evolve.

We remain committed to providing customers with a choice of diverse AI technologies, along with secure and compliant ways to build their AI applications throughout the development lifecycle. Through this approach, customers can enhance the security, compliance, and resilience of their systems.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Max Peterson
Max Peterson

Max is the Vice President of AWS Sovereign Cloud. He leads efforts to ensure that AWS customers around the world have the most advanced set of sovereignty controls, privacy safeguards, and security features available in the cloud. Previously, Max served as the VP of AWS Worldwide Public Sector (WWPS) and created and led the WWPS International Sales division, with a focus on empowering government, education, healthcare, aerospace and satellite, and nonprofit organizations to drive rapid innovation while meeting evolving compliance, security, and policy requirements. Max has over 30 years of public sector experience and served in other technology leadership roles before joining Amazon. Max has earned both a Bachelor of Arts in Finance and Master of Business Administration in Management Information Systems from the University of Maryland.

How SmugMug Increased Data Modeling Productivity with Amazon Q Developer

Post Syndicated from Will Matos original https://aws.amazon.com/blogs/devops/how-smugmug-increased-data-modeling-productivity-with-amazon-q-developer/

This post is co-written with Dr. Geoff Ryder, Manager, at SmugMug.

Introduction

SmugMug operates two very large online photo platforms: SmugMug and Flickr. These platforms enable more than 100 million customers to safely store, search, share, and sell tens of billions of photos every day. However, the data science and engineering team at SmugMug and Flickr often faces complex data modeling challenges that require significant time to resolve.

These challenges arise due to several factors. First, the team has to contend with diverse datasets from different sources. Additionally, the database schema and tables are highly complex, and the team needs to quickly understand application (PHP) code and database table structures in order to generate the necessary complex database queries. Specifically, SmugMug uses Amazon Redshift as its cloud data warehouse to analyze patterns in petabyte-scale data stored in Amazon S3, as well as transactional data in Amazon Aurora and Amazon DynamoDB. This allows them to generate dozens of business reports daily.

However, the complexity increases further as many database tables also need to be imported from third-party organizations into Amazon Redshift, where they are joined with SmugMug and Flickr’s internal tables. In extreme cases, properly modeling all these database tables and handling issues like granularity, cardinality, timestamps and missing data could take years – an impractical timeline for the business. We are excited to walk through SmugMug’s data modeling use cases and how SmugMug uses Amazon Q Developer to improve the data science and engineering team’s productivity.

Discovering Amazon Q Developer

SmugMug was one of the first customers to pilot Amazon Q Developer (previously Amazon CodeWhisperer), the most capable AI-powered assistant for software development that re-imagines the experience across the entire software development lifecycle, making it easier and faster to build, secure, manage, optimize, operate, and transform applications on AWS. There are multiple Amazon Q Developer use cases at SmugMug and Flickr, such as using Amazon Q Developer agent (/dev) for software development (i.e. generating implementation plans and the accompanying code), generating inline code suggestions, asking Amazon Q Developer in chat about AWS services and best practices, and analyzing AWS usage and costs for Cloud Financial Management (CFM) needs. For the data science and engineering team specifically, the key feature is chatting with Amazon Q Developer in integrated development environments (IDEs) like Intellij DataGrip. The data analysts and data scientists at SmugMug and Flickr ask questions in Amazon Q Developer chat to analyze database schemas, generate data model diagrams from DDL (Data Definition Language) statements, convert queries between languages, automatically generate complex database queries for data analysis, generate code to validate table contents, and predict trends using ML (Machine Learning).

Implementing Amazon Q Developer

To solve the data modeling challenges SmugMug faced, the team collaborated closely with their AWS Account Team, AWS Professional Services, and the Amazon Q Developer service team to create and test a data modeling assistant solution using Amazon Q Developer.

As a first step, the data modeler needs to bring the right metadata to bear. For simpler cases, the commands “show view myschema.v” or “show table myschema.t“ retrieve DDL schema information about the specified view or table from Amazon Redshift into the IDE console.

Here’s an example using simulated data for a hypothetical company. For this typical company that handles orders for products, the result of typing “show table sample.orderinfo” and “show table sample.skuinfo”might be:

Image of SQL statement generated by the show table statement. "CREATE TABLE sample.skuinfo ( sku_id bigint ENCODE raw, sku_vendor bigint ENCODE az64, sku_category character varying(18) ENCODE lzo, sku_description character varying(255) ENCODE lzo, date_sku_created timestamp without time zone ENCODE az64, date_sku_updated timestamp without time zone ENCODE az64, pipeline_inserted_at timestamp without time zone ENCODE az64 ) DISTSTYLE KEY SORTKEY ( sku_id );"

Image of SQL statement generated by the show table statement. "CREATE TABLE sample.orderinfo ( order_id bigint ENCODE raw, shipper_id bigint ENCODE az64 distkey, product_id bigint ENCODE az64, quantity_ordered integer ENCODE az64, date_order_placed timestamp without time zone ENCODE az64 ) DISTSTYLE KEY SORTKEY ( order_id );"

This DDL text is now in the open tab. By selecting the text to highlight it, that DDL text becomes part of the context that Amazon Q Developer sees. The modeler can start asking questions about them in the Amazon Q Developer chat window in the IDE.

Diagram showing what is considered part of the context included in a request including the RAG query result, related documents when using the at-workspace key word, the highlighted text in the IDE open tab,the chat history, and the prompt.

In complex scenarios, establishing the correct modeling context requires a combination of schema information, legacy SQL, application source code in various programming languages, sample values, and natural language documentation. Amazon Q Developer addresses this by creating a local index of relevant files and content. When a question is asked using @workspace, this index is consulted to identify and include pertinent sections of code and information in the request. (See this article for additional details on workspace). The prompt plays a crucial role in measuring similarity, so providing comprehensive context within it is essential. To optimize this process, the IDE settings feature a tunable workspace index function, allowing for enhanced performance in identifying and incorporating relevant context.

Image showing the Amazon Q Settings window where you enable the Workspace feature by checking the "Workspace index" box. You can also change the number of worker threads used, and the maximum workspace index size in MB.

Workspace Index Settings

By adopting Amazon Q Developer as a team, we are able to jointly develop and share proprietary prompt text to address the four steps in our modeling process, as follows.

Step 1. Define the goal for the data modeling project

From prior knowledge, sketch a high-level goal for a data model. Gather the data for it manually, or by e.g. querying a vector database and adding its documents to the project.

For this example, we choose as the goal to compute aggregated metrics from a new table or view composed of two existing tables, sample.orderinfo and sample.skuinfo. These contain simulated data about product sales that are common to many companies. The order table is in the style of a fact table that logs customer orders, and the stock keeping unit (SKU) table is a dimension table that provides additional data points of interest about each order. The order and SKU information need to be combined by a join operation before we can compute the metrics. We would like Amazon Q Developer to tell us how to write that SQL join statement.

Step 2. Conduct an exploratory analysis and generate candidates

Next, prompt Amazon Q Developer for candidate foreign keys to join the tables, and for SQL code to execute those joins. Generate an entity-relationship diagram (ERD) as a visual aid. Prompts do not have to be complicated. For example:

@workspace What columns of database tables sample.orderinfo and sample.skuinfo 
would be best to join the two tables? Provide SQL code for the join. Draw an 
entity relationship diagram that shows the joins between the two tables, and 
includes only the fields involved in the join. Add a crow's foot cardinality 
marker to indicate a 1:many relationship, and add it next to the high 
cardinality table.

Image with the first part of the response to the prompt with the following text: "Based on the table schemas, sku_id is the appropriate column to join these tables. The relationship is likely one-to-many (1:M) where one SKU can appear in multiple orders. Here's the SQL join: SELECT o.order_id, o.sku_id, s.sku_description FROM sample.orderinfo o JOIN sample.skuinfo s ON o.sku_id = s.sku_id;

Image with the second part of the response to the prompt with the ASCII relationship diagram showing the join relationship.

Each time tables are joined together, new aggregated metrics become available to drive business insights. Now, for instance, we can find the top selling SKUs in October thanks to our results:

Image shows the top 5 results from the prior query showing the top skus in October.

Sometimes we need to look at code written in languages other than SQL to complete the data model. For example, the names of some vendors this company works with happen to appear in application PHP code as human readable strings, but are saved in the application database as numbers. The analytics data staged in Redshift only contain the numbers. So, we pull a copy of the PHP text file into @workspace, and ask Amazon Q Developer to translate the relevant string-integer mappings into a SQL case statement.

Image shows the selected PHP code with a switch statement mapping Vendor Ids to Vendor Names.

PHP Switch statement showing the mapping of Vendor Ids to String Names.

I am a Redshift database administrator and I am working on a data modeling 
problem. I would like to write SQL statements to join tables sample.orderinfo 
and sample.skuinfo. Please write that SQL to join the two tables. Also, I 
would like to write a SQL case statement to recover all string values defined 
in PHP that are represented as integer values in the database table.

The output of that prompt is shown below.

Image showing the updated SQL query that maps the Vendor Id to the Vendor Name.

Amazon Q Developer automatically detected the PHP switch case statement, converted to SQL, and added it to the final query. Many other programming languages are supported, and modelers should try this technique with other kinds of source code. Note that data scientists and analysts may not know where to look in complex application code for these details, so this discovery-plus-code translation step is a net new benefit to our company that is only possible thanks to Amazon Q Developer.

Step 3. Create code to test the analysis

Now we request SQL source code for a battery of small test queries. These can return cardinality, grain, arithmetic, and null count results.

Please write a short SQL test to compute counts of the key fields that are used 
in the joins, which will verify the cardinality assignments indicated in the 
entity relationship diagram above. The SQL test should compare distinct counts 
to total counts and null counts when it verifies the cardinality.

Image of resulting SQL queries to check cardinality.

Step 4. Validate the results of the analysis

Run the test queries to see if the candidate solution from step 2 meets our goals. The “Insert at cursor” button at the bottom of the response is handy for this. The data modeler can easily spot an error in the join logic and ERD from inspecting the output of the test query. (Or, if it’s hard to interpret the results, keep making the test queries simpler.) If errors arise from the AI misinterpreting or miscalculating a result, or from a vaguely worded prompt, simply adjust the prompt in step 2 to fix the known errors, and repeat steps 2 – 4.

Image showing the query results from the cardinality query.

After a few iterations, taking from seconds to at most tens of minutes each, the modeling errors have been worked out and we arrive at a valid production query.

Key Benefits and Results

With this Amazon Q Developer powered solution and iterative approach, SmugMug has achieved highly accurate data modeling results across numerous database tables. Once the correct modeling configuration is established, various useful outputs may become available.

We already described production SQL, unit tests, and ERDs for documentation. By the end of the process, because Amazon Q Developer has a good understanding of the data it just modeled in its chat history, it will also generate useful Python machine learning programs to predict business trends. Here is a prompt for that, and a partial screenshot of the Python output:

Please write Python code to implement a linear regression that predicts the 
quantity_ordered value based on other fields in the data set. Choose predictor 
variables that are less likely to cause multi-collinearity problems.

Image showing the python code generated to predict quantity_ordered value.

This only shows the model training step, but the full response included all library imports, a Redshift query, feature engineering steps, ML performance metrics, and code for plotting the metrics. And the AI can produce other types of predictive models. For example, you can try:

Please write Python code to implement an XGBoost model that predicts the 
quantity_ordered value based on other fields in the data set.

Ultimately, the solution has improved team productivity for both existing and new team members, while maintaining legacy knowledge needed to onboard new team members more efficiently. Key benefits include:

  1. Reducing SmugMug data analyst and scientist’s time spent on data modeling tasks from days to hours, allowing them to reallocate this time to other high-priority projects.
  2. Automating the generation of BI documentation and predictive ML, also saving crucial time.
  3. Providing net new value by translating application code constant definitions into SQL. Due to organizational boundaries, we would not have achieved this without an assist from the AI.

Future Plans and Expansion

SmugMug conducted the initial data modeling use case testing with over a dozen data science team members and analysts. We are moving on to analyze more complex tables and data schemas, and generating Python code in Amazon SageMaker for ML tasks like data preparation, training, inference, and MLOps. From our experience, Amazon Q Developer has become a preferred internal tool for development that has a data modeling component, and its use continues to expand to different groups around the company.

For SmugMug’s data modeling projects, we continue to enhance the four-step process described above. In order to gather the most relevant context to solve a problem, we build vector database collections to pull from schemas, older SQL code, application source code, BI tool content, and curated documentation. The vector search operation surfaces the right content, and spares data modelers from manually searching in different code archives. We use ChromaDB to do the searches, and bring the results from ChromaDB into the workspace as additional files.

Conclusion

Using Amazon Q Developer for data modeling use cases, SmugMug has managed to increase data science and engineering team productivity by up to 100% when compared to prior workflows. To explore how Amazon Q Developer can benefit your organization, get started here. If you have questions or suggestions, please leave a comment below.

About the Authors

Image of Dr. Geoffrey Ryder

Dr. Geoffrey Ryder

Dr. Geoff Ryder serves as the Manager of Data Science and Engineering at SmugMug, where he leads Team Prophecy in managing the company’s cloud-based data warehouse and analytics platforms. With a focus on leveraging the best AI tools, his team empowers photography clients to enhance their sales of both physical and digital photographic products. Geoff brings over two decades of experience in technical and business roles across Silicon Valley companies, and holds a PhD in Computer Engineering from UC-Santa Cruz.

Will Matos

Will Matos is a Principal Specialist Solutions Architect at AWS, revolutionizing developer productivity through Generative AI, AI-powered chat interfaces, and code generation. With 25 years of tech experience, and over 9 years with AWS, he collaborates with product teams to create intelligent solutions that streamline workflows and accelerate software development cycles. A thought leader engaging early adopters, Will bridges innovation and real-world needs.

Sreenivas Adiki

Sreenivas Adiki is a Sr. Customer Delivery Architect in ProServe, with a focus on data and analytics. He ensures success in designing, building, optimizing, and transforming in the area of Big Data/Analytics. Ensuring solutions are well-designed for successful deployment, Sreenivas participates in deep architectural discussions and design exercises. He has also published several AWS assets, such as whitepapers and proof-of-concept papers.

Kevin Bell

Kevin Bell is a Sr. Solutions Architect at AWS based in Seattle. He has been building things in the cloud for about 10 years. You can find him online as @bellkev on GitHub.

Corey Keane

Corey Keane is a Media and Entertainment (M&E) Sr. Account Manager at AWS. Corey has held a number of positions at Amazon and AWS throughout his 8 years with the company across M&E—including technical business development for strategic partnerships with international game developers, in addition to his current role managing AWS customers in the Media vertical. He leans on his pan-Amazon experience from working on other teams to identify new partnerships between our customers and other Amazon businesses to bring disruptive products to market.

Dissecting the Performance Gains in Amazon Q Developer agent for code transformation

Post Syndicated from Jonathan Vogel original https://aws.amazon.com/blogs/devops/dissecting-the-performance-gains-in-amazon-q-developer-agent-for-code-transformation/

Amazon Q Developer Agent for code transformation is an AI-powered tool which modernizes code bases from Java 8 and Java 11 to Java 17. Integrated into VS Code and IntelliJ, Amazon Q simplifies the migration process and reduce the time and effort compared to manual process. It proposes and verifies code changes, using AI to debug compilation errors. In this blog post, we’ll explore recent improvements to our code transformation agent, particularly its enhanced debugging capabilities. The enhanced debugger agent significantly improves transformation efficiency and quality compared to the existing debugger.

How Amazon Q transforms Java applications

To upgrade Java codebases, the code transformation agent takes the source code input and verify the build and test in source Java version. It then uses deterministic tools to apply code changes, followed by building and testing the changed code in the target Java version. If errors occur in this stage, a generative AI-based system debugs and resolves the compilation errors. Until today, the debugger resolves each error one by one, locating the code file with the error in the codebase, and fixing it. This debug step iterates until all compilation errors are solved or the maximum number of iterations is reached.

A flowchart diagram illustrating Amazon Q's code transformation process for accelerating Java upgrades to version 17. The workflow begins with source code input, flowing through a transformation engine that applies deterministic tools and generative AI, followed by build/test verification cycles and AI-powered debugging to resolve any compilation errors.

As an example, if, as the result of a library upgrade, an import statement is missing or wrong, the AI debugger will re-build, iterate to find all the references in multiple files one by one, and update each reference to resolve the error. Refer to this blog “Three ways Amazon Q Developer agent for code transformation accelerates Java upgrades” for detailed explanation of each transformation step. This approach has helped Q Developer customers achieve accelerations of migration effort by over 40%.

Improving the debugging capabilities of code transformations

To further improve the ability of Q Developer to generate error-free code, we’ve just released multiple foundational improvements to the AI debugger.

  • Multi-error context: the debug AI can now take multiple build errors into consideration, which provides more context, leading to better solution discovery.
  • More tools available for the AI: compared to simply localizing error to a single file and fixing the error previously, the agent can now execute multi-file solutions by exploring the codebase and operating on multiple files.
  • Inter-iteration memory: the debugger AI now remembers previous errors, which contributes to debugging new errors.
  • Intelligent backtracking: the debugger AI can now recognize if the current solution path leads to a dead end, in which case the agent can roll back to the previous state.

To implement these capabilities, the debugger AI is re-architected as a multi-agent system. A memory management agent is responsible to analyze last iteration results and append the relevant portions to the inter-iteration memory. A critic agent is responsible to analyze progress and provide additional information to the debugger agent and, if a dead end is detected, rollback the progress to a previous state. A debugger agent, analyzes the memory and the critique from the previous agents and modifies or updates the plan to fix the remaining errors in the codebase. The debugger agent has its disposal a set of generic and specialized tools to browse and explore the codebase, edit source files, trigger builds, add dependencies, and so on. It is important to note that the agent only has access to the files and tools related to the transformation task, which limits hallucinations and drive towards progress.

Let’s examine how the agent handles recurring issues across multiple files with these improvements. Consider a scenario where several Java files are missing the same import statement after upgrading from Java 8 to Java 17. This happens when you upgrade from older Java collections (like Vector and Enumeration) to modern streaming operations. The system is capable of helping you update these patterns automatically. The agent is now able to intelligently detect this pattern and implement a comprehensive solution across all affected files. Suppose we have three Java files that use the java.util.stream.Collectors class, but the import is missing in each:

File1.java:

public class File1 {
    public List<String> process(List<String> input) {
        return input.stream()
            .filter(s → s.length() > 5)
            .collect(Collectors.toList()); // Error: Cannot resolve symbol 'Collectors'
    }
}

File2.java:

public class File2 {
    public Map<String, Long> countWords(List<String> words) {
        return words.stream()
            .collect(Collectors.groupingBy(
                word -> word.toLowerCase(),
                Collectors.counting()
            )); // Error: Cannot resolve symbol 'Collectors'
    }
}

File3.java:

public class File3 {
    public String concatenate(List<String> strings) {
        return strings.stream()
            .collect(Collectors.joining(", "));
            // Error: Cannot resolve symbol 'Collectors'
    }
}

After the agent detects the common issue and applies the fix, all three files would be updated as follows:

File1.java (after fix):

import java.util.stream.Collectors;

public class File1 {
    public List<String> process(List<String> input) {
        return input.stream()
            .filter(s -> s.length() > 5)
            .collect(Collectors.toList());
    }
}    

File2.java (after fix):

import java.util.stream.Collectors;

public class File2 {
    public Map<String, Long> countWords(List<String> words) {
        return words.stream()
            .collect(Collectors.groupingBy(
                word -> word.toLowerCase(),
                Collectors.counting()));
    }
}

File3.java (after fix):

import java.util.stream.Collectors;

public class File3 {
    public String concatenate(List<String> strings) {
        return strings.stream()
            .collect(Collectors.joining(", "));
    }
}

In this example, the agent has identified that the same import statement (import java.util.stream.Collectors;) was missing in all three files. It then applied the fix consistently across all affected files, demonstrating its ability to recognize patterns and implement solutions efficiently across the entire codebase, avoiding different solutions attempts for each individual error, and saving iteration budget to solve different errors, if present.

The contrast between existing debugger and enhanced Agent is more clear when handling complex, interconnected changes. For instance, in updating Springfox Swagger from 2.0 to 3.0 (OpenAPI), both systems initially made similar changes. However, when faced with subsequent errors, their approaches diverged significantly. Consider this scenario:
Initially, both systems removed Springfox dependencies:

<!-- Removed by both systems -->
<dependency>
    <groupId>io.springfox</groupId>
    <artifactId>springfox-swagger2</artifactId>
    <version>2.9.2</version>
</dependency>

Later, when encountering a “missing symbol: Docket” error, existing debugger attempted to reintroduce Springfox:

<!-- existing debugger trying to add back Springfox -->
<dependency>
    <groupId>io.springfox</groupId>
    <artifactId>springfox-boot-starter</artifactId>
    <version>3.0.0</version>
</dependency>

In contrast, our Agent recognized this as consistent with the previous removal and rewrote the file using SpringDoc OpenAPI:

import org.springdoc.core.GroupedOpenApi;

@Configuration
public class SwaggerConfig {
    @Bean
    public GroupedOpenApi publicApi() {
        return GroupedOpenApi.builder()
                .group("springshop-public")
                .pathsToMatch("/public/**")
                .build();
    }
}   

These latest improvements in our debug AI have yielded positive results. By incorporating multi-error context analysis, additional tooling of multi-file solution, and inter-iteration memory, the agent now delivers more comprehensive and consistent codebase upgrades. We tested our new approach on 62 large open-source applications, some containing over 100,000 lines of code, incorporating more than 100 open-source libraries. The results showed an 85% higher success rate compared to the previous approach. These enhancements significantly boost both the quality and efficiency of code transformation, marking a substantial leap forward in automated application modernization for Java.

Conclusion

With the latest improvements, Q Developer continues to accelerate the journey to modernize Java applications across your organization. For more context, please refer to the blog “Accelerate application upgrades with Amazon Q Developer agent for code transformation.”

As we continue to innovate in code transformation use cases, this release creates the foundation to expand language support, further enhance AI-driven problem-solving algorithms, and streamlining the integration with development workflows. Our goal remains to provide developers and organizations with cutting-edge tools that simplify complex maintenance and modernization processes and foster the adoption of modern, cloud-native architectures. Stay tuned for future updates as we push the boundaries of AI-assisted code transformation.

About the authors

Omer Tripp

Omer heads the Q Code Transformation science team. His research work is at the intersection of programming languages and AI/ML, emphasizing developer productivity and acceleration as well as software security and reliability. Outside of work, Omer likes to stay physically active (through tennis, basketball, skiing, and various other activities), as well as tour the US and the world with his family.

Jonathan Vogel

Jonathan is a Developer Advocate at AWS. He was a DevOps Specialist Solutions Architect at AWS for two years prior to taking on the Developer Advocate role. Prior to AWS, he practiced professional software development for over a decade. Jonathan enjoys music, birding and climbing rocks.

Yiyi Guo

Yiyi is a Senior Product Manager at AWS working on Amazon Q developer agent for code transformation, she focuses on leveraging generative AI to accelerate enterprise application modernization.

Elio Damaggio

Elio Damaggio is the product lead for the transformation capabilities of Amazon Q Developer. With more than 15 years in tech, 11 patents, and a PhD in Computer Science, he is now looking for exciting ways to empower developers through AI.

Special thanks to the scientists on the Q Developer team who helped to provide input to this blog: Talha Oz and Zeren Shui.

AWS named as a leader again in the Gartner Magic Quadrant for Distributed Hybrid Infrastructure

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/aws-named-as-a-leader-again-in-the-gartner-magic-quadrant-for-distributed-hybrid-infrastructure/

Gartner published the second Magic Quadrant for Distributed Hybrid Infrastructure (DHI), which includes Amazon Web Services (AWS) as a leader again. AWS has three products in this DHI portfolio: AWS Outposts, AWS Snowball, and AWS Local Zones. In the accompanying Gartner’s Critical Capabilities for DHI, AWS is ranked number one in four out of six use cases evaluated by Gartner—including hybrid infrastructure management, edge computing, assured workloads, and artificial intelligence & machine learning (AI/ML)—and among the top two in the use case of container management.

Gartner evaluates 10 DHI providers based on their Ability to Execute, which measures a vendor’s capacity to deliver its products or services effectively, and Completeness of Vision, which assesses a vendor’s understanding of the market and its strategy for future growth.

Here is the graphical representation of the 2024 Gartner Magic Quadrant for DHI.

Gartner recognized AWS strengths as:

  • Leading public cloud provider – AWS DHI solutions appeal to AWS public cloud customers that want to extend their infrastructure to their data center and edge locations, while also migrating from their remaining private cloud infrastructure.
  • As-a-service delivery – The fully managed infrastructure delivery of AWS Outposts simplifies operations and enables a hands-off, single-vendor approach to infrastructure management, including integration with some on-premises technologies.
  • AWS support – Gartner clients report high satisfaction with the AWS worldwide support and services team.

We believe this leader placement reflects our innovation at the edge of the cloud for workloads that require low latency, local data processing, data residency, or migration with on-premises interdependencies. At AWS, we extend the same AWS infrastructure, AWS services, APIs, and tools wherever you need them for a truly consistent cloud experience.

Whether your workloads are running in the AWS Regions, in metro areas with AWS Local Zones, on premises with AWS Outposts, in the telco networks with AWS Wavelength, or at the far edge with AWS Snow Family, you can standardize on the same cloud operating model for all your applications. You can streamline developer workflow by standardizing on a common set of continuous integration and continuous deployment (CI/CD) pipelines. It also reduces the time, resources, operational risk, and maintenance downtime required to manage IT infrastructure.

As examples of accelerated innovation, we have added the latest generation of GPU-backed instances to Local Zones to better support ML workloads and expanded the number of locations. We have made Outposts available in more countries and added AWS services supported on Outposts to facilitate migration and disaster recovery, such as AWS Elastic Disaster Recovery and Amazon Route 53 Resolver to improve application availability and performance.

In addition, we have improved the disconnection tolerance for container-based workloads on Outposts by making it possible for customers to run both the Kubernetes control plane and nodes locally, and we enhanced its capabilities for multi-rack Outposts deployments.

Access the complete 2024 Gartner Magic Quadrant for DHI report to learn more.

Channy

Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

GARTNER is a registered trademark and service mark of Gartner and Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.

Run high-availability long-running clusters with Amazon EMR instance fleets

Post Syndicated from Garima Arora original https://aws.amazon.com/blogs/big-data/run-high-availability-long-running-clusters-with-amazon-emr-instance-fleets/

AWS now supports high availability Amazon EMR on EC2 clusters with instance fleet configuration. With high availability instance fleet clusters, you now get the enhanced resiliency and fault tolerance of high availability architecture, along with the improved flexibility and intelligence in Amazon Elastic Compute Cloud (Amazon EC2) instance selection of instance fleets. Amazon EMR is a cloud big data platform for petabyte-scale data processing, interactive analysis, streaming, and machine learning (ML) using open source frameworks such as Apache Spark, Presto and Trino, and Apache Flink. Customers love the scalability and flexibility that Amazon EMR on EC2 offers. However, like most distributed systems running mission-critical workloads, high availability is a core requirement, especially for those with long-running workloads.

In this post, we demonstrate how to launch a high availability instance fleet cluster using the newly redesigned Amazon EMR console, as well as using an AWS CloudFormation template. We also go over the basic concepts of Hadoop high availability, EMR instance fleets, the benefits and trade-offs of high availability, and best practices for running resilient EMR clusters.

High availability in Hadoop

High availability (HA) provides continuous uptime and fault tolerance for a Hadoop cluster. The core components of Hadoop, like Hadoop Distributed File System (HDFS) NameNode and YARN ResourceManager, are single points of failure in clusters with a single primary node. In the event that any of them crash, the entire cluster goes down. High Availability removes this single point of failure by introducing redundant standby nodes that can quickly take over if the primary node fails.

In a high availability EMR cluster, one node serves as the active NameNode that handles client operations, and others act as standby NameNodes. The standby NameNodes constantly synchronize their state with the active one, enabling seamless failover to maintain service availability. To learn more, see Supported applications in an Amazon EMR Cluster with multiple primary nodes.

Key instance fleet differentiations

Amazon EMR recommends using the instance fleet configuration option for provisioning EC2 instances in EMR clusters because it offers a flexible and robust approach to cluster provisioning. Some key advantages include:

  • Flexible instance provisioning – Instance fleets provide a powerful and simple way to specify up to five EC2 instance types on the Amazon EMR console, or up to 30 when using the AWS Command Line Interface (AWS CLI) or API with an allocation strategy. This enhanced diversity helps optimize for cost and performance while increasing the likelihood of fulfilling capacity requirements.
  • Target capacity management – You can specify target capacities for On-Demand and Spot Instances for each fleet. Amazon EMR automatically manages the mix of instances to meet these targets, reducing operational overhead.
  • Improved availability – By spanning multiple instance types and purchasing options such as On-Demand and Spot, instance fleets are more resilient to capacity fluctuations in specific EC2 instance pools.
  • Enhanced Spot Instance handling – Instance fleets offer superior management of Spot Instances, including the ability to set timeouts and specify actions if Spot capacity can’t be provisioned.
  • Reliable cluster launches – You can configure your instance fleet to select multiple subnets for different Availability Zones, allowing Amazon EMR to find the best combination of instances and purchasing options across these zones to launch your cluster in. Amazon EMR will identify the best Availability Zone based on your configuration and available EC2 capacity and launch the cluster.

Prerequisites

Before you launch the high availability EMR instance fleet clusters, make sure you have the following:

  • Latest Amazon EMR release – We recommend that you use the latest Amazon EMR release to benefit from the highest level of resiliency and stability for your high availability clusters. High availability for instance fleets is supported with Amazon EMR releases 5.36.1, 6.8.1, 6.9.1, 6.10.1, 6.11.1, 6.12.0, and later.
  • Supported applications – High availability for instance fleets is supported for applications such as Apache Spark, Presto, Trino, and Apache Flink. Refer to Supported applications in an Amazon EMR Cluster with multiple primary nodes for the complete list of supported applications and their failover processes.

Launch a high availability instance fleet cluster using the Amazon EMR console

Complete the following steps on the Amazon EMR console to configure and launch a high availability EMR cluster with instance fleets:

  1. On the Amazon EMR console, create a new cluster.
  2. For Name, enter a name.
  3. For Amazon EMR release, choose the Amazon EMR release that supports high availability clusters with instance fleets. The setting will default to the latest available Amazon EMR release.

CreateHACluster-EMRRelease

  1. Under Cluster configuration, choose the desired instance types for the primary fleet. (You can select up to five when using the Amazon EMR console.)
  2. Select Use high availability to launch the cluster with three primary nodes.

CreateHACluster

  1. Choose the instance types and target On-Demand and Spot size for the core and task fleet according to your requirements.

InstanceFleet-CreateFleets

  1. Under Allocation strategy, select Apply allocation strategy.
    1. 1 We recommend that you select Price-capacity optimized for your allocation strategy for your cluster for faster cluster provisioning, more accurate Spot Instance allocation, and fewer Spot Instance interruptions.
  2. Under Networking, you can choose multiple subnets for different Availability Zones. This allows Amazon EMR to look across those subnets and launch the cluster in an Availability Zone that best suits your instance and purchasing option requirements.

allocationStrategy

  1. Review your cluster configuration and choose Create cluster.

Amazon EMR will launch your cluster in a few minutes. You can view the cluster details on the Amazon EMR console.
ClusterDetailPage

Launch a high availability cluster with AWS CloudFormation

To launch a high availability cluster using AWS CloudFormation, complete the following steps:

  1. Create a CloudFormation template with EMR resource type AWS::EMR::Cluster and JobFlowInstancesConfig property types MasterInstanceFleet, CoreInstanceFleet and (optional) TaskInstanceFleets. To launch a high availability cluster, configure TargetOnDemandCapacity=3, TargetSpotCapacity=0 for the primary instance fleet and weightedCapacity=1 for each instance type configured for the fleet. See the following code:
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Resources": {
    "cluster": {
      "Type": "AWS::EMR::Cluster",
      "Properties": {
        "Instances": {
          "Ec2SubnetIds": [
            "subnet-003c889b8379f42d1",
            "subnet-0382aadd4de4f5da9",
            "subnet-078fbbb77c92ab099"
          ],
          "MasterInstanceFleet": {
            "Name": "HAPrimaryFleet",
            "TargetOnDemandCapacity": 3,
            "TargetSpotCapacity": 0,
            "InstanceTypeConfigs": [
              {
                "InstanceType": "m5.xlarge",
                "WeightedCapacity": 1
              },
              {
                "InstanceType": "m5.2xlarge",
                "WeightedCapacity": 1
              },
              {
                "InstanceType": "m5.4xlarge",
                "WeightedCapacity": 1
              }
            ]
          },
          "CoreInstanceFleet": {
            "Name": "cfnCore",
            "InstanceTypeConfigs": [
              {
                "InstanceType": "m5.xlarge",
                "WeightedCapacity": 1
              },
              {
                "InstanceType": "m5.2xlarge",
                "WeightedCapacity": 2
              },
              {
                "InstanceType": "m5.4xlarge",
                "WeightedCapacity": 4
              }
            ],
            "LaunchSpecifications": {
              "SpotSpecification": {
                "TimeoutAction": "SWITCH_TO_ON_DEMAND",
                "TimeoutDurationMinutes": 20,
                "AllocationStrategy": "PRICE_CAPACITY_OPTIMIZED"
              }
            },
            "TargetOnDemandCapacity": "4",
            "TargetSpotCapacity": 0
          },
          "TaskInstanceFleets": [
            {
              "Name": "cfnTask",
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "m5.xlarge",
                  "WeightedCapacity": 1
                },
                {
                  "InstanceType": "m5.2xlarge",
                  "WeightedCapacity": 2
                },
                {
                  "InstanceType": "m5.4xlarge",
                  "WeightedCapacity": 4
                }
              ],
              "LaunchSpecifications": {
                "SpotSpecification": {
                  "TimeoutAction": "SWITCH_TO_ON_DEMAND",
                  "TimeoutDurationMinutes": 20,
                  "AllocationStrategy": "PRICE_CAPACITY_OPTIMIZED"
                }
              },
              "TargetOnDemandCapacity": "0",
              "TargetSpotCapacity": 4
            }
          ]
        },
        "Name": "TestHACluster",
        "ServiceRole": "EMR_DefaultRole",
        "JobFlowRole": "EMR_EC2_DefaultRole",
        "ReleaseLabel": "emr-6.15.0",
        "PlacementGroupConfigs": [
          {
            "InstanceRole": "MASTER",
            "PlacementStrategy": "SPREAD"
          }
        ]
      }
    }
  }
}

Make sure to use an Amazon EMR release that supports high availability clusters with instance fleets.

  1. Create a CloudFormation stack with the preceding template:
aws cloudformation create-stack --stack-name HAInstanceFleetCluster --template-body file://cfn-template.json --region us-east-1
  1. Retrieve the cluster ID from the list-clusters response to use in the following steps. You can further filter this list based on filters like cluster status, creation date, and time.
aws emr list-clusters --query "Clusters[?Name=='<YourClusterName>']"
  1. Run the following describe-cluster command:
aws emr describe-cluster --cluster-id j-XXXXXXXXXXX --region us-east-1

If the high availability cluster was launched successfully, the describe-cluster response will return the state of the primary fleet as RUNNING and provisionedOnDemandCapacity as 3. By this point, all three primary nodes have been started successfully.

DescribeClusterResponse

Primary node failover with High Availability clusters

To fetch information on all EC2 instances for an instance fleet, use the list-instances command:

aws emr list-instances --cluster-id j-XXXXXXXXXXX --instance-fleet-type MASTER --region us-east-1

For high availability clusters, it will return three instances in RUNNING state for the primary fleet and other attributes like public and private DNS names.

PrimaryInstance-DescribeCluster

The following screenshot shows the instance fleet status on the Amazon EMR console.

Instancefleet status

Let’s examine two cases for primary node failover.

Case 1: One of the three primary instances is accidentally stopped

When an EC2 instance is accidentally stopped by a user, Amazon EMR detects this and performs a failover for the stopped primary node. Amazon EMR also attempts to launch a new primary node with the same private IP and DNS name to recover back the quorum. During this failover, the cluster remains fully operational, providing true resiliency to single primary node failures.

The following screenshots illustrate the instance fleet details.

InstanceFleetDetail-PrimaryInstanceTerminated

instanceFleerRecovery

This automatic recovery for primary nodes is also reflected in the MultiMasterInstanceGroupNodesRunning or MultiMasterInstanceGroupNodesRunningPercentage Amazon CloudWatch metric emitted by Amazon EMR for your cluster. The following screenshot shows an example of these metrics.

CloudwatchMetrics

Case 2: One of the three primary instances becomes unhealthy

If Amazon EMR continuously receives failures when trying to connect to a primary instance, it is deemed as unhealthy and Amazon EMR will attempt to replace it. Similar to case 1, Amazon EMR will perform a failover for the stopped primary node and also attempt to launch a new primary node with the same private IP and DNS name to recover the quorum.

UnhealthyPrimaryInstance
PrimaryInstanceFailover-2

If you list the instances for the primary fleet, the response will include information for the EC2 instance that was stopped by the user and the new primary instance that replaced it with the same private IP and DNS name.
DescribeClusterResponse-instanceFailover

The following screenshot shows an example of the CloudWatch metrics.

An instance can have connection failures for multiple reasons, including but not limited to disk space unavailable on the instance, critical cluster daemons like instance controller shut down with errors, high CPU utilization, and more. Amazon EMR is continuously improving its health monitoring criteria to better identify unhealthy nodes on an EMR cluster.

Considerations and best practices

The following are some of the key considerations and best practices for using EMR instance fleets to launch a high availability cluster with multiple primary nodes:

  • Use the latest EMR release – With the latest EMR releases, you get the highest level of resiliency and stability for your high availability EMR clusters with multiple primary nodes.
  • Configure subnets for high availability – Amazon EMR can’t replace a failed primary node if the subnet is oversubscribed (there aren’t any available private IP addresses in the subnet). This results in a cluster failure as soon as the second primary node fails. Limited availability of IP addresses in a subnet can also result in cluster launch or scaling failures. To avoid such scenarios, we recommend that you dedicate an entire subnet to an EMR cluster.
  • Configure core nodes for enhanced data availability – To minimize the risk of local HDFS data loss on your production clusters, we recommend that you set the dfs.replication parameter to 3 and launch at least four core nodes. Setting dfs.replication to 1 on clusters with fewer than four core nodes can lead to data loss if a single core node goes down. For clusters with three or fewer core nodes, set dfs.replication parameter to at least 2 to achieve sufficient HDFS data replication. For more information, see HDFS configuration.
  • Use an allocation strategy – We recommend enabling an allocation strategy option for your instance fleet cluster to provide faster cluster provisioning, more accurate Spot Instance allocation, and fewer Spot Instance interruptions.
  • Set alarms for monitoring primary nodes – You should monitor the health and status of primary nodes of your long-running clusters to maintain smooth operations. Configure alarms using CloudWatch metrics such as MultiMasterInstanceGroupNodesRunning, MultiMasterInstanceGroupNodesRunningPercentage, or MultiMasterInstanceGroupNodesRequested.
  • Integrate with EC2 placement groups – You can also choose to protect primary instances against hardware failures by using a placement group strategy for your primary fleet. This will spread the three primary instances across separate underlying hardware to avoid loss of multiple primary nodes at the same time in the event of a hardware failure. See Amazon EMR integration with EC2 placement groups for more details.

When setting up a high availability instance fleet cluster with Amazon EMR on EC2, it’s important to understand that all EMR nodes, including the three primary nodes, are launched within a single Availability Zone. Although this configuration maintains high availability within that Availability Zone, it also means that the entire cluster can’t tolerate an Availability Zone outage. To mitigate the risk of cluster failures due to Spot Instance reclamation, Amazon EMR launches the primary nodes using On-Demand instances, providing an additional layer of reliability for these critical components of the cluster.

Conclusion

This post demonstrated how you can use high availability with EMR on EC2 instance fleets to enhance the resiliency and reliability of your big data workloads. By using instance fleets with multiple primary nodes, EMR clusters can withstand failures and maintain uninterrupted operations, while providing enhanced instance diversity and better Spot capacity management within a single Availability Zone. You can quickly set up these high availability clusters using the Amazon EMR console or AWS CloudFormation, and monitor their health using CloudWatch metrics.

To learn more about the supported applications and their failover process, see Supported applications in an Amazon EMR Cluster with multiple primary nodes. To get started with this feature and launch a high availability EMR on EC2 cluster, refer to Plan and configure primary nodes.


About the Authors

Garima Arora is a Software Development Engineer for Amazon EMR at Amazon Web Services. She specializes in capacity optimization and helps build services that allow customers to run big data applications and petabyte-scale data analytics faster. When not hard at work, she enjoys reading fiction novels and watching anime.

Ravi Kumar is a Senior Product Manager Technical-ES (PMT) at Amazon Web Services, specialized in building exabyte-scale data infrastructure and analytics platforms. With a passion for building innovative tools, he helps customers unlock valuable insights from their structured and unstructured data. Ravi’s expertise lies in creating robust data foundations using open-source technologies and advanced cloud computing, that powers advanced artificial intelligence and machine learning use cases. A recognized thought leader in the field, he advances the data and AI ecosystem through pioneering solutions and collaborative industry initiatives. As a strong advocate for customer-centric solutions, Ravi constantly seeks ways to simplify complex data challenges and enhance user experiences. Outside of work, Ravi is an avid technology enthusiast who enjoys exploring emerging trends in data science, cloud computing, and machine learning.

Tarun Chanana is a Software Development Manager for Amazon EMR at Amazon Web Services.

How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using Amazon DataZone

Post Syndicated from Dhrubajyoti Mukherjee original https://aws.amazon.com/blogs/big-data/how-volkswagen-autoeuropa-built-a-data-solution-with-a-robust-governance-framework-simplifying-access-to-quality-data-using-amazon-datazone/

This is a joint post co-authored with Martin Mikoleizig from Volkswagen Autoeuropa.

This second post of a two-part series that details how Volkswagen Autoeuropa, a Volkswagen Group plant, together with AWS, built a data solution with a robust governance framework using Amazon DataZone to become a data-driven factory. Part 1 of this series focused on the customer challenges, overall solution architecture and solution features, and how they helped Volkswagen Autoeuropa overcome their challenges. This post dives into the technical details, highlighting the robust data governance framework that enables ease of access to quality data using Amazon DataZone.

At Amazon, we work backward, a systematic way to vet ideas and create new products. The key tenet of this approach is to start by defining the customer experience, then iteratively work backward from that point until the team achieves clarity of thought around what to build. The first section of this post discusses how we aligned the technical design of the data solution with the data strategy of Volkswagen Autoeuropa. Next, we detail the governance guardrails of the Volkswagen Autoeuropa data solution. Finally, we highlight the key business outcomes.

Aligning the solution with the data strategy

At an early stage of the project, the Volkswagen Autoeuropa and AWS team identified that a data mesh architecture for the data solution aligns with the Volkswagen Autoeuropa’s vision of becoming a data-driven factory. With this in mind, the team implemented the following steps:

  • Define data domains – In a workshop, the team identified the data landscape and its distribution in Volkswagen Autoeuropa. Next, the team grouped the data assets of the organization along the lines of business and defined the data domains. Because Volkswagen Autoeuropa is at an early stage of their data mesh journey, defining data domains along the lines of business is the recommended approach. As the data solution evolves, Volkswagen Autoeuropa might consider other criteria such as business subdomains to define data domains. The team defined more than five data domains, such as production, quality, logistics, planning, and finance.
  • Identify pioneer cases – The team identified the pioneer use cases that onboard the data solution first, to validate its business value. The team identified two use cases. The first use case helps predict test results during the car assembly process. The second use case enables the creation of reports containing shop floor key metrics for different management levels. The following criteria were considered to identify these use cases:
    • Use cases that deliver measurable business value for Volkswagen Autoeuropa.
    • Use cases with high AWS maturity.
    • Use cases whose requirements can be met with the first release version of the data solution.
  • Onboard key data products – The team identified the key data products that enabled these two use cases and aligned to onboard them into the data solution. These data products belonged to data domains such as production, finance, and logistics. In addition, the team aligned on business metadata attributes that would help with data discovery. The data products are classified as either source-based data or consumer-based data. Source-based data is the unaltered, raw data that is generated from source systems (for example, quality data, safety data) and is useful for other business use cases. Consumer-based data is the aggregated and transformed data from source systems. Reuse of consumer-based data saves cost in extract, transform, and load (ETL) implementation and system maintenance.

In addition to the preceding steps, the team established a data quality framework to improve the quality of the data product registered in the data solution. The following table shows the mapping of the data mesh-based solution components to Amazon DataZone and AWS Glue features. The table also provides generic examples of the components in the automotive industry.

Data Solution Components AWS Service Features Generic Examples
Data domains Amazon DataZone projects and Amazon DataZone domain units Production, logistics
Use cases Amazon DataZone projects Smart manufacturing, predictive maintenance
Data products Amazon DataZone assets Sales data, sensor data
Business metadata Amazon DataZone glossaries and metadata forms Data product owner information, data refresh frequency
Data quality framework AWS Glue Data Quality  A quality score of 92%

Empowering teams with a governance framework

This section discusses the governance framework that was put in place to empower the teams at Volkswagen Autoeuropa by enhancing their analytics journey. It highlights the guardrails that enable ease of access to quality data.

Business metadata

Business metadata helps users understand the context of the data, which can lead to increased trust in the data. Moreover, establishing a common set of attributes of the data products promotes a consistent experience for the users. In addition to the business context, at Volkswagen Autoeuropa, the metadata includes information related to data classification and if the data contains personally identifiable information (PII). The data solution uses Amazon DataZone glossaries and metadata forms to provide business context to their data. Apart from the previous benefits, using the appropriate keywords in Amazon DataZone glossary terms and metadata forms can help with the search and filtering capability of data products in the Amazon DataZone data portal.

Data quality framework

The data quality framework is a comprehensive solution designed to streamline the process of data quality checks and publishing a quality score. It uses AWS Glue Data Quality to generate recommendation rulesets, run orchestrated jobs, store results, and send notifications. This framework can be seamlessly integrated into an AWS Glue job, providing a quality score for data pipeline jobs. The quality score of a data product is published in the Amazon DataZone data portal for consumers to evaluate. The key components of the solution are as follows:

  • Recommendation ruleset generation – The framework generates tailored rulesets based on metadata from the AWS Glue Data Catalog table, providing relevant and comprehensive quality checks.
  • Orchestrated job execution – Jobs are run in AWS Step Functions to perform data quality checks using the generated rulesets against data sources, evaluating data quality based on defined rules and criteria.
  • Result storage and notification – Results, including quality scores, quality status, and rulesets checked, are stored in an Amazon Simple Storage Service (Amazon S3) bucket, maintaining a historical record. End-users receive notifications with relevant details.
  • Data quality score publishing – The quality scores are published in the Amazon DataZone data portal, enabling consumers to access and evaluate data quality.
  • Subscription and quality score requirements – Consumers can subscribe to data sources or targets based on their desired quality score thresholds, making sure they receive data that meets their specific needs and standards.
  • Integration and extensibility – The framework is designed for seamless integration into existing AWS Glue jobs or data pipelines and provides a flexible and extensible architecture for customization and enhancement.

Federated governance

Federated governance empowers producer and consumer teams to operate independently while adhering to a central governance model. For the data solution at Volkswagen Autoeuropa, this meant a centralized team defined the governance guardrails and decentralized data teams employed those guardrails. The following are a few examples of how the team established federated governance in Volkswagen Autoeuropa:

  • Management of Amazon DataZone glossaries and metadata forms – In this mechanism, the Volkswagen Autoeuropa IT team defined the Amazon DataZone glossaries and metadata forms in a central manner. The data teams used them to publish the data assets in the Amazon DataZone. This provides consistency of business metadata across the organization. The following figure explains the process.
    The workflow in the Amazon DataZone data portal consists of the following steps:
    1. The data solution administrator belonging to the Volkswagen Autoeuropa IT team aligns with stakeholders such as data producers, data consumers, and source system owners, and maintains the business metadata using the Amazon DataZone glossaries and metadata forms.
    2. The producer project teams use the Amazon DataZone glossary terms and fill the Amazon DataZone metadata forms to enrich the inventory assets.
    3. After the business metadata is populated, the team publishes the assets in the Amazon DataZone data portal.
  • Management of Amazon DataZone project membership – In this scenario, the management of Amazon DataZone project membership is delegated to a designated administrator of the project. The following figure explains the process.
    The workflow consists of the following steps:
    1. The data solution administrator belonging to the Volkswagen Autoeuropa IT team provisions the Amazon DataZone project and environment using automation. The data solution administrator is the owner of the project.
    2. The data solution administrator delegates the management of the Amazon DataZone project membership to a designated administrator by assigning the owner role.
    3. The Amazon DataZone project administrator assigns the contributor role to eligible users.
    4. The users access the Amazon DataZone project and its assets from the Amazon DataZone data portal.

Authentication and authorization

The Amazon DataZone portal supports two types of authorizations: AWS Identity and Access Management (IAM) roles and AWS IAM Identity Center users. The data solution supports both of these authorization methods. The choice of authentication mechanism is a function of the type of authorization used for Amazon DataZone.

For IAM role authorization, an IAM role is created for each user, incorporating a prefix. Each data solution user role has a permission to list the Amazon DataZone domains (datazone:ListDomains) and to get the data portal login URL (datazone:GetIamPortalLoginUrl) in the Amazon DataZone AWS account. For reasons that are out of scope for this post, there could only be three SAML federated roles in an AWS account in the customer environment. As such, the team didn’t have a dedicated SAML federated role for each Amazon DataZone user. The data solution user role implemented a trust policy allowing the user’s AWS Security Token Service (AWS STS) federated user session principal Amazon Resource Name (ARN). If you don’t have limitations on the number of SAML federated roles per AWS account, you can make all data solution user roles SAML federated roles and update the trust policy accordingly.

For IAM Identity Center authorization, the configuration is done either at the AWS Organizations level or AWS account level in IAM Identity Center. Because there are currently no APIs available for identity source configuration in IAM Identity Center, the team followed the appropriate instructions to configure the identity source on the AWS Management Console.

After the chosen authorization option is activated, Amazon DataZone administrators grant the IAM principals (IAM role or IAM Identity Center user) access to the Amazon DataZone portal. For more details, refer to Manage users in the Amazon DataZone console.

Business outcomes

Volkswagen Autoeuropa and AWS established an iterative mechanism to enable the continuous growth of the data solution. This iterative improvement is expressed as a flywheel as shown in the following figure.

The outcome of each component of the flywheel powers the next component, creating a virtuous cycle. The data solution flywheel consists of five components:

  1. Data solution growth – The primary focus of the flywheel is to accelerate the growth of the data solution. This growth is measured by metrics such as number of data products, number of use cases onboarded into the solution, and number of users.
  2. Enhancing user experience – This component focuses on enhancing the user experience of the data solution. One way to measure the user experience is through user feedback surveys.
  3. Data solution use cases – Improved, positive user experience with the data solution contributes to the increased number of use cases that want to onboard the data solution.
  4. Data producers and consumers – As the number of use cases increases, so does the number of data producers and consumers. Data producers make data available to power the use cases. Data consumers use the data to drive the use cases.
  5. Selection of data products – After data producers onboard the data solution, they publish the assets in the Amazon DataZone data portal. This leads to a larger selection of data products. This, in turn, creates a positive experience for the data solution users.

In addition to the previous components, the positive user experience is reinforced by improving governance guardrails, increasing number of reusable assets, and maximizing operational excellence.

As of writing this post, Volkswagen Autoeuropa reduced the time to discover data from days to minutes using the data solution. This led to approximately 384 times improvement in data discovery time. Data access took several weeks before the Volkswagen Autoeuropa and AWS collaboration. With the help of the data solution powered by Amazon DataZone, the data access time was reduced to minutes. Overall, the data solution resulted in regaining between 48 hours and weeks of customer productivity over the course of a month.

The data solution powered by Amazon DataZone is driving measurable business impact for Volkswagen Autoeuropa. It enables Volkswagen Autoeuropa to deliver digital use cases faster, with less effort, and a higher overall quality. Volkswagen Autoeuropa believes that Amazon DataZone will be key in their journey to become a data-driven factory and to leverage AI.

Conclusion

This post explored how Volkswagen Autoeuropa built a robust and scalable data solution using Amazon DataZone. The first step was to align the solution with Volkswagen Autoeuropa’s overarching data strategy to drive business value.

The establishment of a comprehensive governance framework was central to this effort. This framework encompasses key components, such as business metadata, data quality, federated governance, access controls, and security, which maintain the trustworthiness and reliability of Volkswagen Autoeuropa’s data assets. The post highlighted the Volkswagen Autoeuropa data solution flywheel, showcasing how the solution enabled improved decision-making, increased operational efficiency, and accelerated digital transformation initiatives across the organization.

The data solution built at Volkswagen Autoeuropa is one of the first implementations within the Volkswagen Group and is a blueprint for other Volkswagen production plants.

“This project is a blueprint for other Volkswagen production plants. By involving the AWS team and using Amazon DataZone, we are able to govern our data centrally and make it accessible in an automated and secure way.”

– Daniel Madrid, Head of IT, Volkswagen Autoeuropa.

If you’re looking to harness the power of data mesh to drive innovation and business value within your organization, we’ve got you covered. In Strategies for building a data mesh-based enterprise solution on AWS, we dive deep into the key considerations and current recommendations to establish a robust, scalable, and well-governed data mesh on AWS. This documentation covers everything from aligning your data mesh with overall business strategy to implementing the data mesh strategy framework.

To get hands-on experience with real-world code examples, see our GitHub repository. This open source project provides a step-by-step blueprint for constructing a data mesh architecture using the powerful capabilities of Amazon DataZone, AWS Cloud Development Kit (AWS CDK), and AWS CloudFormation.


About the Authors

BDB-4558-DhrubaDhrubajyoti Mukherjee is a Cloud Infrastructure Architect with a strong focus on data strategy, data analytics, and data governance at AWS. He uses his deep expertise to provide guidance to global enterprise customers across industries, helping them build scalable and secure AWS solutions that drive meaningful business outcomes. Dhrubajyoti is passionate about creating innovative, customer-centric solutions that enable digital transformation, business agility, and performance improvement. An active contributor to the AWS community, Dhrubajyoti authors AWS Prescriptive Guidance publications, blog posts, and open source artifacts, sharing his insights and best practices with the broader community. Outside of work, Dhrubajyoti enjoys spending quality time with his family and exploring nature through his love of hiking mountains.

BDB-4558-RaviRavi Kumar is a Data Architect and Analytics expert at AWS, where he finds immense fulfilment in working with data. His days are dedicated to designing and analyzing complex data systems, uncovering valuable insights that drive business decisions. Outside of work, he unwinds by listening to music and watching movies, activities that allow him to recharge after a long day of data wrangling.

Martin Mikoleizig studied mechanical engineering and production technology at the RWTH Aachen University before starting to work in Dr. h.c. Ing. F. Porsche AG 2015 as a production planner for the engine assembly. Over several years as a Project Manager on Testing Technology for new engine models, he also introduced several innovations like human-machine collaborations and intelligent assistance systems. Starting in 2017, he was responsible for the shop floor IT team of the module lines in Zuffenhausen before he became responsible for the planning of the E-Drive assembly at Porsche. Additionally, he was responsible for the Digitalisation Strategy of the Production Ressort at Porsche. In October 2022, he was assigned to Volkswagen Autoeuropa in Portugal in the role of a Digital Transformation Manager for the plant, driving the digital transformation towards a data-driven factory.

BDB-4558-WeiWeizhou Sun is a Lead Architect at AWS, specializing in digital manufacturing solutions and IoT. With extensive experience in Europe, she has enhanced operational efficiencies, reducing latency and increasing throughput. Weizhou’s expertise includes industrial computer vision, predictive maintenance, and predictive quality, consistently delivering top performance and client satisfaction. A recognized thought leader in IoT and remote driving, she has contributed to business growth through innovations and open source work. Committed to knowledge sharing, Weizhou mentors colleagues and contributes to practice development. Known for her problem-solving skills and customer focus, she delivers solutions that exceed expectations. In her free time, Weizhou explores new technologies and fosters a collaborative culture.

BDB-4558-AjinkyaAjinkya Patil is a Senior Security Architect with AWS Professional Services, specializing in security consulting for customers in the automotive industry. Since joining AWS in 2019, he has played a key role in helping automotive companies design and implement robust security solutions on AWS. Ajinkya is an active contributor to the AWS community, having presented at AWS re:Inforce and authored articles for the AWS Security Blog and AWS Prescriptive Guidance. Outside of his professional pursuits, Ajinkya is passionate about travel and photography, often capturing the diverse landscapes he encounters on his journeys.

BDB-4558-AdjoaAdjoa Taylor has over 20 years of experience in industrial manufacturing, providing industry and technology consulting services, digital transformation, and solution delivery. Currently, Adjoa leads Product Centric Digital Transformation, enabling customers in solving complex manufacturing problems using smart factory and industry-leading transformation mechanisms. Most recently, she drives value with AI/ML and generative AI use cases for the plant floor. Adjoa is an experienced leader, having spent over 20 years of her career delivering projects in countries throughout North America, Latin America, Europe, and Asia. Adjoa brings deep experience across multiple business segments with a focus on business outcome-driven solutions. Adjoa is passionate about helping customers solve problems while realizing the art of the possible through implementing value-based solutions.

How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using Amazon DataZone

Post Syndicated from Dhrubajyoti Mukherjee original https://aws.amazon.com/blogs/big-data/how-volkswagen-autoeuropa-built-a-data-mesh-to-accelerate-digital-transformation-using-amazon-datazone/

This is a joint blog post co-authored with Martin Mikoleizig from Volkswagen Autoeuropa.

Volkswagen Autoeuropa is a Volkswagen Group plant that produces the T-Roc. The plant is located near Lisbon, Portugal and produces about 934 cars per day. In 2023, Volkswagen Autoeuropa represented 1.3% of the national GDP of Portugal and 4% in national export of goods impact with a sales volume of 3.3511 billion Euros. Volkswagen Autoeuropa aims to become a data-driven factory and has been using cutting-edge technologies to enhance digitalization efforts.

In this post, we discuss how Volkswagen Autoeuropa used Amazon DataZone to build a data marketplace based on data mesh architecture to accelerate their digital transformation. The data mesh, built on Amazon DataZone, simplified data access, improved data quality, and established governance at scale to power analytics, reporting, AI, and machine learning (ML) use cases. As a result, the data solution offers benefits such as faster access to data, expeditious decision making, accelerated time to value for use cases, and enhanced data governance.

Understanding Volkswagen Autoeuropa’s challenges

At the time of writing this post, Volkswagen Autoeuropa has already implemented more than 15 successful digital use cases in the context of real-time visualization, business intelligence, industrial computer vision, and AI.

Before the AWS partnership, Volkswagen Autoeuropa faced the following challenges.

  • Long lead time to access data – The digital use cases launched by Volkswagen Autoeuropa spent most of their project time getting access to the data that was relevant to their use cases. After the right data for the use case was found, the IT team provided access to the data through manual configuration. The lead time to access data was often from several days to weeks.
  • Insufficient data governance and auditing – Data was shared directly to use cases by copying it. Therefore, the IT team connected the data manually from their sources to the desired destinations multiple times. This process wasn’t centrally tracked to discover any information on the data sharing process. For example, if the data was copied in the past, how many use cases have access to the data, when access was granted, and who granted the access.
  • Redundant effort to process the same information – Because the IT team copied the data sources based on the exact use case requirements, they shared specific columns of the tables from the data. As additional use cases requested access to the same data with different column requirements, even more copies of the data were created.
  • Repeated process to establish security and governance guardrails – Each time the IT and the security team provided a connection to a new data source, they had to set up the security and governance guardrails. This required repeated manual effort.
  • Data quality issues – Because the data was processed redundantly and shared multiple times, there was no guarantee of or control over the quality of the data. This led to reduced trust in the data.
  • Absence of data catalog and metadata management – Data didn’t have any metadata associated with it, and so use cases couldn’t consume the data without further explanation from the data source owners and specialists. Furthermore, no process to discover new data existed. Similar to the consumption process, use cases would consult specialists to understand the context of the data and if it could provide value.

Envisioning a data solution for Volkswagen Autoeuropa

To address these challenges, Volkswagen Autoeuropa embarked on a bold vision. They envisioned a seamless data consumption process, similar to an online shopping experience. They envisioned a data marketplace where data users could browse and access high-quality, secure data with clear specifications, business context, and relevant attributes. This vision materialized into a project aimed at transforming data accessibility and governance as the foundation for the digital ecosystem. The vision to be realized: Data as seamless as online shopping.

In collaboration with Amazon Web Services (AWS), Volkswagen Autoeuropa joined the Enhanced Plant Onboarding Program of the Global Volkswagen Group’s Digital Production Platform (DPP EPO) strategy. Through this partnership, AWS and Volkswagen Autoeuropa created a data marketplace that significantly improved data availability.

In the discovery phase of the project, Volkswagen Autoeuropa and AWS evaluated several options to build the data solution. In the end, Volkswagen Autoeuropa chose a solution based on data mesh architecture using Amazon DataZone. Being a managed service, Amazon DataZone provided the necessary speed and agility to build the solution. At the same time, it led to higher operational efficiencies and lower operational overhead. The team adopted a data mesh architecture because the principles of the data mesh aligned with Volkswagen Autoeuropa’s vision of being a data driven factory.

Solution overview

This section describes the key features and architecture of the Volkswagen Autoeuropa data solution. The solution is based on a data mesh architecture.

Data solution features

The following figure shows the key capabilities of the Volkswagen Autoeuropa data solution.

The key capabilities of the solution are:

  • Data quality – In the solution, we’ve built a data quality framework to streamline the process of data quality checks and publishing quality scores. It uses AWS Glue Data Quality to generate recommendation rulesets, run orchestrated jobs, store results, and send notifications to users. This framework can be seamlessly integrated into AWS Glue jobs, providing a quality score for data pipeline jobs. In addition, the quality score is published in the Amazon DataZone data portal, allowing consumers to subscribe to the data based on its quality score.Assigning a quality score to the data helps build trust in the data, and shifts the responsibility of maintaining data quality to the data owner. As a result, the quality of the results delivered by these use cases improves.
  • Data registration – The producers sign in to the Amazon DataZone data portal using their AWS Identity and Access Management (IAM) credentials or single sign-on with integration through AWS IAM Identity Center. They register their data assets, which are stored in Amazon Simple Storage Service (Amazon S3), in the Amazon DataZone data catalog. The metadata of the data assets is stored in an AWS Glue catalog and made available in the business data catalog of Amazon DataZone and in the Amazon DataZone data source. The producers add business context such as business unit name, data owner contact information, and data refresh frequency using Amazon DataZone glossaries and metadata forms. In addition, they use generative AI capabilities to generate business metadata. After the business metadata is generated, they review the changes and modify the metadata if needed.Because all data products in Volkswagen Autoeuropa are now registered in the same location, the likelihood of data duplication is significantly reduced. Moreover, the data producers are improving the quality of the data by adding business context to it.
  • Data discovery – The consumers sign in to the Amazon DataZone data portal using their IAM credentials or single sign-on with integration through IAM Identity Center and search the data using keywords in the search bar. After the results are returned, they can further filter the results using glossary terms and project names. Finally, they review the business metadata of the data assets to evaluate if the data is relevant to their business use cases. They can check the quality score of the data assets and the refresh schedule for their use cases.With a data discovery capability in place, consumers can gain information about the data without the need to consult the source system owners or specialists.
  • Data access management – When the consumers find a data asset that’s relevant to their use case, they request access to it using the subscription feature of Amazon DataZone. Data is classified as public, internal, and confidential. For public and internal data assets, the access request is automatically approved. For confidential data assets, the data producer team reviews the access request and either accepts or rejects the subscription request.With a central place to manage data access, data owners can view which use cases have access to their data and when the access request was granted. The fine-grained access control feature of Amazon DataZone gives data owners granular control of their data at the row and column levels.
  • Data consumption – Upon approval of the subscription request, Amazon DataZone provisions the backend infrastructure to make the data accessible to the corresponding consumers. After this process is complete, the consumers can access the data through Amazon Athena using the deep link feature of Amazon DataZone. The data consumption pattern in Volkswagen Autoeuropa supports two use cases:
    • Cloud-to-cloud consumption – Both data assets and consumer teams or applications are hosted in the cloud.
    • Cloud-to-on-premises consumption – Data assets are hosted in the cloud and consumer use cases or applications are hosted on-premises.

Requirements specific to a use case requires access to the relevant data assets; sharing data to use cases using Amazon DataZone doesn’t require creating multiple copies. As a result, duplication and processing of data. Furthermore, by reducing the number of copies of the data, the overall quality of the data products improves. In addition, the backend automation of Amazon DataZone to make data available to use cases reduces the manual effort and improves the lead time to access data.

  • Single collaborative environment – The Amazon DataZone data portal provides a single collaborative environment to the users in Volkswagen Autoeuropa. Data consumers such as use case owners, data engineers, data scientists, and ML engineers can browse and request access to data assets. At the same time, data producers, such as use case owners and source system owners, can publish and curate their data in the Amazon DataZone data portal. This collaborative experience promotes teamwork and accelerates the realization of business value. Furthermore, the security and governance guardrails scales across the organization as the number of use cases increases.

Data solution architecture

The following figure displays the reference architecture of the data solution at Volkswagen Autoeuropa. In the next part of the post, we discuss how we arrived at the solution.

The architecture includes:

  1. The data from SAP applications, manufacturing execution systems (MES), and supervisory control and data acquisition (SCADA) systems is ingested into the producer accounts of Volkswagen Autoeuropa.
  2. In the producer account, raw data is transformed using AWS Glue. The technical metadata of the data is stored in AWS Glue catalog. The data quality is measured using the data quality framework. The data stored in Amazon Simple Storage Service (Amazon S3) is registered as an asset in the Amazon DataZone data catalog hosted in the central governance account.
  3. The central governance account hosts the Amazon DataZone domain and the related Amazon DataZone data portal. The AWS accounts of the data producers and consumers are associated with the Amazon DataZone domain. Amazon DataZone projects belonging to the data producers and consumers are created under the related Amazon DataZone domain units.
  4. Consumers of the data products sign in to the Amazon DataZone data portal hosted in the central governance account using their IAM credentials or single sign-on with integration through IAM Identity Center. They search, filter, and view asset information (for example, data quality, business, and technical metadata).
  5. After the consumer finds the asset they need, they request access to the asset using the subscription feature of Amazon DataZone. Based on the validity of the request, the asset owner approves or rejects the request.
  6. After the subscription request is granted and fulfilled, the asset is accessed in the consumer account for a one-time query using Athena and Microsoft Power BI applications hosted on premises. This consumption pattern can be extended for AI and machine learning (AI/ML) model development using Amazon SageMaker and reporting purposes using Amazon QuickSight.

User journey

After discussing the desired system with the use case teams and stakeholders and analyzing the current workflow, Volkswagen Autoeuropa grouped the user personas of the data solution into three main categories: data producer, data consumer, and data solution administrator. This sets the foundation for the desired user experience and what’s needed to achieve the solution goals.

Data producer

Data producers create the data products in the data solution. There are two types of data producers.

  • Data source owners – Data source owners publish the raw data in the Amazon DataZone data portal. These data products are attributed as source-based data.
  • Use case owners – Use case owners publish data that’s fit for consumption by other use cases. These data products are called consumer-based data.

The following figure shows the user journey of a data producer:

 

A data producer’s journey includes:

  1. Identify data of interest
    1. Identify data (Volkswagen Autoeuropa network).
    2. Perform data quality checks (Volkswagen Autoeuropa network).
  2. Connect data to the data solution
    1. Ingest data into the data solution (Amazon DataZone portal).
    2. Start process to connect data using AWS Glue.
  3. Locate the data source in the data solution
    1. Register data (Amazon DataZone portal).
    2. Add data to the inventory in Amazon DataZone.
  4. Add or edit metadata
    1. Add or edit metadata (Amazon DataZone portal).
    2. Publish data assets (Amazon DataZone portal).
  5. Approve or reject subscription request
    1. Review subscription requests.
  6. Maintain data assets
    1. Manage data assets (Amazon DataZone portal).

Data consumer

Data consumers use data for business analytics, machine learning, AI, and business reporting. Data consumers are data engineers, data scientists, ML engineers, and business users. The following diagram shows the journey of a data consumer.

A data consumer’s journey includes:

  1. Access Amazon DataZone portal
    1. Amazon DataZone portal – Access is granted based on the user’s assigned domain and projects.
  2. Search for data assets
    1. Data assets in Amazon DataZone portal – Search for data and brows the results by glossary terms or the project name. Use additional filters to refine the results.
  3. View business metadata
    1. Select a data asset to see additional information – Review the description, data quality score and metadata.
  4. Request access to data (subscribe)
    1. Subscribe to request access.
    2. After the subscription request is approved, review the data products that you have access to.
    3. Query the data to view and consume the data.
  5. Retrieve additional data
    1. Repeat the steps as needed to access and retrieve additional data.

Data solution administrator

Data solution administrators are responsible for performing administrative tasks on the data solution. The following figure shows the common tasks performed by the data solution administrator.

A data administrator’s journey includes:

  1. Manage projects
    1. Manage Amazon DataZone domain.
    2. Manage Amazon DataZone projects within the domain.
  2. Manage environment
    1. Set up the environment to manage the infrastructure.
  3. Manage business metadata glossary
    1. Manage and enable Amazon DataZone glossaries and metadata forms.
  4. Manage data assets
    1. Manage assets.
    2. Query the data to view and consume the data.
  5. Manage access to data solution
    1. Monitor and revoke access when appropriate.

Conclusion

In this post, you learned how Volkswagen Autoeuropa embarked on a bold vision to become a data driven factory. It shows how this vision was put into action by building a data solution based on data mesh architecture using Amazon DataZone. It highlights the key features and architecture of the data solutions and presents the user journey. As of writing this post, Volkswagen Autoeuropa reduced the data discovery time from days to minutes using the data solution. The time to access data took several weeks before the Volkswagen Autoeuropa and AWS collaboration. Now, with the help of the data solution, the data access time has been reduced to several minutes.

In May 2024, the team achieved a major milestone by successfully offering data on the data solution and transporting it instantly to Power BI, a process that previously took several weeks.

“After one year of work, we did the full roundtrip from offering data on our new data marketplace built using Amazon DataZone to transporting it instantly to third-party tools, a process that previously took several weeks. This was a big achievement for our team.”

– Jorge Paulino, Product owner of the data solution. Volkswagen Autoeuropa.

The next post of the two-part series details discusses how we built the solution, its technical details, and the business value created.

If you want to harness the agility and scalability of a data mesh architecture and Amazon DataZone to accelerate innovation and drive business value for your organization, we have the resources to get you started. Be sure to check out the AWS Prescriptive Guidance: Strategies for building a data mesh-based enterprise solution on AWS. This comprehensive guide covers the key considerations and best practices for establishing a robust, well-governed data mesh on AWS. From aligning your data mesh with overall business strategy to scaling the data mesh across your organization, this Prescriptive Guidance provides a clear roadmap to help you succeed.

If you’re curious to get hands-on, see the GitHub repository: Building an enterprise Data Mesh with Amazon DataZone, Amazon DataZone, AWS CDK, and AWS CloudFormation. This open source project delivers a step-by-step guide to build a data mesh architecture using Amazon DataZone, AWS Cloud Development Kit (AWS CDK), and AWS CloudFormation.


About the Authors

Dhrubajyoti Mukherjee is a Cloud Infrastructure Architect with a strong focus on data strategy, data analytics, and data governance at Amazon Web Services (AWS). He uses his deep expertise to provide guidance to global enterprise customers across industries, helping them build scalable and secure AWS solutions that drive meaningful business outcomes. Dhrubajyoti is passionate about creating innovative, customer-centric solutions that enable digital transformation, business agility, and performance improvement. An active contributor to the AWS community, Dhrubajyoti authors AWS Prescriptive Guidance publications, blog posts, and open-source artifacts, sharing his insights and best practices with the broader community. Outside of work, Dhrubajyoti enjoys spending quality time with his family and exploring nature through his love of hiking mountains.

Ravi Kumar is a Data Architect and Analytics expert at Amazon Web Services; he finds immense fulfillment in working with data. His days are dedicated to designing and analyzing complex data systems, uncovering valuable insights that drive business decisions. Outside of work, he unwinds by listening to music and watching movies, activities that allow him to recharge after a long day of data wrangling.

Martin Mikoleizig studied mechanical engineering and production technology at the RWTH Aachen University before starting to work in Dr. h.c. Ing. F. Porsche AG 2015 as a production planner for the engine assembly. In several years as a Project Manager on Testing Technology for new engine models he also introduced several innovations like human-machine-collaborations and intelligent assistance systems. From 2017, he was responsible for the Shopfloor IT team of the module lines in Zuffenhausen before he became responsible for the Planning of the E-Drive assembly at Porsche. Beside this he was responsible for the Digitalisation Strategy of the Production Ressort at Porsche. Since October 2022, he has been assigned to Volkswagen Autoeuropa in Portugal in the role of a Digital Transformation Manager for the plant driving the Digital Transformation towards a Data Driven Factory.

Weizhou Sun is a Lead Architect at Amazon Web Services, specializing in digital manufacturing solutions and IoT. With extensive experience in Europe, she has enhanced operational efficiencies, reducing latency and increasing throughput. Weizhou’s expertise includes Industrial Computer Vision, predictive maintenance, and predictive quality, consistently delivering top performance and client satisfaction. A recognized thought leader in IoT and remote driving, she has contributed to business growth through innovations and open-source work. Committed to knowledge sharing, Weizhou mentors colleagues and contributes to practice development. Known for her problem-solving skills and customer focus, she delivers solutions that exceed expectations. In her free time, Weizhou explores new technologies and fosters a collaborative culture.

Shameka Almond is an Advisory Consultant at Amazon Web Services. She works closely with enterprise customers to help them better understand the business impact and value of implementing data solutions, including data governance best practices. Shameka has over a decade of wide-ranging IT experience in the manufacturing and aerospace industries, and the nonprofit sector. She has supported several data governance initiatives, helping both public and private organizations identify opportunities for improvement and increased efficiency. Outside of the office she enjoys hosting large family gatherings, and supporting community outreach events dedicated to introducing students in K-12 to STEM.

Adjoa Taylor has over 20 years of experience in industrial manufacturing, providing industry and technology consulting services, digital transformation, and solution delivery. Currently Adjoa leads Product Centric Digital Transformation, enabling customers to solve complex manufacturing problems by leveraging Smart Factory and Industry leading transformation mechanisms. Most recently driving value with AI/ML and generative AI use-cases for the plant floor. Adjoa is an experienced leader spending over 20 years of her career delivering projects in countries throughout North America, Latin America, Europe, and Asia. Through prior roles, Adjoa brings deep experience across multiple business segments with a focus on business outcome driven solutions. Adjoa is passionate about helping customers solve problems while realizing the art of the possible via the right impacting value-based solution.

Celebrating 10 Years of Amazon ECS: Powering a Decade of Containerized Innovation

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/celebrating-10-years-of-amazon-ecs-powering-a-decade-of-containerized-innovation/

Today, we celebrate 10 years of Amazon Elastic Container Service (ECS) and its incredible journey of pushing the boundaries of what’s possible in the cloud! What began as a solution to streamline running Docker containers on Amazon Web Services (AWS) has evolved into a cornerstone technology, offering both impressive performance and operational simplicity, including a serverless option with AWS Fargate for seamless container orchestration.

Over the past decade, Amazon ECS has become a trusted solution for countless organizations, providing the reliability and performance that customers such as SmugMug rely on to power their operations without being bogged down by infrastructure challenges. As Andrew Shieh, Principal Engineer at SmugMug, shares, Amazon ECS has been the “unsung hero” behind their seamless transition to AWS and efficient handling of massive data operations, such as migrating petabytes of photos to Amazon Simple Storage Service (Amazon S3). “The blazingly fast container spin-ups allow us to deliver awesome experiences to our customers,” he adds. It’s this kind of dependable support that has made Amazon ECS a favorite among developers and platform teams, helping them scale their solutions and innovate over the years.

In the early 2010s, as containerized services like Docker gained traction, developers started looking for efficient ways to manage and scale their applications in this new paradigm. Traditional infrastructure was cumbersome, and managing containers at scale was challenging. Amazon ECS arrived in 2014, just when developers were looking to adopt containers at scale. It offered a fully managed, and reliable solution that streamlined container orchestration on AWS. Teams could focus on building and deploying applications without the overhead of managing clusters or complex infrastructure, ushering in a new era of cloud-native development.

When the Amazon ECS team set out to build the service, their vision was clear. As Deepak Singh, product manager who launched Amazon ECS now serving as VP of Next Generation Developer Experience, said at the time, “Our customers wanted a solution that was deeply integrated with AWS, that could work for them at scale and could grow as they grew.” Amazon ECS was designed to use the best of what AWS has to offer—scalability, availability, resilience, and security—to give customers the confidence to run their applications in production environments.

Evolution
Amazon ECS has consistently innovated for customers over the past decade. It marked the beginning of the container innovation journey at AWS, paving the way for a broader ecosystem of container-related services that have transformed how businesses build and manage applications.

Smartsheet proudly sings the praises of the significant impact that Amazon ECS, and especially AWS Fargate, had on their business to date. “Our teams can deploy more frequently, increase throughput, and reduce the engineering time to deploy from hours to minutes. We’ve gone from weekly deployments to deployments that we do multiple times a day. And from what used to be hours of at least two engineers’ time, we’ve been able to shave that down to several minutes,” said Skylar Graika, distinguished engineer at Smartsheet. ” Within the last year, we have been able to scale out its capacity by 50 times, and by leveraging deep integrations across AWS services, we have improved efficiencies and simplified our security and compliance process. Additionally, by adopting AWS Graviton with the Fargate deployments, we’ve seen a 20 percent reduction in cost.”

Amazon ECS played a pivotal role as the starting point for a decade of container evolution at AWS and today, it still stands as one of the most scalable and reliable container orchestration solutions, powering massive operations such as Prime Day 2024, where Amazon launched an impressive 77.24 million ECS tasks, Rufus, a shopping assistant experience powered by generative AI that uses Amazon ECS as part of its core architecture and so many others.

Rustem Feyzkhanov, ML engineering manager at Instrumental, and AWS Machine Learning Hero, is quick to recognize the increased efficiency gained from adopting the service. “Amazon ECS has become an indispensable tool in our work,” says Rastem. “Over the past years, it has simplified container management and service scaling, allowing us to focus on development rather than infrastructure. This service makes it possible for application code teams to co-own infrastructure and that speeds up the development process.”

Timeline
Let’s have a look at some of the key milestones that have shaped the evolution of ECS, marking pivotal moments that changed how customers harness the power of containers on AWS.

2014Introducing Amazon EC2 Container Service! – Check out this nostalgic blog post, which marked the release of ECS in preview mode. It shows how much functionality the service already launched with making a big impact from the get-go! Customers could already run, stop, and manage Docker containers on a cluster of Amazon Elastic Compute Cloud (EC2) instances, with built-in resource management and task scheduling. It became generally available on April 9, 2015.

2015Amazon ECS auto-scaling – With the introduction of added support for more Amazon CloudWatch metrics, customers could now automatically scale their clusters in and out by monitoring the CPU and memory usage in the cluster and configuring threshold values for auto scaling. I think this is a great example of how seemingly modest releases can have a huge impact for customers. Another impactful release was the introduction of Amazon ECR, a fully managed container registry that streamlines container storage and deployment.

2016Application Load Balancer (ALB) for ECS – The introduction of ALB for ECS, provided advanced routing features for containerized applications. ALB enabled more efficient load balancing across microservices, improving traffic management and scalability for ECS workloads. Windows users also benefitted from various releases this year including the added support for Windows Server 2016 with several AMIs and right and beta support for Windows Server Containers.

2017Introducing AWS Fargate! – Fargate was a huge leap forward towards customers being able to run containers without managing the underlying infrastructure, which significantly streamlined their operations. Developers no longer had to worry about provisioning, scaling, or maintaining the EC2 instances on which their containers ran and could now focus entirely on their application logic while AWS handled the rest. This helped them to scale faster and innovate more freely, accelerating their cloud-centered journeys and transforming how they approached containerized applications.

2018AWS Auto Scaling – With this release, teams could now build scaling plans easily for their Amazon ECS tasks. This year also saw the release of many improvements such as moving Amazon ECR to its own console experience outside of the Amazon ECS console, integration of Amazon ECS with AWS Cloud Map, and many others. Additionally, AWS Fargate continued to expand into regions world-wide.

2019Arm-based Graviton2 instances available on Amazon ECS – AWS Graviton2 was released during a time when many businesses were turning their attention towards reprioritizing their sustainability goals. With a focus on improved performance and lower power usage, EC2-instances powered by Graviton2 were supported on Amazon ECS from day 1 of their launch. Customers could take full advantage of this new groundbreaking custom chipset specially built for the cloud. Another great highlight from this year was the launch of AWS Fargate Spot which helped customers to achieve significant cost reductions.

2020Bottlerocket – An open-source, Linux-based operating system optimized for running containers. Designed to improve security and simplify updates, Bottlerocket helped Amazon ECS users achieve greater efficiency and stability in managing containerized workloads.

2021ECS Exec – Amazon ECS introduced ECS Exec in March 2021. With it, customers could run commands directly inside a running container on Amazon EC2 or AWS Fargate. This feature provided enhanced troubleshooting and debugging capabilities without requiring to modify or redeploy containers, streamlining operational workflows. This year also saw the release of Amazon ECS Windows containers streamlined operations for those running them in their cluster.

2022Amazon ECS introduces Service Connect – The release of ECS Service Connect marked a pivotal moment for organizations running microservices architectures on Amazon ECS because it abstracted away much of the complexity involved in service-to-service networking. This dramatically streamlined management of communication between services. With a native service discovery and service mesh capability, developers could now define and manage how their services interacted with each other seamlessly, improving observability, resilience, and security without the need to manage custom networking or load balancers.

2023Amazon GuardDuty ECS runtime monitoring – Last year, Amazon GuardDuty introduced ECS Runtime Monitoring for AWS Fargate, enhancing security by detecting potential threats within running containers. This feature provides continuous visibility into container workloads, improving security posture without additional performance overhead.

2024Amazon ECS Fargate with EBS Integration – In January this year, Amazon ECS and AWS Fargate added support for Amazon EBS volumes, enabling persistent storage for containers. This integration allows users to attach EBS volumes to Fargate tasks, making it much more effortless to deploy storage and support data intensive applications.

Where are we now?
Amazon ECS is in an exciting place right now as it enjoys a level of maturity that allows it to keep innovating while delivering huge value to both new and existing customers. This year has seen many improvements to the service making it increasingly more secure, cost-effective and straightforward to use.

This includes releases such as the support for automatic traffic encryption using TLS in Service Connect;  enhanced stopped task error messages which makes it more straightforward to troubleshoot task launch failures; and the ability to restart containers without having to relaunch the task. The introduction of Graviton2 based instances with AWS Fargate Spot provided customers with a great opportunity to double down on their cost savings.

As usual with AWS, the Amazon ECS team are very focused on delighting customers. “With Amazon ECS and AWS Fargate, we make it really easy for you to focus on your differentiated business logic while leveraging all the powerful compute that AWS offers without having to manage it,” says Nick Coult, director of Product and Science, Serverless Compute. “Our vision with these services was, and still is, to enable you to minimize infrastructure management, write less code, architect for extensibility, and drive high performance, resilience, and security. And, we have continuously innovated in these areas with this goal in mind over the past 10 years. At Amazon ECS, we remain steadfast in our commitment to delivering agility without compromising security, empowering developers with an exceptional experience, unlocking broader, simpler integrations, and new possibilities for emerging workloads like generative AI.”

Conclusion
Looking back on its history, it’s clear to me that ECS is a testament to the AWS approach of working backwards from customer needs. From its early days of streamlining container orchestration to the transformative introduction of Fargate and Service Connect, ECS has consistently evolved to remove barriers for developers and businesses alike.

As we look to the future, I think ECS will keep pushing boundaries, enabling even more innovative and scalable solutions. I encourage everyone to continue exploring what ECS has to offer, discovering new ways to build and pushing the platform to its full potential. There’s a lot more to come, and I’m excited to see where the journey takes us.

Learning resources
If you’re new to Amazon ECS, I recommend you read the comprehensive and accessible Getting Started With Amazon ECS guide.

When you’re ready to skill up with some hands-on free training, I recommend trying this self-paced Amazon ECS workshop, which covers many aspects of the service, including many of the features mentioned in this post.

Thank you, Amazon ECS, and thank you to all of you who use this service and continue to help us make it better for you. Here’s to another 10 years of container innovation! 🥂

How to build a Security Guardians program to distribute security ownership

Post Syndicated from Mitch Beaumont original https://aws.amazon.com/blogs/security/how-to-build-your-own-security-guardians-program/

Welcome to the second post in our series on Security Guardians, a mechanism to distribute security ownership at Amazon Web Services (AWS) that trains, develops, and empowers builder teams to make security decisions about the software that they create. In the previous post, you learned the importance of building a culture of security ownership to scale security within your organization, and how AWS achieves this using the Security Guardians program. Since then, many customers have asked how they can build their own, similar program.

In this post, you will learn the steps to build your own Security Guardians program for your organization, including how to:

  • Set the vision, mission, and goals of your program
  • Identify developer teams that can pilot your new program
  • Define the expected behaviors for those teams
  • Develop training and create opportunities for career development to keep your teams engaged in the program

The guidance in this post is based on what we learned at AWS. Because every organization is different, the final version of the program you build is likely to look different from the one at AWS. Your program needs to reflect the current state of your organization’s culture of security and be designed to cultivate the security-related behaviors that are most important to your organization.

Security Guardians program mechanism

As discussed in the previous post, mechanisms form a key part of our business at AWS. Figure 1 demonstrates how a mechanism is a complete process, or virtuous cycle, that reinforces and improves itself as it operates. It takes controllable inputs and transforms them into ongoing outputs to address a recurring business challenge. In this case, the business challenge AWS faced was that security findings were being identified late in the development lifecycle, making it more expensive—in terms of time, money and effort—to remediate them. This led to bottlenecks in our security review process. The culture of security at AWS, specifically our culture of ownership, provides support to solve this challenge, but we needed the Security Guardians mechanism to actually do it.

Figure 1: AWS mechanism cycle

Figure 1: AWS mechanism cycle

With most mechanisms, driving adoption is difficult, especially when the mechanism requires human participation to succeed. This is also true in the case of Security Guardians, and you can use our experience to help you avoid some of the challenges and growing pains of driving adoption.

Getting everyone aligned

“If I had an hour to solve a problem and my life depended on the solution, I would spend the first 55 minutes determining the proper question to ask, for once I know the proper question, I could solve the problem in less than five minutes.” – Albert Einstein

Getting alignment for the need to distribute security expertise starts with deeply understanding what problems need to be addressed. For example:

  • Is product delivery velocity being negatively impacted by delays in the security review process?
  • What business goal or metric are these delays negatively impacting?
  • Where in the security review process are those delays occurring?
  • What factors are contributing to those delays?
  • Is it a lack of time, people, or skills?

Thoroughly understanding the specific problems and their root causes, as identified by answering those questions, allows you to evaluate whether distributing security ownership is the appropriate solution. This in turn makes it easier to gain alignment and buy-in across the organization for the chosen approach.

A component of a culture of security

Building a strong culture of security requires support from executive leadership to set the direction for the rest of the organization. Executive support makes it easier for product leaders to secure the resources and finances needed for a Security Guardians program to be successful. To align with your organization’s leaders, you can reflect on the goals of your leaders and how the Security Guardians program can be built to meet those goals.

For example, if your business goal is to ship products 25 percent faster, understand how a particular resourcing effort from Security Guardians is going to help your organization meet that goal. AWS benefited from the program with a 26.9 percent reduction in the time to review a new service or feature when a Security Guardian was involved.

Our experience is that it’s challenging to establish a Security Guardians program without executive support. If you’re struggling to identify a business leader to sponsor the program and provide insight on the business problem, your AWS account team—including your account manager or solutions architect—can help. If you’re a business leader or executive reading this post, consider becoming that sponsor yourself.

One step at a time

A step-by-step approach to implementing the Security Guardians program helps overcome organizational challenges and avoid common pitfalls that could lead to failure. These steps, shown in Figure 2, are:

  1. Set the vision
  2. Choose innovators
  3. Define behaviors
  4. Maintain interest
  5. Measure success

These steps support the activities that make a mechanism successful: adoption, inspection, and tools.

Figure 2: Steps for implementing a Security Guardians program

Figure 2: Steps for implementing a Security Guardians program

Set the vision

Now that you’ve identified the business problem or business goal, set the vision for the Security Guardians program by working backwards from this problem or goal to define the purpose of your program. For example, the vision of the AWS Security Guardians is “To nourish security ownership that consistently delights our customers with security-by-design throughout the development lifecycle.”

Craft an ambitious vision for your Security Guardians program. Think beyond easy wins and focus on bold, forward-thinking security outcomes for your organization. Make sure that each element of your vision aligns with a business problem or goal. The following table is an example of how the vision of the program is aligned with business goals:

Business goals Security outcome Long-term goals
Develop products faster and more efficiently. To improve developer agility while reducing security risk. Increase the number of threat models performed by Security Guardians (instead of by application security engineers). Over time, this goal could change to “increase the quality of threat models.”.

Decrease the average monthly security issue rate.

Train three new Security Guardians each quarter.

Reduce long-term security spend. To identify and mitigate security risk as early as possible.
Increase customer trust. To exceed customer security expectations by raising the security bar.

The next step is to define a clear mission that is supported with measurable goals. The mission and goals must be achievable and help to move the needle towards the long-term vision.

The final part is to name your program. We chose Security Guardians, like Marvel’s Guardians of the Galaxy. We’ve also heard customers using Security Champions, Security Advocates, Security Innovators, and Security Drivers. Have fun with it and make sure the name resonates with as many participants as possible.

After you’ve defined the vision, future state, mission, measurable goals, and name of the program, review them with your security and business leaders. It’s beneficial to include your innovators or Security Guardians who will be early adopters of the program in this review. In the next section, you’ll learn how to identify these innovators.

Choosing innovators

Just as you develop for and iterate with early adopters of the products you’re building, you should identify individuals and teams who will pilot the program with you. Before the AWS Security Guardians program, our application security engineering teams built relationships with product teams through security reviews.

This meant that they already knew which individuals within those product teams had an interest in security. This is where AWS started, but the success of your program isn’t dependent on whether you already know who these individuals are. Development teams will self-identify and nominate Security Guardians from their own teams. Figure 4 shows examples to help you get started understanding which development teams will be good early adopters for your program.

Figure 3: Example product teams for early program adopters

Figure 3: Example product teams for early program adopters

The examples in Figure 3 include:

Candidate A: Quick wins team

Early adopters typically share key traits, including existing security measures and a designated security role or team members with security expertise. Essentially, they already prioritize security at the team level.

Candidate B: High impact team

This is the team most impacted by the disparity between product development teams and security teams; the agility and time-related benefits of the Security Guardians program will be the highest for this team. For example, this team might be facing long delays in launching products because of the current security review process at your organization.

Candidate C: High risk team

This team owns a product that has a high security risk because of the nature of the product. This team will benefit the most from additional security scrutiny and from raising the security bar at your organization. For example, this team might be building a product that’s considered a critical asset, hosts sensitive data, or performs critical processes.

After you’ve identified one or more teams that could be good early adopters of the program, you need to identify at least one individual from each team to serve as the Security Guardian. Keep the vision and goals of your program in mind when selecting your Security Guardian. Your early Security Guardians should have at least the following characteristics:

  • Ability to exercise well-informed and decisive judgement
  • Maintain and showcase their knowledge
  • Not afraid to have their work be independently validated
  • Advocate for their security needs in internal discussions
  • Hold a high security bar
  • Thoughtful and assertive to make customer security a top priority on their team

In terms of time commitment, our experience is that each Security Guardian spends an average of 3.5 hours each month on activities such as answering general security questions, identifying security stories needed for sprints, diving deep into security related tasks and supporting security related tasks. Each application security review takes approximately 4 hours of effort.

The first post of this series contains even more details on the characteristics that make a good Security Guardian.

Defining behaviors

It’s important to set expectations on what behaviors you want Security Guardians, developers, and security teams to exhibit within the context of the program. These behaviors typically relate directly to the goals of the program. For example, if one of the goals is to increase the number of threat models created, then create threat modeling will be one of the defined behaviors. The behaviors need to be measurable with some flexibility for change as you improve the program.

At AWS, our Security Guardians have access to a runbook that lists the activities each Guardian should take when engaged as part of a review. With each of these activities understood, the program team will then make sure appropriate training is provided so that the Security Guardians are able to complete each of the activities. For example, AWS Security Guardians are asked to help develop threat models. To support this, the program team has developed and released training material to teach Security Guardians how to create a threat model.

With the defined behaviors, understand how the Security Guardian and product development team will engage with the security team. Although we’re clearly defining behaviors, the behaviors aren’t typically done in a silo for the successful launch of a secure product. At AWS, the Security Guardians and product developers engage with the security teams in key partnership areas. If you’re unsure of where to start in defining the behaviors of your program, Figure 4 shows an example of how teams interact at AWS, beginning with the creation of an initial threat model and going through review, remediation, and testing. Consider creating your own version of the model to help define the behaviors and key partnership areas at your organization.

Figure 4: Example behaviors and partnership areas at AWS

Figure 4: Example behaviors and partnership areas at AWS

In the example of a threat model review, the Guardian and the central security team will jointly create and review the threat model. Specific activity examples include reviewing threats that have no documented mitigations and discussing additional threats that haven’t yet been considered.

As part of encouraging a culture of ownership, AWS recommends allowing Security Guardians to influence the role within a set of boundaries. An example of this is allowing the Security Guardians to be a part of recurring reviews of the program growth metrics, actively collecting their feedback, and encouraging them to host their own training sessions. Active Security Guardians are key to the success of the program and allowing them to influence the program will give them a sense of ownership and inclusion.

Maintaining interest

It’s important to not lose sight that a program like the AWS Security Guardians program is supported by volunteers. Most of your Security Guardians will be product developers who already have a full-time job developing products for your organization. The time and effort to find and onboard new Security Guardians will have a low return on investment if they stop engaging because the program owners didn’t keep them engaged. Keeping Security Guardians is just as important as finding them.

At AWS, we invest time to understand how to build trust with Security Guardians and provide value by working backwards from their wants and needs. Some Security Guardians joined the program to learn new skills and for career growth opportunities. AWS built training programs that were designed for Security Guardians and provide metrics that are used to document their impact to their managers and leaders.

AWS Security Guardians constantly tell us that they value recognition of their contributions by leadership. We work to build mechanisms to continuously surface the great work of our Security Guardians. We also recognize the contributions Security Guardians make through awards, gifts, and other incentives. For example, each quarter, the AWS Security Guardians team sends out a newsletter to senior leaders of the organization. This communication identifies the Guardians within their organization and highlights their contributions, including the number and impact of reviews they’ve completed.

Another way that AWS recognizes the contributions of our Security Guardians is through the Guardians Belt Program. The Guardians Belt Program is designed to recognize Security Guardians for their contributions and support them as they work to advance their security skills and expand their scope of impact. Security Guardians earn Black, Green, Yellow, and White belts with each belt corresponding to significant accomplishments that require consistent commitment to raising the security bar.

To make sure that Security Guardians value the program, your organization should provide and actively facilitate benefits. The benefits must be accessible without requiring additional time or effort from the Security Guardians, promoting immediate and direct gains. Consider the following examples of benefits to maintain Security Guardian interest and support:

  • Specialized training: Workshops, game days, challenges and contests.
  • Impact opportunities: Ability to impact multiple products by working with other teams in the organization, ability to help define patterns, best practices, and automation for the program.
  • Community: Collaborate, connect, share and learn from experts and individuals with similar interests.
  • Ownership opportunities: Ability to accelerate certain steps in the process.
  • Leadership opportunities: Active involvement in recurring program or business reviews.

The best ways to maintain interest are determined by the culture of your organization. What does your organization value the most, and how will the program provide that to your Security Guardians? Sometimes, the best way to answer these questions is to ask your early or potential Security Guardians.

Measuring success

The final step of building a successful Security Guardians program is to measure program success. Measuring success is equivalent to the inspection step from Figure 1. This verifies that your desired outcomes are being achieved and provides a jumping off point for iteration. Measuring success also gives you the opportunity to audit the output or results of the Security Guardians program and perform corrections and improvements.

Earlier in this post, we covered identifying the business problem and creating the vision and measurable goals for your Security Guardians program. Example metrics include:

  • Average time to release features
  • Average number of security issues per team
  • Average time spent by Security Guardians and builders doing security work
  • Percentage of Security Guardians who have taken required and non-required training

Measuring success includes steps to collect feedback and tune the program over time, shown in Figure 5.

Figure 5: Feedback and tuning steps for Security Guardians program.

Figure 5: Feedback and tuning steps for Security Guardians program.

The cycle to gather feedback and tune the program includes:

  1. Report on metrics
  2. Communicate wins
  3. Measure outcome and cycle time
  4. Identify trends
  5. Review goals

Gathering feedback from Security Guardians is as important as providing feedback to them. One of the ways AWS collects feedback from Security Guardians is through an annual survey that collects feedback on their experiences of program and tooling. To help both builders and Security Guardians improve over time, our security review tooling captures feedback from security engineers on the inputs from Security Guardians. Combined, the data gathered through these surveys helps our security ownership mechanism reinforce and improve itself over time.

Figure 6 summarizes the steps that you can take to develop your program.

Figure 6: Security Guardians program steps

Figure 6: Security Guardians program steps

The broad steps to develop a program include:

  • Set the vision: Set your vision for the program and metrics for success. Get sponsorship from leadership. Choose a name for your program.
  • Choose innovators: Identify innovators who have a passion for security and foster a community with continuous knowledge sharing.
  • Define behaviors: Redefine your RACI (responsible, accountable, consulted, informed) and be clear on expectations from your security advocates.
  • Maintain interest: Provide clear training and learning paths and opportunities for career advancement.
  • Measure success: Gather feedback and measure the program’s effectiveness.

Conclusion

This post and the previous post covered numerous concepts, considerations, and ideas, including:

  • The initial intention of the Security Guardians program is to focus on training developers in product teams. This improves early security-focused design thinking.
  • An alternative approach is to embed or align security engineers directly with product development teams. This can be more effective in organizations where reporting structures and accountability are key considerations.
  • Some organizations draw Security Guardians from all job types. The program can also be used to focus on uplifting developers and broad security culture.
  • You must regularly inspect the outcomes delivered by the Security Guardians program and use the information to make incremental improvements as the program matures.

For additional support building a Security Guardians program, contact your AWS account representative and they will get you in touch with a specialist who can help you develop your program.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Mitch Beaumont
Mitch Beaumont

Mitch is a Principal Solutions Architect for Amazon Web Services based in Sydney, Australia. Mitch works with some of Australia’s largest financial services customers, helping them to continually raise the security bar for the products and features that they build and ship. Outside of work, Mitch enjoys spending time with his family, photography, and surfing.
Ana Malhotra
Ana Malhotra

Ana previously worked as a Security Specialist Solutions Architect and was the Healthcare and Life Sciences (HCLS) Security Lead for AWS Industry, based in Seattle, Washington. As a former AWS Application Security Engineer, during her time with AWS Industry, Ana loved talking all things AppSec, including people, process, and technology. In her free time, she enjoys tapping into her creative side with music and dance.

An unexpected discovery: Automated reasoning often makes systems more efficient and easier to maintain

Post Syndicated from Byron Cook original https://aws.amazon.com/blogs/security/an-unexpected-discovery-automated-reasoning-often-makes-systems-more-efficient-and-easier-to-maintain/

During a recent visit to the Defense Advanced Research Projects Agency (DARPA), I mentioned a trend that piqued their interest: Over the last 10 years of applying automated reasoning at Amazon Web Services (AWS), we’ve found that formally verified code is often more performant than the unverified code it replaces.

The reason is that the bug fixes we make during the process of formal verification often positively impact the code’s runtime. Automated reasoning also gives our builders confidence to explore additional optimizations that improve system performance even further. We’ve found that formally verified code is easier to update, modify, and operate, leading to fewer late-night log analysis and debugging sessions. In this post, I’ll share three examples that came up during my discussions with DARPA.

Automated reasoning: The basics

At AWS, we strive to build services that are simple and intuitive for our customers. Underneath that simplicity lie vast, complex distributed systems that process billions of requests every second. Verifying the correctness of these complex systems is a significant challenge. Our production services are in a constant state of evolution as we introduce new features, redesign components, enhance security, and optimize performance. Many of these changes are complex themselves, and must be made without impacting the security or resilience of AWS or our customers.

Design reviews, code audits, stress testing, and fault injection are all invaluable tools we use regularly, and always will. However, we’ve found that we need to supplement these techniques in order to confirm correctness in many cases. Subtle bugs can still escape detection, particularly in large-scale, fault-tolerant architectures. And some issues might even be rooted in the original system design, rather than implementation flaws. As our services have grown in scale and complexity, we’ve had to supplement traditional testing approaches with more powerful techniques based on math and logic. This is where the branch of artificial intelligence (AI) called automated reasoning comes into play.

While traditional testing focuses on validating system behavior under specific scenarios, automated reasoning aims to use logic to verify system behavior under any possible scenario. In even a moderately complex system, it would take an intractably large amount of time to reproduce every combination of possible states and parameters that may occur. With automated reasoning, it’s possible to achieve the same effect quickly and efficiently by computing a logical proof of the correctness of the system.

Using automated reasoning requires our builders to have a different mindset. Instead of trying to think about all possible input scenarios and how they might go wrong, we define how the system should work and identify the conditions that must be met in order for it to behave correctly. Then we can verify that those conditions are true by using mathematical proof. In other words, we can verify that the system is correct.

Automated reasoning views a system’s specification and implementation in mathematics, then applies algorithmic approaches to verify that the mathematical representation of the system satisfies the specification. By encoding our systems as mathematical systems and reasoning about them using formal logic, automated reasoning allows us to efficiently and authoritatively answer critical questions about the systems’ future behavior. What can the system do? What will it do? What can it never do? Automated reasoning can help answer these questions for even the most complex, large-scale, and potentially unbounded systems—scenarios that are impossible to exhaustively validate through traditional testing alone.

Does automated reasoning allow us to achieve perfection? No, because it still depends on certain assumptions about the correct behavior of the components of a system and the relationship between the system and the model of its environment. For example, the model of a system might incorrectly assume that underlying components such as compilers and processors don’t have any bugs (although it is possible to formally verify those components as well). That said, automated reasoning allows us to achieve higher confidence in correctness than is possible by using traditional software development and testing methods.

Faster development

Automated reasoning is not just for mathematicians and scientists. Our Amazon Simple Storage Service (Amazon S3) engineers use automated reasoning every day to prevent bugs. Behind the simple interface of S3 is one of the world’s largest and most complex distributed systems, holding 400 trillion objects, exabytes of data, and regularly processing over 150 million requests per second. S3 is composed of many subsystems that are distributed systems in their own right, many consisting of tens of thousands of machines. New features are being added all the time, while S3 is under heavy use by our customers.

A key component of S3 is the S3 index subsystem, an object metadata store that enables fast data lookups. This component contains a very large, complex data structure and intricate, optimized algorithms. Because the algorithms are difficult for humans to get right at S3 scale, and because we can’t afford errors in S3 lookups, we made new improvements on a cadence of about once per quarter, due to the extreme care and extensive testing required to confidently make a change.

S3 is a well-built and well-tested system built on 15 years of experience. However, there was a bug in the S3 index subsystem for which we couldn’t determine the root cause for some time. The system was able to automatically recover from the exception, so its presence didn’t impact the behavior of the system. Still, we were not satisfied.

Why was this bug around so long? Distributed systems like S3 have a large number of components, each with their own corner cases, and a number of corner cases happen at the same time. In the case of S3, which has over 300 microservices, the number of potential combinations of these corner cases is enormous. It’s not possible for developers to think through each of these corner cases, even when they have evidence the bug exists and ideas about its root cause—never mind all of the possible combinations of corner cases.

This complexity drove us to look at how we could use automated reasoning to explore the possible states and errors that might be hidden in those states. By building a formal specification of the system, we were able to find the bug and prove the absence of further bugs of its type. Using automated reasoning also gave us the confidence to ship updates and improvements every one to two months rather than just three to four times a year.

Faster code

The correctness of the AWS Identity and Access Management (IAM) service is foundational to the security of our customers’ workloads. Across millions of customers, thousands of resource types, and hundreds of AWS services, every API call—every single request to AWS—is processed by the IAM authorization engine. That’s over 1.2 billion requests per second. This is some of the most security-critical and highly scaled software in AWS.

Before any change at AWS goes into production, we need an extremely high degree of confidence that the system remains secure and correct. Using automated reasoning, we can prove that our systems adhere to specific security properties, under an exhaustive number of circumstances. We call this provable security. Not only has automated reasoning enabled us to provide provable security assurance to our customers, it gives us the ability to deliver functionality, security, and optimization at scale.

Like S3, IAM has evolved over 15 years into a time-tested and trusted system. But we wanted to raise the bar further. We built a formal specification that captures the behavior of the existing IAM authorization engine, codified its policy evaluation principles into provable theorems, and used automated reasoning to build a new and more efficient implementation. Earlier this year, we deployed the new proved-correct authorization engine —and no one noticed. Automated reasoning allowed us to seamlessly replace one of the most critical pieces of AWS infrastructure, the authorization engine, with a proved-correct equivalent.

With the specification and proofs in place, we could safely and aggressively optimize the code with a high degree of confidence. At the massive scale of IAM, every microsecond of performance improvement translates into a better customer experience and better cost optimization for AWS. We optimized string matching, removed unnecessary memory allocation and redundant computations, strengthened security, and improved scalability. After every change, we re-ran our proofs to confirm that the system was still operating correctly.

The optimized IAM authorization engine is now 50% faster than its predecessor. We simply would not have been able to make these types of impactful optimizations with such confidence if we didn’t use automated reasoning. For a deeper look at how we did this, see this AWS re:Inforce session.

Faster deployment (of faster code)

Most secure online transactions are protected by encryption. For example, the RSA encryption algorithm protects data by generating two keys: one to encrypt the data, and one to decrypt it. These keys enable secure data transmission as well as secure digital signatures. In the context of encryption, correctness and performance are both essential—a bug in an encryption algorithm can be disastrous.

As AWS customers move their workloads to AWS Graviton, the benefits of optimizing cryptography for the ARM instruction set increase. But optimizing encryption for better performance is complex, which makes it difficult to verify that modified encryption algorithms are behaving properly. Before we started to use automated reasoning, optimizations to cryptography libraries often required months-long reviews to achieve confidence for release into production.

Enter the power of automated reasoning: formal verification made RSA faster, and faster to deploy. We are seeing similar improvements when we apply automated reasoning to elliptic curve cryptography.

The formation of a virtuous cycle

Over the last decade, we’ve increasingly applied automated reasoning techniques within AWS to prove the correctness of our cloud infrastructure and services. We routinely use these methods not only to verify correctness, but also to enhance security and reliability and minimize design flaws. Automated reasoning can be used to create a precise, testable model of a system, which we can use to quickly verify that changes are safe—or learn they are unsafe without causing harm in production.

We can answer critical questions about our infrastructure to detect misconfigurations that might expose data. We can help stop subtle but serious bugs from reaching production that we would not have found with other techniques. We can make bold performance optimizations that we would not have dared attempt without model checking. Automated reasoning provides rigorous mathematical assurance that critical systems behave as expected.

AWS is the first and only cloud provider to use automated reasoning at this scale. As adoption of automated reasoning tools increases, it becomes easier for us to justify ever-larger investments into improving the usability and scalability of automated reasoning tools. The easier it is to use the automated reasoning tools and the more powerful they become, the more adoption we’ve observed. The more we’re able to prove correctness of our cloud infrastructure, the more compelling the cloud is to security-obsessed customers. And, as the examples in this post illustrate, not only are we able to increase security assurance, we are delivering higher performant code to customers faster, translating into cost savings that we can eventually pass on to customers.

My prediction is that we’re in the beginning of an era in which critical properties like security, compliance, availability, durability, and safety can be proved automatically for large-scale cloud architectures. From preventing potential issues with AI hallucinations to analyzing hypervisors, cryptography, and distributed systems, having sound mathematical reasoning at our foundations and continuously analyzing what we build sets Amazon apart.

Learn more

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Byron Cook
Byron Cook

Byron is Professor of Computer Science at University College London (UCL) and a fellow on the UK’s Royal Academy of Engineering. Byron founded the Amazon Automated Reasoning Group in 2015 and currently serves as Vice President and Distinguished Scientist of Automated Reasoning at AWS. Byron’s interests include computer and network security, program analysis and verification, programming languages, theorem proving, logic, hardware design, operating systems, and biological systems.

Strengthening security in the era of generative AI: Must-attend sessions at re:Invent 2024

Post Syndicated from Anna Montalat original https://aws.amazon.com/blogs/security/strengthening-security-in-the-era-of-generative-ai-must-attend-sessions-at-reinvent-2024/

/

AWS re:Invent 2024, December 2-6, 2024 | Las Vegas, NV

Generative AI is transforming industries in new and exciting ways every single day. At Amazon Web Services (AWS), security is our top priority, and we see security as a foundational enabler for organizations looking to innovate. As you prepare for AWS re:Invent 2024, make sure that these essential sessions are on your schedule to learn how security can help your organization innovate with generative AI solutions quickly and securely. Leading experts will provide deep insights into how you can secure generative AI workloads in order to protect data and navigate governance, risk, and compliance.

In this post, we’ve highlighted some of our must-attend sessions and favorite activities for security leaders and practitioners, technical decision-makers, and artificial intelligence and machine learning (AI/ML) builders. To join in on the fun, register here, and we’ll see you in Vegas!

Keynotes and innovation talks

The AWS re:Invent 2024 keynote and innovation talks offer the opportunity to gain direct, transformative insights from senior AWS leaders. Delve into the latest breakthroughs in generative AI, cloud security, and cutting-edge architectural innovations that are redefining the future of application development and the AWS Cloud.

  • KEY002 – CEO Keynote with Matt Garman. Discover how AWS is innovating across the cloud, from reinventing core services to creating new experiences, empowering customers and partners to build a secure and better future.
  • SEC203-INT – Security insights and innovation from AWS with Chris Betz. Discover how groundbreaking security innovations and generative AI empower your organization to accelerate innovation securely, as AWS CISO Chris Betz reveals transformative strategies to integrate and automate security, freeing your team to focus on high-value initiatives.
  • ARC203-INT – Architectural methods & breakthroughs in innovative apps in the cloud with Shaown Nandi and Ben Cabanas. This talk showcases how generative AI and cutting-edge architectural advancements are transforming application design, empowering AWS customers to modernize their systems, develop robust data strategies, and securely navigate the evolving cloud landscape.
  • Check out the full list of innovation talks. Not attending live this year? The keynote and all innovation talks will be live streamed.

Sessions

Discover a range of learning opportunities designed to deepen your expertise in securing generative AI. This year’s sessions put a strong focus on providing customers with impactful real-world, practical prescriptions for securing your AI workloads and the data that powers them. Whether you prefer lecture-style breakout sessions, interactive chalk talks, hands-on workshops, or code-driven discussions, there’s a session tailored to help meet your needs. Explore the following options and reserve your spot to enhance your understanding and practical skills in this rapidly evolving field.

You can find more details and descriptions for session levels (100400) and session types on the re:Invent website.

Breakout sessions

Breakout sessions are lecture-style, 1-hour sessions delivered by AWS experts, customers, and partners—perfect for deepening your knowledge on important topics, gaining actionable insights, and connecting with industry leaders.

  • SEC214 –Elevating client experiences with secure AI: Rocket Mortgage’s approach. Discover how Rocket Mortgage implemented AWS generative AI services to enhance customer experiences while navigating security challenges. Register for this session
  • SEC315 – Bring your workforce identities to AWS for generative AI and analytics. This session will demonstrate the power of integrating your workforce identity provider to gain easier access to generative AI applications and tools. AWS and NVIDIA will demonstrate a full end-to-end identity-aware experience. Register for this session
  • SEC323 –The AWS approach to secure generative AI. Learn how AWS secures generative AI across the infrastructure, model, and application layers, giving customers control over their data with built-in security features. Register for this session
  • SEC403 –Generative AI for security in the real world. Explore practical generative AI applications for security teams, including incident response, red team/blue team enablement, and security operations center (SOC) use cases, to boost your security operations. Register for this session

Chalk talks

Chalk talks are 1-hour long, highly interactive sessions with a small audience. This format is ideal for diving deep into specific topics, engaging directly with AWS experts, and getting your questions answered in real time.

  • SEC303 – Protecting data within your generative AI architectures. Mitigate risks when training large language models (LLMs) on sensitive data. Explore techniques like sanitization, anonymization, and differential privacy. Register for this talk
  • SEC327 – Building secure network designs for gen AI applications. Optimize your cloud network design to power transformative generative AI applications more securely, as we share proven best practices, proactive controls, and reference architectures to build resilient, defense-in-depth architectures and accelerate innovation on AWS. Register for this talk
  • SEC335 –Harness generative AI for business growth amidst the regulatory landscape. Explore how AWS AI/ML solutions can drive business growth while helping you align to responsible practices. Learn from your peers about their strategies to navigate evolving regulatory landscapes, from the European Union’s General Data Protection Regulation (GDPR) to industry-specific mandates. Register for this talk
  • SEC336 –Security and compliance considerations using Amazon Q Business. Discover best practices for securing your Amazon Q Business application, focusing on access control, data protection, and compliance considerations, so that you can keep your AI assistant secure and compliant. Register for this talk
  • SEC338 –Safeguard your generative AI apps from prompt injections. Learn to protect your generative AI applications from prompt injection attacks by understanding input validation, secure prompt engineering, and content moderation. Register for this talk
  • PEX308 – Securing generative AI on AWS. Explore generative AI security considerations through a partner lens, including how partners can enhance data security and the value-adds that partners bring to a customer’s generative AI workloads. Register for this talk
  • AIM344 – Understanding the deep security controls within Amazon Q Business. Learn about the security-related capabilities and controls within Amazon Q that allow you to confidently use your business data safely. Register for this talk
  • AIM407 – Understand the deep security controls within Amazon Bedrock. Dive deep into the security nuances of Amazon Bedrock, as we unpack the architectures, data flows, and lifecycle management of complex features like Guardrails, Agents, and Knowledge Bases, empowering you to use this generative AI service with uncompromising data privacy and control. Register for this talk
  • DEV323 – OWASP Top 10 for LLMs. Strengthen your skills in securing generative AI applications by exploring real-world vulnerabilities and proven mitigation strategies against the OWASP Top 10 risks for large language models (LLMs), through interactive demos and hands-on exercises. Register for this talk

Code talks

Code talks are similar to our popular chalk talk format, but with a focus on live coding or code samples rather than whiteboarding. These sessions look into the actual code used to build a solution, allowing attendees to understand the “why” behind the approach and witness the development process, including any errors that may arise. Participants are encouraged to ask questions and follow along for a deeper, hands-on learning experience.

  • SEC401 – Inspect and secure your application with generative AI. Harness the power of generative AI to bolster your application security, as we unveil how AI-driven tools can rapidly detect vulnerabilities and recommend remediation strategies, empowering you to build more secure software with ease. Register for this talk
  • SEC405 – Consolidated data protection insights with generative AI. Discover how to secure your AWS KMS keys across your accounts by using Amazon Q in QuickSight for quick, actionable insights. Register for this talk

Builders’ sessions

Interact with small groups, led by an AWS expert providing interactive learning about how to build on AWS. Each builders’ session begins with a short explanation or demonstration of what attendees are building, then it’s your turn to build! The expert will guide you end-to-end through this hands-on experience.

Note: You must bring your own laptop to participate in these sessions.

  • DOP302 – Creating secure code with Amazon Q Developer. Supercharge your coding prowess with Amazon Q Developer, as you gain hands-on experience using its AI-powered capabilities to write more secure, optimized code, detect vulnerabilities, and implement instant remediations—transforming your development workflow. Register for this session
  • SMB302 – Empower your business with defense-in-depth architecture. Empower your small-to-medium business to innovate more securely with generative AI by exploring practical, cost-effective defense-in-depth strategies, layered security architectures, and AI-specific safeguards to build resilient, trusted AI-powered solutions in the AWS Cloud. Register for this session

Workshops

Workshops are 2-hour interactive sessions where you collaborate in teams or work individually to solve real-world challenges by using AWS services, making them perfect for hands-on learning. Each workshop begins with a brief lecture, followed by dedicated time to work through the problem.

Note: Don’t forget to bring your laptop to build alongside AWS experts.

  • SEC305 – Generative AI-based code remediations and patch management at scale. Experience hands-on how to use generative AI to assist in automating vulnerability detection and remediation across AWS Lambda, containers, and Amazon Elastic Compute Cloud (Amazon EC2) at scale, empowering your team to proactively secure your applications. Register for this workshop
  • SEC306 – Securing your generative AI applications on AWS. Gain hands-on experience securing generative AI applications by using AWS services and features. Deploy a vulnerable sample AI app, then implement layered security controls to protect, detect, and respond to issues. Use these best practices to secure your own AI apps when you return home! Register for this workshop
  • SEC309 – AWS IAM Identity Center: Secure access to generative AI applications. You’ll learn how to build an identity-aware chat experience, train it on a sample dataset, and connect it to an external workforce identity provider by using native integration between Amazon Q Business and AWS Identity and Access Management (IAM) Identity Center. Register for this workshop
  • SEC310 – Persona-based access to enterprise data for generative AI applications. Learn how to secure document access in generative AI applications by using retrieval augmented generation (RAG), metadata filtering, and Amazon Cognito in this interactive workshop. Register for this workshop

Expo

Want to talk directly with an AWS security expert on generative AI security, or a variety of other security topics? Then don’t miss this opportunity to have one-on-one conversations with leading AWS security experts in the Security Activation area of the expo floor to help you take your organization’s security posture to new heights.

Delve into key security domains such as:

  • Detection and Response: Explore techniques for detecting and responding to security risks to help protect your workloads at scale.
  • Network and Infrastructure Security: Learn how to build and manage a secure global network with AWS services.
  • Application Security: Discover strategies to ship secure software and address the challenges of application security.
  • Identity and Access Management: Adopt modern cloud-native identity solutions and apply least-privilege access controls.
  • Digital Sovereignty & Data Protection: Maintain control over your data and choose how to secure and manage it in the AWS Cloud.

Still time for fun!

After an inspiring week of transformative insights and deep learning, join us for the world renowned re:Play party—the ultimate re:Invent sendoff! Immerse yourself in live entertainment from headlining musical artists, scrumptious cuisine, and flowing refreshments as we come together to unwind, connect, and toast to a future of limitless possibilities.

Register today

It’s going to be an amazing event, and we can’t wait to see you at re:Invent 2024! Register now to secure your spot.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Anna Montalat
Anna Montalat

Anna is a Senior Product Marketing Manager for AWS generative AI security, which includes helping customers securely deploy Amazon Bedrock, Amazon SageMaker, Amazon Q, and other AI/ML solutions. She is passionate about bringing new and emerging technologies to market, working closely with service teams and enterprise customers. Outside of work, Anna skis through wintertime and sails through summer.
Matt Saner
Matt Saner

As a Senior Manager at AWS, Matt leads a team of security specialists who help the world’s most complex organizations tackle critical security challenges. Matt and his team work to transform security organizations into strategic business enablers. Before joining AWS, Matt spent close to two decades in the financial services industry. Outside of work, Matt is a pilot who finds joy in flying general aviation aircraft.

How AWS uses active defense to help protect customers from security threats

Post Syndicated from Chris Betz original https://aws.amazon.com/blogs/security/how-aws-uses-active-defense-to-help-protect-customers-from-security-threats/

AWS is deeply committed to earning and maintaining the trust of customers who rely on us to run their workloads. Security has always been our top priority, which includes designing our own services with security in mind at the outset, and taking proactive measures to mitigate potential threats so that customers can focus on their businesses with confidence. We continuously innovate and invest in advancing our security capabilities.

To help prevent security incidents from disrupting our customers’ businesses, we need to stay ahead of potential threats, and protect customers quickly when we become aware of activities that could be potentially harmful to them. We’ve previously shared details about our sophisticated global honeypot system MadPot and our massive internal neural network graph model Mithra. These are two examples of internal threat intelligence tools that we use to take proactive, real-time action to help prevent a potential threat from becoming a real security issue for our customers.

I mentioned another internal threat intelligence tool called Sonaris in my recent re:Inforce keynote. Sonaris is an active defense tool that analyzes potentially harmful network traffic so that we can quickly and automatically restrict threat actors who are hunting for exploitable vulnerabilities. With MadPot, Mithra, and Sonaris in the hands of our world-class security teams, AWS is equipped with powerful, actionable threat intelligence on a scale that is only possible with AWS. This blog post covers why and how we use Sonaris behind the scenes to help protect customers, and how we know our threat intelligence is having measurable results.

AWS security innovation tackles threats with measurable results for customers at global scale

As organizations have migrated to the cloud over the last decade, threat actors have evolved their tactics to exploit environments that aren’t properly secured. In 2017, our security teams observed an increasing number of unauthorized attempts to scan (systematic examination through digital means and tools) and probe AWS customer accounts—activities conducted by threat actors who were hunting for Amazon Simple Storage Service (Amazon S3) buckets that customers unintentionally configured with public access. To help address this security issue on behalf of our customers, AWS security teams developed active defense capabilities to help detect these kinds of suspicious scanning behaviors and then restrict the actions that malicious actors might take to further improperly access a customer’s S3 bucket.

This novel approach to a new cloud-era security challenge evolved to become the threat intelligence tool we now call Sonaris, which today identifies and automatically restricts unauthorized scanning and S3 bucket discovery within minutes at global scale. Sonaris applies security rules and algorithms to identify anomalies from over two hundred billion events each minute. Preventing opportunistic attempts from threat actors to discover and exploit misconfigurations or out-of-date software represents a significant leap forward in our security capabilities at AWS.

How do we know that the network mitigations performed by Sonaris are actually making a difference for our customers? We can compare threat activity between MadPot sensors, with and without Sonaris protections. To do this, we use MadPot to construct two separate large-scale fleets of honeypot testing groups to compare statistics for each security configuration. One group is protected by our perimeter security controls fed by Sonaris analytics, and a separate fleet receives no protection. This allows us to measure the protective coverage of hosts within the AWS network perimeter.

Findings from these split testing groups underscore the sheer volume of potential threats that Sonaris manages to thwart, and the ongoing work behind the scenes to enhance the security of AWS infrastructure. For example, across the hundreds of different types of malicious interactions that MadPot classified, Sonaris observed an 83% reduction in abuse attempts in September 2024. In the past 12 months, Sonaris denied more than 27 billion attempts to find unintentionally public S3 buckets, and prevented nearly 2.7 trillion attempts to discover vulnerable services on Amazon Elastic Compute Cloud (Amazon EC2). This protection drastically reduces risk for AWS customers.

How Sonaris detects and restricts abusive scanning and exploitation attempts

Sonaris plays a critical role in helping to secure AWS and our customers by detecting and then restricting certain suspicious behavior aimed at AWS infrastructure and services. Its capabilities are built on the integration of both network telemetry sources across AWS, plus our threat intelligence data. What sets Sonaris apart is its integration of AWS network telemetry with Amazon threat intelligence to provide safe and effective threat mitigation to reduce indiscriminate scanning activity.

Sonaris applies heuristic, statistical, and machine learning algorithms to vast amounts of the summarized metadata and service health telemetry that we use to operate our services. One threat intelligence source that Sonaris uses is MadPot, which receives traffic on tens of thousands of IP addresses every day. MadPot emulates hundreds of different services and mimics customer accounts, and then classifies these interactions into known Common Vulnerabilities and Exposures (CVEs) and other vulnerabilities. Through MadPot, Sonaris can also integrate additional high-fidelity signals that help identify activities of threat actors with enhanced precision. First-party threat intelligence collected from MadPot increases Sonaris confidence and accuracy for automatically restricting known malicious vulnerability enumeration attacks so that customers are protected automatically.

When Sonaris identifies malicious attempts to scan an AWS IP address or customer account, it triggers automated protections in AWS Shield, Amazon Virtual Private Cloud (Amazon VPC), Amazon S3, and AWS WAF, automatically protecting customer resources from unauthorized activity in real time. Sonaris is judicious about what activities it restricts, only intervening when there is a sufficiently high assurance that the interactions are malicious. For example, to help ensure that legitimate customer interactions are not restricted, we developed dynamic guardrail models to identify what normal behavior looks like in AWS services so that only suspicious activities are detected and acted on. We update and refresh these guardrail models constantly with our latest observations to help avoid taking action on legitimate customer activities.

Sonaris is having real-world impact at scale against dynamic threats that exist today

Throughout 2023 and 2024, a large active botnet known as Dota3 has been scanning the internet for vulnerable hosts and devices to install cryptominer malware (malicious software designed to secretly use a victim’s computer or device resources). Sonaris has been effectively protecting customers from this botnet, even as the botnet’s operators try new ways to evade defenses. In Q3 2024, we observed this botnet’s scanning behavior change as it began using different payloads, rates, and endpoints, as shown in the following figure. Thanks to the layered detection methods of Sonaris, this botnet was unable to avoid our automatic detection. Sonaris automatically protected customers from more than 16,000 malicious scanning endpoints each hour.

Figure 1: Dota3 botnet activity suddenly changes in September 2024

Figure 1: Dota3 botnet activity suddenly changes in September 2024

AWS is committed to making the internet a safer place

Although Sonaris reduces risk, it does not eliminate it entirely, and our work is far from over. As we continue to evolve and strengthen our security measures, AWS remains committed to making the internet a safer place so that customers can thrive in an increasingly complex digital landscape while maintaining a strong security posture. Through the creation of active security tools such as Sonaris, and our customers’ diligent application of security best practices, we can collectively create a more secure cloud environment for all.

Your feedback is crucial to us and we encourage you to leave comments, reach out to our customer support teams, or engage with us through your preferred channels. Together, we can shape the future of cloud security and stay ahead of emerging threats.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Chris Betz
Chris Betz

Chris is CISO at AWS. He oversees security teams and leads the development and implementation of security policies with the aim of managing risk and aligning the company’s security posture with business objectives. Chris joined Amazon in August 2023 after holding CISO and security leadership roles at leading companies. He lives in Northern Virginia with his family.

Accelerate application upgrades with Amazon Q Developer agent for code transformation

Post Syndicated from Jonathan Vogel original https://aws.amazon.com/blogs/devops/accelerate-application-upgrades-with-amazon-q-developer-agent-for-code-transformation/

In this blog, we will explore how Amazon Q Developer Agent for code transformation accelerates Java application upgrades. We will examine the benefits of this Generative AI-powered agent and outline strategies to achieve maximal acceleration, drawing from real-world success stories and best practices.

Benefits of using Amazon Q Developer to upgrade your applications

Amazon Q Developer addresses a critical challenge for organizations managing numerous Java applications, particularly as they face the approaching end of Long-Term-Support (LTS) for older Java versions. Upgrading to Java 17 enhances security, resolves vulnerabilities, and improves performance while ensuring long-term compatibility and access to modern features. Currently, Q Developer agent for code transformation supports upgrades from Java 8 and 11 to Java 17. Software developers can utilize Q Developer within their IDE (VS Code and JetBrains) to transform both single-module and multi-module applications. Q Developer will generate a plan that identifies necessary library upgrades and replacements for deprecated code in the application, proposing code changes with the goal of ensuring the transformed code compiles successfully in Java 17. Q Developer can significantly enhance the efficiency of your migration workflow, performing code transformations on applications in hours rather than weeks.

Customer success of using Q Developer to modernize legacy Java applications

Customers have used Q Developer to upgrade their Java applications successfully. Here is how two customers as well as Amazon internal teams use Q Developer to accelerate the migration process.

A large insurance company in North America strategically approached their Java upgrade initiative by identifying applications with dependencies that Q Developer could upgrade effectively. They focused on applications that rely on frameworks like Spring Boot, which can be time-consuming to upgrade manually. After leveraging Q Developer to transform 4 applications in pilot, they estimated a 36% acceleration in their upgrade process, indicating that Q Developer automatically completed over a third of the work that would have been required manually. While the remaining portion still necessitated manual intervention to ensure the code would build and run correctly, the effort acceleration was significant.

A major financial services firm’s experience with Q Developer proved equally compelling. In a focused two-day workshop, 20 developers successfully transformed 20 applications in production using the Amazon Q Developer agent. This results in 42% time savings using Q Developer compared to manual upgrade, saving on average 24 hours per application. They spent about 3 weeks to prepare for the transformation workshop. They identified first-party (1P) dependencies—internal libraries that other production applications rely on. Q developer does not guarantee upgrade of 1P dependencies. With a combination of Q Developer and manual work, the customer upgraded many of these common 1P dependencies leading up to the workshop. This step was crucial to gain maximum acceleration while using Q Developer for the upgrades.

Amazon uses Q Developer internally to upgrade Java applications following company-wide campaigns. The central team who owns the campaigns provides detailed guidance on which Java applications can be upgraded with Q developer most effectively. This team also manages Amazon’s internal build system and provides tooling to automate part of the manual efforts. They are able to achieve significant savings. Amazon was able to upgrade more than 50% of production applications in six months, 79% of the auto-generated code reviews were applied without additional changes.

Use Q Developer to upgrade your applications

To ensure that Q Developer is properly applied to the specific characteristics of their codebases, customers create and follow a transformation approach. Teams and individuals who understand the scope of the upgrade run campaigns across the company to effectively utilize Q Developer. To maximize the acceleration from Q Developer, these teams classify the applications which need to be upgraded, identify which ones can be upgraded using Q Developer, estimate the manual effort required, which provides a baseline to measure the value added by Q Developer agent for code transformation. The preparation phase is crucial before starting the execution phase of the upgrade. Each of the steps in the preparation phase plays an important role in maximizing the acceleration of Amazon Q in their upgrade processes.

  1. Classifying the applications to upgrade: Q Developer supports the upgrade of 30 most common Java libraries. Q Developer’s performance on less common and internal libraries is lower compared to the common libraries. In this case, you can use a combination of Q Developer and manual steps. It’s recommended to include both production applications and internal dependencies in this step. You should also classify your applications and internal libraries based on if/how they are used by other applications, it will help prioritize the applications to upgrade first in campaigns. Classifying applications by libraries used can help you identify the best upgrade approach using Q Developer.
  2. Defining baselines of efficiency: To measure the efficiency of the upgrade effort in your organization, it is crucial to establish baselines. Based on the classification of applications, use Q Developer in a pilot for each class to see which libraries are transformed correctly, and which ones have to be done manually. This helps you operationalize the process of using Q Developer and the manual steps required, and understand how this procedure accelerates the upgrade of a certain class of applications. Some customers use manual effort hours for each upgrade on dependency versions and deprecated code as baseline and compare the manual effort hours with time taken when completing the upgrade using Q Developer. For example, you can classify the applications based on the main frameworks used before upgrading applications using Q Developer. Compare the time taken by Q Developer with manual upgrade hours to understand which applications can be upgraded by Q Developer most effectively.
  3. Identifying applications for migration: Decide which applications to use Q Developer for, and prioritize the applications to upgrade in waves based on expected acceleration and business value. You can prioritize the applications which are most used by other applications and upgrade them in the initial campaign, then upgrade the rest of the applications in the subsequent campaigns. By addressing the foundational components first, the overall upgrade process will be streamlined. In Amazon, a centralized internal team defines migration waves and identifies which packages would be included in the upgrade campaign. Additionally, this team conducted analysis of the apps to determine the likelihood of the upgrade being successful using Q developer, and provides an estimate of the remaining engineering effort needed to complete the upgrade. The team will use this information to select applications and uses an Amazon-internal tool to assign the upgrade tasks to the team owning the applications. While SDEs were free to run the upgrade on their own, following the campaign with a set deadline mobilized the application owner teams to complete the upgrade.

Use Q Developer to automate upgrade tasks

Once the preparation phase is completed, you can start the execution phase. Software developers can use Q Developer to accelerate many of the steps in execution phase.

  1. Assessing the components of an application to upgrade. You can use Q Developer to start a transformation, at the beginning of the transformation, there will be a transformation plan generated for you to view which dependencies and deprecated code will be upgraded.
  2. Research and update dependency versions compatible to the target version. Q Developer will analyze your app and attempt to update the dependencies to the versions compatible with target Java version and in some cases the latest version.
  3. Replace deprecated methods and API calls which are not compatible to the target version. Q Developer will detect the deprecated code and attempt to update to what’s recommended in the compatible Java version.
  4. Reviewing the modified code and address any conflicts or issues that may arise. Q Developer will return code changes to you at the end of the transformation. If the transformation is successful, the app will compile in Java 17. If the transformation is partially successful, Q Developer was able to upgrade library versions and make code changes but could not compile the transformed app successfully in Java 17. Check out this part of our documentation on how to handle partial transformations.
  5. Test the upgraded application thoroughly to ensure correct functionality. Q Developer will run the unit tests and integration tests in your app when compiling in the target version.

Conclusion

As organizations face the pressing need to modernize their Java applications, Amazon Q Developer emerges as a powerful ally in this complex journey. The customer success stories demonstrate the tangible benefits of leveraging AI-assisted code transformation: significant time savings, reduced manual effort, and accelerated upgrade processes.

Q Developer not only addresses the technical challenges of Java upgrades, but also enables organizations to approach these initiatives strategically. By classifying applications, establishing baselines, and prioritizing upgrades, teams can maximize the efficiency of their modernization efforts. While Q Developer streamlines much of the upgrade process, it is important to note that some challenges may still arise. For a comprehensive understanding of potential challenges and detailed guidance on getting started with Q Developer, we encourage you to explore our public documentation.

The journey to Java 17 and beyond doesn’t have to be daunting. With Amazon Q Developer, you have a powerful tool at your disposal to accelerate your upgrade process, reduce costs, and ensure your applications remain secure, performant, and future-ready.

Take the first step towards modernizing your Java ecosystem today. Explore Amazon Q Developer and discover how it can transform your upgrade strategy. See Getting Started with Amazon Q Developer agent for code transformation for a how-to guide on using Q Developer to transform Java applications.

About the authors

Jonathan Vogel

Jonathan is a Developer Advocate at AWS. He was a DevOps Specialist Solutions Architect at AWS for two years prior to taking on the Developer Advocate role. Prior to AWS, he practiced professional software development for over a decade. Jonathan enjoys music, birding and climbing rocks.

Yiyi Guo

Yiyi is a Senior Product Manager at AWS working on Amazon Q developer agent for code transformation, she focuses on leveraging generative AI to accelerate enterprise application modernization.

Podcast: Empowering organizations to address their digital sovereignty requirements with AWS

Post Syndicated from Marta Taggart original https://aws.amazon.com/blogs/security/podcast-empowering-organizations-to-address-their-digital-sovereignty-requirements-with-aws/

Developing strategies to navigate the evolving digital sovereignty landscape is a top priority for organizations operating across industries and in the public sector. With data privacy, security, and compliance requirements becoming increasingly complex, organizations are seeking cloud solutions that provide sovereign controls and flexibility. Recently, Max Peterson, Amazon Web Services (AWS) Vice President of Sovereign Cloud, sat down with Daniel Newman, CEO of The Futurum Group and co-founder of Six Five Media, to explore how customers are meeting their unique digital sovereignty needs with AWS. Their thought-provoking conversation delves into the factors that are driving digital sovereignty strategies, the key considerations for customers, and AWS offerings that are designed to deliver control, choice, security, and resilience in the cloud. The podcast includes a discussion of AWS innovations, including the AWS Nitro System, AWS Dedicated Local Zones, AWS Key Management Service External Key Store, and the upcoming AWS European Sovereign Cloud. Check out the episode to gain valuable insights that can help you effectively navigate the digital sovereignty landscape while unlocking the full potential of cloud computing.

Visit Digital Sovereignty at AWS to learn how AWS can help you address your digital sovereignty needs.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Marta Taggart
Marta Taggart

Marta is a Principal Product Marketing Manager focused on digital sovereignty in AWS Security Product Marketing. Outside of work, you’ll find her trying to make sure that her rescue dog, Jack, lives his best life.

Amazon Q Developer Code Challenge

Post Syndicated from Aaron Sempf original https://aws.amazon.com/blogs/devops/amazon-q-developer-code-challenge/

Amazon Q Developer is a generative artificial intelligence (AI) powered conversational assistant that can help you understand, build, extend, and operate AWS applications. You can ask questions about AWS architecture, your AWS resources, best practices, documentation, support, and more.

With Amazon Q Developer in your IDE, you can write a comment in natural language that outlines a specific task, such as, “Upload a file with server-side encryption.” Based on this information, Amazon Q Developer recommends one or more code snippets directly in the IDE that can accomplish the task. You can quickly and easily accept the top suggestions (tab key), view more suggestions (arrow keys), or continue writing your own code.

However, Amazon Q Developer in the IDE is more than just a code completion plugin. Amazon Q Developer is a generative AI (GenAI) powered assistant for software development that can be used to have a conversation about your code, get code suggestions, or ask questions about building software. This provides the benefits of collaborative paired programming, powered by GenAI models that have been trained on billions of lines of code, from the Amazon internal code-base and publicly available sources.

The challenge

At the 2024 AWS Summit in Sydney, an exhilarating code challenge took center stage, pitting a Blue Team against a Red Team, with approximately 10 to 15 challengers in each team, in a battle of coding prowess. The challenge consisted of 20 tasks, starting with basic math and string manipulation, and progressively escalating in difficulty to include complex algorithms and intricate ciphers.

The Blue Team had a distinct advantage, leveraging the powerful capabilities of Amazon Q Developer, the most capable generative AI-powered assistant for software development. With Q Developer’s guidance, the Blue Team navigated increasingly complex tasks with ease, tapping into Q Developer’s vast knowledge base and problem-solving abilities. In contrast, the Red Team competed without assistance, relying solely on their own coding expertise and problem-solving skills to tackle daunting challenges.

As the competition unfolded, the two teams battled it out, each striving to outperform the other. The Blue Team’s efficient use of Amazon Q Developer proved to be a game-changer, allowing them to tackle the most challenging tasks with remarkable speed and accuracy. However, the Red Team’s sheer determination and technical prowess kept them in the running, showcasing their ability to think outside the box and devise innovative solutions.

The culmination of the code challenge was a thrilling finale, with both teams pushing the boundaries of their skills and ultimately leaving the audience in a state of admiration for their remarkable achievements.

Graph of elapsed time of teams in the AWS Sydney Summit code challenge

The graph shows the average completion time in which Team Blue “Q Developer” completed more questions across the board in less time than Team Red “Solo Coder”. Within the 1-hour time limit, Team Blue got all the way to Question 19, whereas Team Red only got to Question 16.

There are some assumptions and validations. People who consider themselves very experienced programmers were encouraged to choose team Red and not use AI, to test themselves against team Blue, those using AI. The code challenges were designed to test the output of applying logic. They were specifically designed to be passable without the use of Amazon Q Developer, to test the optimization of writing logical code with Amazon Q Developer. As a result, the code tasks worked well with Amazon Q Developer due to the nature of and underlying training of Amazon Q Developer models. Many people who attended the event were not Python Programmers (we constrained the challenge to Python only), and walked away impressed at how much of the challenge they could complete.

As an example of one of the more complex questions competitors were given to solve was:

Implement the rail fence cipher.
In the Rail Fence cipher, the message is written downwards on successive "rails" of an imaginary fence, then moving up when we get to the bottom (like a zig-zag). Finally the message is then read off in rows.

For example, using three "rails" and the message "WE ARE DISCOVERED FLEE AT ONCE", the cipherer writes out: 

W . . . E . . . C . . . R . . . L . . . T . . . E
. E . R . D . S . O . E . E . F . E . A . O . C .
. . A . . . I . . . V . . . D . . . E . . . N . .

Then reads off: WECRLTEERDSOEEFEAOCAIVDEN

Given variable a. Use a three-rail fence cipher so that result is equal to the decoded message of variable a.

The questions were both algorithmic and logical in nature, which made them great for testing conversational natural language capability to solve questions using Amazon Q Developer, or by applying one’s own logic to write code to solve the question.

Top scoring individual per team:

Total Questions Complete individual time (min)
With Q Developer (Blue Team) 19 30.46
Solo Coder (Red Team) 16 58.06

By comparing the top two competitors, and considering the solo coder was a highly experienced programmer versus the top Q Developer coder, who was a relatively new programmer not familiar with Python, you can see the efficiency gain when using Q Developer as an AI peer programmer. It took the entire 60 minutes for the solo coder to complete 16 questions, whereas the Q Developer coder got to the final question (Question 20, incomplete) in half of the time.

Summary

Integrating advanced IDE features and adopting paired programming have significantly improved coding efficiency and quality. However, the introduction of Amazon Q Developer has taken this evolution to new heights. By tapping into Q Developer’s vast knowledge base and problem-solving capabilities, the Blue Team was able to navigate complex coding challenges with remarkable speed and accuracy, outperforming the unassisted Red Team. This highlights the transformative impact of leveraging generative AI as a collaborative pair programmer in modern software development, delivering greater efficiency, problem-solving, and, ultimately, higher-quality code. Get started with Amazon Q Developer for your IDE by installing the plugin and enabling your builder ID today.

About the authors:

Aaron Sempf

Aaron Sempf is Next Gen Tech Lead for the AWS Partner Organization in Asia-Pacific and Japan. With over twenty years in software engineering and distributed system, he focuses on solving for large scale complex integration and event driven systems. In his spare time, he can be found coding prototypes for autonomous robots, IoT devices, distributed solutions and designing Agentic Architecture patterns for GenAI assisted business automation.

Paul Kukiel

Paul Kukiel

Paul Kukiel is a Senior Solutions Architect at AWS. With a background of over twenty years in software engineering, he particularly enjoys helping customers build modern, API Driven software architectures at scale. In his spare time, he can be found building prototypes for micro front ends and event driven architectures.

AWS named as a Leader in the first Gartner Magic Quadrant for AI Code Assistants

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/aws-named-as-a-leader-in-the-first-gartner-magic-quadrant-for-ai-code-assistants/

On August 19th, 2024, Gartner published its first Magic Quadrant for AI Code Assistants, which includes Amazon Web Services (AWS). Amazon Q Developer qualified for inclusion, having launched in general availability on April 30, 2024. AWS was ranked as a Leader for its ability to execute and completeness of vision.

We believe this Leader placement reflects our rapid pace of innovation, which makes the whole software development lifecycle easier and increases developer productivity with enterprise-grade access controls and security.

The Gartner Magic Quadrant evaluates 12 AI code assistants based on their Ability to Execute, which measures a vendor’s capacity to deliver its products or services effectively, and Completeness of Vision, which assesses a vendor’s understanding of the market and its strategy for future growth, according to Gartner’s report, How Markets and Vendors Are Evaluated in Gartner Magic Quadrants.

Here is the graphical representation of the 2024 Gartner Magic Quadrant for AI Code Assistants.

Here is the quote from Gartner’s report:

Amazon Web Services (AWS) is a Leader in this Magic Quadrant. Its product, Amazon Q Developer (formerly CodeWhisperer), is focused on assisting and automating developer tasks using AI. For example, Amazon Q Developer helps with code suggestions and transformation, testing and security, as well as feature development. Its operations are geographically diverse, and its clients are of all sizes. AWS is focused on delivering AI-driven solutions that enhance the software development life cycle (SDLC), automating complex tasks, optimizing performance, ensuring security, and driving innovation.

My team focuses on creating content on Amazon Q Developer that directly supports software developers’ jobs-to-be-done, enabled and enhanced by generative AI in Amazon Q Developer Center and Community.aws.

I’ve had the chance to talk with our customers to ask why they choose Amazon Q Developer. They said it is available to accelerate and complete tasks across the SDLC much more than general AI code assistants—from coding, testing, and upgrading, to troubleshooting, performing security scanning and fixes, optimizing AWS resources, and creating data engineering pipelines.

Here are the highlights that customers talked about more often:

Available everywhere you need it – You can use Amazon Q Developer in any of the following integrated development environment (IDE), including Visual Studio Code, JetBrains IDEs, AWS Toolkit with Amazon Q, JupyterLab, Amazon EMR Studio, Amazon SageMaker Studio, or AWS Glue Studio. You can also use Amazon Q Developer in the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS documentation, AWS Support, AWS Console Mobile Application, Amazon CodeCatalyst, or through Slack and Microsoft Teams with AWS Chatbot. According to Safe Software, “Amazon Q knows all the ways to make use of the many tools that AWS provides. Because we are now able to accomplish more, we will be able to extend our automations into other AWS services and make use of Amazon Q to help us get there.” To learn more, visit Amazon Q Developer features and Amazon Q Developer customers.

Customizing code recommendations – You can get code recommendations based on your internal code base. Amazon Q Developer accelerates onboarding to a new code base to generate even more relevant inline code recommendations and chat responses (in preview) by making it aware of your internal libraries, APIs, best practices, and architectural patterns. Your organization’s administrators can securely connect Amazon Q Developer to your internal code bases to create multiple customizations. According to National Australia Bank (NAB), NAB has now added specific suggestions using the Amazon Q customization capability that are tailored to the NAB coding standards. They’re seeing increased acceptance rates of 60 percent with customization. To learn more, visit Customizing suggestions in the AWS documentation.

Upgrading your Java applicationsAmazon Q Developer Agent for code transformation automates the process of upgrading and transforming your legacy Java applications. According to an internal Amazon study, Amazon has migrated tens of thousands of production applications from Java 8 or 11 to Java 17 with assistance from Amazon Q Developer. This represents a savings of over 4,500 years of development work for over a thousand developers (when compared to manual upgrades) and performance improvements worth $260 million dollars in annual cost savings. Transformations from Windows to cross-platform .NET are also coming soon! To learn more, visit Upgrading language versions with the Amazon Q Developer Agent for code transformation in the AWS documentation.

Access the complete 2024 Gartner Magic Quadrant for AI Code Assistants report to learn more.

Channy

Gartner Magic Quadrant for AI Code Assistants, Arun Batchu, Philip Walsh, Matt Brasier, Haritha Khandabattu, 19 August, 2024.

Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

GARTNER is a registered trademark and service mark of Gartner and Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.