Tag Archives: Financial Services

Modernization of real-time payment orchestration on AWS

Post Syndicated from Neeraj Kaushik original https://aws.amazon.com/blogs/architecture/modernization-of-real-time-payment-orchestration-on-aws/

The global real-time payments market is experiencing significant growth. According to Fortune Business Insights, the market was valued at USD 24.91 billion in 2024 and is projected to grow to USD 284.49 billion by 2032, with a CAGR of 35.4%. Similarly, Grand View Research reports that the global mobile payment market, valued at USD 88.50 billion in 2024, is expected to grow at a CAGR of 38.0% from 2025 to 2030. (Disclaimer: Third-party market research and statistics are provided for informational purposed only. AWS and IBM make no representations about the accuracy of this information.)

This rapid expansion underscores the urgency for financial institutions to modernize their payment processing infrastructure. Financial institutions often need to process high volume of transactions with near-zero latency to meet stringent service level agreements (SLAs) to support surging mobile payments volume.

However, traditional payment orchestration systems, often built on monolithic architectures, struggle to meet these demands due to latency, availability, and scalability challenges. Additionally, their reliance on on-premises infrastructure leads to higher costs and an impediment to innovation, reinforcing the need for modernization.

As sustainability becomes a priority, organizations are turning to cloud-based solutions to optimize infrastructure, reduce carbon footprints, and enhance energy efficiency. This shift provides scalability and performance, and aligns with global sustainability goals, securing the future of real-time payments.

In this post, we discuss the real-time payment orchestration framework. It uses an event-driven architecture and AWS serverless services to enhance the resiliency, efficiency, and scalability of real-time payments. By decomposing payment processing into distinct business capabilities, financial institutions can improve modularity and flexibility. Implementing tenant-based segregation helps with data isolation and security. Additionally, adopting asynchronous communication through Amazon Managed Streaming for Apache Kafka (Amazon MSK) enhances scalability and resilience.

Traditional real-time payment orchestration

Payment orchestration serves as a middleware solution, streamlining transaction processing across multiple payment methods, gateways, and financial institutions. It orchestrates key business functions such as payment authorization, payment processing, settlement and clearing, compliance and risk management, and account management for both inbound and outbound payment flows.

The following diagram depicts the high-level business capabilities supported by payment orchestrators across various payment flows, including real-time payments, digital disbursements, tax payments, wires, and more.

Payment processing system flowchart showing main components from acceptance to billing

Detailed flowchart depicting a payment processing system with multiple components. The diagram shows primary payment types at the top (including Realtime Payments, Digital Disbursement, Credit Transfer, and Peer to Peer Payments) flowing down through core processing stages including Payment Acceptance, Execution, Clearing, Reporting, Tracking, Reversals, and Billing.

Many financial institutions adopt a tenant-based approach organized by geography due to varying clearing processes, localized regulations, and transaction requirements across AWS Regions. However, without proper separation of services, teams often continue to add region-specific logic to existing services, gradually increasing their monolithic complexity and using the same infrastructure for all payment flows.

Traditional payment systems process transactions linearly, with each step waiting for the previous one to complete. However, analysis of payment workflows reveals numerous opportunities for parallel execution:

  • Sanctions screening and fraud detection – Compliance and fraud checks can run simultaneously with initial routing decisions, rather than sequentially blocking all subsequent processing
  • Payment routing and authorization requests – When basic validations are complete, routing and authorization can proceed in parallel rather than one after another
  • Payment execution and ledger updates – The actual payment execution doesn’t need to wait for ledger records to be updated—these can occur concurrently
  • Settlement, reconciliation, and tracking – These post-transaction processes can be initiated independently as soon as the primary transaction is complete

This parallel approach can dramatically improve throughput and reduce latency compared to traditional queue-based systems where operations form a sequential chain that extends processing time and creates bottlenecks.

Most legacy payment orchestration systems rely heavily on on-premises virtual machines (VMs), leading to several challenges:

  • Multi-Region support for disaster recovery and multi-tenancy resulting in significant capital expenditure and operational overhead
  • High latency and SLA issues caused by sequential message processing and delays between globally separated data centers
  • Limited reusability of payment flows as monolithic architectures require region-specific changes for local clearing mechanisms and regulations, increasing complexity and costs
  • Scalability challenges and high memory consumption due to inefficient resource utilization and execution of irrelevant logic across regions
  • Complex cross-border payment routing caused by variations in clearing rules, transaction limits, and local regulations, increasing latency and routing errors
  • Integration challenges with diverse data formats because legacy systems rely on proprietary standards (for example, ISO 20022, SWIFT MT), complicating data conversion and compliance
  • High deployment complexity for new payment flows due to monolithic architectures requiring extensive region-specific modifications, slowing time to market
  • Environmental impact and high carbon footprint from on-premises infrastructure consuming excessive energy, whereas cloud-based approaches improve efficiency

Solution overview

To overcome these challenges, the proposed architecture embraces the following design principles to build a future-ready, real-time payment orchestration solution:

  • Performance at scale – Handling over 1,000 transactions per second (TPS) with consistent low latency under varying load conditions.
  • High availability – Achieving 99.999% uptime to meet the strict requirements of financial transactions.
  • Geographic resilience – Supporting global operations with region-specific compliance while maintaining consistent performance.
  • Cost optimization – Reducing total cost of ownership through efficient resource utilization and serverless technologies.
  • Security and compliance – Supporting data protection and regulatory adherence across different jurisdictions.
  • Operational simplicity – Streamlining deployment, monitoring, and maintenance across the payment ecosystem.
  • Microservices – Decomposing payment processing into distinct business capabilities, so financial institutions can improve modularity and flexibility. This microservices-based approach allows for independent scaling and development of critical components.

The following diagram depicts the high-level solution architecture for real-time payments. The existing channels using synchronous or asynchronous APIs can be modified to use edge-optimized endpoints to reduce latency.

Event-driven payment orchestration system with pub/sub channels connecting multiple payment processing modules

Architecture diagram detailing an AWS-based payment orchestration platform utilizing event-driven principles. Features reusable components across two regions, with dedicated modules for payment initiation, execution, reconciliation, billing, and risk management. Implements pub/sub messaging patterns for inter-component communication and connects to enterprise systems including accounting, compliance, and analytics.

An event-driven architecture is used for payment orchestration, which handles communication through a pub/sub pattern. This architecture maintains persistent connections, improving performance of the end-to-end real-time payment processing.

The event-driven architecture for real-time payment processing allows multiple payment operations to occur simultaneously using different adaptors, as opposed to the traditional systems where payment processes are sequential and flow through a single pipeline. Payment events are distributed to specialized payment processor microservices based on their function (initiation, execution, tracking, settlements), enabling each to process independently without waiting for others to complete.

Because we’re transitioning from sequential processing to distributed, maintaining transaction traceability is crucial. The payment tracking adapters shown in the preceding diagram connect to enterprise analytics systems, creating a specialized layer for monitoring transactions. The pub/sub model allows for attaching correlation IDs to events, enabling systems to track related events across different topics and processing stages.

A standardized event schema serves as the foundation for this architecture, providing consistency across regional deployments while allowing for customization at the adapter level. This schema defines uniform event structures containing tenant-specific metadata and supports versioning to accommodate evolving requirements. By isolating region-specific variations to the adapter layer, the solution maintains core functionality while interfacing with diverse enterprise systems through configuration-driven customization rather than code changes.

For most payment processes, especially those with independent processing steps that can run in parallel, this architecture delivers net performance gains despite the topic switching overhead, particularly for complex transactions where multiple independent validations or processing steps are required.

Deployment on the AWS Cloud

The solution uses edge-optimized Amazon API Gateway for channels. An edge-optimized API endpoint routes requests to the nearest Amazon CloudFront Point of Presence (POP), which can help in cases where your clients are geographically distributed to enable efficient routing within each geographical region, enhancing global responsiveness by minimizing network round trips and making sure requests take the shortest possible path before transitioning from the public internet to the client network.

The following diagram illustrates the high-level solution architecture for real-time payments.

Multi-region AWS payment architecture with managed Kafka topics connecting Lambda microservices and DynamoDB storage

Comprehensive AWS payment orchestration solution implementing modern cloud-native architecture principles. Core processing logic implemented as Lambda functions covering initiation, execution, reconciliation, billing, tracking, risk management, and settlement workflows. Leverages Amazon MSK for reliable event streaming between components, with dedicated Kafka topics for each processing stage. Data persistence handled by Amazon DynamoDB, supporting cross-region operations. Architecture demonstrates AWS best practices for financial services, including regional redundancy, serverless computing, managed services, and event-driven design patterns. System integrates with external banking infrastructure and enterprise systems while maintaining separation of concerns through microservices architecture. Features built-in support for compliance monitoring, risk management, and payment tracking through specialized Lambda functions.

The solution uses Amazon MSK to implement an event-driven architecture that efficiently handles both inbound and outbound channels traffic through API requests and asynchronous message-based events. Amazon MSK communicates using a high-performance binary protocol between producers, consumers, and brokers, providing low latency and high throughput. Real-time payments are logically partitioned across multiple tenants within geographical regions—North America, EMEA, LATAM, and Asia-Pacific.

Each real-time payment tenant follows an active/active disaster recovery strategy by deploying MSK clusters across multiple AWS Regions, designed to achieve high availability and resilience. Amazon MSK offer both serverless and provisioned cluster options. The team can decide to select one or the other depending on the non-functional requirements and team expertise. Amazon MSK automatically manages partition leadership with leaders in primary Regions and followers in secondary Regions. During failover, leaders are re-elected in healthy Regions, designed to help maintain processing capabilities during regional incidents. Sticky partitioning uses consistent hashing for deterministic routing, and cooperative rebalancing enables efficient failover. Multi-AZ deployment provides zone redundancy and isolated clusters per Region for data sovereignty compliance through programmatic AWS Identity and Access Management (IAM) and virtual private cloud (VPC) boundaries.

To support seamless cross-Region replication and maintain message continuity, Amazon MSK Replicator—a fully managed feature of Amazon MSK—is used to replicate topics and synchronize consumer group offsets across clusters. MSK Replicator simplifies the process of building multi-Region Kafka applications by not needing custom code, open-source tool configuration, or infrastructure management. It automatically provisions and scales the necessary resources, so teams can focus on business logic while only paying for the data being replicated. In the event of a regional outage or failover, traffic can be automatically redirected to a healthy Region without data loss or service disruption, providing near-zero Recovery Time Objectives (RTOs) and uninterrupted operations for downstream services such as payment processors and audit trail consumers.

In addition to regional redundancy, the architecture uses an event-driven architecture to enable parallel and decoupled processing of payment transactions. Events such as transaction initiation, validation, and settlement are emitted asynchronously and consumed by various microservices independently, which drastically reduces end-to-end latency.

To process these events at scale, the architecture can use AWS Lambda, Amazon Elastic Container Service (Amazon ECS), or Amazon Elastic Kubernetes Service (Amazon EKS) depending upon non-functional requirements. Automatic scaling responds to Amazon CloudWatch metrics, and exponential backoff retry logic with dead-letter queues (DLQs) handles throttling scenarios. Circuit breakers prevent cascade failures during high error rates.

One of the key benefits of the solution is the reusability of payment flows across different regions. Although each region has its own unique compliance requirements and settlement rules, the core functionalities of real-time payments (payment authorization, payment processing, settlement and clearing) are largely similar. This reusability enables rapid deployment of payment solutions across new regions without rearchitecting the entire system. For example, the real-time payment system in the US and UK might share similar business logic for real-time gross settlement but differ in the clearing and compliance requirements. The solution treats these as bounded contexts within the microservices architecture, providing flexibility while making sure each region can handle its own specific rules and regulations.

Sustainability

AWS relentlessly innovates its infrastructure design, build, and operations to make progress towards net-zero carbon by 2040 and being water positive by 2030. Amazon MSK with AWS Graviton based instances use up to 60% less energy than comparable M5 instances, helping you achieve your sustainability goals. Lambda is inherently sustainable by design. Its serverless model makes sure compute resources are only used when needed, drastically reducing idle infrastructure and wasted energy. Instead of keeping always-on servers for infrequent tasks, Lambda provisions compute power just-in-time, achieving near-zero idle capacity.

Security and compliance in financial services

Given the sensitive nature of payment transactions and financial data, you should apply the security controls required to meet financial regulations such as AWS PCI DSS and AWS Federal Information Processing Standard (FIPS) 140-3 according to your organization’s needs.

The solution should incorporate multi-layered security controls, continuous monitoring, and automated compliance auditing to meet the rigorous expectations of banking regulators and internal risk teams. For more information, refer to Security Guidance.

Conclusion

The modernization of payment orchestration systems using an event-driven architecture and AWS serverless technologies marks a significant advancement in meeting the demands of today’s rapidly evolving financial services landscape. This solution addresses the key challenges faced by traditional payment systems while delivering substantial benefits in performance, scalability, cost optimization, global resilience, sustainability, and compliance. By using cutting-edge cloud technologies and robust security controls, financial institutions can now build a future-ready foundation that adapts to evolving business needs while maintaining the highest standards of performance, security, and reliability. As the real-time payments market continues its explosive growth, this modern architecture provides a solution that meets today’s demands and is also well-positioned to support tomorrow’s payment innovations. Organizations looking to modernize their payment infrastructure can use this blueprint to accelerate their digital transformation journey, supporting sustainable, secure, and efficient payment processing at scale in an increasingly competitive global marketplace.

The architecture presented here is for reference purposes only. IBM will work closely with you to deploy the solution in accordance with industry standards and compliance requirements.For additional resources, refer to:

IBM Consulting is an AWS Premier Tier Services Partner that helps customers who use AWS to harness the power of innovation and drive their business transformation. They are recognized as a Global Systems Integrator (GSI) for over 22 competencies, including Financial Services Consulting. For additional information, please contact an IBM Representative.

Multi-Region keys: A new approach to key replication in AWS Payment Cryptography

Post Syndicated from Ruy Cavalcanti original https://aws.amazon.com/blogs/security/multi-region-keys-a-new-approach-to-key-replication-in-aws-payment-cryptography/

In our previous blog post (Part 1 of our key replication series), Automatically replicate your card payment keys across AWS Regions, we explored an event-driven, serverless architecture using AWS PrivateLink to securely replicate card payment keys across AWS Regions. That solution demonstrated how to build a custom replication framework for payment cryptography keys.

Based on customer feedback requesting a more automated, no-code approach, we’re excited to announce an additional option to this capability with Multi-Region keys for AWS Payment Cryptography in Part 2 of our series.

By using this new feature, you can automatically synchronize payment cryptography keys from a primary Region to other Regions that you select, improving resilience and availability of payment applications. You can also choose between account-level replication or key-level replication, giving more flexibility in how to manage payment keys across Regions.

Multi-Region keys: Overview and benefits

The new Multi-Region key replication feature for AWS Payment Cryptography offers you flexible control over your key replication strategy through the following primary capabilities:

  • Control whether keys are replicated
  • Select specific Regions for key replication
  • Manage replication configuration changes
  • Configure either account-level or key-level replication to meet business needs

Multi-Region keys help deliver several benefits for global payment operations, including:

  • Improved availability: Access your payment keys even if a Region becomes unavailable
  • Disaster recovery: Maintain business continuity with replicated keys across Regions
  • Global operations: Support payment processing across multiple geographic regions
  • Simplified management: Centralized control with distributed availability
  • Consistent key IDs: The same key ID across Regions simplifies application development

Configuration options

Payment Cryptography provides two distinct methods for configuring Multi-Region key replication, giving flexibility to implement a strategy that best fits your organization’s needs. You can choose between a broad, account-level approach or a more granular, key-level method.

Account-level

With account-level configuration, AWS automatically replicates exportable symmetric keys created in your Payment Cryptography account from your designated primary Region to other Regions you specify. This simplifies key management in multi-Region deployments, provides consistent key availability in the Regions that you specify, and reduces the operational overhead of key management.

To configure account-level replication using the AWS Command Line Interface (AWS CLI), use the new enable-default-key-replication-regions API to set the Regions where AWS will replicate your keys. To remove Regions from your default replication list, use the disable-default-key-replication-regions API.

Note: Only symmetric keys created after the account-level replication is enabled will be replicated.

Key-level replication

By using key-level replication, you can achieve more granular control by:

  • Designating specific keys as multi-Region keys
  • Defining custom replication targets for each multi-Region key
  • Maintaining Region-specific keys when needed

Note: Within each Region, Payment Cryptography maintains redundancy of your keys across multiple Availability Zones for high availability. Multi-Region key replication extends across geographic boundaries, giving you additional resilience against Regional outages while maintaining control over where your keys are stored.

You can specify replication Regions during key creation using the --replication-regions parameter, using the AWS CLI, with the create-key or import-key APIs. For existing keys, you can use the new add-key-replication-regions and remove-key-replication-regions APIs to manage which regions receive your replicated keys.

Important: When you specify replication Regions during key creation, these settings take precedence over default replication Regions configured at the account level.

How it works

Figure 1 shows the process when you replicate a key in Payment Cryptography.

  1. The key is created in your designated primary Region
  2. Payment Cryptography automatically replicates the key material asynchronously to the specified replica Regions
  3. The replicated keys maintain the same key ID across Regions; only the Region portion of the Amazon Resource Name (ARN) changes
  4. The key in the primary Region is marked with MultiRegionKeyType: PRIMARY
  5. Keys in replica Regions are marked with MultiRegionKeyType: REPLICA and include a reference to the primary Region
  6. When deleting a key, its deletion cascades from the primary to replica Regions

Figure 1: Representation of key replication from us-east-1 to us-west-2

Figure 1: Representation of key replication from us-east-1 to us-west-2

Example: Creating a multi-Region key at key level

The following is an example of creating a card verification key (CVK) in the primary Region (us-east-1) with replication to us-west-2:

aws payment-cryptography create-key \
--exportable \
--key-attributes KeyAlgorithm=TDES_2KEY,\
KeyUsage=TR31_C0_CARD_VERIFICATION_KEY,\
KeyClass=SYMMETRIC_KEY,KeyModesOfUse='{Generate=true,Verify=true}' \
--region us-east-1 \
--replication-regions us-west-2

The response shows the key being created with replication in progress:

{
  "Key": {
    "KeyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/qs6643jl4ohibtqk",
    "KeyAttributes": {
      "KeyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
      "KeyClass": "SYMMETRIC_KEY",
      "KeyAlgorithm": "TDES_2KEY",
      "KeyModesOfUse": {
        "Encrypt": false,
        "Decrypt": false,
        "Wrap": false,
        "Unwrap": false,
        "Generate": true,
        "Sign": false,
        "Verify": true,
        "DeriveKey": false,
        "NoRestrictions": false
      }
    },
    "KeyCheckValue": "CC5EE2",
    "KeyCheckValueAlgorithm": "ANSI_X9_24",
    "Enabled": true,
    "Exportable": true,
    "KeyState": "CREATE_COMPLETE",
    "KeyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
    "CreateTimestamp": "2025-08-21T15:25:54.475000-03:00",
    "UsageStartTimestamp": "2025-08-21T15:25:54.287000-03:00",
    "MultiRegionKeyType": "PRIMARY",
    "ReplicationStatus": {
      "us-west-2": {
        "Status": "IN_PROGRESS"
      }
    },
    "UsingDefaultReplicationRegions": false
  }
}

After replication completes, the status updates to SYNCHRONIZED:

aws payment-cryptography get-key \
--key-identifier arn:aws:payment-cryptography:us-east-1:111122223333:key/qs6643jl4ohibtqk \
--region us-east-1

{
    "Key": {
        "KeyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/qs6643jl4ohibtqk",
        "KeyAttributes": {
            "KeyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
            "KeyClass": "SYMMETRIC_KEY",
            "KeyAlgorithm": "TDES_2KEY",
            "KeyModesOfUse": {
                "Encrypt": false,
                "Decrypt": false,
                "Wrap": false,
                "Unwrap": false,
                "Generate": true,
                "Sign": false,
                "Verify": true,
                "DeriveKey": false,
                "NoRestrictions": false
            }
        },
        "KeyCheckValue": "CC5EE2",
        "KeyCheckValueAlgorithm": "ANSI_X9_24",
        "Enabled": true,
        "Exportable": true,
        "KeyState": "CREATE_COMPLETE",
        "KeyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
        "CreateTimestamp": "2025-08-21T15:25:54.475000-03:00",
        "UsageStartTimestamp": "2025-08-21T15:25:54.287000-03:00",
        "MultiRegionKeyType": "PRIMARY",
        "ReplicationStatus": {
            "us-west-2": {
                "Status": "SYNCHRONIZED"
            }
        },
        "UsingDefaultReplicationRegions": false
    }
}

You can then access the key in the replica Region (us-west-2) using the same key ID and changing only the Region name:

aws payment-cryptography get-key \
--key-identifier arn:aws:payment-cryptography:us-west-2:111122223333:key/qs6643jl4ohibtqk \
--region us-west-2

The response shows the replica key with a reference to the primary Region:

{
    "Key": {
        "KeyArn": "arn:aws:payment-cryptography:us-west-2:111122223333:key/qs6643jl4ohibtqk",
        "KeyAttributes": {
            "KeyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
            "KeyClass": "SYMMETRIC_KEY",
            "KeyAlgorithm": "TDES_2KEY",
            "KeyModesOfUse": {
                "Encrypt": false,
                "Decrypt": false,
                "Wrap": false,
                "Unwrap": false,
                "Generate": true,
                "Sign": false,
                "Verify": true,
                "DeriveKey": false,
                "NoRestrictions": false
            }
        },
        "KeyCheckValue": "CC5EE2",
        "KeyCheckValueAlgorithm": "ANSI_X9_24",
        "Enabled": true,
        "Exportable": true,
        "KeyState": "CREATE_COMPLETE",
        "KeyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
        "CreateTimestamp": "2025-08-21T15:25:54.475000-03:00",
        "UsageStartTimestamp": "2025-08-21T15:25:54.287000-03:00",
        "MultiRegionKeyType": "REPLICA",
        "PrimaryRegion": "us-east-1"
    }
}

Things to consider

When using multi-Region keys, several important aspects should be considered. Multi-Region key replication supports only symmetric keys with the exportable attribute enabled, and asymmetric keys are not supported. For billing purposes, AWS bills per key per Region, which means replicating to three Regions incurs costs for the primary key plus costs for each key in the replica Regions.

Key aliases and tags require separate management in each Region because they are not part of the replication process. While primary keys support modifications and updates, replica keys are read-only copies that support only cryptographic operations. Modifications must be made to the key in the primary Region, and Payment Cryptography automatically propagates these changes to the replica Regions. Monitor the replication status to confirm successful synchronization of these changes.

The deletion process for multi-Region keys follows specific behavior patterns that are important to understand. When a primary key is scheduled for deletion, associated replica keys are deleted immediately. The primary key enters a pending deletion state with a minimum 3-day waiting period, during which the deletion can be canceled. However, if you restore the primary key by canceling its deletion, you will need to re-enable replication to recreate the replica keys in your desired Regions. After the 3-day waiting period expires, the primary key is permanently deleted and becomes unrecoverable. Note that deleting a replica key affects only that specific Region and does not impact the primary key or other replica keys.

Multi-Region key replication operates with eventual consistency. When creating new keys or making changes to existing keys, these updates might not appear immediately across all Regions. Applications should be designed to handle this eventual consistency model and not assume immediate availability of keys or key changes in replica Regions. If your application requires strong consistency, implement polling mechanisms using the GetKey API to verify that changes have been synchronized before proceeding with key operations.

Logging and monitoring

Payment Cryptography logs API activity through AWS CloudTrail, which now includes new events and attributes specific to Multi-Region key replication.

New CloudTrail event

The service logs a new event type called SynchronizeMultiRegionKey, which appears in primary and replica Regions.

Primary Region events:

Two SynchronizeMultiRegionKey events are logged in the primary Region for each replication Region defined:

One event related to a key export process.

{
    "eventVersion": "1.11",
    "userIdentity": {
        "accountId": "111122223333",
        "invokedBy": "payment-cryptography.amazonaws.com"
    },
    "eventTime": "2025-08-21T18:25:56Z",
    "eventSource": "payment-cryptography.amazonaws.com",
    "eventName": "SynchronizeMultiRegionKey",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "payment-cryptography.amazonaws.com",
    "userAgent": "payment-cryptography.amazonaws.com",
    "requestParameters": null,
    "responseElements": null,
    "eventID": "fbae27f1-f2ad-49d1-ab05-d460b0b4ca25",
    "readOnly": false,
    "eventType": "AwsServiceEvent",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "serviceEventDetails": {
        "keyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/qs6643jl4ohibtqk",
        "replicationRegion": "us-west-2",
        "replicationType": "ExportKeyReplica"
    },
    "eventCategory": "Management"
}

One event related to a key import process.

{
    "eventVersion": "1.11",
    "userIdentity": {
        "accountId": "111122223333",
        "invokedBy": "payment-cryptography.amazonaws.com"
    },
    "eventTime": "2025-08-21T18:25:56Z",
    "eventSource": "payment-cryptography.amazonaws.com",
    "eventName": "SynchronizeMultiRegionKey",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "payment-cryptography.amazonaws.com",
    "userAgent": "payment-cryptography.amazonaws.com",
    "requestParameters": null,
    "responseElements": null,
    "eventID": "5c06716f-88ea-4315-b633-5dde83d7232c",
    "readOnly": false,
    "eventType": "AwsServiceEvent",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "serviceEventDetails": {
        "keyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/qs6643jl4ohibtqk",
        "replicationRegion": "us-west-2",
        "replicationType": "ImportKeyReplica"
    },
    "eventCategory": "Management"
}

Replica Region events:

One SynchronizeMultiRegionKey event is logged as an import key process in each replicated Region.

{
    "eventVersion": "1.11",
    "userIdentity": {
        "accountId": "111122223333",
        "invokedBy": "payment-cryptography.amazonaws.com"
    },
    "eventTime": "2025-08-21T18:25:56Z",
    "eventSource": "payment-cryptography.amazonaws.com",
    "eventName": "SynchronizeMultiRegionKey",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "payment-cryptography.amazonaws.com",
    "userAgent": "payment-cryptography.amazonaws.com",
    "requestParameters": null,
    "responseElements": null,
    "eventID": "0a952017-dd89-435e-8959-5de7b43c86d5",
    "readOnly": false,
    "eventType": "AwsServiceEvent",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "serviceEventDetails": {
        "keyArn": "arn:aws:payment-cryptography:us-west-2:111122223333:key/qs6643jl4ohibtqk",
        "replicationRegion": "us-west-2",
        "replicationType": "ImportKeyReplica"
    },
    "eventCategory": "Management"
}

New CloudTrail event attributes

New attributes were included in the service key management APIs. The following are examples of the CreateKey API highlighting the new attributes.

One CreateKey event in the primary Region:

{
    "eventVersion": "1.11",
...
    "eventTime": "2025-08-21T18:25:54Z",
    "eventSource": "payment-cryptography.amazonaws.com",
    "eventName": "CreateKey",
    "awsRegion": "us-east-1",
...
    "requestParameters": {
        "keyAttributes": {
            "keyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
            "keyClass": "SYMMETRIC_KEY",
            "keyAlgorithm": "TDES_2KEY",
            "keyModesOfUse": {
                "encrypt": false,
                "decrypt": false,
                "wrap": false,
                "unwrap": false,
                "generate": true,
                "sign": false,
                "verify": true,
                "deriveKey": false,
                "noRestrictions": false
            }
        },
        "exportable": true,
        "replicationRegions": [
            "us-west-2"
        ]
    },
    "responseElements": {
        "key": {
            "keyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/qs6643jl4ohibtqk",
            "keyAttributes": {
                "keyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
                "keyClass": "SYMMETRIC_KEY",
                "keyAlgorithm": "TDES_2KEY",
                "keyModesOfUse": {
                    "encrypt": false,
                    "decrypt": false,
                    "wrap": false,
                    "unwrap": false,
                    "generate": true,
                    "sign": false,
                    "verify": true,
                    "deriveKey": false,
                    "noRestrictions": false
                }
            },
            "keyCheckValue": "CC5EE2",
            "keyCheckValueAlgorithm": "ANSI_X9_24",
            "enabled": true,
            "exportable": true,
            "keyState": "CREATE_COMPLETE",
            "keyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
            "createTimestamp": "Aug 21, 2025, 6:25:54 PM",
            "usageStartTimestamp": "Aug 21, 2025, 6:25:54 PM",
            "multiRegionKeyType": "PRIMARY",
            "replicationStatus": {
                "us-west-2": {
                    "status": "IN_PROGRESS"
                }
            },
            "usingDefaultReplicationRegions": false
        }
    },
...
}

One CreateKey event in a replica Region:

{
    "eventVersion": "1.11",
    "userIdentity": {
...
        "invokedBy": "payment-cryptography.amazonaws.com"
    },
    "eventTime": "2025-08-21T18:25:54Z",
    "eventSource": "payment-cryptography.amazonaws.com",
    "eventName": "CreateKey",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "payment-cryptography.amazonaws.com",
    "userAgent": "payment-cryptography.amazonaws.com",
    "requestParameters": {
        "keyAttributes": {
            "keyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
            "keyClass": "SYMMETRIC_KEY",
            "keyAlgorithm": "TDES_2KEY",
            "keyModesOfUse": {
                "encrypt": false,
                "decrypt": false,
                "wrap": false,
                "unwrap": false,
                "generate": true,
                "sign": false,
                "verify": true,
                "deriveKey": false,
                "noRestrictions": false
            }
        },
        "exportable": true,
        "enabled": true
    },
    "responseElements": {
        "key": {
            "keyArn": "arn:aws:payment-cryptography:us-west-2:111122223333:key/qs6643jl4ohibtqk",
            "keyAttributes": {
                "keyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
                "keyClass": "SYMMETRIC_KEY",
                "keyAlgorithm": "TDES_2KEY",
                "keyModesOfUse": {
                    "encrypt": false,
                    "decrypt": false,
                    "wrap": false,
                    "unwrap": false,
                    "generate": true,
                    "sign": false,
                    "verify": true,
                    "deriveKey": false,
                    "noRestrictions": false
                }
            },
            "keyCheckValue": "CC5EE2",
            "keyCheckValueAlgorithm": "ANSI_X9_24",
            "enabled": true,
            "exportable": true,
            "keyState": "CREATE_COMPLETE",
            "keyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
            "usageStartTimestamp": "Aug 21, 2025, 6:25:54 PM"
        }
    },
...
}

Getting started

To start using Multi-Region key replication in Payment Cryptography:

  1. Determine your primary Region.
  2. Determine your replica Regions and if you will use account-level or key-level configuration.
  3. Create new exportable symmetric keys or update existing keys to use the Multi-Region key replication feature.
  4. Update your applications to use the consistent key IDs across Regions.

Conclusion

The new Multi-Region key replication feature in Payment Cryptography enhances our automatic key replication capabilities, providing improved resilience and simplified management for global payment applications. This feature helps make sure your payment cryptography keys are available when and where you need them, with the flexibility to choose between account-level or key-level replication strategies.

For more information about AWS Payment Cryptography, visit https://aws.amazon.com/payment-cryptography/.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Ruy Cavalcanti
Ruy Cavalcanti

Ruy is a Senior Security Architect for the Latin American financial industry at AWS. He has worked in IT and Security for over 19 years, helping customers create secure architectures and solve data protection and compliance challenges. When he’s not architecting secure solutions, he enjoys jamming on his guitar, cooking Brazilian-style barbecue, and spending time with his family and friends.
Mark Cline
Mark Cline

Mark is a Principal Product Manager in AWS Payments, where he brings over 15 years of financial services experience across a variety of use cases and disciplines. He works with leading banks, financial institutions, and technology providers to alleviate heavy lifting in the payment system, allowing customers to focus on innovation. When he’s not simplifying payments, you can find him coaching little league or out on a run.

Announcing SageMaker Unified Studio Workshops for Financial Services

Post Syndicated from Sanjay Ohri original https://aws.amazon.com/blogs/big-data/announcing-sagemaker-unified-studio-workshops-for-financial-services/

In March 2025, AWS announced the general availability of the next generation of Amazon SageMaker, including Amazon SageMaker Unified Studio, a single data and AI development environment that brings together the functionality and tools from existing AWS Analytics and AI/ML services, including Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Amazon Bedrock, and Amazon SageMaker AI. You can discover data and AI assets from across your organization, then work together in projects to securely build and share analytics and AI artifacts, including data, models, and generative AI applications in a trusted and secure environment. Governance features including fine-grained access control are built into Amazon SageMaker Unified Studio using Amazon SageMaker Catalog to help you meet enterprise security requirements across your entire data estate. Unified access to your data is provided by a unified, open, and secure data lakehouse architecture built on Apache Iceberg open standards. Whether your data is stored in Amazon Simple Storage Service (Amazon S3) data lakes, Amazon Redshift data warehouses, or third-party and federated data sources, you can access it from one place and use it with Iceberg-compatible engines and tools.

AWS for Financial Services is a pioneer at the intersection of financial services and technology, enabling our customers to optimize operations and push the boundaries of innovation with the broadest set of services and partner solutions—all while maintaining security, compliance, and resilience at scale. Financial institutions are using AI and machine learning (ML), and generative AI services on AWS to transform their organizations faster and in ways never before possible. With Amazon SageMaker Unified Studio, financial services industry (FSI) customers can seamlessly work across different compute resources and clusters using unified notebooks, including generative AI–powered troubleshooting capabilities, and use the built-in SQL editor to query data stored in data lakes, data warehouses, databases, and applications.

Workshops

In this post, we’re excited to announce the release of four Amazon SageMaker Unified Studio publicly available workshops that are specific to each FSI segment: insurance, banking, capital markets, and payments. These workshops can help you learn how to deploy Amazon SageMaker Unified Studio effectively for business use cases. Follow the links for each FSI use case listed in the following table to get started for these self-paced workshops.

FSI use case Description
Insurance In this workshop, you’ll use Amazon SageMaker Unified Studio and analytics services to transform your insurance business challenges into opportunities. It provides hands-on experience in developing data-driven, generative AI–powered solutions for insurance that deliver measurable business value.
Banking In this workshop, you’ll explore how leading retail banks can unlock business value by using Amazon SageMaker Unified Studio to build, scale, and govern end-to-end data analytics and ML workflows. The workshop walks you through a reference architecture and curated banking-specific datasets covering common retail banking use cases, such as customer segmentation, fraud detection, churn prediction, and generative AI applications like personalized communication.
Capital Markets In this workshop, you’ll use Amazon SageMaker Unified Studio to analyze trade and quote data for the S&P 500 stocks to generate insights. The data is stored in various formats across different sources. This solution will unify the data from disparate sources using a lakehouse architecture and offer team members flexibility to access the data using familiar SQL constructs.
Payments In this workshop, you’ll use Amazon SageMaker Unified Studio and analytics services to enable organizations to ingest, store, process, and analyze payment data, supporting needs from data ingestion and storage to big data analytics, streaming analytics, business intelligence, and machine learning.

Conclusion

We appreciate your comments and feedback to help us accelerate adoption of Amazon SageMaker Unified Studio for financial services workloads. Contact your AWS account team to engage a FSI specialist solutions architect if you require additional expert guidance.

Learn more about AWS for financial services, customer case studies, and additional resources on our Financial Services website.


About the authors

Sanjay Ohri

Sanjay Ohri

Sanjay is an award-winning professional with over 15 years of successful global delivery and program management of cost-efficient cloud and on-premise services to companies like JPMorganChase and Bank of America. He works at AWS as a Principal Manager within Worldwide Financial Services working closely with customers and product teams helping to accelerate adoption of AWS services.

Raghu Prabhu

Raghu Prabhu

Raghu is an experienced information technology executive with a successful track record of implementing large technology initiatives. He has designed and managed execution of corporate IT strategies, product development, large mergers and acquisitions, data center consolidations, cloud system implementations, legacy system conversions and business process. He works at AWS as a Go-To-Market Specialist for SageMaker Unified Studio.

Develop and deploy a generative AI application using Amazon SageMaker Unified Studio

Post Syndicated from Amit Maindola original https://aws.amazon.com/blogs/big-data/develop-and-deploy-a-generative-ai-application-using-amazon-sagemaker-unified-studio/

Picture this: You’re a financial analyst starting your Monday morning with a steaming cup of coffee, ready to review your investment portfolio. But instead of manually scouring dozens of news websites, financial reports, and industry analyses, you simply ask your AI assistant: “What global events happened over the weekend that might impact my technology stock holdings?” Within seconds, you receive a comprehensive analysis of relevant news, sentiment scores, and potential investment implications—all powered by a sophisticated generative AI application you built yourself.

This scenario isn’t science fiction; it’s the reality that modern financial professionals can create today. In an era where information moves at the speed of light and industry conditions can shift dramatically overnight, staying informed isn’t just an advantage—it’s essential for survival in competitive financial landscapes. The challenge lies in processing the overwhelming volume of global information that could impact investments while distinguishing reliable insights from noise.

Amazon SageMaker – Develop and scale AI use cases with the broadest set of tools

Luckily for us, technology is making this more straightforward. The next generation of Amazon SageMaker with Amazon SageMaker Unified Studio is a single data and AI development environment where you can find and access the data in your organization and act on it using the best tools across different use cases. SageMaker Unified Studio brings together the functionality and tools from existing AWS analytics and artificial intelligence and machine learning (AI/ML) services, including Amazon EMR , AWS Glue, Amazon Athena, Amazon Redshift , Amazon Bedrock, and Amazon SageMaker AI. From within SageMaker Unified Studio, you can find, access, and query data and AI assets across your organization, then work together in projects to securely build and share analytics and AI artifacts, including data, models, and generative AI applications.

With SageMaker Unified Studio, you can efficiently build generative AI applications in a trusted and secure environment using Amazon Bedrock. You can choose from a selection of high-performing foundation models (FMs) and advanced customization capabilities like Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, Amazon Bedrock Agents, and Amazon Bedrock Flows. You can rapidly tailor and deploy generative AI applications and share with the built-in catalog for discovery.

What makes SageMaker Unified Studio particularly powerful for organizations is its integration with Amazon Bedrock Flows to build generative AI workflows, which is changing how organizations think about AI application development.

Amazon Bedrock Flows for generative AI application development

With Amazon Bedrock Flows, you can build and execute complex generative AI workflows without writing code, using an intuitive visual interface that democratizes AI development. This capability is transformative for organizations where speed, accuracy, and adaptability are paramount. It offers the following benefits:

  • Visual workflow development – Users can design AI applications by dragging and dropping components onto a canvas, making AI logic transparent and modifiable
  • Business logic flexibility – The service supports complex business logic through conditional branching, multi-path decision trees, and dynamic routing
  • Democratizing AI development – Business experts can directly contribute to AI application development without requiring extensive technical expertise
  • Seamless integration – Amazon Bedrock Flows integrates with FMs, knowledge bases, guardrails, and other AWS services
  • Reduced development complexity – The service handles infrastructure management and scaling through serverless execution and SDK APIs

Solution overview

In this post, we explore a financial use case, in which we want to stay on top of latest global events and determine our investment or financial exposure based on this. We can use a SageMaker Unified Studio flow application to pull in latest news summaries, derive sentiment based on news summary, and determine their effects on my investments. The following diagram illustrates this use case.

In the following sections, we show how to create a new project and build a flow application using a generative AI profile in SageMaker Unified Studio.

Prerequisites

For this walkthrough, you must have the following prerequisites:

  • A demo project – Create a demo project in your SageMaker Unified Studio domain. For instructions, see Create a project. For this example, we choose All capabilities in the project profile section, which includes the generative AI project profile enabled.

Create new project and build a flow application in SageMaker Unified Studio

In this section, we create a new a flow application that uses an Amazon Bedrock knowledge base to provide information about your personal portfolio. Complete the following steps:

  1. In SageMaker Unified Studio, open the project you created as a prerequisite and choose Build and then Flow.

  1. Drag Knowledge Base from Nodes to the design panel to add a knowledge base that will include the user’s investment portfolio and news articles and other information like earnings call transcripts, financial analyst reports, and so on.

  1. Choose the Knowledge Base node and configure the knowledge base as follows:
  2. Add a name for your knowledge base name (for example, portfolio…).
  3. Choose the model (for example, Claude 3.5 Haiku).

  1. Choose Create new Knowledge Base.
  2. Enter a name for the knowledge base.
  3. Select Project data source.
  4. For Select a data source, choose the Amazon Simple Storage Service (Amazon S3) bucket location where you uploaded your data.
  5. Choose Create.

The knowledge base creation process takes a few minutes to complete.

  1. When the knowledge base is ready, choose Save to save it to the flow.

  1. Choose My components, and on the options menu (three vertical dots), choose Sync to sync the knowledge base.

Make sure the S3 bucket has all the data (user portfolio data and latest news information data) before syncing the knowledge base.

We don’t provide any financial or news information data as part of this post. Upload current events or news data and investment portfolio data from your own data sources.

Test the flow application

After the knowledge base sync is complete, you can return to the flow application and ask questions. Using SageMaker Unified Studio flows, a financial analyst can provide a more personalized and customized financial outlook to their customers using rich internal financial information on their customer’s investment portfolio and latest publicly available current events and news information. The following are some example questions that you can ask to test the knowledge base:

Check if Tesla or Apple is in any of user's investment portfolio

Please check latest news information to provide information if Tesla has positive, negative or neutral outlook in the near future

Flow-based applications offer a visual approach to creating complex AI workflows. By chaining different nodes, each optimized for specific functions, you can create sophisticated solutions that are more reliable, maintainable, and efficient than single-prompt approaches. These flows allow for conditional logic and branching paths, mimicking human decision-making processes and enabling more nuanced responses based on context and intermediate results.

Clean up

To avoid ongoing charges in your AWS account, delete the resources you created during this tutorial:

  1. Delete the project.
  2. Delete the domain created as part of the prerequisites.

Conclusion

In this post, we demonstrated how to use Amazon Bedrock Flows in SageMaker Unified Studio to build a sophisticated generative AI application for financial analysis and investment decision-making without extensive coding knowledge. With this integration, you can create sophisticated financial analysis workflows through an intuitive visual interface, where you can process industry data, analyze news sentiment, and assess investment implications in real time. The solution integrates seamlessly with AWS services and FMs while providing essential features like automatic scaling, compliance controls, and audit capabilities. The implementation process involves setting up a SageMaker Unified Studio domain, configuring knowledge bases with portfolio and news data, and creating visual workflows that can analyze complex financial information. This democratized approach to AI development allows both technical and business teams to collaborate effectively, significantly reducing development time while maintaining the sophisticated capabilities needed for modern financial analysis.

To get started, explore the SageMaker Unified Studio documentation, set up a project in your AWS environment, and discover how this solution can transform your organization’s data analytics capabilities.


About the authors

Amit Maindola is a Senior Data Architect focused on data engineering, analytics, and AI/ML at Amazon Web Services. He helps customers in their digital transformation journey and enables them to build highly scalable, robust, and secure cloud-based analytical solutions on AWS to gain timely insights and make critical business decisions.

Arghya Banerjee is a Sr. Solutions Architect at AWS in the San Francisco Bay Area, focused on helping customers adopt and use the AWS Cloud. He is focused on big data, data lakes, streaming and batch analytics services, and generative AI technologies.

Melody Yang is a Principal Analytics Architect for Amazon EMR at AWS. She is an experienced analytics leader working with AWS customers to provide best practice guidance and technical advice in order to assist their success in data transformation. Her areas of interests are open-source frameworks and automation, data engineering and DataOps.

Gaurav Parekh is a Solutions Architect at AWS, specializing in generative AI and data analytics, with extensive experience building production AI systems on AWS.

Simplifying sustainability reporting using AWS and generative AI in banking

Post Syndicated from Sachin Kulkarni original https://aws.amazon.com/blogs/architecture/simplifying-sustainability-reporting-using-aws-and-generative-ai-in-banking/

European banks face a new challenge with the European Commission’s transition from the Non-Financial Reporting Directive (NFRD) to the Corporate Sustainability Reporting Directive (CSRD) regulations. This transition represents an expansion in sustainability reporting scope that will affect approximately 50,000 companies, a significant increase from the previous 11,700.

This means that banks themselves need to file sustainability reports because they will now be one of those 50,000 companies, but for their own reporting, they also need to assess their clients’ sustainability reports because they lend or finance those companies.

In this post, you learn how you can use generative AI services on Amazon Web Services (AWS) to automate your sustainability reporting requirements, reduce manual effort, and improve accuracy. You do this by implementing an automated solution for extracting, processing, and validating data from corporate reports.

The challenge

Financial institutions and sustainability teams managing sustainability reporting face three critical challenges:

  • Scale and complexity: Banks and financial institutions must process thousands of annual reports and sustainability documents, often spanning hundreds of pages each. This process requires extensive data extraction, complex EU Taxonomy alignment calculations, and resource-intensive validation steps. Manual processing introduces significant risks of errors and consumes valuable team resources.
  • Regulatory compliance: Banks must now implement detailed CSRD requirements, track specific metrics for turnover, capital expenditure (CapEx), and operating expenses (OpEx), and calculate their Green Asset Ratio (GAR) as well as environmental risks that come with their loans, debt, or equity investments. These new requirements demand robust data collection and processing capabilities.
  • Data management: Processing Green House Gas (GHG) emissions data across Scope 1, 2, and 3 categories requires analyzing complex lending and investment activities. With strict reporting deadlines, organizations need efficient tools to process this expanding volume of sustainability data.

The sustainability team point of view

Banks finance a large variety of counterparties and economic activities. While their carbon footprint is primarily linked to the greenhouse gas (GHG) emissions of their counterparties (Scope 3), The direct GHG emissions (Scope 1) of financial institutions or the GHG emissions linked to their energy consumption (Scope 2) are usually limited. For banks, the most critical key performance indicator (KPI) is the GAR, which measures the proportion of a bank’s taxonomy-aligned balance sheet exposures versus its total eligible exposures, as shown in the following figure.

To calculate their GAR, banks must obtain and use sustainability data from annual reports or sustainability reports of up to 50,000 companies (many of which are subject to NFRD and CSRD reporting), and understand how much of their activities are linked to EU Taxonomy.

The manual process

In the example that follows, we use the Amazon 2023 Annual Report. Some of the data that teams would have to manually extract includes: Revenue, Scope 1, Scope 2, and Scope 3 emissions.

Amazon Annual Report

As you can see from the page count at the top of the preceding figure, people manually searching for this data would have to go through 92 pages to find the parameters they’re looking for. Next, we might determine that some of the data we need (Scope 1, Scope 2, Scope 3) isn’t available in the annual report, so we need to analyze the sustainability report. As shown in the following figure, to manually retrieve the relevant data from this report, we would have to go through 98 pages of information.

amazon sustainability report

To prepare a GAR, we would have to repeat this process across hundreds or even thousands of companies.

A solution using AWS and generative AI

To address these challenges, we propose an automated approach using AWS services. This approach can help banks streamline their sustainability reporting processes.

high level flow

Here’s how this solution works— as shown in the preceding figure:

  1. Upload your counterparties’ reports to Amazon Simple Storage Service (Amazon S3).
  2. Amazon Bedrock automatically:
    1. Determines NFRD eligibility.
    2. Extracts relevant sustainability data.
    3. Organizes information for GAR calculations.
  3. Review and validate the extracted data.
  4. Generate required regulatory reports.

Architecture

We divide the architecture into two areas:

  1. Data ingestion flow
  2. Report generation flow

Data ingestion flow

We use Amazon Bedrock Knowledge Bases to build an automated data ingestion flow. See Prerequisites for your Amazon Bedrock knowledge base data to understand supported document formats and limits for knowledge base data.

Data Ingestion flow

The workflow, shown in the preceding figure, is:

  1.  Annual reports or sustainability reports are uploaded into an S3 bucket.
  2. On the S3 bucket, we enable event notifications for events such as addition, change, or deletion of the reports.
  3. These events are sent to Amazon Event Bridge, which trigger an AWS Lambda function.
  4. The Lambda function syncs the data source to an Amazon Bedrock knowledge base.
  5. Amazon Bedrock Knowledge Bases processes the documents and converts it into vector embeddings. For more information, see Amazon Bedrock Knowledge Bases supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications
  6. Amazon Bedrock Knowledge Bases stores the vector embeddings in the vector database of your choice, such as in an Amazon OpenSearch Serverless collection.

Now the data is read, broken into chunks, converted to embeddings and stored in a vector store. You use a report generation flow to ask questions about the information in the knowledge base.

Report generation flow

To automate the report generation for sustainability teams, we created the report generation flow shown in the following figure.

report generation flow

The report generation flow includes the following steps:

  1. When user uploads an annual report, the data from the report is ingested into the knowledge base, as shown in the data ingestion flow.
  2. A Lambda function—Invoke Bedrock Agent—is triggered to invoke an Amazon Bedrock agent.
  3. The Amazon Bedrock agent determines NFRD or CSRD applicability based on various parameters such as employee numbers and annual revenues. This agent then passes on what kind of regulation to apply to a Lambda function.
  4. The Lambda function Retrieve Sustainability Metrics retrieves various parameters needed for NFRD or CSRD from the annual report.
    1. The function receives NFRD or CSRD applicability from the Amazon Bedrock agent.
    2. Based on NFRD or CSRD applicability, there are specific sustainability metrics that need to be retrieved. For NFRD, there are about 15 metrics that need to be retrieved, and for CSRD, there are about 30 metrics.
    3.  The function iteratively sends {variable} to the Amazon Bedrock flow. For example, if the metric to be retrieved is Scope 1 emission, then the Lambda function will send variable=‘Scope 1 emission’
    4. The function gets the metric value from the Amazon Bedrock flow and when the required metrics are retrieved, creates a CSV file with the details.
  5. Amazon Bedrock flow:
    1. Retrieve {variable} (for example, ‘Scope 1 emission’) from the annual report. For this, we create a prompt, as shown in the following diagram.
    2. Use the prompt to fetch the value from the knowledge base.
        • Prompt:
          <query> You are an intelligent agent that helps retrieve information from a knowledgebase. Please find {{variable}}. Please return only a number and not any additional text. I only need the value so you will return one word</query>

    3. Return the value to the Lambda function in Step 4.

Breakdown of key components

Amazon S3 is used for storing annual statements and sustainability reports, providing highly durable and secure object storage that facilitate immediate access when needed for processing.

Amazon Bedrock Knowledge Bases enables using Retrieval-Augmented Generation (RAG) to optimize the output of a large language model by giving it the context of companies’ annual reports and regulatory requirements. It does so by creating chunks and vector embeddings from the annual reports to enable efficient information retrieval from a vector database of your choice.

Amazon Bedrock foundation models (FMs) extract information from an Amazon Bedrock knowledge base and generate standardized PDF reports for regulators, providing consistent formatting and alignment with CSRD requirements. We encourage you to choose the best foundational model for your use case through the flexibility and enterprise-grade controls of Amazon Bedrock. For this solution, we used Anthropic’s Claude Sonnet 3.5 as the model, but by using Amazon Bedrock, you can choose from over 50 different models to see which one best fits your use case.

Amazon Bedrock Flows orchestrates the document processing pipeline, coordinating between services to automatically extract required sustainability metrics and validate compliance requirements. This feature helps us manage the workflow from initial document ingestion through to final report generation.

Amazon Bedrock Prompt Management creates and helps manage precise prompts that help retrieve multiple sustainability metrics from reports for example: turnover, Scope 1, Scope 2, and Scope 3 emissions data. These structured prompts facilitate consistent data extraction across different document formats.

Amazon Bedrock Agents evaluates each uploaded document to determine NFRD or CSRD eligibility by analyzing company revenue, employee count, and incorporation details. The agents retrieve these parameters by using a Lambda function that’s part of the actions the agent can perform.

Lambda handles event-driven processing when new documents are uploaded. Lambda functions are also used by the agent to retrieve data from companies’ annual reports and trigger the appropriate workflows based on document type.

Amazon EventBridge is used to build event-driven applications at scale across AWS and manages workflow orchestration, automatically initiating document processing when new reports are uploaded through S3 event notifications.

This architecture enables banks to process thousands of sustainability reports efficiently. The solution scales automatically to handle increasing document volumes while keeping security a top priority.

Additional considerations

You can use the following additional AWS service to help further increase the accuracy of information retrieval from sustainability documents.

Amazon Bedrock Guardrails to make sure that the solution caters to responsible AI policies. Specifically, we have added contextual grounding checks to reduce hallucinations. This is important for the solution because we’re trying to find a few specific values in a large document, and these checks make sure that the metrics retrieved are based on the documents.

Automated reasoning checks which help to verify the metrics returned by the solution. Consider the metric Number of employees. There can be multiple places in the annual report where the number of employees is mentioned; for example, temporary workers, part-time employees, employees from various departments, employees from a company that was taken over last year, and so on. To arrive at the right number, automated reasoning checks help.

Benefits

This sustainability reporting solution cuts document processing time from 8—10 weeks to few hours. Banks get clear audit trails showing exactly how they extracted and validated sustainability data. When regulations are updated, the system adapts through its knowledge base without disrupting operations. Built-in security protects company data through the entire process. Access controls and encryption are in place to secure information. The output delivers standardized, accurate reports. This automation lets sustainability teams concentrate on environmental improvements rather than paperwork. Teams can instead analyze trends and develop initiatives instead of hunting through reports for data points.

Conclusion

As sustainability reporting requirements evolve, having a flexible and automated solution will become crucial. While we focused on NFRD reporting, the same pattern can be adapted for CSRD compliance reporting, SFDR reporting requirements, and Internal sustainability metrics, or EU Taxonomy alignment.

Customers looking to build their products in the Financial Services industry have access to industry and domain AWS specialists; contact us for help in your cloud journey.

You can also learn more about AWS services and solutions for financial services by visiting AWS for Financial Services and Generative AI on AWS.


About the authors

Powering global payout intelligence: How MassPay uses Amazon Redshift Serverless and zero-ETL to drive deeper analytics.

Post Syndicated from Yossi Shlomo original https://aws.amazon.com/blogs/big-data/powering-global-payout-intelligence-how-masspay-uses-amazon-redshift-serverless-and-zero-etl-to-drive-deeper-analytics/

Since the company was founded in 2019, MassPay’s singular objective has been to deliver frictionless global payments that power innovation and lift people, businesses, and quality of life worldwide. Today, the MassPay payment orchestration offering empowers companies to move money across borders effortlessly; enabling local payment experiences in over 175 countries and 70 currencies—including digital wallets, locally preferred alternative payment methods, and cryptocurrencies. From hyper-localized checkout experiences to instant global payouts, we orchestrate seamless financial experiences that reflect how people and businesses transact around the world.

As we have expanded globally, so has the complexity of our data. In this blog post we shall cover how understanding real-time payout performance, identifying customer behavior patterns across regions, and optimizing internal operations required more than traditional business intelligence and analytics tools. And how since implementing Amazon Redshift and Zero-ETL, we’ve seen 90% reduction in data availability latency, payments data available for analytics 1.5x faster, leading to 45% reduction in time-to-insight and 37% fewer support tickets related to transaction visibility and payment inquiries.

Unlocking deeper payout intelligence and global insights

To continue our innovation—and to continue to exceed our partners’ and customers’ expectations—we knew we needed to go beyond basic reporting. We know success is dependent upon developing a truly data-driven organization. This means tracking granular KPIs across payout success rates, payment method adoption, transaction velocity, customer onboarding funnel drop-off, and support ticket correlation. We also wanted to better forecast customer payment expectations, monitor foreign exchange cost trends, and understand market-specific nuances such as how payout timing impacts seller satisfaction in social commerce ecosystems.

We didn’t just want more data. We wanted faster, smarter insights that would shape decisions in real time. Being a data-driven organization means our teams don’t guess. They know. And that gives us, our partners, and our customers real operational and competitive advantages.

– Yossi Schlomo, Director of Payment Systems Architecture

MySQL databases, CSV exports, and third-party reporting tools wouldn’t support the scale or speed we needed to deliver.

Choosing AWS: A scalable and integrated analytics foundation

We chose Amazon Web Services (AWS) for our data infrastructure and to accelerate our analytics capabilities.

At the core of our stack is Amazon Redshift Serverless with AI-driven scaling and optimizations enabled, which gives us scalable, fast, and cost-efficient analytics without the burden of managing infrastructure. Coupled with Amazon Aurora MySQL-Compatible Edition as our transactional data store and Amazon Redshift zero-ETL integration, we eliminated manual data pipelines altogether. Transactional data flows into Amazon Redshift in near real-time, instantly powering dashboards, alerts, and machine learning (ML) models.

This data feeds interactive dashboards—both internally and embedded within our platform for customers. Now, executives, operations leads, and customer success teams can drill into payout performance by region, merchant, or payment method, while customers get real-time visibility into their own payout analytics as part of our platform experience. The architecture is shown in the following figure.

MassPay Zero-ETL architecture with Amazon Redshift Serverless

MassPay Zero-ETL architecture with Amazon Redshift Serverless

Why it’s different and what it unlocked

Without Amazon Redshift Serverless and zero-ETL, we would have had to invest in costly custom data pipelines, maintain separate exchange, transform, and load (ETL) infrastructure, and manually manage data freshness. The integration with Aurora MySQL-Compatible is seamless and reduces our analytics latency from minutes to seconds.

Our differentiator is simple: We operationalize not just transactions but analytics for global payments. Most platforms can tell you if a transaction went through. For payments and payouts, MassPay can tell you how fast it went, what it cost, what method was most effective, and what that means for your business in real time.

– Yossi Schlomo, Director of Payment Systems Architecture

Embedded intelligence, built for scale

Every MassPay customer gets access to comprehensive payment analytics. These are accessed using our API or through a white-label dashboard (shown in the following figure). This detail is core to our product and central to our value proposition. As part of our go-to-market strategy, we showcase these capabilities in every demo, and they’ve proven to be key drivers in conversion and upsell conversations, especially with platforms targeting high-growth ecosystems.We use tiered pricing models based on transaction volume, and our embedded intelligence helps our partners and customers optimize usage and scale efficiently.

MassPay Dashboard

MassPay Dashboard

What we’ve gained

Since implementing Amazon Redshift and Zero-ETL, we’ve seen measurable results including:

  • 90% reduction in data availability latency and data available for analytics 1.5x faster
  • 45% reduction in time-to-insight across payment and payout intelligence reports
  • 37% fewer support tickets related to transaction visibility and payment inquiries
  • Real-time Net Promoter Score (NPS) tracking correlates with payout success metrics, driving faster resolution paths

What’s next

We’re now extending our analytics model to include more advanced ML-based payout failure prediction and ML-based payment authorization prediction, FX optimization alerts, partner-level and network-level benchmarking, and much more.

Conclusion

MassPay isn’t just payments. We aren’t just payouts. We are the engine powering modern commerce. With AWS, we’re turning complex global payments infrastructure into a smart, transparent, and scalable platform for insights. For our partners, and for our customers, this means better decisions, faster payment processing, faster payouts, and truly global reach without guesswork.

We encourage you to leverage below resources to explore these features further


About the authors

Yossi Shlomo serves as the Director of Payment Systems Architecture at MassPay. Yossi is an expert in credit card payment systems, PCI compliance, and secure transaction architecture, helping global platforms process payments at scale with confidence. He specializes in building scalable, cloud-based transaction systems and optimizing global payment gateways for performance and reliability.

Milind Oke is a Amazon Redshift and SageMaker Lakehouse specialist Solutions Architect as AWS. He is based out of New York and has been building enterprise data platforms, data warehousing, and analytics solutions for customers across various domains over two decades. In the 5 years with AWS, Milind has been a speaker at worldwide technical conferences and is co-author of Amazon Redshift: The Definitive Guide: Jump-Start Analytics Using Cloud Data Warehousing 1st Edition.

Transforming Maya’s API management with Amazon API Gateway

Post Syndicated from Arthi Jaganathan original https://aws.amazon.com/blogs/architecture/transforming-mayas-api-management-with-amazon-api-gateway/

In this post, you will learn how Amazon Web Services (AWS) customer, Maya, the Philippines’ leading fintech company and digital bank, built an API management platform to address the growing complexities of managing multiple APIs hosted on Amazon API Gateway. API Gateway is a fully managed service that you can use to create RESTful and WebSocket APIs.

At Maya, different teams build APIs to expose their services to merchants. As the number of applications grew, the overhead of managing APIs increased. An API platform is a set of tools to simplify and standardize across API management concerns such as security, governance, automated deployments, observability, and integrations with multiple AWS accounts. This frees up application teams to focus on features while offloading management concerns to the API platform.

Initial state

Prior to implementing the API platform, Maya used a decentralized API management approach, which created significant challenges. Individual teams operated independent API gateways, resulting in fragmented infrastructure, leading to several issues:

  1. Lack of standardization: Implementing consistent API standards across the organization proved difficult. Each team maintained its own configurations and practices, leading to inconsistencies in security and documentation.
  2. Security posture maintenance: While Maya maintained a strong security posture, doing so across the numerous independent gateways was unsustainable. The overhead of applying consistent security policies and updates across all gateways was becoming increasingly burdensome.
  3. Inconsistent operational visibility: Observability wasn’t inherently limited, rather inconsistently applied. Having multiple, different gateways makes it challenging to enforce a unified observability strategy and correlate data across the entire API ecosystem.

Solution overview

To address these challenges, Maya implemented an API platform, code-named Unified API Gateway. This centralized API management helps enforce consistent standards and improve overall security and observability. The following image illustrates the architecture of the Unified API Gateway and how it integrates with backend services managed and owned by different teams across different AWS accounts.

Enterprise-level AWS architecture diagram showing secured API gateway with multi-account EKS service distribution

API Platform Architecture

Maya chose to host all APIs in a central API account to centralize governance. This is managed by a dedicated shared services cloud team. Amazon CloudFront with AWS WAF and AWS Shield Advanced integration provides perimeter security. An AWS Lambda authorizer provides application security by managing authentication, authorization, and session management. This mitigates against the OWASP top 10 API security risks.

Integration to backend services is configured through API Gateway private integration and AWS Transit Gateway. In a decentralized API deployment strategy where APIs are co-hosted with the service in the respective AWS account, the integration will be simpler because you won’t need cross-account network connectivity. You will still benefit from the API management techniques covered in this post.

Standardization through structured service on-boarding

OpenAPI Specification (OAS) provides a structured definition for APIs. As shown in the following figure, service teams define the API OAS specification. This is embedded in Terraform infrastructure-as-code template for API Gateway. These are checked into source code repository and deployed using GitLab CI.

End-to-end API infrastructure pipeline showing specification integration through GitLab CI to AWS API Gateway

API Gateway Infrastructure-as-code (IaC) Pipeline

A configuration file used as a Terraform template supplies parameters for components of the solution such as backend integration, Lambda authorizer details, and additional headers for auditing. The following OAS snippets demonstrate this.

  1. Integration with the backend service
    x-amazon-apigateway-integration:
       type: "http_proxy"
       connectionId: "${vpc_link_id}"
       httpMethod: "GET"
       uri: "http://$${stageVariables.url}:11620/v1/api/endpoint/{id}" # double $ is not a typo
  2. Adding headers to the request
    x-amazon-apigateway-integration:
       type: "http_proxy"
       connectionId: "${vpc_link_id}"
       httpMethod: "GET"
       uri: "http://$${stageVariables.url}:11620/v1/api/endpoint/{id}"
       requestParameters:
          integration.request.header.x-requesting-service-id: "'api-gw'"
          integration.request.header.x-org-customer-id: "context.authorizer.x-org-customer-id"
  3. Security definition
    securitySchemes:
       lambda-authorizer:
          type: "${authorizer_type}"  
          name: "${authorizer_name}"
          x-amazon-apigateway-authtype: "custom"
          x-amazon-apigateway-authorizer:
             type: "request"
             authorizerUri: "${authorizer_uri}"
             authorizerCredentials: "${authorizer_credentials}"
             identitySource: "${authorizer_identity_source}"

API Gateway supports most of the OpenAPI 2.0 specification and the OpenAPI 3.0 specification but there are a few exceptions. Maya uses a custom plugin in the pipeline to enforce necessary limiting rules to help ensure compatibility with API Gateway.

To simplify deployment for development teams, a custom Terraform module abstracts away the API Gateway implementation details.

module "test-microservice-api-gateway" {
  # module version parameters
  source = "gitlabinstance.com/platform-engineering/apigw-terraform-module/aws"
  version = "1.2.7"

  # module deployed infrastructure parameters
  api_name = var.api_name
  api_mapping_path = var.api_mapping_path
  environment = var.environment
  aws_region = var.aws_region
  account_id = var.account_id
  tags = var.tags
  domain_name = var.domain_name
  stage_name = var.stage_name

  oas_path = var.oas_path # this value is populated via environment variable in Gitlab CI/CD

  providers = {
     aws = aws.apigw
  }
  authorizer_credentials = var.authorizer_credentials
  authorizer_uri = var.authorizer_uri
  vpc_link_id = var.vpc_link_id
  endpoint_url = var.endpoint_url
}

To use multi-level prefixes for custom domains with REST API Gateway, you need the Terraform module for API Gateway v2.

resource "aws_api_gateway_rest_api" "apigw" {
   name = "${var.environment}-${var.api_name}"
   body = templatefile(
     local.oasFilePath,
     {
       vpc_link_id = var.vpc_link_id
       authorizer_uri = var.authorizer_uri
       authorizer_credentials = var.authorizer_credentials
     }
  )
  description = "API Gateway for ${var.api_name}"
  endpoint_configuration {
    types = ["REGIONAL"]
  }

   # Default endpoint needs to be disabled if CloudFront is used as entry point to API Gateway
  disable_execute_api_endpoint = true
  tags = local.tags
  }

  # Use apigatewayv2 in order to have multi level base path ex. /v1/service_name
  resource "aws_apigatewayv2_api_mapping" "this" {
     domain_name = var.domain_name
    api_id = aws_api_gateway_rest_api.apigw.id
    stage = aws_api_gateway_stage.apigw.stage_name
    api_mapping_key = var.api_mapping_path
  }

Simplify API security with automation

Maya’s Unified API Gateway implements a robust, multi-layered security strategy. This approach helps ensure comprehensive protection from external threats and enforces stringent access control policies.

AWS WAF inspects and filters incoming traffic to protect against common web exploits, including OWASP Top 10, such as SQL injection and cross-site scripting attacks. A combination of custom and managed rule sets blocks malicious requests and enforces security policies. AWS Shield Advanced mitigates distributed denial of service (DDoS) attacks and provides 24/7 access to the AWS Shield Response Team (SRT) for expert support during attack events. This helps ensure high availability and resiliency.

API Gateway is integrated with a Lambda authorizer for authentication and authorization. The custom function implements fine-grained access control based on several factors such as identity, roles, and scopes.

To help ensure the consistency and integrity of the API configurations, all updates and deployments are strictly managed through an automated infrastructure-as-code (IaC) pipeline. This helps eliminate the risk of unauthorized or accidental manual changes to the API Gateway and any underlying infrastructure. The IaC pipeline makes sure that all API configurations, including security settings, are deployed through a controlled and auditable process. This prevents configuration drift and makes sure that security policies are consistently applied across all APIs. This also means that all changes are subject to code reviews and version control, adding another layer of security and traceability.

End-to-end visibility with observability

Maya’s Unified API Gateway prioritizes comprehensive observability to proactively monitor API performance, identify potential issues, and provide a seamless user experience. It uses a combination of AWS services and integrated tools to achieve this.

Amazon CloudWatch is used to monitor key performance metrics, including latency, error rates, and requests counts. CloudWatch provides real-time insights into the health and performance of APIs. Alerts on P95 and P99 values help identify and address performance bottlenecks, ensuring responsiveness.

CloudWatch metrics are streamed to Dynatrace, an application performance monitoring (APM) tool. The centralized view helps correlate data from various sources, create custom dashboards, and configure intelligent alerts based on predefined thresholds.

To help ensure complete visibility into API activity, the Lambda authorizer and API Gateway access logs are centralized in Splunk. This provides a comprehensive audit trail to track authentication and authorization events, identify security incidents, and troubleshoot API requests. Headers generated after authentication and authorization are done are passed down to the backend services for proper log correlation.

Future roadmap

The Unified API Gateway will continue to evolve to meet the growing needs of the organization and its partners and customers. The following are the key future enhancements that will further streamline API management, improve the developer experience, and enhance security.

  1. Integration with the internal developer portal: This will provide a self-service UI for bootstrapping new APIs from scratch and further empower developers. This will also simplify documentation and discovery by cataloging all APIs
  2. A modular, extension-based design for enhanced processing: This will introduce custom processing of requests in-line in the gateway account before integrating with backend services. Examples include digital signature verification, message transformation, and custom business logic. A modular design will offer a flexible and scalable way to enhance the functionality of Maya’s APIs without modifying backend services.
  3. Bring your own (BYO) authorizer: Support a wider range of identity providers and authentication protocols, providing greater flexibility and control over API access.
  4. Centralizing schema validation: Moving schema validation to API Gateway to bring consistency and improve the robustness and security of APIs by preventing malformed or malicious requests from being processed.
  5. API monetization: Create new revenue streams by adding support for usage-based billing, tiered pricing, and subscription models.

Conclusion

This post has described the creation of Maya’s robust API management and governance solution, using a combination of native AWS services and powerful partner tools such as Terraform and Dynatrace. We’ve demonstrated how this Unified API Gateway has streamlined and automated core API processes, transforming Maya’s previously fragmented infrastructure into a secure and observable ecosystem. By establishing clear guardrails, the API solution team empowers developers to rapidly deploy APIs while maintaining consistent standards.

With the recent implementation of this solution across more teams, Maya is focused on defining and tracking key performance indicators (KPIs). We anticipate measuring critical metrics such as API onboarding efficiency, developer experience, API latency, and security incident rates. These insights will serve as a foundation for continuous improvement and optimization, ensuring the solution’s sustained effectiveness and evolution.

Visit the API platform guidance on Serverlessland to learn more about building API platforms. See the API Gateway pattern collection to learn more about designing REST API integrations on AWS.


About the Authors

AWS renews its AAA Pinakes rating for the Spanish financial sector

Post Syndicated from Daniel Fuertes original https://aws.amazon.com/blogs/security/aws-renews-its-aaa-pinakes-rating-for-the-spanish-financial-sector/

Amazon Web Services (AWS) has successfully revalidated its prestigious AAA rating under the Pinakes qualification system, with certification coverage extending to 174 services across 31 global AWS Regions. This achievement marks a significant milestone in the commitment of AWS to serving the Spanish financial sector with the highest security standards and assurance.

The Pinakes framework, developed by the Centro de Cooperación Interbancaria (CCI), stands as a comprehensive security rating system designed to evaluate and monitor service providers working with Spanish financial institutions. This sophisticated framework encompasses 1,315 requirements, strategically organized into four fundamental categories: confidentiality, integrity, availability of information, and general requirements.

The framework’s evaluation spans 14 domains, encompassing:

  • Information security management program
  • Third-party management
  • Normative compliance
  • Network controls
  • Access controls
  • Incident management
  • Encryption
  • Secure development
  • Continuous Monitoring
  • Antimalware protection
  • Resilience
  • Systems operation
  • Personnel security
  • Facilities security

Pinakes implements a sophisticated rating scale ranging from A+ to D, where A+ represents the highest level of cybersecurity management implementation, and D indicates compliance with minimum security requirements. Each requirement undergoes thorough evaluation by an independent third-party auditor, providing objective assessment of security measures.

The renewal of AWS A ratings across confidentiality, integrity, and availability domains, culminating in an overall AAA security rating, demonstrates our ongoing investment in meeting industry benchmarks. This achievement validates our robust security controls and underscores our dedication to protecting the interests of our Spanish financial sector customers.

This requalification reaffirms the position AWS holds as a trusted service provider and highlights our continuous commitment to maintaining and enhancing our security posture in the Spanish financial sector.

The full control matrix will be published on AWS Artifact and available on request. Pinakes participants who are AWS customers can contact their AWS account manager to request access to it.

As always, we value your feedback and questions. Reach out to the AWS Compliance team through the Contact Us page. To learn more about our other compliance and security programs, see AWS Compliance Programs.

If you have feedback about this post, submit it in the Comments section below.

Daniel Fuertes

Daniel Fuertes

Daniel is a Security Audit Program Manager at AWS based in Madrid, Spain. Daniel leads multiple security audits, attestations, and certification programs in Spain and other EMEA countries. He has twelve years of experience in security assurance and compliance, including previous experience as an auditor for the PCI DSS security framework. He also holds the CISSP, PCIP, and ISO 27001 Lead Auditor certifications.

Build a high-performance quant research platform with Apache Iceberg

Post Syndicated from Guy Bachar original https://aws.amazon.com/blogs/big-data/build-a-high-performance-quant-research-platform-with-apache-iceberg/

In our previous post Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg, we showed how to use Apache Iceberg in the context of strategy backtesting. In this post, we focus on data management implementation options such as accessing data directly in Amazon Simple Storage Service (Amazon S3), using popular data formats like Parquet, or using open table formats like Iceberg. Our experiments are based on real-world historical full order book data, provided by our partner CryptoStruct, and compare the trade-offs between these choices, focusing on performance, cost, and quant developer productivity.

Data management is the foundation of quantitative research. Quant researchers spend approximately 80% of their time on necessary but not impactful data management tasks such as data ingestion, validation, correction, and reformatting. Traditional data management choices include relational, SQL, NoSQL, and specialized time series databases. In recent years, advances in parallel computing in the cloud have made object stores like Amazon S3 and columnar file formats like Parquet a preferred choice.

This post explores how Iceberg can enhance quant research platforms by improving query performance, reducing costs, and increasing productivity, ultimately enabling faster and more efficient strategy development in quantitative finance. Our analysis shows that Iceberg can accelerate query performance by up to 52%, reduce operational costs, and significantly improve data management at scale.

Having chosen Amazon S3 as our storage layer, a key decision is whether to access Parquet files directly or use an open table format like Iceberg. Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines.

In this post, we use the term vanilla Parquet to refer to Parquet files stored directly in Amazon S3 and accessed through standard query engines like Apache Spark, without the additional features provided by table formats such as Iceberg.

Quant developer and researcher productivity

In this section, we focus on the productivity features offered by Iceberg and how it compares to directly reading files in Amazon S3. As mentioned earlier, 80% of quantitative research work is attributed to data management tasks. Business impact heavily relies on quality data (“garbage in, garbage out”). Quants and platform teams have to ingest data from multiple sources with different velocities and update frequencies, and then validate and correct the data. These activities translate into the ability to run append, insert, update, and delete operations. For simple append operations, both Parquet on Amazon S3 and Iceberg offer similar convenience and productivity. However, real-world data is never perfect and needs to be corrected. Gaps filling (inserts), error corrections and restatements (updates), and removing duplicates (deletes) are the most obvious examples. When writing data in the Parquet format directly to Amazon S3 without using an open table format like Iceberg, you have to write code to identify the affected partition, correct errors, and rewrite the partition. Moreover, if the write job fails or a downstream read job occurs during this write operation, all downstream jobs have the possibility of reading inconsistent data. However, Iceberg has built-in insert, update, and delete features with ACID (Atomicity, Consistency, Isolation, Durability) properties, and the framework itself manages the Amazon S3 mechanics on your behalf.

Guarding against lookahead bias is an essential capability of any quant research platform—what backtests as a profitable trading strategy can render itself useless and unprofitable in real time. Iceberg provides time travel and snapshotting capabilities out of the box to manage lookahead bias that could be embedded in the data (such as delayed data delivery).

Simplified data corrections and updates

Iceberg enhances data management for quants in capital markets through its robust insert, delete, and update capabilities. These features allow efficient data corrections, gap-filling in time series, and historical data updates without disrupting ongoing analyses or compromising data integrity.

Unlike direct Amazon S3 access, Iceberg supports these operations on petabyte-scale data lakes without requiring complex custom code. This simplifies data modification processes, which is crucial for ingesting and updating large volumes of market and trade data, quickly iterating on backtesting and reprocessing workflows, and maintaining detailed audit trails for risk and compliance requirements.

Iceberg’s table format separates data files from metadata files, enabling efficient data modifications without full dataset rewrites. This approach also reduces expensive ListObjects API calls typically needed when directly accessing Parquet files in Amazon S3.

Additionally, Iceberg offers merge on read (MoR) and copy on write (CoW) approaches, providing flexibility for different quant research needs. MoR enables faster writes, suitable for frequently updated datasets, and CoW provides faster reads, beneficial for read-heavy workflows like backtesting.

For example, when a new data source or attribute is added, quant researchers can seamlessly incorporate it into their Iceberg tables and then reprocess historical data, confident they’re using correct, time-appropriate information. This capability is particularly valuable in maintaining the integrity of backtests and the reliability of trading strategies.

In scenarios involving large-scale data corrections or updates, such as adjusting for stock splits or dividend payments across historical data, Iceberg’s efficient update mechanisms significantly reduce processing time and resource usage compared to traditional methods.

These features collectively improve productivity and data management efficiency in quant research environments, allowing researchers to focus more on strategy development and less on data handling complexities.

Historical data access for backtesting and validation

Iceberg’s time travel feature can enable quant developers and researchers to access and analyze historical snapshots of their data. This capability can be useful while performing tasks like backtesting, model validation, and understanding data lineage.

Iceberg simplifies time travel workflows on Amazon S3 by introducing a metadata layer that tracks the history of changes made to the table. You can refer to this metadata layer to create a mental model of how Iceberg’s time travel capability works.

Iceberg’s time travel capability is driven by a concept called snapshots, which are recorded in metadata files. These metadata files act as a central repository that stores table metadata, including the history of snapshots. Additionally, Iceberg uses manifest files to provide a representation of data files, their partitions, and any associated deleted files. These manifest files are referenced in the metadata snapshots, allowing Iceberg to identify the relevant data for a specific point in time.

When a user requests a time travel query, the typical workflow involves querying a specific snapshot. Iceberg uses the snapshot identifier to locate the corresponding metadata snapshot in the metadata files. The time travel capability is invaluable to quants, enabling them to backtest and validate strategies against historical data, reproduce and debug issues, perform what-if analysis, comply with regulations by maintaining audit trails and reproducing past states, and roll back and recover from data corruption or errors. Quants can also gain deeper insights into current market trends and correlate them with historical patterns. Also, the time travel feature can further mitigate any risks of lookahead bias. Researchers can access the exact data snapshots that were present in the past, and then run their models and strategies against this historical data, without the risk of inadvertently incorporating future information.

Seamless integration with familiar tools

Iceberg provides a variety of interfaces that enable seamless integration with the open source tools and AWS services that quant developers and researchers are familiar with.

Iceberg provides a comprehensive SQL interface that allows quant teams to interact with their data using familiar SQL syntax. This SQL interface is compatible with popular query engines and data processing frameworks, such as Spark, Trino, Amazon Athena, and Hive. Quant developers and researchers can use their existing SQL knowledge and tools to query, filter, aggregate, and analyze their data stored in Iceberg tables.

In addition to the primary interface of SQL, Iceberg also provides the DataFrame API, which allows quant teams to programmatically interact with their data with popular distributed data processing frameworks like Spark and Flink as well as thin clients like PyIceberg. Quants can further use this API to build more programmatic approaches to access and manipulate data, allowing for the implementation of custom logic and integration of Iceberg with other AWS ecosystems like Amazon EMR.

Although accessing data from Amazon S3 is a viable option, Iceberg provides several advantages like metadata management, performance optimization using partition pruning, data manipulation, and a rich AWS ecosystem integration including services like Athena and Amazon EMR with more seamless and feature-rich data processing experience.

Undifferentiated heavy lifting

Data partitioning is one of major contributing factors to optimizing aggregate throughput to and from Amazon S3, contributing to overall High Performance Computing (HPC) environment price-performance.

Quant researchers often face performance bottlenecks and complex data management challenges when dealing with large-scale datasets in Amazon S3. As discussed in Best practices design patterns: optimizing Amazon S3 performance, single prefix performance is limited to 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 prefix. Iceberg’s metadata layer and intelligent partitioning strategies automatically optimize data access patterns, reducing the likelihood of I/O throttling and minimizing the need for manual performance tuning. This automation allows quant teams to focus on developing and refining trading strategies rather than troubleshooting data access issues or optimizing storage layouts.

In this section, we discuss situations we discovered while running our experiments at scale and solutions provided by Iceberg vs. vanilla Parquet when accessing data in Amazon S3.

As we mentioned in the introduction, the nature of quant research is “fail fast”—new ideas have to be quickly evaluated and then either prioritized for a deep dive or dismissed. This makes it impossible to come up with universal partitioning that works all the time and for all research styles.

When accessing data directly as Parquet files in Amazon S3, without using an open table format like Iceberg, partitioning and throttling issues can arise. Partitioning in this case is determined by the physical layout of files in Amazon S3, and a mismatch between the intended partitioning and the actual file layout can lead to I/O throttling exceptions. Additionally, listing directories in Amazon S3 can also result in throttling exceptions due to the high number of API calls required.

In contrast, Iceberg provides a metadata layer that abstracts away the physical file layout in Amazon S3. Partitioning is defined at the table level, and Iceberg handles the mapping between logical partitions and the underlying file structure. This abstraction helps mitigate partitioning issues and reduces the likelihood of I/O throttling exceptions. Furthermore, Iceberg’s metadata caching mechanism minimizes the number of List API calls required, addressing the directory listing throttling issue.

Although both approaches involve direct access to Amazon S3, Iceberg is an open table format that introduces a metadata layer, providing better partitioning management and reducing the risk of throttling exceptions. It doesn’t act as a database itself, but rather as a data format and processing engine on top of the underlying storage (in this case, Amazon S3).

One of the most effective techniques to address Amazon S3 API quota limits is salting (random hash prefixes)—a method that adds random partition IDs to Amazon S3 paths. This increases the probability of prefixes residing on different physical partitions, helping distribute API requests more evenly. Iceberg supports this functionality out of the box for both data ingestion and reading.

Implementing salting directly in Amazon S3 requires complex custom code to create and use partitioning schemes with random keys in the naming hierarchy. This approach necessitates a custom data catalog and metadata system to map physical paths to logical paths, allowing direct partition access without relying on Amazon S3 List API calls. Without such a system, applications risk exceeding Amazon S3 API quotas when accessing specific partitions.

At petabyte scale, Iceberg’s advantages become clear. It efficiently manages data through the following features:

  • Directory caching
  • Configurable partitioning strategies (range, bucket)
  • Data management functionality (compaction)
  • Catalog, metadata, and statistics use for optimal execution plans

These built-in features eliminate the need for custom solutions to manage Amazon S3 API quotas and data organization at scale, reducing development time and maintenance costs while improving query performance and reliability.

Performance

We highlighted a lot of the functionality of Iceberg that eliminates undifferentiated heavy lifting and improves developer and quant productivity. What about performance?

This section evaluates whether Iceberg’s metadata layer introduces overhead or delivers optimization for quantitative research use cases, comparing it with vanilla Parquet access on Amazon S3. We examine how these approaches impact common quant research queries and workflows.

The key question is whether Iceberg’s metadata layer, designed to optimize vanilla Parquet access on Amazon S3, introduces overhead or delivers the intended optimization for quantitative research use cases. Then we discuss overlapping optimization techniques, such as data distribution and sorting. We also discuss that there is no magic partitioning and all sorting scheme where one size fits all in the context of quant research. Our benchmarks show that Iceberg performs comparably to direct Amazon S3 access, with additional optimizations from its metadata and statistics usage, similar to database indexing.

Vanilla Parquet vs Iceberg: Amazon S3 read performance

We created four different datasets: two using Iceberg and two with direct Amazon S3 Parquet access, each with both sorted and unsorted write distributions. The purpose of this exercise was to compare the performance of direct Amazon S3 Parquet access vs. the Iceberg open table format, taking into account the impact of write distribution patterns when running various queries commonly used in quantitative trading research.

Query 1

We first run a simple count query to get the total number of records in the table. This query helps understand the baseline performance for a straightforward operation. For example, if the table contains tick-level market data for various financial instruments, the count can give an idea of the total number of data points available for analysis.

The following is the code for vanilla Parquet:

count = spark.read.parquet(s3://example-s3-bucket/path/to/data).count()

The following is the code for Iceberg:

count = spark.read.table(table_name).count()
# We used typical count query for the performance comparision however this could have been also done using metadata as shown below which completes in few seconds 
spark.read.format("iceberg").load(f"{table_name}.files").select(sum("record_count")).show(truncate=False)

Query 2

Our second query is a grouping and counting query to find the number of records for each combination of exchange_code and instrument. This query is commonly used in quantitative trading research to analyze market liquidity and trading activity across different instruments and exchanges.

The following is the code for vanilla Parquet:

spark.read.parquet(s3://example-s3-bucket/path/to/data) \
         .groupBy("exchange_code", "instrument") \
         .count() \
         .orderBy("count", ascending=False) \
         .count().show(truncate=False)

The following is the code for Iceberg:

spark.read.table(table_name) \
        .groupBy("exchange_code", "instrument") \
        .count() \
        .orderBy("count", ascending=False) \
        .show(truncate=False)

Query 3

Next, we run a distinct query to retrieve the distinct combinations of year, month, and day from the adapterTimestamp_ts_utc column. In quantitative trading research, this query can be helpful for understanding the time range covered by the dataset. Researchers can use this information to identify periods of interest for their analysis, such as specific market events, economic cycles, or seasonal patterns.

 The following is the code for vanilla Parquet:

spark.read.parquet(s3://example-s3-bucket/path/to/data) \
         .select(f.year("adapterTimestamp_ts_utc").alias("year"),
                 f.month("adapterTimestamp_ts_utc").alias("month"),
                 f.dayofmonth("adapterTimestamp_ts_utc").alias("day")) \
         .distinct() \
         .count() \
         .show(truncate=False)

The following is the code for Iceberg:

spark.read.table(table_name) \
        .select(f.year("adapterTimestamp_ts_utc").alias("year"),
                f.month("adapterTimestamp_ts_utc").alias("month"),
                f.dayofmonth("adapterTimestamp_ts_utc").alias("day")) \
        .distinct() \
        .count() \
        .show(truncate=False)

Query 4

Lastly, we run a grouping and counting query with a date range filter on the adapterTimestamp_ts_utc column. This query is similar to Query 2 but focuses on a specific time period. You could use this query to analyze market activity or liquidity during specific time periods, such as periods of high volatility, market crashes, or economic events. Researchers can use this information to identify potential trading opportunities or investigate the impact of these events on market dynamics.

 The following is the code for vanilla Parquet:

spark.read.parquet(s3://example-s3-bucket/path/to/data) \
         .filter((f.col("adapterTimestamp_ts_utc") >= "2023-04-17 00:00:00") &
                 (f.col("adapterTimestamp_ts_utc") <= "2023-04-18 23:59:59.999")) \
         .groupBy("exchange_code", "instrument") \
         .count() \
         .orderBy("count", ascending=False) \
         .show(truncate=False)

The following is the code for Iceberg. Because Iceberg has a metadata layer, the row count can be fetched from metadata:

spark.read.table(table_name) \
        .filter((f.col("adapterTimestamp_ts_utc") >= "2023-04-17 00:00:00") &
                (f.col("adapterTimestamp_ts_utc") <= "2023-04-18 23:59:59.999")) \
        .groupBy("exchange_code", "instrument") \
        .count() \
        .orderBy("count", ascending=False) \
        .show(truncate=False)

Test results

To evaluate the performance and cost benefits of using Iceberg for our quant research data lake, we created four different datasets: two with Iceberg tables and two with direct Amazon S3 Parquet access, each using both sorted and unsorted write distributions. We first ran AWS Glue write jobs to create the Iceberg tables and then mirrored the same write processes for the Amazon S3 Parquet datasets. For the unsorted datasets, we partitioned the data by exchange and instrument, and for the sorted datasets, we added a sort key on the time column.

Next, we ran a series of queries commonly used in quantitative trading research, including simple count queries, grouping and counting, distinct value queries, and queries with date range filters. Our benchmarking process involved reading data from Amazon S3, performing various transformations and joins, and writing the processed data back to Amazon S3 as Parquet files.

By comparing runtimes and costs across different data formats and write distributions, we quantified the benefits of Iceberg’s optimized data organization, metadata management, and efficient Amazon S3 data handling. The results showed that Iceberg not only enhanced query performance without introducing significant overhead, but also reduced the likelihood of task failures, reruns, and throttling issues, leading to more stable and predictable job execution, particularly with large datasets stored in Amazon S3.

AWS Glue write jobs

In the following table, we compare the performance and the cost implications of using Iceberg vs. vanilla Parquet access on Amazon S3, taking into account the following use cases:

  • Iceberg table (unsorted) – We created an Iceberg table partitioned by exchange_code and instrument This means that the data was physically partitioned in Amazon S3 based on the unique combinations of exchange_code and instrument values. Partitioning the data in this way can improve query performance, because Iceberg can prune out partitions that aren’t relevant to a particular query, reducing the amount of data that needs to be scanned. The data was not sorted on any column in this case, which is the default behavior.
  • Vanilla Parquet (unsorted) – For this use case, we wrote the data directly as Parquet files to Amazon S3, without using Iceberg. We repartitioned the data by exchange_code and instrument columns using standard hash partitioning before writing it out. Repartitioning was necessary to avoid potential throttling issues when reading the data later, because accessing data directly from Amazon S3 without intelligent partitioning can lead to too many requests hitting the same S3 prefix. Like the Iceberg table, the data was not sorted on any column in this case. To make comparison fair, we used the exact repartition count that Iceberg uses.
  • Iceberg table (sorted) – We created another Iceberg table, this time partitioned by exchange_code and instrument Additionally, we sorted the data in this table on the adapterTimestamp_ts_utc column. Sorting the data can improve query performance for certain types of queries, such as those that involve range filters or ordered outputs. Iceberg automatically handles the sorting and partitioning of the data transparently to the user.
  • Vanilla Parquet (sorted) – For this use case, we again wrote the data directly as Parquet files to Amazon S3, without using Iceberg. We repartitioned the data by range on the exchange_code, instrument, and adapterTimestamp_ts_utc columns before writing it out using standard range partitioning with 1996 partition count, because this was what Iceberg was using based on SparkUI. Repartitioning on the time column (adapterTimestamp_ts_utc) was necessary to achieve a sorted write distribution, because Parquet files are sorted within each partition. This sorted write distribution can improve query performance for certain types of queries, similar to the sorted Iceberg table.
Write Distribution Pattern Iceberg Table (Unsorted) Vanilla Parquet (Unsorted) Iceberg Table (Sorted) Vanilla Parquet
(Sorted)
DPU Hours 899.46639 915.70222 1402 1365
Number of S3 Objects 7444 7288 9283 9283
Size of S3 Parquet Objects 567.7 GB 629.8 GB 525.6 GB 627.1 GB
Runtime 1h 51m 40s 1h 53m 29s 2h 52m 7s 2h 47m 36s

AWS Glue read jobs

For the AWS Glue read jobs, we ran a series of queries commonly used in quantitative trading research, such as simple counts, grouping and counting, distinct value queries, and queries with date range filters. We compared the performance of these queries between the Iceberg tables and the vanilla Parquet files read in Amazon S3. In the following table, you can see two AWS Glue jobs that show the performance and cost implications of access patterns described earlier.

Read Queries / Runtime in Seconds Iceberg Table Vanilla Parquet
COUNT(1) on unsorted 35.76s 74.62s
GROUP BY and ORDER BY on unsorted 34.29s 67.99s
DISTINCT and SELECT on unsorted 51.40s 82.95s
FILTER and GROUP BY and ORDER BY on unsorted 25.84s 49.05s
COUNT(1) on sorted 15.29s 24.25s
GROUP BY and ORDER BY on sorted 15.88s 28.73s
DISTINCT and SELECT on sorted 30.85s 42.06s
FILTER and GROUP BY and ORDER BY on sorted 15.51s 31.51s
AWS Glue DPU hours 45.98 67.97

Test results insights

These test results offered the following insights:

  • Accelerated query performance – Iceberg improved read operations by up to 52% for unsorted data and 51% for sorted data. This speed boost enables quant researchers to analyze larger datasets and test trading strategies more rapidly. In quantitative finance, where speed is crucial, this performance gain allows teams to uncover market insights faster, potentially gaining a competitive edge.
  • Reduced operational costs – For read-intensive workloads, Iceberg reduced DPU hours by 32.4% and achieved a 10–16% reduction in Amazon S3 storage. These efficiency gains translate to cost savings in data-intensive quant operations. With Iceberg, firms can run more comprehensive analyses within the same budget or reallocate resources to other high-value activities, optimizing their research capabilities.
  • Enhanced data management and scalability – Iceberg showed comparable write performance for unsorted data (899.47 DPU hours vs. 915.70 for vanilla Parquet) and maintained consistent object counts across sorted and unsorted scenarios (7,444 and 9,283, respectively). This consistency leads to more reliable and predictable job execution. For quant teams dealing with large-scale datasets, this reduces time spent on troubleshooting data infrastructure issues and increases focus on developing trading strategies.
  • Improved productivity – Iceberg outperformed vanilla Parquet access across various query types. Simple counts were 52.1% faster, grouping and ordering operations improved by 49.6%, and filtered queries were 47.3% faster for unsorted data. This performance enhancement boosts productivity in quant research workflows. It reduces query completion times, allowing quant developers and researchers to spend more time on model development and market analysis, leading to faster iteration on trading strategies.

Conclusion

Quant research platforms often avoid adopting new data management solutions like Iceberg, fearing performance penalties and increased costs. Our analysis disproves these concerns, demonstrating that Iceberg not only matches or enhances performance compared to direct Amazon S3 access, but also provides substantial additional benefits.

Our tests reveal that Iceberg significantly accelerates query performance, with improvements of up to 52% for unsorted data and 51% for sorted data. This speed boost enables quant researchers to analyze larger datasets and test trading strategies more rapidly, potentially uncovering valuable market insights faster.

Iceberg streamlines data management tasks, allowing researchers to focus on strategy development. Its robust insert, update, and delete capabilities, combined with time travel features, enable effortless management of complex datasets, improving backtest accuracy and facilitating rapid strategy iteration.

The platform’s intelligent handling of partitioning and Amazon S3 API quota issues eliminates undifferentiated heavy lifting, freeing quant teams from low-level data engineering tasks. This automation redirects efforts to high-value activities such as model development and market analysis. Moreover, our tests show that for read-intensive workloads, Iceberg reduced DPU hours by 32.4% and achieved a 10–16% reduction in Amazon S3 storage, leading to significant cost savings.

Flexibility is a key advantage of Iceberg. Its various interfaces, including SQL, DataFrames, and programmatic APIs, integrate seamlessly with existing quant research workflows, accommodating diverse analysis needs and coding preferences.

By adopting Iceberg, quant research teams gain both performance enhancements and powerful data management tools. This combination creates an environment where researchers can push analytical boundaries, maintain high data integrity standards, and focus on generating valuable insights. The improved productivity and reduced operational costs enable quant teams to allocate resources more effectively, ultimately leading to a more competitive edge in quantitative finance.


About the Authors

Guy Bachar is a Senior Solutions Architect at AWS based in New York. He specializes in assisting capital markets customers with their cloud transformation journeys. His expertise encompasses identity management, security, and unified communication.

Sercan KaraogluSercan Karaoglu is Senior Solutions Architect, specialized in capital markets. He is a former data engineer and passionate about quantitative investment research.

Boris LitvinBoris Litvin is a Principal Solutions Architect at AWS. His job is in financial services industry innovation. Boris joined AWS from the industry, most recently Goldman Sachs, where he held a variety of quantitative roles across equity, FX, and interest rates, and was CEO and Founder of a quantitative trading FinTech startup.

Salim TutuncuSalim Tutuncu is a Senior Partner Solutions Architect Specialist on Data & AI, based in Dubai with a focus on the EMEA. With a background in the technology sector that spans roles as a data engineer, data scientist, and machine learning engineer, Salim has built a formidable expertise in navigating the complex landscape of data and artificial intelligence. His current role involves working closely with partners to develop long-term, profitable businesses using the AWS platform, particularly in data and AI use cases.

Alex TarasovAlex Tarasov is a Senior Solutions Architect working with Fintech startup customers, helping them to design and run their data workloads on AWS. He is a former data engineer and is passionate about all things data and machine learning.

Jiwan PanjikerJiwan Panjiker is a Solutions Architect at Amazon Web Services, based in the Greater New York City area. He works with AWS enterprise customers, helping them in their cloud journey to solve complex business problems by making effective use of AWS services. Outside of work, he likes spending time with his friends and family, going for long drives, and exploring local cuisine.

Preparing for take-off: Regulatory perspectives on generative AI adoption within Australian financial services

Post Syndicated from Julian Busic original https://aws.amazon.com/blogs/security/preparing-for-take-off-regulatory-perspectives-on-generative-ai-adoption-within-australian-financial-services/

The Australian financial services regulator, the Australian Prudential Regulation Authority (APRA), has provided its most substantial guidance on generative AI to date in Member Therese McCarthy Hockey’s remarks to the AFIA Risk Summit 2024. The guidance gives a green light for banks, insurance companies, and superannuation funds to accelerate their adoption of this transformative technology, but reminded the financial services industry of the need for adequate guardrails to make sure that the benefits of generative AI don’t come at an unacceptable cost to the community.

Amazon Web Services (AWS) is committed to developing AI responsibly and strongly supports APRA’s message to proceed with generative AI adoption with appropriate guardrails implemented. AWS is at the forefront of generative AI research and innovation, and many of our financial services customers are already harnessing the benefits of our artificial intelligence (AI), machine learning (ML), and generative AI services. AWS is committed to the responsible development and use of AI so that we can help our customers achieve their business goals while meeting—and aiming to exceed—their regulators’ expectations.

A green light for AI, ML, and generative AI

APRA’s guidance, as outlined in APRA Member Therese McCarthy Hockey’s remarks to the AFIA Risk Summit 2024, offers a clear pathway for adoption of AI, ML, and generative AI technologies by APRA-regulated entities. Ms. McCarthy Hockey says that there is “keen support” within APRA and across government for companies to realize the benefits of technology-led innovation, and she highlights the significant advantages that effective use of generative AI can deliver, such as improved productivity, cost efficiencies, more personalized customer experiences, and the ability to divert valuable resources to higher-level areas of need.

“Within APRA and across governments and regulators there is keen support for the realisation of tangible improvements through innovation.” — APRA Member Therese McCarthy Hockey’s remarks to AFIA Risk Summit May 2024

AWS financial services customers are starting to use more advanced AI for a variety of purposes, such as customer service, marketing, application development, fraud detection, and regulatory compliance. Specific use cases cited by APRA were the use of generative AI to rapidly review long documents against criteria such as policy requirements, use of generative AI-powered coding tools to produce better code faster, and creating generative AI bots to simulate customer testing of products and services. This is an extension of less sophisticated forms of AI which have been in operation for some time, with APRA citing internet chat bots and natural language processing as examples where businesses have already realized efficiencies by automating and speeding up manual or time-consuming processes.

APRA and other financial services regulators are experimenting internally with AI themselves. In Ms. McCarthy Hockey’s speech, she noted that APRA itself is using text analysis tools on an ongoing basis to review responses to APRA risk culture surveys, with the results helping APRA risk specialists direct focus to where it’s most required. APRA is also experimenting with natural language processing tools to review incident reporting data from regulated entities and to highlight incidents that are worthy of further investigation. This helps to reduce the human effort required by APRA staff and increase regulatory efficiency. Finally, APRA is collaborating with the Australian Securities and Investments Commission (ASIC) and the Reserve Bank of Australia (RBA) on a proof of concept to reduce the effort required to compare, analyze, and summarize the reams of documentation the three agencies must review as part of their regular entity supervision duties.

Risks must be understood and managed

APRA advocates for a prudent approach to experimentation with these technologies. As was the case with cloud adoption, organizations with more mature risk and data management capabilities will be able to move faster than those without.

“APRA’s message to the entities we regulate is that firm board oversight, robust technology platforms and strong risk management are essential for companies that want to begin experimenting with new ways of harnessing AI.” — APRA Member Therese McCarthy Hockey’s remarks to AFIA Risk Summit May 2024

APRA’s current regulatory framework is fit-for-purpose

APRA also made the specific point that its existing prudential framework remains fit-for-purpose for the increased uptake of AI, ML, and generative AI.

APRA’s primary focus is on governance, citing three key areas:

  1. Do boards have sufficient capability to determine an appropriate AI strategy and make sound risk management decisions? Are they able to effectively challenge management? What sort of learning and development programs are in train, and do the boards have access to external skills and advice if required?
  2. How mature is the risk culture? Is a risk management mindset embedded and functioning effectively across all three lines of defense? What controls and monitoring are in place to help prevent employees making unauthorized use of AI, ML, and generative AI tools?
  3. Is there adequate data quality and reliability? AI outputs depend directly on the quality of the inputs. APRA states that data management is an area where many regulated entities have a long way to go.

APRA also focuses on accountability, reminding regulated entities that as with any form of outsourcing or use of third-party services, the regulated entity retains accountability for the outputs of the AI, ML, and generative AI programs they deploy. There must always be a human in the loop: a person accountable for verifying that AI operates as intended. The level of human involvement can vary—for example, APRA does not suggest that a human should be involved in every AI decision made by a fraud detection service, but there should be a human who is accountable for the algorithm it runs, its operations, and the outcomes it drives.

How AWS is helping customers locally and globally use AI responsibly

From the outset, AWS has prioritized responsible AI innovation by embedding safety, fairness, robustness, security, and privacy into our development processes, and continuously educating our employees. We extend this commitment through to our customers by designing services that help customers derive business value from AI in a safe and responsible way.

AWS collaborates with organizations such as the OECD AI working groups, the Partnership on AI, the Responsible AI Institute, and strategic partnerships with universities worldwide. In Australia, AWS collaborates with key institutions like the National AI Centre, CSIRO, the Australian Information Industry Association, and the Tech Council of Australia to provide insights on responsible AI adoption and to maximize the benefits of AI technology for the country. The recent Voluntary AI Safety Standard developed by the National AI Centre is the start of clear guidance for Australian organizations to follow, and AWS is engaging with Australia and other governments on the responsible use adoption and use of generative AI.

Recently, AWS has supported global financial services customers in critical areas such as risk management, financial crime prevention, and cybersecurity by using generative AI to analyze and respond to large data volumes in real-time. Verafin (a Nasdaq company) used Amazon Bedrock to improve anti-money laundering and fraud prevention processes. This application of AI enhances the effectiveness of financial crime management programs. Mastercard employs AWS AI and machine learning services to detect and prevent fraud while providing the most seamless customer experience possible.

Generative AI’s role in modernizing legacy systems is increasingly recognized, especially among Australian financial services customers who are undertaking transformation programs to reduce technology debt and enhance process resilience. CommBank, PEXA, and National Australia Bank (NAB) employ generative AI technology to improve speed, quality, and security when building and modifying applications.

How to implement responsible AI within your organization

The core dimensions of responsible AI at AWS align to the key regulatory considerations of both APRA and regulators globally:

  • Fairness – Considering impacts on different groups of stakeholders
  • Explainability – Understanding and evaluating system outputs
  • Privacy and security – Appropriately obtaining, using, and protecting data and models
  • Safety – Working to prevent harmful system output and misuse
  • Controllability – Having mechanisms to monitor and steer AI system behaviour
  • Veracity and robustness – Achieving correct system outputs, even with unexpected or adversarial inputs
  • Governance – Incorporating best practices into the AI supply chain, including providers and deployers
  • Transparency – Enabling stakeholders to make informed choices about their engagement with an AI system

Note that responsible AI is a continually evolving field. Customers can keep updated with developments in this area on our Responsible AI webpage.

The Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI provides extensive guidance, and serves as both a starting point and a guide to help customers meet, and in many cases exceed, regulatory expectations.

We have integrated features into our generative AI services to facilitate the application of responsible AI policies for organizations. For example, Amazon Bedrock Guardrails can help financial services organizations comply with APRA guidance on AI use in several key ways:

  1. Content filtering – Guardrails allows organizations to configure content filters to block harmful or inappropriate content in AI model inputs and outputs. This helps AI applications to adhere to with APRA’s expectations for responsible AI use.
  2. Topic restrictions – Organizations can define specific topics to be avoided in AI interactions. For example, a banking chatbot could be configured so it won’t provide investment advice, aligning with regulatory restrictions.
  3. Sensitive information protection – Guardrails can detect and redact personally identifiable information (PII) in AI inputs and outputs. This helps protect customer privacy and aids in compliance with data protection requirements.
  4. Custom word filters – Companies can set up lists of words or phrases to block, helping maintain appropriate communication.
  5. Contextual grounding checks – This feature helps detect and filter AI hallucinations in model responses where a reference source and a user query are provided, improving the accuracy and reliability of AI-generated responses. This aligns with APRA’s focus on making sure that AI systems provide accurate and trustworthy information.
  6. Customizable policies – Guardrails allows organizations to tailor AI safeguards to their specific needs and regulatory requirements, helping them align with APRA’s principles-based approach.
  7. Consistent safeguards – Guardrails can be applied across multiple AI models and applications, enabling a standardized approach to responsible AI use across the organization.
  8. Transparency and testing – The ability to test guardrails and iterate on configurations supports APRA’s expectations for due diligence and appropriate monitoring of AI systems.

We have a comprehensive user guide detailing how to implement, configure, and test Amazon Bedrock Guardrails.

AWS AI Service Cards also provide detailed information on AWS AI services, including intended use cases, limitations, and responsible AI design choices. This transparency helps financial institutions understand and responsibly use AI technologies.

APRA’s existing prudential standards do not set specific rules for managing AI/ML and generative AI risks. Instead, APRA outlines desired risk management outcomes, leaving it to each regulated entity to assess AI deployment risks and implement appropriate controls. AWS offers the User Guide to Financial Services Regulations and Guidelines in Australia to help customers meet APRA’s requirements.

Ultimately, the rate of AI, ML, and generative AI adoption amongst APRA-regulated entities will be determined by the risk appetite and risk management capability of individual entities. APRA openly encourages its regulated entities—our financial services customers—who are considering AI, ML, and generative AI experimentation and adoption to reach out to APRA directly and initiate dialogue. APRA is a highly experienced, knowledgeable, and approachable regulator, and will be able to provide valuable insights and guidance to regulated entities.

Conclusion and next steps

APRA’s messaging to industry is a significant milestone for AI, ML, and generative AI adoption in the Australian financial services industry. Boards, executives, and technology decision-makers should review APRA’s Risk Summit speech and consider APRA’s support for the adoption of these technologies when refining their strategies and plans.

AWS, and our AWS Partner Network, are experienced in working with financial services customers, and there are already a number of examples both internationally and locally where generative AI has been implemented to create value for our customers. AWS is ready to help our customers meet and exceed APRA’s risk management expectations.

Contact your AWS representative to discuss how the AWS solution architects, AWS Professional Services teams, AWS Training and Certification, and the AWS Partner Network can assist with your AI, ML, and generative AI adoption journey. If you don’t have an AWS representative, please contact us at https://aws.amazon.com/contact-us.
 

Julian Busic
Julian Busic

Julian is a Security Solutions Architect with a focus on regulatory engagement. He works with our customers, their regulators, and AWS teams to help customers raise the bar on secure cloud adoption and usage. Julian has over 15 years of experience working in risk and technology across the financial services industry in Australia and New Zealand.
Jamie Simon
Jamie Simon

Jamie leads AWS business within the banking and financial services industry across Australia and New Zealand, supporting financial services customers as they make use of the cloud to transform their business for a digital and AI-enabled future.
Warren Cammack
Warren Cammack

Warren supports AWS customers in applying the value of the AWS Cloud at scale, focusing on identifying and overcoming blockers to adoption. Currently he is leading the rollout of generative AI services to enable enterprises to benefit from the new technology in a safe, responsible, and effective manner.
Krish De
Krish De

Krish is a Principal Solutions Architect with a focus on financial services. He works with AWS customers, their regulators, and AWS teams to safely accelerate customers’ cloud adoption, with prescriptive guidance on governance, risk, and compliance. Krish has over 20 years of experience working in governance, risk, and technology across the financial services industry in Australia, New Zealand, and the United States.

Amazon EMR on EC2 cost optimization: How a global financial services provider reduced costs by 30%

Post Syndicated from Omar Gonzalez original https://aws.amazon.com/blogs/big-data/amazon-emr-on-ec2-cost-optimization-how-a-global-financial-services-provider-reduced-costs-by-30/

In this post, we highlight key lessons learned while helping a global financial services provider migrate their Apache Hadoop clusters to AWS and best practices that helped reduce their Amazon EMR, Amazon Elastic Compute Cloud (Amazon EC2), and Amazon Simple Storage Service (Amazon S3) costs by over 30% per month.

We outline cost-optimization strategies and operational best practices achieved through a strong collaboration with their DevOps teams. We also discuss a data-driven approach using a hackathon focused on cost optimization along with Apache Spark and Apache HBase configuration optimization.

Background

In early 2022, a business unit of a global financial services provider began their journey to migrate their customer solutions to AWS. This included web applications, Apache HBase data stores, Apache Solr search clusters, and Apache Hadoop clusters. The migration included over 150 server nodes and 1 PB of data. The on-premises clusters supported real-time data ingestion and batch processing.

Because of aggressive migration timelines driven by the closure of data centers, they implemented a lift-and-shift rehosting strategy of their Apache Hadoop clusters to Amazon EMR on EC2, as highlighted in the Amazon EMR migration guide.

Amazon EMR on EC2 provided the flexibility for the business unit to run their applications with minimal changes on managed Hadoop clusters with the required Spark, Hive, and HBase software and versions installed. Because the clusters are managed, they were able to decompose their large on-premises cluster and deploy purpose-built transient and persistent clusters for each use case on AWS without increasing operational overhead.

Challenge

Although the lift-and-shift strategy allowed the business unit to migrate with lower risk and allowed their engineering teams to focus on product development, this came with increased ongoing AWS costs.

The business unit deployed transient and persistent clusters for different use cases. Several application components relied on Spark Streaming for real-time analytics, which was deployed on persistent clusters. They also deployed the HBase environment on persistent clusters.

After the initial deployment, they discovered several configuration issues that led to suboptimal performance and increased cost. Despite using Amazon EMR managed scaling for persistent clusters, the configuration wasn’t efficient due to setting a minimum of 40 core nodes and task nodes, resulting in wasted resources. Core nodes were also misconfigured to auto scale. This led to scale-in events shutting down core nodes with shuffle data. The business unit also implemented Amazon EMR auto-termination policies. Because of shuffle data loss on the EMR on EC2 clusters running Spark applications, certain jobs ran five times longer than planned. Here, auto-termination policies didn’t mark a cluster as idle because a job was still running.

Lastly, there were separate environments for development (dev), user acceptance testing (UAT), production (prod), which were also over-provisioned with the minimum capacity units for the managed scaling policies configured too high, leading to higher costs as shown in the following figure.

Short-term cost-optimization strategy

The business unit completed the migration of applications, databases, and Hadoop clusters in 4 months. Their immediate goal was to get out of their data centers as quickly as possible, followed by cost optimization and modernization. Although they expected greater upfront costs because of the lift-and-shift approach, their costs were 40% higher than forecasted. This sped up their need to optimize.

They engaged with their shared services team and the AWS team to develop a cost-optimization strategy. The business unit began by focusing on cost-optimization best practices to implement immediately that didn’t require product development team engagement or impact their productivity. They performed a cost analysis to determine the largest contributors of cost were EMR on EC2 clusters running Spark, EMR on EC2 clusters running HBase, Amazon S3 storage, and EC2 instances running Solr.

The business unit started by enforcing auto-termination of EMR clusters in their dev environments by using automation. They considered using Amazon EMR isIdle Amazon CloudWatch metrics to build an event-driven solution with AWS Lambda, as described in Optimize Amazon EMR costs with idle checks and automatic resource termination using advanced Amazon CloudWatch metrics and AWS Lambda. They implemented a stricter policy to shut down clusters in their lower environments after 3 hours, regardless of usage. They also updated managed scaling policies in DEV and UAT and set the minimum cluster size to three instances to allow clusters to scale up as needed. This resulted in a 60% savings in monthly dev and UAT costs over 5 months, as shown in the following figure.

For the initial production deployment, they had a subset of Spark jobs running on a persistent cluster with an older Amazon EMR 5.(x) release. To optimize costs, they split smaller jobs and larger jobs to run on separate persistent clusters and configured the minimum number of core nodes required to support jobs in each cluster. Setting the core nodes to a constant size while using managed scaling for only task nodes is a recommended best practice and eliminated the issue of shuffle data loss. This also improved the time to scale in and out, because task nodes don’t store data in Hadoop Distributed File System (HDFS).

Solr clusters ran on EC2 instances. To optimize this environment, they ran performance tests to determine the best EC2 instances for their workload.

With over one petabyte of data, Amazon S3 contributed to over 15% of monthly costs. The business unit enabled the Amazon S3 Intelligent-Tiering storage class to optimize storage expenses for historical data and reduce their monthly Amazon S3 costs by over 40%, as shown in the following figure. They also migrated Amazon Elastic Block Store (Amazon EBS) volumes from gp2 to gp3 volume types.

Longer-term cost-optimization strategy

After the business unit realized initial cost savings, they engaged with the AWS team to organize a financial hackathon (FinHack) event. The goal of the hackathon was to reduce costs further by using a data-driven process to test cost-optimization strategies for Spark jobs. To prepare for the hackathon, they identified a set of jobs to test using different Amazon EMR deployment options (Amazon EC2, Amazon EMR Serverless) and configurations (Spot, AWS Graviton, Amazon EMR managed scaling, EC2 instance fleets) to arrive at the most cost-optimized solution for each job. A sample test plan for a job is shown in the following table. The AWS team also assisted with analyzing Spark configurations and job execution during the event.

Job Test Description Configuration
Job 1 1 Run an EMR on EC2 job with default Spark configurations Non Graviton, On-Demand Instances
2 Run an EMR on Serverless job with default Spark configurations Default configuration
3 Run an EMR on EC2 job with default Spark configuration and Graviton instances Graviton, On-Demand Instances
4 Run an EMR on EC2 job with default Spark configuration and Graviton instances. Hybrid Spot Instance allocation. Graviton, On-Demand and Spot Instances

The business unit also performed extensive testing using Spot Instances before and during the FinHack. They initially used the Spot Instance advisor and Spot Blueprints to create optimal instance fleet configurations. They automated the process to select the most optimal Availability Zone to run jobs by querying for the Spot placement scores using the get_spot_placement_scores API before launching new jobs.

During the FinHack, they also developed an EMR job tracking script and report to granularly track cost per job and measure ongoing improvements. They used the AWS SDK for Python (Boto3) to list the status of all transient clusters in their account and report on cluster-level configurations and instance hours per job.

As they executed the test plan, they found several additional areas of enhancement:

  • One of the test jobs makes API calls to Solr clusters, which introduced a bottleneck in the design. To prevent Spark jobs from overwhelming the clusters, they fine-tuned executor.cores and spark.dynamicAllocation.maxExecutors properties.
  • Task nodes were over-provisioned with large EBS volumes. They reduced the size to 100 GB for additional cost savings.
  • They updated their instance fleet configuration by setting unit/weights proportional based on instance types selected.
  • During the initial migration, they set the spark.sql.shuffle.paritions configuration too high. The configuration was fine-tuned for their on-premises cluster but not updated to align with their EMR clusters. They optimized the configuration by setting the value to one or two times the number of vCores in the cluster .

Following the FinHack, they enforced a cost allocation tagging strategy for persistent clusters that are deployed using Terraform and transient clusters deployed using Amazon Managed Workflows for Apache Airflow (Amazon MWAA). They also deployed an EMR Observability dashboard using Amazon Managed Service for Prometheus and Amazon Managed Grafana.

Results

The business unit reduced monthly costs by 30% over 3 months. This allowed them to continue migration efforts of remaining on-premises workloads. Most of their 2,000 jobs per month now run on EMR transient clusters. They have also increased AWS Graviton usage to 40% of total usage hours per month and Spot usage to 10% in non-production environments.

Conclusion

Through a data-driven approach involving cost analysis, adherence to AWS best practices, configuration optimization, and extensive testing during a financial hackathon, the global financial services provider successfully reduced their AWS costs by 30% over 3 months. Key strategies included enforcing auto-termination policies, optimizing managed scaling configurations, using Spot Instances, adopting AWS Graviton instances, fine-tuning Spark and HBase configurations, implementing cost allocation tagging, and developing cost tracking dashboards. Their partnership with AWS teams and a focus on implementing short-term and longer-term best practices allowed them to continue their cloud migration efforts while optimizing costs for their big data workloads on Amazon EMR.

For additional cost-optimization best practices, we recommend visiting AWS Open Data Analytics.


About the Authors

Omar Gonzalez is a Senior Solutions Architect at Amazon Web Services in Southern California with more than 20 years of experience in IT. He is passionate about helping customers drive business value through the use of technology. Outside of work, he enjoys hiking and spending quality time with his family.

Navnit Shukla, an AWS Specialist Solution Architect specializing in Analytics, is passionate about helping clients uncover valuable insights from their data. Leveraging his expertise, he develops inventive solutions that empower businesses to make informed, data-driven decisions. Notably, Navnit Shukla is the accomplished author of the book Data Wrangling on AWS, showcasing his expertise in the field. He also runs the YouTube channel Cloud and Coffee with Navnit, where he shares insights on cloud technologies and analytics. Connect with him on LinkedIn.

How Banfico built an Open Banking and Payment Services Directive (PSD2) compliance solution on AWS

Post Syndicated from Otis Antoniou original https://aws.amazon.com/blogs/architecture/how-banfico-built-an-open-banking-and-payment-services-directive-psd2-compliance-solution-on-aws/

This post was co-written with Paulo Barbosa, the COO of Banfico. 

Introduction

Banfico is a London-based FinTech company, providing market-leading Open Banking regulatory compliance solutions. Over 185 leading Financial Institutions and FinTech companies use Banfico to streamline their compliance process and deliver the future of banking.

Under the EU’s revised PSD2, banks can use application programming interfaces (APIs) to securely share financial data with licensed and approved third-party providers (TPPs), when there is customer consent. For example, this can allow you to track your bank balances across multiple accounts in a single budgeting app.

PSD2 requires that all parties in the open banking system are identified in real time using secured certificates. Banks must also provide a service desk to TPPs, and communicate any planned or unplanned downtime that could impact the shared services.

In this blog post, you will learn how the Red Hat OpenShift Service on AWS helped Banfico deliver their highly secure, available, and scalable Open Banking Directory — a product that enables seamless and compliant connectivity between banks and FinTech companies.

Using this modular architecture, Banfico can also serve other use cases such as confirmation of payee, which is designed to help consumers verify that the name of the recipient account, or business, is indeed the name that they intended to send money to.

Design Considerations

Banfico prioritized the following design principles when building their product:

  1. Scalability: Banfico needed their solution to be able to scale up seamlessly as more financial institutions and TPPs begin to utilize the solution, without any interruption to service.
  2. Leverage Managed Solutions and Minimize Administrative Overhead: The Banfico team wanted to focus on their areas of core competency around the product, financial services regulation, and open banking. They wanted to leverage solutions that could minimize the amount of infrastructure maintenance they have to perform.
  3. Reliability: Because the PSD2 regulations require real-time identification and up-to-date communication about planned or unplanned downtime, reliability was a top priority to enable stable communication channels between TPPs and banks. The Open Banking Directory therefore needed to reach availability of 99.95%.
  4. Security and Compliance: The Open Banking Directory needed to be highly secure, ensuring that sensitive data is protected at all times. This was also important due to Banfico’s ISO27001 certification.

To address these requirements, Banfico decided to partner up with AWS and Red Hat and use the Red Hat OpenShift Service on AWS (ROSA). This is a service operated by Red Hat and jointly supported with AWS to provide fully managed Red Hat OpenShift platform that gives them a scalable, secure, and reliable way to build their product. They also leveraged other AWS Managed Services to minimize infrastructure management tasks and focus on delivering business value for their customers.

To understand how they were able to architect a solution that addressed their needs while following the design considerations, see the following reference architecture diagram.

Banfico’s Open Banking Directory Architecture Overview:

Banfico's open banking directory architecture overview diagram

Breakdown of key components:

Red Hat OpenShift Service on AWS (ROSA) cluster: The Banfico Open Banking SaaS key services are built on a ROSA cluster that is deployed across three Availability Zones for high availability and fault tolerance. These key services support the following fundamental business capabilities:

  • Their core aggregated API platform that integrates with, and provides access to banking information for TPPs.
  • Facilitating transactions and payment authorizations.
  • TPP authentication and authorization, more specifically:
    • Checking if a certain TPP is authorized by each country’s central bank to check account information and initiate payments.
    • Validating TPP certificates that are issued by Qualified Trust Service Provider (QTSPs), which are: “regulated (Qualified) to provide trusted digital certificates under the electronic Identification and Signature (eIDAS) regulation. PSD2 also requires specific types of eIDAS certificates to be issued.” – Planky Open Banking Glossary
  • Certificate issuing and management. Banfico is able to issue, manage, and store digital certificates that TPPs can use to interact with Open Banking APIs.
  • The collection of data from central banks across the world to collect regulated entity details.

Elastic Load Balancer (ELB): A load balancer helps Banfico deliver their highly-available and scalable product. It allows them to route traffic across their containers as they grow, and perform health checks accordingly, and it provides Banfico customers access to the application workloads running on ROSA through the ROSA router layers.

Amazon Elastic File System (Amazon EFS): During the collection of data from central banks, either through APIs or by scraping HTML, Banfico’s workloads and apps use the highly-scalable and durable Amazon EFS for shared storage. Amazon EFS automatically scales and provides high availability, simplifying operations and enabling Banfico to focus on application development and delivery.

Amazon Simple Storage Service (Amazon S3): To store digital certificates issued and managed by Banfico’s Open Banking Directory, they rely on Amazon S3, which is a highly-durable, available, and scalable object storage service.

Amazon Relational Database Service (Amazon RDS): The Open Banking Directory uses Amazon RDS PostgreSQL to store application data coming from their different containerized services. Using Amazon RDS, they are able to have a highly-available managed relational database which they also replicate to a secondary Region for disaster recovery purposes.

AWS Key Management Service (AWS KMS): Banfico uses AWS KMS to encrypt all data stored on the volumes used by Amazon RDS to make sure their data is secured.

AWS Identity and Access Management (IAM): Leveraging IAM with the principle of least privilege allows the product to follow security best practices.

AWS Shield: Banfico’s product relies on AWS Shield for DDoS protection, which helps in dynamic detection and automatic inline mitigation.

Amazon Route 53: Amazon Route 53 routes end users to Banfico’s site reliably with globally dispersed Domain Name System (DNS) servers and automatic scaling. They can set up in minutes, and having custom routing policies help Banfico maintain compliance.

Using this architecture and AWS technologies, Banfico is able to deliver their Open Banking Directory to their customers, through a SaaS frontend as shown in the following image.

Banfico's Open Banking Directory SaaS front-end

Conclusion

This AWS solution has proven instrumental in meeting Banfico’s critical business needs, delivering 99.95% availability and scalability. Through the utilization of AWS services, the Open Banking Directory product seamlessly accommodates the entirety of Banfico’s client traffic across Europe. This heightened agility not only facilitates rapid feature deployment (40% faster application development), but also enhances user satisfaction. Looking ahead, Banfico’s Open Banking Directory remains committed to fostering safety and trust within the open banking ecosystem, with AWS standing as a valued partner in Banfico’s journey toward sustained success. Customers who are looking to build their own secure and scalable products in the Financial Services Industry have access industry AWS Specialists; contact us for help in your cloud journey. You can also learn more about AWS services and solutions for financial services by visiting AWS for Financial Services.

Introducing the APRA CPS 230 AWS Workbook for Australian financial services customers

Post Syndicated from Krish De original https://aws.amazon.com/blogs/security/introducing-the-apra-cps-230-aws-workbook-for-australian-financial-services-customers/

The Australian Prudential Regulation Authority (APRA) has established the CPS 230 Operational Risk Management standard to verify that regulated entities are resilient to operational risks and disruptions. CPS 230 requires regulated financial entities to effectively manage their operational risks, maintain critical operations during disruptions, and manage the risks associated with service providers.

Amazon Web Services (AWS) is excited to announce the launch of the AWS Workbook for the APRA CPS 230 standard to support AWS customers as they work to meet applicable CPS 230 requirements. The workbook describes operational resilience, AWS and the Shared Responsibility Model, AWS compliance programs, and relevant AWS services and whitepapers that relate to regulatory requirements.

This workbook is complementary to the AWS User Guide to Financial Services Regulations and Guidelines in Australia and is available through AWS Artifact.

As the regulatory environment continues to evolve, we’ll provide further updates regarding AWS offerings in this area on the AWS Security Blog and the AWS Compliance page. The AWS Workbook for the APRA CPS 230 adds to the resources AWS provides about financial services regulation across the world. You can find more information on cloud-related regulatory compliance at the AWS Compliance Center. You can also reach out to your AWS account manager for help finding the resources you need.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Krish De
Krish De

Krish is a Principal Solutions Architect with a focus on financial services. He works with AWS customers, their regulators, and AWS teams to safely accelerate customers’ cloud adoption, with prescriptive guidance on governance, risk, and compliance. Krish has over 20 years of experience working in governance, risk, and technology across the financial services industry in Australia, New Zealand, and the United States.

How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation

Post Syndicated from Julien Lafaye original https://aws.amazon.com/blogs/big-data/how-cfm-built-a-well-governed-and-scalable-data-engineering-platform-using-amazon-emr-for-financial-features-generation/

This post is co-written with Julien Lafaye from CFM.

Capital Fund Management (CFM) is an alternative investment management company based in Paris with staff in New York City and London. CFM takes a scientific approach to finance, using quantitative and systematic techniques to develop the best investment strategies. Over the years, CFM has received many awards for their flagship product Stratus, a multi-strategy investment program that delivers decorrelated returns through a diversified investment approach while seeking a risk profile that is less volatile than traditional market indexes. It was first opened to investors in 1995. CFM assets under management are now $13 billion.

A traditional approach to systematic investing involves analysis of historical trends in asset prices to anticipate future price fluctuations and make investment decisions. Over the years, the investment industry has grown in such a way that relying on historical prices alone is not enough to remain competitive: traditional systematic strategies progressively became public and inefficient, while the number of actors grew, making slices of the pie smaller—a phenomenon known as alpha decay. In recent years, driven by the commoditization of data storage and processing solutions, the industry has seen a growing number of systematic investment management firms switch to alternative data sources to drive their investment decisions. Publicly documented examples include the usage of satellite imagery of mall parking lots to estimate trends in consumer behavior and its impact on stock prices. Using social network data has also often been cited as a potential source of data to improve short-term investment decisions. To remain at the forefront of quantitative investing, CFM has put in place a large-scale data acquisition strategy.

As the CFM Data team, we constantly monitor new data sources and vendors to continue to innovate. The speed at which we can trial datasets and determine whether they are useful to our business is a key factor of success. Trials are short projects usually taking up to a several months; the output of a trial is a buy (or not-buy) decision if we detect information in the dataset that can help us in our investment process. Unfortunately, because datasets come in all shapes and sizes, planning our hardware and software requirements several months ahead has been very challenging. Some datasets require large or specific compute capabilities that we can’t afford to buy if the trial is a failure. The AWS pay-as-you-go model and the constant pace of innovation in data processing technologies enable CFM to maintain agility and facilitate a steady cadence of trials and experimentation.

In this post, we share how we built a well-governed and scalable data engineering platform using Amazon EMR for financial features generation.

AWS as a key enabler of CFM’s business strategy

We have identified the following as key enablers of this data strategy:

  • Managed services – AWS managed services reduce the setup cost of complex data technologies, such as Apache Spark.
  • Elasticity – Compute and storage elasticity removes the burden of having to plan and size hardware procurement. This allows us to be more focused on the business and more agile in our data acquisition strategy.
  • Governance – At CFM, our Data teams are split into autonomous teams that can use different technologies based on their requirements and skills. Each team is the sole owner of its AWS account. To share data to our internal consumers, we use AWS Lake Formation with LF-Tags to streamline the process of managing access rights across the organization.

Data integration workflow

A typical data integration process consists of ingestion, analysis, and production phases.

CFM usually negotiates with vendors a download method that is convenient for both parties. We see a lot of possibilities for exchanging data (HTTPS, FPT, SFPT), but we’re seeing a growing number of vendors standardizing around Amazon Simple Storage Service (Amazon S3).

 CFM data scientists then look up the data and build features that can be used in our trading models. The bulk of our data scientists are heavy users of Jupyter Notebook. Jupyter notebooks are interactive computing environments that allow users to create and share documents containing live code, equations, visualizations, and narrative text. They provide a web-based interface where users can write and run code in different programming languages, such as Python, R, or Julia. Notebooks are organized into cells, which can be run independently, facilitating the iterative development and exploration of data analysis and computational workflows.

We invested a lot in polishing our Jupyter stack (see, for example, the open source project Jupytext, which was initiated by a former CFM employee), and we are proud of the level of integration with our ecosystem that we have reached. Although we explored the option of using AWS managed notebooks to streamline the provisioning process, we have decided to continue hosting these components on our on-premises infrastructure for the current timeline. CFM internal users appreciate the existing development environment and switching to an AWS managed environment would imply a change to their habits, and a temporary drop in productivity.

Exploration of small datasets is entirely feasible within this Jupyter environment, but for large datasets, we have identified Spark as the go-to solution. We could have deployed Spark clusters in our data centers, but we have found that Amazon EMR considerably reduces the time to deploy said clusters and provides many interesting features, such as ARM support through AWS Graviton processors, auto scaling capabilities, and the ability to provision transient clusters.

 After a data scientist has written the feature, CFM deploys a script to the production environment that refreshes the feature as new data comes in. These scripts often run in a relatively short amount of time because they only require processing a small increment of data.

Interactive data exploration workflow

CFM’s data scientists’ preferred way of interacting with EMR clusters is through Jupyter notebooks. Having a long history of managing Jupyter notebooks on premises and customizing them, we opted to integrate EMR clusters into our existing stack. The user workflow is as follows:

  1. The user provisions an EMR cluster through the AWS Service Catalog and the AWS Management Console. Users can also use API calls to do this, but usually prefer using the Service Catalog interface. You can choose various instance types that include different combinations of CPU, memory, and storage, giving you the flexibility to choose the appropriate mix of resources for your applications.
  2. The user starts their Jupyter notebook instance and connects to the EMR cluster.
  3. The user interactively works on the data using the notebook.
  4. The user shuts down the cluster through the Service Catalog.

Solution overview

The connection between the notebook and the cluster is achieved by deploying the following open source components:

  • Apache Livy – This service that provides a REST interface to a Spark driver running on an EMR cluster.
  • Sparkmagic – This set of Jupyter magics provides a straightforward way to connect to the cluster and send PySpark code to the cluster through the Livy endpoint.
  • Sagemaker-studio-analytics-extension – This library provides a set of magics to integrate analytics services (such as Amazon EMR) into Jupyter notebooks. It is used to integrate Amazon SageMaker Studio notebooks and EMR clusters (for more details, see Create and manage Amazon EMR Clusters from SageMaker Studio to run interactive Spark and ML workloads – Part 1). Having the requirement to use our own notebooks, we initially didn’t benefit from this integration. To help us, the Amazon EMR service team made this library available on PyPI and guided us in setting it up. We use this library to facilitate the connection between the notebook and the cluster and to forward the user permissions to the clusters through runtime roles. These runtime roles are then used to access the data instead of instance profile roles assigned to the Amazon Elastic Compute Cloud (Amazon EC2) instances that are part of the cluster. This allows more fine-grained access control on our data.

The following diagram illustrates the solution architecture.

Set up Amazon EMR on an EC2 cluster with the GetClusterSessionCredentials API

A runtime role is an AWS Identity and Access Management (IAM) role that you can specify when you submit a job or query to an EMR cluster. The EMR get-cluster-session-credentials API uses a runtime role to authenticate on EMR nodes based on the IAM policies attached runtime role (we document the steps to enable for the Spark terminal; a similar approach can be expanded for Hive and Presto). This option is generally available in all AWS Regions and the recommended release to use is emr-6.9.0 or later.

Connect to Amazon EMR on the EC2 cluster from Jupyter Notebook with the GCSC API

Jupyter Notebook magic commands provide shortcuts and extra functionality to the notebooks in addition to what can be done with your kernel code. We use Jupyter magics to abstract the underlying connection from Jupyter to the EMR cluster; the analytics extension makes the connection through Livy using the GCSC API.

On your Jupyter instance, server, or notebook PySpark kernel, install the following extension, load the magics, and create a connection to the EMR cluster using your runtime role:

pip install sagemaker-studio-analytics-extension
%load_ext sagemaker_studio_analytics_extension.magics
%sm_analytics emr connect --cluster-id j-XXXXXYYYYY --auth-type Basic_Access --language python --emr-executiojn-role-arn

Production with Amazon EMR Serverless

CFM has implemented an architecture based on dozens of pipelines: data is ingested from data on Amazon S3 and transformed using Amazon EMR Serverless with Spark; resulting datasets are published back to Amazon S3.

Each pipeline runs as a separate EMR Serverless application to avoid resource contention between workloads. Individual IAM roles are assigned to each EMR Serverless application to apply least privilege access.

To control costs, CFM uses EMR Serverless automatic scaling combined with the maximum capacity feature (which defines the maximum total vCPU, memory, and disk capacity that can be consumed collectively by all the jobs running under this application). Finally, CFM uses an AWS Graviton architecture to optimize even more cost and performance (as highlighted in the screenshot below).

After some iterations, the user produces a final script that is put in production. For early deployments, we relied on Amazon EMR on EC2 to run those scripts. Based on user feedback, we iterated and investigated for opportunities to reduce cluster startup times. Cluster startups could take up to 8 minutes for a runtime requiring a fraction of that time, which impacted the user experience. Also, we wanted to reduce the operational overhead of starting and stopping EMR clusters.

Those are the reasons why we switched to EMR Serverless a few months after its initial release. This move was surprisingly straightforward because it didn’t require any tuning and worked instantly. The only drawback we have seen is the requirement to update AWS tools and libraries in our software stacks to incorporate all the EMR features (such as AWS Graviton); on the other hand, it led to reduced startup time, reduced costs, and better workload isolation.

At this stage, CFM data scientists can perform analytics and extract value from raw data. Resulting datasets are then published to our data mesh service across our organization to allow our scientists to work on prediction models. In the context of CFM, this requires a strong governance and security posture to apply fine-grained access control to this data. This data mesh approach allows CFM to have a clear view from an audit standpoint on dataset usage.

Data governance with Lake Formation

A data mesh on AWS is an architectural approach where data is treated as a product and owned by domain teams. Each team uses AWS services like Amazon S3, AWS Glue, AWS Lambda, and Amazon EMR to independently build and manage their data products, while tools like the AWS Glue Data Catalog enable discoverability. This decentralized approach promotes data autonomy, scalability, and collaboration across the organization:

  • Autonomy – At CFM, like at most companies, we have different teams with difference skillsets and different technology needs. Enabling teams to work autonomously was a key parameter in our decision to move to a decentralized model where each domain would live in its own AWS account. Another advantage was improved security, particularly the ability to contain the potential impact area in the event of credential leaks or account compromises. Lake Formation is key in enabling this kind of model because it streamlines the process of managing access rights across accounts. In the absence of Lake Formation, administrators would have to make sure that resource policies and user policies align to grant access to data: this is usually considered complex, error-prone, and hard to debug. Lake Formation makes this process a lot less complicated.
  • Scalability – There are no blockers that prevent other organization units from joining the data mesh structure, and we expect more teams to join the effort of refining and sharing their data assets.
  • Collaboration – Lake Formation provides a sound foundation for making data products discoverable by CFM internal consumers. On top of Lake Formation, we developed our own Data Catalog portal. It provides a user-friendly interface where users can discover datasets, read through the documentation, and download code snippets (see the following screenshot). The interface is tailor-made for our work habits.

Lake Formation documentation is extensive and provides a collection of ways to achieve a data governance pattern that fits every organization requirement. We made the following choices:

  • LF-Tags – We use LF-Tags instead of named resource permissioning. Tags are associated to resources, and personas are given the permission to access all resources with a certain tag. This makes scaling the process of managing rights straightforward. Also, this is an AWS recommended best practice.
  • Centralization – Databases and LF-Tags are managed in a centralized account, which is managed by a single team.
  • Decentralization of permissions management – Data producers are allowed to associate tags to the datasets they are responsible for. Administrators of consumer accounts can grant access to tagged resources.

Conclusions

In this post, we discussed how CFM built a well-governed and scalable data engineering platform for financial features generation.

Lake Formation provides a solid foundation for sharing datasets across accounts. It removes the operational complexity of managing complex cross-account access through IAM and resource policies. For now, we only use it to share assets created by data scientists, but plan to add new domains in the near future.

Lake Formation also seamlessly integrates with other analytics services like AWS Glue and Amazon Athena. The ability to provide a comprehensive and integrated suite of analytics tools to our users is a strong reason for adopting Lake Formation.

Last but not least, EMR Serverless reduced operational risk and complexity. EMR Serverless applications start in less than 60 seconds, whereas starting an EMR cluster on EC2 instances typically takes more than 5 minutes (as of this writing). The accumulation of those earned minutes effectively eliminated any further instances of missed delivery deadlines.

If you’re looking to streamline your data analytics workflow, simplify cross-account data sharing, and reduce operational overhead, consider using Lake Formation and EMR Serverless in your organization. Check out the AWS Big Data Blog and reach out to your AWS team to learn more about how AWS can help you use managed services to drive efficiency and unlock valuable insights from your data!


About the Authors

Julien Lafaye is a director at Capital Fund Management (CFM) where he is leading the implementation of a data platform on AWS. He is also heading a team of data scientists and software engineers in charge of delivering intraday features to feed CFM trading strategies. Before that, he was developing low latency solutions for transforming & disseminating financial market data. He holds a Phd in computer science and graduated from Ecole Polytechnique Paris. During his spare time, he enjoys cycling, running and tinkering with electronic gadgets and computers.

Matthieu Bonville is a Solutions Architect in AWS France working with Financial Services Industry (FSI) customers. He leverages his technical expertise and knowledge of the FSI domain to help customer architect effective technology solutions that address their business challenges.

Joel Farvault is Principal Specialist SA Analytics for AWS with 25 years’ experience working on enterprise architecture, data governance and analytics, mainly in the financial services industry. Joel has led data transformation projects on fraud analytics, claims automation, and Master Data Management. He leverages his experience to advise customers on their data strategy and technology foundations.

Automatically replicate your card payment keys across AWS Regions

Post Syndicated from Ruy Cavalcanti original https://aws.amazon.com/blogs/security/automatically-replicate-your-card-payment-keys-across-aws-regions/

In this blog post, I dive into a cross-Region replication (CRR) solution for card payment keys, with a specific focus on the powerful capabilities of AWS Payment Cryptography, showing how your card payment keys can be securely transported and stored.

In today’s digital landscape, where online transactions have become an integral part of our daily lives, ensuring the seamless operation and security of card payment transactions is of utmost importance. As customer expectations for uninterrupted service and data protection continue to rise, organizations are faced with the challenge of implementing robust security measures and disaster recovery strategies that can withstand even the most severe disruptions.

For large enterprises dealing with card payments, the stakes are even higher. These organizations often have stringent requirements related to disaster recovery (DR), resilience, and availability, where even a 99.99 percent uptime isn’t enough. Additionally, because these enterprises deliver their services globally, they need to ensure that their payment applications and the associated card payment keys, which are crucial for securing card data and payment transactions, are securely replicated and stored across AWS Regions.

Furthermore, I explore an event-driven, serverless architecture and the use of AWS PrivateLink to securely move keys through the AWS backbone, providing additional layers of security and efficiency. Overall, this blog post offers valuable insights into using AWS services for secure and resilient data management across AWS Regions.

Card payment key management

If you examine key management, you will notice that card payment keys are shared between devices and third parties today the same as they were around 40 years ago.

A key ceremony is the process held when parties want to securely exchange keys. It involves key custodians responsible for transporting and entering, key components that have been printed on pieces of paper into a hardware security module (HSM). This is necessary to share initial key encryption keys.

Let’s look at the main issues with the current key ceremony process:

  • It requires a secure room with a network-disconnected Payment HSM
  • The logistics are difficult: Three key custodians in the same place at the same time
  • Timewise, it usually takes weeks to have all custodians available, which can interfere with a project release
  • The cost of the operation which includes maintaining a secure room and the travel of the key custodians
  • Lost or stolen key components

Now, let’s consider the working keys used to encrypt sensitive card data. They rely on those initial keys to protect them. If the initial keys are compromised, their associated working keys are also considered compromised. I also see companies using key management standards from the 1990s, such as ANSI X9.17 / FIPS 171, to share working keys. NIST withdrew the FIPS 171 standard in 2005.

Analyzing the current scenario, you’ll notice security risks because of the way keys are shared today and sometimes because organizations are using deprecated standards.

So, let’s get card payment security into the twenty-first century!

Solution overview

AWS Payment Cryptography is a highly available and scalable service that currently operates within the scope of an individual Region. This means that the encryption keys and associated metadata are replicated across multiple Availability Zones within that Region, providing redundancy and minimizing the risk of downtime caused by failures within a single Region.

While this regional replication across multiple Availability Zones provides a higher level of availability and fault tolerance compared to traditional on-premises HSM solutions, some customers with stringent business continuity requirements have requested support for multi-Region replication.

By spanning multiple Regions, organizations can achieve a higher level of resilience and disaster recovery capabilities because data and services can be replicated and failover mechanisms can be implemented across geographically dispersed locations.

This Payment Cryptography CRR solution addresses the critical requirements of high availability, resilience, and disaster recovery for card payment transactions. By replicating encryption keys and associated metadata across multiple Regions, you can maintain uninterrupted access to payment services, even in the event of a regional outage or disaster.

Note: When planning your replication strategy, check the available Payment Cryptography service endpoints.

Here’s how it works:

  1. Primary Region: Encryption keys are generated and managed in a primary Region using Payment Cryptography.
  2. Replication: The generated encryption keys are securely replicated to a secondary Region, creating redundant copies for failover purposes.
  3. Failover: In the event of a regional outage or disaster in the primary Region, payment operations can seamlessly failover to a secondary Region, using the replicated encryption keys to continue processing transactions without interruption.

This cross-Region replication approach enhances availability and resilience and facilitates robust disaster recovery strategies, allowing organizations to quickly restore payment services in a new Region if necessary.

Figure 1: Cross-Region replication (CRR) solution architecture

Figure 1: Cross-Region replication (CRR) solution architecture

The elements of the CRR architecture are as follows:

  1. Payment Cryptography control plane events are sent to an AWS CloudTrail trail.
  2. The CloudTrail trail is configured to send logs to an Amazon CloudWatch Logs log group.
  3. This log group contains an AWS Lambda subscription filter that filters the following events from Payment Cryptography: CreateKey, DeleteKey and ImportKey.
  4. When one of the events is detected, a Lambda function is launched to start key replication.
  5. The Lambda function performs key export and import processes in a secure way using TR-31, which uses an initial key securely generated and shared using TR-34. This initial key is generated when the solution is enabled.
  6. Communication between the primary (origin) Region and the Payment Cryptography service endpoint at the secondary (destination) Region is done through an Amazon Virtual Private Cloud (Amazon VPC) peering connection, over VPC interface endpoints from PrivateLink.
  7. Metadata information is saved on Amazon DynamoDB tables.

Walkthrough

The CRR solution is deployed in several steps, and it’s essential to understand the underlying processes involved, particularly TR-34 (ANSI X9.24-2) and TR-31 (ANSI X9.143-2022), which play crucial roles in ensuring the secure replication of card payment keys across Regions.

  1. Clone the solution repository from GitHub.
  2. Verify that the prerequisites are in place.
  3. Define which Region the AWS Cloud Development Kit (AWS CDK) stack will be deployed in. This is the primary Region that Payment Cryptography keys will be replicated from.
  4. Enable CRR. This step involves the TR-34 process, which is a widely adopted standard for the secure distribution of symmetric keys using asymmetric techniques. In the context of this solution, TR-34 is used to securely exchange the initial key-encrypting key (KEK) between the primary and secondary Regions. This KEK is then used to encrypt and securely transmit the card payment keys (also called working keys) during the replication process. TR-34 uses asymmetric cryptographic algorithms, such as RSA, to maintain the confidentiality, integrity and authenticity of the exchanged keys.

    Figure 2: TR-34 import key process

    Figure 2: TR-34 import key process

  5. Create, import, and delete keys in the primary Region to check that keys will be automatically replicated. This step uses the TR-31 process, which is a standard for the secure exchange of cryptographic keys and related data. In this solution, TR-31 is employed to securely replicate the card payment keys from the primary Region to the secondary Region, using the previously established KEK for encryption. TR-31 incorporates various cryptographic algorithms, such as AES and HMAC, to protect the confidentiality and integrity of the replicated keys during transit

    Figure 3: TR-31 import key process

    Figure 3: TR-31 import key process

  6. Clean up when needed.

Detailed information about key blocks can be found on the related ANSI documentation. To summarize, the TR-31 key block specification and the TR-34 key block specification, which is based on the TR-31 key block specification, consists of three parts:

  1. Key block header (KBH) – Contains attribute information about the key and the key block.
  2. Encrypted data – This is the key (initial key encryption key for TR-34 and working key for TR-31) being exchanged.
  3. Signature (MAC) – Calculated over the KBH and encrypted data.

Figure 4 presents the entire TR-31 and TR-34 key block parts. It is also called key binding method, which is the technique used to protect the key block secrecy and integrity. On both key blocks, the key, its length, and padding fields are encrypted, maintaining the key block secrecy. Signing of the entire key block fields verifies its integrity and authenticity. The signed result is appended to the end of the block.

Figure 4: TR-31 and TR-34 key block formats

Figure 4: TR-31 and TR-34 key block formats

By adhering to industry-standard protocols like TR-34 and TR-31, this solution helps to ensure that the replication of card payment keys across Regions is performed in a secure manner that delivers confidentiality, integrity, and authenticity. It’s worth mentioning that Payment Cryptography fully supports and implements these standards, providing a solution that adheres to PCI standards for secure key management and replication.

If you want to dive deep into this key management processes, see the service documentation page on import and export keys.

Prerequisites

The Payment Cryptography CRR solution will be deployed through the AWS CDK. The code was developed in Python and assumes that there is a python3 executable in your path. It’s also assumed that the AWS Command Line Interface (AWS CLI) and AWS CDK executables exist in the path system variable of your local computer.

Download and install the following:

It’s recommended that you use the latest stable versions. Tests were performed using the following versions:

  • Python: 3.12.2 (MacOS version)
  • jq: 1.7.1 (MacOS version)
  • AWS CLI: aws-cli/2.15.29 Python/3.11.8 Darwin/22.6.0 exe/x86_64 prompt/off
  • AWS CDK: 2.132.1 (build 9df7dd3)

To set up access to your AWS account, see Configure the AWS CLI.

Note: Tests and commands in the following sections where run on a MacOS operating system.

Deploy the primary resources

The solution is deployed in two main parts:

  1. Primary Region resources deployment
  2. CRR setup, where a secondary Region is defined for deployment of the necessary resources

This section will cover the first part:

Figure 5: Primary Region resources

Figure 5: Primary Region resources

Figure 5 shows the resources that will be deployed in the primary Region:

  1. A CloudTrail trail for write-only log events.
  2. CloudWatch Logs log group associated with the CloudTrail trail. An Amazon Simple Storage Service (Amazon S3) bucket is also created to store this trail’s log events.
  3. A VPC, private subnets, a security group, Lambda functions, and VPC endpoint resources to address private communication inside the AWS backbone.
  4. DynamoDB tables and DynamoDB Streams to manage key replication and orchestrate the solution deployment to the secondary Region.
  5. Lambda functions responsible for managing and orchestrating the solution deployment and setup.

Some parameters can be configured before deployment. They’re located in the cdk.json file (part of the GitHub solution to be downloaded) inside the solution base directory.

The parameters reside inside the context.ENVIRONMENTS.dev key:

{
  ...
  "context": {
    ...
    "ENVIRONMENTS": {
      "dev": {
        "origin_vpc_cidr": "10.2.0.0/16",
        "origin_vpc_name": "origin-vpc",
        "origin_subnets_mask": 22,
        "origin_subnets_prefix_name": "origin-subnet-private"
      }
    }
  }
}

Note: You can change the parameters origin_vpc_cidr, origin_vpc_name and origin_subnets_prefix_name.

Validate that there aren’t VPCs already created with the same CIDR range as the one defined in this file. Currently, the solution is set to be deployed in only two Availability Zones, so the suggestion is to keep the origin_subnet_mask value as is.

To deploy the primary resources:

  1. Download the solution folder from GitHub:
    $ git clone https://github.com/aws-samples/automatically-replicate-your-card-payment-keys.git && cd automatically-replicate-your-card-payment-keys

  2. Inside the solution directory, create a python virtual environment:
    $ python3 -m venv .venv

  3. Activate the python virtual environment:
    $ source .venv/bin/activate

  4. Install the dependencies:
    $ pip install -r requirements.txt

  5. If this is the first time deploying resources with the AWS CDK to your account in the selected AWS Region, run:
    $ cdk bootstrap

  6. Deploy the solution using the AWS CDK:
    $ cdk deploy

    Expected output:

    Do you wish to deploy these changes (y/n)? y  
    apc-crr: deploying... [1/1]  
    apc-crr: creating CloudFormation changeset...
     ✅  apc-crr
    ✨  Deployment time: 307.88s
    Stack ARN:
    arn:aws:cloudformation:<aws_region>:<aws_account>:stack/apc-crr/<stack_id>
    ✨  Total time: 316.06s

  7. If the solution is correctly deployed, an AWS CloudFormation stack with the name apc-crr will have a status of CREATE_COMPLETE status. You can check that by running the following command:
    $ aws cloudformation list-stacks --stack-status CREATE_COMPLETE

    Expected output:

    {
        "StackSummaries": [
            {
                "StackId": "arn:aws:cloudformation:us-east-1:111122223333:stack/apc-crr/5933bc00-f5c1-11ee-9bb2-12ef8d00991b",
                "StackName": "apc-crr",
                "CreationTime": "2024-04-08T16:02:07.413000+00:00",
                "LastUpdatedTime": "2024-04-08T16:02:21.439000+00:00",
                "StackStatus": "CREATE_COMPLETE",
                "DriftInformation": {
                    "StackDriftStatus": "NOT_CHECKED"
                }
            },
            {
                "StackId": "arn:aws:cloudformation:us-east-1:111122223333:stack/CDKToolkit/781e5390-e528-11ee-823a-0a6d63bbc467",
                "StackName": "CDKToolkit",
                "TemplateDescription": "This stack includes resources needed to deploy AWS CDK apps into this environment",
                "CreationTime": "2024-03-18T13:07:27.472000+00:00",
                "LastUpdatedTime": "2024-03-18T13:07:35.060000+00:00",
                "StackStatus": "CREATE_COMPLETE",
                "DriftInformation": {
                    "StackDriftStatus": "NOT_CHECKED"
                }
            }
        ]
    }

Set up cross-Region replication

Some parameters can be configured before initiating the setup. They’re located in the enable-crr.json file in the ./application folder.

The contents of the enable-crr.json file are:

{
  "enabled": true,
  "dest_region": "us-east-1",
  "kek_alias": "CRR_KEK_DO-NOT-DELETE_",
  "key_algo": "TDES_3KEY",
  "kdh_alias": "KDH_SIGN_KEY_DO-NOT-DELETE_",
  "krd_alias": "KRD_SIGN_KEY_DO-NOT-DELETE_",
  "dest_vpc_name": "apc-crr/destination-vpc",
  "dest_vpc_cidr": "10.3.0.0/16",
  "dest_subnet1_cidr": "10.3.0.0/22",
  "dest_subnet2_cidr": "10.3.4.0/22",
  "dest_subnets_prefix_name": "apc-crr/destination-vpc/destination-subnet-private",
  "dest_rt_prefix_name": "apc-crr/destination-vpc/destination-rtb-private"
}

You can change the dest_region, dest_vpc_name, dest_vpc_cidr, dest_subnet1_cidr, dest_subnet2_cidr, dest_subnets_prefix_name and dest_rt_prefix_name parameters.

Validate that there are no VPCs or subnets already created with the same CIDR ranges as are defined in this file.

To enable CRR and monitor its deployment process

  1. Enable CRR.

    From the solution base folder, navigate to the application directory:

    $ cd application

    Run the enable script.

    $ ./enable-crr.sh

    Expected output:

    START RequestId: 8aad062a-ff0b-4963-8ca0-f8078346854f Version: $LATEST  
    Setup has initiated. A CloudFormation template will be deployed in us-west-2.  
    Please check the apcStackMonitor log to follow the deployment status.  
    You can do that by checking the CloudWatch Logs Log group /aws/apc-crr/apcStackMonitor in the Management Console,  
    or by typing on a shell terminal: aws logs tail "/aws/lambda/apcStackMonitor" --follow  
    You can also check the CloudFormation Stack in the Management Console: Account 111122223333, Region us-west-2  
    END RequestId: 8aad062a-ff0b-4963-8ca0-f8078346854f  
    REPORT RequestId: 8aad062a-ff0b-4963-8ca0-f8078346854f  Duration: 1484.53 ms  Billed Duration: 1485 ms  Memory Size: 128 MB Max Memory Used: 79 MB  Init Duration: 400.95 ms

    This will launch a CloudFormation stack to be deployed in the AWS Region that the keys will be replicated to (secondary Region). Logs will be presented in the /aws/lambda/apcStackMonitor log (terminal from step 2).

    If the stack is successfully deployed (CREATE_COMPLETE state), then the KEK setup will be invoked. Logs will be presented in the /aws/lambda/apcKekSetup log (terminal from step 3).

    If the following message is displayed in the apcKekSetup log, then it means that the setup was concluded and new working keys created, deleted, or imported will be replicated.

    Keys Generated, Imported and Deleted in <Primary (Origin) Region> are now being automatically replicated to <Secondary (Destination) Region>

    There should be two keys created in the Region where CRR is deployed and two keys created where the working keys will be replicated. Use the following commands to check the keys:

    $ aws payment-cryptography list-keys --region us-east-1

    Command output showing the keys generated in the primary Region (us-east-1 in the example):

    {
        "Keys": [
            {
                "Enabled": true,
                "KeyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/oevdxprw6szesmfx",
                "KeyAttributes": {
                    "KeyAlgorithm": "RSA_4096",
                    "KeyClass": "PUBLIC_KEY",
                    "KeyModesOfUse": {
                        "Decrypt": false,
                        "DeriveKey": false,
                        "Encrypt": false,
                        "Generate": false,
                        "NoRestrictions": false,
                        "Sign": false,
                        "Unwrap": false,
                        "Verify": true,
                        "Wrap": false
                    },
                    "KeyUsage": "TR31_S0_ASYMMETRIC_KEY_FOR_DIGITAL_SIGNATURE"
                },
                "KeyState": "CREATE_COMPLETE"
            },
            {
                "Enabled": true,
                "Exportable": true,
                "KeyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/ey63g3an7u4ifz7u",
                "KeyAttributes": {
                    "KeyAlgorithm": "TDES_3KEY",
                    "KeyClass": "SYMMETRIC_KEY",
                    "KeyModesOfUse": {
                        "Decrypt": true,
                        "DeriveKey": false,
                        "Encrypt": true,
                        "Generate": false,
                        "NoRestrictions": false,
                        "Sign": false,
                        "Unwrap": true,
                        "Verify": false,
                        "Wrap": true
                    },
                    "KeyUsage": "TR31_K0_KEY_ENCRYPTION_KEY"
                },
                "KeyCheckValue": "7FB069",
                "KeyState": "CREATE_COMPLETE"
            }
        ]
    }
    
    $ aws payment-cryptography list-keys --region us-west-2

    The following is the command output showing the keys generated in the secondary Region (us-west-2 in the example):

    {
        "Keys": [
            {
                "Enabled": true,
                "Exportable": true,
                "KeyArn": "arn:aws:payment-cryptography:us-west-2:111122223333:key/4luahnz4ubuioq4s",
                "KeyAttributes": {
                    "KeyAlgorithm": "RSA_2048",
                    "KeyClass": "ASYMMETRIC_KEY_PAIR",
                    "KeyModesOfUse": {
                        "Decrypt": true,
                        "DeriveKey": false,
                        "Encrypt": true,
                        "Generate": false,
                        "NoRestrictions": false,
                        "Sign": false,
                        "Unwrap": true,
                        "Verify": false,
                        "Wrap": true
                    },
                    "KeyUsage": "TR31_D1_ASYMMETRIC_KEY_FOR_DATA_ENCRYPTION"
                },
                "KeyCheckValue": "56739D06",
                "KeyState": "CREATE_COMPLETE"
            },
            {
                "Enabled": true,
                "Exportable": true,
                "KeyArn": "arn:aws:payment-cryptography:us-west-2:111122223333:key/5gao3i6qvuyqqtzk",
                "KeyAttributes": {
                    "KeyAlgorithm": "TDES_3KEY",
                    "KeyClass": "SYMMETRIC_KEY",
                    "KeyModesOfUse": {
                        "Decrypt": true,
                        "DeriveKey": false,
                        "Encrypt": true,
                        "Generate": false,
                        "NoRestrictions": false,
                        "Sign": false,
                        "Unwrap": true,
                        "Verify": false,
                        "Wrap": true
                    },
                    "KeyUsage": "TR31_K0_KEY_ENCRYPTION_KEY"
                },
                "KeyCheckValue": "7FB069",
                "KeyState": "CREATE_COMPLETE"
            }
        ]
    }

  2. Monitor the resources deployment in the secondary Region. Open a terminal to tail the apcStackMonitor Lambda log and check the deployment of the resources in the secondary Region.
    $ aws logs tail "/aws/lambda/apcStackMonitor" --follow

    The expected output is:

    2024-03-05T15:18:17.870000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac INIT_START Runtime Version: python:3.11.v29  Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:2fb93380dac14772d30092f109b1784b517398458eef71a3f757425231fe6769  
    2024-03-05T15:18:18.321000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac START RequestId: 1bdd37b4-e95b-43bd-a49b-9da55e603845 Version: $LATEST  
    2024-03-05T15:18:18.933000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation in progress. Status: CREATE_IN_PROGRESS  
    2024-03-05T15:18:24.017000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation in progress. Status: CREATE_IN_PROGRESS  
    2024-03-05T15:18:29.108000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation in progress. Status: CREATE_IN_PROGRESS  
    ...  
    2024-03-05T15:21:32.302000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation in progress. Status: CREATE_IN_PROGRESS  
    2024-03-05T15:21:37.390000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation completed. Status: CREATE_COMPLETE  
    2024-03-05T15:21:38.258000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack successfully deployed. Status: CREATE_COMPLETE  
    2024-03-05T15:21:38.354000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac END RequestId: 1bdd37b4-e95b-43bd-a49b-9da55e603845  
    2024-03-05T15:21:38.354000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac REPORT RequestId: 1bdd37b4-e95b-43bd-a49b-9da55e603845Duration: 200032.11 ms Billed Duration: 200033 ms  Memory Size: 128 MB Max Memory Used: 93 MB  Init Duration: 450.87 ms

  3. Monitor the setup of the KEKs between the primary and secondary Regions. Open another terminal to tail the apcKekSetup Lambda log and check the setup of the KEK between the key distribution host (Payment Cryptography in the primary Region) and the key receiving devices (Payment Cryptography in the secondary Region).

    This process uses the TR-34 norm.

    $ aws logs tail "/aws/lambda/apcKekSetup" -–follow

    The expected output is:

    2024-03-12T14:58:18.954000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 INIT_START Runtime Version: python:3.11.v29  Runtime Version ARN: arn:aws:lambda:us-west-2::runtime:2fb93380dac14772d30092f109b1784b517398458eef71a3f757425231fe6769  
    2024-03-12T14:58:19.399000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 START RequestId: a9b60171-dfaf-433a-954c-b0a332d22f50 Version: $LATEST  
    2024-03-12T14:58:19.596000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 1. Generating Key Encryption Key (KEK) - Key that will be used to encrypt the Working Keys  
    2024-03-12T14:58:19.850000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 2. Getting APC Import Parameters from us-east-1  
    2024-03-12T14:58:21.680000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 3. Importing the Root Wrapping Certificates in us-west-2  
    2024-03-12T14:58:21.826000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 4. Getting APC Export Parameters from us-west-2  
    2024-03-12T14:58:23.193000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 5. Importing the Root Signing Certificates in us-east-1  
    2024-03-12T14:58:23.439000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 6. Exporting the KEK from us-west-2  
    2024-03-12T14:58:23.555000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 7. Importing the Wrapped KEK to us-east-1  
    2024-03-12T14:58:23.840000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Initial Key Exchange Successfully Completed.  
    2024-03-12T14:58:23.840000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 Keys Generated, Imported and Deleted in us-west-2 are now being automatically replicated to us-east-1  
    2024-03-12T14:58:23.840000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 Keys already present in APC won't be replicated. If you want to, it must be done manually.  
    2024-03-12T14:58:23.844000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 END RequestId: a9b60171-dfaf-433a-954c-b0a332d22f50  
    2024-03-12T14:58:23.844000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 REPORT RequestId: a9b60171-dfaf-433a-954c-b0a332d22f50 Duration: 4444.78 ms  Billed Duration: 4445 ms  Memory Size: 5120 MB  Max Memory Used: 95 MB  Init Duration: 444.73 ms

Testing

Now it’s time to test the solution. The idea is to simulate an application that manages keys in the service. You will use AWS CLI to send commands directly from a local computer to the Payment Cryptography public endpoints.

Check if the user or role being used has the necessary permissions to manage keys in the service. The following AWS Identity and Access Management (IAM) policy example shows an IAM policy that can be attached to the user or role that will run the commands in the service.

{
      "Version": "2012-10-17",
      "Statement": [
            {
               "Effect": "Allow",
               "Action": [
                  "payment-cryptography:CreateKey",
                  "payment-cryptography:ImportKey",
                  "payment-cryptography:DeleteKey"
               ],
               "Resource": [
                  "*"
               ]
            }   
      ]
   }

Note: As an add-on, you can change the *(asterisk) to the Amazon Resource Name (ARN) of the created key.

For information about IAM policies, see the Identity and access management for Payment Cryptography documentation.

To test the solution

  1. Prepare to monitor the replication processes. Open a new terminal to monitor the apcReplicateWk log and verify that keys are being replicated from one Region to the other.
    $ aws logs tail "/aws/lambda/apcReplicateWk" --follow

  2. Create, import, and delete working keys. Start creating and deleting keys in the account and Region where the CRR solution was deployed (primary Region).

    Currently, the solution only listens to the CreateKey, ImportKey and DeleteKey commands. CreateAlias and DeleteAlias commands aren’t yet implemented, so the aliases won’t replicate.

    It takes some time for the replication function to be invoked because it relies on the following steps:

    1. A Payment Cryptography (CreateKey, ImportKey, or DeleteKey) log event is delivered to a CloudTrail trail.
    2. The log event is sent to the CloudWatch Logs log group, which invokes the subscription filter and the Lambda function associated with it is run.

    CloudTrail typically delivers logs within about 5 minutes of an API call. This time isn’t guaranteed. See the AWS CloudTrail Service Level Agreement for more information.

    Example 1: Create a working key

    Run the following command:

    aws payment-cryptography create-key --exportable --key-attributes KeyAlgorithm=TDES_2KEY,\
    KeyUsage=TR31_C0_CARD_VERIFICATION_KEY,KeyClass=SYMMETRIC_KEY, \
    KeyModesOfUse='{Generate=true,Verify=true}'

    Command output:

    {
        "Key": {
            "CreateTimestamp": "2022-10-26T16:04:11.642000-07:00",
            "Enabled": true,
            "Exportable": true,
            "KeyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/hjprdg5o4jtgs5tw",
            "KeyAttributes": {
                "KeyAlgorithm": "TDES_2KEY",
                "KeyClass": "SYMMETRIC_KEY",
                "KeyModesOfUse": {
                    "Decrypt": false,
                    "DeriveKey": false,
                    "Encrypt": false,
                    "Generate": true,
                    "NoRestrictions": false,
                    "Sign": false,
                    "Unwrap": false,
                    "Verify": true,
                    "Wrap": false
                },
                "KeyUsage": "TR31_C0_CARD_VERIFICATION_KEY"
            },
            "KeyCheckValue": "B72F",
            "KeyCheckValueAlgorithm": "ANSI_X9_24",
            "KeyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
            "KeyState": "CREATE_COMPLETE",
            "UsageStartTimestamp": "2022-10-26T16:04:11.559000-07:00"
        }
    }

    From the terminal where the /aws/lambda/apcReplicateWk log is being tailed, the expected output is:

    2024-03-05T15:57:13.871000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 INIT_START Runtime Version: python:3.11.v29   Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:2fb93380dac14772d30092f109b1784b517398458eef71a3f757425231fe6769  
    2024-03-05T15:57:14.326000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 START RequestId: c7670e9b-6db0-494e-86c4-4c64126695ee Version: $LATEST  
    2024-03-05T15:57:14.327000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 This is a WK! Sync in progress...  
    2024-03-05T15:57:14.717000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 ##### Step 1. Exporting SYMMETRIC_KEY arn:aws:payment-cryptography:us-east-1:111122223333:key/hjprdg5o4jtgs5tw from us-east-1 using alias/CRR_KEK_DO-NOT-DELETE_6e3606a32690 Key Encryption Key  
    2024-03-05T15:57:15.044000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 ##### Step 2. Importing the Wrapped Key to us-west-2  
    2024-03-05T15:57:15.661000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 Imported SYMMETRIC_KEY key: arn:aws:payment-cryptography:us-west-2:111122223333:key/bykk4cwnbyfu3exo as TR31_C0_CARD_VERIFICATION_KEY in us-west-2  
    2024-03-05T15:57:15.794000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 END RequestId: c7670e9b-6db0-494e-86c4-4c64126695ee  
    2024-03-05T15:57:15.794000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 REPORT RequestId: c7670e9b-6db0-494e-86c4-4c64126695ee        Duration: 1468.13 ms    Billed Duration: 1469 ms        Memory Size: 128 MB     Max Memory Used: 78 MB  Init Duration: 454.02 ms

    Example 2: Delete a working key:

    Run the following command:

    aws payment-cryptography delete-key \
    --key-identifier arn:aws:payment-cryptography:us-east-1:111122223333:key/hjprdg5o4jtgs5tw

    Command output:

    {
        "Key": {
            "KeyArn": "arn:aws:payment-cryptography:us-east-2:111122223333:key/kwapwa6qaifllw2h",
            "KeyAttributes": {
                "KeyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
                "KeyClass": "SYMMETRIC_KEY",
                "KeyAlgorithm": "TDES_2KEY",
                "KeyModesOfUse": {
                    "Encrypt": false,
                    "Decrypt": false,
                    "Wrap": false,
                    "Unwrap": false,
                    "Generate": true,
                    "Sign": false,
                    "Verify": true,
                    "DeriveKey": false,
                    "NoRestrictions": false
                }
            },
            "KeyCheckValue": "",
            "KeyCheckValueAlgorithm": "ANSI_X9_24",
            "Enabled": false,
            "Exportable": true,
            "KeyState": "DELETE_PENDING",
            "KeyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
            "CreateTimestamp": "2023-06-05T12:01:29.969000-07:00",
            "UsageStopTimestamp": "2023-06-05T14:31:13.399000-07:00",
            "DeletePendingTimestamp": "2023-06-12T14:58:32.865000-07:00"
        }
    }

    From the terminal where the /aws/lambda/apcReplicateWk log is being tailed, the expected output is:

    2024-03-05T16:02:56.892000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 START RequestId: d557cb28-6974-4888-bb7b-9f8aa4b78640 Version: $LATEST  
    2024-03-05T16:02:56.894000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 This is not CreateKey or ImportKey!  
    2024-03-05T16:02:57.621000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 arn:aws:payment-cryptography:us-west-2: 111122223333:key/bykk4cwnbyfu3exo deleted from us-west-2.  
    2024-03-05T16:02:57.691000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 END RequestId: d557cb28-6974-4888-bb7b-9f8aa4b78640  
    2024-03-05T16:02:57.691000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 REPORT RequestId: d557cb28-6974-4888-bb7b-9f8aa4b78640        Duration: 802.89 ms     Billed Duration: 803 ms Memory Size: 128 MB     Max Memory Used: 79 MB

    See the service documentation for more information about key management operations.

Clean up

You can disable the solution (keys will stop being replicated, but the resources from the primary Region will remain deployed) or destroy the resources that were deployed.

Note: Completing only Step 3, destroying the stack, won’t delete the resources deployed in the secondary Region or the keys that have been generated.

  1. Disable the CRR solution.

    The KEKs created during the enablement process will be disabled and marked for deletion in both the primary and secondary Regions. The waiting period before deletion is 3 days.

    From the base directory where the solution is deployed, run the following commands:

    $ source .venv/bin/activate
    $ cd application
    $ ./disable-crr.sh

    Expected output:

    START RequestId: bc96659c-3063-460a-8b29-2aa21b967c9a Version: $LATEST  
    Deletion has initiated...  
    Please check the apcKekSetup log to check if the solution has been successfully disabled.  
    You can do that by checking the CloudWatch Logs Log group /aws/apc-crr/apcKekSetup in the Management Console,  
    or by typing on a shell terminal: aws logs tail "/aws/lambda/apcKekSetup" --follow  
      
    Please check the apcStackMonitor log to follow the stack deletion status.  
    You can do that by checking the CloudWatch Logs Log group /aws/apc-crr/apcStackMonitor in the Management Console,  
    or by typing on a shell terminal: aws logs tail "/aws/lambda/apcStackMonitor" --follow  
    END RequestId: bc96659c-3063-460a-8b29-2aa21b967c9a  
    REPORT RequestId: bc96659c-3063-460a-8b29-2aa21b967c9a  Duration: 341.94 ms Billed Duration: 342 ms Memory Size: 128 MB Max Memory Used: 76 MB  Init Duration: 429.87 ms

  2. Monitor the Lambda functions logs.

    Open two other terminals.

    1. On the first terminal, run:
      $ aws logs tail "/aws/lambda/apcKekSetup" --follow

    2. On the second terminal, run:
      $ aws logs tail "/aws/lambda/apcStackMonitor" --follow

    Keys created during the exchange of the KEK will be deleted and the logs will be presented in the /aws/lambda/apcKekSetup log group.

    Expected output:

    2024-03-05T16:40:23.510000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d INIT_START Runtime Version: python:3.11.v28  Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:7893bafe1f7e5c0681bc8da889edf656777a53c2a26e3f73436bdcbc87ccfbe8  
    2024-03-05T16:40:23.971000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d START RequestId: fc10b303-f028-4a94-a2cf-b8c0a762ea16 Version: $LATEST  
    2024-03-05T16:40:23.971000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d Disabling CRR and Deleting KEKs  
    2024-03-05T16:40:25.276000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d Keys and aliases deleted from APC.  
    2024-03-05T16:40:25.294000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d DB status updated.  
    2024-03-05T16:40:25.297000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d END RequestId: fc10b303-f028-4a94-a2cf-b8c0a762ea16  
    2024-03-05T16:40:25.297000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d REPORT RequestId: fc10b303-f028-4a94-a2cf-b8c0a762ea16 Duration: 1326.39 ms  Billed Duration: 1327 ms  Memory Size: 5120 MB  Max Memory Used: 94 MB  Init Duration: 460.29 ms

    Second, the CloudFormation stack will be deleted with its associated resources.
    Logs will be presented in the /aws/lambda/apcStackMonitor Log group.

    Expected output:

    2024-03-05T16:40:25.854000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec START RequestId: 6b0b8207-19ae-40a1-b889-c92f8a5c243c Version: $LATEST  
    2024-03-05T16:40:26.486000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec De-provisioning Resources in the Destination Region. StackName: apc-setup-orchestrator-77aecbcf-1e4f-4e2a-8faa-6e3606a32690  
    2024-03-05T16:40:26.805000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:40:31.889000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:40:36.977000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:40:42.065000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:40:47.152000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    ...  
    2024-03-05T16:44:10.598000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:44:15.683000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:44:20.847000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion completed. Status: DELETE_COMPLETE  
    2024-03-05T16:44:21.043000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Resources successfully deleted. Status: DELETE_COMPLETE  
    2024-03-05T16:44:21.601000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec END RequestId: 6b0b8207-19ae-40a1-b889-c92f8a5c243c  
    2024-03-05T16:44:21.601000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec REPORT RequestId: 6b0b8207-19ae-40a1-b889-c92f8a5c243cDuration: 235746.42 ms Billed Duration: 235747 ms  Memory Size: 128 MB Max Memory Used: 94 MB

  3. Destroy the CDK stack.
    $ cd ..
    $ cdk destroy

    Expected output:

    Are you sure you want to delete: apc-crr (y/n)? y  
    apc-crr: destroying... [1/1]  
     ✅  apc-crr: destroyed

  4. Delete all remaining working keys from the primary Region. If keys generated in the primary Region weren’t deleted before disabling the solution, then they’ll also exist in the secondary Region. To clean up keys in both Regions, get the key ARNS that have a status of CREATE_COMPLETE and delete them.
    $ aws payment-cryptography list-keys --region <Primary (Origin) Region>
    
    $ aws payment-cryptography delete-key \
    --key-identifier <key arn> --region <Primary (Origin) Region>
    
    $ aws payment-cryptography list-keys --region <Secondary (Destination) Region>
    
    $ aws payment-cryptography delete-key \
    --key-identifier <key arn> --region <Secondary (Destination) Region>

Security considerations

While using this Payment Cryptography CRR solution, it is crucial that you follow security best practices to maintain the highest level of protection for sensitive payment data:

  • Least privilege access: Implement strict access controls and follow the principle of least privilege, granting access to payment cryptography resources only to authorized personnel and services.
  • Encryption in transit and at rest: Make sure that sensitive data, including encryption keys and payment card data, is encrypted both in transit and at rest using industry-standard encryption algorithms.
  • Audit logging and monitoring: Activate audit logging and continuously monitor activity logs for suspicious or unauthorized access attempts.
  • Regular key rotation: Implement a key rotation strategy to periodically rotate encryption keys, reducing the risk of key compromise and minimizing potential exposure.
  • Incident response plan: Develop and regularly test an incident response plan to promote efficient and coordinated actions in the event of a security breach or data compromise.

Conclusion

In the rapidly evolving world of card payment transactions, maintaining high availability, resilience, robust security, and disaster recovery capabilities is crucial for maintaining customer trust and business continuity. AWS Payment Cryptography offers a solution that is tailored specifically for protecting sensitive payment card data.

By using the CRR solution, organizations can confidently address the stringent requirements of the payment industry, safeguarding sensitive data while providing continued access to payment services, even in the face of regional outages or disasters. With Payment Cryptography, organizations are empowered to deliver seamless and secure payment experiences to their customers.

To go further with this solution, you can modify it to fit your organization architecture by, for example, adding replication for the CreateAlias command. This will allow the Payment Cryptography keys alias to also be replicated between the primary and secondary Regions.

References

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Ruy Cavalcanti
Ruy Cavalcanti

Ruy is a Senior Security Architect for the Latin American financial market at AWS. He has worked in IT and Security for over 19 years, helping customers create secure architectures and solve data protection and compliance challenges. When he’s not architecting secure solutions, he enjoys jamming on his guitar, cooking Brazilian-style barbecue, and spending time with his family and friends.

Encryption in transit over external networks: AWS guidance for NYDFS and beyond

Post Syndicated from Aravind Gopaluni original https://aws.amazon.com/blogs/security/encryption-in-transit-over-external-networks-aws-guidance-for-nydfs-and-beyond/

On November 1, 2023, the New York State Department of Financial Services (NYDFS) issued its Second Amendment (the Amendment) to its Cybersecurity Requirements for Financial Services Companies adopted in 2017, published within Section 500 of 23 NYCRR 500 (the Cybersecurity Requirements; the Cybersecurity Requirements as amended by the Amendment, the Amended Cybersecurity Requirements). In the introduction to its Cybersecurity Resource Center, the Department explains that the revisions are aimed at addressing the changes in the increasing sophistication of threat actors, the prevalence of and relative ease in running cyberattacks, and the availability of additional controls to manage cyber risks.

This blog post focuses on the revision to the encryption in transit requirement under section 500.15(a). It outlines the encryption capabilities and secure connectivity options offered by Amazon Web Services (AWS) to help customers demonstrate compliance with this updated requirement. The post also provides best practices guidance, emphasizing the shared responsibility model. This enables organizations to design robust data protection strategies that address not only the updated NYDFS encryption requirements but potentially also other security standards and regulatory requirements.

The target audience for this information includes security leaders, architects, engineers, and security operations team members and risk, compliance, and audit professionals.

Note that the information provided here is for informational purposes only; it is not legal or compliance advice and should not be relied on as legal or compliance advice. Customers are responsible for making their own independent assessments and should obtain appropriate advice from their own legal and compliance advisors regarding compliance with applicable NYDFS regulations.

500.15 Encryption of nonpublic information

The updated requirement in the Amendment states that:

  1. As part of its cybersecurity program, each covered entity shall implement a written policy requiring encryption that meets industry standards, to protect nonpublic information held or transmitted by the covered entity both in transit over external networks and at rest.
  2. To the extent a covered entity determines that encryption of nonpublic information at rest is infeasible, the covered entity may instead secure such nonpublic information using effective alternative compensating controls that have been reviewed and approved by the covered entity’s CISO in writing. The feasibility of encryption and effectiveness of the compensating controls shall be reviewed by the CISO at least annually.

This section of the Amendment removes the covered entity’s chief information security officer’s (CISO) discretion to approve compensating controls when encryption of nonpublic information in transit over external networks is deemed infeasible. The Amendment mandates that, effective November 2024, organizations must encrypt nonpublic information transmitted over external networks without the option of implementing alternative compensating controls. While the use of security best practices such as network segmentation, multi-factor authentication (MFA), and intrusion detection and prevention systems (IDS/IPS) can provide defense in depth, these compensating controls are no longer sufficient to replace encryption in transit over external networks for nonpublic information.

However, the Amendment still allows for the CISO to approve the use of alternative compensating controls where encryption of nonpublic information at rest is deemed infeasible. AWS is committed to providing industry-standard encryption services and capabilities to help protect customer data at rest in the cloud, offering customers the ability to add layers of security to their data at rest, providing scalable and efficient encryption features. This includes the following services:

While the above highlights encryption-at-rest capabilities offered by AWS, the focus of this blog post is to provide guidance and best practice recommendations for encryption in transit.

AWS guidance and best practice recommendations

Cloud network traffic encompasses connections to and from the cloud and traffic between cloud service provider (CSP) services. From an organization’s perspective, CSP networks and data centers are deemed external because they aren’t under the organization’s direct control. The connection between the organization and a CSP, typically established over the internet or dedicated links, is considered an external network. Encrypting data in transit over these external networks is crucial and should be an integral part of an organization’s cybersecurity program.

AWS implements multiple mechanisms to help ensure the confidentiality and integrity of customer data during transit and at rest across various points within its environment. While AWS employs transparent encryption at various transit points, we strongly recommend incorporating encryption by design into your architecture. AWS provides robust encryption-in-transit capabilities to help you adhere to compliance requirements and mitigate the risks of unauthorized disclosure and modification of nonpublic information in transit over external networks.

Additionally, AWS recommends that financial services institutions adopt a secure by design (SbD) approach to implement architectures that are pre-tested from a security perspective. SbD helps establish control objectives, security baselines, security configurations, and audit capabilities for workloads running on AWS.

Security and Compliance is a shared responsibility between AWS and the customer. Shared responsibility can vary depending on the security configuration options for each service. You should carefully consider the services you choose because your organization’s responsibilities vary depending on the services used, the integration of those services into your IT environment, and applicable laws and regulations. AWS provides resources such as service user guides and AWS Customer Compliance Guides, which map security best practices for individual services to leading compliance frameworks, including NYDFS.

Protecting connections to and from AWS

We understand that customers place a high priority on privacy and data security. That’s why AWS gives you ownership and control over your data through services that allow you to determine where your content will be stored, secure your content in transit and at rest, and manage access to AWS services and resources for your users. When architecting workloads on AWS, classifying data based on its sensitivity, criticality, and compliance requirements is essential. Proper data classification allows you to implement appropriate security controls and data protection mechanisms, such as Transport Layer Security (TLS) at the application layer, access control measures, and secure network connectivity options for nonpublic information over external networks. When it comes to transmitting nonpublic information over external networks, it’s a recommended practice to identify network segments traversed by this data based on your network architecture. While AWS employs transparent encryption at various transit points, it’s advisable to implement encryption solutions at multiple layers of the OSI model to establish defense in depth and enhance end-to-end encryption capabilities. Although requirement 500.15 of the Amendment doesn’t mandate end-to-end encryption, implementing such controls can provide an added layer of security and can help demonstrate that nonpublic information is consistently encrypted during transit.

AWS offers several options to achieve this. While not every option provides end-to-end encryption on its own, using them in combination helps to ensure that nonpublic information doesn’t traverse open, public networks unprotected. These options include:

  • Using AWS Direct Connect with IEEE 802.1AE MAC Security Standard (MACsec) encryption
  • VPN connections
  • Secure API endpoints
  • Client-side encryption of data before sending it to AWS

AWS Direct Connect with MACsec encryption

AWS Direct Connect provides direct connectivity to the AWS network through third-party colocation facilities, using a cross-connect between an AWS owned device and either a customer- or partner-owned device. Direct Connect can reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections. Within Direct Connect connections (a physical construct) there will be one or more virtual interfaces (VIFs). These are logical entities and are reflected as industry-standard 802.1Q VLANs on the customer equipment terminating the Direct Connect connection. Depending on the type of VIF, they will use either public or private IP addressing. There are three different types of VIFs:

  • Public virtual interface – Establish connectivity between AWS public endpoints and your data center, office, or colocation environment.
  • Transit virtual interface – Establish private connectivity between AWS Transit Gateways and your data center, office, or colocation environment. Transit Gateways is an AWS managed high availability and scalability regional network transit hub used to interconnect Amazon Virtual Private Cloud (Amazon VPC) and customer networks.
  • Private virtual interface – Establish private connectivity between Amazon VPC resources and your data center, office, or colocation environment.

By default, a Direct Connect connection isn’t encrypted from your premises to the Direct Connect location because AWS cannot assume your on-premises device supports the MACsec protocol. With MACsec, Direct Connect delivers native, near line-rate, point-to-point encryption, ensuring that data communications between AWS and your corporate network remain protected. MACsec is supported on 10 Gbps and 100 Gbps dedicated Direct Connect connections at selected points of presence. Using Direct Connect with MACsec-enabled connections and combining it with the transparent physical network encryption offered by AWS from the Direct Connect location through the AWS backbone not only benefits you by allowing you to securely exchange data with AWS, but also enables you to use the highest available bandwidth. For additional information on MACsec support and cipher suites, see the MACsec section in the Direct Connect FAQs.

Figure 1 illustrates a sample reference architecture for securing traffic from corporate network to your VPCs over Direct Connect with MACsec and AWS Transit Gateways.

Figure 1: Sample architecture for using Direct Connect with MACsec encryption

Figure 1: Sample architecture for using Direct Connect with MACsec encryption

In the sample architecture, you can see that Layer 2 encryption through MACsec only encrypts the traffic from your on-premises systems to the AWS device in the Direct Connect location, and therefore you need to consider additional encryption solutions at Layer 3, 4, or 7 to get closer to end-to-end encryption to the device where you’re comfortable for the packets to be decrypted. In the next section, let’s review an option for using network layer encryption using AWS Site-to-Site VPN.

Direct Connect with Site-to-Site VPN

AWS Site-to-Site VPN is a fully managed service that creates a secure connection between your corporate network and your Amazon VPC using IP security (IPsec) tunnels over the internet. Data transferred between your VPC and the remote network routes over an encrypted VPN connection to help maintain the confidentiality and integrity of data in transit. Each VPN connection consists of two tunnels between a virtual private gateway or transit gateway on the AWS side and a customer gateway on the on-premises side. Each tunnel supports a maximum throughput of up to 1.25 Gbps. See Site-to-Site VPN quotas for more information.

You can use Site-to-Site VPN over Direct Connect to achieve secure IPsec connection with the low latency and consistent network experience of Direct Connect when reaching resources in your Amazon VPCs.

Figure 2 illustrates a sample reference architecture for establishing end-to-end IPsec-encrypted connections between your networks and Transit Gateway over a private dedicated connection.

Figure 2: Encrypted connections between the AWS Cloud and a customer’s network using VPN

Figure 2: Encrypted connections between the AWS Cloud and a customer’s network using VPN

While Direct Connect with MACsec and Site-to-Site VPN with IPsec can provide encryption at the physical and network layers respectively, they primarily secure the data in transit between your on-premises network and the AWS network boundary. To further enhance the coverage for end-to-end encryption, it is advisable to use TLS encryption. In the next section, let’s review mechanisms for securing API endpoints on AWS using TLS encryption.

Secure API endpoints

APIs act as the front door for applications to access data, business logic, or functionality from other applications and backend services.

AWS enables you to establish secure, encrypted connections to its services using public AWS service API endpoints. Public AWS owned service API endpoints (AWS managed services like Amazon Simple Queue Service (Amazon SQS), AWS Identity and Access Management (IAM), AWS Key Management Service (AWS KMS), others) have certificates that are owned and deployed by AWS. By default, requests to these public endpoints use HTTPS. To align with evolving technology and regulatory standards for TLS, as of February 27, 2024, AWS has updated its TLS policy to require a minimum of TLS 1.2, thereby deprecating support for TLS 1.0 and 1.1 versions on AWS service API endpoints across each of our AWS Regions and Availability Zones.

Additionally, to enhance connection performance, AWS has begun enabling TLS version 1.3 globally for its service API endpoints. If you’re using the AWS SDKs or AWS Command Line Interface (AWS CLI), you will automatically benefit from TLS 1.3 after a service enables it.

While requests to public AWS service API endpoints use HTTPS by default, a few services, such as Amazon S3 and Amazon DynamoDB, allow using either HTTP or HTTPS. If the client or application chooses HTTP, the communication isn’t encrypted. Customers are responsible for enforcing HTTPS connections when using such AWS services. To help ensure secure communication, you can establish an identity perimeter by using the IAM policy condition key aws:SecureTransport in your IAM roles to evaluate the connection and mandate HTTPS usage.

As enterprises increasingly adopt cloud computing and microservices architectures, teams frequently build and manage internal applications exposed as private API endpoints. Customers are responsible for managing the certificates on private customer-owned endpoints. AWS helps you deploy private customer-owned identities (that is, TLS certificates) through the use of AWS Certificate Manager (ACM) private certificate authorities (PCA) and the integration with AWS services that offer private customer-owned TLS termination endpoints.

ACM is a fully managed service that lets you provision, manage, and deploy public and private TLS certificates for use with AWS services and internal connected resources. ACM minimizes the time-consuming manual process of purchasing, uploading, and renewing TLS certificates. You can provide certificates for your integrated AWS services either by issuing them directly using ACM or by importing third-party certificates into the ACM management system. ACM offers two options for deploying managed X.509 certificates. You can choose the best one for your needs.

  • AWS Certificate Manager (ACM) – This service is for enterprise customers who need a secure web presence using TLS. ACM certificates are deployed through Elastic Load Balancing (ELB), Amazon CloudFront, Amazon API Gateway, and other integrated AWS services. The most common application of this type is a secure public website with significant traffic requirements. ACM also helps to simplify security management by automating the renewal of expiring certificates.
  • AWS Private Certificate Authority (Private CA) – This service is for enterprise customers building a public key infrastructure (PKI) inside the AWS Cloud and is intended for private use within an organization. With AWS Private CA, you can create your own certificate authority (CA) hierarchy and issue certificates with it for authenticating users, computers, applications, services, servers, and other devices. Certificates issued by a private CA cannot be used on the internet. For more information, see the AWS Private CA User Guide.

You can use a centralized API gateway service, such as Amazon API Gateway, to securely expose customer-owned private API endpoints. API Gateway is a fully managed service that allows developers to create, publish, maintain, monitor, and secure APIs at scale. With API Gateway, you can create RESTful APIs and WebSocket APIs, enabling near real-time, two-way communication applications. API Gateway operations must be encrypted in-transit using TLS, and require the use of HTTPS endpoints. You can use API Gateway to configure custom domains for your APIs using TLS certificates provisioned and managed by ACM. Developers can optionally choose a specific TLS version for their custom domain names. For use cases that require mutual TLS (mTLS) authentication, you can configure certificate-based mTLS authentication on your custom domains.

Pre-encryption of data to be sent to AWS

Depending on the risk profile and sensitivity of the data that’s being transferred to AWS, you might want to choose encrypting data in an application running on your corporate network before sending it to AWS (client-side encryption). AWS offers a variety of SDKs and client-side encryption libraries to help you encrypt and decrypt data in your applications. You can use these libraries with the cryptographic service provider of your choice, including AWS Key Management Service or AWS CloudHSM, but the libraries do not require an AWS service.

  • The AWS Encryption SDK is a client-side encryption library that you can use to encrypt and decrypt data in your application and is available in several programming languages, including a command-line interface. You can use the SDK to encrypt your data before you send it to an AWS service. The SDK offers advanced data protection features, including envelope encryption and additional authenticated data (AAD). It also offers secure, authenticated, symmetric key algorithm suites, such as 256-bit AES-GCM with key derivation and signing.
  • The AWS Database Encryption SDK is a set of software libraries developed in open source that enable you to include client-side encryption in your database design. The SDK provides record-level encryption solutions. You specify which fields are encrypted and which fields are included in the signatures that help ensure the authenticity of your data. Encrypting your sensitive data in transit and at rest helps ensure that your plaintext data isn’t available to a third party, including AWS. The AWS Database Encryption SDK for DynamoDB is designed especially for DynamoDB applications. It encrypts the attribute values in each table item using a unique encryption key. It then signs the item to protect it against unauthorized changes, such as adding or deleting attributes or swapping encrypted values. After you create and configure the required components, the SDK transparently encrypts and signs your table items when you add them to a table. It also verifies and decrypts them when you retrieve them. Searchable encryption in the AWS Database Encryption SDK enables you search encrypted records without decrypting the entire database. This is accomplished by using beacons, which create a map between the plaintext value written to a field and the encrypted value that is stored in your database. For more information, see the AWS Database Encryption SDK Developer Guide.
  • The Amazon S3 Encryption Client is a client-side encryption library that enables you to encrypt an object locally to help ensure its security before passing it to Amazon S3. It integrates seamlessly with the Amazon S3 APIs to provide a straightforward solution for client-side encryption of data before uploading to Amazon S3. After you instantiate the Amazon S3 Encryption Client, your objects are automatically encrypted and decrypted as part of your Amazon S3 PutObject and GetObject requests. Your objects are encrypted with a unique data key. You can use both the Amazon S3 Encryption Client and server-side encryption to encrypt your data. The Amazon S3 Encryption Client is supported in a variety of programming languages and supports industry-standard algorithms for encrypting objects and data keys. For more information, see the Amazon S3 Encryption Client developer guide.

Encryption in-transit inside AWS

AWS implements responsible and sophisticated technical and physical controls that are designed to help prevent unauthorized access to or disclosure of your content. To protect data in transit, traffic traversing through the AWS network that is outside of AWS physical control is transparently encrypted by AWS at the physical layer. This includes traffic between AWS Regions (except China Regions), traffic between Availability Zones, and between Direct Connect locations and Regions through the AWS backbone network.

Network segmentation

When you create an AWS account, AWS offers a virtual networking option to launch resources in a logically isolated virtual private network (VPN), Amazon Virtual Private Cloud (Amazon VPC). A VPC is limited to a single AWS Region and every VPC has one or more subnets. VPCs can be connected externally using an internet gateway (IGW), VPC peering connection, VPN, Direct Connect, or Transit Gateways. Traffic within the your VPC is considered internal because you have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways.

As a customer, you maintain ownership of your data, and you select which AWS services can process, store, and host your data, and you choose the Regions in which your data is stored. AWS doesn’t automatically replicate data across Regions, unless the you choose to do so. Data transmitted over the AWS global network between Regions and Availability Zones is automatically encrypted at the physical layer before leaving AWS secured facilities. Cross-Region traffic that uses Amazon VPC and Transit Gateway peering is automatically bulk-encrypted when it exits a Region.

Encryption between instances

AWS provides secure and private connectivity between Amazon Elastic Compute Cloud (Amazon EC2) instances of all types. The Nitro System is the underlying foundation for modern Amazon EC2 instances. It’s a combination of purpose-built server designs, data processors, system management components, and specialized firmware that provides the underlying foundation for EC2 instances launched since the beginning of 2018. Instance types that use the offload capabilities of the underlying Nitro System hardware automatically encrypt in-transit traffic between instances. This encryption uses Authenticated Encryption with Associated Data (AEAD) algorithms, with 256-bit encryption and has no impact on network performance. To support this additional in-transit traffic encryption between instances, instances must be of supported instance types, in the same Region, and in the same VPC or peered VPCs. For a list of supported instance types and additional requirements, see Encryption in transit.

Conclusion

The second Amendment to the NYDFS Cybersecurity Regulation underscores the criticality of safeguarding nonpublic information during transmission over external networks. By mandating encryption for data in transit and eliminating the option for compensating controls, the Amendment reinforces the need for robust, industry-standard encryption measures to protect the confidentiality and integrity of sensitive information.

AWS provides a comprehensive suite of encryption services and secure connectivity options that enable you to design and implement robust data protection strategies. The transparent encryption mechanisms that AWS has built into services across its global network infrastructure, secure API endpoints with TLS encryption, and services such as Direct Connect with MACsec encryption and Site-to-Site VPN, can help you establish secure, encrypted pathways for transmitting nonpublic information over external networks.

By embracing the principles outlined in this blog post, financial services organizations can address not only the updated NYDFS encryption requirements for section 500.15(a) but can also potentially demonstrate their commitment to data security across other security standards and regulatory requirements.

For further reading on considerations for AWS customers regarding adherence to the Second Amendment to the NYDFS Cybersecurity Regulation, see the AWS Compliance Guide to NYDFS Cybersecurity Regulation.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Financial Services re:Post and AWS Security, Identity, & Compliance re:Post ,or contact AWS Support.
 

Aravind Gopaluni
Aravind Gopaluni

Aravind is a Senior Security Solutions Architect at AWS, helping financial services customers navigate ever-evolving cloud security and compliance needs. With over 20 years of experience, he has honed his expertise in delivering robust solutions to numerous global enterprises. Away from the world of cybersecurity, he cherishes traveling and exploring cuisines with his family.
Stephen Eschbach
Stephen Eschbach

Stephen is a Senior Compliance Specialist at AWS, helping financial services customers meet their security and compliance objectives on AWS. With over 18 years of experience in enterprise risk, IT GRC, and IT regulatory compliance, Stephen has worked and consulted for several global financial services companies. Outside of work, Stephen enjoys family time, kids’ sports, fishing, golf, and Texas BBQ.

How Zurich Insurance Group built a log management solution on AWS

Post Syndicated from Jake Obi original https://aws.amazon.com/blogs/big-data/how-zurich-insurance-group-built-a-log-management-solution-on-aws/

This post is written in collaboration with Clarisa Tavolieri, Austin Rappeport and Samantha Gignac from Zurich Insurance Group.

The growth in volume and number of logging sources has been increasing exponentially over the last few years, and will continue to increase in the coming years. As a result, customers across all industries are facing multiple challenges such as:

  • Balancing storage costs against meeting long-term log retention requirements
  • Bandwidth issues when moving logs between the cloud and on premises
  • Resource scaling and performance issues when trying to analyze massive amounts of log data
  • Keeping pace with the growing storage requirements, while also being able to provide insights from the data
  • Aligning license costs for Security Information and Event Management (SIEM) vendors with log processing, storage, and performance requirements. SIEM solutions help you implement real-time reporting by monitoring your environment for security threats and alerting on threats once detected.

Zurich Insurance Group (Zurich) is a leading multi-line insurer providing property, casualty, and life insurance solutions globally. In 2022, Zurich began a multi-year program to accelerate their digital transformation and innovation through the migration of 1,000 applications to AWS, including core insurance and SAP workloads.

The Zurich Cyber Fusion Center management team faced similar challenges, such as balancing licensing costs to ingest and long-term retention requirements for both business application log and security log data within the existing SIEM architecture. Zurich wanted to identify a log management solution to work in conjunction with their existing SIEM solution. The new approach would need to offer the flexibility to integrate new technologies such as machine learning (ML), scalability to handle long-term retention at forecasted growth levels, and provide options for cost optimization. In this post, we discuss how Zurich built a hybrid architecture on AWS incorporating AWS services to satisfy their requirements.

Solution overview

Zurich and AWS Professional Services collaborated to build an architecture that addressed decoupling long-term storage of logs, distributing analytics and alerting capabilities, and optimizing storage costs for log data. The solution was based on categorizing and prioritizing log data into priority levels between 1–3, and routing logs to different destinations based on priority. The following diagram illustrates the solution architecture.

Flow of logs from source to destination. All logs are sent to Cribl which routes portions of logs to the SIEM, portions to Amazon OpenSearch, and copies of logs to Amazon S3.

The workflow steps are as follows:

  1. All of the logs (P1, P2, and P3) are collected and ingested into an extract, transform, and load (ETL) service, AWS Partner Cribl’s Stream product, in real time. Capturing and streaming of logs is configured per use case based on the capabilities of the source, such as using built-in forwarders, installing agents, using Cribl Streams, and using AWS services like Amazon Data Firehose. This ETL service performs two functions before data reaches the analytics layer:
    1. Data normalization and aggregation – The raw log data is normalized and aggregated in the required format to perform analytics. The process consists of normalizing log field names, standardizing on JSON, removing unused or duplicate fields, and compressing to reduce storage requirements.
    2. Routing mechanism – Upon completing data normalization, the ETL service will apply necessary routing mechanisms to ingest log data to respective downstream systems based on category and priority.
  2. Priority 1 logs, such as network detection & response (NDR), endpoint detection and response (EDR), and cloud threat detection services (for example, Amazon GuardDuty), are ingested directly to the existing on-premises SIEM solution for real-time analytics and alerting.
  3. Priority 2 logs, such as operating system security logs, firewall, identity provider (IdP), email metadata, and AWS CloudTrail, are ingested into Amazon OpenSearch Service to enable the following capabilities. Previously, P2 logs were ingested into the SIEM.
    1. Systematically detect potential threats and react to a system’s state through alerting, and integrating those alerts back into Zurich’s SIEM for larger correlation, reducing by approximately 85% the amount of data ingestion into Zurich’s SIEM. Eventually, Zurich plans to use ML plugins such as anomaly detection to enhance analysis.
    2. Develop log and trace analytics solutions with interactive queries and visualize results with high adaptability and speed.
    3. Reduce the average time to ingest and average time to search that accommodates the increasing scale of log data.
    4. In the future, Zurich plans to use OpenSearch’s security analytics plugin, which can help security teams quickly detect potential security threats by using over 2,200 pre-built, publicly available Sigma security rules or create custom rules.
  4. Priority 3 logs, such as logs from enterprise applications and vulnerability scanning tools, are not ingested into the SIEM or OpenSearch Service, but are forwarded to Amazon Simple Storage Service (Amazon S3) for storage. These can be queried as needed using one-time queries.
  5. Copies of all log data (P1, P2, P3) are sent in real time to Amazon S3 for highly durable, long-term storage to satisfy the following:
    1. Long-term data retentionS3 Object Lock is used to enforce data retention per Zurich’s compliance and regulatory requirements.
    2. Cost-optimized storageLifecycle policies automatically transition data with less frequent access patterns to lower-cost Amazon S3 storage classes. Zurich also uses lifecycle policies to automatically expire objects after a predefined period. Lifecycle policies provide a mechanism to balance the cost of storing data and meeting retention requirements.
    3. Historic data analysis – Data stored in Amazon S3 can be queried to satisfy one-time audit or analysis tasks. Eventually, this data could be used to train ML models to support better anomaly detection. Zurich has done testing with Amazon SageMaker and has plans to add this capability in the near future.
  6. One-time query analysis – Simple audit use cases require historical data to be queried based on different time intervals, which can be performed using Amazon Athena and AWS Glue analytic services. By using Athena and AWS Glue, both serverless services, Zurich can perform simple queries without the heavy lifting of running and maintaining servers. Athena supports a variety of compression formats for reading and writing data. Therefore, Zurich is able to store compressed logs in Amazon S3 to achieve cost-optimized storage while still being able to perform one-time queries on the data.

As a future capability, supporting on-demand, complex query, analysis, and reporting on large historical datasets could be performed using Amazon OpenSearch Serverless. Also, OpenSearch Service supports zero-ETL integration with Amazon S3, where users can query their data stored in Amazon S3 using OpenSearch Service query capabilities.

The solution outlined in this post provides Zurich an architecture that supports scalability, resilience, cost optimization, and flexibility. We discuss these key benefits in the following sections.

Scalability

Given the volume of data currently being ingested, Zurich needed a solution that could satisfy existing requirements and provide room for growth. In this section, we discuss how Amazon S3 and OpenSearch Service help Zurich achieve scalability.

Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. The total volume of data and number of objects you can store in Amazon S3 are virtually unlimited. Based on its unique architecture, Amazon S3 is designed to exceed 99.999999999% (11 nines) of data durability. Additionally, Amazon S3 stores data redundantly across a minimum of three Availability Zones (AZs) by default, providing built-in resilience against widespread disaster. For example, the S3 Standard storage class is designed for 99.99% availability. For more information, check out the Amazon S3 FAQs.

Zurich uses AWS Partner Cribl’s Stream solution to route copies of all log information to Amazon S3 for long-term storage and retention, enabling Zurich to decouple log storage from their SIEM solution, a common challenge facing SIEM solutions today.

OpenSearch Service is a managed service that makes it straightforward to run OpenSearch without having to manage the underlying infrastructure. Zurich’s current on-premises SIEM infrastructure is comprised of more than 100 servers, all of which have to be operated and maintained. Zurich hopes to reduce this infrastructure footprint by 75% by offloading priority 2 and 3 logs from their existing SIEM solution.

To support geographies with restrictions on cross-border data transfer and to meet availability requirements, AWS and Zurich worked together to define an Amazon OpenSearch Service configuration that would support 99.9% availability using multiple AZs in a single region.

OpenSearch Service supports cross-region and cross-cluster queries, which helps with distributing analysis and processing of logs without moving data, and provides the ability to aggregate information across clusters. Since Zurich plans to deploy multiple OpenSearch domains in different regions, they will use cross-cluster search functionality to query data seamlessly across different regional domains without moving data. Zurich also configured a connector for their existing SIEM to query OpenSearch, which further allows distributed processing from on premises, and enables aggregation of data across data sources. As a result, Zurich is able to distribute processing, decouple storage, and publish key information in the form of alerts and queries to their SIEM solution without having to ship log data.

In addition, many of Zurich’s business units have logging requirements that could also be satisfied using the same AWS services (OpenSearch Service, Amazon S3, AWS Glue, and Amazon Athena). As such, the AWS components of the architecture were templatized using Infrastructure as Code (IaC) for consistent, repeatable deployment. These components are already being used across Zurich’s business units.

Cost optimization

In thinking about optimizing costs, Zurich had to consider how they would continue to ingest 5 TB per day of security log information just for their centralized security logs. In addition, lines of businesses needed similar capabilities to meet requirements, which could include processing 500 GB per day.

With this solution, Zurich can control (by offloading P2 and P3 log sources) the portion of logs that are ingested into their primary SIEM solution. As a result, Zurich has a mechanism to manage licensing costs, as well as improve the efficiency of queries by reducing the amount of information the SIEM needs to parse on search.

Because copies of all log data are going to Amazon S3, Zurich is able to take advantage of the different Amazon S3 storage tiers, such as using S3 Intelligent-Tiering to automatically move data among Infrequent Access and Archive Access tiers, to optimize the cost of retaining multiple years’ worth of log data. When data is moved to the Infrequent Access tier, costs are reduced by up to 40%. Similarly, when data is moved to the Archive Instant Access tier, storage costs are reduced by up to 68%.

Refer to Amazon S3 pricing for current pricing, as well as for information by region. Moving data to S3 Infrequent Access and Archive Access tiers provides a significant cost savings opportunity while meeting long-term retention requirements.

The team at Zurich analyzed priority 2 log sources, and based on historical analytics and query patterns, determined that only the most recent 7 days of logs are typically required. Therefore, OpenSearch Service was right-sized for retaining 7 days of logs in a hot tier. Rather than configuring UltraWarm and cold storage tiers for OpenSearch Service, copies of the remaining logs were simultaneously being sent to Amazon S3 for long-term retention and could be queried using Athena.

The combination of cost-optimization options is projected to reduce by 53% the cost of per GB of log data ingested and stored for 13 months when compared to the previous approach.

Flexibility

Another key consideration for the architecture was the flexibility to integrate with existing alerting systems and data pipelines, as well as the ability to incorporate new technology into Zurich’s log management approach. For example, Zurich also configured a connector for their existing SIEM to query OpenSearch, which further allows distributed processing from on premises and enables aggregation of data across data sources.

Within the OpenSearch Service software, there are options to expand log analysis using security analytics with predefined indicators of compromise across common log types. OpenSearch Service also offers the capability to integrate with ML capabilities such as anomaly detection and alert correlation to enhance log analysis.

With the introduction of Amazon Security Lake, there is another opportunity to expand the solution to more efficiently manage AWS logging sources and add to this architecture. For example, you can use Amazon OpenSearch Ingestion to generate security insights on security data from Amazon Security Lake.

Summary

In this post, we reviewed how Zurich was able to build a log data management architecture that provided the scalability, flexibility, performance, and cost-optimization mechanisms needed to meet their requirements.

To learn more about components of this solution, visit the Centralized Logging with OpenSearch implementation guide, review Querying AWS service logs, or run through the SIEM on Amazon OpenSearch Service workshop.


About the Authors

Clarisa Tavolieri is a Software Engineering graduate with qualifications in Business, Audit, and Strategy Consulting. With an extensive career in the financial and tech industries, she specializes in data management and has been involved in initiatives ranging from reporting to data architecture. She currently serves as the Global Head of Cyber Data Management at Zurich Group. In her role, she leads the data strategy to support the protection of company assets and implements advanced analytics to enhance and monitor cybersecurity tools.

Austin RappeportAustin Rappeport is a Computer Engineer who graduated from the University of Illinois Urbana/Champaign in 2011 with a focus in Computer Security. After graduation, he worked for the Federal Energy Regulatory Commission in the Office of Electric Reliability, working with the North American Electric Reliability Corporation’s Critical Infrastructure Protection Standards on both the audit and enforcement side, as well as standards development. Austin currently works for Zurich Insurance as the Global Head of Detection Engineering and Automation, where he leads the team responsible for using Zurich’s security tools to detect suspicious and malicious activity and improve internal processes through automation.

Samantha Gignac is a Global Security Architect at Zurich Insurance. She graduated from Ferris State University in 2014 with a Bachelor’s degree in Computer Systems & Network Engineering. With experience in the insurance, healthcare, and supply chain industries, she has held roles such as Storage Engineer, Risk Management Engineer, Vulnerability Management Engineer, and SOC Engineer. As a Cybersecurity Architect, she designs and implements secure network systems to protect organizational data and infrastructure from cyber threats.

Claire Sheridan is a Principal Solutions Architect with Amazon Web Services working with global financial services customers. She holds a PhD in Informatics and has more than 15 years of industry experience in tech. She loves traveling and visiting art galleries.

Jake Obi is a Principal Security Consultant with Amazon Web Services based in South Carolina, US, with over 20 years’ experience in information technology. He helps financial services customers improve their security posture in the cloud. Prior to joining Amazon, Jake was an Information Assurance Manager for the US Navy, where he worked on a large satellite communications program as well as hosting government websites using the public cloud.

Srikanth Daggumalli is an Analytics Specialist Solutions Architect in AWS. Out of 18 years of experience, he has over a decade of experience in architecting cost-effective, performant, and secure enterprise applications that improve customer reachability and experience, using big data, AI/ML, cloud, and security technologies. He has built high-performing data platforms for major financial institutions, enabling improved customer reach and exceptional experiences. He is specialized in services like cross-border transactions and architecting robust analytics platforms.

Freddy Kasprzykowski is a Senior Security Consultant with Amazon Web Services based in Florida, US, with over 20 years’ experience in information technology. He helps customers adopt AWS services securely according to industry best practices, standards, and compliance regulations. He is a member of the Customer Incident Response Team (CIRT), helping customers during security events, a seasoned speaker at AWS re:Invent and AWS re:Inforce conferences, and a contributor to open source projects related to AWS security.

New AWS whitepaper: AWS User Guide for Federally Regulated Financial Institutions in Canada

Post Syndicated from Dan MacKay original https://aws.amazon.com/blogs/security/new-aws-whitepaper-aws-user-guide-for-federally-regulated-financial-institutions-in-canada/

Amazon Web Services (AWS) has released a new whitepaper to help financial services customers in Canada accelerate their use of the AWS Cloud.

The new AWS User Guide for Federally Regulated Financial Institutions in Canada helps AWS customers navigate the regulatory expectations of the Office of the Superintendent of Financial Institutions (OSFI) in a shared responsibility environment. It is intended for OSFI-regulated institutions that are looking to run material workloads in the AWS Cloud, and is particularly useful for leadership, security, risk, and compliance teams that need to understand OSFI requirements and guidance applicable to the use of AWS services.

This whitepaper summarizes OSFI’s expectations with respect to Technology and Cyber Risk Management (OSFI Guideline B-13). It also gives OSFI-regulated institutions information that they can use to commence their due diligence and assess how to implement the appropriate programs for their use of AWS Cloud services. In subsequent versions of the whitepaper, we will provide considerations for other OSFI guidelines as applicable.

In addition to this whitepaper, AWS provides updates on the evolving Canadian regulatory landscape on the AWS Security Blog and the AWS Compliance page. Customers looking for more information on cloud-related regulatory compliance in different countries around the world can refer to the AWS Compliance Center. For additional resources or support, reach out to your AWS account manager or contact us here.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Dan MacKay

Dan MacKay

Dan is the Financial Services Compliance Specialist for AWS Canada. He advises financial services customers on best practices and practical solutions for cloud-related governance, risk, and compliance. Dan specializes in helping AWS customers navigate financial services and privacy regulations applicable to the use of cloud technology in Canada with a focus on third-party risk management and operational resilience.

Dave Trieu

Dave Trieu

Dave is an AWS Solutions Architect Manager with over two decades in the tech industry. He excels in guiding organizations through modernization and using cloud technologies for transformation. Dave helps businesses navigate the digital landscape and maintain a competitive edge by crafting and implementing cutting-edge solutions that address immediate business needs while anticipating future trends.

AWS Payment Cryptography is PCI PIN and P2PE certified

Post Syndicated from Tim Winston original https://aws.amazon.com/blogs/security/aws-payment-cryptography-is-pci-pin-and-p2pe-certified/

Amazon Web Services (AWS) is pleased to announce that AWS Payment Cryptography is certified for Payment Card Industry Personal Identification Number (PCI PIN) version 3.1 and as a PCI Point-to-Point Encryption (P2PE) version 3.1 Decryption Component.

With Payment Cryptography, your payment processing applications can use payment hardware security modules (HSMs) that are PCI PIN Transaction Security (PTS) HSM certified and fully managed by AWS, with PCI PIN and P2PE-compliant key management. These attestations give you the flexibility to deploy your regulated workloads with reduced compliance overhead.

The PCI P2PE Decryption Component enables PCI P2PE Solutions to use AWS to decrypt credit card transactions from payment terminals, and PCI PIN attestation is required for applications that process PIN-based debit transactions. According to PCI, “Use of a PCI P2PE Solution can also allow merchants to reduce where and how the PCI DSS applies within their retail environment, increasing security of customer data while simplifying compliance with the PCI DSS”.

Coalfire, a third-party Qualified PIN Assessor (QPA) and Qualified Security Assessor (P2PE), evaluated Payment Cryptography. Customers can access the PCI PIN Attestation of Compliance (AOC) report, the PCI PIN Shared Responsibility Summary, and the PCI P2PE Attestation of Validation through AWS Artifact.

To learn more about our PCI program and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Author

Tim Winston

Tim is a Principal Payments Industry Specialist for AWS Payment Cryptography. He focuses on compliance for the service and its customers.

Author

Nivetha Chandran

Nivetha is a Security Assurance Manager at AWS. She leads multiple security and compliance initiatives within AWS. Nivetha has over 10 years of experience in security assurance and holds a master’s degree in information management from University of Washington.

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Post Syndicated from Toney Thomas original https://aws.amazon.com/blogs/big-data/empowering-data-driven-excellence-how-the-bluestone-data-platform-embraced-data-mesh-for-success/

This post is co-written with Toney Thomas and Ben Vengerovsky from Bluestone.

In the ever-evolving world of finance and lending, the need for real-time, reliable, and centralized data has become paramount. Bluestone, a leading financial institution, embarked on a transformative journey to modernize its data infrastructure and transition to a data-driven organization. In this post, we explore how Bluestone uses AWS services, notably the cloud data warehousing service Amazon Redshift, to implement a cutting-edge data mesh architecture, revolutionizing the way they manage, access, and utilize their data assets.

The challenge: Legacy to modernization

Bluestone was operating with a legacy SQL-based lending platform, as illustrated in the following diagram. To stay competitive and responsive to changing market dynamics, they decided to modernize their infrastructure. This modernization involved transitioning to a software as a service (SaaS) based loan origination and core lending platforms. Because these new systems produced vast amounts of data, the challenge of ensuring a single source of truth for all data consumers emerged.

Birth of the Bluestone Data Platform

To address the need for centralized, scalable, and governable data, Bluestone introduced the Bluestone Data Platform. This platform became the hub for all data-related activities across the organization. AWS played a pivotal role in bringing this vision to life.

The following are the key components of the Bluestone Data Platform:

  • Data mesh architecture – Bluestone adopted a data mesh architecture, a paradigm that distributes data ownership across different business units. Each data producer within the organization has its own data lake in Apache Hudi format, ensuring data sovereignty and autonomy.
  • Four-layered data lake and data warehouse architecture – The architecture comprises four layers, including the analytical layer, which houses purpose-built facts and dimension datasets that are hosted in Amazon Redshift. These datasets are pivotal for reporting and analytics use cases, powered by services like Amazon Redshift and tools like Power BI.
  • Machine learning analytics – Various business units, such as Servicing, Lending, Sales & Marketing, Finance, and Credit Risk, use machine learning analytics, which run on top of the dimensional model within the data lake and data warehouse. This enables data-driven decision-making across the organization.
  • Governance and self-service – The Bluestone Data Platform provides a governed, curated, and self-service avenue for all data use cases. AWS services like AWS Lake Formation in conjunction with Atlan help govern data access and policies.
  • Data quality framework – To ensure data reliability, they implemented a data quality framework. It continuously assesses data quality and syncs quality scores to the Atlan governance tool, instilling confidence in the data assets within the platform.

The following diagram illustrates the architecture of their updated data platform.

AWS and third-party services

AWS played a pivotal and multifaceted role in empowering Bluestone’s Data Platform to thrive. The following AWS and third-party services were instrumental in shaping Bluestone’s journey toward becoming a data-driven organization:

  • Amazon Redshift – Bluestone harnessed the power of Amazon Redshift and its features like data sharing to create a centralized repository of data assets. This strategic move facilitated seamless data sharing and collaboration across diverse business units, paving the way for more informed and data-driven decision-making.
  • Lake Formation – Lake Formation emerged as a cornerstone in Bluestone’s data governance strategy. It played a critical role in enforcing data access controls and implementing data policies. With Lake Formation, Bluestone achieved protection of sensitive data and compliance with regulatory requirements.
  • Data quality monitoring – To maintain data reliability and accuracy, Bluestone deployed a robust data quality framework. AWS services were essential in this endeavor, because they complemented open source tools to establish an in-house data quality monitoring system. This system continuously assesses data quality, providing confidence in the reliability of the organization’s data assets.
  • Data governance tooling – Bluestone chose Atlan, available through AWS Marketplace, to implement comprehensive data governance tooling. This SaaS service played a pivotal role in onboarding multiple business teams and fostering a data-centric culture within Bluestone. It empowered teams to efficiently manage and govern data assets.
  • Orchestration using Amazon MWAA – Bluestone heavily relied on Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to manage workflow orchestrations efficiently. This orchestration framework seamlessly integrated with various data quality rules, which were evaluated using Great Expectations operators within the Airflow environment.
  • AWS DMS – Bluestone used AWS Database Migration Service (AWS DMS) to streamline the consolidation of legacy data into the data platform. This service facilitated the smooth transfer of data from legacy SQL Server warehouses to the data lake and data warehouse, providing data continuity and accessibility.
  • AWS Glue – Bluestone used the AWS Glue PySpark environment for implementing data extract, transform, and load (ETL) processes. It played a pivotal role in processing data originating from various source systems, providing data consistency and suitability for analytical use.
  • AWS Glue Data Catalog – Bluestone centralized their data management using the AWS Glue Data Catalog. This catalog served as the backbone for managing data assets within the Bluestone data estate, enhancing data discoverability and accessibility.
  • AWS CloudTrail – Bluestone implemented AWS CloudTrail to monitor and audit platform activities rigorously. This security-focused service provided essential visibility into platform actions, providing compliance and security in data operations.

AWS’s comprehensive suite of services has been integral in propelling the Bluestone Data Platform towards data-driven success. These services have not only enabled efficient data governance, quality assurance, and orchestration, but have also fostered a culture of data centricity within the organization, ultimately leading to better decision-making and competitive advantage. Bluestone’s journey showcases the power of AWS in transforming organizations into data-driven leaders in their respective industries.

Bluestone data architecture

Bluestone’s data architecture has undergone a dynamic transformation, transitioning from a lake house framework to a data mesh architecture. This evolution was driven by the organization’s need for data products with distributed ownership and the necessity for a centralized mechanism to govern and access these data products across various business units.

The following diagram illustrates the solution architecture and its use of AWS and third-party services.

Let’s delve deeper into how this architecture shift has unfolded and what it entails:

  • The need for change – The catalyst for this transformation was the growing demand for discrete data products tailored to the unique requirements of each business unit within Bluestone. Because these business units generated their own data assets in their respective domains, the challenge lay in efficiently managing, governing, and accessing these diverse data stores. Bluestone recognized the need for a more structured and scalable approach.
  • Data products with distributed ownership – In response to this demand, Bluestone adopted a data mesh architecture, which allowed for the creation of distinct data products aligned with each business unit’s needs. Each of these data products exists independently, generating and curating data assets specific to its domain. These data products serve as individual data hubs, ensuring data autonomy and specialization.
  • Centralized catalog integration – To streamline the discovery and accessibility of the data assets that are dispersed across these data products, Bluestone introduced a centralized catalog. This catalog acts as a unified repository where all data products register their respective data assets. It serves as a critical component for data discovery and management.
  • Data governance tool integration – Ensuring data governance and lineage tracking across the organization was another pivotal consideration. Bluestone implemented a robust data governance tool that connects to the centralized catalog. This integration makes sure that the overarching lineage of data assets is comprehensively mapped and captured. Data governance processes are thereby enforced consistently, guaranteeing data quality and compliance.
  • Amazon Redshift data sharing for control and access – To facilitate controlled and secure access to data assets residing within individual data product Redshift instances, Bluestone used Amazon Redshift data sharing. This capability allows data assets to be exposed and shared selectively, providing granular control over access while maintaining data security and integrity.

In essence, Bluestone’s journey from a lake house to a data mesh architecture represents a strategic shift in data management and governance. This transformation empowers different business units to operate autonomously within their data domains while ensuring centralized control, governance, and accessibility. The integration of a centralized catalog and data governance tooling, coupled with the flexibility of Amazon Redshift data sharing, creates a harmonious ecosystem where data-driven decision-making thrives, ultimately contributing to Bluestone’s success in the ever-evolving financial landscape.

Conclusion

Bluestone’s journey from a legacy SQL-based system to a modern data mesh architecture on AWS has improved the way the organization interacts with data and positioned them as a data-driven powerhouse in the financial industry. By embracing AWS services, Bluestone has successfully achieved a centralized, scalable, and governable data platform that empowers its teams to make informed decisions, drive innovation, and stay ahead in the competitive landscape. This transformation serves as compelling proof that Amazon Redshift and AWS Cloud data sharing capabilities are a great pathway for organizations looking to embark on their own data-driven journeys with AWS.


About the Authors

Toney Thomas is a Data Architect and Data Engineering Lead at Bluestone, renowned for his role in envisioning and coining the company’s pioneering data strategy. With a strategic focus on harnessing the power of advanced technology to tackle intricate business challenges, Toney leads a dynamic team of Data Engineers, Reporting Engineers, Quality Assurance specialists, and Business Analysts at Bluestone. His leadership extends to driving the implementation of robust data governance frameworks across diverse organizational units. Under his guidance, Bluestone has achieved remarkable success, including the deployment of innovative platforms such as a fully governed data mesh business data system with embedded data quality mechanisms, aligning seamlessly with the organization’s commitment to data democratization and excellence.

Ben Vengerovsky is a Data Platform Product Manager at Bluestone. He is passionate about using cloud technology to revolutionize the company’s data infrastructure. With a background in mortgage lending and a deep understanding of AWS services, Ben specializes in designing scalable and efficient data solutions that drive business growth and enhance customer experiences. He thrives on collaborating with cross-functional teams to translate business requirements into innovative technical solutions that empower data-driven decision-making.

Rada Stanic is a Chief Technologist at Amazon Web Services, where she helps ANZ customers across different segments solve their business problems using AWS Cloud technologies. Her special areas of interest are data analytics, machine learning/AI, and application modernization.