Tag Archives: Security, Identity & Compliance

AWS achieves HDS certification in four additional AWS Regions

Post Syndicated from Janice Leung original https://aws.amazon.com/blogs/security/aws-achieves-hds-certification-in-four-additional-aws-regions/

Amazon Web Services (AWS) is pleased to announce that four additional AWS Regions—Asia Pacific (Hong Kong), Asia Pacific (Osaka), Asia Pacific (Hyderabad), and Israel (Tel Aviv)—have been granted the Health Data Hosting (Hébergeur de Données de Santé, HDS) certification, increasing the scope to 24 global AWS Regions.

The Agence du Numérique en Santé (ANS), the French governmental agency for health, introduced the HDS certification to strengthen the security and protection of personal health data. By achieving this certification, AWS demonstrates our continuous commitment to adhere to the heightened expectations for cloud service providers.

The following 24 Regions are in scope for this certification:

  • US East (N. Virginia)
  • US East (Ohio)
  • US West (N. California)
  • US West (Oregon)
  • Asia Pacific (Hong Kong)
  • Asia Pacific (Hyderabad)
  • Asia Pacific (Jakarta)
  • Asia Pacific (Mumbai)
  • Asia Pacific (Osaka)
  • Asia Pacific (Seoul)
  • Asia Pacific (Singapore)
  • Asia Pacific (Sydney)
  • Asia Pacific (Tokyo)
  • Canada (Central)
  • Europe (Frankfurt)
  • Europe (Ireland)
  • Europe (London)
  • Europe (Milan)
  • Europe (Paris)
  • Europe (Stockholm)
  • Europe (Zurich)
  • Middle East (UAE)
  • Israel (Tel Aviv)
  • South America (São Paulo)

The HDS certification demonstrates that AWS provides a framework for technical and governance measures to secure and protect personal health data according to HDS requirements. Our customers who handle personal health data can continue to manage their workloads in HDS-certified Regions with confidence.

Independent third-party auditors evaluated and certified AWS on September 3, 2024. The HDS Certificate of Compliance demonstrating AWS compliance status is available on the Agence du Numérique en Santé (ANS) website and AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

For up-to-date information, including when additional Regions are added, visit the AWS Compliance Programs page and choose HDS.

AWS strives to continuously meet your architectural and regulatory needs. If you have questions or feedback about HDS compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Author

Janice Leung
Janice is a Security Assurance Program Manager at AWS based in New York. She leads various commercial security certifications, within the automobile, healthcare, and telecommunications sectors across Europe. In addition, she leads the AWS infrastructure security program worldwide. Janice has over 10 years of experience in technology risk management and audit at leading financial services and consulting company.

Tea Jioshvili

Tea Jioshvili
Tea is a Security Assurance Manager at AWS, based in Berlin, Germany. She leads various third-party audit programs across Europe. She previously worked in security assurance and compliance, business continuity, and operational risk management in the financial industry for multiple years.

Build a mobile driver’s license solution based on ISO/IEC 18013-5 using AWS Private CA and AWS KMS

Post Syndicated from Ram Ramani original https://aws.amazon.com/blogs/security/build-a-mobile-drivers-license-solution-based-on-iso-iec-18013-5-using-aws-private-ca-and-aws-kms/

A mobile driver’s license (mDL) is a digital representation of a physical driver’s license that’s stored on a mobile device. An mDL is a significant improvement over physical credentials, which can be lost, stolen, counterfeited, damaged, or contain outdated information, and can expose unconsented personally identifiable information (PII). Organizations are working together to use mDLs across various situations, ranging from validating identity during airplane boarding to sharing information for age-restricted activities.

The trust in the mDL system is based on public-private key cryptography where mDLs are signed by issuing authorities using their private key and verified using the issuing authority’s public key. In this blog post, we show you how to build an mDL issuing authority in Amazon Web Services (AWS) using AWS Private Certificate Authority and AWS Key Management Service (AWS KMS) according to mDL specification ISO/IEC 18013-5:2021. These AWS services align with the cryptographic requirements placed on the issuing authorities by ISO/IEC 18013-5. While we have tailored this post to an mDL use case, the sign and verify mechanism using AWS Private CA and AWS KMS can be used for multiple kinds of digital identity verification.

Solution overview

AWS Private CA provides you with a highly available private certificate authority (CA) service without the initial investment and ongoing maintenance costs of operating your own private CA. CA administrators can use AWS Private CA to create a complete CA hierarchy, including online root and subordinate CAs, without needing external CAs. You can issue, rotate, and revoke certificates that are trusted within your organization using AWS Private CA.

AWS Private CA can issue certificates formatted as required by ISO/IEC 18013-5. You can build a certificate authority (CA) in AWS Private CA—referred to as the issuing authority certificate authority (IACA) in ISO/IEC 18013-5. We create an IACA self-signed root certificate and an mDL document signing certificate in AWS Private CA.

AWS KMS is a managed service that you can use to create and control the cryptographic keys that are used to protect your data. AWS KMS uses FIPS 140-2 Level 3 validated hardware security modules (HSMs) to protect AWS KMS keys, which is a requirement for building an issuing authority as described in ISO/IEC 18013-5. We create an asymmetric key pair in AWS KMS for signing and verification of the mDL document. We programmatically create a certificate signing request (CSR) that’s signed by the asymmetric key pair stored in AWS KMS. The CSR is sent to the AWS Private CA service for issuing the mDL document signing certificate that matches the certificate profile requirement specified for the document signing certificate in ISO/IEC 18013-5.

We sign an mDL document using the private key of the asymmetric key pair created in AWS KMS with a KeyUsage value of SIGN_VERIFY. The signed mDL document is delivered to a mobile device where it’s stored in a digital wallet and produced for verification by mDL readers. The mDL readers are configured with IACA certificates from various issuing authorities that allow them to verify the mDL documents signed by respective issuing authorities. An example of an issuing authority could be a state government agency that issues driver’s licenses.

Least privilege

The solution in this post uses AWS KMS and AWS Private CA services. Before you implement the process described in this post, ensure that the AWS Identity and Access Management (IAM) principal you choose follows the principle of least privilege and that permissions are scoped to the minimum required permissions required. See Security best practices in IAM to learn more.

Solution architecture

A sample solution architecture for building an mDL issuing authority in AWS is shown in Figure 1. The figure shows the step-by-step process starting from setting up a private CA and issuing an mDL document signing certificate to mDL issuance and verification. The infrastructure that’s built using this architecture includes a root certificate authority, which issues a document signer certificate. You can find the certificate requirements in section B.1 Certificate Profile of ISO/IEC 18013-5.

Figure 1: mDL issuing authority architecture and process flow in AWS

Figure 1: mDL issuing authority architecture and process flow in AWS

In this post, we use AWS Command Line Interface (AWS CLI) commands, but these can be replaced by AWS SDK API calls if needed. Along with the AWS CLI steps, a GitHub sample is provided that’s used to programmatically create and sign an mDL document signing CSR using AWS KMS.

See the AWS CLI commands documentation for AWS Private CA and AWS KMS for detailed information on the commands used in this solution.

Solution walkthrough

Use the following steps to create the infrastructure needed for mDL signing and verification.

Step 1: Create IACA CA in AWS Private CA

In this step, the root of trust IACA (issuing authority CA) will be created. The IACA root CA is the root of trust that will be used for verification of the mDL.

  1. Create a local ca_config.txt file with the following content. The contents of this file are derived from the Certificate profiles section (Annex B) within ISO/IEC 18013-5. You can change the Country and CommonName values in the file as needed for your requirements.
    {
      "KeyAlgorithm": "EC_prime256v1",
      "SigningAlgorithm": "SHA256WITHECDSA",
      "Subject": {
        "Country": "US",
        "CommonName": "mDL IACA Root"
      }
    }

  2. The IACA root certificate will be paired with a certificate revocation list (CRL). See Planning a certificate revocation list (CRL) for information about configuring CRLs. Create a local file called revocation_config.txt with the following information to configure a CRL. The values for CustomCname and S3BucketName are examples, update them with the values that you have created within your AWS account. Update ExpirationInDays to fit your requirements. We recommend configuring encryption on the Amazon Simple Storage Service (Amazon S3) bucket containing your CRLs.
    {
      "CrlConfiguration": {
        "CustomCname": "example.com",
        "Enabled": true,
        "S3BucketName": "crlmdlbucket",
        "ExpirationInDays": 5000,  
      }
    }

  3. Invoke an AWS CLI command to create a private certificate authority. Replace the region parameter as needed. Update the file:// paths in the following command to the locations where you’ve stored the ca_config.txt and revocation_config.txt files.
    aws acm-pca create-certificate-authority \ 
        --region us-west-1 \
        --certificate-authority-configuration file://ca_config.txt \
        --revocation-configuration file://revocation_config.txt \
        —-certificate-authority-type "ROOT"

  4. The command should produce the following output. The output contains the Amazon Resource Name (ARN) of the created CA. You will need this ARN in subsequent steps.
    {
        "CertificateAuthorityArn": "arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113"
    }

Step 2: Retrieve the CSR for IACA root certificate

You’ll create an IACA root certificate, which starts with retrieving a CSR. This step retrieves the CSR for the IACA root certificate. The certificate-authority-arn parameter carries the CA ARN that was generated in Step 1.

  1. The following command will output a Privacy-Enhanced Mail (PEM) formatted CSR.
    aws acm-pca get-certificate-authority-csr \
        --region us-west-1 \
        --output text \
        --certificate-authority-arn arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113

  2. The following is the format of the output CSR:
    -----BEGIN CERTIFICATE REQUEST-----
    ..
    -----END CERTIFICATE REQUEST-----

  3. Store the output text in a file called IACA.csr.

Step 3: Generate root certificate

  1. This step issues the IACA root certificate. Create a file named extensions.txt using the following contents, which are derived from the Certificate profiles section of ISO/IEC 18013-5.

    The KeyUsage extension with KeyCertSign and CRLSign should be set to true. A custom extension for the CRL distribution point is set and the validity of the certificate should be set to 9 years or 3285 days (set in the next step). Because the IACA root certificate is only used to issued mDLs, a maximum validity period of 9 years is sufficient, as indicated in Table B.1 of ISO/IEC 18013-5. Additionally, a CRL distribution point extension must be present. In the following example, the CRL URL encoded in the CDP extension is http://example.com/crl/0116z123-dv7a-59b1-x7be-1231v72571136.crl, aligning with both the CA CRL configuration applied to the CA at creation and to the CA ID. For base-64 encoding of the CDP extension, you can refer to this java sample.

    {
      "Extensions": {
        "KeyUsage": {
          "KeyCertSign": true,
          "CRLSign": true
        },
        "CustomExtensions": [
          {
            "ObjectIdentifier": "2.5.29.31",
            "Value": "MEgwRqBEoEKGQGh0dHA6Ly9leGFtcGxlLmNvbS9jcmwvMDExNnoxMjMtZHY3YS01OWIxLXg3YmUtMTIzMXY3MjU3MTEzNi5jcmw="
           }
        ]
      }
    }

  2. Issue the following command to AWS Private CA to create the certificate.
    aws acm-pca issue-certificate \
        --region us-west-1 \
        --certificate-authority-arn arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113 \
        --template-arn "arn:aws:acm-pca:::template/BlankRootCACertificate_PathLen0_APIPassthrough/V1" \
        --signing-algorithm "SHA256WITHECDSA" \
        --csr fileb://IACA.csr \
        --validity Value=3285,Type="DAYS" \
        --api-passthrough file://extensions.txt

  3. The preceding command will produce the following output:
    {
      "CertificateArn": "arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113/certificate/34a1dab03117f0e89c54b1234fe13318"
    }

Note that the IACA root CA created with AWS Private CA currently doesn’t have a CRL distribution point (CDP) extension by default. However, that is a mandatory extension according to the IACA root certificate profile in ISO/IEC 18013-5. To implement this, we use a custom extension passed in using API passthrough, which embeds the CDP extension. The distribution point specified in that extension must be based on the CA ID, which is 0116z123-dv7a-59b1-x7be-1231v7257113 derived from the CA ARN that was created in Step 1.

Step 4: Retrieve root certificate

This step retrieves the IACA root certificate in PEM format.

  1. Use the following code to retrieve the IACA root certificate:
    aws acm-pca get-certificate \
        --region us-west-1 \
        --certificate-authority-arn arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113 \
        --certificate-arn arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113/certificate/34a1dab03117f0e89c54b1234fe13318 \
        --output text

  2. The command output will be a PEM formatted certificate similar to the following:
    -----BEGIN CERTIFICATE-----
    ..
    -----END CERTIFICATE-----

  3. Store the output text in a file named IACA-Root-CA-Cert.pem.

Step 5: Import root certificate

Use the following code to import the root certificate into AWS Private CA and make the certificate authority active and ready to issue certificates.

aws acm-pca import-certificate-authority-certificate \
    --region us-west-1 \
    --certificate-authority-arn arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113 \
    --certificate fileb://IACA-Root-CA-Cert.pem

You should see success after running the command.

Step 6: Create an asymmetric key in AWS KMS

In this step, create an asymmetric signing key in AWS KMS which will be used to sign the mDL document signing CSR.

  1. Use the following command to create an asymmetric key:
    aws kms create-key \
        --region us-west-1 \
        --key-spec ECC_NIST_P256 \
        --key-usage SIGN_VERIFY

  2. The command should produce the following output:
    {
      "KeyMetadata": {
        "AWSAccountId": "123412345678",
        "KeyId": "3ab87971-1fe2-45d9-955a-5dc7f65558zf",
        "Arn": "arn:aws:kms:us-west-1:123412345678:key/3ab87971-1fe2-45d8-955c-5dc7f65558ef",
        "CreationDate": "2024-05-18T19:53:27.318000+00:00",
        "Enabled": true,
        "Description": "",
        "KeyUsage": "SIGN_VERIFY",
        "KeyState": "Enabled",
        "Origin": "AWS_KMS",
        "KeyManager": "CUSTOMER",
        "CustomerMasterKeySpec": "ECC_NIST_P256",
        "KeySpec": "ECC_NIST_P256",
        "SigningAlgorithms": [
          "ECDSA_SHA_256"
        ],
        "MultiRegion": false
      }
    }

  3. Note the Arn value from the output. You will use it in Step 7 to configure the CSR creation utility for the mDL document signing certificate.

Step 7: Use the CSR creation utility to generate the document signing CSR

We published a sample utility in GitHub that creates a CSR signed by an AWS asymmetric key.

  1. Clone the GitHub repository and then follow the instructions in the README file from the repository to configure and run it.
  2. This program will output a PEM formatted CSR similar to the following:
    -----BEGIN CERTIFICATE REQUEST-----
    ..
    -----END CERTIFICATE REQUEST-----

  3. Copy the output and store it in a file named document-signing-kms.csr. You will use the file in Step 8 to create the mDL document signing certificate based on this CSR.

Step 8: Generate an mDL document signing certificate

This step creates the document signing certificate from the CSR that’s signed using the AWS KMS asymmetric key.

  1. Create a file named extensionSigner.txt with the following contents. The contents of this file are derived from the Certificate profiles section of ISO/IEC 18013-5. The JSON snippet that follows shows the extension structure containing the KeyUsage extension with DigitalSignature field set to true.
    {
         "Extensions": {
             "KeyUsage": {
                 "DigitalSignature": true
             },
             "ExtendedKeyUsage": [
                 {
                     "ExtendedKeyUsageObjectIdentifier": "1.0.18013.5.1.2"
                 }
             ]
         }
    }

  2. Use the following AWS CLI command to create the certificate.
    aws acm-pca issue-certificate \
        --region us-west-1 \
        --certificate-authority-arn arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113 \
        --template-arn "arn:aws:acm-pca:::template/BlankEndEntityCertificate_APIPassthrough/V1" \
        --signing-algorithm "SHA256WITHECDSA" \
        --csr fileb://document-signing-kms.csr \
        --validity Value=1825,Type="DAYS" \
        --api-passthrough file://extensionSigner.txt

  3. Output:
    {
        "CertificateArn": "arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113/certificate/d462fcd3b9h3beb45c7c312241d42fba"
    }

  4. You will use the CertificateArn from the output in Step 9 to retrieve the mDL document signing certificate.

Step 9: Retrieve the mDL document signing certificate

This step retrieves the document signing certificate in PEM format from AWS Private CA.

  1. Use the following command to retrieve the document signing certificate:
    aws acm-pca get-certificate \
        --region us-west-1 \
        --certificate-authority-arn arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113 \
        --certificate-arn arn:aws:acm-pca:us-west-1:123412345678:certificate-authority/0116z123-dv7a-59b1-x7be-1231v7257113/certificate/d462fcd3b9h3beb45c7c312241d42fba \
        --output text

  2. Store the output text in document_signing_cert.pem.

You now have the mDL document signing certificate for packaging later with the Concise Binary Object Representation (CBOR) structure required by ISO/IEC 18013-5.

Step 10: mDL reader ingests issuing authority’s mDL signing certificate chain

An mDL reader can trust the mDL presented by a user after cryptographically verifying the mDL. This verification requires the reader to possess the mDL signing certificate chain of the issuing authority that issued the user the mDL. As required by the decentralized public key infrastructure (PKI) trust model specified in ISO/IEC 18013-5, the mDL reader will ingest the mDL signing certificate chain of the issuing authority.

Step 11: User makes an mDL signing request to the issuing authority

The user makes a request to the issuing authority to sign the mDL.

Step 12: Issuing authority issues signed mDL to the user

The issuing authority will authenticate the user’s identity and issue a signed mDL. The issuing authority provisions mDL data to the user’s device along with a CBOR encoded object known as a mobile security object (MSO). MSOs contain a digest algorithm, individual digests of mDL data elements, and a validity period. After this MSO has been generated and encoded as required by ISO/IEC 18013-5:2021 section 9.1.2.4, the MSO can be signed by the issuing authority. This signature can be generated in AWS KMS as shown in the following command. Generating the encoded MSO is out of scope for this post.

  1. Use the following command to produce the SHA-256 digest of encoded MSO object using the sha256sum utility.
    sha256sum < EncodedMSO > EncodedMSODigest

  2. Sign the digest using the AWS KMS asymmetric key created in Step 6.
    aws kms sign \
     --region us-west-1 \
     --key-id 3ab87971-1fe2-45d8-955c-5dc7f65558ef \
     --message fileb://EncodedMSODigest \
     --message-type DIGEST \
     --signing-algorithm ECDSA_SHA_256 \
     --output text \
     --query Signature | base64 --decode

  3. This signature will be combined with the issuing authority certificate and the MSO to form a CBOR Object Signing and Encryption (COSE) signed message and will be presented with the mDL data elements to readers. Readers can validate this signature to confirm the integrity of the MSO.

Step 13: User presents their mDL to an mDL reader

The user presents their mDL to the mDL reader for identity verification, such as at an airport. This process is called mDL Initialization in ISO/IEC 18013-5:2021 section 6.3.2.2. The mDL is activated during this initialization step.

Step 14: An mDL reader requests mDL data from a user’s mobile device

The mDL reader issues an mDL retrieval request to the user’s mobile device. A key feature of mDLs is that they allow mDL holders to present a subset of their PII. An mDL reader will request specific attributes such as name and date of birth, requiring the mDL holder to consent to the release of this information. The mDL reader’s request contains the list of PII data element identifiers that it is requesting the mDL holder to share.

Step 15: User consents to share their mDL data

The user receives a prompt notifying them of mDL sharing request. This prompt shows the user the list of PII data elements that are being requested. The user consents to the request and the mDL data that includes the MSO is shared with the reader.

Step 16: Reader validates mDL integrity

The reader receives the mDL data and validates it for integrity. The inclusion of the MSO with the mDL data elements provides mDL readers with a mechanism for validating the integrity of the data they’ve received. The mDL reader can then hash and verify individual mDL data elements presented by the device. If all data elements match their corresponding entries in the MSO, the mDL device reader can attest that the data hasn’t been tampered with.

As an example, assume that the mDL contains the following data elements:

24(<<
  {
    "digestID": 0,
    "random": h'BBA394B98088CAE238D35979F7210E18DFAF70354524D86149CA20046E4321B1',
    "elementIdentifer": "given_name",
    "elementValue": "John"
  }
>>),
24(<<
  {
    "digestID": 1,
    "random": h'901F63FD880A15B30EDCEEFA857201C52FB9EAD1D39C15BB592829D16CB8A368',
    "elementIdentifer": "family_name",
    "elementValue": "Doe"
  }
>>)

And a Mobile Security Object containing the following data element digests:

24(<<
  {
    "version": "1.0",
    "digestAlgorithm": "SHA-256",
    "valueDigests":
    {
      "org.iso.18013.5.1":
      {
        0: h’D6AA81E454036313A9A681809151DDDBDF702289094F18286DDC591C41C6434E',
        1: h'4C3D83940CA8C5DE8060A23EB649C175E79B745B6A7D9939B4D16B3E46BB14D5'
      }
    }
  }
>>)

The MSO’s integrity would first confirm that the validity period of the MSO (not shown) has not expired. It can then verify the signature (not shown) with the issuing authority’s public key. After this has been established, both data elements need to be verified. The CBOR representation of each element (digestID, random, elementIdentifier, and elementValue) is encoded as bytes and then hashed using SHA-256. For example, the following should equal D6AA81E454036313A9A681809151DDDBDF702289094F18286DDC591C41C6434E.

SHA256(CBOR byte representation of 24(<<
    {
      "digestID": 0,
      "random": h'BBA394B98088CAE238D35979F7210E18DFAF70354524D86149CA20046E4321B1',
      "elementIdentifer": "given_name",
      "elementValue": "John"
    }
  >>))
)

Likewise, the following example should equal
4C3D83940CA8C5DE8060A23EB649C175E79B745B6A7D9939B4D16B3E46BB14D5.

SHA256(CBOR byte representation of 24(<<
    {
      "digestID": 1,
      "random": h'901F63FD880A15B30EDCEEFA857201C52FB9EAD1D39C15BB592829D16CB8A368',
      "elementIdentifer": "family_name",
      "elementValue": "Doe"
    }
  >>)))

If all data elements pass this hash verification check, then the presented mDL contents can be trusted by the mDL reader.

Summary

As you saw in this solution, mobile driver’s licenses (mDLs) provide increased security and flexible consent management to preserve privacy for individuals. The principles of cryptographic signing and verification aren’t new and both AWS KMS and AWS Private CA are well suited for supporting digital identity applications, whether it’s a driver’s license or some other kind of identification. To learn more about AWS KMS asymmetric keys and AWS Private CA, see Digital signing with the new asymmetric keys feature of AWS KMS and How to host and manage an entire private certificate infrastructure in AWS.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Certificate Manager re:Post and AWS AWS Key Management Service re:Post, or contact AWS Support.

Ram Ramani
Ram Ramani

Ram is a Principal Security architect in AWS, responsible for leading the data protection and privacy focus areas. Prior to this role, Ram held software developer positions at various organizations with a focus on applied math and machine learning.
Raj Jain
Raj Jain

Raj is a Senior Software Engineer in the Amazon FinTech organization, responsible for developing security and compliance services that underlie the AWS and broader Amazon infrastructure. Raj is a published author in the Bell Labs Technical Journal, has authored IETF standards, AWS security blogs, and holds twelve patents.
Kyle Schultheiss
Kyle Schultheiss

Kyle is a Senior Software Engineer on the AWS Cryptography team. He has been working on the ACM Private Certificate Authority service since its inception in 2018. In prior roles, he contributed to other AWS services such as Amazon Virtual Private Cloud, Amazon EC2, and Amazon Route 53.

Automatically replicate your card payment keys across AWS Regions

Post Syndicated from Ruy Cavalcanti original https://aws.amazon.com/blogs/security/automatically-replicate-your-card-payment-keys-across-aws-regions/

In this blog post, I dive into a cross-Region replication (CRR) solution for card payment keys, with a specific focus on the powerful capabilities of AWS Payment Cryptography, showing how your card payment keys can be securely transported and stored.

In today’s digital landscape, where online transactions have become an integral part of our daily lives, ensuring the seamless operation and security of card payment transactions is of utmost importance. As customer expectations for uninterrupted service and data protection continue to rise, organizations are faced with the challenge of implementing robust security measures and disaster recovery strategies that can withstand even the most severe disruptions.

For large enterprises dealing with card payments, the stakes are even higher. These organizations often have stringent requirements related to disaster recovery (DR), resilience, and availability, where even a 99.99 percent uptime isn’t enough. Additionally, because these enterprises deliver their services globally, they need to ensure that their payment applications and the associated card payment keys, which are crucial for securing card data and payment transactions, are securely replicated and stored across AWS Regions.

Furthermore, I explore an event-driven, serverless architecture and the use of AWS PrivateLink to securely move keys through the AWS backbone, providing additional layers of security and efficiency. Overall, this blog post offers valuable insights into using AWS services for secure and resilient data management across AWS Regions.

Card payment key management

If you examine key management, you will notice that card payment keys are shared between devices and third parties today the same as they were around 40 years ago.

A key ceremony is the process held when parties want to securely exchange keys. It involves key custodians responsible for transporting and entering, key components that have been printed on pieces of paper into a hardware security module (HSM). This is necessary to share initial key encryption keys.

Let’s look at the main issues with the current key ceremony process:

  • It requires a secure room with a network-disconnected Payment HSM
  • The logistics are difficult: Three key custodians in the same place at the same time
  • Timewise, it usually takes weeks to have all custodians available, which can interfere with a project release
  • The cost of the operation which includes maintaining a secure room and the travel of the key custodians
  • Lost or stolen key components

Now, let’s consider the working keys used to encrypt sensitive card data. They rely on those initial keys to protect them. If the initial keys are compromised, their associated working keys are also considered compromised. I also see companies using key management standards from the 1990s, such as ANSI X9.17 / FIPS 171, to share working keys. NIST withdrew the FIPS 171 standard in 2005.

Analyzing the current scenario, you’ll notice security risks because of the way keys are shared today and sometimes because organizations are using deprecated standards.

So, let’s get card payment security into the twenty-first century!

Solution overview

AWS Payment Cryptography is a highly available and scalable service that currently operates within the scope of an individual Region. This means that the encryption keys and associated metadata are replicated across multiple Availability Zones within that Region, providing redundancy and minimizing the risk of downtime caused by failures within a single Region.

While this regional replication across multiple Availability Zones provides a higher level of availability and fault tolerance compared to traditional on-premises HSM solutions, some customers with stringent business continuity requirements have requested support for multi-Region replication.

By spanning multiple Regions, organizations can achieve a higher level of resilience and disaster recovery capabilities because data and services can be replicated and failover mechanisms can be implemented across geographically dispersed locations.

This Payment Cryptography CRR solution addresses the critical requirements of high availability, resilience, and disaster recovery for card payment transactions. By replicating encryption keys and associated metadata across multiple Regions, you can maintain uninterrupted access to payment services, even in the event of a regional outage or disaster.

Note: When planning your replication strategy, check the available Payment Cryptography service endpoints.

Here’s how it works:

  1. Primary Region: Encryption keys are generated and managed in a primary Region using Payment Cryptography.
  2. Replication: The generated encryption keys are securely replicated to a secondary Region, creating redundant copies for failover purposes.
  3. Failover: In the event of a regional outage or disaster in the primary Region, payment operations can seamlessly failover to a secondary Region, using the replicated encryption keys to continue processing transactions without interruption.

This cross-Region replication approach enhances availability and resilience and facilitates robust disaster recovery strategies, allowing organizations to quickly restore payment services in a new Region if necessary.

Figure 1: Cross-Region replication (CRR) solution architecture

Figure 1: Cross-Region replication (CRR) solution architecture

The elements of the CRR architecture are as follows:

  1. Payment Cryptography control plane events are sent to an AWS CloudTrail trail.
  2. The CloudTrail trail is configured to send logs to an Amazon CloudWatch Logs log group.
  3. This log group contains an AWS Lambda subscription filter that filters the following events from Payment Cryptography: CreateKey, DeleteKey and ImportKey.
  4. When one of the events is detected, a Lambda function is launched to start key replication.
  5. The Lambda function performs key export and import processes in a secure way using TR-31, which uses an initial key securely generated and shared using TR-34. This initial key is generated when the solution is enabled.
  6. Communication between the primary (origin) Region and the Payment Cryptography service endpoint at the secondary (destination) Region is done through an Amazon Virtual Private Cloud (Amazon VPC) peering connection, over VPC interface endpoints from PrivateLink.
  7. Metadata information is saved on Amazon DynamoDB tables.

Walkthrough

The CRR solution is deployed in several steps, and it’s essential to understand the underlying processes involved, particularly TR-34 (ANSI X9.24-2) and TR-31 (ANSI X9.143-2022), which play crucial roles in ensuring the secure replication of card payment keys across Regions.

  1. Clone the solution repository from GitHub.
  2. Verify that the prerequisites are in place.
  3. Define which Region the AWS Cloud Development Kit (AWS CDK) stack will be deployed in. This is the primary Region that Payment Cryptography keys will be replicated from.
  4. Enable CRR. This step involves the TR-34 process, which is a widely adopted standard for the secure distribution of symmetric keys using asymmetric techniques. In the context of this solution, TR-34 is used to securely exchange the initial key-encrypting key (KEK) between the primary and secondary Regions. This KEK is then used to encrypt and securely transmit the card payment keys (also called working keys) during the replication process. TR-34 uses asymmetric cryptographic algorithms, such as RSA, to maintain the confidentiality, integrity and authenticity of the exchanged keys.

    Figure 2: TR-34 import key process

    Figure 2: TR-34 import key process

  5. Create, import, and delete keys in the primary Region to check that keys will be automatically replicated. This step uses the TR-31 process, which is a standard for the secure exchange of cryptographic keys and related data. In this solution, TR-31 is employed to securely replicate the card payment keys from the primary Region to the secondary Region, using the previously established KEK for encryption. TR-31 incorporates various cryptographic algorithms, such as AES and HMAC, to protect the confidentiality and integrity of the replicated keys during transit

    Figure 3: TR-31 import key process

    Figure 3: TR-31 import key process

  6. Clean up when needed.

Detailed information about key blocks can be found on the related ANSI documentation. To summarize, the TR-31 key block specification and the TR-34 key block specification, which is based on the TR-31 key block specification, consists of three parts:

  1. Key block header (KBH) – Contains attribute information about the key and the key block.
  2. Encrypted data – This is the key (initial key encryption key for TR-34 and working key for TR-31) being exchanged.
  3. Signature (MAC) – Calculated over the KBH and encrypted data.

Figure 4 presents the entire TR-31 and TR-34 key block parts. It is also called key binding method, which is the technique used to protect the key block secrecy and integrity. On both key blocks, the key, its length, and padding fields are encrypted, maintaining the key block secrecy. Signing of the entire key block fields verifies its integrity and authenticity. The signed result is appended to the end of the block.

Figure 4: TR-31 and TR-34 key block formats

Figure 4: TR-31 and TR-34 key block formats

By adhering to industry-standard protocols like TR-34 and TR-31, this solution helps to ensure that the replication of card payment keys across Regions is performed in a secure manner that delivers confidentiality, integrity, and authenticity. It’s worth mentioning that Payment Cryptography fully supports and implements these standards, providing a solution that adheres to PCI standards for secure key management and replication.

If you want to dive deep into this key management processes, see the service documentation page on import and export keys.

Prerequisites

The Payment Cryptography CRR solution will be deployed through the AWS CDK. The code was developed in Python and assumes that there is a python3 executable in your path. It’s also assumed that the AWS Command Line Interface (AWS CLI) and AWS CDK executables exist in the path system variable of your local computer.

Download and install the following:

It’s recommended that you use the latest stable versions. Tests were performed using the following versions:

  • Python: 3.12.2 (MacOS version)
  • jq: 1.7.1 (MacOS version)
  • AWS CLI: aws-cli/2.15.29 Python/3.11.8 Darwin/22.6.0 exe/x86_64 prompt/off
  • AWS CDK: 2.132.1 (build 9df7dd3)

To set up access to your AWS account, see Configure the AWS CLI.

Note: Tests and commands in the following sections where run on a MacOS operating system.

Deploy the primary resources

The solution is deployed in two main parts:

  1. Primary Region resources deployment
  2. CRR setup, where a secondary Region is defined for deployment of the necessary resources

This section will cover the first part:

Figure 5: Primary Region resources

Figure 5: Primary Region resources

Figure 5 shows the resources that will be deployed in the primary Region:

  1. A CloudTrail trail for write-only log events.
  2. CloudWatch Logs log group associated with the CloudTrail trail. An Amazon Simple Storage Service (Amazon S3) bucket is also created to store this trail’s log events.
  3. A VPC, private subnets, a security group, Lambda functions, and VPC endpoint resources to address private communication inside the AWS backbone.
  4. DynamoDB tables and DynamoDB Streams to manage key replication and orchestrate the solution deployment to the secondary Region.
  5. Lambda functions responsible for managing and orchestrating the solution deployment and setup.

Some parameters can be configured before deployment. They’re located in the cdk.json file (part of the GitHub solution to be downloaded) inside the solution base directory.

The parameters reside inside the context.ENVIRONMENTS.dev key:

{
  ...
  "context": {
    ...
    "ENVIRONMENTS": {
      "dev": {
        "origin_vpc_cidr": "10.2.0.0/16",
        "origin_vpc_name": "origin-vpc",
        "origin_subnets_mask": 22,
        "origin_subnets_prefix_name": "origin-subnet-private"
      }
    }
  }
}

Note: You can change the parameters origin_vpc_cidr, origin_vpc_name and origin_subnets_prefix_name.

Validate that there aren’t VPCs already created with the same CIDR range as the one defined in this file. Currently, the solution is set to be deployed in only two Availability Zones, so the suggestion is to keep the origin_subnet_mask value as is.

To deploy the primary resources:

  1. Download the solution folder from GitHub:
    $ git clone https://github.com/aws-samples/automatically-replicate-your-card-payment-keys.git && cd automatically-replicate-your-card-payment-keys

  2. Inside the solution directory, create a python virtual environment:
    $ python3 -m venv .venv

  3. Activate the python virtual environment:
    $ source .venv/bin/activate

  4. Install the dependencies:
    $ pip install -r requirements.txt

  5. If this is the first time deploying resources with the AWS CDK to your account in the selected AWS Region, run:
    $ cdk bootstrap

  6. Deploy the solution using the AWS CDK:
    $ cdk deploy

    Expected output:

    Do you wish to deploy these changes (y/n)? y  
    apc-crr: deploying... [1/1]  
    apc-crr: creating CloudFormation changeset...
     ✅  apc-crr
    ✨  Deployment time: 307.88s
    Stack ARN:
    arn:aws:cloudformation:<aws_region>:<aws_account>:stack/apc-crr/<stack_id>
    ✨  Total time: 316.06s

  7. If the solution is correctly deployed, an AWS CloudFormation stack with the name apc-crr will have a status of CREATE_COMPLETE status. You can check that by running the following command:
    $ aws cloudformation list-stacks --stack-status CREATE_COMPLETE

    Expected output:

    {
        "StackSummaries": [
            {
                "StackId": "arn:aws:cloudformation:us-east-1:111122223333:stack/apc-crr/5933bc00-f5c1-11ee-9bb2-12ef8d00991b",
                "StackName": "apc-crr",
                "CreationTime": "2024-04-08T16:02:07.413000+00:00",
                "LastUpdatedTime": "2024-04-08T16:02:21.439000+00:00",
                "StackStatus": "CREATE_COMPLETE",
                "DriftInformation": {
                    "StackDriftStatus": "NOT_CHECKED"
                }
            },
            {
                "StackId": "arn:aws:cloudformation:us-east-1:111122223333:stack/CDKToolkit/781e5390-e528-11ee-823a-0a6d63bbc467",
                "StackName": "CDKToolkit",
                "TemplateDescription": "This stack includes resources needed to deploy AWS CDK apps into this environment",
                "CreationTime": "2024-03-18T13:07:27.472000+00:00",
                "LastUpdatedTime": "2024-03-18T13:07:35.060000+00:00",
                "StackStatus": "CREATE_COMPLETE",
                "DriftInformation": {
                    "StackDriftStatus": "NOT_CHECKED"
                }
            }
        ]
    }

Set up cross-Region replication

Some parameters can be configured before initiating the setup. They’re located in the enable-crr.json file in the ./application folder.

The contents of the enable-crr.json file are:

{
  "enabled": true,
  "dest_region": "us-east-1",
  "kek_alias": "CRR_KEK_DO-NOT-DELETE_",
  "key_algo": "TDES_3KEY",
  "kdh_alias": "KDH_SIGN_KEY_DO-NOT-DELETE_",
  "krd_alias": "KRD_SIGN_KEY_DO-NOT-DELETE_",
  "dest_vpc_name": "apc-crr/destination-vpc",
  "dest_vpc_cidr": "10.3.0.0/16",
  "dest_subnet1_cidr": "10.3.0.0/22",
  "dest_subnet2_cidr": "10.3.4.0/22",
  "dest_subnets_prefix_name": "apc-crr/destination-vpc/destination-subnet-private",
  "dest_rt_prefix_name": "apc-crr/destination-vpc/destination-rtb-private"
}

You can change the dest_region, dest_vpc_name, dest_vpc_cidr, dest_subnet1_cidr, dest_subnet2_cidr, dest_subnets_prefix_name and dest_rt_prefix_name parameters.

Validate that there are no VPCs or subnets already created with the same CIDR ranges as are defined in this file.

To enable CRR and monitor its deployment process

  1. Enable CRR.

    From the solution base folder, navigate to the application directory:

    $ cd application

    Run the enable script.

    $ ./enable-crr.sh

    Expected output:

    START RequestId: 8aad062a-ff0b-4963-8ca0-f8078346854f Version: $LATEST  
    Setup has initiated. A CloudFormation template will be deployed in us-west-2.  
    Please check the apcStackMonitor log to follow the deployment status.  
    You can do that by checking the CloudWatch Logs Log group /aws/apc-crr/apcStackMonitor in the Management Console,  
    or by typing on a shell terminal: aws logs tail "/aws/lambda/apcStackMonitor" --follow  
    You can also check the CloudFormation Stack in the Management Console: Account 111122223333, Region us-west-2  
    END RequestId: 8aad062a-ff0b-4963-8ca0-f8078346854f  
    REPORT RequestId: 8aad062a-ff0b-4963-8ca0-f8078346854f  Duration: 1484.53 ms  Billed Duration: 1485 ms  Memory Size: 128 MB Max Memory Used: 79 MB  Init Duration: 400.95 ms

    This will launch a CloudFormation stack to be deployed in the AWS Region that the keys will be replicated to (secondary Region). Logs will be presented in the /aws/lambda/apcStackMonitor log (terminal from step 2).

    If the stack is successfully deployed (CREATE_COMPLETE state), then the KEK setup will be invoked. Logs will be presented in the /aws/lambda/apcKekSetup log (terminal from step 3).

    If the following message is displayed in the apcKekSetup log, then it means that the setup was concluded and new working keys created, deleted, or imported will be replicated.

    Keys Generated, Imported and Deleted in <Primary (Origin) Region> are now being automatically replicated to <Secondary (Destination) Region>

    There should be two keys created in the Region where CRR is deployed and two keys created where the working keys will be replicated. Use the following commands to check the keys:

    $ aws payment-cryptography list-keys --region us-east-1

    Command output showing the keys generated in the primary Region (us-east-1 in the example):

    {
        "Keys": [
            {
                "Enabled": true,
                "KeyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/oevdxprw6szesmfx",
                "KeyAttributes": {
                    "KeyAlgorithm": "RSA_4096",
                    "KeyClass": "PUBLIC_KEY",
                    "KeyModesOfUse": {
                        "Decrypt": false,
                        "DeriveKey": false,
                        "Encrypt": false,
                        "Generate": false,
                        "NoRestrictions": false,
                        "Sign": false,
                        "Unwrap": false,
                        "Verify": true,
                        "Wrap": false
                    },
                    "KeyUsage": "TR31_S0_ASYMMETRIC_KEY_FOR_DIGITAL_SIGNATURE"
                },
                "KeyState": "CREATE_COMPLETE"
            },
            {
                "Enabled": true,
                "Exportable": true,
                "KeyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/ey63g3an7u4ifz7u",
                "KeyAttributes": {
                    "KeyAlgorithm": "TDES_3KEY",
                    "KeyClass": "SYMMETRIC_KEY",
                    "KeyModesOfUse": {
                        "Decrypt": true,
                        "DeriveKey": false,
                        "Encrypt": true,
                        "Generate": false,
                        "NoRestrictions": false,
                        "Sign": false,
                        "Unwrap": true,
                        "Verify": false,
                        "Wrap": true
                    },
                    "KeyUsage": "TR31_K0_KEY_ENCRYPTION_KEY"
                },
                "KeyCheckValue": "7FB069",
                "KeyState": "CREATE_COMPLETE"
            }
        ]
    }
    
    $ aws payment-cryptography list-keys --region us-west-2

    The following is the command output showing the keys generated in the secondary Region (us-west-2 in the example):

    {
        "Keys": [
            {
                "Enabled": true,
                "Exportable": true,
                "KeyArn": "arn:aws:payment-cryptography:us-west-2:111122223333:key/4luahnz4ubuioq4s",
                "KeyAttributes": {
                    "KeyAlgorithm": "RSA_2048",
                    "KeyClass": "ASYMMETRIC_KEY_PAIR",
                    "KeyModesOfUse": {
                        "Decrypt": true,
                        "DeriveKey": false,
                        "Encrypt": true,
                        "Generate": false,
                        "NoRestrictions": false,
                        "Sign": false,
                        "Unwrap": true,
                        "Verify": false,
                        "Wrap": true
                    },
                    "KeyUsage": "TR31_D1_ASYMMETRIC_KEY_FOR_DATA_ENCRYPTION"
                },
                "KeyCheckValue": "56739D06",
                "KeyState": "CREATE_COMPLETE"
            },
            {
                "Enabled": true,
                "Exportable": true,
                "KeyArn": "arn:aws:payment-cryptography:us-west-2:111122223333:key/5gao3i6qvuyqqtzk",
                "KeyAttributes": {
                    "KeyAlgorithm": "TDES_3KEY",
                    "KeyClass": "SYMMETRIC_KEY",
                    "KeyModesOfUse": {
                        "Decrypt": true,
                        "DeriveKey": false,
                        "Encrypt": true,
                        "Generate": false,
                        "NoRestrictions": false,
                        "Sign": false,
                        "Unwrap": true,
                        "Verify": false,
                        "Wrap": true
                    },
                    "KeyUsage": "TR31_K0_KEY_ENCRYPTION_KEY"
                },
                "KeyCheckValue": "7FB069",
                "KeyState": "CREATE_COMPLETE"
            }
        ]
    }

  2. Monitor the resources deployment in the secondary Region. Open a terminal to tail the apcStackMonitor Lambda log and check the deployment of the resources in the secondary Region.
    $ aws logs tail "/aws/lambda/apcStackMonitor" --follow

    The expected output is:

    2024-03-05T15:18:17.870000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac INIT_START Runtime Version: python:3.11.v29  Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:2fb93380dac14772d30092f109b1784b517398458eef71a3f757425231fe6769  
    2024-03-05T15:18:18.321000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac START RequestId: 1bdd37b4-e95b-43bd-a49b-9da55e603845 Version: $LATEST  
    2024-03-05T15:18:18.933000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation in progress. Status: CREATE_IN_PROGRESS  
    2024-03-05T15:18:24.017000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation in progress. Status: CREATE_IN_PROGRESS  
    2024-03-05T15:18:29.108000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation in progress. Status: CREATE_IN_PROGRESS  
    ...  
    2024-03-05T15:21:32.302000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation in progress. Status: CREATE_IN_PROGRESS  
    2024-03-05T15:21:37.390000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack creation completed. Status: CREATE_COMPLETE  
    2024-03-05T15:21:38.258000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac Stack successfully deployed. Status: CREATE_COMPLETE  
    2024-03-05T15:21:38.354000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac END RequestId: 1bdd37b4-e95b-43bd-a49b-9da55e603845  
    2024-03-05T15:21:38.354000+00:00 2024/03/05/apcStackMonitor[$LATEST]6e6762b029cb4f7d8963c3206226deac REPORT RequestId: 1bdd37b4-e95b-43bd-a49b-9da55e603845Duration: 200032.11 ms Billed Duration: 200033 ms  Memory Size: 128 MB Max Memory Used: 93 MB  Init Duration: 450.87 ms

  3. Monitor the setup of the KEKs between the primary and secondary Regions. Open another terminal to tail the apcKekSetup Lambda log and check the setup of the KEK between the key distribution host (Payment Cryptography in the primary Region) and the key receiving devices (Payment Cryptography in the secondary Region).

    This process uses the TR-34 norm.

    $ aws logs tail "/aws/lambda/apcKekSetup" -–follow

    The expected output is:

    2024-03-12T14:58:18.954000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 INIT_START Runtime Version: python:3.11.v29  Runtime Version ARN: arn:aws:lambda:us-west-2::runtime:2fb93380dac14772d30092f109b1784b517398458eef71a3f757425231fe6769  
    2024-03-12T14:58:19.399000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 START RequestId: a9b60171-dfaf-433a-954c-b0a332d22f50 Version: $LATEST  
    2024-03-12T14:58:19.596000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 1. Generating Key Encryption Key (KEK) - Key that will be used to encrypt the Working Keys  
    2024-03-12T14:58:19.850000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 2. Getting APC Import Parameters from us-east-1  
    2024-03-12T14:58:21.680000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 3. Importing the Root Wrapping Certificates in us-west-2  
    2024-03-12T14:58:21.826000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 4. Getting APC Export Parameters from us-west-2  
    2024-03-12T14:58:23.193000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 5. Importing the Root Signing Certificates in us-east-1  
    2024-03-12T14:58:23.439000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 6. Exporting the KEK from us-west-2  
    2024-03-12T14:58:23.555000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Step 7. Importing the Wrapped KEK to us-east-1  
    2024-03-12T14:58:23.840000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 ##### Initial Key Exchange Successfully Completed.  
    2024-03-12T14:58:23.840000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 Keys Generated, Imported and Deleted in us-west-2 are now being automatically replicated to us-east-1  
    2024-03-12T14:58:23.840000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 Keys already present in APC won't be replicated. If you want to, it must be done manually.  
    2024-03-12T14:58:23.844000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 END RequestId: a9b60171-dfaf-433a-954c-b0a332d22f50  
    2024-03-12T14:58:23.844000+00:00 2024/03/12/apcKekSetup[$LATEST]ea4c6a7c85ac42da8a043aa8626d2897 REPORT RequestId: a9b60171-dfaf-433a-954c-b0a332d22f50 Duration: 4444.78 ms  Billed Duration: 4445 ms  Memory Size: 5120 MB  Max Memory Used: 95 MB  Init Duration: 444.73 ms

Testing

Now it’s time to test the solution. The idea is to simulate an application that manages keys in the service. You will use AWS CLI to send commands directly from a local computer to the Payment Cryptography public endpoints.

Check if the user or role being used has the necessary permissions to manage keys in the service. The following AWS Identity and Access Management (IAM) policy example shows an IAM policy that can be attached to the user or role that will run the commands in the service.

{
      "Version": "2012-10-17",
      "Statement": [
            {
               "Effect": "Allow",
               "Action": [
                  "payment-cryptography:CreateKey",
                  "payment-cryptography:ImportKey",
                  "payment-cryptography:DeleteKey"
               ],
               "Resource": [
                  "*"
               ]
            }   
      ]
   }

Note: As an add-on, you can change the *(asterisk) to the Amazon Resource Name (ARN) of the created key.

For information about IAM policies, see the Identity and access management for Payment Cryptography documentation.

To test the solution

  1. Prepare to monitor the replication processes. Open a new terminal to monitor the apcReplicateWk log and verify that keys are being replicated from one Region to the other.
    $ aws logs tail "/aws/lambda/apcReplicateWk" --follow

  2. Create, import, and delete working keys. Start creating and deleting keys in the account and Region where the CRR solution was deployed (primary Region).

    Currently, the solution only listens to the CreateKey, ImportKey and DeleteKey commands. CreateAlias and DeleteAlias commands aren’t yet implemented, so the aliases won’t replicate.

    It takes some time for the replication function to be invoked because it relies on the following steps:

    1. A Payment Cryptography (CreateKey, ImportKey, or DeleteKey) log event is delivered to a CloudTrail trail.
    2. The log event is sent to the CloudWatch Logs log group, which invokes the subscription filter and the Lambda function associated with it is run.

    CloudTrail typically delivers logs within about 5 minutes of an API call. This time isn’t guaranteed. See the AWS CloudTrail Service Level Agreement for more information.

    Example 1: Create a working key

    Run the following command:

    aws payment-cryptography create-key --exportable --key-attributes KeyAlgorithm=TDES_2KEY,\
    KeyUsage=TR31_C0_CARD_VERIFICATION_KEY,KeyClass=SYMMETRIC_KEY, \
    KeyModesOfUse='{Generate=true,Verify=true}'

    Command output:

    {
        "Key": {
            "CreateTimestamp": "2022-10-26T16:04:11.642000-07:00",
            "Enabled": true,
            "Exportable": true,
            "KeyArn": "arn:aws:payment-cryptography:us-east-1:111122223333:key/hjprdg5o4jtgs5tw",
            "KeyAttributes": {
                "KeyAlgorithm": "TDES_2KEY",
                "KeyClass": "SYMMETRIC_KEY",
                "KeyModesOfUse": {
                    "Decrypt": false,
                    "DeriveKey": false,
                    "Encrypt": false,
                    "Generate": true,
                    "NoRestrictions": false,
                    "Sign": false,
                    "Unwrap": false,
                    "Verify": true,
                    "Wrap": false
                },
                "KeyUsage": "TR31_C0_CARD_VERIFICATION_KEY"
            },
            "KeyCheckValue": "B72F",
            "KeyCheckValueAlgorithm": "ANSI_X9_24",
            "KeyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
            "KeyState": "CREATE_COMPLETE",
            "UsageStartTimestamp": "2022-10-26T16:04:11.559000-07:00"
        }
    }

    From the terminal where the /aws/lambda/apcReplicateWk log is being tailed, the expected output is:

    2024-03-05T15:57:13.871000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 INIT_START Runtime Version: python:3.11.v29   Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:2fb93380dac14772d30092f109b1784b517398458eef71a3f757425231fe6769  
    2024-03-05T15:57:14.326000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 START RequestId: c7670e9b-6db0-494e-86c4-4c64126695ee Version: $LATEST  
    2024-03-05T15:57:14.327000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 This is a WK! Sync in progress...  
    2024-03-05T15:57:14.717000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 ##### Step 1. Exporting SYMMETRIC_KEY arn:aws:payment-cryptography:us-east-1:111122223333:key/hjprdg5o4jtgs5tw from us-east-1 using alias/CRR_KEK_DO-NOT-DELETE_6e3606a32690 Key Encryption Key  
    2024-03-05T15:57:15.044000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 ##### Step 2. Importing the Wrapped Key to us-west-2  
    2024-03-05T15:57:15.661000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 Imported SYMMETRIC_KEY key: arn:aws:payment-cryptography:us-west-2:111122223333:key/bykk4cwnbyfu3exo as TR31_C0_CARD_VERIFICATION_KEY in us-west-2  
    2024-03-05T15:57:15.794000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 END RequestId: c7670e9b-6db0-494e-86c4-4c64126695ee  
    2024-03-05T15:57:15.794000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 REPORT RequestId: c7670e9b-6db0-494e-86c4-4c64126695ee        Duration: 1468.13 ms    Billed Duration: 1469 ms        Memory Size: 128 MB     Max Memory Used: 78 MB  Init Duration: 454.02 ms

    Example 2: Delete a working key:

    Run the following command:

    aws payment-cryptography delete-key \
    --key-identifier arn:aws:payment-cryptography:us-east-1:111122223333:key/hjprdg5o4jtgs5tw

    Command output:

    {
        "Key": {
            "KeyArn": "arn:aws:payment-cryptography:us-east-2:111122223333:key/kwapwa6qaifllw2h",
            "KeyAttributes": {
                "KeyUsage": "TR31_C0_CARD_VERIFICATION_KEY",
                "KeyClass": "SYMMETRIC_KEY",
                "KeyAlgorithm": "TDES_2KEY",
                "KeyModesOfUse": {
                    "Encrypt": false,
                    "Decrypt": false,
                    "Wrap": false,
                    "Unwrap": false,
                    "Generate": true,
                    "Sign": false,
                    "Verify": true,
                    "DeriveKey": false,
                    "NoRestrictions": false
                }
            },
            "KeyCheckValue": "",
            "KeyCheckValueAlgorithm": "ANSI_X9_24",
            "Enabled": false,
            "Exportable": true,
            "KeyState": "DELETE_PENDING",
            "KeyOrigin": "AWS_PAYMENT_CRYPTOGRAPHY",
            "CreateTimestamp": "2023-06-05T12:01:29.969000-07:00",
            "UsageStopTimestamp": "2023-06-05T14:31:13.399000-07:00",
            "DeletePendingTimestamp": "2023-06-12T14:58:32.865000-07:00"
        }
    }

    From the terminal where the /aws/lambda/apcReplicateWk log is being tailed, the expected output is:

    2024-03-05T16:02:56.892000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 START RequestId: d557cb28-6974-4888-bb7b-9f8aa4b78640 Version: $LATEST  
    2024-03-05T16:02:56.894000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 This is not CreateKey or ImportKey!  
    2024-03-05T16:02:57.621000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 arn:aws:payment-cryptography:us-west-2: 111122223333:key/bykk4cwnbyfu3exo deleted from us-west-2.  
    2024-03-05T16:02:57.691000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 END RequestId: d557cb28-6974-4888-bb7b-9f8aa4b78640  
    2024-03-05T16:02:57.691000+00:00 2024/03/05/apcReplicateWk[$LATEST]66dae4eef2bf42f6afd0e4cc70b48606 REPORT RequestId: d557cb28-6974-4888-bb7b-9f8aa4b78640        Duration: 802.89 ms     Billed Duration: 803 ms Memory Size: 128 MB     Max Memory Used: 79 MB

    See the service documentation for more information about key management operations.

Clean up

You can disable the solution (keys will stop being replicated, but the resources from the primary Region will remain deployed) or destroy the resources that were deployed.

Note: Completing only Step 3, destroying the stack, won’t delete the resources deployed in the secondary Region or the keys that have been generated.

  1. Disable the CRR solution.

    The KEKs created during the enablement process will be disabled and marked for deletion in both the primary and secondary Regions. The waiting period before deletion is 3 days.

    From the base directory where the solution is deployed, run the following commands:

    $ source .venv/bin/activate
    $ cd application
    $ ./disable-crr.sh

    Expected output:

    START RequestId: bc96659c-3063-460a-8b29-2aa21b967c9a Version: $LATEST  
    Deletion has initiated...  
    Please check the apcKekSetup log to check if the solution has been successfully disabled.  
    You can do that by checking the CloudWatch Logs Log group /aws/apc-crr/apcKekSetup in the Management Console,  
    or by typing on a shell terminal: aws logs tail "/aws/lambda/apcKekSetup" --follow  
      
    Please check the apcStackMonitor log to follow the stack deletion status.  
    You can do that by checking the CloudWatch Logs Log group /aws/apc-crr/apcStackMonitor in the Management Console,  
    or by typing on a shell terminal: aws logs tail "/aws/lambda/apcStackMonitor" --follow  
    END RequestId: bc96659c-3063-460a-8b29-2aa21b967c9a  
    REPORT RequestId: bc96659c-3063-460a-8b29-2aa21b967c9a  Duration: 341.94 ms Billed Duration: 342 ms Memory Size: 128 MB Max Memory Used: 76 MB  Init Duration: 429.87 ms

  2. Monitor the Lambda functions logs.

    Open two other terminals.

    1. On the first terminal, run:
      $ aws logs tail "/aws/lambda/apcKekSetup" --follow

    2. On the second terminal, run:
      $ aws logs tail "/aws/lambda/apcStackMonitor" --follow

    Keys created during the exchange of the KEK will be deleted and the logs will be presented in the /aws/lambda/apcKekSetup log group.

    Expected output:

    2024-03-05T16:40:23.510000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d INIT_START Runtime Version: python:3.11.v28  Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:7893bafe1f7e5c0681bc8da889edf656777a53c2a26e3f73436bdcbc87ccfbe8  
    2024-03-05T16:40:23.971000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d START RequestId: fc10b303-f028-4a94-a2cf-b8c0a762ea16 Version: $LATEST  
    2024-03-05T16:40:23.971000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d Disabling CRR and Deleting KEKs  
    2024-03-05T16:40:25.276000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d Keys and aliases deleted from APC.  
    2024-03-05T16:40:25.294000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d DB status updated.  
    2024-03-05T16:40:25.297000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d END RequestId: fc10b303-f028-4a94-a2cf-b8c0a762ea16  
    2024-03-05T16:40:25.297000+00:00 2024/03/05/apcKekSetup[$LATEST]1c97946d8bc747b19cc35d9b1472ff8d REPORT RequestId: fc10b303-f028-4a94-a2cf-b8c0a762ea16 Duration: 1326.39 ms  Billed Duration: 1327 ms  Memory Size: 5120 MB  Max Memory Used: 94 MB  Init Duration: 460.29 ms

    Second, the CloudFormation stack will be deleted with its associated resources.
    Logs will be presented in the /aws/lambda/apcStackMonitor Log group.

    Expected output:

    2024-03-05T16:40:25.854000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec START RequestId: 6b0b8207-19ae-40a1-b889-c92f8a5c243c Version: $LATEST  
    2024-03-05T16:40:26.486000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec De-provisioning Resources in the Destination Region. StackName: apc-setup-orchestrator-77aecbcf-1e4f-4e2a-8faa-6e3606a32690  
    2024-03-05T16:40:26.805000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:40:31.889000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:40:36.977000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:40:42.065000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:40:47.152000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    ...  
    2024-03-05T16:44:10.598000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:44:15.683000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion in progress. Status: DELETE_IN_PROGRESS  
    2024-03-05T16:44:20.847000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Stack deletion completed. Status: DELETE_COMPLETE  
    2024-03-05T16:44:21.043000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec Resources successfully deleted. Status: DELETE_COMPLETE  
    2024-03-05T16:44:21.601000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec END RequestId: 6b0b8207-19ae-40a1-b889-c92f8a5c243c  
    2024-03-05T16:44:21.601000+00:00 2024/03/05/apcStackMonitor[$LATEST]2cb4c9044a08474894ff5fa81940dbec REPORT RequestId: 6b0b8207-19ae-40a1-b889-c92f8a5c243cDuration: 235746.42 ms Billed Duration: 235747 ms  Memory Size: 128 MB Max Memory Used: 94 MB

  3. Destroy the CDK stack.
    $ cd ..
    $ cdk destroy

    Expected output:

    Are you sure you want to delete: apc-crr (y/n)? y  
    apc-crr: destroying... [1/1]  
     ✅  apc-crr: destroyed

  4. Delete all remaining working keys from the primary Region. If keys generated in the primary Region weren’t deleted before disabling the solution, then they’ll also exist in the secondary Region. To clean up keys in both Regions, get the key ARNS that have a status of CREATE_COMPLETE and delete them.
    $ aws payment-cryptography list-keys --region <Primary (Origin) Region>
    
    $ aws payment-cryptography delete-key \
    --key-identifier <key arn> --region <Primary (Origin) Region>
    
    $ aws payment-cryptography list-keys --region <Secondary (Destination) Region>
    
    $ aws payment-cryptography delete-key \
    --key-identifier <key arn> --region <Secondary (Destination) Region>

Security considerations

While using this Payment Cryptography CRR solution, it is crucial that you follow security best practices to maintain the highest level of protection for sensitive payment data:

  • Least privilege access: Implement strict access controls and follow the principle of least privilege, granting access to payment cryptography resources only to authorized personnel and services.
  • Encryption in transit and at rest: Make sure that sensitive data, including encryption keys and payment card data, is encrypted both in transit and at rest using industry-standard encryption algorithms.
  • Audit logging and monitoring: Activate audit logging and continuously monitor activity logs for suspicious or unauthorized access attempts.
  • Regular key rotation: Implement a key rotation strategy to periodically rotate encryption keys, reducing the risk of key compromise and minimizing potential exposure.
  • Incident response plan: Develop and regularly test an incident response plan to promote efficient and coordinated actions in the event of a security breach or data compromise.

Conclusion

In the rapidly evolving world of card payment transactions, maintaining high availability, resilience, robust security, and disaster recovery capabilities is crucial for maintaining customer trust and business continuity. AWS Payment Cryptography offers a solution that is tailored specifically for protecting sensitive payment card data.

By using the CRR solution, organizations can confidently address the stringent requirements of the payment industry, safeguarding sensitive data while providing continued access to payment services, even in the face of regional outages or disasters. With Payment Cryptography, organizations are empowered to deliver seamless and secure payment experiences to their customers.

To go further with this solution, you can modify it to fit your organization architecture by, for example, adding replication for the CreateAlias command. This will allow the Payment Cryptography keys alias to also be replicated between the primary and secondary Regions.

References

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Ruy Cavalcanti
Ruy Cavalcanti

Ruy is a Senior Security Architect for the Latin American financial market at AWS. He has worked in IT and Security for over 19 years, helping customers create secure architectures and solve data protection and compliance challenges. When he’s not architecting secure solutions, he enjoys jamming on his guitar, cooking Brazilian-style barbecue, and spending time with his family and friends.

2024 ISO and CSA STAR certificates now available with three additional services

Post Syndicated from Atulsing Patil original https://aws.amazon.com/blogs/security/2024-iso-and-csa-star-certificates-now-available-with-three-additional-services/

Amazon Web Services (AWS) successfully completed an onboarding audit with no findings for ISO 9001:2015, 27001:2022, 27017:2015, 27018:2019, 27701:2019, 20000-1:2018, and 22301:2019, and Cloud Security Alliance (CSA) STAR Cloud Controls Matrix (CCM) v4.0. Ernst and Young CertifyPoint auditors conducted the audit and reissued the certificates on July 22, 2024. The objective of the audit was to assess the level of compliance with the requirements of the applicable international standards.

During the audit, we added the following three AWS services to the scope of the certification:

For a full list of AWS services that are certified under ISO and CSA Star, see the AWS ISO and CSA STAR Certified page. Customers can also access the certifications in the AWS Management Console through AWS Artifact.

If you have feedback about this post, submit comments in the Comments section below.

Atul Patil

Atulsing Patil
Atulsing is a Compliance Program Manager at AWS. He has 27 years of consulting experience in information technology and information security management. Atulsing holds a master of science in electronics degree and professional certifications such as CCSP, CISSP, CISM, CDPSE, ISO 27001 Lead Auditor, HITRUST CSF, Archer Certified Consultant, and AWS CCP.

Nimesh Ravas

Nimesh Ravasa
Nimesh is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Nimesh has 15 years of experience in information security and holds CISSP, CDPSE, CISA, PMP, CSX, AWS Solutions Architect – Associate, and AWS Security Specialty certifications.

Chinmaee Parulekar

Chinmaee Parulekar
Chinmaee is a Compliance Program Manager at AWS. She has 5 years of experience in information security. Chinmaee holds a master of science degree in management information systems and professional certifications such as CISA.

Summer 2024 SOC report now available with 177 services in scope

Post Syndicated from Brownell Combs original https://aws.amazon.com/blogs/security/summer-2024-soc-report-now-available-with-177-services-in-scope/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that the Summer 2024 System and Organization Controls (SOC) 1 report is now available. The report covers 177 services over the 12-month period of July 1, 2023–June 30, 2024, so that customers have a full year of assurance with the report. This report demonstrates our continuous commitment to adhere to the heightened expectations for cloud service providers.

Going forward, we will issue SOC reports covering a 12-month period each quarter as follows:

Report Period covered
Spring SOC 1, 2, and 3 April 1–March 31
Summer SOC 1 July 1–June 30
Fall SOC 1, 2, and 3 October 1–September 30
Winter SOC 1 January 1–December 31

Customers can download the Summer 2024 SOC report through AWS Artifact, a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about SOC compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Brownell Combs
Brownell Combs

Brownell is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Brownell holds a master of science degree in computer science from University of Virginia and a bachelor of science degree in computer science from Centre College. He has over 20 years of experience in IT risk management and CISSP, CISA, CRISC, and GIAC GCLD certifications.
Paul Hong
Paul Hong

Paul is a Compliance Program Manager at AWS. He leads multiple security, compliance, and training initiatives within AWS, and has 10 years of experience in security assurance. Paul holds CISSP, CEH, and CPA certifications. He has a master’s degree in accounting information systems and a bachelor’s degree in business administration from James Madison University, Virginia.
Tushar Jain
Tushar Jain

Tushar is a Compliance Program Manager at AWS. He leads multiple security, compliance, and training initiatives within AWS. Tushar holds a master of business administration from Indian Institute of Management Shillong, and a bachelor of technology in electronics and telecommunication engineering from Marathwada University. He has over 12 years of experience in information security and holds CCSK and CSXF certifications.
Michael Murphy
Michael Murphy

Michael is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Michael has 12 years of experience in information security. He holds a master’s degree and a bachelor’s degree in computer engineering from Stevens Institute of Technology. He also holds CISSP, CRISC, CISA, and CISM certifications.
Nathan Samuel
Nathan Samuel

Nathan is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Nathan has a bachelor of commerce degree from the University of the Witwatersrand, South Africa, and has over 21 years of experience in security assurance. He holds the CISA, CRISC, CGEIT, CISM, CDPSE, and Certified Internal Auditor certifications.
ryan wilks
Ryan Wilks

Ryan is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Ryan has 13 years of experience in information security. He has a bachelor of arts degree from Rutgers University and holds ITIL, CISM, and CISA certifications.

Encryption in transit over external networks: AWS guidance for NYDFS and beyond

Post Syndicated from Aravind Gopaluni original https://aws.amazon.com/blogs/security/encryption-in-transit-over-external-networks-aws-guidance-for-nydfs-and-beyond/

On November 1, 2023, the New York State Department of Financial Services (NYDFS) issued its Second Amendment (the Amendment) to its Cybersecurity Requirements for Financial Services Companies adopted in 2017, published within Section 500 of 23 NYCRR 500 (the Cybersecurity Requirements; the Cybersecurity Requirements as amended by the Amendment, the Amended Cybersecurity Requirements). In the introduction to its Cybersecurity Resource Center, the Department explains that the revisions are aimed at addressing the changes in the increasing sophistication of threat actors, the prevalence of and relative ease in running cyberattacks, and the availability of additional controls to manage cyber risks.

This blog post focuses on the revision to the encryption in transit requirement under section 500.15(a). It outlines the encryption capabilities and secure connectivity options offered by Amazon Web Services (AWS) to help customers demonstrate compliance with this updated requirement. The post also provides best practices guidance, emphasizing the shared responsibility model. This enables organizations to design robust data protection strategies that address not only the updated NYDFS encryption requirements but potentially also other security standards and regulatory requirements.

The target audience for this information includes security leaders, architects, engineers, and security operations team members and risk, compliance, and audit professionals.

Note that the information provided here is for informational purposes only; it is not legal or compliance advice and should not be relied on as legal or compliance advice. Customers are responsible for making their own independent assessments and should obtain appropriate advice from their own legal and compliance advisors regarding compliance with applicable NYDFS regulations.

500.15 Encryption of nonpublic information

The updated requirement in the Amendment states that:

  1. As part of its cybersecurity program, each covered entity shall implement a written policy requiring encryption that meets industry standards, to protect nonpublic information held or transmitted by the covered entity both in transit over external networks and at rest.
  2. To the extent a covered entity determines that encryption of nonpublic information at rest is infeasible, the covered entity may instead secure such nonpublic information using effective alternative compensating controls that have been reviewed and approved by the covered entity’s CISO in writing. The feasibility of encryption and effectiveness of the compensating controls shall be reviewed by the CISO at least annually.

This section of the Amendment removes the covered entity’s chief information security officer’s (CISO) discretion to approve compensating controls when encryption of nonpublic information in transit over external networks is deemed infeasible. The Amendment mandates that, effective November 2024, organizations must encrypt nonpublic information transmitted over external networks without the option of implementing alternative compensating controls. While the use of security best practices such as network segmentation, multi-factor authentication (MFA), and intrusion detection and prevention systems (IDS/IPS) can provide defense in depth, these compensating controls are no longer sufficient to replace encryption in transit over external networks for nonpublic information.

However, the Amendment still allows for the CISO to approve the use of alternative compensating controls where encryption of nonpublic information at rest is deemed infeasible. AWS is committed to providing industry-standard encryption services and capabilities to help protect customer data at rest in the cloud, offering customers the ability to add layers of security to their data at rest, providing scalable and efficient encryption features. This includes the following services:

While the above highlights encryption-at-rest capabilities offered by AWS, the focus of this blog post is to provide guidance and best practice recommendations for encryption in transit.

AWS guidance and best practice recommendations

Cloud network traffic encompasses connections to and from the cloud and traffic between cloud service provider (CSP) services. From an organization’s perspective, CSP networks and data centers are deemed external because they aren’t under the organization’s direct control. The connection between the organization and a CSP, typically established over the internet or dedicated links, is considered an external network. Encrypting data in transit over these external networks is crucial and should be an integral part of an organization’s cybersecurity program.

AWS implements multiple mechanisms to help ensure the confidentiality and integrity of customer data during transit and at rest across various points within its environment. While AWS employs transparent encryption at various transit points, we strongly recommend incorporating encryption by design into your architecture. AWS provides robust encryption-in-transit capabilities to help you adhere to compliance requirements and mitigate the risks of unauthorized disclosure and modification of nonpublic information in transit over external networks.

Additionally, AWS recommends that financial services institutions adopt a secure by design (SbD) approach to implement architectures that are pre-tested from a security perspective. SbD helps establish control objectives, security baselines, security configurations, and audit capabilities for workloads running on AWS.

Security and Compliance is a shared responsibility between AWS and the customer. Shared responsibility can vary depending on the security configuration options for each service. You should carefully consider the services you choose because your organization’s responsibilities vary depending on the services used, the integration of those services into your IT environment, and applicable laws and regulations. AWS provides resources such as service user guides and AWS Customer Compliance Guides, which map security best practices for individual services to leading compliance frameworks, including NYDFS.

Protecting connections to and from AWS

We understand that customers place a high priority on privacy and data security. That’s why AWS gives you ownership and control over your data through services that allow you to determine where your content will be stored, secure your content in transit and at rest, and manage access to AWS services and resources for your users. When architecting workloads on AWS, classifying data based on its sensitivity, criticality, and compliance requirements is essential. Proper data classification allows you to implement appropriate security controls and data protection mechanisms, such as Transport Layer Security (TLS) at the application layer, access control measures, and secure network connectivity options for nonpublic information over external networks. When it comes to transmitting nonpublic information over external networks, it’s a recommended practice to identify network segments traversed by this data based on your network architecture. While AWS employs transparent encryption at various transit points, it’s advisable to implement encryption solutions at multiple layers of the OSI model to establish defense in depth and enhance end-to-end encryption capabilities. Although requirement 500.15 of the Amendment doesn’t mandate end-to-end encryption, implementing such controls can provide an added layer of security and can help demonstrate that nonpublic information is consistently encrypted during transit.

AWS offers several options to achieve this. While not every option provides end-to-end encryption on its own, using them in combination helps to ensure that nonpublic information doesn’t traverse open, public networks unprotected. These options include:

  • Using AWS Direct Connect with IEEE 802.1AE MAC Security Standard (MACsec) encryption
  • VPN connections
  • Secure API endpoints
  • Client-side encryption of data before sending it to AWS

AWS Direct Connect with MACsec encryption

AWS Direct Connect provides direct connectivity to the AWS network through third-party colocation facilities, using a cross-connect between an AWS owned device and either a customer- or partner-owned device. Direct Connect can reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections. Within Direct Connect connections (a physical construct) there will be one or more virtual interfaces (VIFs). These are logical entities and are reflected as industry-standard 802.1Q VLANs on the customer equipment terminating the Direct Connect connection. Depending on the type of VIF, they will use either public or private IP addressing. There are three different types of VIFs:

  • Public virtual interface – Establish connectivity between AWS public endpoints and your data center, office, or colocation environment.
  • Transit virtual interface – Establish private connectivity between AWS Transit Gateways and your data center, office, or colocation environment. Transit Gateways is an AWS managed high availability and scalability regional network transit hub used to interconnect Amazon Virtual Private Cloud (Amazon VPC) and customer networks.
  • Private virtual interface – Establish private connectivity between Amazon VPC resources and your data center, office, or colocation environment.

By default, a Direct Connect connection isn’t encrypted from your premises to the Direct Connect location because AWS cannot assume your on-premises device supports the MACsec protocol. With MACsec, Direct Connect delivers native, near line-rate, point-to-point encryption, ensuring that data communications between AWS and your corporate network remain protected. MACsec is supported on 10 Gbps and 100 Gbps dedicated Direct Connect connections at selected points of presence. Using Direct Connect with MACsec-enabled connections and combining it with the transparent physical network encryption offered by AWS from the Direct Connect location through the AWS backbone not only benefits you by allowing you to securely exchange data with AWS, but also enables you to use the highest available bandwidth. For additional information on MACsec support and cipher suites, see the MACsec section in the Direct Connect FAQs.

Figure 1 illustrates a sample reference architecture for securing traffic from corporate network to your VPCs over Direct Connect with MACsec and AWS Transit Gateways.

Figure 1: Sample architecture for using Direct Connect with MACsec encryption

Figure 1: Sample architecture for using Direct Connect with MACsec encryption

In the sample architecture, you can see that Layer 2 encryption through MACsec only encrypts the traffic from your on-premises systems to the AWS device in the Direct Connect location, and therefore you need to consider additional encryption solutions at Layer 3, 4, or 7 to get closer to end-to-end encryption to the device where you’re comfortable for the packets to be decrypted. In the next section, let’s review an option for using network layer encryption using AWS Site-to-Site VPN.

Direct Connect with Site-to-Site VPN

AWS Site-to-Site VPN is a fully managed service that creates a secure connection between your corporate network and your Amazon VPC using IP security (IPsec) tunnels over the internet. Data transferred between your VPC and the remote network routes over an encrypted VPN connection to help maintain the confidentiality and integrity of data in transit. Each VPN connection consists of two tunnels between a virtual private gateway or transit gateway on the AWS side and a customer gateway on the on-premises side. Each tunnel supports a maximum throughput of up to 1.25 Gbps. See Site-to-Site VPN quotas for more information.

You can use Site-to-Site VPN over Direct Connect to achieve secure IPsec connection with the low latency and consistent network experience of Direct Connect when reaching resources in your Amazon VPCs.

Figure 2 illustrates a sample reference architecture for establishing end-to-end IPsec-encrypted connections between your networks and Transit Gateway over a private dedicated connection.

Figure 2: Encrypted connections between the AWS Cloud and a customer’s network using VPN

Figure 2: Encrypted connections between the AWS Cloud and a customer’s network using VPN

While Direct Connect with MACsec and Site-to-Site VPN with IPsec can provide encryption at the physical and network layers respectively, they primarily secure the data in transit between your on-premises network and the AWS network boundary. To further enhance the coverage for end-to-end encryption, it is advisable to use TLS encryption. In the next section, let’s review mechanisms for securing API endpoints on AWS using TLS encryption.

Secure API endpoints

APIs act as the front door for applications to access data, business logic, or functionality from other applications and backend services.

AWS enables you to establish secure, encrypted connections to its services using public AWS service API endpoints. Public AWS owned service API endpoints (AWS managed services like Amazon Simple Queue Service (Amazon SQS), AWS Identity and Access Management (IAM), AWS Key Management Service (AWS KMS), others) have certificates that are owned and deployed by AWS. By default, requests to these public endpoints use HTTPS. To align with evolving technology and regulatory standards for TLS, as of February 27, 2024, AWS has updated its TLS policy to require a minimum of TLS 1.2, thereby deprecating support for TLS 1.0 and 1.1 versions on AWS service API endpoints across each of our AWS Regions and Availability Zones.

Additionally, to enhance connection performance, AWS has begun enabling TLS version 1.3 globally for its service API endpoints. If you’re using the AWS SDKs or AWS Command Line Interface (AWS CLI), you will automatically benefit from TLS 1.3 after a service enables it.

While requests to public AWS service API endpoints use HTTPS by default, a few services, such as Amazon S3 and Amazon DynamoDB, allow using either HTTP or HTTPS. If the client or application chooses HTTP, the communication isn’t encrypted. Customers are responsible for enforcing HTTPS connections when using such AWS services. To help ensure secure communication, you can establish an identity perimeter by using the IAM policy condition key aws:SecureTransport in your IAM roles to evaluate the connection and mandate HTTPS usage.

As enterprises increasingly adopt cloud computing and microservices architectures, teams frequently build and manage internal applications exposed as private API endpoints. Customers are responsible for managing the certificates on private customer-owned endpoints. AWS helps you deploy private customer-owned identities (that is, TLS certificates) through the use of AWS Certificate Manager (ACM) private certificate authorities (PCA) and the integration with AWS services that offer private customer-owned TLS termination endpoints.

ACM is a fully managed service that lets you provision, manage, and deploy public and private TLS certificates for use with AWS services and internal connected resources. ACM minimizes the time-consuming manual process of purchasing, uploading, and renewing TLS certificates. You can provide certificates for your integrated AWS services either by issuing them directly using ACM or by importing third-party certificates into the ACM management system. ACM offers two options for deploying managed X.509 certificates. You can choose the best one for your needs.

  • AWS Certificate Manager (ACM) – This service is for enterprise customers who need a secure web presence using TLS. ACM certificates are deployed through Elastic Load Balancing (ELB), Amazon CloudFront, Amazon API Gateway, and other integrated AWS services. The most common application of this type is a secure public website with significant traffic requirements. ACM also helps to simplify security management by automating the renewal of expiring certificates.
  • AWS Private Certificate Authority (Private CA) – This service is for enterprise customers building a public key infrastructure (PKI) inside the AWS Cloud and is intended for private use within an organization. With AWS Private CA, you can create your own certificate authority (CA) hierarchy and issue certificates with it for authenticating users, computers, applications, services, servers, and other devices. Certificates issued by a private CA cannot be used on the internet. For more information, see the AWS Private CA User Guide.

You can use a centralized API gateway service, such as Amazon API Gateway, to securely expose customer-owned private API endpoints. API Gateway is a fully managed service that allows developers to create, publish, maintain, monitor, and secure APIs at scale. With API Gateway, you can create RESTful APIs and WebSocket APIs, enabling near real-time, two-way communication applications. API Gateway operations must be encrypted in-transit using TLS, and require the use of HTTPS endpoints. You can use API Gateway to configure custom domains for your APIs using TLS certificates provisioned and managed by ACM. Developers can optionally choose a specific TLS version for their custom domain names. For use cases that require mutual TLS (mTLS) authentication, you can configure certificate-based mTLS authentication on your custom domains.

Pre-encryption of data to be sent to AWS

Depending on the risk profile and sensitivity of the data that’s being transferred to AWS, you might want to choose encrypting data in an application running on your corporate network before sending it to AWS (client-side encryption). AWS offers a variety of SDKs and client-side encryption libraries to help you encrypt and decrypt data in your applications. You can use these libraries with the cryptographic service provider of your choice, including AWS Key Management Service or AWS CloudHSM, but the libraries do not require an AWS service.

  • The AWS Encryption SDK is a client-side encryption library that you can use to encrypt and decrypt data in your application and is available in several programming languages, including a command-line interface. You can use the SDK to encrypt your data before you send it to an AWS service. The SDK offers advanced data protection features, including envelope encryption and additional authenticated data (AAD). It also offers secure, authenticated, symmetric key algorithm suites, such as 256-bit AES-GCM with key derivation and signing.
  • The AWS Database Encryption SDK is a set of software libraries developed in open source that enable you to include client-side encryption in your database design. The SDK provides record-level encryption solutions. You specify which fields are encrypted and which fields are included in the signatures that help ensure the authenticity of your data. Encrypting your sensitive data in transit and at rest helps ensure that your plaintext data isn’t available to a third party, including AWS. The AWS Database Encryption SDK for DynamoDB is designed especially for DynamoDB applications. It encrypts the attribute values in each table item using a unique encryption key. It then signs the item to protect it against unauthorized changes, such as adding or deleting attributes or swapping encrypted values. After you create and configure the required components, the SDK transparently encrypts and signs your table items when you add them to a table. It also verifies and decrypts them when you retrieve them. Searchable encryption in the AWS Database Encryption SDK enables you search encrypted records without decrypting the entire database. This is accomplished by using beacons, which create a map between the plaintext value written to a field and the encrypted value that is stored in your database. For more information, see the AWS Database Encryption SDK Developer Guide.
  • The Amazon S3 Encryption Client is a client-side encryption library that enables you to encrypt an object locally to help ensure its security before passing it to Amazon S3. It integrates seamlessly with the Amazon S3 APIs to provide a straightforward solution for client-side encryption of data before uploading to Amazon S3. After you instantiate the Amazon S3 Encryption Client, your objects are automatically encrypted and decrypted as part of your Amazon S3 PutObject and GetObject requests. Your objects are encrypted with a unique data key. You can use both the Amazon S3 Encryption Client and server-side encryption to encrypt your data. The Amazon S3 Encryption Client is supported in a variety of programming languages and supports industry-standard algorithms for encrypting objects and data keys. For more information, see the Amazon S3 Encryption Client developer guide.

Encryption in-transit inside AWS

AWS implements responsible and sophisticated technical and physical controls that are designed to help prevent unauthorized access to or disclosure of your content. To protect data in transit, traffic traversing through the AWS network that is outside of AWS physical control is transparently encrypted by AWS at the physical layer. This includes traffic between AWS Regions (except China Regions), traffic between Availability Zones, and between Direct Connect locations and Regions through the AWS backbone network.

Network segmentation

When you create an AWS account, AWS offers a virtual networking option to launch resources in a logically isolated virtual private network (VPN), Amazon Virtual Private Cloud (Amazon VPC). A VPC is limited to a single AWS Region and every VPC has one or more subnets. VPCs can be connected externally using an internet gateway (IGW), VPC peering connection, VPN, Direct Connect, or Transit Gateways. Traffic within the your VPC is considered internal because you have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways.

As a customer, you maintain ownership of your data, and you select which AWS services can process, store, and host your data, and you choose the Regions in which your data is stored. AWS doesn’t automatically replicate data across Regions, unless the you choose to do so. Data transmitted over the AWS global network between Regions and Availability Zones is automatically encrypted at the physical layer before leaving AWS secured facilities. Cross-Region traffic that uses Amazon VPC and Transit Gateway peering is automatically bulk-encrypted when it exits a Region.

Encryption between instances

AWS provides secure and private connectivity between Amazon Elastic Compute Cloud (Amazon EC2) instances of all types. The Nitro System is the underlying foundation for modern Amazon EC2 instances. It’s a combination of purpose-built server designs, data processors, system management components, and specialized firmware that provides the underlying foundation for EC2 instances launched since the beginning of 2018. Instance types that use the offload capabilities of the underlying Nitro System hardware automatically encrypt in-transit traffic between instances. This encryption uses Authenticated Encryption with Associated Data (AEAD) algorithms, with 256-bit encryption and has no impact on network performance. To support this additional in-transit traffic encryption between instances, instances must be of supported instance types, in the same Region, and in the same VPC or peered VPCs. For a list of supported instance types and additional requirements, see Encryption in transit.

Conclusion

The second Amendment to the NYDFS Cybersecurity Regulation underscores the criticality of safeguarding nonpublic information during transmission over external networks. By mandating encryption for data in transit and eliminating the option for compensating controls, the Amendment reinforces the need for robust, industry-standard encryption measures to protect the confidentiality and integrity of sensitive information.

AWS provides a comprehensive suite of encryption services and secure connectivity options that enable you to design and implement robust data protection strategies. The transparent encryption mechanisms that AWS has built into services across its global network infrastructure, secure API endpoints with TLS encryption, and services such as Direct Connect with MACsec encryption and Site-to-Site VPN, can help you establish secure, encrypted pathways for transmitting nonpublic information over external networks.

By embracing the principles outlined in this blog post, financial services organizations can address not only the updated NYDFS encryption requirements for section 500.15(a) but can also potentially demonstrate their commitment to data security across other security standards and regulatory requirements.

For further reading on considerations for AWS customers regarding adherence to the Second Amendment to the NYDFS Cybersecurity Regulation, see the AWS Compliance Guide to NYDFS Cybersecurity Regulation.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Financial Services re:Post and AWS Security, Identity, & Compliance re:Post ,or contact AWS Support.
 

Aravind Gopaluni
Aravind Gopaluni

Aravind is a Senior Security Solutions Architect at AWS, helping financial services customers navigate ever-evolving cloud security and compliance needs. With over 20 years of experience, he has honed his expertise in delivering robust solutions to numerous global enterprises. Away from the world of cybersecurity, he cherishes traveling and exploring cuisines with his family.
Stephen Eschbach
Stephen Eschbach

Stephen is a Senior Compliance Specialist at AWS, helping financial services customers meet their security and compliance objectives on AWS. With over 18 years of experience in enterprise risk, IT GRC, and IT regulatory compliance, Stephen has worked and consulted for several global financial services companies. Outside of work, Stephen enjoys family time, kids’ sports, fishing, golf, and Texas BBQ.

Making sense of secrets management on Amazon EKS for regulated institutions

Post Syndicated from Piyush Mattoo original https://aws.amazon.com/blogs/security/making-sense-of-secrets-management-on-amazon-eks-for-regulated-institutions/

Amazon Web Services (AWS) customers operating in a regulated industry, such as the financial services industry (FSI) or healthcare, are required to meet their regulatory and compliance obligations, such as the Payment Card Industry Data Security Standard (PCI DSS) or Health Insurance Portability and Accountability Act (HIPPA).

AWS offers regulated customers tools, guidance and third-party audit reports to help meet compliance requirements. Regulated industry customers often require a service-by-service approval process when adopting cloud services to make sure that each adopted service aligns with their regulatory obligations and risk tolerance. How financial institutions can approve AWS services for highly confidential data walks through the key considerations that customers should focus on to help streamline the approval of cloud services. In this post we cover how regulated customers, especially FSI customers, can approach secrets management on Amazon Elastic Kubernetes Service (Amazon EKS) to help meet data protection and operational security requirements. Amazon EKS gives you the flexibility to start, run, and scale Kubernetes applications in the AWS Cloud or on-premises.

Applications often require sensitive information such as passwords, API keys, and tokens to connect to external services or systems. Kubernetes has secrets objects for managing these types of sensitive information. Additional tools and approaches have evolved to supplement the Kubernetes Secrets to help meet the compliance requirements of regulated organizations. One of the driving forces behind the evolution of these tools for regulated customers is that the native Kubernetes Secrets values aren’t encrypted but encoded as base64 strings; meaning that their values can be decoded by a threat actor with either API access or authorization to create a pod in a namespace containing the secret. There are options such as GoDaddy Kubernetes External Secrets, AWS Secrets and Configuration Provider (ASCP) for the Kubernetes Secrets Store CSI Driver, Hashicorp Vault, and Bitnami Sealed secrets that you can use to can help to improve the security, management, and audibility of your secrets usage.

In this post, we cover some of the key decisions involved in choosing between External Secrets Operator (ESO), Sealed Secrets, and ASCP for the Kubernetes Secrets Store Container Storage Interface (CSI) Driver, specifically for FSI customers with regulatory demands. These decision points are also broadly applicable to customers operating in other regulated industries.

AWS Shared Responsibility Model

Security and compliance is a shared responsibility between AWS and the customer. The AWS Shared Responsibility Model describes this as security of the cloud and security in the cloud:

  • AWS responsibility – Security of the cloud: AWS is responsible for protecting the infrastructure that runs the services offered in the AWS Cloud. For Amazon EKS, AWS is responsible for the Kubernetes control plane, which includes the control plane nodes and etcd database. Amazon EKS is certified by multiple compliance programs for regulated and sensitive applications. The effectiveness of the security controls are regularly tested and verified by third-party auditors as part of the AWS compliance programs.
  • Customer responsibility – Security in the cloud: Customers are responsible for the security and compliance of customer configured systems and services deployed on AWS. This includes responsibility for securely deploying, configuring and managing ESO within their Amazon EKS cluster. For Amazon EKS, the customer responsibility depends upon the worker nodes you pick to run your workloads and cluster configuration as shown in Figure 1. In the case of Amazon EKS deployment using Amazon Elastic Compute Cloud (Amazon EC2) hosts, the customer responsibility includes the following areas:
    • The security configuration of the data plane, including the configuration of the security groups that allow traffic to pass from the Amazon EKS control plane into the customer virtual private cloud (VPC).
    • The configuration of the nodes and the containers themselves.
    • The nodes’ operating system, including updates and security patches.
    • Other associated application software:
    • The sensitivity of your data, such as personally identifiable information (PII), keys, passwords, and tokens
      • Customers are responsible for enforcing access controls to protect their data and secrets.
      • Customers are responsible for monitoring and logging activities related to secrets management including auditing access, detecting anomalies and responding to security incidents.
    • Your company’s requirements, applicable laws and regulations
    • When using AWS Fargate, the operational overhead for customers is reduced in the following areas:
      • The customer is not responsible for updating or patching the host system.
      • Fargate manages the placement and scaling of containers.
Figure 1: AWS Shared Responsibility Model with Fargate and Amazon EC2 based workflows

Figure 1: AWS Shared Responsibility Model with Fargate and Amazon EC2 based workflows

As an example of the Shared Responsibility Model in action, consider a typical FSI workload accepting or processing payments cards and subject to PCI DSS requirements. PCI DSS v4.0 requirement 3 focuses on guidelines to secure cardholder data while at rest and in transit:

Control ID Control description
3.6 Cryptographic keys used to protect stored account data are secured.
3.6.1.2 Store secret and private keys used to encrypt and decrypt cardholder data in one (or more) of the following forms:

  • Encrypted with a key-encrypting key that is at least as strong as the data-encrypting key, and that is stored separately from the data-encrypting key.
  • Stored within a secure cryptographic device (SCD), such as a hardware security module (HSM) or PTS-approved point-of-interaction device.
  • Has at least two full-length key components or key shares, in accordance with an industry-accepted method. Note: It is not required that public keys be stored in one of these forms.
3.6.1.3 Access to cleartext cryptographic key components is restricted to the fewest number of custodians necessary.

NIST frameworks and controls are also broadly adopted by FSI customers. NIST Cyber Security Framework (NIST CSF) and NIST SP 800-53 (Security and Privacy Controls for Information Systems and Organizations) include the following controls that apply to secrets:

Regulation or framework Control ID Control description
NIST CSF PR.AC-1 Identities and credentials are issued, managed, verified, revoked, and audited for authorized devices, users and processes.
NIST CSF PR.DS-1 Data-at-rest is protected.
NIST 800-53.r5 AC-2(1)
AC-3(15)
Secrets should have automatic rotation enabled.
Delete unused secrets.

Based on the preceding objectives, the management of secrets can be categorized into two broad areas:

  • Identity and access management ensures separation of duties and least privileged access.
  • Strong encryption, using a dedicated cryptographic device, introduces a secure boundary between the secrets data and keys, while maintaining appropriate management over the cryptographic keys.

Choosing your secrets management provider

To help choose a secrets management provider and apply compensating controls effectively, in this section we evaluate three different options based on the key objectives derived from the PCI DSS and NIST controls described above and other considerations such as operational overhead, high availability, resiliency, and developer or operator experience.

Architecture and workflow

The following architecture and component descriptions highlight the different architectural approaches and responsibilities of each solution’s components, ranging from controllers and operators, command-line interface (CLI) tools, custom resources, and CSI drivers working together to facilitate secure secrets management within Kubernetes environments.

External Secrets Operator (ESO) extends the Kubernetes API using a custom resource definition (CRD) for secret retrieval. ESO enables integration with external secrets management systems such as AWS Secrets Manager, HashiCorp Vault, Google Secrets Manager, Azure Key Vault, IBM Cloud Secrets Manager, and various other systems. ESO watches for changes to an external secret store and keeps Kubernetes secrets in sync. These services offer features that aren’t available with native Kubernetes Secrets, such as fine-grained access controls, strong encryption, and automatic rotation of secrets. By using these purpose-built tools outside of a Kubernetes cluster, you can better manage risk and benefit from central management of secrets across multiple Amazon EKS clusters. For more information, see the detailed walkthrough of using ESO to synchronize secrets from Secrets Manager to your Amazon EKS Fargate cluster.

ESO is comprised of a cluster-side controller that automatically reconciles the state within the Kubernetes cluster and updates the related secrets anytime the external API’s secret undergoes a change.

Figure 2: ESO workflow

Figure 2: ESO workflow

Sealed Secrets is an open source project by Bitnami comprised of a Kubernetes controller coupled with a client-side CLI tool with the objective to store secrets in Git in a secure fashion. Sealed Secrets encrypts your Kubernetes secret into a SealedSecret, which can also be deployed to a Kubernetes cluster using kubectl. For more information, see the detailed walkthough of using tools from the Sealed Secrets open source project to manage secrets in your Amazon EKS clusters.

Sealed Secrets comprises of three main components: First, there is an operator or a controller which is deployed onto a Kubernetes cluster. The controller is responsible for decrypting your secrets. Second, you have a CLI tool called Kubeseal that takes your secret and encrypts it. Third, you have a CRD. Instead of creating regular secrets, you create SealedSecrets, which is a CRD defined within Kubernetes. That is how the operator knows when to perform the decryption process within your Kubernetes cluster.

Upon startup, the controller looks for a cluster-wide private-public key pair and generates a new 4096-bit RSA public-private key pair if one doesn’t exist. The private key is persisted in a secret object in the same namespace as the controller. The public key portion of this is made publicly available to anyone wanting to use Sealed Secrets with this cluster.

Figure 3: Sealed Secrets workflow

Figure 3: Sealed Secrets workflow

The AWS Secrets Manager and Config Provider (ASCP) for Secret Store CSI driver is an open source tool from AWS that allows secrets from Secrets Manager and Parameter Store, a capability of AWS Systems Manager, to be mounted as files inside Amazon EKS pods. It uses a CRD called SecretProviderClass to specify which secrets or parameters to mount. Upon a pod start or restart, the CSI driver retrieves the secrets or parameters from AWS and writes them to a tmpfs volume mounted in the pod. The volume is automatically cleaned up when the pod is deleted, making sure that secrets aren’t persisted. For more information, see the detailed walkthrough on how to set up and configure the ASCP to work with Amazon EKS.

ASCP comprises of a cluster-side controller acting as the provider, allowing secrets from Secrets Manager, and parameters from Parameter Store to appear as files mounted in Kubernetes pods. Secrets Store CSI Driver is a DaemonSet with three containers: node-driver-registrar, which registers the CSI driver with Kubelet; secrets-store, which implements the CSI Node service gRPC services for mounting and unmounting volumes during pod creation and deletion; and  liveness-probe, which monitors the health of the CSI driver and reports to Kubernetes for automatic issue detection and pod restart.

Figure 4: AWS Secrets Manager and configuration provider

Figure 4: AWS Secrets Manager and configuration provider

In the next section, we cover some of the key decisions involved in choosing whether to use ESO, Sealed Secrets, or ASCP for regulated customers to help meet their regulatory and compliance needs.

Comparing ESO, Sealed Secrets, and ASCP objectives

All three solutions address different aspects of secure secrets management and aim to help FSI customers meet their regulatory compliance requirements while upholding the protection of sensitive data in Kubernetes environments.

ESO synchronizes secrets from external APIs into Kubernetes, targeting the cluster operator and application developer personas. The cluster operator is responsible for setting up ESO and managing access policies. The application developer is responsible for defining external secrets and the application configuration.

Sealed Secrets encrypts your Kubernetes secrets before storing them in version control systems such as public Git repositories. This is the case if you decide to check in your Kubernetes manifest to a Git repository granting access to your sensitive secrets to anyone who has access to the Git repository. This is ultimately the reason why Sealed Secrets was created and the sealed secret can be decrypted only by the controller running in the target cluster.

Using ASCP, you can securely store and manage your secrets in Secrets Manager and retrieve them through your applications running on Kubernetes without having to write custom code. Secrets Manager provides features such as rotation, auditing, and access control that can help FSI customers meet regulatory compliance requirements and maintain a robust security posture.

Installation

The deployment and configuration details that follow highlight the different approaches and resources used by each solution to integrate with Kubernetes and external secret stores, catering to the specific requirements of secure secrets management in containerized environments.

ESO provides Helm charts for ease of operator deployment. External Secrets provides custom resources like SecretStore and ExternalSecret for configuring the required operator functionality to synchronize external secrets to your cluster. For instance, SecretStore can be used by the cluster operator to be able to connect to AWS Secrets Manager using appropriate credentials to pull in the secrets.

To install Sealed Secrets, you can deploy the Sealed Secrets Controller onto the Kubernetes cluster. You can deploy the manifest by itself or you can use a Helm chart to deploy the Sealed Secrets Controller for you. After the controller is installed, you use the Kubeseal client-side utility to encrypt secrets using asymmetric cryptography. If you don’t already have the Kubeseal CLI installed, see the installation instructions.

ASCP provides Helm charts to assist in operator deployment. The ASCP operator provides custom resources such as SecretProviderClass to provide provider-specific parameters to the CSI driver. During pod start and restart, the CSI driver will communicate with the provider using gRPC to retrieve the secret content from the external secret store you specified in the SecretProviderClass custom resource. Then the volume is mounted in the pod as tmpfs and the secret contents are written to the volume.

Encryption and key management

These solutions use robust encryption mechanisms and key management practices provided by external secret stores and AWS services such as AWS Key Management Service (AWS KMS) and Secrets Manager. However, additional considerations and configurations might be required to meet specific regulatory requirements, such as PCI DSS compliance for handling sensitive data.

ESO relies on encryption features within the external secrets management system. For instance, Secrets Manager supports envelope encryption with AWS KMS which is FIPS 140-2 Level 3 certified. Secrets Manager has several compliance certifications making it a great fit for regulated workloads. FIPS 140-2 Level 3 ensures only strong encryption algorithms approved by NIST can be used to protect data. It also defines security requirements for the cryptographic module, creating logical and physical boundaries.

Both AWS KMS and Secrets Manager help you to manage key lifecycle and to integrate with other AWS Services. In terms of key rotation, both provide automatic rotation of secrets that runs on a schedule (which you define), and abstract the complexity of managing different versions of keys. For AWS managed keys, the key rotation happens automatically once every year by default. With customer managed keys (CMKs), automatic key rotation is available but not enabled by default.

When using SealedSecrets, you use the Kubeseal tool to convert a standard Kubernetes Secret into a Sealed Secrets resource. The contents of the Sealed Secrets are encrypted with the public key served by the Sealed Secrets Controller as described in the Sealed Secrets project homepage.

In the absence of cloud native secrets management integration, you might have to add compensating controls to achieve the regulatory standards required by your organization. In cases where the underlying SealedSecrets data is sensitive in nature, such as cardholder PII, PCI requires that you store sensitive secrets in a cryptographic device such as a hardware security module (HSM). You can use Secrets Manager to store the master key generated to seal the secrets. However, this you will have to enable additional integration with Amazon EKS APIs to fetch the master key securely from the EKS cluster. You will also have to modify your deployment process to use a master key from Secrets Manager. The applications running in the EKS cluster must have permissions to fetch the SealedSecret and master key from Secrets Manager. This might involve configuring the application to interact with Amazon EKS APIs and Secrets Manager. For non-sensitive data, Kubeseal can be used directly within the EKS cluster to manage secrets and sealing keys.

For key rotation, you can store the controller generated private key in Parameter Store as a SecureString. You can use the advanced tier in Parameter Store if the file containing the private keys exceeds the Standard tier limit of up to 4,096 characters. In addition, if you want to add key rotation, you can use AWS KMS.

The ASCP relies on encryption features within the chosen secret store, such as Secrets Manager. Secrets Manager supports integration with AWS KMS for an additional layer of security by storing encryption keys separately. The Secrets Store CSI Driver facilitates secure interaction with the secret store, but doesn’t directly encrypt secrets. Encrypting mounted content can provide further protection, but introduces operational overhead related to key management.

ASCP relies on Secrets Manager and AWS KMS for encryption and decryption capabilities. As a recommendation, you can encrypt mounted content to further protect the secrets. However, this introduces the additional operational overhead of managing encryption keys and addressing key rotation.

Additional considerations

These solutions address various aspects of secure secrets management, ranging from centralized management, compliance, high availability, performance, developer experience, and integration with existing investments, catering to the specific needs of FSI customers in their Kubernetes environments.

ESO can be particularly useful when you need to manage an identical set of secrets across multiple Kubernetes clusters. Instead of configuring, managing, and rotating secrets at each cluster level individually, you can synchronize your secrets across your clusters. This simplifies secrets management by providing a single interface to manage secrets across multiple clusters and environments.

External secrets management systems typically offer advanced security features such as encryption at rest, access controls, audit logs, and integration with identity providers. This helps FSI customers ensure that sensitive information is stored and managed securely in accordance with regulatory requirements.

FSI customers usually have existing investments in their on-premises or cloud infrastructure, including secrets management solutions. ESO integrates seamlessly with existing secrets management systems and infrastructure, allowing FSI customers to use their investment in these systems without requiring significant changes to their workflow or tooling. This makes it easier for FSI customers to adopt and integrate ESO into their existing Kubernetes environments.

ESO provides capabilities for enforcing policies and governance controls around secrets management such as access control, rotation policies, and audit logging when using services like Secrets Manager. For FSI customers, audits and compliance are critical and ESO verifies that access to secrets is tracked and audit trails are maintained, thereby simplifying the process of demonstrating adherence to regulatory standards. For instance, secrets stored inside Secrets Manager can be audited for compliance with AWS Config and AWS Audit Manager. Additionally, ESO uses role-based access control (RBAC) to help prevent unauthorized access to Kubernetes secrets as documented in the ESO security best practices guide.

High availability and resilience are critical considerations for mission critical FSI applications such as online banking, payment processing, and trading services. By using external secrets management systems designed for high availability and disaster recovery, ESO helps FSI customers ensure secrets are available and accessible in the event of infrastructure failure or outages, thereby minimizing service disruption and downtime.

FSI workloads often experience spikes in transaction volumes, especially during peak days or hours. ESO is designed to efficiently managed a large volume of secrets by using external secrets management that’s optimized for performance and scalability.

In terms of monitoring, ESO provides Prometheus metrics to enable fine-grained monitoring of access to secrets. Amazon EKS pods offer diverse methods to grant access to secrets present on external secrets management solutions. For example, in non-production environments, access can be granted through IAM instance profiles assigned to the Amazon EKS worker nodes. For production, using IAM roles for service accounts (IRSA) is recommended. Furthermore, you can achieve namespace level fine-grained access control by using annotations.

ESO also provides options to configure operators to use a VPC endpoint to comply with FIPS requirements.

Additional developer productivity benefits provided by ESO include support for JSON objects (Secret key/value in the AWS Management console) or strings (Plaintext in the console). With JSON objects, developers can programmatically update multiple values atomically when rotating a client certificate and private key.

The benefit of Sealed Secrets, as discussed previously, is when you upload your manifest to a Git repository. The manifest will contain the encrypted SealedSecrets and not the regular secrets. This assures that no one has access to your sensitive secrets even when they have access to your Git repository. Sealed Secrets offer a few benefits to developers in terms of developer experience. Sealed Secrets gives you access to manage your secrets, making them more readily available to developers. Sealed Secrets offers VSCode extension to assist in integrating it into the software development lifecycle (SDLC). Using Sealed Secrets, you can store the encrypted secrets in the version control systems such as Gitlab and GitHub. Sealed Secrets can reduce operational overhead related to updating dependent objects because whenever a secret resource is updated, the same update is applied to the dependent objects.

ASCP integration with the Kubernetes Secrets Store CSI Driver on Amazon EKS offers enhanced security through seamless integration with Secrets Manager and Parameter Store, ensuring encryption, access control, and auditing. It centralizes management of sensitive data, simplifying operations and reducing the risk of exposure. The dynamic secrets injection capability facilitates secure retrieval and injection of secrets into Kubernetes pods, while automatic rotation provides up-to-date credentials without manual intervention. This combined solution streamlines deployment and management, providing a secure, scalable, and efficient approach to handling secrets and configuration settings in Kubernetes applications.

Consolidated threat model

We created a threat model based on the architecture of the three solution offerings. The threat model provides a comprehensive view of the potential threats and corresponding mitigations for each solution, allowing organizations to proactively address security risks and ensure the secure management of secrets in their Kubernetes environments.

X = Mitigations applicable to the solution

Threat Mitigations ESO Sealed Secrets ASCP
Unauthorized access or modification of secrets
  • Implement least privilege access principles
  • Rotate and manage credentials securely
  • Enable RBAC and auditing in Kubernetes
X X X
Insider threat (for example, a rogue administrator who has legitimate access)
  • Implement least privilege access principles
  • Enable auditing and monitoring
  • Enforce separation of duties and job rotation
X X
Compromise of the deployment process
  • Secure and harden the deployment pipeline
  • Implement secure coding practices
  • Enable auditing and monitoring
X
Unauthorized access or tampering of secrets during transit
  • Enable encryption in transit using TLS
  • Implement mutual TLS authentication between components
  • Use private networking or VPN for secure communication
X X X
Compromise of the Kubernetes API server because of vulnerabilities or misconfiguration
  • Secure and harden the Kubernetes API server
  • Enable authentication and authorization mechanisms (for example, mutual TLS and RBAC)
  • Keep Kubernetes components up-to-date and patched
  • Enable Kubernetes audit logging and monitoring
X
Vulnerability in the external secrets controller leading to privilege escalation or data exposure
  • Keep the external secrets controller up-to-date and patched
  • Regularly monitor for and apply security updates
  • Implement least privilege access principles
  • Enable auditing and monitoring
X
Compromise of the Secrets Store CSI Driver, node-driver-registrar, Secrets Store CSI Provider, kubelet, or Pod could lead to unauthorized access or exposure of secrets
  • Implement least privilege principles and role-based access controls
  • Regularly patch and update the components
  • Monitor and audit the component activities
X
Unauthorized access or data breach in Secrets Manager could expose sensitive secrets
  • Implement strong access controls and access logging for Secrets Manager
  • Encrypt secrets at rest and in transit
  • Regularly rotate and update secrets
X X

Shortcomings and limitations

The following limitations and drawbacks highlight the importance of carefully evaluating the specific requirements and constraints of your organization before adopting any of these solutions. You should consider factors such as team expertise, deployment environments, integration needs, and compliance requirements to promote a secure and efficient secrets management solution that aligns with your organization’s needs.

ESO doesn’t include a default way to restrict network traffic to and from ESO using network policies or similar network or firewall mechanisms. The application team is responsible for properly configuring network policies to improve the overall security posture of ESO within your Kubernetes cluster.

Any time an external secret associated with ESO is rotated, you must restart the deployment that uses that particular external secret. Given the inherent risks associated with integrating an external entity or third-party solution into your system, including ESO, it’s crucial to implement a comprehensive threat model similar to the Kubernetes Admission Control Threat Model.

Also, ESO set up is complicated and the controller must be installed on the Kubernetes cluster.

SealedSecrets cannot be reused across namespaces unless they’re re-encrypted or made cluster-wide, which makes it challenging to manage secrets across multiple namespaces consistently. The need to manually rotate and re-encrypt SealedSecrets with new keys can introduce operational overhead, especially in large-scale environments with numerous secrets. The old sealing keys pose a potential risk of misuse by unauthorized users, which increases the risk. To mitigate both risks (high overhead and old secrets), you should implement additional controls such as deleting older keys as part of the key rotation process or periodically rotate sealing keys and make sure that old sealed secret resources are re-encrypted with the new keys. Sealed Secrets doesn’t support external secret stores such as HashiCorp Vault, or cloud provider services such as Secrets Manager, Parameter Store, or Azure Key Vault. Sealed Secrets requires a Kubeseal client-side binary to encrypt secrets. This can be a concern in FSI environments where client-side tools are restricted by security policies.

While ASCP provides seamless integration with Secrets Manager and Parameter Store, teams unfamiliar with these AWS services might need to invest some additional effort to fully realize the benefits. This additional effort is justified by the long-term benefits of centralized secrets management and access control provided by these services. Additionally, relying primarily on AWS services for secrets management can potentially limit flexibility in deploying to alternative cloud providers or on-premises environments in the future. These factors should be carefully evaluated based on the specific needs and constraints of the application and deployment environment.

Conclusion

We have provided a summary of three options for managing secrets in Amazon EKS, ESO, Sealed Secrets, and AWS Secrets and Configuration Provider (ASCP), and the key considerations for FSI customers when choosing between them. The choice depends on several factors including existing investments in secrets management systems, specific security needs and compliance requirements, preference for a Kubernetes native solution or willingness to accept vendor lock-in.

The guidance provided here covers the strengths, limitations, and trade-offs of each option, allowing regulated institutions to make an informed decision based on their unique requirements and constraints. This guidance can be adapted and tailored to fit the specific needs of an organization, providing a secure and efficient secrets management solution for their Amazon EKS workloads, while aligning with the stringent security and compliance standards of the regulated institutions.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Piyush Mattoo

Piyush Mattoo
Piyush is a Senior Solution Architect for Financial Services Data Provider segment at Amazon Web Services. He is a software technology leader with over a decade long experience building scalable and distributed software systems to enable business value through the use of technology. He is based out of Southern California and current interests include outdoor camping and nature walks.

Ruy Cavalcanti

Ruy Cavalcanti
Ruy is a Senior Security Architect for the Latin American Financial market at AWS. He has been working in IT and Security for over 19 years, helping customers create secure architectures in the AWS Cloud. Ruy’s interests include jamming on his guitar, firing up the grill for some Brazilian-style barbecue, and enjoying quality time with his family and friends.

Chetan Pawar

Chetan Pawar
Chetan is a Cloud Architect specializing in infrastructure within AWS Professional Services. As a member of the Containers Technical Field Community, he provides strategic guidance on enterprise Infrastructure and DevOps for clients across multiple industries. He has an 18-year track record building large-scale Infrastructure and containerized platforms. Outside of work, he is an avid traveler and motorsport enthusiast.

Announcing AWS KMS Elliptic Curve Diffie-Hellman (ECDH) support

Post Syndicated from Patrick Palmer original https://aws.amazon.com/blogs/security/announcing-aws-kms-elliptic-curve-diffie-hellman-ecdh-support/

When using cryptography to protect data, protocol designers often prefer symmetric keys and algorithms for their speed and efficiency. However, when data is exchanged across an untrusted network such as the internet, it becomes difficult to ensure that only the exchanging parties can know the same key. Asymmetric key pairs and algorithms help to solve this problem by allowing a public key to be shared over an untrusted network. And by using a key agreement scheme, two parties can use each other’s public key in combination with their own private key to each derive the same shared secret.

We’re excited to announce that AWS Key Management Service (AWS KMS) now supports Elliptic Curve Diffie-Hellman (ECDH) key agreement on elliptic curve (ECC) KMS keys. You can use the new DeriveSharedSecret API action to enable two parties to establish a secure communication channel by using a derived shared secret.

In this blog post we provide an overview of the new API action and explain how it can help you establish secure communications by exchanging only public keys to obtain a derived shared secret. We then show example commands to demonstrate how AWS KMS and OpenSSL can be used by two parties to derive a shared secret.

With this new DeriveSharedSecret API action, customers can take an external party’s public key and, in combination with a private key that resides within AWS KMS, derive a shared secret which can be used to derive a symmetric encryption key with a key derivation function (KDF). Customers can then use this symmetric encryption key to encrypt data locally within their application.

The same external party can combine their own related private key with the customer’s corresponding public key from AWS KMS to derive the same shared secret.

Now that both parties have the same shared secret, they can generate a symmetric encryption key that can be used to encrypt and decrypt the data they exchange.

DeriveSharedSecret offers a simple and secure way for customers to use their private key from within their application, enabling new asymmetric cryptography use cases for keys protected by AWS KMS, such as elliptic curve integrated encryption scheme (ECIES) or end-to-end encryption (E2EE) schemes.

AWS KMS DeriveSharedSecret overview

The AWS KMS API Reference documentation covers the DeriveSharedSecret API action in more detail than we include in this post. We broadly describe how to interact with the API action, using the following steps:

  1. Create an elliptic curve (ECC) KMS key, selecting that the key be used for KEY_AGREEMENT and choosing one of the supported key specs. You will not be able to modify existing ECC keys to be used for key agreement.
  2. Have another party create an elliptic curve key that matches the key spec you defined for your KMS key.
  3. Retrieve the public key associated with your KMS key by using the existing GetPublicKey API action.
  4. Exchange public keys through a trusted means of exchange with the other party. Note that DeriveSharedSecret expects a base64-encoded DER-formatted public key.
  5. Use the other party’s public key as an input, along with your specified KEY_AGREEMENT key. The only key agreement algorithm supported by AWS KMS at launch is ECDH.
  6. The other party should use the public key retrieved from AWS KMS and the private key associated with their generated ECC key pair to derive a shared secret.

The result of the preceding steps is that both parties have the same output without exchanging secret information. Only public keys were exchanged between the two parties. The output of DeriveSharedSecret is the raw shared secret. This shared secret is the multiplication of points on the elliptic curves and can result in many more bytes than are needed for an encryption key. We recommend that customers use a KDF, following the National Institute of Standards and Technology (NIST) SP800-56A Rev. 3 section 5.8 guidance, to derive encryption keys from this shared secret.

For the purposes of this post, we will demonstrate the steps by using the AWS CLI and OpenSSL command line. AWS has incorporated best practices for customers within the AWS Encryption SDK. You can find more details at AWS KMS ECDH keyrings.

Example use case

An example use case where you might wish to use ECDH key agreement is for end-to-end encryption. Although protocols exist that provide a secure framework for secure communications (for example, within AWS Wickr), we will highlight the simplified high-level steps behind some of these protocols. In our example use case, Alice and Bob are both part of a messaging network. This network is managed by a centralized service, and this service must not be able to access Alice or Bob’s unencrypted messages.

Figure 1: High-level architecture for the service described in the example use case

Figure 1: High-level architecture for the service described in the example use case

As shown in Figure 1, Alice and Bob each have an ECC key pair and participate in the secret derivation by using ECDH, through the following steps:

  1. Alice registers her public key in the centralized key storage service. A detailed discussion of the key storage service is beyond the scope of this post.
  2. Bob, an AWS KMS user, calls the AWS KMS GetPublicKey action to obtain the public key for the ECC KMS key pair.
  3. Bob registers his public key in the same centralized key storage service.
  4. Alice, who wants to exchange encrypted messages with Bob, retrieves Bob’s public key from the centralized key storage service.
  5. Bob gets a notification that Alice wants to communicate with him, and he retrieves Alice’s public key from the centralized key storage service.
  6. Using Bob’s public key and her private key, Alice derives a shared secret by using her cryptography provider.
  7. Using Alice’s public key and his private key, Bob derives a shared secret by using DeriveSharedSecret.
  8. Alice and Bob now have an identical shared secret. From this shared secret, she can create a symmetric encryption key by using a suitable KDF. The symmetric encryption key can be used to create ciphertext that can be sent to Bob.

Example use case walkthrough

You can use the following steps to create a KMS key for ECDH use and derive a shared secret by using AWS KMS. For our demonstration purposes, the user Alice (from our example use case) is using OpenSSL as the cryptography tool. We will show how the AWS KMS user Bob and OpenSSL user Alice can derive a shared secret by using each other’s public key.

General prerequisites

You must have the following prerequisites in place in order to implement the solution:

  • AWS CLI — The latest version is recommended. The example here uses aws-cli/2.15.40 and aws-cli/1.32.110.
  • OpenSSL — The example here uses OpenSSL 3.3.0.
  • Both parties (Alice and Bob, from our example use case) have an ECC key on the same curve. The steps in the next section, Key creation prerequisite, explain how these keys can be created.

Key creation prerequisite

Alice and Bob must use the same ECC curve during key creation. The DeriveSharedSecret API action supports curves ECC_NIST_P256, ECC_NIST_P384, and ECC_NIST_P521, which map to P-256, P-384, and P-521 respectively in OpenSSL. The curves that AWS KMS supports are the curves approved by the U.S. National Institute of Standards and Technology (NIST). Additionally, AWS KMS supports the SM2 key spec only in Amazon Web Services China Regions.

Bob creates an asymmetric KMS key for key agreement purposes

Bob creates a key pair in AWS KMS by using the CreateKey API action. In the following example, Bob creates an ECC key pair with ECC_NIST_P256 for the KeySpec parameter and KEY_AGREEMENT for the KeyUsage parameter.

aws kms create-key \
--key-spec ECC_NIST_P256 \
--key-usage KEY_AGREEMENT \
--description "Example ECDH key pair"

The response looks something like this:

{
    "KeyMetadata": {
        "AWSAccountId": "111122223333",
        "KeyId": "a1b2c3d4-5678-90ab-cdef-EXAMPLE11111",
        "Arn": "arn:aws:kms:us-east-1:111122223333:key/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111",
        "CreationDate": "2024-06-25T13:06:24.888000-07:00",
        "Enabled": true,
        "Description": "Example ECDH key pair",
        "KeyUsage": "KEY_AGREEMENT",
        "KeyState": "Enabled",
        "Origin": "AWS_KMS",
        "KeyManager": "CUSTOMER",
        "CustomerMasterKeySpec": "ECC_NIST_P256",
        "KeySpec": "ECC_NIST_P256",
        "KeyAgreementAlgorithms": [
            "ECDH"
        ],
        "MultiRegion": false
    }
}

You can follow the Creating asymmetric KMS keys documentation to see how to use the AWS Management Console to create a KMS key pair with the same properties as shown here. This example creates a KMS key with a default KMS key policy. You should review and configure your key policy according to the principle of least privilege, as appropriate for your environment.

Note: When a KMS key is created, it will be logged by AWS CloudTrail, a service that monitors and records activity within your account. API calls to the AWS KMS service are logged in CloudTrail, which you can use to audit access to KMS keys.

To allow your KMS key to be identified by a human-readable string rather than by the KeyId value, you can create an alias for the KMS key (replace the target-key-id value of a1b2c3d4-5678-90ab-cdef-EXAMPLE11111 with your KeyId value). This makes it easier to use and manage your KMS keys.

Bob creates an alias for his KMS key by using the CLI with the following command:

aws kms create-alias \
    --alias-name alias/example-ecdh-key \
    --target-key-id a1b2c3d4-5678-90ab-cdef-EXAMPLE11111 

Alice creates an ECC key for key agreement purposes by using OpenSSL

Using the ecparam and genkey option of OpenSSL, Alice creates a P-256 ECC key. The P-256 curve is represented by AWS KMS as ECC_NIST_P256.

Note: For ECDH to work, the curve of the OpenSSL ECC key must be same as the ECC KMS key created by the other party (Bob, in our example use case).

openssl ecparam -name P-256 \
        -genkey -out openssl_ecc_private_key.pem

Key exchange and secret derivation process

The following sections outline the steps that Alice and Bob will follow to share their public keys, retrieve one another’s public key, and then derive the same shared secret using AWS KMS and OpenSSL. The shared secrets derived by Alice and Bob respectively are then compared to show that they both derived the same shared secret.

Step 1: Alice generates and registers her OpenSSL public key with a central service

AWS KMS expects the public key in DER format. Therefore, in this example Alice creates a DER-format public key by using her ECC private key. Alice runs the following command to produce a DER-format file that contains her public key:

openssl ec -in openssl_ecc_private_key.pem \
        -pubout -outform DER \
        > openssl_ecc_public_key.bin.der

The file openssl_ecc_public_key.bin.der will have the public key in DER format, which Alice can store in the centralized key storage service (or send to anyone she would like to communicate with). Details about the centralized key storage service are beyond the scope of this post.

Step 2: Bob obtains the public key for his ECC KMS Key

To retrieve a copy of the public key for his ECC KMS key, Bob uses the GetPublicKey API action. Bob calls this API by using the AWS CLI command get-public-key, as follows:

aws kms get-public-key \
    --key-id alias/example-ecdh-key \
    --output text \
    --query PublicKey | base64 --decode > kms_ecdh_public_key.der

The returned PublicKey value is a DER-encoded X.509 public key. Because the AWS CLI is being used, the public key output is base64-encoded for readability purposes. This base64-encoded value is decoded by using the base64 command, and the decoded value is stored in the output file. The file kms_ecdh_public_key.der contains the DER-encoded public key.

Note: If you call this API by using one of the AWS SDKs, such as Boto3, then the returned PublicKey value is not base64-encoded.

In our example use case, Alice is using OpenSSL, which expects the public key in PEM format. Bob converts his DER-format public key into PEM format by using the following command:

openssl ec -pubin -inform DER -outform PEM \
        -in kms_ecdh_public_key.der \
        -out kms_ecdh_public_key.pem

The file kms_ecdh_public_key.pem contains the public key in PEM format.

Step 3: Bob registers his public key with the centralized key storage service

Bob saves his public key in PEM format, obtained in Step 2, in the centralized key storage service.

Step 4: Alice retrieves Bob’s public key to derive a shared secret

To perform ECDH key agreement, the two parties involved (Alice and Bob, in our example use case) need to exchange their public key with each other. Alice, who wants to send encrypted messages to Bob, retrieves Bob’s public key from the centralized key storage service.

Bob’s public key, kms_ecdh_public_key.pem, is already in PEM format as expected by OpenSSL.

Step 5: Bob retrieves Alice’s public key to derive a shared secret

To perform ECDH key agreement, the two parties involved, Alice and Bob, need to exchange their public key with each other. Bob gets a notification that Alice wants to communicate with him, and he retrieves Alice’s public key from the centralized key storage service.

Alice’s public key, openssl_ecc_public_key.bin.der, is already in DER format as expected by AWS KMS.

Step 6: Alice uses OpenSSL to derive the shared secret

Alice, using her private key and Bob’s public key, can derive the shared secret by using OpenSSL. Alice derives the shared secret by using the OpenSSL pkeyutl command with the derive option, as follows:

openssl pkeyutl -derive \
-inkey openssl_ecc_private_key.pem \
-peerkey kms_ecdh_public_key.pem > openssl.ss

The file openssl.ss will have the shared secret in binary format.

Step 7: Bob uses AWS KMS to derive the shared secret

Bob, using his private key (which remains securely within AWS KMS) and Alice’s public key, can derive the shared secret by using AWS KMS. The following example shows how Bob uses the DeriveSharedSecret API action with the AWS CLI command derive-shared-secret. At launch, the only supported key agreement algorithm is ECDH. Bob passes Alice’s public key for the PublicKey parameter.

aws kms derive-shared-secret \
--key-id alias/example-ecdh-key \
--public-key fileb://path/to/openssl_ecc_public_key.bin.der \
--key-agreement-algorithm ECDH \
--output text --query SharedSecret |base64 --decode > kms.ss

Because the AWS CLI is being used, the returned SharedSecret value is base64-encoded for readability purposes. Using the base64 --decode command, the decoded binary format is stored to the file.

Note: If you call this API by using one of the AWS SDKs, such as Boto3, then the returned SharedSecret value is not base64-encoded.

The file kms.ss will have the shared secret in binary format.

Step 8: Using the shared secret and a suitable KDF, Alice derives an encryption key to encrypt her communication to Bob

You can use the following command to compare the two files containing the derived shared secrets that were obtained in Steps 6 and 7 and verify that they are identical:

diff -qs openssl.ss kms.ss

Because these files are identical, we can see that the same secret was derived using both AWS KMS and OpenSSL.

Using the shared secret, Alice should then derive a symmetric encryption key by using a suitable KDF. She can use this symmetric encryption key to encrypt data and send the ciphertext to Bob.

This blog post does not cover the steps to derive that symmetric encryption key, because that can be a complex topic depending on your use case. However, we note that you should not use the raw shared secret as an encryption key because it is not uniform. In other words, the shared secret has a lot of entropy, but the byte string itself is not random.

NIST recommends that you use a KDF function over the raw shared secret (value Z as described in section 5.8 of NIST SP800-56A Rev. 3). The KDFs that are recommended are described in more detail in NIST SP800-56C Rev. 2. One such example is OpenSSL Single Step KDF (SSKDF) EVP_KDF-SS, but using this KDF involves choosing the other values, such as FixedInfo, carefully.

To help customers make the right choice for the resulting KDF to use on the shared secret, the AWS Encryption SDK now includes AWS KMS ECDH keyrings. The keyring is a construct within the AWS Encryption SDK that you implement within your code. The keyring handles the management of encryption keys while applying best practices to protect your data. You can use the keyring to reference your KMS keys for key agreement, and then call a function to encrypt data. Data will be encrypted by using a derived shared wrapping key following NIST recommendations, and the Encryption SDK applies key commitment to the ciphertext.

Summary

In this blog post, we highlighted how you can use the recently launched DeriveSharedSecret API action to securely derive a shared secret. You’ve seen how ECDH can be used between two parties without having to share secret information across untrusted networks. We explained how you can audit your AWS KMS key usage through AWS CloudTrail logs. We highlighted that you would need to use a KDF to generate a symmetric encryption key from the shared secret. We strongly recommend that you use the AWS Encryption SDK to encrypt your data, which helps make sure that the recommended NIST key derivation functions are used for generating symmetric encryption keys.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Patrick Palmer

Patrick Palmer
Patrick is a Principal Security Specialist Solutions Architect at AWS. He helps customers around the world use AWS services in a secure manner and specializes in cryptography. When not working, he enjoys spending time with his growing family and playing video games.

Raj Puttaiah

Raj Puttaiah
Raj is a Software Development Manager for AWS KMS. Raj leads the development of AWS KMS features, focusing on operational excellence. When not working, Raj spends time with his family hiking the beautiful Washington outdoors, and accompanying his two sons to their activities.

Michael Miller

Michael Miller
Michael is a Senior Solutions Architect at AWS, based in Ireland. He helps public sector customers across the UK and Ireland accelerate their cloud adoption journey and specializes in security and networking. In prior roles, Michael has been responsible for designing architectures and supporting implementations across various sectors including service providers, consultancies, and financial services organizations.

Using Amazon GuardDuty Malware Protection to scan uploads to Amazon S3

Post Syndicated from Luke Notley original https://aws.amazon.com/blogs/security/using-amazon-guardduty-malware-protection-to-scan-uploads-to-amazon-s3/

Amazon Simple Storage Service (Amazon S3) is a widely used object storage service known for its scalability, availability, durability, security, and performance. When sharing data between organizations, customers need to treat incoming data as untrusted and assess it for malicious files before ingesting it into their downstream processes. This traditionally requires setting up secure staging buckets, deploying third-party anti-virus and anti-malware scanning software, and managing a complex data pipeline and processing architecture.

To address the need for malware protection in Amazon S3, Amazon Web Services (AWS) has launched Amazon GuardDuty Malware Protection for Amazon S3. This new feature provides malicious object scanning for objects uploaded to S3 buckets, using multiple AWS-developed and industry-leading third-party malware scanning engines. It eliminates the need for customers to manage their own isolated data pipelines, compute infrastructure, and anti-virus software across accounts and AWS Regions, providing malware detection without compromising the scale, latency, and resiliency of S3 usage.

In this blog post, we share a solution that uses Amazon EventBridge, AWS Lambda, and Amazon S3 to copy scanned S3 objects to a destination S3 bucket. EventBridge is a serverless event bus that you can use to build event-driven architectures and automate your business workflows. In this solution, we allow events to be invoked from an object that is being placed in an S3 bucket. The events can be processed by a serverless function in Lambda to invoke a malware scan. We then show you how to extend this solution for other use cases specific to your organization.

Feature overview

GuardDuty Malware Protection for Amazon S3 provides a malware and anti-virus detection service for new objects uploaded to an S3 bucket. Malware Protection for S3 is enabled from within the AWS Management Console for GuardDuty and GuardDuty threat detection is not required to be enabled to use this feature. If GuardDuty threat detection is enabled, security findings for detected malware are also sent to GuardDuty. This allows customer development or application teams and security teams to work together and oversee malware protection for S3 buckets throughout the organization.

When your AWS account has GuardDuty enabled in an AWS Region, your account is associated to a unique regional entity called a detector ID. All findings that GuardDuty generates and API operations that are performed are associated with this detector ID. If you don’t want to use GuardDuty with your AWS account, Malware Protection for S3 is available as an independent feature. Used independently, Malware Protection for S3 will not create an associated detector ID.

When a malware scan identifies a potentially malicious object and you don’t have a detector ID, no GuardDuty finding will be generated in your AWS account. GuardDuty will publish the malware scan results to your default EventBridge event bus and metrics to an Amazon CloudWatch namespace for you to use for automating additional tasks.

GuardDuty manages error handling and reprocessing of event creation and publication as needed to make sure that each object is properly evaluated before being accessed by downstream resources. GuardDuty supports configuring Amazon S3 object tagging actions to be performed throughout the process.

Figure 1 shows the high-level overview of the S3 object scanning process.

Figure 1: S3 object scanning process

Figure 1: S3 object scanning process

The object scanning process is the following:

  1. An object is uploaded to an S3 bucket that has been configured for malware detection. If the object is uploaded as a multi-part upload, then a new object notification will be generated on completion of the upload.
  2. The malware scan service receives a notification that a new object has been detected in the bucket.
  3. The malware scan service downloads the object by using AWS PrivateLink. This will be automatically created when malware detection is enabled on an S3 bucket. No additional configuration is required.
  4. The malware detection service then reads, decrypts, and scans this object in an isolated VPC with no internet access within the GuardDuty service account. Encryption at rest is used for customer data that is scanned during this process. After the malware detection scan is complete, the object is deleted from the malware scanning environment.
  5. The malware scan result event is sent to the EventBridge default event bus in your AWS account and Region where malware detection has been enabled. When malware is detected, an EventBridge notification is generated that includes details of which S3 object was flagged as malicious and supporting information such as the malware variant and known use cases for the malicious software.
  6. Scan metrics such as number of objects scanned and bytes scanned are sent to Amazon CloudWatch.
  7. If malware is detected, the service sends a finding to the GuardDuty detector ID in the current Region.
  8. If you have configured object tagging, GuardDuty adds a predefined tag with key GuardDutyMalwareScanStatus and a potential scan result value of your scanned S3 object.

IAM permissions

Enabling and using GuardDuty Malware Protection for S3 requires you to add AWS Identity and Access Manager (IAM) role permissions and a specific trust policy for GuardDuty to perform the malware scan on your behalf. GuardDuty provides you flexibility to enable this feature for your entire bucket, or limit the scope of the malware scan to specific object prefixes where GuardDuty scans each uploaded object that starts with up to five selected prefixes.

To allow GuardDuty Malware Protection for S3 to scan and add tags to your S3 objects, you need an IAM role that includes permissions to perform the following tasks:

  1. A trust policy to allow Malware Protection to assume the IAM role.
  2. Allow EventBridge actions to create and manage the EventBridge managed rule to allow Malware Protection for S3 to listen to your S3 object notifications.
  3. Allow Amazon S3 and EventBridge actions to send notification to EventBridge for events in the S3 bucket.
  4. Allow Amazon S3 actions to access the uploaded S3 object and add a predefined tag GuardDutyMalwareScanStatus to the scanned S3 object.
  5. If you’re encrypting S3 buckets with AWS Key Management System (AWS KMS) keys, you must allow AWS KMS key actions to access the object before scanning and putting a test object in S3 buckets with the supported encryption.

This IAM policy is required each time you enable Malware Protection for S3 for a new bucket in your account. Alternatively, update an existing IAM PassRole policy to include the details of another S3 bucket resource each time you enable Malware Protection. See the AWS documentation for example policies and permissions required.

S3 object tagging and access control

When you enable S3 object tagging, GuardDuty adds a predefined tag with key GuardDutyMalwareScanStatus and a potential scan result value of your scanned S3 object. These tags enable the implementation of a tag-based access control (TBAC) policy for the objects, halting access to an S3 object until a malware scan has been completed.

The example S3 bucket policy in the AWS GuardDuty user guide stops anyone other than the GuardDuty Malware scan service principal from reading objects from the specific S3 bucket that aren’t tagged GuardDutyMalwareScanStatus with a value NO_THREATS_FOUND. The policy also helps prevent other roles or users other than GuardDuty from adding the GuardDutyMalwareScanStatus tag.

Configure optional access for other IAM roles that are allowed to override the GuardDutyMalwareScanStatus tag after an object is tagged. Achieve this by replacing <IAM-role-name> in the following example S3 bucket policy.

{
            "Sid": "OnlyGuardDutyCanTag",
            "Effect": "Deny",
            "NotPrincipal": {
                "AWS": [
                    "arn:aws:iam::555555555555:root",
                    "arn:aws:iam::555555555555:role/<IAM-role-name>",
                    "arn:aws:iam::555555555555:assumed-role/<IAM-role-name>/GuardDutyMalwareProtection"
                ]
            },

Change the policy if you are required to allow certain principals or roles to read failed or skipped objects. You can permit a special role to read the malicious object if needed as part of your existing incident response process. Do this by adding an additional statement into the S3 bucket policy and replacing the <IAM-role-name>value in the following example.

{
            "Sid": "AllowSecurityTeamReadMalicious",
            "Effect": "Deny",
            "NotPrincipal": {
                "AWS": [
                    "arn:aws:iam::555555555555:role/<IAM-role-name>"
                ]
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::DOC-EXAMPLE-BUCKET",
                "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*"
            ],
            "Condition": {
                "StringNotEquals": {
                    "s3:ExistingObjectTag/GuardDutyMalwareScanStatus": "THREATS_FOUND"
                }
            }
        },

Solution overview

This solution is designed to streamline the deployment of GuardDuty Malware Protection for S3, helping you to maintain a secure and reliable S3 storage environment while minimizing the risk of malware infections and their potential consequences. The solution provides several configuration options, allowing you to create a new S3 bucket or use an existing one, enable encryption with a new or existing AWS KMS key, and optionally set up a function to copy objects with a defined tag to a destination S3 bucket. The copy function feature offers an additional layer of protection by separating potentially malicious files from clean ones, allowing you to maintain a separate repository of safe data for continued business operations or further analysis.

Figure 2 shows the solution architecture.

Figure 2: Amazon GuardDuty copy S3 object solution overview

Figure 2: Amazon GuardDuty copy S3 object solution overview

The high-level workflow of the solution is as follows:

  1. An object is uploaded to an S3 bucket that has been configured for malware detection.
  2. The malware scan service receives a notification that a new object has been detected in the bucket and then GuardDuty reads, decrypts, and scans the object in an isolated environment.
  3. An EventBridge rule is configured to listen for events that match the pattern of completed scans for the monitored bucket that have a scan result of NO_THREATS_FOUND.
  4. When the matched event pattern occurs, the copy object Lambda function is invoked.
  5. The Lambda copy object function copies the object from the monitored S3 bucket to the target bucket.

In this solution, you will use the follow AWS services and features:

  • Event tracking: This solution uses an EventBridge rule to listen for completed malware scan result events for a specific S3 bucket, which has been enabled for malware scanning. When the EventBridge rule finds a matched event, the rule passes the required parameters and invokes the Lambda function required to copy the S3 object from the source malware protected bucket to a destination clean bucket. The event pattern used in this solution uses the following format:
    {
      "source": ["aws.guardduty"],
      "detail-type": ["GuardDuty Malware Protection Object Scan Result"],
      "detail": {
        "scanStatus": ["COMPLETED"],
        "resourceType": ["S3_OBJECT"],
        "s3ObjectDetails": {
          "bucketName": ["<DOC-EXAMPLE-BUCKET-111122223333>"]
        },
        "scanResultDetails": {
          "scanResultStatus": ["NO_THREATS_FOUND"]
        }
      }
    }

    Note: Replace the value of the bucketName attribute with the bucket in your account.

  • Task orchestration: A Lambda function handles the logic for copying the S3 object from the source bucket to the destination bucket which has just been scanned by GuardDuty. If the object was created within a new S3 prefix, the prefix and the object will be copied. If the object was tagged by GuardDuty, then the object tag will be copied.

Deploy the solution

The solution CloudFormation template provides you with multiple deployment scenarios so you can choose which best applies to your use case.

Deploy the CloudFormation template

For this next step, make sure that you deploy the CloudFormation template provided in the AWS account and Region where you want to test this solution.

To deploy the CloudFormation template

  1. Choose the Launch Stack button to launch a CloudFormation stack in your account. Note that the stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution in other Regions, download the solution’s CloudFormation template, modify it, and deploy it to the selected Regions.

    Launch Stack

  2. Choose the appropriate scenario and complete the parameters information questions as shown in Figure 3.

    Figure 3: CloudFormation template parameters

    Figure 3: CloudFormation template parameters

    Each of the following scenarios and their parameter information (from Figure 3) can be evaluated to make sure that the CloudFormation template deploys successfully:

    Deployment scenario

    • Create a new bucket or use an existing bucket?
    • If ”new”, should a KMS key be created for the new bucket?
    • Would you like to create the copy function to a destination bucket? Create the Lambda copy function from the protected bucket to the clean bucket.

    Post scan file copy function

    • This will be used as the basis for the copy function and EventBridge rule to invoke the function: Copy files to the clean bucket with either the THREATS or NO_THREATS_FOUND tagged value.

    Existing S3 bucket configuration – not used for new S3 buckets

    • Enter the bucket name that you would like to be your scanned bucket: Enter the existing S3 bucket name that will be enabled for GuardDuty Malware Protection for S3.
    • Enter the bucket name that you would like to be your scanned bucket: Enter the S3 bucket name to be used as the copy destination for S3 objects.
    • Is the existing bucket using a KMS key? Is the existing S3 bucket encrypted with an existing KMS key?
    • ARN of the existing KMS key to be used: Provide the existing KMS key Amazon Resource Name (ARN) to be used for KMS encryption. IAM policies will be configured for this KMS key name.
    • Lambda Copy Function clean bucket: Create a new S3 bucket with the Lambda copy function from the protected bucket to the clean bucket.
  3. Review the stack name and the parameters for the template.
  4. On the Quick create stack screen, scroll to the bottom and select I acknowledge that AWS CloudFormation will create IAM resources.
  5. Choose Create stack. The deployment of the CloudFormation stack will take 3–4 minutes.

After the CloudFormation stack has deployed successfully, the solution will be available for use in the same Region where you deployed the CloudFormation stack. The solution deploys a specific Lambda function and EventBridge rule to match the name of the source S3 bucket.

Deploy the AWS CDK template

Alternatively if you prefer to use AWS CDK, download the CDK code from the GitHub repository.

Follow the readme contained within the repository to deploy the solution or individual components depending your requirements.

Extend the solution

In this section, you’ll find options for extending the solution.

Copy alternative status results

The solution can be extended to copy S3 objects with a scan result status that you define. To change the scan result used to invoke the copy function, update the scanresultstatus in the event pattern defined in EventBridge rule created as part of the solution named S3Malware-CopyS3Object-<DOC-EXAMPLE-BUCKET-111122223333>.

"scanResultDetails": {
      "scanResultStatus": ["<scan result status>"]

Delete source S3 objects

To delete the object from the source after the copy was successful, you will need to update the Lambda function code and the IAM role used by the Lambda function.

The IAM role used by the Lambda function requires a new statement added to the existing role. The JSON formatted statement is provided in the following example.

{
			"Action": [
				"s3:DeleteObject"
					],
			"Resource": [
				"arn:aws:s3:::<DOC-EXAMPLE-BUCKET-111122223333>/*"
			],
			"Effect": "Allow",
			"Sid": "AllowDeleteObjectSourceBucket"
		},

The copy Lambda function requires the following lines to be added at the end of the function code to delete the object:

s3.delete_object(Bucket=SOURCE_BUCKET,Key=SOURCE_KEY)

Scan existing S3 objects

When GuardDuty Malware Protection for S3 is enabled, it scans only new objects put into the bucket. To scan existing objects in a S3 bucket for malware, set up bucket replication to replicate all objects from a source bucket to a destination bucket with Malware Protection enabled.

Automate tagged object deletion

To remove malicious objects from the S3 bucket to help prevent accidental download or access, implement a tag-based lifecycle rule to delete the object after a specific number of days. To achieve this follow the steps in Setting a lifecycle configuration on a bucket to configure a lifecycle rule and make sure the tag key is GuardDutyMalwareScanStatus and value is THREATS_FOUND.

Figure 4: Tag based S3 lifecycle rule

Figure 4: Tag based S3 lifecycle rule

Align the lifecycle policy with your organization’s current S3 object malware investigation procedures. Deleting objects prematurely might hinder security teams’ ability to analyze potentially malicious content. When using bucket versioning instead of permanently deleting the object, Amazon S3 inserts a delete marker that becomes the current version of the object.

AWS Transfer Family integration

If you’re using the AWS Transfer Family service with Secure File Transfer Protocol (SFTP) connector for S3, it’s recommended to scan external uploads for malware before using the received files. This helps ensure the security and integrity of data transferred into your S3 buckets using SFTP.

Figure 5: AWS Transfer Family S3 workflow

Figure 5: AWS Transfer Family S3 workflow

To implement malware scanning, configure a file processing workflow configuration to copy the uploaded objects into an S3 bucket that has GuardDuty Malware Protection for S3 enabled.

Figure 6: Transfer Family configuration workflow

Figure 6: Transfer Family configuration workflow

Summary

Amazon GuardDuty Malware Protection for S3 is now available to assess untrusted objects for malicious files before being ingested by downstream processes within your organization. Customers can automatically scan their S3 objects for malware and take appropriate actions, such as quarantining or remediating infected files. This proactive approach helps mitigate the risks associated with malware infections, data breaches, and potential financial losses. The solution provided offers an additional layer of protection by separating potentially malicious files from clean ones, allowing customers to maintain a separate repository of safe data for continued business operations or further analysis. Visit the 2024 re:Inforce session or the what’s new blog post to understand additional service details.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Luke Notley

Luke Notley
Luke is a Senior Solutions Architect with Amazon Web Services and is based in Western Australia. Luke has a passion for helping customers connect business outcomes with technology and assisting customers throughout their cloud journey, helping them design scalable, flexible, and resilient architectures. In his spare time, he enjoys traveling, coaching basketball teams, and DJing.

Arran Peterson

Arran Peterson
Arran, a Solutions Architect based in Adelaide, South Australia, collaborates closely with customers to deeply understand their distinct business needs and goals. His role extends to assisting customers in recognizing both the opportunities and risks linked to their decisions related to cloud solutions.

How to centrally manage secrets with AWS Secrets Manager

Post Syndicated from Shagun Beniwal original https://aws.amazon.com/blogs/security/how-to-centrally-manage-secrets-with-aws-secrets-manager/

In today’s digital landscape, managing secrets, such as passwords, API keys, tokens, and other credentials, has become a critical task for organizations. For some Amazon Web Services (AWS) customers, centralized management of secrets can be a robust and efficient solution to address this challenge. In this post, we delve into using AWS data protection services such as AWS Secrets Manager and AWS Key Management Service (AWS KMS) to help make secrets management easier in your environment by centrally managing them from a designated AWS account.

Centralized secrets management involves the consolidation of sensitive information into a single, secure repository. This repository acts as a centralized vault where secrets are stored, accessed, and managed with strict security controls. Centralizing secrets can help organizations enforce uniform security policies, streamline access control, and mitigate the risk of unauthorized access or leakage.

This approach offers several key benefits. First, it can enhance security by reducing the threat surface and providing a single point of control for managing access to sensitive information. Additionally, centralized secrets management can facilitate compliance with regulatory requirements by enforcing strict access controls and audit trails.

Furthermore, centralization promotes efficiency and scalability by enabling automated workflows for secret rotation, provisioning, and revocation. This automation reduces administrative tasks and minimizes the risk of human error, enhancing overall operational excellence.

Overview

In this post, we’ll walk you through how to set up a centralized account for managing your secrets and their lifecycle by using AWS Lambda rotation functions. Furthermore, to facilitate efficient access and management across multiple member accounts, we’ll discuss how to establish tunnelling through VPC peering to enable seamless communication between the Centralized Security Account in this architecture and the associated member accounts.

Notably, applications within the member accounts will directly access the secrets stored in the Centralized Security Account through the use of resource policies, streamlining the retrieval process. Additionally, using AWS provided DNS within the Centralized Security Account’s virtual private cloud (VPC) will automate the resolution of database host addresses to their respective control plane IP addresses. This functionality allows AWS Lambda function traffic to efficiently traverse the peering connection, enhancing overall system performance and reliability.

Figure 1 shows the solution architecture. The architecture has four accounts that are managed through AWS Organizations. Out of these four accounts, there are three workload accounts designated as Account A, Account B, and Account C that host the application and database for serving user requests, and a Centralized Security Account from which the secrets will be maintained and managed. VPC 1 from every workload account (Account A, Account B, and Account C) is peered with VPC 1 (part of the Centralized Security Account) to allow communication between workload accounts and the secrets management account. For high availability, secrets are also replicated to a different AWS Region.

Figure 1: Sample solution architecture for centrally managing secrets

Figure 1: Sample solution architecture for centrally managing secrets

Deploy the solution

Follow the steps in this section to deploy the solution.

Step 1: Create secrets, including database secrets, in your Centralized Security Account

First, create the secrets you want to use for this walkthrough. For example, the database secrets will have a following parameters:

{
    "engine": " sql”,
    "username": " admin ",
    "password": "EXAMPLE-PASSWORD",
    "host": "<cross account DB host URL>",
    "dbInstanceIdentifier": "<cross account DB instance identifier>"
    "port": "3306"
}

To create a database secret (console)

  1. Open the AWS Secrets Manager console in the Centralized Security Account.
  2. Choose Store a new secret.
  3. Choose Credentials for other database and provide the user name and password.

    Figure 2: Create and store a new secret using Secrets Manager

    Figure 2: Create and store a new secret using Secrets Manager

  4. For Encryption key, use the instructions in the AWS KMS documentation to create and choose the AWS KMS key that you want Secrets Manager to use to encrypt the secret value. Because you need to access the secret from another AWS account, make sure you are using an AWS KMS customer managed key (CMK).

    Important: Make sure that you do NOT use aws/secretsmanager, because it is an AWS managed key for Secrets Manager and you cannot modify the key policy.

    Figure 3: Select the encryption key to encrypt the secret created

    Figure 3: Select the encryption key to encrypt the secret created

    AWS Secrets Manager makes it possible for you to replicate secrets across multiple AWS Regions to provide regional access and low-latency requirements. If you turn on rotation for your primary secret, Secrets Manager rotates the secret in the primary Region, and the new secret value propagates to the associated Regions. Rotation of replicated secrets does not have to be individually managed.

    Note: When replicating a secret in Secrets Manager, you have the option to choose between using a multi-Region key (MRK) or an independent KMS key in the Region where the secrets are replicated. Your choice depends on your specific requirements such as operational preferences, regulatory compliance, and ease of management.

  5. For Database, select the database from the list of supported database types displayed and provide the host URL in the server address field, the database name, and the port number. Choose Next.

    Figure 4: Selecting the database and providing the database details

    Figure 4: Selecting the database and providing the database details

  6. For Configure secret, provide a secret name (for example, PostgresAppUser) and optionally add a description and tags. The resource permissions required to access the secret from across accounts will be explained later in this post.

    (Optional) Under Replicate secret, select other Regions and customer managed KMS keys from respective Regions to replicate this secret for high availability purposes, and then choose Next.

  7. The next screen will ask you to configure automatic rotation, but you can skip this step for now because you will create the rotation Lambda function in Step 2. Choose Next and then Store to finish saving the secret.

    Note: Secrets Manager rotation uses a Lambda function to update the secret and the database or service. After the secret is created, you must create a rotation Lambda function separately and attach it to the secret for rotating it. This detailed process is covered in the following steps.

Step 2: Deploy the rotation Lambda function where needed

For secrets that require automatic rotation to be turned on, deploy the rotation Lambda function from the serverless application list.

To deploy the rotation Lambda function

  1. In the Centralized Security Account, open the AWS Lambda console.
  2. In the left navigation menu, choose Applications, and then choose Create application.
  3. Choose Serverless Application and then choose the Public Applications tab.
  4. Make sure you have selected the checkbox for Show apps that create custom IAM roles or resource policies.

    Figure 5: Create a rotation Lambda function in the centralized security account for secret rotation

    Figure 5: Create a rotation Lambda function in the centralized security account for secret rotation

  5. In the search field under Serverless application, search for SecretsManager, and the available functions for rotation will be displayed. Choose the Lambda function based on your DB engine type. For example, if the DB engine type is Postgres SQL, select SecretsManagerRDSPostgreSQLRotationSingleUser from the list by choosing the application name.

    Figure 6: Choosing the AWS provided PostgreSQL rotation function (optionally you may choose a different rotation Lambda function)

    Figure 6: Choosing the AWS provided PostgreSQL rotation function (optionally you may choose a different rotation Lambda function)

  6. On the next page, under Application settings, provide the requested details for the following settings:
    1. functionName (for example, PostgresDBUserRotationLambda)
    2. endpoint – For the SecretsManagerRDSPostgreSQLRotationSingleUser option, in the endpoint field, add https://secretsmanager.us-east-1.amazonaws.com. (Choose the Secrets Manager service endpoint based on the Region where the rotation Lambda is created.)
    3. kmsKeyArn – Used by the secret for encryption.
    4. vpcSecurityGroupIds Provide the security group ID for the rotation Lambda function. Under the outbound rules tab of the security group attached to the rotation Lambda, add the required rules for the Lambda function to communicate with the Secrets Manager service endpoint and database. Also, make sure that the security groups attached to your database or service allow inbound connections from the Lambda rotation function.
    5. vpcSubnetIds – When you provide vpcSubnetIDs, provide subnets of a VPC from the Centralized Security Account where you are planning to deploy your rotation Lambda functions.

    Figure 7: Set up rotation Lambda configuration

    Figure 7: Set up rotation Lambda configuration

  7. Select the checkbox next to I acknowledge that this app creates custom IAM roles and resource policies, and then choose Deploy. This will create the required Lambda function to rotate your secret.
  8. Navigate to the Secrets Manager console and edit the secret to turn on automatic rotation (for instructions, see the Secrets Manager documentation).

    Figure 8: Editing the rotation in the Secrets Manager console

    Figure 8: Editing the rotation in the Secrets Manager console

    Set a rotation schedule according to your organization’s data security strategy.

  9. For Lambda rotation function, select the new Lambda function PostgresDbUserRotationLambda that you created in the previous step to associate it with the secret.

    Figure 9: The rotation configuration settings in the Secrets Manager console

    Figure 9: The rotation configuration settings in the Secrets Manager console

Step 3: Set up networking for Lambda to reach the Secrets Manager service endpoint

To provide connectivity to the Lambda function, you can either deploy a VPC endpoint with Private DNS enabled or a NAT gateway.

Deploy a VPC endpoint with Private DNS enabled

To create an Amazon VPC endpoint for AWS Secrets Manager (recommended)

  1. Open the Amazon VPC console, choose Endpoints, and then choose Create endpoint.
  2. For Service category, select AWS services. In the Service Name list, select the Secrets Manager endpoint service named com.amazonaws.<Region>.secretsmanager.

    Figure 10: Create a VPC endpoint for Secrets Manager

    Figure 10: Create a VPC endpoint for Secrets Manager

  3. For VPC, specify the VPC you want to create the endpoint in. This should be the VPC that you selected for hosting centralized secret rotation using the AWS Lambda function.
  4. To create a VPC endpoint, you need to specify the private IP address range in which the endpoint will be accessible. To do this, select the subnet for each Availability Zone (AZ). This restricts the VPC endpoint to the private IP address range specific to each AZ and also creates an AZ-specific VPC endpoint. Specifying more than one subnet-AZ combination helps improve fault tolerance and make the endpoint accessible from a different AZ in case of an AZ failure.
  5. Select the Enable DNS name checkbox for the VPC endpoint. Private DNS resolves the standard Secrets Manager DNS hostname https://secretsmanager.<Region>.amazonaws.com. to the private IP addresses associated with the VPC endpoint specific DNS hostname.

    Figure 11: Set up VPC endpoint configurations

    Figure 11: Set up VPC endpoint configurations

  6. Associate a security group with this endpoint (for instructions, see the AWS PrivateLink documentation). The security group enables you to control the traffic to the endpoint from resources in your VPC. The attached security group should accept inbound connections from the Lambda function for rotation on port 443.

    Figure 12: Attaching the security group to the VPC endpoint

    Figure 12: Attaching the security group to the VPC endpoint

Create a NAT gateway

Alternatively, you can give your function internet access. Place the function in private subnets and route the outbound traffic to a NAT gateway in a public subnet. The NAT gateway has a public IP address and connects to the internet through the VPC’s internet gateway. To create a NAT gateway, follow the steps described in this AWS re:post article.

Step 4: Deploy VPC peering

Next, deploy VPC peering between the Centralized Security Account and the member accounts that hold the database.

To deploy VPC peering

  1. Open the Amazon VPC console in the Centralized Security Account.
  2. In the left navigation pane, choose Peering connections, and then choose Create peering connection.
  3. Configure the following information, and choose Create peering connection when you are done:
    1. Name – You can optionally name your VPC peering connection, for example central_secret_management_vpc_peer.
    2. VPC ID (Requester) – Select the centralized secret management AWS Lambda VPC in your account with which you want to create the VPC peering connection.
    3. Account – Choose Another account.
    4. Account ID – Enter the ID of the AWS account that owns the database.

      Figure 13: Create VPC peering connection

      Figure 13: Create VPC peering connection

    5. VPC ID (Accepter) – Enter the ID of the database VPC with which to create the VPC peering connection.

      Figure 14: Create VPC peering connection – Entering the VPC ID

      Figure 14: Create VPC peering connection – Entering the VPC ID

  4. From the database account, navigate to the Amazon VPC console. Choose Peering connections and then choose Accept request.

    Figure 15: Accepting the VPC peering request from the database account (Accounts A, B, and C)

    Figure 15: Accepting the VPC peering request from the database account (Accounts A, B, and C)

  5. Add a route to the route tables in both VPCs so that you can send and receive traffic across the peering connection. Each table has a local route and a route that sends traffic for the peer VPC to the VPC peering connection.

    Figure 16: Sample table to show VPC peering connections between the Centralized Security Account and application/database accounts

    Figure 16: Sample table to show VPC peering connections between the Centralized Security Account and application/database accounts

  6. Perform the following steps in the Centralized Security Account:
    1. Open the Amazon VPC console in the Centralized Security Account.
    2. Select the Centralized Security Account Lambda VPC. Under Details, choose Main route table.
    3. Choose Edit routes, and then choose Add routes. Under Destination, add the database VPC CIDR (172.31.0.0/16) in an empty field. Under Target, select the peering connection you created in Step 3.
  7. Perform the following steps in Account 2, where the application/database is hosted:
    1. Open the VPC console in the database account.
    2. Select the Centralized Security Account Lambda VPC and then, under Details, choose Main route table.
    3. Choose Edit routes, and then choose Add routes. Under Destination, add the rotation Lambda VPC CIDR (10.0.0.0/16) in an empty field. Under Target, select the peering connection you created in Step 3.

Step 5: Set up resource-based policies on each secret

After the secrets are deployed into the Centralized Security Account, to allow application roles or users in other accounts to access the secrets (known as cross-account access), you must allow access in both a resource policy and in an identity policy. This is different than granting access to identities in the same account rather than the secret.

To set up resource-based policies on each secret

  1. Attach a resource policy to the secret in the Centralized Security Account by using the following steps:
    1. Open the Secrets Manager console. Remember to choose the Region that is appropriate for your setup.
    2. From the list of secrets, choose your secret.
    3. On the Secret details page, choose the Overview tab.
    4. Under Resource permissions, choose Edit permissions.
    5. In the Code field, attach or append the following resource policy statement, and then choose Save:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<account2-id>:role/ApplicationRole"
          },
          "Action": "secretsmanager:GetSecretValue",
          "Resource": "<ARN of secret to which this policy is attached>"
        }
      ]
    }

  2. Add the following resource policy statement to the key policy for the KMS key in the Centralized Security Account.
    {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<account2-id>:role/ApplicationRole"
          },
          "Action": [
            "kms:Decrypt",
            "kms:DescribeKey"
          ],
          "Resource": "<kms-key-resource-arn>"
        }

    If there exists no policy on the key, add the following policy to the key.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<account2-id>:role/ApplicationRole"
          },
          "Action": [
            "kms:Decrypt",
            "kms:DescribeKey"
          ],
          "Resource": "<kms-key-resource-arn>"
        }
      ]
    }

  3. Attach an identity policy to the identity in the accounts where you hosted your applications to provide access to the secret and the KMS key used to encrypt the secret.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "secretsmanager:GetSecretValue",
          "Resource": "arn:aws:secretsmanager:<your-region>:<centralized-security-account-id>:secret:<secret-id>"
        },
        {
          "Effect": "Allow",
          "Action": "kms:Decrypt",
          "Resource": "arn:aws:kms:<your-region>:<centralized-security-account-id>:key/<key-id>"
        }
      ]
    }

The access policies mentioned here are just for the example in this post. In a production environment, only provide the needed granular permissions by exercising least privilege principles.

What challenges does this solution present, and how can you overcome them?

Along with the advantages discussed in this post, there are a few challenges you should anticipate while deploying this solution:

  1. Currently there is a maximum of 20,480 characters allowed in a resource-based permissions policy attached to a secret. For organizations where a large number of external accounts need to be given access to a secret, you will need to keep this quota in mind.
  2. There is also a limit on the total number of active VPC peering connections per VPC. By default, the limit is 50 connections, but this is adjustable up to 125. If you require more connections across VPCs, you can use other solutions, like a transit gateway, as an alternative.
  3. As the number of applications that require access to secrets from the Centralized Security Account increases, the number of external accesses will also increase, and access control might become difficult over time. To reduce the number of external accounts that have access to the Centralized Security Account, you may choose to use AWS IAM Access Analyzer.

Conclusion

In this post, we provided you with a step-by-step solution to establish a Centralized Security Account that uses the AWS Secrets Manager service for securely storing your secrets in a central place. The post outlined the process of deploying AWS Lambda functions to facilitate automatic rotation of necessary secrets. Furthermore, we delved into the implementation of VPC peering to provide uninterrupted connectivity between the rotation function and your databases or applications housed in different AWS accounts, helping to ensure smooth rotation.

Finally, we discussed the essential policies that are needed to enable applications to use these secrets through resource-based policies. This implementation provides a way for you to conveniently monitor and audit your secrets.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Shagun Beniwal

Shagun Beniwal
Shagun is a Technical Account Manager at AWS. He manages Global System Integrators (GSIs) and Partners operating on AWS Enterprise Support. He is a member of the internal security community with focus areas in threat detection & incident response, infrastructure security, and IAM. Shagun helps customers achieve strategic business outcomes in security, resilience, cost optimization, and operations. You can follow Shagun on LinkedIn.

Navaneeth Krishnan Venugopal

Navaneeth Krishnan Venugopal
Navaneeth is a Cloud Support – Security Engineer II at AWS and an AWS Secrets Manager subject matter expert (SME). He is passionate about cybersecurity and helps provide tailored, secure solutions for a broad spectrum of technical issues faced by customers. Navaneeth has a focus on security and compliance and enjoys helping customers architect secure solutions on AWS.

Cloud infrastructure entitlement management in AWS

Post Syndicated from Mathangi Ramesh original https://aws.amazon.com/blogs/security/cloud-infrastructure-entitlement-management-in-aws/

Customers use Amazon Web Services (AWS) to securely build, deploy, and scale their applications. As your organization grows, you want to streamline permissions management towards least privilege for your identities and resources. At AWS, we see two customer personas working towards least privilege permissions: security teams and developers. Security teams want to centrally inspect permissions across their organizations to identify and remediate access-related risks, such as excessive permissions, anomalous access to resources or compliance of identities. Developers want policy verification tools that help them set effective permissions and maintain least privilege as they build their applications.

Customers are increasingly turning to cloud infrastructure entitlement management (CIEM) solutions to guide their permissions management strategies. CIEM solutions are designed to identify, manage, and mitigate risks associated with access privileges granted to identities and resources in cloud environments. While the specific pillars of CIEM vary, four fundamental capabilities are widely recognized: rightsizing permissions, detecting anomalies, visualization, and compliance reporting. AWS provides these capabilities through services such as AWS Identity and Access Management (IAM) Access Analyzer, Amazon GuardDuty, Amazon Detective, AWS Audit Manager, and AWS Security Hub. I explore these services in this blog post.

Rightsizing permissions

Customers primarily explore CIEM solutions to rightsize their existing permissions by identifying and remediating identities with excessive permissions that pose potential security risks. In AWS, IAM Access Analyzer is a powerful tool designed to assist you in achieving this goal. IAM Access Analyzer guides you to set, verify, and refine permissions.

After IAM Access Analyzer is set up, it continuously monitors AWS Identity and Access Management (IAM) users and roles within your organization and offers granular visibility into overly permissive identities. This empowers your security team to centrally review and identify instances of unused access, enabling them to take proactive measures to refine access and mitigate risks.

While most CIEM solutions prioritize tools for security teams, it’s essential to also help developers make sure that their policies adhere to security best practices before deployment. IAM Access Analyzer provides developers with policy validation and custom policy checks to make sure their policies are functional and secure. Now, they can use policy recommendations to refine unused access, making sure that identities have only the permissions required for their intended functions.

Anomaly detection

Security teams use anomaly detection capabilities to identify unexpected events, observations, or activities that deviate from the baseline behavior of an identity. In AWS, Amazon GuardDuty supports anomaly detection in an identity’s usage patterns, such as unusual sign-in attempts, unauthorized access attempts, or suspicious API calls made using compromised credentials.

By using machine learning and threat intelligence, GuardDuty can establish baselines for normal behavior and flag deviations that might indicate potential threats or compromised identities. When establishing CIEM capabilities, your security team can use GuardDuty to identify threat and anomalous behavior pertaining to their identities.

Visualization

With visualization, you have two goals. The first is to centrally inspect the security posture of identities, and the second is to comprehensively understand how identities are connected to various resources within your AWS environment. IAM Access Analyzer provides a dashboard to centrally review identities. The dashboard helps security teams gain visibility into the effective use of permissions at scale and identify top accounts that need attention. By reviewing the dashboard, you can pinpoint areas that need focus by analyzing accounts with the highest number of findings and the most commonly occurring issues such as unused roles.

Amazon Detective helps you to visually review individual identities in AWS. When GuardDuty identifies a threat, Detective generates a visual representation of identities and their relationships with resources, such as Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Simple Storage Service (Amazon S3) buckets, or AWS Lambda functions. This graphical view provides a clear understanding of the access patterns associated with each identity. Detective visualizes access patterns, highlighting unusual or anomalous activities related to identities. This can include unauthorized access attempts, suspicious API calls, or unexpected resource interactions. You can depend on Detective to generate a visual representation of the relationship between identities and resources.

Compliance reporting

Security teams work with auditors to assess whether identities, resources, and permissions adhere to the organization’s compliance requirements. AWS Audit Manager automates evidence collection to help you meet compliance reporting and audit needs. These automated evidence packages include reporting on identities. Specifically, you can use Audit Manager to analyze IAM policies and roles to identify potential misconfigurations, excessive permissions, or deviations from best practices.

Audit Manager provides detailed compliance reports that highlight non-compliant identities or access controls, allowing your auditors and security teams to take corrective actions and support ongoing adherence to regulatory and organizational standards. In addition to monitoring and reporting, Audit Manager offers guidance to remediate certain types of non-compliant identities or access controls, reducing the burden on security teams and supporting timely resolution of identified issues.

Single pane of glass

While customers appreciate the diverse capabilities AWS offers across various services, they also seek a unified and consolidated view that brings together data from these different sources. AWS Security Hub addresses this need by providing a single pane of glass that enables you to gain a holistic understanding of your security posture. Security Hub acts as a centralized hub, consuming findings from multiple AWS services and presenting a comprehensive view of how identities are being managed and used across the organization.

Conclusion

CIEM solutions are designed to identify, manage, and mitigate risks associated with access privileges granted to identities and resources in cloud environments. The AWS services mentioned in this post can help you achieve your CIEM goals. If you want to explore CIEM capabilities in AWS, use the services mentioned in this post or see the following resources.

Resources

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Mathangi Ramesh

Mathangi Ramesh
Mathangi is the Principal Product Manager for AWS IAM Access Analyzer. She enjoys talking to customers and working with data to solve problems. Outside of work, Mathangi is a fitness enthusiast and a Bharatanatyam dancer. She holds an MBA degree from Carnegie Mellon University.

Network perimeter security protections for generative AI

Post Syndicated from Riggs Goodman III original https://aws.amazon.com/blogs/security/network-perimeter-security-protections-for-generative-ai/

Generative AI–based applications have grown in popularity in the last couple of years. Applications built with large language models (LLMs) have the potential to increase the value companies bring to their customers. In this blog post, we dive deep into network perimeter protection for generative AI applications. We’ll walk through the different areas of network perimeter protection you should consider, discuss how those apply to generative AI–based applications, and provide architecture patterns. By implementing network perimeter protection for your generative AI–based applications, you gain controls to help protect from unauthorized use, cost overruns, distributed denial of service (DDoS), and other threat actors or curious users.

Perimeter protection for LLMs

Network perimeter protection for web applications helps answer important questions, for example:

  • Who can access the app?
  • What kind of data is sent to the app?
  • How much data is the app is allowed to use?

For the most part, the same network protection methods used for other web apps also work for generative AI apps. The main focus of these methods is controlling network traffic that is trying to access the app, not the specific requests and responses the app creates. We’ll focus on three key areas of network perimeter protection:

  1. Authentication and authorization for the app’s frontend
  2. Using a web application firewall
  3. Protection against DDoS attacks

The security concerns of using LLMs in these apps, including issues with prompt injections, sensitive information leaks, or excess agency, is beyond the scope of this post.

Frontend authentication and authorization

When designing network perimeter protection, you first need to decide whether you will allow certain users to access the application, based on whether they are authenticated (AuthN) and whether they are authorized (AuthZ) to ask certain questions of the generative AI–based applications. Many generative AI–based applications sit behind an authentication layer so that a user must sign in to their identity provider before accessing the application. For public applications that are not behind any authentication (a chatbot, for example), additional considerations are required with regard to AWS WAF and DDoS protection, which we discuss in the next two sections.

Let’s look at an example. Amazon API Gateway is an option for customers for the application frontend, providing metering of users or APIs with authentication and authorization. It’s a fully managed service that makes it convenient for developers to publish, maintain, monitor, and secure APIs at scale. With API Gateway, you create AWS Lambda authorizers to control access to APIs within your application. Figure 1 shows how access works for this example.

Figure 1: An API Gateway, Lambda authorizer, and basic filter in the signal path between client and LLM

Figure 1: An API Gateway, Lambda authorizer, and basic filter in the signal path between client and LLM

The workflow in Figure 1 is as follows:

  1. A client makes a request to your API that is fronted by the API Gateway.
  2. When the API Gateway receives the request, it sends the request to a Lambda authorizer that authenticates the request through OAuth, SAML, or another mechanism. The Lambda authorizer returns an AWS Identity and Access Management (IAM) policy to the API Gateway, which will permit or deny the request.
  3. If permitted, the API Gateway sends the API request to the backend application. In Figure 1, this is a Lambda function that provides additional capabilities in the area of LLM security, standing in for more complex filtering. In addition to the Lambda authorizer, you can configure throttling on the API Gateway on a per-client basis or on the application methods clients are accessing before traffic makes it to the backend application. Throttling can provide some mitigation against not only DDoS attacks but also model cloning and inversion attacks.
  4. Finally, the application sends requests to your LLM that is deployed on AWS. In this example, the LLM is deployed on Amazon Bedrock.

The combination of Lambda authorizers and throttling helps support a number of perimeter protection mechanisms. First, only authorized users gain access to the application, helping to prevent bots and the public from accessing the application. Second, for authorized users, you limit the rate at which they can invoke the LLM to prevent excessive costs related to requests and responses to the LLM. Third, after users have been authenticated and authorized by the application, the application can pass identity information to the backend data access layer in order to restrict the data available to the LLM, aligning with what the user is authorized to access.

Besides API Gateway, AWS provides other options you can use to provide frontend authentication and authorization. AWS Application Load Balancer (ALB) supports OpenID Connect (OIDC) capabilities to require authentication to your OIDC provider prior to access. For internal applications, AWS Verified Access combines both identity and device trust signals to permit or deny access to your generative AI application.

AWS WAF

Once the authentication or authorization decision is made, the next consideration for network perimeter protection is on the application side. New security risks are being identified for generative AI–based applications, as described in the OWASP Top 10 for Large Language Model Applications. These risks include insecure output handling, insecure plugin design, and other mechanisms that cause the application to provide responses that are outside the desired norm. For example, a threat actor could craft a direct prompt injection to the LLM, which causes the LLM behave improperly. Some of these risks (insecure plugin design) can be addressed by passing identity information to the plugins and data sources. However, many of those protections fall outside the network perimeter protection and into the realm of security within the application. For network perimeter protection, the focus is on validating the users who have access to the application and supporting rules that allow, block, or monitor web requests based on network rules and patterns at the application level prior to application access.

In addition, bot traffic is an important consideration for web-based applications. According to Security Today, 47% of all internet traffic originates from bots. Bots that send requests to public applications drive up the cost of using generative AI–based applications by causing higher request loads.

To protect against bot traffic before the user gains access to the application, you can implement AWS WAF as part of the perimeter protection. Using AWS WAF, you can deploy a firewall to monitor and block the HTTP(S) requests that are forwarded to your protected web application resources. These resources exist behind Amazon API Gateway, ALB, AWS Verified Access, and other resources. From a web application point of view, AWS WAF is used to prevent or limit access to your application before invocation of your LLM takes place. This is an important area to consider because, in addition to protecting the prompts and completions going to and from the LLM itself, you want to make sure only legitimate traffic can access your application. AWS Managed Rules or AWS Marketplace managed rule groups provide you with predefined rules as part of a rule group.

Let’s expand the previous example. As your application shown in Figure 1 begins to scale, you decide to move it behind Amazon CloudFront. CloudFront is a web service that gives you a distributed ingress into AWS by using a global network of edge locations. Besides providing distributed ingress, CloudFront gives you the option to deploy AWS WAF in a distributed fashion to help protect against SQL injections, bot control, and other options as part of your AWS WAF rules. Let’s walk through the new architecture in Figure 2.

Figure 2: Adding AWS WAF and CloudFront to the client-to-model signal path

Figure 2: Adding AWS WAF and CloudFront to the client-to-model signal path

The workflow shown in Figure 2 is as follows:

  1. A client makes a request to your API. DNS directs the client to a CloudFront location, where AWS WAF is deployed.
  2. CloudFront sends the request through an AWS WAF rule to determine whether to block, monitor, or allow the traffic. If AWS WAF does not block the traffic, AWS WAF sends it to the CloudFront routing rules.

    Note: It is recommended that you restrict access to the API Gateway so users cannot bypass the CloudFront distribution to access the API Gateway. An example of how to accomplish this goal can be found in the Restricting access on HTTP API Gateway Endpoint with Lambda Authorizer blog post.

  3. CloudFront sends the traffic to the API Gateway, where it runs through the same traffic path as discussed in Figure 1.

To dive into more detail, let’s focus on bot traffic. With AWS WAF Bot Control, you can monitor, block, or rate limit bots such as scrapers, scanners, crawlers, status monitors, and search engines. Bot Control provides multiple options in terms of configured rules and inspection levels. For example, if you use the targeted inspection level of the rule group, you can challenge bots that don’t self-identify, making it harder and more expensive for malicious bots to operate against your generative AI–based application. You can use the Bot Control managed rule group alone or in combination with other AWS Managed Rules rule groups and your own custom AWS WAF rules. Bot Control also provides granular visibility on the number of bots that are targeting your application, as shown in Figure 3.

Figure 3: Bot control dashboard for bot requests and non-bot requests

Figure 3: Bot control dashboard for bot requests and non-bot requests

How does this functionality help you? For your generative AI–based application, you gain visibility into how bots and other traffic are targeting your application. AWS WAF provides options to monitor and customize the web request handling of bot traffic, including allowing specific bots or blocking bot traffic to your application. In addition to bot control, AWS WAF provides a number of different managed rule groups, including baseline rule groups, use-case specific rule groups, IP reputation rules groups, and others. For more information, take a look at the documentation on both AWS Managed Rules rule groups and AWS Marketplace managed rule groups.

DDoS protection

The last topic we’ll cover in this post is DDoS with LLMs. Similar to threats against other Layer 7 applications, threat actors can send requests that consume an exceptionally high amount of resources, which results in a decline in the service’s responsiveness or an increase in the cost to run the LLMs that are handling the high number of requests. Although throttling can help support a per-user or per-method rate limit, DDoS attacks use more advanced threat vectors that are difficult to protect against with throttling.

AWS Shield helps to provide protection against DDoS for your internet-facing applications, both at Layer 3/4 with Shield standard or Layer 7 with Shield Advanced. For example, Shield Advanced responds automatically to mitigate application threats by counting or blocking web requests that are part of the exploit by using web access control lists (ACLs) that are part of your already deployed AWS WAF. Depending on your requirements, Shield can provide multiple layers of protection against DDoS attacks.

Figure 4 shows how your deployment might look after Shield is added to the architecture.

Figure 4: Adding Shield Advanced to the client-to-model signal path

Figure 4: Adding Shield Advanced to the client-to-model signal path

The workflow in Figure 4 is as follows:

  1. A client makes a request to your API. DNS directs the client to a CloudFront location, where AWS WAF and Shield are deployed.
  2. CloudFront sends the request through an AWS WAF rule to determine whether to block, monitor, or allow the traffic. AWS Shield can mitigate a wide range of known DDoS attack vectors and zero-day attack vectors. Depending on the configuration, Shield Advanced and AWS WAF work together to rate-limit traffic coming from individual IP addresses. If AWS WAF or Shield Advanced don’t block the traffic, the services will send it to the CloudFront routing rules.
  3. CloudFront sends the traffic to the API Gateway, where it will run through the same traffic path as discussed in Figure 1.

When you implement AWS Shield and Shield Advanced, you gain protection against security events and visibility into both global and account-level events. For example, at the account level, you get information on the total number of events seen on your account, the largest bit rate and packet rate for each resource, and the largest request rate for CloudFront. With Shield Advanced, you also get access to notifications of events that are detected by Shield Advanced and additional information about detected events and mitigations. These metrics and data, along with AWS WAF, provide you with visibility into the traffic that is trying to access your generative AI–based applications. This provides mitigation capabilities before the traffic accesses your application and before invocation of the LLM.

Considerations

When deploying network perimeter protection with generative AI applications, consider the following:

  • AWS provides multiple options, on both the frontend authentication and authorization side and the AWS WAF side, for how to configure perimeter protections. Depending on your application architecture and traffic patterns, multiple resources can provide the perimeter protection with AWS WAF and integrate with identity providers for authentication and authorization decisions.
  • You can also deploy more advanced LLM-specific prompt and completion filters by using Lambda functions and other AWS services as part of your deployment architecture. Perimeter protection capabilities are focused on preventing undesired traffic from reaching the end application.
  • Most of the network perimeter protections used for LLMs are similar to network perimeter protection mechanisms for other web applications. The difference is that additional threat vectors come into play compared to regular web applications. For more information on the threat vectors, see OWASP Top 10 for Large Language Model Applications and Mitre ATLAS.

Conclusion

In this blog post, we discussed how traditional network perimeter protection strategies can provide defense in depth for generative AI–based applications. We discussed the similarities and differences between LLM workloads and other web applications. We walked through why authentication and authorization protection is important, showing how you can use Amazon API Gateway to throttle through usage plans and to provide authentication through Lambda authorizers. Then, we discussed how you can use AWS WAF to help protect applications from bots. Lastly, we talked about how AWS Shield can provide advanced protection against different types of DDoS attacks at scale. For additional information on network perimeter protection and generative AI security, take a look at other blogs posts in the AWS Security Blog Channel.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Riggs Goodman III

Riggs Goodman III
Riggs is a Principal Partner Solution Architect at AWS. His current focus is on AI security and data security, providing technical guidance, architecture patterns, and leadership for customers and partners to build AI workloads on AWS. Internally, Riggs focuses on driving overall technical strategy and innovation across AWS service teams to address customer and partner challenges.

Spring 2024 SOC 2 report now available in Japanese, Korean, and Spanish

Post Syndicated from Brownell Combs original https://aws.amazon.com/blogs/security/spring-2024-soc-2-report-now-available-in-japanese-korean-and-spanish/

Japanese | Korean | Spanish

At Amazon Web Services (AWS), we continue to listen to our customers, regulators, and stakeholders to understand their needs regarding audit, assurance, certification, and attestation programs. We are pleased to announce that the AWS System and Organization Controls (SOC) 2 report is now available in Japanese, Korean, and Spanish. This translated report will help drive greater engagement and alignment with customer and regulatory requirements across Japan, Korea, Latin America, and Spain.

The Japanese, Korean, and Spanish language versions of the report do not contain the independent opinion issued by the auditors, but you can find this information in the English language version. Stakeholders should use the English version as a complement to the Japanese, Korean, or Spanish versions.

Going forward, the following reports in each quarter will be translated. Spring and Fall SOC 1 controls are included in the Spring and Fall SOC 2 reports, so this translation schedule will provide year-round coverage of the English versions.

  • Spring SOC 2 (April 1 – March 31)
  • Summer SOC 1 (July 1 – June 30)
  • Fall SOC 2 (October 1 – September 30)
  • Winter SOC 1 (January 1 – December 31)

Customers can download the translated Spring 2024 SOC 2 reports in Japanese, Korean, and Spanish through AWS Artifact, a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

The Spring 2024 SOC 2 report includes a total of 177 services in scope. For up-to-date information, including when additional services are added, visit the AWS Services in Scope by Compliance Program webpage and choose SOC.

AWS strives to continuously bring services into scope of its compliance programs to help you meet your architectural and regulatory needs. Please reach out to your AWS account team if you have questions or feedback about SOC compliance.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 


Japanese version

第1四半期 2024 SOC 2 レポートの日本語、韓国語、スペイン語版の提供を開始

当社はお客様、規制当局、利害関係者の声に継続的に耳を傾け、Amazon Web Services (AWS) における監査、保証、認定、認証プログラムに関するそれぞれのニーズを理解するよう努めています。この度、AWS System and Organization Controls (SOC) 2 レポートが、日本語、韓国語、スペイン語で利用可能になりました。この翻訳版のレポートは、日本、韓国、ラテンアメリカ、スペインのお客様および規制要件との連携と協力体制を強化するためのものです。

本レポートの日本語、韓国語、スペイン語版には監査人による独立した第三者の意見は含まれていませんが、英語版には含まれています。利害関係者は、日本語、韓国語、スペイン語版の補足として英語版を参照する必要があります。

今後、四半期ごとの以下のレポートで翻訳版が提供されます。SOC 1 統制は、第1四半期 および 第3四半期 SOC 2 レポートに含まれるため、英語版と合わせ、1 年間のレポートの翻訳版すべてがこのスケジュールで網羅されることになります。

  • 第1四半期 SOC 2 (4 月 1 日〜3 月 31 日)
  • 第2四半期 SOC 1 (7 月 1 日〜6 月 30 日)
  • 第3四半期 SOC 2 (10 月 1 日〜9 月 30 日)
  • 第4四半期 SOC 1 (1 月 1 日〜12 月 31 日)

第1四半期 2024 SOC 2 レポートの日本語、韓国語、スペイン語版は AWS Artifact (AWS のコンプライアンスレポートをオンデマンドで入手するためのセルフサービスポータル) を使用してダウンロードできます。AWS マネジメントコンソール内の AWS Artifact にサインインするか、AWS Artifact の開始方法ページで詳細をご覧ください。

第1四半期 2024 SOC 2 レポートの対象範囲には合計 177 のサービスが含まれます。その他のサービスが追加される時期など、最新の情報については、コンプライアンスプログラムによる対象範囲内の AWS のサービスで [SOC] を選択してご覧いただけます。

AWS では、アーキテクチャおよび規制に関するお客様のニーズを支援するため、コンプライアンスプログラムの対象範囲に継続的にサービスを追加するよう努めています。SOC コンプライアンスに関するご質問やご意見については、担当の AWS アカウントチームまでお問い合わせください。

コンプライアンスおよびセキュリティプログラムに関する詳細については、AWS コンプライアンスプログラムをご覧ください。当社ではお客様のご意見・ご質問を重視しています。お問い合わせページより AWS コンプライアンスチームにお問い合わせください。
 


Korean version

2024년 춘계 SOC 2 보고서의 한국어, 일본어, 스페인어 번역본 제공

Amazon은 고객, 규제 기관 및 이해 관계자의 의견을 지속적으로 경청하여 Amazon Web Services (AWS)의 감사, 보증, 인증 및 증명 프로그램과 관련된 요구 사항을 파악하고 있습니다. AWS System and Organization Controls(SOC) 2 보고서가 이제 한국어, 일본어, 스페인어로 제공됨을 알려 드립니다. 이 번역된 보고서는 일본, 한국, 중남미, 스페인의 고객 및 규제 요건을 준수하고 참여도를 높이는 데 도움이 될 것입니다.

보고서의 일본어, 한국어, 스페인어 버전에는 감사인의 독립적인 의견이 포함되어 있지 않지만, 영어 버전에서는 해당 정보를 확인할 수 있습니다. 이해관계자는 일본어, 한국어 또는 스페인어 버전을 보완하기 위해 영어 버전을 사용해야 합니다.

앞으로 분기마다 다음 보고서가 번역본으로 제공됩니다. SOC 1 통제 조치는 춘계 및 추계 SOC 2 보고서에 포함되어 있으므로, 이 일정은 영어 버전과 함께 모든 번역된 언어로 연중 내내 제공됩니다.

  • 춘계 SOC 2(4/1~3/31)
  • 하계 SOC 1(7/1~6/30)
  • 추계 SOC 2(10/1~9/30)
  • 동계 SOC 1(1/1~12/31)

고객은 AWS 규정 준수 보고서를 필요할 때 이용할 수 있는 셀프 서비스 포털인 AWS Artifact를 통해 한국어, 일본어, 스페인어로 번역된 2024년 춘계 SOC 2 보고서를 다운로드할 수 있습니다. AWS Management Console의 AWS Artifact에 로그인하거나 Getting Started with AWS Artifact(AWS Artifact 시작하기)에서 자세한 내용을 알아보세요.

2024년 춘계 SOC 2 보고서에는 총 177개의 서비스가 포함됩니다. 추가 서비스가 추가되는 시기 등의 최신 정보는 AWS Services in Scope by Compliance Program(규정 준수 프로그램별 범위 내 AWS 서비스)에서 SOC를 선택하세요.

AWS는 고객이 아키텍처 및 규제 요구 사항을 충족할 수 있도록 지속적으로 서비스를 규정 준수 프로그램의 범위에 포함시키기 위해 노력하고 있습니다. SOC 규정 준수에 대한 질문이나 피드백이 있는 경우 AWS 계정 팀에 문의하시기 바랍니다.

규정 준수 및 보안 프로그램에 대한 자세한 내용은 AWS 규정 준수 프로그램을 참조하세요. 언제나 그렇듯이 AWS는 여러분의 피드백과 질문을 소중히 여깁니다. 문의하기 페이지를 통해 AWS 규정 준수 팀에 문의하시기 바랍니다.
 


Spanish version

El informe de SOC 2 primavera 2024 se encuentra disponible actualmente en japonés, coreano y español

Seguimos escuchando a nuestros clientes, reguladores y partes interesadas para comprender sus necesidades en relación con los programas de auditoría, garantía, certificación y acreditación en Amazon Web Services (AWS). Nos enorgullece anunciar que el informe de controles de sistema y organización (SOC) 2 de AWS se encuentra disponible en japonés, coreano y español. Estos informes traducidos ayudarán a impulsar un mayor compromiso y alineación con los requisitos normativos y de los clientes en Japón, Corea, Latinoamérica y España.

Estas versiones del informe en japonés, coreano y español no contienen la opinión independiente emitida por los auditores, pero se puede acceder a esta información en la versión en inglés del documento. Las partes interesadas deben usar la versión en inglés como complemento de las versiones en japonés, coreano y español.

De aquí en adelante, los siguientes informes trimestrales estarán traducidos. Dado que los controles SOC 1 se incluyen en los informes de primavera y otoño de SOC 2, esta programación brinda una cobertura anual para todos los idiomas traducidos cuando se la combina con las versiones en inglés.

  • SOC 2 primavera (del 1/4 al 31/3)
  • SOC 1 verano (del 1/7 al 30/6)
  • SOC 2 otoño (del 1/10 al 30/9)
  • SOC 1 invierno (del 1/1 al 31/12)

Los clientes pueden descargar los informes de SOC 2 primavera 2024 traducidos al japonés, coreano y español a través de AWS Artifact, un portal de autoservicio para el acceso bajo demanda a los informes de cumplimiento de AWS. Inicie sesión en AWS Artifact mediante la Consola de administración de AWS u obtenga más información en Introducción a AWS Artifact.

El informe de SOC 2 primavera 2024 incluye un total de 177 servicios que se encuentran dentro del alcance. Para acceder a información actualizada, que incluye novedades sobre cuándo se agregan nuevos servicios, consulte los Servicios de AWS en el ámbito del programa de conformidad y seleccione SOC.

AWS se esfuerza de manera continua por añadir servicios dentro del alcance de sus programas de conformidad para ayudarlo a cumplir con sus necesidades de arquitectura y regulación. Si tiene alguna pregunta o sugerencia sobre la conformidad de los SOC, no dude en comunicarse con su equipo de cuenta de AWS.

Para obtener más información sobre los programas de conformidad y seguridad, consulte los Programas de conformidad de AWS. Como siempre, valoramos sus comentarios y preguntas; de modo que no dude en comunicarse con el equipo de conformidad de AWS a través de la página Contacte con nosotros.

Brownell Combs

Brownell Combs
Brownell is a Compliance Program Manager at AWS, where he leads multiple security and privacy initiatives. Brownell holds a Master’s Degree in Computer Science from the University of Virginia and a Bachelor’s Degree in Computer Science from Centre College. He has over 20 years of experience in information technology risk management and CISSP, CISA, CRISC, and GIAC GCLD certifications.

Rodrigo Fiuza

Rodrigo Fiuza
Rodrigo is a Security Audit Manager at AWS, based in São Paulo. He leads audits, attestations, certifications, and assessments across Latin America, the Caribbean, and Europe. Rodrigo has worked in risk management, security assurance, and technology audits for the past 12 years.

Paul Hong

Paul Hong
Paul is a Compliance Program Manager at AWS. He leads multiple security, compliance, and training initiatives within AWS and has over 10 years of experience in security assurance. Paul is a CISSP, CEH, and CPA, and holds a Masters of Accounting Information Systems and a Bachelors of Business Administration from James Madison University, Virginia.

Hwee Hwang

Hwee Hwang
Hwee is an Audit Specialist at AWS based in Seoul, South Korea. Hwee is responsible for third-party and customer audits, certifications, and assessments in Korea. Hwee previously worked in security governance, risk, and compliance and is laser focused on building customers’ trust and providing them assurance in the cloud.

Tushar Jain

Tushar Jain
Tushar is a Compliance Program Manager at AWS, where he leads multiple security, compliance, and training initiatives. He holds a Master of Business Administration from Indian Institute of Management, Shillong, India and a Bachelor of Technology in Electronics and Telecommunication Engineering from Marathwada University, India. He has over 12 years of experience in information security and holds CCSK and CSXF certifications.

Eun Jin Kim

Eun Jin Kim
Eun Jin is a security assurance professional working as the Audit Program Manager at AWS. She mainly leads compliance programs in South Korea for the financial sector. She has more than 25 years of experience and holds a Master’s Degree in Management Information Systems from Carnegie Mellon University in Pittsburgh, Pennsylvania and a Master’s Degree in Law from George Mason University in Arlington, Virginia.

Michael Murphy

Michael Murphy
Michael is a Compliance Program Manager at AWS, where he leads multiple security and privacy initiatives. Michael has 12 years of experience in information security. He holds a Master’s Degree in Computer Engineering and a Bachelor’s Degree in Computer Engineering from Stevens Institute of Technology. He also holds CISSP, CRISC, CISA, and CISM certifications.

Nathan Samuel

Nathan Samuel
Nathan is a Compliance Program Manager at AWS, where he leads multiple security and privacy initiatives. Nathan has a Bachelor’s of Commerce degree from the University of the Witwatersrand, South Africa. He has 21 years’ experience in security assurance and holds the CISA, CRISC, CGEIT, CISM, CDPSE, and Certified Internal Auditor certifications.

Seul Un Sung

Seul Un Sung
Seul Un is a Security Assurance Audit Program Manager at AWS, where she has been leading South Korea audit programs, including K-ISMS and RSEFT, for the past four years. She has a Bachelor’s of Information Communication and Electronic Engineering degree from Ewha Womans University and has 14 years of experience in IT risk, compliance, governance, and audit, and holds the CISA certification.

Hidetoshi Takeuchi

Hidetoshi Takeuchi
Hidetoshi is a Senior Audit Program Manager at AWS, based in Japan, leading Japan and India security certification and authorization programs. Hidetoshi has led information technology, cyber security, risk management, compliance, security assurance, and technology audits for the past 28 years and holds the CISSP certifications.

ryan wilks

Ryan Wilks
Ryan is a Compliance Program Manager at AWS, where he leads multiple security and privacy initiatives. Ryan has 13 years of experience in information security. He has a bachelor of arts degree from Rutgers University and holds ITIL, CISM, and CISA certifications.

Hardening the RAG chatbot architecture powered by Amazon Bedrock: Blueprint for secure design and anti-pattern migration

Post Syndicated from Magesh Dhanasekaran original https://aws.amazon.com/blogs/security/hardening-the-rag-chatbot-architecture-powered-by-amazon-bedrock-blueprint-for-secure-design-and-anti-pattern-migration/

This blog post demonstrates how to use Amazon Bedrock with a detailed security plan to deploy a safe and responsible chatbot application. In this post, we identify common security risks and anti-patterns that can arise when exposing a large language model (LLM) in an application. Amazon Bedrock is built with features you can use to mitigate vulnerabilities and incorporate secure design principles. This post highlights architectural considerations and best practice strategies to enhance the reliability of your LLM-based application.

Amazon Bedrock unleashes the fusion of generative artificial intelligence (AI) and LLMs, empowering you to craft impactful chatbot applications. As with technologies handling sensitive data and intellectual property, it’s crucial that you prioritize security and adopt a robust security posture. Without proper measures, these applications can be susceptible to risks such as prompt injection, information disclosure, model exploitation, and regulatory violations. By proactively addressing these security considerations, you can responsibly use Amazon Bedrock foundation models and generative AI capabilities.

The chatbot application use case represents a common pattern in enterprise environments, where businesses want to use the power of generative AI foundation models (FMs) to build their own applications. This falls under the Pre-trained models category of the Generative AI Security Scoping Matrix. In this scope, businesses directly integrate with FMs like Anthropic’s Claude through Amazon Bedrock APIs to create custom applications, such as customer support Retrieval Augmented Generation (RAG) chatbots, content generation tools, and decision support systems.

This post provides a comprehensive security blueprint for deploying chatbot applications that integrate with Amazon Bedrock, enabling the responsible adoption of LLMs and generative AI in enterprise environments. We outline mitigation strategies through secure design principles, architectural considerations, and best practices tailored to the challenges of integrating LLMs and generative AI capabilities.

By following the guidance in this post, you can proactively identify and mitigate risks associated with deploying and operating chatbot applications that integrate with Amazon Bedrock and use generative AI models. The guidance can help you strengthen the security posture, protect sensitive data and intellectual property, maintain regulatory compliance, and responsibly deploy generative AI capabilities within your enterprise environments.

This post contains the following high-level sections:

Chatbot application architecture overview

The chatbot application architecture described in this post represents an example implementation that uses various AWS services and integrates with Amazon Bedrock and Anthropic’s Claude 3 Sonnet LLM. This baseline architecture serves as a foundation to understand the core components and their interactions. However, it’s important to note that there can be multiple ways for customers to design and implement a chatbot architecture that integrates with Amazon Bedrock, depending on their specific requirements and constraints. Regardless of the implementation approach, it’s crucial to incorporate appropriate security controls and follow best practices for secure design and deployment of generative AI applications.

The chatbot application allows users to interact through a frontend interface and submit prompts or queries. These prompts are processed by integrating with Amazon Bedrock, which uses the Anthropic Claude 3 Sonnet LLM and a knowledge base built from ingested data. The LLM generates relevant responses based on the prompts and retrieved context from the knowledge base. While this baseline implementation outlines the core functionality, it requires incorporating security controls and following best practices to mitigate potential risks associated with deploying generative AI applications. In the subsequent sections, we discuss security anti-patterns that can arise in such applications, along with their corresponding mitigation strategies. Additionally, we present a secure and responsible architecture blueprint for the chatbot application powered by Amazon Bedrock.

Figure 1: Baseline chatbot application architecture using AWS services and Amazon Bedrock

Figure 1: Baseline chatbot application architecture using AWS services and Amazon Bedrock

Components in the chatbot application baseline architecture

The chatbot application architecture uses various AWS services and integrates with the Amazon Bedrock service and Anthropic’s Claude 3 Sonnet LLM to deliver an interactive and intelligent chatbot experience. The main components of the architecture (as shown in Figure 1) are:

  1. User interaction layer:
    Users interact with the chatbot application through the Streamlit frontend (3), a Python-based open-source library, used to build the user-friendly and interactive interface.
  2. Amazon Elastic Container Service (Amazon ECS) on AWS Fargate:
    A fully managed and scalable container orchestration service that eliminates the need to provision and manage servers, allowing you to run containerized applications without having to manage the underlying compute infrastructure.
  3. Application hosting and deployment:
    The Streamlit application (3) components are hosted and deployed on Amazon ECS on AWS Fargate (2), maintaining scalability and high availability. This architecture represents the application and hosting environment in an independent virtual private cloud (VPC) to promote a loosely-coupled architecture. The Streamlit frontend can be replaced with your organization’s specific frontend and quickly integrated with the backend Amazon API Gateway in the VPC. An application load balancer is used to distribute traffic to the Streamlit application instances.
  4. API Gateway driven Lambda Integration:
    In this example architecture, instead of directly invoking the Amazon Bedrock service from the frontend, an API Gateway backed by an AWS Lambda function (5) is used as an intermediary layer. This approach promotes better separation of concerns, scalability, and secure access to Amazon Bedrock by limiting direct exposure from the frontend.
  5. Lambda:
    Lambda provides highly scalable, short-term serverless compute. Here, the requests from Streamlit are processed. First, the history of the user’s session is retrieved from Amazon DynamoDB (6). Second, the user’s question, history, and the context are formatted into a prompt template and queried against Amazon Bedrock with the knowledge base, employing retrieval augmented generation (RAG).
  6. DynamoDB:
    DynamoDB is responsible for storing and retrieving chat history, conversation history, recommendations, and other relevant data using the Lambda function.
  7. Amazon S3:
    Amazon Simple Storage Services (Amazon S3) is a data storage service and is used here for storing data artifacts that are ingested into the knowledge base.
  8. Amazon Bedrock:
    Amazon Bedrock plays a central role in the architecture. It handles the questions posed by the user using Anthropic Claude 3 Sonnet LLM (9) combined with a previously generated knowledge base (10) of the customer’s organization-specific data.
  9. Anthropic Claude 3 Sonnet:
    Anthropic Claude 3 Sonnet is the LLM used to generate tailored recommendations and responses based on user inputs and the context retrieved from the knowledge base. It’s part of the text analysis and generation module in Amazon Bedrock.
  10. Knowledge base and data ingestion:
    Relevant documents classified as public are ingested from Amazon S3 (9) into in an Amazon Bedrock knowledge base. Knowledge bases are backed by Amazon OpenSearch Service. Amazon Titan Embeddings (10) are used to generate the vector embeddings database of the documents. Storing the data as vector embeddings allows for semantic similarity searching of the documents to retrieve the context of the question posed by the user (RAG). By providing the LLM with context in addition to the question, there’s a much higher chance of getting a useful answer from the LLM.

Comprehensive logging and monitoring strategy

This section outlines a comprehensive logging and monitoring strategy for the Amazon Bedrock-powered chatbot application, using various AWS services to enable centralized logging, auditing, and proactive monitoring of security events, performance metrics, and potential threats.

  1. Logging and auditing:
    • AWS CloudTrail: Logs API calls made to Amazon Bedrock, including InvokeModel requests, as well as information about the user or service that made the request.
    • AWS CloudWatch Logs: Captures and analyzes Amazon Bedrock invocation logs, user prompts, generated responses, and errors or warnings encountered during the invocation process.
    • Amazon OpenSearch Service: Logs and indexes data related to the OpenSearch integration, context data retrievals, and knowledge base operations.
    • AWS Config: Monitors and audits the configuration of resources related to the chatbot application and Amazon Bedrock service, including IAM policies, VPC settings, encryption key management, and other resource configurations.
  2. Monitoring and alerting:
    • AWS CloudWatch: Monitors metrics specific to Amazon Bedrock, such as the number of model invocations, latency of invocations, and error metrics (client-side errors, server-side errors, and throttling). Configures targeted CloudWatch alarms to proactively detect and respond to anomalies or issues related to Bedrock invocations and performance.
    • AWS GuardDuty: Continuously monitors CloudTrail logs for potential threats and unauthorized activity within the AWS environment.
    • AWS Security Hub: Provides centralized security posture management and compliance checks.
    • Amazon Security Lake: Provides a centralized data lake for log analysis; is integrated with CloudTrail and SecurityHub.
  3. Security information and event management integration:
    • Integrate with security information and event management (SIEM) solutions for centralized log management, real-time monitoring of security events, and correlation of logging data from multiple sources (CloudTrail, CloudWatch Logs, OpenSearch Service, and so on).
  4. Continuous improvement:
    • Regularly review and update logging and monitoring configurations, alerting thresholds, and integration with security solutions to address emerging threats, changes in application requirements, or evolving best practices.

Security anti-patterns and mitigation strategies

This section identifies and explores common security anti-patterns associated with the Amazon Bedrock chatbot application architecture. By recognizing these anti-patterns early in the development and deployment phases, you can implement effective mitigation strategies and fortify your security posture.

Addressing security anti-patterns in the Amazon Bedrock chatbot application architecture is crucial for several reasons:

  1. Data protection and privacy: The chatbot application processes and generates sensitive data, including personal information, intellectual property, and confidential business data. Failing to address security anti-patterns can lead to data breaches, unauthorized access, and potential regulatory violations.
  2. Model integrity and reliability: Vulnerabilities in the chatbot application can enable bad actors to manipulate or exploit the underlying generative AI models, compromising the integrity and reliability of the generated outputs. This can have severe consequences, particularly in decision-support or critical applications.
  3. Responsible AI deployment: As the adoption of generative AI models continues to grow, it’s essential to maintain responsible and ethical deployment practices. Addressing security anti-patterns is crucial for maintaining trust, transparency, and accountability in the chatbot application powered by AI models.
  4. Compliance and regulatory requirements: Many industries and regions have specific regulations and guidelines governing the use of AI technologies, data privacy, and information security. Addressing security anti-patterns is a critical step towards adhering to and maintaining compliance for the chatbot application.

The security anti-patterns that are covered in this post include:

  1. Lack of secure authentication and access controls
  2. Insufficient input validation and sanitization
  3. Insecure communication channels
  4. Inadequate prompt and response logging, auditing, and non-repudiation
  5. Insecure data storage and access controls
  6. Failure to secure FMs and generative AI components
  7. Lack of responsible AI governance and ethics
  8. Lack of comprehensive testing and validation

Anti-pattern 1: Lack of secure authentication and access controls

In a generative AI chatbot application using Amazon Bedrock, a lack of secure authentication and access controls poses significant risks to the confidentiality, integrity, and availability of the system. Identity spoofing and unauthorized access can enable threat actors to impersonate legitimate users or systems, gain unauthorized access to sensitive data processed by the chatbot application, and potentially compromise the integrity and confidentiality of the customer’s data and intellectual property used by the application.

Identity spoofing and unauthorized access are important areas to address in this architecture, as the chatbot application handles user prompts and responses, which may contain sensitive information or intellectual property. If a threat actor can impersonate a legitimate user or system, they can potentially inject malicious prompts, retrieve confidential data from the knowledge base, or even manipulate the responses generated by the Anthropic Claude 3 LLM integrated with Amazon Bedrock.

Anti-pattern examples

  • Exposing the Streamlit frontend interface or the API Gateway endpoint without proper authentication mechanisms, potentially allowing unauthenticated users to interact with the chatbot application and inject malicious prompts.
  • Storing or hardcoding AWS access keys or API credentials in the application code or configuration files, increasing the risk of credential exposure and unauthorized access to AWS services like Amazon Bedrock or DynamoDB.
  • Implementing weak or easily guessable passwords for administrative or service accounts with elevated privileges to access the Amazon Bedrock service or other critical components.
  • Lacking multi-factor authentication (MFA) for AWS Identity and Access Management (IAM) users or roles with privileged access, increasing the risk of unauthorized access to AWS resources, including the Amazon Bedrock service, if credentials are compromised.

Mitigation strategies

To mitigate the risks associated with a lack of secure authentication and access controls, implement robust IAM controls, as well as continuous logging, monitoring, and threat detection mechanisms.

IAM controls:

  • Use industry-standard protocols like OAuth 2.0 or OpenID Connect, and integrate with AWS IAM Identity Center or other identity providers for centralized authentication and authorization for the Streamlit frontend interface and AWS API Gateway endpoints.
  • Implement fine-grained access controls using AWS IAM policies and resource-based policies to restrict access to only the necessary Amazon Bedrock resources, Lambda functions, and other components required for the chatbot application.
  • Enforce the use of MFA for all IAM users, roles, and service accounts with access to critical components like Amazon Bedrock, DynamoDB, or the Streamlit application.

Continuous logging and monitoring and threat detection:

  • See the Comprehensive logging and monitoring strategy section for guidance on implementing centralized logging and monitoring solutions to track and audit authentication events, access attempts, and potential unauthorized access or credential misuse across the chatbot application components and Amazon Bedrock service, as well as using CloudWatch, Lambda, and GuardDuty to detect and respond to anomalous behavior and potential threats.

Anti-pattern 2: Insufficient input sanitization and validation

Insufficient input validation and sanitization in a generative AI chatbot application can expose the system to various threats, including injection events, data tampering, adversarial events, and data poisoning events. These vulnerabilities can lead to unauthorized access, data manipulation, and compromised model outputs.

Injection events: If user prompts or inputs aren’t properly sanitized and validated, a threat actor can potentially inject malicious code, such as SQL code, leading to unauthorized access or manipulation of the DynamoDB chat history data. Additionally, if the chatbot application or components process user input without proper validation, a threat actor can potentially inject and run arbitrary code on the backend systems, compromising the entire application.

Data tampering: A threat actor can potentially modify user prompts or payloads in transit between the chatbot interface and Amazon Bedrock service, leading to unintended model responses or actions. Lack of data integrity checks can allow a threat actor to tamper with the context data exchanged between Amazon Bedrock and OpenSearch, potentially leading to incorrect or malicious search results influencing the LLM responses.

Data poisoning events: If the training data or context data used by the LLM or chatbot application isn’t properly validated and sanitized, bad actors can potentially introduce malicious or misleading data, leading to biased or compromised model outputs.

Anti-pattern examples

  • Failure to validate and sanitize user prompts before sending them to Amazon Bedrock, potentially leading to injection events or unintended data exposure.
  • Lack of input validation and sanitization for context data retrieved from OpenSearch, allowing malformed or malicious data to influence the LLM’s responses.
  • Insufficient sanitization of LLM-generated responses before displaying them to users, enabling potential code injection or rendering of harmful content.
  • Inadequate sanitization of user input in the Streamlit application or Lambda functions, failing to remove or escape special characters, code snippets, or potentially malicious patterns, enabling code injection events.
  • Insufficient validation and sanitization of training data or other data sources used by the LLM or chatbot application, allowing data poisoning events that can introduce malicious or misleading data, leading to biased or compromised model outputs.
  • Allowing unrestricted character sets, input lengths, or special characters in user prompts or data inputs, enabling adversaries to craft inputs that bypass input validation and sanitization mechanisms, potentially causing undesirable or malicious outputs.
  • Relying solely on deny lists for input validation, which can be quickly bypassed by adversaries, potentially leading to injection events, data tampering, or other exploit scenarios.

Mitigation strategies

To mitigate the risks associated with insufficient input validation and sanitization, implement robust input validation and sanitization mechanisms throughout the chatbot application and its components.

Input validation and sanitization:

  • Implement strict input validation rules for user prompts at the chatbot interface and Amazon Bedrock service boundaries, defining allowed character sets, maximum input lengths, and disallowing special characters or code snippets. Use Amazon Bedrock’s Guardrails feature, which allows defining denied topics and content filters to remove undesirable and harmful content from user interactions with your applications.
  • Use allow lists instead of deny lists for input validation to maintain a more robust and comprehensive approach.
  • Sanitize user input by removing or escaping special characters, code snippets, or potentially malicious patterns.

Data flow validation:

  • Validate and sanitize data flows between components, including:
    • User prompts sent to the FM and responses generated by the FM and returned to the chatbot interface.
    • Training data, context data, and other data sources used by the FM or chatbot application.

Protective controls:

  • Use AWS Web Application Firewall (WAF) for input validation and protection against common web exploits.
  • Use AWS Shield for protection against distributed denial of service (DDoS) events.
  • Use CloudTrail to monitor API calls to Amazon Bedrock, including InvokeModel requests.
  • See the Comprehensive logging and monitoring strategy section for guidance on implementing Lambda functions, Amazon EventBridge rules, and CloudWatch Logs to analyze CloudTrail logs, ingest application logs, user prompts, and responses, and integrate with incident response and SIEM solutions for detecting, investigating, and mitigating security incidents related to input validation and sanitization, including jailbreaking attempts and anomalous behavior.

Anti-pattern 3: Insecure communication channels

Insecure communication channels between chatbot application components can expose sensitive data to interception, tampering, and unauthorized access risks. Unsecured channels enable man-in-the-middle events where threat actors intercept, modify data in transit such as user prompts, responses, and context data, leading to data tampering, malicious payload injection, and unauthorized information access.

Anti-pattern examples

  • Failure to use AWS PrivateLink for secure service-to-service communication within the VPC, exposing communications between Amazon Bedrock and other AWS services to potential risks over the public internet, even when using HTTPS.
  • Absence of data integrity checks or mechanisms to detect and prevent data tampering during transmission between components.
  • Failure to regularly review and update communication channel configurations, protocols, and encryption mechanisms to address emerging threats and ensure compliance with security best practices.

Mitigation strategies

To mitigate the risks associated with insecure communication channels, implement secure communication mechanisms and enforce data integrity throughout the chatbot application’s components and their interactions. Proper encryption, authentication, and integrity checks should be employed to protect sensitive data in transit and help prevent unauthorized access, data tampering, and man-in-the-middle events.

Secure communication channels:

  • Use PrivateLink for secure service-to-service communication between Amazon Bedrock and other AWS services used in the chatbot application architecture. PrivateLink provides a private, isolated communication channel within the Amazon VPC, eliminating the need to traverse the public internet. This mitigates the risk of potential interception, tampering, or unauthorized access to sensitive data transmitted between services, even when using HTTPS.
  • Use AWS Certificate Manager (ACM) to manage and automate the deployment of SSL/TLS certificates used for secure communication between the chatbot frontend interface (the Streamlit application) and the API Gateway endpoint. ACM simplifies the provisioning, renewal, and deployment of SSL/TLS certificates, making sure that communication channels between the user-facing components and the backend API are securely encrypted using industry-standard protocols and up-to-date certificates.

Continuous logging and monitoring:

  • See the Comprehensive Logging and Monitoring Strategy section for guidance on implementing centralized logging and monitoring mechanisms to detect and respond to potential communication channel anomalies or security incidents, including monitoring communication channel metrics, API call patterns, request payloads, and response data, using AWS services like CloudWatch, CloudTrail, and AWS WAF.

Network segmentation and isolation controls

  • Implement network segmentation by deploying the Amazon ECS cluster within a dedicated VPC and subnets, isolating it from other components and restricting communication based on the principle of least privilege.
  • Create separate subnets within the VPC for the public-facing frontend tier and the backend application tier, further isolating the components.
  • Use AWS security groups and network access control lists (NACLs) to control inbound and outbound traffic at the instance and subnet levels, respectively, for the ECS cluster and the frontend instances.

Anti-pattern 4: Inadequate logging, auditing, and non-repudiation

Inadequate logging, auditing, and non-repudiation mechanisms in a generative AI chatbot application can lead to several risks, including a lack of accountability, challenges in forensic analysis, and compliance concerns. Without proper logging and auditing, it’s challenging to track user activities, diagnose issues, perform forensic analysis in case of security incidents, and demonstrate compliance with regulations or internal policies.

Anti-pattern examples

  • Lack of logging for data flows between components, such as user prompts sent to Amazon Bedrock, context data exchanged with OpenSearch, and responses from the LLM, hindering investigative efforts in case of security incidents or data breaches.
  • Insufficient logging of user activities within the chatbot application—such as sign in attempts, session duration, and actions performed—limiting the ability to track and attribute actions to specific users.
  • Absence of mechanisms to ensure the integrity and authenticity of logged data, allowing potential tampering or repudiation of logged events.
  • Failure to securely store and protect log data from unauthorized access or modification, compromising the reliability and confidentiality of log information.

Mitigation strategies

To mitigate the risks associated with inadequate logging, auditing, and non-repudiation, implement comprehensive logging and auditing mechanisms to capture critical events, user activities, and data flows across the chatbot application components. Additionally, measures must be taken to maintain the integrity and authenticity of log data, help prevent tampering or repudiation, and securely store and protect log information from unauthorized access.

Comprehensive logging and auditing:

  • See the Comprehensive logging and monitoring strategy section for detailed guidance on implementing logging mechanisms using CloudTrail, CloudWatch Logs, and OpenSearch Service, as well as using CloudTrail for logging and monitoring API calls, especially Amazon Bedrock API calls and other API activities within the AWS environment, using CloudWatch for monitoring Amazon Bedrock-specific metrics, and ensuring log data integrity and non-repudiation through the CloudTrail log file integrity validation feature and implementing S3 Object Lock and S3 Versioning for log data stored in Amazon S3.
  • Make sure that log data is securely stored and protected from unauthorized access by using AWS Key Management Service (AWS KMS) for encryption at rest and implementing restrictive IAM policies and resource-based policies to control access to log data.
  • Retain log data for an appropriate period based on compliance requirements, using CloudTrail log file integrity validation and CloudWatch Logs retention periods and data archiving capabilities.

User activity monitoring and tracking:

  • Use CloudTrail for logging and monitoring API calls, especially Amazon Bedrock API calls and other API activities within the AWS environment, such as API Gateway, Lambda, and DynamoDB. Additionally, use CloudWatch for monitoring metrics specific to Amazon Bedrock, including the number of model invocations, latency, and error metrics (client-side errors, server-side errors, and throttling).
  • Integrate with security information and event management (SIEM) solutions for centralized log management and real-time monitoring of security events.

Data integrity and non-repudiation:

  • Implement digital signatures or non-repudiation mechanisms to verify the integrity and authenticity of logged data, minimizing tampering or repudiation of logged events. Use the CloudTrail log file integrity validation feature, which uses industry-standard algorithms (SHA-256 for hashing and SHA-256 with RSA for digital signing) to provide non-repudiation and verify log data integrity. For log data stored in Amazon S3, enable S3 Object Lock and S3 Versioning to provide an immutable, write once, read many (WORM) data storage model, helping to prevent object deletions or modifications, and maintaining data integrity and non-repudiation. Additionally, implement S3 bucket policies and IAM policies to restrict access to log data stored in S3, further enhancing the security and non-repudiation of logged events.

Anti-pattern 5: Insecure data storage and access controls

Insecure data storage and access controls in a generative AI chatbot application can lead to significant risks, including information disclosure, data tampering, and unauthorized access. Storing sensitive data, such as chat history, in an unencrypted or insecure manner can result in information disclosure if the data store is compromised or accessed by unauthorized entities. Additionally, a lack of proper access controls can allow unauthorized parties to access, modify, or delete data, leading to data tampering or unauthorized access.

Anti-pattern examples

  • Storing chat history data in DynamoDB without encryption at rest using AWS KMS customer-managed keys (CMKs).
  • Lack of encryption at rest using CMKs from AWS KMS for data in OpenSearch, Amazon S3, or other components that handle sensitive data.
  • Overly permissive access controls or lack of fine-grained access control mechanisms for the DynamoDB chat history, OpenSearch, Amazon S3, or other data stores, increasing the risk of unauthorized access or data breaches.
  • Storing sensitive data in clear text, or using insecure encryption algorithms or key management practices.
  • Failure to regularly review and rotate encryption keys or update access control policies to address potential security vulnerabilities or changes in access requirements.

Mitigation strategies

To mitigate the risks associated with insecure data storage and access controls, implement robust encryption mechanisms, secure key management practices, and fine-grained access control policies. Encrypting sensitive data at rest and in transit, using customer-managed encryption keys from AWS KMS, and implementing least- privilege access controls based on IAM policies and resource-based policies can significantly enhance the security and protection of data within the chatbot application architecture.

Key management and encryption at rest:

  • Implement AWS KMS to manage and control access to CMKs for data encryption across components like DynamoDB, OpenSearch, and Amazon S3.
    • Use CMKs to configure DynamoDB to automatically encrypt chat history data at rest.
    • Configure OpenSearch and Amazon S3 to use encryption at rest with AWS KMS CMKs for data stored in these services.
    • CMKs provide enhanced security and control, allowing you to create, rotate, disable, and revoke encryption keys, enabling better key isolation and separation of duties.
    • CMKs enable you to enforce key policies, audit key usage, and adhere to regulatory requirements or organizational policies that mandate customer-managed encryption keys.
    • CMKs offer portability and independence from specific services, allowing you to migrate or integrate data across multiple services while maintaining control over the encryption keys.
    • AWS KMS provides a centralized and secure key management solution, simplifying the management and auditing of encryption keys across various components and services.
  • Implement secure key management practices, including:
    • Regular key rotation to maintain the security of your encrypted data.
    • Separation of duties to make sure that no single individual has complete control over key management operations.
    • Strict access controls for key management operations, using IAM policies and roles to enforce the principle of least privilege.

Fine-grained access controls:

  • Implement fine-grained access controls for the DynamoDB chat history data store, OpenSearch, Amazon S3, and other data stores using IAM policies and roles.
  • Implement fine-grained access controls and define least-privilege access policies for all resources handling sensitive data, such as the DynamoDB chat history data store, OpenSearch, Amazon S3, and other data stores or services. For example, use IAM policies and resource-based policies to restrict access to specific DynamoDB tables, OpenSearch domains, and S3 buckets, limiting access to only the necessary actions (for example, read, write, and list) based on the principle of least privilege. Extend this approach to all resources handling sensitive data within the chatbot application architecture, making sure that access is granted only to the minimum required resources and actions necessary for each component or user role.

Continuous improvement:

  • Regularly review and update encryption configurations, access control policies, and key management practices to address potential security vulnerabilities or changes in access requirements.

Anti-pattern 6: Failure to secure FM and generative AI components

Inadequate security measures for FMs and generative AI components in a chatbot application can lead to severe risks, including model tampering, unintended information disclosure, and denial of service. Threat actors can manipulate unsecured FMs and generative AI models to generate biased, harmful, or malicious responses, potentially causing significant harm or reputational damage.

Lack of proper access controls or input validation can result in unintended information disclosure, where sensitive data is inadvertently included in model responses. Additionally, insecure FM or generative AI components can be vulnerable to denial-of-service events, disrupting the availability of the chatbot application and impacting its functionality.

Anti-pattern examples

  • Insecure model fine tuning practices, such as using untrusted or compromised data sources, can lead to biased or malicious models.
  • Lack of continuous monitoring for FM and generative AI components, leaving them vulnerable to emerging threats or known vulnerabilities.
  • Lack of guardrails or safety measures to control and filter the outputs of FMs and generative AI components, potentially leading to the generation of harmful, biased, or undesirable content.
  • Inadequate access controls or input validation for prompts and context data sent to the FM components, increasing the risk of injection events or unintended information disclosure.
  • Failure to implement secure deployment practices for FM and generative AI components, including secure communication channels, encryption of model artifacts, and access controls.

Mitigation strategies

To mitigate the risks associated with inadequately secured foundational models (FMs) and generative AI components, implement secure integration mechanisms, robust model fine-tuning and deployment practices, continuous monitoring, and effective guardrails and safety measures. These mitigation strategies help prevent model tampering, unintended information disclosure, denial-of-service events, and the generation of harmful or undesirable content, while ensuring the security, reliability, and ethical alignment of the chatbot application’s generative AI capabilities.

Secure integration with LLMs and knowledge bases:

  • Implement secure communication channels (for example HTTPS or PrivateLink) between Amazon Bedrock, OpenSearch, and the FM components to help prevent unauthorized access or data tampering.
  • Implement strict input validation and sanitization for prompts and context data sent to the FM components to help prevent injection events or unintended information disclosure.
  • Implement access controls and least-privilege principles for the OpenSearch integration to limit the data accessible to the LLM components.

Secure model fine tuning, deployment, and monitoring:

  • Establish secure and auditable fine-tuning pipelines, using trusted and vetted data sources, to help prevent tampering or the introduction of biases.
  • Implement secure deployment practices for FM and generative AI components, including access controls, secure communication channels, and encryption of model artifacts.
  • Continuously monitor FM and generative AI components for security vulnerabilities, performance issues, and unintended behavior.
  • Implement rate-limiting, throttling, and load-balancing mechanisms to help prevent denial-of-service events on FM and generative AI components.
  • Regularly review and audit FM and generative AI components for compliance with security policies, industry best practices, and regulatory requirements.

Guardrails and safety measures

  • Implement guardrails, which are safety measures designed to reduce harmful outputs and align the behavior of FMs and generative AI components with human values.
  • Use keyword-based filtering, metric-based thresholds, human oversight, and customized guardrails tailored to the specific risks and cultural and ethical norms of each application domain.
  • Monitor the effectiveness of guardrails through performance benchmarking and adversarial testing.

Jailbreak robustness testing

  • Conduct jailbreak robustness testing by prompting the FMs and generative AI components with a diverse set of jailbreak attempts across different prohibited scenarios to identify weaknesses and improve model robustness.

Anti-pattern 7: Lack of responsible AI governance and ethics

While the previous anti-patterns focused on technical security aspects, it is equally important to address the ethical and responsible governance of generative AI systems. Without strong governance frameworks, ethical guidelines, and accountability measures, chatbot applications can result in unintended consequences, biased outcomes, and a lack of transparency and trust.

Anti-pattern examples

  • Lack of an established ethical AI governance framework, including principles, policies, and processes to guide the responsible development and deployment of the generative AI chatbot application.
  • Insufficient measures to ensure transparency, explainability, and interpretability of the LLM and generative AI components, making it difficult to understand and audit their decision-making processes.
  • Absence of mechanisms for stakeholder engagement, public consultation, and consideration of societal impacts, potentially leading to a lack of trust and acceptance of the chatbot application.
  • Failure to address potential biases, discrimination, or unfairness in the training data, models, or outputs of the generative AI system.
  • Inadequate processes for testing, validation, and ongoing monitoring of the chatbot application’s ethical behavior and alignment with organizational values and societal norms.

Mitigation strategies

To minimize a lack of responsible AI governance and ethics, establish a comprehensive ethical AI governance framework, promote transparency and interpretability, engage stakeholders and consider societal impacts, address potential biases and fairness issues, implement continuous improvement and monitoring processes, and use guardrails and safety measures. These mitigation strategies help to foster trust, accountability, and ethical alignment in the development and deployment of the generative AI chatbot application, mitigating the risks of unintended consequences, biased outcomes, and a lack of transparency.

Ethical AI governance framework:

  • Establish an ethical AI governance framework, including principles, policies, and processes to guide the responsible development and deployment of the generative AI chatbot application.
  • Define clear ethical guidelines and decision-making frameworks to address potential ethical dilemmas, biases, or unintended consequences.
  • Implement accountability measures, such as designated ethics boards, ethics officers, or external advisory committees, to oversee the ethical development and deployment of the chatbot application.

Transparency and interpretability:

  • Implement measures to promote transparency and interpretability of the LLM and generative AI components, allowing for auditing and understanding of their decision-making processes.
  • Provide clear and accessible information to stakeholders and users about the chatbot application’s capabilities, limitations, and potential biases or ethical considerations.

Stakeholder engagement and societal impact:

  • Establish mechanisms for stakeholder engagement, public consultation, and consideration of societal impacts, fostering trust and acceptance of the chatbot application.
  • Conduct impact assessments to identify and mitigate potential negative consequences or risks to individuals, communities, or society.

Bias and fairness:

  • Address potential biases, discrimination, or unfairness in the training data, models, or outputs of the generative AI system through rigorous testing, bias mitigation techniques, and ongoing monitoring.
  • Promote diverse and inclusive representation in the development, testing, and governance processes to reduce potential biases and blind spots.

Continuous improvement and monitoring:

  • Implement processes for ongoing testing, validation, and monitoring of the chatbot application’s behavior and alignment with organizational values and societal norms.
  • Regularly review and update the AI governance framework, policies, and processes to address emerging ethical challenges, societal expectations, and regulatory developments.

Guardrails and safety measures:

  • Implement guardrails, such as Guardrails for Amazon Bedrock, which are safety measures designed to reduce harmful outputs and align the behavior of LLMs and generative AI components with human values and responsible AI policies.
  • Use Guardrails for Amazon Bedrock to define denied topics and content filters to remove undesirable and harmful content from interactions between users and your applications.
    • Define denied topics using natural language descriptions to specify topics or subject areas that are undesirable in the context of your application.
    • Configure content filters to set thresholds for filtering harmful content across categories such as hate, insults, sexuality, and violence based on your use cases and responsible AI policies.
    • Use the personally identifiable information (PII) redaction feature to redact information such as names, email addresses, and phone numbers from LLM-generated responses or block user inputs that contain PII.
  • Integrate Guardrails for Amazon Bedrock with CloudWatch to monitor and analyze user inputs and LLM responses that violate defined policies, enabling proactive detection and response to potential issues.
  • Monitor the effectiveness of guardrails through performance benchmarking and adversarial testing, continuously refining and updating the guardrails based on real-world usage and emerging ethical considerations.

Jailbreak robustness testing:

  • Conduct jailbreak robustness testing by prompting the LLMs and generative AI components with a diverse set of jailbreak attempts across different prohibited scenarios to identify weaknesses and improve model robustness.

Anti-pattern 8: Lack of comprehensive testing and validation

Inadequate testing and validation processes for the LLM system and the generative AI chatbot application can lead to unidentified vulnerabilities, performance bottlenecks, and availability issues. Without comprehensive testing and validation, organizations might fail to detect potential security risks, functionality gaps, or scalability and performance limitations before deploying the application in a production environment.

Anti-pattern examples

  • Lack of functional testing to validate the correctness and completeness of the LLM’s responses and the chatbot application’s features and functionalities.
  • Insufficient performance testing to identify bottlenecks, resource constraints, or scalability limitations under various load conditions.
  • Absence of security testing, such as penetration testing, vulnerability scanning, and adversarial testing to uncover potential security vulnerabilities or model exploits.
  • Failure to incorporate automated testing and validation processes into a continuous integration and continuous deployment (CI/CD) pipeline, leading to manual and one-time testing efforts that might overlook critical issues.
  • Inadequate testing of the chatbot application’s integration with external services and components, such as Amazon Bedrock, OpenSearch, and DynamoDB, potentially leading to compatibility issues or data integrity problems.

Mitigation strategies

To address the lack of comprehensive testing and validation, implement a robust testing strategy encompassing functional, performance, security, and integration testing. Integrate automated testing into a CI/CD pipeline, conduct security testing like threat modeling and penetration testing, and use adversarial validation techniques. Continuously improve testing processes to verify the reliability, security, and scalability of the generative AI chatbot application.

Comprehensive testing strategy:

  • Establish a comprehensive testing strategy that includes functional testing, performance testing, load testing, security testing, and integration testing for the LLM system and the overall chatbot application.
  • Define clear testing requirements, test cases, and acceptance criteria based on the application’s functional and non-functional requirements, as well as security and compliance standards.

Automated testing and CI/CD integration:

  • Incorporate automated testing and validation processes into a CI/CD pipeline, enabling continuous monitoring and assessment of the LLM’s performance, security, and reliability throughout its lifecycle.
  • Use automated testing tools and frameworks to streamline the testing process, improve test coverage, and facilitate regression testing.

Security testing and adversarial validation:

  • Conduct threat modeling exercises early in the design process and as soon as the design is finalized for the chatbot application architecture to proactively identify potential security risks and vulnerabilities. Subsequently, conduct regular security testing—including penetration testing, vulnerability scanning, and adversarial testing—to uncover and validate identified security vulnerabilities or model exploits.
  • Implement adversarial validation techniques, such as prompting the LLM with carefully crafted inputs designed to expose weaknesses or vulnerabilities, to improve the model’s robustness and security.

Performance and load testing:

  • Perform comprehensive performance and load testing to identify potential bottlenecks, resource constraints, or scalability limitations under various load conditions.
  • Use tools and techniques for load generation, stress testing, and capacity planning to ensure the chatbot application can handle anticipated user traffic and workloads.

Integration testing:

  • Conduct thorough integration testing to validate the chatbot application’s integration with external services and components, such as Amazon Bedrock, OpenSearch, and DynamoDB, maintaining seamless communication and data integrity.

Continuous improvement:

  • Regularly review and update the testing and validation processes to address emerging threats, new vulnerabilities, or changes in application requirements.
  • Use testing insights and results to continuously improve the LLM system, the chatbot application, and the overall security posture.

Common mitigation strategies for all anti-patterns

  • Regularly review and update security measures, access controls, monitoring mechanisms, and guardrails for LLM and generative AI components to address emerging threats, vulnerabilities, and evolving responsible AI best practices.
  • Conduct regular security assessments, penetration testing, and code reviews to identify and remediate vulnerabilities or misconfigurations related to logging, auditing, and non-repudiation mechanisms.
  • Stay current with security best practices, guidance, and updates from AWS and industry organizations regarding logging, auditing, and non-repudiation for generative AI applications.

Secure and responsible architecture blueprint

After discussing the baseline chatbot application architecture and identifying critical security anti-patterns associated with generative AI applications built using Amazon Bedrock, we now present the secure and responsible architecture blueprint. This blueprint (Figure 2) incorporates the recommended mitigation strategies and security controls discussed throughout the anti-pattern analysis.

Figure 2: Secure and responsible generative AI chatbot architecture blueprint

Figure 2: Secure and responsible generative AI chatbot architecture blueprint

In this target state architecture, unauthenticated users interact with the chatbot application through the frontend interface (1), where it’s crucial to mitigate the anti-pattern of insufficient input validation and sanitization by implementing secure coding practices and input validation. The user inputs are then processed through AWS Shield, AWS WAF, and CloudFront (2), which provide DDoS protection, web application firewall capabilities, and a content delivery network, respectively. These services help mitigate insufficient input validation, web exploits, and lack of comprehensive testing by using AWS WAF for input validation and conducting regular security testing.

The user requests are then routed through API Gateway (3), which acts as the entry point for the chatbot application, facilitating API connections to the Streamlit frontend. To address anti-patterns related to authentication, insecure communication, and LLM security, it’s essential to implement secure authentication protocols, HTTPS/TLS, access controls, and input validation within API Gateway. Communication between the VPC resources and API Gateway is secured through VPC endpoints (4), using PrivateLink for secure private communication and attaching endpoint policies to control which AWS principals can access the API Gateway service (8), mitigating the insecure communication channels anti-pattern.

The Streamlit application (5) is hosted on Amazon ECS in a private subnet within the VPC. It hosts the frontend interface and must implement secure coding practices and input validation to mitigate insufficient input validation and sanitization. User inputs are then processed by Lambda (6), a serverless compute service hosted within the VPC, which connects to Amazon Bedrock, OpenSearch, and DynamoDB through VPC endpoints (7). These VPC endpoints have endpoint policies attached to control access, enabling secure private communication between the Lambda function and the services, mitigating the insecure communication channels anti-pattern. Within Lambda, strict input validation rules, allow-lists, and user input sanitization are implemented to address the input validation anti-pattern.

User requests from the chatbot application are sent to Amazon Bedrock (12), a generative AI solution that powers the LLM capabilities. To mitigate the failure to secure FM and generative AI components anti-pattern, secure communication channels, input validation, and sanitization for prompts and context data must be implemented when interacting with Amazon Bedrock.

Amazon Bedrock interacts with OpenSearch Service (9) using Amazon Bedrock knowledge bases to retrieve relevant context data for the user’s question. The knowledge base is created by ingesting public documents from Amazon S3 (10). To mitigate the anti-pattern of insecure data storage and access controls, implement encryption at rest using AWS KMS and fine-grained IAM policies and roles for access control within OpenSearch Service. Titan Embeddings (11) are the format of the vector embeddings, which represent the documents stored in Amazon S3. The vector format enables similarity calculation and retrieval of relevant information (12). To address the failure to secure FM and generative AI components anti-pattern, secure integration with Titan Embeddings and input data validation should be implemented.

The knowledge base data, user prompts, and context data are processed by Amazon Bedrock (13) with the Claude 3 LLM (14). To address the anti-patterns of failure to secure FM and generative AI components, as well as lack of responsible AI governance and ethics, secure communication channels, input validation, ethical AI governance frameworks, transparency and interpretability measures, stakeholder engagement, bias mitigation, and guardrails like Guardrails for Amazon Bedrock should be implemented.

The generated responses and recommendations are then stored and retrieved in Amazon DynamoDB (15) by the Lambda function. To mitigate insecure data storage and access, encrypting data at rest with AWS KMS (16) and implement fine-grained access controls through IAM policies and roles.

Comprehensive logging, auditing, and monitoring mechanisms are provided by CloudTrail (17), CloudWatch (18), and AWS Config (19) to address the inadequate logging, auditing, and non-repudiation anti-pattern. See the Comprehensive logging and monitoring strategy section for detailed guidance on implementing comprehensive logging, auditing, and monitoring mechanisms using CloudTrail, CloudWatch, CloudWatch Logs, and AWS Config to address the inadequate logging, auditing, and non-repudiation anti-pattern; including logging API calls made to Amazon Bedrock service, monitoring Amazon Bedrock-specific metrics, capturing and analyzing Bedrock invocation logs, and monitoring and auditing the configuration of resources related to the chatbot application and Amazon Bedrock service.

IAM (20) plays a crucial role in the overall architecture and in mitigating anti-patterns related to authentication and insecure data storage and access. IAM roles and permissions are critical in enforcing secure authentication mechanisms, least privilege access, multi-factor authentication, and robust credential management across the various components of the chatbot application. Additionally, service control policies (SCPs) can be configured to restrict access to specific models or knowledge bases within Amazon Bedrock, preventing unauthorized access or use of sensitive intellectual property.

Finally, GuardDuty (21), Amazon Inspector (22), Security Hub (23), and Security Lake (24) have been included as additional recommended services to further enhance the security posture of the chatbot application. GuardDuty (21) provides threat detection across the control and data planes, Amazon Inspector (22) enables vulnerability assessments and continuous monitoring of Amazon ECS and Lambda workloads. Security Hub (23) offers centralized security posture management and compliance checks, while Security Lake (24) acts as a centralized data lake for log analysis, integrated with CloudTrail and SecurityHub.

Conclusion

By identifying critical anti-patterns and providing comprehensive mitigation strategies, you now have a solid foundation for a secure and responsible deployment of generative AI technologies in enterprise environments.

The secure and responsible architecture blueprint presented in this post serves as a comprehensive guide for organizations that want to use the power of generative AI while ensuring robust security, data protection, and ethical governance. By incorporating industry-leading security controls—such as secure authentication mechanisms, encrypted data storage, fine-grained access controls, secure communication channels, input validation and sanitization, comprehensive logging and auditing, secure FM integration and monitoring, and responsible AI guardrails—this blueprint addresses the unique challenges and vulnerabilities associated with generative AI applications.

Moreover, the emphasis on comprehensive testing and validation processes, as well as the incorporation of ethical AI governance principles, makes sure that you can not only mitigate potential risks, but also promote transparency, explainability, and interpretability of the LLM components, while addressing potential biases and ensuring alignment with organizational values and societal norms.

By following the guidance outlined in this post and depicted in the architectural blueprint, you can proactively identify and mitigate potential risks, enhance the security posture of your generative AI-based chatbot solutions, protect sensitive data and intellectual property, maintain regulatory compliance, and responsibly deploy LLMs and generative AI technologies in your enterprise environments.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Magesh Dhanasekaran
Magesh Dhanasekaran

Magesh is a Security Architect at AWS. He has a proven track record providing information security consulting services to financial institutions and government agencies in Australia and the United States. Magesh uses his experience in cloud security architecture, digital transformation, and secure application development practices to provide security advice on AWS products and services. He currently holds multiple industry certifications.
Amy Tipple
Amy Tipple

Amy is a Senior Data Scientist with the Professional Services Data and Machine Learning team and has been with AWS for approximately four years. Amy has worked on several engagements involving generative AI and is an advocate for making sure that generative AI-related security is accessible and understandable for AWS users.

SaaS authentication: Identity management with Amazon Cognito user pools

Post Syndicated from Shubhankar Sumar original https://aws.amazon.com/blogs/security/saas-authentication-identity-management-with-amazon-cognito-user-pools/

Amazon Cognito is a customer identity and access management (CIAM) service that can scale to millions of users. Although the Cognito documentation details which multi-tenancy models are available, determining when to use each model can sometimes be challenging. In this blog post, we’ll provide guidance on when to use each model and review their pros and cons to help inform your decision.

Cognito overview

Amazon Cognito handles user identity management and access control for web and mobile apps. With Cognito user pools, you can add sign-up, sign-in, and access control to your apps. A Cognito user pool is a user directory within a specific AWS Region where users can authenticate and register for applications. In addition, a Cognito user pool is an OpenID Connect (OIDC) identity provider (IdP). App users can either sign in directly through a user pool or federate through a third-party IdP. Cognito issues a user pool token after successful authentication, which can be used to securely access backend APIs and resources.

Cognito issues three types of tokens:

  • ID token – Contains user identity claims like name, email, and phone number. This token type authenticates users and enables authorization decisions in apps and API gateways.
  • Access token – Includes user claims, groups, and authorized scopes. This token type grants access to API operations based on the authenticated user and application permissions. It also enables fine-grained, user-based access control within the application or service.
  • Refresh token – Retrieves new ID and access tokens when these are expired. Access and ID tokens are short-lived, while the refresh token is long-lived. By default, refresh tokens expire 30 days after the user signs in, but this can be configured to a value between 60 minutes and 10 years.

You can find more information on using tokens and their contents in the Cognito documentation.

Multi-tenancy approaches

Software as a service (SaaS) architectures often use silo, pool, or bridge deployment models, which also apply to CIAM services like Cognito. The silo model isolates tenants in dedicated resources. The pool model shares resources between tenants. The bridge model connects siloed and pooled components. This post compares the Cognito silo and pool models for SaaS identity management.

It’s also possible to combine the silo and pool models by having multiple tiers of resources. For example, you could have a siloed tier for sensitive tenant data along with a pooled tier for shared functionality. This is similar to the silo model but with added routing complexity to connect the tiers. When you have multiple pools or silos, this is a similar approach to the pure silo model but with more components to manage.

More detail on these models are included in the AWS SaaS Lens.

We’ve detailed five possible patterns in the following sections and explored the scenarios where each of the patterns can be used, along with the advantages and disadvantages for each. The rest of the post delves deeper into the details of these different patterns, enabling you to make an informed decision that best aligns with your unique requirements and constraints.

Pattern 1: Representing SaaS identity with custom attributes

To implement multi-tenancy in a SaaS application, tenant context needs to be associated with user identity. This allows implementation of the multi-tenant policies and strategies that comprise our SaaS application. Cognito has user pool attributes, which are pieces of information to represent identity. There are standard attributes, such as name and email, that describe the user identity. Cognito also supports custom attributes that can be used to hold information about the user’s relationship to a tenant, such as tenantId.

By using custom attributes for multi-tenancy in Amazon Cognito, the tenant context for each user can be stored in their user profile.

To enable multi-tenancy, you can add a custom attribute like tenantId to the user profile. When a new user signs up, this tenantId attribute can be set to a value indicating which tenant the user belongs to. For example, users with tenantId “1234” belong to Tenant A, while users with tenantId “5678” belong to Tenant B.

The tenantId attribute value gets returned in the ID token after a successful user authentication. (This value can also be added to the access token through customization by using a pre-token generation Lambda trigger.) The application can then inspect this claim to determine which tenant the user belongs to. The tenantId attribute is typically managed at the SaaS platform level and is read-only to users and the application layer. (Note: SaaS providers need to configure the tenantId attribute to be read-only.)

In addition to storing a tenant ID, you can use custom attributes to model additional tenant context. For instance, attributes like tenantName, tenantTier, or tenantRegion could be defined and set appropriately for each user to provide relevant informational context for the application. However, make sure not to use custom attributes as a database—they are meant to represent identity, not store application data. Custom attributes should only contain information that is relevant for authorization decisions and JSON web token (JWT) compactness and should be relatively static because their values are stored in the Cognito directory. Updating frequently changing data requires modifying the directory, which can be cumbersome.

The custom attributes themselves need to be defined at the time of creating the Amazon Cognito user pool, and there is a maximum of 50 custom attributes that you can create. Once the pool is created, these custom attribute fields will be present on every user profile in that user pool. However, they won’t have values populated yet. The actual tenant attribute values get populated only when a new user is created in the user pool. This can be done in two ways:

  1. During user sign-up, a post confirmation AWS Lambda trigger can be used to set the appropriate tenant attribute values based on the user’s input.
  2. An admin user can provision a new user through the AdminCreateUser API operation and specify the tenant attribute values at that time.

After user creation, the custom tenant attribute values can still be updated by an administrator through the AdminUpdateUserAttributes API operation or by a user with the UpdateUserAttributes API operation, if needed. But the key point is that the custom attributes themselves must be predefined at user pool creation, while the values get set later during user creation and provisioning flows. Figure 1 shows how custom attributes are associated with an ID token and used subsequently in downstream applications.

Figure 1: Associating tenant context with custom attributes

Figure 1: Associating tenant context with custom attributes

As shown in Figure 1:

  • The custom tenant attribute values from the user profile are included in the Cognito ID token that is generated after a successful user authentication. These values can be used for access control for other AWS services, such as Amazon API Gateway.
  • You can configure Amazon API Gateway with a Lambda authorizer function that validates the ID token signature (the aws-jwt-verify library can be used for this purpose) and inspects the tenant ID claim in each request.
  • Based on the tenant ID value extracted from the ID token, the Lambda authorizer can determine which backend resources and services each authenticated user is authorized to access.

You can use this method to provide fine-grained access control, as described in this blog post, by using tenant claims as context in addition to the user claims embedded within the token. This pattern of embedding information about the user’s identity, along with details on their associated tenant, in a single token is what AWS refers to as SaaS identity.

The multi-tenancy approaches of using siloed user pools, shared pools, or custom attributes rely on embedding tenant context within the user identity. This is accomplished by having Cognito include claims with tenant information in the JWTs issued after authentication.

The JWT encodes user identity information like the username, email address, and so on. By adding custom claims that contain tenant identifiers or metadata, the tenant context gets tightly coupled to the user identity. The embedded tenant context in the JWT allows applications to implement access control and authorization based on the associated tenant for each user.

This combination of user identity information and tenant context in the issued JWT represents the SaaS identity—a unified identity spanning both user and tenant dimensions. The application uses this SaaS identity for implementing multi-tenant logic and policies.

Pattern 2: Shared user pool (pool model)

A single, shared Amazon Cognito user pool simplifies identity management for multi-tenant SaaS applications. With one consolidated pool, changes and configurations apply across tenants in one place, which can reduce overhead.

For example, you can define password complexity rules and other settings once at the user pool level, and then these settings are shared across tenants. Adding new tenants is streamlined by using the settings in the existing shared pool, without duplicating setup per tenant. This avoids deploying isolated pools when onboarding new tenants.

Additionally, the tokens issued from the shared pool are signed by the same issuer. There is no tenant-specific issuer in the tokens when using a shared pool. For SaaS apps with common identity needs, a shared multi-tenant pool minimizes friction for rapid onboarding despite that loss of per-tenant customization.

Advantages of the pool model:

  • This model uses a single shared user pool for tenants. This simplifies onboarding by setting user attributes rather than configuring multiple user pools.
  • Tenants authenticate using the same application client and user pool, which keeps the SaaS client configuration simple.

Disadvantages of the pool model:

  • Sharing one pool means that settings like password policies and MFA apply uniformly, without customization per tenant.
  • Some resource quotas are managed at a user pool level (for example, the number of application clients or customer attributes), so you need to consider quotas carefully when adopting this model.

Pattern 3: Group-based multi-tenancy (pool model)

Amazon Cognito user pools give an administrator the capability to add groups and associate users with groups. Doing so introduces specific attributes (cognito:groups and cognito:roles) that are managed and maintained by Cognito and available within the ID tokens. (Access tokens only have the cognito:groups attribute.) These groups can be used to enable multi-tenancy by creating a separate group for each tenant. Users can be assigned to the appropriate tenant group based on the value of a custom tenantId attribute. The application can then implement authorization logic to limit access to resources and data based on the user’s tenant group membership that is encoded in the tokens. This provides isolation and access control across tenants, making use of the native group constructs in Cognito rather than relying entirely on custom attributes.

The group information contained in the tokens can then be used by downstream services to make authorization decisions. Groups are often combined with custom attributes for more granular access control. For example, in the SaaS Factory Serverless SaaS – Reference Solution developed by the AWS SaaS Factory team, roles are specified by using Cognito groups, but tenant identity relies on a custom tenantId attribute. The tenant ID attribute provides isolation between tenants, while the groups define individual user roles and access privileges that apply within a tenant.

Figure 2 shows how groups are associated with the user and then the Lambda authorizer can determine which backend resources and services each authenticated user is authorized to access.

Figure 2: Group-based multi-tenancy

Figure 2: Group-based multi-tenancy

In this model, groups can provide role-based controls, while custom attributes like tenant ID provide the contextual information needed to enforce tenant isolation. The authorization decisions are then made by evaluating a user’s group memberships and attribute values in order to provide fine-grained access tailored to each tenant and user. So groups directly enable role-based checks, while custom attributes provide broader context for conditional access across tenants. Together they can provide the data that is needed to implement granular authorization in a multi-tenant application.

Advantages of group-based multi-tenancy:

  • This model uses a single shared user pool for tenants, so that onboarding requires setting user attributes rather than configuring multiple pools.
  • Tenants authenticate through the same application client and pool, keeping SaaS client configuration straightforward.

Disadvantages of group-based multi-tenancy:

  • Sharing one pool means that settings like password policies and MFA apply uniformly without per-tenant customization.
  • There is a limit of 10,000 groups per user pool.

Pattern 4: Dedicated user pool per tenant (silo model)

Another common approach for multi-tenant identity with Cognito is to provision a separate user pool for each tenant. A Cognito user pool is a user directory, so using distinct pools provides maximum isolation. However, this approach requires that you implement tenant routing logic in the application to determine which user pool a user should authenticate against, based on their tenant.

Tenant routing

With separate user pools per tenant (or application clients, as we’ll discuss later), the application needs logic to route each user to the appropriate pool (or client) for authentication. There are a few options that you can use for this approach:

  • Use a subdomain in the URL that maps to the tenant—for example, tenant1.myapp.com routes to Tenant 1’s user pool. This requires mapping subdomains to tenant pools.
  • Rely on unique email domains per tenant—for example, @tenant1.com goes to Tenant 1’s pool. This requires mapping email domains to pools.
  • Have the user select their tenant from a dropdown list. This requires the tenant choices to be configured.
  • Prompt the user to enter a tenant ID code that maps to pools. This requires mapping codes to pools.

No matter the approach you chose, the key requirements are the following:

  • A data point to identify the tenant (such as subdomain, email, selection, or code).
  • A mapping dataset that takes tenant identifying information from the user and looks up the corresponding user pool to route to for authentication.
  • Routing logic to redirect to the appropriate user pool.

For example, the AWS SaaS Factory Serverless Reference Architecture uses the approach shown in Figure 3.

Figure 3: Dedicated user pool per tenant

Figure 3: Dedicated user pool per tenant

The workflow is as follows:

  1. The user enters their tenant name during sign-in.
  2. The tenant name retrieves tenant-specific information like the user pool ID, application client ID, and API URLs.
  3. Tenant-specific information is passed to the SaaS app to initialize authentication to the correct user pool and app client, and this is used to initialize an authorization code flow.
  4. The app redirects to the Cognito hosted UI for authentication.
  5. User credentials are validated, and Cognito issues an OAuth code.
  6. The OAuth code is exchanged for a JWT token from Cognito.
  7. The JWT token is used to authenticate the user to access microservices.

Advantages of the one pool per tenant model:

  • Users exist in a single directory with no cross-tenant visibility. Tokens are issued and signed with keys that are unique to that pool.
  • Each pool can have customized security policies, like password rules or MFA requirements per tenant.
  • Pools can be hosted in different AWS Regions to meet data residency needs.

Potential disadvantages of the one pool per tenant model:

  • There are limits on the number of pools per account. (The default is 1,000 pools, and the maximum is 10,000.)
  • Additional automation is required to create multiple pools, especially with customized configurations.
  • Applications must implement tenant routing to direct authentication requests to the correct user pool.
  • Troubleshooting can be more difficult, because configuration of each pool is managed separately and tenant routing functionality is added.

In summary, separate user pools maximize tenant isolation but require more complex provisioning and routing. You might also need to consider limits on the pool count for large multi-tenant deployments.

Pattern 5: Application client per tenant (bridge model)

You can achieve some extra tenant isolation by using separate application clients per tenant in a single user pool, in addition to using groups and custom attributes. Cognito configurations from the application client, such as OAuth scopes, hosted UI customization, and security policies can be specific to each tenant. The application client also enables external IdP federation per tenant. However, user pool–level settings, such as password policy, remain shared.

Figure 4 shows how a single user pool can be configured with multiple application clients. Each of those application clients is assigned to a tenant. However, this approach requires that you implement tenant routing logic in the application to determine which application client a tenant should be mapped to (similar to the approach we discussed for the shared user pool). Once the user is authenticated, you can configure Amazon API Gateway with a Lambda authorizer function that validates the ID token signature. Subsequently, the Lambda authorizer can determine which backend resources and services each authenticated user is authorized to access.

Figure 4: Application client based multi-tenancy

Figure 4: Application client based multi-tenancy

For tenants that want to use their own IdP through SAML or OpenID Connect federation, you can create a dedicated application client that will redirect users to authenticate with the tenant’s federated IdP. This has some key benefits:

  • If a single external IdP is enabled on the application client, the hosted UI automatically redirects users without presenting Cognito sign-in screens. This provides a familiar sign-in experience for tenants and is frictionless if users have existing sessions with the tenant IdP.
  • Management of user activities like joining and leaving, passwords, and other tasks are entirely handled by the tenant in their own IdP. The SaaS provider doesn’t need to get involved in these processes.

Importantly, even with federation, Cognito still issues tokens after successful external authentication. So the SaaS provider gets consistent tokens from Cognito to validate during authorization, regardless of the IdP.

Attribute mapping

When federating with an external IdP, Amazon Cognito can dynamically map attributes to populate the tokens it issues. This allows attributes like groups, email addresses, and roles created in the IdP to be passed to Cognito during authentication and added to the tokens.

The mapping occurs upon every sign-in, overwriting the existing mapped attributes to stay in sync with the latest IdP values. Therefore, changes made in the external IdP related to mapped attributes are reflected in Cognito after signing in. If a mapped attribute is required in the Cognito user pool, like email for sign-in, it must have an equivalent in the IdP to map. The target attributes in Cognito must be configured as mutable, since immutable attributes cannot be overwritten after creation, even through mapping.

Important: For SaaS identity, tenant attributes should be defined in Cognito rather than mapped from an external IdP. This helps to prevent tenants from tampering with values and maintains isolation. However, user attributes like groups and roles can be mapped from the tenant’s IdP to manage permissions. This allows tenants to configure application roles by using their own IdP groups.

Advantages of the bridge model:

  • This model enables tenant-specific configuration like OAuth scopes, UI, and IdPs.
  • Tenant users access familiar workflows through external IdPs, and when using external IdPs, tenant user management is handled externally.
  • No custom claim mappings are needed, but can be used optionally.
  • Cognito still issues tokens for authorization.

Disadvantages of the bridge model:

  • Requires routing users to the correct app client per tenant.
  • There is a limit on the number of app clients per user pool.
  • Some user pool settings remain shared, such as password policy.
  • There is no dynamic group claim modification.

Conclusion

In this blog post, we explored various ways Amazon Cognito user pools can enable multi-tenant identity for SaaS solutions. A single shared user pool simplifies management but limits the option to customize user pool–level policies, while separate pools maximize isolation and configurability at the cost of complexity. If you use multiple application clients, you can balance tailored options like external IdPs and OAuth scopes with centralized policies in the user pool. Custom claim mappings provide flexibility but require additional logic.

These two approaches can also be combined. For example, you can have dedicated user pools for select high-tier tenants while others share a multi-tenant pool. The optimal choice depends on the specific tenant needs and on the customization that is required.

In this blog post, we have mainly focused on a static approach. You can also use a pre-token generation Lambda trigger to modify tokens by adding, changing, or removing claims dynamically. The trigger can also override the group membership in both the identity and access tokens. Other claim changes only apply to the ID token. A common use case for this trigger is injecting tenant attributes into the token dynamically.

Evaluate the pros and cons of each approach against the requirements of the SaaS architecture and tenants. Often a hybrid model works best. Cognito constructs like user pools, IdPs, and triggers provide various levers that you can use to fine-tune authentication and authorization across tenants.

For further reading on these topics, see the Common Amazon Cognito scenarios topic in the Cognito Developer Guide and the related blog post How to Use Cognito Pre-Token Generation trigger to Customize Claims in ID Tokens.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Cognito re:Post

Shubhankar Sumar

Shubhankar Sumar
Shubhankar is a Senior Solutions Architect at AWS, working with enterprise software and SaaS customers across the UK to help architect secure, scalable, efficient and cost-effective systems. He is an experienced software engineer having built many SaaS solutions. Shubhankar specializes in building multi-tenant platform on the cloud. He is also working closely with customers to bring in GenAI capabilities in their SaaS application.

Owen Hawkins

Owen Hawkins
With over 20 years of information security experience, Owen brings deep expertise to his role as a Principal Solutions Architect at AWS. He works closely with ISV customers, drawing on his extensive background in digital banking security. Owen specializes in SaaS and multi-tenant architecture. He is passionate about enabling companies to securely embrace the cloud. Solving complex challenges excites Owen, who thrives on finding innovative ways to protect and run applications on AWS.

How AWS tracks the cloud’s biggest security threats and helps shut them down

Post Syndicated from CJ Moses original https://aws.amazon.com/blogs/security/how-aws-tracks-the-clouds-biggest-security-threats-and-helps-shut-them-down/

Threat intelligence that can fend off security threats before they happen requires not just smarts, but the speed and worldwide scale that only AWS can offer.

Organizations around the world trust Amazon Web Services (AWS) with their most sensitive data. One of the ways we help secure data on AWS is with an industry-leading threat intelligence program where we identify and stop many kinds of malicious online activities that could harm or disrupt our customers or our infrastructure. Producing accurate, timely, actionable, and scalable threat intelligence is a responsibility we take very seriously, and is something we invest significant resources in.

Customers increasingly ask us where our threat intelligence comes from, what types of threats we see, how we act on what we observe, and what they need to do to protect themselves. Questions like these indicate that Chief Information Security Officers (CISOs)—whose roles have evolved from being primarily technical to now being a strategic, business-oriented function—understand that effective threat intelligence is critical to their organizations’ success and resilience. This blog post is the first of a series that begins to answer these questions and provides examples of how AWS threat intelligence protects our customers, partners, and other organizations.

High-fidelity threat intelligence that can only be achieved at the global scale of AWS

Every day across AWS infrastructure, we detect and thwart cyberattacks. With the largest public network footprint of any cloud provider, AWS has unparalleled insight into certain activities on the internet, in real time. For threat intelligence to have meaningful impact on security, large amounts of raw data from across the internet must be gathered and quickly analyzed. In addition, false positives must be purged. For example, threat intelligence findings could erroneously indicate an insider threat when an employee is logged accessing sensitive data after working hours, when in reality, that employee may have been tasked with a last-minute project and had to work overnight. Producing threat intelligence is very time consuming and requires substantial human and digital resources. Artificial intelligence (AI) and machine learning can help analysts sift through and analyze vast amounts of data. However, without the ability to collect and analyze relevant information across the entire internet, threat intelligence is not very useful. Even for organizations that are able to gather actionable threat intelligence on their own, without the reach of global-scale cloud infrastructure, it’s difficult or impossible for time-sensitive information to be collectively shared with others at a meaningful scale.

The AWS infrastructure radically transforms threat intelligence because we can significantly boost threat intelligence accuracy—what we refer to as high fidelity—because of the sheer number of intelligence signals (notifications generated by our security tools) we can observe. And we constantly improve our ability to observe and react to threat actors’ evolving tactics, techniques, and procedures (TTPs) as we discover and monitor potentially harmful activities through MadPot, our sophisticated globally-distributed network of honeypot threat sensors with automated response capabilities.

With our global network and internal tools such as MadPot, we receive and analyze thousands of different kinds of event signals in real time. For example, MadPot observes more than 100 million potential threats every day around the world, with approximately 500,000 of those observed activities classified as malicious. This means high-fidelity findings (pieces of relevant information) produce valuable threat intelligence that can be acted on quickly to protect customers around the world from harmful and malicious online activities. Our high-fidelity intelligence also generates real-time findings that are ingested into our intelligent threat detection security service Amazon GuardDuty, which automatically detects threats for millions of AWS accounts.

AWS’s Mithra ranks domain trustworthiness to help protect customers from threats

Let’s dive deeper. Identification of malicious domains (physical IP addresses on the internet) is crucial to effective threat intelligence. GuardDuty generates various kinds of findings (potential security issues such as anomalous behaviors) when AWS customers interact with domains, with each domain being assigned a reputation score derived from a variety of metrics that rank trustworthiness. Why this ranking? Because maintaining a high-quality list of malicious domain names is crucial to monitoring cybercriminal behavior so that we can protect customers. How do we accomplish the huge task of ranking? First, imagine a graph so large (perhaps one of the largest in existence) that it’s impossible for a human to view and comprehend the entirety of its contents, let alone derive usable insights.

Meet Mithra. Named after a mythological rising sun, Mithra is a massive internal neural network graph model, developed by AWS, that uses algorithms for threat intelligence. With its 3.5 billion nodes and 48 billion edges, Mithra’s reputation scoring system is tailored to identify malicious domains that customers come in contact with, so the domains can be ranked accordingly. We observe a significant number of DNS requests per day—up to 200 trillion in a single AWS Region alone—and Mithra detects an average of 182,000 new malicious domains daily. By assigning a reputation score that ranks every domain name queried within AWS on a daily basis, Mithra’s algorithms help AWS rely less on third parties for detecting emerging threats, and instead generate better knowledge, produced more quickly than would be possible if we used a third party.

Mithra is not only able to detect malicious domains with remarkable accuracy and fewer false positives, but this super graph is also capable of predicting malicious domains days, weeks, and sometimes even months before they show up on threat intel feeds from third parties. This world-class capability means that we can see and act on millions of security events and potential threats every day.

By scoring domain names, Mithra can be used in the following ways:

  • A high-confidence list of previously unknown malicious domain names can be used in security services like GuardDuty to help protect our customers. GuardDuty also allows customers to block malicious domains and get alerts for potential threats.
  • Services that use third-party threat feeds can use Mithra’s scores to significantly reduce false positives.
  • AWS security analysts can use scores for additional context as part of security investigations.

Sharing our high-fidelity threat intelligence with customers so they can protect themselves

Not only is our threat intelligence used to seamlessly enrich security services that AWS and our customers rely on, we also proactively reach out to share critical information with customers and other organizations that we believe may be targeted or potentially compromised by malicious actors. Sharing our threat intelligence enables recipients to assess information we provide, take steps to reduce their risk, and help prevent disruptions to their business.

For example, using our threat intelligence, we notify organizations around the world if we identify that their systems are potentially compromised by threat actors or appear to be running misconfigured systems vulnerable to exploits or abuse, such as open databases. Cybercriminals are constantly scanning the internet for exposed databases and other vulnerabilities, and the longer a database remains exposed, the higher the risk that malicious actors will discover and exploit it. In certain circumstances when we receive signals that suggest a third-party (non-customer) organization may be compromised by a threat actor, we also notify them because doing so can help head off further exploitation, which promotes a safer internet at large.

Often, when we alert customers and others to these kinds of issues, it’s the first time they become aware that they are potentially compromised. After we notify organizations, they can investigate and determine the steps they need to take to protect themselves and help prevent incidents that could cause disruptions to their organization or allow further exploitation. Our notifications often also include recommendations for actions organizations can take, such as to review security logs for specific domains and block them, implement mitigations, change configurations, conduct a forensic investigation, install the latest patches, or move infrastructure behind a network firewall. These proactive actions help organizations to get ahead of potential threats, rather than just reacting after an incident occurs.

Sometimes, the customers and other organizations we notify contribute information that in turn helps us assist others. After an investigation, if an affected organization provides us with related indicators of compromise (IOCs), this information can be used to improve our understanding of how a compromise occurred. This understanding can lead to critical insights we may be able to share with others, who can use it to take action to improve their security posture—a virtuous cycle that helps promote collaboration aimed at improving security. For example, information we receive may help us learn how a social engineering attack or particular phishing campaign was used to compromise an organization’s security to install malware on a victim’s system. Or, we may receive information about a zero-day vulnerability that was used to perpetrate an intrusion, or learn how a remote code execution (RCE) attack was used to run malicious code and other malware to steal an organization’s data. We can then use and share this intelligence to protect customers and other third parties. This type of collaboration and coordinated response is more effective when organizations work together and share resources, intelligence, and expertise.

Three examples of AWS high-fidelity threat intelligence in action

Example 1: We became aware of suspicious activity when our MadPot sensors indicated unusual network traffic known as backscatter (potentially unwanted or unintended network traffic that is often associated with a cyberattack) that contained known IOCs associated with a specific threat attempting to move across our infrastructure. The network traffic appeared to be originating from the IP space of a large multinational food service industry organization and flowing to Eastern Europe, suggesting potential malicious data exfiltration. Our threat intelligence team promptly contacted the security team at the affected organization, which wasn’t an AWS customer. They were already aware of the issue but believed they had successfully addressed and removed the threat from their IT environment. However, our sensors indicated that the threat was continuing and not resolved, showing that a persistent threat was ongoing. We requested an immediate escalation, and during a late-night phone call, the AWS CISO shared real-time security logs with the CISO of the impacted organization to show that large amounts of data were still being suspiciously exfiltrated and that urgent action was necessary. The CISO of the affected company agreed and engaged their Incident Response (IR) team, which we worked with to successfully stop the threat.

Example 2: Earlier this year, Volexity published research detailing two zero-day vulnerabilities in the Ivanti Connect Secure VPN, resulting in the publication of CVE-2023-46805 (an authentication-bypass vulnerability) and CVE-2024-21887 (a command-injection vulnerability found in multiple web components). The U.S. Cybersecurity and Infrastructure Security Agency (CISA) issued a cybersecurity advisory on February 29, 2024 on this issue. Earlier this year, Amazon security teams enhanced our MadPot sensors to detect attempts by malicious actors to exploit these vulnerabilities. Using information obtained by the MadPot sensors, Amazon identified multiple active exploitation campaigns targeting vulnerable Ivanti Connect Secure VPNs. We also published related intelligence in the GuardDuty common vulnerabilities and exposures (CVE) feed, enabling our customers who use this service to detect and stop this activity if it is present in their environment. (For more on CVSS metrics, see the National Institute of Standards and Technology (NIST) Vulnerability Metrics.)

Example 3: Around the time Russia began its invasion of Ukraine in 2022, Amazon proactively identified infrastructure that Russian threat groups were creating to use for phishing campaigns against Ukrainian government services. Our intelligence findings were integrated into GuardDuty to automatically protect AWS customers while also providing the information to the Ukrainian government for their own protection. After the invasion, Amazon identified IOCs and TTPs of Russian cyber threat actors that appeared to target certain technology supply chains that could adversely affect Western businesses opposed to Russia’s actions. We worked with the targeted AWS customers to thwart potentially harmful activities and help prevent supply chain disruption from taking place.

AWS operates the most trusted cloud infrastructure on the planet, which gives us a unique view of the security landscape and the threats our customers face every day. We are encouraged by how our efforts to share our threat intelligence have helped customers and other organizations be more secure, and we are committed to finding even more ways to help. Upcoming posts in this series will include other threat intelligence topics such as mean time to defend, our internal tool Sonaris, and more.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Max Peterson

CJ Moses
CJ Moses is the Chief Information Security Officer at Amazon. In his role, CJ leads security engineering and operations across Amazon. His mission is to enable Amazon businesses by making the benefits of security the path of least resistance. CJ joined Amazon in December 2007, holding various roles including Consumer CISO, and most recently AWS CISO, before becoming CISO of Amazon in September of 2023.

Prior to joining Amazon, CJ led the technical analysis of computer and network intrusion efforts at the Federal Bureau of Investigation’s Cyber Division. CJ also served as a Special Agent with the Air Force Office of Special Investigations (AFOSI). CJ led several computer intrusion investigations seen as foundational to the security industry today.

CJ holds degrees in Computer Science and Criminal Justice, and is an active SRO GT America GT2 race car driver.

OSPAR 2024 report now available with 163 services in scope

Post Syndicated from Joseph Goh original https://aws.amazon.com/blogs/security/ospar-2024-report-available-with-163-services-in-scope/

Amazon Web Services (AWS) is pleased to announce the completion of our annual Outsourced Service Provider’s Audit Report (OSPAR) audit cycle on July 1, 2024. The 2024 OSPAR certification cycle includes the addition of 10 new services in scope, bringing the total number of services in scope to 163 in the AWS Asia Pacific (Singapore) Region.

Newly added services in scope include the following:

The Association of Banks in Singapore (ABS) has established the Guidelines on Control Objectives and Procedures for Outsourced Service Providers to provide baseline controls criteria that Outsourced Service Providers (“OSPs”) operating in Singapore should have in place. Successfully completing the OSPAR assessment demonstrates that AWS has implemented a robust system of controls that adhere to these guidelines. This underscores our commitment to fulfill the security expectations for cloud service providers set by the financial services industry in Singapore.

Customers can use OSPAR to streamline their due diligence processes, thereby reducing the effort and costs associated with compliance. OSPAR remains a core assurance program for our financial services customers, as it is closely aligned with local regulatory requirements from the Monetary Authority of Singapore (MAS).

You can download the latest OSPAR report from AWS Artifact, a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact. The list of services in scope for OSPAR is available in the report, and is also available on the AWS Services in Scope by Compliance Program webpage.

As always, we’re committed to bringing new services into the scope of our OSPAR program based on your architectural and regulatory needs. If you have questions about the OSPAR report, contact your AWS account team.

If you have feedback about this post, submit comments in the Comments section below.

Joseph Goh

Joseph Goh
Joseph is the APJ ASEAN Lead at AWS, based in Singapore. He leads security audits, certifications, and compliance programs across the Asia Pacific region. Joseph is passionate about delivering programs that build trust with customers and providing them assurance on cloud security.

Federated access to Amazon Athena using AWS IAM Identity Center

Post Syndicated from Ajay Rawat original https://aws.amazon.com/blogs/security/federated-access-to-amazon-athena-using-aws-iam-identity-center/

Managing Amazon Athena through identity federation allows you to manage authentication and authorization procedures centrally. Athena is a serverless, interactive analytics service that provides a simplified and flexible way to analyze petabytes of data.

In this blog post, we show you how you can use the Athena JDBC driver (which includes a browser Security Assertion Markup Language (SAML) plugin) to connect to Athena from third-party SQL client tools, which helps you quickly implement identity federation capabilities and multi-factor authentication (MFA). This enables automation and enforcement of data access policies across your organization.

You can use AWS IAM Identity Center to federate access to users to AWS accounts. IAM Identity Center integrates with AWS Organizations to manage access to the AWS accounts under your organization. In this post, you will learn how you can integrate the Athena browser-based SAML plugin to add single sign-on (SSO) and MFA capability with your federation identity provider (IdP).

Prerequisites

To implement this solution, you must have the follow prerequisites:

Note: Lake Formation only supports a single role in the SAML assertion. Multiple roles cannot be used.

Solution overview

Figure 1: Solution architecture

Figure 1: Solution architecture

To implement the solution, complete the steps shown in Figure 1:

  1. An IAM Identity Center administrator configures two custom SAML applications.
  2. An IAM Identity Center administrator configures the attribute mappings and custom attribute mappings for each SAML application and then grants users or groups access to the SAML applications.
  3. An IAM administrator sets up an IAM IdP and uploads corresponding metadata document for each SAML application in their AWS account.
  4. An IAM administrator sets up two IAM roles (sensitive and non-sensitive) and permissions in their AWS account.
  5. A Lake Formation administrator grants two IAM role permissions to the corresponding database and tables.

The solution workflow consists of the following high-level steps as shown in Figure 1:

  1. A user initiates a connection through SQL client.
  2. The SQL client redirects the user to the AWS access portal URL, which is configured in the JDBC client tool for the user authentication.
  3. The user enters workforce identity credentials (username and password). Then selects Sign in.
  4. The AWS access portal verifies the user’s identity. IAM Identity Center redirects the request to the Identity Center authentication service to validate the user’s credentials.
  5. If MFA is enabled for the user, then they are prompted to authenticate their MFA device.
    1. MFA is initiated.
    2. The user enters or approves the MFA details.
    3. The user’s MFA is successfully completed.
  6. The user selects an application.
    1. After successful authentication, the user will be signed in to the AWS access portal. Under the applications tab, they can view available assigned applications.
    2. The user selects a SAML application.
  7. IAM Identity Center redirects the request to the Identity Center authentication service to validate the user’s access to a SAML application.
  8. The user uses the client to run a SQL query.
  9. The client makes a call to Athena to retrieve the table and associated metadata from the Data Catalog.
  10. Athena requests access to the data from Lake Formation.
  11. Lake Formation invokes the AWS Security Token Service (AWS STS).
    1. Lake Formation invokes AWS STS. Lake Formation obtains temporary AWS credentials with permissions of the defined IAM role (sensitive or non-sensitive) associated with the data lake location.
    2. Lake Formation returns temporary credentials to Athena.
  12. Athena uses the temporary credentials to retrieve data objects from Amazon S3.
  13. The Athena engine successfully runs the query and returns the results to the client.

Solution walkthrough

The walkthrough includes eight sections that will guide you through the process of configuring an identity provider and SAML applications, defining roles, managing access to those roles using Lake Formation, and setting up third party SQL clients such as SQL Workbench to connect to your data store and query your data through Athena.

Step 1: Federate onboarding

Federating onboarding is done using a customer managed application. The steps occur within the IAM Identity Center account. As part of federated onboarding, you need to create Identity Center groups. Groups are a collection of people who have the same security rights and permissions. You can create groups and add users to the groups. Create one Identity Center group for sensitive data and another for non-sensitive data to provide distinct access to different classes of data sets. You can assign access to Identity Center applications to a user or group.

To federate onboarding:

  1. Open the AWS Management Console using the IAM Identity Center account and go to IAM Identity Center. Select Applications from the navigation pane and then choose Add Application.

    Figure 2: Add an IAM Identity Center application

    Figure 2: Add an IAM Identity Center application

  2. Select I have an application I want to set up set up, select SAML 2.0 application type, and then choose Next.

    Figure 3: IAM Identity Center application types

    Figure 3: IAM Identity Center application types

  3. Under configure application, enter an application Display name (such as Athena Sensitive Application) and Description. Leave the application properties empty.
  4. To download the SAML metadata file, go to the IAM Identity Center metadata section and choose Download. You will need to have this file available in step 4 when configuring a SAML IdP.
  5. Under the Application properties, add the following:
    1. Enter http://localhost:7890/athena/ as the Application ACS URL.
    2. Enter urn:amazon:webservices as the Application SAML audience.
  6. Choose Submit.
  7. Select Application and then select your application name under Customer managed.
  8. Choose Action, and then select Edit attribute mappings.

    Figure 4: Configuring SAML application attribute mappings

    Figure 4: Configuring SAML application attribute mappings

  9. Update attributes as listed in the following table:

    Attribute Map Format
    Subject ${user:email} emailAddress
    https://aws.amazon.com/SAML
    /Attributes/RoleSessionName
    ${user:email} unspecified
    https://aws.amazon.com/SAML
    /Attributes/Role
    <Athena IAM Role Name from Target Account (e.g. Sensitive-IAM-Role or Non-Sensitive-IAM-Role)>, <saml- IdP ARN>

    For example:

    arn:aws:iam::account-number:role/sensitive,arn:aws:iam::account-number:saml-provider/provider-name

    unspecified
  10. Choose Assign users and groups to assign groups to the Custom SAML 2.0 applications.

    Figure 5: Assigning user and groups to SAML application

    Figure 5: Assigning user and groups to SAML application

  11. Repeat steps 2 through 9 for the non-sensitive data group using Athena Non-Sensitive Application as the application display name.

Step 2: Create a SAML IdP

You must create a SAML IdP that points to the federated service. Before you can create a SAML IdP, you must obtain the SAML metadata document from the federated service’s onboarding section. This involves uploading some metadata about the federated service and naming the new provider.

To create an IdP:

  1. From the IAM console, choose Identity providers, then choose Create Provider.

    Figure 6: Create IAM IdPs

    Figure 6: Create IAM IdPs

  2. Select SAML as the configure provider type.
  3. Enter your provider name, for example FederateDemo, for a testing IdP.
  4. From Metadata document, choose File, and browse to where you saved the Federate metadata file from step 4 of Federate onboarding.
  5. Verify the configuration and choose Add provider.
  6. Write down the IdPs Amazon Resource Name (ARN).

Step 3: Create IAM roles and policies

For this step, create two IAM roles (sensitive-iam-role and non-sensitive-iam-role), along with custom identity-based policies and a trust policy for both IAM roles. The trust policy defines which principals can assume the role and under which conditions. Additionally, you must create custom identity-based policies for both IAM roles. These policies can be attached to an IAM role to specify the actions that the principal can perform on the specified resources.

To create IAM roles:

  1. Using the data lake administrator account, go to the IAM console
  2. In the navigation pane of the console, select Roles, and then choose Create role.
  3. Select Custom trust policy as the type. Paste the following custom trust policy for the role. Replace the federated ARN(<account-id>) with the ARN of the IdP from step 6 of Create a SAML IdP (<idp-federation-name>).
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "SamlTrustPolicy",
          "Effect": "Allow",
          "Principal": {
            "Federated": "arn:aws:iam::<account-id>:saml-provider/<idp-federation-name>"
          },
          "Action": "sts:AssumeRoleWithSAML",
          "Condition": {
            "StringEquals": {
              "saml:aud": [
                "http://localhost:7890/athena/",
                "https://signin.aws.amazon.com/saml"
              ]
            }
          }
        }
      ]
    }

  4. Choose Next.
  5. Enter sensitive-iam-role as the custom role name.
  6. Review the role and then choose Create role.
  7. Repeat steps 1 through 6 and enter non-sensitive-iam-role at step 4.

To create IAM policies:

  1. From the data lake administrator account, select IAM, and then choose Policies.
  2. Choose Create Policy. The following are the custom policies for sensitive-iam-role and non-sensitive-iam-role.
  3. Insert the following policy and update the S3 bucket name (<3-bucket-name>), AWS Region (<region>) account ID (<account-id>), CloudWatch alarm name (<AlarmName>), Athena workgroup name (sensitive or non-sensitive) (<WorkGroupName>), KMS key alias name (<KMS-key-alias-name>), and organization ID (<aws-PrincipalOrgID>).
    {
      "Statement": [
        {
          "Action": [
            "lakeformation:SearchTablesByLFTags",
            "lakeformation:SearchDatabasesByLFTags",
            "lakeformation:ListLFTags",
            "lakeformation:GetResourceLFTags",
            "lakeformation:GetLFTag",
            "lakeformation:GetDataAccess",
            "glue:SearchTables",
            "glue:GetTables",
            "glue:GetTable",
            "glue:GetPartitions",
            "glue:GetDatabases",
            "glue:GetDatabase"
          ],
          "Effect": "Allow",
          "Resource": "*",
          "Sid": "LakeformationAccess"
        },
        {
          "Action": [
            "s3:PutObject",
            "s3:ListMultipartUploadParts",
            "s3:ListBucketMultipartUploads",
            "s3:ListBucket",
            "s3:GetObject",
            "s3:GetBucketLocation",
            "s3:CreateBucket",
            "s3:AbortMultipartUpload"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::<s3-bucket-name>/*",
            "arn:aws:s3:::<s3-bucket-name>"
          ],
          "Sid": "S3Access"
        },
        {
          "Action": "s3:ListAllMyBuckets",
          "Effect": "Allow",
          "Resource": "*",
          "Sid": "AthenaS3ListAllBucket"
        },
        {
          "Action": [
            "cloudwatch:PutMetricAlarm",
            "cloudwatch:DescribeAlarms"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:cloudwatch:<region>:<account-id>:alarm:<AlarmName>"
          ],
          "Sid": "CloudWatchLogs"
        },
        {
          "Action": [
            "athena:UpdatePreparedStatement",
            "athena:StopQueryExecution",
            "athena:StartQueryExecution",
            "athena:ListWorkGroups",
            "athena:ListTableMetadata",
            "athena:ListQueryExecutions",
            "athena:ListPreparedStatements",
            "athena:ListNamedQueries",
            "athena:ListEngineVersions",
            "athena:ListDatabases",
            "athena:ListDataCatalogs",
            "athena:GetWorkGroup",
            "athena:GetTableMetadata",
            "athena:GetQueryResultsStream",
            "athena:GetQueryResults",
            "athena:GetQueryExecution",
            "athena:GetPreparedStatement",
            "athena:GetNamedQuery",
            "athena:GetDatabase",
            "athena:GetDataCatalog",
            "athena:DeletePreparedStatement",
            "athena:DeleteNamedQuery",
            "athena:CreatePreparedStatement",
            "athena:CreateNamedQuery",
            "athena:BatchGetQueryExecution",
            "athena:BatchGetNamedQuery"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:athena:<region>:<account-id>:workgroup/<WorkGroupName>",
            "arn:aws:athena:{Region}:{Account}:datacatalog/{DataCatalogName}"
          ],
          "Sid": "AthenaAllow"
        },
        {
          "Action": [
            "kms:GenerateDataKey",
            "kms:DescribeKey",
            "kms:Decrypt"
          ],
          "Condition": {
            "ForAnyValue:StringLike": {
              "kms:ResourceAliases": "<KMS-key-alias-name>"
            }
          },
          "Effect": "Allow",
          "Resource": "*",
          "Sid": "kms"
        },
        {
          "Action": "*",
          "Condition": {
            "StringNotEquals": {
              "aws:PrincipalOrgID": "<aws-PrincipalOrgID>"
            }
          },
          "Effect": "Deny",
          "Resource": "*",
          "Sid": "denyRule"
        }
      ],
      "Version": "2012-10-17"
    }

  4. Update the custom policy to add the corresponding Athena workgroup ARN for the sensitive and non-sensitive IAM roles.

    Note: See the documentation for information about AWS global condition context keys.

  5. Choose Create policy to save your new policy. Create one policy for the sensitive IAM role and another for the non-sensitive IAM role.

Step 4: Attach identity-based policies to IAM roles

You can add and remove permissions for an IAM user, group, or role by attaching and detaching IAM policies to that identity. Policies define the permissions that determine what actions an identity can perform on which AWS resources. Attaching a policy grants the associated permissions.

To attach IAM policies to an IAM role:

  1. Attach the custom policy to the corresponding IAM roles.
  2. Referring back to step 9 of the Federate onboarding for the IAM Identity Center custom application, update the attribute mappings ARNs for both the IAM roles and the SAML IdPs. Perform this step for both the sensitive and non-sensitive custom applications.

Step 5. Grant permissions to IAM roles

A data lake administrator has the broad ability to grant a principal (including themselves) permissions on Data Catalog resources. This includes the ability to manage access controls and permissions for the data lake. When you grant Lake Formation permissions on a specific Data Catalog table, you can also include data filtering specifications. This allows you to further restrict access to certain data within the table, limiting what users can see in their query results based on those filtering rules.

To grant permissions to IAM roles:

In the Lake Formation console, under Permissions in the navigation pane, select Data Lake permissions, and then choose Grant.

To grant Database permissions to IAM roles:

  1. Under Principals, select the IAM role name (for example, Sensitive-IAM-Role).
  2. Under Named Data Catalog resources, go to Databases and select a database (for example, demo).

    Figure 7: Select an IAM role and database

    Figure 7: Select an IAM role and database

  3. Under Database permissions, select Describe and then choose Grant.

    Figure 8: Grant database permissions to an IAM role

    Figure 8: Grant database permissions to an IAM role

To grant Tables permissions to IAM roles:

  1. Repeat steps 1 and 2.
  2. Under Tables – optional, choose a table name (for example, demo2).

    Figure 9: Select tables within a database to grant access

    Figure 9: Select tables within a database to grant access

  3. Select the desired Table Permissions (for example, select and describe), and then choose Grant.

    Figure 10: Grant access to tables within the database

    Figure 10: Grant access to tables within the database

  4. Repeat steps 1 through 6 to grant access for the respective database and tables for the non-sensitive IAM role.

Step 6: Client-side setup using JDBC

You can use a JDBC connection to connect Athena and SQL client applications (for example, PyCharm or SQL Workbench) to enable analytics and reporting on the data that Athena returns from Amazon S3 databases. To use the Athena JDBC driver, you must specify the driver class from the JAR file. Additionally, you must pass in some parameters to change the authentication mechanism so the athena-sts-auth libraries are used:

  • aws credentials provider class – Specifies which provider to use, for example, BrowserSaml.
  • S3 output location – Where in S3 the Athena service can write its output. For example, s3://path/to/query/bucket/.

To set up PyCharm

  1. Install Athena JDBC 3.x driver from Athena JDBC 3.x driver.
    1. In the left navigation pane, select JDBC 3.x and then Getting started. Select Uber jar to download a .jar file, which contains the driver and its dependencies.

    Figure 11: Download Athena JDBC jar

    Figure 11: Download Athena JDBC jar

  2. Open PyCharm and create a new project.
    1. Enter a Name for your project
    2. Select the desired project Location
    3. Choose Create

    Figure 12: Create a new project in PyCharm

    Figure 12: Create a new project in PyCharm

  3. Configure Data Source and drivers. Select Data Source, and then choose the plus sign or New to configure new data sources and drivers.

    Figure 13: Add database source properties

    Figure 13: Add database source properties

  4. Configure the Athena driver by selecting the Drivers tab, and then choose the plus sign to add a new driver.

    Figure 14: Add database drivers

    Figure 14: Add database drivers

  5. Under Driver Files, upload the custom JAR file that you downloaded in the Step 1. Select the Athena class dropdown. Enter the driver’s name (for example Athena JDBC Driver). Then choose Apply.

    Figure 15: Add database driver files

    Figure 15: Add database driver files

  6. Configure a new data source. Choose the plus sign and select your driver’s name from the driver dropdown.
  7. Enter the data source name (for example, Athena Demo). For the authentication method, select User & Password.

    Figure 16: Create a project data source profile

    Figure 16: Create a project data source profile

  8. Select the SSH/SSL tab and select Use SSL. Verify that the Use truststore options for IDE, JAVA, and system are all selected.

    Figure 17: Enable data source profile SSL

    Figure 17: Enable data source profile SSL

  9. Select the Options tab and then select Single Session Mode.

    Figure 18: Configure single session mode in PyCharm

    Figure 18: Configure single session mode in PyCharm

  10. Select the General tab and enter the JDBC and SSO URL. The following is a sample JDBC URL based on the SAML application:
    jdbc:athena://Region=<region-name>;CredentialsProvider=BrowserSaml;WorkGroup=<name-of-the-WorkGroup>;SsoLoginUrl=d-xxxxxxxxxx.awsapps.com/start

    1. Choose Apply.
    2. Choose Test Connection. This will open a browser and take you to the IAM Identity Center console. Select the account and role that you want to connect with.

    Figure 19: Test the data source connection

    Figure 19: Test the data source connection

  11. After the connection is successful, select the Schemas tab and select All databases and All schemas.

    Figure 20: Select data source databases and schemas

    Figure 20: Select data source databases and schemas

  12. Run a sample test query: SELECT <table-names> FROM <database-name> limit 10;
  13. Verify that the credentials and permissions are working as expected.

To set up SQL Workbench

  1. Open SQL Workbench.
  2. Configure an Athena driver by selecting File and then Manage Drivers.
  3. Enter the Athena JDBC Driver as the name and set the library to browse the path for the location where you downloaded the driver. Enter com.amazonaws.athena.jdbc.AthenaDriver as the Classname.
  4. Enter the following URL, replacing <us-east-1> with your desired Region and <name-of-the-WorkGroup> with your workgroup name.
    jdbc:athena://Region=<us-east-1>;CredentialsProvider=BrowserSaml;WorkGroup=<name-of-the-WorkGroup>;SsoLoginUrl=d-xxxxxxxxxx.awsapps.com/start;

    Choose OK.

  5. Run a test query, replacing <table-names> and <database-name> with your table and database names:
    SELECT <table-names> FROM <database-name> limit 10;

  6. Verify that the credentials and permissions are working as expected.

Conclusion

In this post, we covered how to use JDBC drivers to connect to Athena from third-party SQL client tools. You learned how to configure IAM Identity Center applications, defining an IAM IdP, IAM roles, and policies. You also learned how to grant permissions to IAM roles using Lake Formation to create distinct access to different classes of data sets and connect to Athena through an SQL client tool (such as PyCharm). This same setup can also work with other supported identity sources such as IAM Identity Centerself-managed or on-premises Active Directory, or an external IdP.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Ajay Rawat

Ajay Rawat
Ajay is a Senior Security Consultant, focusing on AWS Identity and Access Management (IAM), data protection, incident response, and operationalizing AWS security services to increase security effectiveness and reduce risk. Ajay is a technology enthusiast and enjoys working with customers to solve their technical challenges and to improve their security posture in the cloud.

Mihir Borkar

Mihir Borkar
Mihir is an AWS Data Architect who excels at simplifying customer challenges with innovative cloud data solutions. Specializing in AWS Lake Formation and AWS Glue, he designs scalable data lakes and analytics platforms, demonstrating expertise in crafting efficient solutions within the AWS Cloud.

Create a customizable cross-company log lake for compliance, Part I: Business Background

Post Syndicated from Colin Carson original https://aws.amazon.com/blogs/big-data/create-a-customizable-cross-company-log-lake-for-compliance-part-i-business-background/

As described in a previous postAWS Session Manager, a capability of AWS Systems Manager, can be used to manage access to Amazon Elastic Compute Cloud (Amazon EC2) instances by administrators who need elevated permissions for setup, troubleshooting, or emergency changes. While working for a large global organization with thousands of accounts, we were asked to answer a specific business question: “What did employees with privileged access do in Session Manager?”

This question had an initial answer: use logging and auditing capabilities of Session Manager and integration with other AWS services, including recording connections (StartSession API calls) with AWS CloudTrail, and recording commands (keystrokes) by streaming session data to Amazon CloudWatch Logs.

This was helpful, but only the beginning. We had more requirements and questions:

  • After session activity is logged to CloudWatch Logs, then what?
  • How can we provide useful data structures that minimize work to read out, delivering faster performance, using more data, with more convenience?
  • How do we support a variety of usage patterns, such as ongoing system-to-system bulk transfer, or an ad-hoc query by a human for a single session?
  • How should we share and implement governance?
  • Thinking bigger, what about the same question for a different service or across more than one use case? How do we add what other API activity happened before or after a connection—in other words, context?

We needed more comprehensive functionality, more customization, and more control than a single service or feature could offer. Our journey began where previous customer stories about using Session Manager for privileged access (similar to our situation), least privilege, and guardrails ended. We had to create something new that combined existing approaches and ideas:

  • Low-level primitives such as Amazon Simple Storage Service (Amazon S3).
  • Latest features and approaches of AWS, such as vertical and horizontal scaling in AWS Glue.
  • Our experience working with legal, audit, and compliance in large enterprise environments.
  • Customer feedback.

In this post, we introduce Log Lake, a do-it-yourself data lake based on logs from CloudWatch and AWS CloudTrail. We share our story in three parts:

  • Part 1: Business background – We share why we created Log Lake and AWS alternatives that might be faster or easier for you.
  • Part 2: Build – We describe the architecture and how to set it up using AWS CloudFormation templates.
  • Part 3: Add – We show you how to add invocation logs, model input, and model output from Amazon Bedrock to Log Lake.

Do you really want to do it yourself?

Before you build your own log lake, consider the latest, highest-level options already available in AWS–they can save you a lot of work. Whenever possible, choose AWS services and approaches that abstract away undifferentiated heavy lifting to AWS so you can spend time on adding new business value instead of managing overhead. Know the use cases services were designed for, so you have a sense of what they already can do today and where they’re going tomorrow.

If that doesn’t work, and you don’t see an option that delivers the customer experience you want, then you can mix and match primitives in AWS for more flexibility and freedom, as we did for Log Lake.

Session Manager activity logging

As we mentioned in our introduction, you can save logging data to AmazonS3add a table on top, and query that table using Amazon Athena—this is what we recommend you consider first because it’s straightforward.

This would result in files with the sessionid in the name. If you want, you can process these files into a calendarday, sessionid, sessiondata format using an S3 event notification that invokes a function (and make sure to save it to a different bucket, in a different table, to avoid causing recursive loops). The function could derive the calendarday and sessionid from the S3 key metadata, and sessiondata would be the entire file contents.

Alternatively, you can sign to one log group in CloudWatch logs, have an Amazon Data Firehose subscription filter move that to S3 (this file would have additional metadata in the JSON content and more customization potential from filters). This was used in our situation, but it wasn’t enough by itself.

AWS CloudTrail Lake

CloudTrail Lake is for running queries on events over years of history and with near real-time latency and offers a deeper and more customizable view of events than CloudTrail Event history. CloudTrail Lake enables you to federate an event data store, which lets you view the metadata in the AWS Glue catalog and run Athena queries. For needs involving one organization and ongoing ingesting from a trail (or point-in-time import from Amazon S3, or both), you can consider CloudTrail Lake.

We considered CloudTrail Lake, as either a managed lake option or source for CloudTrail only, but ended up creating our own AWS Glue job instead. This was because of a combination of reasons, including full control over schema and jobs, ability to ingest data from an S3 bucket of our choosing as an ongoing source, fine-grained filtering on account, AWS Region, and eventName (eventName filtering wasn’t supported for management events ), and cost.

The cost of CloudTrail lake based on uncompressed data ingested (data size can be 10 times larger than in Amazon S3) was a factor for our use case. In one test, we found CloudTrail Lake to be 38 times faster to process the same workload as Log Lake, but Log Lake was 10–100 times less costly depending on filters, timing, and account activity. Our test workload was 15.9 GB file size in S3, 199 million events, and 400 thousand files, spread across over 150 accounts and 3 Regions. Filters Log Lake applied were eventname='StartSession', 'AssumeRole', 'AssumeRoleWithSAML', and five arbitrary allow listed accounts. These tests might be different from your use case, so you should do your own testing, gather your own data, and decide for yourself.

Other services

The products mentioned previously are the most relevant to the outcomes we were trying to accomplish, but you should consider security, identity, and compliance products on AWS, too. These products and features can be used either as an alternative to Log Lake or to add functionality.

As an example, Amazon Bedrock can add functionality in three ways:

  • To skip the search and query Log Lake for you
  • To summarize across logs
  • As a source for logs (similar to Session Manager as a source for CloudWatch logs)

Querying means you can have an AI agent query your AWS Glue catalog (such as the Log Lake catalog) for data-based results. Summarizing means you can use generative artificial intelligence (AI) to summarize your text logs from a knowledge base as part of retrieval augmented generation (RAG), to ask questions like “How many log files are exactly the same? Who changed IAM roles last night?” Considerations and limitations apply.

Adding Amazon Bedrock as a source means using invocation logging to collect requests and responses.

Because we wanted to store very large amounts of data frugally (compressed and columnar format, not text) and produce non-generative (data-based) results that can be used for legal compliance and security, we didn’t use Amazon Bedrock in Log Lake—but we will revisit this topic in Part 3 when we detail how to use the approach we used for Session Manager for Amazon Bedrock.

Business background

When we began talking with our business partners, sponsors, and other stakeholders, important questions, problems, opportunities, and requirements emerged.

Why we needed to do this

Legal, security, identity, and compliance authorities of the large enterprise we were working for had created a customer-specific control. To comply with the control objective, use of elevated privileges required a manager to manually review all available data (including any session manager activity) to confirm or deny if use of elevated privileges was justified. This was a compliance use case that, when solved, could be applied to more use cases such as auditing and reporting.

Note on terms:

  • Here, the customer in customer-specific control means a control that is solely the responsibility of a customer, not AWS, as described in the AWS Shared Responsibility Model.
  • In this article, we define auditing broadly as testing information technology (IT) controls to mitigate risk, by anyone, at any cadence (ongoing as part of day-to-day operations, or one time only). We don’t refer to auditing that is financial, only conducted by an independent third-party, or only at certain times. We use self-review and auditing interchangeably.
  • We also define reporting broadly as presenting data for a specific purpose in a specific format to evaluate business performance and facilitate data-driven decisions—such as answering “how many employees had sessions last week?”

The use case

Our first and most important use case was a manager who needed to review activity, such as from an after-hours on-call page the previous night. If the manager needed to have additional discussions with their employee or needed additional time to consider activity, they had up to a week (7 calendar days) before they needed to confirm or deny elevated privileges were needed, based on their team’s procedures. A manager needed to review an entire set of events that all share the same session, regardless of known keywords or specific strings, as part of all available data in AWS. This was the workflow:

  1. Employee uses homegrown application and standardized workflow to access Amazon EC2 with elevated privileges using Session Manager.
  2. API activity in CloudTrail and continuous logging to CloudWatch logs.
  3. The problem space – Data somehow gets procured, processed, and provided (this would become Log Lake later).
  4. Another homegrown system (different from step 1) presents session activity to managers and applies access controls (a manager should only review activity for their own employees, and not be able to peruse data outside their team). This data might be only one StartSession API call and no session details, or might be thousands of lines from cat file
  5. The manager reviews all available activity, makes an informed decision, and confirms or denies if use was justified.

This was an ongoing day-to-day operation, with a narrow scope. First, this meant only data available in AWS; if something couldn’t be captured by AWS, it was out of scope. If something was possible, it should be made available. Second, this meant only certain workflows; using Session Manager with elevated privileges for a specific, documented standard operating procedure.

Avoiding review

The simplest solution would be to block sessions on Amazon EC2 with elevated privileges, and fully automate build and deployment. This was possible for some but not all workloads, because some workloads required initial setup, troubleshooting, or emergency changes of Marketplace AMIs.

Is accurate logging and auditing possible?

We won’t extensively detail ways to bypass controls here, but there are important limitations and considerations we had to consider, and we recommend you do too.

First, logging isn’t available for sessionType Port, which includes SSH. This could be mitigated by ensuring employees can only use a custom application layer to start sessions without SSH. Blocking direct SSH access to EC2 instances using security group policies is another option.

Second, there are many ways to intentionally or accidentally hide or obfuscate activity in a session, making review of a specific command difficult or impossible. This was acceptable for our use case for multiple reasons:

  • A manager would always know if a session started and needed review from CloudTrail (our source signal). We joined to CloudWatch to meet our all available data requirement.
  • Continuous streaming to CloudWatch logs would log activity as it happened. Additionally, streaming to CloudWatch Logs supported interactive shell access, and our use case only used interactive shell access (sessionType Standard_Stream). Streaming isn’t supported for sessionType, InteractiveCommands, or NonInteractiveCommands.
  • The most important workflow to review involved an engineered application with one standard operating procedure (less variety than all the ways Session Manager could be used).
  • Most importantly, the manager was responsible for reviewing the reports and expected to apply their own judgement and interpret what happened. For example, a manager review could result in a follow up conversation with the employee that could improve business processes. A manager might ask their employee, “Can you help me understand why you ran this command? Do we need to update our runbook or automate something in deployment?”

To protect data against tampering, changes, or deletion, AWS provides tools and features such as AWS Identity and Access Management (IAM) policies and permissions and Amazon S3 Object Lock.

Security and compliance are a shared responsibility between AWS and the customer, and customers need to decide what AWS services and features to use for their use case. We recommend customers consider a comprehensive approach that considers overall system design and includes multiple layers of security controls (defense in depth). For more information, see the Security pillar of the AWS Well-Architected Framework.

Avoiding automation

Manual review can be a painful process, but we couldn’t automate review for two reasons: Legal requirements and to add friction to the feedback loop felt by a manager whenever an employee used elevated privileges, to discourage using elevated privileges.

Works with existing

We had to work with existing architecture, spanning thousands of accounts and multiple AWS Organizations. This meant sourcing data from buckets as an edge and point of ingress. Specifically, CloudTrail data was managed and consolidated outside of CloudTrail, across organizations and trails, into S3 buckets. CloudWatch data was also consolidated to S3 buckets, from Session Manager to CloudWatch Logs, with Amazon Data Firehose subscription filters on CloudWatch Logs pointing to S3. To avoid negative side effects on existing business processes, our business partners didn’t want to change settings in CloudTrail, CloudWatch, and Firehose. This meant Log Lake needed features and flexibility that enabled changes without impacting other workstreams using the same sources.

Event filtering is not a data lake

Before we were asked to help, there were attempts to do event filtering. One attempt tried to monitor session activity using Amazon EventBridge. This was limited to AWS API operations recorded by CloudTrail such as StartSession and didn’t include the information from inside the session, which was in CloudWatch Logs. Another attempt tried event filtering CloudWatch in the form of a subscription filter. Also, an attempt was made using EventBridge Event Bus with EventBridge rules, and storage in Amazon DynamoDB. These attempts didn’t deliver the expected results because of a combination of factors:

Size

Couldn’t accept large session log payloads because of the EventBridge PutEvents limit of 256 KB entry size. Saving large entries to Amazon S3 and using the object URL in the PutEvents entry would avoid this limitation in EventBridge, but wouldn’t pass the most important information the manager needed to review (the event’s sessionData element). This meant managing files and physical dependencies, and losing the metastore benefit of working with data as logical sets and objects.

Storage

Event filtering was a way to process data, not storage or a source of truth. We asked, how do we restore data lost in flight or destroyed after landing? If components are deleted or undergoing maintenance, can we still procure, process, and provide data—at all three layers independently? Without storage, no.

Data quality

No source of truth meant data quality checks weren’t possible.  We couldn’t answer questions like: “Did the last job process more than 90 percent of events from CloudTrail in DynamoDB?” or“What percentage are we missing from source to target?”

Anti-patterns

DynamoDB as long-term storage wasn’t the most appropriate data store for large analytical workloads, low I/O, and highly complex many-to-many joins.

Reading out

Deliveries were fast, but work (and time and cost) was needed after delivery. In other words, queries had to do extra work to transform raw data into the needed format at time of read, which had a significant, cumulative effect on performance and cost. Imagine users running a select * from table without any filters on years of data and paying for storage and compute of those queries.

Cost of ownership

Filtering by event contents (sessionData from CloudWatch) required knowledge of session behavior, which was business logic. This meant changes to business logic required changes to event filtering. Imagine being asked to change CloudWatch filters or EventBridge rules based on a business process change, and trying to remember where to make the change, or troubleshoot why expected events weren’t being passed. This meant a higher cost of ownership and slower cycle times at best, and inability to meet SLA and scale at worst.

Accidental coupling

Creates accidental coupling between downstream consumers and low-level events. Consumers who directly integrate against events might get different schemas at different times for the same events, or events they don’t need. There’s no way to manage data at a higher level than event, at the level of sets (like all events for one sessionid), or at the object level (a table designed for dependencies). In other words, there was no metastore layer that separated the schema from the files, like in a data lake.

More sources (data to load in)

There were other, less important use cases that we wanted to expand to later: inventory management and security.

For inventory management, such as identifying EC2 instances running a Systems Manager agent that’s missing a patch, finding IAM users with inline policies, or finding Redshift clusters with nodes that aren’t RA3. This data would come from AWS Config unless it isn’t a supported resource type. We cut inventory management from scope because AWS Config data could be added to an AWS Glue catalog later, and queried from Athena using an approach like the one described in How to query your AWS resource configuration states using AWS Config and Amazon Athena.

For security, Splunk and OpenSearch were already in use for serviceability and operational analysis, sourcing files from Amazon S3. Log Lake is a complementary approach sourcing from the same data, which adds metadata and simplified data structures at the cost of latency. For more information about having different tools analyze the same data, see Solving big data problems on AWS.

More use cases (reasons to read out)

We knew from the first meeting that this was a bigger opportunity than just building a dataset for sessions from Systems Manager for manual manager review. Once we had procured logs from CloudTrail and CloudWatch, set up Glue jobs to process logs into convenient tables, and were able to join across these tables, we could change filters and configuration settings to answer questions about additional services and use cases, too. Similar to how we process data for Session Manager, we could expand the filters on Log Lake’s Glue jobs, and add data for Amazon Bedrock model invocation logging. For other use cases, we could use Log Lake as a source for automation (rules-based or ML), deep forensic investigations, or string-match searches (such as IP addresses or user names).

Additional technical considerations

*How did we define session? We would always know if a session started from StartSession event in CloudTrail API activity. Regarding when a session ended, we did not use TerminateSession because this was not always present and we considered this domain-specific logic. Log Lake enabled downstream customers to decide how to interpret the data. For example, our most important workflow had a Systems Manager timeout of 15 minutes, and our SLA was 90 minutes. This meant managers knew a session with a start time more than 2 hours prior to the current time was already ended.

*CloudWatch data required additional processing compared to CloudTrail, because CloudWatch logs from Firehose were saved in gzip format without gz suffix and had multiple JSON documents in the same line that needed to be processed to be on separate lines. Firehose can transform and convert records, such as invoking a Lambda function to transform, convert JSON to ORC, and decompress data, but our business partners didn’t want to change existing settings.

How to get the data (a deep dive)

To support the dataset needed for a manager to review, we needed to identify API-specific metadata (time, event source, and event name), and then join it to session data. CloudTrail was necessary because it was the most authoritative source for AWS API activity, specifically StartSession and AssumeRole and AssumeRoleWithSAML events, and contained context that didn’t exist in CloudWatch Logs (such as the error code AccessDenied) which could be useful for compliance and investigation. CloudWatch was necessary because it contained the keystrokes in a session, in the CloudWatch log’s sessionData element. We needed to obtain the AWS source of record from CloudTrail, but we recommend you check with your authorities to confirm you really need to join to CloudTrail. We mention this in case you hear this question “why not derive some sort of earliest eventTime from CloudWatch logs, and skip joining to CloudTrail entirely? That would cut size and complexity by half.”

To join CloudTrail (eventTime, eventname, errorCode, errorMessage, and so on) with CloudWatch (sessionData), we had to do the following:

  1. Get the higher level API data from CloudTrail (time, event source, and event name), as the authoritative source for auditing Session Manager. To get this, we needed to look inside all CloudTrail logs and get only the rows with eventname=‘StartSession’ and eventsource=‘ssm.amazonaws.com’ (events from Systems Manager)—our business partners described this as looking for a needle in a haystack, because this could be only one session event across millions or billions of files. After we obtained this metadata, we needed to extract the sessionid to know what session to join it to, and we chose to extract sessionid from responseelements. Alternatively, we could use useridentity.sessioncontext.sourceidentity if a principal provided it while assuming a role (requires sts:SetSourceIdentity in the role trust policy).

Sample of a single record’s responseelements.sessionid value: "sessionid":"theuser-thefederation-0b7c1cc185ccf51a9"

The actual sessionid was the final element of the logstream: 0b7c1cc185ccf51a9.

  1. Next we needed to get all logs for a single session from CloudWatch. Similarly to CloudTrail, we needed to look inside all CloudWatch logs landing in Amazon S3 from Firehose to identify only the needles that contained "logGroup":"/aws/ssm/sessionlogs". Then, we could get sessionid from logstream or sessionId, and get session activity from the message.sessionData.

Sample of a single record’s logStream element: "sessionId": "theuser-thefederation-0b7c1cc185ccf51a9"

Note: Looking inside the log isn’t always necessary. We did it because we had to work with existing logs Firehose put to Amazon S3, which didn’t have the logstream (and sessionid) in the file name. For example, a file from Firehose might have a name like

cloudwatch-logs-otherlogs-3-2024-03-03-22-22-55-55239a3d-622e-40c0-9615-ad4f5d4381fa

If we were able to use the ability of Session Manager to send to S3 directly, the file name in S3 is the loggroup (theuser-thefederation-0b7c1cc185ccf51a9.dms)and could be used to derive sessionid without looking inside the file.

  1. Downstream of Log Lake, consumers could join on sessionid which was derived in the previous step.

What’s different about Log Lake

If you remember one thing about Log Lake, remember this: Log Lake is a data lake for compliance-related use cases, uses CloudTrail and CloudWatch as data sources, has separate tables for writing (original raw) and reading (read-optimized or readready), and gives you control over all components so you can customize it for yourself.

Here are some of the signature qualities of Log Lake:

Legal, identity, or compliance use cases

This includes deep dive forensic investigation, meaning use cases that are large volume, historical, and analytical. Because Log Lake uses Amazon S3, it can meet regulatory requirements that require write-once-read-many (WORM) storage.

AWS Well-Architected Framework

Log Lake applies real-world, time-tested design principles from the AWS Well-Architected Framework. This includes, but is not limited to:

Operational Excellence also meant knowing service quotas, performing workload testing, and defining and documenting runbook processes. If we hadn’t tried to break something to see where the limit is, then we considered it untested and inappropriate for production use. To test, we would determine the highest single day volume we’d seen in the past year, and then run that same volume in an hour to see if (and how) it would break.

High-Performance, Portable Partition Adding (AddAPart)

Log Lake adds partitions to tables using Lambda functions with SQS, a pattern we call AddAPart. This uses Amazon Simple Query Service (SQS) to decouple triggers (files landing in Amazon S3) from actions (associating that file with metastore partition). Think of this as having four F’s:

This means no AWS Glue crawlers, no alter table or msck repair table to add partitions in Athena, and can be reused across sources and buckets. The management of partitions in Log Lake makes using partition-related features available in AWS Glue, including AWS Glue partition indexes and workload partitioning and bounded execution.

File name filtering uses the same central controls for lower cost of ownership, faster changes, troubleshooting from one location, and emergency levers—this means that if you want to avoid log recursion happening from a specific account, or want to exclude a Region because of regulatory compliance, you can do it in one place, managed by your change control process, before you pay for processing in downstream jobs.

If you want to tell a team, “onboard your data source to our log lake, here are the steps you can use to self-serve,” you can use AddAPart to do that. We describe this in Part 2.

Readready Tables

In Log Lake, data structures offer differentiated value to users, and original raw data isn’t directly exposed to downstream users by default. For each source, Log Lake has a corresponding read-optimized readready table.

Instead of this:

from_cloudtrail_raw

from_cloudwatch_raw

Log Lake exposes only these to users:

from_cloudtrail_readready

from_cloudwatch_readready

In Part 2, we describe these tables in detail. Here are our answers to frequently asked questions about readready tables:

Q: Doesn’t this have an up-front cost to process raw into readready? Why not pass the work (and cost) to downstream users?

A: Yes, and for us the cost of processing partitions of raw into readready happened once and was fixed, and was offset by the variable costs of querying, which was from many company-wide callers (systemic and human), with high frequency, and large volume.

Q: How much better are readready tables in terms of performance, cost, and convenience? How do you achieve these gains? How do you measure “convenience”?

A: In most tests, readready tables are 5–10 times faster to query and more than 2 times smaller in Amazon S3. Log Lake applies more than one technique: omitting columns, partition design, AWS Glue partition indexes, data types (readready tables don’t allow any nested complex data types within a column, such as struct<struct>), columnar storage (ORC), and compression (ZLIB). We measure convenience as the amount of operations required to join on a sessionid; using Log Lake’s readready tables this is 0 (zero).

Q: Do raw and readready use the same files or buckets?

A: No, files and buckets are not shared. This decouples writes from reads, improves both write and read performance, and adds resiliency.

This question is important when designing for large sizes and scaling, because a single job or downstream read alone can span millions of files in Amazon S3. S3 scaling doesn’t happen immediately, so queries against raw or original data involving many tiny JSON files can cause S3 503 errors when it exceeds 5,500 GET/HEAD per second. More than one bucket helps avoid resource saturation. There is another option that we didn’t have when we created Log Lake: S3 Express One Zone. For reliability, we still recommend not putting all your files in one bucket. Also, don’t forget to filter your data.

Customization and control

You can customize and control all components (columns or schema, data types, compression, job logic, job schedule, and so on) because Log Lake is built using AWS primitives—such as Amazon SQS and Amazon S3—for the most comprehensive combination of features with the most freedom to customize. If you want to change something, you can.

From mono to many

Rather than one large, monolithic lake that is tightly coupled to other systems, Log Lake is just one node in a larger network of distributed data products across different data domains—this concept is data mesh. Just like the AWS APIs it is built on, Log Lake abstracts away heavy lifting and enables users to move faster, more efficiently, and not wait for centralized teams to make changes. Log Lake does not try to cover all use cases—instead, Log Lake’s data can be accessed and consumed by domain-specific teams, empowering business experts to self-serve.

When you need more flexibility and freedom

As builders, sometimes you want to dissect a customer experience, find problems, and figure out ways to make it better. That means going a layer down to mix and match primitives together to get more comprehensive features and more customization, flexibility, and freedom.

We built Log Lake for our long-term needs, but it would have been easier in the short-term to save Session Manager logs to Amazon S3 and query them with Athena. If you have considered what already exists in AWS, and you’re sure you need more comprehensive abilities or customization, read on to Part 2: Build, which explains Log Lake’s architecture and how you can set it up.

If you have feedback and questions, let us know in the comments section.

References


About the authors

Colin Carson is a Data Engineer at AWS ProServe. He has designed and built data infrastructure for multiple teams at Amazon, including Internal Audit, Risk & Compliance, HR Hiring Science, and Security.

Sean O’Sullivan is a Cloud Infrastructure Architect at AWS ProServe. He has over 8 years industry experience working with customers to drive digital transformation projects, helping architect, automate, and engineer solutions in AWS.

AWS completes the first GDV joint audit with participant insurers in Germany

Post Syndicated from Servet Gözel original https://aws.amazon.com/blogs/security/aws-completes-the-first-gdv-joint-audit-with-participant-insurers-in-germany/

We’re excited to announce that Amazon Web Services (AWS) has completed its first German Insurance Association (GDV) joint audit with GDV participant members, which provides assurance to customers in the German insurance industry for the security of their workloads on AWS. This is an important addition to the joint audits performed at AWS by our regulated customers within the financial services industry. Joint audits are an efficient method to provide additional assurance to a group of customers on the “security of the cloud” (as described in the AWS Shared Responsibility Model), in addition to Compliance Programs (for example, C5) and resources that are provided to customers on AWS Artifact.

At AWS, security is our top priority. As customers embrace the scalability and flexibility of AWS, we’re helping them evolve security and compliance into key business enablers. We’re obsessed with earning and maintaining customer trust, and we provide our financial services customers, their end users, and regulatory bodies with the assurance that AWS has the necessary controls in place to help protect their most sensitive material and regulated workloads.

With the increasing digitalization of the financial services industry, and the importance of cloud computing as a key enabling technology for digitalization, security and governance is becoming an ever-more-significant priority for financial services companies. Our engagement with GDV members is an example of how AWS supports customers’ risk management and regulatory compliance. For the first time, this joint audit meticulously assessed the AWS controls that enable us to help protect customers’ workloads, while adhering to strict regulatory obligations. For insurers, moving their workloads to AWS helps protect customer data, support continuity of business-critical operations, and meet new standards in regulatory reporting.

GDV is the association of private insurers in Germany, representing around 470 members in the industry, and is a key player within the German and European financial services industries. GDV’s members participating in this joint audit have reached out to AWS to exercise their audit rights. For this cycle, the 35 participating members from the German insurance industry decided to appoint the Deutsche Cyber-Sicherheitsorganisation GmbH (DCSO) as the single external audit service provider, to perform the audit on behalf of each of the participating members. Because many participating members are affiliates of larger insurance groups and the audit report can be used throughout the group, a coverage of over 70% of the German market in terms of revenue is achieved.

Audit preparations

The scope of the audit was defined with reference to the Federal Office for Information Security (BSI) C5 Framework. It included key domains such as identity and access management, as well as AWS services such as Amazon Elastic Compute Cloud (Amazon EC2), and Regions relevant to participant members such as the Europe (Frankfurt) Region (eu-central-1).

Audit fieldwork

This audit fieldwork phase started after a kick-off in Berlin, Germany. It used a remote approach, with work occurring through the use of videoconferencing and through a secure audit portal for the inspection of evidence. Auditors assessed AWS policies, procedures, and controls, following a risk-based approach, and using sampled evidence, deep-dive sessions and follow-up questions to clarify provided evidence. In the DCSO’s own words regarding their experience during the audit, “We experienced a transparent and comprehensive audit process and appreciate the professional approach as well as the commitment shown by AWS in addressing all our inquiries.”

Audit results

The audit was carried out and completed according to the assessment criteria that were mutually agreed upon by AWS and auditors on behalf of participating members. After a joint review by the auditors and AWS, the auditors finalized the audit report. The results of the GDV joint audit are only available to the participating members and their regulators. The results provide participating members with assurance regarding the AWS controls environment, helping members remove compliance blockers, accelerate their adoption of AWS services, obtain confidence, and gain trust in AWS security controls.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Servet Gözel

Servet Gözel
Servet is a Principal Audit Program Manager in security assurance at AWS, based in Munich, Germany. Servet leads customer audits and assurance programs across Europe. For the past 19 years, he has worked in IT audit, IT advisory, and information security roles areas across various industries and held the CISO role for a group company under a leading insurance provider in Germany.

Andreas Terwellen

Andreas Terwellen
Andreas is a Senior Manager in security audit assurance at AWS, based in Frankfurt, Germany. His team is responsible for third-party and customer audits, attestations, certifications, and assessments across Europe. Previously, he was a CISO in a DAX-listed telecommunications company in Germany. He also worked for different consulting companies managing large teams and programs across multiple industries and sectors.

Daniele Basriev

Daniele Basriev
Daniele is a Security Audit Program Manager at AWS who manages customer security audits and third-party audit programs across Europe. In the past 19 years, he has worked in a wide array of industries and on numerous control frameworks within complex fast-paced environments. He built his expertise in Big 4 accounting firms and then moved into IT security strategy, IT governance, and compliance roles across multiple industries.