Tag Archives: Technical How-to

How to unit test and deploy AWS Glue jobs using AWS CodePipeline

2022-05-04 Praveen Kumar Jeyarajan

Post Syndicated from Praveen Kumar Jeyarajan original https://aws.amazon.com/blogs/devops/how-to-unit-test-and-deploy-aws-glue-jobs-using-aws-codepipeline/

This post is intended to assist users in understanding and replicating a method to unit test Python-based ETL Glue Jobs, using the PyTest Framework in AWS CodePipeline. In the current practice, several options exist for unit testing Python scripts for Glue jobs in a local environment. Although a local development environment may be set up to build and unit test Python-based Glue jobs, by following the documentation, replicating the same procedure in a DevOps pipeline is difficult and time consuming.

Unit test scripts are one of the initial quality gates used by developers to provide a high-quality build. One must reuse these scripts during regression testing to make sure that all of the existing functionality is intact, and that new releases don’t disrupt key application functionality. The majority of the regression test suites are expected to be integrated with the DevOps Pipeline for its execution. Unit testing an application code is a fundamental task that evaluates whether each (unit) code written by a programmer functions as expected. Unit testing of code provides a mechanism to determine that software quality hasn’t been compromised. One of the difficulties in building Python-based Glue ETL tasks is their ability for unit testing to be incorporated within DevOps Pipeline, especially when there are modernization of mainframe ETL process to modern tech stacks in AWS

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue provides all of the capabilities needed for data integration. This means that you can start analyzing your data and putting it to use in minutes rather than months. AWS Glue provides both visual and code-based interfaces to make data integration easier.

Prerequisites

GitHub Repository

Amazon ECR Image URI for Glue Library

Solution overview

A typical enterprise-scale DevOps pipeline is illustrated in the following diagram. This solution describes how to incorporate the unit testing of Python-based AWS Glue ETL processes into the AWS DevOps Pipeline.

Figure 1 Solution Overview

The GitHub repository aws-glue-jobs-unit-testing has a sample Python-based Glue job in the src folder. Its associated unit test cases built using the Pytest Framework are accessible in the tests folder. An AWS CloudFormation template written in YAML is included in the deploy folder. As a runtime environment, AWS CodeBuild utilizes custom container images. This feature is used to build a project utilizing Glue libraries from Public ECR repository, that can run the code package to demonstrate unit testing integration.

Solution walkthrough

Time to read 7 min
Time to complete 15-20 min
Learning level 300
Services used
AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, Amazon Elastic Container Registry (Amazon ECR) Public Repositories, AWS CloudFormation

The container image at the Public ECR repository for AWS Glue libraries includes all of the binaries required to run PySpark-based AWS Glue ETL tasks locally, as well as unit test them. The public container repository has three image tags, one for each AWS Glue version supported by AWS Glue. To demonstrate the solution, we use the image tag glue_libs_3.0.0_image_01 in this post. To utilize this container image as a runtime image in CodeBuild, copy the Image URI corresponding to the image tag that you intend to use, as shown in the following image.

Figure 2 Select Glue Library from Public ECR

The aws-glue-jobs-unit-testing GitHub repository contains a CloudFormation template, pipeline.yml, which deploys a CodePipeline with CodeBuild projects to create, test, and publish the AWS Glue job. As illustrated in the following, use the copied image URL from Amazon ECR public to create and test a CodeBuild project.

  TestBuild:
    Type: AWS::CodeBuild::Project
    Properties:
      Artifacts:
        Type: CODEPIPELINE
      BadgeEnabled: false
      Environment:
        ComputeType: BUILD_GENERAL1_LARGE
        Image: "public.ecr.aws/glue/aws-glue-libs:glue_libs_3.0.0_image_01"
        ImagePullCredentialsType: CODEBUILD
        PrivilegedMode: false
        Type: LINUX_CONTAINER
      Name: !Sub "${RepositoryName}-${BranchName}-build"
      ServiceRole: !GetAtt CodeBuildRole.Arn

The pipeline performs the following operations:

It uses the CodeCommit repository as the source and transfers the most recent code from the main branch to the CodeBuild project for further processing.
The following stage is build and test, in which the most recent code from the previous phase is unit tested and the test report is published to CodeBuild report groups.
If all of the test results are good, then the next CodeBuild project is launched to publish the code to an Amazon Simple Storage Service (Amazon S3) bucket.
Following the successful completion of the publish phase, the final step is to deploy the AWS Glue task using the CloudFormation template in the deploy folder.

Deploying the solution

Set up

Now we’ll deploy the solution using a CloudFormation template.

Using the GitHub Web, download the code.zip file from the aws-glue-jobs-unit-testing repository. This zip file contains the GitHub repository’s src, tests, and deploy folders. You may also create the zip file yourself using command-line tools, such as git and zip. To create the zip file on Linux or Mac, open the terminal and enter the following commands.

git clone https://github.com/aws-samples/aws-glue-jobs-unit-testing.git
cd aws-glue-jobs-unit-testing
git checkout master
zip -r code.zip src/ tests/ deploy/

Sign in to the AWS Management Console and choose the AWS Region of your choice.
Create an Amazon S3 bucket. For more information, see How Do I Create an S3 Bucket? in the AWS documentation.
Upload the downloaded zip package, code.zip, to the Amazon S3 bucket that you created.

In this example, I created an Amazon S3 bucket named aws-glue-artifacts-us-east-1 in the N. Virginia (us-east-1) Region, and used the console to upload the zip package from the GitHub repository to the Amazon S3 bucket.

Figure 3 Upload code.zip file to S3 bucket

Creating the stack

In the CloudFormation console, choose Create stack.
On the Specify template page, choose Upload a template file, and then choose the pipeline.yml template, downloaded from the GitHub repository

Figure 4 Upload pipeline.yml template to create a new CloudFormation stack

Specify the following parameters:.

Stack name: glue-unit-testing-pipeline (Choose a stack name of your choice)
ApplicationStackName: glue-codepipeline-app (This is the name of the CloudFormation stack that will be created by the pipeline)
BranchName: master (This is the name of the branch to be created in the CodeCommit repository to check-in the code from the Amazon S3 bucket zip file)
BucketName: aws-glue-artifacts-us-east-1 (This is the name of the Amazon S3 bucket that contains the zip file. This bucket will also be used by the pipeline for storing code artifacts)
CodeZipFile: lambda.zip (This is the key name of the sample code Amazon S3 object. The object should be a zip file)
RepositoryName: aws-glue-unit-testing (This is the name of the CodeCommit repository that will be created by the stack)
TestReportGroupName: glue-unittest-report (This is the name of the CodeBuild test report group that will be created to store the unit test reports)

Figure 5 Fill parameters for stack creation

Choose Next, and again Next.

On the Review page, under Capabilities, choose the following options:

I acknowledge that CloudFormation might create IAM resources with custom names.

Figure 6 Acknowledge IAM roles creation

Choose Create stack to begin the stack creation process. Once the stack creation is complete, the resources that were created are displayed on the Resources tab. The stack creation takes approximately 5-7 minutes.

Figure 7 Successful completion of stack creation

The stack automatically creates a CodeCommit repository with the initial code checked-in from the zip file uploaded to the Amazon S3 bucket. Furthermore, it creates a CodePipeline view using the CodeCommit repository as the source. In the above example, the CodeCommit repository is aws-glue-unit-test, and the pipeline is aws-glue-unit-test-pipeline.

Testing the solution

To test the deployed pipeline, open the CodePipeline console and select the pipeline created by the CloudFormation stack. Select the Release Change button on the pipeline page.

Figure 8 Choose Release Change on pipeline page

The pipeline begins its execution with the most recent code in the CodeCommit repository.

When the Test_and_Build phase is finished, select the Details link to examine the execution logs.

Figure 9 Successfully completed the Test_and_Build stage

Select the Reports tab, and choose the test report from Report history to view the unit execution results.

Figure 10 Test report from pipeline execution

Finally, after the deployment stage is complete, you can see, run, and monitor the deployed AWS Glue job on the AWS Glue console page. For more information, refer to the Running and monitoring AWS Glue documentation

Figure 11 Successful pipeline execution

Cleanup

To avoid additional infrastructure costs, make sure that you delete the stack after experimenting with the examples provided in the post. On the CloudFormation console, select the stack that you created, and then choose Delete. This will delete all of the resources that it created, including CodeCommit repositories, IAM roles/policies, and CodeBuild projects.

Summary

In this post, we demonstrated how to unit test and deploy Python-based AWS Glue jobs in a pipeline with unit tests written with the PyTest framework. The approach is not limited to CodePipeline, and it can be used to build up a local development environment, as demonstrated in the Big Data blog. The aws-glue-jobs-unit-testing GitHub repository contains the example’s CloudFormation template, as well as sample AWS Glue Python code and Pytest code used in this post. If you have any questions or comments regarding this example, please open an issue or submit a pull request.

Authors:

Access Apache Livy using a Network Load Balancer on a Kerberos-enabled Amazon EMR cluster

2022-05-03 Bharat Gamini

Post Syndicated from Bharat Gamini original https://aws.amazon.com/blogs/big-data/access-apache-livy-using-a-network-load-balancer-on-a-kerberos-enabled-amazon-emr-cluster/

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto. Amazon EMR supports Kerberos for authentication; you can enable Kerberos on Amazon EMR and put the cluster in a private subnet to maximize security.

To access the cluster, the best practice is to use a Network Load Balancer (NLB) to expose only specific ports, which are access-controlled via security groups. By default, the NLB prevents Kerberos ticket authentication to any Amazon EMR service.

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as SparkContext management, all via a simple REST interface or an RPC client library.

In this post, we discuss how to provide Kerberos ticket access to Livy for external systems like Airflow and Notebooks using an NLB. You can apply this process to other Amazon EMR services beyond Livy, such as Trino and Hive.

Solution overview

The following are the high-level steps required:

Create an EMR cluster with Kerberos security configuration.
Create an NLB with required listeners and target groups.
Update the Kerberos Key Distribution Center (KDC) to create a new service principal and keytab changes.
Update the Livy configuration file.
Verify Livy is accessible via the NLB.
Run the Python Livy test case.

Prerequisites

The advanced configuration presented in this post assumes familiarity with Amazon EMR, Kerberos, Livy, Python and bash.

Create an EMR cluster

Create the Kerberos security configuration using the AWS Command Line Interface (AWS CLI) as follows (this creates the KDC on the EMR primary node):

aws emr create-security-configuration --name kdc-security-config --security-configuration '{
   "EncryptionConfiguration":{
      "InTransitEncryptionConfiguration":{
         "TLSCertificateConfiguration":{
            "CertificateProviderType":"PEM",
            "S3Object":"s3://${conf_bucket}/${certs.zip}"
         }
      },
      "AtRestEncryptionConfiguration":{
         "S3EncryptionConfiguration":{
            "EncryptionMode":"SSE-S3"
         }
      },
      "EnableInTransitEncryption":true,
      "EnableAtRestEncryption":true
   },
   "AuthenticationConfiguration":{
      "KerberosConfiguration":{
         "Provider":"ClusterDedicatedKdc",
         "ClusterDedicatedKdcConfiguration":{
            "TicketLifetimeInHours":24
         }
      }
   }
}'

It’s a security best practice to keep passwords in AWS Secrets Manager. You can use a bash function like the following as the argument to the --kerberos-attributes option so no passwords are stored in the launch script or command line. The function outputs the required JSON for the --kerberos-attributes option after retrieving the password from Secrets Manager.

krbattrs() { # Pull the KDC password from Secrets Manager without saving to disk or var
cat << EOF
  {
    "Realm": "EC2.INTERNAL",
    "KdcAdminPassword": "$(aws secretsmanager get-secret-value \
        --secret-id KDCpasswd  |jq -r .SecretString)"
  }
EOF
}

Create the cluster using the AWS CLI as follows:

aws emr create-cluster \
  --name "<your-cluster-name>" \
  --release-label emr-6.4.0 \
  --log-uri "s3://<your-log-bucket>" \
  --applications Name=Hive Name=Spark \
  --ec2-attributes "KeyName=<your-key-name>,SubnetId=<your-private-subnet>" \
  --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=m3.xlarge \
  --security-configuration <kdc-security-config> \
  --kerberos-attributes $(krbattrs) \
  --use-default-roles

Create an NLB

Create an internet-facing NLB with TCP listeners in your VPC and subnet. An Internet-facing load balancer routes requests from clients to targets over the internet. Conversely, an Internal NLB routes requests to targets using private IP addresses. For instructions, refer to Create a Network Load Balancer.

The following screenshot shows the listener details.

Create target groups and register the EMR primary instance (Livy3) and KDC instance (KDC3). For this post, these instances are the same; use the respective instances if KDC is running on a different instance.

The KDC and EMR security groups must allow the NLB’s private IP address to access ports 88 and 8998, respectively. You can find the NLB’s private IP address by searching the elastic network interfaces for the NLB’s name. For access control instructions, refer to this article on the knowledge center. Now that the security groups allow access, the NLB health check should pass, but Livy isn’t usable via the NLB until you make further changes (detailed in the following sections). The NLB is actually being used as a proxy to access Livy rather than doing any load balancing.

Update the Kerberos KDC

The KDC used by the Livy service must contain a new HTTP Service Principal Name (SPN) using the public NLB host name.

You can create the new principle from the EMR primary host using the full NLB public host name:

sudo kadmin.local addprinc HTTP/[email protected]

Replace the fully qualified domain name (FQDN) and Kerberos realm as needed. Ensure the NLB hostname is all lowercase.

After the new SPN exists, you create two keytabs containing that SPN. The first keytab is for the Livy service. The second keytab, which must use the same KVNO number as the first keytab, is for the Livy client.

Create Livy service keytab as follows:

sudo kadmin.local ktadd -norandkey -k /etc/livy2.keytab livy/[email protected]
sudo kadmin.local ktadd -norandkey -k /etc/livy2.keytab HTTP/[email protected]
sudo chown livy:livy /etc/livy2.keytab
sudo chmod 600 /etc/livy2.keytab
sudo -u livy klist -e -kt /etc/livy2.keytab

Note the key version number (KVNO) for the HTTP principal in the output of the preceding klist command. The KVNO numbers for the HTTP principal must match the KVNO numbers in the user keytab. Copy the livy2.keytab file to the EMR cluster Livy host if it’s not already there.

Create a user or client keytab as follows:

sudo kadmin.local ktadd -norandkey -k /var/tmp/user1.keytab [email protected]
sudo kadmin.local ktadd -norandkey -k /var/tmp/user1.keytab HTTP/[email protected]

Note the -norandkey option used when adding the SPN. That preserves the KVNO created in the preceding livy2.keytab.

Copy the user1.keytab to the client machine running the Python test case as user1.

Replace the FQDN, realm, and keytab path as needed.

Update the Livy configuration file

The primary change on the EMR cluster primary node running the Livy service is to the /etc/livy/conf/livy.conf file. You change the authentication principal that Livy uses, as well as the associated Kerberos keytab created earlier.

Make the following changes to the livy.conf file with sudo:

livy.server.auth.kerberos.principal = HTTP/[email protected]
livy.server.auth.kerberos.keytab = /etc/livy2.keytab

Don’t change the livy.server.launch.kerberos.* values.

Restart and verify the Livy service:

sudo systemctl restart livy-server
sudo systemctl status livy-server

Verify the Livy port is listening:

sudo lsof -Pi :8998

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 30106 livy 196u IPv6 224853844 0t0 TCP *:8998 (LISTEN)

You can automate these steps (modifying the KDC and Livy config file) by adding a step to the EMR cluster. For more information, refer to Tutorial: Configure a cluster-dedicated KDC.

Verify Livy is accessible via the NLB

You can now use user1.keytab to authenticate against the Livy REST endpoint. Copy the user1.keytab you created earlier to the host and user login, which run the Livy test case. The host running the test case must be configured to connect to the modified KDC.

Create a Linux user (user1) on client host and EMR cluster.

If the client host has a terminal available that the user can run commands in, you can use the following commands to verify network connectivity to Livy before running the actual Livy Python test case.

Verify the NLB host and port are reachable (no data will be returned by the nc command):

$ nc -vz mynlb-a0ac4b07f9f7f0a1.elb.us-west-2.amazonaws.com 8998
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 44.242.1.1:8998.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.

Create a TLS connection, which returns the server’s TLS certificate and TCP packets:

openssl s_client -connect mynlb-a0ac4b07f9f7f0a1.elb.us-west-2.amazonaws.com:8998

If the openssl command doesn’t return a TLS server certificate, the rest of the verification doesn’t succeed. You may have a proxy or firewall interfering with the connection. Investigate your network environment, resolve the issue, and repeat the openssl command to ensure connectivity.

Verify the Livy REST endpoint using curl. This verifies Livy REST but not Spark.

kinit -kt user1.keytab [email protected]
curl -k -u : --negotiate https://mynlb-a0ac4b07f9f7f0a1.elb.us-west-2.amazonaws.com:8998/sessions
{"from":0,"total":0,"sessions":[]}

curl arguments:
  "-k"   - Ignore secure connection check
  "-u :" - Use user name and passwords from environment
  "--negotiate" – Enables negotiate(SPNEGO) authentication

Run the Python Livy test case

The Livy test case is a simple Python3 script named livy-verify.py. You can run this script from a client machine to run Spark commands via Livy using the NLB. The script is as follows:

#!/usr/bin/env python3
# pylint: disable=invalid-name,no-member

"""
Verify Livy (REST) service using pylivy module, install req modules or use PEX:
  sudo yum install python3-devel
  sudo python3 -m pip install livy==0.8.0 requests==2.27.1 requests_gssapi==1.2.3
  https://pypi.org/project/requests-gssapi/

Kerberos authN implicitly uses TGT from kinit in users login env
"""

import shutil
import requests
from requests_gssapi import HTTPSPNEGOAuth
from livy import LivySession

# Disable ssl-warnings for self-signed certs when testing
requests.packages.urllib3.disable_warnings()

# Set the base URI(use FQDN and TLS) to target a Livy service
# Redefine remote_host to specify the remote Livy hostname to connect to
remote_host="mynlb-a0ac4b07f9f7f0a1.elb.us-west-2.amazonaws.com"
livy_uri = "https://" + remote_host + ":8998"

def livy_session():
    ''' Connect to Livy (REST) and run trivial pyspark command '''
    sconf = {"spark.master": "yarn", "spark.submit.deployMode": "cluster"}

    if shutil.which('kinit'):
        kauth = HTTPSPNEGOAuth()
        # Over-ride with an explicit user principal
        #kauth = HTTPSPNEGOAuth(principal="[email protected]")
        print("kinit found, kauth using Krb cache")
    else:
        kauth = None
        print("kinit NOT found, kauth set to None")

    with LivySession.create(livy_uri, verify=False, auth=kauth, spark_conf=sconf) as ls:
        ls.run("rdd = sc.parallelize(range(100), 2)")
        ls.run("rdd.take(3)")

    return 'LivySession complete'

def main():
    """ Starting point """
    livy_session()

if __name__ == '__main__':
    main()

The test case requires the new SPN to be in the user’s Kerberos ticket cache. To get the service principal into the Kerberos cache, use the kinit command with the -S option:

kinit -kt user1.keytab -S HTTP/[email protected] [email protected]

Note the SPN and the User Principal Name (UPN) are both used in the kinit command.

The Kerberos cache should look like the following code, as revealed by the klist command:

klist
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: [email protected]

Valid starting Expires Service principal
01/20/2022 01:38:06 01/20/2022 11:38:06 HTTP/[email protected]
renew until 01/21/2022 01:38:06

Note the HTTP service principal in the klist ticket cache output.

After the SPN is in the cache as verified by klist, you can run the following command to verify that Livy accepts the Kerberos ticket and runs the simple PySpark script. It generates a simple array, [0,1,2], as the output. The preceding Python script has been copied to the /var/tmp/user1/ folder in this example.

/var/tmp/user1/livy-verify.py
kinit found, kauth using TGT
[0, 1, 2]

It can take a minute or so to generate the result. Any authentication errors will happen in seconds. If the test in the new environment generates the preceding array, the Livy Kerberos configuration has been verified.

Any other client program that needs to have Livy access must be a Kerberos client of the KDC that generated the keytabs. It must also have a client keytab (such as user1.keytab or equivalent) and the service principal key in its Kerberos ticket cache.

In some environments, a simple kinit as follows may be sufficient:

kdestroy
kinit -kt user1.keytab [email protected]

Summary

If you have an existing EMR cluster running Livy and using Kerberos (even in a private subnet), you can add an NLB to connect to the Livy service and still authenticate with Kerberos. For simplicity, we used a cluster-dedicated KDC in this post, but you can use any KDC architecture option supported by Amazon EMR. This post documented all the KDC and Livy changes to make it work; the script and procedure have been run successfully in multiple environments. You can modify the Python script as needed and try running the verification script in your environment.

For more details about the systems and processes described in this post, refer to the following:

About the Authors

John Benninghoff is a AWS Professional Services Sr. Data Architect, focused on Data Lake architecture and implementation.

Bharat Gamini is a Data Architect focused on Big Data & Analytics at Amazon Web Services. He helps customers architect and build highly scalable, robust and secure cloud-based analytical solutions on AWS. Besides family time, he likes watching movies and sports.

Smithy Server and Client Generator for TypeScript (Developer Preview)

2022-05-02 Adam Thomas

Post Syndicated from Adam Thomas original https://aws.amazon.com/blogs/devops/smithy-server-and-client-generator-for-typescript/

We’re excited to announce the Developer Preview of Smithy’s server and client generators for TypeScript. This enables developers to write concise, type-safe code in the same model-first manner that AWS has used to develop its services. Smithy is AWS’s open-source Interface Definition Language (IDL) for web services. AWS uses Smithy and its internal predecessor to model services, generate server scaffolding, and generate rich clients in multiple languages, such as the AWS SDKs.

If you’re unfamiliar with Smithy, check out the Smithy website and watch an introductory talk from Michael Dowling, Smithy’s Principal Engineer.

This post will demonstrate how you can write a simple Smithy model, write a service that implements the model, deploy it to AWS Lambda, and call it using a generated client.

What can the server generator do for me?

Using Smithy and its server generator unlocks model-first development. Model-first development puts your customers first. This forces you to define your interface first rather than let your API to become implicitly defined by your implementation choices.

Smithy’s server generator for TypeScript enables development at a higher level of abstraction. By making serialization, deserialization, and routing an implementation detail in generated code, service developers can focus on writing code against modeled types, rather than against raw HTTP requests. Your business logic and unit tests will be cleaner and more readable, and the way that your messages are represented on the wire is defined explicitly by a protocol, not implicitly by your JSON parser.

The server generator also lets you leverage TypeScript’s type safety. Not only is the business logic of your service written against strongly typed interfaces, but also you can reference your service’s types in your AWS Cloud Development Kit (AWS CDK) definition. This makes sure that your stack will fail at build time rather than deployment time if it’s out of sync with your model.

Finally, using Smithy for service generation lets you ship clients in Smithy’s growing portfolio of generated clients. We’re unveiling a developer preview of the client generator for TypeScript today as well, and we’ll continue to unveil more implementations in the future.

The architecture of a Smithy service

A Smithy service looks much like any other web service running on Lambda behind Amazon API Gateway. The difference lies in the code itself. Where a standard service might use a generic deserializer to parse an incoming request and bind it to an object, a Smithy service relies on code generation for deserialization, serialization, validation, and the object model itself. These functions are generated into a standalone library known as a Smithy server SDK. Using a server SDK with one of AWS’s prepackaged request converters, service developers can focus on their business logic, rather than the undifferentiated heavy lifting of parsing and generating HTTP requests and responses.

A data flow diagram for a Smithy service

Walkthrough

This post will walk you through the process of building and using a Smithy service, from modeling to deployment.

By the end, you should be able to:

Model a simple REST service in Smithy
Generate a Smithy server SDK for TypeScript
Implement a service in Lambda using the generated server SDK
Deploy the service to AWS using the AWS CDK
Generate a client SDK, and use it to call the deployed service

The complete example described in this post can be found here.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
JDK >= 8, Node.js >= 14, Yarn >= 2, and Git installed
Your workstation configured to use your AWS account with the CDK

Checking out the sample repository

Create a new repository from the template repository here.

To clone the application in your browser

Open https://github.com/aws-samples/smithy-server-generator-typescript-sample in your browser
Select “Use this template” in the top right-hand corner
Fill out the form, and select “Create repository from template”
Clone your new repository from GitHub by following the instructions in the “Code” dropdown

Exploring and setting up the sample application

The sample application is split into three separate submodules:

model – contains the Smithy model that defines the service
Server – contains the code generation setup, application logic, and CDK stack for the service
typescript-client – contains the code generation setup for a rich client generated in TypeScript

To bootstrap the sample application and run the initial build

Open a terminal and navigate to the root of the sample application
Run the following command:
```
./gradlew build && yarn install
```
Wait until the build finishes successfully

Modeling a service using Smithy

In an IDE of your choice, open the file at model/src/main/smithy/main.smithy. This file defines the interface for the sample web service, a service that can echo strings back to the caller, as well as provide the string length.

The service definition forms the root of a Smithy model. It defines the operations that are available to clients, as well as common errors that are thrown by all of the operations in a service.


@sigv4(name: "execute-api")
@restJson1
service StringWizard {
    version: "2018-05-10",
    operations: [Echo, Length],
    errors: [ValidationException],
}

This service uses the @sigv4 trait to indicate that calls must be signed with AWS Signature V4. In the sample application, API Gateway’s Identity and Access Management (IAM) Authentication support provides this functionality.

@restJson1 indicates the protocol supported by this service. RestJson1 is Smithy’s built-in protocol for RESTful web services that use JSON for requests and responses.

This service advertises two operations: Echo and Length. Furthermore, it indicates that every operation on the service must be expected to throw ValidationException, if an invalid input is supplied.

Next, let’s look at the definition of the Length operation and its input type.

/// An operation that computes the length of a string
/// provided on the URI path
@readonly
@http(code: 200, method: "GET", uri: "/length/{string}",)
operation Length {
     input: LengthInput,
     output: LengthOutput,
     errors: [PalindromeException],
}

@input
structure LengthInput {
     @required
     @httpLabel
     string: String,
}

This operation uses the @http trait to model how requests are processed with restJson1, including the method (GET) and how the URI is formed (using a label to bind the string field from LengthInput to a path segment). HTTP binding with Smithy can be explored in depth at Smithy’s documentation page.

Note that this operation can also throw a PalindromeException, which we’ll explore in more detail when we check out the business logic.

Updating the Smithy model to add additional constraints to the input

Smithy constraint traits are used to enable additional validation for input types. Server SDKs automatically perform validation based on the Smithy constraints in the model. Let’s add a new constraint to the input for the Length operation. Moreover, let’s make sure that only alphanumeric characters can be passed in by the caller.

Open model/src/main/smithy/main.smithy in an editor

Add a @pattern constraint to the string member of Length input. It should look like this:

structure LengthInput {
    @required
    @httpLabel
    @pattern(“^[a-zA-Z0-9]$”)
    string: String,
}

Open a terminal, and navigate to the root of the sample application
Run the following command:
```
yarn build
```
Wait for the build to finish successfully

Using the Smithy Server Generator for TypeScript

The key component of a Smithy web service is its code generator, which translates the Smithy model into actual code. You’ve already run the code generator – it runs every time that you build the sample application.

The codegen directory inside of the server submodule is where the Smithy Server Generator for TypeScript is configured and run. The server generator uses Smithy Build to build, and it’s configured by smithy-build.json.

{
  "version" : "1.0",
  "outputDirectory" : "build/output",
  "projections" : {
      "ts-server" : {
         "plugins": {
           "typescript-ssdk-codegen" : {
              "package" : "@smithy-demo/string-wizard-service-ssdk",
              "packageVersion": "0.0.1"
           }
        }
      },
      "apigateway" : {
        "plugins" : {
          "openapi": {
             "service": "software.amazon.smithy.demo#StringWizard",
             "protocol": "aws.protocols#restJson1",
             "apiGatewayType" : "REST"
           }
         }
      }
   }
}

This smithy-build configures two projections. The ts-server projection generates the server SDK by invoking the typescript-ssdk-codegen plugin. The package and packageVersion arguments are used to generate an npm package that you can add as a dependency in your server code.

The OpenAPI projection configures Smithy’s OpenAPI converter to generate a file that can be imported into API Gateway to host this service. It uses Smithy’s ability to extend models via the imports keyword to extend the base model with an additional API Gateway configuration. The generated OpenAPI specification is used by the CDK stack, which we’ll explore later.

If you open package.json in the server submodule, then you’ll notice this line in the dependencies section:

"@smithy-demo/string-wizard-service-ssdk": "workspace:server/codegen/build/smithyprojections/server-codegen/ts-server/typescript-ssdk-codegen"

The key, @smithy-demo/string-wizard-service-ssdk, matches the package key in the smithy-build.json file. The value uses Yarn’s workspaces feature to set up a local dependency on the generated server SDK. This lets you use the server SDK as a standalone npm dependency without publishing it to a repository. Since we bundle the server application into a zip file before uploading it to Lambda, you can treat the server SDK as an implementation detail that isn’t published externally.

We won’t get into the details here, but you can see the specifics of how the code generator is invoked by looking at the regenerate:ssdk script in the server’s package.json, as well as the build.gradle file in the server’s codegen directory.

Implementing an operation using a server SDK

The server generator takes care of the undifferentiated heavy lifting of writing a Smithy service. However, there are still two tasks left for the service developer: writing the Lambda entrypoint, and implementing the operation’s business logic.

First, let’s look at the entrypoint for the Length operation. Open server/src/length_handler.ts in an editor. You should see the following content:

import { getLengthHandler } from "@smithy-demo/string-wizard-service-ssdk";
import { APIGatewayProxyHandler } from "aws-lambda";
import { LengthOperation } from "./length";
import { getApiGatewayHandler } from "./apigateway";
// This is the entry point for the Lambda Function that services the LengthOperation
export const lambdaHandler: APIGatewayProxyHandler = getApiGatewayHandler(getLengthHandler(LengthOperation));

If you’ve written a Lambda entry-point before, then exporting a function of type APIGatewayProxyHandler will be familiar to you. However, there are a few new pieces here. First, we have a function from the server SDK, called getLengthHandler, that takes a Smithy Operation type and returns a ServiceHandler. Operation is the interface that the server SDK uses to encapsulate business logic. The core task of implementing a Smithy service is to implement Operations. ServiceHandler is the interface that encapsulates the generated logic of a server SDK. It’s the black box that handles serialization, deserialization, error handling, validation, and routing.

The getApiGatewayHandler function simply invokes the request and response conversion logic, and then builds a custom context for the operation. We won’t go into their details here.

Next, let’s explore the operation implementation. Open server/src/length.ts in an editor. You should see the following content:

import { Operation } from "@aws-smithy/server-common";
import {
  LengthServerInput,
  LengthServerOutput,
  PalindromeException,
} from "@smithy-demo/string-wizard-service-ssdk";
import { HandlerContext } from "./apigateway";
import { reverse } from "./util";

// This is the implementation of business logic of the LengthOperation
export const LengthOperation: Operation<LengthServerInput, LengthServerOutput, HandlerContext> = async (
  input,
  context
) => {
  console.log(`Received Length operation from: ${context.user}`);

  if (input.string != undefined && input.string === reverse(input.string)) {
     throw new PalindromeException({ message: "Cannot handle palindrome" });
  }

  return {
     length: input.string?.length,
  };
};

Let’s look at this implementation piece-by-piece. First, the function type Operation<LengthServerInput, LengthServerOutput, HandlerContext> provides the type-safe interface for our business logic. LengthServerInput and LengthServerOutput are the code generated types that correspond to the input and output types for the Length operation in our Smithy model. If we use the wrong type arguments for the Operation, then it will fail type checks against the getLengthHandler function in the entry-point. If we try to access the incorrect properties on the input, then we’ll also see type checker failures. This is one of the core tenets of the Smithy Server Generator for TypeScript: writing a web service should be as strongly typed as writing anything else.

Next, let’s look at the section that validates that the input isn’t a palindrome:

if (input.string != undefined && input.string === reverse(input.string)) {
    throw new PalindromeException({ message: "Cannot handle palindrome" });
}

Although the server SDK can validate the input against Smithy’s constraint traits, there is no constraint trait for rejecting palindromes. Therefore, we must include this validation in our business logic. Our Smithy model includes a PalindromeException definition that includes a message member. This is generated as a standard subclass of Error with a constructor that takes in a message that your operation implementation can throw like any other error. This will be caught and properly rendered as a response by the server SDK.

Finally, there’s the return statement. Since the Smithy model defines LengthOutput as a structure containing an integer member called length, we return an object that has the same structural type here.

Note that this business logic doesn’t have to consider serialization, or the wire format of the request or response, let alone anything else related to HTTP or API Gateway. The unit tests in src/length/length.spec.ts reflect this. They’re the same standard unit tests as you would write against any other TypeScript class. The server SDK lets you write your business logic at a higher level of abstraction, thus simplifying your unit testing and letting your developers focus on their business logic rather than the messy details.

Deploying the sample application

The sample application utilizes the AWS CDK to deploy itself to your AWS account. Explore the CDK definition in server/lib/cdk-stack.ts. An in-depth exploration of the stack is out of the scope for this post, but it looks largely like any other AWS application that deploys TypeScript code to Lambda behind API Gateway.

The key difference is that the cdk stack can rely on a generated OpenAPI definition for the API Gateway resource. This makes sure that your deployed application always matches your Smithy model. Furthermore, it can use the server SDK’s generated types to make sure that every modeled operation has an implementation deployed to Lambda. This means that forgetting to wire up the implementation for a new operation becomes a compile-time failure, rather than a runtime one.

To deploy the sample application from the command line

1. Open a terminal and navigate to the server directory of your sample application.
2. Run the following command:
```
yarn cdk deploy
```
3. The cdk will display a list of security-sensitive resources that will be deployed to your account. These consist mostly of AWS Identity and Access Management (IAM) roles used by your Lambda functions for execution. Enter y to continue deploying the application to your account.
4. When it has completed, the CDK will print your new application’s endpoint and the CloudFormation stack containing your application to the console. It will look something like the following:
```
Outputs:
    StringWizardService.StringWizardApiEndpoint59072E9B
    = https://RANDOMSTRING.execute-api.us-west-2.amazonaws.com/prod/
	
Stack ARN:
    arn:aws:cloudformation:us-west-2:YOURACCOUNTID:stack/StringWizardService/SOME-UUID
```
5. Log on to your AWS account in the AWS Management Console.
6. Navigate to the Lambda console. You should see two new functions: one that starts with StringWizardService-EchoFunction, and one that starts with StringWizardService-EchoFunction. These are the implementations of your Smithy service’s operations.
7. Navigate to the Amazon API Gateway console. You should see a new REST API named StringWizardAPI, with Resources POST /echo and GET /length/{string}, corresponding to your Smithy model.
Calling the sample application with a generated client

The last piece of the Smithy puzzle is the strongly-typed generated client generated by the Smithy Client Generator for TypeScript. It’s located in the typescript-client folder, which has a codegen folder that uses SmithyBuild to generate a client in much the same manner as the server.

The sample application ships with a simple wrapper script for the length operation that uses the generated client to build a rudimentary CLI. Open the typescript-client/bin/length.ts file in your editor. The contents will look like the following:
```
#!/usr/bin/env node

import {LengthCommand, StringWizardClient} from "@smithy-demo/string-client";

const client = new StringWizardClient({endpoint: process.argv[2]});

client.send(new LengthCommand({
     string: process.argv[3]
})).catch((err) => {
     console.log("Failed with error: " + err);
process.exit(1);
}).then((res) => {
     process.stderr.write(res.length?.toString() ?? "0");
});
```
If you’ve used the AWS SDK for JavaScript v3, this will look familiar. This is because it’s generated using the Smithy Client Generator for TypeScript!

From the code, you can see that the CLI takes two positional arguments: the endpoint for the deployed application, and an input string. Let’s give it a spin.

To call the deployed application using the generated client
1. Open a terminal and navigate to the typescript-client directory.
2. Run the following command to build the client:
```
yarn build
```
3. Using the endpoint output by the CDK in the Deploying the sample application section above, run the following command:
```
yarn run str-length https://RANDOMSTRING.execute-api.us-west-2.amazonaws.com/prod/ foo 
```
4. You should see an output of 3, the length of foo.
5. Next, trigger anerror by calling your endpoint with a palindrome by running the following command:
```
yarn run str-length https://RANDOMSTRING.execute-api.us-west-2.amazonaws.com/prod/ kayak
```
6. You should see the following output:
```
Failed with error: PalindromeException: Cannot handle palindrome
```
Cleaning up

To avoid incurring future charges, delete the resources.

To delete the sample application using the CDK
1. Open a terminal and navigate to the server directory.
2. Run the following command:
```
yarn cdk destroy StringWizardService
```
3. Answer y to the prompt Are you sure you want to delete: StringWizardService (y/n)?
4. Wait for the CDK to complete the deletion of your CloudFormation stack. You should see the following when it has completed:
```
✅ StringWizardService: destroyed
```
Conclusion

You have now used a Smithy model to define a service, explored how a generated server SDK can simplify your web service development, deployed the service to the AWS Cloud using the AWS CDK, and called the service using a strongly-typed generated client.

If you aren’t familiar with Smithy, but you want to learn more, then don’t forget to check out the documentation or the introductory video.

To learn more about the Smithy Server Generator for TypeScript, check out its documentation.

If you have feature requests, bug reports, feedback of any kind, or would like to contribute, head over to the GitHub repository.

Adam Thomas

Adam Thomas is a Senior Software Development engineer on the Smithy team. He has been a web service developer at Amazon for over ten years. Outside of work, Adam is a passionate advocate for staying inside, playing video games, and reading fiction.

Chaos experiments on Amazon RDS using AWS Fault Injection Simulator

2022-05-01 Anup Sivadas

Post Syndicated from Anup Sivadas original https://aws.amazon.com/blogs/devops/chaos-experiments-on-amazon-rds-using-aws-fault-injection-simulator/

Performing controlled chaos experiments on your Amazon Relational Database Service (RDS) database instances and validating the application behavior is essential to making sure that your application stack is resilient. How does the application behave when there is a database failover? Will the connection pooling solution or tools being used gracefully connect after a database failover is successful? Will there be a cascading failure if the database node gets rebooted for a few seconds? These are some of the fundamental questions that you should consider when evaluating the resiliency of your database stack. Chaos engineering is a way to effectively answer these questions.

Traditionally, database failure conditions, such as a failover or a node reboot, are often triggered using a script or 3rd party tools. However, at scale, these external dependencies often become a bottleneck and are hard to maintain and manage. Scripts and 3rd party tools can fail when called, whereas a web service is highly available. The scripts and 3rd party tools also tend to require elevated permissions to work, which is a management overhead and insecure from a least privilege access model perspective. This is where AWS Fault Injection Simulator (FIS) comes to the rescue.

AWS Fault Injection Simulator (AWS FIS) is a fully managed service for running fault injection experiments on AWS that makes it easier to improve an application’s performance, observability, and resiliency. Fault injection experiments are used in chaos engineering, which is the practice of stressing an application in testing or production environments by creating disruptive events, such as a sudden increase in CPU or memory consumption, database failover and observing how the system responds, and implementing improvements.

We can define the key phases of chaos engineering as identifying the steady state of the workload, defining a hypothesis, running the experiment, verifying the experiment results and making necessary improvements based on the experiment results. These phases will confirm that you are injecting failures in a controlled environment through well-planned experiments in order to build confidence in the workloads and tools we are using to withstand turbulent conditions.

This diagram explains the phases of chaos engineering. We start with identifying the steady state, defining a hypothesis, run the experiment, verify the experiment results and improve. This is a cycle.

Example—

Baseline: we have a managed database with a replica and automatic failover enabled.
Hypothesis: failure of a single database instance / replica may slow down a few requests but will not adversely affect our application.
Run experiment: trigger a DB failover.
Verify: confirm/dis-confirm the hypothesis by looking at KPIs for the application (e.g., via CloudWatch metric/alarm).

Methodology and Walkthrough

Let’s look at how you can configure AWS FIS to perform failure conditions for your RDS database instances. For this walkthrough, we’ll look at injecting a cluster failover for Amazon Aurora PostgreSQL. You can leverage an existing Aurora PostgreSQL cluster or you can launch a new cluster by following the steps in the Create an Aurora PostgreSQL DB Cluster documentation.

Step 1: Select the Aurora Cluster.

The Aurora PostgreSQL instance that we’ll use for this walkthrough is provisioned in us-east-1 (N. Virginia), and it’s a cluster with two instances. There is one writer instance and another reader instance (Aurora replica). The cluster is named chaostest, the writer instance is named chaostest-instance-1, and the reader is named chaostest-intance-1-us-east-1a.

Under RDS Databases Section, the cluster named chaostest is selected. Under the cluster there are two instances which is available. Chaostest-instance-1 is the writer instance and chaostest-instance-1-us-east-1a is the reader instance.

The goal is to simulate a failover for this Aurora PostgreSQL cluster so that the existing chaostest-intance-1-us-east-1a reader instance will switch roles and then be promoted as the writer, and the existing chaostest-instance-1 will become the reader.

Step 2: Navigate to the AWS FIS console.

We will now navigate to the AWS FIS console to create an experiment template. Select Create experiment template.

Under FIS console, Create experiment template needs to be selected.

Step 3: Complete the AWS FIS template pre-requisites.

Enter a Description, Name, and select the AWS IAM Role for the experiment template.

Under create experiment template section, Simulate Database Failover is entered for the Description field. DBFailover is entered for the Name field(Optional). FISWorkshopServiceRole is selected for the IAM role drop down field.

The IAM role selected above was pre-created. To use AWS FIS, you must create an IAM role that grants AWS FIS the permissions required so that the service can run experiments on your behalf. The role follows the least privileged model and includes permissions to act on your database clusters like trigger a failover. AWS FIS only uses the permissions that have been delegated explicitly for the role. To learn more about how to create an IAM role with the required permissions for AWS FIS, refer to the FIS documentation.

Step 4: Navigate to the Actions, Target, Stop Condition section of the template.

The next key section of AWS FIS is Action, Target, and Stop Condition.

Action, Target and Stop Conditions section is highlighted in the image.

Action—An action is an activity that AWS FIS performs on an AWS resource during an experiment. AWS FIS provides a set of pre-configured actions based on the AWS resource type. Each Action runs for a specified duration during an experiment, or until you stop the experiment. An action can run sequentially or in parallel.

For our experiment, the Action will be aws:rds:failover-db-cluster.

Target—A target is one or more AWS resources on which AWS FIS performs an action during an experiment. You can choose specific resources or select a group of resources based on specific criteria, such as tags or state.

For our experiment, the target will be the chaostest Aurora PostgreSQL cluster.

Stop Condition—AWS FIS provides the controls and guardrails that you need to run experiments safely on your AWS workloads. A stop condition is a mechanism to stop an experiment if it reaches a threshold that you define as an Amazon CloudWatch alarm. If a stop condition is triggered while the experiment is running, then AWS FIS stops the experiment.

For our experiment, we won’t be defining a stop condition. This is because this simple experiment contains only one action. Stop conditions are especially useful for experiments with a series of actions, to prevent them from continuing if something goes wrong.

Step 5: Configure Action.

Now, let’s configure the Action and Target for our experiment template. Under the Actions section, we will select Add action to get the New action window.

The action section displays Name,Description, Action Type and Start After fields. There is a Add action button that needs to be selected.

Enter a Name, a Description, and select Action type aws:rds:failover-db-cluster. Start after is an optional setting. This setting allows you to specify an action that should precede the one we are currently configuring.

Under Actions section, DBFailover is entered for the Name field. DB Failover Action is entered for the Description field. aws:rds:failover-db-cluster is entered for the Action Type field. Start after field is left blank and Clusters-Target-1 is selected for the Target field. The save button will save the info entered for the respective fields.

Step 6: Configure Target.

Note that a Target has been automatically created with the name Clusters-Target-1. Select Save to save the action.

Next, you will edit the Clusters-Target-1 target to select the target, i.e., the Aurora PostgreSQL cluster.

Under Targets section, the edit button for Clusters-Target-1 is highlighted.

Select Target method as Resource IDs, and select the chaostest cluster. If you are interested to select a group of resources, then select Resource tags, filters and parameters option.

Under Edit Target section, Clusters-Target-1 is selected for the Name field. aws:rds:cluster is selected for the Resource type field. Action is set as DBFailover. Resource IDs is selected for the Target Method field. Chaostest cluster is selected for the Resource IDs drop down box. Save button is also available in this section to save the configuration.

Step 7: Create the experiment template to complete this stage.

We will wrap up the process by selecting the create experiment template.

Create experiment template option is highlighted. The user will click this button to proceed creating a template.

We will get a warning stating that a stop condition isn’t defined. We’ll enter create in the provided field to create the template.

After selecting the create experiment template option in the previous screen, the user is prompted to enter "create" in the field to proceed.

We will get a success message if the entries are correct and the template will be successfully created.

"You successfully created experiment template" success message is displayed in the screen and its highlighted in green color.

Step 8: Verify the Aurora Cluster.

Before we run the experiment, let’s double-check the chaostest Aurora Cluster to confirm which instance is the writer and which is the reader.

Under the RDS section, chaos cluster is listed. The user is confirming that chaostest-instance-1 is the writer and chaostest-instance-1-us-east-1a is the reader.

We confirmed that chaostest-instance-1 is the writer and chaostest-instance-1-us-east-1a is the reader.

Step 9: Run the AWS FIS experiment.

Now we’ll run the FIS experiment. Select Actions, and then select Start for the experiment template.

Under the experiment template section, the Simulate Database Failover template is selection. Under the actions section, option Start is selected to start the experiment. The other options under Actions section includes Update, Manage tags and Delete.

Select Start experiment and you’ll get another warning to confirm if you really want to start this experiment. Confirm by entering start say Start experiment.

Under the Start Experiement section, user is promoted to enter "start" in the field to start the experiement.

Step 10: Observe the various stages of the experiment.

The experiment will be in initiating, running and will eventually be in completed states.

The experiment is in initiating state.

The experiment is in Complete state.

Step 11: Verify the Aurora Cluster to confirm failover.

Now let’s look at the chaostest Aurora PostgreSQL cluster to check the state. Note that a failover was indeed triggered by FIS and chaostest-instance-1-us-east-1a is the newly promoted writer and chaostest-instance-1 is the reader now.

Under RDS Section, Chaostest cluster is shown. This time, the writer is chaostest-instance-1-us-east-1a.

Step 12: Verify the Aurora Cluster logs.

We can also confirm the failover action by looking at the Logs and events section of the Aurora Cluster.

Under the Recent Events section of the chaos-test cluster, the failover messages is displayed. One of the messages lists "Started cross AZ failover to DB instance:chaostest-instance-1-us-east-1a. This confirms that the experiment was successful.

Clean up

If you created a new Aurora PostgreSQL cluster for this walkthrough, then you can terminate the cluster to optimize the costs by following the steps in the Deleting an Aurora DB cluster documentation.

You can also delete the AWS FIS experiment template by following the steps in the Delete an experiment template documentation.

You can refer to the AWS FIS documentation to learn more about the service. If you want to know more about chaos engineering, check out the AWS re:Invent session Testing resiliency using chaos engineering and The Chaos Engineering Collection. Finally, check out the following GitHub repo for additional example experiments, and how you can work with AWS FIS using the AWS Cloud Development Kit (AWS CDK).

Conclusion

In this walkthrough, you learned how you can leverage AWS FIS to inject failures into your RDS Instances. To get started with AWS Fault Injection Service for Amazon RDS, refer to the service documentation.

Author:

Extend your pre-commit hooks with AWS CloudFormation Guard

2022-04-26 Joaquin Manuel Rinaudo

Post Syndicated from Joaquin Manuel Rinaudo original https://aws.amazon.com/blogs/security/extend-your-pre-commit-hooks-with-aws-cloudformation-guard/

Git hooks are scripts that extend Git functionality when certain events and actions occur during code development. Developer teams often use Git hooks to perform quality checks before they commit their code changes. For example, see the blog post Use Git pre-commit hooks to avoid AWS CloudFormation errors for a description of how the AWS Integration and Automation team uses various pre-commit hooks to help reduce effort and errors when they build AWS Quick Starts.

This blog post shows you how to extend your Git hooks to validate your AWS CloudFormation templates against policy-as-code rules by using AWS CloudFormation Guard. This can help you verify that your code follows organizational best practices for security, compliance, and more by preventing you from commit changes that fail validation rules.

We will also provide patterns you can use to centrally maintain a list of rules that security teams can use to roll out new security best practices across an organization. You will learn how to configure a pre-commit framework by using an example repository while you store Guard rules in both a central Amazon Simple Storage Service (Amazon S3) bucket or in versioned code repositories (such as AWS CodeCommit, GitHub, Bitbucket, or GitLab).

Prerequisites

To complete the steps in this blog post, first perform the following installations.

Install AWS Command Line Interface (AWS CLI).
Install the Git CLI.
Install the pre-commit framework by running the following command.
pip install pre-commit
Install the Rust programming language by following these instructions.
(Windows only) Install the version of Microsoft Visual C++ Build Tools 2019 that provides just the Visual C++ build tools.

Solution walkthrough

In this section, we walk you through an exercise to extend a Java service on an Amazon EKS example repository with Git hooks by using AWS CloudFormation Guard. You can choose to upload your Guard rules in either a separate GitHub repository or your own S3 bucket.

First, download the sample repository that you will add the pre-commit framework to.

To clone the test repository

Clone the repo to a local directory by running the following command in your local terminal.
git clone https://github.com/aws-samples/amazon-eks-example-for-stateful-java-service.git

Next, create Guard rules that reflect the organization’s policy-as-code best practices and store them in an S3 bucket.

To set up an S3 bucket with your Guard rules

Create an S3 bucket by running the following command in the AWS CLI.
aws s3 mb s3://<account-id>-cfn-guard-rules --region <aws-region>

where <account-id> is the ID of the AWS account you’re using and <aws-region> is the AWS Region you want to use.
(Optional) Alternatively, you can follow the Getting started with Amazon S3 tutorial to create the bucket and upload the object (as described in step 4 that follows) by using the AWS Management Console.
When you store your Guard rules in an S3 bucket, you can make the rules accessible to other member accounts in your organization by using the aws:PrincipalOrgID condition and setting the value to your organization ID in the bucket policy.
Create a file that contains a Guard rule named rules.guard, with the following content.
```
let eks_cluster = Resources.* [ Type == 'AWS::EKS::Cluster' ]
rule eks_public_disallowed when %eks_cluster !empty {
      %eks_cluster.Properties.ResourcesVpcConfig.EndpointPublicAccess == false
}
```
This rule will verify that public endpoints are disabled by checking that resources that are created by using the AWS::EKS::Cluster resource type have the EndpointPublicAccess property set to false. For more information about authoring your own rules using Guard domain-specific language (DSL), see Introducing AWS CloudFormation Guard 2.0.
Upload the rule set to your S3 bucket by running the following command in the AWS CLI.
aws s3 cp rules.guard s3://<account-id>-cfn-guard-rules/rules/rules.guard

In the next step, you will set up the pre-commit framework in the repository to run CloudFormation Guard against code changes.

To configure your pre-commit hook to use Guard

Run the following command to create a new branch where you will test your changes.
git checkout -b feature/guard-hook
Navigate to the root directory of the project that you cloned earlier and create a .pre-commit-config.yaml file with the following configuration.
```
repos:
  - repo: local
    hooks:
      -   id: cfn-guard-rules
          name: Rules for AWS
          description: Download Organization rules
          entry: aws s3 cp --recursive s3://<account-id>-cfn-guard-rules/rules  guard-rules/org-rules/
          language: system
          pass_filenames: false
      -   id: cfn-guard
          name: AWS CloudFormation Guard
          description: Validate code against your Guard rules
          entry: bash -c 'for template in "$@"; do cfn-guard validate -r guard-rules -d "$template" || SCAN_RESULT="FAILED"; done; if [[ "$SCAN_RESULT" = "FAILED" ]]; then exit 1; fi'
          language: rust
          files: \.(json|yaml|yml|template\.json|template)$
          additional_dependencies:
            - cli:cfn-guard
```
You will need to replace the <account-id> placeholder value with the AWS account ID you entered in the To set up an S3 bucket with your Guard rules procedure.

This hook configuration uses local pre-commit hooks to download the latest version of Guard rules from the bucket you created previously. This allows you to set up a centralized set of Guard rules across your organization.

Alternatively, you can create and use a code repository such as GitHub, AWS CodeCommit, or Bitbucket to keep your rules in version control. To do so, replace the command in the Download Organization rules step of the .pre-commit-config.yaml file with:

bash -c ‘if [ -d guard-rules/org-rules ]; then cd guard-rules/org-rules && git pull; else git clone <guard-rules-repository-target> guard-rules/org-rules; fi’

Where <guard-rules-repository-target> is the HTTPS or SSH URL of your repository. This command will clone or pull the latest rules from your Git repo by using your Git credentials.

The hook will also install Guard as an additional dependency by using a Rust hook. Using Guard, it will run the code changes in the repository directory against the downloaded rule set. When misconfigurations are detected, the hook stops the commit.

You can further extend your organization rules with your own Guard rules by adding them to the cfn-guard-rules folder. You should commit these rules in your repository and add cfn-guard-rules/org-rules/* to your .gitignore file.
Run a pre-commit install command to install the hooks you just created.

Finally, test that the pre-commit’s Guard hook fails commits of code changes that do not follow organizational best practices.

To test pre-commit hooks

Add EndpointPublicAccess: true in cloudformation/eks.template.yaml, as shown following. This describes the test-only intent (meaning that you want to detect and flag errors in your rule) of adding public access to the Amazon Elastic Kubernetes Service (Amazon EKS) cluster.
```
  EKSCluster:
    Type: AWS::EKS::Cluster
    Properties:
      Name: java-app-demo-cluster
      ResourcesVpcConfig:
        EndpointPublicAccess: true
        SecurityGroupIds:
          - !Ref EKSControlPlaneSecurityGroup
```
Add your changes with the git add command.
git add .pre-commit-config.yaml

git add cloudformation/eks.template.yaml

Commit changes with the following command.

git commit -m “bad config”

You should see the following error that disallows the commit to the local repository and shows which one of your Guard rules failed.

amazon-eks-controlplane.template.yaml Status = FAIL
		
FAILED rules
		
rules.guard/eks_public_disallowed    FAIL
		
---
		
Evaluation of rules rules.guard against data amazon-eks-controlplane.template.yaml
		
---
		
Property
[/Resources/EKS/Properties/ResourcesVpcConfig/EndpointPublicAccess] in data
[eks.template.yaml] is not compliant with [rules.guard/eks_public_disallowed] 
because provided value [true] did not match expected value [false]. 
Error Message []

(Optional) You can also test hooks before committing by using the pre-commit run command to see similar output.

Cleanup

To avoid incurring ongoing charges, follow these cleanup steps to delete the resources and files you created as you followed along with this blog post.

To clean up resources and files

Remove your local repository.
rm -rf /path/to/repository
Delete the S3 bucket you created by running the following command.
aws s3 rb s3://<account-id>-cfn-guard-rules --force
(Optional) Remove the pre-commit hooks framework by running this command.
pip uninstall pre-commit

Conclusion

In this post, you learned how to use AWS CloudFormation Guard with the pre-commit framework locally to validate your infrastructure-as-code solutions before you push remote changes to your repositories.

You also learned how to extend the solution to use a centralized list of security rules that is stored in versioned code repositories (GitHub, Bitbucket, or GitLab) or an S3 bucket. And you learned how to further extend the solution with your own rules. You can find examples of rules to use in Guard’s Github repository or refer to write preventative compliance rules for AWS CloudFormation templates the cfn-guard way. You can then further configure other repositories to prevent misconfigurations by using the same Guard rules.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the KMS re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Deploy .NET Blazor WebAssembly Application to AWS Amplify

2022-04-20 Adilson Perinei

Post Syndicated from Adilson Perinei original https://aws.amazon.com/blogs/devops/deploy-net-blazor-webassembly-application-to-aws-amplify/

Blazor can run your client-side C# code directly in the browser, using WebAssembly. It is a .NET running on WebAssembly, and you can reuse code and libraries from the server-side parts of your application.

Overview of solution

In this post, you will deploy a Blazor WebAssembly Application from git repository to AWS Amplify. We will use .NET 6. to create a Blazor WebAssembly on local machine using AWS Command Line Interface (AWS CLI), use GitHub as a git repository, and deploy the application to Amplify.

Follow this post on: Windows 10, Windows 11/Ubuntu 20.04 LTS/macOS 10.15 “Catalina”, macOS 11.0 “Big Sur”, or macOS 12.0 “Monterey”.

User pushes Blazor WebAssembly code and amplify.yml to Github. This action will trigger the amplify pipeline that will use amplify.yml to build and deploy the application.

Walkthrough

We will walk through the following steps:

Create Blazor WebAssembly application on our local machine using AWS CLI
Test /run the application locally
Create a new repository on Github
Create a local repository
Setup Amplify
Test /run the application on AWS

Prerequisites

For this walkthrough, you should have the following prerequisites:

AWS Account – How to create an AWS account
Dotnet version on Windows/Linux/MacOS – How to install Dotnet version
Git CLI on Windows/Linux/MacOS – How to install Git CLI
Github account – How to create a github account

Let’s start creating a Blazor WebAssembly application on our local machine using CLI:

1. Open the command line interface
2. Create a directory for your application running the following command:

mkdir BlazorWebApp

1. Change to the application directory running the following command:

cd BlazorWebApp

1. Create the Blazor WebAssembly Application running the following command:

dotnet new blazorwasm

1. Run the application:

dotnet run

1. Copy the URL after “Now listening on:”, and paste it on your browser.
  Example: http://localhost:5152 (port might be different in your CLI)

Sample app

1. After testing your application, go back to the terminal and press <ctrl> + c to stop the application.
2. Create a gitignore for your project running the following command:

dotnet new gitignore

1. Create a file called “amplify.yml” in the root directory of your application. The name must be exactly “amplify.yml”. This file contains the commands to build your application used by AWS CodeBuild.
2. Copy and paste the following code to the file amplify.yml.

version: 1 frontend: phases: preBuild: commands: - curl -sSL https://dot.net/v1/dotnet-install.sh > dotnet-install.sh - chmod +x *.sh - ./dotnet-install.sh -c 6.0 -InstallDir ./dotnet6 - ./dotnet6/dotnet --version build: commands: - ./dotnet6/dotnet publish -c Release -o release artifacts: baseDirectory: /release/wwwroot files: - '**/*' cache: paths: []

Create a new repository on Github:

1. 1. Log in to the Github website and create a new repository:

Create Github repo

1. 1. Type a name for your repository, choose private, and add a read.me file as shown in the following screenshot:

Create repo

Create a local repository for your application:

1. 1. On the root folder of your application enter the following commands. Make sure that you have configured git CLI with email and user

git add --all git commit -m “first commit” git branch -M main git remote add origin https://github.com/perinei/blazorToAmplify.git (replace red text with your repo) for ssh authentication use: git remote add origin [email protected]: perinei/blazorToAmplify.git git push -u origin main

Setup Amplify:

1. 1. Log in to the AWS account
  2. Go to AWS Amplify Service
  3. On the left panel, choose All apps
  4. Select New app as per the following screen
  5. Select Host Web App from the dropdown list

1. 1. Choose Github

1. 1. Select Continue. If you are still logged in on your Github account, then the page will automatically authenticate you, otherwise select the Authenticate Button
  2. Choose your repository: in my case, perinei/bazortoamplify
  3. Branch: main
  4. Select next

1. 1. Give your app a name
  2. amplify.yml will be automatically detected and will be used to build the application on AWS

1. 1. Select Next to review the configuration
  2. Select Save and Deploy
  3. Amplify will provision, build, deploy, and verify the application

1. 1. When the process is complete, select the URL of your application and test the application.

1. 1. Congratulations! Your Blazor WebAssembly is running on Amplify.

Cleaning up

To avoid incurring future charges, delete the resources. On Amplify, choose your app name on the left panel, select action, and then delete app.

Conclusion

Congratulations, you deployed your first Blazor Webassembly Application to AWS Amplify.

In this blog post you learned how to easily build a full CI/CD pipeline for a Blazor WebAssembly using the AWS amplify. It was only necessary to specify the repository and the commands build the application on the file amplify.yml that should be include on the root folder of repository. You can also easily add a custom domain to your application. Visit Set up custom domains on AWS Amplify Hosting

AWS can help you to migrate .NET applications to the Cloud. Visit .NET on AWS.

The .NET on AWS YouTube playlist is the place to get the latest .NET on AWS videos, including AWS re:Invent sessions.

To learn more about how to amplify.yml to build your application, visit Configuring build settings – AWS Amplify.

How to protect HMACs inside AWS KMS

2022-04-20 Jeremy Stieglitz

Post Syndicated from Jeremy Stieglitz original https://aws.amazon.com/blogs/security/how-to-protect-hmacs-inside-aws-kms/

Today AWS Key Management Service (AWS KMS) is introducing new APIs to generate and verify hash-based message authentication codes (HMACs) using the Federal Information Processing Standard (FIPS) 140-2 validated hardware security modules (HSMs) in AWS KMS. HMACs are a powerful cryptographic building block that incorporate secret key material in a hash function to create a unique, keyed message authentication code.

In this post, you will learn the basics of the HMAC algorithm as a cryptographic building block, including how HMACs are used. In the second part of this post, you will see a few real-world use cases that show an application builder’s perspective on using the AWS KMS HMAC APIs.

HMACs provide a fast way to tokenize or sign data such as web API requests, credit cards, bank routing information, or personally identifiable information (PII).They are commonly used in several internet standards and communication protocols such as JSON Web Tokens (JWT), and are even an important security component for how you sign AWS API requests.

HMAC as a cryptographic building block

You can consider an HMAC, sometimes referred to as a keyed hash, to be a combination function that fuses the following elements:

A standard hash function such as SHA-256 to produce a message authentication code (MAC).
A secret key that binds this MAC to that key’s unique value.

Combining these two elements creates a unique, authenticated version of the digest of a message. Because the HMAC construction allows interchangeable hash functions as well as different secret key sizes, one of the benefits of HMACs is the easy replaceability of the underlying hash function (in case faster or more secure hash functions are required), as well as the ability to add more security by lengthening the size of the secret key used in the HMAC over time. The AWS KMS HMAC API is launching with support for SHA-224, SHA-256, SHA-384, and SHA-512 algorithms to provide a good balance of key sizes and performance trade-offs in the implementation. For more information about HMAC algorithms supported by AWS KMS, see HMAC keys in AWS KMS in the AWS KMS Developer Guide.

HMACs offer two distinct benefits:

Message integrity: As with all hash functions, the output of an HMAC will result in precisely one unique digest of the message’s content. If there is any change to the data object (for example you modify the purchase price in a contract by just one digit: from “$350,000” to “$950,000”), then the verification of the original digest will fail.
Message authenticity: What distinguishes HMAC from other hash methods is the use of a secret key to provide message authenticity. Only message hashes that were created with the specific secret key material will produce the same HMAC output. This dependence on secret key material ensures that no third party can substitute their own message content and create a valid HMAC without the intended verifier detecting the change.

HMAC in the real world

HMACs have widespread applications and industry adoption because they are fast, high performance, and simple to use. HMACs are particularly popular in the JSON Web Token (JWT) open standard as a means of securing web applications, and have replaced older technologies such as cookies and sessions. In fact, Amazon implements a custom authentication scheme, Signature Version 4 (SigV4), to sign AWS API requests based on a keyed-HMAC. To authenticate a request, you first concatenate selected elements of the request to form a string. You then use your AWS secret key material to calculate the HMAC of that string. Informally, this process is called signing the request, and the output of the HMAC algorithm is informally known as the signature, because it simulates the security properties of a real signature in that it represents your identity and your intent.

Advantages of using HMACs in AWS KMS

AWS KMS HMAC APIs provide several advantages over implementing HMACs in application software because the key material for the HMACs is generated in AWS KMS hardware security modules (HSMs) that are certified under the FIPS 140-2 program and never leave AWS KMS unencrypted. In addition, the HMAC keys in AWS KMS can be managed with the same access control mechanisms and auditing features that AWS KMS provides on all AWS KMS keys. These security controls ensure that any HMAC created in AWS KMS can only ever be verified in AWS KMS using the same KMS key. Lastly, the HMAC keys and the HMAC algorithms that AWS KMS uses conform to industry standards defined in RFC 2104 HMAC: Keyed-Hashing for Message Authentication.

Use HMAC keys in AWS KMS to create JSON Web Tokens

The JSON Web Token (JWT) open standard is a common use of HMAC. The standard defines a portable and secure means to communicate a set of statements, known as claims, between parties. HMAC is useful for applications that need an authorization mechanism, in which claims are validated to determine whether an identity has permission to perform some action. Such an application can only work if a validator can trust the integrity of claims in a JWT. Signing JWTs with an HMAC is one way to assert their integrity. Verifiers with access to an HMAC key can cryptographically assert that the claims and signature of a JWT were produced by an issuer using the same key.

This section will walk you through an example of how you can use HMAC keys from AWS KMS to sign JWTs. The example uses the AWS SDK for Python (Boto3) and implements simple JWT encoding and decoding operations. This example shows the ease with which you can integrate HMAC keys in AWS KMS into your JWT application, even if your application is in another language or uses a more formal JWT library.

Create an HMAC key in AWS KMS

Begin by creating an HMAC key in AWS KMS. You can use the AWS KMS console or call the CreateKey API action. The following example shows creation of a 256-bit HMAC key:

import boto3

kms = boto3.client('kms')

# Use CreateKey API to create a 256-bit key for HMAC
key_id = kms.create_key(
	KeySpec='HMAC_256',
	KeyUsage='GENERATE_VERIFY_MAC'
)['KeyMetadata']['KeyId']

Use the HMAC key to encode a signed JWT

Next, you use the HMAC key to encode a signed JWT. There are three components to a JWT token: the set of claims, header, and signature. The claims are the very application-specific statements to be authenticated. The header describes how the JWT is signed. Lastly, the MAC (signature) is the output of applying the header’s described operation to the message (the combination of the claims and header). All these are packed into a URL-safe string according to the JWT standard.

The following example uses the previously created HMAC key in AWS KMS within the construction of a JWT. The example’s claims simply consist of a small claim and an issuance timestamp. The header contains key ID of the HMAC key and the name of the HMAC algorithm used. Note that HS256 is the JWT convention used to represent HMAC with SHA-256 digest. You can generate the MAC using the new GenerateMac API action in AWS KMS.

import base64
import json
import time

def base64_url_encode(data):
	return base64.b64encode(data, b'-_').rstrip(b'=')

# Payload contains simple claim and an issuance timestamp
payload = json.dumps({
	"does_kms_support_hmac": "yes",
	"iat": int(time.time())
}).encode("utf8")

# Header describes the algorithm and AWS KMS key ID to be used for signing
header = json.dumps({
	"typ": "JWT",
	"alg": "HS256",
	"kid": key_id #This key_id is from the “Create an HMAC key in AWS KMS” #example. The “Verify the signed JWT” example will later #assert that the input header has the same value of the #key_id 
}).encode("utf8")

# Message to sign is of form <header_b64>.<payload_b64>
message = base64_url_encode(header) + b'.' + base64_url_encode(payload)

# Generate MAC using GenerateMac API of AWS KMS
MAC = kms.generate_mac(
	KeyId=key_id, #This key_id is from the “Create an HMAC key in AWS KMS” 
				 #example
	MacAlgorithm='HMAC_SHA_256',
	Message=message
)['Mac']

# Form JWT token of form <header_b64>.<payload_b64>.<mac_b64>
jwt_token = message + b'.' + base64_url_encode(mac)

Verify the signed JWT

Now that you have a signed JWT, you can verify it using the same KMS HMAC key. The example below uses the new VerifyMac API action to validate the MAC (signature) of the JWT. If the MAC is invalid, AWS KMS returns an error response and the AWS SDK throws an exception. If the MAC is valid, the request succeeds and the application can continue to do further processing on the token and its claims.

def base64_url_decode(data):
	return base64.b64decode(data + b'=' * (4 - len(data) % 4), b'-_')

# Parse out encoded header, payload, and MAC from the token
message, mac_b64 = jwt_token.rsplit(b'.', 1)
header_b64, payload_b64 = message.rsplit(b'.', 1)

# Decode header and verify its contents match expectations
header_map = json.loads(base64_url_decode(header_b64).decode("utf8"))
assert header_map == {
	"typ": "JWT",
	"alg": "HS256",
	"kid": key_id #This key_id is from the “Create an HMAC key in AWS KMS” 
				 #example
}

# Verify the MAC using VerifyMac API of AWS KMS. # If the verification fails, this will throw an error.
kms.verify_mac(
	KeyId=key_id, #This key_id is from the “Create an HMAC key in AWS KMS” 
				 #example
	MacAlgorithm='HMAC_SHA_256',
	Message=message,
	Mac=base64_url_decode(mac_b64)
)

# Decode payload for use application-specific validation/processing
payload_map = json.loads(base64_url_decode(payload_b64).decode("utf8"))

Create separate roles to control who has access to generate HMACs and who has access to validate HMACs

It’s often helpful to have separate JWT creators and validators so that you can distinguish between the roles that are allowed to create tokens and the roles that are allowed to verify tokens. HMAC signatures performed outside of AWS-KMS don’t work well for this because you can’t isolate creators and verifiers if they both must have a copy of the same key. However, this is not an issue for HMAC keys in AWS KMS. You can use key policies to separate out who has permission to ask AWS KMS to generate HMACs and who has permission to ask AWS KMS to validate. Each party uses their own unique access keys to access the HMAC key in AWS KMS. Only HSMs in AWS KMS will ever have access to the actual key material. See the following example key policy statements that separate out GenerateMac and VerifyMac permissions:

{
	"Id": "example-jwt-policy",
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "Allow use of the key for creating JWTs",
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::111122223333:role/JwtProducer"
			},
			"Action": [
				"kms:GenerateMac"
			],
			"Resource": "*"
		},
		{
			"Sid": "Allow use of the key for validating JWTs",
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::111122223333:role/JwtConsumer"
			},
			"Action": [
				"kms:VerifyMac"
			],
			"Resource": "*"
		}
	]
}

Conclusion

In this post, you learned about the new HMAC APIs in AWS KMS (GenerateMac and VerifyMac). These APIs complement existing AWS KMS cryptographic operations: symmetric key encryption, asymmetric key encryption and signing, and data key creation and key enveloping. You can use HMACs for JWTs, tokenization, URL and API signing, as a key derivation function (KDF), as well as in new designs that we haven’t even thought of yet. To learn more about HMAC functionality and design, see HMAC keys in AWS KMS in the AWS KMS Developer Guide.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the KMS re:Post or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.

Build a data pipeline to automatically discover and mask PII data with AWS Glue DataBrew

2022-04-13 Samson Lee

Post Syndicated from Samson Lee original https://aws.amazon.com/blogs/big-data/build-a-data-pipeline-to-automatically-discover-and-mask-pii-data-with-aws-glue-databrew/

Personally identifiable information (PII) data handling is a common requirement when operating a data lake at scale. Businesses often need to mitigate the risk of exposing PII data to the data science team while not hindering the productivity of the team to get to the data they need in order to generate valuable data insights. However, there are challenges in striking the right balance between data governance and agility:

Proactively identifying the dataset that contains PII data, if it’s not labeled by the data providers
Determining to what extent the data scientists can access the dataset
Minimizing chances that the data lake operator is visually exposed to PII data when they process the data

To help overcome these challenges, we can build a data pipeline that automatically scans data upon its arrival to the data lake, then further masks the portion of data that is labeled as PII data. Automating the PII data scanning and masking tasks helps prevent human actors processing the data while the PII data is still presented in plain text, yet still provides data consumers timely access to the newly arrived dataset.

To build a data pipeline that can automatically handle PII data, you can use AWS Glue DataBrew. DataBrew is a no-code data preparation tool with pre-built transformations to automate data preparation tasks. It natively supports PII data identification, entity detection, and PII data handling features. In addition to its visual interface for no-code data preparation, it offers APIs to let you orchestrate the creation and running of DataBrew profile jobs and recipe jobs.

In this post, we illustrate how you can orchestrate DataBrew jobs with AWS Step Functions to build a data pipeline to handle PII data. The pipeline is triggered by Amazon Simple Storage Service (Amazon S3) event notifications sent to Amazon EventBridge whenever there is a new data object lands in a S3 bucket. We also include an AWS CloudFormation template for you to deploy as a reference.

Solution overview

The following diagram describes the solution architecture.

Architecture Diagram

The solution includes a S3 bucket as the data input bucket and another S3 bucket as the data output bucket. Data uploaded to the data input bucket sends an event to EventBridge to trigger the data pipeline. The pipeline is composed of a Step Functions state machine, DataBrew jobs, and an AWS Lambda function used for reading the results of the DataBrew profile job.

The solution workflow includes the following steps:

A new data file is uploaded to the data input bucket.
EventBridge receives an object created event from the S3 bucket, and triggers the Step Functions state machine.
The state machine uses DataBrew to register the S3 object as a new DataBrew dataset, and creates a profile job. The profile job results, including the PII statistics, are written to the data output bucket.
A Lambda function reads the profile job results and returns whether the data file contains PII data.
If no PII data is found, the workflow is complete; otherwise, a DataBrew recipe job is created to target the columns that contain PII data.
When running the DataBrew recipe job, DataBrew uses the secret (a base64 encoded string, such as TXlTZWNyZXQ=) stored in AWS Secrets Manager to hash the PII columns.
When the job is complete, the new data file with PII data hashed is written to the data output bucket.

Prerequisites

To deploy the solution, you should have the following prerequisites:

An AWS account
An AWS user with AWS Identity and Access Management (IAM) permissions to manage AWS resources including Amazon S3, DataBrew, Step Functions, Lambda, and Secrets Manager

Deploy the solution using AWS CloudFormation

To deploy the solution using the CloudFormation template, complete the following steps.

Sign in to your AWS account.
Choose Launch Stack:
Navigate to one of the AWS Regions where DataBrew is available (such as us-east-1).
For Stack name, enter a name for the stack or leave as default (automate-pii-handling-data-pipeline).
For HashingSecretValue, enter a secret (which is base64 encoded during the CloudFormation stack creation) to use for data hashing.
For PIIMatchingThresholdValue, enter a threshold value (1–100 in terms of percentage, default is 80) to indicate the desired percentage of records the DataBrew profile job must identify as PII data in a given column, so that the data in the column is further hashed by the subsequent DataBrew PII recipe job.
Select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Create stack.

CloudFormation Quick Launch Page

The CloudFormation stack creation process takes around 3-4 minutes to complete.

Test the data pipeline

To test the data pipeline, you can download the sample synthetic data generated by Mockaroo. The dataset contains synthetic PII fields such as email, contact number, and credit card number.

Data Preview

The sample data contains columns of PII data as an illustration; you can use DataBrew to detect PII values down to the cell level.

On the AWS CloudFormation console, navigate to the Outputs tab for the stack you created.
Choose the URL value for AmazonS3BucketForGlueDataBrewDataInput to navigate to the S3 bucket created for DataBrew data input.
Choose Upload.
Choose Add files to upload the data file you downloaded.
Choose Upload again.
Return to the Outputs tab for the CloudFormation stack.
Choose the URL value for AWSStepFunctionsStateMachine.

You’re redirected to the Step Functions console, where you can review the state machine you created. The state machine should be in a Running state.

In the Executions list, choose the current run of the state machine.

A graph inspector visualizes which step of the pipeline is being run. You can also inspect the step input and output of each step completed.
For the provided sample dataset, with 8 columns containing 1,000 rows of records, the whole run takes approximately 7–8 minutes.

Data pipeline details

While we’re waiting for the steps to complete, let’s explain more of how this data pipeline is built. The following figure is the detailed workflow of the Step Functions state machine.

Step Functions Workflow

The key step in the state machine is the Lambda function used to parse the DataBrew profile job result. The following code is a snippet of the profile job result in JSON format:

{
    "defaultProfileConfiguration": {...
    },
    "entityDetectorConfigurationOverride": {
        "AllowedStatistics": [...
        ],
        "EntityTypes": [
            "USA_ALL",
            "PERSON_NAME"
        ]
    },
    "datasetConfigurationOverride": {},
    "sampleSize": 1000,
    "duplicateRowsCount": 0,
    "columns": [
        {...
        },
        {
            "name": "email_address",
            "type": "string",
            "entity": {
                "rowsCount": 1000,
                "entityTypes": [
                    {
                        "entityType": "EMAIL",
                        "rowsCount": 1000
                    }
                ]
            }...
        }...
    ]...
}

Inside columns, each column object has the property entity if it’s detected to be a column containing PII data. rowsCount inside entity tells us how many rows out of the total sample are identified as PII, followed by entityTypes to indicate the type of PII identified.

The following is the Python code used in the Lambda function:

import json
import boto3
import os

def lambda_handler(event, context):

  s3Bucket = event["Outputs"][0]["Location"]["Bucket"]
  s3ObjKey = event["Outputs"][0]["Location"]["Key"]

  s3 =boto3.client('s3')
  glueDataBrewProfileResultFile = s3.get_object(Bucket=s3Bucket, Key=s3ObjKey)
  glueDataBrewProfileResult = json.loads(glueDataBrewProfileResultFile['Body'].read().decode('utf-8'))
  columnsProfiled = glueDataBrewProfileResult["columns"]
  PIIColumnsList = []

  for item in columnsProfiled:
    if "entityTypes" in item["entity"]:
      if (item["entity"]["rowsCount"]/glueDataBrewProfileResult["sampleSize"]) >= int(os.environ.get("threshold"))/100:
        PIIColumnsList.append(item["name"])

  if PIIColumnsList == []:
    return 'No PII columns found.'
  else:
    return PIIColumnsList

To summarize what the logic of the Lambda function is, a for-loop is implemented to aggregate a list of column names, in which the ratio of PII rows over the total sample size of that column is larger than or equal to the threshold value set earlier in the CloudFormation stack creation step. The Lambda function returns the list of column names to the Step Functions state machine to author a DataBrew recipe that masks only the columns in the returned list, instead of all the columns of the dataset. This way, we retain the content of non-PII columns for the data consumer while not exposing the PII data in plain text.

Step Functions Workflow Studio

We use CRYPTOGRAPHIC_HASH in this solution for the Operation parameter of the DataBrew CreateRecipe step. Because the profile job result and threshold value have already been used to determine which columns contain PII data to mask, the recipe step doesn’t include the parameter entityTypeFilter to enforce all rows of the columns getting hashed. Otherwise, some rows in the column might not be hashed by the operation if the particular rows of data are not identified by DataBrew as PII.

If your dataset potentially contains free-text columns such as doctor notes and email body, it would be beneficial to include the parameter entityTypeFilter in an additional recipe step to handle the free-text columns. For more information, refer to the values supported for this parameter.

To customize the solution further, you can also choose other PII recipe steps available from DataBrew to mask, replace, or transform the data in approaches best suited for your use cases.

Data pipeline results

After a deeper dive into the solution components, let’s check if all the steps in the Step Functions state machine are complete and review the results.

Navigate to the Datasets page on the DataBrew console to view the data profile result of the dataset you just uploaded.

Five columns of the dataset have been identified as columns containing PII data. Depending on the threshold value you set when creating the CloudFormation stack (the default is 80), the column spoken_language wouldn’t be included in the PII data masking step because only 14% of the rows were identified as a name of a person.

Navigate to the Jobs page to inspect the output of the data masking step.
Choose 1 output to see the S3 bucket containing the data output.
Choose the value for Destination to navigate to the S3 bucket.

The data output S3 bucket contains a .json file, which is the data profile result you just reviewed in JSON format. There is also a folder path that contains the data output of the PII data masking task.

Choose the folder path.
Select the CSV file, which is the output of the DataBrew recipe job.
On the Actions menu, choose Query with S3 Select.
In the SQL query section, choose Run SQL query.

The query results sampled five rows from the data output of the DataBrew recipe job; the columns identified as PII (full_name, email_address, and contact_phone_number) have been masked. Congratulations! You have successfully produced a dataset from a data pipeline that detects and masks PII data automatically.

Clean up

To avoid incurring future charges, delete the resources you created as part of this post.

On the AWS CloudFormation console, delete the stack you created (default name is automate-pii-handling-data-pipeline).

Conclusion

In this post, you learned how to build a data pipeline that automatically detects PII data and masks the data accordingly when a new data file arrives in an S3 bucket. With DataBrew profile jobs, you can develop logics with low code to run automatically on the profile results. For this post, our job determined which columns to mask. You can also author the DataBrew recipe job in an automated approach, which helps limit occasions when human actors can access the PII data while it’s still in plain text.

You can learn more about this solution and the source code by visiting the GitHub repository. To learn more about what DataBrew can do in handling PII data, refer to Introducing PII data identification and handling using AWS Glue DataBrew and Personally identifiable information (PII) recipe steps.

About the Author

Author

Samson Lee is a Solutions Architect with a focus on the data analytics domain. He works with customers to build enterprise data platforms, discovering and designing solutions on AI/ML use cases. Samson also enjoys coffee and wine tasting outside of work.

Orchestrating high performance computing with AWS Step Functions and AWS Batch

2022-04-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/orchestrating-high-performance-computing-with-aws-step-functions-and-aws-batch/

This post is written by Dan Fox, Principal Specialist Solutions Architect; Sabha Parameswaran, Senior Solutions Architect.

High performance computing (HPC) workloads address challenges in a wide variety of industries, including genomics, financial services, oil and gas, weather modeling, and semiconductor design. These workloads frequently have complex orchestration requirements that may include large datasets and hundreds or thousands of compute tasks.

AWS Step Functions is a workflow automation service that can simplify orchestrating other AWS services. AWS Batch is a fully managed batch processing service that can dynamically scale to address computationally intensive workloads. Together, these services can orchestrate and run demanding HPC workloads.

This blog post identifies three common challenges when creating HPC workloads. It describes some features with Step Functions and AWS Batch that can help to solve these challenges. It then shows a sample project that performs complex task orchestration using Step Functions and AWS Batch.

Reaching service quotas with polling iterations

It’s common for HPC workflows to require that a step comprising multiple jobs completes before advancing to the next step. In these cases, it’s typical for developers to implement iterative polling patterns to track job completions.

To handle task orchestration for a workload like this, you may choose to use Step Functions. Step Functions has a hard limit of 25,000 history events. A single state transition may contain multiple history events. For example, an entry to the state and an exit from the state count as two steps. A workflow that iteratively polls many long-running processes may run into limits with this service quota.

Step Functions addresses this by providing synchronization capabilities with several integrated services, including AWS Batch. For integrated services, Step Functions can wait for a task to complete before progressing to the next state. This feature, called “Run a Job (.sync)” in the documentation, returns a Success or Failure message only when the compute service task is complete, reducing the number of entries in the event history log. View the Step Functions documentation for a complete list of supported service integrations.

Supporting parallel and dynamic tasks

HPC workloads may require a changing number of compute tasks from execution to execution. For example, the number of compute tasks required in a workflow may depend on the complexity or length of an input dataset. For performance reasons, you may desire for these tasks to run in parallel.

Step Functions supports faster data processing with a fixed number of parallel executions through the parallel state type. If a workload has an unknown number of branches, the map state type can run a set of parallel steps for each element of an input array. We refer to this pattern as dynamic parallelism.

Limits and flow control with dynamic tasks

The map state may limit concurrent iterations. When this occurs, some iterations do not begin until previous iterations have completed. The likelihood of this occurring increases when an input array has over 40 items. If your HPC workload benefits from increased concurrency, you may use nested workflows. Step Functions allows you to orchestrate more complex processes by composing modular, reusable workflows.

For example, a map state may invoke secondary, nested workflows, which also contain map states. By nesting Step Functions workflows, you can build larger, more complex workflows out of smaller, simpler workflows.

As your workflow grows in complexity, you may use the callback task, which is an additional flow control feature. Callback tasks provide a way to pause a workflow pending the return of a unique task token. A callback task passes this token to an integrated service and then stops. Once the integrated service has completed, it may return the task token to the Step Functions workflow with a SendTaskSuccess or SendTaskFailure call. Once the callback task receives the task token, the workflow proceeds to the next state.

View the documentation for a list of integrated services that support this pattern.

A sample project with several orchestration patterns

This sample project demonstrates orchestration of HPC workloads using Step Functions, AWS Batch, and Lambda. The nesting in this project is three layers deep. The primary state machine runs the second layer nested state machines, which in turn run the third layer nested state machines.

The primary state machine demonstrates dynamic parallelism. It receives an input payload that contains an array of items used as input for its map step. Each dynamic map branch runs a nested secondary state machine.

The secondary state machine demonstrates several workflow patterns including a Lambda function with callback, a third layer nested state machine in sync mode, an AWS Batch job in sync mode, and a Lambda function calling AWS Batch with a callback. The tertiary state machine only notifies its parent with Success when called in sync mode.

Explore the ASL in this project to review the code for these patterns.

Deploy the sample application

The README of the Github project contains more detailed instructions, but you may also follow these steps.

Prerequisites

AWS Account: If you don’t have an AWS account, navigate to https://aws.amazon.com/ and choose Create an AWS Account.
VPC: A valid existing VPC with subnets (for execution of AWS Batch jobs). Refer to https://docs.aws.amazon.com/vpc/latest/userguide/vpc-getting-started.html for creating a VPC.
AWS CLI: This project requires a local install of AWS CLI. Refer to https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html for installing AWS CLI.
Git CLI: This project requires a local install of Git CLI. Refer to https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
AWS SAM CLI: This project requires a local install of AWS SAM CLI to build and deploy the sample application. Refer to https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html for instructions to install and use AWS SAM CLI.
Python 3.8 and Docker: Required only if you plan for local development and testing with AWS SAM CLI.

Build and deploy in your account

Follow these steps to build this application locally and deploy it in your AWS account:

Git clone the repository into a local folder.

git clone https://github.com/aws-samples/aws-stepfunction-complex-orchestrator-app
cd aws-stepfunction-complex-orchestrator-app

Build the application using AWS SAM CLI.
```
sam build
```
Use AWS SAM deploy with the --guided flag.
```
sam deploy --guided
```
1. Parameter BatchScriptName (accept default: batch-notify-step-function.sh)
2. Parameter VPCID (enter the id of your VPC)
3. Parameter FargateSubnetAccessibility (choose Private or Public depending on subnet configuration)
4. Parameter Subnets (enter ID for a single subnet)
5. Parameter batchSleepDuration (accept default: 15)
6. Accept all other defaults
Note the name of the BucketName parameter in the Outputs of the deploy process. Save this S3 bucket name for the next step.

Copy the batch script into the S3 bucket created in the prior step.

aws s3 cp ./batch-script/batch-notify-step-function.sh s3://<S3-bucket-name>

Testing the sample application

Once the SAM CLI deploy is complete, navigate to Step Functions in the AWS Console.

Note the three new state machines deployed to your account. Each state machine has a random suffix generated as part of the AWS SAM deploy process:
1. ComplexOrchestratorStateMachine1-…
2. SyncNestedStateMachine2-…
3. CallbackNotifyStateMachine3-…
Follow the link for the primary state machine: ComplexOrchestratorStateMachine1-…
Choose Start execution.
The sample payload for this state machine is located here: orchestrator-step-function-payload.json. This JSON document has 2 input elements within the entries array. Select and copy this JSON and paste it into the Input field in the console, overwriting the default value. This causes the map state to create and execute two nested state machines in parallel. You may modify this JSON to contain more elements to increase the number of parallel executions.
View the “Execution event history” in the console for this execution. Under the Resource column, locate the links to the nested state machines. Select these links to follow their individual executions. You may also explore the links to Lambda, CloudWatch, and AWS Batch job.
Navigate to AWS Batch in the console. Step Functions workflows are available to AWS Batch users within the AWS Batch console. Find the integration from the AWS Batch side navigation under “Related AWS services”. You can also read this blog post to learn more.
Common troubleshooting. If the batch job fails or if the Step Functions workflow times out, make sure that you have correctly copied over the batch script into the S3 bucket as described in step 5 of the Build and Deploy in your account section of this post. Also make sure that the FargateSubnetAccessibility Parameter matches the configuration of your subnet (Public or Private).
The state machine may take several minutes to complete. When successful, the Graph Inspector displays:

Cleaning up

To clean up the deployment, run the following commands to delete the stack associated with the AWS SAM deployment:

aws cloudformation delete-stack --stack-name <stack-name>

Conclusion

This blog post describes several challenges common to orchestrating HPC workloads. I describe how Step Functions with AWS Batch can solve many of these challenges. I provide a project that contains several sample patterns and show how to deploy and test this in your account.

For more information on serverless patterns, visit Serverless Land.

Introducing global endpoints for Amazon EventBridge

2022-04-08 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-global-endpoints-for-amazon-eventbridge/

This post is written by Stephen Liedig, Sr Serverless Specialist SA.

Last year, AWS announced two new features for Amazon EventBridge that allow you to route events from any commercial AWS Region, and across your AWS accounts. This supported a wide range of use cases, allowing you to implement easily global event delivery and replication scenarios.

From today, EventBridge extends this capability with global endpoints. Global endpoints provide a simpler and more reliable way for you to improve the availability and reliability of event-driven applications. The feature allows you to fail over event ingestion automatically to a secondary Region during service disruptions. Global endpoints also provide optional managed event replication, simplifying your event bus configuration and reducing the risk of event loss during any service disruption.

This blog post explains how to configure global endpoints in your AWS account, update your applications to publish events to the endpoint, and how to test endpoint failover.

How global endpoints work

Customers building multi-Region architectures today are building more resilience by using self-managed replication via EventBridge’s cross-Region capabilities. With this architecture, events are sent directly to an event bus in a primary Region and replicated to another event bus in a secondary Region.

This event flow can be interrupted if there is a service disruption. In this scenario, event producers in the primary Region cannot PutEvents to their event bus, and event replication to the secondary Region is impacted.

To put more resiliency around multi-Region architectures, you can now use global endpoints. Global endpoints solve these issues by introducing two core service capabilities:

A global endpoint is a managed Amazon Route 53 DNS endpoint. It routes events to the event buses in either Region, depending on the health of the service in the primary Region.
There is a new EventBridge metric called IngestionToInvocationStartLatency. This exposes the time to process events from the point at which they are ingested by EventBridge to the point the first invocation of a target in your rules is made. This is a service-level metric measured across all of your rules and provides an indication of the health of the EventBridge service. Any extended periods of high latency over 30 seconds may indicate a service disruption.

These two features provide you with the ability to failover event ingestion automatically to the event bus in the secondary Region. The failover is triggered via a Route 53 health check that monitors a CloudWatch alarm that observes the IngestionToInvocationStartLatency in the primary Region.

If the metric exceeds the configured threshold of 30 seconds consecutively for 5 minutes, the alarm state changes to “ALARM”. This causes the Route 53 health check state to become unhealthy, and updates the routing of the global endpoint. All events from that point on are delivered to the event bus in the secondary Region.

The diagram below illustrates how global endpoints reroutes events being delivered from the event bus in the primary Region to the event bus in the secondary Region when CloudWatch alarms trigger the failover of the Route 53 health check.

Once events are routed to the secondary Region, you have a couple of options:

Continue processing events by deploying the same solution that processes events in the primary Region to the secondary Region.
Create an EventBridge archive to persist all events coming through the secondary event bus. EventBridge archives provide you with a type of “Active/Archive” architecture allowing you to replay events to the event bus once the primary Region is healthy again.

When the global endpoint alarm returns to a healthy state, the health check updates the endpoint configuration, and begins routing events back to the primary Region.

Global endpoints can be optionally configured to replicate events across Regions. When enabled, managed rules are created on your primary and secondary event buses that define the event bus in the other Region as the rule target.

Under normal operating conditions, events being sent to the primary event bus are also sent to the secondary in near-real-time to keep both Regions synchronized. When a failover occurs, consumers in the secondary Region have an up-to-date state of processed events, or the ability to replay messages delivered to the secondary Region, before the failover happens.

When the secondary Region is active, the replication rule attempts to send events back to the event bus in the primary Region. If the event bus in the primary Region is not available, EventBridge attempts to redeliver the events for up to 24 hours, per its default event retry policy. As this is a managed rule, you cannot change this. To manage for a longer period, you can archive events being ingested by the secondary event bus.

How do you identify if the event has been replicated from another Region? Events routed via global endpoints have identical resource fields, which contain the Amazon Resource Name (ARN) of the global endpoint that routed the event to the event bus. The region field shows the origin of the event. In the following example, the event is sent to the event bus in the primary Region and replicated to the event bus in the secondary Region. The event in the secondary Region is us-east-1, showing the source of the event was the event bus in the primary Region.

If there is a failover, events are routed to the secondary Region and replicated to the primary Region. Inspecting these events, you would expect to see us-west-2 as the source Region.

The two preceding events are identical except for the id. Event IDs can change across API calls so correlating events across Regions requires you to have an immutable, unique identifier. Consumers should also be designed with idempotency in mind. If you are replicating events, or replaying them from archives, this ensures that there are no side effects from duplicate processing.

Setting up a global endpoint

To configure a Global endpoint, define two event buses — one in the “primary” Region (this is the same Region you configure the endpoint in) and one in a “secondary” Region. To ensure that events are routed correctly, the event bus in the secondary Region must have the same name, in the same account, as the primary event bus.

Create two event buses in different Regions with the same name. This is quickly set up using the AWS Command Line Interface (AWS CLI):Primary event bus:
aws events create-event-bus --name orders-bus --region us-east-1Secondary event bus:
aws events create-event-bus --name orders-bus --region us-west-2
Open the Amazon EventBridge console in the Region where you want to create the global endpoint. This aligns with your primary Region. Navigate to the new global endpoints page and create a new endpoint.
In the Endpoint details panel, specify a name for your global endpoint (for example, OrdersGlobalEndpoint) and enter a description.
Select the event bus in the primary Region, orders-bus.
Select the Region used when creating the secondary event bus previously. the secondary event bus by choosing the Region it was created in from the dropdown.
Select the Route 53 health check for triggering failover and recovery. If you have not created one before, choose New health check. This opens an AWS CloudFormation console to create the “LatencyFailuresHealthCheck” health check and CloudWatch alarm in your account. EventBridge provides a template with recommended defaults for a CloudWatch alarm that is triggered when the average latency exceeds 30 seconds for 5 minutes.
Once the CloudFormation stack is deployed, return to the EventBridge console and refresh the dropdown list of health checks. Select the physical ID of the health check you created.
Ensure that event replication is enabled, and create the endpoint.
Once the endpoint is created, it appears in the console. The global endpoint URL contains the EndpointId, which you must specify in PutEvents API calls to publish events to the endpoint.

Testing failover

Once you have created an endpoint, you can test the configuration by creating “catch all” rules on the primary and secondary event buses. The simplest way to see events being processed is to create rules with CloudWatch log group target.

Testing global endpoint failure over is accomplished by inverting the Route 53 health check. This can be accomplished using any of the Route 53 APIs. Using the console, open the Route 53 health checks landing page and edit the “LatencyFailuresHealthCheck” associated with your global endpoint. Check “Invert health check status” and save to update the health check.

Within a few minutes, the health check changes state from “Healthy” to “Unhealthy” and you see events flowing to the event bus in the secondary.

Using the PutEvents API with global endpoints

To use global endpoints in your applications, you must update your current PutEvents API call. All AWS SDKs have been updated to include an optional EndpointId parameter that you must set when publishing events to a global endpoint. Even though you are no longer putting events directly on the event bus, the EventBusName must be defined to validate the endpoint configuration.

PutEvents SDK support for global endpoints requires the AWS Common Runtime (CRT) library, which is available for multiple programming languages, including Python, Node.js, and Java:

https://github.com/awslabs/aws-crt-python
https://github.com/awslabs/aws-crt-nodejs
https://github.com/awslabs/aws-crt-java

To install the awscrt module for Python using pip, run:

python3 -m pip install boto3 awscrt

This example shows how to send an event to a global endpoint using the Python SDK:

import json
import boto3
from datetime import datetime
import uuid
import random

client = session.client('events', config=my_config)

detail = {
    "order_date": datetime.now().isoformat(),
    "customer_id": str(uuid.uuid4()),
    "order_id": str(uuid.uuid4()),
    "order_total": round(random.uniform(1.0, 1000.0), 2)
}

put_response = client.put_events(
    EndpointId=" y6gho8g4kc.veo",
    Entries=[
        {
            'Source': 'com.aws.Orders',
            'DetailType': 'OrderCreated',
            'Detail': json.dumps(detail),
            'EventBusName': 'orders-bus'
        }
    ]
)

Event producers can suffer data loss if the PutEvents API call fails, even if you are using global endpoints. Global endpoints allow you to automate the re-routing of events to another event-bus in another Region, but the health checks triggering the failover won’t be invoked for at least 5 minutes. It’s possible that your applications experience increased error rates for PutEvents operations before the failover occurs and events are routed to a healthy Region. To safeguard against message loss during this time, it’s best practice to use exponential retry and back-off patterns and durable store-and-forward capability at the producer level.

Conclusion

This blog shows how to create an EventBridge global endpoint to improve the availability and reliability of event ingestion of event-driven applications. This example shows how to use the PutEvents in the Python AWS SDK to publish events to a global endpoint.

To create a global endpoint using the API, see CreateEndpoint in the Amazon EventBridge API Reference. You can also create a global endpoint by using AWS CloudFormation using an AWS::Events:: Endpoints resource.

To learn more about EventBridge global endpoints, see the EventBridge Developer Guide. For more serverless learning resources, visit Serverless Land.

Improve reusability and security using Amazon Athena parameterized queries

2022-03-23 Blayze Stefaniak

Post Syndicated from Blayze Stefaniak original https://aws.amazon.com/blogs/big-data/improve-reusability-and-security-using-amazon-athena-parameterized-queries/

Amazon Athena is a serverless interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL, and you only pay for the amount of data scanned by your queries. If you use SQL to analyze your business on a daily basis, you may find yourself repeatedly running the same queries, or similar queries with minor adjustments. Athena parameterized queries enable you to prepare statements you can reuse with different argument values you provide at run time. Athena parameterized queries also provide a layer of security against SQL injection attacks, and mask the query string in AWS CloudTrail for workloads with sensitive data.

This post shows you how to create and run parameterized queries in Athena. This post provides an example of how Athena parameterized queries protect against SQL injection, and shows the CloudTrail events with the masked query string. Lastly, the post reviews functions related to managing Athena prepared statements. If you want to follow along, this post provides steps to set up the components with a sample dataset; alternatively, you can use your own dataset.

Reusability

Athena prepared statements allow you to run and reuse queries within your Athena workgroup. By decoupling the queries from the code, you can update your prepared statements and your applications independent from one another. If a data lake has schema updates, it could require query updates. If multiple applications share the same Athena workgroup and are using similar queries, you can create a new query or update the existing query to serve multiple use cases, without each application being required to adjust similar queries in their own source code. Parameterized queries are currently supported for SELECT, INSERT INTO, CTAS, and UNLOAD statements. For the most current list, refer to Considerations and Limitations in Querying with Prepared Statements.

Security

Athena prepared statements provide a layer of protection against SQL injection. If you are using Athena behind an application interface, free text inputs inherently present a SQL injection threat vector which, if left unmitigated, could result in data exfiltration. When the parameterized query is run, Athena interprets the arguments as literal values, not as executable commands nor SQL fragments like SQL operators.

When using Athena, CloudTrail captures all Athena API calls as audit events to provide a record of actions taken by an AWS user, role, or AWS service. Customers with sensitive data in their data lakes, such as personally identifiable information (PII), have told us they don’t want query strings in their CloudTrail event history for compliance reasons. When running parameterized queries, the query string is masked with HIDDEN_DUE_TO_SECURITY_REASONS in the CloudTrail event, so you don’t show protected data within your log streams.

Solution overview

This post documents the steps using the public Amazon.com customer reviews dataset; however, you can follow similar steps to use your own dataset.

The example query is to find a product’s 4-star (out of 5 stars) reviews voted as the most helpful by other customers. The intent behind the query is to find query results that indicate constructive product feedback. The intent is to validate the feedback and get helpful feedback incorporated into the product roadmap. The product used in this use case is the Amazon Smile eGift Card.

Prerequisites

As a prerequisite, you need a foundational understanding of SQL syntax, as well as a foundational understanding the following AWS services:

Athena
AWS CloudFormation
AWS Glue and the AWS Glue Data Catalog
AWS Identity and Access Management (IAM)
Amazon S3

This post assumes you have:

A CloudTrail trail created – for instructions, refer to Creating a trail.
An S3 bucket created – for instructions, refer to Creating a bucket.

Deploy resources for the example dataset

If you’re using the example dataset, follow the steps in this section. The data is in an S3 bucket in an AWS-managed AWS account. You need to create Athena and AWS Glue resources to get started.

This post provides a CloudFormation template that deploys the following resources in your AWS account:

AthenaWorkGroup – An Athena workgroup for your dataset and prepared statements. On the console, this workgroup is named PreparedStatementsWG.
GlueDatabase – A database in the AWS Glue Data Catalog for table metadata. The database is named athena_prepared_statements.
GlueTableAmazonReviews – An external table with Amazon.com customer reviews in the Data Catalog.

The following diagram shows how the resources interact when the query runs.

To deploy the CloudFormation template, follow these steps:

Navigate to this post’s GitHub repository.
Clone the repository or copy the CloudFormation template athena-prepared-statements.yaml.
On the AWS CloudFormation console, choose Create stack.
Select Upload a template file and choose Choose file.
Upload athena-prepared-statements.yaml, then choose Next.
On the Specify stack details page, enter the stack name athena-prepared-statements-blog.
For S3QueryResultsBucketName, enter your S3 bucket name.
If you leave AthenaWorkGroupName as default, the Athena workgroup is named PreparedStatementsWG. If you change the value, the Athena workgroup name must be unique in your AWS Region.
Choose Next.
On the Configure stack options page, choose Next.
On the Review page, choose Create stack.

The script takes less than a minute to run and change to a CREATE_COMPLETE state. If you deploy the stack twice in the same AWS account and Region, the AWS Glue database, table, or Athena workgroup may already exist, and the process fails with a message indicating that the resource already exists in another template.

For least-privilege authorization for deployment of the CloudFormation template, you can create an AWS CloudFormation service role with the following IAM policy actions. To do this, you must create an IAM policy and IAM role, and choose this role when configuring stack options.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:CreateDatabase"
      ],
      "Resource": [
        "arn:${Partition}:glue:${Region}:${Account}:catalog",
        "arn:${Partition}:glue:${Region}:${Account}:database/athena_prepared_statements"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:DeleteDatabase"
      ],
      "Resource": [
        "arn:${Partition}:glue:${Region}:${Account}:catalog",
        "arn:${Partition}:glue:${Region}:${Account}:database/athena_prepared_statements",
        "arn:${Partition}:glue:${Region}:${Account}:table/athena_prepared_statements/*",
        "arn:${Partition}:glue:${Region}:${Account}:userDefinedFunction/athena_prepared_statements/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:CreateTable"
      ],
      "Resource": [
        "arn:${Partition}:glue:${Region}:${Account}:catalog",
        "arn:${Partition}:glue:${Region}:${Account}:database/athena_prepared_statements",
        "arn:${Partition}:glue:${Region}:${Account}:table/athena_prepared_statements/amazon_reviews_parquet"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:DeleteTable"
      ],
      "Resource": [
        "arn:${Partition}:glue:${Region}:${Account}:catalog",
        "arn:${Partition}:glue:${Region}:${Account}:database/athena_prepared_statements",
        "arn:${Partition}:glue:${Region}:${Account}:table/athena_prepared_statements/amazon_reviews_parquet"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "athena:CreateWorkGroup",
        "athena:DeleteWorkGroup",
        "athena:GetWorkGroup"
      ],
      "Resource": "arn:${Partition}:athena:${Region}:${Account}:workgroup/PreparedStatementsWG"
    }
  ]
}

For authorization for the IAM principal running the CloudFormation template and following along, this post was tested with the following AWS managed policies and the customer managed policy below.

AWS managed policies:

AmazonAthenaFullAccess
AWSCloudTrailReadOnlyAccess
AWSCloudFormationFullAccess

Customer managed policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ViewS3BucketsWithoutErrors",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Sid": "InteractWithMyBucketAndDataSetBucket",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::${my-bucket-name}*",
                "arn:aws:s3:::amazon-reviews-pds*"
            ]
        },
        {
            "Sid": "UploadCloudFormationTemplate",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::cf-template*"
        },
        {
            "Sid": "CleanUpResults",
            "Effect": "Allow",
            "Action": [
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::${my-bucket-name}/results*"
            ]
        },
        {
            "Sid": "ListRolesForCloudFormationDeployment",
            "Effect": "Allow",
            "Action": [
                "iam:ListRoles"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Sid": "IAMRoleForCloudFormationDeployment",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                "arn:${Partition}:iam::${Account}:role/${role-name}"
            ],
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": "cloudformation.amazonaws.com"
                }
            }
        }
    ]
}

Partition the example dataset

The CloudFormation template created an external table pointing at a dataset of over 130 million customer reviews from Amazon.com. Partitioning data improves query performance and reduces cost by restricting the amount of data scanned by each query. The external table for this dataset has Hive-compatible partitions. The MSCK REPAIR TABLE SQL statement scans the prefix paths in the S3 bucket and updates the metadata in the Data Catalog with the partition metadata. To access the dataset, the external table’s partitions must be updated.

After you deploy the CloudFormation template, complete the following steps:

On the Athena console, choose Query editor in the navigation pane.
For Data Source, enter AwsDataCatalog.
For Database, enter athena_prepared_statements.
On the Workgroup drop-down menu, choose PreparedStatementsWG.
Choose Acknowledge to confirm.
In the query editor pane, run the following SQL statement for your external table:

MSCK REPAIR TABLE athena_prepared_statements.amazon_reviews_parquet;

This query takes approximately 15 seconds to run when tested in us-east-1.

Run the following query to list the available partitions. The example dataset has partitions based on product_category.

SHOW PARTITIONS athena_prepared_statements.amazon_reviews_parquet;

Run a SELECT statement to output a sample of data available in the table:

SELECT * FROM athena_prepared_statements.amazon_reviews_parquet limit 10;

Create prepared statements

To use Athena parameterized queries, first you run the PREPARE SQL statement and specify your positional parameters, denoted by question marks. The Athena prepared statement is stored with a name you specify.

Run the following PREPARE statement in the Athena query editor. This example query, named product_helpful_reviews, provides customer reviews with three parameters for a specified product ID, star rating provided by the reviewer, and minimum number of helpful votes provided to the review by other Amazon.com customers.

PREPARE product_helpful_reviews FROM
SELECT product_id, product_title, star_rating, helpful_votes, review_headline, review_body
FROM amazon_reviews_parquet WHERE product_id = ? AND star_rating = ? AND helpful_votes > ?
ORDER BY helpful_votes DESC
LIMIT 10;

You could also use the CreatePreparedStatement API or SDK. For example, to create your prepared statement from AWS CLI, run the following command:

aws athena create-prepared-statement \
--statement-name "product_helpful_reviews" \
--query-statement "SELECT product_id, product_title, star_rating, helpful_votes, review_headline, review_body FROM amazon_reviews_parquet WHERE product_id = ? AND star_rating = ? AND helpful_votes > ? ORDER BY helpful_votes DESC LIMIT 10;" \
--work-group PreparedStatementsWG \
--region region

For more information on creating prepared statements, refer to SQL Statements in Querying with Prepared Statements.

Run parameterized queries

You can run a parameterized query against the prepared statement with the EXECUTE SQL statement and a USING clause. The USING clause specifies the argument values for the prepared statement’s parameters.

Run the following EXECUTE statement in the Athena query editor. The prepared statement created in the previous section is run with parameters to output 4-star reviews for the Amazon Smile eGift Card product ID with at least 10 helpful votes.

EXECUTE product_helpful_reviews USING 'BT00DDVMVQ', 4, 10;

If you receive the message PreparedStatement product_helpful_reviews was not found in workGroup primary, make sure you selected the PreparedStatementsWG workgroup.

For more information on running parameterized queries, refer to SQL Statements in Querying with Prepared Statements.

Mask query string data in CloudTrail events using parameterized queries

You may want to use parameterized queries to redact sensitive data from the query string visible in CloudTrail events. For example, you may have columns containing PII as parameters, which you don’t want visible in logs. Athena automatically masks query strings from CloudTrail events for EXECUTE statements, replacing the query string with the value HIDDEN_DUE_TO_SECURITY_REASONS. This helps you avoid displaying protected data in your log streams.

To access the CloudTrail event for the query, complete the following steps:

Navigate to the Event history page on the CloudTrail console.
On the drop-down menu, choose Event name.
Search for StartQueryExecution events.

CloudTrail event records for parameterized queries include a queryString value redacted with HIDDEN_DUE_TO_SECURITY_REASONS. The query string is visible in the Athena workgroup’s query history. You can control access by using least-privilege IAM policies to Athena, the AWS Glue Data Catalog, and the Amazon S3 query output location in your workgroup settings. For more information on viewing recent queries, refer to Viewing Recent Queries. For more information on IAM policies, refer to Actions, resources, and condition keys for AWS services.

Layer of protection for SQL injection

In this section, you’re shown an example of a SQL injection attack, and how prepared statements can protect against the same attack. We use the Athena console to invoke the StartQueryExecution API against a table named users with three rows.

SQL injection is an attempt to insert malicious SQL code into requests to change the statement and extract data from your dataset’s tables. Without Athena parameterized queries, if you’re querying a dataset directly or appending user input to a SQL query, and users can append SQL fragments, the dataset may be vulnerable to SQL injection attacks which return unauthorized data in the result set.

This post shows an example of inserting a SQL fragment in a malicious way. In the example, an OR condition which will always return true (such as OR 1=1) is appended to the WHERE clause. The same example query is shown with Athena parameterized queries, and the query fails because it contains an invalid parameter value, since the parameter value is expected to be an integer but contains the characters “OR”. If the parameter was based on a String column, then the same SQL injection attempt would result in the query returning no results because the positional argument is interpreted as a literal parameter value.

Athena provides an additional layer of defense against multi-statement SQL injection attacks. Attempting to perform SQL injection with an executable command (such as DROP) results in a failed query with Athena providing an error Only one sql statement is allowed, because Athena only accepts one executable command per SQL statement submission.

Although Athena prepared statements provide a layer of protection against SQL injection attacks, other precautions provide additional layers of defense. Athena prepared statements can be a part of your defense-in-depth strategy. For more information on layers of security, refer to Amazon Athena Security.

SQL injection example

The intended use of the SELECT query in the example is to receive a small set of values. However, an attacker can manipulate the input to append malicious SQL code. For example, an attacker can input a value of 1 OR 1=1, which appends a true condition to the WHERE clause and returns all records in the table:

SELECT * FROM users WHERE id = 1 OR 1=1;

By appending malicious SQL code, the attacker can retrieve all rows of the users table, as shown in the following screenshot.
An image of the Athena graphical user interface. A query SELECT * FROM users WHERE id = 1 OR 1=1; has been run. All 3 users in the table, with ids 1, 2, and 3, returned with all columns of the table.

SQL injection attempt with a prepared statement

If we create prepared statements with the same query from the previous example, the executable command is passed as a literal argument for the parameter’s value. If a user tries to pass additional SQL, they receive a syntax error because the WHERE clause is based on ID, which expects an integer value.

Create a prepared statement using the same query against the users table:

PREPARE get_user FROM SELECT * FROM users WHERE id = ?

Set the parameter to a legitimate value:

EXECUTE get_user USING 1

The expected result returns, as shown in the following screenshot.

Graphical user interface of Athena running query EXECUTE get_user USING 1. Only the user with id 1 returned.

Now, attempt to pass a malicious value:

EXECUTE get_user USING 1 OR 1=1

Running this prepared statement produces a syntax error, because an integer value is expected, but it receives an invalid integer value of 1 OR 1=1. The query and syntax error are shown in the following screenshot.

Graphical user interface of Athena querying EXECUTE get_user USING 1 OR 1=1. There is an error. The error says "SYNTAX_ERROR: line 1:24: Left side of logical expression must evaluate to a boolean (actual: integer). This query ran against the "default" database, unless qualified by the query. Please post the error message in our forum."

Working with prepared statements

This section describes administrative functions to make it easier to work with prepared statements.

List all prepared statements in my AWS account

To list all prepared statements in an Athena workgroup from the AWS Command Line Interface (AWS CLI), you can run the following command:

aws athena list-prepared-statements --work-group workgroup_name --region region_name

If following the example above, the command will return the following response.

{
  "PreparedStatements": [
    {
      "StatementName": "product_helpful_reviews",
      "LastModifiedTime": "2022-01-14T15:33:07.935000+00:00"
    }
  ]
}

To list all available prepared statements in your AWS account, you can use the AWS APIs. This post provides a sample script using the AWS SDK for Python (Boto3) to loop through all Regions in your account, and provide the prepared statements per Athena workgroup.

Make sure you have AWS credentials where you plan to run the Python script. For more information, refer to Credentials.

Clone the GitHub repo or copy the Python script list-prepared-statements.py from the repo and run the script:

python3 list-prepared-statements.py

Replace <my-profile-name> with your AWS profile name when it prompts you, or leave empty to use default local credentials.

Enter the AWS CLI profile name or leave blank if using instance profile: <my-profile-name>

The following text is the output of the script. If following along, the response returns only the product_helpful_reviews prepared statement.

eu-north-1:
ap-south-1:
eu-west-3:
eu-west-2:
eu-west-1:
ap-northeast-3:
ap-northeast-2:
ap-northeast-1:
sa-east-1:
ca-central-1:
ap-southeast-1:
ap-southeast-2:
eu-central-1:
us-east-1:
        athena-v2-wg: my_select
        PreparedStatementsWG: get_user
        PreparedStatementsWG: get_contacts_by_company
        PreparedStatementsWG: product_helpful_reviews
        PreparedStatementsWG: count_stars
        PreparedStatementsWG: helpful_reviews
        PreparedStatementsWG: get_product_info
        PreparedStatementsWG: check_avg_stars_of_category
        PreparedStatementsWG: my_select_v1
        PreparedStatementsWG: my_select_v2
us-east-2:
us-west-1:
us-west-2:

Update prepared statements

You have a few options for updating prepared statements. You may want to do this to optimize your query performance, change the values you select, or for several other reasons.

Rerun the PREPARE statement with the changes in the Athena query editor or against the StartQueryExecution API.
Use the UpdatePreparedStatement API via the AWS CLI or SDK.

You can use this API to add a description to your prepared statements or update your queries. To update your query statement via this method, you must provide the statement name, workgroup name, updated query statement, and optionally a new description. For more information about the UpdatePreparedStatement API, refer to update-prepared-statement.

You may want to roll out versions of your query. To maintain backward-compatibility for users, you could create a new prepared statement with a different name. For example, the prepared statement could have a version number in its name (such as my_select_v1 and my_select_v2). When necessary, you could communicate changes to teams who rely on the prepared statement, and later deallocate the old prepared statement versions.

Delete prepared statements

To delete a prepared statement, you can use the following query syntax when against the StartQueryExecution API, or from within the Athena query editor:

DEALLOCATE PREPARE product_helpful_reviews

You could also use the DeletePreparedStatement API or SDK. For example, to delete your prepared statement from AWS CLI, run the following command:

aws athena delete-prepared-statement --statement-name product_helpful_reviews --work-group PreparedStatementsWG --region region

Clean up

If you followed along with this post, you created several components that may incur costs. To avoid future charges, remove the resources with the following steps:

Delete the S3 bucket’s results prefix created after you run a query on your workgroup.

With the default template, it’s named <S3QueryResultsBucketName>/athena-results. Use caution in this step. Unless you are using versioning on your S3 bucket, deleting S3 objects cannot be undone.

Delete the Athena prepared statements in the PreparedStatementsWG

You can follow the steps in the Delete prepared statements section of this post using either the DEALLOCATE PREPARE statement or delete-prepared-statement API for each prepared statement you created.

To remove the CloudFormation stack, select the stack on the AWS CloudFormation console, choose Delete, and confirm.

Conclusion

Athena parameterized queries make it easy to decouple your code base from your queries by providing a way to store common queries within your Athena workgroup. This post provided information about how Athena parameterized queries can improve your code reusability and data lake security. We showed how you can set up a sample data lake and start using parameterized queries today. We also provided an example of the protections parameterized queries offers, and detailed additional administrative functions.

You can get started with Athena prepared statements via the Athena console, the AWS CLI, or the AWS SDK. To learn more about Athena, refer to the Amazon Athena User Guide.

Thanks for reading this post! If you have questions about Athena parameterized queries, don’t hesitate to leave a comment in the comments section.

About the Authors

Blayze Stefaniak is a Senior Solutions Architect at AWS who works with public sector, federal financial, and healthcare organizations. Blayze is based out of Pittsburgh. He is passionate about breaking down complex situations into something practical and actionable. His interests include artificial intelligence, distributed systems, and Excel formula gymnastics. Blayze holds a B.S.B.A. in Accounting and B.S. in Information Systems from Clarion University of Pennsylvania. In his spare time, you can find Blayze listening to Star Wars audiobooks, trying to make his dogs laugh, and probably talking on mute.

Daniel Tatarkin is a Solutions Architect at Amazon Web Services (AWS) supporting Federal Financial organizations. He is passionate about big data analytics and serverless technologies. Outside of work, he enjoys learning about personal finance, coffee, and trying out new programming languages for fun.

Federated access to Amazon Redshift clusters in AWS China Regions with Active Directory Federation Services

2022-03-23 Clement Yuan

Post Syndicated from Clement Yuan original https://aws.amazon.com/blogs/big-data/federated-access-to-amazon-redshift-clusters-in-aws-china-regions-with-active-directory-federation-services/

Many customers already manage user identities through identity providers (IdPs) for single sign-on access. With an IdP such as Active Directory Federation Services (AD FS), you can set up federated access to Amazon Redshift clusters as a mechanism to control permissions for the database objects by business groups. This provides a seamless user experience, and centralizes the governance of authentication and permissions for end-users. For more information, refer to the blog post series “Federate access to your Amazon Redshift cluster with Active Directory Federation Services (AD FS)” (part 1, part 2).

Due to the differences in the implementation of Amazon Web Services in China, customers have to adjust the configurations accordingly. For example, AWS China Regions (Beijing and Ningxia) are in a separate AWS partition, therefore all the Amazon Resource Names (ARNs) include the suffix -cn. AWS China Regions are also hosted at a different domain: www.amazonaws.cn.

This post introduces a step-by-step procedure to set up federated access to Amazon Redshift in AWS China Regions. It pinpoints the key differences you should pay attention to and provides a troubleshooting guide for common errors.

Solution overview

The following diagram illustrates the process of Security Assertion Markup Language 2.0 (SAML)-based federation access to Amazon Redshift in AWS China Regions. The workflow includes the following major steps:

The SQL client provides a user name and password to AD FS.
AD FS authenticates the credential and returns a SAML response if successful.
The SQL client makes an API call to AWS Security Token Service (AWS STS) to assume a preferred role with SAML.
AWS STS authenticates the SAML response based on the mutual trust and returns temporary credentials if successful.
The SQL client communicates with Amazon Redshift to get back a database user with temporary credentials, then uses it to join database groups and connect to the specified database.

We organize the walkthrough in the following high-level steps:

Configure an AD FS relying party trust for AWS China Regions and define basic claim rules.
Provision an AWS Identity and Access Management (IAM) identity provider and roles.
Complete the remaining relying party trust’s claim rules based on the IAM resources.
Connect to Amazon Redshift with federated access via a JDBC-based SQL client.

Prerequisites

This post assumes that you have the following prerequisites:

Windows Server 2016
The ability to create users and groups in AD
The ability to configure a relying party trust and define claim rules in AD FS
An AWS account
Sufficient permissions to provision IAM identity providers, roles, Amazon Virtual Private Cloud (Amazon VPC) related resources, and an Amazon Redshift cluster via AWS Cloud Development Kit (AWS CDK)

Configure an AD FS relying party trust and define claim rules

A relying party trust allows AWS and AD FS to communicate with each other. It is possible to configure two relying party trusts for both AWS China Regions and AWS Regions in the same AD FS at the same time. For AWS China Regions, we need to use a different SAML metadata document at https://signin.amazonaws.cn/static/saml-metadata.xml. The relying party’s identifier for AWS China Regions is urn:amazon:webservices:cn-north-1, whereas that for AWS Global Regions is urn:amazon:webservices. Note down this identifier to use later in this post.

Add AD groups and users

With SAML-based federation, end-users assume an IAM role and use it to join multiple database (DB) groups. The permissions on such roles and DB groups can be effectively managed by AD groups. We use different prefixes in AD group names to distinguish them, which help map to roles and DB group claim rules. It’s important to distinguish correctly the two types of AD groups because they’re mapped to different AWS resources.

We continue our walkthrough with an example. Suppose in business there are two roles: data scientist and data engineer, and two DB groups: oncology and pharmacy. Data scientists can join both groups, and data engineers can only join the pharmacy group. On the AD side, we define one AD group for each role and group. On the AWS side, we define one IAM role for each role and one Amazon Redshift DB group for each DB group. Suppose Clement is a data scientist and Jimmy is a data engineer, and both are already managed by AD. The following diagram illustrates this relationship.

You may create the AD groups and users with either the AWS Command Line Interface (AWS CLI) or the AWS Management Console. We provide sample commands in the README file in the GitHub repo.

Follow substeps a to o of Step 2 in Setting up JDBC or ODBC Single Sign-on authentication with AD FS to set up the relying party with the correct SAML metadata document for AWS China Regions and define the first three claim rules (NameId, RoleSessionName, and Get AD Groups). We resume after the IAM identity provider and roles are provisioned.

Provision an IAM identity provider and roles

You establish the trust for AD with AWS by provisioning an IAM identity provider. The IAM identity provider and assumed roles should be in one AWS account, otherwise you get the following error message during federated access: “Principal exists outside the account of the Role being assumed.” Follow these steps to provision the resources:

Download the metadata file at https://yourcompany.com/FederationMetadata/2007-06/FederationMetadata.xml from your AD FS server.
Save it locally at /tmp/FederationMetadata.xml.
Check out the AWS CDK code on GitHub.
Use AWS CDK to deploy the stack named redshift-cn:

export AWS_ACCOUNT=YOUR_AWS_ACCOUNT
export AWS_DEFAULT_REGION=cn-north-1
export AWS_PROFILE=YOUR_PROFILE

cdk deploy redshift-cn --require-approval never

The AWS CDK version should be 2.0 or newer. For testing purposes, you may use the AdministratorAccess managed policy for the deployment. For production usage, use a profile with least privilege.

The following table summarizes the resources that the AWS CDK package provisions.

Service	Resource	Count	Notes
Amazon VPC	VPC	1	.
	Subnet	2	.
	Internet gateway	1	.
	Route table	1	.
	Security group	1	.
IAM	SAML Identity provider	1	.
IAM	Role	3	1 service role for cluster 2 federated roles
Amazon Redshift	Cluster	1	1 node, dc2.large
AWS Secrets Manager	Secret	1	.

In this example, the Publicly Accessible setting of the Amazon Redshift cluster is set to Enabled for simplicity. However, in a production environment, you should disable this setting and place the cluster inside a private subnet group. Refer to How can I access a private Amazon Redshift cluster from my local machine for more information.

Configure a security group

Add an inbound rule for your IP address to allow connection to the Amazon Redshift cluster.

Find the security group named RC Default Security Group.
Obtain the pubic IP address of your machine.
Add an inbound rule for this IP address and the default port for Amazon Redshift 5439.

Complete the remaining claim rules

After you provision the IAM identity provider and roles, add a claim rule to define SAML roles. We add a customer claim rule with the name Roles. It finds AD groups with the prefix role_ and replaces it with a combined ARN string. Pay attention to the ARNs of the resources where the partition is aws-cn. Replace AWS_ACCOUNT with your AWS account ID. The following table demonstrates how the selected AD groups are transformed to IAM role ARNs.

Selected AD Group	Transformed IAM Role ARN
`role_data_scientist`	`arn:aws-cn:iam::AWS_ACCOUNT:role/rc_data_scientist`
`role_data_engineer`	`arn:aws-cn:iam::AWS_ACCOUNT:role/rc_data_engineer`

To add the claim rule, open the AD FS management console in your Windows Server and complete the following steps:

Choose Relying Party Trusts, then choose the relying party for AWS China.
Choose Edit Claim Issuance Policy, then choose Add Role.
On the Claim rule template menu, choose Send Claims Using a Custom Rule.
For Claim rule name, enter Roles.
In the Custom rule section, enter the following:

c:[Type == "http://temp/variable", Value =~ "(?i)^role_"]
=> issue(Type = "https://aws.amazon.com/SAML/Attributes/Role",
Value = RegExReplace(c.Value, "role_",
"arn:aws-cn:iam::AWS_ACCOUNT:saml-provider/rc-provider,arn:aws-cn:iam::AWS_ACCOUNT:role/rc_"));

The optional parameters of DbUser, AutoCreate, and DbGroups can be provided via either JDBC connection parameters or SAML attribute values. The benefit of user federation is to manage users in one place centrally. Therefore, the DbUser value should be automatically provided by the SAML attribute. The AutoCreate parameter should always be true, otherwise you have to create DB users beforehand. Finally, the DbGroups parameter could be provided by SAML attributes provided that such relationship is defined in AD.

To summarize, we recommend to provide at least DbUser and AutoCreate in SAML attributes, such that the end-user can save time by composing shorter connection strings. In our example, we provide all three parameters via SAML attributes.

Add a customer claim rule named DbUser. We use an email address as the value for DbUser:

c:[Type == "http://schemas.microsoft.com/ws/2008/06/identity/claims/windowsaccountname", 
 Issuer == "AD AUTHORITY"]
=> issue(store = "Active Directory",
types = ("https://redshift.amazon.com/SAML/Attributes/DbUser"),
query = ";mail;{0}", param = c.Value);

You can also choose a Security Accounts Manager (SAM) account name, which is usually the user name of the email address. Using an email address plays an important role in IAM role policy setting. We revisit this issue later.

Add the custom claim rule named AutoCreate:

=> issue(type = "https://redshift.amazon.com/SAML/Attributes/AutoCreate", value = "true");

Add a customer claim rule named DbGroups. It finds all AD groups with the prefix group_ and lists them as values for DbGroups:

c:[Type == "http://temp/variable", Value =~ "(?i)^group_"]
=> issue(Type = "https://redshift.amazon.com/SAML/Attributes/DbGroups", Value = c.Value);

You can test the preceding setting is correct by obtaining the SAML response via your browser.

Visit https://yourcompany.com/adfs/ls/IdpInitiatedSignOn.aspx on your Windows Server, log in with user clement, and check that the following SAML attributes exist. For user jimmy, the role is rc_data_engineer and the DB group contains only group_pharmacy.

<AttributeStatement>
    <Attribute Name="https://aws.amazon.com/SAML/Attributes/RoleSessionName">
        <AttributeValue>[email protected]</AttributeValue>
    </Attribute>
    <Attribute Name="https://aws.amazon.com/SAML/Attributes/Role">
        <AttributeValue>arn:aws-cn:iam::AWS_ACCOUNT:saml-provider/rc-provider,arn:aws-cn:iam::AWS_ACCOUNT:role/rc_data_scientist</AttributeValue>
    </Attribute>
    <Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbUser">
        <AttributeValue>[email protected]</AttributeValue>
    </Attribute>
    <Attribute Name="https://redshift.amazon.com/SAML/Attributes/AutoCreate">
        <AttributeValue>true</AttributeValue>
    </Attribute>
    <Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbGroups">
        <AttributeValue>group_pharmacy</AttributeValue>
        <AttributeValue>group_oncology</AttributeValue>
    </Attribute>
</AttributeStatement>

The preceding SAML attribute names are verified valid for AWS China Regions. The URLs end with amazon.com. It’s incorrect to change them to amazonaws.cn or amazon.cn.

Connect to Amazon Redshift with a SQL client

We use JDBC-based SQL Workbench/J (SQL client) to connect to the Amazon Redshift cluster. Amazon Redshift uses a DB group to collect DB users. The database privileges are managed collectively at group level. In this post, we don’t dive deep into privilege management. However, you need to create the preceding two DB groups.

Connect to the provisioned cluster and create the groups. You can connect on the AWS Management Console via the query editor with temporary credentials, or via a SQL client with database user admin and password. The password is stored in AWS Secrets Manager. You may need proper permissions for the above operations.

create group group_oncology;
create group group_pharmacy;

Follow the instructions in Connect to your cluster by using SQL Workbench/J to download and install the SQL client and the Amazon Redshift JDBC driver.

We recommend the JDBC driver version 2.1 with AWS SDK driver-dependent libraries.

Test that the cluster is connectable via its endpoint. The primary user name is admin. You retrieve the secret value of the cluster’s password via Secrets Manager. Specify DSILogLevel and LogPath to collect driver logs and help diagnosis. The connection string looks like the following code. Replace CLUSTER_ENDPOINT with the correct value and delete all line breakers. We split the line for readability.

jdbc:redshift://CLUSTER_ENDPOINT.cn-north-1.redshift.amazonaws.com.cn:5439/main
;ssl_insecure=true
;DSILogLevel=3
;LogPath=/tmp

For AWS China Regions, one extra JDBC driver option loginToRp must be set as you set up a separate relying party trust for the AWS China Regions. If an AD user is mapped to more than one AWS role, in the connection string, use preferred_role to specify the exact role to assume for federated access.

Copy the role ARN directly and pay attention to the aws-cn partition.

If the user is mapped to only one role, this option can be omitted.

Replace CLUSTER_ID with the correct cluster identifier. For the user name, enter yourcompany\clement; for the password, enter the credential from AD:

jdbc:redshift:iam://CLUSTER_ID:cn-north-1/main
;ssl_insecure=true
;DSILogLevel=3
;LogPath=/tmp
;loginToRp=urn:amazon:webservices:cn-north-1
;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider
;idp_host=adfsserver.yourcompany.com
;preferred_role=arn:aws-cn:iam::AWS_ACCOUNT:role/rc_data_scientist

When you’re connected, run the SQL statement as shown in the following screenshot.

The user prefixed with IAMA indicates that the user connected with federated access and was auto-created.

As an optional step, in the connection string, you can set the DbUser, AutoCreate, and DbGroups parameters.

Parameters from the connection string are before those from SAML attributes. We recommend you set at least DbUser and AutoCreate via SAML attributes. If it’s difficult to manage DB groups in AD users or you want flexibility, specify DbGroups in the connection string. See the following code:

jdbc:redshift:iam://CLUSTER_ID:cn-north-1/main
;ssl_insecure=true
;DSILogLevel=3
;LogPath=/tmp
;loginToRp=urn:amazon:webservices:cn-north-1
;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider
;idp_host=adfsserver.yourcompany.com
;preferred_role=arn:aws-cn:iam::AWS_ACCOUNT:role/rc_data_scientist
;[email protected]
;AutoCreate=true
;DbGroups=group_oncology

Use an email or SAM account name as DB user

The role policy follows the example policy for using GetClusterCredentials. It further allows redshift:DescribeClusters on the cluster because the role queries the cluster endpoint and port based on its identifier and Region. To make sure that the DB user is the same as the AD user, in this post, we use the following condition to check, where ROLE_ID is the unique identifier of the role:

{"StringEqualsIgnoreCase": {"aws:userid": "ROLD_ID:${redshift:DbUser}"}}

The example policy uses the following condition:

{"StringEqualsIgnoreCase": {"aws:userid": "ROLD_ID:${redshift:DbUser}@yourcompany.com"}}

The difference is apparent. The aws:userid contains the RoleSessionName, which is the email address. The SAM account name is the string before @ in the email address. Because the connection string parameter is before the SAML attribute parameter, we summarize the possible cases as follows:

If SAML attributes contain DbUser:
- If the condition value contains a domain suffix:
  - If the DbUser SAML attribute value is an email address, DbUser must be in the connection string without the domain suffix.
  - If the DbUser SAML attribute value is a SAM account name, DbUser can be omitted in the connection string. Otherwise, the value must not contain a domain suffix.
- If the condition value doesn’t contain a domain suffix:
  - If the DbUser SAML attribute value is an email address, DbUser can be omitted in the connection string. Otherwise, the value must contain a domain suffix.
  - If the DbUser SAML attribute value is a SAM account name, DbUser must be in the connection string with a domain suffix.
If SAML attributes don’t contain DbUser:
- If the condition value contains a domain suffix, DbUser must be in the connection string without a domain suffix.
- If the condition value doesn’t contain a domain suffix, DbUser can be omitted in the connection string, because RoleSessionName value which is the email address acts as DbUser. Otherwise, the value must contain a domain suffix.

Troubleshooting

Federated access to Amazon Redshift is a non-trivial process. However, it consists of smaller steps that we can divide and conquer when problems occur. Refer to the access diagram in the solution overview. We can split the process into three phrases:

Is SAML-based federation successful? Verify this by visiting the single sign-on page of AD FS and make sure you can sign in to the console with the federated role. Do you configure the relying party with the AWS China specific metadata document? Obtain the SAML response and check if the destination is https://signin.amazonaws.cn/saml. Are the SAML provider ARN and IAM role ARNs correct? Check if the role’s trust relationship contains the correct value for SAML:aud. For other possible checkpoints, refer to Troubleshooting SAML 2.0 federation with AWS.
Are the role policies correct? If SAML-based federation is successful, check the role policies are correct. Compare yours with those provided by this post. Did you use aws where aws-cn should be used? If the policy condition contains a domain suffix, is it the correct domain suffix? You can obtain the domain suffix in use if you get the error that the assumed role isn’t authorized to perform an action.
Is the SQL client connecting successfully? Is the cluster identifier correct? Make sure that your connection string contains the loginToRp option and points to the AWS China relying party. If multiple IAM roles are mapped, make sure preferred_role is one of them with the correct role ARN. You can get the list of roles in the SAML response. Try to set ssl_insecure to true temporarily for debugging. Check the previous subsection and make sure the DbUser is properly used or set according to the DbUser SAML attribute and condition value for aws:user. Turn on the driver logs and get debug hints there. Sometimes you may need to restart the SQL client to clear the cache and retry.

Security concerns

In a production environment, we suggest applying the following security settings, which aren’t used in this post.

For the Amazon Redshift cluster, complete the following:

Disable the publicly accessible option and place the cluster inside a private or isolated subnet group
Encrypt the cluster, for example, with a customer managed AWS Key Management Service (AWS KMS) key
Enable enhanced VPC routing such that the network doesn’t leave your VPC
Configure the cluster to require Secure Sockets Layer (SSL) and use one-way SSL authentication

For the IAM federated roles:

Specify the exact DB groups for action redshift:JoinGroup. If you want to use a wildcard, make sure it doesn’t permit unwanted DB groups.
Check StringEquals for aws:user against the role ID along with the Amazon Redshift DB user. This condition can be checked for GetClusterCredentials, CreateClusterUser, and JoinGroup actions. Refer to the sample code for detailed codes.

In Amazon Redshift, the DB group is used to manage privileges for a collection of DB users. A DB user joins some DB groups during a login session and is granted the privileges associated to the groups. As we discussed before, you can use either the SAML attribute value or the connection property to specify the DB groups. The Amazon Redshift driver prefers the value from the connection string to that from the SAML attribute. As a result, the end-user can override the DB groups in the connection string. Therefore, to confine the privileges a DB user can be granted, the IAM role policy must restrict which DB groups the DB user is allowed to join safely, otherwise there might be a security risk. The following policy snippet shows such a risk. Always follow the least privilege principle when defining permission policies.

{
    "Effect": "Allow",
    "Action": "redshift:JoinGroup",
    "Resource": "*"
}

Clean up

Run the following command to destroy the resources and stop incurring charges:

cdk destroy redshift-cn --force

Remove the users and groups created in the AD FS. Finally, remove the relying party trust for AWS China Regions in your AD FS if you don’t need it anymore.

Conclusion

In this post, we walked you through how to connect to Amazon Redshift in China with federated access based on AD FS. AWS China Regions are in a partition different from other AWS Regions, so you must pay special attention during the configuration. In summary, you need to check AWS resources ARNs with the aws-cn partition, SAML-based federation with the AWS China specific metadata document, and an Amazon Redshift JDBC driver with extra connecting options. This post also discusses different usage scenarios for the redshift:Dbuser parameter and provides common troubleshooting suggestions.

For more information, refer to the Amazon Redshift Cluster Management Guide. Find the code used for this post in the following GitHub repository.

About the Authors

Wenjun Yuan is a Cloud Infra Architect in AWS Professional Services based in Chengdu, China. He works with various customers, from startups to international enterprises, helping them build and implement solutions with state-of-the-art cloud technologies and achieve more in their cloud explorations. He enjoys reading poetry and traveling around the world in his spare time.

Khoa Nguyen is a Big Data Architect in AWS Professional Services. He works with large enterprise customers and AWS partners to accelerate customers’ business outcomes by providing expertise in Big Data and AWS services.

Yewei Li is a Data Architect in AWS Professional Services based in Shanghai, China. He works with various enterprise customers to design and build data warehousing and data lake solutions on AWS. In his spare time, he loves reading and doing sports.

How to re-platform and modernize Java web applications on AWS

2022-03-23 Rick Armstrong

Post Syndicated from Rick Armstrong original https://aws.amazon.com/blogs/compute/re-platform-java-web-applications-on-aws/

This post is written by: Bill Chan, Enterprise Solutions Architect

According to a report from Grand View Research, “the global application server market size was valued at USD 15.84 billion in 2020 and is expected to expand at a compound annual growth rate (CAGR) of 13.2% from 2021 to 2028.” The report also suggests that Java based application servers “accounted for the largest share of around 50% in 2020.” This means that many organizations continue to rely on Java application server capabilities to deliver middleware services that underpin the web applications running their transactional, content management and business process workloads.

The maturity of the application server technology also means that many of these web applications were built on traditional three-tier web architectures running in on-premises data centers. And as organizations embark on their journey to cloud, the question arises as to what is the best approach to migrate these applications?

There are seven common migration strategies when moving applications to the cloud, including:

Retain – keeping applications running as is and revisiting the migration at a later stage
Retire – decommissioning applications that are no longer required
Repurchase – switching from existing applications to a software-as-a-service (SaaS) solution
Rehost – moving applications as is (lift and shift), without making any changes to take advantage of cloud capabilities
Relocate – moving applications as is, but at a hypervisor level
Replatform – moving applications as is, but introduce capabilities that take advantage of cloud-native features
Refactor – re-architect the application to take full advantage of cloud-native features

Refer to Migrating to AWS: Best Practices & Strategies and the 6 Strategies for Migrating Applications to the Cloud for more details.

This blog focuses on the ‘replatform’ strategy, which suits customers who have large investments in application server technologies and the business case for re-architecting doesn’t stack up. By re-platforming their applications into the cloud, customers can benefit from the flexibility of a ‘pay-as-you-go’ model, dynamically scale to meet demand and provision infrastructure as code. Additionally, customers can increase the speed and agility to modernize existing applications and build new cloud-native applications to deliver better customer experiences.

In this post, we walk through the steps to replatform a simple contact management Java application running on an open-source Tomcat application server, along with modernization aspects that include:

Deploying a Tomcat web application with automatic scaling capabilities
Integrating Tomcat with Redis cache (using Redisson Session Manager for Tomcat)
Integrating Tomcat with Amazon Cognito for authentication (using Boyle Software’s OpenID Connect Authenticator for Tomcat)
Delegating user log in and sign up to Amazon Cognito

Overview of solution

The solution is comprised of the following components:

A VPC across two Availability Zones
Two public subnets, two private app subnets, and two private DB subnets
An Internet Gateway attached to the VPC
- A public route table routing internet traffic to the Internet Gateway
- Two private route tables routing traffic internally within the VPC
A frontend web server application Elastic Load Balancing that routes traffic to the Apache Web Servers
An Auto Scaling group that launches additional Apache Web Servers based on defined scaling policies. Each instance of the web server is based on a launch template, which defines the same configuration for each new web server.
A hosted zone in Amazon Route 53 with a domain name that routes to the frontend web server Elastic Load Balancing
An application Elastic Load Balancing that routes traffic to the Tomcat application servers
An Auto Scaling group that launches additional Tomcat Application Servers based on defined scaling policies. Each instance of the Tomcat application server is based on a launch template, which defines the same configuration and software components for each new application server
A Redis cache cluster with a primary and replica node to store session data after the user has authenticated, making your application servers stateless
A Redis open-source Java client, with a Tomcat Session Manager implementation to store authenticated user session data in Redis cache
A MySQL Amazon Relational Database Service (Amazon RDS) Multi-AZ deployment for MySQL RDS to store the contact management and role access tables
An Amazon Simple Storage Service (Amazon S3) bucket to store the application and framework artifacts, images, scripts and configuration files that are referenced by any new Tomcat application server instances provisioned by automatic scaling
Amazon Cognito with a sign-up Lambda function to register users and insert a corresponding entry in the user account tables. Cognito acts as an identity provider and performs the user authentication using an OpenID Connect Authenticator Java component

Walkthrough

The following steps overviews how to deploy the blog solution:

Clone and build the Sample Web Application and AWS Signup Lambda Maven projects from GitHub repository
Deploy the CloudFormation template (java-webapp-infra.yaml) to create the AWS networking infrastructure and the CloudFormation template (java-webapp-rds.yaml) to create the database instance
Update and build the sample web application and signup Lambda function
Upload the packages into your S3 bucket
Deploy the CloudFormation template (java-webapp-components.yaml) to create the blog solution components
Update the solution configuration files and upload them into your S3 bucket
Run a script to provision the underlying database tables
Validate the web application, session cache and automatic scaling functionality
Clean up resources

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
An Amazon Elastic Compute Cloud (Amazon EC2) key pair (required for authentication). For more details, see Amazon EC2 key pairs
A Java Integrated Development Environment (IDE) such as Eclipse or NetBeans. AWS also offers a cloud-based IDE that lets you write, run and debug code in your browser without having to install files or configure your development machine, called AWS Cloud9. I will show how AWS Cloud9 can be used as part of a DevOps solution in a subsequent post
A valid domain name and SSL certificate for the deployed web application. To validate the OAuth 2.0 integration, Cognito requires the URL that the user is redirected to after successful sign-in to be HTTPS. Refer to a configuring a user pool app client for more details
Downloaded the following JARs:

Note: the solution was validated in the preceding versions and therefore, the launch template created for the CloudFormation solution stack refers to these specific JARs. If you decide to use different versions, then the ‘java-webapp-components.yaml’ will need to be updated to reflect the new versions. Alternatively, you can externalize the parameters in the template.

Clone the GitHub repository to your local machine

This repository contains the sample code for the Java web application and post confirmation sign-up Lambda function. It also contains the CloudFormation templates required to set up the AWS infrastructure, SQL script to create the supporting database and configuration files for the web server, Tomcat application server and Redis cache.

Deploy infrastructure CloudFormation template

2. Create the infrastructure stack using the java-webapp-infra.yaml template (located in the ‘config’ directory of the repo).

3. Infrastructure stack outputs:

Deploy database CloudFormation template

Log in to the AWS Management Console and open the CloudFormation service.
Create the infrastructure stack using the java-webapp-rds.yaml template (located in the ‘config’ directory of the repo).
Database stack outputs.

Update and build sample web application and signup Lambda function

Import the ‘sample-webapp’ and ‘aws-signup-lambda’ Maven projects from the repository into your IDE.

Update the sample-webapp’s UserDAO class to reflect the RDSEndpoint, DBUserName, and DBPassword from the previous step:”

// Externalize and update jdbcURL, jdbcUsername, jdbcPassword parameters specific to your environment
	private String jdbcURL = "jdbc:mysql://<RDSEndpoint>:3306/webappdb?useSSL=false";
	private String jdbcUsername = "<DBUserName>";
	private String jdbcPassword = "<DBPassword>";

To build the ‘sample-webapp’ Maven project, use the standard ‘clean install’ goals.

Update the aws-signup-lambda’s signupHandler class to reflect RDSEndpoint, DBUserName, and DBPassword from the solution stack:

// Update with your database connection details
		String jdbcURL = "jdbc:mysql://<RDSEndpoint>:3306/webappdb?useSSL=false";
		String jdbcUsername = "<DBUserName>";
		String jdbcPassword = "<DBPassword>";

To build the aws-signup-lambda Maven project, use the ‘package shade:shade’ goals to include all dependencies in the package.
Two packages are created in their respective target directory: ‘sample-webapp.war’ and ‘create-user-lambda-1.0.jar’

Upload the packages into your S3 bucket

Log in to the AWS Management Console and open the S3 service.
Select the bucket created by the infrastructure CloudFormation template in an earlier step.

3. Create a ‘config’ and ‘lib’ folder in the bucket.

4. Upload the ‘sample-webapp.war’ and ‘create-user-lambda-1.0.jar’ created an earlier step (along with the downloaded packages from the pre-requisites section) into the ‘lib’ folder of the bucket. The ‘lib’ folder should look like this:

Note: the solution was validated in the preceding versions and therefore, the launch template created for the CloudFormation solution stack refers to these specific package names.

Deploy the solution components CloudFormation template

1. Log in to the AWS Management Console and open the CloudFormation service (if you aren’t already logged in from the previous step).

2. Create the web application solution stack using the ‘java-webapp-components.yaml’ template (located in the ‘config’ directory of the repo).

3. Guidance on the different template input parameters:

a. BastionSGSource – default is 0.0.0.0/0, but it is recommended to restrict this to your allowed IPv4 CIDR range for additional security

b. BucketName – the bucket name created as part of the infrastructure stack. This blog uses the bucket name is ‘chanbi-java-webapp-bucket’

c. CallbackURL – the URL that the user is redirected to after successful sign up/sign in is composed of your domain name (blog.example.com), the application root (sample-webapp), and the authentication form action ‘j_security_check’. As noted earlier, this needs to be over HTTPS

d. CreateUserLambdaKey – the S3 object key for the signup Lambda package. This blog uses the key ‘lib/create-user-lambda-1.0.jar’

e. DBUserName – the database user name for the MySQL RDS. Make note of this as it will be required in a subsequent step

f. DBUserPassword – the database user password. Make note of this as it will be required in a subsequent step

g. KeyPairName – the key pair to use when provisioning the EC2 instances. This key pair was created in the pre-requisite step

h. WebALBSGSource – the IPv4 CIDR range allowed to access the web app. Default is 0.0.0.0/0

i. The remaining parameters are import names from the infrastructure stack. Use default settings

4. After successful stack creation, you should see the following java web application solution stack output:

Update configuration files

The GitHub repository’s ‘config’ folder contains the configuration files for the web server, Tomcat application server and Redis cache, which needs to be updated to reflect the parameters specific to your stack output in the previous step.
Update the virtual hosts in ‘httpd.conf’ to proxy web traffic to the internal app load balancer. Use the value defined by the key ‘AppALBLoadBalancerDNS’ from the stack output.
```
<VirtualHost *:80>
ProxyPass / http://<AppALBLoadBalancerDNS>:8080/
ProxyPassReverse / http://<AppALBLoadBalancerDNS>:8080/
</VirtualHost>
```

Update JDBC resource for the ‘webappdb’ in the ‘context.xml, with the values defined by the RDSEndpoint, DBUserName, and DBPassword from the solution components CloudFormation stack:

<Resource name="jdbc/webappdb" auth="Container" type="javax.sql.DataSource"
               maxTotal="100" maxIdle="30" maxWaitMillis="10000"
               username="<DBUserName>" password="<DBPassword>" driverClassName="com.mysql.jdbc.Driver"
               url="jdbc:mysql://<RDSEndpoint>:3306/webappdb"/>

Log in to the AWS Management Console and open the Amazon Cognito service. Select ‘Manage User Pools’ and you will notice that a ‘java-webapp-pool’ has been created by the solution components CloudFormation stack. Select the ‘java-webapp-pool’ and make note of the ‘Pool Id’, ‘App client id’ and ‘App client secret’.

5. Update ‘Valve’ configuration in the ‘context.xml’, with the ‘Pool Id’, ‘App client id’ and ‘App client secret’ values from the previous step. The Cognito IDP endpoint specific to your Region can be found here. The host base URI needs to be replaced with the domain for your web application.

    <Valve className="org.bsworks.catalina.authenticator.oidc.tomcat90.OpenIDConnectAuthenticator"
       providers="[
           {
               name: 'Amazon Cognito',
               issuer: https://<cognito-idp-endpoint-for-you-region>/<cognito-pool-id>,
               clientId: <user-pool-app-client-id>,
               clientSecret: <user-pool-app-client-secret>
           }
       ]"
        hostBaseURI="https://<your-sample-webapp-domain>" usernameClaim="email" />

6. Update the ‘address’ parameter in ‘redisson.yaml’ with Redis cluster endpoint. Use the value defined by the key ‘RedisClusterEndpoint’ from the solution components CloudFormation stack output.

singleServerConfig:
    address: "redis://<RedisClusterEndpoint>:6379"

7. No updates are required to the following files:

a. server.xml – defines a data source realm for the user names, passwords, and roles assigned to users

      <Realm className="org.apache.catalina.realm.DataSourceRealm"
   dataSourceName="jdbc/webappdb" localDataSource="true"
   userTable="user_accounts" userNameCol="user_name" userCredCol="user_pass"
   userRoleTable="user_account_roles" roleNameCol="role_name" debug="9" />
      </Realm>

b. tomcat.service – allows Tomcat to run as a service

c. uninstall-sample-webapp.sh – removes the sample web application

Upload configuration files into your S3 bucket

Upload the configuration files from the previous step into the ‘config’ folder of the bucket. The ‘config’ folder should look like this:

Update the Auto Scaling groups

Auto Scaling groups manage the provisioning and removal of the web and application instances in our solution. To start an instance of the web server, update the Auto Scaling group’s desired capacity (1), minimum capacity (1) and maximum capacity (2) as shown in the following image:

2. To start an instance of the application server, update the Auto Scaling group’s desired capacity (1), minimum capacity (1) and maximum capacity (2) for as shown in the following image:

The web and application scaling groups will show a status of “Updating capacity” (as shown in the following image) as the instances start up.

After web and application servers have started, an instance will appear under ‘Instance management’ with a ‘Healthy’ status for each Auto Scaling group (as shown in the following image).

Run the database script webappdb_create_tables.sql

The database script creates the database and underlying tables required by the web application. As the database server resides in the DB private subnet and is only accessible from the application server instance, we need to first connect (via SSH) to the bastion host (using public IPv4 DNS), and from there we can connect (via SSH) to the application server instance (using its private IPv4 address). This will in turn allow us to connect to the database instance and run the database script. Refer to connecting to your Linux instance using SSH for more details. Instance details are located under the ‘Instances’ view (as shown in the following image).

2. Transfer the database script webappdb_create_tables.sql to the application server instance via the Bastion Host. Refer to transferring files using a client for details.

3. Once connected to the application server via SSH, execute the command to connect to the database instance:

mysql -h <RDSEndpoint> -P 3306 -u <DBUserName> -p

4. Enter the DB user password used when creating the database instance. You will be presented with the MySQL prompt after successful login:

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 300
Server version: 8.0.23 Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]>

5. Run the command to run the database script webappdb_create_tables.sql:

source /home/ec2-user/webappdb_create_tables.sql

Add an HTTPS listener to the external web load balancer

Log in to the AWS Management Console and select Load Balancers as part of the EC2 service
Add a HTTPS listener on port 443 for the web load balancer. The default action for the listener is to forward traffic to the web instance target group. Refer to create an HTTPS listener for your Application Load Balancer for more details.

Reference the SSL certificate for your domain. In the following example, I have used a certificate from AWS Certificate Manager (ACM) for my domain. You also have the option of using a certificate from Identity Access Management or importing your own certificate.

Update your DNS to route traffic from the external web load balancer

In this example, I use Amazon Route 53 as the Domain Name Server (DNS) service, but the steps will be similar when using your own DNS service.
Create an A record type that routes traffic from your domain name to the external web load balancer. For more details, refer to creating records by using the Amazon Route 53 console.

Validate the web application

In your browser, access the following https://<yourdomain.example.com>/sample-webapp
Select “Amazon Cognito” to authenticate using Cognito as the Identity Provider (IdP). You will be redirected to the login page for your Cognito domain.
Select the “Sign up” to create a new user and enter your email and password. Note the password strength requirements that can be configured as part of the user pool’s policies.
An email with the verification code will be sent to the sign-up email address. Enter the code on the verification code screen.
After successful confirmation, you will be re-directed to the authenticated landing page for the web application.
The simple web application allows you to add, edit, and delete contacts as shown in the following image.

Validate the session data on Redis

Follow the steps outlined in connecting to nodes for details on connecting to your Redis cache cluster. You will need to connect to your application server instance (via the bastion host) to perform this as the Redis cache is only accessible from the private subnet.
After successfully installing the Redis client, search for your authenticated user session key in the cluster by running the command (from within the ‘redis-stable’ directory):
```
src/redis-cli -c -h <RedisClusterEndpoint> -p 6379 -–bigkeys
```

You should see an output with your Tomcat authenticated session (if you can’t, perform another login via the Cognito login screen):

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest hash   found so far '"redisson:tomcat_session:AE647D93F2BECEFEE07B5B42C435E3DE"' with 8 fields

Connect to the cache cluster:

# src/redis-cli -c -h <RedisClusterEndpoint> -p 6379

Run the HGETALL command to get the session details:

java-webapp-redis-cluster.<xxxxxx>.0001.apse2.cache.amazonaws.com:6379> HGETALL "redisson:tomcat_session:AE647D93F2BECEFEE07B5B42C435E3DE"
 1) "session:creationTime"
 2) "\x04L\x00\x00\x01}\x16\x92\x1bX"
 3) "session:lastAccessedTime"
 4) "\x04L\x00\x00\x01}\x16\x92%\x9c"
 5) "session:principal"
 6) "\x04\x04\t>@org.apache.catalina.realm.GenericPrincipal$SerializablePrincipal\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x04>\x04name\x16\x00>\bpassword\x16\x00>\tprincipal\x16\x00>\x05roles\x16\x00\x16>\[email protected]>\[email protected]\x01B\x01\x14>\bstandard"
 7) "session:maxInactiveInterval"
 8) "\x04K\x00\x00\a\b"
 9) "session:isValid"
10) "\x04P"
11) "session:authtype"
12) "\x04>\x04FORM"
13) "session:isNew"
14) "\x04Q"
15) "session:thisAccessedTime"
16) "\x04L\x00\x00\x01}\x16\x92%\x9c"

Scale your web and application server instances

Amazon EC2 Auto Scaling provides several ways for you to scale instances in your Auto Scaling group such as scaling manually as we did in an earlier step. But you also have the option to scale dynamically to meet changes in demand (such as maintaining CPU Utilization at 50%), predictively scale in advance of daily and weekly patterns in traffic flows, or scale based on a scheduled time. Refer to scaling the size of your Auto Scaling group for more details.
We will create a scheduled action to provision another application server instance.
As per our scheduled action, at 11.30 am, an additional application server instance is started.
Under instance management, you will see an additional instance in ‘Pending’ state as it starts.
To test the stateless nature of your application, you can manually stop the original application server instance and observe that your end-user experience is unaffected i.e. you are not prompted to re-authenticate and can continue using the application as your session data is stored in Redis ElastiCache and not tied to the original instance.

Cleaning up

To avoid incurring future charges, remove the resources by deleting the java-webapp-components, java-webapp-rds and java-webapp-infra CloudFormation stacks.

Conclusion

Customers with significant investments in Java application server technologies have options to migrate to the cloud without requiring a complete re-architecture of their applications. In this blog, we’ve shown an approach to modernizing Java applications running on Tomcat Application Server in AWS. And in doing so, take advantage of cloud-native features such as automatic scaling, provisioning infrastructure as code, and leveraging managed services (such as ElastiCache for Redis and Amazon RDS) to make our application stateless. We also demonstrated modernization features such as authentication and user provisioning via an external IdP (Amazon Cognito). For more information on different re-platforming patterns refer to the AWS Prescriptive Guidance on Migration.

Managing and Securing AWS Outposts Instances using AWS Systems Manager, Amazon Inspector, and Amazon GuardDuty

2022-03-22 sbbusser

Post Syndicated from sbbusser original https://aws.amazon.com/blogs/compute/managing-and-securing-aws-outposts-instances-using-aws-systems-manager-amazon-inspector-and-amazon-guardduty/

This post is written by Sumeeth Siriyur, Specialist Solutions Architect.

AWS Outposts is a family of fully managed solutions that deliver AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience. Outposts is ideal for workloads that need low latency access to on-premises applications or systems, local data processing, and secure storage of sensitive customer data that must remain anywhere without an AWS region, including inside company-controlled environments or a specific country.

A key feature of Outposts is that it offers the same AWS hardware infrastructure, services, APIs, and tools to build and run your applications on-premises and “in AWS Regions”. Outposts is part of the cloud for a truly consistent hybrid experience. AWS compute, storage, database, and other services run locally on Outposts, and you can access the full range of AWS services available in the Region to build, manage, and scale your on-premises applications using familiar AWS services and tools.

Outposts comes in a variety of form factors, from 1U and 2U servers to 42U Outposts rack. This post focuses on the 42U form factor of Outposts.

This post demonstrates how to use some of the existing AWS services in the Region, such as AWS System Manager (SSM), Amazon Inspector, and Amazon GuardDuty to manage and secure your workload environment on Outposts rack. This is no different from how you use these services for workloads in the AWS Regions.

Solution overview

In this scenario, Outposts rack is locally installed in a customer premises. The service link connectivity to the AWS Region can be either via an AWS Direct Connect private virtual interface, a public virtual interface, or the public internet.

The local gateway (LGW) provides connectivity between the Outposts instances and the local on-premises network.

A virtual private cloud (VPC) spans all Availability Zones in its AWS Region. You can extend the VPC in the Region to the Outpost by adding an Outpost subnet. To add an Outpost subnet to a VPC, specify the Amazon Resource Name (ARN) – arn:aws:outposts:region:account-id – of the Outpost when you create the subnet. Outposts rack support multiple subnets. In this scenario, we have extended the VPC from the Region (us-west-2) to the Outpost.

To improve the security posture of the Outposts instance, you can configure AWS SSM to use an interface VPC endpoint in Amazon Virtual Private Cloud (VPC). An interface VPC endpoint lets you connect to services powered by AWS PrivateLink, a technology that lets you privately access AWS SSM APIs by using private IP addresses. See the details in the following AWS SSM section for the VPC endpoints.

Most importantly, to leverage any of the AWS services in the Region, Outposts rack relies on connectivity to the parent AWS Region. Outposts rack is not designed for disconnected operations or environments with limited to no connectivity. We recommend that you have highly-available networking connections back to your AWS Region. For an optimal experience and resiliency, AWS recommends that you use redundant connectivity of at least 500 Mbps (1 Gbps or higher) for the service link connection to the AWS Region.

Outposts offers a consistent experience with the same hardware infrastructure, services, APIs, management, and operations on-premises as in the AWS Regions. Unlike other hybrid solutions that require different APIs, manual software updates, and purchase of third-party hardware and support, Outposts enables developers and IT operations teams to achieve the same pace of innovation across different environments.

In the first section, let’s see how we can use AWS SSM services for managing and operating Outposts instances.

Managing Outposts instances using AWS SSM

The Amazon Systems Manager Agent (SSM Agent) is installed and running on the Outposts instances.

SSM Agent is installed by default on Amazon Linux, Amazon Linux 2, Ubuntu Server16.04 and Ubuntu Server 18.04 LTS based Amazon Elastic Compute Cloud (EC2) AMIs. If SSM Agent isn’t preinstalled, then you must manually install the agent. Agent communication with SSM is via TCP port 443.

Linux: Manually install SSM Agent on EC2 instances for Linux

Windows: Manually install SSM Agent on EC2 instances for Windows Server

Create an IAM instance profile for SSM

By default, SSM doesn’t have permission to perform actions on your instances. Grant access by using an AWS Identity and Access Management (IAM) instance profile. An instance profile is a container that passes IAM role information to an Amazon EC2 instance at launch. You can create an instance profile for SSM by attaching one or more IAM policies that define the necessary permissions to a new role or to a role that you already created. Make sure that you follow AWS best practices by having a least-privileges policy created.

Create VPC endpoints for SSM.

a. amazonaws.us-west-2.ssm: The endpoint for the Systems Manager service.

b. amazonaws.us-west-2.ec2messages: Systems Manager uses this endpoint to make calls from the SSM Agent to the Systems Manager service.

c. amazonaws.us-west-2.ec2: If you’re using Systems Manager to create VSS-enabled snapshots, then you must make sure that you have an endpoint to the EC2 service. Without the EC2 endpoint defined, a call to enumerate attached Amazon Elastic Block Storage (EBS) volumes fails, which causes the Systems Manager command to fail.

d. amazonaws.us-west-2.ssmmessages: This endpoint is for connecting to your instances with a secure data channel using Session Manager.

e. amazonaws.us-west-2.s3: Systems Manager uses this endpoint to update SSM agent, perform patch operation, and for uploading logs into Amazon Simple Storage Service (S3) buckets.

Once the SSM agent has been installed and the necessary permission has been provided for the Systems Manager, log in to Systems Manager Console and navigate to Fleet Manager to discover the Outposts instances as shown in the following image.

4. You can use compliance to scan the Outposts instances for patch compliance and configuration inconsistencies.

5. AWS Systems Manager Inventory provides visibility into your Outposts computing environment. You can use this inventory to collect metadata about the instances.

6. With Session Manager, you can log into your Outposts instances. You can use either an interactive one-click browser-based shell, or the AWS Command Line Interface (CLI) for Linux based EC2 instances. For Windows instances, you can connect using Remote Desktop Protocol (RDP). For better SEO, suggest replacing this with “Check out”, attach the link to “how to connect to Windows instances from the Fleet Manager console”, and delete can be found here. here.

Note that accessing the Outposts EC2 instances through SSH or RDP via the Region based Session Manager will have more latency via service link than accessing via the LGW.

7. Patch Manager automated the process of patching the Outposts instances with both security-related and other types of updates. In the following you can see that one of the Outposts instances is scanned and updated with an operational update.

Security at AWS is the highest priority. Security is a shared responsibility between AWS and customers. We offer the security tools and procedures to secure the Outposts instances as in the AWS region. By using AWS services, you can enhance your security posture on Outposts rack in these areas.

Data Protection
IAM
Infrastructure Security
Resiliency
Compliance Validation
Find out more about these individual aspects of security in AWS Outposts.

In the second section, let’s see how we can use Amazon Inspector running in the AWS Region to scan for vulnerabilities within the Outposts environment. Amazon Inspector uses the widely deployed SSM Agent to automatically scan for vulnerabilities on Outposts instances.

Scan Outposts instances for vulnerabilities using Amazon Inspector

Amazon Inspector is an automated vulnerability management service that continually scans AWS workloads for software vulnerabilities and unintended network exposure. Amazon Inspector automatically discovers all of the Outposts EC2 instances (installed with SSM Agent) and container images residing in Amazon Elastic Container Registry (ECR) that are identified for scanning. Then, it immediately starts scanning them for software vulnerabilities and unintended network exposure.

All workloads are continually rescanned when a new Common Vulnerabilities And Exposures (CVE) is published, or when there are changes in the workloads, such as installation of new software in an Outposts EC2 instance.

Amazon Inspector uses the widely deployed SSM Agent (deployed in the previous scenario) to collect the software inventory and configurations from your Outposts EC2 instances. Use the VPC interface endpoint – com.amazonaws.us-west-2.inspector2 – to privately access Amazon Inspector. The collected application inventory and configurations are used to assess workloads for vulnerabilities.

The following Summary Dashboard provides information on how many Outposts EC2 instances and the container repositories are scanned and discovered.

2. The findings by Vulnerability tab help to identify the most vulnerable Outposts EC2 instances in your environment. In the following, you can see Outposts instances with the following vulnerability highlighted.

a. Port range 0 to 65535 is reachable from an Internet Gateway

b. Port 22 is reachable from an Internet Gateway

3. The findings by instance tab shows you all of the active findings for a Single Outposts instance in your environment. In the following, you can see that for this instance there are a total of 12 high and 19 medium findings based on the rules in the Common Vulnerabilities And Exposures (CVE) package.

In the last section, let’s see how we can use GuardDuty to detect any threats within the Outposts environment.

Threat Detection service for your AWS accounts and Outposts workloads using Amazon GuardDuty

GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for malicious activities and delivers detailed security findings for visibility and remediation.

GuardDuty continuously monitors and analyses the Outposts instances and reports suspicious activities using the GuardDuty console. It gets this information from CloudTrail Management Events, VPC Flow Logs, and DNS logs.

In this scenario, GuardDuty has detected an SSH brute force attack against an Outposts instance.

Costs associated with the scenario

Systems Manager: With AWS Systems Manager, you pay only for what you use on the priced feature. In this scenario, we have used the following features.
1. Inventory – No additional charges
2. Session Manager – No additional charges
3. Patch Manager – No additional charges

*Note that there will be charges for the VPC endpoint created.

Amazon Inspector: Costs for Amazon Inspector are based on container images scanned to ECR and the EC2 instances being scanned.
1. The average number of EC2 instances scanned per month in US-WEST-2 region is $1.258 per instance. In the above scenario, there are three instances within the Outposts at $1.258 = $3.774
Amazon GuardDuty: VPC Flow logs and CloudWatch logs are used for GuardDuty analysis. In this scenario, Only VPC Flow logs are considered.
1. VPC Flow log is charged per GB/month. In US-WEST-2 region – the First 500 GB/month is $1 per GB. In the above scenario, there are three instances within the Outposts that would generate approximately 80 MB of data, which is still within the 500 GB limit.
Understand more about AWS Outposts rack pricing on our website.

Cleaning up

Please delete example resources if they are no longer needed to avoid incurring future costs.

Amazon Inspector: Disable Amazon Inspector from the Amazon Inspector Console.
Amazon GuardDuty: You can use the GuardDuty console to suspend or disable GuardDuty. You are not charged for using GuardDuty when the service is suspended.
Delete unused IAM policies

Conclusion

On-premises data centers traditionally use a variety of infrastructure, tools, and APIs. This disparate assortment of hardware and software solutions results in complexity. In turn, this leads to greater management costs, inability of staff to translate skills from one setting to another, and limits in innovation and knowledge-sharing between environments.

Using a common set of tools, services in the AWS Regions and on Outposts on premises allows you to have a consistent operation environment, thereby delivering a true hybrid cloud experience. Equally, by using the same tools to deploy and manage workloads in both environments, you can reduce operational overhead.

To get started with Outposts, see AWS Outposts Family. For more information about Outposts availability, see the Outposts rack FAQ.

Automate the Creation of On-Demand Capacity Reservations for running EC2 instances

2022-03-22 sbbusser

Post Syndicated from sbbusser original https://aws.amazon.com/blogs/compute/automate-the-creation-of-on-demand-capacity-reservations-for-running-ec2-instances/

This post is written by Ballu Singh a Principal Solutions Architect at AWS, Neha Joshi a Senior Solutions Architect at AWS, and Naveen Jagathesan a Technical Account Manager at AWS.

Customers have asked how they can “create On-Demand Capacity Reservations (ODCRs) for their existing instances during events, such as the holiday season, Black Friday, marketing campaigns, or others?”

ODCRs let you reserve compute capacity for your your Amazon Elastic Compute Cloud (Amazon EC2) instances. ODCRs further make sure that you always have EC2 capacity access when required, and for as long as you need it. Customers who want to make sure that any instances that are stopped/started during the critical event and are available when needed should be covered by ODCRs.

ODCRs let you reserve compute capacity for your Amazon EC2 instances in a specific availability zone for any duration. This means that you can create and manage capacity reservations independently from the billing discounts offered by Savings Plans or Regional Reserved Instances. You can create ODCR at any time, without entering into a one-year or three-year term commitment, and the capacity is available immediately. Billing starts as soon as the ODCR enters the active state. When you no longer need it, cancel the ODCR to stop incurring charges.

At the time of this blog publication, if you need to create ODCR for existing running instances, you must manually identify your running instances configuration with matching attributes, such as instance type, platform, and Availability Zone. This is a time and resource consuming process.

In this post, we provide an automated way to manage ODCR operations. This includes creating, modifying, and cancelling ODCRs for the running instances across regions in an account, all without requiring any manual intervention of specifying instance configuration attributes. Additionally, it creates an Amazon CloudWatch Alarm for InstanceUtilization and an Amazon Simple Notification Service (Amazon SNS) topic with topic name ODCRAlarmNotificationTopic to notify when the threshold breaches.

Note: This will not create cluster placement group ODCRs. For details on capacity reservations in cluster placement groups, refer here.

Getting started

Before you create Capacity Reservations, note the limitations and restrictions here.

To get started, download the scripts for registering, modifying, and canceling ODCRs and associated requirements.txt, as well as AWS Identity and Access Management (IAM) policy from the GitHub link here.

Pre-requisites

To implement these scripts, you need the following prerequisites:

Access to AWS Management Console, AWS Command Line Interface (CLI),or AWS SDK for ODCR.
The following IAM role permissions for IAM users using the solution as provided in ODCR_IAM.json.
Amazon EC2 instance having supported platform for capacity reservation. Capacity Reservations support the following platforms listed here for Linux and Windows.
Refer to the above GitHub link for the code, and save the requirements.txt file in the same directory with other python scripts. You may want to run the requirements.txt file if you don’t have appropriate dependency to run the rest of the python scripts. You can run this using the following command:

pip3 install -r requirements.txt

Implementation Details

To create ODCR capacity reservation

The following instructions will guide you through creating a capacity reservation of running instances across all of the Regions within an AWS account.
Input variables needed from users:

EndDateType (String) – Indicates how the Capacity Reservation ends. A Capacity Reservation can have one of the following end types:
- - unlimited – The Capacity Reservation remains active until you explicitly cancel it. Don’t provide an EndDate if the EndDateType is unlimited.
  - limited – The Capacity Reservation expires automatically at a specified date and time. You must provide an EndDate value if the EndDateType value is limited.

EndDate (datetime) – The date and time when the Capacity Reservation expires. When a Capacity Reservation expires, the reserved capacity is released and you can no longer launch instances into it. The Capacity Reservation’s state changes to expired when it reaches its end date and time.

You must provide EndDateType as ‘limited’ and the EndDate in standard UTC format to secure instances for a limited period. Command to execute register ODCR script with limited period:

You must provide EndDateType as ‘unlimited’ to secure instances for unlimited period. Command to execute register ODCR script with unlimited period:

registerODCR.py '<EndDateType>' '<EndDate>'
    Example- registerODCR.py 'limited' '2022-01-31 14:30:00'

You must provide EndDateType as ‘unlimited’ to secure instances for unlimited period. Command to execute register ODCR script with unlimited period:

registerODCR.py 'EndDateType'
    Example- registerODCR.py 'unlimited'

This registerODCR.py script does following four things:

1. Describe instances cross-region in an account. It checks for the instance that has:

- No Capacity reservation
- State of the instance is running
- Tenancy is default
- InstanceLifecycle is None indicates whether this is a Spot Instance or a Scheduled Instance

Note: Describe instances API call is counted toward your account API limit. Therefore, it is advisable to run the script during non-peak hours or before the short-term scaling event begins. Work with AWS Support team if you run into API throttling.

2. Aggregates instances with similar attributes, such as InstanceType, AvailabilityZone, Tenancy, and Platform.

3. Describe reserved instances cross-region in an account. It checks for instance(s) that have Zonal Reservation Instances (ZRIs) and compares them with aggregated instances with similar attributes.

4. Finally,

- Reserves ODCR(s) for existing running instances with matching attributes for which ZRIs do not exist.

Note: If you have one or more ZRIs in an account, then the script compares them with the existing instances with matching characteristics – Instance Type, AZ, and Platform – and does NOT create ODCR for the ZRIs to avoid incurring redundant charges. If there are more running instances than ZRIs, then the script creates an ODCR for just the delta.

- Creates an SNS topic with the topic name – ODCRAlarmNotificationTopic in the region where you’re registering ODCR, if it doesn’t already exist.
- Creates CloudWatch alarm for InstanceUtilization using the best practices, which can be found here.

Note: You must subscribe and confirm to the SNS topic, if you haven’t already, to receive notifications.

The CloudWatch alarm is also created on your behalf in the region for each ODCR. This alarm monitors your ODCR metric- InstanceUtilization. Whenever it breaches threshold (50% in this case), it enters the alarm state and sends an SNS notification using the topic that was created for you if you subscribed to it.

Note: You can change the alarm threshold based on your specific needs.

You will receive an email notification when CloudWatch Alarm State changes to Alarm with:
- SNS Subject (Assuming CW alarms triggers in US East region).

ALARM: "ODCRAlarm-cr-009969c7abf4daxxx" in US East (N. Virginia)

- SNS Body will have the details
  - CW alarm, region, link to view the alarm, alarm details, and state change actions.

With this, if your ODCR InstanceUtilization drops, then you will be notified in near-real time to help you optimize the capacity and stop unnecessary payments for unused capacity.

To modify ODCR capacity reservation

To modify the attributes of an active capacity reservation after you have created it, adhere to the following instructions.

Note: When modifying a Capacity Reservation, you can only increase or decrease the quantity and change how it is released. You can’t change the instance type, EBS optimization, instance store settings, platform, Availability Zone, or instance eligibility of a Capacity Reservation. If you must modify any of these attributes, then we recommend that you cancel the reservation, and then create a new one with the required attributes. You can’t modify a Capacity Reservation after it has expired or after you have explicitly canceled it.

Input variables needed from users:
- CapacityReservationID – The ID of the Capacity Reservation that you want to modify.
- InstanceCount (integer) – The number of instances for which to reserve capacity. The number of instances can’t be increased or decreased by more than 1000 in a single request.
- EndDateType (String) – Indicates how the Capacity Reservation ends. A Capacity Reservation can have one of the following end types:
  - unlimited – The Capacity Reservation remains active until you explicitly cancel it. Don’t provide an EndDate if the EndDateType is unlimited.
  - limited – The Capacity Reservation expires automatically at a specified date and time. You must provide an EndDate value if the EndDateType value is limited.
- EndDate (datetime) – The date and time of when the Capacity Reservation expires. When a Capacity Reservation expires, the reserved capacity is released, and you can no longer launch
- instances into it. The Capacity Reservation’s state changes to expired when it reaches its end date and time.
  Example to run the modify ODCR script for ‘limited’ period:
- You must provide EndDateType as ‘unlimited’ to modify instances for an unlimited period. Command to the run modify ODCR script with unlimited period:

Command to execute modify ODCR script:

    modifyODCR.py <CapacityReservationId> <InstanceCount> <EndDateType> <EndDate>

Example to execute the modify ODCR script for limited period:

modifyODCR.py 'cr-05e6a94b99915xxxx' '1' 'limited' '2022-01-31 14:30:00'

Note: EndDate is in the standard UTC time.

You must provide EndDateType as ‘unlimited’ to modify instances for unlimited period. Command to execute modify ODCR script with unlimited period:

modifyODCR.py <CapacityReservationId> <InstanceCount> <EndDateType>

Example to execute the modify ODCR script for unlimited period:

modifyODCR.py 'cr-05e6a94b99915xxxx' '1' 'unlimited'

To cancel ODCR capacity reservation

To cancel the ODCR that are in the “Active” state, follow these instructions:

Note: Once the cancellation request succeeds, the reservation status will be marked as “cancelled”.

Input variables needed from users:
- CapacityReservationID – The ID of the Capacity Reservation to cancel.
You must provide one parameter while executing the cancellation script.
Command to execute cancel ODCR script:

cancelODCR.py <CapacityReservationId>

Example to execute the cancel ODCR script:

Example - cancelODCR.py 'cr-05e6a94b99915xxxx'

Monitoring

CloudWatch metrics let you monitor the unused capacity in your Capacity Reservations to optimize the ODCR. ODCRs send metric data to CloudWatch every five minutes. Although Capacity Reservation usage metrics are UsedInstanceCount, AvailableInstanceCount, TotalInstanceCount, and InstanceUtilization, for this solution we will be using the InstanceUtilization metric. This shows the percentage of reserved capacity instances that are currently in use. This will be useful for monitoring and optimizing ODCR consumption.

For example, if your On-Demand Capacity Reservation is for four instances and with matching criteria only one EC2 instance is currently running, then the InstanceUtilization metric will be 25% for your respective capacity reservation.

Let’s look at the steps to create the CloudWatch monitoring dashboard for your On-Demand Capacity Reservation solution:

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
If necessary, change the Region. From the navigation bar, select the Region where your Capacity Reservation resides. For more information, see Regions and Endpoints.
In the navigation pane, choose Metrics.

For All metrics, choose EC2 Capacity Reservations.

4. Choose the metric dimension By Capacity Reservation. Metrics will be grouped by

5. Select the dropdown arrow for InstanceUtilization, and select Search for this only.

Once we see the InstanceUtilization metric in the filter list, select Graph Search.

This displays the InstanceUtilization metrics for the selected period.

OPTIONAL: To display the Capacity Reservation IDs for active metrics only:

- Navigate to Graphed metrics.

- Under Details column, select Edit math expression.

- Edit the math expression with the following, and select Apply:

REMOVE_EMPTY(SEARCH('{AWS/EC2CapacityReservations,CapacityReservationId} MetricName="InstanceUtilization"', 'Average', 300))

This displays the Capacity Reservation IDs for active metrics only.

With this configuration, whenever new Capacity Reservations are created, the InstanceUtilization metric for respective Capacity Reservation IDs will be populated.

6. From the Actions drop-down menu, select Add to dashboard.

Select Create new to create a new dashboard for monitoring your ODCR metrics.

Specify the new dashboard name, and select Add to dashboard.

7. These configuration steps will navigate you to your newly created CloudWatch dashboard under Dashboards.

Once this is created, if you create new Capacity Reservations, or new instances get added to existing reservations, then those metrics will be automatically be added to your CloudWatch Dashboard.

Note: You may see a delay of approximately 5-10 minutes from the point when changes are made to your environment (ODCR operations or instances launch/termination activities) to those changes getting reflected on your CloudWatch Dashboard metrics.

Conclusion

In this post, we discussed a solution for automating ODCR operations for existing EC2 instances. This included creating capacity reservation, modifying capacity reservation, and cancelling capacity reservation operations that inherit your existing EC2 instances for attribute details. We also discussed monitoring aspects of ODCR metrics using CloudWatch. This solution allows you to automate some of the ODCR operations for existing instances, thereby optimizing and speeding up the entire process.

For more information, see Target a group of Amazon EC2 On-Demand Capacity Reservations blog and Capacity Reservations documentation.

If you have feedback or questions about this post, please submit your comments in the comments section or contact AWS Support.

Running cross-account workflows with AWS Step Functions and Amazon API Gateway

2022-03-21 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/running-cross-account-workflows-with-aws-step-functions-and-amazon-api-gateway/

This post is written by Hardik Vasa, Senior Solutions Architect, and Pratik Jain, Cloud Infrastructure Architect.

AWS Step Functions allow you to build scalable and distributed applications using state machines. With the launch of Step Functions nested workflows, you can start a Step Functions workflow from another workflow. However, this requires both workflows to be in the same account. There are many use cases that require you to orchestrate workflows across different AWS accounts from one central AWS account.

This blog post covers a solution to invoke Step Functions workflows cross account using Amazon API Gateway. With this, you can perform cross-account orchestration for scheduling, ETL automation, resource deployments, security audits, and log aggregations all from a central account.

Overview

The following architecture shows a Step Functions workflow in account A invoking an API Gateway endpoint in account B, and passing the payload in the API request. The API then invokes another Step Functions workflow in account B asynchronously. The resource policy on the API allows you to restrict access to a specific Step Functions workflow to prevent anonymous access.

You can extend this architecture to run workflows across multiple Regions or accounts. This blog post shows running cross-account workflows with two AWS accounts.

To invoke an API Gateway endpoint, you can use Step Functions AWS SDK service integrations. This approach allows users to build solutions and integrate services within a workflow without writing code.

The example demonstrates how to use the cross-account capability using two AWS example accounts:

Step Functions state machine A: Account ID #111111111111
API Gateway API and Step Functions state machine B: Account ID #222222222222

Setting up

Start by creating state machine A in the account #111111111111. Next, create the state machine in target account #222222222222, followed by the API Gateway REST API integrated to the state machine in the target account.

Account A: #111111111111

In this account, create a state machine, which includes a state that invokes an API hosted in a different account.

Create an IAM role for Step Functions

Sign in to the IAM console in account #111111111111, and then choose Roles from left navigation pane
Choose Create role.
For the Select trusted entity, under AWS service, select Step Functions from the list, and then choose Next.
On the Add permissions page, choose Next.
On the Review page, enter StepFunctionsAPIGatewayRole for Role name, and then choose Create role.
Create inline policies to allow Step Functions to access the API actions of the services you need to control. Navigate to the role that you created and select Add Permissions and then Create inline policy.
Use the Visual editor or the JSON tab to create policies for your role. Enter the following:
```
Service: Execute-API
Action: Invoke
Resource: All Resources
```
Choose Review policy.
Enter APIExecutePolicy for name and choose Create Policy.

Creating a state machine in source account

Navigate to the Step Functions console in account #111111111111 and choose Create state machine
Select Design your workflow visually, and the click Standard and then click Next
On the design page, search for APIGateway:Invoke state, then drag and drop the block on the page:
In the API Gateway Invoke section on the right panel, update the API Parameters with the following JSON policy:
```
 {
  "ApiEndpoint.$": "$.ApiUrl",
  "Method": "POST",
  "Stage": "dev",
  "Path": "/execution",
  "Headers": {},
  "RequestBody": {
   "input.$": "$.body",
   "stateMachineArn.$": "$.stateMachineArn"
  },
  "AuthType": "RESOURCE_POLICY"
}
```
These parameters indicate that the ApiEndpoint, payload (body) and stateMachineArn are dynamically assigned values based on input provided during workflow execution. You can also choose to assign these values statically, based on your use case.
[Optional] You can also configure the API Gateway Invoke state to retry upon task failure by configuring the retries setting.
Choose Next and then choose Next again. On the Specify state machine settings page:
1. Enter a name for your state machine.
2. Select Choose an existing role under Permissions and choose StepFunctionsAPIGatewayRole.
3. Select Log Level ERROR.
Choose Create State Machine.

After creating this state machine, copy the state machine ARN for later use.

Account B: #222222222222

In this account, create an API Gateway REST API that integrates with the target state machine and enables access to this state machine by means of a resource policy.

Creating a state machine in the target account

Navigate to the Step Functions Console in account #222222222222 and choose Create State Machine.
Under Choose authoring method select Design your workflow visually and the type as Standard.
Choose Next.
On the design page, search for Pass state. Drag and drop the state.
Choose Next.
In the Review generated code page, choose Next and:
1. Enter a name for the state machine.
2. Select Create new role under the Permissions section.
3. Select Log Level ERROR.
Choose Create State Machine.

Once the state machine is created, copy the state machine ARN for later use.

Next, set up the API Gateway REST API, which acts as a gateway to accept requests from the state machine in account A. This integrates with the state machine you just created.

Create an IAM Role for API Gateway

Before creating the API Gateway API endpoint, you must give API Gateway permission to call Step Functions API actions:

Sign in to the IAM console in account #222222222222 and choose Roles. Choose Create role.
On the Select trusted entity page, under AWS service, select API Gateway from the list, and then choose Next.
On the Select trusted entity page, choose Next
On the Name, review, and create page, enter APIGatewayToStepFunctions for Role name, and then choose Create role
Choose the name of your role and note the Role ARN:
arn:aws:iam::222222222222:role/APIGatewayToStepFunctions
Select the IAM role (APIGatewayToStepFunctions) you created.
On the Permissions tab, choose Add permission and choose Attach Policies.
Search for AWSStepFunctionsFullAccess, choose the policy, and then click Attach policy.

Creating the API Gateway API endpoint

After creating the IAM role, create a custom API Gateway API:

Open the Amazon API Gateway console in account #222222222222.
Click Create API. Under REST API choose Build.
Enter StartExecutionAPI for the API name, and then choose Create API.
On the Resources page of StartExecutionAPI, choose Actions, Create Resource.
Enter execution for Resource Name, and then choose Create Resource.
On the /execution Methods page, choose Actions, Create Method.
From the list, choose POST, and then select the check mark.

Configure the integration for your API method

On the /execution – POST – Setup page, for Integration Type, choose AWS Service. For AWS Region, choose a Region from the list. For Regions that currently support Step Functions, see Supported Regions.
For AWS Service, choose Step Functions from the list.
For HTTP Method, choose POST from the list. All Step Functions API actions use the HTTP POST method.
For Action Type, choose Use action name.
For Action, enter StartExecution.
For Execution Role, enter the role ARN of the IAM role that you created earlier, as shown in the following example. The Integration Request configuration can be seen in the image below.
arn:aws:iam::222222222222:role/APIGatewayToStepFunctions
Choose Save. The visual mapping between API Gateway and Step Functions is displayed on the /execution – POST – Method Execution page.

After you configure your API, you can configure the resource policy to allow the invoke action from the cross-account Step Functions State Machine. For the resource policy to function in cross-account scenarios, you must also enable AWS IAM authorization on the API method.

Configure IAM authorization for your method

On the /execution – POST method, navigate to the Method Request, and under the Authorization option, select AWS_IAM and save.
In the left navigation pane, choose Resource Policy.

Use this policy template to define and enter the resource policy for your API.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "states.amazonaws.com"
            },
            "Action": "execute-api:Invoke",
            "Resource": "execute-api:/*/*/*",
            "Condition": {
                "StringEquals": {
                    "aws:SourceArn": [
                        "<SourceAccountStateMachineARN>"
                     ]
                }
            }
        }
    ]
}

Note: You must replace <SourceAccountStateMachineARN> with the state machine ARN from account #111111111111 (account A).

Choose Save.

Once the resource policy is configured, deploy the API to a stage.

Deploy the API

In the left navigation pane, click Resources and choose Actions.
From the Actions drop-down menu, choose Deploy API.
In the Deploy API dialog box, choose [New Stage], enter dev in Stage name.
Choose Deploy to deploy the API.

After deployment, capture the API ID, API Region, and the stage name. These are used as inputs during the execution phase.

Starting the workflow

To run the Step Functions workflow in account A, provide the following input:

{
   "ApiUrl": "<api_id>.execute-api.<region>.amazonaws.com",
   "stateMachineArn": "<stateMachineArn>",
   "body": "{\"someKey\":\"someValue\"}"
}

Paste in the values of APIUrl and stateMachineArn from account B in the preceding input. Make sure the ApiUrl is in the format as shown.

AWS Serverless Application Model deployment

You can deploy the preceding solution architecture with the AWS Serverless Application Model (AWS SAM), which is an open-source framework for building serverless applications. During deployment, AWS SAM transforms and expands the syntax into AWS CloudFormation syntax, enabling you to build serverless applications faster.

Logging and monitoring

Logging and monitoring are vital for observability, measuring performance and audit purposes. Step Functions allows logging using CloudWatch Logs. Step Functions also automatically sends execution metrics to CloudWatch. You can learn more on monitoring Step Functions using CloudWatch.

Cleaning up

To avoid incurring any charges, delete all the resources that you have created in both the accounts. This would include deleting the Step Functions state machines and API Gateway API.

Conclusion

This blog post provides a step-by-step guide on securely invoking a cross-account Step Functions workflow from a central account using API Gateway as front end. This pattern can be extended to scale workflow executions across different Regions and accounts.

By using a centralized account to orchestrate workflows across AWS accounts, this can help prevent duplicating work in each account.

To learn more about serverless and AWS Step Functions, visit the Step Functions Developer Guide.

For more serverless learning resources, visit Serverless Land.

Incident notification mechanism using Amazon Pinpoint two-way SMS

2022-03-21 Pavlos Ioannou Katidis

Post Syndicated from Pavlos Ioannou Katidis original https://aws.amazon.com/blogs/messaging-and-targeting/incident-notification-mechanism-using-amazon-pinpoint-two-way-sms/

Unexpected situations that require immediate attention can occur in any industry. Part of resolving these incidents is the notifications’ delivery. For example, utility companies that have installed gas sensors need to notify immediately the available engineer if a leak occurs.

The goal of an incident management process is to restore a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. A key element of incident management is sending timely notifications to the assigned or available resource(s) who can rectify the issue.

An incident can take place at any time and the resource(s) assigned to it might not have internet access and even if they receive the message they might not be equipped to work on it. This creates five key requirements for an incident notifications mechanism:

Notify the resources via a communication channel that ensures message delivery even without internet access
Enable assigned resources to respond to a request via a communication channel that doesn’t require internet access
Send reminder(s) in case there is no response from the assigned resource(s)
Escalate to another resource in case the first one doesn’t reply or declines the incident
Store the incident details & status for reporting and data analysis

In this blog post, I share a solution on how you can automate the delivery of incident notifications. This solution utilizes Amazon Pinpoint SMS channel to contact the designated resources who might not have access to the internet. Furthermore, the recipient of the SMS is able to reply with an acknowledgement. AWS Step Functions orchestrates the user journey using AWS Lambda functions to evaluate the recipients’ response and trigger the next best action. You will use AWS CloudFormation to deploy this solution.

Use Cases

An incident notification mechanism can vary depending the organization’s requirements and 3rd party system integrations. In this blog the solution covers all five points listed above but it might require further modifications depending your use case.

With minor modifications this solution can also be used in the following use cases:

Medicine intake notification: It will notify the patient via SMS that it is their time to take their medicine. If the patient doesn’t acknowledge the SMS by replying then this can be escalated to their assigned doctor
Assignment submission: It will notify the student that their assignment is due. If the student doesn’t acknowledge the SMS by replying then this can be escalated to their teacher

High-level Architecture

The solution requires the country of your SMS recipients to support two-way SMS. To check which countries, support two-way SMS visit this page. If two-way SMS is supported then you will need to request a dedicated originating identity. You can also use Toll Free Number or 10DLC if your recipients are in the US.

Note: Sender ID doesn’t support two-way SMS.

A new incident is represented as an item in an Amazon DynamoDB table containing information such as description, URL, incident_id as well as the contact numbers for two resources. A resource is someone who has been assigned to work on this incident. The second resource is for escalation purposes in case the first one doesn’t acknowledge or decline the incident notification.

The Amazon DynamoDB table covers three functions for this solution:

A way to add new incidents using either the AWS console or programmatically
As a storage for variables that indicate the incident’s status and can be used from the solution to determine the next action(s)
As a historical data storage for all incidents that have been created for data analysis purposes

The solution utilizes Amazon DynamoDB Streams to invoke an AWS Lambda function every time a new incident is created. The AWS Lambda function triggers an AWS Step Function State machine, which orchestrates three AWS Lambda functions:

Send_First_SMS: Sends the first SMS
Reminder_SMS: Sends a reminder SMS if the resource does not acknowledge the first SMS
Incident_State_Review: Assesses the status of the incident and either goes back to the first AWS Lambda function or finishes the AWS Step Function State machine execution

The AWS Step Functions State machine uses the Choice state, which evaluates the response of the previous AWS Lambda function and decides on the next state. This is a very useful feature that can reduce custom code and potentially AWS Lambda invocations resulting to cost savings.

Additionally, the waiting between steps is also managed from AWS Step Functions State machine using the Wait state. This can be configured to wait seconds, days or till a specific point in the future.

To be able to receive SMS, this solution uses Amazon Pinpoint’s two-way SMS feature. When receiving an SMS Amazon Pinpoint sends a payload to an Amazon SNS topic, which needs to be created separately. An AWS Lambda function that is subscribed to the Amazon SNS topic processes the SMS content and performs one or both of the following actions:

Update the incident status in the DynamoDB table
Create a new Step Function State machine execution

In this solution SMS recipients can reply by typing either yes or no. The SMS response is not case sensitive.

An inbound SMS payload contains the originationNumber, destinationNumber, messageKeyword, messageBody, inboundMessageId and previousPublishedMessageId. Noticeably there isn’t a direct way to associate an inbound SMS with an incident. To overcome this challenge this solution uses a second DynamoDB table, which stores the message_id and incident_id every time an SMS is send to any of the two resources. This allows the solution to use the previousPublishedMessageId from the inbound SMS payload to fetch the respective incident_id from the second DynamoDB table.

The code in this solution uses AWS SDK for Python (Boto3).

Prerequisites

An Amazon Pinpoint project with the SMS channel enabled – Guide on how to enable Amazon Pinpoint SMS channel
Check if the country you want to send SMS to, supports two-way SMS – List with countries that support two-way SMS
An originating identity that supports two-way SMS – Guide on how to request a phone number
Increase your monthly SMS spending quota for Amazon Pinpoint – Guide on how to increase the monthly SMS spending quota

Deploy the solution

Step 1: Create an S3 bucket

Navigate to the Amazon S3 console
Select Create bucket
Enter a unique name for Bucket name
Select the AWS Region to be the same as the one of your Amazon Pinpoint project
Scroll to the bottom of the page and select Create bucket
Follow this link to download the GitHub repository. Once the repository is downloaded, unzip it and navigate to \amazon-pinpoint-incident-notifications-mechanism-main\src
Access the S3 bucket created above and upload the five .zip files

Step 2: Create a stack

The application is deployed using an AWS CloudFormation template.
Navigate to the AWS CloudFormation console select Create stack > With new resources (standard)
Select Template is ready as Prerequisite – Prepare template and choose Upload a template file as Template source
Select Choose file and from the GitHub repository downloaded in step 1.6 navigate to amazon-pinpoint-incident-notifications-mechanism-main\cfn upload CloudFormation_template.yaml and select Next
Type Pinpoint-Incident-Notifications-Mechanism as Stack name, paste the S3 bucket name created in step 1.5 as the LambdaCodeS3BucketName, type the Amazon Pinpoint Originating Number in E.164 format as OriginatingIdenity, paste the Amazon Pinpoint project ID as PinpointProjectId and type 40 for WaitingBetweenSteps
Select Next, till you reach to Step 4 Review where you will need to check the box I acknowledge that AWS CloudFormation might create IAM resources and then select Create Stack
The stack creation process takes approximately 2 minutes. Click on the refresh button to get the latest event regarding the deployment status. Once the stack has been deployed successfully you should see the last Event with Logical ID Pinpoint-Incident-Notifications-Mechanism and with Status CREATE_COMPLETE

Step 3: Configure two-way SMS SNS topic

Navigate to the Amazon Pinpoint console > SMS and voice > Phone numbers. Select the originating identity that supports two-way SMS. Scroll to the bottom of the page and click to expand the and check the box to enable it.

For SNS topic select Choose an existing SNS topic then using the drop down choose the one that contains the name of the AWS CloudFormation stack from Step 2.4 as well as the name TwoWaySMSSNSTopic and click Save.

Step 4: Create a new incident

To create a new incident, navigate to Amazon DynamoDB console > Tables and select the table containing the name of the AWS CloudFormation stack from Step 2.4 as well as the name IncidentInfoDynamoDB. Select View items and then Create item.

On the Create item page choose JSON, copy and paste the JSON below into the text box and replace the values for the first_contact and second_contact with a valid mobile number that you have access to.

Note: If you don’t have two different mobile numbers, enter the same for both first_contact and second_contact fields. The mobile numbers must follow E.164 format +<country code><number>.

{
   "incident_id":{
      "S":"123"
   },
   "incident_stat":{
      "S":"not_acknowledged"
   },
   "double_escalation":{
      "S":"no"
   },
   "description":{
      "S":"Error 111, unit 1 malfunctioned. Urgent assistance is required."
   },
   "url":{
      "S":"https://example.com/incident/111/overview"
   },
   "first_contact":{
      "S":"+4479083---"
   },
   "second_contact":{
      "S":"+4479083---"
   }
}

Incident fields description:

incident_id: Needs to be unique
incident_stat: This is used from the application to store the incident status. When creating the incident, this value should always be not_acknowledged
double_escalation: This is used from the application as a flag for recipients who try to escalate an incident that is already escalated. When creating the incident, this value should always be no
description: You can type a description that best describes the incident. Be aware that depending the number of characters the SMS parts will increase. For more information on SMS character limits visit this page
url: You can add a URL that resources can access to resolve the issue. If this field is not pertinent to your use case then type no url
first_contact: This should contain the mobile number in E.164 format for the first resource
second_contact: This should contain the mobile number in E.164 format for the second resource. The second resource will be contacted only if the first one does not acknowledge the SMS or declines the incident

Once the above is ready you can select Create item. This will execute the AWS Step Functions State machine and you should receive an SMS. You can reply with yes to acknowledge the incident or with no to decline it. Depending your response, the incident status in the DynamoDB table will be updated and if you reply no then the incident will be escalated sending a SMS to the second_contact.

Note: The SMS response is not case sensitive.

Clean-up

To remove the solution:

Delete the AWS CloudFormation stack by following the steps listed in this guide
Delete the dedicated originating identity that you used to send the SMS by following the steps listed in this guide
Delete the Amazon Pinpoint project by navigating the Amazon Pinpoint console, select your Amazon Pinpoint Project, choose Settings > General settings > Delete Project

Next Steps

This solution currently works only if your SMS recipients are in one country. If your use case requires to send SMS to multiple countries you will need to:

Check this page to ensure that these countries support two-way SMS
Follow the instructions in this page to obtain a number that supports two-way SMS for each country
Expand the solution to identify the country of the SMS recipient and to choose the correct number accordingly. To identify the country of the SMS recipient you can use Amazon Pinpoint’s phone number validate service via Amazon Pinpoint API or SDKs. The phone validate service returns a list of data points per mobile number with one of them being the Country

Incidents that are not being acknowledged by any of the assigned resources, have their status updated to unacknowledged but they don’t escalate further. Depending your requirements, you can expand the solution to send an email using Amazon Pinpoint APIs or perform an outbound call using Amazon Connect APIs.

Conclusion

In this blog post, I have demonstrated how your organization can use Amazon Pinpoint two-way SMS and Step Functions to automate incident notifications. Furthermore, the solution highlights the synergy of AWS services and how you can build a custom solution with little effort that meets your requirements.

About the Author

Pavlos Ioannou Katidis

Pavlos Ioannou Katidis is an Amazon Pinpoint and Amazon Simple Email Service Specialist Solutions Architect at AWS. He loves to dive deep into his customer’s technical issues and help them design communication solutions. In his spare time, he enjoys playing tennis, watching crime TV series, playing FPS PC games, and coding personal projects.

Implementing mutual TLS for Java-based AWS Lambda functions

2022-03-16 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/implementing-mutual-tls-for-java-based-aws-lambda-functions-2/

This post is written by Dhiraj Mahapatro, Senior Specialist SA, Serverless and Christian Mueller, Principal Solutions Architect

Modern secure applications establish network connections to other services through HTTPS. This ensures that the application connects to the right party and encrypts the data before sending it over the network.

You might not want unauthenticated users to connect to your service as a service provider. One solution to this requirement is to use mutual TLS (Transport Layer Security). Mutual TLS (or mTLS) is a common security mechanism that uses client certificates to add an authentication layer. This allows the service provider to verify the client’s identity cryptographically.

The purpose of mutual TLS in serverless

mTLS refers to two parties authenticating each other at the same time when establishing a connection. By default, the TLS protocol only proves the identity of the server to a client using X.509 certificates. With mTLS, a client must prove its identity to the server to communicate. This helps support a zero-trust policy to protect against adversaries like man-in-the-middle attacks.

mTLS is often used in business-to-business (B2B) applications and microservices, where interservice communication needs mutual authentication of parties. In Java, you see the following error when the server expects a certificate, but the client does not provide one:

PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

This blog post explains multiple ways to implement a Java-based AWS Lambda function that uses mTLS to authenticate with a third-party internal or external service. The sample application and this post explain the advantages and tradeoffs of each approach.

The KeyStore and TrustStore in Java

The TrustStore is used to store certificate public keys from a certificate authority (CA) or trusted servers. A client can verify the public certificate presented by the server in a TLS connection. A KeyStore stores private key and identity certificates that a specific application uses to prove the client’s identity.

The stores contain opposite certificates. The TrustStore holds the identification certificates that identify others, while the KeyStore holds the identification certificates that identify itself.

Overview

To start, you create certificates. For brevity, this sample application uses a script that uses OpenSSL and Java’s keytool for self-signed certificates from a CA. You store the generated keys in Java KeyStore and TrustStore. However, the best practice for creating and maintaining certificates and private CA is to use AWS Certificate Manager and AWS Certificate Manager Private Certificate Authority.

You can find the details of the script in the README file.

The following diagram shows the use of KeyStore and TrustStore in the client Lambda function, and the server running on Fargate.

KeyStore and TrustStore

The demo application contains several Lambda functions. The Lambda functions act as clients to services provided by Fargate behind an Amazon Network Load Balancer (NLB) running in a private Amazon VPC. Amazon Route 53 private hosted zones are used to resolve selected hostnames. You attach the Lambda functions to this VPC to resolve the hostnames for the NLB. To learn more, read how AWS Lambda uses Hyperplane elastic network interfaces to work with custom VPC.

The following examples refer to portions of InfrastructureStack.java and the implementation in the corresponding Lambda functions.

Providing a client certificate in a Lambda function artifact

The first option is to provide the KeyStore and TrustStore in a Lambda functions’ .zip artifact. You provide specific Java environment variables within the Lambda configuration to instruct the JVM to load and trust your provided Keystore and TrustStore. The JVM uses these settings instead of the Java Runtime Environment’s (JRE) default settings (use a stronger password for your use case):

"-Djavax.net.ssl.keyStore=./client_keystore_1.jks -Djavax.net.ssl.keyStorePassword=secret -Djavax.net.ssl.trustStore=./client_truststore.jks -Djavax.net.ssl.trustStorePassword=secret"

The JRE uses this KeyStore and TrustStore to build a default SSLContext. The HttpClient uses this default SSLContext to create a TLS connection to the backend service running on Fargate.

The following architecture diagram shows the sample implementation. It consists of an Amazon API Gateway endpoint with a Lambda proxy integration that calls a backend Fargate service running behind an NLB.

Providing a client certificate in a Lambda function artifact

This is a basic approach for a prototype. However, it has a few shortcomings related to security and separation of duties. The KeyStore contains the private key, and the password is exposed to the source code management (SCM) system, which is a security concern. Also, it is the Lambda function owner’s responsibility to update the certificate before its expiration. You can address these concerns about separation of duties with the following approach.

Providing the client certificate in a Lambda layer

In this approach, you separate the responsibility between two entities. The Lambda function owner and the KeyStore and TrustStore owner.

The KeyStore and TrustStore owner provides the certificates securely to the function developer who may be working in a separate AWS environment. For simplicity, the demo application uses the same AWS account.

The KeyStore and TrustStore owner achieves this by using AWS Lambda layers. The KeyStore and TrustStore owner packages and uploads the certificates as a Lambda layer and only allows access to authorized functions. The Lambda function owner does not access the KeyStore or manage its lifecycle. The KeyStore and TrustStore owner’s responsibility is to release a new version of this layer when necessary and inform users.

Providing the client certificate in a Lambda layer

The KeyStore and TrustStore are extracted under the path /opt as part of including a Lambda layer. The Lambda function can now use the layer as:

Function lambdaLayerFunction = new Function(this, "LambdaLayerFunction", FunctionProps.builder()
  .functionName("lambda-layer")
  .handler("com.amazon.aws.example.AppClient::handleRequest")
  .runtime(Runtime.JAVA_11)
  .architecture(ARM_64)
  .layers(singletonList(lambdaLayerForService1cert))
  .vpc(vpc)
  .code(Code.fromAsset("../software/2-lambda-using-separate-layer/target/lambda-using-separate-layer.jar"))
  .memorySize(1024)
  .environment(Map.of(
    "BACKEND_SERVICE_1_HOST_NAME", BACKEND_SERVICE_1_HOST_NAME,
    "JAVA_TOOL_OPTIONS", "-Djavax.net.ssl.keyStore=/opt/client_keystore_1.jks -Djavax.net.ssl.keyStorePassword=secret -Djavax.net.ssl.trustStore=/opt/client_truststore.jks -Djavax.net.ssl.trustStorePassword=secret"
  ))
  .timeout(Duration.seconds(10))
  .logRetention(RetentionDays.ONE_WEEK)
  .build());

The KeyStore and TrustStore passwords are still supplied as environment variables and stored in the SCM system, which is against best practices. You can address this with the next approach.

Storing passwords securely in AWS Systems Manager Parameter Store

AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data and secret management. You can use Parameter Store to store the KeyStore and TrustStore passwords instead of environment variables. The Lambda function uses an IAM policy to access Parameter Store and gets the passwords as a secure string during the Lambda initialization phase.

With this approach, you build a custom SSLContext after retrieving the KeyStore and TrustStore passwords from the Parameter Store. Once you create SSLContext, provide that to the HttpClient you use to connect with the backend service:

HttpClient client = HttpClient.newBuilder()
  .version(HttpClient.Version.HTTP_2)
  .connectTimeout(Duration.ofSeconds(5))
  .sslContext(sslContext)
  .build();

You can also use a VPC interface endpoint for AWS Systems Manager to keep the traffic from your Lambda function to Parameter Store internal to AWS. The following diagram shows the interaction between AWS Lambda and Parameter Store.

Storing passwords securely in AWS Systems Manager Parameter Store

This approach works for Lambda functions interacting with a single backend service requiring mTLS. However, it is common in a modern microservices architecture to integrate with multiple backend services. Sometimes, these services require a client to assume different identities by using different KeyStores. The next approach explains how to handle the multiple services scenario.

Providing multiple client certificates in Lambda layers

You can provide multiple KeyStore and TrustStore pairs within multiple Lambda layers. All layers attached to a function are merged when provisioning the function. Ensure your KeyStore and TrustStore names are unique. A Lambda function can use up to five Lambda layers.

Similar to the previous approach, you load multiple KeyStores and TrustStores to construct multiple SSLContext objects. You abstract the common logic to create an SSLContext object in another Lambda layer. Now, the Lambda function calling two different backend services uses 3 Lambda layers:

Lambda layer for backend service 1 (under /opt)
Lambda layer for backend service 2 (under /opt)
Lambda layer for the SSL utility that takes the KeyStore, TrustStore, and their passwords to return an SSLContext object

SSL utility Lambda layer provides the getSSLContext default method in a Java interface. The Lambda function implements this interface. Now, you create a dedicated HTTP client per service.

The following diagram shows your final architecture:

Providing multiple client certificates in Lambda layers

Prerequisites

To run the sample application, you need:

To build and provision the stack:

Clone the git repository.

git clone https://github.com/aws-samples/serverless-mutual-tls.git
cd serverless-mutual-tls

Create the two root CA’s, client, and server certificates.

./scripts/1-create-certificates.sh

Build and package all examples.

./scripts/2-build_and_package-functions.sh

Provision the AWS infrastructure (make sure that Docker is running).

./scripts/3-provision-infrastructure.sh

Verification

Verify that the API endpoints are working and using mTLS by running these commands from the base directory:

export API_ENDPOINT=$(cat infrastructure/target/outputs.json | jq -r '.LambdaMutualTLS.apiendpoint')

To see the error when mTLS is not used in the Lambda function, run:

curl -i $API_ENDPOINT/lambda-no-mtls

The preceding curl command responds with an HTTP status code 500 and plain body as:

PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

For successful usage of mTLS as shown in the previous use cases, run:

curl -i $API_ENDPOINT/lambda-only
curl -i $API_ENDPOINT/lambda-layer
curl -i $API_ENDPOINT/lambda-parameter-store
curl -i $API_ENDPOINT/lambda-multiple-certificates

The last curl command responds with an HTTP status code 200 and body as:

[
 {"hello": "from backend service 1"}, 
 {"hello": "from backend service 2"}
]

Additional security

You can add additional controls via Java environment variables. Compliance standards like PCI DSS in financial services require customers to exercise more control over the underlying negotiated protocol and ciphers.

Some of the useful Java environment variables to troubleshoot SSL/TLS connectivity issues in a Lambda function are:

-Djavax.net.debug=all
-Djavax.net.debug=ssl,handshake
-Djavax.net.debug=ssl:handshake:verbose:keymanager:trustmanager
-Djavax.net.debug=ssl:record:plaintext

You can enforce a specific minimum version of TLS (for example, v1.3) to meet regulatory requirements:

-Dhttps.protocols=TLSv1.3

Alternatively, programmatically construct your SSLContext inside the Lambda function:

SSLContext sslContext = SSLContext.getInstance("TLSv1.3");

You can also use the following Java environment variable to limit the use of weak cipher suites or unapproved algorithms, and explicitly provide the supported cipher suites:

-Dhttps.cipherSuites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,TLS_DHE_RSA_WITH_AES_256_CBC_SHA256

You achieve the same programmatically with the following code snippet:

httpClient = HttpClient.newBuilder()
  .version(HttpClient.Version.HTTP_2)
  .connectTimeout(Duration.ofSeconds(5))
  .sslContext(sslContext)
  .sslParameters(new SSLParameters(new String[]{
    "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA",
    ………
  }))
  .build();

Cleaning up

The stack creates a custom VPC and other related resources. Clean up after usage to avoid the ongoing cost of running these services. To clean up the infrastructure and the self-generated certificates, run:

./scripts/4-delete-certificates.sh
./scripts/5-deprovision-infrastructure.sh

Conclusion

mTLS in Java using KeyStore and TrustStore is a well-established approach for using client certificates to add an authentication layer. This blog highlights the four approaches that you can take to implement mTLS using Java-based Lambda functions.

Each approach addresses the separation of concerns required while implementing mTLS with additional security features. Use an approach that suits your needs, organizational security best practices, and enterprise requirements. Refer to the demo application for additional details.

For more serverless learning resources, visit Serverless Land.

Extend SQL Server DR using log shipping for SQL Server FCI with Amazon FSx for Windows configuration

2022-03-14 Yogi Barot

Post Syndicated from Yogi Barot original https://aws.amazon.com/blogs/architecture/extend-sql-server-dr-using-log-shipping-for-sql-server-fci-with-amazon-fsx-for-windows-configuration/

This week for Women’s History Month, we’re continuing to feature female authors. We’re showcasing women in the tech industry who are building, creating, and, above all, inspiring, empowering, and encouraging everyone—especially women and girls—in tech.

Companies choosing to rehost their on-premises SQL Server workloads to AWS can face challenges with setting up their disaster recovery (DR) strategy. Solutions such as Always On can be a more expensive, complex configuration across Regions. It can cause latency issues when synchronously replicating data cross-Region. Snapshots have additional overhead and may breach their stringent recovery point objective/recovery time objective (RPO/RTO) requirements.

A log shipping solution can take advantage of cross-Region replication of data using Amazon FSx for Windows File Server. It has less maintenance overhead, doesn’t introduce latency, and meets RPO/RTO requirements. A multi-Region architecture for Microsoft SQL Server is often adopted for SQL Server deployments for business continuity (disaster recovery) and improved latency (for a geographically distributed customer base).

This blog post explores SQL Server DR architecture using SQL Server failover cluster with Amazon FSx for Windows File Server for the primary site and secondary DR site. We describe how to set up a multi-Region DR using log shipping. We’ll explain the architecture patterns so you can follow along and effectively design a highly available SQL Server deployment that spans two or more AWS Regions.

Here are some advantages of using log shipping versus Always On distributed availability group DR setup.

Log shipping works with SQL Server Standard edition
It lowers total cost of ownership (TCO) as you only need one SQL Server Standard edition license at the primary/DR site
It’s straightforward to configure
There’s no need for clustering setup at the OS level
It supports all SQL Server versions. You don’t need the SQL Server version to be the same for source and destination instances.

Log shipping DR solution for SQL Server FCI with Amazon FSx

The architecture diagram in Figure 1 depicts SQL Server failover cluster instance (FCI) using Amazon FSx as storage (multiple Availability Zones) in Region 1. It uses a standalone or a similar setup on Region 2. It uses a log shipping feature for replication and DR. This will also serve as the reference architecture for our solution.

Figure 1. Log shipping DR solution for SQL Server FCI with Amazon FSx

Figure 1 shows an SQL cluster in Region 1 and standalone SQL cluster in Region 2. The primary cluster in Region 1 is initially configured with SQL Server failover cluster instance (FCI) using Amazon FSx for its shared storage. Region 2 can have a standalone Amazon EC2 server with SQL Server and Amazon Elastic Block Store (EBS) as storage. Or it can have an identical configuration to Region 1, but with different hostnames, and an SQL network name (SQLFCI02) to avoid possible collisions.

You can build the VPC peering or AWS Transit Gateway to have seamless connectivity between the two Regions for the opened ports (SQL Server, SMB for file share, and others.)

With Amazon FSx, you get a fully managed and shared file storage solution, that automatically replicates the underlying storage synchronously across multiple Availability Zones. Amazon FSx provides high availability with automatic failure detection and automatic failover if there are any hardware or storage issues. The service fully supports continuously available shares, a feature that permits SQL Server uninterrupted access to shared file data.

There is an asynchronous replication setup from Region 1 to Region 2 using the log shipping feature. In this type of configuration, Microsoft SQL Server log shipping replicates databases using transaction logs. This ensures that a physically replicated warm standby database is an exact binary replica of the primary database. This is referred to as physical replication.

Log shipping can be configured with two available modes. These are related to the state of the secondary log-shipped SQL Server database.

Standby mode. The database is available for querying. Users cannot access the database while restore is going on. But once restore is completed, users can access it in read-only mode.
Restore mode. The database is not accessible for users.

In this solution, you configure a warm standby SQL Server database on an EC2 instance designated in SQL FCI using Amazon FSx as shared storage. You can send transaction log backups asynchronously between your primary Region database and the warm standby server in the other Region. The transaction log backups are then applied to the warm standby database sequentially. When all the logs have been applied, you can perform a manual failover and point the application to the secondary Region. We recommend running the primary and secondary database instances in separate Availability Zones, and configuring a monitor instance to track all the details of log shipping.

Prerequisites

Configure two SQL Server clusters with FCI in Region 1 and Region 2, or have SQL Server cluster on Region 1 and Amazon EC2 with SQL Server with EBS on Region 2. Learn more about how to set up SQL Server FCI with Amazon FSx.
Configure VPC peering or Transit Gateway for the VPCs (where the SQL Server clusters reside).
- VPC peering – Learn more about setting up Inter-Region VPC Peering.
- Transit gateway – Similar to VPC peering. Learn how to Use an AWS Transit Gateway to Simplify Your Network Architecture and Scaling VPN throughput using AWS Transit Gateway.
Configure networking and security to work across the peering connection or Transit Gateway.
Verify that Amazon FSx and SQL connectivity is seamless across both Regions. For example, we should be able to connect Amazon FSx and SQL Server remotely from one Region to the other. Confirm that security group rules are in place. Learn more about FSx for Windows File Server.
SQL Server shouldn’t be running in Express edition as log shipping supports all editions except Express edition.
Give shared folders on primary and secondary Regions appropriate permissions so the network path is accessible across Regions.
The databases for log shipping must be in FULL recovery mode. Learn more about log shipping.

Walkthrough steps to set up DR for SQL Server FCI

Following are the steps required to configure SQL Server DR using SQL Server failover cluster. Amazon FSx for Windows File Server is used for the primary site and secondary DR site. We also demonstrate how to set up a multi-Region log shipping.

Assumed variables

Region_01:
WSFC Cluster Name: SQLCluster1
FCI Virtual Network Name: SQLFCI01
Region_02:
Amazon EC2 Name: EC2SQL2

Make sure to configure network connectivity between your clusters. In this solution, we are using two VPCs in two separate Regions.

- VPC peering is configured to enable network traffic on both VPCs.
- The domain controller (AWS Managed Microsoft AD) on both VPCs are configured with conditional forwarding. This enables DNS resolution between the two VPCs.

Configure SQL FCI setup using Amazon FSx as shared storage on Region_01.
Configure SQL standalone instance on Region_02 with EBS volume as storage.
Create an Amazon FSx in the primary Region with AWS managed Active Directory, or on-premises Active Directory connected with trust relation or AD Connector.
Create a SQL Server service account with proper permissions to be able to set up transaction log settings.
Configure VPC peering between the primary and DR/secondary Region.
Join the domain to the Active Directory network for both primary and secondary servers in primary Region.
Mount Amazon FSx on primary and secondary server and allow shared permissions, so SQL Server is able to access the folder. Use Amazon FSx for storing transaction log backups and EBS for storing transaction logs on the secondary Region.
Set up log shipping from the primary server SQL Server FCI01 to the secondary SQL Server EC2SQL2 with the standby option enabled. This way the databases can be in read on the secondary SQL Server.
In case of disaster, follow the FAILOVER and FAILBACK steps in the next sections. Learn more by reading Change Roles Between Primary and Secondary Log Shipping Servers.

Failover steps

In case of disaster at primary Region node SQLFCI01, log shipping acts as DR solution. Following, we show the steps to bring the databases online on EC2SQL02. Once SQLFCI01 is back, Use the following steps if DR drill checks to failover. In a real disaster, follow the process from Step 3 onwards.

1. Stop all activities on SQLFCI01 databases involved in log shipping jobs on SQLFCI01 and EC2SQL02. Confirm if any process is running by using the following query:

Use master
Go
select * from sysprocesses where dbid = DB_ID('DatabaseName')

2. Take full backup on SQLFCI01 as rollback option.

BACKUP DATABASE [DatabaseName]
TO DISK = N'Provide Drive details'
WITH COMPRESSION
GO

3. Take last tail transaction log backup if we have access to SQL Server. Otherwise, check the last available transaction log stored in EC2SQL02 and restore it with RECOVERY to bring the databases online on EC2SQL02.

RESTORE LOG [DatabaseName] FROM DISK = N'Provide path of last tlog'
WITH FILE = 1, RECOVERY, NOUNLOAD, STATS = 10
GO

4. Redirect the application connections to EC2SQL02.

Failback methods

1. Native backup/restore or rollback strategy

Take full backup from EC2SQL02 and copy to the SQLFCI01.
RESTORE the full backup on SQLFCI01.
Reconfigure log shipping between SQLFCI01 and EC2SQL02.

2. Reverse log shipping

In case of DR drills or business continuity and disaster recovery (BCDR) activities, we can set up reverse log shipping to reduce the time taken to failover. It doesn’t require reinitializing the database with a full backup if performed carefully. It is crucial to preserve the log sequence number (LSN) chain. Perform the final log backup using the NORECOVERY option. Backing up the log with this option puts the database in a state where log backups can be restored. It ensures that the database’s LSN chain doesn’t deviate. This procedure helps reduce downtime to bring back SQLFCI01.

STOP all activities on SQLFCI01 databases involved in log shipping jobs on SQLFCI01 and EC2SQL02.
TAKE Tlog backup of SQLFCI01 with NORECOVERY option.

BACKUP LOG [DatabaseName]
TO DISK = 'BackupFilePathname'
WITH NORECOVERY;

RESTORE transaction log backup on EC2SQL02 with NORECOVERY.
Reconfigure log shipping and reenable the jobs back.
Reconfigure the application connections to SQLFCI01.

Conclusion

A multi-Region strategy for your mission-critical SQL Server deployments is key for business continuity and disaster recovery. This blog post shows how to achieve that using log shipping for SQL Server FCI deployment. Setting up DR using log shipping can help you save costs and meet your business requirements.

To learn more, check out Simplify your Microsoft SQL Server high availability deployments using Amazon FSx for Windows File Server.

How to set up federated single sign-on to AWS using Google Workspace

2022-03-10 Wei Chen

Post Syndicated from Wei Chen original https://aws.amazon.com/blogs/security/how-to-set-up-federated-single-sign-on-to-aws-using-google-workspace/

Organizations who want to federate their external identity provider (IdP) to AWS will typically do it through AWS Single Sign-On (AWS SSO), AWS Identity and Access Management (IAM), or use both. With AWS SSO, you configure federation once and manage access to all of your AWS accounts centrally. With AWS IAM, you configure federation to each AWS account, and manage access individually for each account. AWS SSO supports identity synchronization through the System for Cross-domain Identity Management (SCIM) v2.0 for several identity providers. For IdPs not currently supported, you can provision users manually. Otherwise, you can choose to federate to AWS from Google Workspace through IAM federation, which this post will cover below.

Google Workspace offers a single sign-on service based off of the Security Assertion Markup Language (SAML) 2.0. Users can use this service to access to your AWS resources by using their existing Google credentials. For users to whom you grant access, they will see an additional SAML app in their Google Workspace console. When your users choose this SAML app, they will be redirected to www.google.com the AWS Management Console.

Solution Overview

In this solution, you will create a SAML identity provider in IAM to establish a trusted communication channel across which user authentication information may be securely passed with your Google IdP in order to permit your Google Workspace users to access the AWS Management Console. You, as the AWS administrator, delegate responsibility for user authentication to a trusted IdP, in this case Google Workspace. Google Workspace leverages SAML 2.0 messages to communicate user authentication information between Google and your AWS account. The information contained within the SAML 2.0 messages allows an IAM role to grant the federated user permissions to sign in to the AWS Management Console and access your AWS resources. The IAM policy attached to the role they select determines which permissions the federated user has in the console.

Figure 1: Login process for IAM federation

Figure 1 illustrates the login process for IAM federation. From the federated user’s perspective, this process happens transparently: the user starts at the Google Workspace portal and ends up at the AWS Management Console, without having to supply yet another user name and password.

The portal verifies the user’s identity in your organization. The user begins by browsing to your organization’s portal and selects the option to go to the AWS Management Console. In your organization, the portal is typically a function of your IdP that handles the exchange of trust between your organization and AWS. In Google Workspace, you navigate to https://myaccount.google.com/ and select the nine dots icon on the top right corner. This will show you a list of apps, one of which will log you in to AWS. This blog post will show you how to configure this custom app.

Figure 2: Google Account page
The portal verifies the user’s identity in your organization.
The portal generates a SAML authentication response that includes assertions that identify the user and include attributes about the user. The portal sends this response to the client browser. Although not discussed here, you can also configure your IdP to include a SAML assertion attribute called SessionDuration that specifies how long the console session is valid. You can also configure the IdP to pass attributes as session tags.
The client browser is redirected to the AWS single sign-on endpoint and posts the SAML assertion.
The endpoint requests temporary security credentials on behalf of the user, and creates a console sign-in URL that uses those credentials.
AWS sends the sign-in URL back to the client as a redirect.
The client browser is redirected to the AWS Management Console. If the SAML authentication response includes attributes that map to multiple IAM roles, the user is first prompted to select the role for accessing the console.

The list below is a high-level view of the specific step-by-step procedures needed to set up federated single sign-on access via Google Workspace.

The setup

Follow these top-level steps to set up federated single sign-on to your AWS resources by using Google Apps:

Download the Google identity provider (IdP) information.
Create the IAM SAML identity provider in your AWS account.
Create roles for your third-party identity provider.
Assign the user’s role in Google Workspace.
Set up Google Workspace as a SAML identity provider (IdP) for AWS.
Test the integration between Google Workspace and AWS IAM.
Roll out to a wider user base.

Detailed procedures for each of these steps compose the remainder of this blog post.

Step 1. Download the Google identity provider (IdP) information

First, let’s get the SAML metadata that contains essential information to enable your AWS account to authenticate the IdP and locate the necessary communication endpoint locations:

Log in to the Google Workspace Admin console
From the Admin console Home page, select Security > Settings > Set up single sign-on (SSO) with Google as SAML Identity Provider (IdP).

Figure 3: Accessing the “single sign-on for SAML applications” setting
Choose Download Metadata under IdP metadata.

Figure 4: The “SSO with Google as SAML IdP” page

Step 2. Create the IAM SAML identity provider in your account

Now, create an IAM IdP for Google Workspace in order to establish the trust relationship between Google Workspace and your AWS account. The IAM IdP you create is an entity within your AWS account that describes the external IdP service whose users you will configure to assume IAM roles.

Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
In the navigation pane, choose Identity providers and then choose Add provider.
For Configure provider, choose SAML.
Type a name for the identity provider (such as GoogleWorkspace).
For Metadata document, select Choose file then specify the SAML metadata document that you downloaded in Step 1–c.
Verify the information that you have provided. When you are done, choose Add provider.

Figure 5: Adding an Identity provider
Document the Amazon Resource Name (ARN) by viewing the identity provider you just created in step f. The ARN should looks similar to this:
arn:aws:iam::123456789012:saml-provider/GoogleWorkspace

Step 3. Create roles for your third-party Identity Provider

For users accessing the AWS Management Console, the IAM role that the user assumes allows access to resources within your AWS account. The role is where you define what you allow a federated user to do after they sign in.

To create an IAM role, go to the AWS IAM console. Select Roles > Create role.
Choose the SAML 2.0 federation role type.
For SAML Provider, select the provider which you created in Step 2.
Choose Allow programmatic and AWS Management Console access to create a role that can be assumed programmatically and from the AWS Management Console.
Review your SAML 2.0 trust information and then choose Next: Permissions.

Figure 6: Reviewing your SAML 2.0 trust information

GoogleSAMLPowerUserRole:

For this walkthrough, you are going to create two roles that can be assumed by SAML 2.0 federation. For GoogleSAMLPowerUserRole, you will attach the PowerUserAccess AWS managed policy. This policy provides full access to AWS services and resources, but does not allow management of users and groups. Choose Filter policies, then select AWS managed – job function from the dropdown. This will show a list of AWS managed policies designed around specific job functions.

Figure 7: Selecting the AWS managed job function
To attach the policy, select PowerUserAccess. Then choose Next: Tags, then Next: Review.

Figure 8: Attaching the PowerUserAccess policy to your role
Finally, choose Create role to finalize creation of your role.

Figure 9: Creating your role

GoogleSAMLViewOnlyRole

Repeat steps a to g for the GoogleSAMLViewOnlyRole, attaching the ViewOnlyAccess AWS managed policy.

Figure 10: Creating the GoogleSAMLViewOnlyRole

Figure 11: Attaching the ViewOnlyAccess permissions policy

Document the ARN of both roles. The ARN should be similar to
arn:aws:iam::123456789012:role/GoogleSAMLPowerUserRole and

arn:aws:iam::123456789012:role/GoogleSAMLViewOnlyAccessRole.

Step 4. Assign the user’s role in Google Workspace

Here you will specify the role or roles that this user can assume in AWS.

Log in to the Google Admin console.
From the Admin console Home page, go to Directory > Users and select Manage custom attributes from the More dropdown, and choose Add Custom Attribute.
Configure the custom attribute as follows:

Category: AWS

Description: Amazon Web Services Role Mapping

For Custom fields, enter the following values:

Name: AssumeRoleWithSaml

Info type: Text

Visibility: Visible to user and admin

InNo. of values: Multi-value
Choose Add. The new category should appear in the Manage user attributes page.

Figure12: Adding the custom attribute
Navigate to Users, and find the user you want to allow to federate into AWS. Select the user’s name to open their account page, then choose User Information.
Select on the custom attribute you recently created, named AWS. Add two rows, each of which will include the values you recorded earlier, using the format below for each AssumeRoleWithSaml row.
Row 1:
arn:aws:iam::123456789012:role/GoogleSAMLPowerUserRole,arn:aws:iam:: 123456789012:saml-provider/GoogleWorkspace

Row 2:
arn:aws:iam::123456789012:role/GoogleSAMLViewOnlyAccessRole,arn:aws:iam:: 123456789012:saml-provider/GoogleWorkspace

The format of the AssumeRoleWithSaml is constructed by using the RoleARN(from Step 3-h) + “,”+ Identity provider ARN (from Step 2-g), this value will be passed as SAML attribute value for attribute with name https://aws.amazon.com/SAML/Attributes/Role. The final result will look similar to below:

Figure 13: Adding the roles that the user can assume

Step 5. Set up Google Workspace as a SAML identity provider (IdP) for AWS

Now you’ll set up the SAML app in your Google Workspace account. This includes adding the SAML attributes that the AWS Management Console expects in order to allow a SAML-based authentication to take place.

Log into the Google Admin console.

From the Admin console Home page, go to Apps > Web and mobile apps.
Choose Add custom SAML app from the Add App dropdown.
Enter AWS Single-Account Access for App name and upload an optional App icon to identify your SAML application, and select Continue.

Figure 14: Naming the custom SAML app and setting the icon
Fill in the following values:

ACS URL: https://signin.aws.amazon.com/saml

Entity ID: urn:amazon:webservices

Name ID format: EMAIL

Name ID: Basic Information > Primary email

Note: Your primary email will become your role’s AWS session name
Choose CONTINUE.

Figure 15: Adding the custom SAML app

AWS requires the IdP to issue a SAML assertion with some mandatory attributes (known as claims). The AWS documentation explains how to configure the SAML assertion. In short, you need to create an assertion with the following:

An attribute of name https://aws.amazon.com/SAML/Attributes/Role. This element contains one or more AttributeValue elements that list the IAM identity provider and role to which the user is mapped by your IdP. The IAM role and IAM identity provider are specified as a comma-delimited pair of ARNs in the same format as the RoleArn and PrincipalArn parameters that are passed to AssumeRoleWithSAML.
An attribute of name https://aws.amazon.com/SAML/Attributes/RoleSessionName (again, this is just a definition of type, not an actual URL) with a string value. This is the federated user’s role session name in AWS.

A name identifier (NameId) that is used to identify the subject of a SAML assertion.

Google Directory attributes	App attributes
AWS > AssumeRoleWithSaml	https://aws.amazon.com/SAML/Attributes/Role
Basic Information > Primary email	https://aws.amazon.com/SAML/Attributes/RoleSessionName

Figure 16: Mapping between Google Directory attributes and SAML attributes

Choose FINISH and save the mapping.

Step 6. Test the integration between Google Workspace and AWS IAM

Log into the Google Admin portal.
From the Admin console Home page, go to Apps > Web and mobile apps.
Select the Application you created in Step 5-i.
At the top left, select TEST SAML LOGIN, then choose ALLOW ACCESS within the popup box.

Figure 18: Testing the SAML login
Select ON for everyone in the Service status section, and choose SAVE. This will allow every user in Google Workspace to see the new SAML custom app.

Figure 19: Saving the custom app settings
Now navigate to Web and mobile apps and choose TEST SAML LOGIN again. Amazon Web Services should open in a separate tab and display two roles for users to choose from:

FIgure 20: Testing SAML login again

Figure 21: Selecting the IAM role you wish to assume for console access
Select the desired role and select Sign in.
You should now be redirected to AWS Management Console home page.
Google workspace users should now be able to access the AWS application from their workspace:

Figure 22: Viewing the AWS custom app

Conclusion

By following the steps in this blog post, you’ve configured your Google Workspace directory and AWS accounts to allow SAML-based federated sign-on for selected Google Workspace users. Using this over IAM users helps centralize identity management, making it easier to adopt a multi-account strategy.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Category:	AWS
Description:	Amazon Web Services Role Mapping

Name:	AssumeRoleWithSaml
Info type:	Text
Visibility:	Visible to user and admin
InNo. of values:	Multi-value

ACS URL:	https://signin.aws.amazon.com/saml
Entity ID:	urn:amazon:webservices
Name ID format:	EMAIL
Name ID:	Basic Information > Primary email

Deploying the solution

Set up

Solution overview

Prerequisites

Create an EMR cluster

Create an NLB

Update the Kerberos KDC

Update the Livy configuration file

Verify Livy is accessible via the NLB

Run the Python Livy test case

Summary

About the Authors

What can the server generator do for me?

The architecture of a Smithy service

Walkthrough

Prerequisites

Checking out the sample repository

To clone the application in your browser

Exploring and setting up the sample application

Modeling a service using Smithy

Updating the Smithy model to add additional constraints to the input

Using the Smithy Server Generator for TypeScript

Implementing an operation using a server SDK

Deploying the sample application

Calling the sample application with a generated client

To call the deployed application using the generated client

Cleaning up

To delete the sample application using the CDK

Conclusion

Example—

Methodology and Walkthrough

Clean up

Conclusion

Author:

Prerequisites

Solution walkthrough

Cleanup

Conclusion

Overview of solution

Walkthrough

Prerequisites

Create a new repository on Github:

Create a local repository for your application:

Setup Amplify:

Cleaning up

Conclusion

HMAC as a cryptographic building block

HMAC in the real world

Advantages of using HMACs in AWS KMS

Use HMAC keys in AWS KMS to create JSON Web Tokens

Create an HMAC key in AWS KMS

Use the HMAC key to encode a signed JWT

Verify the signed JWT

Create separate roles to control who has access to generate HMACs and who has access to validate HMACs

Conclusion

Solution overview

Prerequisites

Deploy the solution using AWS CloudFormation

Test the data pipeline

Data pipeline details

Data pipeline results

Clean up

Conclusion

About the Author

Reaching service quotas with polling iterations

Supporting parallel and dynamic tasks

Limits and flow control with dynamic tasks

A sample project with several orchestration patterns

Deploy the sample application

Prerequisites

Build and deploy in your account

Testing the sample application

Cleaning up

Conclusion

How global endpoints work

Setting up a global endpoint

Testing failover

Using the PutEvents API with global endpoints

Conclusion

Reusability