Tag Archives: contributed

Building dynamic Amazon SNS subscriptions for auto scaling container workloads 

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-dynamic-amazon-sns-subscriptions-for-auto-scaling-container-workloads/

This post is written by Mithun Mallick, Senior Specialist Solutions Architect, App Integration.

Amazon Simple Notification Service (SNS) is a serverless publish subscribe messaging service. It supports a push-based subscriptions model where subscribers must register an endpoint to receive messages. Amazon Simple Queue Service (SQS) is one such endpoint, which is used by applications to receive messages published on an SNS topic.

With containerized applications, the container instances poll the queue and receive the messages. However, containerized applications can scale out for a variety of reasons. The creation of an SQS queue for each new container instance creates maintenance overhead for customers. You must also clean up the SNS-SQS subscription once the instance scales in.

This blog walks through a dynamic subscription solution, which automates the creation, subscription, and deletion of SQS queues for an Auto Scaling group of containers running in Amazon Elastic Container Service (ECS).

Overview

The solution is based on the use of events to achieve the dynamic subscription pattern. ECS uses the concept of tasks to create an instance of a container. You can find more details on ECS tasks in the ECS documentation.

This solution uses the events generated by ECS to manage the complete lifecycle of an SNS-SQS subscription. It uses the task ID as the name of the queue that is used by the ECS instance for pulling messages. More details on the ECS task ID can be found in the task documentation.

This also uses Amazon EventBridge to apply rules on ECS events and trigger an AWS Lambda function. The first rule detects the running state of an ECS task and triggers a Lambda function, which creates the SQS queue with the task ID as queue name. It also grants permission to the queue and creates the SNS subscription on the topic.

As the container instance starts up, it can send a request to its metadata URL and retrieve the task ID. The task ID is used by the container instance to poll for messages. If the container instance terminates, ECS generates a task stopped event. This event matches a rule in Amazon EventBridge and triggers a Lambda function. The Lambda function retrieves the task ID, deletes the queue, and deletes the subscription from the SNS topic. The solution decouples the container instance from any overhead in maintaining queues, applying permissions, or managing subscriptions. The security permissions for all SNS-SQS management are handled by the Lambda functions.

This diagram shows the solution architecture:

Solution architecture

Events from ECS are sent to the default event bus. There are various events that are generated as part of the lifecycle of an ECS task. You can find more on the various ECS task states in ECS task documentation. This solution uses ECS as the container orchestration service but you can also use Amazon Elastic Kubernetes Service.(EKS). For EKS, you must apply the rules for EKS task state events.

Walkthrough of the implementation

The code snippets are shortened for brevity. The full source code of the solution is in the GitHub repository. The solution uses AWS Serverless Application Model (AWS SAM) for deployment.

SNS topic

The SNS topic is used to send notifications to the ECS tasks. The following snippet from the AWS SAM template shows the definition of the SNS topic:

  SNSDynamicSubsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Ref DynamicSubTopicName

Container instance

The container instance subscribes to the SNS topic using an SQS queue. The container image is a Java class that reads messages from an SQS queue and prints them in the logs. The following code shows some of the message processor implementation:

AmazonSQS sqs = AmazonSQSClientBuilder.defaultClient();
AmazonSQSResponder responder = AmazonSQSResponderClientBuilder.standard()
        .withAmazonSQS(sqs)
        .build();

SQSMessageConsumer consumer = SQSMessageConsumerBuilder.standard()
        .withAmazonSQS(responder.getAmazonSQS())
        .withQueueUrl(queue_url)
        .withConsumer(message -> {
            System.out.println("The message is " + message.getBody());
            sqs.deleteMessage(queue_url,message.getReceiptHandle());

        }).build();
consumer.start();

The queue_url highlighted is the task ID of the ECS task. It is retrieved in the constructor of the class:

String metaDataURL = map.get("ECS_CONTAINER_METADATA_URI_V4");

HttpGet request = new HttpGet(metaDataURL);
CloseableHttpResponse response = httpClient.execute(request);

HttpEntity entity = response.getEntity();
if (entity != null) {
    String result = EntityUtils.toString(entity);
    String taskARN = JsonPath.read(result, "$['Labels']['com.amazonaws.ecs.task-arn']").toString();
    String[] arnTokens = taskARN.split("/");
    taskId = arnTokens[arnTokens.length-1];
    System.out.println("The task arn : "+taskId);
}

queue_url = sqs.getQueueUrl(taskId).getQueueUrl();

The queue URL is constructed from the task ID of the container. Each queue is dedicated to each of the tasks or the instances of the container running in ECS.

EventBridge rules

The following event pattern on the default event bus captures events that match the start of the container instance. The rule triggers a Lambda function:

      EventPattern:
        source:
          - aws.ecs
        detail-type:
          - "ECS Task State Change"
        detail:
          desiredStatus:
            - "RUNNING"
          lastStatus:  
            - "RUNNING"

The start rule routes events to a Lambda function that creates a queue with the name as the task ID. It creates the subscription to the SNS topic and grants permission on the queue to receive messages from the topic.

This event pattern matches STOPPED events of the container task. It also triggers a Lambda function to delete the queue and the associated subscription:

      EventPattern:
        source:
          - aws.ecs
        detail-type:
          - "ECS Task State Change"
        detail:
          desiredStatus:
            - "STOPPED"
          lastStatus:  
            - "STOPPED"

Lambda functions

There are two Lambda functions that perform the queue creation, subscription, authorization, and deletion.

The SNS-SQS-Subscription-Service

The following code creates the queue based on the task id, applies policies, and subscribes it to the topic. It also stores the subscription ARN in a Amazon DynamoDB table:

# get the task id from the event
taskArn = event['detail']['taskArn']
taskArnTokens = taskArn.split('/')
taskId = taskArnTokens[len(taskArnTokens)-1]

create_queue_resp = sqs_client.create_queue(QueueName=queue_name)

response = sns.subscribe(TopicArn=topic_arn, Protocol="sqs", Endpoint=queue_arn)

ddbresponse = dynamodb.update_item(
    TableName=SQS_CONTAINER_MAPPING_TABLE,
    Key={
        'id': {
            'S' : taskId.strip()
        }
    },
    AttributeUpdates={
        'SubscriptionArn':{
            'Value': {
                'S': subscription_arn
            }
        }
    },
    ReturnValues="UPDATED_NEW"
)

The cleanup service

The cleanup function is triggered when the container instance is stopped. It fetches the subscription ARN from the DynamoDB table based on the taskId. It deletes the subscription from the topic and deletes the queue. You can modify this code to include any other cleanup actions or trigger a workflow. The main part of the function code is:

taskId = taskArnTokens[len(taskArnTokens)-1]

ddbresponse = dynamodb.get_item(TableName=SQS_CONTAINER_MAPPING_TABLE,Key={'id': { 'S' : taskId}})
snsresp = sns.unsubscribe(SubscriptionArn=subscription_arn)

queuedelresp = sqs_client.delete_queue(QueueUrl=queue_url)

Conclusion

This blog shows an event driven approach to handling dynamic SNS subscription requirements. It relies on the ECS service events to trigger appropriate Lambda functions. These create the subscription queue, subscribe it to a topic, and delete it once the container instance is terminated.

The approach also allows the container application logic to focus only on consuming and processing the messages from the queue. It does not need any additional permissions to subscribe or unsubscribe from the topic or apply any additional permissions on the queue. Although the solution has been presented using ECS as the container orchestration service, it can be applied for EKS by using its service events.

For more serverless learning resources, visit Serverless Land.

Visualizing AWS Step Functions workflows from the AWS Batch console

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/visualizing-aws-step-functions-workflows-from-the-aws-batch-console/

This post written by Dhiraj Mahapatro, Senior Specialist SA, Serverless.

AWS Step Functions is a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Step Functions workflows manage failures, retries, parallelization, service integrations, and observability so builders can focus on business logic.

AWS Batch is one of the service integrations that are available for Step Functions. AWS Batch enables users to more easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and compute resource classifications based on the volume and specific resource requirements of the batch jobs submitted. AWS Batch plans, schedules, and runs batch computing workloads across the full range of AWS compute services and features, such as AWS FargateAmazon EC2, and spot instances.

Now, Step Functions is available to AWS Batch users through the AWS Batch console. This feature enables AWS Batch users to augment compute options and have additional orchestration capabilities to manage their batch jobs.

This blog walks through Step Functions integration in AWS Batch console and shows how AWS Batch users can efficiently use Step Functions workflow orchestrators in batch workloads. A sample application also highlights the use of AWS Lambda as a compute option for AWS Batch.

Introducing workflow orchestration in AWS Batch console

Today, AWS users use AWS Batch for high performance computing, post-trade analytics, fraud surveillance, screening, DNA sequencing, and more. AWS Batch minimizes human error, increases speed and accuracy, and reduces costs with automation so that users can refocus on evolving the business.

In addition to using compute-intensive tasks, users sometimes need Lambda for simpler, less intense processing. Users also want to combine the two in a single business process that is scalable and repeatable.

Workflow orchestration (powered by Step Functions) in AWS Batch console allows orchestration of batch jobs with Step Functions state machine:

Workflow orchestration in Batch console

Workflow orchestration in Batch console

Using batch-related patterns from Step Functions

Error handling

Step Functions natively handles errors and retries of its workflows. Users rely on this native error handling mechanism to focus on building business logic.

Workflow orchestration in AWS Batch console provides common batch-related patterns that are present in Step Functions. Handling errors while submitting batch jobs in Step Functions is one of them.

Getting started with orchestration in Batch

Getting started with orchestration in Batch

  1. Choose Get Started from Handle complex errors.
  2. From the pop-up, choose Start from a template and choose Continue.

A new browser tab opens with Step Functions Workflow Studio. The Workflow Studio designer has a workflow pattern template pre-created. Diving deeper into the workflow highlights that the Step Functions workflow submits a batch job and then handles success and error scenarios by sending Amazon SNS notifications, respectively.

Alternatively, choosing Deploy a sample project from the Get Started pop-up deploys a sample Step Functions workflow.

Deploying a sample project

Deploying a sample project

This option allows creating a state machine from scratch, reviewing the workflow definition, deploying an AWS CloudFormation stack, and running the workflow in Step Functions console.

Deploy and run from console

Deploy and run from console

Once deployed, the state machine is visible in the Step Functions console as:

Viewing the state machine in the AWS Step Functions console

Viewing the state machine in the AWS Step Functions console

Select the BatchJobNotificationStateMachine to land on the details page:

View the state machine's details

View the state machine’s details

The CloudFormation template has already provisioned the required batch job in AWS Batch and the SNS topic for success and failure notification.

To see the Step Functions workflow in action, use Start execution. Keep the optional name and input as is and choose Start execution:

Run the Step Function

Run the Step Function

The state machine completes the tasks successfully by Submitting Batch Job using AWS Batch and Notifying Success using the SNS topic:

The successful results in the console

The successful results in the console

The state machine used the AWS Batch Submit Job task. The Workflow orchestration in AWS Batch console now highlights this newly created Step Functions state machine:

The state machine is listed in the Batch console

The state machine is listed in the Batch console

Therefore, any state machine that uses this task in Step Functions for this account is listed here as a state machine that orchestrates batch jobs.

Combine Batch and Lambda

Another pattern to use in Step Functions is the combination of Lambda and batch job.

Select Get Started from Combine Batch and Lambda pop-up followed by Start from a template and Continue. This takes the user to Step Functions Workflow studio with the following pattern. The Lambda task generates input for the subsequent batch job task. Submit Batch Job task takes the input and submits the batch job:

Combining AWS Lambda with AWS Step Functions

Combining AWS Lambda with AWS Step Functions

Step Functions enables AWS Batch users to combine Batch and Lambda functions to optimize compute spend while using the power of the different compute choices.

Fan out to multiple Batch jobs

In addition to error handling and combining Lambda with AWS Batch jobs, a user can fan out multiple batch jobs using Step Functions’ map state. Map state in Step Functions provides dynamic parallelism.

With dynamic parallelism, a user can submit multiple batch jobs based on a collection of batch job input data. With visibility to each iteration’s input and output, users can easily navigate and troubleshoot in case of failure.

Easily navigate and troubleshoot in case of failure

Easily navigate and troubleshoot in case of failure

AWS Batch users are not limited to the previous three patterns shown in Workflow orchestration in the AWS Batch console. AWS Batch users can start from scratch and build Step Functions state machine by navigating to the bottom right and using Create state machine:

Create a state machine from the Step Functions console

Create a state machine from the Step Functions console

Create State Machine in AWS Batch console opens a new tab with Step Functions console’s Create state machine page.

Design a workflow visually

Design a workflow visually

Refer building a state machine AWS Step Functions Workflow Studio for additional details.

Deploying the application

The sample application shows fan out to multiple batch jobs pattern. Before deploying the application, you need:

To deploy:

  1. From a terminal window, clone the GitHub repo:
    git clone [email protected]:aws-samples/serverless-batch-job-workflow.git
  2. Change directory:
    cd ./serverless-batch-job-workflow
  3. Download and install dependencies:
    sam build
  4. Deploy the application to your AWS account:
    sam deploy --guided

To run the application using the AWS CLI, replace the state machine ARN from the output of deployment steps:

aws stepfunctions start-execution \
    --state-machine-arn <StepFunctionArnHere> \
    --region <RegionWhereApplicationDeployed> \
    --input "{}"

Step Functions is not limited to AWS Batch’s Submit Job API action

In September 2021, Step Functions announced integration support for 200 AWS Services to enable easier workflow automation. With this announcement, Step Functions is not limited to integrate with AWS Batch’s SubmitJob API but also can integrate with any AWS Batch SDK API today.

Step Functions can automate the lifecycle of an AWS Batch job, starting from creating a compute environment, creating job queues, registering job definitions, submitting a job, and finally cleaning up.

Other AWS service integrations

Step Functions support for 200 AWS Services equates integration with more than 9,000 API actions across these services. AWS Batch tasks in Step Functions can evolve by integrating with available services in the workflow for their pre- and post-processing needs.

For example, batch job input data sanitization can be done inside Lambda and that gets pushed to an Amazon SQS queue or Amazon S3 as an object for auditability purposes.

Similarly, Amazon SNS, Amazon Pinpoint, or Amazon SES can notify once AWS Batch job task is complete.

There are multiple ways to decorate around an AWS Batch job task. Refer to AWS SDK service integrations and optimized integrations for Step Functions for additional details.

Important considerations

Workflow orchestrations in the AWS Batch console only show Step Functions state machines that use AWS Batch’s Submit Job task. Step Functions state machines do not show in the AWS Batch console when:

  1. A state machine uses any other AWS SDK Batch API integration task
  2. AWS Batch’s SubmitJob API is invoked inside a Lambda function task using an AWS SDK client (like Boto3 or Node.js or Java)

Cleanup

The sample application provisions AWS Batch (the job definition, job queue, and ECS compute environment inside a VPC). It also creates subnets, route tables, and an internet gateway. Clean up the stack after testing the application to avoid the ongoing cost of running these services.

To delete the sample application stack, use the latest version of AWS SAM CLI and run:

sam delete

Conclusion

To learn more on AWS Batch, read the Orchestrating Batch jobs section in the Batch developer guide.

To get started, open the workflow orchestration page in the Batch console. Select Orchestrate Batch jobs with Step Functions Workflows to deploy a sample project, if you are new to Step Functions.

This feature is available in all Regions where both Step Functions and AWS Batch are available. View the AWS Regions table for details.

To learn more on Step Functions patterns, visit Serverless Land.

Accepting API keys as a query string in Amazon API Gateway

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/accepting-api-keys-as-a-query-string-in-amazon-api-gateway/

This post was written by Ronan Prenty, Sr. Solutions Architect and Zac Burns, Cloud Support Engineer & API Gateway SME

Amazon API Gateway is a fully managed service that makes it easier for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the front door to applications and allow developers to offload tasks like authorization, throttling, caching, and more.

A common feature requested by customers is the ability to track usage for specific users or services through API keys. API Gateway REST APIs support this feature and, for added security, require that the API key resides in a header or an authorizer.

Developers may also need to pass API keys in the query string parameters. Best practices encourage refactoring the requests at the client level to move API keys to the header. However, this may not be possible during the migration.

This blog explains how to build an API Gateway REST API that temporarily accepts API keys as query string parameters. This post helps customers who have APIs that accept API keys as query string parameters and want to migrate to API Gateway with minimal impact on their clients. The post also discusses increasing security by refactoring the client to send API keys as a header instead of a query string.

There is also an example project for you to test and evaluate. This solution uses a custom authorizer AWS Lambda function to extract the API key from the query string parameter and apply it to a usage plan. The sample application uses the AWS Serverless Application Model (AWS SAM) for deployment.

Key concepts

API keys and usage plans

API keys are alphanumeric strings that are distributed to developers to grant access to an API. API Gateway can generate these on your behalf, or you can import them.

Usage plans let you provide API keys to your customers so that you can track and limit their usage. API keys are not a primary authorization mechanism for your APIs. If multiple APIs are associated with a usage plan, a user with a valid API key can access all APIs in that usage plan. We provide numerous options for securing access to your APIs, including resource policies, Lambda authorizers, and Amazon Cognito user pools.

Usage plans define who can access deployed API stages and methods along with metering their usage. Usage plans use API keys to identify who is making requests and apply throttling and quota limits.

How API Gateway handles API keys

API Gateway supports API keys sent as headers in a request. It does not support API keys sent as a query string parameter. API Gateway only accepts requests over HTTPS, which means that the request is encrypted. When sending API keys as query string parameters, there is still a risk that URLs are logged in plaintext by the client sending requests.

API Gateway has two settings to accept API keys:

  1. Header: The request contains the values as the X-API-Key header. API Gateway then validates the key against a usage plan.
  2. Authorizer: The authorizer includes the API key as part of the authorization response. Once API Gateway receives the API key as part of the response, it validates it against a usage plan.

Solution overview

To accept an API key as a query string parameter temporarily, create a custom authorizer using a Lambda function:

Note: the apiKeySource property of your API must be set to Authorizer instead of Header.

Note: the apiKeySource property of your API must be set to Authorizer instead of Header.

  1. The client sends an HTTP request to the API Gateway endpoint with the API key in the query string.
  2. API Gateway sends the request to a REQUEST type custom authorizer
  3. The custom authorizer function extracts the API Key from the payload. It constructs the response object with the API Key as the value for the `usageIdentifierKey` property
  4. The response gets sent back to API Gateway for validation.
  5. API Gateway validates the API key against a usage plan.
  6. If valid, API Gateway passes the request to the backend.

Deploying the solution

Prerequisites

This solution requires no pre-existing AWS resources and deploys everything you need from the template. Deploying the solution requires:

You can find the solution on GitHub using this link.

With the prerequisites completed, deploy the template with the following commands:

git clone https://github.com/aws-samples/amazon-apigateway-accept-apikeys-as-querystring.git
cd amazon-apigateway-accept-apikeys-as-querystring
sam build --use-container
sam deploy --guided

Long term considerations

This temporary solution enables developers to migrate APIs to API Gateway and maintain query string-based API keys. While this solution does work, it does not follow best practices.

In addition to security, there is also a cost factor. Each time the client request contains an API key, the custom authorizer AWS Lambda function will be invoked, increasing the total amount of Lambda invocations you are billed for. To ensure you are billed only for valid requests, you can add an identity source to the custom authorizer meaning that only requests containing this identity source will be sent to the Lambda function. Requests that do not contain this identity source will not be billed by Lambda or API Gateway. Migrating to a header-based API key removes the need for a custom authorizer and the extra Lambda function invocations. You can find out more information on AWS Lambda billing here.

Customer migration process

With this in mind, the structure of the request sent by API clients must change from:

GET /some-endpoint?apiKey=abc123456789

To:

GET /some-endpoint
x-api-key: abc123456789

You can provide clients with a notice period when this temporary solution is operational. After, they must migrate to a new API endpoint using a header to provide the API keys. Once the client migration is complete, they can retire the custom solution.

Developer portal

In addition to migrating API keys to a header-based solution, customers also ask us how to manage customer keys and usage plans. One option is to deploy the API Gateway developer portal.

This portal enables your customers to discover available APIs, browse API documentation, register for API keys, test APIs in the user interface, and monitor their API usage. This portal also allows you to publish non-API Gateway managed APIs by uploading OpenAPI definitions. The serverless developer portal can be customized and branded to suit your organization.

Conclusion

This blog post demonstrates how to use custom authorizers in API Gateway to accept API keys as a query string parameter. It also provides an AWS SAM template to deploy an example application for testing. Finally, it discusses the importance of moving customers to header-based API keys and managing those keys with the developer portal.

For more serverless content, visit Serverless Land.

Using JSONPath effectively in AWS Step Functions

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-jsonpath-effectively-in-aws-step-functions/

This post is written by Dhiraj Mahapatro, Senior Serverless Specialist SA, Serverless.

AWS Step Functions uses Amazon States Language (ASL), which is a JSON-based, structured language used to define the state machine. ASL uses paths for input and output processing in between states. Paths follow JSONPath syntax.

JSONPath provides the capability to select parts of JSON structures similar to how XPath expressions select nodes of XML documents. Step Functions provides the data flow simulator, which helps in modeling input and output path processing using JSONPath.

This blog post explains how you can effectively use JSONPath in a Step Functions workflow. It shows how you can separate concerns between states by specifically identifying input to and output from each state. It also explains how you can use advanced JSONPath expressions for filtering and mapping JSON content.

Overview

The sample application in this blog is based on a use case in the insurance domain. A new potential customer signs up with an insurance company by creating an account. The customer provides their basic information, and their interests in the types of insurances for shopping later.

The information provided by the potential insurance customer is accepted by the insurance company’s new account application for processing. This application is built using Step Functions, which accepts provided input as a JSON payload and applies the following business logic:

Example application architecture

  1. Verify the identity of the user.
  2. Verify the address of the user.
  3. Approve the new account application if the checks pass.
  4. Upon approval, insert user information into the Amazon DynamoDB Accounts table.
  5. Collect home insurance interests and store in an Amazon SQS queue.
  6. Send email notification to the user about the application approval.
  7. Deny the new account application if the checks fail.
  8. Send an email notification to the user about the application denial.

Deploying the application

Before deploying the solution, you need:

To deploy:

  1. From a terminal window, clone the GitHub repo:
    git clone [email protected]:aws-samples/serverless-account-signup-service.git
  2. Change directory:
    cd ./serverless-account-signup-service
  3. Download and install dependencies:
    sam build
  4. Deploy the application to your AWS account:
    sam deploy --guided
  5. During the guided deployment process, enter a valid email address for the parameter “Email” to receive email notifications.
  6. Once deployed, a confirmation email is sent to the provided email address from SNS. Confirm the subscription by clicking the link in the email.
    Email confirmation

To run the application using the AWS CLI, replace the state machine ARN from the output of deployment steps:

aws stepfunctions start-execution \
  --state-machine-arn <StepFunctionArnHere> \
  --input "{\"data\":{\"firstname\":\"Jane\",\"lastname\":\"Doe\",\"identity\":{\"email\":\"[email protected]\",\"ssn\":\"123-45-6789\"},\"address\":{\"street\":\"123 Main St\",\"city\":\"Columbus\",\"state\":\"OH\",\"zip\":\"43219\"},\"interests\":[{\"category\":\"home\",\"type\":\"own\",\"yearBuilt\":2004},{\"category\":\"auto\",\"type\":\"car\",\"yearBuilt\":2012},{\"category\":\"boat\",\"type\":\"snowmobile\",\"yearBuilt\":2020},{\"category\":\"auto\",\"type\":\"motorcycle\",\"yearBuilt\":2018},{\"category\":\"auto\",\"type\":\"RV\",\"yearBuilt\":2015},{\"category\":\"home\",\"type\":\"business\",\"yearBuilt\":2009}]}}"

Paths in Step Functions

Here is the sample payload structure :

{
  "data": {
    "firstname": "Jane",
    "lastname": "Doe",
    "identity": {
      "email": "[email protected]",
      "ssn": "123-45-6789"
    },
    "address": {
      "street": "123 Main St",
      "city": "Columbus",
      "state": "OH",
      "zip": "43219"
    },
    "interests": [
      {"category": "home", "type": "own", "yearBuilt": 2004},
      {"category": "auto", "type": "car", "yearBuilt": 2012},
      {"category": "boat", "type": "snowmobile", "yearBuilt": 2020},
      {"category": "auto", "type": "motorcycle", "yearBuilt": 2018},
      {"category": "auto", "type": "RV", "yearBuilt": 2015},
      {"category": "home", "type": "business", "yearBuilt": 2009}
    ]
  }
}

The payload has data about the new user (identity and address information) and the user’s interests in the types of insurance.

The Compute Blog post on using data flow simulator elaborates on how to use Step Functions paths. To summarize how paths work:

  1. InputPath – What input does a task need?
  2. Parameters – How does the task need the structure of the input to be?
  3. ResultSelectors – What to choose from the task’s output?
  4. ResultPath – Where to put the chosen output?
  5. OutputPath – What output to send to the next state?

The key idea is that the input of downstream states input depends on the output of previous states. JSONPath expressions help structuring input and output between states.

Using JSONPath inside paths

This is how paths are used in the sample application for each type.

InputPath

The first two main tasks in the Step Functions state machine validate the identity and the address of the user. Since both validations are unrelated, they can work independently by using parallel state.

Each state needs the identity and address information provided by the input payload. There is no requirement to provide interests to those states, so InputPath can help answer “What input does a task need?”.

Inside the Check Identity state:

"InputPath": "$.data.identity"

Inside the Check Address state:

"InputPath": "$.data.address"

Parameters

What should the input of the underlying task look like? Check Identity and Check Address use their respective AWS Lambda functions. When Lambda functions or any other AWS service integration is used as a task, the state machine should follow the request syntax of the corresponding service.

For a Lambda function as a task, the state should provide the FunctionName and an optional Payload as parameters. For the Check Identity state, the parameters section looks like:

"Parameters": {
    "FunctionName": "${CheckIdentityFunctionArn}",
    "Payload.$": "$"
}

Here, Payload is the entire identity JSON object provided by InputPath.

ResultSelector

Once the Check Identity task is invoked, the Lambda function successfully validates the user’s identity and responds with an approval response:

{
  "ExecutedVersion": "$LATEST",
  "Payload": {
    "statusCode": "200",
    "body": "{\"approved\": true,\"message\": \"identity validation passed\"}"
  },
  "SdkHttpMetadata": {
    "HttpHeaders": {
      "Connection": "keep-alive",
      "Content-Length": "43",
      "Content-Type": "application/json",
      "Date": "Thu, 16 Apr 2020 17:58:15 GMT",
      "X-Amz-Executed-Version": "$LATEST",
      "x-amzn-Remapped-Content-Length": "0",
      "x-amzn-RequestId": "88fba57b-adbe-467f-abf4-daca36fc9028",
      "X-Amzn-Trace-Id": "root=1-5e989cb6-90039fd8971196666b022b62;sampled=0"
    },
    "HttpStatusCode": 200
  },
  "SdkResponseMetadata": {
    "RequestId": "88fba57b-adbe-467f-abf4-daca36fc9028"
  },
  "StatusCode": 200
}

The identity validation approval must be provided to the downstream states for additional processing. However, the downstream states only need the Payload.body from the preceding JSON.

You can use a combination of intrinsic function and ResultSelector to choose attributes from the task’s output:

"ResultSelector": {
  "identity.$": "States.StringToJson($.Payload.body)"
}

ResultSelector takes the JSON string $.Payload.body and applies States.StringToJson to convert the string to JSON store in a new attribute named identity:

"identity": {
    "approved": true,
    "message": "identity validation passed"
}

When Check Identity and Check Address states finish their work and exit, the step output from each state is captured as a JSON array. This JSON array is the step output of the parallel state. Reconcile the results from the JSON array using the ResultSelector that is available in parallel state.

"ResultSelector": {
    "identityResult.$": "$[0].result.identity",
    "addressResult.$": "$[1].result.address"
}

ResultPath

After ResultSelector, where should the identity result go to in the initial payload? The downstream states need access to the actual input payload in addition to the results from the previous state. ResultPath provides the mechanism to extend the initial payload to add results from the previous state.

ResultPath: "$.result" informs the state machine that any result selected from the task output (actual output if none specified) should go under result JSON attribute and result should get added to the incoming payload. The output from ResultPath looks like:

{
  "data": {
    "firstname": "Jane",
    "lastname": "Doe",
    "identity": {
      "email": "[email protected]",
      "ssn": "123-45-6789"
    },
    "address": {
      "street": "123 Main St",
      "city": "Columbus",
      "state": "OH",
      "zip": "43219"
    },
    "interests": [
      {"category":"home", "type":"own", "yearBuilt":2004},
      {"category":"auto", "type":"car", "yearBuilt":2012},
      {"category":"boat", "type":"snowmobile","yearBuilt":2020},
      {"category":"auto", "type":"motorcycle","yearBuilt":2018},
      {"category":"auto", "type":"RV", "yearBuilt":2015},
      {"category":"home", "type":"business", "yearBuilt":2009}
    ]
  },
  "result": {
    "identity": {
      "approved": true,
      "message": "identity validation passed"
    }
  }
}

The preceding JSON has results from an operation but also the incoming payload is intact for business logic in downstream states.

This pattern ensures that the previous state keeps the payload hydrated for the next state. Use these combinations of paths across all states to make sure that each state has all the information needed.

As with the parallel state’s ResultSelector, an appropriate ResultPath is needed to hold both the results from Check Identity and Check Address to get the below results JSON object added to the payload:

"results": {
  "addressResult": {
    "approved": true,
    "message": "address validation passed"
  },
  "identityResult": {
    "approved": true,
    "message": "identity validation passed"
  }
}

With this approach for all of the downstream states, the input payload is still intact and the state machine has collected results from each state in results.

OutputPath

To return results from the state machine, ideally you do not send back the input payload to the caller of the Step Functions workflow. You can use OutputPath to select a portion of the state output as an end result. OutputPath determines what output to send to the next state.

In the sample application, the last states (Approved Message and Deny Message) defined OutputPath as:

“OutputPath”: “$.results”

The output from the state machine is:

{
  "addressResult": {
    "approved": true,
    "message": "address validation passed"
  },
  "identityResult": {
    "approved": true,
    "message": "identity validation passed"
  },
  "accountAddition": {
    "statusCode": 200
  },
  "homeInsuranceInterests": {
    "statusCode": 200
  },
  "sendApprovedNotification": {
    "statusCode": 200
  }
}

This response strategy is also effective when using a Synchronous Express Workflow for this business logic.

Advanced JSONPath

You can declaratively use advanced JSONPath expressions to apply logic without writing imperative code in utility functions.

Let’s focus on the interests that the new customer has asked for in the input payload. The Step Functions state machine has a state that focuses on interests in the “home” insurance category. Once the new account application is approved and added to the database successfully, the application captures home insurance interests. It adds home-related detail in an HomeInterestsQueue SQS queue and transitions to the Approved Message state.

The interests JSON array has the information about insurance interests. An effective way to get home-related details is to filter out the interests array based on the category “home”. You can try this in data flow simulator:

Data flow simulator

You can apply additional filter expressions to filter data according to your use case. To learn more, visit the the data flow simulator blog.

Inside the state machine JSON, the Home Insurance Interests task has:

"InputPath": "$..interests[?(@.category==home)]"

It uses advanced JSONPath with $.. notation and [?(@.category==home)] filters.

Using advanced expressions on JSONPath is not limited to home insurance interests and can be extended to other categories and business logic.

Cleanup

To delete the sample application, use the latest version of the AWS SAM CLI and run:

sam delete

Conclusion

This post uses a sample application to highlight effective use of JSONPath and data filtering strategies that can be used in Step Functions.

JSONPath provides the flexibility to work on JSON objects and arrays inside the Step Functions states machine by reducing the amount of utility code. It allows developers to build state machines by separating concerns for states’ input and output data. Advanced JSONPath expressions help writing declarative filtering logic without needing imperative utility code, optimizing cost, and performance.

For more serverless learning resources, visit Serverless Land.

Operating serverless at scale: Improving consistency – Part 2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-serverless-at-scale-improving-consistency-part-2/

This post is written by Jerome Van Der Linden, Solutions Architect.

Part one of this series describes how to maintain visibility on your AWS resources to operate them properly. This second part focuses on provisioning these resources.

It is relatively easy to create serverless resources. For example, you can set up and deploy a new AWS Lambda function in a few clicks. By using infrastructure as code, such as the AWS Serverless Application Model (AWS SAM), you can deploy hundreds of functions in a matter of minutes.

Before reaching these numbers, companies often want to standardize developments. Standardization is an effective way of reducing development time and costs, by using common tools, best practices, and patterns. It helps in meeting compliance objectives and lowering some risks, mainly around security and operations, by adopting enterprise standards. It can also reduce the scope of required skills, both in terms of development and operations. However, excessive standardization can reduce agility and innovation and sometimes even productivity.

You may want to provide application samples or archetypes, so that each team does not reinvent the wheel for every new project. These archetypes get a standard structure so that any developer joining the team is quickly up-to-speed. The archetypes also bundle everything that is required by your company:

  • Security library and setup to apply a common security layer to all your applications.
  • Observability framework and configuration, to collect and centralize logs, metrics and traces.
  • Analytics, to measure usage of the application.
  • Other capabilities or standard services you might have in your company.

This post describes different ways to create and share project archetypes.

Sharing AWS SAM templates

AWS SAM makes it easier to create and deploy serverless applications. It uses infrastructure as code (AWS SAM templates) and a command line tool (AWS SAM CLI). You can also create customizable templates that anyone can use to initialize a project. Using the sam init command and the --location option, you can bootstrap a serverless project based on the template available at this location.

For example, here is a CRUDL (Create/Read/Update/Delete/List) microservice archetype. It contains an Amazon DynamoDB table, an API Gateway, and five AWS Lambda functions written in Java (one for each operation):

CRUDL microservice

You must first create a template. Not only an AWS SAM template describing your resources, but also the source code, dependencies and some config files. Using cookiecutter, you can parameterize this template. You add variables surrounded by {{ and }} in files and folders (for example, {{cookiecutter.my_variable}}). You also define a cookiecutter.json file that describes all the variables and their possible values.

In this CRUD microservice, I add variables in the AWS SAM template file, the Java code, and in other project resources:

Variables in the project

You then need to share this template either in a Git/Mercurial repository or an HTTP/HTTPS endpoint. Here the template is available on GitHub.

After sharing, anyone within your company can use it and bootstrap a new project with the AWS SAM CLI and the init command:

$ sam init --location git+ssh://[email protected]/aws-samples/java-crud-microservice-template.git

project_name [Name of the project]: product-crud-microservice
object_model [Model to create / read / update / delete]: product
runtime [java11]:

The command prompts you to enter the variables and generates the following project structure and files.

sam init output files

Variable placeholders have been replaced with their values. The project can now be modified for your business requirements and deployed in the AWS Cloud.

There are alternatives to AWS SAM like AWS CloudFormation modules or AWS CDK constructs. These define high-level building blocks that can be shared across the company. Enterprises with multiple development teams and platform engineering teams can also use AWS Proton. This is a managed solution to create templates (see an example for serverless) and share them across the company.

Creating a base container image for Lambda functions

Defining complete application archetypes may be too much work for your company. Or you want to let teams design their architecture and choose their programming languages and frameworks. But you need them to apply a few patterns (centralized logging, standard security) plus some custom libraries. In that case, use Lambda layers if you deploy your functions as zip files or provide a base image if you deploy them as container images.

When building a Lambda function as a container image, you must choose a base image. It contains the runtime interface client to manage the interaction between Lambda and your function code. AWS provides a set of open-source base images that you can use to create your container image.

Using Docker, you can also build your own base image on top of these, including any library, piece of code, configuration, and data that you want to apply to all Lambda functions. Developers can then use this base image containing standard components and benefit from them. This also reduces the risk of misconfiguration or forgetting to add something important.

First, create the base image. Using Docker and a Dockerfile, specify the files that you want to add to the image. The following example uses the Python base image and installs some libraries (security, logging) thanks to pip and the requirements.txt file. It also adds Python code and some config files:

Dockerfile and requirements.txt

It uses the /var/task folder, which is the working directory for Lambda functions, and where code should reside. Next you must create the image and push it to Amazon Elastic Container Registry (ECR):

# login to ECR with Docker
$ aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <account-number>.dkr.ecr.<region>.amazonaws.com

# if not already done, create a repository to store the image
$ aws ecr create-repository --repository-name <my-company-image-name> --image-tag-mutability IMMUTABLE --image-scanning-configuration scanOnPush=true

# build the image
$ docker build -t <my-company-image-name>:<version> .

# get the image hash (alphanumeric string, 12 chars, eg. "ece3a6b5894c")
$ docker images | grep my-company-image-name | awk '{print $3}'

# tag the image
$ docker tag <image-hash> <account-number>.dkr.ecr.<region>.amazonaws.com/<my-company-image-name>:<version>

# push the image to the registry
$ docker push <account-number>.dkr.ecr.<region>.amazonaws.com/<my-company-image-name>:<version>

The base image is now available to use for everyone with access to the ECR repository. To use this image, developers must use it in the FROM instruction in their Lambda Dockerfile. For example:

FROM <account-number>.dkr.ecr.<region>.amazonaws.com/<my-company-image-name>:<version>

COPY app.py ${LAMBDA_TASK_ROOT}

COPY requirements.txt .

RUN pip3 install -r requirements.txt -t "${LAMBDA_TASK_ROOT}"

CMD ["app.lambda_handler"]

Now all Lambda functions using this base image include your standard components and comply with your requirements. See this blog post for more details on building Lambda container-based functions with AWS SAM.

Conclusion

In this post, I show a number of solutions to create and share archetypes or layers across the company. With these archetypes, development teams can quickly bootstrap projects with company standards and best practices. It provides consistency across applications and helps meet compliance rules. From a developer standpoint, it’s a good accelerator and it also allows them to have some topics handled by the archetype.

In governance, companies generally want to enforce these rules. Part 3 will describe how to be more restrictive using different guardrails mechanisms.

Read more information about AWS Proton, which is now generally available.

For more serverless learning resources, visit Serverless Land.

Using Okta as an identity provider with Amazon MWAA

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-okta-as-an-identity-provider-with-amazon-mwaa/

This post is written by Henry Robalino, Solutions Architect.

Amazon Managed Workflows for Apache Airflow (Amazon MWAA), is a fully managed service that allows data engineers and data scientists to run data processing workflows in the cloud. Okta is a third-party identity provider (IdP) that allows customers to use AWS Single Sign-On (AWS SSO) for their employees to be able to log in quickly and securely.

This blog post shows how to integrate Okta with AWS SSO to access Amazon MWAA using single sign-on.

Overview

Customers use Amazon MWAA to run workflows at scale on the cloud. They want to use their existing login solutions and investments the business made on their current IdP, in this case Okta.

AWS SSO does not yet provide APIs to automate creation and configuration of custom SAML 2.0 applications. As a result, many of the steps in this blog are manual and require using the AWS Management Console.

Prerequisites

Deploying this solution requires:

Creating an Amazon MWAA application in AWS SSO

Create a custom SAML 2.0 application for Amazon MWAA

  1. Sign into the AWS Management Console, using an account with the appropriate permissions to modify AWS SSO.
  2. In the AWS SSO console, navigate to Applications. Select “Add a new application”.
    Add a new application
  3. On the Add New Application page, select “Add a custom SAML 2.0 application”:
    Add a custom SAML 2.0 application
  4. On the Configure Custom SAML 2.0 application:
    1. For Display name, enter AWS_SSO_Amazon_MWAA.
    2. For Description, enter AWS SSO Application for Amazon MWAA.
      Configuration
  5. In the Application metadata section, select the option to manually type in the metadata values.
    Before:
    Application metadata
    After:
    After entries
  6. Enter the Application properties and Application metadata sections:
    • Application start URL: This is the Amazon MWAA WebLogin URL, which you can locate in the Amazon MWAA console.
      • For example: https://123456a0-0101-2020-9e11-1b159eec9000.c2.us-east-1.airflow.amazonaws.com
    • Application ACS URL: This is the Assertion Consumer Service (ACS) URL that AWS SSO provides.
      • For example: https://us-east-1.signin.aws.amazon.com/platform/saml/acs/012345678-0102-0304-0506-EXAMPLE01
    • Application SAML audience: This is the SAML audience that AWS SSO provides.
      • For example: https://us-east-1.signin.aws.amazon.com/platform/saml/d-012345678E
  7. The Application properties and Application metadata now look like this:
    Resulting dialog
  8. Choose Save changes. A custom SAML 2.0 application for Amazon MWAA is created. You are now redirected to the AWS_SSO_Amazon_MWAA application page.
  9. On the Attribute mappings tab, modify the existing Subject attribute to “${user:subject}” and a Format of “unspecified.” Choose Save changes.
    Subject field
  10. On the Assigned users tab, add the previously created Amazon MWAA Okta user. Select Assign users and the user. Choose Save changes.
    Assign users

You have now created a custom application for Amazon MWAA in AWS SSO. You have added a user and configured the attribute mappings.

Configuring an Amazon MWAA Permission Sets in AWS SSO

Assign IAM permissions to the newly created Amazon MWAA application by using a permissions set. A permission set is a collection of administrator-defined policies that AWS SSO uses to determine a user’s effective permissions to access a given AWS account.

  1. Navigate to the AWS SSO console. Select on AWS accounts on the left-hand side. Select the Permission sets tab and choose the Create permission set button.
    Create permission set
  2. Select the Create a custom permission set option.
    Create permission set workflow
  3. Provide a name for the Custom Permission Set and an optional description. Choose the Create a custom permissions policy check box.
    Workflow step 2
  4. In the new text field, add the IAM policy below. This set of permissions is associated with the AWS_SSO_Amazon_MWAA application. Make sure to use the correct Amazon Resource Names (ARN) for your Amazon MWAA environment in the below sample text.Sample IAM policy:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "airflow:GetEnvironment",
                    "airflow:CreateCliToken"
                ],
                "Resource": "arn:aws:airflow:us-east-1:111222333444:environment/MY-MWAA-ENV"
            },
            {
                "Effect": "Allow",
                "Action": "airflow:CreateWebLoginToken",
                "Resource": "arn:aws:airflow:us-east-1:111222333444:role/MY-MWAA-ENV/viewer"
            }
        ]
    }
    

    The policy enables the following permissions:

    • GetEnvironment – retrieves the details of an Amazon MWAA environment
    • CreateCLIToken – creates a CLI token request for an MWAA environment.
    • CreateWebLoginToken – creates an Airflow web UI login token request for the Amazon MWAA environment.

5. Follow the prompts to fill out tags as necessary. Choose Proceed to AWS accounts.
Proceed to AWS accounts

You have now finished configuring the Amazon MWAA application inside of AWS SSO.

Testing and validation

To test and validate the configuration:

  1. Navigate to your Okta SSO portal. Sign in with the appropriate account that is assigned to the Amazon MWAA application.
    Single sign-on
  2. To access Amazon MWAA, select the AWS Account application. This opens up the AWS Management Console in another window. Once this window opens, close it. As of this writing Amazon MWAA does not support “Auth Mode: SSO”, hence this workaround.
  3. Next, select the AWS_SSO_Amazon_MWAA application. You are redirected to the Amazon MWAA SSO Page.
  4. Choose the Sign in with AWS Management Console SSO.
    Sign in to Airflow
  5. You are redirected to the Amazon MWAA web server UI.
    Amazon MWAA web server UI

In this page, you can see all the DAGs available to you and view the DAG history. In the top-right corner, you can see that you are logged in using the AWS SSO assumed role.

Conclusion

This blog post shows you how to integrate Amazon MWAA with Okta as your managed AWS SSO implementation. You can use this solution for your own use cases and enable Okta SSO and Amazon MWAA.

To stay up to date with AWS Identity launches, see: https://aws.amazon.com/blogs/security/highlights-from-the-latest-aws-identity-launches/.

For more serverless learning resources, visit Serverless Land.

Operating serverless at scale: Implementing governance – Part 1

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/operating-serverless-at-scale-implementing-governance-part-1/

This post is written by Jerome Van Der Linden, Solutions Architect.

With serverless services, infrastructure management tasks like capacity provisioning and patching are handled by AWS, so you can focus on writing code and deliver value to your customers. By reducing operational overhead, developers can iterate faster and release new features more often.

But with increased agility and productivity, you must also keep control. When scaling to thousands of AWS Lambda functions, hundreds of AWS Step Functions workflows, and millions of Amazon EventBridge events sent throughout the company, you must maintain visibility. What is provisioned, what is running, and how does everything work as a whole?

This three-part series covers important topics to help you maintain control over a growing set of serverless resources.

Maintaining visibility on resources and workloads

For governance, the first recommendation is to have clear visibility of your environment:

  • Visibility on resources: APIs, Lambda functions, state machines, event buses, queues, or topics. It is essential to have an up-to-date inventory of all your resources together with metadata such as the application it belongs to, the environment where it is deployed, and the owner. This is needed to track cost, manage compliance and evaluate risks.
  • Visibility on how these resources are linked together. They may be components of the same application. You must track who is calling who (in synchronous calls) or who is consuming what message or event from whom (in asynchronous calls). This dynamic view is as necessary as the inventory. It gives you important insight on the architecture and potential security and compliance issues.

This visibility into your AWS environment is essential to understand what you do with it and be able to operate it. It allows you to understand your workloads and make sure they follow your compliance rules. It allows you to track your usage of AWS and potentially optimize and reduce your costs. This is a best practice for building and growing on AWS.

Tagging your resources

For all resources on AWS, assign tags to your resources. A tag consists in a label (the key) and an optional value. It makes it easier to organize, search for, and filter resources by application, environment, or other criteria. Tags can serve different purposes: automation, access control, cost allocation, risk management. Above all, they provide an additional layer of information you can use to understand why each resource exists.

Assigning tags during provisioning is the preferred method. Using the AWS Serverless Application Model (AWS SAM), you can define tags. For example, for a Lambda function:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: function/
      Handler: app.lambda_handler
      Runtime: python3.8
      Tags:
        mycompany:environment: "dev"
        mycompany:application-id: "ecommerce"
        mycompany:service-id: "products"
        mycompany:business-owner: "[email protected]"

As the number of resources increases in a template, you can use the --tags option of the sam deploy command. This applies a set of tags to all compatible resources declared in the template:

sam deploy \
--tags mycompany:environment=dev \
mycompany:application-id=ecommerce \
mycompany:service-id=products \
mycompany:[email protected]

You can also add these tags in the samconfig.toml file so you don’t need to specify them each time on the command line:

tags = "mycompany:environment=dev mycompany:application-id=ecommerce mycompany:service-id=products mycompany:[email protected]"

Enforcing consistency in tags

To maintain organization, tags must be consistent. For example, do you use “environment“, ”Environment“ or ”env“? And is the value ”dev“, ”Dev“ or ”Development“?

  1. Define which tags are necessary and what do you need to identify: the owner, the application, the environment, the business line, etc. You can have up to 50 tags per resource but you should restrict yourself to a set of needed tags and iterate. Apply the YAGNI principle to minimize maintenance.
  2. Agree on the syntax. You may use spinal-case (lower case with hyphens to separate words) and a prefix to identify your company. For example, mycompany:application-id. Having a tag dictionary shared with developers and administrators may be useful. See this guide for best practices.
  3. Enforce the “rules” that are established:
    • At the Organization level, use Service Control Policies to block the creation of a resource if not correctly tagged. The following example denies the creation of Lambda functions when the environment tag is absent. You can find more examples here. Before applying, test policies in a sandbox:
      {
         "Sid": "DenyCreateLambdaWithNoEnvironmentTag",
         "Effect": "Deny",
         "Action": "lambda:CreateFunction",
         "Resource": [
            "arn:aws:lambda:*:*:function/*"
         ],
         "Condition": {
            "Null": {
                "aws:RequestTag/environment": "true"
             }
          }
      }
      
    • Use Config Rules in AWS Config to verify that resources have the appropriate tags (see this example for Lambda). You can also perform remediation if necessary.
    • Use Tag Policies to enforce consistency across your accounts and resources. Tag Policies verify the syntax and values set on your resources. They mark as non-compliant all the resources that do not match the policies. For example, the following policy defines the tag “environment“ and its possible values:
      {
          "tags": {
              "mycompany:environment": {
                  "tag_value": {
                      "@@assign": [
                          "dev",
                          "test",
                          "qa",
                          "prod"
                      ]
                  }
              }
          }
      }
      

4. Monitor the percentage of resources untagged or badly tagged and try to improve. Also iterate on your tag dictionary as your requirements evolve by adding new tags or removing unused ones.

Applying tags on serverless resources

For serverless resources, there are additional tags you may add:

  • For an API or a microservice, you may want to know if it is public (B2C), semi-public (B2B) or private (internal). This can help you adjust the RTO and the level of support. You can add a tag “exposition” with a value “public” or “private” to your API Gateway and Lambda functions.
  • For an API or a microservice again, you may want to know the route from where traffic is coming. For a Lambda function, use the tag “route” with a value like “POST /products”, “GET /products/_id_” (“{}” are not valid characters for tags).
  • For Lambda functions, add sources (triggers) and destinations within tags. For example, add the SNS topic name or SQS queue name to a “trigger” or “destination” tag. This helps document the dependencies between resources and have a better view of the architecture.

Adding more tags increases maintenance, and unmaintained tags can be misleading. Add tags if you really need them and if you can automate their maintenance. For example with AWS CloudFormation or AWS SAM, or using scripts or scheduled Lambda functions (using propagate-cfn-tags for example).

Grouping related resources

In addition to tags, use resource groups to better organize resources, by creating groups of related resources. For example, you can consolidate a set of Lambda functions and APIs for a microservice, or more globally a set of components related to the same application.

If you already created tagged resources, you can create a resource group based on these tags, using the following command. This example groups all supported resources that are tagged with the tag “mycompany:service-id” and value “products”:

aws resource-groups create-group \
--name products-service \
--resource-query '{"Type":"TAG_FILTERS_1_0","Query":"{\"ResourceTypeFilters\":[\"AWS::AllSupported\"],\"TagFilters\":[{\"Key\":\"mycompany:service-id\",\"Values\":[\"products\"]}]}"}'

If you use infrastructure as code (AWS CloudFormation or AWS SAM), which is the recommended approach, you can have a resource group created for a complete stack. This is convenient for serverless applications with a reasonable number of related resources:

Resources:
  ResourceGroup:
    Type: AWS::ResourceGroups::Group
    Properties:
      Name: products-service

With a group defined, visualize the list of its resources in the AWS Management Console or using the following CLI command:

aws resource-groups list-group-resources --group products-service

For more details on resource groups, read this blog post.

Getting a dynamic view of your resources

Tags and resource groups give you a static and declarative view of your resources and their relations. To know what is running, and how everything is linked, it is also important to have a dynamic view. It is the best representation of your environment but since it is often documentation-based and rarely automated, it quickly becomes inaccurate and outdated.

Using AWS X-Ray, you can trace requests made from one resource to another, and thus have a complete map of interactions between components of your application. Combine it with Amazon CloudWatch Logs and metrics and you have Amazon CloudWatch ServiceLens.

The primary use case for ServiceLens is to help you debug and troubleshoot performance issues. But you can also use the Service Map to gain visibility into your applications. What are the components that make up your application? What are the transactions and dependencies between those components?

AWS X-Ray ServiceLens

To obtain such a map, you must enable X-Ray tracing in your application for all supported resources. This can be done in the AWS SAM template by enabling “tracing”:

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: function/
      Handler: app.lambda_handler
      Runtime:python3.8
      Tracing: Active
      
  MyApi:
    Type: AWS::Serverless::Api
    Properties:
      DefinitionBody:
        Fn::Transform:
          Name: "AWS::Include"
          Parameters:
            Location: "resources/openapi.yaml"
      EndpointConfiguration: REGIONAL
      StageName: prod
      TracingEnabled: true
      
  MyStateMachine:
    Type: AWS::Serverless::StateMachine
    Properties:
      DefinitionUri: statemachine/my_state_machine.asl.json
      Role: arn:aws:iam::123456123456:role/service-role/my-sample-role
      Tracing:
        Enabled: True

You may also need to instrument your Lambda function code using the X-Ray SDK, to retrieve and propagate traces when using services like Amazon SNS, Amazon SQS or Amazon EventBridge.

When this is done, open the CloudWatch ServiceLens console to get the dynamic view of all your components. See their respective size (based on the number of requests they handle), their relations, and the services they use. As it is based on the real execution of your application, it is always up to date.

Conclusion

Having visibility on your AWS resources is the key to operating and growing successfully. In this first part of this series on serverless governance, I describe how you can get this visibility by using tags to organize and group your resources, and ease the search and management of related resources. I also describe how AWS X-Ray, combined with CloudWatch ServiceLens, can provide a dynamic view of workloads and help you understand how serverless resources are acting together.

The second part will focus on provisioning and how to standardize deployments to improve consistency and compliance.

Read this whitepaper on tagging best practices to get more details on tags.

For more serverless learning resources, visit Serverless Land.

Simplifying B2B integrations with AWS Step Functions Workflow Studio

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/simplifying-b2b-integrations-with-aws-step-functions-workflow-studio/

This post is written by Patrick Guha, Associate Solutions Architect, WWPS, Hamdi Hmimy, Associate Solutions Architect, WWPS, and Madhu Bussa, Senior Solutions Architect, WWPS.

B2B integration helps organizations with disparate technologies communicate and exchange business-critical information. However, organizations typically have few options in building their B2B pipeline infrastructure. Often, they use an out-of-the-box SaaS integration platform that may not meet all of their technical requirements. Other times, they code an entire pipeline from scratch.

AWS Step Functions offers a solution to this challenge. It provides both the code customizability of AWS Lambda with the low-code, visual building experience of Workflow Studio.

This post introduces a customizable serverless architecture for B2B integrations. It explains how the solution works and how it can save you time in building B2B integrations.

Overview

The following diagram illustrates components and interactions of the example application architecture:

Example architecture

This section describes building a configurable B2B integration pipeline with AWS serverless technologies. It walks through the components and discusses the flow of transactions from trading partners to a consolidated database.

Communication

The B2B integration pipeline accepts transactions both in batch (a single file with many transactions) and in real-time (a single transaction) modes. Batch communication is over open standard SFTP protocol, and real-time communication is over the REST/HTTPS protocol.

For batch communication needs, AWS Transfer Family provides managed support for file transfers directly between Amazon Simple Storage Service (S3) or Amazon Elastic File System (EFS). EFS provides a serverless, elastic file system that lets you share file data without provisioning or managing storage.

Amazon EventBridge provides serverless event bus functionality to automate specific actions in the B2B pipeline. In this case, batch transaction uploads from a partner trigger the B2B pipeline. When a file is put in S3 from the Transfer SFTP server, EventBridge captures the event and routes to a target Lambda function.

As batch transactions are saved via AWS Transfer SFTP to Amazon S3, AWS CloudTrail captures the events. It provides the underlying API requests as files are PUT into S3, which triggers the EventBridge rule created previously.

For real-time communication needs, Amazon API Gateway provides an API management layer, allowing you to manage the lifecycle of APIs. Trading partners can send their transactions to this API over the ubiquitous REST API protocol.

Processing

Amazon Simple Queue Service (SQS) is a fully managed queuing service that allows you to decouple applications. In this solution, SQS manages and stores messages to be processed individually.

Lambda is a fully managed serverless compute service that allows you to create business logic functions as needed. In this example, Lambda functions process the data from the pipeline to clean, format, and upload the transactions from SQS.

Step Functions manages the workflow of a B2B transaction. Step Functions is a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications. Workflows manage failures, retries, parallelization, service integrations, and observability so developers can focus on higher-value business logic.

API Gateway is used in the processing pipeline of the solution to enrich the transactions coming through the pipeline.

Amazon DynamoDB serves as the database for the solution. DynamoDB is a key-value and document database that can scale to virtually any number of requests. As the pipeline experiences a wide range of transaction throughputs, DynamoDB is a good solution to store processed transactions as they arrive.

Batch transaction flow

  1. A trading partner logs in to AWS Transfer SFTP and uploads a batch transaction file.
  2. An S3 bucket is populated with the batch transaction file.
  3. A CloudTrail data event captures the batch transaction file being PUT into S3.
  4. An EventBridge rule is triggered from the CloudTrail data event.
  5. Lambda is triggered from the EventBridge rule. It processes each message from the batch transaction file and sends individual messages to SQS.
  6. SQS stores each message from the file as it is passed through from Lambda.
  7. Lambda is triggered from each SQS incoming message, then invokes Step Functions to run through the following steps for each transaction.
  8. Lambda accepts and formats the transaction payload.
  9. API Gateway enriches the transaction.
  10. DynamoDB stores the transaction.

Single/real-time transaction flow

  1. A trading partner uploads a single transaction via an API Gateway REST API.
  2. API Gateway sends a single transaction to Lambda SQS writer function via proxy integration.
  3. SQS stores each message from the API POSTs.
  4. Lambda is triggered from each SQS incoming message. It invokes Step Functions to run through the workflow for each transaction.
  5. Lambda accepts and formats the transaction payload.
  6. API Gateway enriches the transaction.
  7. DynamoDB stores the transaction.

Exploring and testing the architecture

To explore and test the architecture, there is an AWS Serverless Application Model (AWS SAM) template. The AWS SAM template creates an AWS CloudFormation stack for you. This can help you save time building your own B2B pipeline, as you can deploy and customize the example application.

To deploy in your own AWS account:

  1. To install AWS SAM, visit the installation page.
  2. To deploy the AWS SAM template, navigate to the directory where the template is located. Run the following bash commands in the terminal:
    git clone https://github.com/aws-samples/simplified-serverless-b2b-application
    cd simplified-serverless-b2b-application
    sam build
    sam deploy --guided --capabilities CAPABILITY_NAMED_IAM

Prerequisites

  1. Create an SSH key pair. To authenticate to the AWS Transfer SFTP server and upload files, you must generate an SSH key pair. Once created, you must copy the contents of the public key to the SshPublicKeyParameter displayed after running the sam deploy command. Follow the instructions to create an SSH key pair for Transfer.
  2. Copy batch and real-time input. The following XML content contains multiple example transactions to be processed in the batch workflow. Create an XML file on your machine with the following content:
    <?xml version="1.0" encoding="UTF-8"?>
    <Transactions>
      <Transaction TransactionID="1">
        <Notes>Transaction made between user 57 and user 732.</Notes>
      </Transaction>
      <Transaction TransactionID="2">
        <Notes>Transaction made between user 9824 and user 2739.</Notes>
      </Transaction>
      <Transaction TransactionID="3">
        <Notes>Transaction made between user 126 and user 543.</Notes>
      </Transaction>
      <Transaction TransactionID="4">
        <Notes>Transaction made between user 5785 and user 839.</Notes>
      </Transaction>
      <Transaction TransactionID="5">
        <Notes>Transaction made between user 83782 and user 547.</Notes>
      </Transaction>
      <Transaction TransactionID="6">
        <Notes>Transaction made between user 64783 and user 1638.</Notes>
      </Transaction>
      <Transaction TransactionID="7">
        <Notes>Transaction made between user 785 and user 7493.</Notes>
      </Transaction>
      <Transaction TransactionID="8">
        <Notes>Transaction made between user 5473 and user 3829.</Notes>
      </Transaction>
      <Transaction TransactionID="9">
        <Notes>Transaction made between user 3474 and user 9372.</Notes>
      </Transaction>
      <Transaction TransactionID="10">
        <Notes>Transaction made between user 1537 and user 9473.</Notes>
      </Transaction>
      <Transaction TransactionID="11">
        <Notes>Transaction made between user 2837 and user 7383.</Notes>
      </Transaction>
    </Transactions>
    

    Similarly, the following content contains a single transaction to be processed in the real-time workflow.

    transactionId=12&transactionMessage= Transaction made between user 687 and user 329.

  3. Download Cyberduck, an SFTP client, to upload the batch transaction file to the B2B pipeline.

Uploading the XML file to Transfer and POST to API Gateway

Use Cyberduck to upload the batch transaction file to the B2B pipeline. Follow the instructions here to upload the preceding transactions XML file. You can find the Transfer server endpoint in both the Transfer console and the Outputs section of the AWS SAM template.

Use the API Gateway console to test the POST method for the single transaction workflow. Navigate to API Gateway in the AWS Management Console and choose the REST API created by the AWS SAM template called SingleTransactionAPI.

In the Resources pane, view the methods. Choose the POST method. Next, choose the client Test bar on the left.

API Gateway console

Copy the single real-time transaction into the Query Strings text box then choose Test. This sends the single transaction and starts the real-time workflow of the B2B pipeline.

Testing in the console

Viewing Step Functions executions

Navigate to the Step Functions console. Choose the state machine created by the AWS SAM template called StepFunctionsStateMachine.

In the Executions tab, you see a number of successful executions. Each transaction represents a Step Functions state machine execution. This means that every time a transaction is submitted by a trading partner to SQS, it is individually processed as a unique Step Functions state machine execution. This granularity is useful for tracking transactions.

Step Functions console

Viewing Workflow Studio

Next, view the Step Functions state machine definition. On the StepFunctionsStateMachine page, choose Edit. You see a code and visual representation of your workflow.

The code version uses Amazon States Language, allowing you to modify the state machine as needed. Choose the Workflow Studio button to get a visual representation of the services and integrations in the workflow.

ASL in Workflow Studio

The Workflow Studio helps you to save time while building a B2B pipeline. There are over 40 different actions you can take on various AWS services and flow states that can provide additional logic to the workflow.

Step Functions Workflow Studio

One of the largest benefits of Workflow Studio are the time-savings possible through built-in integrations to AWS services. This architecture includes two integrations: API Gateway request and DynamoDB PutItem.

Choose the API Gateway request state in the diagram. To make a request to the API Gateway REST API, you update the API Parameters section in Configuration. Previously, you may have used a Lambda function to perform this action, adding extra code to maintain a B2B pipeline.

Updating parameters

The same is true for DynamoDB. Choose the DynamoDB PutItem state to view more. The configuration to put the item is made in the API Parameters section. To connect to any other AWS services via Actions, add Identity and Access Manager (IAM) permissions for Step Functions to access them. These examples include the necessary IAM permissions for Step Functions to both API Gateway and DynamoDB in the AWS SAM template.

PutItem workflow

Cleaning up

To avoid ongoing charges to your AWS account, remove the created resources:

  • Use the CloudFormation console to delete the stack created as part of this demo. Choose the stack, choose Delete, and then choose Delete stack.
  • Navigate to the S3 console and both Empty then Delete S3 buckets created from the stack: [stackName]-cloudtrails3bucket-[uniqueId] and [stackName]-sftpservers3bucket-[uniqueId]
  • Navigate to the CloudWatch console and delete the following log groups created from the stack: /aws/lambda/IntakeSingleTransactionLambdaFunction, /aws/lambda/SingleQueueUploadLambda, and /aws/lambda/TriggerStepFunctionsLambdaFunction.

Conclusion

This post shows an architecture to share your business data with your trading partners using API Gateway, AWS Transfer for SFTP, Lambda, and Step Functions. This architecture enables organizations to quickly on-board partners, build event-driven pipelines, and streamline business processes.

To learn more about B2B pipelines on AWS, read: https://aws.amazon.com/blogs/compute/integrating-b2b-using-event-notifications-with-amazon-sns/.

For more serverless learning resources, visit Serverless Land.

Building an API poller with AWS Step Functions and AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-an-api-poller-with-aws-step-functions-and-aws-lambda/

This post is written by Siarhei Kazhura, Solutions Architect.

Many customers have to integrate with external APIs. One of the most common use cases is data synchronization between a customer and their trusted partner.

There are multiple ways of doing this. For example, the customer can provide a webhook that the partner can call to notify the customer of any data changes. Often the customer has to poll the partner API to stay up to date with the changes. Even when using a webhook, a complete synchronization happening on schedule is necessary.

Furthermore, the partner API may not allow loading all the data at once. Often, a pagination solution allows loading only a portion of the data via one API call. That requires the customer to build an API poller that can iterate through all the data pages to fully synchronize.

This post demonstrates a sample API poller architecture, using AWS Step Functions for orchestration, AWS Lambda for business logic processing, along with Amazon API Gateway, Amazon DynamoDB, Amazon SQS, Amazon EventBridge, Amazon Simple Storage Service (Amazon S3), and the AWS Serverless Application Model (AWS SAM).

Overall architecture

Reference architecture

The application consists of the following resources defined in the AWS SAM template:

  • PollerHttpAPI: The front door of the application represented via an API Gateway HTTP API.
  • PollerTasksEventBus: An EventBridge event bus that is directly integrated with API Gateway. That means that an API call results in an event being created in the event bus. EventBridge allows you to route the event to the destination you want. It also allows you to archive and replay events as needed, adding resiliency to the architecture. Each event has a unique id that this solution uses for tracing purposes.
  • PollerWorkflow: The Step Functions workflow.
  • ExternalHttpApi: The API Gateway HTTP API that is used to simulate an external API.
  • PayloadGenerator: A Lambda function that is generating a sample payload for the application.
  • RawPayloadBucket: An Amazon S3 bucket that stores the payload received from the external API. The Step Functions supported payload size is up to 256 KB. For larger payloads, you can store the API payload in an S3 bucket.
  • PollerTasksTable: A DynamoDB table that tracks each poller’s progress. The table has a TimeToLive (TTL) attribute enabled. This automatically discards tasks that exceed the TTL value.
  • ProcessPayoadQueue: Amazon SQS queue that decouples our payload fetching mechanism from our payload processing mechanism.
  • ProcessPayloadDLQ: Amazon SQS dead letter queue is collecting the messages that we are unable to process.
  • ProcessPayload: Lambda function that is processing the payload. The function reports progress of each poller task, marking it as complete when given payload is processed successfully.

Data flow

Data flow

When the API poller runs:

  1. After a POST call is made to PollerHttpAPI /jobs endpoint, an event containing the API payload is put on the PollerTasksEventBus.
  2. The event triggers the PollerWorkflow execution. The event payload (including the event unique id) is passed to the PollerWorkflow.
  3. The PollerWorkflow starts by running the PreparePollerJob function. The function retrieves required metadata from the ExternalHttpAPI. For example, the total number of records to be loaded and maximum records that can be retrieved via a single API call. The function creates poller tasks that are required to fetch the data. The task calculation is based on the metadata received.
  4. The PayloadGenerator function generates random ExternalHttpAPI payloads. The PayloadGenerator function also includes code that simulates random errors and delays.
  5. All the tasks are processed in a fan-out fashion using dynamic-parallelism. The FetchPayload function retrieves a payload chunk from the ExternalHttpAPI, and the payload is saved to the RawPayloadBucket.
  6. A message, containing a pointer to the payload file stored in the RawPayloadBucket, the id of the task, and other task information is sent to the ProcessPayloadQueue. Each message has jobId and taskId attributes. This helps correlate the message with the poller task.
  7. Anytime a task is changing its status (for example, when the payload is saved to S3 bucket, or when a message has been sent to SQS queue) the progress is reported to the PollerTaskTable.
  8. The ProcessPayload function is long-polling the ProcessPayloadQueue. As messages appear on the queue, they are being processed.
  9. The ProcessPayload function is removing an object from the RawPayloadBucket. This is done to illustrate a type of processing that you can do with the payload stored in the S3 bucket.
  10. After the payload is removed successfully, the progress is reported to the PollerTasksTable. The corresponding task is marked as complete.
  11. If the ProcessPayload function experiences errors, it tries to process the message several times. If it cannot process the message, the message is pushed to the ProcessPayloadDLQ. This is configured as a dead-letter queue for the ProcessPayloadQueue.

Step Functions state machine

State machine

The Step Functions state machine orchestrates the following workflow:

  1. Fetch external API metadata and create tasks required to fetch all payload.
  2. For each task, report that the task has entered Started state.
  3. Since the task is in the Started state, the next action is FetchPayload
  4. Fetch payload from the external API and store it in an S3 bucket.
  5. In case of success, move the task to a PayloadSaved state.
  6. In case of an error, report that the task is in a failed state.
  7. Report that the task has entered PayloadSaved (or failed) state.
  8. In case the task is in the PayloadSaved state, move to the SendToSQS step. If the task is in a failed state, exit.
  9. Send the S3 object pointer and additional task metadata to the SQS queue.
  10. Move the task to an enqueued state.
  11. Report that the task has entered enqueued state.
  12. Since the task is in the enqueued state, we are done.
  13. Combine the results for a single task execution.
  14. Combine the results for all the task executions.

Prerequisites to implement the solution

The following prerequisites are required for this walk-through:

Step-by-step instructions

You can use AWS Cloud9, or your preferred IDE, to deploy the AWS SAM template. Refer to the cleanup section of this post for instructions to delete the resources to stop incurring any further charges.

  1. Clone the repository by running the following command:
    git clone https://github.com/aws-samples/sam-api-poller.git
  2. Change to the sam-api-poller directory, install dependencies and build the application:
    npm install
    sam build -c -p
  3. Package and deploy the application to the AWS Cloud, following the series of prompts. Name the stack sam-api-poller:
    sam deploy --guided --capabilities CAPABILITY_NAMED_IAM
  4. SAM outputsAfter stack creation, you see ExternalHttpApiUrl, PollerHttpApiUrl, StateMachineName, and RawPayloadBucket in the outputs section.
    CloudFormation outputs
  5. Store API URLs as variables:
    POLLER_HTTP_API_URL=$(aws cloudformation describe-stacks --stack-name sam-api-poller --query "Stacks[0].Outputs[?OutputKey=='PollerHttpApiUrl'].OutputValue" --output text)
    
    EXTERNAL_HTTP_API_URL=$(aws cloudformation describe-stacks --stack-name sam-api-poller --query "Stacks[0].Outputs[?OutputKey=='ExternalHttpApiUrl'].OutputValue" --output text)
  6. Make an API call:
    REQUEST_PYLOAD=$(printf '{"url":"%s/payload"}' $EXTERNAL_HTTP_API_URL)
    EVENT_ID=$(curl -d $REQUEST_PYLOAD -H "Content-Type: application/json" -X POST $POLLER_HTTP_API_URL/jobs | jq -r '.Entries[0].EventId')
    
  7. The EventId that is returned by the API is stored in a variable. You can trace all the poller tasks related to this execution via the EventId. Run the following command to track task progress:
    curl -H "Content-Type: application/json" $POLLER_HTTP_API_URL/jobs/$EVENT_ID
  8. Inspect the output. For example:
    {"Started":9,"PayloadSaved":15,"Enqueued":11,"SuccessfullyCompleted":0,"FailedToComplete":0,"Total":35}%
  9. Navigate to the Step Functions console and choose the state machine name that corresponds to the StateMachineName from step 4. Choose an execution and inspect the visual flow.
    Workflow
  10. Inspect each individual step by clicking on it. For example, for the PollerJobComplete step, you see:
    Step output

Cleanup

  1. Make sure that the `RawPayloadBucket` bucket is empty. In case the bucket has some files, follow emptying a bucket guide.
  2. To delete all the resources permanently and stop incurring costs, navigate to the CloudFormation console. Select the sam-api-poller stack, then choose Delete -> Delete stack.

Cost optimization

For Step Functions, this example uses the Standard Workflow type because it has a visualization tool. If you are planning to re-use the solution, consider switching from standard to Express Workflows. This may be a better option for the type of workload in this example.

Conclusion

This post shows how to use Step Functions, Lambda, EventBridge, S3, API Gateway HTTP APIs, and SQS to build a serverless API poller. I show how you can deploy a sample solution, process sample payload, and store it to S3.

I also show how to perform clean-up to avoid any additional charges. You can modify this example for your needs and build a custom solution for your use case.

For more serverless learning resources, visit Serverless Land.

Managing federated schema with AWS Lambda and Amazon S3

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/managing-federated-schema-with-aws-lambda-and-amazon-s3/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

GraphQL schema management is one of the biggest challenges in the federated setup. IMDb has 19 subgraphs (graphlets) – each of them owns and publishes a part of the schema as a part of an independent CI/CD pipeline.

To manage federated schema effectively, IMDb introduced a component called Schema Manager. This is responsible for fetching the latest schema changes and validating them before publishing it to the Gateway.

Part 1 presents the migration from a monolithic REST API to a federated GraphQL (GQL) endpoint running on AWS Lambda. This post focuses on schema management in federated GQL systems. It shows the challenges that the teams faced when designing this component and how we addressed them. It also shares best practices and processes for schema management, based on our experience.

Comparing monolithic and federated GQL schema

In the standard, monolithic implementation of GQL, there is a single file used to manage the whole schema. This makes it easier to ensure that there are no conflicts between the new changes and the earlier schema. Everything can be validated at the build time and there is no risk that external changes break the endpoint during runtime.

This is not true for the federated GQL endpoint. The gateway fetches service definitions from the graphlets on runtime and composes the overall schema. If any of the graphlets introduces a breaking change, the gateway fails to compose the schema and won’t be able to serve the requests.

The more graphlets we federate to, the higher the risk of introducing a breaking change. In enterprise scale systems, you need a component that protects the production environment from potential downtime. It must notify graphlet owners that they are about to introduce a breaking change, preferably during development before releasing the change.

Federated schema challenges

There are other aspects of handling federated schema to consider. If you use AWS Lambda, the default schema composition increases the gateway startup time, which impacts the endpoint’s performance. If any of the declared graphlets are unavailable at the time of schema composition, there may be gateway downtime or at least an incomplete overall schema. If schemas are pre-validated and stored in a highly available store such as Amazon S3, you mitigate both of these issues.

Another challenge is schema consistency. Ideally, you want to propagate the changes to the gateway in a timely manner after a schema change is published. You also need to consider handling field deprecation and field transfer across graphlets (change of ownership). To catch potential errors early, the system should support dry-run-like functionality that will allow developers to validate changes against the current schema during the development stage.

The Schema Manager

Schema Manager

To mitigate these challenges, the Gateway/Platform team introduces a Schema Manager component to the workload. Whenever there’s a deployment in any of the graphlet pipelines, the schema validation process is triggered.

Schema Manager fetches the most recent sub-schemas from all the graphlets and attempts to compose an overall schema. If there are no errors and conflicts, a change is approved and can be safely promoted to production.

In the case of a validation failure, the breaking change is blocked in the graphlet deployment pipeline and the owning team must resolve the issue before they can proceed with the change. Deployments of graphlet code changes also depend on this approval step, so there is no risk that schema and backend logic can get out of sync, when the approval step blocks the schema change.

Integration with the Gateway

To handle versioning of the composed schema, a manifest file stores the locations of the latest approved set of graphlet schemas. The manifest is a JSON file mapping the name of the graphlet to the S3 key of the schema file, in addition to the endpoint of the graphlet service.

The file name of each graphlet schema is a hash of the schema. The Schema Manager pulls the current manifest and uses the hash of the validated schema to determine if it has changed:

{
   "graphlets": {
     "graphletOne": {
        "schemaPath": "graphletOne/1a3121746e75aafb3ca9cccb94f23d89",
        "endpoint": "arn:aws:lambda:us-east-1:123456789:function:GraphletOne"
     },
     "graphletTwo": { 
        "schemaPath": "graphletTwo/213362c3684c89160a9b2f40cd8f191a",
        "endpoint": "arn:aws:lambda:us-east-1:123456789:function:GraphletTwo"
     },
     ...
  }
}

Based on these details, the Gateway fetches the graphlet schemas from S3 as part of service startup and stores them in the in-memory cache. It later polls for the updates every 5 minutes.

Using S3 as the schema store addresses the latency, availability and validation concerns of fetching schemas directly from the graphlets on runtime.

Eventual schema consistency

Since there are multiple graphlets that can be updated at the same time, there is no guarantee that one schema validation workflow will not overwrite the results of another.

For example:

  1. SchemaUpdater 1 runs for graphlet A.
  2. SchemaUpdater 2 runs for graphlet B.
  3. SchemaUpdater 1 pulls the manifest v1.
  4. SchemaUpdater 2 pulls the manifest v1.
  5. SchemaUpdater 1 uploads manifest v2 with change to graphlet A
  6. SchemaUpdater 2 uploads manifest v3 that overwrites the changes in v2. Contains only changes to graphlet B.

This is not a critical issue because no matter which version of the manifest wins in this scenario both manifests represent a valid schema and the gateway does not have any issues. When SchemaUpdater is run for graphlet A again, it sees that the current manifest does not contain the changes uploaded before, so it uploads again.

To reduce the risk of schema inconsistency, Schema Manager polls for schema changes every 15 minutes and the Gateway polls every 5 minutes.

Local schema development

Schema validation runs automatically for any graphlet change as a part of deployment pipelines. However, that feedback loop happens too late for an efficient schema development cycle. To reduce friction, the team uses a tool that performs this validation step without publishing any changes. Instead, it would output the results of the validation to the developer.

Schema validation

The Schema Validator script can be added as a dependency to any of the graphlets. It fetches graphlet’s schema definition described in Schema Definition Language (SDL) and passes it as payload to Schema Manager. It performs the full schema validation and returns any validation errors (or success codes) to the user.

Best practices for federated schema development

Schema Manager addresses the most critical challenges that come from federated schema development. However, there are other issues when organizing work processes at your organization.

It is crucial for long term maintainability of the federated schema to keep a high-quality bar for the incoming schema changes. Since there are multiple owners of sub-schemas, it’s good to keep a communication channel between the graphlet teams so that they can provide feedback for planned schema changes.

It is also good to extract common parts of the graph to a shared library and generate typings and the resolvers. This lets the graphlet developers benefit from strongly typed code. We use open-source libraries to do this.

Conclusion

Schema Management is a non-trivial challenge in federated GQL systems. The highest risk to your system availability comes with the potential of introducing breaking schema change by one of the graphlets. Your system cannot serve any requests after that. There is the problem of the delayed feedback loop for the engineers working on schema changes and the impact of schema composition during runtime on the service latency.

IMDb addresses these issues with a Schema Manager component running on Lambda, using S3 as the schema store. We have put guardrails in our deployment pipelines to ensure that no breaking change is deployed to production. Our graphlet teams are using common schema libraries with automatically generated typings and review the planned schema changes during schema working group meetings to streamline the development process.

These factors enable us to have stable and highly maintainable federated graphs, with automated change management. Next, our solution must provide mechanisms to prevent still-in-use fields from getting deleted and to allow schema changes coordinated between multiple graphlets. There are still plenty of interesting challenges to solve at IMDb.

For more serverless learning resources, visit Serverless Land.

Building federated GraphQL on AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-federated-graphql-on-aws-lambda/

This post is written by Krzysztof Lis, Senior Software Development Engineer, IMDb.

IMDb is the world’s most popular source for movie, TV, and celebrity content. It deals with a complex business domain including movies, shows, celebrities, industry professionals, events, and a distributed ownership model. There are clear boundaries between systems and data owned by various teams.

Historically, IMDb uses a monolithic REST gateway system that serves clients. Over the years, it has become challenging to manage effectively. There are thousands of files, business logic that lacks clear ownership, and unreliable integration tests tied to the data. To fix this, the team used GraphQL (GQL). This is a query language for APIs that lets you request only the data that you need and a runtime for fulfilling those queries with your existing data.

It’s common to implement this protocol by creating a monolithic service that hosts the complete schema and resolves all fields requested by the client. It is good for applications with a relatively small domain and clear, single-threaded ownership. IMDb chose the federated approach, that allows us to federate GQL requests to all existing data teams. This post shows how to build federated GraphQL on AWS Lambda.

Overview

This article covers migration from a monolithic REST API and monolithic frontend to a federated backend system powering a modern frontend. It enumerates challenges in the earlier system and explains why federated GraphQL addresses these problems.

I present the architecture overview and describe the decisions made when designing the new system. I also present our experiences with developing and running high-traffic and high-visibility pages on the new endpoint – improvement in IMDb’s ownership model, development lifecycle, in addition to ease of scaling.

Comparing GraphQL with federated GraphQL

Comparing GraphQL with federated GraphQL

Federated GraphQL allows you to combine GraphQLs APIs from multiple microservices into a single API. Clients can make a single request and fetch data from multiple sources, including joining across data sources, without additional support from the source services.

This is an example schema fragment:

type TitleQuote {
  "Quote ID"
  id: ID!
  "Is this quote a spoiler?"
  isSpoiler: Boolean!
  "The lines that make up this quote"
  lines: [TitleQuoteLine!]!
  "Votes from users about this quote..."
  interestScore: InterestScore!
  "The language of this quote"
  language: DisplayableLanguage!
}
"A specific line in the Title Quote. Can be a verbal line with characters speaking or stage directions"
type TitleQuoteLine {
  "The characters who speak this line, e.g.  'Rick'. Not required: a line may be non-verbal"
  characters: [TitleQuoteCharacter!]
  "The body of the quotation line, e.g 'Here's looking at you kid. '.  Not required: you may have stage directions with no dialogue."
  text: String
  "Stage direction, e.g. 'Rick gently places his hand under her chin and raises it so their eyes meet'. Not required."
  stageDirection: String
}

This is an example monolithic query: “Get the 2 top quotes from The A-Team (title identifier: tt0084967)”:

{ 
  title(id:"tt0084967"){ 
    quotes(first:2){ 
      lines { text } 
    } 
  }
}

Here is an example response:

{ 
  "data": { 
    "title": { 
      "quotes": { 
        "lines": [
          { 
            "text": "I love it when a plan comes together!"
          },
          {
            "text": "10 years ago a crack commando unit was sent to prison by a military court for a crime they didn't commit..."
          }
        ]
      } 
    }
  }
}

This is an example federated query: “What is Jackie Chan (id nm0000329) known for? Get the text, rating and image for each title”

{
  name(id: "nm0000329") {
    knownFor(first: 4) {
      title {
        titleText {
          text
        }
        ratingsSummary {
          aggregateRating
        }
        primaryImage {
          url
        }
      }
    }
  }
}

The monolithic example fetches quotes from a single service. In the federated example, knownFor, titleText, ratingsSummary, primaryImage are fetched transparently by the gateway from separate services. IMDb federates the requests across 19 graphlets, which are transparent to the clients that call the gateway.

Architecture overview

Architecture overview

IMDb supports three channels for users: website, iOS, and Android apps. Each of the channels can request data from a single federated GraphQL gateway, which federates the request to multiple graphlets (sub-graphs). Each of the invoked graphlets returns a partial response, which the gateway merges with responses returned by other graphlets. The client receives only the data that they requested, in the shape specified in the query. This can be especially useful when the developers must be conscious of network usage (for example, over mobile networks).

This is the architecture diagram:

Architecture diagram

There are two core components in the architecture: the Gateway and Schema Manager, which run on Lambda. The Gateway is a Node.js-based Lambda function that is built on top of open-source Apollo Gateway code. It is customized with code responsible predominantly for handling authentication, caching, metrics, and logging.

Other noteworthy components are Upstream Graphlets and an A/B Testing Service that enables A/B tests in the graph. The Gateway is connected to an Application Load Balancer, which is protected by AWS WAF and fronted by Amazon CloudFront as our CDN layer. This uses Amazon ElastiCache with Redis as the engine to cache partial responses coming from graphlets. All logs and metrics generated by the system are collected in Amazon CloudWatch.

Choosing the compute platform

This uses Lambda, since it scales on demand. IMDb uses Lambda’s Provisioned Concurrency to reduce cold start latency. The infrastructure is abstracted away so there is no need for us to manage our own capacity and handle patches.

Additionally, Application Load Balancer (ALB) has support for directing HTTP traffic to Lambda. This is an alternative to API Gateway. The workload does not need many of the features that API Gateway provides, since the gateway has a single endpoint, making ALB a better choice. ALB also supports AWS WAF.

Using Lambda, the team designed a way to fetch schema updates without needing to fetch the schema with every request. This is addressed with the Schema Manager component. This component improves latency and improves the overall customer experience.

Integration with legacy data services

The main purpose of the federated GQL migration is to deprecate a monolithic service that aggregates data from multiple backend services before sending it to the clients. Some of the data in the federated graph comes from brand new graphlets that are developed with the new system in mind.

However, much of the data powering the GQL endpoint is sourced from the existing backend services. One benefit of running on Lambda is the flexibility to choose the runtime environment that works best with the underlying data sources and data services.

For the graphlets relying on the legacy services, IMDb uses lightweight Java Lambda functions using provided client libraries written in Java. They connect to legacy backends via AWS PrivateLink, fetch the data, and shape it in the format expected by the GQL request. For the modern graphlets, we recommend the graphlet teams to explore Node.js as the first option due to improved performance and ease of development.

Caching

The gateway supports two caching modes for graphlets: static and dynamic. Static caching allows graphlet owners to specify a default TTL for responses returned by a graphlet. Dynamic caching calculates TTL based on a custom caching extension returned with the partial response. It allows graphlet owners to decide on the optimal TTL for content returned by their graphlet. For example, it can be 24 hours for queries containing only static text.

Permissions

Each of the graphlets runs in a separate AWS account. The graphlet accounts grant the gateway AWS account (as AWS principal) invoke permissions on the graphlet Lambda function. This uses a cross-account IAM role in the development environment that is assumed by stacks deployed in engineers’ personal accounts.

Experience with developing on federated GraphQL

The migration to federated GraphQL delivered on expected results. We moved the business logic closer to the teams that have the right expertise – the graphlet teams. At the same time, a dedicated platform team owns and develops the core technical pieces of the ecosystem. This includes the Gateway and Schema Manager, in addition to the common libraries and CDK constructs that can be reused by the graphlet teams. As a result, there is a clear ownership structure, which is aligned with the way IMDb teams are organized.

In terms of operational excellence of the platform team, this reduced support tickets related to business logic. Previously, these were routed to an appropriate data service team with a delay. Integration tests are now stable and independent from underlying data, which increases our confidence in the Continuous Deployment process. It also eliminates changing data as a potential root cause for failing tests and blocked pipelines.

The graphlet teams now have full ownership of the data available in the graph. They own the partial schema and the resolvers that provide data for that part of the graph. Since they have the most expertise in that area, the potential issues are identified early on. This leads to a better customer experience and overall health of the system.

The web and app developers groups are also impacted by the migration. The learning curve was aided by tools like GraphQL Playground and Apollo Client. The teams covered the learning gap quickly and started delivering new features.

One of the main pages at IMDb.com is the Title Page (for example, Shutter Island). This was successfully migrated to use the new GQL endpoint. This proves that the new, serverless federated system can serve production traffic at over 10,000 TPS.

Conclusion

A single, highly discoverable, and well-documented backend endpoint enabled our clients to experiment with the data available in the graph. We were able to clean up the backend API layer, introduce clear ownership boundaries, and give our client powerful tools to speed up their development cycle.

The infrastructure uses Lambda to remove the burden of managing, patching, and scaling our EC2 fleets. The team dedicated this time to work on features and operational excellence of our systems.

Part two will cover how IMDb manages the federated schema and the guardrails used to ensure high availability of the federated GraphQL endpoint.

For more serverless learning resources, visit Serverless Land.

Building a serverless distributed application using a saga orchestration pattern

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-distributed-application-using-a-saga-orchestration-pattern/

This post is written by Anitha Deenadayalan, Developer Specialist SA, DevAx (Developer Acceleration).

This post shows how to use the saga design pattern to preserve data integrity in distributed transactions across multiple services. In a distributed transaction, multiple services can be called before a transaction is completed. When the services store data in different data stores, it can be challenging to maintain data consistency across these data stores.

To maintain consistency in a transaction, relational databases provide two-phase commit (2PC). This consists of a prepare phase and a commit phase. In the prepare phase, the coordinating process requests the transaction’s participating processes (participants) to promise to commit or rollback the transaction. In the commit phase, the coordinating process requests the participants to commit the transaction. If the participants cannot agree to commit in the prepare phase, then the transaction is rolled back.

In distributed systems architected with microservices, two-phase commit is not an option as the transaction is distributed across various databases. In this case, one solution is to use the saga pattern.

A saga consists of a sequence of local transactions. Each local transaction in saga updates the database and triggers the next local transaction. If a transaction fails, then the saga runs compensating transactions to revert the database changes made by the preceding transactions.

There are two types of implementations for the saga pattern: choreography and orchestration.

Saga choreography

The saga choreography pattern depends on the events emitted by the microservices. The saga participants (microservices) subscribe to the events and they act based on the event triggers. For example, the order service in the following diagram emits an OrderPlaced event. The inventory service subscribes to that event and updates the inventory when the OrderPlaced event is emitted. Similarly the participant services act based on the context of the emitted event.

Solution architecture

Saga orchestration

The saga orchestration pattern has a central coordinator called orchestrator. The saga orchestrator manages and coordinates the entire transaction lifecycle. It is aware of the series of steps to be performed to complete the transaction. To run a step, it sends a message to the participant microservice to perform the operation. The participant microservice completes the operation and sends a message to the orchestrator. Based on the received message, the orchestrator decides which microservice to run next in the transaction:

Sage orchestrator in flow

You can use AWS Step Functions to implement the saga orchestration when the transaction is distributed across multiple databases.

Overview

This example uses a Step Functions workflow to implement the saga orchestration pattern, using the following architecture:

API Gateway to Lambda to Step Functions

When a customer calls the API, the invocation occurs, and pre-processing occurs in the Lambda function. The function starts the Step Functions workflow to start processing the distributed transaction.

The Step Functions workflow calls the individual services for order placement, inventory update, and payment processing to complete the transaction. It sends an event notification for further processing. The Step Functions workflow acts as the orchestrator to coordinate the transactions. If there is any error in the workflow, the orchestrator runs the compensatory transactions to ensure that the data integrity is maintained across various services.

When pre-processing is not required, you can also trigger the Step Functions workflow directly from API Gateway without the Lambda function.

The Step Functions workflow

The following diagram shows the steps that are run inside the Step Functions workflow. The green boxes show the steps that are run successfully. The order is placed, inventory is updated, and payment is processed before a Success state is returned to the caller.

The orange boxes indicate the compensatory transactions that are run when any one of the steps in the workflow fails. If the workflow fails at the Update inventory step, then the orchestrator calls the Revert inventory and Remove order steps before returning a Fail state to the caller. These compensatory transactions ensure that the data integrity is maintained. The inventory reverts to original levels and the order is reverted back.

Step Functions workflow

This preceding workflow is an example of a distributed transaction. The transaction data is stored across different databases and each service writes to its own database.

Prerequisites

For this walkthrough, you need:

Setting up the environment

For this walkthrough, use the AWS CDK code in the GitHub Repository to create the AWS resources. These include IAM roles, REST API using API Gateway, DynamoDB tables, the Step Functions workflow and Lambda functions.

  1. You need an AWS access key ID and secret access key for configuring the AWS Command Line Interface (AWS CLI). To learn more about configuring the AWS CLI, follow these instructions.
  2. Clone the repo:
    git clone https://github.com/aws-samples/saga-orchestration-netcore-blog
  3. After cloning, this is the directory structure:
    Directory structure
  4. The Lambda functions in the saga-orchestration directory must be packaged and copied to the cdk-saga-orchestration\lambdas directory before deployment. Run these commands to process the PlaceOrderLambda function:
    cd PlaceOrderLambda/src/PlaceOrderLambda 
    dotnet lambda package
    cp bin/Release/netcoreapp3.1/PlaceOrderLambda.zip ../../../../cdk-saga-orchestration/lambdas
    
  5. Repeat the same commands for all the Lambda functions in the saga-orchestration directory.
  6. Build the CDK code before deploying to the console:
    cd cdk-saga-orchestration/src/CdkSagaOrchestration
    dotnet build
    
  7. Install the aws-cdk package:
    npm install -g aws-cdk 
  8. The cdk synth command causes the resources defined in the application to be translated into an AWS CloudFormation template. The cdk deploy command deploys the stacks into your AWS account. Run:
    cd cdk-saga-orchestration
    cdk synth 
    cdk deploy
    
  9. CDK deploys the environment to AWS. You can monitor the progress using the CloudFormation console. The stack name is CdkSagaOrchestrationStack:
    CloudFormation console

The Step Functions configuration

The CDK creates the Step Functions workflow, DistributedTransactionOrchestrator. The following snippet defines the workflow with AWS CDK for .NET:

var stepDefinition = placeOrderTask
    .Next(new Choice(this, "Is order placed")
        .When(Condition.StringEquals("$.Status", "ORDER_PLACED"), updateInventoryTask
            .Next(new Choice(this, "Is inventory updated")
                .When(Condition.StringEquals("$.Status", "INVENTORY_UPDATED"),
                    makePaymentTask.Next(new Choice(this, "Is payment success")
                        .When(Condition.StringEquals("$.Status", "PAYMENT_COMPLETED"), successState)
                        .When(Condition.StringEquals("$.Status", "ERROR"), revertPaymentTask)))
                .When(Condition.StringEquals("$.Status", "ERROR"), waitState)))
        .When(Condition.StringEquals("$.Status", "ERROR"), failState));

Step Functions workflow

Compare the states language definition for the state machine with the definition above. Also observe the inputs and outputs for each step and how the conditions have been configured. The steps with type Task call a Lambda function for the processing. The steps with type Choice are decision-making steps that define the workflow.

Setting up the DynamoDB table

The Orders and Inventory DynamoDB tables are created using AWS CDK. The following snippet creates a DynamoDB table with AWS CDK for .NET:

var inventoryTable = new Table(this, "Inventory", new TableProps
{
    TableName = "Inventory",
    PartitionKey = new Attribute
    {
        Name = "ItemId",
        Type = AttributeType.STRING
    },
    RemovalPolicy = RemovalPolicy.DESTROY
});
  1. Open the DynamoDB console and select the Inventory table.
  2. Choose Create Item.
  3. Select Text, paste the following contents, then choose Save.
    {
      "ItemId": "ITEM001",
      "ItemName": "Soap",
      "ItemsInStock": 1000,
      "ItemStatus": ""
    }
    

    Create Item dialog

  4. Create two more items in the Inventory table:
    {
      "ItemId": "ITEM002",
      "ItemName": "Shampoo",
      "ItemsInStock": 500,
      "ItemStatus": ""
    }
    
    {
      "ItemId": "ITEM003",
      "ItemName": "Toothpaste",
      "ItemsInStock": 2000,
      "ItemStatus": ""
    }
    

The Lambda functions UpdateInventoryLambda and RevertInventoryLambda increment and decrement the ItemsInStock attribute value. The Lambda functions PlaceOrderLambda and UpdateOrderLambda insert and delete items in the Orders table. These are invoked by the saga orchestration workflow.

Triggering the saga orchestration workflow

The API Gateway endpoint, SagaOrchestratorAPI, is created using AWS CDK. To invoke the endpoint:

  1. From the API Gateway service page, select the SagaOrchestratorAPI:
    List of APIs
  2. Select Stages in the left menu panel:
    Stages menu
  3. Select the prod stage and copy the Invoke URL:
    Invoke URL
  4. From Postman, open a new tab. Select POST in the dropdown and enter the copied URL in the textbox. Move to the Headers tab and add a new header with the key ‘Content-Type’ and value as ‘application/json’:
    Postman configuration
  5. In the Body tab, enter the following input and choose Send.
    {
      "ItemId": "ITEM001",
      "CustomerId": "ABC/002",
      "MessageId": "",
      "FailAtStage": "None"
    }
    
  6. You see the output:
    Output
  7. Open the Step Functions console and view the execution. The graph inspector shows that the execution has completed successfully.
    Successful workflow execution
  8. Check the items in the DynamoDB tables, Orders & Inventory. You can see an item in the Orders table indicating that an order is placed. The ItemsInStock in the Inventory table has been deducted.
    Changes in DynamoDB tables
  9. To simulate the failure workflow in the saga orchestrator, send the following JSON as body in the Postman call. The FailAtStage parameter injects the failure in the workflow. Select Send in Postman after updating the Body:
    {
      "ItemId": "ITEM002",
      "CustomerId": "DEF/002",
      "MessageId": "",
      "FailAtStage": "UpdateInventory"
    }
    
  10. Open the Step Functions console to see the execution.
  11. While the function waits in the wait state, look at the items in the DynamoDB tables. A new item is added to the Orders table and the stock for Shampoo is deducted in the Inventory table.
    Changes in DynamoDB table
  12. Once the wait completes, the compensatory transaction steps are run:
    Workflow execution result
  13. In the graph inspector, select the Update Inventory step. On the right pane, click on the Step output tab. The status is ERROR, which changes the control flow to run the compensatory transactions.
    Step output
  14. Look at the items in the DynamoDB table again. The data is now back to a consistent state, as the compensatory transactions have run to preserve data integrity:
    DynamoDB table items

The Step Functions workflow implements the saga orchestration pattern. It performs the coordination across distributed services and runs the transactions. It also performs compensatory transactions to preserve the data integrity.

Cleaning up

To avoid incurring additional charges, clean up all the resources that have been created. Run the following command from a terminal window. This deletes all the resources that were created as part of this example.

cdk destroy

Conclusion

This post showed how to implement the saga orchestration pattern using API Gateway, Step Functions, Lambda, DynamoDB, and .NET Core 3.1. This can help maintain data integrity in distributed transactions across multiple services. Step Functions makes it easier to implement the orchestration in the saga pattern.

To learn more about developing microservices on AWS, refer to the whitepaper on microservices. To learn more about the features, refer to the AWS CDK Features page.

Increasing performance of Java AWS Lambda functions using tiered compilation

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/increasing-performance-of-java-aws-lambda-functions-using-tiered-compilation/

This post is written by Mark Sailes, Specialist Solutions Architect, Serverless and Richard Davison, Senior Partner Solutions Architect.

The Operating Lambda: Performance optimization blog series covers important topics for developers, architects, and system administrators who are managing applications using AWS Lambda functions. This post explains how you can reduce the initialization time to start a new execution environment when using the Java-managed runtimes.

Lambda lifecycle

Many Lambda workloads are designed to deliver fast responses to synchronous or asynchronous workloads in a fraction of a second. Examples of these could be public APIs to deliver dynamic content to websites or a near-real time data pipeline doing small batch processing.

As usage of these systems increases, Lambda creates new execution environments. When a new environment is created and used for the first time, additional work is done to make it ready to process an event. This creates two different performance profiles: one with and one without the additional work.

To improve the response time, you can minimize the effect of this additional work. One way you can minimize the time taken to create a new managed Java execution environment is to tune the JVM. It can be optimized specifically for these workloads that do not have long execution durations.

One example of this is configuring a feature of the JVM called tiered compilation. From version 8 of the Java Development Kit (JDK), the two just-in-time compilers C1 and C2 have been used in combination. C1 is designed for use on the client side and to enable short feedback loops for developers. C2 is designed for use on the server side and to achieve higher performance after profiling.

Tiering is used to determine which compiler to use to achieve better performance. These are represented as five levels:

Tiering levels

Profiling has an overhead, and performance improvements are only achieved after a method has been invoked a number of times, the default being 10,000. Lambda customers wanting to achieve faster startup times can use level 1 with little risk of reducing warm start performance. The article “Startup, containers & Tiered Compilation” explains tiered compilation further.

For customers who are doing highly repetitive processing, this configuration might not be suited. Applications that repeat the same code paths many times want the JVM to profile and optimize these paths. Concrete examples of these would be using Lambda to run Monte Carlo simulation or hash calculations. You can run the same simulations thousands of times and the JVM profiling can reduce the total execution time significantly.

Performance improvements

The example project is a Java 11-based application used to analyze the impact of this change. The application is triggered by Amazon API Gateway and then puts an item into Amazon DynamoDB. To compare the performance difference caused by this change, there is one Lambda function with the additional changes and one without. There are no other differences in the code.

Download the code for this example project from the GitHub repo: https://github.com/aws-samples/aws-lambda-java-tiered-compilation-example.

To install prerequisite software:

  1. Install the AWS CDK.
  2. Install Apache Maven, or use your preferred IDE.
  3. Build and package the Java application in the software folder:
    cd software/ExampleFunction/
    mvn package
  4. Zip the execution wrapper script:
    cd ../OptimizationLayer/
    ./build-layer.sh
    cd ../../
  5. Synthesize CDK. This previews changes to your AWS account before it makes them:
    cd infrastructure
    cdk synth
  6. Deploy the Lambda functions:
    cdk deploy --outputs-file outputs.json

The API Gateway endpoint URL is displayed in the output and saved in a file named outputs.json. The contents are similar to:

InfrastructureStack.apiendpoint = https://{YOUR_UNIQUE_ID_HERE}.execute-api.eu-west-1.amazonaws.com

Using Artillery to load test the changes

First, install prerequisites:

  1. Install jq and Artillery Core.
  2. Run the following two scripts from the /infrastructure directory:
    artillery run -t $(cat outputs.json | jq -r '.InfrastructureStack.apiendpoint') -v '{ "url": "/without" }' loadtest.yml
    
    artillery run -t $(cat outputs.json | jq -r '.InfrastructureStack.apiendpoint') -v '{ "url": "/with" }' loadtest.yml

Check the results using Amazon CloudWatch Insights

  1. Navigate to Amazon CloudWatch.
  2. Select Logs then Logs Insights.
  3. Select the following two log groups from the drop-down list:
    /aws/lambda/example-with-layer
    /aws/lambda/example-without-layer
  4. Copy the following query and choose Run query:
        filter @type = "REPORT"
        | parse @log /\d+:\/aws\/lambda\/example-(?<function>\w+)-\w+/
        | stats
        count(*) as invocations,
        pct(@duration, 0) as p0,
        pct(@duration, 25) as p25,
        pct(@duration, 50) as p50,
        pct(@duration, 75) as p75,
        pct(@duration, 90) as p90,
        pct(@duration, 95) as p95,
        pct(@duration, 99) as p99,
        pct(@duration, 100) as p100
        group by function, ispresent(@initDuration) as coldstart
        | sort by function, coldstart
    

    Query window

You see results similar to:

Query results

Here is a simplified table of the results:

Settings Type

# of invocations

p90 (ms)

p95 (ms)

p99 (ms)

Default settings Cold start

754

5,212

5,338

5,517

Default settings Warm start

35,247

58

93

255

Tiered compilation stopping at level 1 Cold start

383

2,071

2,086

2,221

Tiered compilation stopping at level 1 Warm start

35,618

23

32

86

The results are from testing 120 concurrent requests over 5 minutes using an open-source software project called Artillery. You can find instructions on how to run these tests in the GitHub repo. The results show that for this application, cold starts for 90% of invocations improve by 3141 ms (60%). These numbers are specific for this application and your application may behave differently.

Using wrapper scripts for Lambda functions

Wrapper scripts are a feature available in Java 8 and Java 11 on Amazon Linux 2 managed runtimes. They are not available for the Java 8 on Amazon Linux 1 managed runtime.

To apply this optimization flag to Java Lambda functions, create a wrapper script and add it to a Lambda layer zip file. This script will alter the JVM flags which Java is started with, within the execution environment.

#!/bin/sh
shift
export _JAVA_OPTIONS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1"
java "$@"

Read the documentation to learn how to create and share a Lambda layer.

Console walkthrough

This change can be configured using AWS Serverless Application Model (AWS SAM), the AWS Command Line Interface (AWS CLI), AWS CloudFormation, or from within the AWS Management Console.

Using the AWS Management Console:

  1. Navigate to the AWS Lambda console.
  2. Select Functions and choose the Lambda function to add the layer to.
    Lambda functions
  3. The Code tab is selected by default. Scroll down to the Layers panel.
  4. Select Add a layer.
    Add a layer
  5. Select Custom layers and choose your layer.
    Add layer
  6. Select the Version. Choose Add.
  7. From the menu, select the Configuration tab and Environment variables. Choose Edit.
    Configuration tab
  8. Choose Add environment variable. Add the following:
    – Key: AWS_LAMBDA_EXEC_WRAPPER
    – Value: /opt/java-exec-wrapper
    Edit environment variables
  9. Choose Save.You can verify that the changes are applied by invoking the function and viewing the log events. The log line Picked up _JAVA_OPTIONS: -XX:+TieredCompilation -XX:TieredStopAtLevel=1 is added.Log events

Conclusion

Tiered compilation stopping at level 1 reduces the time the JVM spends optimizing and profiling your code. This could help reduce start up times for Java applications that require fast responses, where the workload doesn’t meet the requirements to benefit from profiling.

You can make further reductions in startup time using GraalVM. Read more about GraalVM and the Quarkus framework in the architecture blog. View the code example at https://github.com/aws-samples/aws-lambda-java-tiered-compilation-example to see how you can apply this to your Lambda functions.

For more serverless learning resources, visit Serverless Land.

Sending mobile push notifications and managing device tokens with serverless applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/sending-mobile-push-notifications-and-managing-device-tokens-with-serverless-application/

This post is written by Rafa Xu, Cloud Architect, Serverless and Joely Huang, Cloud Architect, Serverless.

Amazon Simple Notification Service (SNS) is a fast, flexible, fully managed push messaging service in the cloud. SNS can send mobile push notifications directly to applications on mobile devices such as message alerts and badge updates. SNS sends push notifications to a mobile endpoint created by supplying a mobile token and platform application.

When publishing mobile push notifications, a device token is used to generate an endpoint. This identifies where the push notification is sent (target destination). To push notifications successfully, the token must be up to date and the endpoint must be validated and enabled.

A common challenge when pushing notifications is keeping the token up to date. Tokens can automatically change due to reasons such as mobile operating system (OS) updates and application store updates.

This post provides a serverless solution to this challenge. It also provides a way to publish push notifications to specific end users by maintaining a mapping between users, endpoints, and tokens.

Overview

To publish mobile push notifications using SNS, generate an SNS endpoint to use as a destination target for the push notification. To create the endpoint, you must supply:

  1. A mobile application token: The mobile operating system (OS) issues the token to the application. It is a unique identifier for the application and mobile device pair.
  2. Platform Application Amazon Resource Name (ARN): SNS provides this ARN when you create a platform application object. The platform application object requires a valid set of credentials issued by the mobile platform, which you provide to SNS.

Once the endpoint is generated, you can store and reuse it again. This prevents the application from creating endpoints indefinitely, which could exhaust the SNS endpoint limit.

To reuse the endpoints and successfully push notifications, there are a number of challenges:

  • Mobile application tokens can change due to a number of reasons, such as application updates. As a result, the publisher must update the platform endpoint to ensure it uses an up-to-date token.
  • Mobile application tokens can become invalid. When this happens, messages won’t be published, and SNS disables the endpoint with the invalid token. To resolve this, publishers must retrieve a valid token and re-enable the platform endpoint
  • Mobile applications can have many users, each user could have multiple devices, or one device could have multiple users. To send a push notification to a specific user, a mapping between the user, device, and platform endpoints should be maintained.

For more information on best practices for managing mobile tokens, refer to this post.

Follow along the blog post to learn how to implement a serverless workflow for managing and maintaining valid endpoints and user mappings.

Solution overview

The solution uses the following AWS services:

  • Amazon API Gateway: Provides a token registration endpoint URL used by the mobile application. Once called, it invokes an AWS Lambda function via the Lambda integration.
  • Amazon SNS: Generates and maintains the target endpoint and manages platform application objects.
  • Amazon DynamoDB: Serverless database for storing endpoints that also maintains a mapping between the user, endpoint, and mobile operating system.
  • AWS Lambda: Retrieves endpoints from DynamoDB, validates and generates endpoints, and publishes notifications by making requests to SNS.

The following diagram represents a simplified interaction flow between the AWS services:

Solution architecture

To register the token, the mobile app invokes the registration token endpoint URL generated by Amazon API Gateway. The token registration happens every time a user logs in or opens the application. This ensures that the token and endpoints are always valid during the application usage.

The mobile application passes the token, user, and mobileOS as parameters to API Gateway, which forwards the request to the Lambda function.

The Lambda function validates the token and endpoint for the user by making API calls to DynamoDB and SNS:

  1. The Lambda function checks DynamoDB to see if the endpoint has been previously created.
    1. If the endpoint does not exist, it creates a platform endpoint via SNS.
  2. Obtain the endpoint attributes from SNS:
    1. Check the “enabled” endpoint attribute and set to “true” to enable the platform endpoint, if necessary.
    2. Validate the “token” endpoint attribute with the token provided in the API Gateway request. If it does not match, update the “token” attribute.
    3. Send a request to SNS to update the endpoint attributes.
  3. If a new endpoint is created, update DynamoDB with the new endpoint.
  4. Return a successful response to API Gateway.

Deploying the AWS Serverless Application Model (AWS SAM) template

Use the AWS SAM template to deploy the infrastructure for this workflow. Before deploying the template, first create a platform application in SNS.

  1. Navigate to the SNS console. Select Push Notifications on the left-hand menu to create a platform application:
    Mobile push notifications
  2. This shows the creation of a platform application for iOS applications:
    Create platform application
  3. To install AWS SAM, visit the installation page.
  4. To deploy the AWS SAM template, navigate to the directory where the template is located. Run the commands in the terminal:
    git clone https://github.com/aws-samples/serverless-mobile-push-notification
    cd serverless-mobile-push-notification
    sam build
    sam deploy --guided

Lambda function code snippets

The following section explains code from the Lambda function for the workflow.

Create the platform endpoint

If the endpoint exists, store it as a variable in the code. If the platform endpoint does not exist in the DynamoDB database, create a new endpoint:

        need_update_ddb = False
        response = table.get_item(Key={'username': username, 'appos': appos})
        if 'Item' not in response:
            # create endpoint
            response = snsClient.create_platform_endpoint(
                PlatformApplicationArn=SUPPORTED_PLATFORM[appos],
                Token=token,
            )
            devicePushEndpoint = response['EndpointArn']
            need_update_ddb = True
        else:
            # update the endpoint
            devicePushEndpoint = response['Item']['endpoint']

Check and update endpoint attributes

Check that the token attribute for the platform endpoint matches the token received from the mobile application through the request. This also checks for the endpoint “enabled” attribute and re-enables the endpoint if necessary:

response = snsClient.get_endpoint_attributes(
                EndpointArn=devicePushEndpoint
            )
            endpointAttributes = response['Attributes']

            previousToken = endpointAttributes['Token']
            previousStatus = endpointAttributes['Enabled']
            if previousStatus.lower() != 'true' or previousToken != token:
                snsClient.set_endpoint_attributes(
                    EndpointArn=devicePushEndpoint,
                    Attributes={
                        'Token': token,
                        'Enabled': 'true'
                    }
                )

Update the DynamoDB table with the newly generated endpoint

If a platform endpoint is newly created, meaning there is no item in the DynamoDB table, create a new item in the table:

        if need_update_ddb:
            table.update_item(
                Key={
                    'username': username,
                    'appos': appos
                },
                UpdateExpression="set endpoint=:e",
                ExpressionAttributeValues={
                    ':e': devicePushEndpoint
                },
                ReturnValues="UPDATED_NEW"
            )

As best practice, the code cleans up the table, in case there are multiple entries for the same endpoint mapped to different users. This can happen when the mobile application is used by multiple users on the same device. When one user logs out and a different user logs in, this creates a new entry in the DynamoDB table to map the endpoint with the new user.

As a result, you must remove the entry that maps the same endpoint to the previously logged in user. This way, you only keep the endpoint that matches the user provided by the mobile application through the request.

result = table.query(
    # Add the name of the index you want to use in your query.
        IndexName="endpoint-index",
        KeyConditionExpression=Key('endpoint').eq(devicePushEndpoint),
    )
    for item in result['Items']:
        if item['username'] != username and item['appos'] == appos:
            print(f"deleting orphan item: username {username}, os {appos}".format(username=item['username'], appos=appos))
            table.delete_item(
                Key={
                    'username': item['username'],
                    'appos': appos
                },
            )

Conclusion

This blog shows how to deploy a serverless solution for validating and managing SNS platform endpoints and tokens. To publish push notifications successfully, use SNS to check the endpoint attribute and ensure it is mapped to the correct token and the endpoint is enabled.

This approach uses DynamoDB to store the device token and platform endpoints for each user. This allows you to send push notifications to specific users, retrieve, and reuse previously created endpoints. You create a Lambda function to facilitate the workflow, including validating the DynamoDB item for storing an enabled and up-to-date token.

Visit this link to learn more about Amazon SNS mobile push notifications: http://docs.aws.amazon.com/sns/latest/dg/SNSMobilePush.html

For more serverless learning resources, visit Serverless Land.

Adding resiliency to AWS CloudFormation custom resource deployments

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/adding-resiliency-to-aws-cloudformation-custom-resource-deployments/

This post is written by Dathu Patil, Solutions Architect and Naomi Joshi, Cloud Application Architect.

AWS CloudFormation custom resources allow you to write custom provisioning logic in templates. These run anytime you create, update, or delete stacks. Using AWS Lambda-backed custom resources, you can associate a Lambda function with a CloudFormation custom resource. The function is invoked whenever the custom resource is created, updated, or deleted.

When CloudFormation asynchronously invokes the function, it passes the request data, such as the request type and resource properties to the function. The customizability of Lambda functions in combination with CloudFormation allow a wide range of scenarios. For example, you can dynamically look up Amazon Machine Image (AMI) IDs during stack creation or use utilities such as string reversal functions.

Unhandled exceptions or transient errors in the custom resource Lambda function can cause your code to exit without sending a response. CloudFormation requires an HTTPS response to confirm if the operation is successful or not. An unreported exception causes CloudFormation to wait until the operation times out before starting a stack rollback.

If the exception occurs again on rollback, CloudFormation waits for a timeout exception before ending in a rollback failure. During this time, your stack is unusable. You can learn more about this and best practices by reviewing Best Practices for CloudFormation Custom Resources.

In this blog, you learn how you can use Amazon SQS and Lambda to add resiliency to your Lambda-backed CloudFormation custom resource deployments. The example shows how to use CloudFormation custom resource to look up an AMI ID dynamically during Amazon EC2 creation.

Overview

CloudFormation templates that declare an EC2 instance must also specify an AMI ID. This includes an operating system and other software and configuration information used to launch the instance. The correct AMI ID depends on the instance type and Region in which you’re launching your stack. AMI ID can change regularly, such as when an AMI is updated with software updates.

Customers often implement a CloudFormation custom resource to look up an AMI ID while creating an EC2 instance. In this example, the lookup Lambda function calls the EC2 API. It fetches the available AMI IDs, uses the latest AMI ID, and checks for a compliance tag. This implementation assumes that there are separate processes for creating AMI and running compliance checks. The process that performs compliance and security checks creates a compliance tag on a successful scan.

This solution shows how you can use SQS and Lambda to add resiliency to handle an exception. In this case, the exception occurs in the AMI lookup custom resource due to a missing compliance tag. When the AMI lookup function fails processing, it uses the Lambda destination configuration to send the request to an SQS queue. The message is reprocessed using the SQS queue and Lambda function.

Solution architecture

  1. The CloudFormation custom resource asynchronously invokes the AMI lookup Lambda function to perform appropriate actions.
  2. The AMI lookup Lambda function calls the EC2 API to fetch the list of AMIs and checks for a compliance tag. If the tag is missing, it throws an unhandled exception.
  3. On failure, the Lambda destination configuration sends the request to the retry queue that is configured as a dead-letter queue (DLQ). SQS adds a custom delay between retry processing to support more than two retries.
  4. The retry Lambda function processes messages in the retry queue using Lambda with SQS. Lambda polls the queue and invokes the retry Lambda function synchronously with an event that contains queue messages.
  5. The retry function then synchronously invokes the AMI lookup function using the information from the request SQS message.

The AMI Lookup Lambda function

An AWS Serverless Application Model (AWS SAM) template is used to create the AMI lookup Lambda function. You can configure async event options such as number of retries on the Lambda function. The maximum retries allowed is 2 and there is no option to set a delay between the invocation attempts.

When a transient failure or unhandled error occurs, the request is forwarded to the retry queue. This part of the AWS SAM template creates AMI lookup Lambda function:

  AMILookupLambda:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: amilookup/
      Handler: app.lambda_handler
      Runtime: python3.8
      Timeout: 300
      EventInvokeConfig:
          MaximumEventAgeInSeconds: 60
          MaximumRetryAttempts: 2
          DestinationConfig:
            OnFailure:
              Type: SQS
              Destination: !GetAtt RetryQueue.Arn
      Policies:
        - AMIDescribePolicy: {}

This function calls the EC2 API using the boto3 AWS SDK for Python. It calls the describe_images method to get a list of images with given filter conditions. The Lambda function iterates through the AMI list and checks for compliance tags. If the tag is not present, it raises an exception:

ec2_client = boto3.client('ec2', region_name=region)
         # Get AMI IDs with the specified name pattern and owner
         describe_response = ec2_client.describe_images(
            Filters=[{'Name': "name", 'Values': architectures},
                     {'Name': "tag-key", 'Values': ['ami-compliance-check']}],
            Owners=["amazon"]
        )

The queue and the retry Lambda function

The retry queue adds a 60-second delay before a message is available for the processing. The time delay between retry processing attempts provides time for transient errors to be corrected. This is the AWS SAM template for creating these resources:

RetryQueue:
  Type: AWS::SQS::Queue
  Properties:
    VisibilityTimeout: 60
    DelaySeconds: 60
    MessageRetentionPeriod: 600

RetryFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: retry/
      Handler: app.lambda_handler
      Runtime: python3.8
      Timeout: 60
      Events:
        MySQSEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt RetryQueue.Arn
            BatchSize: 1
      Policies:
        - LambdaInvokePolicy:
            FunctionName: !Ref AMILookupFunction

The retry Lambda function periodically polls for new messages in the retry queue. The function synchronously invokes the AMI lookup Lambda function. On success, a response is sent to CloudFormation. This process runs until the AMI lookup function returns a successful response or the message is deleted from the SQS queue. The deletion is based on the MessageRetentionPeriod, which is set to 600 seconds in this case.

for record in event['Records']:
        body = json.loads(record['body'])
        response = client.invoke(
            FunctionName=body['requestContext']['functionArn'],
            InvocationType='RequestResponse',
            Payload=json.dumps(body['requestPayload']).encode()
        )            

Deployment walkthrough

Prerequisites

To get started with this solution, you need:

  • AWS CLI and AWS SAM CLI installed to deploy the solution.
  • An existing Amazon EC2 public image. You can choose any of the AMIs from the AWS Management Console with Architecture = X86_64 and Owner = amazon for test purposes. Note the AMI ID.

Download the source code from the resilient-cfn-custom-resource GitHub repository. The template.yaml file is an AWS SAM template. It deploys the Lambda functions, SQS, and IAM roles required for the Lambda function. It uses Python 3.8 as the runtime and assigns 128 MB of memory for the Lambda functions.

  1. To build and deploy this application using the AWS SAM CLI build and guided deploy:
    sam build --use-container
    sam deploy --guided

The custom resource stack creation invokes the AMI lookup Lambda function. This fetches the AMI ID from all public EC2 images available in your account with the tag ami-compliance-check. Typically, the compliance tags are created by a process that performs security scans.

In this example, the security scan process is not running and the tag is not yet added to any AMIs. As a result, the custom resource throws an exception, which goes to the retry queue. This is retried by the retry function until it is successfully processed.

  1. Use the console or AWS CLI to add the tag to the chosen EC2 AMI. In this example, this is analogous to a separate governance process that checks for AMI compliance and adds the compliance tag if passed. Replace the $AMI-ID with the AMI ID captured in the prerequisites:
    aws ec2 create-tags –-resources $AMI-ID --tags Key=ami-compliance-check,Value=True
  2. After the tags are added, a response is sent successfully from the custom resource Lambda function to the CloudFormation stack. It includes your $AMI-ID and a test EC2 instance is created using that image. The stack creation completes successfully with all resources deployed.

Conclusion

This blog post demonstrates how to use SQS and Lambda to add resiliency to CloudFormation custom resources deployments. This solution can be customized for use cases where CloudFormation stacks have a dependency on a custom resource.

CloudFormation custom resource failures can happen due to unhandled exceptions. These are caused by issues with a dependent component, internal service, or transient system errors. Using this solution, you can handle the failures automatically without the need for manual intervention. To get started, download the code from the GitHub repo and start customizing.

For more serverless learning resources, visit Serverless Land.

Authenticating and authorizing Amazon MQ users with Lightweight Directory Access Protocol

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/authenticating-and-authorizing-amazon-mq-users-with-lightweight-directory-access-protocol/

This post is written by Dominic Gagné and Mithun Mallick.

Amazon MQ is a managed message broker service for Apache ActiveMQ and RabbitMQ that simplifies setting up and operating message brokers in the AWS Cloud. Integrating an Amazon MQ broker with a Lightweight Directory Access Protocol (LDAP) server allows you to manage credentials and permissions for users in a single location. There is also the added benefit of not requiring a message broker reboot for new authorization rules to take effect.

This post explores concepts around Amazon MQ’s authentication and authorization model. It covers the steps to set up Amazon MQ access for a Microsoft Active Directory user.

Authentication and authorization

Amazon MQ for ActiveMQ uses native ActiveMQ authentication to manage user permissions by default. Users are created within Amazon MQ to allow broker access, and are mapped to read, write, and admin operations on various destinations. This local user model is referred to as the simple authentication type.

As an alternative to simple authentication, you can maintain broker access control authorization rules within an LDAP server on a per-destination or destination set basis. Wildcards are also supported for rules that apply to multiple destinations.

The LDAP integration feature uses the ActiveMQ standard Java Authentication and Authorization Service (JAAS) plugin. Additional details on the plugin can be found within ActiveMQ security documentation. Authentication details are defined as part of the ldapServerMetadata attribute. Authorization settings are configured as part of the cachedLDAPAuthorizationMap node in the broker’s activemq.xml configuration.

Here is an overview of the integration:

overview graph

  1. Client requests access to a queue or topic.
  2. Authenticate and authorize the client via JAAS.
  3. Grant or deny Access to the specified queue or topic.
  4. If access is granted, allow the client to read, write, or create.

Integration with LDAP

ActiveMQ integration with LDAP sets up a secure LDAP access connection between an Amazon MQ for ActiveMQ broker and a Microsoft Active Directory server. You can also use other implementations of LDAP as the directory server, such as OpenLDAP.

Amazon MQ encrypts all data between a broker and LDAP server, and enforces secure LDAP (LDAPS) via public certificates. Unsecured LDAP on port 389 is not supported; traffic must communicate via the secure LDAP port 636. In this example, a Microsoft Active Directory server has LDAPS configured with a public certificate. To set up a Simple AD server with LDAPS and a public certificate, read this blog post.

To integrate with a Microsoft Active Directory server:

  1. Configure users in the Microsoft Active Directory directory information tree (DIT) structure for client authentication to the broker.
  2. Configure destinations in the Microsoft Active Directory DIT structure to allow destination-level authorization for individual users or entire groups.
  3. Create an ActiveMQ configuration to allow authorization via LDAP.
  4. Create a broker and perform a basic test to validate authentication and authorization access for a test user.

Configuring Microsoft Active Directory for client authentication

Create the hierarchy structure within the Microsoft Active Directory DIT to provision users. The server must be part of the domain and has a domain admin user. The domain admin user is needed in the broker configuration.

In this DIT, the domain corp.example.com is used, though you can use any domain name. An organizational unit (OU) named corp exists under the root. ActiveMQ related entities are defined under the corp OU.

This OU is the user base that the broker uses to search for users when performing authentication operations. Represented as LDIF, the user base is:

OU=Users,OU=corp,DC=corp,DC=example,DC=com

To create this OU and user:

  1. Log on to the Windows Server using a domain admin user.
  2. Open Active Directory Users and Computers by running dsa.msc from the command line.
  3. Choose corp and create an OU named Users, located within corp.
  4. Select the Users OU and enter the name mquser.
  5. Deselect the option to change password on next logon.
  6. Finally, choose Next to create the user.

Because the ActiveMQ source code hardcodes the attribute name for users to uid, make sure that each user has this attribute set. For simplicity, use the user’s connection user name. For more information, see the ActiveMQ source code and knowledgebase article.

Users must belong to the amazonmq-console-admins group to enable console access. Members of this group can create and delete any destinations via the console, regardless of other authorization rules in place. Access to this group should be granted sparingly.

Configuring Microsoft Active Directory for authorization

Now that our broker knows where to search for users, configure the DIT such that the broker can search for user permissions relating to authorization.

Back in the root OU corp where the Users OU was previously created:

  1. Create a new OU named Destination.
  2. Within the Destination OU, create an OU for each type of destination that ActiveMQ offers. These are Queue, Topic, and Temp.

For each destination that you want to allow authorization:

  1. Add an OU under the type of destination.
  2. Provide the name of the destination as the name of the OU. Wildcards are also supported, as found in ActiveMQ documentation.

This example shows three OUs that require authorization. These are DEMO.MYQUEUE, DEMO.MYSECONDQUEUE, and DEMO.EVENTS.$. The queue search base, which provides authorization information for destinations of type Queue, has the following location in the DIT:

OU=Queue,OU=Destination,OU=corp,DC=corp,DC=example,DC=com

Note the DEMO.EVENTS.$ wildcard queue name. Permissions in that OU apply to all queue names matching that wildcard.

Within each OU representing a destination or wildcard destination set, create three security groups. These groups relate to specific permissions on the relevant destination, using the same admin, read, and write permissions rules as ActiveMQ documentation describes.

There is a conflict with the group name “admin”. Legacy “pre-Windows 2000” rules do not allow groups to share the same name, even in different locations of the DIT. The value in the “pre-Windows 2000” text box does not impact the setup but it must be globally unique. In the following screenshot, a uuid suffix is appended to each admin group name.

Adding a user to the admin security group for a particular destination enables the user to create and delete that topic. Adding them to the read security group enables them to read from the destination, and adding them to the write group enables them to write to the destination.

In this example, mquser is added to the admin and write groups for the queue DEMO.MYQUEUE. Later, you test this user’s authorization permissions to confirm that the integration works as expected.

In addition to adding individual users to security group permissions, you can add entire groups. Because ActiveMQ hardcodes attribute names for groups, ensure that the group has the object class groupOfNames, as shown in the ActiveMQ source code.

To do this, follow the same process as with the UID for users. See the knowledgebase article for additional information.

The LDAP server is now compatible with ActiveMQ. Next, create a broker and configure LDAP values based on the LDAP deployment.

Creating a configuration to enable authorization via LDAP

Authorization rules in ActiveMQ are sourced from the broker’s activemq.xml configuration file.

  1. Begin by navigating to the Amazon MQ console to create a configuration with the Authentication Type set as LDAP.
  2. Edit this configuration to include the cachedLDAPAuthorizationMap, which is the node used to configure the locations in the LDAP DIT where authorization rules are stored. For more information on this topic, visit ActiveMQ documentation.
  3. Within the cachedLDAPAuthorizationMap in the broker’s configuration,Add the location of the OUs related to authorization in the broker’s configuration.
  4. Under the authorizationPlugin tag, enter a cachedLDAPAuthorizationMap node.
  5. Do not specify connectionUrl, connectionUsername, or connectionPassword. These values are filled in using the LDAP Server Metadata specified when creating the broker. If you specify these values, they are ignored.An example cachedLDAPAuthorizationMap is presented in the following image:

Creating a broker and testing Active Directory integration

Start by creating a broker using the default durability optimized storage.

  1. Select a Single-instance broker. You can use Active/standby broker or Network of Brokers if required.
  2. Choose Next.
  3. In the next page, under Configure Settings, set a name for the broker.
  4. Select an instance type.
  5. In the ActiveMQ Access section, select LDAP Authentication & Authorization.The input fields display parameters for connecting with the LDAP server. The service account must be associated with a user that can bind to your LDAP server. The server does not need to be public but the domain name must be publicly resolvable.
  6. The next section of the page includes the search configuration for Active Directory users who are authorized to access the queues and topics. The values depend on the org structure created in the Active Directory setup. These values are based on your DIT.
  7. Once users and role search metadata are provided, configure the broker to launch with the configuration created in the previous section (named my-ldap-authorization-conf). Do this by selecting the Additional Settings drop-down and choose the correct configuration file.
  8. Use the configuration where you defined cachedLDAPAuthorizationMap. This enables the broker to enforce read/write/admin permissions for client connections to the broker. These are defined in the LDAP server’s Destination OU.

Once the broker is running, authentication and authorization rules are enforced using the users and authorization rules defined in the configured LDAP server. During the Microsoft Active Directory setup, mquser is added to the admin and write groups for the queue DEMO.MYQUEUE. This means mquser can create and write to the queue DEMO.MYQUEUE but cannot perform any actions on other queues.

Test this by writing to the queue:

The client can connect to the broker and send messages to the queue DEMO.MYQUEUE using the credentials for mquser.

Conclusion

This post shows the steps to integrate an LDAP server with an Amazon MQ broker. After the integration, you can manage authentication and authorization rules for your users, without rebooting the broker.

For more serverless learning resources, visit Serverless Land.

 

Understanding VPC links in Amazon API Gateway private integrations

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/understanding-vpc-links-in-amazon-api-gateway-private-integrations/

This post is written by Jose Eduardo Montilla Lugo, Security Consultant, AWS.

A VPC link is a resource in Amazon API Gateway that allows for connecting API routes to private resources inside a VPC. A VPC link acts like any other integration endpoint for an API and is an abstraction layer on top of other networking resources. This helps simplify configuring private integrations.

This post looks at the underlying technologies that make VPC links possible. I further describe what happens under the hood when a VPC link is created for both REST APIs and HTTP APIs. Understanding these details can help you better assess the features and benefits provided by each type. This also helps you make better architectural decisions when designing API Gateway APIs.

This article assumes you have experience in creating APIs in API Gateway. The main purpose is to provide a deeper explanation of the technologies that make private integrations possible. For more information on creating API Gateway APIs with private integrations, refer to the Amazon API Gateway documentation.

Overview

AWS Hyperplane and AWS PrivateLink

There are two types of VPC links: VPC links for REST APIs and VPC links for HTTP APIs. Both provide access to resources inside a VPC. They are built on top of an internal AWS service called AWS Hyperplane. This is an internal network virtualization platform, which supports inter-VPC connectivity and routing between VPCs. Internally, Hyperplane supports multiple network constructs that AWS services use to connect with the resources in customers’ VPCs. One of those constructs is AWS PrivateLink, which is used by API Gateway to support private APIs and private integrations.

AWS PrivateLink allows access to AWS services and services hosted by other AWS customers, while maintaining network traffic within the AWS network. Since the service is exposed via a private IP address, all communication is virtually local and private. This reduces the exposure of data to the public internet.

In AWS PrivateLink, a VPC endpoint service is a networking resource in the service provider side that enables other AWS accounts to access the exposed service from their own VPCs. VPC endpoint services allow for sharing a specific service located inside the provider’s VPC by extending a virtual connection via an elastic network interface in the consumer’s VPC.

An interface VPC endpoint is a networking resource in the service consumer side, which represents a collection of one or more elastic network interfaces. This is the entry point that allows for connecting to services powered by AWS PrivateLink.

Comparing private APIs and private integrations

Private APIs are different to private integrations. Both use AWS PrivateLink but they are used in different ways.

A private API means that the API endpoint is reachable only through the VPC. Private APIs are accessible only from clients within the VPC or from clients that have network connectivity to the VPC. For example, from on-premises clients via AWS Direct Connect. To enable private APIs, an AWS PrivateLink connection is established between the customer’s VPC and API Gateway’s VPC.

Clients connect to private APIs via an interface VPC endpoint, which routes requests privately to the API Gateway service. The traffic is initiated from the customer’s VPC and flows through the AWS PrivateLink to the API Gateway’s AWS account:

Consumer connected to provider through VPC Link

Consumer connected to provider through VPC Link

When the VPC endpoint for API Gateway is enabled, all requests to API Gateway APIs made from inside the VPC go through the VPC endpoint. This is true for private APIs and public APIs. Public APIs are still accessible from the internet and private APIs are accessible only from the interface VPC endpoint. Currently, you can only configure REST APIs as private.

A private integration means that the backend endpoint resides within a VPC and it’s not publicly accessible. With a private integration, API Gateway service can access the backend endpoint in the VPC without exposing the resources to the public internet.

A private integration uses a VPC link to encapsulate connections between API Gateway and targeted VPC resources. VPC links allow access to HTTP/HTTPS resources within a VPC without having to deal with advanced network configurations. Both REST APIs and HTTP APIs offer private integrations but only VPC links for REST APIs use AWS PrivateLink internally.

VPC links for REST APIs

When you create a VPC link for a REST API, a VPC endpoint service is also created, making the AWS account a service provider. The service consumer in this case is API Gateway’s account. The API Gateway service creates an interface VPC endpoint in their account for the Region where the VPC link is being created. This establishes an AWS PrivateLink from the API Gateway VPC to your VPC. The target of the VPC endpoint service and the VPC link is a Network Load Balancer, which forwards requests to the target endpoints:

VPC Link for REST APIs

VPC Link for REST APIs

Before establishing any AWS PrivateLink connection, the service provider must approve the connection request. Requests from the API Gateway accounts are automatically approved in the VPC link creation process. This is because the AWS accounts that serve API Gateway for each Region are allow-listed in the VPC endpoint service.

When a Network Load Balancer is associated with an endpoint service, the traffic to the targets is sourced from the NLB. The targets receive the private IP addresses of the NLB, not the IP addresses of the service consumers.

This is helpful when configuring the security groups of the instances behind the NLB for two reasons. First, you do not know the IP address range of the VPC that’s connecting to the service. Second, NLB’s elastic network interfaces do not have any security groups attached. This means that they cannot be used as a source in the security groups of the targets. To learn more, read how to find the internal IP addresses assigned to an NLB.

To create a private API with a private integration, two AWS PrivateLink connections are established. The first is from a customer VPC to API Gateway’s VPC so that clients in the VPC can reach the API Gateway service endpoint. The other is from API Gateway’s VPC to the customer VPC so that API Gateway can reach the backend endpoint. Here is an example architecture:

Private API with private integrations

Private API with private integrations

VPC links for HTTP APIs

HTTP APIs are the latest type of API Gateway APIs that are cheaper and faster than REST APIs. VPC links for HTTP APIs do not require the creation of VPC endpoint services so a Network Load Balancer is not necessary. With VPC Links for HTTP APIs, you can now use an ALB or an AWS Cloud Map service to target private resources. This allows for more flexibility and scalability in the configuration required on both sides.

Configuring multiple integration targets is also easier with VPC links for HTTP APIs. For example, VPC links for REST APIs can be associated only with a single NLB. Configuring multiple backend endpoints requires some workarounds such as using multiple listeners on the NLB, associated with different target groups.

In contrast, a single VPC link for HTTP APIs can be associated with multiple backend endpoints without additional configuration. Also, with the new VPC link, customers with containerized applications can use ALBs instead of NLBs and take advantage of layer-7 load-balancing capabilities and other features such as authentication and authorization.

AWS Hyperplane supports multiple types of network virtualization constructs, including AWS PrivateLink. VPC links for REST APIs rely on AWS PrivateLink. However, VPC links for HTTP APIs use VPC-to-VPC NAT, which provides a higher level of abstraction.

The new construct is conceptually similar to a tunnel between both VPCs. These are created via elastic network interface attachments on the provider and consumer ends, which are both managed by AWS Hyperplane. This tunnel allows a service hosted in the provider’s VPC (API Gateway) to initiate communications to resources in a consumer’s VPC. API Gateway has direct connectivity to these elastic network interfaces and can reach the resources in the VPC directly from their own VPC. Connections are permitted according to the configuration of the security groups attached to the elastic network interfaces in the customer side.

Although it seems to provide the same functionality as AWS PrivateLink, these constructs differ in implementation details. A service endpoint in AWS PrivateLink allows for multiple connections to a single endpoint (the NLB), whereas the new approach allows a source VPC to connect to multiple destination endpoints. As a result, a single VPC link can integrate with multiple Application Load Balancers, Network Load Balancers, or resources registered with an AWS Cloud Map service on the customer side:

VPC Link for HTTP APIs

VPC Link for HTTP APIs

This approach is similar to the way that other services such as Lambda access resources inside customer VPCs.

Conclusion

This post explores how VPC links can set up API Gateway APIs with private integrations. VPC links for REST APIs encapsulate AWS PrivateLink resources such as interface VPC endpoints and VPC endpoint services to configure connections from API Gateway’s VPC to customer’s VPC to access private backend endpoints.

VPC links for HTTP APIs use a different construct in the AWS Hyperplane service to provide API Gateway with direct network access to VPC private resources. Understanding the differences between the two is important when adding private integrations as part of your API architecture design.

For more serverless learning resources, visit Serverless Land.

Building a serverless multiplayer game that scales: Part 2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-serverless-multiplayer-game-that-scales-part-2/

This post is written by Vito De Giosa, Sr. Solutions Architect and Tim Bruce, Sr. Solutions Architect, Developer Acceleration.

This series discusses solutions for scaling serverless games, using the Simple Trivia Service, a game that relies on user-generated content. Part 1 describes the overall architecture, how to deploy to your AWS account, and different communications methods.

This post discusses how to scale via automation and asynchronous processes. You can use automation to minimize the need to scale personnel to review player-generated content for acceptability. It also introduces asynchronous processing, which allows you to run non-critical processes in the background and batch data together. This helps to improve resource usage and game performance. Both scaling techniques can also reduce overall spend.

To set up the example, see the instructions in the GitHub repo and the README.md file. This example uses services beyond the AWS Free Tier and incurs charges. Instructions to remove the example application from your account are also in the README.md file.

Technical implementation

Games require a mechanism to support auto-moderated avatars. Specifically, this is an upload process to allow the player to send the content to the game. There is a content moderation process to remove unacceptable content and a messaging process to provide players with a status regarding their content.

Here is the architecture for this feature in Simple Trivia Service, which is combined within the avatar workflow:

Architecture diagram

This architecture processes images uploaded to Amazon S3 and notifies the user of the processing result via HTTP WebPush. This solution uses AWS Serverless services and the Amazon Rekognition moderation API.

Uploading avatars

Players start the process by uploading avatars via the game client. Using presigned URLs, the client allows players to upload images directly to S3 without sharing AWS credentials or exposing the bucket publicly.

The URL embeds all the parameters of the S3 request. It includes a SignatureV4 generated with AWS credentials from the backend allowing S3 to authorize the request.

S3 upload process

  1. The front end retrieves the presigned URL invoking an AWS Lambda function through an Amazon API Gateway HTTP API endpoint.
  2. The front end uses the URL to send a PUT request to S3 with the image.

Processing avatars

After the upload completes, the backend performs a set of activities. These include content moderation, generating the thumbnail variant, and saving the image URL to the player profile. AWS Step Functions orchestrates the workflow by coordinating tasks and integrating with AWS services, such as Lambda and Amazon DynamoDB. Step Functions enables creating workflows without writing code and handles errors, retries, and state management. This enables traffic control to avoid overloading single components when traffic surges.

The avatar processing workflow runs asynchronously. This allows players to play the game without being blocked and enables you to batch the requests. The Step Functions workflow is triggered from an Amazon EventBridge event. When the user uploads an image to S3, an event is published to EventBridge. The event is routed to the avatar processing Step Functions workflow.

The single avatar feature runs in seconds and uses Step Functions Express Workflows, which are ideal for high-volume event-processing use cases. Step Functions can also support longer running processes and manual steps, depending on your requirements.

To keep performance at scale, the solution adopts four strategies. First, it moderates content automatically, requiring no human intervention. This is done via Amazon Rekognition moderation API, which can discover inappropriate content in uploaded avatars. Developers do not need machine learning expertise to use this API. If it identifies unacceptable content, the Step Functions workflow deletes the uploaded picture.

Second, it uses avatar thumbnails on the top navigation bar and on leaderboards. This speeds up page loading and uses less network bandwidth. Image-editing software runs in a Lambda function to modify the uploaded file and store the result in S3 with the original.

Third, it uses Amazon CloudFront as a content delivery network (CDN) with the S3 bucket hosting images. This improves performance by implementing caching and serving static content from locations closer to the player. Additionally, using CloudFront allows you to keep the bucket private and provide greater security for the content stored within S3.

Finally, it stores profile picture URLs in DynamoDB and replicates the thumbnail URL in an Amazon Cognito user attribute named picture. This allows the game to retrieve the avatar URL as part of the login process, saving an HTTP GET request for the player profile.

The last step of the workflow publishes the result via an event to EventBridge for downstream systems to consume. The service routes the event to the notification component to inform the player about the moderation status.

Notifying users of the processing result

The result of the avatar workflow to the player is important but not urgent. Players want to know the result but not impact their gameplay experience. A solution for this challenge is to use HTTP web push. It uses the HTTP protocol and does not require a constant communication channel between backend and front end. This allows players to play games without being blocked or by introducing latency to the game communications channel.

Applications requiring low latency fully bidirectional communication, such as highly interactive multi-player games, typically use WebSockets. This creates a persistent two-way channel for front end and backend to exchange information. The web push mechanism can provide non-urgent data and messages to the player without interrupting the WebSockets channel.

The web push protocol describes how to use a consolidated push service as a broker between the web-client and the backend. It accepts subscriptions from the client and receives push message delivery requests from the backend. Each browser vendor provides a push service implementation that is compliant with the W3C Push API specification and is external to both client and backend.

The web client is typically a browser where a JavaScript application interacts with the push service to subscribe and listen for incoming notifications. The backend is the application that notifies the front end. Here is an overview of the protocol with all the parties involved.

Notification process

  1. A component on the client subscribes to the configured push service by sending an HTTP POST request. The client keeps a background connection waiting for messages.
  2. The push service returns a URL identifying a push resource that the client distributes to backend applications that are allowed to send notifications.
  3. Backend applications request a message delivery by sending an HTTP POST request to the previously distributed URL.
  4. The push service forwards the information to the client.

This approach has four advantages. First, it reduces the effort to manage the reliability of the delivery process by off-loading it to an external and standardized component. Second, it minimizes cost and resource consumption. This is because it doesn’t require the backend to keep a persistent communication channel or compute resources to be constantly available. Third, it keeps complexity to a minimum because it relies on HTTP only without requiring additional technologies. Finally, HTTP web push addresses concepts such as message urgency and time-to-live (TTL) by using a standard.

Serverless HTTP web push

The implementation of the web push protocol requires the following components, per the Push API specification. First, the front end is required to create a push subscription. This is implemented through a service worker, a script running in the origin of the application. The service worker exposes operations to access the push service either creating subscriptions or listening for push events.

Serverless HTTP web push

  1. The client uses the service worker to subscribe to the push service via the Push API.
  2. The push service responds with a payload including a URL, which is the client’s push endpoint. The URL is used to create notification delivery requests.
  3. The browser enriches the subscription with public cryptographic keys, which are used to encrypt messages ensuring confidentiality.
  4. The backend must receive and store the subscription for when a delivery request is made to the push service. This is provided by API Gateway, Lambda, and DynamoDB. API Gateway exposes an HTTP API endpoint that accepts POST requests with the push service subscription as payload. The payload is stored in DynamoDB alongside the player identifier.

This front end code implements the process:

//Once service worker is ready
navigator.serviceWorker.ready
  .then(function (registration) {
    //Retrieve existing subscription or subscribe
    return registration.pushManager.getSubscription()
      .then(async function (subscription) {
        if (subscription) {
          console.log('got subscription!', subscription)
          return subscription;
        }
        /*
         * Using Public key of our backend to make sure only our
         * application backend can send notifications to the returned
         * endpoint
         */
        const convertedVapidKey = self.vapidKey;
        return registration.pushManager.subscribe({
          userVisibleOnly: true,
          applicationServerKey: convertedVapidKey
        });
      });
  }).then(function (subscription) {
    //Distributing the subscription to the application backend
    console.log('register!', subscription);
    const body = JSON.stringify(subscription);
    const parms = {jwt: jwt, playerName: playerName, subscription: body};
    //Call to the API endpoint to save the subscription
    const res = DataService.postPlayerSubscription(parms);
    console.log(res);
  });

 

Next, the backend reacts to the avatar workflow completed custom event to create a delivery request. This is accomplished with EventBridge and Lambda.

Backend process after avater workflow completed

  1. EventBridge routes the event to a Lambda function.
  2. The function retrieves the player’s agent subscriptions, including push endpoint and encryption keys, from DynamoDB.
  3. The function sends an HTTP POST to the push endpoint with the encrypted message as payload.
  4. When the push service delivers the message, the browser activates the service worker updating local state and displaying the notification.

The push service allows creating delivery requests based on the knowledge of the endpoint and the front end allows the backend to deliver messages by distributing the endpoint. HTTPS provides encryption for data in transit while DynamoDB encrypts all your data at rest to provide confidentiality and security for the endpoint.

Security of WebPush can be further improved by using Voluntary Application Server Identification (VAPID). With WebPush, the clients authenticate messages at delivery time. VAPID allows the push service to perform message authentication on behalf of the web client avoiding denial-of-service risk. Without the additional security of VAPID, any application knowing the push service endpoint might successfully create delivery requests with an invalid payload. This can cause the player’s agent to accept messages from unauthorized services and, possibly, cause a denial-of-service to the client by overloading its capabilities.

VAPID requires backend applications to own a key pair. In Simple Trivia Service, a Lambda function, which is an AWS CloudFormation custom resource, generates the key pair when deploying the stack. It securely saves values in AWS System Manager (SSM) Parameter Store.

Here is a representation of VAPID in action:

VAPID process architecture

  1. The front end specifies which backend the push service can accept messages from. It does this by including the public key from VAPID in the subscription request.
  2. When requesting a message delivery, the backend self-identifies by including the public key and a token signed with the private key in the HTTP Authorization header. If the keys match and the client uses the public key at subscription, the message is sent. If not, the message is blocked by the push service.

The Lambda function that sends delivery requests to the push service reads the key values from SSM. It uses them to generate the Authorization header to include in the request, allowing for successful delivery to the client endpoint.

Conclusion

This post shows how you can add scaling support for a game via automation. The example uses Amazon Rekognition to check images for unacceptable content and uses asynchronous architecture patterns with Step Functions and HTTP WebPush. These scaling approaches can help you to maximize your technical and personnel investments.

For more serverless learning resources, visit Serverless Land.

Automating Amazon CloudWatch dashboards and alarms for Amazon Managed Workflows for Apache Airflow

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/automating-amazon-cloudwatch-dashboards-and-alarms-for-amazon-managed-workflows-for-apache-airflow/

This post is written by Mark Richman, Senior Solutions Architect.

Amazon Managed Workflows for Apache Airflow (MWAA) is a fully managed service that makes it easier to run open-source versions of Apache Airflow on AWS. It allows you to build workflows to run your extract-transform-load (ETL) jobs and data pipelines.

When working with MWAA, you often need to know more about the performance of your Airflow environment to achieve full insight and observability of your system. Airflow emits a number of useful metrics to Amazon CloudWatch, which are described in the product documentation. MWAA allows customers to define CloudWatch dashboards and alarms based upon the metrics and logs that Apache Airflow emits.

Airflow exposes metrics such as number of Directed Acyclic Graph (DAG) processes, DAG bag size, number of currently running tasks, task failures, and successes. Airflow is already set up to send metrics for an MWAA environment to CloudWatch.

This blog demonstrates a solution that automatically detects any deployed Airflow environments associated with the AWS account. It builds a CloudWatch dashboard and some useful alarms for each. All source code for this blog post is available on GitHub.

You automate the creation of a CloudWatch dashboard, which displays several of these key metrics together with CloudWatch alarms. These alarms receive notifications when the metrics fall outside of the thresholds that you configure, and allow you to perform actions in response.

Prerequisites

Deploying this solution requires:

Overview

Based on AWS serverless services, this solution includes:

  1. An Amazon EventBridge rule that runs on a schedule to invoke an AWS Step Functions workflow.
  2. Step Function orchestrates several AWS Lambda functions to query the existing MWAA environments.
  3. Lambda functions will update the CloudWatch dashboard definition to include metrics such as QueuedTasks, RunningTasks, SchedulerHeartbeat, TasksPending, and TotalParseTime.
  4. CloudWatch alarms are created for unhealthy workers and heartbeat failure across all MWAA environments. These alarms are removed for any nonexistent environments.
  5. Any MWAA environments that no longer exist have their respective CW dashboards removed.

Reference architecture

EventBridge, a serverless event service, is configured to invoke a Step Functions workflow every 10 minutes. You can configure this for your preferred interval. Step Functions invokes a number of Lambda functions in parallel. If a function throws an error, the failed step in the state machine transitions to its respective failed state and the entire workflow ends.

Each of the Lambda functions performs a single task, orchestrated by Step Functions. Each function has a descriptive name for the task it performs in the workflow.

Understanding the CreateDashboardFunction function

When you deploy the AWS SAM template, the SeedDynamoDBFunction Lambda function is invoked. The function populates a DynamoDB table called DashboardTemplateTable with a CloudWatch dashboard definition. This definition is a template for any new CloudWatch dashboard created by CreateDashboardFunction.

You can see this definition in the GitHub repo:

{
  "widgets": [{
      "type": "metric",
      "x": 0,
      "y": 0,
      "width": 12,
      "height": 6,
      "properties": {
        "view": "timeSeries",
        "stacked": true,
        "metrics": [
          [
            "AmazonMWAA",
            "QueuedTasks",
            "Function",
            "Executor",
            "Environment",
            "${EnvironmentName}"
          ]
        ],
        "region": "${AWS::Region}",
        "title": "QueuedTasks ${EnvironmentName}",
        "period": 300
      }
    },
    ...
}

When this Step Functions workflow runs, it runs CreateDashboardFunction. This iterates through all the MWAA environments in the account, creating or updating its corresponding CloudWatch dashboard. You can see the code in /functions/create_dashboard/app.py:

for env in mwaa_environments:
        dashboard_name = f"Airflow-{env}"

        dashboard_body = dashboard_template.replace(
            "${AWS::Region}", os.getenv("AWS_REGION", "us-east-1")
        ).replace("${EnvironmentName}", env)

        logger.info(f"Creating/updating dashboard: {dashboard_name}")
        logger.debug(dashboard_body)

        response = cloudwatch.put_dashboard(
            DashboardName=dashboard_name, DashboardBody=dashboard_body
        )

        logger.info(json.dumps(response, indent=2))

Step Functions state machine

Here is a visualization of the Step Functions state machine:

State machine visualization

Step Functions is based on state machines and tasks. A state machine is a workflow. A task is a state in a workflow that represents a single unit of work that another AWS service performs. Each step in a workflow is a state. The state machine in this solution is defined in JSON format in the repo.

Building and deploying the example application

To build and deploy this solution:

  1. Clone the repo from GitHub:
    git clone https://github.com/aws-samples/mwaa-dashboard
    cd mwaa-dashboard
    
  2. Build the application. Since Lambda functions may depend on packages that have natively compiled programs, use the --use-container flag. This flag compiles your functions locally in a Docker container that behaves like the Lambda environment:
    sam build --use-container
  3. Deploy the application to your AWS account:
    sam deploy --guided

This command packages and deploys the application to your AWS account. It provides a series of prompts:

  • Stack Name: The name of the stack to deploy to AWS CloudFormation. This should be unique to your account and Region. This walkthrough uses mwaa-dashboard throughout this project.
  • AWS Region: The AWS Region you want to deploy your app to.
  • Confirm changes before deploy: If set to yes, any change sets will be shown to you for manual review. If set to no, the AWS SAM CLI automatically deploys application changes.
  • Respond to the remaining prompts per the SAM CLI command reference.

Updating the CloudWatch dashboard template definition in DynamoDB

The CloudWatch dashboard template definition is stored in DynamoDB. This is a one-time setup step, performed by the functions/seed_dynamodb Lambda custom resource at stack deployment time.

To override the template, you can edit the data directly in DynamoDB using the AWS Management Console. Alternatively, modify the scripts/dashboard-template.json file and update DynamoDB using the scripts/seed.py script.

cd scripts
./seed.py -t <dynamodb-table-name>
cd ..

Here, <dynamodb-table-name> is the name of the DynamoDB table created during deployment. For example:

./seed.py -t mwaa-dashboard-DashboardTemplateTable-VA2M5945RCF1

Viewing the CloudWatch dashboards and alarms

If you have any existing MWAA environments, or create new ones, a dashboard for each appears with the Airflow- prefix. If you delete an MWAA environment, the corresponding dashboard is also deleted. No CloudWatch metrics are deleted.

Upon successful completion of the Step Functions workflow, you see a list of custom dashboards. There is one for each of your MWAA environments:

Custom dashboards view

Choosing the dashboard name displays the widgets defined in the JSON described previously. Each widget corresponds to an Airflow key performance indicator (KPI). The dashboards can be customized through the AWS Management Console without any code changes.

Example dashboard

These are the metrics:

  • QueuedTasks: The number of tasks with queued state. Corresponds to the executor.queued_tasks Airflow metric.
  • TasksPending: The number of tasks pending in executor. Corresponds to the scheduler.tasks.pending Airflow metric.
  • RunningTasks: The number of tasks running in executor. Corresponds to the executor.running_tasks Airflow metric.
  • SchedulerHeartbeat: The number of check-ins Airflow performs on the scheduler job. Corresponds to the scheduler_heartbeat Airflow metrics.
  • TotalParseTime: Number of seconds taken to scan and import all DAG files once. Corresponds to the dag_processing.total_parse_time Airflow metric.

More information on all MWAA metrics available in CloudWatch can be found in the documentation. CloudWatch alarms are also created for each MWAA environment:

CloudWatch alarms list

By default, you create two alarms:

  • {environment name}-UnhealthyWorker: This alarm triggers if the number of QueuedTasks is greater than the number of RunningTasks, and the number of RunningTasks is zero, for a period of 15 minutes.
  • {environment name}-HeartbeatFail: This alarm triggers if SchedulerHeartbeat is zero for a period of five minutes.

You can configure actions in response to these alarms, such as an Amazon SNS notification to email or a Slack message.

Cleaning up

After testing this application, delete the resources created to avoid ongoing charges. You can use the AWS CLI, AWS Management Console, or the AWS APIs to delete the CloudFormation stack deployed by AWS SAM.

To delete the stack via the AWS CLI, run the following command:

aws cloudformation delete-stack --stack-name mwaa-dashboard

The log groups all share the prefix /aws/lambda/mwaa-dashboard. Delete these with the command:

aws logs delete-log-group --log-group-name <log group>

Conclusion

With Amazon MWAA, you can spend more time building workflows and less time managing and scaling infrastructure. This article shows a serverless example that automatically creates CloudWatch dashboards and alarms for all existing and new MWAA environments. With this example, you can achieve better observability for your MWAA environments.

To get started with MWAA, visit the user guide. To deploy this solution in your own AWS account, visit the GitHub repo for this article.

For more serverless learning resources, visit Serverless Land.

Integrating Amazon API Gateway private endpoints with on-premises networks

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/integrating-amazon-api-gateway-private-endpoints-with-on-premises-networks/

This post was written by Ahmed ElHaw, Sr. Solutions Architect

Using AWS Direct Connect or AWS Site-to-Site VPN, customers can establish a private virtual interface from their on-premises network directly to their Amazon Virtual Private Cloud (VPC). Hybrid networking enables customers to benefit from the scalability, elasticity, and ease of use of AWS services while using their corporate network.

Amazon API Gateway can make it easier for developers to interface with and expose other services in a uniform and secure manner. You can use it to interface with other AWS services such as Amazon SageMaker endpoints for real-time machine learning predictions or serverless compute with AWS Lambda. API Gateway can also integrate with HTTP endpoints and VPC links in the backend.

This post shows how to set up a private API Gateway endpoint with a Lambda integration. It uses a Route 53 resolver, which enables on-premises clients to resolve AWS private DNS names.

Overview

API Gateway private endpoints allow you to use private API endpoints inside your VPC. When used with Route 53 resolver endpoints and hybrid connectivity, you can access APIs and their integrated backend services privately from on-premises clients.

You can deploy the example application using the AWS Serverless Application Model (AWS SAM). The deployment creates a private API Gateway endpoint with a Lambda integration and a Route 53 inbound endpoint. I explain the security configuration of the AWS resources used. This is the solution architecture:

Private API Gateway with a Hello World Lambda integration.

Private API Gateway with a Hello World Lambda integration.

 

  1. The client calls the private API endpoint (for example, GET https://abc123xyz0.execute-api.eu-west-1.amazonaws.com/demostage).
  2. The client asks the on-premises DNS server to resolve (abc123xyz0.execute-api.eu-west-1.amazonaws.com). You must configure the on-premises DNS server to forward DNS queries for the AWS-hosted domains to the IP addresses of the inbound resolver endpoint. Refer to the documentation for your on-premises DNS server to configure DNS forwarders.
  3. After the client successfully resolves the API Gateway private DNS name, it receives the private IP address of the VPC Endpoint of the API Gateway.
    Note: Call the DNS endpoint of the API Gateway for the HTTPS certificate to work. You cannot call the IP address of the endpoint directly.
  4. Amazon API Gateway passes the payload to Lambda through an integration request.
  5. If Route 53 Resolver query logging is configured, queries from on-premises resources that use the endpoint are logged.

Prerequisites

To deploy the example application in this blog post, you need:

  • AWS credentials that provide the necessary permissions to create the resources. This example uses admin credentials.
  • Amazon VPN or AWS Direct Connect with routing rules that allow DNS traffic to pass through to the Amazon VPC.
  • The AWS SAM CLI installed.
  • Clone the GitHub repository.

Deploying with AWS SAM

  1. Navigate to the cloned repo directory. Alternatively, use the sam init command and paste the repo URL:

    SAM init example

    SAM init example

  2. Build the AWS SAM application:
    sam build
  3. Deploy the AWS SAM application:
    sam deploy –guided

This stack creates and configures a virtual private cloud (VPC) configured with two private subnets (for resiliency) and DNS resolution enabled. It also creates a VPC endpoint with (service name = “com.amazonaws.{region}.execute-api”), Private DNS Name = enabled, and a security group set to allow TCP Port 443 inbound from a managed prefix list. You can edit the created prefix list with one or more CIDR block(s).

It also deploys an API Gateway private endpoint and an API Gateway resource policy that restricts access to the API, except from the VPC endpoint. There is also a “Hello world” Lambda function and a Route 53 inbound resolver with a security group that allows TCP/UDP DNS port inbound from the on-premises prefix list.

A VPC endpoint is a logical construct consisting of elastic network interfaces deployed in subnets. The elastic network interface is assigned a private IP address from your subnet space. For high availability, deploy in at least two Availability Zones.

Private API Gateway VPC endpoint

Private API Gateway VPC endpoint

Route 53 inbound resolver endpoint

Route 53 resolver is the Amazon DNS server. It is sometimes referred to as “AmazonProvidedDNS” or the “.2 resolver” that is available by default in all VPCs. Route 53 resolver responds to DNS queries from AWS resources within a VPC for public DNS records, VPC-specific DNS names, and Route 53 private hosted zones.

Integrating your on-premises DNS server with AWS DNS server requires a Route 53 resolver inbound endpoint (for DNS queries that you’re forwarding to your VPCs). When creating an API Gateway private endpoint, a private DNS name is created by API Gateway. This endpoint is resolved automatically from within your VPC.

However, the on-premises servers learn about this hostname from AWS. For this, create a Route 53 inbound resolver endpoint and point your on-premises DNS server to it. This allows your corporate network resources to resolve AWS private DNS hostnames.

To improve reliability, the resolver requires that you specify two IP addresses for DNS queries. AWS recommends configuring IP addresses in two different Availability Zones. After you add the first two IP addresses, you can optionally add more in the same or different Availability Zone.

The inbound resolver is a logical resource consisting of two elastic network interfaces. These are deployed in two different Availability Zones for resiliency.

Route 53 inbound resolver

Route 53 inbound resolver

Configuring the security groups and resource policy

In the security pillar of the AWS Well-Architected Framework, one of the seven design principles is applying security at all layers: Apply a defense in depth approach with multiple security controls. Apply to all layers (edge of network, VPC, load balancing, every instance and compute service, operating system, application, and code).

A few security configurations are required for the solution to function:

  • The resolver security group (referred to as ‘ResolverSG’ in solution diagram) inbound rules must allow TCP and UDP on port 53 (DNS) from your on-premises network-managed prefix list (source). Note: configure the created managed prefix list with your on-premises network CIDR blocks.
  • The security group of the VPC endpoint of the API Gateway “VPCEndpointSG” must allow HTTPS access from your on-premises network-managed prefix list (source). Note: configure the crated managed prefix list with your on-premises network CIDR blocks.
  • For a private API Gateway to work, a resource policy must be configured. The AWS SAM deployment sets up an API Gateway resource policy that allows access to your API from the VPC endpoint. We are telling API Gateway to deny any request explicitly unless it is originating from a defined source VPC endpoint.
    Note: AWS SAM template creates the following policy:

    {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Effect": "Allow",
              "Principal": "*",
              "Action": "execute-api:Invoke",
              "Resource": "arn:aws:execute-api:eu-west-1:12345678901:dligg9dxuk/DemoStage/GET/hello"
          },
          {
              "Effect": "Deny",
              "Principal": "*",
              "Action": "execute-api:Invoke",
              "Resource": "arn:aws:execute-api:eu-west-1: 12345678901:dligg9dxuk/DemoStage/GET/hello",
              "Condition": {
                  "StringNotEquals": {
                      "aws:SourceVpce": "vpce-0ac4147ba9386c9z7"
                  }
              }
          }
      ]
    }

     

The AWS SAM deployment creates a Hello World Lambda. For demonstration purposes, the Lambda function always returns a successful response, conforming with API Gateway integration response.

Testing the solution

To test, invoke the API using a curl command from an on-premises client. To get the API URL, copy it from the on-screen AWS SAM deployment outputs. Alternatively, from the console go to AWS CloudFormation outputs section.

CloudFormation outputs

CloudFormation outputs

Next, go to Route 53 resolvers, select the created inbound endpoint and note of the endpoint IP addresses. Configure your on-premises DNS forwarder with the IP addresses. Refer to the documentation for your on-premises DNS server to configure DNS forwarders.

Route 53 resolver IP addresses

Route 53 resolver IP addresses

Finally, log on to your on-premises client and call the API Gateway endpoint. You should get a success response from the API Gateway as shown.

curl https://dligg9dxuk.execute-api.eu-west-1.amazonaws.com/DemoStage/hello

{"response": {"resultStatus": "SUCCESS"}}

Monitoring and troubleshooting

Route 53 resolver query logging allows you to log the DNS queries that originate in your VPCs. It shows which domain names are queried, the originating AWS resources (including source IP and instance ID) and the responses.

You can log the DNS queries that originate in VPCs that you specify, in addition to the responses to those DNS queries. You can also log DNS queries from on-premises resources that use an inbound resolver endpoint, and DNS queries that use an outbound resolver endpoint for recursive DNS resolution.

After configuring query logging from the console, you can use Amazon CloudWatch as the destination for the query logs. You can use this feature to view and troubleshoot the resolver.

{
    "version": "1.100000",
    "account_id": "1234567890123",
    "region": "eu-west-1",
    "vpc_id": "vpc-0c00ca6aa29c8472f",
    "query_timestamp": "2021-04-25T12:37:34Z",
    "query_name": "dligg9dxuk.execute-api.eu-west-1.amazonaws.com.",
    "query_type": "A",
    "query_class": "IN",
    "rcode": "NOERROR",
    "answers": [
        {
            "Rdata": "10.0.140.226”, API Gateway VPC Endpoint IP#1
            "Type": "A",
            "Class": "IN"
        },
        {
            "Rdata": "10.0.12.179", API Gateway VPC Endpoint IP#2
            "Type": "A",
            "Class": "IN"
        }
    ],
    "srcaddr": "172.31.6.137", ONPREMISES CLIENT
    "srcport": "32843",
    "transport": "UDP",
    "srcids": {
        "resolver_endpoint": "rslvr-in-a7dd746257784e148",
        "resolver_network_interface": "rni-3a4a0caca1d0412ab"
    }
}

Cleaning up

To remove the example application, navigate to CloudFormation and delete the stack.

Conclusion

API Gateway private endpoints allow use cases for building private API–based services inside your VPCs. You can keep both the frontend to your application (API Gateway) and the backend service private inside your VPC.

I discuss how to access your private APIs from your corporate network through Direct Connect or Site-to-Site VPN without exposing your endpoints to the internet. You deploy the demo using AWS Serverless Application Model (AWS SAM). You can also change the template for your own needs.

To learn more, visit the API Gateway tutorials and workshops page in the API Gateway developer guide.

For more serverless learning resources, visit Serverless Land.