Tag Archives: Technical How-to

Building serverless multi-Region WebSocket APIs

2022-03-10 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-serverless-multi-region-websocket-apis/

This post is written by Ben Freiberg, Senior Solutions Architect, and Marcus Ziller, Senior Solutions Architect.

Many modern web applications use the WebSocket protocol for bidirectional communication between frontend clients and backends. The fastest way to get started with WebSockets on AWS is to use WebSocket APIs powered by Amazon API Gateway.

This serverless solution allows customers to get started with WebSockets without having the complexity of running a WebSocket API. WebSocket APIs are a Regional service bound to a single Region, which may affect latency and resilience for some workloads.

This post shows how to build a multi-regional WebSocket API for a global real-time chat application.

Overview of the solution

This solution uses AWS Cloud Development Kit (CDK). This is an open source software development framework to model and provision cloud application resources. Using the CDK can reduce the complexity and amount of code needed to automate the deployment of resources.

This solution uses AWS Lambda, Amazon API Gateway, Amazon DynamoDB, and Amazon EventBridge.

This diagram outlines the workflow implemented in this blog:

Users across different Regions establish WebSocket connections to an API endpoint in a Region. For every connection, the respective API Gateway invokes the ConnectionHandler Lambda function, which stores the connection details in a Regional DynamoDB table.
User A sends a chat message via the established WebSocket connection. The API Gateway invokes the ClientMessageHandler Lambda function with the received message. The Lambda function publishes an event to an EventBridge event bus that contains the message and the connectionId of the message sender.
The event bus invokes the EventBusMessageHandler Lambda function, which pushes the received message to all other clients connected in the Region. It also replicates the event into us-west-1.
EventBusMessageHandler in us-west-1 receives and send it out to all connected clients in the Region via the same mechanism.

Walkthrough

The following walkthrough explains the required components, their interactions and how the provisioning can be automated via CDK.

For this walkthrough, you need:

An AWS account
Installed Node.js
Installed git
AWS CDK installed

Checkout and deploy the sample stack:

After completing the prerequisites, clone the associated GitHub repository by running the following command in a local directory:
```
git clone [email protected]/aws-samples/multi-region-websocket-api
```
Open the repository in your preferred editor and review the contents of the src and cdk folder.
Follows the instructions in the README.md to deploy the stack.

The following components are deployed in your account for every specified Region. If you didn’t change the default, the Regions are eu-west-1 and us-west-1.

API Gateway for WebSocket connectivity

API Gateway is a fully managed service that makes it easier for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the “front door” for applications to access data, business logic, or functionality from your backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications.

WebSocket APIs serve as a stateful frontend for an AWS service, in this case AWS Lambda. A Lambda function is used for the WebSocket endpoint that maintains a persistent connection to handle message transfer between the backend service and clients. The WebSocket API invokes the backend based on the content of the messages that it receives from client apps.

There are three predefined routes that can be used: $connect, $disconnect, and $default.

const connectionLambda = new lambda.Function(..);
const requestHandlerLambda = new lambda.Function(..);

const webSocketApi = new apigwv2.WebSocketApi(this, 'WebsocketApi', {
      apiName: 'WebSocketApi',
      description: 'A regional Websocket API for the multi-region chat application sample',
      connectRouteOptions: {
        integration: new WebSocketLambdaIntegration('connectionIntegration', connectionLambda.fn),
      },
      disconnectRouteOptions: {
        integration: new WebSocketLambdaIntegration('disconnectIntegration', connectionLambda.fn),
      },
      defaultRouteOptions: {
        integration: new WebSocketLambdaIntegration('defaultIntegration', requestHandlerLambda.fn),
      },
});

const websocketStage = new apigwv2.WebSocketStage(this, 'WebsocketStage', {
      webSocketApi,
      stageName: 'dev',
      autoDeploy: true,
});

$connect and $disconnect are used by clients to initiate or end a connection with the API Gateway. Each route has a backend integration that is invoked for the respective event. In this example, a Lambda function gets invoked with details of the event. The following code snippet shows how you can track each of the connected clients in an Amazon DynamoDB table. Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale.

// Simplified example for brevity
// Visit GitHub repository for complete code

function connectionHandler(event: APIGatewayEvent) {
  if (eventType === 'CONNECT') {
    await dynamoDbClient.put({
      Item: {
        connectionId,
        chatId: 'DEFAULT',
        ttl: Math.round(Date.now() / 1000 + 3600) // TTL of one hour
      },
    });
  }

  if (eventType === 'DISCONNECT') {
    await dynamoDbClient.delete({
      TableName: process.env.TABLE_NAME!,
      Key: {
        connectionId,
        chatId: 'DEFAULT',
      },
    })
  }

  return ..
}

The $default route is used when the route selection expression produces a value that does not match any of the other route keys in your API routes. For this post, we use it as a default route for all messages sent to the API Gateway by a client. For each message, a Lambda function is invoked with an event of the following format.

{
     "requestContext": {
         "routeKey": "$default",
         "messageId": "GXLKJfX4FiACG1w=",
         "eventType": "MESSAGE",
         "messageDirection": "IN",
         "connectionId": "GXLKAfX1FiACG1w=",
         "apiId": "3m4dnp0wy4",
         "requestTimeEpoch": 1632812813588,
         // some fields omitted for brevity   
         },
     "body": "{ .. }",
     "isBase64Encoded": false
}

EventBridge for cross-Region message distribution

The Lambda function uses the AWS SDK to publish the message data in event.body to EventBridge. EventBridge is a serverless event bus that makes it easier to build event-driven applications at scale. It delivers a stream of real-time data from event sources to targets. You can set up routing rules to determine where to send your data to build application architectures that react in real time to your data sources with event publishers and consumers decoupled.

The following CDK code defines routing rules on the event bus that is applied for every event with source ChatApplication and detail type ChatMessageReceived.

    new events.Rule(this, 'ProcessRequest', {
      eventBus,
      enabled: true,
      ruleName: 'ProcessChatMessage',
      description: 'Invokes a Lambda function for each chat message to push the event via websocket and replicates the event to event buses in other regions.',
      eventPattern: {
        detailType: ['ChatMessageReceived'],
        source: ['ChatApplication'],
      },
      targets: [
        new LambdaFunction(processLambda.fn),
        ...additionalEventBuses,
      ],
    });

Intra-Region message delivery

The first target is a Lambda function that sends the message out to clients connected to the API Gateway endpoint in the same Region where the message was received.

To that end, the function first uses the AWS SDK to query DynamoDB for active connections for a given chatId in its AWS Region. It then removes the connectionId of the message sender from the list and calls postToConnection(..) for the remaining connection ids to push the message to the respective clients.

export async function handler(event: EventBridgeEvent<'EventResponse', ResponseEventDetails>): Promise<any> {
  const connections = await getConnections(event.detail.chatId);
  connections
    .filter((cId: string) => cId !== event.detail.senderConnectionId)
    .map((connectionId: string) => gatewayClient.postToConnection({
      ConnectionId: connectionId,
      Data: JSON.stringify({ data: event.detail.message }),
    })
}

Inter-Region message delivery

To send messages across Regions, this solution uses EventBridge’s cross-Region event routing capability. Cross-Region event routing allows you to replicate events across Regions by adding an event bus in another Region as the target of a rule. In this case, the architecture is a mesh of event buses in all Regions so that events in every event bus are replicated to all other Regions.

A message sent to an event bus in a Region is replicated to the event buses in the other Regions and trigger the intra-region workflow that I described earlier. However, to avoid infinite loops the EventBridge service implements circuit breaker logic that prevents infinite loops of event buses sending messages back and forth. Thus, only ProcessRequestLambda is invoked as a rule target. The function receives the message via its invocation event and looks up the active WebSocket connections in its Region. It then pushes the message to all relevant clients.

This process happens in every Region so that the initial message is delivered to every connected client with at-least-once semantics.

Improving resilience

The architecture of this solution is resilient to service disruptions in a Region. In such an event, all clients connected to the affected Region reconnect to an unaffected Region and continue to receive events. Although this isn’t covered in the CDK code, you can also set up Amazon Route 53 health checks to automate DNS failover to a healthy Region.

Testing the workflow

You can use any WebSocket client to test the application. Here you can see three clients, one connected to the us-west-1 API Gateway endpoint and two connected to the eu-west-1 endpoint. Each one sends a message to the application and every other client receives it, regardless of the Region it is connected to.

Cleaning up

Most services used in this blog post have an allowance in the AWS Free Tier. Be sure to check potential costs of this solution and delete the stack if you don’t need it anymore. Instructions on how to do this are included inside the README in the repository.

Conclusion

This blog post shows how to use the AWS serverless platform to build a multi-regional chat application over WebSockets. With the cross-Region event routing of EventBridge the architecture is resilient as well as extensible.

For more resources on how to get the most out of the AWS serverless platform, visit Serverless Land.

Composing AWS Step Functions to abstract polling of asynchronous services

2022-03-09 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/composing-aws-step-functions-to-abstract-polling-of-asynchronous-services/

This post is written by Nicolas Jacob Baer, Senior Cloud Application Architect, Goeksel Sarikaya, Senior Cloud Application Architect, and Shukhrat Khodjaev, Engagement Manager, AWS ProServe.

AWS Step Functions workflows can use the three main integration patterns when calling AWS services. A common integration pattern is to call a service and wait for a response before proceeding to the next step.

While this works well with AWS services, there is no built-in support for third-party, on-premises, or custom-built services running on AWS. When a workflow integrates with such a service in an asynchronous fashion, it requires a primary step to invoke the service. There are additional steps to wait and check the result, and handle possible error scenarios, retries, and fallbacks.

Although this approach may work well for small workflows, it does not scale to multiple interactions in different steps of the workflow and becomes repetitive. This may result in a complex workflow, which makes it difficult to build, test, and modify. In addition, large workflows with many repeated steps are more difficult to troubleshoot and understand.

This post demonstrates how to decompose the custom service integration by nesting Step Functions workflows. One basic workflow is dedicated to handling the asynchronous communication that offers modularity. It can be re-used as a building block. Another workflow is used to handle the main process by invoking the nested workflow for service interaction, where all the repeated steps are now hidden in multiple executions of the nested workflow.

Overview

Consider a custom service that provides an asynchronous interface, where an action is initially triggered by an API call. After a few minutes, the result is available to be polled by the caller. The following diagram shows a basic workflow interacting with this custom service that encapsulates the service communication in a workflow:

Call Custom Service API – calls a custom service, in this case through an API. Potentially, this could use a service integration or use AWS Lambda if there is custom code.
Wait – waits for the service to prepare a result. This depends on the service that the workflow is interacting with, and could vary from seconds to days to months.
Poll Result – attempts to poll the result from the custom service.
Choice – repeats the polling in case the result was not available yet, move on to failed or success state if result was retrieved. In addition, a timeout should be in place here in case the result is not available within the expected time range. Otherwise, this might lead to an infinite loop.
Fail – fails the workflow if a timeout or a threshold for the number of retries with error conditions is reached.
Transform Result – transforms the result or adds additional meta information to provide further information to the caller (for example, runtime or retries).
Success – finishes the workflow successfully.

If you build a larger workflow that interacts with this custom service in multiple steps in a workflow, you can reduce the complexity by using the Step Functions integration to call the nested workflow with a Wait state.

An illustration of this can be found in the following diagram, where the nested stack is called three times sequentially. Likewise, you can build a more complex workflow that adds additional logic through more steps or interacts with a custom service in parallel. The polling logic is hidden in the nested workflow.

Walkthrough

To get started with AWS Step Functions and Amazon API Gateway using the AWS Management Console:

Go to AWS Step Functions in the AWS Management Console.
Choose Run a sample project and choose Start a workflow within a workflow.
Scroll down to the sample projects, which are defined using Amazon States Language (ASL).
Review the example definition, then choose Next.
Choose Deploy resources. The deployment can take up to 10 minutes. After deploying the resources, you can edit the sample ASL code to define steps in the state machine.
The deployment creates two state machines: NestingPatternMainStateMachine and NestingPatternAnotherStateMachine. NestingPatternMainStateMachine orchestrates the other nested state machines sequentially or in parallel.
Select a state machine, then choose Edit to edit the workflow. In the NestingPatternMainStateMachine, the first state triggers the nested workflow NestingPatternAnotherStateMachine. You can pass necessary parameters to the nested workflow by using Parameters and Input as shown in the example below with Parameter1 and Parameter2. Once the first nested workflow completes successfully, the second nested workflow is triggered. If the result of the first nested workflow is not successful, the NestingPatternMainStateMachine fails with the Fail state.
Select nested workflow NestingPatternAnotherStateMachine, and then select Edit to add AWS Lambda functions to start a job and poll the state of the jobs. This can be any asynchronous job that needs to be polled to query its state. Based on expected job duration, the Wait state can be configured for 10-20 seconds. If the workflow is successful, the main workflow returns a successful result.

Use cases and limitations

This approach allows encapsulation of workflows consisting of multiple sequential or parallel services. Therefore, it provides flexibility that can be used for different use cases. Services can be part of distributed applications, part of automated business processes, big data or machine learning pipelines using AWS services.

Each nested workflow is responsible for an individual step in the main workflow, providing flexibility and scalability. Hundreds of nested workflows can run and be monitored in parallel with the main workflow (see AWS Step Functions Service Quotas).

The approach described here is not applicable for custom service interactions faster than 1 second, since it is the minimum configurable value for a wait step.

Nested workflow encapsulation

Similar to the principle of encapsulation in object-oriented programming, you can use a nested workflow for different interactions with a custom service. You can dynamically pass input parameters to the nested workflow during workflow execution and receive return values. This way, you can define a clear interface between the nested workflow and the parent workflow with different actions and integrations. Depending on the use-case, a custom service may offer a variety of different actions that must be integrated into workflows that run in Step Functions, but can all be combined into a single workflow.

Debugging and tracing

Additionally, debugging and tracing can be done through Execution Event History in the State Machine Management Console. In the Resource column, you can find a link to the executed nested step function. It can be debugged in case of any error in the nested Step Functions workflow.

However, debugging can be challenging in case of multiple parallel nested workflows. In such cases, AWS X-Ray can be enabled to visualize the components of a state machine, identify performance bottlenecks, and troubleshoot requests that have led to an error.

To enable AWS X-Ray in AWS Step Functions:

Open the Step Functions console and choose Edit state machine.
Scroll down to Tracing settings, and Choose Enable X-Ray tracing.

For detailed information regarding AWS X-Ray and AWS Step Functions please refer to the following documentation: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-xray-tracing.html

Conclusion

This blog post describes how to compose a nested Step Functions workflow, which asynchronously manages a custom service using the polling mechanism.

To learn more about how to use AWS Step Functions workflows for serverless microservices orchestration, visit Serverless Land.

Decoding protobuf messages using AWS Lambda

2022-03-07 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/decoding-protobuf-messages-using-aws-lambda/

This post is written by Ennio Pastore, Data Lab Architect.

Protobuf is short for protocol buffers, which are language- and platform-neutral mechanisms for serializing structured data. Compared to XML or JSON the size of the messages is smaller, so the network transfer is faster, reducing latency in the interactions between applications. They are commonly used in communications protocols like RPC systems, for persistent storage of data in a variety of storage systems, and in use-cases ranging from data analysis pipelines to mobile clients.

Since the protobuf messages are encoded in a binary format, they are non-human readable and in order to be processed you have to decode them. You define how you want your data to be structured once, then you can use generated source code to read and write structured data more easily. You can use a variety of languages to read and write data from a variety of data streams. Currently the supported languages are C++, C#, Dart, Go, Java, Kotlin, Python.

This blog post shows you how to decode protobuf messages in a data stream processing application using AWS Lambda functions.

Overview

This example assumes you are already receiving protobuf messages in an Amazon Kinesis Data Streams.

You will learn how to deploy a Lambda function that decodes protobuf messages and store them in JSON format in an Amazon S3 bucket.

To achieve this, create an AWS Lambda layer (step 1) containing the protobuf libraries that are required for the decoding. You can use any development environment where you can install Python 3.x and pip to create the Lambda layers.

After creating the layer, you can include it in the Lambda function (step 2) and you can implement the logic to decode the messages.

Prerequisites

You need the following prerequisites to deploy the solution:

AWS account
AWS CLI
AWS Serverless Application Model (AWS SAM) CLI
Python 3.9
An AWS Identity and Access Management (IAM) role with appropriate access.
Python source code for protobuf

To generate the Python source code required to decode protobuf data, you need a development environment with Python (3.x) and pip already installed.

You can use a local machine, an Amazon EC2 instance, or if you cannot install Python locally, use AWS Cloud9.

Generation of the Python source code for protobuf

Generate the Python source code required for the protobuf encoding and decoding, starting from the proto definition file. This code can be generated using the protobuf compiler from the proto definition file.

Create the proto definition file:

cat > /home/ec2-user/environment/demo.proto << ENDOFFILE
syntax = "proto3";
message demo {
  optional int32 id = 1;
  optional string name = 2;
  optional int32 timevalue = 3;
  optional string event = 4;
}
ENDOFFILE

Compile this file with the protobuf compiler (protoc) to generate the Python source code required for the protobuf encoding/decoding. The generated code only works for the classes defined in the proto definition file.

wget 
https://github.com/protocolbuffers/protobuf/releases/download/v3.19.1/protoc-3.19.1-linux-x86_64.zip

unzip protoc-3.19.1-linux-x86_64.zip

mkdir /home/ec2-user/environment/output

/home/ec2-user/environment/bin/protoc -I=/home/ec2-user/environment/ --python_out=/home/ec2-user/environment/output demo.proto

Create the Lambda layer

In your development environment, in the output directory, create a new directory named protobuf. Install the protobuf libraries locally:
```
mkdir -p ~/environment/output/protobuf
cd ~/environment/output/protobuf
mkdir python
cd python
pip3 install protobuf --target .
```

Include the Python source code to the libraries installed locally:

mkdir custom
cd custom
cp ~/environment/output/demo_pb2.py .
echo 'custom' >> ~/environment/output/protobuf/python/protobuf-3.19.1.dist-info/namespace_packages.txt
echo 'custom/demo_pb2.py' >> ~/environment/output/protobuf/python/protobuf-3.19.1.dist-info/RECORD
echo 'custom' >> ~/environment/output/protobuf/python/protobuf-3.19.1.dist-info/top_level.txt

Zip the Python folder:

cd ~/environment/output/protobuf
zip -r protobuf.zip .

The Lambda layer is ready. If you built it on a remote instance, you must download it in your local machine.

Step 4: Adding the Protobuf Layer to Lambda

Add the layer created in the previous steps to Lambda:

From the AWS Management Console select the Lambda service and choose Create a Layer:
Enter the name protobuf-lambda and upload the protobuf.zip that you created in the previous step.
Once the upload is complete, select x86_64 compatible architecture and select the corresponding Python runtime versions.

Implementation

The full source of the solution is in the GitHub repository and is deployed with AWS SAM.

Clone the solution repository using git:

git clone https://github.com/aws-samples/lambda-protobuf-decoder

Build the AWS SAM project:
```
sam build
```
Deploy the project using AWS SAM and the AWS SAM CLI. Follow the prompts, entering:
1. The name of the Kinesis Data Stream containing the protobuf messages
2. The name of the S3 Bucket that will be used to store the decoded messages
3. The name of your previously created AWS Lambda layer.For all other prompts select “Y”.

Deploy the project using AWS SAM:

sam deploy --guided --capabilities CAPABILITY_NAMED_IAM

The stack is complete when the message “Successfully created/updated stack”. If the stack fails, find the resources that failed to create and troubleshoot any issues.

Testing the AWS SAM stack

Once the AWS SAM stack is successfully deployed, navigate to the Lambda service and choose “protobuf-decoder-lambda”.
Choose the “Monitoring” tab, then “View logs in CloudWatch”:
Select the top Log stream from the list. The logs show for each message the original protobuf message and the decoded message:

Check that all the messages are stored correctly in JSON format in the S3 bucket:

Navigate to the Amazon S3 console and find the destination bucket you specified in the AWS SAM template.
There are multiple files. Select one and choose Actions -> Query with S3 Select.
In the “Input settings” panel and “Output settings” panels, for the “Format” options, select the value JSON.
In the “SQL query” panel, using the default query, choose Run SQL Query. You can see that the content of the object in the S3 bucket is a JSON message.

Cleaning up

If you have generated any events, empty the S3 bucket before deleting the entire stack. If you do not, the data will not be deleted.

To delete the stack, use the AWS SAM CLI. Assuming the stack name is protodecoder, run:

sam delete --stack-name protodecoder

Conclusion

This post shows how to create a Lambda function to decode in real-time protobuf messages. You import the proto message definition in a development environment and compile it to generate the Python source code.

You create the Lambda layer for the protobuf decoding Lambda function, integrating the Python source code previously created with the protobuf libraries. Using AWS SAM, you create the Lambda function including the protobuf libraries.

If you want to dig deeper into Lambda functions, see What is AWS Lambda? To extend the Lambda function to interact with multiple AWS services, see the Boto3 documentation.

For more serverless learning resources, visit Serverless Land.

Migrating a monolithic .NET REST API to AWS Lambda

2022-03-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/migrating-a-monolithic-net-rest-api-to-aws-lambda/

This post is written by James Eastham, Cloud Infrastructure Architect.

There are many ways to deploy a .NET application to AWS. From a single process ASP.NET core web API hosted on an EC2 instance to a serverless API backed by AWS Lambda. This post explains key topics to simplify your move from monolith to serverless.

The .NET Framework launched in 2002. This means that there are years’ worth of existing .NET application code that can benefit from moving to a serverless architecture. With the release of the AWS Porting Assistant for .NET and the AWS Microsoft Extractor for .NET, AWS tooling can assist directly with this modernization.

These tools help modernization but don’t migrate the compute layer from traditional servers to serverless technology.

Hexagonal architecture

The hexagonal architecture pattern proposes the division of a system into loosely coupled and interchangeable components. The application and business logic sit at the core of the application.

The next layer up is a set of interfaces that handle bidirectional communication from the core business logic layer. Implementation details are moved to the outside. The inputs (API controllers, UI, consoles, test scripts) and outputs (database implementations, message bus interactions) are at the perimeter.

The chosen compute layer becomes an implementation detail, not a core part of the system. It allows a cleaner process for migrating any integrations, from the frontend, to the compute layer and underlying database engine.

Code examples

The GitHub repo contains the code examples from this post with instructions for deploying the migrated serverless application.

The repository contains a .NET core REST API. It uses MySQL for its database engine and relies on an external API as part of its business logic. It also contains a migrated serverless version of the same application that you can deploy to your AWS account. This uses a combination of the AWS Cloud Development Kit (CDK) and the AWS Serverless Application Model (AWS SAM) CLI.

The architecture of the deployed monolithic application is:

After migrating the application to Lambda, the architecture is:

Integrations

Modern web applications rely on databases, file systems, and even other applications. With first class support for dependency injection in .NET Core, managing these integrations is simpler.

The following code snippet is taken from the BookingController.cs file. It shows how required interfaces are injected into the constructor of the controller. One of the controller methods uses the injected interface to list bookings from the BookingRepository.

    [ApiController]
    [Route("[controller]")]
    public class BookingController : ControllerBase
    {
        private readonly ILogger<BookingController> _logger;
        private readonly IBookingRepository _bookingRepository;
        private readonly ICustomerService _customerService;

        public BookingController(ILogger<BookingController> logger,
            IBookingRepository bookingRepository,
            ICustomerService customerService)
        {
            this._logger = logger;
            this._bookingRepository = bookingRepository;
            this._customerService = customerService;
        }

        /// <summary>
        /// HTTP GET endpoint to list all bookings for a customer.
        /// </summary>
        /// <param name="customerId">The customer id to list for.</param>
        /// <returns>All <see cref="Booking"/> for the given customer.</returns>
        [HttpGet("customer/{customerId}")]
        public async Task<IActionResult> ListForCustomer(string customerId)
        {
            this._logger.LogInformation($"Received request to list bookings for {customerId}");

            return this.Ok(await this._bookingRepository.ListForCustomer(customerId));
        }
}

The implementation of the IBookingRepository is configured at startup using dependency injection in the Startup.cs file.

services.AddTransient<IBookingRepository, BookingRepository>();

This works when using an ASP.NET Core Web API project, since the framework abstracts much of the complexity and configuration. But it’s possible to apply the same practices for .NET core code running in Lambda.

Configuring dependency injection in AWS Lambda

The startup logic is moved to a standalone DotnetToLambda.Serverless.Config library. This allows you to share the dependency injection configuration between multiple Lambda functions. This library contains a single static class named ServerlessConfig.

There is little difference between this file and the Startup.cs file:

public void ConfigureServices(IServiceCollection services)
{
	var databaseConnection =
		new DatabaseConnection(this.Configuration.GetConnectionString("DatabaseConnection"));
	
	services.AddSingleton<DatabaseConnection>(databaseConnection);
	
	services.AddDbContext<BookingContext>(options =>
		options.UseMySQL(databaseConnection.ToString()));

	services.AddTransient<IBookingRepository, BookingRepository>();
	services.AddHttpClient<ICustomerService, CustomerService>();
	
	services.AddControllers();
}

And the configuration method in the ServerlessConfig class:


public static void ConfigureServices()
{
	var client = new AmazonSecretsManagerClient();
	
	var serviceCollection = new ServiceCollection();

	var connectionDetails = LoadDatabaseSecret(client);

	serviceCollection.AddDbContext<BookingContext>(options =>
		options.UseMySQL(connectionDetails.ToString()));
	
	serviceCollection.AddHttpClient<ICustomerService, CustomerService>();
	serviceCollection.AddTransient<IBookingRepository, BookingRepository>();
	serviceCollection.AddSingleton<DatabaseConnection>(connectionDetails);
	serviceCollection.AddSingleton<IConfiguration>(LoadAppConfiguration());

	serviceCollection.AddLogging(logging =>
	{
		logging.AddLambdaLogger();
		logging.SetMinimumLevel(LogLevel.Debug);
	});

	Services = serviceCollection.BuildServiceProvider();
}

The key addition is the manual creation of the ServiceCollection object on line 27 and the call to BuildServiceProvider on line 45. In.NET core the framework abstracts away this manual object initialization. The created ServiceProvider is then exposed as a read-only property of the ServerlessConfig class. All we have done is taken the boilerplate code that an ASP.NET Core web API performs behind the scenes and brought it into the foreground.

This allows you to copy and paste large parts of the startup configuration directly from the web API and re-use it in your Lambda functions.

Lambda API controllers

For the function code, follow a similar process. For example, here is the ListForCustomer endpoint re-written for Lambda:

 public class Function
{
	private readonly IBookingRepository _bookingRepository;
	private readonly ILogger<Function> _logger;
	
	public Function()
	{
		ServerlessConfig.ConfigureServices();

		this._bookingRepository = ServerlessConfig.Services.GetRequiredService<IBookingRepository>();
		this._logger = ServerlessConfig.Services.GetRequiredService<ILogger<Function>>();
	}
	
	public async Task<APIGatewayProxyResponse> FunctionHandler(APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
	{
		if (!apigProxyEvent.PathParameters.ContainsKey("customerId"))
		{
			return new APIGatewayProxyResponse
			{
				StatusCode = 400,
				Headers = new Dictionary<string, string> { { "Content-Type", "application/json" } }
			};
		}

		var customerId = apigProxyEvent.PathParameters["customerId"];
		
		this._logger.LogInformation($"Received request to list bookings for: {customerId}");

		var customerBookings = await this._bookingRepository.ListForCustomer(customerId);
		
		return new APIGatewayProxyResponse
		{
			Body = JsonSerializer.Serialize(customerBookings),
			StatusCode = 200,
			Headers = new Dictionary<string, string> { { "Content-Type", "application/json" } }
		};
	}
}

The function constructor calls the startup configuration. This allows the initial configuration to be re-used while the Lambda execution environment is still active. Once the services have been configured any required interfaces can be retrieved from the services property of the ServerlessConfig class.

The second key differences are the mapping of the inbound request and response back to API Gateway. The HTTP request arrives as an event and the contents must be manually parsed out of the raw HTTP data. The same applies to the HTTP response, which must be constructed manually. Other than these two differences, it’s a copy from the original BookingController.

Application configuration

An ASP.NET Core Web API contains an appsettings.json file, which contains runtime specific configuration. The framework handles loading the file and exposing it as an injectable IConfiguration interface. It’s also possible to load settings from environment variables.

This is still possible when using Lambda. You can package an appsettings.json file with the compiled code and load it manually at runtime. However, when using Lambda as the compute layer, there are AWS-specific options for managing configuration.

Environment variables

Lambda environment variables are used to add runtime configuration, as shown in the template.yaml file:

 Environment:
	Variables:
		SERVICE: bookings
		DATABASE_CONNECTION_SECRET_ID: !Ref SecretArn

This AWS SAM configuration adds an environment variable named DATABASE_CONNECTION_SECRET_ID. You can access this in Lambda the same way an environment variable is accessed in any C# application:

 var databaseConnectionSecret = client.GetSecretValueAsync(new GetSecretValueRequest()
            {
                SecretId = Environment.GetEnvironmentVariable("DATABASE_CONNECTION_SECRET_ID"),
            }).Result;

This is the simplest way to add runtime configuration. The variables are stored in plaintext and any change requires a redeployment or manual interaction.

External configuration services

AWS has services that allow you to move application configuration outside of the function code. These include AWS Systems Manager Parameter Store, AWS AppConfig and AWS Secrets Manager.

You can use Parameter Store to store plaintext parameters that can also be encrypted using the AWS Key Management Service. The contents of the appsettings.json file from the ASP.NET Core API is directly copied into the parameter string and deployed using the AWS CDK.

 var parameter = new StringParameter(this, "dev-configuration", new StringParameterProps()
{
	ParameterName = "dotnet-to-lambda-dev",
	StringValue = "{\"CustomerApiEndpoint\": \"https://jsonplaceholder.typicode.com/users\"}",
	DataType = ParameterDataType.TEXT,
	Tier = ParameterTier.STANDARD,
	Type = ParameterType.STRING,
	Description = "Dev configuration for dotnet to lambda",
});

This JSON data is loaded as part of the startup configuration. The IConfiguration implementation is then built manually using the parameter string.

 private static IConfiguration LoadAppConfiguration()
{
	var client = new AmazonSimpleSystemsManagementClient();
	var param = client.GetParameterAsync(new GetParameterRequest()
	{
		Name = "dotnet-to-lambda-dev"
	}).Result;
	
	return new ConfigurationBuilder()
		.AddJsonStream(new MemoryStream(Encoding.ASCII.GetBytes(param.Parameter.Value)))
		.Build();

The second configuration mechanism is Secrets Manager. This helps protect secrets and provides easier rotation and management of database credentials.

Amazon RDS is integrated with Secrets Manager. When creating a new RDS instance, the database connection details can be automatically encrypted and persisted as a secret. The details for the MySQL instance are stored in Secrets Manager and are not exposed. These connection details can be accessed as part of the startup configuration using the Secrets Manager SDK.

private static DatabaseConnection LoadDatabaseSecret(AmazonSecretsManagerClient client)
{
	var databaseConnectionSecret = client.GetSecretValueAsync(new GetSecretValueRequest()
	{
		SecretId = Environment.GetEnvironmentVariable("DATABASE_CONNECTION_SECRET_ID"),
	}).Result;

	return JsonSerializer
		.Deserialize<DatabaseConnection>(databaseConnectionSecret.SecretString);
}

The Lambda functions require IAM permissions to access both Secrets Manager and Parameter Store. AWS SAM includes pre-defined policy templates that you can add to the template. Four lines of YAML apply the required Secrets Manager and SSM permissions:

Policies:
	- AWSSecretsManagerGetSecretValuePolicy:
		SecretArn: !Ref SecretArn
	- SSMParameterReadPolicy:
		ParameterName: dotnet-to-lambda-dev

For a full list, see the policy template list.

Networking

The final architectural component is the network. Lambda functions are deployed into a VPC owned by the service. The function can access anything available on the public internet such as other AWS services, HTTPS endpoints for APIs, or services and endpoints outside AWS. The function then has no way to connect to your private resources inside of your VPC.

When deploying an RDS instance into AWS, it’s best practice to place the database in a private subnet with external ingress. If Lambda uses RDS, you must create a connection between the Lambda service VPC and your VPC. The details of this networking component can be found in this blog article.

The AWS SAM template defines this networking configuration:

VpcConfig:
	SubnetIds:
	  - !Ref PrivateSubnet1
	  - !Ref PrivateSubnet2
	SecurityGroupIds:
	  - !Ref SecurityGroup

In this example, the networking configuration is applied globally. This means that the same configuration is applied to all Lambda functions in the template. The functions here are deployed across two subnets and one security group. Learn more about the steps for configuring the subnets and security groups for RDS access in this article.

The specific values for the subnets and security groups are taken from environment variables. When running locally, you can provide these variables manually. When deploying via CICD, these variables can be changed dynamically based on the stage of the pipeline.

 PrivateSubnet1:
	Description: 'Required. Private subnet 1. Output from cdk deploy'
	Type: 'String'
PrivateSubnet2:
	Description: 'Required. Private subnet 2. Output from cdk deploy'
	Type: 'String'
SecurityGroup:
	Description: 'Required. Security group. Output from cdk deploy'
	Type: 'String'

Conclusion

This blog post shows the required considerations for migrating a .NET core REST API to AWS Lambda. You can now start to look at your existing code base and make an informed decision whether Lambda is for you. With the right abstractions and configuration, you can migrate a .NET core API to Lambda compute with copy and paste.

For more serverless learning resources, visit Serverless Land.

Audit AWS service events with Amazon EventBridge and Amazon Kinesis Data Firehose

2022-03-01 Anand Shah

Post Syndicated from Anand Shah original https://aws.amazon.com/blogs/big-data/audit-aws-service-events-with-amazon-eventbridge-and-amazon-kinesis-data-firehose/

Amazon EventBridge is a serverless event bus that makes it easy to build event-driven applications at scale using events generated from your applications, integrated software as a service (SaaS) applications, and AWS services. Many AWS services generate EventBridge events. When an AWS service in your account emits an event, it goes to your account’s default event bus.

The following are a few event examples:

Amazon EC2 Auto Scaling events
Amazon CloudWatch alarm status change events
Events generated from developer tools on AWS services like AWS CodeCommit and AWS CodeDeploy
AWS Step Functions state change events
Analytics on AWS services like AWS Glue, Amazon Redshift, and Amazon EMR

By default, these AWS service-generated events are transient and therefore not retained. This post shows how you can forward AWS service-generated events or custom events to Amazon Simple Storage Service (Amazon S3) for long-term storage, analysis, and auditing purposes using EventBridge rules and Amazon Kinesis Data Firehose.

Solution overview

In this post, we provide a working example of AWS service-generated events ingested to Amazon S3. To make sure we have some service events available in default event bus, we use Parameter Store, a capability of AWS Systems Manager to store new parameters manually. This action generates a new event, which is ingested by the following pipeline.

Architecture Diagram

The pipeline includes the following steps:

AWS service-generated events (for example, a new parameter created in Parameter Store) goes to the default event bus at EventBridge.
The EventBridge rule matches all events and forwards those to Kinesis Data Firehose.
Kinesis Data Firehose delivers events to the S3 bucket partitioned by detail-type and receipt time using its dynamic partitioning capability.
The S3 bucket stores the delivered events, and their respective event schema is registered to the AWS Glue Data Catalog using an AWS Glue crawler.
You query events using Amazon Athena.

Deploy resources using AWS CloudFormation

We use AWS CloudFormation templates to create all the necessary resources for the ingestion pipeline. This removes opportunities for manual error, increases efficiency, and provides consistent configurations over time. The template is also available on GitHub.

Complete the following steps:

Click here to
Acknowledge that the template may create AWS Identity and Access Management (IAM) resources.
Choose Create stack.

The template takes about 10 minutes to complete and creates the following resources in your AWS account:

An S3 bucket to store event data.
A Firehose delivery stream with dynamic partitioning configuration. Dynamic partitioning enables you to continuously partition streaming data in Kinesis Data Firehose by using keys within the data (for example, customer_id or transaction_id) and then deliver the data grouped by these keys into corresponding S3 prefixes.
An EventBridge rule that forwards all events from the default event bus to Kinesis Data Firehose.
An AWS Glue crawler that references the path to the event data in the S3 bucket. The crawler inspects data landed to Amazon S3 and registers tables as per the schema with the AWS Glue Data Catalog.
Athena named queries for you to query the data processed by this example.

Trigger a service event

After you create the CloudFormation stack, you trigger a service event.

On the AWS CloudFormation console, navigate to the Outputs tab for the stack.
Choose the link for the key CreateParameter.

Create Parameter

You’re redirected to the Systems Manager console to create a new parameter.

For Name, enter a name (for example, my-test-parameter).
For Value, enter the test value of your choice (for example, test-value).

My Test parameter

Leave everything else as default and choose Create parameter.

This step saves the new Systems Manager parameter and pushes the parameter-created event to the default EventBridge event bus, as shown in the following code:

{
  "version": "0",
  "id": "6a7e4feb-b491-4cf7-a9f1-bf3703497718",
  "detail-type": "Parameter Store Change",
  "source": "aws.ssm",
  "account": "123456789012",
  "time": "2017-05-22T16:43:48Z",
  "region": "us-east-1",
  "resources": [
    "arn:aws:ssm:us-east-1:123456789012:parameter/foo"
  ],
  "detail": {
    "operation": "Create",
    "name": "my-test-parameter",
    "type": "String",
    "description": ""
  }
}

Discover the event schema

After the event is triggered by saving the parameter, wait at least 2 minutes for the event to be ingested via Kinesis Data Firehose to the S3 bucket. Now complete the following steps to run an AWS Glue crawler to discover and register the event schema in the Data Catalog:

On the AWS Glue console, choose Crawlers in the navigation pane.
Select the crawler with the name starting with S3EventDataCrawler.
Choose Run crawler.

Run Crawler

This step runs the crawler, which takes about 2 minutes to complete. The crawler discovers the schema from all events and registers it as tables in the Data Catalog.

Query the event data

When the crawler is complete, you can start querying event data. To query the event, complete the following steps:

On the AWS CloudFormation console, navigate to the Outputs tab for your stack.
Choose the link for the key AthenaQueries.

Athena Queries

You’re redirected to the Saved queries tab on the Athena console. If you’re running Athena queries for the first time, set up your S3 output bucket. For instructions, see Working with Query Results, Recent Queries, and Output Files.

Search for Blog to find the queries created by this post.
Choose the query Blog – Query Parameter Store Events.

Find Athena Saved Queries

The query opens on the Athena console.

Choose Run query.

You can update the query to search the event you created earlier.

Apply a WHERE clause with the parameter name you selected earlier:

SELECT * FROM "AwsDataCatalog"."eventsdb-randomId"."parameter_store_change"
WHERE detail.name = 'Your event name'

You can also choose the link next to the key CuratedBucket from the CloudFormation stack outputs to see paths and the objects loaded to the S3 bucket from other event sources. Similarly, you can query them via Athena.

Clean up

Complete the following steps to delete your resources and stop incurring costs:

On the AWS CloudFormation console, select the stack you created and choose Delete.
On the Amazon S3 console, find the bucket with the name starting with eventbridge-firehose-blog-curatedbucket.
Select the bucket and choose Empty.
Enter permanently delete to confirm the choice.
Select the bucket again and choose Delete.
Confirm the action by entering the bucket name when prompted.
On the Systems Manager console, go to the parameter store and delete the parameter you created earlier.

Summary

This post demonstrates how to use an EventBridge rule to redirect AWS service-generated events or custom events to Amazon S3 using Kinesis Data Firehose to use for long-term storage, analysis, querying, and audit purposes.

For more information, see the Amazon EventBridge User Guide. To learn more about AWS service events supported by EventBridge, see Events from AWS services.

About the Author

Anand Shah is a Big Data Prototyping Solution Architect at AWS. He works with AWS customers and their engineering teams to build prototypes using AWS analytics services and purpose-built databases. Anand helps customers solve the most challenging problems using the art of the possible technology. He enjoys beaches in his leisure time.

Avoid affecting your production environment during migration with AWS Application Migration Service

2022-02-28 Michael Spence

Post Syndicated from Michael Spence original https://aws.amazon.com/blogs/architecture/avoid-affecting-your-production-environment-during-migration-with-aws-application-migration-service/

Customers commonly use AWS Application Migration Service to migrate Active Directory joined Windows or Linux servers to Amazon Web Services (AWS). However, this process can affect the production environment during testing. For example, if you update DNS addresses during testing, clients that try to reach the original server will be redirected to the testing server.

To avoid this issue, you could launch instances in a completely isolated network environment. However, this may not be an option for you if your application stack depends on other applications or services.

In this post, we offer an architecture and process that can be applied to applications that cannot be completely isolated during testing. It uses Microsoft Domain Name System (DNS) Query Policies and a Group Policy and does not affect the production environment.

Overview of solution

Using a simple application stack of a WordPress site as an example, we’ll show you how to implement this solution. We’ll also discuss the prequisites required to implement this solution and describe potential domain issues you may encounter and how to solve them.

Figure 1 shows a view of the testing architecture on-premises and in AWS:

It uses an Amazon Virtual Private Cloud (Amazon VPC) that was set up specifically for testing.
Within this Amazon VPC, a management subnet that runs on Amazon Elastic Compute Cloud (Amazon EC2) contains the production environment’s self-managed domain controller. This domain controller provides authentication and DNS services to the instances launched during testing.
The client that tests the application is located within the same private subnet as the applications being tested.

Figure 1. Testing architecture design

Prerequisites

To implement this solution, you will need:

An AWS account
Network connectivity between an on-premises server (or other cloud) and AWS
Active Directory (Windows Server 2016 or later)
Active Directory Integrated DNS
Basic understanding of the following technologies:
- Application Migration Service
- EC2 launch templates
- Active Directory (Windows Version 2016, 2019, or 2022)
- Microsoft Group Policy
- Active Directory Integrated DNS
- Windows PowerShell

Potential domain issues and how to solve them

Application Migration Service is a block-level replication of the original source server, meaning that the replicated server is an exact copy of the original server. During testing, the original source server and the server that was launched for testing are active on the domain at the same time.

This has the potential to create these issues:

Each server joined to an Active Directory domain has an associated computer object, and that object has a password. The server will change this password automatically every 30 days by default. If either server changes the password for that object, it will cause the other server to fail because it cannot establish a secure session with a domain controller.
Active Directory supports dynamic DNS updates by allowing servers to register and dynamically update their resource records. If the replicated server updates DNS with a new IP address, that change can propagate to the rest of the environment. This means that clients trying to reach the original server will be redirected to the testing server, which will affect the production environment.

By creating a new Group Policy Object (GPO) with the following settings (and as shown in Figure 2), you can avoid these potential issues.

Disable Machine Password Changes. Use reverse logic notation to enable the setting to disable the automatic password changes on the server.
Disable Dynamic Updates from the DNS Client. This setting will prevent the server from updating DNS with a new IP address when you are testing it.

Apply this newly created GPO to the servers that are being migrated. The servers can be targeted for the policy by Active Directory security group, or you can move the servers to a separate organizational unit (OU) within Active Directory and apply the GPO to that OU.

Figure 2. GPO settings

After you apply GPO settings to the server being migrated, verify the GPO settings from an administrative command prompt:

C:\> gpupdate /force

C:\> gpresult /r /scope computer

Active Directory replication

You can disable outbound replication from the domain controller in the testing VPC while still allowing inbound replication. We intentionally configured the network configuration within the testing VPC to allow connectivity to on-premises and other networks.

To do this, issue the following command from an administrative command prompt with a domain administrator privileged user:

C:\> repadmin /options <Testing DC> +disable_outbound_repl

Example walkthrough

1. Example application description

Let’s look at a simple application stack of a WordPress site. It consists of a single web server and a single database server that are being migrated from on-premises to AWS. The WordPress stack could be Linux or Windows. The WordPress configuration on the web server points to the database by hostname only (not by IP address).

When the web server starts up during the testing phase, it issues a DNS query to find the database server by name. Note that we don’t want to return the IP address of the current production database server; we want the one currently being tested. When our workstation in the testing VPC opens up a browser to the WordPress site, we also want it to return the IP address of the web server being tested.

For reference, Table 1 reflects the IP addresses, network information, and URL used in the testing example.

Table 1. Example values for the testing scenario
Testing Private Subnet CIDR	10.0.67.0/24
Domain Zone Name	awslaunch.local
WordPress Site URL	http://wswpweb1

Hostname	Original IP Address	Testing IP Address
wswpweb1	10.0.4.171	10.0.67.171
wswpsql1	10.0.4.202	10.0.67.202

2. Modify EC2 launch templates

To ensure that we launch the EC2 instances with the IP addresses we specified in Table 1, let’s change the EC2 launch template associated with each of the source servers from within Application Migration Service. The EC2 launch template should specify the testing subnet and the IP address to be assigned to the test instances.

Figure 3. EC2 launch template settings

Once you’ve modified the EC2 launch template, be sure to set it as the “Default” template because this is the version of the template that Application Migration Service will launch upon testing.

3. Create Microsoft DNS Query Policies

Now that we’ve launched the EC2 instances for testing, let’s set up DNS Query Policies for DNS to return an alternate set of IP addresses during the testing phase.

We’ll set these policies to return alternate IP addresses from DNS for the servers involved in testing and only within the boundary of the testing private subnet (Policy 3.4). By doing this, if any client outside of the testing private subnet queries this domain controller, it will return the original IP addresses and not the testing ones.

Because DNS Query Policies are specific to a specific domain controller, let’s issue the following PowerShell commands on the domain controller that is active in our testing VPC (AD-DC-2 in Figure 1) to create the following policies:

Policy 3.1 A client subnet that represents our private subnet in the testing VPC. This is a one-time configuration.

Add-DnsServerClientSubnet -Name
"MGN-Testing-AZ1" -IPv4Subnet "10.0.67.0/24"

Policy 3.2 A zone scope within the domain zone name. This is a one-time configuration.

Add-DnsServerZoneScope -ZoneName
"awslaunch.local" -Name "MGN-Testing-AZ1-Scope"

Policy 3.3 Alternative IP addresses in the new zone scope.

Add-DnsServerResourceRecord -ZoneName
"awslaunch.local" -A -Name "wswpweb1" -IPv4Address
"10.0.67.171" -ZoneScope "MGN-Testing-AZ1-Scope"

Add-DnsServerResourceRecord -ZoneName
"awslaunch.local" -A -Name "wswpsql1" -IPv4Address
"10.0.67.202" -ZoneScope "MGN-Testing-AZ1-Scope"

Policy 3.4 A query resolution policy that returns alternate IP addresses for our testing servers without impacting the rest of the domain.

Add-DnsServerQueryResolutionPolicy -Name
"MGN-Testing-AZ1-Policy" -Action ALLOW -FQDN "eq,wswpweb1.awslaunch.local,wswpsql1.awslaunch.local"
-ClientSubnet "eq,MGN-Testing-AZ1" -ZoneScope
"MGN-Testing-AZ1-Scope,1" -ZoneName "awslaunch.local"

4. Verify current production IP addresses

Now that we have our policies set, let’s make sure that the application stack IP addresses are still returning correctly from the current production environment with the following command:

C:\>nslookup wswpweb1

Server: WSDC1.awslaunch.local

Address: 10.0.5.10

Name: wswpweb1.awslaunch.local

Address: 10.0.4.171

C:\>nslookup wswpsql1

Server: WSDC1.awslaunch.local

Address: 10.0.5.10

Name: wswpsql1.awslaunch.local

Address: 10.0.4.202

These are the correct IP addresses. They match the original IP addresses from Table 1.

5. Verify testing IP addresses

Now let’s verify that the application stack IP addresses are returning correctly from the testing environment.

C:\>nslookup wswpweb1

Server: WSDC5.awslaunch.local

Address: 10.0.69.10

Name: wswpweb1.awslaunch.local

Address: 10.0.67.171

C:\>nslookup wswpsql1

Server: WSDC5.awslaunch.local

Address: 10.0.69.10

Name: wswpsql1.awslaunch.local

Address: 10.0.67.202

There are the correct addresses. They match the testing IP addresses from Table 1.

We can also verify that the WordPress site is functional by opening a browser from our testing client and browsing to the URL (https://wswpweb1, listed in Table 1).

Figure 4. WordPress site verification

Additionally, we can verify that the web server is connecting to the correct SQL server database by issuing the following command when logged into the WordPress web server:

C:\ >netstat | findstr /c:"ms-sql-s"

TCP 10.0.67.171:50355 10.0.67.202:ms-sql-s TIME_WAIT

TCP 10.0.67.171:50361
10.0.67.202:ms-sql-s
TIME_WAIT

6. Cleanup

After you’ve tested everything, mark the source servers in Application Migration Service as “Ready for Cutover.” This will terminate the EC2 testing instances.

The following PowerShell commands will remove the DNS resource records and the DNS Query Policy:

Remove-DnsServerResourceRecord -ZoneName
"awslaunch.local" -ZoneScope "MGN-Testing-AZ1-Scope"
-RRType "A" -Name "wswpweb1"

Remove-DnsServerResourceRecord -ZoneName
"awslaunch.local" -ZoneScope "MGN-Testing-AZ1-Scope"
-RRType "A" -Name "wswpsql1"

Remove-DnsServerQueryResolutionPolicy -Name
"MGN-Testing-AZ1-Policy" -ZoneName "awslaunch.local"

This command will re-enable replication from the testing domain controller.

C:\> repadmin /options <Testing DC>
-disable_outbound_repl

Once the servers are launched for final cutover, you can remove the Group Policy applied to the servers to allow the server to resume password changes and dynamic DNS updates.

Conclusion

When you migrate your server resources to AWS with Application Migration Service, the testing phase of the migration lifecycle is critical. It verifies that your application will function properly within the AWS environment before committing to a final cutover. The testing phase can uncover unexpected application dependencies, gauge application performance, and deliver the confidence that your migration to AWS will be successful without risks to your production environment.

In this blog post, we explored an architecture and process that uses Microsoft DNS Query Policies and Group Policy to address potential domain issues and avoids affecting the production environment. This testing framework persists during the entire cloud migration and across multiple application migrations. Once migration is complete, the testing architecture can be removed or maintained for future cloud migrations.

Ready to get started? Read more about the AWS Application Migration Service and other blog posts related to migrations on the AWS Cloud Enterprise Strategy Blog and the AWS Architecture Blog.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Automate code reviews with Amazon CodeGuru Reviewer

2022-02-15 Dhiraj Thakur

Post Syndicated from Dhiraj Thakur original https://aws.amazon.com/blogs/devops/automate-code-reviews-with-amazon-codeguru-reviewer/

A common problem in software development is accidentally or unintentionally merging code with bugs, defects, or security vulnerabilities into your main branch. Finding and mitigating these faulty lines of code deployed to the production environment can cause severe outages in running applications and can cost unnecessary time and effort to fix.

Amazon CodeGuru Reviewer tackles this issue using automated code reviews, which allows developers to fix the issue based on automated CodeGuru recommendations before the code moves to production.

This post demonstrates how to use CodeGuru for automated code reviews and uses an AWS CodeCommit approval process to set up a code approval governance model.

Solution overview

In this post, you create an end-to-end code approval workflow and add required approvers to your repository pull requests. This can help you identify and mitigate issues before they’re merged into your main branches.

Let’s discuss the core services highlighted in our solution. CodeGuru Reviewer is a machine learning-based service for automated code reviews and application performance recommendations. CodeCommit is a fully managed and secure source control repository service. It eliminates the need to scale infrastructure to support highly available and critical code repository systems. CodeCommit allows you to configure approval rules on pull requests. Approval rules act as a gatekeeper on your source code changes. Pull requests that fail to satisfy the required approvals can’t be merged into your main branch for production deployment.

The following diagram illustrates the architecture of this solution.

With CodeCommit repository, creating a pull request and approval rule. Then run the workflow to test the code, review CodeGuru recommendations to make appropriate changes, and run the workflow again to confirm that the code is ready to be merged

The solution has three personas:

Repository admin – Sets up the code repository in CodeCommit
Developer – Develops the code and uses pull requests in the main branch to move the code to production
Code approver – Completes the code review based on the recommendations from CodeGuru and either approves the code or asks for fixes for the issue

The solution workflow contains the following steps:

The repository admin sets up the workflow, including a code repository in CodeCommit for the development group, required access to check in their code to the dev branch, integration of the CodeCommit repository with CodeGuru, and approval details.
Developers develop the code and check in their code in the dev branch. This creates a pull request to merge the code in the main branch.
CodeGuru analyzes the code and reports any issues, along with recommendations based on the code quality.
The code approver analyzes the CodeGuru recommendations and provides comments for how to fix the issue in the code.
The developers fix the issue based on the feedback they received from the code approver.
The code approver analyzes the CodeGuru recommendations of the updated code. They approve the code to merge if everything is okay.
The code gets merged in the main branch upon approval from all approvers.
An AWS CodePipeline pipeline is triggered to move the code to the preproduction or production environment based on its configuration.

In the following sections, we walk you through configuring the CodeCommit repository and creating a pull request and approval rule. We then run the workflow to test the code, review recommendations and make appropriate changes, and run the workflow again to confirm that the code is ready to be merged.

Prerequisites

Before we get started, we create an AWS Cloud9 development environment, which we use to check in the Python code for this solution. The sample Python code for the exercise is available at the link. Download the .py files to a local folder.

Complete the following steps to set up the prerequisite resources:

Set up your AWS Cloud9 environment and access the bash terminal, preferably in the us-east-1 Region.
Create three AWS Identity and Access Management (IAM) users and its roles for the repository admin, developer, and approver by running the AWS CloudFormation template.

Configuring IAM roles and users

Sign in to the AWS Management Console.
Download ‘Persona_Users.yaml’ from github
Navigate to AWS CloudFormation and click on Create Stack drop down to choose With new resouces (Standard).
click on Upload a template file to upload file form local.
Enter a Stack Name such as ‘Automate-code-reviews-codeguru-blog’.
Enter IAM user’s temp password.
Click Next to all the other default options.
Check mark I acknowledge that AWS CloudFormation might create IAM resources with custom names. Click Create Stack.

This template creates three IAM users for Repository admin, Code Approver, Developer that are required at different steps while following this blog.

Configure the CodeCommit repository

Let’s start with CodeCommit repository. The repository works as the source control for the Java and Python code.

Sign in to the AWS Management Console as the repository admin.
On the CodeCommit console, choose Getting started in the navigation pane.
Choose Create repository.

Creating AWS CodeCommit create a new repository using AWS Console

For Repository name, enter transaction_alert_repo.
Select Enable Amazon CodeGuru Reviewer for Java and Python – optional.
Choose Create.

create CodeCommit repository named transaction_alert_repo, check box on Enable Amazon CodeGuru Reviewer for Java and Python

The repository is created.

On the repository details page, choose Clone HTTPS on the Clone URL menu.

clone HTTPS link for CodeCommit repo transaction_alert_repo using clone URL menu

Copy the URL to use in the next step to clone the repository in the development environment.

Clone link for HTTPS for CodeCommit repo transaction_alert_repo is avaiable to copy

On the CodeGuru console, choose Repositories in the navigation pane under Reviewer.

You can see our CodeCommit repository is associated with CodeGuru.

CodeCommit repository is to be associated with CodeGuru

Sign in to the console as the developer.
On the AWS Cloud9 console, clone the repository, using the URL that you copied in the previous step.

This action clones the repository and creates the transaction_alert_repo folder in the environment.

git clone https://git-codecommit.us-east-.amazonaws.com/v1/repos/transaction_alert_repo
cd transaction_alert_repo
echo "This is a test file" > README.md
git add -A
git commit -m "initial setup"
git push

git clone CodeCommit repo to Cloud9 using git clone command, readme.md file is created locally and pushed back to CodeCommit repo]

Check the file in CodeCommit to confirm that the README.md file is copied and available in the CodeCommit repository.

CodeCommit repo is now pushed with readme.md file

In the AWS Cloud9 environment, choose the transaction_alert_repo folder.
On the File menu, choose Upload Local Files to upload the Python files from your local folder (which you downloaded earlier).

Upload downloaded python test files that we are going to use for this blog from local system to Cloud9

Choose Select files and upload read_file.py and read_rule.py.

Drag and drop python files on cloud9 upload UI

You can see that both files are copied in the AWS Cloud9 environment under the transaction_alert_repo folder:

git checkout -b dev
git add -A
git commit -m "initial import of files"
git push --set-upstream origin dev

Push python local files are pushed to CodeCommit repo using git push command

Check the CodeCommit console to confirm that the read_file.py and read_rule.py files are copied in the repository.

Check the CodeCommit console to verify these pushed files are available

Create a pull request

Now we create our pull request.

On the CodeCommit console, navigate to your repository and choose Pull requests in the navigation pane.
Choose Create pull request.

Create pull request for the new files added

For Destination, choose master.
For Source, choose dev.
Choose Compare to see any conflict details in merging the request.

Pull request is visible to master branch, ready to merge

If the environments are mergeable, enter a title and description.
Choose Create pull request.

Pull request is merged and CodeGuru recommendation is triggered

Create an approval rule

We now create an approval rule as the repository admin.

Sign in to the console as the repository admin.
On the CodeCommit console, navigate to the pull request you created.
On the Approvals tab, choose Create approval rule.

Creating new Approval rule for any merge action

For Rule name, enter Require an approval before merge.
For Number of approvals needed, enter 1.
Under Approval pool members, provide an IAM ARN value for the code approver.
Choose Create.

Approval Rule mentions, requires an approval before merge

Review recommendations

We can now view any recommendations regarding our pull request code review.

As the repository admin, on the CodeGuru console, choose Code reviews in the navigation pane.
On the Pull request tab, confirm that the code review is completed, as it might take some time to process.
To review recommendations, choose the completed code review.

Check CodeGuru recommendation to see avaiable recommendation

You can now review the recommendation details, as shown in the following screenshot.

Review CodeGuru review recommendation details

Sign in to the console as the code approver.
Navigate to the pull request to view its details.

check pull request in detail, check stauts and Approval status

On the Changes tab, confirm that the CodeGuru recommendation files are available.

confirm that the CodeGuru recommendation files are available

Check the details of each recommendation and provide any comments in the New comment section.

The developer can see this comment as feedback from the approver to fix the issue.

Choose Save.

In CodeGuru console developer can see this comment as feedback from the approver to fix the issue

Enter any overall comments regarding the changes and choose Save.

Enter any overall comments regarding the changes and choose save

Sign in to the console as the developer.
On the CodeCommit console, navigate to the pull request -> select the request -> click on Changes to review the approver feedback.

click on Changes to review the approver feedback in CodeCommit console

Make changes, rerun the code review, and merge the environments

Let’s say the developer makes the required changes in the code to address the issue and uploads the new code in the AWS Cloud9 environment. If CodeGuru doesn’t find additional issues, we can merge the environments.

Run the following command to push the updated code to CodeCommit:

git add -A
git commit -m "code-fixed"
git push --set-upstream origin dev

git clone CodeCommit repo to Cloud9 using git clone command, readme.md file is created locally and pushed back to CodeCommit repo

Sign in to the console as the approver.
Navigate to the code review.

CodeGuru hasn’t found any issue in the updated code, so there are no recommendations.

CodeGuru hasn’t found any issue in the updated code, so there are no recommendations avaiable this time

On the CodeCommit console, you can verify the code and provide your approval comment.
Choose Save.

Using CodeCommit console, code can be now verified for approval

On the pull request details page, choose Approve.

New code is found with no conflict and can be approved

Now the developer can see on the CodeCommit console that the pull request is approved.

Code pull request is in Approved status

Approved code is ready to be merged

Select your merge strategy. For this post, we select Fast forward merge.
Choose Merge pull request.

Fast and forward merge is used to merge the code

You can see a success message.

Success message is generated for successful merge

On the CodeCommit console, choose Code in the navigation pane for your repository.
Choose master from the branch list.

The read_file.py and read_rule.py files are available under the main branch.

the new files are also avaiable in main branch beacuse of the successful merge

Clean up the resources

To avoid incurring future charges, remove the resources created by this solution by

Conclusion

This post highlighted the benefits of CodeGuru automated code reviews. You created an end-to-end code approval workflow and added required approvers to your repository pull requests. This solution can help you identify and mitigate issues before they’re merged into your main branches.

You can get started from the CodeGuru console by integrating CodeGuru Reviewer with your supported CI/CD pipeline.

For more information about automating code reviews and check out the documentation.

About the Authors

Building custom connectors using the Amazon AppFlow Custom Connector SDK

2022-02-14 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-custom-connectors-using-the-amazon-appflow-custom-connector-sdk/

This post is written by Kamen Sharlandjiev, Sr. Specialist SA, Integration, Ray Jang, Principal PMT, Amazon AppFlow, and Dhiraj Mahapatro, Sr. Specialist SA, Serverless.

Amazon AppFlow is a fully managed integration service that enables you to transfer data securely between software as a service (SaaS) applications like Salesforce, SAP, Zendesk, Slack, ServiceNow, and AWS services like Amazon S3 and Amazon Redshift. Amazon AppFlow lets you run enterprise-scale data flows on a schedule, in response to business events, or on-demand.

Amazon AppFlow is a managed integration service that replaces the heavy-lifting of developing, maintaining, and updating connectors. It supports bidirectional integration between external SaaS applications and AWS services.

The Custom Connector Software Development Kit (SDK) now makes it easier to integrate with private API endpoints, proprietary applications, or other cloud services. It provides access to all available managed integrations and the ability to build your own custom integration as part of the integrated experience. The SDK is open-source and available for Java or Python.

You can deploy custom connectors built with the SDK in different ways:

Private – The connector is available only inside the AWS account where deployed.
Shared – The connector can be shared for use with other AWS accounts.
Public – Publish connectors on the AWS Marketplace for free or charge a subscription fee. For more information, refer to Sharing AppFlow connectors via AWS Marketplace.

Overview

This blog takes you through building and deploying your own Amazon AppFlow Custom Connector using the Java SDK. The sample application shows how to build your first custom connector with Amazon AppFlow.

The process of building, deploying, and using a custom connector is:

Create a custom connector as an AWS Lambda function using the Amazon AppFlow Custom Connector SDK.
Deploy the custom connector Lambda function, which provides the serverless compute for the connector.
Lambda function integrates with a SaaS application or private API.
Register the custom connector with Amazon AppFlow.
Users can now use this custom connector in the Amazon AppFlow service.

Building an Amazon AppFlow custom connector

The sample application used in this blog creates a new custom connector that implements a MySQL JDBC driver. With this connector, you can connect to a remote MySQL or MariaDB instance to read and write data.

The SDK allows you to build custom connectors and use the service’s built-in authentication support for: OAuth2, API key, and basic auth. For other use cases, such as JDBC, you must create your own custom authentication implementation.

The SDK includes the source code for an example Salesforce connector. This highlights a complete use case for a source and destination Amazon AppFlow connector using OAuth2 as authentication.

Details

There are three mandatory Java interfaces that a connector must implement:

ConfigurationHandler.java: Defines the functionality to implement connector configurations, and credentials-related operations.
MetadataHandler.java: Represents the functionality to implement for objects metadata.
RecordHandler.java: Defines functionality to implement record-related CRUD operations.

Prerequisites

Ensure that the following software is installed on your workstation:

To run the sample application:

Clone the code repository:

git clone https://github.com/aws-samples/amazon-appflow-custom-jdbc-connector.git

cd amazon-appflow-custom-jdbc-connector

After cloning the sample application, visit these Java classes for more details:

JDBCConnectorConfigurationHandler.java: A configuration handler validates credentials for the JDBC client connection, validates connector runtime settings, and describes connector configuration.
JDBCConnectorMetadataHandler.java: A metadata handler describes and lists all entities for the JDBC connector metadata
JDBCConnectorRecordsHandler.java: A record handler that defines how to implement CRUD operations for this JDBC connector

To add JDBC clients for other database engines, implement JDBCClient.java interface. The custom connector uses a Lambda function as a POJO class to handle requests. The SDK provides an abstract BaseLambdaConnectorHandler class that, which you use as follows:

import com.amazonaws.appflow.custom.connector.lambda.handler.BaseLambdaConnectorHandler;

public class JDBCConnectorLambdaHandler extends BaseLambdaConnectorHandler {

  public JDBCConnectorLambdaHandler() {
    super(
      new JDBCConnectorMetadataHandler(),
      new JDBCConnectorRecordHandler(),
      new JDBCConnectorConfigurationHandler()
    );
  }
}

Local testing and debugging

While developing the connector specific functionality, developers require local testing capability to build and debug faster. The SDK and the example connector provides examples on testing custom connectors.

Additionally, you can experiment with JUnit and the DSL builders provided by the SDK. The JUnit test allows you to test this implementation locally by simulating an appropriate request to the Lambda functions. You can use debug points and step into the code implementation from start to end using the built-in IDE debugger. The sample application comes with example of JUnit tests that can be used with debug points.

Credentials management

Amazon AppFlow stores all sensitive information in AWS Secrets Manager. The secret is created when you create a connector profile. The secret ARN is passed in the ConnectorContext that forms part of the Lambda function’s invocation request.

To test locally:

Mock the “CredentialsProvider” and stub out the response of GetCredentials API. Note that the CredentialProvider provides several different GetCredentials methods, depending on the authentication used.
Create a secret in AWS Secrets Manager. Configure an IAM user with programmatic access and sufficient permissions to allow the secretsmanager:GetSecretValue action and let the CredentialsProvider call Secrets Manager locally. When you initialize a new service client without supplying any arguments, the SDK attempts to find AWS credentials by using the default credential provider chain.

For more information, read Working with AWS Credentials (SDK for Java) and Creating an IAM user with programmatic access.

Deploying the Lambda function in an AWS account

This example connector package provides an AWS Serverless Application Model (AWS SAM) template in the project folder. It describes the following resources:

The Lambda function containing the custom connector code.
The AWS IAM policy, allowing the function to read secrets from AWS Secrets Manager.
The AWS Lambda policy permission allowing Amazon AppFlow to invoke the Lambda function.

The sample application’s AWS SAM template provides two resources:

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Description: Template to deploy the lambda connector in your account.
Resources:
  ConnectorFunction:
    Type: 'AWS::Serverless::Function'
    Properties:
      Handler: "org.custom.connector.jdbc.handler.JDBCConnectorLambdaHandler::handleRequest"
      CodeUri: "./target/appflow-custom-jdbc-connector-jdbc-1.0.jar"
      Description: "AppFlow custom JDBC connector example"
      Runtime: java11
      Timeout: 30
      MemorySize: 1024
      Policies:
        Version: '2012-10-17'
        Statement:
          Effect: Allow
          Action: 'secretsmanager:GetSecretValue'
          Resource: !Sub 'arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:appflow!${AWS::AccountId}-*'

  PolicyPermission:
    Type: 'AWS::Lambda::Permission'
    Properties:
      FunctionName: !GetAtt ConnectorFunction.Arn
      Action: lambda:InvokeFunction
      Principal: 'appflow.amazonaws.com'
      SourceAccount: !Ref 'AWS::AccountId'
      SourceArn: !Sub 'arn:aws:appflow:${AWS::Region}:${AWS::AccountId}:*'

Deploy this custom connector by using the following command from the amazon-appflow-custom-jdbc-connector base directory:

mvn package && sam deploy –-guided

Once deployment completes, follow below steps to register and use the connector.

Registering the custom connector

There are two ways to register the custom connector.

1. Register through the AWS Management Console

From the AWS Management Console, navigate to Amazon AppFlow. Select Connectors on the left-side menu. Choose on the “Register New Connector” button.
Register the connector by selecting your Lambda function and typing in the connector label.
The newly created custom connector Lambda function appears in the list if you deployed using AWS SAM by following the steps in this tutorial. If you deployed the Lambda function manually, ensure that appropriate Lambda permissions are set, as described in the Lambda Permissions and Resource Policy section.
Provide a label for the connector. The label must be unique per account per Region. Choose Register.
The connector appears in the list of custom connectors.

2. Register with the API

Invoke the registerConnector public API endpoint with the following request payload:

{
   "connectorLabel":"TestCustomConnector",
   "connectorProvisioningType":"LAMBDA",
   "connectorProvisioningConfig":{
      "lambda":{ "lambdaArn":"arn:aws:lambda:<region>:<aws_account_id>:function:<lambdaFunctionName>"
      }
   }
}

For connectorLabel, use a unique label. Currently, the only supported connectorProvisioningType is LAMBDA.

Using the new custom connector

Navigate to the Connections link from the left-menu. Select the registered connector from the drop-down.
Choose Create Connection.
Complete the connector-specific setup:
Proceed with creating a flow and selecting your new connection.
Check Lambda function’s Amazon CloudWatch Logs to troubleshoot errors, if any, during connector registration, connector profile creation, and flow execution process.

Production considerations

This example is a proof of concept. To build a production-ready solution, review the non-exhaustive list of differences between sample and production-ready solutions.

If you plan to use the custom connector with high concurrency, review AWS Lambda quotas and limitations.

Cleaning up the custom connector stack

To delete the connector:

Delete all flows in Amazon AppFlow that you created as part of this tutorial.
Delete any connector profiles.
Unregister the custom connector.
To delete the stack, run the following command from the amazon-appflow-custom-jdbc-connector base directory:
```
sam delete
```

Conclusion

This blog post shows how to extend the Amazon AppFlow service to move data between SaaS endpoints and custom APIs. You can now build custom connectors using the Amazon AppFlow Custom Connector SDK.

Using custom connectors in Amazon AppFlow allows you to integrate siloed applications with minimal code. For example, different business units using legacy applications in an organization can now integrate their services via the Amazon AppFlow Custom Connectors SDK.

Depending on your choice of framework you can use the open source Python SDK or Java SDK from GitHub. To learn more, refer to the Custom Connector SDK Developer Guide.

For more serverless learning resources, visit Serverless Land.

Introducing AWS Virtual Waiting Room

2022-02-10 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-aws-virtual-waiting-room/

This post is written by Justin Pirtle, Principal Solutions Architect, Joan Morgan, Software Developer Engineer, and Jim Thario, Software Developer Engineer.

Today, AWS is introducing an official AWS Virtual Waiting Room solution. You can integrate this new, open-source solution with existing web and mobile applications. It can help buffer users during times of peak demand and sudden bursts of traffic, preventing systems from resource exhaustion.

Events commonly use virtual waiting rooms where there is either unknown demand or expected large bursts of traffic. Examples of such events include concert ticket sales, Black Friday promotions, COVID-19 vaccine registrations, and more. Virtual waiting rooms allow a quota of users to view, select, and complete their transactions directly. They shield the application’s backend environment from traffic by buffering users in a waiting room until it is their turn in line.

Like any real-life queuing system, a user enters the AWS Virtual Waiting Room and requests a number in line. After receiving a number corresponding to the unique device ID, the browser then polls regularly for updates. The update provides the current number being served and anticipated time until they are front of line.

After reaching the front of the line, the user can exchange the number and device ID for a secure session token. This is included with their downstream requests to authenticate users securely.

If a user discovers the backend endpoint and tries to send requests, they are redirected into the waiting room. The API requests are denied access until they have a valid token. This prevents the backend from needing to scale to accommodate all users at a single time.

Integrating the AWS Virtual Waiting Room into your application

Integration steps depend on the integration pattern for your application. You can decide if all users are routed through the waiting room or only during periods of excessive traffic. You can also choose to protect only the web host serving the backend webpages or one or more APIs powering backend commerce services.

There are four common patterns supported for integrating the waiting room into your application:

Upstream redirection of all traffic from the main target site to flow through AWS Virtual Waiting Room. This option sends all user traffic through the waiting room with the initial capacity of users permitted to the protected system. The traffic passes through transparently, then it buffers the remaining users. It admits new users as capacity becomes available. The target system is only accessible by users who pass through the waiting room.
Downstream redirection to the virtual waiting room from the target site. This option sends all traffic to the target site. The target site conditionally redirects requests that need to enter the waiting room. No DNS or upstream modifications are needed. The target site must be able to handle the initial user requests and redirection responses.
Direct target site API integration for buffering users from an existing website without any redirection. Your web or mobile application integrates the virtual waiting room at the API-level. This does not need any redirection to a different waiting room endpoint or site. This can offer a seamless user experience but may require more development for the integration.
OpenID Connect (OIDC) adapter. This option offers no-code native integration of the waiting room with OpenID Connect-enabled system components, such as the AWS Application Load Balancer (ALB). Users are redirected by the load balancer or similar component to the waiting room. They are buffered until issued a signed, time-limited JSON Web Token (JWT). Once the user’s JWT token is issued, the load balancer then forwards user requests to the target backend systems.

Overview of the AWS Virtual Waiting Room solution

The AWS Virtual Waiting Room solution implementation includes three main components:

Core APIs. The main resources deployed include two Amazon API Gateway deployments, a VPC, several AWS Lambda functions, an Amazon DynamoDB table, and an Amazon ElastiCache cluster. This API provides the basic mechanisms for tracking clients entering the waiting room. It requests status of the line progression and an authentication token to enter the target protected site.
Waiting room front-end website. The waiting room static site is shown to users awaiting their turn. This site dynamically updates the position being served and their place in line on a configurable interval. You customize this site’s HTML, CSS, and JavaScript to match your frontend styling and theme.
Lambda authorizer for protected target system. The Lambda authorizer wraps and protects the downstream protected target system’s APIs. This ensures that all user invocations have a validated time-limited token issued by the waiting room core API. It helps to prevent users from bypassing the waiting room.

The Virtual Waiting Room CloudFormation template deploys the following infrastructure:

An Amazon CloudFront distribution to deliver public API calls for the client.
Amazon API Gateway public API resources to process queue requests from the virtual waiting room, track the queue position, and support validation of tokens that allow access to the target website.
An Amazon Simple Queue Service (Amazon SQS) queue to regulate traffic to the AWS Lambda function that processes the queue messages. Instead of invoking the Lambda function for each request, the SQS queue batches the incoming bursts of requests.
API Gateway private API resources to support administrative functions.
Lambda functions to validate and process public and private API requests, and return the appropriate responses.
Amazon Virtual Private Cloud (VPC) to host the Lambda functions that interact directly with the Amazon ElastiCache for Redis cluster. VPC endpoints allow Lambda functions in the VPC to communicate with services within the solution.
An Amazon CloudWatch rule to invoke a Lambda function that works with a custom Amazon EventBridge bus to periodically broadcast status updates.
An Amazon DynamoDB table to store token data.
AWS Secrets Manager to store keys for token operations and other sensitive data.
(Optional) Authorizer component consisting of an AWS Identity and Access Management (IAM) role and a Lambda function to validate signatures for your API calls. The only requirement for the authorizer to protect your API is to use API Gateway.
(Optional) Amazon Simple Notification Service (Amazon SNS), CloudWatch, and Lambda functions to support two inlet strategies.
(Optional) OpenID adaptor component with API Gateway and Lambda functions to allow an OpenID provider to authenticate users to your website. CloudFront distribution with an Amazon Simple Storage Service (Amazon S3) bucket for the waiting room page for this component.
(Optional) A CloudFront distribution with Amazon S3 origin bucket for the optional sample waiting room web application.

Deploying the AWS Virtual Waiting Room

To get started with the AWS Virtual Waiting Room, deploy the Getting Started stack. This deploys the Core APIs stack, the Authorizers stack, and a sample application CloudFormation stack:

Launch the Getting Started CloudFormation stack. The template launches in the US East (N. Virginia) Region by default. To launch the solution in a different AWS Region, use the Region selector in the console navigation bar.
On the Create stack page, verify that the correct template URL is in the Amazon S3 URL text box and choose Next.
On the Specify stack details page, assign a name to your solution stack, and accept all default parameter values. For information about naming character limitations, refer to IAM and STS Limits in the AWS Identity and Access Management User Guide. Choose Next.
On the Configure stack options page, choose Next.
On the Review page, review and confirm the settings. Check the box acknowledging that the template creates AWS Identity and Access Management (IAM) resources.
Choose Create stack to deploy the stack.
You can view the status of the stack in the AWS CloudFormation Console in the Status column. You should receive a CREATE_COMPLETE status in approximately 30 minutes.
Once successfully deployed, browse to the Outputs tab.
Copy the ControlPanelURL and WaitingRoomURL to a scratch pad file for later use.

Configuring the AWS Virtual Waiting Room

After deploying the three stacks, test the waiting room using the sample application:

Navigate to the IAM console. Create a new IAM user or select an existing IAM user in the same account where you deployed the waiting room stack.
Grant the selected IAM user programmatic access. Download the key file or copy the access key ID and secret access key values to your scratch pad for later use.
Add the IAM user to the ProtectedAPIGroup IAM user group created by the getting started template:
Open the control panel in a new tab or browser window using the ControlPanelURL output you saved earlier.
In the control panel, expand the Configuration section.
Enter the access key ID and secret access key that you retrieved in Generate AWS keys to call the IAM secured APIs. The endpoints and event ID are filled in from the URL parameters.
Choose Use. The button activates after you have supplied the credentials.
You now see the status “Connected” shown following the various metrics reported:

Test the sample waiting room

Browse to the sample waiting room in a new browser tab. Use the WaitingRoomURL you captured previously from the CloudFormation stack output values.
Select Reserve to enter the waiting room. If you are unable to proceed with your transaction, your assigned number is not yet reached.
Navigate back to the browser tab with the control panel.
Under Increment Serving Counter, select Change. This manually increments the serving counter and allows 100 users to move on from the waiting room to the target site.
Navigate back to the waiting room and choose Check out now! You are redirected to the target site since your serving number is eligible to proceed beyond the waiting room.
Select Purchase now to finish your transaction at the target site. This page represents the protected system beyond the waiting room. Replace this with the actual system users you are protecting.
After the simulated purchase is complete, you can see that the transaction is successful. This transaction is authorized using the time-limited authorization token, which came from the waiting room previously. If a user bypasses the waiting room, they would not be successful in completing a transaction.

Customizing the AWS Virtual Waiting Room for your application

The sample browser client demonstrates an entire user flow frontend with the AWS Virtual Waiting Room flow for an ecommerce purchase. You can use this code as a starting point for your waiting room or reference the API communication code for integrating the waiting room into your existing website.

This sample code is built with Vue.js and Bootstrap to render the user interface. It uses the Axios and Axios-Retry packages to make API calls to the virtual waiting room stack. The sample code uses the Axios-Retry package to show how to handle throttling conditions and exponential backoff in high-traffic situations.

The control panel client is used to make requests to the private waiting room API that requires IAM-based authorization. The control panel client demonstrates how to construct and sign requests to the private API. It can be used in production or customized further. All of the sample source code room source is available in GitHub including the sample user client and control panel client.

Conclusion

The AWS Virtual Waiting Room solution is available today at no additional cost, provided as open source under the Apache 2 license. It supports customized integration with any front-end application via a variety of integration techniques. You can also customize how and when the waiting room allows users to progress into the protected target system using a variety of strategies.

To learn more about the AWS Virtual Waiting Room solution, visit the solution implementation and implementation guide.

For more serverless learning resources, visit Serverless Land.

Creating computing quotas on AWS Outposts rack with EC2 Capacity Reservations sharing

2022-02-10 Rachel Zheng

Post Syndicated from Rachel Zheng original https://aws.amazon.com/blogs/compute/creating-computing-quotas-on-aws-outposts-rack-with-ec2-capacity-reservation-sharing/

This post is written by Yi-Kang Wang, Senior Hybrid Specialist SA.

AWS Outposts rack is a fully managed service that delivers the same AWS infrastructure, AWS services, APIs, and tools to virtually any on-premises datacenter or co-location space for a truly consistent hybrid experience. AWS Outposts rack is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency, and migration of applications with local system interdependencies. In addition to these benefits, we have started to see many of you need to share Outposts rack resources across business units and projects within your organization. This blog post will discuss how you can share Outposts rack resources by creating computing quotas on Outposts with Amazon Elastic Compute Cloud (Amazon EC2) Capacity Reservations sharing.

In AWS Regions, you can set up and govern a multi-account AWS environment using AWS Organizations and AWS Control Tower. The natural boundaries of accounts provide some built-in security controls, and AWS provides additional governance tooling to help you achieve your goals of managing a secure and scalable cloud environment. And while Outposts can consistently use organizational structures for security purposes, Outposts introduces another layer to consider in designing that structure. When an Outpost is shared within an Organization, the utilization of the purchased capacity also needs to be managed and tracked within the organization. The account that owns the Outpost resource can use AWS Resource Access Manager (RAM) to create resource shares for member accounts within their organization. An Outposts administrator (admin) can share the ability to launch instances on the Outpost itself, access to the local gateways (LGW) route tables, and/or access to customer-owned IPs (CoIP). Once the Outpost capacity is shared, the admin needs a mechanism to control the usage and prevent over utilization by individual accounts. With the introduction of Capacity Reservations on Outposts, we can now set up a mechanism for computing quotas.

Concept of computing quotas on Outposts rack

In the AWS Regions, Capacity Reservations enable you to reserve compute capacity for your Amazon EC2 instances in a specific Availability Zone for any duration you need. On May 24, 2021, Capacity Reservations were enabled for Outposts rack. It supports not only EC2 but Outposts services running over EC2 instances such as Amazon Elastic Kubernetes (EKS), Amazon Elastic Container Service (ECS) and Amazon EMR. The computing power of above services could be covered in your resource planning as well. For example, you’d like to launch an EKS cluster with two self-managed worker nodes for high availability. You can reserve two instances with Capacity Reservations to secure computing power for the requirement.

Here I’ll describe a method for thinking about resource pools that an admin can use to manage resource allocation. I’ll use three resource pools, that I’ve named reservation pool, bulk and attic, to effectively and extensibly manage the Outpost capacity.

A reservation pool is a resource pool reserved for a specified member account. An admin creates a Capacity Reservation to match member account’s need, and shares the Capacity Reservation with the member account through AWS RAM.

A bulk pool is an unreserved resource pool that is used when member accounts run out of compute capacity such as EC2, EKS, or other services using EC2 as underlay. All compute capacity in the bulk pool can be requested to launch until it is exhausted. Compute capacity that is not under a reservation pool belongs to the bulk pool by default.

An attic is a resource pool created to hold the compute capacity that the admin wouldn’t like to share with member accounts. The compute capacity remains in control by admin, and can be released to the bulk pool when needed. Admin creates a Capacity Reservation for the attic and owns the Capacity Reservation.

The following diagram shows how the admin uses Capacity Reservations with AWS RAM to manage computing quotas for two member accounts on an Outpost equipped with twenty-four m5.xlarge. Here, I’m going to break the idea into several pieces to help you understand easily.

There are three Capacity Reservations created by admin. CR #1 reserves eight m5.xlarge for the attic, CR #2 reserves four m5.xlarge instances for account A and CR #3 reserves six m5.xlarge instances for account B.
The admin shares Capacity Reservation CR #2 and CR #3 with account A and B respectively.
Since eighteen m5.xlarge instances are reserved, the remaining compute capacity in the bulk pool will be six m5.xlarge.
Both Account A and B can continue to launch instances exceeding the amount in their Capacity Reservation, by utilizing the instances available to everyone in the bulk pool.

Once the bulk pool is exhausted, account A and B won’t be able to launch extra instances from the bulk pool.
The admin can release more compute capacity from the attic to refill the bulk pool, or directly share more capacity with CR#2 and CR#3. The following diagram demonstrates how it works.

Based on this concept, we realize that compute capacity can be securely and efficiently allocated among multiple AWS accounts. Reservation pools allow every member account to have sufficient resources to meet consistent demand. Making the bulk pool empty indirectly sets the maximum quota of each member account. The attic plays as a provider that is able to release compute capacity into the bulk pool for temporary demand. Here are the major benefits of computing quotas.

Centralized compute capacity management
Reserving minimum compute capacity for consistent demand
Resizable bulk pool for temporary demand
Limiting maximum compute capacity to avoid resource congestion.

Configuration process

To take you through the process of configuring computing quotas in the AWS console, I have simplified the environment like the following architecture. There are four m5.4xlarge instances in total. An admin account holds two of the m5.4xlarge in the attic, and a member account gets the other two m5.4xlarge for the reservation pool, which results in no extra instance in the bulk pool for temporary demand.

Prerequisites

The admin and the member account are within the same AWS Organization.
The Outpost ID, LGW and CoIP have been shared with the member account.

Creating a Capacity Reservation for the member account

Sign in to AWS console of the admin account and navigate to the AWS Outposts page. Select the Outpost ID you want to share with the member account, choose Actions, and then select Create Capacity Reservation. In this case, reserve two m5.4xlarge instances.

In the Reservation details, you can terminate the Capacity Reservation by manually enabling or selecting a specific time. The first option of Instance eligibility will automatically count the number of instances against the Capacity Reservation without specifying a reservation ID. To avoid misconfiguration from member accounts, I suggest you select Any instance with matching details in most use cases.

Sharing the Capacity Reservation through AWS RAM

Go to the RAM page, choose Create resource share under Resource shares page. Search and select the Capacity Reservation you just created for the member account.

Choose a principal that is an AWS ID of the member account.

Creating a Capacity Reservation for attic

Create a Capacity Reservation like step 1 without sharing with anyone. This reservation will just be owned by the admin account. After that, check Capacity Reservations under the EC2 page, and the two Capacity Reservations there, both with availability of two m5.4xlarge instances.

Launching EC2 instances

Log in to the member account, select the Outpost ID the admin shared in step 2 then choose Actions and select Launch instance. Follow AWS Outposts User Guide to launch two m5.4xlarge on the Outpost. When the two instances are in Running state, you can see a Capacity Reservation ID on Details page. In this case, it’s cr-0381467c286b3d900.

So far, the member account has run out of two m5.4xlarge instances that the admin reserved for. If you try to launch the third m5.4xlarge instance, the following failure message will show you there is not enough capacity.

Allocating more compute capacity in bulk pool

Go back to the admin console, select the Capacity Reservation ID of the attic on EC2 page and choose Edit. Modify the value of Quantity from 2 to 1 and choose Save, which means the admin is going to release one more m5.4xlarge instance from the attic to the bulk pool.

Launching more instances from bulk pool

Switch to the member account console, and repeat step 4 but only launch one more m5.4xlarge instance. With the resource release on step 5, the member account successfully gets the third instance. The compute capacity is coming from the bulk pool, so when you check the Details page of the third instance, the Capacity Reservation ID is blank.

Cleaning up

Terminate the three EC2 instances in the member account.
Unshare the Capacity Reservation in RAM and delete it in the admin account.
Unshare the Outpost ID, LGW and CoIP in RAM to get the Outposts resources back to the admin.

Conclusion

In this blog post, the admin can dynamically adjust compute capacity allocation on Outposts rack for purpose-built member accounts with an AWS Organization. The bulk pool offers an option to fulfill flexibility of resource planning among member accounts if the maximum instance need per member account is unpredictable. By contrast, if resource forecast is feasible, the admin can revise both the reservation pool and the attic to set a hard limit per member account without using the bulk pool. In addition, I only showed you how to create a Capacity Reservation of m5.4xlarge for the member account, but in fact an admin can create multiple Capacity Reservations with various instance types or sizes for a member account to customize the reservation pool. Lastly, if you would like to securely share Amazon S3 on Outposts with your member accounts, check out Amazon S3 on Outposts now supports sharing across multiple accounts to get more details.

How UnitedHealth Group Improved Disaster Recovery for Machine-to-Machine Authentication

2022-02-08 Vinodh Kumar Rathnasabapathy

Post Syndicated from Vinodh Kumar Rathnasabapathy original https://aws.amazon.com/blogs/architecture/how-unitedhealth-group-improved-disaster-recovery-for-machine-to-machine-authentication/

This blog post was co-authored by Vinodh Kumar Rathnasabapathy, Senior Manager of Software Engineering, UnitedHealth Group.

Engineers who use Amazon Cognito for machine-to-machine authentication select a primary Region where they deploy their application infrastructure and the Amazon Cognito authorization endpoint. Amazon Cognito is a highly available service in single Region deployments with a published service-level agreement (SLA) target of 99.9%. The UnitedHealth Group (UHG) team needed a solution that would enable them to build and deploy their applications in multiple Regions to achieve higher availability targets. A multi-Region application architecture would also allow UHG engineers to failover to a secondary Region in the event that their application experienced issues in the primary Region.

At UHG, Federated Data Services (FDS) is a business-critical customer-facing application, which requires 99.95% availability and disaster recovery features. The FDS engineering team needed their Amazon Cognito infrastructure to be highly available in case of any service events in AWS Regions, along with having greater flexibility of switching between Regions.

The FDS engineering team worked on a custom solution using existing AWS services to fulfill their availability and recovery requirements. This solution not only serves the purpose of their current business needs but also provides recovery in case of any future disaster.

Overview of the solution

In this solution, we select two AWS Regions which will include the primary Region and the failover Region. Amazon Cognito app clients (including client ID and client secret pair) are created in both Regions and stored in an Amazon DynamoDB table. Client applications are given the client ID and client secret of the primary Region. Optionally, an application-generated ID and secret can be provided to the client to conceal the actual Amazon Cognito client ID and client secret. The process is as follows:

The client application (machine) initiates an authentication request by sending the Amazon Cognito app client ID and client secret to an Amazon Route 53 domain record.
Route 53 routes authentication requests to the in-Region Amazon API Gateway utilizing a Simple routing policy. From there, API Gateway shuts down the TLS connection using AWS Certificate Manager (ACM), and serves as a proxy for the authentication request to AWS Lambda.
AWS Lambda verifies the client ID and client secret, and uses them to look up the in-Region client ID and client secret.
Lambda uses preconfigured environment variables to request the appropriate Region from DynamoDB.
AWS Lambda then passes the returned app client credentials to the in-Region Amazon Cognito deployment. Amazon Cognito verifies the client ID and client secret, and returns an access token to the Lambda function.
The client application (machine) can now use this token to access downstream applications.
The client authentication process in the secondary (failover) Region is the same, with one exception. In the secondary Region, the Lambda environment variables retrieve app client credentials from the DynamoDB database for the secondary Region Amazon Cognito instance.

To initiate a failover between Regions, the Route 53 domain record needs to be pointed to the secondary Region API Gateway Regional endpoint. The downstream application’s Amazon Cognito configuration files must also be updated to point to the secondary Region Amazon Cognito instance. Alerts can be enabled using Amazon CloudWatch alarms to notify system operators of issues that may warrant a failover (a manual process to help system operators decide when to failover). The entire failover process takes just a couple of seconds for DNS to switch over and for the application to start accepting tokens from the secondary Region. This failover process could be automated based on generated alerts.

This architecture is suitable for a hot standby, active-passive type of application deployment. It is important to note that independent Amazon Cognito environments are being used in each Region, so you will need to set up your application to failover to the secondary Region for authentication. For example, your backend should be able to accept and validate access tokens from both primary and secondary Amazon Cognito user pools. To learn more about disaster recovery options in AWS, visit Disaster recovery options in the cloud.

Architecture overview

Figure 1 shows you how to build a multi-Region machine-to-machine architecture using Amazon Cognito, which uses DynamoDB global tables to perform the data replication. A Lambda function is utilized to retrieve the credentials for the active Region that the application is operating in, and a Regional Amazon Cognito endpoint returns the required token.

Figure 1. Multi-Region Amazon Cognito machine-to-machine architecture

Process flow

The Route53 domain record for the authentication proxy service is given to the client application and pointed at the API Gateway Regional endpoint. The client passes primary Region app client credentials to the API Gateway.
API Gateway passes Lambda the client ID and client secret pair.
Lambda does a lookup in DynamoDB to verify the client ID and client secret. After the identity is confirmed, Lambda uses Region-based environment variables to identify if the client should be using the primary Region or secondary Region for authenticating to Amazon Cognito. Lambda retrieves the Region-based client ID and client secret from DynamoDB.
Lambda passes the Region-based app ID and secret to Amazon Cognito, which verifies the client ID and client secret, and returns an access token to the Lambda function.
Lambda passes the access token from the Regional Amazon Cognito environment back to the client to be used against Region-based backend applications.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Note: Ensure that you are following your organization’s security best practices while deploying this architecture.

Implementation

This implementation will focus on the Lambda logic that is used to retrieve user credentials based on Lambda environment variables. We will share snippets of the Lambda function code so you can create the logic necessary to enable multi-Region application architecture using Amazon Cognito app clients. In addition to the Lambda function, you will also need to create and configure the following resources using security implementation designated by your organization:

DynamoDB table with fields primaryClientID, primaryClientSecret, secondaryClientID, and secondaryClientSecret.

1. Because this table is used to store secrets, make sure encryption is enabled and you follow Security Best Practices for Amazon DynamoDB and your organization’s security best practices.
2. Enable DynamoDB global tables.

API Gateway Regional endpoint with TLS encryption using ACM.
Route53 domain record routing traffic to Regional API Gateway.
Downstream application configuration pointing at the Regional Amazon Cognito endpoint.
IAM role that grants access from Lambda to DynamoDB.

Now let’s configure our Lambda function using Node.js language. Within the Lambda console create a new Lambda function that you will author from scratch. Select the Node.js runtime and change the runtime Lambda execution role to the IAM role that you have created for Lambda. Next, we will walk through the Lambda function code and configuration.

Attach your new IAM role as a Lambda runtime role that will grant your Lambda function access to the DynamoDB table.
Within the Lambda configuration environment variables, create several key-value variables in the Lambda function. The following are the environmental variables we will add.

1. key: OAUTH_HOST_PRIMARY, value: https://${cognito-primary-region-domain-name}/oauth2/token
2. key: OAUTH_HOST_SECONDARY, value: https://${cognito-secondary-region-domain-name}/oauth2/token

Import the Node.js libraries using the following code.

"dependencies": {
    "aws-sdk": "^2.723.0",
    "aws-serverless-express": "^3.3.8",
    "axios": "^0.21.1",
    "base-64": "^1.0.0",
    "cors": "^2.8.5",
    "dotenv": "^8.2.0",
    "express": "^4.17.1",
    "jsonwebtoken": "^8.5.1",
    "jwt-decode": "^2.2.0",
    }

Parse data from the incoming client application authentication request.

router.post("/", async (request, response) => {
  const client_id = request.body["client_id"];
  const client_secret = request.body["client_secret"]
      
  const client = await dynamoDB.getClientCredentialById(client_id, client_secret);
    
  if (client === undefined) {
    return response.status(400).json({
       message:
          "Client not configured for in Cognito. Please check with support team",
        });
      }
    
   const client_credentials = getClientCredentials(client);
   let access_token = await authService.getJwtToken(client_credentials);
          
   response.send(access_token);
});

Reference the environment variables to determine the Region that the Lambda function is operating in and set the Region as primary or secondary.

getClientCredentials = client => {
   return region === "us-east-1" ? { clientId: client.primaryClientId, clientSecret: client.primaryClientSecret, oAuthHost: config.OAUTH_HOST }
   : region === "us-east-2" ? { clientId: client.secondaryClientId, clientSecret: client.secondaryClientSecret, oAuthHost: config.OAUTH_HOST_EAST_2 }
   : {};
}

Verify the client ID and client secret against the DynamoDB table and get the Region-based client ID and client secret.

const getClientCredentialById = async (client_id, client_secret) => {

  let params = {
    TableName: clientCredentialTable,
    Key: {
      primaryClientId: client_id,
      primaryClientSecret: client_secret,
    },
  };

  const clientCredential = await ddb.get(params).promise();
  return clientCredential.Item;
};

Pass the Regional client ID and client secret to Amazon Cognito. You will receive an access token from Amazon Cognito.

   const base64ClientCredentials = base64.encode(
     client_credentials.clientId.concat(":").concat(client_credentials.clientSecret)
   );
   const headers = {
     "content-type": "application/x-www-form-urlencoded",
     authorization: "Basic " + base64ClientCredentials,
    };
    const data = "grant_type=client_credentials";
    
    // Post request to Cognito OAuth URL
    const token = await Axios.post(client_credentials.oAuthHost, data, { headers });
    return token.data;
 };

Pass the access token from the Regional Amazon Cognito environment back to the client to be used against Region-based backend applications.

let access_token = await authService.getJwtToken(client_credentials);
 
  response.send(access_token);

New app client creation

You need to implement this Lambda function in both the primary and secondary Regions. Modify the environmental variables in the secondary Region with the secondary Region’s information.

To enroll a new client in this multi-Region architecture using Amazon Cognito we will go through the process as shown in the following illustration.

Figure 2. Creating a new app client

You need to create a new:

App ID and secret in Amazon Cognito in the primary Region.
App ID and secret in Amazon Cognito in the secondary Region.
Item in DynamoDB table consisting of the Amazon Cognito credentials created: primaryClientID, primaryClientSecret, secondaryClientID, and secondaryClientSecret.

Failover process

In this blog post we are creating a hot standby, active-passive type of application deployment. You will need to ensure your application is configured to be able to use either the primary Region Amazon Cognito or the secondary Region Amazon Cognito environment. The process to failover between Regions consists of the following:

Start application backend in the secondary Region.
Reconfigure the application backend Amazon Cognito identity provider YAML file to point at the secondary Region Amazon Cognito identity provider.
Modify the Route 53 domain record to point client applications at the secondary Region API Gateway Regional endpoint.

Cleaning up

In order to avoid unnecessary charges, please be sure to clean up any resources that were built as part of this architecture that are no longer in use.

Conclusion

In this blog post, we presented a solution that allows you to failover Amazon Cognito app clients from one AWS Region to another Region. The benefits of this architecture will allow you to have greater flexibility for running your Amazon Cognito app clients in the Region that is best suited for your use case. With this solution you now have the capability to failover Amazon Cognito app clients to a different AWS Region in the event of application or system errors.

Several variants of this solution can be implemented using the provided Lambda failover logic. You can store App ID and secret in AWS Secrets Manager. To learn more, see How to replicate secrets in AWS Secrets Manager to multiple Regions. You can also automate the failover process between primary and secondary Regions. To accomplish this, you will need to evaluate events which should cause a failover in your environment. Later, build the appropriate automation to failover your downstream application to the secondary Region Amazon Cognito deployment.

Amazon Cognito can be used for machine authentication as per the limits posted in Quotas in Amazon Cognito. Review the limits of number of app clients per user pool and the other applicable rate limits (for example, client credentials rate limits) and verify that these limits meet the needs of your application.

Optum’s Story

UnitedHealth Group (UHG) is the health and well-being company responsible for over 150 million lives globally. Optum, a part of UnitedHealth Group, is a health services business serving the healthcare marketplace, including payers, care providers, employers, governments, life sciences companies and consumers, through its OptumHealth, OptumInsight, OptumRx, and OptumServe businesses.

Federated Data Services (FDS) is the power behind the scenes that enables interoperability and secure transmission of personally identifiable information. It protects health information between lines of businesses, technology systems, applications, members and providers. With FDS, data is centralized, making it easier to share and retrieve by other systems. This assures businesses get a flexible and scalable solution that adapts to changes in technology and business needs.

How to mount Linux volume and keep mount point consistency

2022-02-05 limillan

Post Syndicated from limillan original https://aws.amazon.com/blogs/compute/how-to-mount-linux-volume-and-keep-mount-point-consistency/

This post is written by: Leonardo Azize Martins, Cloud Infrastructure Architect, Professional Services

Customers often use Amazon Elastic Compute Cloud (Amazon EC2) Linux based instances with many Amazon Elastic Block Store (Amazon EBS) volumes attached. In this case, device name can vary depending on some facts, such as virtualization type, instance type, or operating system. As the device name can change, the customer shouldn’t rely on the device name to mount volumes from it. The customer wants to avoid the situation where a volume is mounted on a different mount point just because the device name changed, or the device name doesn’t exist because the name pattern changed.

Customers who want to utilize the latest instance family usually change the instance type when a new one is available. The device name can be different between instance families, such as T2 and T3. T2 uses /dev/sd[a-z], while T3 uses /dev/nvme[0-26]n1. If you mount a device on T2 called /dev/sdc, when you change the instance family to T3 the same device won’t be called /dev/sdc anymore. In this case, it will fail to mount.

Amazon EBS volumes are exposed as NVMe block devices on instances built on the AWS Nitro System. The block device driver can assign NVMe device names in a different order than what you specified for the volumes in the block device mapping. In this situation, a device that should be mounted on /data could end-up being mounted on /logs.

On Linux, you can use the fstab file to mount devices using kernel name descriptors (the traditional way), file system labels, or the file system UUID. Kernel name descriptors aren’t persistent and can change each boot. Therefore, they shouldn’t be used in configuration files.

UUID is a mechanism for giving each filesystem a unique identifier. These identifiers’ attributes are generated by filesystem utilities (mkfs.*) when the device is formatted, and they’re designed so that collisions are unlikely. All GNU/Linux filesystems (including swap and LUKS headers of raw encrypted devices) support UUID.

As UUID is a filesystem attribute, it can also be used with Logical Volume Manager (LVM) and Linux software RAID (mdadm).

Depending on the fstab file configuration, you may find that you can’t access your instance, which requires you to follow a rescue process to fix issues. This is the case if you configure the fstab file with the device name and change the instance type.

This post shows how to mount Linux volumes and keep mount points preserved when the instance type is changed or the instance is rebooted.

Overview of solution

When you create an instance, you specify block devices mapping. It doesn’t mean that the Linux device has the same name or can be discovered in the same order as specified on the instance mapping. This situation can be more evident when using applications that require more volumes.

Using UUID to mount volumes lets you mitigate future issues when you stop and start your instance or change the instance type.

Figure 1: EC2 instance block device mapping

Walkthrough

You will create one instance with three volumes: one root volume and two data volumes. We use Amazon Linux 2 in this post. In this instance type, volumes have a specific name format. Later, you will change the instance type. The new instance type volumes will have another name format.

Follow these steps:

Create an instance with three volumes
Create the filesystem on the volumes and mount them
Change the instance type

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
Linux knowledge

Create instance

Create one instance with three volumes: a root volume and two data volumes. Use Launch Instance Wizard with the following details.

Launch Amazon Linux 2 instance

On Step 1, choose Amazon Linux 2 AMI (HVM), SSD Volume Type.
On Step 2, choose micro.
On Step 3, choose Next.
On Step 4, add two new volumes, device /dev/sdb 10 GiB and device /dev/sdc 12 GiB.

Figure 2: Launch instances, add storage

Create filesystem and mount

Connect to your instance using EC2 Instance Connect or any other method that feels comfortable for you. Mount the device using UUID instead of the device name. Run the following instructions as the user root.

Format and mount the device

Run the following command to confirm that you have three disks:
```
$ lsblk
```
Format disk as xfs, and run the following commands:
```
mkfs.xfs /dev/xvdb
mkfs.xfs /dev/xvdc
```
Create mount point, and run the following commands:
```
mkdir /mnt/disk1
mkdir /mnt/disk2
```

Add mount instructions, and run the following commands:

echo "$(blkid /dev/xvdb | awk '{print $2}') /mnt/disk1 xfs defaults,noatime" | tee -a /etc/fstab
echo "$(blkid /dev/xvdc | awk '{print $2}') /mnt/disk2 xfs defaults,noatime" | tee -a /etc/fstab

Mount volumes, create dummy file, and run the following commands:
```
mount -a
touch /mnt/disk1/file1.txt
touch /mnt/disk2/file2.txt
```

You will have an fstab file like the following:

cat /etc/fstab
UUID=7b355c6b-f82b-4810-94b9-4f3af651f629     /           xfs    defaults,noatime  1   1
UUID="2c160dd6-586c-4258-84bb-c79933b9ae02" /mnt/disk1 xfs defaults,noatime
UUID="3e7a0c28-7cf1-40dc-82a1-2f5cfb53f9a4" /mnt/disk2 xfs defaults,noatime

Change instance type

Change the instance type from t2.micro to t3.micro.

Launch Amazon Linux 2 instance

Stop instance.
Change the instance type to micro.
Start instance.
Connect to your instance using EC2 Instance Connect.
Check the device name, and run the following command:

lsblk

List files, and run the following command:

ls -l /mnt/*

Note that the device names are changed from xvd* to nvme*. All of the devices are mounted without any issue and with the correct mount points.

Cleaning up

To avoid incurring future charges, delete the instance and all of the volumes that you created in this post.

The other side

UUID is an attribute of the filesystem that was generated when you formatted your device. Therefore, it will follow your device even if you create an AMI or snapshot. So you don’t need to worry about a restore process, and it will smoothly proceed to an instance restore. You must be careful if you restore a snapshot from one volume and attach it to the same instance, as you will end-up with two volumes that are using the same UUID. If you try to mount the restored volume on the same instance, then it will fail and you will find this message on /var/log/messages file. kernel: XFS (xvdf1): Filesystem has duplicate UUID f6646b81-f2a6-46ca-9f3d-c746cf015379 - can't mount It is even more important to be careful if you attach a volume created from the snapshot of the root volume and restart your instance. Since both volumes have the same UUID, you may find that a volume other than the one attached to /dev/xvda or /dev/sda has become the root volume of your instance. See the following example for details. Note that both volumes have the same UUID, but the one mounted on / is /dev/xvdf1, not /dev/xvda1, which is the real root volume for this instance.

$ blkid
/dev/xvda1: LABEL="/" UUID="f6646b81-f2a6-46ca-9f3d-c746cf015379" TYPE="xfs" PARTLABEL="Linux" PARTUUID="79fae994-3708-4293-bb29-4d069d1c786b"
/dev/xvdf1: LABEL="/" UUID="f6646b81-f2a6-46ca-9f3d-c746cf015379" TYPE="xfs" PARTLABEL="Linux" PARTUUID="79fae994-3708-4293-bb29-4d069d1c786b"

$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   8G  0 disk
└─xvda1 202:1    0   8G  0 part
xvdf    202:80   0   8G  0 disk
└─xvdf1 202:81   0   8G  0 part /

Conclusion

In this post, we covered how to use UUID to mount Linux devices using fstab file. This keeps the mount point on the correct device. It also lets you change the instance type without changes to the fstab file. You can use UUID with LVM and Linux software RAID (mdadm), UUID, as an attribute of the filesystem, will be the same even after a backup and restore process, snapshot, or clone. To learn more, check out our block device mappings and device names on Linux instances documentation.

Expanding Your EC2 Possibilities By Utilizing the CPU Launch Options

2022-02-04 limillan

Post Syndicated from limillan original https://aws.amazon.com/blogs/compute/expanding-your-ec2-possibilities-by-utilizing-the-cpu-launch-options/

This post is written by: Matthew Brunton, Senior Solutions Architect – WWPS

To ensure our customers have the appropriate machines available for their workloads, AWS offers a wide range of hardware options that include hundreds of types of instances that help customers achieve the best price performance for their workloads. In some specialized circumstances, our customers need an even wider range of options, or more flexibility. This could be driven by a desire to optimize licensing costs, or the customer wanting more hardware configuration options. Some high performance workloads can improve by turning off simultaneous multithreading. In our AWS Well Architected Framework – High Performance Computing Lens we have the following recommendation “Unless an application has been tested with hyperthreading enabled, it is recommended that hyperthreading be disabled”. With these factors in mind, AWS offers the ability to configure some options regarding the CPU configuration in launched instances.

Our larger instance types that have a higher number of cores, and offer multithreaded cores will translate to a larger combination of potential options. The valid combinations of cores and threads per core can be found here. To consider utilizing the CPU options for both cores and threads per core, you will need to consider instance types that have multiple CPU’s and/or cores.

You can specify numerous CPU options for some of our larger instances via the console, command line interface, or the API. Moreover, you can remove CPUs from the launch configuration, or deactivate threading within CPUs that have multiple threads per core. Amazon Elastic Compute Cloud (Amazon EC2) FAQ’s for the Optimize CPU’s feature can be found here. You should be aware that this feature is only available during instance launch and cannot be modified after launch. The launch options persist after you reboot, stop, or start an instance.

You can easily determine how many CPUs and threads a machine has. There are numerous ways to see this via the AWS Management Console and the AWS Command Line Interface (CLI).

Within the AWS Management Console, under the ‘Instance Details’ section, opening up the ‘Host and placement group’ item reveals the number of vCPUs that your machine has.

Figure 1: Console details showing number of vCPU’s

This information is also available using the AWS CLI as follows:

aws ec2 describe-instances --region us-east-1 --filters "Name=instance-type,Values='c6i.*large'"

...
    "Instances": [
        {
            "Monitoring": {
                "State": "disabled"
            }, 
            "PrivateDnsName": "ip-172-31-44-121.ec2.internal",
 "PrivateIpAddress": "172.31.44.121",                              
 "State": {
                "Code": 16, 
                "Name": "running"
            }, 
            "EbsOptimized": false, 
            "LaunchTime": "2021-11-22T01:46:58+00:00",
            "ProductCodes": [], 
            "VpcId": " vpc-7f7f1502", 
            "CpuOptions": { "CoreCount": 32, "ThreadsPerCore": 1 }, 
            "StateTransitionReason": "", 
            ...
        }
    ]
...

Figure 2: ec2 describe-instances cli output

A handy AWS CLI command ‘describe-instance-types’ will show the list of valid cores, the possible threads-per-core, and the default values for the vCPUs and cores. These are listed in the ‘DefaultVCpus’ and ‘DefaultCores’ items in the returned JSON listed as follows.

aws ec2 describe-instance-types --region us-east-1 --filters "Name=instance-type,Values='c6i.*'"
{
    "InstanceTypes": [
        {
            "InstanceType": "c6i.4xlarge",
            "CurrentGeneration": true,
            "FreeTierEligible": false,
            "SupportedUsageClasses": [
                "on-demand",
                "spot"
            ],
            "SupportedRootDeviceTypes": [
                "ebs"
            ],
            "SupportedVirtualizationTypes": [
                "hvm"
            ],
            "BareMetal": false,
            "Hypervisor": "nitro",
            "ProcessorInfo": {
                "SupportedArchitectures": [
                    "x86_64"
                ],
                "SustainedClockSpeedInGhz": 3.5
            },
            "VCpuInfo": { "DefaultVCpus": 16, "DefaultCores": 8, "DefaultThreadsPerCore": 2, "ValidCores": [ 2, 4, 6, 8 ], "ValidThreadsPerCore": [ 1, 2 ] },
            "MemoryInfo": {
                "SizeInMiB": 32768
            },

Figure 3: ec2 describe-instance-types cli output

To utilize the AWS CLI to launch one or multiple instances, you should use the run instances CLI command.

The following is the shorthand syntax:

aws ec2 run-instances --image-id xxx --instance-type xxx --cpu-options “CoreCount=xx,ThreadsPerCore=xx” --key-name xxx --region xxx

For example, the following command launches a 32-core machine with only 1 thread per core instead of the standard 2 threads per core:

aws ec2 run-instances --image-id ami-0c2b8ca1dad447f8a --instance-type c6i.16xlarge
--cpu-options " CoreCount =32, ThreadsPerCore =1" --key-name MyKeyPair --region xxx

If you are using the CPU options parameters in CloudFormation templates, then the following applies:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-instance-cpuoptions.html

The following is an example of the YAML syntax for specifying the CPU configuration.

Resources:
  CustomEC2Instance:
    Type: "AWS::EC2::Instance"
    Properties:
      InstanceType: xxx
      ImageId: xxx
      CpuOptions:
CoreCount: xx
ThreadsPerCore: x

As can be seen in the following table, there are a number of valid CPU core options, as well as the option to set one or two threads per core for each CPU for certain instance types. This significantly opens up the number of combinations and permutations to meet your specific workload need. In the case of the c6i instance listed in the following table, there are an additional 32 CPU core and threading combinations available to customers.

Instance type	Default vCPUs	Default CPU cores	Default threads per core	Valid CPU cores	Valid threads per core
c6i.16xlarge	64	32	2	2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32	1, 2

Figure 4: Valid Core and Thread Launch Options

Note that you still pay the same amount for the EC2 instance, even if you deactivate some of the cores or threads.

Conclusion

This approach can let customers customize the EC2 hardware options even further to ensure a wider range of CPU/memory combinations over and above the already extensive AWS instance options. This lets customers finely tune hardware for their exact application requirements, whether that is running High Performance workloads or trying to save money where license restrictions mean you can benefit from a specific CPU configuration.

We have run through various scenarios in this post, which detailed how to launch instances with alternate CPU configurations, easily check the current configuration of your running instances via our API and console, and how to configure the options in CloudFormation templates.

We love to see our customers maximizing the flexibility of the AWS platform to deliver outstanding results. Have a look at some of your High Performance workloads and give the threading options a try, or take a deep dive into any of your more expensive licenses to see if you could benefit from an alternate CPU configuration. In order to get started, check out our detailed developer documentation for the optimize CPU options.

How to deploy AWS Network Firewall to help protect your network from malware

2022-01-27 Ajit Puthiyavettle

Post Syndicated from Ajit Puthiyavettle original https://aws.amazon.com/blogs/security/how-to-deploy-aws-network-firewall-to-help-protect-your-network-from-malware/

Protecting your network and computers from security events requires multi-level strategies, and you can use network level traffic filtration as one level of defense. Users need access to the internet for business reasons, but they can inadvertently download malware, which can impact network and data security. This post describes how to use custom Suricata Rules with AWS Network Firewall to add protections that prevent users from downloading malware. You can use your own internal list, or a list from commercial or open-source threat intelligence feeds.

Network Firewall is a managed service that makes it easy to deploy essential network protection for all of your Amazon Virtual Private Cloud (Amazon VPC) Infrastructure. Network Firewall’s flexible rules engine lets you define firewall rules, giving you fine-grained control over network traffic, such as blocking outbound requests to prevent the spread of potential malware.

Features of Network Firewall

This section describes features of Network Firewall that help improve the overall security of your network.

Network Firewall:

Is a managed Amazon Web Services (AWS) service, so you don’t have to build and maintain the infrastructure to host the network firewall.
Integrates with AWS Firewall Manager, which allows you to centrally manage security policies and automatically enforce mandatory security policies across existing and newly created accounts and virtual private clouds (VPCs).
Protects application availability by filtering inbound internet traffic using tools such as access control list (ACL) rules, stateful inspection, protocol detection, and intrusion prevention.
Provides URL, IP address, and domain-based outbound traffic filtering to help you meet compliance requirements, stop potential data leaks, and block communication with known malware hosts.
Gives you control and visibility of VPC-to-VPC traffic to logically separate networks that host sensitive applications or line-of-business resources.
Complements existing network and application security services on AWS by providing control and visibility to layer 3 through 7 network traffic for your entire VPC.

Automating deployment of Network Firewall and management of Network Firewall rules support management at-scale and help in timely response, as Network Firewall is designed to block access to insecure sites before they impact your resources. For the solution in this blog post, you’ll use an AWS CloudFormation template to deploy the network architecture with Network Firewall.

Solution architecture

Figure 1 shows a sample architecture to demonstrate how users are able to download malware files, and how you can prevent this using network firewall rules.

Network Firewall is deployed in a single VPC architecture, where it is placed in line with the traffic to and from the internet.

Figure 1. Network architecture diagram

The network architecture shown in Figure 1 includes three subnets:

A network firewall subnet
Hosts the Network Firewall endpoint interface. All outbound traffic from this network goes through the internet gateway.
A public subnet
Hosts a NAT gateway. The next hop from the public subnet is the Network Firewall endpoint, where all traffic can be inspected before being forwarded to the internet.
A private network subnet
Used to host the client instances. All outbound traffic from this network goes to the NAT gateway endpoint.

In the network architecture shown in Figure 1, only one AZ is shown for simplicity, but best practices recommend deploying infrastructure across multiple AZs

To run the CloudFormation deployment template

To set up the architecture shown in Figure 1, launch the provided CloudFormation deployment template using the Launch stack button in step 2 below.
This CloudFormation template:
- Sets up VPCs and appropriate subnets as required by the network architecture.
- Creates a route table with appropriate routes and attaches it to the appropriate subnet (i.e. private subnet, firewall subnet, public subnet).
- Creates a test instance with appropriate security groups.
- Deploys Network Firewall with firewall policy.
- Creates a Rule Group SampleStatefulRulegroupName with Suricata rules, which is not attached to a firewall policy
To launch the stack, click the Launch Stack button below.

Name the newly created stack (for example, nfw-stack).
The template will also install two sample rules that will be used to protect against accessing two sample malware site URLs, but it will not automatically attach them to a firewall policy
You can see that Network Firewall with firewall policy was deployed as part of the basic CloudFormation deployment. It also created Suricata rules in rule groups, but is not yet attached to the firewall policy.

Note: Unless you attach the rule to the Network Firewall, it will not provide the required protection.

Example: confirming vulnerability

We have identified two sample URLs that contain malware to use for demonstration.

In the example screen shot below, we tested vulnerability by logging into test instance using AWS Session Manager. and at the shell prompt, used wget to access and download a malware file.

Figure 2 that follows is a screenshot of how a user could access and download two different malware files.

Note: Since these URLs contain malware files, we do not recommend users perform this test, but are providing a screenshot as a demonstration. If you wish to actually test ability to download files, use URLs you know are safe for testing.

Figure 2. Insecure URL access

Network Firewall policies

Before the template creates the Network Firewall rule group, it creates a Network Firewall policy and attaches it to the Network Firewall. An AWS Network Firewall firewall policy defines the monitoring and protection behavior for a firewall. The details of the behavior are defined in the rule groups that you add to your policy.

Network Firewall rules

A Network Firewall rule group is a reusable set of criteria for inspecting and handling network traffic. You can add one or more rule groups to a firewall policy as part of policy configuration. The included template does this for you.

Network Firewall rule groups are either stateless or stateful. Stateless rule groups evaluate packets in isolation, while stateful rule groups evaluate them in the context of their traffic flow. Network Firewall uses a Suricata rules engine to process all stateful rules.

Suricata rules can be used to create a Network Firewall stateful rule to prevent insecure URL access. Figure 3 shows the Suricata rules that the template adds and attaches to the Network Firewall policy in order to block access to the sample malware URLs used in the previous example.

Figure 3. Suricata rules in a Network Firewall rule group

Attach the rule group to the Network Firewall policy

When you launched the CloudFormation template, it automatically created these rules in the rule group. You will now be attaching this rule group to the firewall policy in order to enable the protection. You will need similar rules to block the test URLs that are used for your testing.

Figure 3 shows two Suricata rules that have been configured to block the insecure malware URLs.

To add Suricata rules to Network Firewall

To improve site security and protect against downloading malware, you can add Suricata rules to Network Firewall to secure your site. You’ll do this by:

Creating and attaching a firewall policy to the Network Firewall.
Creating rules as part of rule groups, which are attached to the firewall policy
Testing to verify that access to malware URLs from the instance is blocked.

Let’s review Suricata Rules that are created, which can be attached to Network Firewall.

Suricata rule parts

Each Suricata rule has three parts:

Action

drop action that should be taken

Header

http this is the traffic protocol

$HOME_NET anywhere $HOME_NET is a Suricata variable. By default it is set to the CIDR range of the VPC where Network Firewall is deployed and any refers to any source port

$EXTERNAL_NET 80 where $EXTERNAL_NET 80 is a Suricata standard variable that refers to traffic destination, and 80 refers to the destination port

-> is the direction that tells in which direction the signature has to match

Options

msg “MALWARE custom solution” – gives textual information about the signature and the possible alert

flow to_server,established – it is used to match on the direction of the flow and established refers to match on established connections

classtype trojan-activity – gives information about the classification of rules and alerts

sid:xxxxx gives every signature its own id

content “xxxx” – This keyword is very important and it identifies the pattern that your signature should match.

http_uri is a content modifier that helps you match specifically and only on the request URI

rev:xxx this goes along with sid keyword. It represents the version of the signature

The signatures in the Suricate rule shown in Figure 3 will block traffic that matches the http_uri contents /data/js_crypto_miner.html and /data/java_jre17_exec.html when the traffic is initiated from the VPC to the public network.

To attach a rule group to an existing Network Firewall

In Figure 4, the Network Firewall has a policy attached. but it does not have a rule group

Figure 4. A policy is attached, but not a rule group

As shown in Figure 5, choose Add rule group to start adding your Suricata rule to the Network Firewall.
Choose Add from existing stateful rule groups to attach an already created Suricata rule group.

Figure 5. Choose Add rule group

Figure 6 shows the Suriacata rule groups that are already created. SampleStatefulRulegroupName is the rule group created by the CloudFormation template.
Select the rule group and choose Add stateful rule group to finish adding the rule group to Network Firewall.

Figure 6. Review the rule groups that are already created

Figure 7 shows that the rule group SampleStatefulRulegroupName is now part of the Stateful rule group section of Network Firewall screen, which completes adding Suricata rules to Network Firewall.

Figure 7. Shows the new rule group is now added

Example: validating the solution

Your Network Firewall is now configured to block malware URLs that are defined in the rulegroup SampleStatefulRulegroupName.

As in the example above where we confirmed vulnerability, Figure 8 shows how to validate that the solution is now protecting your users from accessing malware sites.

Figure 8 shows a user trying to access the same insecure URLs we tested earlier and shows that the URLs are now blocked and the attempted connection times out.

Figure 8. Insecure URL access blocked

Validating blocking access helps your security team ensure that users or applications on your network cannot download malware. You can add similar rules for any URLs you identify as insecure. SOC operators are typically not familiar with updating CloudFormation templates, but you can use a deployment pipeline where the data required for the rule is stored in Amazon DynamoDB and use AWS Lambda functions to automate updating rules.

Now that you have an example running, you should implement a complete rule set that meets your requirement from a publicly available malware list such as CISSECURITY MALWARE LIST.

Cleanup

AWS resources created for testing can result in additional costs. Since this environment used a CloudFormation template, you can remove all AWS resources associated with the solution by deleting the CloudFormation stack you named previously (for example, nfw-stack).

Conclusion

This blog describes an approach for preventing users from downloading malware. The solution presented uses AWS Network Firewall to secure your environment by blocking access to the specified malware URLs. The supplied CloudFormation template can be used to automate this protection, and to easily set up a test environment to simulate the scenario.

For additional best practice information, see:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Deploy and Manage Gitlab Runners on Amazon EC2

2022-01-26 Sylvia Qi

Post Syndicated from Sylvia Qi original https://aws.amazon.com/blogs/devops/deploy-and-manage-gitlab-runners-on-amazon-ec2/

Gitlab CI is a tool utilized by many enterprises to automate their Continuous integration, continuous delivery and deployment (CI/CD) process. A Gitlab CI/CD pipeline consists of two major components: A .gitlab-ci.yml file describing a pipeline’s jobs, and a Gitlab Runner, an application that executes the pipeline jobs.

Setting up the Gitlab Runner is a time-consuming process. It involves provisioning the necessary infrastructure, installing the necessary software to run pipeline workloads, and configuring the runner. For enterprises running hundreds of pipelines across multiple environments, it is essential to automate the Gitlab Runner deployment process so as to be deployed quickly in a repeatable, consistent manner.

This post will guide you through utilizing Infrastructure-as-Code (IaC) to automate Gitlab Runner deployment and administrative tasks on Amazon EC2. With IaC, you can quickly and consistently deploy the entire Gitlab Runner architecture by running a script. You can track and manage changes efficiently. And, you can enforce guardrails and best practices via code. The solution presented here also offers autoscaling so that you save costs by terminating resources when not in use. You will learn:

How to deploy Gitlab Runner quickly and consistently across multiple AWS accounts.
How to enforce guardrails and best practices on the Gitlab Runner through IaC.
How to autoscale Gitlab Runner based on workloads to ensure best performance and save costs.

This post comes from a DevOps engineer perspective, and assumes that the engineer is familiar with the practices and tools of IaC and CI/CD.

Overview of the solution

The following diagram displays the solution architecture. We use AWS CloudFormation to describe the infrastructure that is hosting the Gitlab Runner. The main steps are as follows:

The user runs a deploy script in order to deploy the CloudFormation template. The template is parameterized, and the parameters are defined in a properties file. The properties file specifies the infrastructure configuration, as well as the environment in which to deploy the template.
The deploy script calls CloudFormation CreateStack API to create a Gitlab Runner stack in the specified environment.
During stack creation, an EC2 autoscaling group is created with the desired number of EC2 instances. Each instance is launched via a launch template, which is created with values from the properties file. An IAM role is created and attached to the EC2 instance. The role contains permissions required for the Gitlab Runner to execute pipeline jobs. A lifecycle hook is attached to the autoscaling group on instance termination events. This ensures graceful instance termination.
During instance launch, CloudFormation uses a cfn-init helper script to install and configure the Gitlab Runner:
1. cfn-init installs the Gitlab Runner software on the EC2 instance.
2. cfn-init configures the Gitlab Runner as a docker executor using a pre-defined docker image in the Gitlab Container Registry. The docker executor implementation lets the Gitlab Runner run each build in a separate and isolated container. The docker image contains the software required to run the pipeline workloads, thereby eliminating the need to install these packages during each build.
3. cfn-init registers the Gitlab Runner to Gitlab projects specified in the properties file, so that these projects can utilize the Gitlab Runner to run pipelines.

The user may repeat the same steps to deploy Gitlab Runner into another environment.

Architecture diagram previously explained in post.

Walkthrough

This walkthrough will demonstrate how to deploy the Gitlab Runner, and how easy it is to conduct Gitlab Runner administrative tasks via this architecture. We will walk through the following tasks:

Build a docker executor image for the Gitlab Runner.
Deploy the Gitlab Runner stack.
Update the Gitlab Runner.
Terminate the Gitlab Runner.
Add/Remove Gitlab projects from the Gitlab Runner.
Autoscale the Gitlab Runner based on workloads.

The code in this post is available at https://github.com/aws-samples/amazon-ec2-gitlab-runner.git

Prerequisites

For this walkthrough, you need the following:

A Gitlab account (all tiers including Gitlab Free self-managed, Gitlab Free SaaS, and higher tiers). This demo uses gitlab.com free tire.
A Gitlab Container Registry.
A Git client to clone the source code provided.
An AWS account with local credentials properly configured (typically under ~/.aws/credentials).
The latest version of the AWS CLI. For more information, see Installing, updating, and uninstalling the AWS CLI.
Docker is installed and running on the localhost/laptop.
Nodejs and npm installed on the localhost/laptop.
A VPC with 2 private subnets and that is connected to the internet via NAT gateway allowing outbound traffic.
The following IAM service-linked role created in the AWS account: AWSServiceRoleForAutoScaling
An Amazon S3 bucket for storing Lambda deployment packages.
Familiarity with Git, Gitlab CI/CD, Docker, EC2, CloudFormation and Amazon CloudWatch.

Build a docker executor image for the Gitlab Runner

The Gitlab Runner in this solution is implemented as docker executor. The Docker executor connects to Docker Engine and runs each build in a separate and isolated container via a predefined docker image. The first step in deploying the Gitlab Runner is building a docker executor image. We provided a simple Dockerfile in order to build this image. You may customize the Dockerfile to install your own requirements.

To build a docker image using the sample Dockerfile:

Create a directory where we will store our demo code. From your terminal run:

mkdir demo-repos && cd demo-repos

Clone the source code repository found in the following location:

git clone https://github.com/aws-samples/amazon-ec2-gitlab-runner.git

Create a new project on your Gitlab server. Name the project any name you like.
Clone your newly created repo to your laptop. Ignore the warning about cloning an empty repository.

git clone <your-repo-url>

Copy the demo repo files into your newly created repo on your laptop, and push it to your Gitlab repository. You may customize the Dockerfile before pushing it to Gitlab.

cp -r amazon-ec2-gitlab-runner/* <your-repo-dir>
cd <your-repo-dir>
git add .
git commit -m “Initial commit”
git push

On the Gitlab console, go to your repository’s Package & Registries -> Container Registry. Follow the instructions provided on the Container Registry page in order to build and push a docker image to your repository’s container registry.

Deploy the Gitlab Runner stack

Once the docker executor image has been pushed to the Gitlab Container Registry, we can deploy the Gitlab Runner. The Gitlab Runner infrastructure is described in the Cloudformation template gitlab-runner.yaml. Its configuration is stored in a properties file called sample-runner.properties. A launch template is created with the values in the properties file. Then it is used to launch instances. This architecture lets you deploy Gitlab Runner to as many environments as you like by utilizing the configurations provided in the appropriate properties files.

During the provisioning process, utilize a cfn-init helper script to run a series of commands to install and configure the Gitlab Runner.

          commands:
            01InstallDocker:
              command: sudo yum -y install docker
            02StartDocker:
              command: sudo service docker start
            03DownloadGitlabRunner:
              command: sudo wget -O /usr/bin/gitlab-runner https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64
            04ChmodGitlabRunner:
              command: sudo chmod a+x /usr/bin/gitlab-runner
            05AddUser:
              command: sudo useradd --comment 'GitLab Runner' --create-home gitlab-runner --shell /bin/bash
            06InstallGitlabRunner:
              command: sudo gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
            07SetRegion:
              command: !Sub 'aws configure set default.region ${AWS::Region}'
            08ConfigureDockerExecutor:
              command: !Sub 
                - |
                  for GitlabGroupToken in `aws ssm get-parameters --names /${AWS::StackName}/ci-tokens --query 'Parameters[0].Value' | sed -e "s/\"//g" | sed "s/,/ /g"`;do
                      sudo gitlab-runner register \
                      --non-interactive \
                      --url "${GitlabServerURL}" \
                      --registration-token $GitlabGroupToken \
                      --executor "docker" \
                      --docker-image "${DockerImagePath}" \
                      --description "Gitlab Runner with Docker Executor" \
                      --locked="${isLOCKED}" --access-level "${ACCESS}" \
                      --docker-volumes "/var/run/docker.sock:/var/run/docker.sock" \
                      --tag-list "${RunnerEnvironment}-${RunnerVersion}-docker"
                  done
                - isLOCKED: !FindInMap [GitlabRunnerRegisterOptionsMap, !Ref RunnerEnvironment, isLOCKED]
                  ACCESS: !FindInMap [GitlabRunnerRegisterOptionsMap, !Ref RunnerEnvironment, ACCESS]                              
            09StartGitlabRunner:
              command: sudo gitlab-runner start

The helper script ensures that the Gitlab Runner setup is consistent and repeatable for each deployment. If a configuration change is required, users simply update the configuration steps and redeploy the stack. Furthermore, all changes are tracked in Git, which allows for versioning of the Gitlab Runner.

To deploy the Gitlab Runner stack:

Obtain the runner registration tokens of the Gitlab projects that you want registered to the Gitlab Runner. Obtain the token by selecting the project’s Settings > CI/CD and expand the Runners section.
Update the sample-runner.properties file parameters according to your own environment. Refer to the gitlab-runner.yaml file for a description of these parameters. Rename the file if you like. You may also create an additional properties file for deploying into other environments.
Run the deploy script to deploy the runner:

cd <your-repo-dir>
./deploy-runner.sh <properties-file> <region> <aws-profile> <stack-name>

<properties-file> is the name of the properties file.

<region> is the region where you want to deploy the stack.

<aws-profile> is the name of the CLI profile you set up in the prerequisites section.

<stack-name> is the name you chose for the CloudFormation stack.

For example:

./deploy-runner.sh sample-runner.properties us-east-1 dev amazon-ec2-gitlab-runner-demo

After the stack is deployed successfully, you will see the Gitlab Runner autoscaling group created in the EC2 console:

After the stack is deployed successfully, you will see the Gitlab Runner autoscaling group created in the EC2 console.

Under your Gitlab project Settings > CICD > Runners > Available specific runners, you will see the fully configured Gitlab Runner. The green circle indicates that the Gitlab Runner is ready for use.

Now go to your Gitlab project Settings  CICD  Runners  Available specific runners, you will see the fully configured Gitlab Runner. The green circle indicates that the Gitlab Runner is ready for use.

Updating the Gitlab Runner

There are times when you would want to update the Gitlab Runner. For example, updating the instance VolumeSize in order to resolve a disk space issue, or updating the AMI ID when a new AMI becomes available.

Utilizing the properties file and launch template makes it easy to update the Gitlab Runner. Simply update the Gitlab Runner configuration parameters in the properties file. Then, run the deploy script to udpate the Gitlab Runner stack. To ensure that the changes take effect immediately (e.g., existing instances are replaced by new instances with the new configuration), we utilize an AutoscalingRollingUpdate update policy to automatically update the instances in the autoscaling group.

    UpdatePolicy:
      AutoScalingRollingUpdate:
        MinInstancesInService: !Ref MinInstancesInService
        MaxBatchSize: !Ref MaxBatchSize
        PauseTime: "PT5M"
        WaitOnResourceSignals: true
        SuspendProcesses:
          - HealthCheck
          - ReplaceUnhealthy
          - AZRebalance
          - AlarmNotification
          - ScheduledActions

The policy tells CloudFormation that when changes are detected in the launch template, update the instances in batch size of MaxBatchSize, while keeping a number of instances (specified in MinInstanceInService) in service during the update.

Below is an example of updating the Gitlab Runner instance type.

To update the instance type of the runner instance:

Update the “InstanceType” parameter in the properties file.

InstanceType=t2.medium

Run the deploy-runner.sh script to update the CloudFormation stack:

cd <your-repo-dir>
./deploy-runner.sh <properties-file> <region> <aws-profile> <stack-name>

In the CloudFormation console, you will see that the launch template is updated first, then a rolling update is initiated. The instance type update requires a replacement of the original instance, so a temporary instance was launched and put in service. Then, the temporary instance was terminated when the new instance was launched successfully.

After the update is complete, you will see that on the Gitlab project’s console, the old Gitlab Runner, ez_5x8Rv, is replaced by the new Gitlab Runner, N1_UQ7yc.

Terminate the Gitlab Runner

There are times when an autoscaling group instance must be terminated. For example, during an autoscaling scale-in event, or when the instance is being replaced by a new instance during a stack update, as seen previously. When terminating an instance, you must ensure that the Gitlab Runner finishes executing any running jobs before the instance is terminated, otherwise your environment could be left in an inconsistent state. Also, we want to ensure that the terminated Gitlab Runner is removed from the Gitlab project. We utilize an autoscaling lifecycle hook to achieve these goals.

The lifecycle hook works like this: A CloudWatch event rule actively listens for the EC2 Instance-terminate events. When one is detected, the event rule triggers a Lambda function. The Lambda function calls SSM Run Command to run a series of commands on the EC2 instances, via a SSM Document. The commands include stopping the Gitlab Runner gracefully when all running jobs are finished, de-registering the runner from Gitlab projects, and signaling the autoscaling group to terminate the instance.

There are also times when you want to terminate an instance manually. For example, when an instance is suspected to not be functioning properly. To terminate an instance from the Gitlab Runner autoscaling group, use the following command:

aws autoscaling terminate-instance-in-auto-scaling-group \
    --instance-id="${InstanceId}" \
    --no-should-decrement-desired-capacity \
    --region="${region}" \
    --profile="${profile}"

The above command terminates the instance. The lifecycle hook ensures that the cleanup steps are conducted properly, and the autoscaling group launches another new instance to replace the old one.

Note that if you terminate the instance by using the “ec2 terminate-instance” command, then the autoscaling lifecycle hook actions will not be triggered.

Add/Remove Gitlab projects from the Gitlab Runner

As new projects are added to your enterprise, you may want to register them to the Gitlab Runner, so that those projects can utilize the Gitlab Runner to run pipelines. On the other hand, you would want to remove the Gitlab Runner from a project if it no longer wants to utilize the Gitlab Runner, or if it qualifies to utilize the Gitlab Runner. For example, if a project is no longer allowed to deploy to an environment configured by the Gitlab Runner. Our architecture offers a simple way to add and remove projects from the Gitlab Runner. To add new projects to the Gitlab Runner, update the RunnerRegistrationTokens parameter in the properties file, and then rerun the deploy script to update the Gitlab Runner stack.

To add new projects to the Gitlab Runner:

Update the RunnerRegistrationTokens parameter in the properties file. For example:

RunnerRegistrationTokens=ps8RjBSruy1sdRdP2nZX,XbtZNv4yxysbYhqvjEkC

Update the Gitlab Runner stack. This updates the SSM parameter which stores the tokens.

cd <your-repo-dir>
./deploy-runner.sh <properties-file> <region> <aws-profile> <stack-name>

Relaunch the instances in the Gitlab Runner autoscaling group. The new instances will use the new RunnerRegistrationTokens value. Run the following command to relaunch the instances:

./cycle-runner.sh <runner-autoscaling-group-name> <region> <optional-aws-profile>

To remove projects from the Gitlab Runner, follow the steps described above, with just one difference. Instead of adding new tokens to the RunnerRegistrationTokens parameter, remove the token(s) of the project that you want to dissociate from the runner.

Autoscale the runner based on custom performance metrics

Each Gitlab Runner can be configured to handle a fixed number of concurrent jobs. Once this capacity is reached for every runner, any new jobs will be in a Queued/Waiting status until the current jobs complete, which would be a poor experience for our team. Setting the number of concurrent jobs too high on our runners would also result in a poor experience, because all jobs leverage the same CPU, memory, and storage in order to conduct the builds.

In this solution, we utilize a scheduled Lambda function that runs every minute in order to inspect the number of jobs running on every runner, leveraging the Prometheus Metrics endpoint that the runners expose. If we approach the concurrent build limit of the group, then we increase the Autoscaling Group size so that it can take on more work. As the number of concurrent jobs decreases, then the scheduled Lambda function will scale the Autoscaling Group back in an effort to minimize cost. The Scaling-Up operation will ignore the Autoscaling Group’s cooldown period, which will help ensure that our team is not waiting on a new instance, whereas the Scale-Down operation will obey the group’s cooldown period.

Here is the logical sequence diagram for the work:

Sequence diagram

For operational monitoring, the Lambda function also publishes custom CloudWatch Metrics for the count of active jobs, along with the target and actual capacities of the Autoscaling group. We can utilize this information to validate that the system is working properly and determine if we need to modify any of our autoscaling parameters.

Congratulations! You have completed the walkthrough. Take some time to review the resources you have deployed, and practice the various runner administrative tasks that we have covered in this post.

Troubleshooting

Problem: I deployed the CloudFormation template, but no runner is listed in my repository.

Possible Cause: Errors have been encountered during cfn-init, causing runner registration to fail. Connect to your runner EC2 instance, and check /var/log/cfn-*.log files.

Cleaning up

To avoid incurring future charges, delete every resource provisioned in this demo by deleting the CloudFormation stack created in the “Deploy the Gitlab Runner stack” section.

Conclusion

This article demonstrated how to utilize IaC to efficiently conduct various administrative tasks associated with a Gitlab Runner. We deployed Gitlab Runner consistently and quickly across multiple accounts. We utilized IaC to enforce guardrails and best practices, such as tracking Gitlab Runner configuration changes, terminating the Gitlab Runner gracefully, and autoscaling the Gitlab Runner to ensure best performance and minimum cost. We walked through the deploying, updating, autoscaling, and terminating of the Gitlab Runner. We also saw how easy it was to clean up the entire Gitlab Runner architecture by simply deleting a CloudFormation stack.

About the authors

How to use tokenization to improve data security and reduce audit scope

2022-01-25 Tim Winston

Post Syndicated from Tim Winston original https://aws.amazon.com/blogs/security/how-to-use-tokenization-to-improve-data-security-and-reduce-audit-scope/

Tokenization of sensitive data elements is a hot topic, but you may not know what to tokenize, or even how to determine if tokenization is right for your organization’s business needs. Industries subject to financial, data security, regulatory, or privacy compliance standards are increasingly looking for tokenization solutions to minimize distribution of sensitive data, reduce risk of exposure, improve security posture, and alleviate compliance obligations. This post provides guidance to determine your requirements for tokenization, with an emphasis on the compliance lens given our experience as PCI Qualified Security Assessors (PCI QSA).

What is tokenization?

Tokenization is the process of replacing actual sensitive data elements with non-sensitive data elements that have no exploitable value for data security purposes. Security-sensitive applications use tokenization to replace sensitive data, such as personally identifiable information (PII) or protected health information (PHI), with tokens to reduce security risks.

De-tokenization returns the original data element for a provided token. Applications may require access to the original data, or an element of the original data, for decisions, analysis, or personalized messaging. To minimize the need to de-tokenize data and to reduce security exposure, tokens can retain attributes of the original data to enable processing and analysis using token values instead of the original data. Common characteristics tokens may retain from the original data are:

Format attributes

Length	for compatibility with storage and reports of applications written for the original data
Character set	for compatibility with display and data validation of existing applications
Preserved character positions	such as first 6 and last 4 for credit card PAN

Analytics attributes

Mapping consistency	where the same data always results in the same token
Sort order

Retaining functional attributes in tokens must be implemented in ways that do not defeat the security of the tokenization process. Using attribute preservation functions can possibly reduce the security of a specific tokenization implementation. Limiting the scope and access to tokens addresses limitations introduced when using attribute retention.

Why tokenize? Common use cases

I need to reduce my compliance scope

Tokens are generally not subject to compliance requirements if there is sufficient separation of the tokenization implementation and the applications using the tokens. Encrypted sensitive data may not reduce compliance obligations or scope. Such industry regulatory standards as PCI DSS 3.2.1 still consider systems that store, process, or transmit encrypted cardholder data as in-scope for assessment; whereas tokenized data may remove those systems from assessment scope. A common use case for PCI DSS compliance is replacing PAN with tokens in data sent to a service provider, which keeps the service provider from being subject to PCI DSS.

I need to restrict sensitive data to only those with a “need-to-know”

Tokenization can be used to add a layer of explicit access controls to de-tokenization of individual data items, which can be used to implement and demonstrate least-privileged access to sensitive data. For instances where data may be co-mingled in a common repository such as a data lake, tokenization can help ensure that only those with the appropriate access can perform the de-tokenization process and reveal sensitive data.

I need to avoid sharing sensitive data with my service providers

Replacing sensitive data with tokens before providing it to service providers who have no access to de-tokenize data can eliminate the risk of having sensitive data within service providers’ control, and avoid having compliance requirements apply to their environments. This is common for customers involved in the payment process, which provides tokenization services to merchants that tokenize the card holder data, and return back to their customers a token they can use to complete card purchase transactions.

I need to simplify data lake security and compliance

A data lake centralized repository allows you to store all your structured and unstructured data at any scale, to be used later for not-yet-determined analysis. Having multiple sources and data stored in multiple structured and unstructured formats creates complications for demonstrating data protection controls for regulatory compliance. Ideally, sensitive data should not be ingested at all; however, that is not always feasible. Where ingestion of such data is necessary, tokenization at each data source can keep compliance-subject data out of data lakes, and help avoid compliance implications. Using tokens that retain data attributes, such as data-to-token consistency (idempotence) can support many of the analytical capabilities that make it useful to store data in the data lake.

I want to allow sensitive data to be used for other purposes, such as analytics

Your organization may want to perform analytics on the sensitive data for other business purposes, such as marketing metrics, and reporting. By tokenizing the data, you can minimize the locations where sensitive data is allowed, and provide tokens to users and applications needing to conduct data analysis. This allows numerous applications and processes to access the token data and maintain security of the original sensitive data.

I want to use tokenization for threat mitigation

Using tokenization can help you mitigate threats identified in your workload threat model, depending on where and how tokenization is implemented. At the point where the sensitive data is tokenized, the sensitive data element is replaced with a non-sensitive equivalent throughout the data lifecycle, and across the data flow. Some important questions to ask are:

What are the in-scope compliance, regulatory, privacy, or security requirements for the data that will be tokenized?
When does the sensitive data need to be tokenized in order to meet security and scope reduction objectives?
What attack vector is being addressed for the sensitive data by tokenizing it?
Where is the tokenized data being hosted? Is it in a trusted environment or an untrusted environment?

For additional information on threat modeling, see the AWS security blog post How to approach threat modeling.

Tokenization or encryption consideration

Tokens can provide the ability to retain processing value of the data while still managing the data exposure risk and compliance scope. Encryption is the foundational mechanism for providing data confidentiality.

Encryption rarely results in cipher text with a similar format to the original data, and may prevent data analysis, or require consuming applications to adapt.

Your decision to use tokenization instead of encryption should be based on the following:

Reduction of compliance scope	As discussed above, by properly utilizing tokenization to obfuscate sensitive data you may be able to reduce the scope of certain framework assessments such as PCI DSS 3.2.1.
Format attributes	Used for compatibility with existing software and processes.
Analytics attributes	Used to support planned data analysis and reporting.
Elimination of encryption key management	A tokenization solution has one essential API—create token—and one optional API—retrieve value from token. Managing access controls can be simpler than some non-AWS native general purpose cryptographic key use policies. In addition, the compromise of the encryption key compromises all data encrypted by that key, both past and future. The compromise of the token database compromises only existing tokens.

Where encryption may make more sense

Although scope reduction, data analytics, threat mitigation, and data masking for the protection of sensitive data make very powerful arguments for tokenization, we acknowledge there may be instances where encryption is the more appropriate solution. Ask yourself these questions to gain better clarity on which solution is right for your company’s use case.

Scalability	If you require a solution that scales to large data volumes, and have the availability to leverage encryption solutions that require minimal key management overhead, such as AWS Key Management Services (AWS KMS), then encryption may be right for you.
Data format	If you need to secure data that is unstructured, then encryption may be the better option given the flexibility of encryption at various layers and formats.
Data sharing with 3rd parties	If you need to share sensitive data in its original format and value with a 3rd party, then encryption may be the appropriate solution to minimize external access to your token vault for de-tokenization processes.

What type of tokenization solution is right for your business?

When trying to decide which tokenization solution to use, your organization should first define your business requirements and use cases.

What are your own specific use cases for tokenized data, and what is your business goal? Identifying which use cases apply to your business and what the end state should be is important when determining the correct solution for your needs.
What type of data does your organization want to tokenize? Understanding what data elements you want to tokenize, and what that tokenized data will be used for may impact your decision about which type of solution to use.
Do the tokens need to be deterministic, the same data always producing the same token? Knowing how the data will be ingested or used by other applications and processes may rule out certain tokenization solutions.
Will tokens be used internally only, or will the tokens be shared across other business units and applications? Identifying a need for shared tokens may increase the risk of token exposure and, therefore, impact your decisions about which tokenization solution to use.
How long does a token need to be valid? You will need to identify a solution that can meet your use cases, internal security policies, and regulatory framework requirements.

Choosing between self-managed tokenization or tokenization as a service

Do you want to manage the tokenization within your organization, or use Tokenization as a Service (TaaS) offered by a third-party service provider? Some advantages to managing the tokenization solution with your company employees and resources are the ability to direct and prioritize the work needed to implement and maintain the solution, customizing the solution to the application’s exact needs, and building the subject matter expertise to remove a dependency on a third party. The primary advantages of a TaaS solution are that it is already complete, and the security of both tokenization and access controls are well tested. Additionally, TaaS inherently demonstrates separation of duties, because privileged access to the tokenization environment is owned by the tokenization provider.

Choosing a reversible tokenization solution

Do you have a business need to retrieve the original data from the token value? Reversible tokens can be valuable to avoid sharing sensitive data with internal or third-party service providers in payments and other financial services. Because the service providers are passed only tokens, they can avoid accepting additional security risk and compliance scope. If your company implements or allows de-tokenization, you will need to be able to demonstrate strict controls on the management and use of de-tokenization privilege. Eliminating the implementation of de-tokenization is the clearest way to demonstrate that downstream applications cannot have sensitive data. Given the security and compliance risks of converting tokenized data back into its original data format, this process should be highly monitored, and you should have appropriate alerting in place to detect each time this activity is performed.

Operational considerations when deciding on a tokenization solution

While operational considerations are outside the scope of this post, they are important factors for choosing a solution. Throughput, latency, deployment architecture, resiliency, batch capability, and multi-regional support can impact the tokenization solution of choice. Integration mechanisms with identity and access control and logging architectures, for example, are important for compliance controls and evidence creation.

No matter which deployment model you choose, the tokenization solution needs to meet security standards, similar to encryption standards, and must prevent determining what the original data is from the token values.

Conclusion

Using tokenization solutions to replace sensitive data offers many security and compliance benefits. These benefits include lowered security risk and smaller audit scope, resulting in lower compliance costs and a reduction in regulatory data handling requirements.

Your company may want to use sensitive data in new and innovative ways, such as developing personalized offerings that use predictive analysis and consumer usage trends and patterns, fraud monitoring and minimizing financial risk based on suspicious activity analysis, or developing business intelligence to improve strategic planning and business performance. If you implement a tokenization solution, your organization can alleviate some of the regulatory burden of protecting sensitive data while implementing solutions that use obfuscated data for analytics.

On the other hand, tokenization may also add complexity to your systems and applications, as well as adding additional costs to maintain those systems and applications. If you use a third-party tokenization solution, there is a possibility of being locked into that service provider due to the specific token schema they may use, and switching between providers may be costly. It can also be challenging to integrate tokenization into all applications that use the subject data.

In this post, we have described some considerations to help you determine if tokenization is right for you, what to consider when deciding which type of tokenization solution to use, and the benefits. disadvantages, and comparison of tokenization and encryption. When choosing a tokenization solution, it’s important for you to identify and understand all of your organizational requirements. This post is intended to generate questions your organization should answer to make the right decisions concerning tokenization.

You have many options available to tokenize your AWS workloads. After your organization has determined the type of tokenization solution to implement based on your own business requirements, explore the tokenization solution options available in AWS Marketplace. You can also build your own solution using AWS guides and blog posts. For further reading, see this blog post: Building a serverless tokenization solution to mask sensitive data.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Security Assurance Services or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Migrating AWS Lambda functions to Arm-based AWS Graviton2 processors

2022-01-24 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/migrating-aws-lambda-functions-to-arm-based-aws-graviton2-processors/

AWS Lambda now allows you to configure new and existing functions to run on Arm-based AWS Graviton2 processors in addition to x86-based functions. Using this processor architecture option allows you to get up to 34% better price performance. This blog post highlights some considerations when moving from x86 to arm64 as the migration process is code and workload dependent.

Functions using the Arm architecture benefit from the performance and security built into the Graviton2 processor, which is designed to deliver up to 19% better performance for compute-intensive workloads. Workloads using multithreading and multiprocessing, or performing many I/O operations, can experience lower invocation time, which reduces costs.

Duration charges, billed with millisecond granularity, are 20 percent lower when compared to current x86 pricing. This also applies to duration charges when using Provisioned Concurrency. Compute Savings Plans supports Lambda functions powered by Graviton2.

The architecture change does not affect the way your functions are invoked or how they communicate their responses back. Integrations with APIs, services, applications, or tools are not affected by the new architecture and continue to work as before.

The following runtimes, which use Amazon Linux 2, are supported on Arm:

Node.js 12 and 14
Python 3.8 and 3.9
Java 8 (java8.al2) and 11
.NET Core 3.1
Ruby 2.7
Custom runtime (provided.al2)

Lambda@Edge does not support Arm as an architecture option.

You can create and manage Lambda functions powered by Graviton2 processor using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS CloudFormation, AWS Serverless Application Model (AWS SAM), and AWS Cloud Development Kit (AWS CDK). Support is also available through many AWS Lambda Partners.

Understanding Graviton2 processors

AWS Graviton processors are custom built by AWS. Generally, you don’t need to know about the specific Graviton processor architecture, unless your applications can benefit from specific features.

The Graviton2 processor uses the Neoverse-N1 core and supports Arm V8.2 (include CRC and crypto extensions) plus several other architectural extensions. In particular, Graviton2 supports the Large System Extensions (LSE), which improve locking and synchronization performance across large systems.

Migrating x86 Lambda functions to arm64

Many Lambda functions may only need a configuration change to take advantage of the price/performance of Graviton2. Other functions may require repackaging the Lambda function using Arm-specific dependencies, or rebuilding the function binary or container image.

You may not require an Arm processor on your development machine to create Arm-based functions. You can build, test, package, compile, and deploy Arm Lambda functions on x86 machines using AWS SAM and Docker Desktop. If you have an Arm-based system, such as an Apple M1 Mac, you can natively compile binaries.

Functions without architecture-specific dependencies or binaries

If your functions don’t use architecture-specific dependencies or binaries, you can switch from one architecture to the other with a single configuration change. Many functions using interpreted languages such as Node.js and Python, or functions compiled to Java bytecode, can switch without any changes. Ensure you check binaries in dependencies, Lambda layers, and Lambda extensions.

To switch functions from x86 to arm64, you can change the Architecture within the function runtime settings using the Lambda console.

Edit AWS Lambda function Architecture

If you want to display or log the processor architecture from within a Lambda function, you can use OS specific calls. For example, Node.js process.arch or Python platform.machine().

When using the AWS CLI to create a Lambda function, specify the --architectures option. If you do not specify the architecture, the default value is x86-64. For example, to create an arm64 function, specify --architectures arm64.

aws lambda create-function \
    --function-name MyArmFunction \
    --runtime nodejs14.x \
    --architectures arm64 \
    --memory-size 512 \
    --zip-file fileb://MyArmFunction.zip \
    --handler lambda.handler \
    --role arn:aws:iam::123456789012:role/service-role/MyArmFunction-role

When using AWS SAM or CloudFormation, add or amend the Architectures property within the function configuration.

MyArmFunction:
  Type: AWS::Lambda::Function
  Properties:
    Runtime: nodejs14.x
    Code: src/
    Architectures:
  	- arm64
    Handler: lambda.handler
    MemorySize: 512

When initiating an AWS SAM application, you can specify:

sam init --architecture arm64

When building Lambda layers, you can specify CompatibleArchitectures.

MyArmLayer:
  Type: AWS::Lambda::LayerVersion
  Properties:
    ContentUri: layersrc/
    CompatibleArchitectures:
      - arm64

Building function code for Graviton2

If you have dependencies or binaries in your function packages, you must rebuild the function code for the architecture you want to use. Many packages and dependencies have arm64 equivalent versions. Test your own workloads against arm64 packages to see if your workloads are good migration candidates. Not all workloads show improved performance due to the different processor architecture features.

For compiled languages like Rust and Go, you can use the provided.al2 custom runtime, which supports Arm. You provide a binary that communicates with the Lambda Runtime API.

When compiling for Go, set GOARCH to arm.

GOOS=linux GOARCH=arm go build

When compiling for Rust, set the target.

cargo build --release -- target-cpu=neoverse-n1

The default installation of Python pip on some Linux distributions is out of date (<19.3). To install binary wheel packages released for Graviton, upgrade the pip installation using:

sudo python3 -m pip install --upgrade pip

The Arm software ecosystem is continually improving. As a general rule, use later versions of compilers and language runtimes whenever possible. The AWS Graviton Getting Started GitHub repository includes known recent changes to popular packages that improve performance, including ffmpeg, PHP, .Net, PyTorch, and zlib.

You can use https://pkgs.org/ as a package repository search tool.

Sometimes code includes architecture specific optimizations. These can include code optimized in assembly using specific instructions for CRC, or enabling a feature that works well on particular architectures. One way to see if any optimizations are missing for arm64 is to search the code for __x86_64__ ifdefs and see if there is corresponding arm64 code included. If not, consider alternative solutions.

For additional language-specific considerations, see the links within the GitHub repository.

The Graviton performance runbook is a performance profiling reference by the Graviton to benchmark, debug, and optimize application code.

Building functions packages as container images

Functions packaged as container images must be built for the architecture (x86 or arm64) they are going to use. There are arm64 architecture versions of the AWS provided base images for Lambda. To specify a container image for arm64, use the arm64 specific image tag, for example, for Node.js 14:

public.ecr.aws/lambda/nodejs:14-arm64
public.ecr.aws/lambda/nodejs:latest-arm64
public.ecr.aws/lambda/nodejs:14.2021.10.01.16-arm64

Arm64 Images are also available from Docker Hub.

You can also use arbitrary Linux base images in addition to the AWS provided Amazon Linux 2 images. Images that support arm64 include Alpine Linux 3.12.7 or later, Debian 10 and 11, Ubuntu 18.04 and 20.04. For more information and details of other supported Linux versions, see Operating systems available for Graviton based instances.

Migrating a function

Here is an example of how to migrate a Lambda function from x86 to arm64 and take advantage of newer software versions to improve price and performance. You can follow a similar approach to test your own code.

I have an existing Lambda function as part of an AWS SAM template configured without an Architectures property, which defaults to x86_64.

  Imagex86Function:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.9

The Lambda function code performs some compute intensive image manipulation. The code uses a dependency configured with the following version:

{
  "dependencies": {
    "imagechange": "^1.1.1"
  }
}

I duplicate the Lambda function within the AWS SAM template using the same source code and specify arm64 as the Architectures.

  ImageArm64Function:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.9
      Architectures:
        - arm64

I use AWS SAM to build both Lambda functions. I specify the --use-container flag to build each function within its architecture-specific build container.

sam build –use-container

I can use sam local invoke to test the arm64 function locally even on an x86 system.

AWS SAM local invoke

I then use sam deploy to deploy the functions to the AWS Cloud.

The AWS Lambda Power Tuning open-source project runs your functions using different settings to suggest a configuration to minimize costs and maximize performance. The tool allows you to compare two results on the same chart and incorporate arm64-based pricing. This is useful to compare two versions of the same function, one using x86 and the other arm64.

I compare the performance of the X86 and arm64 Lambda functions and see that the arm64 Lambda function is 12% cheaper to run:

Compare x86 and arm64 with dependency version 1.1.1

I then upgrade the package dependency to use version 1.2.1, which has been optimized for arm64 processors.

{
  "dependencies": {
    "imagechange": "^1.2.1"
  }
}

I use sam build and sam deploy to redeploy the updated Lambda functions with the updated dependencies.

I compare the original x86 function with the updated arm64 function. Using arm64 with a newer dependency code version increases the performance by 30% and reduces the cost by 43%.

Compare x86 and arm64 with dependency version 1.2.1

You can use Amazon CloudWatch,to view performance metrics such as duration, using statistics. You can then compare average and p99 duration between the two architectures. Due to the Graviton2 architecture, functions may be able to use less memory. This could allow you to right-size function memory configuration, which also reduces costs.

Deploying arm64 functions in production

Once you have confirmed your Lambda function performs successfully on arm64, you can migrate your workloads. You can use function versions and aliases with weighted aliases to control the rollout. Traffic gradually shifts to the arm64 version or rolls back automatically if any specified CloudWatch alarms trigger.

AWS SAM supports gradual Lambda deployments with a feature called Safe Lambda deployments using AWS CodeDeploy. You can compile package binaries for arm64 using a number of CI/CD systems. AWS CodeBuild supports building Arm based applications natively. CircleCI also has Arm compute resource classes for deployment. GitHub Actions allows you to use self-hosted runners. You can also use AWS SAM within GitHub Actions and other CI/CD pipelines to create arm64 artifacts.

Conclusion

Lambda functions using the Arm/Graviton2 architecture provide up to 34 percent price performance improvement. This blog discusses a number of considerations to help you migrate functions to arm64.

Many functions can migrate seamlessly with a configuration change, others need to be rebuilt to use arm64 packages. I show how to migrate a function and how updating software to newer versions may improve your function performance on arm64. You can test your own functions using the Lambda PowerTuning tool.

Start migrating your Lambda functions to Arm/Graviton2 today.

For more serverless learning resources, visit Serverless Land.

Continuous compliance monitoring using custom audit controls and frameworks with AWS Audit Manager

2022-01-18 Deenadayaalan Thirugnanasambandam

Post Syndicated from Deenadayaalan Thirugnanasambandam original https://aws.amazon.com/blogs/security/continuous-compliance-monitoring-using-custom-audit-controls-and-frameworks-with-aws-audit-manager/

For most customers today, security compliance auditing can be a very cumbersome and costly process. This activity within a security program often comes with a dependency on third party audit firms and robust security teams, to periodically assess risk and raise compliance gaps aligned with applicable industry requirements. Due to the nature of how audits are now performed, many corporate IT environments are left exposed to threats until the next manual audit is scheduled, performed, and the findings report is presented.

AWS Audit Manager can help you continuously audit your AWS usage and simplify how you assess IT risks and compliance gaps aligned with industry regulations and standards. Audit Manager automates evidence collection to reduce the “all hands-on deck” manual effort that often happens for audits, while enabling you to scale your audit capability in the cloud as your business grows. Customized control frameworks help customers evaluate IT environments against their own established assessment baseline, enabling them to discern how aligned they are with a set of compliance requirements tailored to their business needs. Custom controls can be defined to collect evidence from specific data sources, helping rate the IT environment against internally defined audit and compliance requirements. Each piece of evidence collected during the compliance assessment becomes a record that can be used to demonstrate compliance with predefined requirements specified by a control.

In this post, you will learn how to leverage AWS Audit Manager to create a tailored audit framework to continuously evaluate your organization’s AWS infrastructure against the relevant industry compliance requirements your organization needs to adhere to. By implementing this solution, you can simplify yet accelerate the detection of security risks present in your AWS environment, which are relevant to your organization, while providing your teams with the information needed to remedy reported compliance gaps.

Solution overview

This solution utilizes an event-driven architecture to provide agility while reducing manual administration effort.

AWS Audit Manager–AWS Audit Manager helps you continuously audit your AWS usage to simplify how you assess risk and compliance with regulations and industry standards.
AWS Lambda–AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, in response to events such as changes in data, application state or user actions.
Amazon Simple Storage Service (Amazon S3) –Amazon S3 is object storage built to store and retrieve any amount of data from anywhere, that offers industry leading availability, performance, security, and virtually unlimited scalability at very low costs.
AWS Cloud Development Kit (AWS CDK)–AWS Cloud Development Kit is a software development framework for provisioning your cloud infrastructure in code through AWS CloudFormation.

Architecture

This solution enables automated controls management using event-driven architecture with AWS Services such as AWS Audit Manager, AWS Lambda and Amazon S3, in integration with code management services like GitHub and AWS CodeCommit. The Controls owner can design, manage, monitor and roll out custom controls in GitHub with a simple custom controls configuration file, as illustrated in Figure 1. Once the controls configuration file is placed in an Amazon S3 bucket, the on-commit event of the file triggers a control pipeline to load controls in audit manager using a Lambda function.

Figure 1: Solution workflow

Solution workflow overview

The Control owner loads the controls as code (Controls and Framework) into an Amazon S3 bucket.
Uploading the Controls yaml file into the S3 bucket triggers a Lambda function to process the control file.
The Lambda function processes the Controls file, and creates a new control (or updates an existing control) in the Audit Manager.
Uploading the Controls Framework yaml file into the S3 bucket triggers a Lambda function to process the Controls Framework file.
The Lambda function validates the Controls Framework file, and updates the Controls Framework library in Audit Manager

This solution can be extended to create custom frameworks based on the controls, and to run an assessment framework against the controls.

Prerequisite steps

Sign in to your AWS Account
Login to the AWS console and choose the appropriate AWS Region.
In the Search tab, search for AWS Audit Manager

Figure 2. AWS Audit Manager

Choose Set up AWS Audit Manager.

Keep the default configurations from this page, such as Permissions and Data encryption. When done choose Complete setup.

Before deploying the solution, please ensure that the following software packages and their dependencies are installed on your local machine:

Node.js v12 or above	https://nodejs.org/en/
AWS CLI version 2	https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html
AWS CDK	https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html
jq	https://stedolan.github.io/jq/
git	https://git-scm.com/
AWS CLI configuration	https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html

Solution details

To provision the enterprise control catalog with AWS Audit Manager, start by cloning the sample code from the aws-samples repository on GitHub, followed by running the installation script (included in this repository) with sample controls and framework from your AWS Account.

To clone the sample code from the repository

On your development terminal, git clone the source code of this blog post from the AWS public repository:

git clone [email protected]:aws-samples/enterprise-controls-catalog-via-aws-audit-manager.git

To bootstrap CDK and run the deploy script

The CDK Toolkit Stack will be created by cdk bootstrap and will manage resources necessary to enable deployment of Cloud Applications with AWS CDK.

cdk bootstrap aws://<AWS Account Number>/<Region> # Bootstrap CDK in the specified account and region

cd audit-manager-blog

./deploy.sh

Workflow

Figure 3 illustrates the overall deployment workflow. The deployment script triggers the NPM package manager, and invokes AWS CDK to create necessary infrastructure using AWS CloudFormation. The CloudFormation template offers an easy way to provision and manage lifecycles, by treating infrastructure as code.

Figure 3: Detailed workflow lifecycle

Once the solution is successfully deployed, you can view two custom controls and one custom framework available in AWS Audit Manager. The custom controls use a combination of manual and automated evidence collection, using compliance checks for resource configurations from AWS Config.

To verify the newly created custom data security controls

In the AWS console, go to AWS Audit Manager and select Control library
Choose Custom controls to view the controls DataSecurity-DatainTransit and DataSecurity-DataAtRest

Figure 4. View custom controls

To verify the newly created custom framework

In the AWS console, go to AWS Audit Manager and select Framework library.
Choose Custom frameworks to view the following framework:

Figure 5. Custom frameworks list

You have now successfully created the custom controls and framework using the proposed solution.

Next, you can create your own controls and add to your frameworks using a simple configuration file, and let the implemented solution do the automated provisioning.

To set up error reporting

Before you begin creating your own controls and frameworks, you should complete the error reporting configuration. The solution automatically sets up the error reporting capability using Amazon SNS, a web service that enables sending and receiving notifications from the cloud.

In the AWS Console, go to Amazon SNS > Topics > AuditManagerBlogNotification
Select Create subscription and choose Email as your preferred endpoint to subscribe.
This will trigger an automated email on subscription confirmation. Upon confirmation, you will begin receiving any error notifications by email.

To create your own custom control as code

Follow these steps to create your own controls and frameworks:

Create a new control file named example-control.yaml with contents as shown below. This creates a custom control to check whether all public access to data in Amazon S3 is prohibited:

name:
DataSecurity-PublicAccessProhibited

description:
Information and records (data) are managed consistent with the organization’s risk strategy to protect the confidentiality, integrity, and availability of information.

actionPlanTitle:
All public access block settings are enabled at account level

actionPlanInstructions:
Ensure all Amazon S3 resources have public access prohibited

testingInformation:
Test attestations – preventive and detective controls for prohibiting public access

tags:
ID: PRDS-3Subcategory: Public-Access-Prohibited
Category: Data Security-PRDS
CIS: CIS17
COBIT: COBIT 5 APO07-03
NIST: NIST SP 800-53 Rev 4

datasources:
sourceName: Config attestation
sourceDescription: Config attestation
sourceSetUpOption: System_Controls_Mapping
sourceType: AWS_Config

sourceKeyword:
keywordInputType: SELECT_FROM_LIST
keywordValue: S3_ACCOUNT_LEVEL_PUBLIC_ACCESS_BLOCKS

Go to AWS Console > AWS CloudFormation > Stacks. Select AuditManagerBlogStack and choose Outputs.
Make note of the bucketOutput name that starts with auditmanagerblogstack-
Upload the example-control.yaml file into the auditmanagerblogstack- bucket noted in step 3, inside the controls folder
The event-driven architecture is deployed as part of the solution. Uploading the file to the Amazon S3 bucket triggers an automated event to create the new custom control in AWS Audit Manager.

To validate your new custom control is automatically provisioned in AWS Audit Manager

In the AWS console, go to AWS Audit Manager and select Control library
Choose Custom controls to view the following controls:

Figure 6. Audit Manager custom controls are listed as Custom controls

To create your own custom framework as code

Create a new framework file named example-framework.yaml with contents as shown below:

name:
Sample DataSecurity Framework

description:
A sample data security framework to prohibit public access to data

complianceType:
NIST

controlSets:
– name: Prohibit public access
controls:
– DataSecurity-PublicAccessProhibited

tags:
Tag1: DataSecurity
Tag2: PublicAccessProhibited

Go to AWS Console > AWS CloudFormation > Stacks. Select AuditManagerBlogStack and choose Outputs.
Make note of the bucketOutput name that starts with auditmanagerblogstack-
Upload the example-framework.yaml file into the bucket noted in step 3 above, inside the frameworks folder
The event driven architecture is deployed as part of the blog. The file upload to Amazon S3 triggers an automated event to create the new custom framework in AWS Audit Manager.

To validate your new custom framework automatically provisioned in AWS Audit Manager

Go to AWS Audit Manager in the AWS console and select Control library
Click Custom controls and you should be able to see the following controls:

Figure 7. View custom controls created via custom repo

Congratulations, you have successfully created your new custom control and framework using the proposed solution.

Next steps

An Audit Manager assessment is based on a framework, which is a grouping of controls. Using the framework of your choice as a starting point, you can create an assessment that collects evidence for the controls in that framework. In your assessment, you can also define the scope of your audit. This includes specifying which AWS accounts and services you want to collect evidence for. You can create an assessment from a custom framework you build yourself, using steps from the Audit Manager documentation.

Conclusion

The solution provides the dynamic ability to design, develop and monitor capabilities that can be extended as a standardized enterprise IT controls catalogue for your company. With AWS Audit Manager, you can build compliance controls as code, with capability to audit your environment on a daily, weekly, or monthly basis. You can use this solution to improve the dynamic nature of assessments with AWS Audit Manager’s compliance audit, on time with reduced manual effort. To learn more about our standard frameworks to assist you, see Supported frameworks in AWS Audit Manager which provides prebuilt frameworks based on AWS best practices.

Configure AWS SSO ABAC for EC2 instances and Systems Manager Session Manager

2022-01-12 Rodrigo Ferroni

Post Syndicated from Rodrigo Ferroni original https://aws.amazon.com/blogs/security/configure-aws-sso-abac-for-ec2-instances-and-systems-manager-session-manager/

In this blog post, I show you how to configure AWS Single Sign-On to define attribute-based access control (ABAC) permissions to manage Amazon Elastic Compute Cloud (Amazon EC2) instances and AWS Systems Manager Session Manager for federated users. This combination allows you to control access to specific Amazon EC2 instances based on users’ attributes. I show you how defined AWS SSO identity source attributes like login and department can be used, and how custom attributes like SSMSessionRunAs can be used to pass these attributes into Amazon Web Services (AWS) from an external identity provider (IdP) using SAML 2.0 assertion.

AWS SSO added support for ABAC to enable you to create fine-grained permissions for your workforce in AWS using user attributes. Using user attributes as tags in AWS helps you simplify the process of creating fine-grained permissions in AWS and enables you to ensure that your workforce has access only to the AWS resources with matching tags.

The new feature works with any supported AWS SSO identity source. This post walks you through the steps to enable attributes for access control, create permission sets and manage assignments when using a supported external IdP as your identity source.

Solution overview

The following architecture diagram—Figure 1—presents an overview of the solution.

Figure 1: Solution architecture diagram

In the example in Figure 1, Alice and Bob are users who each have the attributes
login, department, and SSMSessionRunAs. These attributes are created and updated in the external directory—Okta in this example—under those users’ profiles. The first two attributes are automatically synchronized by using System for Cross-domain Identity Management (SCIM) protocol between AWS SSO and Okta and configured within AWS SSO settings. The third custom attribute is passed directly from Okta into the AWS accounts as a new SAML assertion.

Both users are using the same AWS SSO custom permission set that allows them to launch a new Amazon EC2 instance with proper tags enforcement. Based on those tags, they can start, stop, and restart the EC2 instance if they are in the same department, and to terminate it if they are the owner. Also, they can connect using Session Manager if they’re in the same department. Users can sign in to those instances using the Linux OS user defined in the attribute SSMSessionRunAs.

Prerequisites

To perform the steps to use AWS SSO attributes for ABAC, you must already have deployed AWS SSO for your AWS Organizations and have connected with an external identity source using SAML and SCIM protocols. For more information, see Checklist: Configuring ABAC in AWS using AWS SSO.

You need two test users for implementing and testing the solution. You can use two existing users, or create new users named Alice and Bob to match the solution and testing described in the following sections.

Implement the solution

The basic steps to implement the solution are:

Confirm in AWS SSO settings that you have defined an external IdP, authentication via SAML 2.0, and provisioning via SCIM protocol.
Enable attributes for access control and define the two supported attributes: login and department.
Create a new user attribute in the Okta Directory.
Edit and confirm the users’ attributes defined in the Okta Directory profile.
Configure the SAML attribute statement in the Okta AWS SSO application.
Create a new permission set using an ABAC policy.
Create an AWS account assignment to the users using the permission set created in the previous step.

Confirm AWS SSO configuration

In this first step, you confirm that AWS SSO has been properly configured. Go to AWS SSO console SSO settings to check that the configuration of your identity source, authentication, and provisioning is as follows:

Identity source: External Identity Provider
Authentication: SAML 2.0
Provisioning: SCIM

Confirm authentication is working as expected, by going to your user portal URL in a new browser instance (to ensure your user authentication doesn’t overwrite your existing authentication). The user portal offers a single place to access all the assigned AWS accounts, roles, and applications. For example, it should look like https://exampledomain.awsapps.com/start. Once you access it, the process automatically redirects the request to your external provider for authentication, and then returns the user to the AWS SSO user portal.
To confirm provisioning, go to the AWS SSO console and choose Users from the right panel. You should see your Okta users assigned to the AWS SSO application being synchronized by SCIM protocol. Select any user to see the Created by SCIM and Updated by SCIM information for that user.

Enable AWS SSO attributes for access control

In this step, you enable ABAC and then configure AWS SSO attributes. This solution uses the Attributes for access control page in the AWS Management Console to enter the key and value pairs.

To enable attributes for access control

Open the AWS SSO console.
Choose Settings.
On the Settings page, under Identity source, next to Attributes for access control, select Enable. As shown in Figure 2.

Figure 2: Attributes for access control settings (enable ABAC)

Once ABAC is enabled, you can select the attributes to be synchronized. For this use case, select login and department.

To select your attributes using the AWS SSO console

Open the AWS SSO console.
Choose Settings.
On the Settings page, under Identity source, next to Attributes for access control, choose View details.
On the Attributes for access control page, notice the Key and Value columns. This is where you will be mapping the attribute from your identity source to an attribute that AWS SSO passes as a session tag. Set the first key and value pair by entering login as the key and ${path:userName} as the value. Set the second key and value pair to department and ${path:enterprise.department}. The settings are shown in Figure 3 below.

Figure 3: Map attributes using the Attributes for access control page
Choose Save changes.

Create a new attribute in Okta Directory

In this third step, you create the new custom attribute SSMSessionRunAs.

To create a new user attribute

Open the Okta console.
Under Directory, choose Profile Editor.
Choose Edit Profile for Okta User (default).
Under Attributes, choose Add Attribute as follows:
Data type: Select String
Display Name: Enter SSMSessionRunAs
Variable Name: Enter SSMSessionRunAs
Attribute Length: Select Less than and enter 10 (max).
Choose Save.

Edit and confirm users’ attributes defined in Okta Directory profile

Now that you have the new attribute SSMSessionRunAs created, go to the users’ profiles to enter the Department and SSMSessionRunAs values for both users.

To edit and confirm users’ attributes

Open the Okta console.
Under Directory, choose People.
Select user Bob.
Under Profile tab choose Edit as follows:
For the key Department, enter blue as the value.

For the key SSMSessionRunAs, enter bob as the value.
Choose Save.
Repeat steps 1 through 5 for Alice. For the key Department, enter amber as the value and for SSMSessionRunAs, enter alice as the value.
Confirm that the attributes of both users are defined in the external directory as follows:Username (login): [email protected]
First name (firstName): Bob
Last name (lastName): Rodriguez
Display name (displayName): Bob
Department (department): blue
SSMSessionRunAs (SSMSessionRunAs): bob

Username (login): [email protected]
First name (firstName): Alice
Last name (lastName): Rosalez
Display name (displayName): Alice
Department (department): amber
SSMSessionRunAs (SSMSessionRunAs): alice

Configure SAML attribute statement in Okta AWS SSO application

The attribute SSMSessionRunAs isn’t available as an attribute within AWS SSO. However, you can include it by defining SAML attribute statements, which are inserted into the SAML assertions.

To create a new SAML attribute

Open the Okta Application console.
Choose AWS Single Sign-on application.
On the Sign On tab, choose Edit Settings.
Under SAML 2.0 Attributes Statements enter the following:
- For Name, enter https://aws.amazon.com/SAML/Attributes/AccessControl:SSMSessionRunAs
- For Name format, select URI Reference
- For Value, enter user.SSMSessionRunAs
Choose Save.

Create a new permission set using an ABAC policy

In this step, you create a permissions policy that determines who can access your AWS resources based on the configured attribute value. When you enable ABAC and specify attributes, AWS SSO passes the attribute value of the authenticated user into AWS Identity and Access Management (IAM) for use in policy evaluation.

To create a permission set

Open the AWS SSO console.
Choose AWS accounts.
Select the Permission sets tab.
Choose Create permission set.

On the Create new permission set page, choose Create a custom permission set.

Choose Next: Details.
Under Create a custom permission set, enter a name that will identify this permission set in AWS SSO. This name will also appear as an IAM role in the user portal for any users who have access to it. For this solution, name it myCustomPermissionSetEC2SSM.

Choose Create a custom permissions policy and paste in the following ABAC policy document:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowDescribeList",
      "Action": [
        "ec2:Describe*",
        "ssm:Describe*",
        "ssm:Get*",
        "ssm:List*",
        "iam:ListInstanceProfiles",
        "cloudwatch:DescribeAlarms"
      ],
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Sid": "AllowRunInstancesResources",
      "Effect": "Allow",
      "Action": "ec2:RunInstances",
      "Resource": [
        "arn:aws:ec2:*::image/*",
        "arn:aws:ec2:*::snapshot/*",
        "arn:aws:ec2:*:*:subnet/*",
        "arn:aws:ec2:*:*:key-pair/*",
        "arn:aws:ec2:*:*:security-group/*",
        "arn:aws:ec2:*:*:network-interface/*"
      ]
    },
    {
      "Sid": "AllowRunInstancesConditions",
      "Effect": "Allow",
      "Action": "ec2:RunInstances",
      "Resource": [
        "arn:aws:ec2:*:*:instance/*",
        "arn:aws:ec2:*:*:volume/*",
        "arn:aws:ec2:*:*:network-interface/*"
      ],
      "Condition": {
        "StringLike": {
          "aws:RequestTag/Name": "*"
        },
        "StringEquals": {
          "aws:RequestTag/Owner": "${aws:PrincipalTag/login}",
          "aws:RequestTag/Department": "${aws:PrincipalTag/department}"
        },
        "ForAllValues:StringEquals": {
          "aws:TagKeys": [
            "Name",
            "Owner",
            "Department"
          ]
        }
      }
    },
    {
      "Sid": "AllowCreateTagsOnRunInstance",
      "Effect": "Allow",
      "Action": "ec2:CreateTags",
      "Resource": [
        "arn:aws:ec2:*:*:volume/*",
        "arn:aws:ec2:*:*:instance/*",
        "arn:aws:ec2:*:*:network-interface/*"
      ],
      "Condition": {
        "StringEquals": {
          "ec2:CreateAction": "RunInstances"
        }
      }
    },
    {
      "Sid": "AllowPassRoleSpecificRole",
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::*:role/EC2UbuntuSSMRole"
    },
    {
      "Sid": "AllowEC2ActionsConditions",
      "Effect": "Allow",
      "Action": [
        "ec2:StartInstances",
        "ec2:StopInstances",
        "ec2:RebootInstances"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "ec2:ResourceTag/Department": "${aws:PrincipalTag/department}"
        }
      }
    },
    {
      "Sid": "AllowTerminateConditions",
      "Effect": "Allow",
      "Action": [
        "ec2:TerminateInstances"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "ec2:ResourceTag/Owner": "${aws:PrincipalTag/login}"
        }
      }
    },
    {
      "Sid": "AllowStartSessionConditions",
      "Effect": "Allow",
      "Action": [
        "ssm:StartSession"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "ssm:resourceTag/Department": "${aws:PrincipalTag/department}"
        }
      }
    },
    {
      "Sid": "AllowTerminateSessionConditions",
      "Effect": "Allow",
      "Action": [
        "ssm:TerminateSession"
      ],
      "Resource": [
        "arn:aws:ssm:*:*:session/${aws:PrincipalTag/login}-*"
      ]
    }
  ]
}

Choose Next: Tags.
Review the selections you made, and then choose Create.

The policy described above uses SAML session tags for the ABAC to define permissions based on attributes. These attributes are the tags passed in the AssumeRoleWithSAML operation when the SAML-based federation occurs.

A combination of global (aws:TagKeys, aws:PrincipalTag, aws:RequestTag) and service (ec2:ResourceTag, ec2:CreateAction, ssm:resourceTag) condition keys is used to assign the permissions.

To learn more about AWS global and service conditions keys, see AWS global condition context keys and The condition keys table for AWS services.

Assign users to an AWS account

In this step, you use the permission set created in the previous step to assign access to the users for a specified AWS account.

To assign access to users

Open the AWS SSO console.
Choose AWS accounts.
Under the AWS organization tab, in the list of AWS accounts, select one or more accounts to which you want to assign access.
Choose Assign users.
On the Select users or groups page, select both test users from the list of users as shown in Figure 4.

Note: You can use the search box to look for specific users.

Figure 4: Select users to assign to AWS accounts
Choose Next: Permission sets.
On the Select permission sets page, select the permission sets that you created in step 5 to apply to the users from the table as shown in Figure 5.

Figure 5: Select permissions sets
Choose Finish to start the configuration of your AWS account. When configuration is complete, a message is displayed stating that you have successfully configured your AWS account as shown in Figure 6.

Figure 6: Confirmation that configuration is complete

Test the solution

Now that you have everything in place, let’s test the solution. To test the solution, you’ll log in to AWS SSO, access the AWS account and check the event logs, and test the Amazon EC2 operations.

Log in to AWS SSO as Bob through your external IdP

Enter the user portal URL in a browser window and log in to AWS SSO as Bob. AWS SSO redirects to the external provider for the log in process. After successful authentication, the external provider redirects to the AWS SSO portal, which shows you a list of the AWS accounts that you have access to. In this case, Bob has access to one AWS account as shown in Figure 7.

Figure 7: AWS SSO showing AWS accounts that the user has access to

Access the AWS account using the permission set and confirm the event logs

Select the Management console link for the AWS account that has the myCustomPermissionSetEC2SSM permission set that you created earlier. This action federates into the AWS account and is logged in to AWS CloudTrail with the API AssumeRoleWithSAML. To confirm that the SAML session tags are being passed in the session, look at the API event log in the CloudTrail Event history console. In the following example, you can check the principalTags keys and their values under requestParameters.

{
     "eventVersion": "1.08",
     "userIdentity": {
          "type": "SAMLUser",
          "principalId": "d/UbWH0ijLBmlakaboZwi5CA/30=:[email protected]",
          "userName": "[email protected]",
          "identityProvider": "d/UbWH0ijLBmlakaboZwi5CA/30="
},
     "eventTime": "2021-05-13T16:08:48Z",
     "eventSource": "sts.amazonaws.com",
     "eventName": "AssumeRoleWithSAML",
     ...
     "requestParameters": {
        "sAMLAssertionID": "_5072d119-64f5-4341-aeed-30d9b7c24b5b",
        "roleSessionName": "[email protected]",
        "principalTags": {
            "SSMSessionRunAs": "bob",
            "department": "blue",
            "login": "[email protected]"
        },
        "durationSeconds": 3600,
        "roleArn": "arn:aws:iam::555555555555:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_myCustomPermissionSetEC2SSM_9e80ec498218bbea",
        "principalArn": "arn:aws:iam::555555555555:saml-provider/AWSSSO_5f872b6782a0507a_DO_NOT_DELETE"
    },
     "responseElements": {
     ...

Test EC2 operations

Open the Amazon EC2 console:
For this example, when opening the Amazon EC2 console there are already three running EC2 instances to test the ABAC policy that have been created with proper tags explained in the following step. From the top menu, you can also confirm the federated login AWSReservedSSO_myCustomPermissionSetEC2SSM_9e80ec498218bbea/[email protected] that represents the AWS SSO managed role and the user as shown in Figure 8.

Figure 8: EC2 instances and user information
Launch a new EC2 instance:
Start testing the ABAC policy by launching a new EC2 instance. This action is authorized only when you fill in the three required tags: Name, Owner, and Department.
1. From the Amazon EC2 console, choose Launch Instances.
2. Set the AMI, for this example select an Ubuntu-based OS.
3. Set the Instance Type, a t2.micro will work.
4. Configure the EC2 instance. Choose an IAM role to allow Systems Manager to manage the new EC2 instance. In this case, you have to create the IAM role EC2UbuntuSSMRole with the AWS managed policy AmazonEC2RoleforSSM attached in advanced with proper IAM permissions since the user Bob is not allow to do so. Then, you must use the user data to create the OS Ubuntu user—Bob—that you need to log in to the EC2 instance by using Session Manager. You can copy and paste the following to create the user “Bob”:#!/bin/bash
  sudo useradd -m bob
5. Add storage using the default settings.
6. Add tags. From the ABAC policy previously created, you can confirm that tag key Name can be anything as the condition StringLike is indicated with a wildcard (*). The tag keys Owner and Department have to match the principal session tags passed through federation. In this case, enter [email protected] as the key Owner, and enter blue as the Department, as shown in Figure 9.
  
  Figure 9: EC2 tags describing key value pairs
7. Configure security groups. When configuring security groups, you can choose an existing security group that doesn’t allow any inbound traffic to the SSH port. Since when using Session Manager you connect to the EC2 instance through an API that is going to be an outbound connection. This way you can safely leave the security group inbound rules close.
8. Review and launch. It will ask you about selecting or creating a key pair. You don’t need one, because you’re using Session Manager. Proceed without selecting or creating a new SSH key pair. When launching the EC2 instance with the correct tag keys and values, you get the success message shown in Figure 10.
  
  Figure 10: EC2 success message launching an instance with the correct tags
  
  If there are any missing tag keys or the values aren’t correct, the action will be denied as shown in Figure 11. For more information, you can decode the authorization error message using the API DecodeAuthorizationMessage.
  
  Figure 11: EC2 failed message launching an instance with incorrect tags
Stop, reboot, and terminate EC2 instances.
The next tests are to be stop, reboot, and terminate the EC2 instances. In the ABAC policy you defined that only users who have the same department value as the resource can perform the first two actions. You can terminate and EC2 instance only if you are an owner. To stop, reboot, and terminate instances, open the EC2 Console, choose Instances, and select the instance you want to affect. Choose Instance state and choose the action you want to test: Stop instance, Reboot instance or Terminate instance.

Trying to stop the EC2 instance amber-instance where Department is amber is shown in Figure 12.

Figure 12: EC2 console showing how to stop an instance

The action should fail as shown in Figure 13.

Figure 13: EC2 instance failure message stopping an instance with wrong tags

Only when the department value of the EC2 instance is blue is it possible to stop or reboot the instance as shown in Figure 14.

Figure 14: EC2 success message stopping an instance with correct tags

Only when the owner who launched the EC2 instance matches with the federated login is it possible to terminate the instance. Trying to terminate an EC2 instance that was launched by anyone other than the owner will lead to a failed action as shown in Figure 15.

Figure 15: EC2 failed message terminating an instance with incorrect tags
Try to modify tags. Because ABAC policies rely on tags, you cannot modify tags after the resources have been created. This is set in the ABAC policy statement AllowCreateTagsOnRunInstance in Create a new permission set using an ABAC policy. If you try to modify any tag keys or values on existing resources, the changes will be denied. For example, if you try to modify the owner of a tag on an existing EC2 instance, you get the “Failed to update tags” error message as shown in Figure 16.

Figure 16: Failed message when attempting to modify tags
Connect to the EC2 instance using Session Manager.
1. Test logging in to the EC2 instance by choosing the new instance and choosing Connect as shown in Figure 17.
  
  Figure 17: EC2 console selecting an instance to connect
2. Then choose the Session Manager tab and choose Connect as shown in Figure 18.
  
  Figure 18: EC2 console selecting Session Manager to connect
  
  This will open a new tab in the browser redirecting to a Systems Manager session where you can confirm that the Ubuntu OS user is Bob as shown in Figure 19.
  
  Figure 19: Systems Manager session started confirming Ubunto OS user
  
  Note: By default, sessions are launched using the credentials of a system-generated account named ssm-user that is created on a managed instance. However, you can instead launch sessions using any OS user by enabling the run as feature in SSM. To learn more about this, see Enable run as support for Linux and macOS instances in the Systems Manager Session Manager user guide.
3. Performing the same action in an EC2 instance with a different Department tag will lead to a denied action as shown in Figure 20. This is because the ABAC policy allows the StartSession action only when the Department key matches the Department value in the EC2 instance.
  
  Figure 20: Systems Manager StartSession failed message

Conclusion

In this blog post, you learned how to use AWS SSO with the two methods of passing attributes to AWS account using session tags for ABAC. You also learned how to build policies with tags as conditions to simplify and reuse custom permission sets. You have seen working examples with services like EC2, and Systems Manager Session Manager. To learn more about ABAC policies, SAML session tags, and how to pass session tags in federation, see IAM tutorial: Use SAML session tags for ABAC and Passing session tags using AssumeRoleWithSAML.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Detect Real-Time Anomalies and Failures in Industrial Processes Using Apache Flink

2022-01-10 Hubert Asamer

Post Syndicated from Hubert Asamer original https://aws.amazon.com/blogs/architecture/detect-real-time-anomalies-and-failures-in-industrial-processes-using-apache-flink/

For a long time, industrial control systems were the heart of the manufacturing process which allows collecting, processing, and acting on data from the shop floor. Process manufacturers used a distributed control system (DCS) to do the automated control and operation of an industrial process or plant.

With the convergence of operational technology and information technology (IT), customers such as Yara are integrating their DCS with additional intelligence from the IT side. This provides customers with a holistic view of the different data sources to make more complex decisions with advanced analytics.

In this blog post, we show how to start with advanced analytics on streaming data coming from the shop floor. The sensor data, such as pressure and temperature, is typically published by a DCS. It is then ingested with a local edge gateway and streamed to the cloud with streaming and industrial internet of things (IoT) technology. Analytics on the streaming data is typically done before all data points are stored in the data layer. Figure 1 shows how the data flow can be modeled and visualized with AWS services.

Figure 1: High-level ingestion and analytics architecture

In this example, we are concentrating on the streaming analytics part in the Cloud. We will generate data from a simulated DCS to Amazon Kinesis Data Streams where you have a gateway such as AWS IoT Greengrass and maybe other IoT services in-between.

For the simulated process that the DCS is controlling, we use a well-documented industrial process for creating a chemical compound (acetic anhydride) called the Tennesee Eastman process (TEP). There are several simulations available as open source. We demonstrate how to use this data as a constant stream with more than 30 real-time measurement parameters, ingest to Kinesis Data Streams, and run in-stream analytics using Apache Flink. Within Apache Flink, data is grouped and mapped to the respective stages and parts of the industrial process, and constantly analyzed by calculating anomalies of all process stages. All raw data, plus the derived anomalies and failure patterns, are then ingested from Apache Flink to Amazon Timestream for further use in near real-time dashboards.

Overview of solution

Note: Refer to steps 1 to 6 in Figure 2.

As a starting point for a realistic and data intensive measurement source, we use an already existing (TEP) simulation framework written in C++ originally created from National Institute of Standards and Technology, and published as open source. The GitHub Blog repository contains a small patch which adds AWS connectivity with the software development kits (SDKs) and modifications to the command line arguments. The programs provided by this framework are (step 1) a simulation process starter with configurable starting conditions and timestep configurations and a real-time client (step 2) which connects to the simulation and sends the simulation output data to the AWS Cloud.

Tennesee Eastman process (TEP) background

A paper by Downs & Vogel, A plant-wide industrial process control problem, from 1991 states:

“This chemical standard process consists of a reactor/separator/recycle arrangement involving two simultaneous gas-liquid exothermic reactions.”

“The process produces two liquid products from four reactants. Also present are an inert and a byproduct making a total of eight components. Two additional byproduct reactions also occur. The process has 12 valves available for manipulation and 41 measurements available for monitoring or control.“

The simulation framework used can control all of the 12 valve settings and produces 41 measurement variables with varying sampling frequency.

Data ingestion

The 41 measurement variables, named xmeas_1 to xmeas_41, are emitted by the real-time client (step 2) as key-value JSON messages. The client code is configured to produce 100 messages per second. A built-in C++ Kinesis SDK allows the real-time client to directly stream JSON messages to a Kinesis data stream (step 3).

Figure 2 – Detailed system architecture

Stream processing with Apache Flink

Messages sent to Amazon Kinesis Data Stream are processed in configurable batch sizes by an Apache Flink application, deployed in Amazon Kinesis Data Analytics. Apache Flink is an open-source stream processing framework, written and usable in Java or Scala. As described in Figure 3, it allows the definition of various data sources (for example, a Kinesis data stream) and data sinks for storing processing results. In-between data can be processed by a range of operators—typically mapping and reducing functions (step 4).

In our case, we use a mapping operator where each batch of incoming messages is processed. In Code snippet 1, we apply a custom mapping function to the raw data stream. For rapid and iterative development purposes it’s possible to have the complete stream processing pipeline running in a local Java or Scala IDE such as Maven, Eclipse, or IntelliJ.

Figure 3: Flink execution plan (green: streaming data sources; yellow: data sinks)

public class StreamingJob extends AnomalyDetector {
---
  public static DataStream<String> createKinesisSource
    (StreamExecutionEnvironment env, 
     ParameterTool parameter)
    {
    // create Stream
    return kinesisStream;
  }
---
  public static void main(String[] args) {
    // set up the execution environment
    final StreamExecutionEnvironment env = 
      StreamExecutionEnvironment.getExecutionEnvironment();
---
    DataStream<List<TimestreamPoint>> mainStream =
      createKinesisSource(env, parameter)
      .map(new AnomalyJsonToTimestreamPayloadFn(parameter))
      .name("MaptoTimestreamPayload");
---
    env.execute("Amazon Timestream Flink Anomaly Detection Sink");
  }
}

Code snippet 1: Flink application main class

In-stream anomaly detection

Within the Flink mapping operator, a statistical outlier detection (anomaly detection) is implemented. Flink allows the inclusion of custom libraries within its operators. The library used here is published by AWS—a Random Cut Forest implementation available from GitHub. Random Cut Forest is a well understood statistical method which can operate on batches of measurements. It then calculates an anomaly score for each new measurement by comparing a new value with a cached pool (=forest) of older values.

The algorithm allows the creation of grouped anomaly scores, where a set of variables is combined to calculate a single anomaly score. In the simulated chemical process (TEP), we can group the measurement variables into three process stages:

reactor feed analysis
purge gas analysis
product analysis.

Each group consists of 5–10 measurement variables. We’re getting anomaly scores for a, b, and c. In Code snippet 2 we can learn how an anomaly detector is created. The class AnomalyDetector is instantiated and extended then three times (for our three distinct process stages) within the mapping function as described in Code snippet 3.

Flink distributes this calculation across its worker nodes and handles data deduplication processes within its system.

---
public class AnomalyDetector {
    protected final ParameterTool parameter;
    protected final Function<RandomCutForest, LineTransformer> algorithmInitializer;
    protected LineTransformer algorithm;
    protected ShingleBuilder shingleBuilder;
    protected double[] pointBuffer;
    protected double[] shingleBuffer;
    public AnomalyDetector(
      ParameterTool parameter,
      Function<RandomCutForest,LineTransformer> algorithmInitializer)
    {
      this.parameter = parameter;
      this.algorithmInitializer = algorithmInitializer;
    }
    public List<String> run(Double[] values) {
            if (pointBuffer == null) {
                prepareAlgorithm(values.length);
            }
      return processLine(values);
    }
    protected void prepareAlgorithm(int dimensions) {
---
      RandomCutForest forest = RandomCutForest.builder()
        .numberOfTrees(Integer.parseInt(
          parameter.get("RcfNumberOfTrees", "50")))
        .sampleSize(Integer.parseInt(
          parameter.get("RcfSampleSize", "8192")))
        .dimensions(shingleBuilder.getShingledPointSize())
        .lambda(Double.parseDouble(
          parameter.get("RcfLambda", "0.00001220703125")))
        .randomSeed(Integer.parseInt(
          parameter.get("RcfRandomSeed", "42")))
      .build();
---
    algorithm = algorithmInitializer.apply(forest);
  }

Code snippet 2: AnomalyDetector base class, which gets extended by the streaming applications main class

public class AnomalyJsonToTimestreamPayloadFn extends 
    RichMapFunction<String, List<TimestreamPoint>> {
  protected final ParameterTool parameter;
  private final Logger logger = 

  public AnomalyJsonToTimestreamPayloadFn(ParameterTool parameter) {
    this.parameter = parameter;
  }

  // create new instance of StreamingJob for running our Forest
  StreamingJob overallAnomalyRunner1;
  StreamingJob overallAnomalyRunner2;
  StreamingJob overallAnomalyRunner3;
---

  // use `open`method as RCF initialization
  @Override
  public void open(Configuration parameters) throws Exception {
    overallAnomalyRunner1 = new StreamingJob(parameter);
    overallAnomalyRunner2 = new StreamingJob(parameter);
    overallAnomalyRunner3 = new StreamingJob(parameter);
  super.open(parameters);
}
---

Code snippet 3: Mapping Function uses the Flink RichMapFunction open routine to initialize three distinct Random Cut Forests

Data persistence – Flink data sinks

After all anomalies are calculated, we can decide where to send this data. Flink provides various ready-to-use data sinks. In these examples, we fan out all (raw and processed) data to Amazon Kinesis Data Firehose for storing in Amazon Simple Storage Service (Amazon S3) (long term) (step 5) and to Amazon Timestream (short term) (step 5). Kinesis Data Firehose is configured with a small AWS Lambda function to reformat data from JSON to CSV, and data is stored with automated partitioning to Amazon S3. A Timestream data sink does not come pre-bundled with Flink. A custom Timestream ingestion code is used in these examples. Flink provides extensible operator interfaces for the creation of custom map and sink functions.

Timeseries handling

Timestream, in combination with Grafana, is used for near real-time monitoring. Grafana comes bundled with a Timestream data source plugin and can constantly query and visualize Timestream data (step 6).

Walkthrough

Our architecture is available as a deployable AWS CloudFormation template. The simulation framework comes packed as a docker image, with an option to install it locally on a linux host.

Prerequisites

To implement this architecture, you will need:

An AWS account
Docker (CE) Engine v18++
Java JDK v11++
maven v3.6++

We recommend running a local and recent Linux environment. It is assumed that you are using AWS Cloud9, deployed with CloudFormation, within your AWS account.

Steps

Follow these steps to deploy the solution and play with the simulation framework. At the end, detected anomalies derived from Flink are stored next to all raw data in Timestream and presented in Grafana. We’re using AWS Cloud9 and its Linux terminal capabilities here to fire up a Grafana instance, then manually run the simulation to ingest data to Kinesis and optionally manually start the Flink app from the console using Maven.

Deploy stack

After you’re logged in to the AWS Management console you can deploy the CloudFormation stack. This stack creates a fully configured AWS Cloud9 environment with the related GitHub Repo already in place, a Kinesis data stream, Kinesis Data Firehose delivery stream, Kinesis Data Analytics with Flink app deployed, Timestream database, and an S3 bucket.

After successful deployment, record two important facts from the CloudFormation console: the chosen stack name and the attribute 03Cloud9EnvUrl displayed in the Output Section of the stack. The attribute’s URL will take you directly to our deployed AWS Cloud9 environment.

Run post install step within AWS Cloud9

The deployed stack created an AWS Cloud9 environment and an AWS Identity and Access Management (IAM) instance profile. We apply this instance profile to AWS Cloud9 to interact with Kinesis, Timestream, and Amazon S3 throughout the next steps. The used script also configures and installs other required tools.

1. Open a terminal window.

$ cd flinkAnomalySteps/deployment
$ source c9-postInstall.sh
---SETTING UP IAM INSTANCE PROFILE
Please enter cloudformation stack name (default: flink-rcf-app):
# enter your stack name

Start a Grafana development server

In this section we are starting a Grafana server using docker. Cloud 9 allows us to expose web applications (for demo & development purposes) on container port 8080.

1. Open a terminal window.

$ cd ../src/grafana-dashboard
$ docker volume create grafana-storage
# this creates a docker volume for persisting your Grafana settings
$./start-grafana.sh
# this starts a recent Grafana using docker with Timestream plugin and a pre-configured dashboard in place

2. Open the preview panel by selecting Preview, and then select Preview Running Application.

3. Next, in the preview pane, select Pop out into new Window.

4. A new browser tab with Grafana opens.

5. Choose any username and password combination.

6. In Grafana use the “Search dashboards” icon on the left and choose “TEP-SIM-DEV”. This pre-configured dashboard displays data from Amazon Timestream (see step “Open Grafana”).

TEP simulation procedure

Within your local or AWS Cloud9 Linux environment, fetch the simulation docker image from the public AWS container registry, or build the simulation binaries locally, for building manually check the GitHub repo.

Start simulation (in separate terminal)

# starts container and switch into the container-shell
$ docker run -it --rm \
  --network host \
  --name tesim-runner \
  tesim-runner:01 \
 /bin/bash
# then inside container
$ ./tesim --simtime 100  --external-ctrl
# simulation started…

Manipulate simulation (in separate terminal)

Follow the steps here for a basic process disturbance task. Review the aspects of influencing the simulation in the GitHub-Repo. The rtclient program has a range of commands to use for introducing disturbances.

# first switch into running simulation container
$ docker exec -it tesim-runner /bin/bash
# now we can access the shared storage of the simulation process…
$ ./rtclient –setidv 6
# this enables one of the built in process disturbances (1-20)
$ ./rtclient –setidv 7
$ ./rtclient –setidv 8
$ …

Stream Simulation data to Amazon Kinesis DataStream (in separate terminal)

The client has a built-in record frequency of 50 messages per second. One message contains more than 50 measurements, so we have approximately 2,500 measurements per second.

$ ./rtclient -k

AWS libcrypto resolve: found static libcrypto 1.1.1 HMAC symbols
AWS libcrypto resolve: found static libcrypto 1.1.1 EVP_MD symbols
{"xmeas_1": 3649.739476,"xmeas_2": 4451.32071,"xmeas_3": 9.223142558,"xmeas_4": 32.39290913,"xmeas_5": 47.55975621,"xmeas_6": 2798.975688,"xmeas_7": 64.99582601,"xmeas_8": 122.8987929,"xmeas_9": 0.1978264656,…}
# Messages in JSON sent to Kinesis DataStream visible via stdout

Compile and start Flink Application (optional step)

If you want deeper insights into the Flink Application, we can start this as well from the AWS Cloud9 instance. Note: this is only appropriate in development.

$ cd flinkAnomalySteps/src
$ cd flink-rcf-app
$ mvn clean compile
# the Flink app gets compiled
$ mvn exec:java -Dexec.mainClass= \
    "com.amazonaws.services.kinesisanalytics.StreamingJob"
# Flink App is started with default settings in place…
…

Open Grafana dashboard (from the step Start a Grafana development server)

Process anomalies are visible instantly after you start the simulation. Use Grafana to drill down into the data as needed.

/**example - simplest possible Timestream query used for Viz:**/

SELECT CREATE_TIME_SERIES(time, measure_value::double) as anomaly_stream6 FROM "kdaflink"."kinesisdata1"

    WHERE measure_name='anomaly_score_stream6' AND

    time between ago(15m) and now()

Code snippet 4: Timestream SQL example; Timestream database is `kdaflink` – table is `kinesisdata1`

Figure 4 – Grafana dashboard showing near real-time simulation data, three anomalies, mapped to the TEP process, are constantly calculated by Flink

S3 raw metrics bucket

For the sake of completeness and potential usefulness, the Flink Application emits all raw data in an intermediate step to Kinesis Data Firehose. The service converts all JSON data to CSV format by using a small AWS Lambda function.

$ aws s3 ls flink-rcf-app-rawmetricsbucket-<CFN-UUID>/tep-raw-csv/2021/11/26/19/

Cleaning up

Delete the deployed CloudFormation stack. All resources (excluding S3 buckets) are permanently deleted.

Conclusion

In this blog post, we learned that in-stream anomaly detection and constant measurement data insights can work together. The Apache Flink framework offers a ready-to-use platform that is mission critical for future adoption across manufacturing and other industries. Other applications of the presented Flink pattern can run on capable edge compute devices. Integration with AWS IoT Greengrass and AWS Greengrass Stream Manager are part of the GitHub Blog repository.

Another extension includes measurement data pattern detection routines, which can coexist with in-stream anomaly detection and can detect specific failure patterns over time using time-windowing features of the Flink framework. You can refer to the GitHub repo which accompanies this blog post. Give it a try and let us know your feedback in the comments!