Tag Archives: Advanced (300)

Things to Consider When You Build REST APIs with Amazon API Gateway

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/things-to-consider-when-you-build-rest-apis-with-amazon-api-gateway/

A few weeks ago, we kicked off this series with a discussion on REST vs GraphQL APIs. This post will dive deeper into the things an API architect or developer should consider when building REST APIs with Amazon API Gateway.

Request Rate (a.k.a. “TPS”)

Request rate is the first thing you should consider when designing REST APIs. By default, API Gateway allows for up to 10,000 requests per second. You should use the built in Amazon CloudWatch metrics to review how your API is being used. The Count metric in particular can help you review the total number of API requests in a given period.

It’s important to understand the actual request rate that your architecture is capable of supporting. For example, consider this architecture:

REST API 1

This API accepts GET requests to retrieve a user’s cart by using a Lambda function to perform SQL queries against a relational database managed in RDS.  If you receive a large burst of traffic, both API Gateway and Lambda will scale in response to the traffic. However, relational databases typically have limited memory/cpu capacity and will quickly exhaust the total number of connections.

As an API architect, you should design your APIs to protect your down stream applications.  You can start by defining API Keys and requiring your clients to deliver a key with incoming requests. This lets you track each application or client who is consuming your API.  This also lets you create Usage Plans and throttle your clients according to the plan you define.  For example, you if you know your architecture is capable of of sustaining 200 requests per second, you should define a Usage plan that sets a rate of 200 RPS and optionally configure a quota to allow a certain number of requests by day, week, or month.

Additionally, API Gateway lets you define throttling settings for the whole stage or per method. If you know that a GET operation is less resource intensive than a POST operation you can override the stage settings and set different throttling settings for each resource.

Integrations and Design patterns

The example above describes a synchronous, tightly coupled architecture where the request must wait for a response from the backend integration (RDS in this case). This results in system scaling characteristics that are the lowest common denominator of all components. Instead, you should look for opportunities to design an asynchronous, loosely coupled architecture. A decoupled architecture separates the data ingestion from the data processing and allows you to scale each system separately. Consider this new architecture:

REST API 2

This architecture enables ingestion of orders directly into a highly scalable and durable data store such as Amazon Simple Queue Service (SQS).  Your backend can process these orders at any speed that is suitable for your business requirements and system ability.  Most importantly,  the health of the backend processing system does not impact your ability to continue accepting orders.

Security

Security with API Gateway falls into three major buckets, and I’ll outline them below. Remember, you should enable all three options to combine multiple layers of security.

Option 1 (Application Firewall)

You can enable AWS Web Application Firewall (WAF) for your entire API. WAF will inspect all incoming requests and block requests that fail your inspection rules. For example, WAF can inspect requests for SQL Injection, Cross Site Scripting, or whitelisted IP addresses.

Option 2 (Resource Policy)

You can apply a Resource Policy that protects your entire API. This is an IAM policy that is applied to your API and you can use this to white/black list client IP ranges or allow AWS accounts and AWS principals to access your API.

Option 3 (AuthZ)

  1. IAM:This AuthZ option requires clients to sign requests with the AWS v4 signing process. The associated IAM role or user must have permissions to perform the execute-api:Invoke action against the API.
  2. Cognito: This AuthZ option requires clients to login into Cognito and then pass the returned ID or Access JWT token in the Authentication header.
  3. Lambda Auth: This AuthZ option is the most flexible and lets you execute a Lambda function to perform any custom auth strategy needed. A common use case for this is OpenID Connect.

A Couple of Tips

Tip #1: Use Stage variables to avoid hard coding your backend Lambda and HTTP integrations. For example, you probably have multiple stages such as “QA” and “PROD” or “V1” and “V2.” You can define the same variable in each stage and specify different values. For example, you might an API that executes a Lambda function. In each stage, define the same variable called functionArn. You can reference this variable as your Lambda ARN during your integration configuration using this notation: ${stageVariables.functionArn}. API Gateway will inject the corresponding value for the stage dynamically at runtime, allowing you to execute different Lambda functions by stage.

Tip #2: Use Path and Query variables to inject dynamic values into your HTTP integrations. For example, your cart API may define a userId Path variable that is used to lookup a user’s cart: /cart/profile/{userId}. You can inject this variable directly into your backend HTTP integration URL settings like this: http://myapi.someds.com/cart/profile/{userId}

Summary

This post covered strategies you should use to ensure your REST API architectures are scalable and easy to maintain.  I hope you’ve enjoyed this post and our next post will cover GraphQL API architectures with AWS AppSync.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

How to Architect APIs for Scale and Security

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/how-to-architect-apis-for-scale-and-security/

We hope you’ve enjoyed reading our posts on best practices for your serverless applications. This series of posts will focus on best practices and concepts you should be familiar with when you architect APIs for your applications. We’ll kick this first post off with a comparison between REST and GraphQL API architectures.

Introduction

Developers have been creating RESTful APIs for a long time, typically using HTTP methods, such as GET, POST, DELETE to perform operations against the API. Amazon API Gateway is designed to make it easy for developers to create APIs at any scale without managing any servers. API Gateway will handle all of the heavy lifting needed including traffic management, security, monitoring, and version/environment management.

GraphQL APIs are relatively new, with a primary design goal of allowing clients to define the structure of the data that they require. AWS AppSync allows you to create flexible APIs that access and combine multiple data sources.

REST APIs

Architecting a REST API is structured around creating combinations of resources and methods.  Resources are paths  that are present in the request URL and methods are HTTP actions that you take against the resource. For example, you may define a resource called “cart”: http://myapi.somecompany.com/cart. The cart resource can respond to HTTP POSTs for adding items to a shopping cart or HTTP GETs for retrieving the items in your cart. With API Gateway, you would implement the API like this:

Behind the scenes, you can integrate with nearly any backend to provide the compute logic, data persistence, or business work flows.  For example, you can configure an AWS Lambda function to perform the addition of an item to a shopping cart (HTTP POST).  You can also use API Gateway to directly interact with AWS services like Amazon DynamoDB.  An example is using API Gateway to retrieve items in a cart from DynamoDB (HTTP GET).

RESTful APIs tend to use Path and Query parameters to inject dynamic values into APIs. For example, if you want to retreive a specific cart with an id of abcd123, you could design the API to accept a query or path parameter that specifies the cartID:

/cart?cartId=abcd123 or /cart/abcd123

Finally, when you need to add functionality to your API, the typical approach would be to add additional resources.  For example, to add a checkout function, you could add a resource called /cart/checkout.

GraphQL APIs

Architecting GraphQL APIs is not structured around resources and HTTP verbs, instead you define your data types and configure where the operations will retrieve data through a resolver. An operation is either a query or a mutation. Queries simply retrieve data while mutations are used when you want to modify data. If we use the same example from above, you could define a cart data type as follows:

type Cart {

  cartId: ID!

  customerId: String

  name: String

  email: String

  items: [String]

}

Next, you configure the fields in the Cart to map to specific data sources. AppSync is then responsible for executing resolvers to obtain appropriate information. Your client will send a HTTP POST to the AppSync endpoint with the exact shape of the data they require. AppSync is responsible for executing all configured resolvers to obtain the requested data and return a single response to the client.

Rest API

With GraphQL, the client can change their query to specify the exact data that is needed. The above example shows two queries that ask for different sets of information. The first getCart query asks for all of the static customer (customerId, name, email) and a list of items in the cart. The second query just asks for the customer’s static information. Based on the incoming query, AppSync will execute the correct resolver(s) to obtain the data. The client submits the payload via a HTTP POST to the same endpoint in both cases. The payload of the POST body is the only thing that changes.

As we saw above, a REST based implementation would require the API to define multiple HTTP resources and methods or path/query parameters to accomplish this.

AppSync also provides other powerful features that are not possible with REST APIs such as real-time data synchronization and multiple methods of authentication at the field and operation level.

Summary

As you can see, these are two different approaches to architecting your API. In our next few posts, we’ll cover specific features and architecture details you should be aware of when choosing between API Gateway (REST) and AppSync (GraphQL) APIs. In the meantime, you can read more about working with API Gateway and Appsync.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Simple Two-way Messaging using the Amazon SQS Temporary Queue Client

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/simple-two-way-messaging-using-the-amazon-sqs-temporary-queue-client/

This post is contributed by Robin Salkeld, Sr. Software Development Engineer

Amazon SQS is a fully managed message queuing service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. Asynchronous workflows have always been the primary use case for SQS. Using queues ensures one component can keep running smoothly without losing data when another component is unavailable or slow.

We were surprised, then, to discover that many customers use SQS in synchronous workflows. For example, many applications use queues to communicate between frontends and backends when processing a login request from a user.

Why would anyone use SQS for this? The service stores messages for up to 14 days with high durability, but messages in a synchronous workflow often must be processed within a few minutes, or even seconds. Why not just set up an HTTPS endpoint?

The more we talked to customers, the more we understood. Here’s what we learned:

  • Creating a queue is often easier and faster than creating an HTTPS endpoint and the infrastructure necessary to ensure the endpoint’s scalability.
  • Queues are safe by default because they are locked down to the AWS account that created them. In addition, any DDoS attempt on your service is absorbed by SQS instead of loading down your own servers.
  • There is generally no need to create firewall rules for the communication between microservices if they use queues.
  • Although SQS provides durable storage (which isn’t necessary for short-lived messages), it is still a cost-effective solution for this use case. This is especially true when you consider all the messaging broker maintenance that is done for you.

However, setting up efficient two-way communication through one-way queues requires some non-trivial client-side code. In our previous two-part post series on implementing enterprise integration patterns with AWS messaging services, Point-to-point channels and Publish-subscribe channels, we discussed the Request-Response Messaging Pattern. In this pattern, each requester creates a temporary destination to receive each response message.

The simplest approach is to create a new queue for each response, but this is like building a road just so a single car can drive on it before tearing it down. Technically, this can work (and SQS can create and delete queues quickly), but we can definitely make it faster and cheaper.

To better support short-lived, lightweight messaging destinations, we are pleased to present the Amazon SQS Temporary Queue Client. This client makes it easy to create and delete many temporary messaging destinations without inflating your AWS bill.

Virtual queues

The key concept behind the client is the virtual queue. Virtual queues let you multiplex many low-traffic queues onto a single SQS queue. Creating a virtual queue only instantiates a local buffer to hold messages for consumers as they arrive; there is no API call to SQS and no costs associated with creating a virtual queue.

The Temporary Queue Client includes the AmazonSQSVirtualQueuesClient class for creating and managing virtual queues. This class implements the AmazonSQS interface and adds support for attributes related to virtual queues. You can create a virtual queue using this client by calling the CreateQueue API action and including the HostQueueURL queue attribute. This attribute specifies the existing SQS queue on which to host the virtual queue. The queue URL for a virtual queue is in the form <host queue URL>#<virtual queue name>. For example:

https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue#MyVirtualQueueName

When you call the SendMessage or SendMessageBatch API actions on AmazonSQSVirtualQueuesClient with a virtual queue URL, the client first extracts the virtual queue name. It then attaches this name as an additional message attribute to each message, and sends the messages to the host queue. When you call the ReceiveMessage API action on a virtual queue, the calling thread waits for messages to appear in the in-memory buffer for the virtual queue. Meanwhile, a background thread polls the host queue and dispatches messages to these buffers according to the additional message attribute.

This mechanism is similar to how the AmazonSQSBufferedAsyncClient prefetches messages, and the benefits are similar. A single call to SQS can provide messages for up to 10 virtual queues, reducing the API calls that you pay for by up to a factor of ten. Deleting a virtual queue simply removes the client-side resources used to implement them, again without making API calls to SQS.

The diagram below illustrates the end-to-end process for sending messages through virtual queues:

Sending messages through virtual queues

Virtual queues are similar to virtual machines. Just as a virtual machine provides the same experience as a physical machine, a virtual queue divides the resources of a single SQS queue into smaller logical queues. This is ideal for temporary queues, since they frequently only receive a handful of messages in their lifetime. Virtual queues are currently implemented entirely within the Temporary Queue Client, but additional support and optimizations might be added to SQS itself in the future.

In most cases, you don’t have to manage virtual queues yourself. The library also includes the AmazonSQSTemporaryQueuesClient class. This class automatically creates virtual queues when the CreateQueue API action is called and creates host queues on demand for all queues with the same queue attributes. To optimize existing application code that creates and deletes queues, you can use this class as a drop-in replacement implementation of the AmazonSQS interface.

The client also includes the AmazonSQSRequester and AmazonSQSResponder interfaces, which enable two-way communication through SQS queues. The following is an example of an RPC implementation for a web application’s login process.

/**
 * This class handles a user's login request on the client side.
 */
public class LoginClient {

    // The SQS queue to send the requests to.
    private final String requestQueueUrl;

    // The AmazonSQSRequester creates a temporary queue for each response.
    private final AmazonSQSRequester sqsRequester = AmazonSQSRequesterClientBuilder.defaultClient();

    private final LoginClient(String requestQueueUrl) {
        this.requestQueueUrl = requestQueueUrl;
    }

    /**
     * Send a login request to the server.
     */
    public String login(String body) throws TimeoutException {
        SendMessageRequest request = new SendMessageRequest()
                .withMessageBody(body)
                .withQueueUrl(requestQueueUrl);

        // This:
        //  - creates a temporary queue
        //  - attaches its URL as an attribute on the message
        //  - sends the message
        //  - receives the response from the temporary queue
        //  - deletes the temporary queue
        //  - returns the response
        //
        // If something goes wrong and the server's response never shows up, this method throws a
        // TimeoutException.
        Message response = sqsRequester.sendMessageAndGetResponse(request, 20, TimeUnit.SECONDS);
        
        return response.getBody();
    }
}

/**
 * This class processes users' login requests on the server side.
 */
public class LoginServer {

    // The SQS queue to poll for login requests.
    // Assume that on construction a thread is created to poll this queue and call
    // handleLoginRequest() below for each message.
    private final String requestQueueUrl;

    // The AmazonSQSResponder sends responses to the correct response destination.
    private final AmazonSQSResponder sqsResponder = AmazonSQSResponderClientBuilder.defaultClient();

    private final AmazonSQS(String requestQueueUrl) {
        this.requestQueueUrl = requestQueueUrl;
    }

    /**
     * Handle a login request sent from the client above.
     */
    public void handleLoginRequest(Message message) {
        // Assume doLogin does the actual work, and returns a serialized result
        String response = doLogin(message.getBody());

        // This extracts the URL of the temporary queue from the message attribute and sends the
        // response to that queue.
        sqsResponder.sendResponseMessage(MessageContent.fromMessage(message), new MessageContent(response));  
    }
}

Cleaning up

As with any other AWS SDK client, your code should call the shutdown() method when the temporary queue client is no longer needed. The AmazonSQSRequester interface also provides a shutdown() method, which shuts down its internal temporary queue client. This ensures that the in-memory resources needed for any virtual queues are reclaimed, and that the host queue that the client automatically created is also deleted automatically.

However, in the world of distributed systems things are a little more complex. Processes can run out of memory and crash, and hosts can reboot suddenly and unexpectedly. There are even cases where you don’t have the opportunity to run custom code on shutdown.

The Temporary Queue Client client addresses this issue as well. For each host queue with recent API calls, the client periodically uses the TagQueue API action to attach a fresh tag value that indicates the queue is still being used. The tagging process serves as a heartbeat to keep the queue alive. According to a configurable time period (by default, 5 minutes), a background thread uses the ListQueues API action to obtain the URLs of all queues with the configured prefix. Then, it deletes each queue that has not been tagged recently. The mechanism is similar to how the Amazon DynamoDB Lock Client expires stale lock leases.

If you use the AmazonSQSTemporaryQueuesClient directly, you can customize how long queues must be idle before they is deleted by configuring the IdleQueueRetentionPeriodSeconds queue attribute. The client supports setting this attribute on both host queues and virtual queues. For virtual queues, setting this attribute ensures that the in-memory resources do not become a memory leak in the presence of application bugs.

Any API call to a queue marks it as non-idle, including ReceiveMessage calls that don’t return any messages. The only reason to increase the retention period attribute is to give the client more time when it can’t send heartbeats—for example, due to garbage collection pauses or networking issues.

But what if you want to use this client in a fleet of a thousand EC2 instances? Won’t every client spend a lot of time checking every queue for idleness? Doesn’t that imply duplicate work that increases as the fleet is scaled up?

We thought of this too. The Temporary Queue Client creates a shared queue for all clients using the same queue prefix, and uses this queue as a work queue for the distributed task. Instead of every client calling the ListQueues API action every five minutes, a new seed message (which triggers the sweeping process) is sent to this queue every five minutes.

When one of the clients receives this message, it calls the ListQueues API action and sends each queue URL in the result as another kind of message to the same shared work queue. The work of actually checking each queue for idleness is distributed roughly evenly to the active clients, ensuring scalability. There is even a mechanism that works around the fact that the ListQueues API action currently only returns no more than 1,000 queue URLs at time.

Summary

We are excited about how the new Amazon SQS Temporary Queue Client makes more messaging patterns easier and cheaper for you. Download the code from GitHub, have a look at Temporary Queues in the Amazon SQS Developer Guide, try out the client, and let us know what you think!

Best Practices for Developing on AWS Lambda

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/best-practices-for-developing-on-aws-lambda/

In our previous post we discussed the various ways you can invoke AWS Lambda functions. In this post, we’ll provide some tips and best practices you can use when building your AWS Lambda functions.

One of the benefits of using Lambda, is that you don’t have to worry about server and infrastructure management. This means AWS will handle the heavy lifting needed to execute your Lambda functions. You should take advantage of this architecture with the tips below.

Tip #1: When to VPC-Enable a Lambda Function

Lambda functions always operate from an AWS-owned VPC. By default, your function has full ability to make network requests to any public internet address — this includes access to any of the public AWS APIs. For example, your function can interact with AWS DynamoDB APIs to PutItem or Query for records. You should only enable your functions for VPC access when you need to interact with a private resource located in a private subnet. An RDS instance is a good example.

RDS instance: When to VPC enable a Lambda function

Once your function is VPC-enabled, all network traffic from your function is subject to the routing rules of your VPC/Subnet. If your function needs to interact with a public resource, you will need a route through a NAT gateway in a public subnet.

Tip #2: Deploy Common Code to a Lambda Layer (i.e. the AWS SDK)

If you intend to reuse code in more than one function, consider creating a Layer and deploying it there. A great candidate would be a logging package that your team is required to standardize on. Another great example is the AWS SDK. AWS will include the AWS SDK for NodeJS and Python functions (and update the SDK periodically). However, you should bundle your own SDK and pin your functions to a version of the SDK you have tested.

Tip #3: Watch Your Package Size and Dependencies

Lambda functions require you to package all needed dependencies (or attach a Layer) — the bigger your deployment package, the slower your function will cold-start. Remove all unnecessary items, such as documentation and unused libraries. If you are using Java functions with the AWS SDK, only bundle the module(s) that you actually need to use — not the entire SDK.

Good:

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>dynamodb</artifactId>
    <version>2.6.0</version>
</dependency>

Bad:

<!-- https://mvnrepository.com/artifact/software.amazon.awssdk/aws-sdk-java -->
<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>aws-sdk-java</artifactId>
    <version>2.6.0</version>
</dependency>

Tip #4: Monitor Your Concurrency (and Set Alarms)

Our first post in this series talked about how concurrency can effect your down stream systems. Since Lambda functions can scale extremely quickly, this means you should have controls in place to notify you when you have a spike in concurrency. A good idea is to deploy a CloudWatch Alarm that notifies your team when function metrics such as ConcurrentExecutions or Invocations exceeds your threshold. You should create an AWS Budget so you can monitor costs on a daily basis. Here is a great example of how to set up automated cost controls.

Tip #5: Over-Provision Memory (in some use cases) but Not Function Timeout

Lambda allocates compute power in proportion to the memory you allocate to your function. This means you can over provision memory to run your functions faster and potentially reduce your costs. You should benchmark your use case to determine where the breakeven point is for running faster and using more memory vs running slower and using less memory.

However, we recommend you do not over provision your function time out settings. Always understand your code performance and set a function time out accordingly. Overprovisioning function timeout often results in Lambda functions running longer than expected and unexpected costs.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

How to migrate a digital signing workload to AWS CloudHSM

Post Syndicated from Tracy Pierce original https://aws.amazon.com/blogs/security/how-to-migrate-a-digital-signing-workload-to-aws-cloudhsm/

Is your on-premises Hardware Security Module (HSM) at end-of-life? Does continued maintenance of your on-premises hardware take a lot of time and cost a lot of money? Do you want or need all of your workloads to be performed on AWS? By migrating these workloads to AWS CloudHSM, you receive automated backups, low cost HSMs, managed maintenance, automatic recovery in event of a hardware failure, integrated fault tolerance, and high-availability. One such workload you might consider migrating is secret key material used for digital signing operations.

Enterprise certificate authority (CA) or public key infrastructure (PKI) applications use the private portion of an asymmetric key pair generated and stored in a hardware security module (HSM) to perform signing operations. Examples of such operations include the creation of digital certificates for web-servers or IoT devices, file signatures, or when negotiating a TLS session. Migrating this type of workload to AWS may save you time and money. If your HSM is at end of life and you need an alternative, you can migrate the digital signing workload to AWS CloudHSM in just a few steps.

This post will focus on a workload that allows you to create and use a digital certificate to digitally sign an arbitrary file. I’ll show you how to create a new asymmetric key pair and generate the corresponding certificate signing request (CSR) on AWS CloudHSM. This CSR, once signed by the appropriate issuing CA, allows your new key pair and the associated certificate to be trusted in the same way as the key pairs in your original HSM. You could then move traffic related to signing operations or issuing certificates to your AWS CloudHSM cluster.

Background

Before I walk you through the steps of migrating a certificate signing workload into CloudHSM, I’ll provide a little background information so you’ll know how CloudHSM, PKI, and CAs work together. Every certificate is associated with a key pair made up of a private (secret) key and a public key. The private key associated with a certificate needs to be kept confidential, so it typically resides on a hardware security module (HSM). The public portion of the key pair is not confidential, is included in the certificate, and can be shared with anyone who wants to verify a digital signature made with the corresponding private key. In a PKI, a CA is the trusted entity that issues digital certificates on behalf of end-entities. At the top of the trust hierarchy is a root CA, which is implicitly trusted when it is established because it acts as the root of trust for intermediate CAs and end-entity certificates that may be issued underneath it. Intermediate CAs are trusted because their certificates are signed by the root CA. Intermediate CAs in turn sign end-entity certificates, which are used to authenticate identities of various actors across the data transfer process. A common use case for end-entity certificates is for web servers so that connecting clients can verify the server’s identity. Generally, end-entity certificates are valid for 1-3 years, intermediate CA certificates are valid for 5-10 years, and root CAs are valid for 30 years or more.

Beyond solving for the non-repudiation of objects signed by end-entity certificates to ensure the owner of the private key performed the signing operation, there is still the problem of trusting that the owner of the private key is the identity they claim to be. When evaluating trust in this way, there are generally two options; relying on public CAs or private CAs. Public CAs widely distribute the public keys of their root certificates into popular client trust stores (for example, browsers and operating systems). This allows users to verify that the identity of the end-entity has been attested to by a publicly trusted CA. This helps when the signer and the verifier of the digital asset don’t know each other and haven’t shared cryptographic material with each other in advance to perform future validations. Private CAs are those for which there are no widely distributed copies of their associated public keys. The verifier has to retrieve the public key from the private CA and has to explicitly trust the cert without any third-party attestation of the signer’s identity. This is appropriate for cases when signers and verifiers are in the same company or know each other. Examples of when to use a private CA are securing virtual private networks, data or file replication between internal servers, remote backups, file-sharing, email, or other personal accounts.

Regardless of the certificate trust model you need, AWS CloudHSM can be used to create the initial key pair and CSR for both public and private CA requests. Note that AWS offers some alternatives for certificate management that may simplify your workloads without having to use AWS CloudHSM directly. AWS Certificate Manager (ACM) automatically creates key pairs and issues public or private certificates to identify resources within your organization. For use cases that need capabilities not yet supported by ACM, or in unusual situations in which a single-tenant HSM under your control is required for compliance reasons, you can use AWS CloudHSM directly for key generation and signing operations.

Organizations currently using an on-premises HSM for the creation of asymmetric keys used in digital certificates often use a vendor-proprietary mechanism to replicate key material across multiple HSMs for resiliency. However, this method prevents the key material from ever being transferred to an HSM offered by a different vendor. Consider it “vendor lock-in’ by design. So, the private key corresponding to the certificates you use for signing and authentication are locked inside that HSM. But if they are locked inside, how do you move to AWS CloudHSM? The answer is that you don’t have to rely on these inaccessible keys: you can create a new key pair and use it within AWS CloudHSM to begin issuing end-entity certificates.

Solution overview

I will go over creating a new private key in AWS CloudHSM using the Windows client and using Microsoft certreq to generate a corresponding CSR. You provide this CSR to your private or public CA to receive a signed certificate in return. This certificate and its public key then needs to be propagated to wherever your signatures are verified. At the end of this post, I will show you how to verify your digital signatures using Microsoft SignTool. SignTool is provided by Microsoft to allow Windows users to digitally sign files, verify file signatures, and file timestamps.
 

Figure 1: Procedural diagram

Figure 1: Procedural diagram

As shown in the diagram above, the steps followed in this post are:

  1. Create a new RSA private key using KSP/CNG through the AWS CloudHSM Windows client.
  2. Using Microsoft certreq, create your CSR.
  3. Provide the CSR to your CA for signing.
  4. Use Microsoft SignTool to sign files in your environment.

Note: You may have to register this new certificate with any partners that do not automatically verify the entire certificate chain. This could be 3rd party applications, vendors, or outside entities that utilize your certificates to determine trust.

Prerequisites

In this walkthrough, I assume that you already have an AWS CloudHSM cluster set up and initialized with at least one HSM device, and an Amazon Elastic Compute Cloud (EC2) Windows-based instance with the AWS CloudHSM client, PowerShell, and Windows SDK with Microsoft SignTool installed. You must have a crypto user (CU) on the HSM to perform the steps in this post.

Deploying the solution

Step 1: Create a new private key using KSP/CNG using the AWS CloudHSM Windows client

On your Windows server where the AWS CloudHSM Windows client is installed, use a text editor to create a certificate request file named IISCertRequest.inf. For the purpose of this post, I have filled out an example file below.


[Version]
Signature = "$Windows NT$"
[NewRequest]
Subject = "CN=example.com,C=US,ST=Washington,L=Seattle,O=ExampleOrg,OU=WebServer"
HashAlgorithm = SHA256
KeyAlgorithm = RSA
KeyLength = 2048
ProviderName = "Cavium Key Storage Provider"
KeyUsage = "CERT_DIGITAL_SIGNATURE_KEY_USAGE"
MachineKeySet = True    

Step 2: Using Microsoft certreq, create your CSR

On the same server, open PowerShell and, at the PowerShell prompt, create a CSR from the IISCertRequest.inf file by using the Windows certreq command. Here’s an example of the command. Remember to change out the text in red italics with your own file name.


PS C:\>certreq -new <IISCertRequest.inf IISCertRequest.csr> 
	SDK Version: 2.03
CertReq: Request Created

If successful, you’ll see the “Request Created” message above, as well as the new file <IISCertRequest.csr> on your server. This certificate will be provided to your choice of public CA for certificate issuance. This will need to be completed manually via your public CAs suggested method of certificate request.

Step 3: Provide the CSR to your CA for signing

The CA that had been signing your existing end-entity certificates with keys generated by your original HSM is the one you use to sign the new certificates with keys generated by AWS CloudHSM, as well. There are many CAs to choose from, such as Digicert, Trustwave, GoDaddy, and so on. You will want to follow their steps for submitting your CSR to receive your certificate in return.

Step 4: Use Microsoft SignTool to sign files in your environment

When you receive your signed certificate back from your chosen CA, save a copy locally on your Windows server. Then, move the certificate file to the Personal Certificate Store in Windows so it can be used by other applications, such as Microsoft SignTool. Here’s an example of the command. Be sure to replace the value in <red italics> with your actual certificate name.
PS C:\certreq -accept <signedCertificate.cer>

Now, the certificate is ready for use, and I’ll show you how to use it to sign a file. First, you have to get the thumbprint of your certificate. To do this, open PowerShell as an Administrator (right-click the app and choose Run as Administrator). Type this command:
PS C:\>Get-ChildItem -path cert:\LocalMachine\My

If successful, you should see an output similar to this. Copy the thumbprint that is returned. You’ll need it when you perform the actual signing operation on a file.


Thumbprint				                Subject
---------------						-----------
49DF7HDJT84723FDKCURLSXYRF9830568CXHSUB2		CN=WINDOWS-CA
VJFU57E6DI9DKMCHAKLDFJA8E73739Q04730QU7A		CN=www.example.com, OU=Certif….

To open the SignTool application, navigate to the app’s directory within PowerShell. By default, this is typically:
C:\Program Files (x86)\Windows Kits\<SDK Version> \bin\<version number> \<CPU architecture>

For example, if you had downloaded the Microsoft Windows SDK 10 version, the application would be stored in:

C:\Program Files (x86)\Windows Kits\10\bin\10.0.17763.0\x64

When you’ve located the directory, sign your file by running the command below. Remember to replace the values in <red italics> with your own values. The test.exe file in this example can be any valid executable file in your directory.
PS C:\>.\signtool.exe sign /v /fd sha256 /sha1 <thumbprint> /sm /as C:\Users\Administrator\Desktop\<test.exe>

You should see a message like this:


Done Adding Additional Store
Successfully signed C:\User\Administrator\Desktop\<test.exe>

Number of files successfully Signed: 1
Number of warnings: 0
Number of errors: 0

One last optional item you can do is verify the signature on the file using the command below. Again, replace your values for those in red italics.
PS C:\>.\signtool.exe verify /v /pa C:\Users\Administrators\Desktop\<test.exe>

You’ve now successfully migrated your file signing workload to AWS CloudHSM. If your signing certificate was not issued by a publicly trusted CA but instead by a private CA, make sure to deploy a copy of the root CA certificate and any intermediate certs from the private CA on any systems you want to verify the integrity of your signed file.

Conclusion

In this post, I walked you through creating a new RSA asymmetric key pair to create a CSR. After supplying the CSR to your chosen CA and receiving a signing certificate in return, I then showed you a how to use Microsoft SignTool with AWS CloudHSM to sign files in your environment. You can now use AWS CloudHSM to sign code, documents, or other certificates in the same method of your original HSMs.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS CloudHSM forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Tracy Pierce

Tracy Pierce is a Senior Consultant, Security Specialty, for Remote Consulting Services. She enjoys the peculiar culture of Amazon and uses that to ensure every day is exciting for her fellow engineers and customers alike. Customer Obsession is her highest priority and she shows this by improving processes, documentation, and building tutorials. She has her AS in Computer Security & Forensics from SCTD, SSCP certification, AWS Developer Associate certification, and AWS Security Specialist certification. Outside of work, she enjoys time with friends, her Great Dane, and three cats. She keeps work interesting by drawing cartoon characters on the walls at request.

Understanding the Different Ways to Invoke Lambda Functions

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/understanding-the-different-ways-to-invoke-lambda-functions/

In our first post, we talked about general design patterns to enable massive scale with serverless applications. In this post, we’ll review the different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Synchronous Invokes

Synchronous invocations are the most straight forward way to invoke your Lambda functions. In this model, your functions execute immediately when you perform the Lambda Invoke API call. This can be accomplished through a variety of options, including using the CLI or any of the supported SDKs.

Here is an example of a synchronous invoke using the CLI:

aws lambda invoke —function-name MyLambdaFunction —invocation-type RequestResponse —payload  “[JSON string here]”

The Invocation-type flag specifies a value of “RequestResponse”. This instructs AWS to execute your Lambda function and wait for the function to complete. When you perform a synchronous invoke, you are responsible for checking the response and determining if there was an error and if you should retry the invoke.

Many AWS services can emit events that trigger Lambda functions. Here is a list of services that invoke Lambda functions synchronously:

Asynchronous Invokes

Here is an example of an asynchronous invoke using the CLI:

aws lambda invoke —function-name MyLambdaFunction —invocation-type Event —payload  “[JSON string here]”

Notice, the Invocation-type flag specifies “Event.” If your function returns an error, AWS will automatically retry the invoke twice, for a total of three invocations.

Here is a list of services that invoke Lambda functions asynchronously:

Asynchronous invokes place your invoke request in Lambda service queue and we process the requests as they arrive. You should use AWS X-Ray to review how long your request spent in the service queue by checking the “dwell time” segment.

Poll based Invokes

This invocation model is designed to allow you to integrate with AWS Stream and Queue based services with no code or server management. Lambda will poll the following services on your behalf, retrieve records, and invoke your functions. The following are supported services:

AWS will manage the poller on your behalf and perform Synchronous invokes of your function with this type of integration. The retry behavior for this model is based on data expiration in the data source. For example, Kinesis Data streams store records for 24 hours by default (up to 168 hours). The specific details of each integration are linked above.

Conclusion

In our next post, we’ll provide some tips and best practices for developing Lambda functions. Happy coding!

 

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Configuring user creation workflows with AWS Step Functions and AWS Managed Microsoft AD logs

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/configuring-user-creation-workflows-with-aws-step-functions-and-aws-managed-microsoft-ad-logs/

This post is contributed by Taka Matsumoto, Cloud Support Engineer

AWS Directory Service lets you run Microsoft Active Directory as a managed service. Directory Service for Microsoft Active Directory, also referred to as AWS Managed Microsoft AD, is powered by Microsoft Windows Server 2012 R2. It manages users and makes it easy to integrate with compatible AWS services and other applications. Using the log forwarding feature, you can stay aware of all security events in Amazon CloudWatch Logs. This helps monitor events like the addition of a new user.

When new users are created in your AWS Managed Microsoft AD, you might go through the initial setup workflow manually. However, AWS Step Functions can coordinate new user creation activities into serverless workflows that automate the process. With Step Functions, AWS Lambda can be also used to run code for the automation workflows without provisioning or managing servers.

In this post, I show how to create and trigger a new user creation workflow in Step Functions. This workflow creates a WorkSpace in Amazon WorkSpaces and a user in Amazon Connect using AWS Managed Microsoft AD, Step Functions, Lambda, and Amazon CloudWatch Logs.

Overview

The following diagram shows the solution graphically.

Configuring user creation workflows with AWS Step Functions and AWS Managed Microsoft AD logs

Walkthrough

Using the following procedures, create an automated user creation workflow with AWS Managed Microsoft AD. The solution requires the creation of new resources in CloudWatch, Lambda, and Step Functions, and a new user in Amazon WorkSpaces and Amazon Connect. Here’s the list of steps:

  1. Enable log forwarding.
  2. Create the Lambda functions.
  3. Set up log streaming.
  4. Create a state machine in Step Functions.
  5. Test the solution.

Requirements

To follow along, you need the following resources:

  • AWS Managed Microsoft AD
    • Must be registered with Amazon WorkSpaces
    • Must be registered with Amazon Connect

In this example, you use an Amazon Connect instance with SAML 2.0-based authentication as identity management. For more information, see Configure SAML for Identity Management in Amazon Connect.

Enable log forwarding

Enable log forwarding for your AWS Managed Microsoft AD.  Use /aws/directoryservice/<directory id> for the CloudWatch log group name. You will use this log group name when creating a Log Streaming in Step 3.

Create Lambda functions

Create two Lambda functions. The first starts a Step Functions execution with CloudWatch Logs. The second performs a user registration process with Amazon WorkSpaces and Amazon Connect within a Step Functions execution.

Create the first function with the following settings:

  • Name: DS-Log-Stream-Function
  • Runtime: Python 3.7
  • Memory: 128 MB
  • Timeout: 3 seconds
  • Environment variables:
    • Key: stateMachineArn
    • Value: arn:aws:states:<Region>:<AccountId>:stateMachine:NewUserWorkFlow
  • IAM role with the following permissions:
    • AWSLambdaBasicExecutionRole
    • The following permissions policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "states:StartExecution",
            "Resource": "*"
        }
    ]
}
import base64
import boto3
import gzip
import json
import re
import os
def lambda_handler(event, context):
    logEvents = DecodeCWPayload(event)
    print('Event payload:', logEvents)
    returnResultDict = []
    
    # Because there can be more than one message pushed in a single payload, use a for loop to start a workflow for every user
    for logevent in logEvents:
        logMessage = logevent['message']
        upnMessage =  re.search("(<Data Name='UserPrincipalName'>)(.*?)(<\/Data>)",logMessage)
        if upnMessage != None:
            upn = upnMessage.group(2).lower()
            userNameAndDomain = upn.split('@')
            userName = userNameAndDomain[0].lower()
            userNameAndDomain = upn.split('@')
            domainName = userNameAndDomain[1].lower()
            sfnInputDict = {'Username': userName, 'UPN': upn, 'DomainName': domainName}
            sfnResponse = StartSFNExecution(json.dumps(sfnInputDict))
            print('Username:',upn)
            print('Execution ARN:', sfnResponse['executionArn'])
            print('Execution start time:', sfnResponse['startDate'])
            returnResultDict.append({'Username': upn, 'ExectionArn': sfnResponse['executionArn'], 'Time': str(sfnResponse['startDate'])})

    returnObject = {'Result':returnResultDict}
    return {
        'statusCode': 200,
        'body': json.dumps(returnObject)
    }

# Helper function decode the payload
def DecodeCWPayload(payload):
    # CloudWatch Log Stream event 
    cloudWatchLog = payload['awslogs']['data']
    # Base 64 decode the log 
    base64DecodedValue = base64.b64decode(cloudWatchLog)
    # Uncompress the gzipped decoded value
    gunzipValue = gzip.decompress(base64DecodedValue)
    dictPayload = json.loads(gunzipValue)
    decodedLogEvents = dictPayload['logEvents']
    return decodedLogEvents

# Step Functions state machine execution function
def StartSFNExecution(sfnInput):
    sfnClient = boto3.client('stepfunctions')
    try:
        response = sfnClient.start_execution(
            stateMachineArn=os.environ['stateMachineArn'],
            input=sfnInput
        )
        return response
    except Exception as e:
        return e

For the other function used to perform a user creation task, use the following settings:

  • Name: SFN-New-User-Flow
  • Runtime: Python 3.7
  • Memory: 128 MB
  • Timeout: 3 seconds
  • Environment variables:
    • Key: nameDelimiter
    • Value: . [period]

This delimiter is used to split the username into a first name and last name, as Amazon Connect instances with SAML-based authentication require both a first name and last name for users. For more information, see CreateUser API and UserIdentity Info.

  • Key: bundleId
  • Value: <WorkSpaces bundle ID>

Run the following AWS CLI command to return Amazon-owned WorkSpaces bundles. Use one of the bundle IDs for the key-value pair.

aws workspaces describe-workspace-bundles –owner AMAZON

  • Key: directoryId
  • Value: <WorkSpaces directory ID>

Run the following AWS CLI command to return Amazon WorkSpaces directories. Use your directory ID for the key-value pair.

aws workspaces describe-workspace-directories

  • Key: instanceId
  • Value: <Amazon Connect instance ID>

Find the Amazon Connect instance ID the Amazon Connect instance ID.

  • Key: routingProfile
  • Value: <Amazon Connect routing profile>

Run the following AWS CLI command to list routing profiles with their IDs. For this walkthrough, use the ID for the basic routing profile.

aws connect list-routing-profiles –instance-id <instance id>

  • Key: securityProfile
  • Value: <Amazon Connect security profile>

Run the following AWS CLI command to list security profiles with their IDs. For this walkthrough, use the ID for an agent security profile.

aws connect list-security-profiles –instance-id  <instance id>

  • IAM role permissions:
    • AWSLambdaBasicExecutionRole

The following permissions policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "connect:CreateUser",
                "workspaces:CreateWorkspaces"
            ],
            "Resource": "*"
        }
    ]
}
import json
import os
import boto3

def lambda_handler(event, context):
    userName = event['input']['User']
    nameDelimiter = os.environ['nameDelimiter']
    if nameDelimiter in userName:
        firstName = userName.split(nameDelimiter)[0]
        lastName = userName.split(nameDelimiter)[1]
    else:
        firstName = userName
        lastName = userName
    domainName = event['input']['Domain']
    upn = event['input']['UPN']
    serviceName = event['input']['Service']
    if serviceName == 'WorkSpaces':
        # Setting WorkSpaces variables
        workspacesDirectoryId = os.environ['directoryId']
        workspacesUsername = upn
        workspacesBundleId = os.environ['bundleId']
        createNewWorkSpace = create_new_workspace(
            directoryId=workspacesDirectoryId,
            username=workspacesUsername,
            bundleId=workspacesBundleId
        )
        return createNewWorkSpace
    elif serviceName == 'Connect':
        createConnectUser = create_connect_user(
            connectUsername=upn,
            connectFirstName=firstName,
            connectLastName=lastName,
            securityProfile=os.environ['securityProfile'], 
            routingProfile=os.environ['routingProfile'], 
            instanceId=os.environ['instanceId']
        )
        return createConnectUser
    else:
        print(serviceName, 'is not recognized...')
        print('Available service names are WorkSpaces and Connect')
        unknownServiceException = {
            'statusCode': 500,
            'body': json.dumps(f'Service name, {serviceName}, is not recognized')}
        raise Exception(unknownServiceException)

class FailedWorkSpaceCreationException(Exception):
    pass

class WorkSpaceResourceExists(Exception):
    pass

def create_new_workspace(directoryId, username, bundleId):
    workspacesClient = boto3.client('workspaces')
    response = workspacesClient.create_workspaces(
        Workspaces=[{
                'DirectoryId': directoryId,
                'UserName': username,
                'BundleId': bundleId,
                'WorkspaceProperties': {
                    'RunningMode': 'AUTO_STOP',
                    'RunningModeAutoStopTimeoutInMinutes': 60,
                    'RootVolumeSizeGib': 80,
                    'UserVolumeSizeGib': 100,
                    'ComputeTypeName': 'VALUE'
                    }}]
                    )
    print('create_workspaces response:',response)
    for pendingRequest in response['PendingRequests']:
        if pendingRequest['UserName'] == username:
            workspacesResultObject = {'UserName':username, 'ServiceName':'WorkSpaces', 'Status': 'Success'}
            return {
                'statusCode': 200,
                'body': json.dumps(workspacesResultObject)
                }
    for failedRequest in response['FailedRequests']:
        if failedRequest['WorkspaceRequest']['UserName'] == username:
            errorCode = failedRequest['ErrorCode']
            errorMessage = failedRequest['ErrorMessage']
            errorResponse = {'Error Code:', errorCode, 'Error Message:', errorMessage}
            if errorCode == "ResourceExists.WorkSpace": 
                raise WorkSpaceResourceExists(str(errorResponse))
            else:
                raise FailedWorkSpaceCreationException(str(errorResponse))
                
def create_connect_user(connectUsername, connectFirstName,connectLastName,securityProfile,routingProfile,instanceId):
    connectClient = boto3.client('connect')
    response = connectClient.create_user(
                    Username=connectUsername,
                    IdentityInfo={
                        'FirstName': connectFirstName,
                        'LastName': connectLastName
                        },
                    PhoneConfig={
                        'PhoneType': 'SOFT_PHONE',
                        'AutoAccept': False,
                        },
                    SecurityProfileIds=[
                        securityProfile,
                        ],
                    RoutingProfileId=routingProfile,
                    InstanceId = instanceId
                    )
    connectSuccessResultObject = {'UserName':connectUsername,'ServiceName':'Connect','FirstName': connectFirstName, 'LastName': connectLastName,'Status': 'Success'}
    return {
        'statusCode': 200,
        'body': json.dumps(connectSuccessResultObject)
        }

Set up log streaming

Create a new CloudWatch Logs subscription filter that sends log data to the Lambda function DS-Log-Stream-Function created in Step 2.

  1. In the CloudWatch console, choose Logs, Log Groups, and select the log group, /aws/directoryservice/<directory id>, for the directory set up in Step 1.
  2. Choose Actions, Stream to AWS Lambda.
  3. Choose Destination, and select the Lambda function DS-Log-Stream-Function.
  4. For Log format, choose Other as the log format and enter “<EventID>4720</EventID>” (include the double quotes).
  5. Choose Start streaming.

If there is an existing subscription filter for the log group, run the following AWS CLI command to create a subscription filter for the Lambda function, DS-Log-Stream-Function.

aws logs put-subscription-filter \

--log-group-name /aws/directoryservice/<directoryid> \

--filter-name NewUser \

--filter-pattern "<EventID>4720</EventID>" \

--destination-arn arn:aws:lambda:<Region>:<ACCOUNT_NUMBER>:function:DS-Log-Stream-Function

For more information, see Using CloudWatch Logs Subscription Filters.

Create a state machine in Step Functions

The next step is to create a state machine in Step Functions. This state machine runs the Lambda function, SFN-New-User-Flow, to create a user in Amazon WorkSpaces and Amazon Connect.

Define the state machine, using the following settings:

  • Name: NewUserWorkFlow
  • State machine definition: Copy the following state machine definition:
{
    "Comment": "An example state machine for a new user creation workflow",
    "StartAt": "Parallel",
    "States": {
        "Parallel": {
            "Type": "Parallel",
            "End": true,
            "Branches": [
                {
                    "StartAt": "CreateWorkSpace",
                    "States": {
                        "CreateWorkSpace": {
                            "Type": "Task",
                            "Parameters": {
                                "input": {
                                    "User.$": "$.Username",
                                    "UPN.$": "$.UPN",
                                    "Domain.$": "$.DomainName",
                                    "Service": "WorkSpaces"
                                }
                            },
                            "Resource": "arn:aws:lambda:{region}:{account id}:function:SFN-New-User-Flow",
                            "Retry": [
                                {
                                    "ErrorEquals": [
                                        "WorkSpaceResourceExists"
                                    ],
                                    "IntervalSeconds": 1,
                                    "MaxAttempts": 0,
                                    "BackoffRate": 1
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "IntervalSeconds": 10,
                                    "MaxAttempts": 2,
                                    "BackoffRate": 2
                                }
                            ],
                            "Catch": [
                                {
                                    "ErrorEquals": [
                                        "WorkSpaceResourceExists"
                                    ],
                                    "ResultPath": "$.workspacesResult",
                                    "Next": "WorkSpacesPassState"
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "ResultPath": "$.workspacesResult",
                                    "Next": "WorkSpacesPassState"
                                }
                            ],
                            "End": true
                        },
                        "WorkSpacesPassState": {
                            "Type": "Pass",
                            "Parameters": {
                                "Result.$": "$.workspacesResult"
                            },
                            "End": true
                        }
                    }
                },
                {
                    "StartAt": "CreateConnectUser",
                    "States": {
                        "CreateConnectUser": {
                            "Type": "Task",
                            "Parameters": {
                                "input": {
                                    "User.$": "$.Username",
                                    "UPN.$": "$.UPN",
                                    "Domain.$": "$.DomainName",
                                    "Service": "Connect"
                                }
                            },
                            "Resource": "arn:aws:lambda:{region}:{account id}:function:SFN-New-User-Flow",
                            "Retry": [
                                {
                                    "ErrorEquals": [
                                        "DuplicateResourceException"
                                    ],
                                    "IntervalSeconds": 1,
                                    "MaxAttempts": 0,
                                    "BackoffRate": 1
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "IntervalSeconds": 10,
                                    "MaxAttempts": 2,
                                    "BackoffRate": 2
                                }
                            ],
                            "Catch": [
                                {
                                    "ErrorEquals": [
                                        "DuplicateResourceException"
                                    ],
                                    "ResultPath": "$.connectResult",
                                    "Next": "ConnectPassState"
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "ResultPath": "$.connectResult",
                                    "Next": "ConnectPassState"
                                }
                            ],
                            "End": true,
                            "ResultPath": "$.connectResult"
                        },
                        "ConnectPassState": {
                            "Type": "Pass",
                            "Parameters": {
                                "Result.$": "$.connectResult"
                            },
                            "End": true
                        }
                    }
                }
            ]
        }
    }
}

After entering the name and state machine definition, choose Next.

Configure the settings by choosing Create an IAM role for me. This creates an IAM role for the state machine to run the Lambda function SFN-New-User-Flow.

Here’s the list of states in the NewUserWorkFlow state machine definition:

  • Start—When the state machine starts, it creates a parallel state to start both the CreateWorkSpace and CreateConnectUser states.
  • CreateWorkSpace—This task state runs the SFN-New-User-Flow Lambda function to create a new WorkSpace for the user. If this is successful, it goes to the End state.
  • WorkSpacesPassState—This pass state returns the result from the CreateWorkSpace state.
  • CreateConnectUse — This task state runs the SFN-New-User-Flow Lambda function to create a user in Amazon Connect. If this is successful, it goes to the End state.
  • ConnectPassState—This pass state returns the result from the CreateWorkSpace state.
  • End

The following diagram shows how these states relate to each other.

Step Functions State Machine

Test the solution

It’s time to test the solution. Create a user in AWS Managed Microsoft AD. The new user should have the following attributes:

This starts a new state machine execution in Step Functions. Here’s the flow:

  1. When there is a user creation event (Event ID: 4720) in the AWS Managed Microsoft AD security log, CloudWatch invokes the Lambda function, DS-Log-Stream-Function, to start a new state machine execution in Step Functions.
  2. To create a new WorkSpace and create a user in the Amazon Connect instance, the state machine execution runs tasks to invoke the other Lambda function, SFN-New-User-Flow.

Conclusion

This solution automates the initial user registration workflow. Step Functions provides the flexibility to customize the workflow to meet your needs. This walkthrough included Amazon WorkSpaces and Amazon Connect; both services are used to register the new user. For organizations that create a number of new users on a regular basis, this new user automation workflow can save time when configuring resources for a new user.

The event source of the automation workflow can be any event that triggers the new user workflow, so the event source isn’t limited to CloudWatch Logs. Also, the integrated service used for new user registration can be any AWS service that offers API and works with AWS Managed Microsoft AD. Other programmatically accessible services within or outside AWS can also fill that role.

In this post, I showed you how serverless workflows can streamline and coordinate user creation activities. Step Functions provides this functionality, with the help of Lambda, Amazon WorkSpaces, AWS Managed Microsoft AD, and Amazon Connect. Together, these services offer increased power and functionality when managing users, monitoring security, and integrating with compatible AWS services.

How to set up an outbound VPC proxy with domain whitelisting and content filtering

Post Syndicated from Vesselin Tzvetkov original https://aws.amazon.com/blogs/security/how-to-set-up-an-outbound-vpc-proxy-with-domain-whitelisting-and-content-filtering/

Controlling outbound communication from your Amazon Virtual Private Cloud (Amazon VPC) to the internet is an important part of your overall preventive security controls. By limiting outbound traffic to certain trusted domains (called “whitelisting”) you help prevent instances from downloading malware, communicating with bot networks, or attacking internet hosts. It’s not practical to prevent all outbound web traffic, though. Often, you want to allow access to certain well-known domains (for example, to communicate with partners, to download software updates, or to communicate with AWS API endpoints). In this post, I’ll show you how to limit outbound web connections from your VPC to the internet, using a web proxy with custom domain whitelists or DNS content filtering services. The solution is scalable, highly available, and deploys in a fully automated way.

Solution benefits and deliverables

This solution is based on the open source HTTP proxy Squid. The proxy can be used for all workloads running in the VPC, like Amazon Elastic Compute Cloud (EC2) and AWS Fargate. The solution provides you with the following benefits:

  • An outbound proxy that permit connections to whitelisted domains that you define, while presenting customizable error messages when connections are attempted to unapproved domains.
  • Optional domain content filtering based on DNS, delivered by external services like OpenDNS, Quad9, CleanBrowsing, Yandex.DNS or others. For this option, you do need to be a customer of these external services.
  • Transparent encryption handling, due to the extraction of the domain information from the Server Name Indication (SNI) extension in TLS. Encryption in transit is preserved and end-to-end encryption is maintained.
  • An auto-scaling group with Elastic Load Balancing (ELB) Network Load Balancers that spread over several of your existing subnets (and Availability Zones) and scale based on CPU load.
  • One Elastic IP address per proxy instance for internet communication. Sometimes the web sites that you’re communicating want to know your IP address so they can accept traffic from you. Giving the proxies’ elastic IP addresses allows you to know what IP addresses your web connections will come from.
  • Proxy access logs delivered to CloudWatch Logs.
  • Proxy metrics, available in CloudWatch Metrics.
  • Automated solution deployment via AWS CloudFormation.

Out of scope

  • This solution does not serve applications that aren’t proxy capable. Deep packet inspection is also out of scope.
  • TLS encryption is kept end-to-end, and only the SNI extension is examined. For unencrypted traffic (HTTP), only the host header is analyzed.
  • DNS content filtering must be delivered by an external provider; this solution only integrates with it.

Services used, cost, and performance

The solution uses the following services:

In total, the solution costs a few dollars per day depending on the region and the bandwidth usage. If you are using a DNS filtering service, you may also be charged by the service provider.

Note: An existing VPC and internet gateway are prerequisites to this solution, and aren’t included in the pricing calculations.

Solution architecture

 

Figure 1: Solution overview

Figure 1: Solution overview

As shown in Figure 1:

  1. The solution is deployed automatically via an AWS CloudFormation template.
  2. CloudWatch Logs stores the Squid access log so that you can search and analyze it.
  3. The list of allowed (whitelisted) domains is stored in AWS Secrets Manager. The Amazon EC2 instance retrieves the domain list every 5 minutes via cronjob and updates the proxy configuration if the list has changed. The values in Secrets Manager are provisioned by CloudFormation and can be read only by the proxy EC2 instances.
  4. The client running on the EC2 instance must have proxy settings pointing toward the Network Load Balancer. The load balancer will forward the request to the fleet of proxies in the target group.

Prerequisites

  1. You need an already deployed VPC, with public and private subnets spreading over several Availability Zones (AZs). You can find a description of how to set up your VPC environment at Default VPC Setup.
  2. You must have an internet gateway, with routing set up so that only traffic from a public subnet can reach the internet.

You don’t need to have a NAT (network translation address) gateway deployed since this function will be provided by the outbound proxy.

Integration with content filtering DNS services

If you require content filtering from an external company, like OpenDNS or Yandex.DNS, you must register and become a customer of that service. Many have free services, in addition to paid plans if you need advanced statistics and custom categories. This is your responsibility as the customer. (Learn more about the shared responsibility between AWS and the customer.)

Your DNS service provider will assign you a list of DNS IP addresses. You’ll need to enter the IP addresses when you provision (see Installation below).

If the DNS provider requires it, you may give them the source IPs of the proxies. There are four reserved IPs that you can find in the stack output (see Output parameters below).

Installation (one-time setup)

    1. Select the Launch Stack button to launch the CloudFormation template:
      The "Launch Stack" button

      Note: You must sign in your AWS Account in order to launch the stack in the required region. The stack content can also be downloaded here.

    2. Provide the following proxy parameters, as shown in Figure 2:
      • Allowed domains: Enter your whitelisted domains. Use a leading dot (“.”) to indicate subdomains.
      • Custom DNS servers (optional): List any DNS servers that will be used by the proxy. Leave the default value to use the default Amazon DNS server.
      • Proxy Port: Enter the listener port of the proxy.
      • Instance Type: Enter the EC2 instance type that you want to use for the proxies. Instance type will affect vertical scaling capabilities and solution cost. For more information, see Amazon EC2 Instance Types.
      • AMI ID to be used: This field is prepopulated with the Amazon Machine Image (AMI) ID found in AWS Systems Manager Parameter Store. By default, it will point toward the latest Amazon Linux 2 image. You do not need to adjust this value.
      • SSH Key name (optional): Enter the name of the SSH key for your proxy EC2 instances. This is relevant only for debugging, or if you need to log in on the proxy servers. Consider using AWS Systems Manager Session Manager instead of SSH.
    3. Next, provide the following network parameters, as shown in Figure 2:
      • VPC ID: The VPC where the solution will be deployed.
      • Public subnets: The subnets where the proxies will be deployed. Select between 2 and 3 subnets.
      • Private subnets: The subnets where the Network Load Balancer will be deployed. Select between 2 and 3 subnets.
      • Allowed client CIDR: The value you enter here will be added to the proxy security group. By default, the private IP range 172.31.0.0/16 is allowed. The allowed block size is between a /32 netmask and an /8 netmask. This prevents you from using an open IP range like 0.0.0.0/0. If you were to set an open IP range, your proxies would accept traffic from anywhere on the internet, which is a bad practice.

 

Figure 2: Launching the CloudFormation template

Figure 2: Launching the CloudFormation template

 

  • When you’ve entered all your proxy and network parameters, select Next. On the following wizard screens, you can keep the default values and select Next and Create Stack.

 

Find the output parameters

After the stack status has changed to “deployed,” you’ll need to note down the output parameters to configure your clients. Look for the following parameters in the Outputs tab of the stack:

  • The domain name of the proxy that should be configured on the client
  • The port of the proxy that should be configured on the client
  • 4 Elastic IP addresses for the proxy’s instances. These are used for outbound connections to Internet.
  • The CloudWatch Log Group, for access logs.
  • The Security Group that is attached to the proxies.
  • The Linux command to set the proxy. You can copy and paste this to your shell.
Figure 3: Stack output parameters

Figure 3: Stack output parameters

Use the proxy

Proxy setting parameters are specific to every application. Most Linux application use the environment variables http_proxy and https_proxy.

    1. Log in on the Linux EC2 instance that’s allowed to use the proxy.
    2. To set the shell parameter temporarily (only for the current shell session), execute the following export commands:
      
          [[email protected] ~]$ export http_proxy=http://<Proxy-DOMAIN>:<Proxy-Port>
          [[email protected] ~]$ export https_proxy=$http_proxy
          

      1. Replace <Proxy-DOMAIN> with the domain of the load balancer, which you can find in the stack output parameter.
      2. Replace <Proxy-Port> with the port of your proxy, which is also listed in the stack output parameter.

 

  1. Next, you can use cURL (for example) to test the connection. Replace <URL> with one of your whitelisted URLs:
    
            [[email protected] ~]$ curl -k <URL> -k                                                                
            <!DOCTYPE html>
            …
        

  2. You can add the proxy parameter permanently to interactive and non-interactive shells. If you do this, you won’t need to set them again after reloading. Execute the following commands in your application shell:
    
            [[email protected] ~]$ echo 'export http_proxy=http://<Proxy-DOMAIN>:<Proxy-Port>' >> ~/.bashrc
            [[email protected] ~]$ echo 'export https_proxy=$http_proxy' >> ~/.bashrc
            
            [[email protected] ~]$ echo 'export http_proxy=http://<Proxy-DOMAIN>:<Proxy-Port>' >> ~/.bash_profile
            [[email protected] ~]$ echo 'export https_proxy=$http_proxy' >> ~/.bash_profile
        

    1. Replace <Proxy-DOMAIN> with the domain of the load balancer.
    2. Replace <Proxy-Port> with the port of your proxy.

Customize the access denied page

An error page will display when a user’s access is blocked or if there’s an internal error. You can adjust the look and feel of this page (HTML or styles) according to the Squid error directory tag.

Use the proxy access log

The proxy access log is an important tool for troubleshooting. It contains the client IP address, the destination domain, the port, and errors with timestamps. The access logs from Squid are uploaded to CloudWatch. You can find them from the CloudWatch console under Log Groups, with the prefix Proxy, as shown in the figure below.

Figure 4: CloudWatch log with access group

Figure 4: CloudWatch log with access group

You can use CloudWatch Insight to analyze and visualize your queries. See the following figure for an example of denied connections visualized on a timeline:

Figure 5: Access logs analysis with CloudWatch Insight

Figure 5: Access logs analysis with CloudWatch Insight

Monitor your metrics with CloudWatch

The main proxy metrics are upload every five minutes to CloudWatch Metrics in the proxy namespace:

  • client_http.errors /sec – errors in processing client requests per second
  • client_http.hits /sec – cache hits per second
  • client_http.kbytes_in /sec – client uploaded data per second
  • client_http.kbytes_out /sec – client downloaded data per second
  • client_http.requests /sec – number of requests per second
  • server.all.errors /sec – proxy server errors per second
  • server.all.kbytes_in /sec – proxy server uploaded data per second
  • server.all.kbytes_out /sec – proxy downloaded data per second
  • server.all.requests /sec – all requests sent by proxy server per second

In the figure below, you can see an example of metrics. For more information on metric use, see the Squid project information.

Figure 6: Example of CloudWatch metrics

Figure 6: Example of CloudWatch metrics

Manage the proxy configuration

From time to time, you may want to add or remove domains from the whitelist. To change your whitelisted domains, you must update the input values in the CloudFormation stack. This will cause the values stored in Secrets Manager to update as well. Every five minutes, the proxies will pull the list from Secrets Manager and update as needed. This means it can take up to five minutes for your change to propagate. The change will be propagated to all instances without terminating or deploying them.

Note that when the whitelist is updated, the Squid proxy processes are restarted, which will interrupt ALL connections passing through them at that time. This can be disruptive, so be careful about when you choose to adjust the whitelist.

If you want to change other CloudFormation parameters, like DNS or Security Group settings, you can again update the CloudFormation stack with new values. The CloudFormation stack will launch a new instance and terminate legacy instances (a rolling update).

You can change the proxy Squid configuration by editing the CloudFormation template (section AWS::CloudFormation::Init) and updating the stack. However, you should not do this unless you have advanced AWS and Squid experience.

Update the instances

To update your AMI, you can update the stack. If the AMI has been updated with a newer version, then a rolling update will redeploy the EC2 instances and Squid software. This automates the process of patching managed instances with both security-related and other updates. If the AMI has not changed, no update will be performed.

Alternately, you can terminate the instance, and the auto scaling group will launch a new instance with the latest updates for Squid and the OS, starting from scratch. This approach may lead to a short service interruption for the clients served on this instance, during the time in which the load balancer is switching to an active instance.

Troubleshooting

I’ve summarized a few common problems and solutions below.

ProblemSolutions
I receive timeout at client application.
  • Check that you’ve configured the client application to use the proxy. (See Using a proxy, above.)
  • Check that the Security Group allows access from the client instance.
  • Verify that your NACL and routing table allow communication to and from the Network Load Balancer.
I receive an error page that access was blocked by the administrator.Check the stack input parameter for allowed domains. The domains must be comma separated. Included subdomains must start with dot. For example:

  • To include www.amazon.com, specify www.amazon.com
  • To include all subdomains of amazon.com as part of a list, specify .amazon.com
I received a 500 error page from the proxy.
  • Make sure that the proxy EC2 instance has internet access. The public subnets must have an Internet Gateway connected and set as the default route.
  • Check the DNS input parameter in the CloudFormation stack, if you use an external DNS service. Make sure the DNS provider has the correct proxy IPs (if you were required to provide them
The webpage doesn’t look as expected. There are fragments or styles missing.Many pages download content from multiple domains. You need to whitelist all of these domains. Use the access logs in CloudWatch Log to determine which domains are blocked, then update the stack.
On the proxy error page, I receive “unknown certificate issuer.”During the setup, a self-signed certificate for the squid error page is generated. If you need to add your own certificate, you can adapt the CloudFormation template. This requires moderate knowledge of Unix/Linux and AWS CloudFormation.

Conclusion

In this blog post, I showed you how you can configure an outbound proxy for controlling the internet communication from a VPC. If you need Squid support, you can find various offerings on the Squid Support page. AWS forums provides support for Amazon Elastic Compute Cloud (EC2). When you need AWS experts to help you plan, build, or optimise your infrastructure, consider engaging AWS Professional Services.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Vesselin Tzvetkov

Vesselin is senior security consultant at AWS Professional Services and is passionate about security architecture and engineering innovative solutions. Outside of technology, he likes classical music, philosophy, and sports. He holds a Ph.D. in security from TU-Darmstadt and a M.S. in electrical engineering from Bochum University in Germany.

How to Design Your Serverless Apps for Massive Scale

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/how-to-design-your-serverless-apps-for-massive-scale/

Serverless is one of the hottest design patterns in the cloud today, allowing you to focus on building and innovating, rather than worrying about the heavy lifting of server and OS operations. In this series of posts, we’ll discuss topics that you should consider when designing your serverless architectures. First, we’ll look at architectural patterns designed to achieve massive scale with serverless.

Scaling Considerations

In general, developers in a “serverful” world need to be worried about how many total requests can be served throughout the day, week, or month, and how quickly their system can scale. As you move into the serverless world, the most important question you should understand becomes: “What is the concurrency that your system is designed to handle?”

The AWS Serverless platform allows you to scale very quickly in response to demand. Below is an example of a serverless design that is fully synchronous throughout the application. During periods of extremely high demand, Amazon API Gateway and AWS Lambda will scale in response to your incoming load. This design places extremely high load on your backend relational database because Lambda can easily scale from thousands to tens of thousands of concurrent requests. In most cases, your relational databases are not designed to accept the same number of concurrent connections.

Serverless at scale-1

This design risks bottlenecks at your relational database and may cause service outages. This design also risks data loss due to throttling or database connection exhaustion.

Cloud Native Design

Instead, you should consider decoupling your architecture and moving to an asynchronous model. In this architecture, you use an intermediary service to buffer incoming requests, such as Amazon Kinesis or Amazon Simple Queue Service (SQS). You can configure Kinesis or SQS as out of the box event sources for Lambda. In design below, AWS will automatically poll your Kinesis stream or SQS resource for new records and deliver them to your Lambda functions. You can control the batch size per delivery and further place throttles on a per Lambda function basis.

Serverless at scale - 2

This design allows you to accept extremely high volume of requests, store the requests in a durable datastore, and process them at the speed which your system can handle.

Conclusion

Serverless computing allows you to scale much quicker than with server-based applications, but that means application architects should always consider the effects of scaling to your downstream services. Always keep in mind cost, speed, and reliability when you’re building your serverless applications.

Our next post in this series will discuss the different ways to invoke your Lambda functions and how to design your applications appropriately.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Updates to Serverless Architectural Patterns and Best Practices

Post Syndicated from Drew Dennis original https://aws.amazon.com/blogs/architecture/updates-to-serverless-architectural-patterns-and-best-practices/

As we sail past the halfway point between re:Invent 2018 and re:Invent 2019, I’d like to revisit some of the recent serverless announcements we’ve made. These are all complimentary to the patterns discussed in the re:Invent architecture track’s Serverless Architectural Patterns and Best Practices session.

AWS Event Fork Pipelines

AWS Event Fork Pipelines was announced in March 2019. Many customers use asynchronous event-driven processing in their serverless applications to decouple application components and address high concurrency needs. And in doing so, they often find themselves needing to backup, search, analyze, or replay these asynchronous events. That is exactly what AWS Event Fork Pipelines aims to achieve. You can plug them into a new or existing SNS topic used by your application and immediately address retention and compliance needs, gain new business insights, or even improve your application’s disaster recovery abilities.

AWS Event Fork Pipelines is a suite of three applications. The first application addresses event storage and backup needs by writing all events to an S3 bucket where they can be queried with services like Amazon Athena. The second is a search and analytics pipeline that delivers events to a new or existing Amazon ES domain, enabling search and analysis of your events. Finally, the third application is an event replay pipeline that can be used to reprocess messages should a downstream failure occur in your application. AWS Event Fork Pipelines is available in AWS Serverless Application Model (SAM) templates and are available in the AWS Serverless Application Repository (SAR). Check out our example e-commerce application on GitHub..

Amazon API Gateway Serverless Developer Portal

If you publish APIs for developers allowing them to build new applications and capabilities with your data, you understand the need for a developer portal. Also, in March 2019, we announced some significant upgrades to the API Gateway Serverless Developer Portal. The portal’s front end is written in React and is designed to be fully customizable.

The API Gateway Serverless Developer Portal is also available in GitHub and the AWS SAR. As you can see from the architecture diagram below, it is integrated with Amazon Cognito User Pools to allow developers to sign-up, receive an API Key, and register for one or more of your APIs. You can now also enable administrative scenarios from your developer portal by logging in as users belonging to the portal’s Admin group which is created when the portal is initially deployed to your account. For example, you can control which APIs appear in a customer’s developer portal, enable SDK downloads, solicit developer feedback, and even publish updates for APIs that have been recently revised.

AWS Lambda with Amazon Application Load Balancer (ALB)

Serverless microservices have been built by our customers for quite a while, with AWS Lambda and Amazon API Gateway. At re:Invent 2018 during Dr. Werner Vogel’s keynote, a new approach to serverless microservices was announced, Lambda functions as ALB targets.

ALB’s support for Lambda targets gives customers the ability to deploy serverless code behind an ALB, alongside servers, containers, and IP addresses. With this feature, ALB path and host-based routing can be used to direct incoming requests to Lambda functions. Also, ALB can now provide an entry point for legacy applications to take on new serverless functionality, and enable migration scenarios from monolithic legacy server or container-based applications.

Use cases for Lambda targets for ALB include adding new functionality to an existing application that already sits behind an ALB. This could be request monitoring by sending http headers to Elasticsearch clusters or implementing controls that manage cookies. Check out our demo of this new feature. For additional details, take a look at the feature’s documentation.

Security Overview of AWS Lambda Whitepaper

Finally, I’d be remiss if I didn’t point out the great work many of my colleagues have done in releasing the Security Overview of AWS Lambda Whitepaper. It is a succinct and enlightening read for anyone wishing to better understand the Lambda runtime environment, function isolation, or data paths taken for payloads sent to the Lambda service during synchronous and asynchronous invocations. It also has some great insight into compliance, auditing, monitoring, and configuration management of your Lambda functions. A must read for anyone wishing to better understand the overall security of AWS serverless applications.

I look forward to seeing everyone at re:Invent 2019 for more exciting serverless announcements!

About the author

Drew DennisDrew Dennis is a Global Solutions Architect with AWS based in Dallas, TX. He enjoys all things Serverless and has delivered the Architecture Track’s Serverless Patterns and Best Practices session at re:Invent the past three years. Today, he helps automotive companies with autonomous driving research on AWS, connected car use cases, and electrification.

How to securely provide database credentials to Lambda functions by using AWS Secrets Manager

Post Syndicated from Ramesh Adabala original https://aws.amazon.com/blogs/security/how-to-securely-provide-database-credentials-to-lambda-functions-by-using-aws-secrets-manager/

As a solutions architect at AWS, I often assist customers in architecting and deploying business applications using APIs and microservices that rely on serverless services such as AWS Lambda and database services such as Amazon Relational Database Service (Amazon RDS). Customers can take advantage of these fully managed AWS services to unburden their teams from infrastructure operations and other undifferentiated heavy lifting, such as patching, software maintenance, and capacity planning.

In this blog post, I’ll show you how to use AWS Secrets Manager to secure your database credentials and send them to Lambda functions that will use them to connect and query the backend database service Amazon RDS—without hardcoding the secrets in code or passing them through environment variables. This approach will help you secure last-mile secrets and protect your backend databases. Long living credentials need to be managed and regularly rotated to keep access into critical systems secure, so it’s a security best practice to periodically reset your passwords. Manually changing the passwords would be cumbersome, but AWS Secrets Manager helps by managing and rotating the RDS database passwords.

Solution overview

This is sample code: you’ll use an AWS CloudFormation template to deploy the following components to test the API endpoint from your browser:

  • An RDS MySQL database instance on a db.t2.micro instance
  • Two Lambda functions with necessary IAM roles and IAM policies, including access to AWS Secrets Manager:
    • LambdaRDSCFNInit: This Lambda function will execute immediately after the CloudFormation stack creation. It will create an “Employees” table in the database, where it will insert three sample records.
    • LambdaRDSTest: This function will query the Employees table and return the record count in an HTML string format
  • RESTful API with “GET” method on AWS API Gateway

Here’s the high level setup of the AWS services that will be created from the CloudFormation stack deployment:
 

Figure 1: Solution architecture

Figure 1: Architecture diagram

  1. Clients call the RESTful API hosted on AWS API Gateway
  2. The API Gateway executes the Lambda function
  3. The Lambda function retrieves the database secrets using the Secrets Manager API
  4. The Lambda function connects to the RDS database using database secrets from Secrets Manager and returns the query results

You can access the source code for the sample used in this post here: https://github.com/awslabs/automating-governance-sample/tree/master/AWS-SecretsManager-Lambda-RDS-blog.

Deploying the sample solution

Set up the sample deployment by selecting the Launch Stack button below. If you haven’t logged into your AWS account, follow the prompts to log in.

By default, the stack will be deployed in the us-east-1 region. If you want to deploy this stack in any other region, download the code from the above GitHub link, place the Lambda code zip file in a region-specific S3 bucket and make the necessary changes in the CloudFormation template to point to the right S3 bucket. (Please refer to the AWS CloudFormation User Guide for additional details on how to create stacks using the AWS CloudFormation console.)
 
Select this image to open a link that starts building the CloudFormation stack

Next, follow these steps to execute the stack:

  1. Leave the default location for the template and select Next.
     
    Figure 2: Keep the default location for the template

    Figure 2: Keep the default location for the template

  2. On the Specify Details page, you’ll see the parameters pre-populated. These parameters include the name of the database and the database user name. Select Next on this screen
     
    Figure 3: Parameters on the "Specify Details" page

    Figure 3: Parameters on the “Specify Details” page

  3. On the Options screen, select the Next button.
  4. On the Review screen, select both check boxes, then select the Create Change Set button:
     
    Figure 4: Select the check boxes and "Create Change Set"

    Figure 4: Select the check boxes and “Create Change Set”

  5. After the change set creation is completed, choose the Execute button to launch the stack.
  6. Stack creation will take between 10 – 15 minutes. After the stack is created successfully, select the Outputs tab of the stack, then select the link.
     
    Figure 5:  Select the link on the "Outputs" tab

    Figure 5: Select the link on the “Outputs” tab

    This action will trigger the code in the Lambda function, which will query the “Employee” table in the MySQL database and will return the results count back to the API. You’ll see the following screen as output from the RESTful API endpoint:
     

    Figure 6:   Output from the RESTful API endpoint

    Figure 6: Output from the RESTful API endpoint

At this point, you’ve successfully deployed and tested the API endpoint with a backend Lambda function and RDS resources. The Lambda function is able to successfully query the MySQL RDS database and is able to return the results through the API endpoint.

What’s happening in the background?

The CloudFormation stack deployed a MySQL RDS database with a randomly generated password using a secret resource. Now that the secret resource with randomly generated password has been created, the CloudFormation stack will use dynamic reference to resolve the value of the password from Secrets Manager in order to create the RDS instance resource. Dynamic references provide a compact, powerful way for you to specify external values that are stored and managed in other AWS services, such as Secrets Manager. The dynamic reference guarantees that CloudFormation will not log or persist the resolved value, keeping the database password safe. The CloudFormation template also creates a Lambda function to do automatic rotation of the password for the MySQL RDS database every 30 days. Native credential rotation can improve security posture, as it eliminates the need to manually handle database passwords through the lifecycle process.

Below is the CloudFormation code that covers these details:


#This is a Secret resource with a randomly generated password in its SecretString JSON.
MyRDSInstanceRotationSecret:
    Type: AWS::SecretsManager::Secret
    Properties:
    Description: 'This is my rds instance secret'
    GenerateSecretString:
        SecretStringTemplate: !Sub '{"username": "${!Ref RDSUserName}"}'
        GenerateStringKey: 'password'
        PasswordLength: 16
        ExcludeCharacters: '"@/\'
    Tags:
    -
        Key: AppNam
        Value: MyApp

#This is a RDS instance resource. Its master username and password use dynamic references to resolve values from
#SecretsManager. The dynamic reference guarantees that CloudFormation will not log or persist the resolved value
#We use a ref to the Secret resource logical id in order to construct the dynamic reference, since the Secret name is being
#generated by CloudFormation
MyDBInstance2:
    Type: AWS::RDS::DBInstance
    Properties:
    AllocatedStorage: 20
    DBInstanceClass: db.t2.micro
    DBName: !Ref RDSDBName
    Engine: mysql
    MasterUsername: !Ref RDSUserName
    MasterUserPassword: !Join ['', ['{{resolve:secretsmanager:', !Ref MyRDSInstanceRotationSecret, ':SecretString:password}}' ]]
    MultiAZ: False
    PubliclyAccessible: False      
    StorageType: gp2
    DBSubnetGroupName: !Ref myDBSubnetGroup
    VPCSecurityGroups:
    - !Ref RDSSecurityGroup
    BackupRetentionPeriod: 0
    DBInstanceIdentifier: 'rotation-instance'

#This is a SecretTargetAttachment resource which updates the referenced Secret resource with properties about
#the referenced RDS instance
SecretRDSInstanceAttachment:
    Type: AWS::SecretsManager::SecretTargetAttachment
    Properties:
    SecretId: !Ref MyRDSInstanceRotationSecret
    TargetId: !Ref MyDBInstance2
    TargetType: AWS::RDS::DBInstance
#This is a RotationSchedule resource. It configures rotation of password for the referenced secret using a rotation lambda
#The first rotation happens at resource creation time, with subsequent rotations scheduled according to the rotation rules
#We explicitly depend on the SecretTargetAttachment resource being created to ensure that the secret contains all the
#information necessary for rotation to succeed
MySecretRotationSchedule:
    Type: AWS::SecretsManager::RotationSchedule
    DependsOn: SecretRDSInstanceAttachment
    Properties:
    SecretId: !Ref MyRDSInstanceRotationSecret
    RotationLambdaARN: !GetAtt MyRotationLambda.Arn
    RotationRules:
        AutomaticallyAfterDays: 30

#This is a lambda Function resource. We will use this lambda to rotate secrets
#For details about rotation lambdas, see https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html     https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html
#The below example assumes that the lambda code has been uploaded to a S3 bucket, and that it will rotate a mysql database password
MyRotationLambda:
    Type: AWS::Serverless::Function
    Properties:
    Runtime: python2.7
    Role: !GetAtt MyLambdaExecutionRole.Arn
    Handler: mysql_secret_rotation.lambda_handler
    Description: 'This is a lambda to rotate MySql user passwd'
    FunctionName: 'cfn-rotation-lambda'
    CodeUri: 's3://devsecopsblog/code.zip'      
    Environment:
        Variables:
        SECRETS_MANAGER_ENDPOINT: !Sub 'https://secretsmanager.${AWS::Region}.amazonaws.com' 

Verifying the solution

To be certain that everything is set up properly, you can look at the Lambda code that’s querying the database table by following the below steps:

  1. Go to the AWS Lambda service page
  2. From the list of Lambda functions, click on the function with the name scm2-LambdaRDSTest-…
  3. You can see the environment variables at the bottom of the Lambda Configuration details screen. Notice that there should be no database password supplied as part of these environment variables:
     
    Figure 7: Environment variables

    Figure 7: Environment variables

    
        import sys
        import pymysql
        import boto3
        import botocore
        import json
        import random
        import time
        import os
        from botocore.exceptions import ClientError
        
        # rds settings
        rds_host = os.environ['RDS_HOST']
        name = os.environ['RDS_USERNAME']
        db_name = os.environ['RDS_DB_NAME']
        helperFunctionARN = os.environ['HELPER_FUNCTION_ARN']
        
        secret_name = os.environ['SECRET_NAME']
        my_session = boto3.session.Session()
        region_name = my_session.region_name
        conn = None
        
        # Get the service resource.
        lambdaClient = boto3.client('lambda')
        
        
        def invokeConnCountManager(incrementCounter):
            # return True
            response = lambdaClient.invoke(
                FunctionName=helperFunctionARN,
                InvocationType='RequestResponse',
                Payload='{"incrementCounter":' + str.lower(str(incrementCounter)) + ',"RDBMSName": "Prod_MySQL"}'
            )
            retVal = response['Payload']
            retVal1 = retVal.read()
            return retVal1
        
        
        def openConnection():
            print("In Open connection")
            global conn
            password = "None"
            # Create a Secrets Manager client
            session = boto3.session.Session()
            client = session.client(
                service_name='secretsmanager',
                region_name=region_name
            )
            
            # In this sample we only handle the specific exceptions for the 'GetSecretValue' API.
            # See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html
            # We rethrow the exception by default.
            
            try:
                get_secret_value_response = client.get_secret_value(
                    SecretId=secret_name
                )
                print(get_secret_value_response)
            except ClientError as e:
                print(e)
                if e.response['Error']['Code'] == 'DecryptionFailureException':
                    # Secrets Manager can't decrypt the protected secret text using the provided KMS key.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InternalServiceErrorException':
                    # An error occurred on the server side.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InvalidParameterException':
                    # You provided an invalid value for a parameter.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'InvalidRequestException':
                    # You provided a parameter value that is not valid for the current state of the resource.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
                elif e.response['Error']['Code'] == 'ResourceNotFoundException':
                    # We can't find the resource that you asked for.
                    # Deal with the exception here, and/or rethrow at your discretion.
                    raise e
            else:
                # Decrypts secret using the associated KMS CMK.
                # Depending on whether the secret is a string or binary, one of these fields will be populated.
                if 'SecretString' in get_secret_value_response:
                    secret = get_secret_value_response['SecretString']
                    j = json.loads(secret)
                    password = j['password']
                else:
                    decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary'])
                    print("password binary:" + decoded_binary_secret)
                    password = decoded_binary_secret.password    
            
            try:
                if(conn is None):
                    conn = pymysql.connect(
                        rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
                elif (not conn.open):
                    # print(conn.open)
                    conn = pymysql.connect(
                        rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
        
            except Exception as e:
                print (e)
                print("ERROR: Unexpected error: Could not connect to MySql instance.")
                raise e
        
        
        def lambda_handler(event, context):
            if invokeConnCountManager(True) == "false":
                print ("Not enough Connections available.")
                return False
        
            item_count = 0
            try:
                openConnection()
                # Introducing artificial random delay to mimic actual DB query time. Remove this code for actual use.
                time.sleep(random.randint(1, 3))
                with conn.cursor() as cur:
                    cur.execute("select * from Employees")
                    for row in cur:
                        item_count += 1
                        print(row)
                        # print(row)
            except Exception as e:
                # Error while opening connection or processing
                print(e)
            finally:
                print("Closing Connection")
                if(conn is not None and conn.open):
                    conn.close()
                invokeConnCountManager(False)
        
            content =  "Selected %d items from RDS MySQL table" % (item_count)
            response = {
                "statusCode": 200,
                "body": content,
                "headers": {
                    'Content-Type': 'text/html',
                }
            }
            return response        
        

In the AWS Secrets Manager console, you can also look at the new secret that was created from CloudFormation execution by following the below steps:

  1. Go to theAWS Secret Manager service page with appropriate IAM permissions
  2. From the list of secrets, click on the latest secret with the name MyRDSInstanceRotationSecret-…
  3. You will see the secret details and rotation information on the screen, as shown in the following screenshot:
     
    Figure 8: Secret details and rotation information

    Figure 8: Secret details and rotation information

Conclusion

In this post, I showed you how to manage database secrets using AWS Secrets Manager and how to leverage Secrets Manager’s API to retrieve the secrets into a Lambda execution environment to improve database security and protect sensitive data. Secrets Manager helps you protect access to your applications, services, and IT resources without the upfront investment and ongoing maintenance costs of operating your own secrets management infrastructure. To get started, visit the Secrets Manager console. To learn more, visit Secrets Manager documentation.

If you have feedback about this post, add it to the Comments section below. If you have questions about implementing the example used in this post, open a thread on the Secrets Manager Forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ramesh Adabala

Ramesh is a Solution Architect on the Southeast Enterprise Solution Architecture team at AWS.

How to use AWS Secrets Manager client-side caching in .NET

Post Syndicated from Sepehr Samiei original https://aws.amazon.com/blogs/security/how-to-use-aws-secrets-manager-client-side-caching-in-dotnet/

AWS Secrets Manager now has a client-side caching library for.NET that makes it easier to access secrets from .NET applications. This is in addition to client-side caching libraries for Java, JDBC, Python, and Go. These libraries help you improve availability, reduce latency, and reduce the cost of retrieving your secrets. Secrets Manager cache library does this by serving secrets out of a local cache and eliminating frequent Secrets Manager API calls.

AWS Secrets Manager enables you to automatically rotate, manage, and retrieve secrets throughout their lifecycle. Users and applications can access secrets through a call to Secrets Manager APIs, eliminating the need to hardcode sensitive information in plain text. It offers secret rotation with built-in integration for AWS services such as Amazon Relational Database Service (Amazon RDS) and Amazon Redshift, and it’s also extensible to other types of secrets. Secrets Manager enables you to control access to secrets using fine-grained permissions, and all actions on secrets, including retrievals, are traceable and auditable through AWS CloudTrail.

AWS Secrets Manager client-side caching for .NET extends benefits of the AWS Secrets Manager to wider use cases in .NET applications. These extra benefits are now available without having to spend precious time and effort on developing your own caching solution.

In this post, I’ll discuss the following topics:

  • The benefits offered by Secrets Manager client-side caching library for .NET
  • How Secrets Manager client-side caching library for .NET works
  • How to use Secrets Manager client-side library in .NET applications
  • How to extend Secrets Manager client-side library with your own custom logic

The benefits offered by Secrets Manager client-side caching library for .NET

Client-side caching is benefitial in following ways:

  • Availability: Network links sometimes suffer slowdowns or intermittent breaks. Client-side caching can significantly improve availability by eliminating a large number of API calls.
  • Latency: Retrieving secrets through API calls includes the network latency. Retrieving secrets from the local cache eliminates that latency and, therefore, improves performance.
  • Cost: Each API call to a Secrets Manager endpoint encounters a small charge. Using a local cache saves costs associated with API calls.

Using a client-side cache is a best practice; however, in the same way that you don’t want to reinvent the wheel everytime you need one, crafting your own client-side caching solution is suboptimal. The Secrets Manager client-side caching library relieves you from writing your own client-side caching solution while still giving you its benefits. Furthermore, it includes best practices such as:

  • Automatically refreshing cached secrets: the library periodically updates secrets to ensure your application gets the most recent version of a secret. You can control and change refresh intervals using configuration properties.
  • Integration with your applications: To use this library, just add the dependency to your .NET project and provide the identifier to the secret you want to access in your code.

How Secrets Manager client-side caching library for .NET works

The library is implemented in .NET Standard. This means you can reuse the same library in projects of all flavors of .NET, including .NET Framework, .NET Core, and Xamarin.

Note: Because the AWS Secrets Manager client-side caching library depends on Microsoft.Extensions.caching.memory, make sure you add it to your project dependencies.

As an extension to Secrets Manager .NET SDK, the cache library provides you an alternative to direct invocation of Secrets Manager API methods. You invoke cache library methods, and if the value doesn’t exist in the cache, the cache library invokes Secrets Manager methods on your behalf.

The default refresh interval for “current” version of secrets (the latest value stored in Secrets Manager for that secret) is 1 hour. This is because latest version may change from time to time. The library allows you to configure this frequency to higher or lower per your specific application requirements.

If you request a specific version of a secret by specifying both secret ID and secret version parameters, by default the library sets refresh interval to 48 hours. Since each version of a secret is immutable, there is no need to refresh them frequently.

You can also enable “Last known good value caching” to provide some protection in cases of transient network issues or service outages. If this is enabled, the cache will keep track of the last known good secret value, and in the event of an error occurring while refreshing a secret value from the service, the cache will return the last known good value. This feature is disabled by default, and can be enabled by setting the EnableLastKnownGoodValueCaching property of SecretsManagerCacheOptions class to true. You can pass your instance of SecretsManagerCacheOptions to SecretsManagerCache constructor.

The cache library provides a thread-safe implementation for both cache-check as well as entry populations. Therefore, simultaneous requests for a secret that is not available in the cache will result in a single API request to SecretsManager.

How to use Secrets Manager client-side caching library in .NET applications

You can add Secrets Manager client-side caching library to your projects either directly or through dependency injection. The dependency package is also available through NuGet. In this example, I use NuGet to add the library to my project. Open the NuGet Package Manager console and browse for AWSSDK.SecretsManager.Caching. Select the library and install it.
 

Figure 1: Select the AWSSDK.SecretsManager.Caching library

Figure 1: Select the AWSSDK.SecretsManager.Caching library

Before using the cache, you need to have at least one secret stored in your account using AWS Secrets Manager. To create a test secret:

  1. Go to the AWS Console, and then select AWS Secrets Manager.
  2. Select Store a new secret.
  3. For secret type, select Other type of secret, and then add three key/value pairs as shown here:
    
        {
          "Domain": "<yourDomainName>",
          "UserName": "<yourUserName>",
          "Password": "<yourPassword>"
        }  
        

  4. Next, create a cache object, and then invoke its methods with appropriate parameters. Below is a code snippet using AWS Secrets Manager client-side cache library to access our secret. Notice this snippet assumes you’ve added Newtonsoft.Json library to your project:
    
        public MyClass : IDisposable
        {
                private readonly IAmazonSecretsManager secretsManager;
                private readonly ISecretsManagerCache cache;
            
                public MyClass()
                {
                    this.secretsManager = new AmazonSecretsManagerClient();
                    this.cache = new SecretsManagerCache(this.secretsManager);
                }
        
                public void Dispose()
                {
                    this.secretsManager.Dispose();
                    this.cache.Dispose();
                }
            
                public async Task<NetworkCredential> GetNetworkCredential(string secretId)
                    {
                        var sec = await this.cache.GetSecretAsync(secretId);
                        var jo = Newtonsoft.Json.Linq.JObject.Parse(sec.SecretString);
                        return new NetworkCredential(
                            domain: jo["Domain"].ToObject<string>(),
                            userName: jo["Username"].ToObject<string>(),
                            password: jojo["Password"].ToObject<string>());
                    }
                }        
        

For ASP.NET projects, you can use the library with dependency-injection. To do this, you first have to register Secrets Manager caching to the dependency injection service collection in the Startup class of your ASP.NET project:


public class Startup
{
    public void ConfigureServices(IServiceCollection services)
    {
        services.AddSecretsManagerCaching();
    }
}

Then, you’ll be able to consume the cache using constructor injection in your classes.


public MyClass : IDisposable
    {
        private readonly ISecretsManagerCache cache;

        public MyClass(ISecretsManagerCache cache)
        {
            this.cache = cache;
        }

        public async Task<NetworkCredential> GetNetworkCredential(string secretId)
        {
            var sec = await this.cache.GetSecretAsync(secretId);
            var jo = Newtonsoft.Json.Linq.JObject.Parse(sec.SecretString);
                        return new NetworkCredential(
                        domain: jo["Domain"].ToObject<string>(),
                        userName: jo["Username"].ToObject<string>(),
                        password: jojo["Password"].ToObject<string>());
        }
    }    

How to add in-memory encryption and other custom extensions

The Secrets Manager caching library is designed to be extendable with your own custom logic. One possibility is to extend its implementation to include in-memory encryption of cached secrets to add another layer of protection on your retrieved secrets. For this purpose, you have to manually implement two of the interfaces included in the library. The library includes SecretCacheEntry class, implementing the interface ISecretCacheEntry. This is the object that stores secrets in memory. You could create another class implementing the same ISecretCacheEntry interface to add in-memory encryption/decryption.


public class EncryptedSecretCacheEntry : ISecretCacheEntry
    {
        public EncryptedSecretCacheEntry(GetSecretValueResponse response, TimeSpan expiry)
        {
            this.VersionId = response.VersionId;
            this.LastRetreived = DateTime.UtcNow;
            this.Name = response.Name;
            this.Expires = this.LastRetreived.Add(expiry);

            if (response.SecretBinary != null && response.SecretBinary.Length > 0)
            {
                using (var ms = response.SecretBinary)
                {
                    this.SecretBinary = ms.ToArray();
                }
            }
            else
            {
                this.SecretString = response.SecretString; 
            }
        }
    
        private byte[] _EncryptedSecretString;
        public string SecretString
        {
            get { return MyCustomCipherService.DecryptString(_EncryptedSecretString); }
            set { _EncryptedSecretString = MyCustomCipherService.EncryptString(value); }
        }
    
        private byte[] _EncryptedSecretBinary;
        public byte[] SecretBinary
        {
            get { return MyCustomCipherService.Decrypt(_EncryptedSecretBinary); }
            set { _EncryptedSecretBinary = MyCustomCipherService.Encrypt(value); }
        }
    
        public string VersionId { get; private set; }

        public string Name { get; private set; }

        public string LocalId => $"{this.Name}:{this.VersionId}";

        public DateTime LastRetreived { get; private set; }

        public DateTime Expires { get; private set; }
    } 

The second step is to implement the ISecretCacheEntryFactory class:


public class EncryptedSecretCacheEntryFactory : ISecretCacheEntryFactory
    {
        public ISecretCacheEntry CreateEntry(GetSecretValueResponse response, TimeSpan expiry)
        {
            return new EncryptedSecretCacheEntry(response, expiry);
        }
    }

Having these two classes, I can now modify the constructor of my SecretsUserClass to add my custom encryption logic to Secrets Manager cache library:


public SecretsUserClass()
    {
        this.secretsManager = new AmazonSecretsManagerClient();
        this.cache = new SecretsManagerCache(this.secretsManager, new   EncryptedSecretCacheEntryFactory(), new SecretsManagerCacheOptions(), null);
    }

You could even go further and fully customize the cache by implementing ISecretsManagerCache or implementing a child class that inherits functionality of SecretsManagerCache and adds new methods to it.

Conclusion

It’s critical for enterprises to protect secrets from unauthorized access and adhere to various industry or legislative compliance requirements. Mitigating the risk of compromise often involves complex techniques, significant effort, and costs, such as applying encryption, managing vaults and HSM modules, rotating secrets, audit access, and so on. Because the level of effort is high, many developers tend to use the much simpler, but substantially riskier, alternative of hard-coding secrets in application code, or simply storing secrets in plain-text format. These practices are problematic from the security and compliance point of view, but they need to be understood as symptoms of the more fundamental problem of complexity in the systems enterprises have built. To address the problem of weak security and compliance practices, you have to address the problem of complexity. Complex systems can be simplified and made more secure when they are reusable, accessible, and are automated, needing no human interaction.

In this post, I’ve shown how you can improve availability, reduce latency, and reduce the cost of using your secrets by using the Secrets Manager client-side caching library for .NET. I also showed how to extend it by implementing your own custom logic for more advanced use-cases, such as in-memory encryption of secrets.

To get started managing secrets, open the Secrets Manager console. To learn more, read
How to Store, Distribute, and Rotate Credentials Securely with Secret Manager or refer to the Secrets Manager documentation. See AWS Region Table for the list of AWS regions where Secrets Manager is available.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread in the Secrets Manager forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Sepehr Samiei

Sepehr is currently a Senior Solutions Architect at AWS. He started his professional career as a .Net developer, which continued for more than 10 years. Early on, he quickly became a fan of cloud computing and loves to help customers utilise the power of Microsoft tech on AWS. His wife and daughter are the most precious parts of his life, and he and his wife expect to have a son soon!

Simplify DNS management in a multi-account environment with Route 53 Resolver

Post Syndicated from Mahmoud Matouk original https://aws.amazon.com/blogs/security/simplify-dns-management-in-a-multiaccount-environment-with-route-53-resolver/

In a previous post, I showed you a solution to implement central DNS in a multi-account environment that simplified DNS management by reducing the number of servers and forwarders you needed when implementing cross-account and AWS-to-on-premises domain resolution. With the release of the Amazon Route 53 Resolver service, you now have access to a native conditional forwarder that will simplify hybrid DNS resolution even more.

In this post, I’ll show you a modernized solution to centralize DNS management in a multi-account environment by using Route 53 Resolver. This solution allows you to resolve domains across multiple accounts and between workloads running on AWS and on-premises without the need to run a domain controller in AWS.

Solution overview

My solution will show you how to solve three primary use-cases for domain resolution:

  • Resolving on-premises domains from workloads running in your VPCs.
  • Resolving private domains in your AWS environment from workloads running on-premises.
  • Resolving private domains between workloads running in different AWS accounts.

The following diagram explains the high-level full architecture.
 

Figure 1: Solution architecture diagram

Figure 1: Solution architecture diagram

In this architecture:

  1. This is the Amazon-provided default DNS server for the central DNS VPC, which we’ll refer to as the DNS-VPC. This is the second IP address in the VPC CIDR range (as illustrated, this is 172.27.0.2). This default DNS server will be the primary domain resolver for all workloads running in participating AWS accounts.
  2. This shows the Route 53 Resolver endpoints. The inbound endpoint will receive queries forwarded from on-premises DNS servers and from workloads running in participating AWS accounts. The outbound endpoint will be used to forward domain queries from AWS to on-premises DNS.
  3. This shows conditional forwarding rules. For this architecture, we need two rules, one to forward domain queries for onprem.private zone to the on-premises DNS server through the outbound gateway, and a second rule to forward domain queries for awscloud.private to the resolver inbound endpoint in DNS-VPC.
  4. This indicates that these two forwarding rules are shared with all other AWS accounts through AWS Resource Access Manager and are associated with all VPCs in these accounts.
  5. This shows the private hosted zone created in each account with a unique subdomain of awscloud.private.
  6. This shows the on-premises DNS server with conditional forwarders configured to forward queries to the awscloud.private zone to the IP addresses of the Resolver inbound endpoint.

Note: This solution doesn’t require VPC-peering or connectivity between the source/destination VPCs and the DNS-VPC.

How it works

Now, I’m going to show how the domain resolution flow of this architecture works according to the three use-cases I’m focusing on.

First use case

 

 Figure 2:  Use case for resolving on-premises domains from workloads running in AWS

Figure 2: Use case for resolving on-premises domains from workloads running in AWS

First, I’ll look at resolving on-premises domains from workloads running in AWS. If the server with private domain host1.acc1.awscloud.private attempts to resolve the address host1.onprem.private, here’s what happens:

  1. The DNS query will route to the default DNS server of the VPC that hosts host1.acc1.awscloud.private
  2. Because the VPC is associated with the forwarding rules shared from the central DNS account, these rules will be evaluated by the default Amazon-provided DNS in the VPC.
  3. In this example, one of the rules indicates that queries for onprem.private should be forwarded to an on-premises DNS server. Following this rule, the query will be forwarded to an on-premises DNS server.
  4. The forwarding rule is associated with the Resolver outbound endpoint, so the query will be forwarded through this endpoint to an on-premises DNS server.

In this flow, the DNS query that was initiated in one of the participating accounts has been forwarded to the centralized DNS server which, in turn, forwarded this to the on-premises DNS.

Second use case

Next, here’s how on-premises workloads will be able to resolve private domains in your AWS environment:
 

Figure 3: Use case for how on-premises workloads will be able to resolve private domains in your AWS environment

Figure 3: Use case for how on-premises workloads will be able to resolve private domains in your AWS environment

In this case, the query for host1.acc1.awscloud.private is initiated from an on-premises host. Here’s what happens next:

  1. The domain query is forwarded to on-premises DNS server.
  2. The query is then forwarded to the Resolver inbound endpoint via a conditional forwarder rule on the on-premises DNS server.
  3. The query reaches the default DNS server for DNS-VPC.
  4. Because DNS-VPC is associated with the private hosted zone acc1.awscloud.private, the default DNS server will be able to resolve this domain.

In this case, the DNS query has been initiated on-premises and forwarded to centralized DNS on the AWS side through the inbound endpoint.

Third use case

Finally, you might need to resolve domains across multiple AWS accounts. Here’s how you could achieve this:
 

Figure 4: Use case for how to resolve domains across multiple AWS accounts

Figure 4: Use case for how to resolve domains across multiple AWS accounts

Let’s say that host1 in host1.acc1.awscloud.private attempts to resolve the domain host2.acc2.awscloud.private. Here’s what happens:

  1. The domain query is sent to the default DNS server for the VPC hosting source machine (host1).
  2. Because the VPC is associated with the shared forwarding rules, these rules will be evaluated.
  3. A rule indicates that queries for awscloud.private zone should be forwarded to the resolver endpoint in DNS-VPC (for inbound endpoint IP addresses), which will then use the Amazon-provided default DNS to resolve the query.
  4. Because DNS-VPC is associated with the acc2.awscloud.private hosted zone, the default DNS will use auto-defined rules to resolve this domain.

This use case explains the AWS-to-AWS case where the DNS query has been initiated on one participating account and forwarded to central DNS for resolution of domains in another AWS account. Now, I’ll look at what it takes to build this solution in your environment.

How to deploy the solution

I’ll show you how to configure this solution in four steps:

  1. Set up a centralized DNS account.
  2. Set up each participating account.
  3. Create private hosted zones and Route 53 associations.
  4. Configure on-premises DNS forwarders.

Step 1: Set up a centralized DNS account

In this step, you’ll set up resources in the centralized DNS account. Primarily, this includes the DNS-VPC, Resolver endpoints, and forwarding rules.

  1. Create a VPC to act as DNS-VPC according to your business scenario, either using the web console or from an AWS Quick Start. You can review common scenarios in the Amazon VPC user guide; one very common scenarios is a VPC with public and private subnets.
  2. Create resolver endpoints. You need to create an outbound endpoint to forward DNS queries to on-premises DNS and an inbound endpoint to receive DNS queries forwarded from on-premises workloads and other AWS accounts.
  3. Create two forwarding rules. The first rule is to forward DNS queries for zone onprem.private to your on-premises DNS server IP addresses, and the second rule is to forward DNS queries for zone awscloud.private to the IP addresses of the resolver inbound endpoint.
  4. After creating the rules, associate them with DNS-VPC that was created in step #1. This will allow the Route 53 Resolver to start forwarding domain queries accordingly.
  5. Finally, you need to share the two forwarding rules with all participating accounts. To do that, you’ll use AWS Resource Access Manager and you can share the rules with your entire AWS Organization or with specific accounts.

Note: To be able to forward domain queries to your on-premises DNS server, you need connectivity between your data center and DNS-VPC, which could be established either using site-to-site VPN or AWS Direct Connect.

Step 2: Set up participating accounts

For each participating account, you need to configure your VPCs to use the shared forwarding rules, and you need to create a private hosted zone for each account.

  • Accept the shared rules from AWS Resource Access Manager. This step is not required if the rules were shared to your AWS Organization. Then, associate the forwarding rules with the VPCs that host your workloads in each account. Once associated, the resolver will start forwarding DNS queries according to the rules.

At this point, you should be able to resolve on-premises domains from workloads running in any VPC associated with the shared forwarding rules. To create private domains in AWS, you need to create Private Hosted Zones.

Step 3: Create private hosted zones

In this step, you need to create a private hosted zone in each account with a subdomain of awscloud.private. Use unique names for each private hosted zone to avoid domain conflicts in your environment (for example, acc1.awscloud.private or dev.awscloud.private).

  1. Create a private hosted zone in each participating account with a subdomain of awscloud.private and associate it with VPCs running in that account.
  2. Associate the private hosted zone with DNS-VPC. This allows the centralized DNS-VPC to resolve domains in the private hosted zone and act as a DNS resolver between AWS accounts.

Because the private hosted zone and DNS-VPC are in different accounts, you need to associate the private hosted zone with DNS-VPC. To do that, you need to create authorization from the account that owns the private hosted zone and accept this authorization from the account that owns DNS-VPC. You can do that using AWS CLI:

  1. In each participating account, create the authorization using the private hosted zone ID, the region, and the VPC ID that you want to associate (DNS-VPC).
    
        aws route53 create-vpc-association-authorization --hosted-zone-id <hosted-zone-id>  --vpc VPCRegion=<region> ,VPCId=<vpc-id>    
    

  2. In the centralized DNS account, associate the DNS-VPC with the hosted zone in each participating account.
    
        aws route53 associate-vpc-with-hosted-zone --hosted-zone-id <hosted-zone-id> --vpc VPCRegion=<region>,VPCId=<vpc-id>    
    

Step 4: Configure on-premises DNS forwarders

To be able to resolve subdomains within the awscloud.private domain from workloads running on-premises, you need to configure conditional forwarding rules to forward domain queries to the two IP addresses of resolver inbound endpoints that were created in the central DNS account. Note that this requires connectivity between your data center and DNS-VPC, which could be established either using site-to-site VPN or
AWS Direct Connect.

Additional considerations and limitations

Thanks to the flexibility of Route 53 Resolver and conditional forwarding rules, you can control which queries to send to central DNS and which ones to resolve locally in the same account. This is particularly important when you plan to use some AWS services, such as AWS PrivateLink or Amazon Elastic File System (EFS) because domain names associated with these services need to be resolved local to the account that owns them. In this section, I will name two use-cases that require additional considerations.

  1. Interface VPC Endpoints (AWS PrivateLink)

    When you create an AWS PrivateLink interface endpoint, AWS generates endpoint-specific DNS hostnames that you can use to communicate with the service. For AWS services and AWS Marketplace partner services, you can optionally enable private DNS for the endpoint. This option associates a private hosted zone with your VPC. The hosted zone contains a record set for the default DNS name for the service (for example, ec2.us-east-1.amazonaws.com) that resolves to the private IP addresses of the endpoint network interfaces in your VPC. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames.

    If you use private DNS for your endpoint, you have to resolve DNS queries to the endpoint local to the account and use the default DNS provided by AWS. So, in this case, I recommend that you resolve domain queries in amazonaws.com locally and not forward these queries to central DNS.

  2. Mounting EFS with a DNS name

    You can mount an Amazon EFS file system on an Amazon EC2 instance using DNS names. The file system DNS name automatically resolves to the mount target’s IP address in the Availability Zone of the connecting Amazon EC2 instance. To be able to do that, the VPC must use the default DNS provided by Amazon to resolve EFS DNS names.

    If you plan to use EFS in your environment, I recommend that you resolve EFS DNS names locally and avoid sending these queries to central DNS because clients in that case would not receive answers optimized for their availability zone, which might result in higher operation latencies and less durability.

Summary

In this post, I introduced a simplified solution to implement central DNS resolution in a multi-account and hybrid environment. This solution uses AWS Route 53 Resolver, AWS Resource Access Manager, and native Route 53 capabilities and it reduces complexity and operations effort by removing the need for custom DNS servers or forwarders in AWS environment.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on in the AWS forums.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Mahmoud Matouk

Mahmoud is part of our world-wide public sector Solutions Architects, helping higher education customers build innovative, secured, and highly available solutions using various AWS services.

How to decrypt ciphertexts in multiple regions with the AWS Encryption SDK in C

Post Syndicated from Liz Roth original https://aws.amazon.com/blogs/security/how-to-decrypt-ciphertexts-multiple-regions-aws-encryption-sdk-in-c/

You’ve told us that you want to encrypt data once with AWS Key Management Service (AWS KMS) and decrypt that data with customer master keys (CMKs) that you specify, often with CMKs in different AWS Regions. Doing this saves you compute resources and helps you to enable secure and efficient high-availability schemes.

The AWS Crypto Tools team has introduced the AWS Encryption SDK for C so you can achieve these goals. The new tool also adds more options for language and platform support and is fully interoperable with the implementations in Java and Python.

The AWS Encryption SDK is a client-side encryption library that helps make it easier for you to implement encryption best practices in your applications. You can use it with master keys from multiple sources, including AWS KMS CMKs. The AWS Encryption SDK doesn’t require AWS KMS or any other AWS service.

You can use AWS KMS APIs directly to encrypt data keys using multiple CMKs, but the AWS Encryption SDK provides tools to make working with multiple CMKs even easier, with everything you need stored in the Encryption SDK’s portable encrypted message format. The AWS Encryption SDK for C uses the concept of keyrings, which makes it easy to work with ciphertexts encrypted using multiple CMKs.

In this post, I will walk you through an example using the new AWS Encryption SDK for C. I’ll focus on some highlights from example code in the context of what an example application deployment might look like. You can find the complete example code in this GitHub repository. As always, we welcome your comments and your contributions.

Example scenario

To add some context around the example code, assume that you have a data processing application deployed both in US West (Oregon) us-west-2 and EU Central (Frankfurt) eu-central-1. For added durability, this example application creates and encrypts data in us-west-2 before it’s copied to the eu-central-1 Region. You have assurance that you could decrypt that data in us-west-2 if needed, but you want to mitigate the case where the decryption service in us-west-2 is unavailable. So how do you ensure you can decrypt your data in the eu-central-1 region when you need to?

In this example, your data processing application uses the AWS Encryption SDK and AWS KMS to generate a 256-bit data key to encrypt content locally in us-west-2. The AWS Encryption SDK for C deletes the plaintext data key after use, but an encrypted copy of that data key is included in the encrypted message that the AWS Encryption SDK returns. This prevents you from losing the encrypted copy of the data key, which would make your encrypted content unrecoverable. The data key is encrypted under the AWS KMS CMKs in each of the two regions in which you might want to decrypt the data in the future.

A best practice is to plan to decrypt data using in-region data keys and CMKs. This reduces latency and simplifies the permissions and auditing properties of the decryption operation. The latency impact from the cross-region API calls occur only during the encryption operation.

In this scenario, the AWS KMS CMK key policy permissions look like this:

  • To encrypt data, the AWS identity used by the data processing application in us-west-2 needs kms:GenerateDataKey permission on the us-west-2 CMK and kms:Encrypt permission on the eu-central-1 CMK. You can specify these permissions in a key policy or IAM policy. This will let the application create a data key in us-west-2 and encrypt the data key under CMKs in both AWS Regions.
  • To decrypt data, the AWS identity used by the data processing application in us-west-2 needs kms:Decrypt permissions on the CMK in us-west-2 or the CMK in eu-central-1.

Encryption path

First, define variables for the Amazon Resource Names (ARNs) of your CMKs in us-west-2 and eu-central-1. In the Encryption SDK for C, to encrypt, you can identify a CMK by its CMK ARN or the Alias ARN that is mapped to the CMK ARN.


const char *KEY_ARN_US_WEST_2 = "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab";

const char *KEY_ARN_EU_CENTRAL_1 = "arn:aws:kms:eu-central-1:111122223333:key/0987dcba-09fe-87dc-65ba-ab0987654321";      
     

Now, use the CMK ARNs to create a keyring. In the Encryption SDK, a keyring is used to generate, encrypt, and decrypt data keys under multiple master keys. You’ll create a KMS keyring configured to use multiple CMKs.


struct aws_cryptosdk_keyring *kms_keyring=Aws::Cryptosdk::KmsKeyring::Builder().Build(KEY_ARN_US_WEST_2, { KEY_ARN_EU_CENTRAL_1 });

When the AWS Encryption SDK uses this keyring to encrypt data, it calls GenerateDataKey on the first CMK that you specify, and Encrypt on each of the remaining CMKs that you specify. The result is a plaintext data key generated in us-west-2, an encryption of the data key using the CMK in us-west-2, and an encryption of the data key using the CMK in eu-central-1.

The plaintext data key that AWS KMS generated in us-west-2 is protected under a TLS session using only cipher suites that support forward-secrecy. The process of sending that same plaintext data key to the AWS KMS endpoint in eu-central-1 for encryption is also protected under a similar TLS session.

The Encryption SDK uses the data key to encrypt your data, and it stores the encrypted data keys with your encrypted content. The result is an encrypted message that can be decrypted using the CMK in us-west-2 or the CMK in eu-central-1.

Now that you understand what’s going to happen after you create the keyring, I’ll return to the code sample. Next, you need to create an encrypt-mode session with your keyring and pass in the CMM. In the AWS Encryption SDK for C, you use a session to encrypt a single plaintext message or decrypt a single ciphertext message, regardless of its size. The session maintains the state of the message throughout its processing.


struct aws_cryptosdk_session *session = aws_cryptosdk_session_new_from_keyring(alloc, AWS_CRYPTOSDK_ENCRYPT, kms_keyring);

With the keyring and encrypt-mode session, the data processing application can ask the Encryption SDK to encrypt the data under the CMKs that you specified in two different AWS regions:


aws_cryptosdk_session_process(
    session,
    out_ciphertext,
    out_ciphertext_buf_sz,
    out_ciphertext_len,
    in_plaintext,
    in_plaintext_len,
    &in_plaintext_consumed))

The result is an encrypted message that contains the ciphertext and two encrypted copies of the same data key. One encrypted data key was encrypted by your CMK in us-west-2 and other encrypted data key was encrypted by your CMK in eu-central-1.

Decryption path

In the AWS Encryption SDK for C, you use keyrings for both encrypting and decrypting. You can use the same keyring for both, or you can use different keyrings for each operation.

Why would you want to use a different keyring for decryption? At a high level, encrypt keyrings specify all CMKs that can decrypt the ciphertext. Decrypt keyrings constrain the CMKs the application is permitted to use.

Reusing a keyring for both encrypt and decrypt mode can simplify your AWS Encryption SDK client configuration, but splitting the keyring and using different AWS KMS clients provides more flexibility to meet your security and architecture goals. The option you choose depends in part on the constraints you want to place on the CMKs your application uses.

The Decrypt API in the AWS KMS service doesn’t permit you to specify a CMK as a request parameter. But the AWS Encryption SDK lets you specify one or many CMKs in a decryption keyring, or even discover which CMKs to try automatically. I’ll discuss each option in the next section.

Decryption path 1: Use a specific CMK

This keyring option configures the AWS Encryption SDK to use only a specified CMK in the specified AWS Region. This implies that your data processing application will need kms:Decrypt permissions on that specific CMK and your application will always call the same AWS KMS endpoints in the specified AWS Region. CloudTrail events from the Decrypt API will also only appear in the specified AWS Region.

You might use a specific CMK when the user or application that is decrypting the data has kms:Decrypt permission on only one of the CMKs that encrypted the data keys.

The CMK that you specify to decrypt the data must be one of the CMKs that was used to encrypt the data. Make sure that at least one of the CMKs from your encrypt keyring is included in the decrypt keyring and that the caller has kms:Decrypt permission to use it.

In my example, I encrypted the data keys using CMKs in us-west-2 and eu-central 1, so I’ll start decrypting in eu-central-1 because I want to have a specific decrypt instantiation of the data processing application dedicated to eu-central-1. Assume the eu-central-1 data processing application has configured AWS IAM credentials for a principal with permission to call the Decrypt operation on the eu-central-1 CMK.

Configure a keyring that asks the AWS Encryption SDK to use the CMK in eu-central-1 to decrypt:

Aws::Cryptosdk::KmsKeyring::Builder().Build(KEY_ARN_EU_CENTRAL_1)

The Encryption SDK reads the encrypted message, finds the encrypted data key that was encrypted using the CMK in eu-central-1, and uses this keyring to decrypt.

Decryption path 2: Use any of several CMKs

This keyring option configures the AWS Encryption SDK to try several specific CMKs during its decryption attempts, stopping as soon as it succeeds. You should configure the AWS IAM credentials used by your data processing application to have kms:Decrypt permissions on each of the specified regional CMKs.

Your application could end up calling multiple regional AWS KMS endpoints. CloudTrail events from the Decrypt API will appear in the AWS Region in which the decrypt operation succeeds, and in any of the other AWS Regions that the keyring attempts to use. The CMK that you specify to decrypt the data must be one of the CMKs that was used to encrypt the data. Make sure that at least one of the CMKs from your encrypt keyring is included in the decrypt keyring and that the application has kms:Decrypt permission to use it.

You might define an encryption keyring that includes multiple CMKs so that users with different permissions can decrypt the same message. For example, you might include in your encryption keyring keys in multiple AWS regions.

Here’s an example keyring constructed with multiple CMKs:

Aws::Cryptosdk::KmsKeyring::Builder().Build(KEY_ARN_EU_CENTRAL_1, { KEY_ARN_US_WEST_2 })

The AWS Encryption SDK reads each of the encrypted data keys stored in the encrypted message in the order that they appear. For each data key, the Encryption SDK searches the keyring for the matching CMK that encrypted it. If it finds that CMK, the AWS Encryption SDK calls AWS KMS in the AWS Region where the CMK exists to decrypt that data key, then uses that decrypted key to decrypt the message. If the decryption operation fails for any reason, the AWS Encryption SDK moves on to the next encrypted data key in the message and tries again.

The AWS Encryption SDK will try to decrypt the encrypted message in this way until either decryption succeeds, or the AWS Encryption SDK has attempted and failed to decrypt any of the encrypted data keys using the CMKs specified in the keyring.

If this keyring configuration looks familiar, it’s because it’s similar to the configuration you used on the encrypt path when you encrypted under multiple CMKs. The difference is this:

  • Encryption: The AWS Encryption SDK uses every CMK in the keyring to encrypt the data key, and adds all of the encrypted data keys to the encrypted message.
  • Decryption: The AWS Encryption SDK attempts to decrypt one of the encrypted data key using only the CMKs in the keyring. It stops as soon as it succeeds.

Decryption path 3: Strategic ARNs reduction using the Discovery keyring

The previous decryption paths required you to keep track of the exact CMKs used during the encryption operation, which may suit your needs for security and event logging. But what if you want more flexibility? What if you want to change the CMKs that you use in encryption operations without updating the data processing application that decrypts your data? You can configure a keyring that doesn’t specify CMKs to use for decryption, but instead tries each CMK that encrypted a data key until decryption succeeds or all referenced CMKs fail. We call this configuration a KMS Discovery keyring.

A Discovery keyring is equivalent to a keyring that includes all of the same CMKs that were used to encrypt the data, but it’s simpler and less error-prone. You might use a KMS Discovery keyring if you have no preference among the CMKs that encrypted a data key, and don’t mind the latency tradeoffs of trying CMKs in remote AWS Regions, or trying CMKs that will fail a permissions check while searching for one that succeeds. You can think of the KMS Discovery keyring as a universal keyring that you can use and reuse in your applications in many AWS Regions.

When you use a KMS Discovery keyring, the AWS Encryption SDK reads each encrypted data key and discovers the ARN of the CMK used to encrypt it. The AWS Encryption SDK then uses the configured IAM credentials to call AWS KMS in that CMK’s AWS Region to decrypt the data key. The AWS Encryption SDK repeats that process until it has decrypted the data key or runs out of encrypted data keys to try. .


Aws::Cryptosdk::KmsKeyring::Builder().BuildDiscovery();

While KMS Discovery keyrings are simpler, you run the risk of having your data processing application make a cross-region call to an AWS KMS endpoint that adds unwanted latency. In my example, you might not want the decrypting application running in us-west-2 to wait for the AWS Encryption SDK to call AWS KMS in eu-central-1. To use only the CMKs in a particular AWS Region to decrypt the data keys, create a KMS Regional Discovery keyring that specifies the AWS Region, but not the CMK ARNs. In my example, the following keyring allows the AWS Encryption SDK to use only CMKs in us-west-2.


Aws::Cryptosdk::KmsKeyring::Builder()
        .WithKmsClient(create_kms_client(Aws::Region::US_WEST_2)).BuildDiscovery());

Because this example KMS Regional Discovery keyring specifies a client for the us-west-2 AWS Region, not a CMK ARN, the AWS Encryption SDK will only try to decrypt any encrypted data key it finds that was encrypted using any CMK in us-west-2. If, for some reason, none of the encrypted data keys was encrypted using a CMK in us-west-2, or the application decrypting the data doesn’t have permission to use CMKs in us-west-2, the AWS Encryption SDK call to decrypt the message with this keyring fails and fails fast. This may provide you with more options for deterministic error handling.

Keep in mind that the KMS Regional Discovery keyring allows the AWS Encryption SDK to try the CMK for each encrypted data key in the specified AWS Region. However, AWS KMS never uses a CMK until it verifies that the caller has permission to perform the requested operation. If the application doesn’t have kms:Decrypt permission for any of the CMKs that were used to encrypt the data keys, decryption fails.

Summary

Encrypting KMS data keys using multiple CMKs provides a variety of options to decrypt ciphertexts to meet your security, auditing, and latency requirements. My examples show how encrypted messages can be decrypted by using AWS KMS CMKs in multiple AWS Regions. You can also use the Encryption SDK with master keys supplied by a custom key management infrastructure independent of AWS.

The AWS Encryption SDK’s portable and interoperable encrypted message format makes it easier to combine multiple encrypted data keys with your encrypted data to support the decryption access scheme you want. The AWS Encryption SDK for C brings these utilities to a new, broader set of platform and application environments to complement the existing Java and Python versions.

You can find the AWS Encryption SDK for C on GitHub.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS Crypto Tools forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Liz Roth

Liz is a Senior Software Development Engineer at Amazon Web Services. She has been at Amazon for more than 8 years and has more than 10 years of industry experience across a variety of areas, including security, networks, and operations.

Create fine-grained session permissions using IAM managed policies

Post Syndicated from Sulay Shah original https://aws.amazon.com/blogs/security/create-fine-grained-session-permissions-using-iam-managed-policies/

As a security best practice, AWS Identity and Access Management (IAM) recommends that you use temporary security credentials from AWS Security Token Service (STS) when you access your AWS resources. Temporary credentials are short-term credentials generated dynamically and provided to the user upon request. Today, one of the most widely used mechanisms for requesting temporary credentials in AWS is an IAM role. The advantage of using an IAM role is that multiple users in your organization can assume the same IAM role. By default, all users assuming the same role get the same permissions for their role session. To create distinctive role session permissions or to further restrict session permissions, users or systems can set a session policy when assuming a role. A session policy is an inline permissions policy which users pass in the session when they assume the role. You can pass the policy yourself, or you can configure your broker to insert the policy when your identities federate in to AWS (if you have an identity broker configured in your environment). This allows your administrators to reduce the number of roles they need to create, since multiple users can assume the same role yet have unique session permissions. If users don’t require all the permissions associated to the role to perform a specific action in a given session, your administrator can configure the identity broker to pass a session policy to reduce the scope of session permissions when users assume the role. This helps administrators set permissions for users to perform only those specific actions for that session.

With today’s launch, AWS now enables you to specify multiple IAM managed policies as session policies when users assume a role. This means you can use multiple IAM managed policies to create fine-grained session permissions for your user’s sessions. Additionally, you can centrally manage session permissions using IAM managed policies.
In this post, I review session policies and their current capabilities, introduce the concept of using IAM managed policies as session policies to control session permissions, and show you how to use managed policies to create fine-grained session permissions in AWS.

How do session policies work?

Before I walk through an example, I’ll review session policies.

A session policy is an inline policy that you can create on the fly and pass in the session during role assumption to further scope the permissions of the role session. The effective permissions of the session are the intersection of the role’s identity-based policies and the session policy. The maximum permissions that a session can have are the permissions that are allowed by the role’s identity-based policies. You can pass a single inline session policy programmatically by using the policy parameter with the AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity, and GetFederationToken API operations.

Next, I’ll provide an example with an inline session policy to demonstrate how you can restrict session permissions.

Example: Passing a session policy with AssumeRole API to restrict session permissions

Consider a scenario where security administrator John has administrative privileges when he assumes the role SecurityAdminAccess in the organization’s AWS account. When John assumes this role, he knows the specific actions he’ll perform using this role. John is cautious of the role permissions and follows the practice of restricting his own permissions by using a session policy when assuming the role. This way, John ensures that at any given point in time, his role session can only perform the specific action for which he assumed the SecurityAdminAccess role.

In my example, John only needs permissions to access an Amazon Simple Storage Service (S3) bucket called NewHireOrientation in the same account. He passes a session policy using the policy.json file below to reduce his session permissions when assuming the role SecurityAdminAccess.


{
"Version":"2012-10-17",
"Statement":[{
    "Sid":"Statement1",
    "Effect":"Allow",
    "Action":["s3:GetBucket", "s3:GetObject"],
    "Resource": ["arn:aws:s3:::NewHireOrientation", "arn:aws:s3:::NewHireOrientation/*"]
    }]
}  

In this example, the action and resources elements of the policy statement allow access only to the NewHireOrientation bucket and all the objects inside this bucket.

Using the AWS Command Line Interface (AWS CLI), John can pass the session policy’s file path (that is, file://policy.json) while calling the AssumeRole API with the following commands:


aws sts assume-role 
--role-arn "arn:aws:iam::111122223333:role/SecurityAdminAccess" 
--role-session-name "s3-session" 
--policy file://policy.json 

When John assumes the SecurityAdminAccess role using the above command, his effective session permissions are the intersection of the permissions on the role and the session policy. This means that although the SecurityAdminAccess role had administrative privileges, John’s resulting session permissions are s3:GetBucket and s3:GetObject on the NewHireOrientation bucket. This way, John can ensure he only has access to the NewHireOrientation bucket for this session.

Using IAM managed policies as session policies

You can now pass up to 10 IAM managed policies as session policies. This gives you the ability to further restrict session permissions. The managed policy you pass can be AWS managed or customer managed. To pass managed policies as session policies, you need to specify the Amazon Resource Name (ARN) of the IAM policies using the new policy-arns parameter in the AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity, or GetFederationToken API operations. You can use existing managed policies or create new policies in your account and pass them as session policies with any of the aforementioned APIs. The managed policies passed in the role session must be in the same account as that of the role. Additionally, you can pass an inline session policy and ARNs of managed policies in the same role session. To learn more about the sizing guidelines for session policies, please review the STS documentation.

Next, I’ll provide an example using IAM managed policies as session policies to help you understand how you can use multiple managed policies to create fine-grained session permissions.

Example: Passing IAM managed policies in a role session

Consider an example where Mary has a software development team in California (us-west-1) working on a project using Amazon Elastic Compute Cloud (EC2). This team needs permissions to spin up new EC2 instances to meet the project’s scalability requirements. Mary’s organization has a security policy that requires developers to create and manage AWS resources in their respective geographic locations only. This means a developer from California should have permissions to launch new EC2 instances only in California. Now, Mary’s organization has an identity and authentication system such as Active Directory, for which all employees already have identities created. Additionally, there is a custom identity broker application which verifies that employees are signed into the existing identity and authentication system. This broker application is configured to obtain temporary security credentials for the employees using the AssumeRole API. (To learn more about using identity provider and identity broker with AWS, please see AWS Federated Authentication with Active Directory Federation Services..)

Mary creates a managed policy called DevCalifornia and adds a region restriction for California using the aws:RequestedRegion condition key. Following the best practice of granting least privilege, Mary lists out the specific actions the developers would need for spinning up EC2 instances:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeVpcs",
                "ec2:DescribeInstances",
                "ec2:DescribeImages",
                "ec2:DescribeKeyPairs",
                "ec2:RunInstances"                               
            ],
            "Resource": "*",
                    "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "us-west-1"
                }
            }
        }
        
    ]
}    

The above policy grants specific permissions to launch EC2 instances. The condition element of the policy sets a restriction on the Region where these actions can be performed. The condition key aws:RequestedRegion ensures that these service-specific actions can only be performed in California.

For Mary’s team’s use case, instead of creating a new role Mary uses an existing role in her account called EC2Admin, which has the AmazonEC2FullAccess AWS managed policy attached to it, granting full access to Amazon EC2. Next, Mary configures the identity broker in such a way that the developers from the team in California can assume the EC2Admin role but with reduced session permissions. The broker passes the DevCalifornia managed policy as a session policy to reduce the scope of the session permissions when a developer from Mary’s team assumes the role. This way, Mary can ensure the team remains compliant with her organization’s security policy.

If performed using the AWS CLI, the command would look like this:

aws sts assume-role –role-arn “arn:aws:iam::444455556666:role/AppDev” –role-session-name “teamCalifornia-session” –policy-arns arn=”arn:aws:iam::444455556666:policy/DevCalifornia”

If you want to pass multiple managed policies as session policies, then the command would look like this:

aws sts assume-role –role-arn “arn:aws:iam::<accountID>:role/<RoleName>” –role-session-name “<example-session>” –policy-arns arn=”arn:aws:iam::<accountID>:policy/<PolicyName1>” arn=”arn:aws:iam::<accountID>:policy/<PolicyName2>”

In the above example, PolicyName1 and PolicyName2 can be AWS managed or customer managed policies. You can also use them in conjunction, where PolicyName1 is an AWS managed policy and PolicyName2 a customer managed policy.

Conclusion

You can now use IAM managed policies as session policies in role sessions and federated sessions to create fine-grained session permissions. You can use this functionality today by creating IAM managed policies using your existing inline session policies and referencing their policy ARNs in your role sessions. You can also keep using your existing session policy and pass the ARNs of IAM managed policies using the new policy-arn parameter to further scope your session permissions.

If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

Sulay Shah

Sulay is the product manager for Identity and Access Management service at AWS. He strongly believes in the customer first approach and is always looking for new opportunities to assist customers. Outside of work, Sulay enjoys playing soccer and watching movies. Sulay holds a master’s degree in computer science from the North Carolina State University.

Improve availability and latency of applications by using AWS Secret Manager’s Python client-side caching library

Post Syndicated from Paavan Mistry original https://aws.amazon.com/blogs/security/improve-availability-and-latency-of-applications-by-using-aws-secret-managers-python-client-side-caching-library/

Note from May 10, 2019: We’ve updated a code sample for accuracy.


Today, AWS Secrets Manager introduced a client-side caching library for Python that improves the availability and latency of accessing and distributing credentials to your applications. It can also help you reduce the cost associated with retrieving secrets. In this post, I’ll walk you through the following topics:

  • An overview of the Secrets Manager client-side caching library for Python
  • How to use the Python client-side caching library to retrieve a secret

Here are the key benefits of client-side caching libraries:

  • Improved availability: You can cache secrets to reduce the impact of network availability issues such as increased response times and temporary loss of network connectivity.
  • Improved latency: Retrieving secrets from the local cache is faster than retrieving secrets by sending API requests to Secrets Manager within a Virtual Private Network (VPN) or over the Internet.
  • Reduced cost: Retrieving secrets from the cache can reduce the number of API requests made to and billed by Secrets Manager.
  • Automatic refresh of secrets: The library updates the cache by calling Secrets Manager periodically, ensuring your applications use the most current secret value. This ensures any regularly rotated secrets are automatically retrieved.
  • Implementation in just two steps: Add the Python library dependency to your application, and then provide the identifier of the secret that you want the library to use.

Using the Secrets Manager client-side caching library for Python

First, I’ll walk you through an example in which I retrieve a secret without using the Python cache. Then I’ll show you how to update your code to use the Python client-side caching library.

Retrieving a secret without using a cache

Using the AWS SDK for Python (Boto3), you can retrieve a secret from Secrets Manager using the API call flow, as shown below.

Figure 1: Diagram showing GetSecretValue API call without the Python cache

Figure 1: Diagram showing GetSecretValue API call without the Python cache

To understand the benefits of using a cache, I’m going to create a sample secret using the AWS Command Line Interface (AWS CLI):


aws secretsmanager create-secret --name python-cache-test --secret-string "cache-test"

The code below demonstrates a GetSecretValue API call to AWS Secrets Manager without using the cache feature. Each time the application makes a call, the AWS Secrets Manager GetSecretValue API will also be called. This increases the secret retrieval latency. Additionally, there is a minor cost associated with an API call made to the AWS Secrets Manager API endpoint.


    import boto3
    import base64
    from botocore.exceptions import ClientError
    
    def get_secret():
    
        secret_name = "python-cache-test"
        region_name = "us-west-2"
    
        # Create a Secrets Manager client
        session = boto3.session.Session()
        client = session.client(
            service_name='secretsmanager',
            region_name=region_name
        )
    
        # In this sample we only handle the specific exceptions for the 'GetSecretValue' API.
        # See https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html
        # We rethrow the exception by default.
    
        try:
            get_secret_value_response = client.get_secret_value(
                SecretId=secret_name
            )
        except ClientError as e:
            if e.response['Error']['Code'] == 'DecryptionFailureException':
                # Secrets Manager can't decrypt the protected secret text using the provided KMS key.
                # Deal with the exception here, and/or rethrow at your discretion.
                raise e
        else:
            # Decrypts secret using the associated KMS CMK.
            # Depending on whether the secret is a string or binary, one of these fields will be populated.
            if 'SecretString' in get_secret_value_response:
                secret = get_secret_value_response['SecretString']
                print(secret)
            else:
                decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary'])
                
        # Your code goes here.
    
    get_secret()       

Using the Python client-side caching library to retrieve a secret

Using the Python cache feature, you can now use the cache library to reduce calls to the AWS Secrets Manager API, improving the availability and latency of your application. As shown in the diagram below, when you implement the Python cache, the call to retrieve the secret is routed to the local cache before reaching the AWS Secrets Manager API. If the secret exists in the cache, the application retrieves the secret from the client-side cache. If the secret does not exist in the client-side cache, the request is routed to the AWS Secrets Manager endpoint to retrieve the secret.

Figure 2: Diagram showing GetSecretValue API call using Python client-side cache

                                Figure 2: Diagram showing GetSecretValue API call using Python client-side cache

In the example below, I’ll implement a Python cache to retrieve the secret from a local cache, and hence avoid calling the AWS Secrets Manager API:


    import boto3
	import base64
	from aws_secretsmanager_caching import SecretCache, SecretCacheConfig

	from botocore.exceptions import ClientError

	def get_secret():

    	secret_name = "python-cache-test"
    	region_name = "us-west-2"

    	# Create a Secrets Manager client
    	session = boto3.session.Session()
    	client = session.client(
        	service_name='secretsmanager',
        	region_name=region_name
    	)

    	try:
        	# Create a cache
        	cache = SecretCache(SecretCacheConfig(),client)

        	# Get secret string from the cache
        	get_secret_value_response = cache.get_secret_string(secret_name)

    	except ClientError as e:
        	if e.response['Error']['Code'] == 'DecryptionFailureException':
            	# Deal with the exception here, and/or rethrow at your discretion.
            	raise e
    	else:
            	secret = get_secret_value_response
            	print(secret)
    	# Your code goes here.
	get_secret()    

The cache allows advanced configuration using the SecretCacheConfig library. This library allows you to define cache configuration parameters to help meet your application security, performance, and cost requirements. The SDK enforces the configuration thresholds on maximum cache size, default secret version stage to request, and secret refresh interval between requests. It also allows configuration of various exception thresholds. Further detail on this library is provided in the library.

Based on the secret refresh interval defined in your cache configuration, the cache will check the version of the secret at the defined interval, using the DescribeSecret API to determine if a new version is available. If there is a newer version of the secret, the cache will update to the latest version from AWS Secrets Manager, using the GetSecretValue API. This ensures that an updated version of the secret is available in the cache.

Additionally, the Python client-side cache library allows developers to retrieve secrets from the cache directly, using the secret name through decorator functions. An example of using a decorator function is shown below:


    from aws_secretsmanager_caching.decorators import InjectKeywordedSecretString
 
    class TestClass:
        def __init__(self):
            pass
     
        @InjectKeywordedSecretString('python-cache-test', cache, arg1='secret_key1', arg2='secret_key2')
        def my_test_function(arg1, arg2):
            print("arg1: {}".format(arg1))
            print("arg2: {}".format(arg2))
     
    test = TestClass()
    test.my_test_function()    

To delete the secret created in this post, run the command below:


aws secretsmanager delete-secret --secret-id python-cache-test --force-delete-without-recovery

Summary

In this post, we’ve showed how you can improve availability, reduce latency, and reduce API call cost for your secrets by using the Secrets Manager client-side caching library for Python. To get started managing secrets, open the Secrets Manager console. To learn more, read How to Store, Distribute, and Rotate Credentials Securely with Secret Manager or refer to the Secrets Manager documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Secrets Manager forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Paavan Mistry

Paavan is a Security Specialist Solutions Architect at AWS where he enjoys solving customers’ cloud security, risk, and compliance challenges. Outside of work, he enjoys reading about leadership, politics, law, and human rights.

How to migrate your EC2 Oracle Transparent Data Encryption (TDE) database encryption wallet to CloudHSM

Post Syndicated from Tracy Pierce original https://aws.amazon.com/blogs/security/how-to-migrate-your-ec2-oracle-transparent-data-encryption-tde-database-encryption-wallet-to-cloudhsm/

In this post, I’ll show you how to migrate an encryption wallet for an Oracle database installed on Amazon EC2 from using an outside HSM to using AWS CloudHSM. Transparent Data Encryption (TDE) for Oracle is a common use case for Hardware Security Module (HSM) devices like AWS CloudHSM. Oracle TDE uses what is called “envelope encryption.” Envelope encryption is when the encryption key used to encrypt the tables of your database is in turn encrypted by a master key that resides either in a software keystore or on a hardware keystore, like an HSM. This master key is non-exportable by design to protect the confidentiality and integrity of your database encryption. This gives you a more granular encryption scheme on your data.

An encryption wallet is an encrypted container used to store the TDE master key for your database. The encryption wallet needs to be opened manually after a database startup and prior to the TDE encrypted data being accessed, so the master key is available for data decryption. The process I talk about in this post can be used with any non-AWS hardware or software encryption wallet, or a hardware encryption wallet that utilizes AWS CloudHSM Classic. For my examples in this tutorial, I will be migrating from a CloudHSM Classic to a CloudHSM cluster. It is worth noting that Gemalto has announced the end-of-life for Luna 5 HSMs, which our CloudHSM Classic fleet uses.

Note: You cannot migrate from an Oracle instance in
Amazon Relational Database Service (Amazon RDS) to AWS CloudHSM. You must install the Oracle database on an Amazon EC2 instance. Amazon RDS is not currently integrated with AWS CloudHSM.

When you move from one type of encryption wallet to another, new TDE master keys are created inside the new wallet. To ensure that you have access to backups that rely on your old HSM, consider leaving the old HSM running for your normal recovery window period. The steps I discuss will perform the decryption of your TDE keys and then re-encrypt them with the new TDE master key for you.

Once you’ve migrated your Oracle databases to use AWS CloudHSM as your encryption wallet, it’s also a good idea to set up cross-region replication for disaster recovery efforts. With copies of your database and encryption wallet in another region, you can be back in production quickly should a disaster occur. I’ll show you how to take advantage of this by setting up cross-region snapshots of your Oracle database Amazon Elastic Block Store (EBS) volumes and copying backups of your CloudHSM cluster between regions.

Solution overview

For this solution, you will modify the Oracle database’s encryption wallet to use AWS CloudHSM. This is completed in three steps, which will be detailed below. First, you will switch from the current encryption wallet, which is your original HSM device, to a software wallet. This is done by reverse migrating to a local wallet. Second, you’ll replace the PKCS#11 provider of your original HSM with the CloudHSM PKCS#11 software library. Third, you’ll switch the encryption wallet for your database to your CloudHSM cluster. Once this process is complete, your database will automatically re-encrypt all data keys using the new master key.

To complete the disaster recovery (DR) preparation portion of this post, you will perform two more steps. These consist of copying over snapshots of your EC2 volumes and your CloudHSM cluster backups to your DR region. The following diagram illustrates the steps covered in this post.
 

Figure 1: Steps to migrate your EC2 Oracle TDE database encryption wallet to CloudHSM

Figure 1: Steps to migrate your EC2 Oracle TDE database encryption wallet to CloudHSM

  1. Switch the current encryption wallet for the Oracle database TDE from your original HSM to a software wallet via a reverse migration process.
  2. Replace the PKCS#11 provider of your original HSM with the AWS CloudHSM PKCS#11 software library.
  3. Switch your encryption wallet to point to your AWS CloudHSM cluster.
  4. (OPTIONAL) Set up cross-region copies of the EC2 instance housing your Oracle database
  5. (OPTIONAL) Set up a cross-region copy of your recent CloudHSM cluster backup

Prerequisites

This process assumes you have the below items already set up or configured:

Deploy the solution

Now that you have the basic steps, I’ll go into more detail on each of them. I’ll show you the steps to migrate your encryption wallet to a software wallet using a reverse migration command.

Step 1: Switching the current encryption wallet for the Oracle database TDE from your original HSM to a software wallet via a reverse migration process.

To begin, you must configure the sqlnet.ora file for the reverse migration. In Oracle databases, the sqlnet.ora file is a plain-text configuration file that contains information like encryption, route of connections, and naming parameters that determine how the Oracle server and client must use the capabilities for network database access. You will want to create a backup so you can roll back in the event of any errors. You can make a copy with the below command. Make sure to replace </path/to/> with the actual path to your sqlnet.ora file location. The standard location for this file is “$ORACLE_HOME/network/admin“, but check your setup to ensure this is correct.

cp </path/to/>sqlnet.ora </path/to/>sqlnet.ora.backup

The software wallet must be created before you edit this file, and it should preferably be empty. Then, using your favorite text editor, open the sqlnet.ora file and set the below configuration. If an entry already exists, replace it with the below text.


ENCRYPTION_WALLET_LOCATION=
  (SOURCE=(METHOD=FILE)(METHOD_DATA=
    (DIRECTORY=<path_to_keystore>)))

Make sure to replace the <path_to_keystore> with the directory location of your destination wallet. The destination wallet is the path you choose for the local software wallet. You will notice in Oracle the words “keystore” and “wallet” are interchangeable for this post. Next, you’ll configure the wallet for the reverse migration. For this, you will use the ADMINISTER KEY MANAGEMENT statement with the SET ENCRYPTION KEY and REVERSE MIGRATE clauses as shown in the example below.

By using the REVERSE MIGRATE USING clause in your statement, you ensure the existing TDE table keys and tablespace encryption keys are decrypted by the hardware wallet TDE master key and then re-encrypted with the software wallet TDE master key. You will need to log into the database instance as a user that has been granted the ADMINISTER KEY MANAGEMENT or SYSKM privileges to run this statement. An example of the login is below. Make sure to replace the <sec_admin> and <password> with your administrator user name and password for the database.


sqlplus c##<sec_admin> syskm
Enter password: <password> 
Connected.

Once you’re connected, you’ll run the SQL statement below. Make sure to replace <password> with your own existing wallet password and <username:password> with your own existing wallet user ID and password. We are going to run this statement with the WITH BACKUP parameter, as it’s always ideal to take a backup in case something goes incorrectly.

ADMINISTER KEY MANAGEMENT SET ENCRYPTION KEY IDENTIFIED BY <password> REVERSE MIGRATE USING “<username:password>” WITH BACKUP;

If successful, you will see the text keystore altered. When complete, you do not need to restart your database or manually re-open the local wallet as the migration process loads this into memory for you.

With the migration complete, you’ll now move onto the next step of replacing the PKCS#11 provider of your original HSM with the CloudHSM PKCS#11 software library. This library is a PKCS#11 standard implementation that communicates with the HSMs in your cluster and is compliant with PKCS#11 version 2.40.

Step 2: Replacing the PKCS#11 provider of your original HSM with the AWS CloudHSM PKCS#11 software library.

You’ll begin by installing the software library with the below two commands.

wget https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL6/cloudhsm-client-pkcs11-latest.el6.x86_64.rpm

sudo yum install -y ./cloudhsm-client-pkcs11-latest.el6.x86_64.rpm

When installation completes, you will be able to find the CloudHSM PKCS#11 software library files in the directory, the default directory for AWS CloudHSM’s software library installs. To ensure processing speed and throughput capabilities of the HSMs, I suggest installing a Redis cache as well. This cache stores key handles and attributes locally, so you may access them without making a call to the HSMs. As this step is optional and not required for this post, I will leave the link for installation instructions here. With the software library installed, you want to ensure the CloudHSM client is running. You can check this with the command below.

sudo start cloudhsm-client

Step 3: Switching your encryption wallet to point to your AWS CloudHSM cluster.

Once you’ve verified the client is running, you’re going to create another backup of the sqlnet.ora file. It’s always a good idea to take backups before making any changes. The command would be similar to below, replacing </path/to/> with the actual path to your sqlnet.ora file.

cp </path/to/>sqlnet.ora </path/to/>sqlnet.ora.backup2

With this done, again open the sqlnet.ora file with your favorite text editor. You are going to edit the line encryption_wallet_location to resemble the below text.


ENCRYPTION_WALLET_LOCATION=
  (SOURCE=(METHOD=HSM))

Save the file and exit. You will need to create the directory where your Oracle database will expect to find the library file for the AWS CloudHSM PKCS#11 software library. You do this with the command below.

sudo mkdir -p /opt/oracle/extapi/64/hsm

With the directory created, you next copy over the CloudHSM PKCS#11 software library from the original installation directory to this one. It is important this new directory only contain the one library file. Should any files exist in the directory that are not directly related to the way you installed the CloudHSM PKCS#11 software library, remove them. The command to copy is below.

sudo cp /opt/cloudhsm/lib/libcloudhsm_pkcs11_standard.so /opt/oracle/extapi/64/hsm

Now, modify the ownership of the directory and everything inside. The Oracle user must have access to these library files to run correctly. The command to do this is below.

sudo chown -R oracle:dba /opt/oracle

With that done, you can start your Oracle database. This completes the migration of your encryption wallet and TDE keys from your original encryption wallet to a local wallet, and finally to CloudHSM as the new encryption wallet. Should you decide you wish to create new TDE master encryption keys on CloudHSM, you can follow the steps here to do so.

These steps are optional, but helpful in the event you must restore your database to production quickly. For customers that leverage DR environments, we have two great blog posts here and here to walk you through each step of the cross-region replication process. The first uses a combination of AWS Step Functions and Amazon CloudWatch Events to copy your EBS snapshots to your DR region, and the second showcases how to copy your CloudHSM cluster backups to your DR region.

Summary

In this post, I walked you through how to migrate your Oracle TDE database encryption wallet to point it to CloudHSM for secure storage of your TDE. I showed you how to properly install the CloudHSM PKCS#11 software library and place it in the directory for Oracle to find and use. This process can be used to migrate most on-premisis encryption wallet to AWS CloudHSM to ensure security of your TDE keys and meet compliance requirements.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS CloudHSM forum.

Want more AWS Security news? Follow us on Twitter.

Author

Tracy Pierce

Tracy Pierce is a Senior Cloud Support Engineer at AWS. She enjoys the peculiar culture of Amazon and uses that to ensure every day is exciting for her fellow engineers and customers alike. Customer Obsession is her highest priority and she shows this by improving processes, documentation, and building tutorials. She has her AS in Computer Security & Forensics from SCTD, SSCP certification, AWS Developer Associate certification, and AWS Security Specialist certification. Outside of work, she enjoys time with friends, her Great Dane, and three cats. She keeps work interesting by drawing cartoon characters on the walls at request.

From Poll to Push: Transform APIs using Amazon API Gateway REST APIs and WebSockets

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/from-poll-to-push-transform-apis-using-amazon-api-gateway-rest-apis-and-websockets/

This post is courtesy of Adam Westrich – AWS Principal Solutions Architect and Ronan Prenty – Cloud Support Engineer

Want to deploy a web application and give a large number of users controlled access to data analytics? Or maybe you have a retail site that is fulfilling purchase orders, or an app that enables users to connect data from potentially long-running third-party data sources. Similar use cases exist across every imaginable industry and entail sending long-running requests to perform a subsequent action. For example, a healthcare company might build a custom web portal for physicians to connect and retrieve key metrics from patient visits.

This post is aimed at optimizing the delivery of information without needing to poll an endpoint. First, we outline the current challenges with the consumer polling pattern and alternative approaches to solve these information delivery challenges. We then show you how to build and deploy a solution in your own AWS environment.

Here is a glimpse of the sample app that you can deploy in your own AWS environment:

What’s the problem with polling?

Many customers need implement the delivery of long-running activities (such as a query to a data warehouse or data lake, or retail order fulfillment). They may have developed a polling solution that looks similar to the following:

  1. POST sends a request.
  2. GET returns an empty response.
  3. Another… GET returns an empty response.
  4. Yet another… GET returns an empty response.
  5. Finally, GET returns the data for which you were looking.

The challenges of traditional polling methods

  • Unnecessary chattiness and cost due to polling for result sets—Every time your frontend polls an API, it’s adding costs by leveraging infrastructure to compute the result. Empty polls are just wasteful!
  • Hampered mobile battery life—One of the top contributors of apps that eat away at battery life is excessive polling. Make sure that your app isn’t on your users’ Top App Battery Usage list-of-shame that could result in deletion.
  • Delayed data arrival due to polling schedule—Some approaches to polling include an incremental backoff to limit the number of empty polls. This sometimes results in a delay between data readiness and data arrival.

But what about long polling?

  • User request deadlocks can hamper application performance—Long synchronous user responses can lead to unexpected user wait times or UI deadlocks, which can affect mobile devices especially.
  • Memory leaks and consumption could bring your app down—Keeping long-running tasks queries open may overburden your backend and create failure scenarios, which may bring down your app.
  • HTTP default timeouts across browsers may result in inconsistent client experience—These timeouts vary across browsers, and can lead to an inconsistent experience across your end users. Depending on the size and complexity of the requests, processing can last longer than many of these timeouts and take minutes to return results.

Instead, create an event-driven architecture and move your APIs from poll to push.

Asynchronous push model

To create optimal UX experiences, frontend developers often strive to create progressive and reactive user experiences. Users can interact with frontend objects (for example, push buttons) with little lag when sending requests and receiving data. But frontend developers also want users to receive timely data, without sacrificing additional user actions or performing unnecessary processing.

The birth of microservices and the cloud over the past several years has enabled frontend developers and backend service designers to think about these problems in an asynchronous manner. This enables the virtually unlimited resources of the cloud to choreograph data processing. It also enables clients to benefit from progressive and reactive user experiences.

This is a fresh alternative to the synchronous design pattern, which often relies on client consumers to act as the conductor for all user requests. The following diagram compares the flow of communication between patterns.Comparison of communication patterns

Asynchronous orchestration can be easily choreographed using the workflow definition, monitoring, and tracking console with AWS Step Functions state machines. Break up services into functions with AWS Lambda and track executions within a state diagram, like the following:

With this approach, consumers send a request, which initiates downstream distributed processing. The subsequent steps are invoked according to the state machine workflow and each execution can be monitored.

Okay, but how does the consumer get the result of what they’re looking for?

Multiple approaches to this problem

There are different approaches for consumers to retrieve their resulting data. In order to create the optimal solution, there are several questions that a service owner may need to ask.

Is there known trust with the client (such as another microservice)?

If the answer is Yes, one approach is to use Amazon SNS. That way, consumers can subscribe to topics and have data delivered using email, SMS, or even HTTP/HTTPS as an event subscriber (that is, webhooks).

With webhooks, the consumer creates an endpoint where the service provider can issue a callback using a small amount of consumer-side resources. The consumer is waiting for an incoming request to facilitate downstream processing.

  1. Send a request.
  2. Subscribe the consumer to the topic.
  3. Open the endpoint.
  4. SNS sends the POST request to the endpoint.

If the trust answer is No, then clients may not be able to open a lightweight HTTP webhook after sending the initial request. In that case, you must open the webhook. Consider an alternative framework.

Are you restricted to only using a REST protocol?

Some applications can only run HTTP requests. These scenarios include the age of the technology or browser for end users, and potential security requirements that may block other protocols.

In these scenarios, customers may use a GET method as a best practice, but we still advise avoiding polling. In these scenarios, some design questions may be: Does data readiness happen at a predefined time or duration interval? Or can the user experience tolerate time between data readiness and data arrival?

If the answer to both of these is Yes, then consider trying to send GET calls to your RESTful API one time. For example, if a job averages 10 minutes, make your GET call at 10 minutes after the submission. Sounds simple, right? Much simpler than polling.

Should I use GraphQL, a WebSocket API, or another framework?

Each framework has tradeoffs.

If you want a more flexible query schema, you may gravitate to GraphQL, which follows a “data-driven UI” approach. If data drives the UI, then GraphQL may be the best solution for your use case.

AWS AppSync is a serverless GraphQL engine that supports the heavy lifting of these constructs. It offers functionality such as AWS service integration, offline data synchronization, and conflict resolution. With GraphQL, there’s a construct called Subscriptions where clients can subscribe to event-based subscriptions upon a data change.

Amazon API Gateway makes it easy for developers to deploy secure APIs at scale. With the recent introduction of API Gateway WebSocket APIs, web, mobile clients, and backend services can communicate over an established WebSocket connection. This also allows clients to be more reactive to data updates and only do work one time after an update has been received over the WebSocket connection.

The typical frontend design approach is to create a UI component that is updated when the results of the given procedure are complete. This is beneficial over a complete webpage refresh and gains in the customer’s user experience.

Because many companies have elected to use the REST framework for creating API-driven tightly bound service contracts, a RESTful interface can be used to validate and receive the request. It can provide a status endpoint used to deliver status. Also, it provides additional flexibility in delivering the result to the variety of clients, along with the WebSocket API.

Poll-to-push solution with API Gateway

Imagine a scenario where you want to be updated as soon as the data is created. Instead of the traditional polling methods described earlier, use an API Gateway WebSocket API. That pushes new data to the client as it’s created, so that it can be rendered on the client UI.

Alternatively, a WebSocket server can be deployed on Amazon EC2. With this approach, your server is always running to accept and maintain new connections. In addition, you manage the scaling of the instance manually at times of high demand.

By using an API Gateway WebSocket API in front of Lambda, you don’t need a machine to stay always on, eating away your project budget. API Gateway handles connections and invokes Lambda whenever there’s a new event. Scaling is handled on the service side. To update our connected clients from the backend, we can use the API Gateway callback URL. The AWS SDKs make communication from the backend easy. As an example, see the Boto3 sample of post_to_connection:

import boto3 
#Use a layer or deployment package to 
#include the latest boto3 version.
...
apiManagement = boto3.client('apigatewaymanagementapi', region_name={{api_region}},
                      endpoint_url={{api_url}})
...
response = apiManagement.post_to_connection(Data={{message}},ConnectionId={{connectionId}})
...

Solution example

To create this more optimized solution, create a simple web application that enables a user to make a request against a large dataset. It returns the results in a flat file over WebSocket. In this case, we’re querying, via Amazon Athena, a data lake on S3 populated with AWS Twitter sentiment (Twitter: @awscloud or #awsreinvent). However, this solution could apply to any data store, data mart, or data warehouse environment, or the long-running return of data for a response.

For the frontend architecture, create this web application using a JavaScript framework (such as JQuery). Use API Gateway to accept a REST API for the data and then open a WebSocket connection on the client to facilitate the return of results:poll to push solution architecture

  1. The client sends a REST request to API Gateway. This invokes a Lambda function that starts the Step Functions state machine execution. The function also returns a task token for the open connection activity worker. The function then returns the execution ARN and task token to the client.
  2. Using the data returned by the REST request, the client connects to the WebSocket API and sends the task token to the WebSocket connection.
  3. The WebSocket notifies the Step Functions state machine that the client is connected. The Lambda function completes the OpenConnection task through validating the task token and sending a success message.
  4. After RunAthenaQuery and OpenConnection are successful, the state machine updates the connected client over the WebSocket API that their long-running job is complete. It uses the REST API call post_to_connection.
  5. The client receives the update over their WebSocket connection using the IssueCallback Lambda function, with the callback URL from the API Gateway WebSocket API.

In this example application, the data response is an S3 presigned URL composed of results from Athena. You can then configure your frontend to use that S3 link to download to the client.

Why not just open the WebSocket API for the request?

While this approach can work, we advise against it for this use case. Break the required interfaces for this use case into three processes:

  1. Submit a request.
  2. Get the status of the request (to detect failure).
  3. Deliver the results.

For the submit request interface, RESTful APIs have strong controls to ensure that user requests for data are validated and given guardrails for the intended query purpose. This helps prevent rogue requests and unforeseen impacts to the analytics environment, especially when exposed to a large number of users.

With this example solution, you’re restricting the data requests to specific countries in the frontend JavaScript. Using the RESTful POST method for API requests enables you to validate data as query string parameters, such as the following:

https://<apidomain>.amazonaws.com/Demo/CreateStateMachineAndToken?Country=France

API Gateway models and mapping templates can also be used to validate or transform request payloads at the API layer before they hit the backend. Use models to ensure that the structure or contents of the client payload are the same as you expect. Use mapping templates to transform the payload sent by clients to another format, to be processed by the backend.

This REST validation framework can also be used to detect header information on WebSocket browser compatibility (external site). While using WebSocket has many advantages, not all browsers support it, especially older browsers. Therefore, a REST API request layer can pass this browser metadata and determine whether a WebSocket API can be opened.

Because a REST interface is already created for submitting the request, you can easily add another GET method if the client must query the status of the Step Functions state machine. That might be the case if a health check in the request is taking longer than expected. You can also add another GET method as an alternative access method for REST-only compatible clients.

If low-latency request and retrieval are an important characteristic of your API and there aren’t any browser-compatibility risks, use a WebSocket API with JSON model selection expressions to protect your backend with a schema.

In the spirit of picking the best tool for the job, use a REST API for the request layer and a WebSocket API to listen for the result.

This solution, although secure, is an example for how a non-polling solution can work on AWS. At scale, it may require refactoring due to the cross-talk at high concurrency that may result in client resubmissions.

To discover, deploy, and extend this solution into your own AWS environment, follow the PollToPush instructions in the AWS Serverless Application Repository.

Conclusion

When application consumers poll for long-running tasks, it can be a wasteful, detrimental, and costly use of resources. This post outlined multiple ways to refactor the polling method. Use API Gateway to host a RESTful interface, Step Functions to orchestrate your workflow, Lambda to perform backend processing, and an API Gateway WebSocket API to push results to your clients.