Tag Archives: AWS Storage

Use AWS Nitro Enclaves to perform computation of multiple sensitive datasets

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/leveraging-aws-nitro-enclaves-to-perform-computation-of-multiple-sensitive-datasets/

This blog post is written by, Jeff Wisman, Principal Solutions Architect and Andrew Lee, Solutions Architect.

Introduction

Many organizations have sensitive datasets that they do not want to share with others because of stringent security and compliance requirements. However, they would still like to use each other’s data to perform processing and aggregation. For example, B2B (business to business) companies often want to augment their customer information dataset with additional demographic or psychographic signals. This enrichment of data is often done by one party sending customer information to be matched against another party’s data universe. Naturally, privacy and the revealing of business-critical customer information to an external entity is a major concern here. In this blog, we present a solution where multiple parties can choose to give an isolated compute environment access to their encrypted data to be decrypted and processed in a secure way using AWS Nitro Enclaves.

Designing and building your own secure private computing solution can be challenging, with few out-of-the-box solutions. Our sample application uses Nitro Enclaves, which support the creation of an isolated execution environment called an enclave and a cryptographic attestation process for generating and validating the enclave’s identity. The attestation process makes it possible to ensure only authorized code is running, as well as integration with the AWS Key Management Service (AWS KMS), so that only enclaves that you choose can access sensitive data. Nitro Enlaves enables customers to focus more on their application instead of worrying about integration with external services. While many enterprise use cases involve complex datasets, we’ll use a hypothetical scenario to learn the fundamentals of how this works. The example proof of concept (POC) application will be centered around a third-party bidding service for real estate transactions. Buyers will submit encrypted bids to the application. Once all the bids have been entered, the application will decrypt the bids, determine the highest bidder, and return a result without disclosing the actual bid amounts to any party.

How it works

The POC will be deployed across three AWS accounts, one each for buyer-1, buyer-2, and the bidding service. The bidding service will be run in an enclave on the bidding service’s account.

  1. The bidding service will generate a set of measurements called platform configuration registers (PCRs) from the application code that uses the attestation process. PCRs are cryptographic measurements that are unique to an enclave. An attestation document can be used to verify the identity of the enclave and establish trust.
  2. The buyers will each generate their own AWS KMS key and use AWS KMS to authorize cryptographic requests from the bidding service enclave based on PCR values in the attestation document.
  3. The buyers will place their bids into a file, encrypt the file using AWS KMS, and store them in their own Amazon Simple Storage Service (Amazon S3)
  4. The bidding service will run the application, which will retrieve the encrypted bids from each buyer’s S3 bucket, decrypt the bids, calculate the highest bidder, and store the result in an S3 bucket.

Overall workflow of POC

Implementation

Let’s take a deeper dive into the steps involved in implementing this POC. To deploy the POC to your environment, follow the instructions in the AWS Nitro Enclaves Bidding Service GitHub project.

Enclave image generation

The first step is for the bidding service to launch the parent instance that will host the enclave. Refer to Launch the parent instance for more information about this process. The two real estate buyers, which we will call buyer-1 and buyer-2, will need to review the application code of the enclave application and agree that their data will not be exposed outside the enclave. Once they agree on the code, the bidding service generates the enclave image and a set of measurements as part of the attestation process. The buyers should also perform this process to ensure that the enclave image was generated and its measurements are from the agreed-upon application code. During the generation process, a unique set of measurements is taken of the application, which will make up its identity. When the enclave makes a request to decrypt data with AWS KMS, those measurements will be included in an attestation document to prove the enclave’s identity. Access policies in AWS KMS can then grant access to that identity. An example of a set of measurements is shown here:

Enclave Image successfully created.
{ "Measurements": { "HashAlgorithm": "Sha384 { ... }",
"PCR0":"287b24930a9f0fe14b01a71ecdc00d8be8fad90f9834d547158854b8279c74095c43f8d7f047714e98deb7903f20e3dd",
"PCR1":"aca6e62ffbf5f7deccac452d7f8cee1b94048faf62afc16c8ab68c9fed8c38010c73a669f9a36e596032f0b973d21895",
"PCR2":"0315f483ae1220b5e023d8c80ff1e135edcca277e70860c31f3003b36e3b2aaec5d043c9ce3a679e3bbd5b3b93b61d6f"
} }

Preparing encrypted data

Each buyer will create an AWS KMS key and use that key to encrypt their bids. The encrypted bids will be stored in their respective S3 buckets. Because AWS KMS integrates with Nitro Enclaves to provide built-in attestation support, each buyer can add the PCR values generated earlier as a condition to their respective AWS KMS key policies. This will ensure that only the enclave application code agreed upon by both buyers will have access to utilize the keys for decryption. The following is an example of a KMS key policy with PCR values as a condition:

{
  "Version": "2012-10-17",
  "Id": "key-default-1",
  "Statement": [{
    "Sid": "Enable decrypt from enclave",
    "Effect": "Allow",
    "Principal": < PARENT INSTANCE ROLE ARN > ,
    "Action": "kms:Decrypt",
    "Resource": "",
    "Condition": {
      "StringEqualsIgnoreCase": {
        "kms:RecipientAttestation:ImageSha384": "<PCR0 VALUE FROM BUILDING ENCLAVE IMAGE>",
        "kms:RecipientAttestation:PCR1":"<PCR1 VALUE FROM BUILDING ENCLAVE IMAGE>",
        "kms:RecipientAttestation:PCR2":"<PCR2 VALUE FROM BUILDING ENCLAVE IMAGE>"
      }
    }
  }]
}

The previous example only shows a key policy that uses PCR0, PCR1, and PCR2. You can further scope down the permissions by adding additional PCR values, for instance, role, parent instance ID, and a signing certificate for the enclave image. Refer to the AWS Nitro Enclaves User Guide for more details about PCR values.

Running the POC

The bidding service will run the enclave image generated earlier on the parent instance. The application runs as two parts, one part on the parent instance and another in the enclave. Communication between the parent and the enclave is done through a vsock connection. An AWS KMS proxy is also used on the parent to allow communication between the enclave and AWS KMS for decrypting data. The parent application will retrieve the encrypted bids from each buyer’s S3 bucket and send them to the enclave. The enclave will decrypt the data using both buyers’ AWS KMS keys and present attestation documents signed by the Nitro Hypervisor. AWS KMS will validate that the PCR values in the attestation documents match the key policy before performing the decryption. The decryption event will be logged in AWS CloudTrail for auditing purposes. Once the encrypted bids are decrypted, the values are compared, and the winning buyer is recorded in the result. The unencrypted result is then returned to the parent and written to an S3 bucket in the bidding service’s account. A diagram of this process is shown in Figure 2.

Cleanup

Be sure you delete all the resources that were created when following the included Github project:

  • Bidding service EC2 instance EC2 instance IAM role and policy
  • AWS KMS Customer managed keys
  • S3 Buckets for storing encrypted files

Conclusion

In this blog post, we introduced a sample POC utilizing Nitro Enclaves to allow a bidding service to process two parties’ encrypted data without revealing their data to any party. We did this by ensuring access to sensitive data is only allowed from an application running within an enclave. With the straightforward integration of AWS KMS and the attestation process, customers can quickly develop applications on Nitro Enclaves that can enable computing on encrypted datasets from multiple accounts. For more information on AWS Nitro Enclaves, see the official product documentation or the introductory videos on YouTube.

Choosing between AWS Lambda data storage options in web apps

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/

AWS Lambda is an on-demand compute service that powers many serverless applications. Lambda functions are ephemeral, with execution environments only existing for a brief time when the function is invoked. Many compute operations need access to external data for a variety of purposes. This includes importing third-party libraries, accessing machine learning models, or exporting the output of the compute operation.

Lambda provides a comprehensive range of storage options to meet the needs of web application developers. These include other AWS services such as Amazon S3 and Amazon EFS. There are also native storage options available, such as temporary storage or Lambda layers. In this blog post, I explain the differences between these options, and discuss common use-cases to help you choose for your own applications.

This post references the Happy Path web application series, and you can download the code for that application from the repository.

Amazon S3 – Object storage

Amazon S3 is an object storage service that scales elastically. It offers high availability and 11 9’s of durability. The service is ideal for storing unstructured data. This includes binary data, such as images or media, log files and sensor data.

Sample contents from an S3 bucket.

There are certain characteristics of S3 object storage that are important to remember. While S3 objects can be versioned, you cannot append data as you could in a file system. You have to store an entirely new version of an object. S3 also has a flat storage hierarchy that’s different to a file system. Instead of directories, you use folders to logically organize objects, by prefixing ‘foldername/’ in the key name.

S3 has important event integrations for serverless developers. It has a native integration with Lambda, which allows you to invoke a function in response to an S3 event. This can provide a scalable way to trigger application workflows when objects are created or deleted in S3. In the Happy Path application, the image-processing workflows are initiated by this event integration. To learn more about using S3 to trigger automated serverless workflows, visit the learning path.

S3 is often an important repository for an organization’s data lake. If your application writes data to S3 buckets, this can be a useful staging area for downstream processing. For analytics workloads, you can use AWS Glue to perform extract, transform, and loan (ETL) operations. To create ad hoc visualizations and business analysis reports, Amazon QuickSight can connect to your S3 buckets and produce interactive dashboards. To learn how to build business intelligence dashboards for your web application, visit the Innovator Island workshop.

S3 also provides object lifecycle management. This allows you to automatically change storage classes when certain conditions are met. For example, an application for uploading expenses could automatically archive PDFs after 1 year to Amazon S3 Glacier to reduce storage costs. In the Happy Path application, the original high-resolution uploads are stored in a separate bucket from the optimized distribution assets. To reduce storage costs, lifecycle management could be configured to automatically delete these original photo assets after 30 days.

Temporary storage with /tmp

The Lambda execution environment provides a file system for your code to use at /tmp. This space has a fixed size of 512 MB. The same Lambda execution environment may be reused by multiple Lambda invocations to optimize performance. The /tmp area is preserved for the lifetime of the execution environment and provides a transient cache for data between invocations. Each time a new execution environment is created, this area is deleted.

Consequently, this is intended as an ephemeral storage area. While functions may cache data here between invocations, it should be used only for data needed by code in a single invocation. It’s not a place to store data permanently, and is better-used to support operations required by your code.

Operationally, working with files in /tmp is the same as your local hard disk, and offers fast I/O throughput. For example, to unzip a file into this space in Python, use:

import os, zipfile
os.chdir('/tmp')
with zipfile.ZipFile(myzipfile, 'r') as zip:
    zip.extractall()

Lambda layers

Your Lambda functions may use additional libraries as part of the deployment package. You can bundle these in the deployment archive or optionally move to a layer instead. A Lambda function can have up to five layers, and is subject to the maximum deployment size of 50 MB (zipped). Packages in layers are available in the /opt directory during invocations. While layers are private to you by default, you can also share layers with other AWS accounts, or make layers public.

Lambda layers in the console

There are many benefits to using layers throughout the functions in your serverless application. It’s best practice to include the AWS SDK instead of depending on the version bundled with the Lambda service. This enables you to pin the version of the SDK. By using a layer, you don’t need to bundle the package with each function, which can increase your deployment package size and slow down deployments. You can create an AWS SDK layer and then include a reference to the layer in each function.

Layers can be an effective way to bundle large dependencies, or share compiled libraries with binaries that vary by operating system. For example, the Happy Path application uses the Sharp npm graphics library to process images. Similarly, the Innovator Island workshop uses the OpenCV library to perform image manipulation, and this is imported using a shared layer.

Layers are static once they are deployed. You can only change the contents of a layer by deploying a new version. Any Lambda function using the layer binds to a specific version and must be updated to change layer versions. To learn more, see using Lambda layers to simplify your development process.

Amazon EFS for Lambda

Amazon EFS is a fully managed, elastic, shared file system that integrates with other AWS services. It is durable storage option that offers high availability. You can now mount EFS volumes in Lambda functions, which makes it simpler to share data across invocations. The file system grows and shrinks as you add or delete data, so you do not need to manage storage limits.

EFS file system in the console.

The Lambda service mounts EFS file systems when the execution environment is prepared. This happens in parallel with other initialization operations so typically does not impact cold start latency. If the execution environment is warm from previous invocations, the mount is already prepared. To use EFS, your Lambda function must be in the same VPC as the file system.

EFS enables new capabilities for serverless applications. The file system is a dynamic binding for Lambda functions, unlike layers. This makes it useful for deploying code libraries where you want to always use the latest version. You configure the mount path when integrating the file system with your function, and then include packages from this location. Additionally, you can use this to include packages that exceed the limits of layers.

Due to its speed and support of standard file operations, EFS is also useful for ingesting or writing large numbers files durably. This can be helpful for zipping or unzipping large archives, for example. For appending to existing files, EFS is also a preferred option to using S3.

To learn more, see using Amazon EFS for AWS Lambda in your serverless applications.

Comparing the different data storage options

This table compares the characteristics of these four different data storage options for Lambda:

Amazon S3 /tmp Lambda Layers Amazon EFS
Maximum size Elastic 512 MB 50 MB Elastic
Persistence Durable Ephemeral Durable Durable
Content Dynamic Dynamic Static Dynamic
Storage type Object File system Archive File system
Lambda event source integration Native N/A N/A N/A
Operations supported Atomic with versioning Any file system operation Immutable Any file system operation
Object tagging Y N N N
Object metadata Y N N N
Pricing model Storage + requests + data transfer Included in Lambda Included in Lambda Storage + data transfer + throughput
Sharing/permissions model IAM Function-only IAM IAM + NFS
Source for AWS Glue Y N N N
Source for Amazon QuickSight Y N N N
Relative data access speed from Lambda Fast Fastest Fastest Very fast

Conclusion

Lambda is a flexible, on-demand compute service for serverless application. It supports a wide variety of workloads by providing a number of different data storage options.

In this post, I compare the capabilities and use-cases of S3, EFS, Lambda layers, and temporary storage for Lambda functions. There are benefits to each approach, as each type has different behaviors and characteristics. For web application developers, these storage types support different operations depending upon the needs of your serverless backend.

As the newest integration with Lambda, EFS now enables new workloads and capabilities. This includes sharing large code packages with Lambda, or durably operating on large numbers of files. It also opens up new possibilities for developers working on deep learning inference models.

To learn more about storage options available, visit the AWS Serverless homepage. For more serverless learning resources, visit https://serverlessland.com.