Enhancing file sharing using Amazon S3 and AWS Step Functions

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/enhancing-file-sharing-using-amazon-s3-and-aws-step-functions/

This post is written by Islam Elhamaky, Senior Solutions Architect and Adrian Tadros, Senior Solutions Architect.

Amazon S3 is a cloud storage service that many customers use for secure file storage. S3 offers a feature called presigned URLs to generate temporary download links, which are effective and secure way to upload and download data to authorized users.

There are times when customers need more control over how data is accessed. For example, they may want to limit downloads based on IAM roles instead of presigned URLs, or limit the number of downloads per object to control data access costs. Additionally, it can be useful to track individuals access those download URLs.

This blog post presents an example application that can provide this extra functionality, using AWS serverless services.

Overview

The code included in this example uses a variety of serverless services:

  • Amazon API Gateway receives all incoming requests from users and authorizes access using Amazon Cognito.
  • AWS Step Functions coordinates file sharing and downloading activities such as user validation, checking download eligibility, recording events, request routing, and response formatting.
  • AWS Lambda implements admin activities such as retrieving metadata, listing files and deletion.
  • Amazon DynamoDB stores permissions to ensure users only have access to files that have been shared with them.
  • Amazon S3 provides durable storage for users to upload and download files.
  • Amazon Athena provides an efficient way to query S3 Access Logs to extract download and bandwidth usage.
  • Amazon QuickSight provides a visual dashboard to view download and bandwidth analytics.

AWS Cloud Development Kit (AWS CDK) deploys the AWS resources and can plug into your preferred CI/CD process.

Architecture Overview

Architecture

  1. User Interface: The front end is a static React single page application hosted on S3 and served via Amazon CloudFront. The UI uses AWS NorthStar and Cloudscape design components. Amplify UI simplifies interactions with Amazon Cognito such as providing the ability to log in, sign up, and perform email verification.
  2. API Gateway: Users interact via an API Gateway REST API.
  3. Authentication:  Amazon Cognito manages user identities and access. Users sign up using their email address and then verify their email address. Requests to the API include an access token, which is verified using a Amazon Cognito authorizer.
  4. Microservices: The core operations are built with Lambda. The primary workflows allow users to share and download files and Step Functions orchestrates multiple steps in the process. These can include validating requests, authorizing that users have the correct permissions to access files, sending notifications, auditing, and keeping tracking of who is accessing files.
  5. Permission store: DynamoDB stores essential information about files such as ownership details and permissions for sharing. It tracks who owns a file and who has been granted access to download it.
  6. File store: An S3 bucket is the central file repository. Each user has a dedicated folder within the S3 bucket to store files.
  7. Notifications: The solution uses Amazon Simple Notification Service (SNS) to send email notifications to recipients when a file is shared.
  8. Analytics: S3 Access Logs are generated whenever users download or upload files to the file storage bucket. Amazon Athena filters these logs to generate a download report, extracting key information (such as the identity of the users who downloaded files and the total bandwidth consumed during the downloads).
  9. Reporting: Amazon QuickSight provides an interface for administrators to view download reports and dashboards.

Walkthrough

As prerequisites, you need:

  • Node.js version 16+.
  • AWS CLI version 2+.
  • An AWS account and a profile set up on your computer.

Follow the instructions in the code repository to deploy the example to your AWS account. Once the application is deployed, you can access the user interface.

In this example, you walk through the steps to create upload a file and share it with a recipient:

  1. The example requires users to identify themselves using an email address. Choose Create Account then Sign In with your credentials.
    Create account
  2. Select Share a file.
    Share a file
  3. Select Choose file to browse and select file to share. Choose Next.
    Choose file
  4. You must populate at least one recipient. Choose Add recipient to add more recipients. Choose Next.
    Step 4
  5. Set Expire date and Limit downloads to configure share expiry date and limit the number of allowed downloads. Choose Next.
    Step 5
  6. Review the share request details. You can navigate to previous screens to modify. Choose Submit once done.
    Step 6
  7. Choose My files to view your shared file.
    Step 7

Extending the solution

The example uses Step Functions to allow you to extend and customize the workflows. This implements a default workflow, providing you with the ability to override logic or introduce new steps to meet your requirements.

This section walks through the default behavior of the Share File and Download File Step Functions workflows.

The Share File workflow

Share File workflow

The share file workflow consists of the following steps:

  1. Validate: check that the share request contains all mandatory fields.
  2. Get User Info: retrieve the logged in user’s information such as name and email address from Amazon Cognito.
  3. Authorize: check the permissions stored in DynamoDB to verify if the user owns the file and has permission to share the file.
  4. Audit: record the share attempt for auditing purposes.
  5. Process: update the permission store in DynamoDB.
  6. Send notifications: send email notifications to recipients to let them know that a new file has been shared with them.

The Download File workflow

Download File workflow

The download file workflow consists of the following steps:

  1. Validate: check that the download request contains the required fields (for example, user ID and file ID).
  2. Get user info: retrieve the user’s information from Amazon Cognito such as their name and email address.
  3. Authorize: check the permissions store in DynamoDB to check if the user owns the file or is valid recipient with permissions to download the file.
  4. Audit: record the download attempt.
  5. Process: generate a short-lived S3 pre-signed download URL and return to the user.

Step Functions API data mapping

The example uses API Gateway request and response data mappings to allow the REST API to communicate directly with Step Functions. This section shows how to customize the mapping based on your use case.

Request data mapping

The API Gateway REST API uses Apache VTL templates to transform and construct requests to the underlying service. This solution abstracts the construction of these templates using a CDK construct:

api.root
.addResource('share')
.addResource('{fileId}')
.addMethod(
  'POST',
   StepFunctionApiIntegration(shareStepFunction, [
      { name: 'fileId', sourceType: 'params' },
      { name: 'recipients', sourceType: 'body' },
      /* your custom input fields */
   ]),
   authorizerSettings,
);

The StepFunctionApiIntegration construct handles the request mapping allowing you to extract fields from the incoming API request and pass these as inputs to a Step Functions workflow. This generates the following VTL template:

{
  "name": "$context.requestId",
  "input": "{\"userId\":\"$context.authorizer.claims.sub\",\"fileId\":\"$util.escap eJavaScript($input.params('fileId'))\",\"recipients\":$util.escapeJavaScript($input.json('$.recipients'))}",
  "stateMachineArn": "...stateMachineArn"
}

In this scenario, fields are extracted from the API request parameters, body, and authorization header and passed to the workflow. You can customize the configuration to meet your requirements.

Response data mapping

The example has response mapping templates using Apache VTL. The output of the last step in a workflow is mapped as a JSON response and returned to the user through API Gateway. The response also includes CORS headers:

#set($context.responseOverride.header.Access-Control-Allow-Headers = '*')
#set($context.responseOverride.header.Access-Control-Allow-Origin = '*')
#set($context.responseOverride.header.Access-Control-Allow-Methods = '*')
#if($input.path('$.status').toString().equals("FAILED"))
#set($context.responseOverride.status = 500)
{
  "error": "$input.path('$.error')",
  "cause": "$input.path('$.cause')"
}
#else
  $input.path('$.output')
#end

You can customize this response template to meet your requirements. For example, you may provide custom behavior for different response codes.

Conclusion

In this blog post, you learn how you can securely share files with authorized external parties and track their access using AWS serverless services. The sample application presented uses Step Functions to allow you to extend and customize the workflows to meet your use case requirements.

To learn more about the concepts discussed, visit:

For more serverless learning resources, visit Serverless Land. Learn about data processing in Step Functions by reading the guide: Introduction to Distributed Map for Serverless Data Processing.