Tag Archives: Amazon CloudFront

Optimizing the cost of serverless web applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-the-cost-of-serverless-web-applications/

Web application backends are one of the most frequent types of serverless use-case for customers. The pay-for-value model can make it cost-efficient to build web applications using serverless tools.

While serverless cost is generally correlated with level of usage, there are architectural decisions that impact cost efficiency. The impact of these choices is more significant as your traffic grows, so it’s important to consider the cost-effectiveness of different designs and patterns.

This blog post reviews some common areas in web applications where you may be able to optimize cost. It uses the Happy Path web application as a reference example, which you can read about in the introductory blog post.

Serverless web applications generally use a combination of the services in the following diagram. I cover each of these areas to highlight common areas for cost optimization.

Serverless architecture by AWS service

The API management layer: Selecting the right API type

Most serverless web applications use an API between the frontend client and the backend architecture. Amazon API Gateway is a common choice since it is a fully managed service that scales automatically. There are three types of API offered by the service – REST APIs, WebSocket APIs, and the more recent HTTP APIs.

HTTP APIs offer many of the features in the REST APIs service, but the cost is often around 70% less. It supports Lambda service integration, JWT authorization, CORS, and custom domain names. It also has a simpler deployment model than REST APIs. This feature set tends to work well for web applications, many of which mainly use these capabilities. Additionally, HTTP APIs will gain feature parity with REST APIs over time.

The Happy Path application is designed for 100,000 monthly active users. It uses HTTP APIs, and you can inspect the backend/template.yaml to see how to define these in the AWS Serverless Application Model (AWS SAM). If you have existing AWS SAM templates that are using REST APIs, in many cases you can change these easily:

REST to HTTP API

Content distribution layer: Optimizing assets

Amazon CloudFront is a content delivery network (CDN). It enables you to distribute content globally across 216 Points of Presence without deploying or managing any infrastructure. It reduces latency for users who are geographically dispersed and can also reduce load on other parts of your service.

A typical web application uses CDNs in a couple of different ways. First, there is the distribution of the application itself. For single-page application frameworks like React or Vue.js, the build processes create static assets that are ideal for serving over a CDN.

However, these builds may not be optimized and can be larger than necessary. Many frameworks offer optimization plugins, and the JavaScript community frequently uses Webpack to bundle modules and shrink deployment packages. Similarly, any media assets used in the application build should be optimized. You can use tools like Lighthouse to analyze your web apps to find images that can be resized or compressed.

Optimizing images

The second common CDN use-case for web apps is for user-generated content (UGC). Many apps allow users to upload images, which are then shared with other users. A typical photo from a 12-megapixel smartphone is 3–9 MB in size. This high resolution is not necessary when photos are rendered within web apps. Displaying the high-resolution asset results in slower download performance and higher data transfer costs.

The Happy Path application uses a Resizer Lambda function to optimize these uploaded assets. This process creates two different optimized images depending upon which component loads the asset.

Image sizes in front-end applications

The upload S3 bucket shows the original size of the upload from the smartphone:

The distribution S3 bucket contains the two optimized images at different sizes:

Optimized images in the distribution S3 bucket

The distribution file sizes are 98–99% smaller. For a busy web application, using optimized image assets can make a significant difference to data transfer and CloudFront costs.

Additionally, you can convert to highly optimized file formats such as WebP to reduce file size even further. Not all browsers support this format, but you can use CSS on the frontend to fall back to other types if needed:

<img src="myImage.webp" onerror="this.onerror=null; this.src='myImage.jpg'">

The data layer

AWS offers many different database and storage options that can be useful for web applications. Billing models vary by service and Region. By understanding the data access and storage requirements of your app, you can make informed decisions about the right service to use.

Generally, it’s more cost-effective to store binary data in S3 than a database. First, when the data is uploaded, you can upload directly to S3 with presigned URLs instead of proxying data via API Gateway or another service.

If you are using Amazon DynamoDB, it’s best practice to store larger items in S3 and include a reference token in a table item. Part of DynamoDB pricing is based on read capacity units (RCUs). For binary items such as images, it is usually more cost-efficient to use S3 for storage.

Many web developers who are new to serverless are familiar with using a relational database, so choose Amazon RDS for their database needs. Depending upon your use-case and data access patterns, it may be more cost effective to use DynamoDB instead. RDS is not a serverless service so there are monthly charges for the underlying compute instance. DynamoDB pricing is based upon usage and storage, so for many web apps may be a lower-cost choice.

Integration layer

This layer includes services like Amazon SQS, Amazon SNS, and Amazon EventBridge, which are essential for decoupling serverless applications. Each of these have a request-based pricing component, where 64 KB of a payload is billed as one request. For example, a single SQS message with a 256 KB payload is billed as four requests. There are two optimization methods common for web applications.

1. Combine messages

Many messages sent to these services are much smaller than 64 KB. In some applications, the publishing service can combine multiple messages to reduce the total number of publish actions to SNS. Additionally, by either eliminating unused attributes in the message or compressing the message, you can store more data in a single request.

For example, a publishing service may be able to combine multiple messages together in a single publish action to an SNS topic:

  • Before optimization, a publishing service sends 100,000,000 1KB-messages to an SNS topic. This is charged as 100 million messages for a total cost of $50.00.
  • After optimization, the publishing service combines messages to send 1,562,500 64KB-messages to an SNS topic. This is charged as 1,562,500 messages for a total cost of $0.78.

2. Filter messages

In many applications, not every message is useful for a consuming service. For example, an SNS topic may publish to a Lambda function, which checks the content and discards the message based on some criteria. In this case, it’s more cost effective to use the native filtering capabilities of SNS. The service can filter messages and only invoke the Lambda function if the criteria is met. This lowers the compute cost by only invoking Lambda when necessary.

For example, an SNS topic receives messages about customer orders and forwards these to a Lambda function subscriber. The function is only interested in canceled orders and discards all other messages:

  • Before optimization, the SNS topic sends all messages to a Lambda function. It evaluates the message for the presence of an order canceled attribute. On average, only 25% of the messages are processed further. While SNS does not charge for delivery to Lambda functions, you are charged each time the Lambda service is invoked, for 100% of the messages.
  • After optimization, using an SNS subscription filter policy, the SNS subscription filters for canceled orders and only forwards matching messages. Since the Lambda function is only invoked for 25% of the messages, this may reduce the total compute cost by up to 75%.

3. Choose a different messaging service

For complex filtering options based upon matching patterns, you can use EventBridge. The service can filter messages based upon prefix matching, numeric matching, and other patterns, combining several rules into a single filter. You can create branching logic within the EventBridge rule to invoke downstream targets.

EventBridge offers a broader range of targets than SNS destinations. In cases where you publish from an SNS topic to a Lambda function to invoke an EventBridge target, you could use EventBridge instead and eliminate the Lambda invocation. For example, instead of routing from SNS to Lambda to AWS Step Functions, instead create an EventBridge rule that routes events directly to a state machine.

Business logic layer

Step Functions allows you to orchestrate complex workflows in serverless applications while eliminating common boilerplate code. The Standard Workflow service charges per state transition. Express Workflows were introduced in December 2019, with pricing based on requests and duration, instead of transitions.

For workloads that are processing large numbers of events in shorter durations, Express Workflows can be more cost-effective. This is designed for high-volume event workloads, such as streaming data processing or IoT data ingestion. For these cases, compare the cost of the two workflow types to see if you can reduce cost by switching across.

Lambda is the on-demand compute layer in serverless applications, which is billed by requests and GB-seconds. GB-seconds is calculated by multiplying duration in seconds by memory allocated to the function. For a function with a 1-second duration, invoked 1 million times, here is how memory allocation affects the total cost in the US East (N. Virginia) Region:

Memory (MB)GB/SCompute costTotal cost
128125,000$ 2.08$ 2.28
512500,000$ 8.34$ 8.54
10241,000,000$ 16.67$ 16.87
15361,500,000$ 25.01$ 25.21
20482,000,000$ 33.34$ 33.54
30082,937,500$ 48.97$ 49.17

There are many ways to optimize Lambda functions, but one of the most important choices is memory allocation. You can choose between 128 MB and 3008 MB, but this also impacts the amount of virtual CPU as memory increases. Since total cost is a combination of memory and duration, choosing more memory can often reduce duration and lower overall cost.

Instead of manually setting the memory for a Lambda function and running executions to compare duration, you can use the AWS Lambda Power Tuning tool. This uses Step Functions to run your function against varying memory configurations. It can produce a visualization to find the optimal memory setting, based upon cost or execution time.

Optimizing costs with the AWS Lambda Power Tuning tool

Conclusion

Web application backends are one of the most popular workload types for serverless applications. The pay-per-value model works well for this type of workload. As traffic grows, it’s important to consider the design choices and service configurations used to optimize your cost.

Serverless web applications generally use a common range of services, which you can logically split into different layers. This post examines each layer and suggests common cost optimizations helpful for web app developers.

To learn more about building web apps with serverless, see the Happy Path series. For more serverless learning resources, visit https://serverlessland.com.

Using serverless backends to iterate quickly on web apps – part 1

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-serverless-backends-to-iterate-quickly-on-web-apps-part-1/

For many organizations, building applications is an iterative process where requirements change quickly. Traditional software architectures can be challenging to adapt to these changes. Often, early architectural decisions may limit the developers’ ability to deliver new features. Serverless architectural patterns are often much more adaptable, and can help developers keep pace with an evolving list of end-user requirements.

This blog series explores how to structure and build a serverless web app backend to enable the most flexibility for changing product requirements. It covers how to use serverless services in your architecture, and how to separate parts of the backend to make maintenance easier. I also show how you can use AWS Step Functions to encapsulate complex workflows and minimize the amount the custom code in your applications.

In this series:

  • Part 1: Deploy the application, test the upload process, and review the architecture.
  • Part 2: Understand how to use Step Functions, and deploy a custom workflow.
  • Part 3: Advanced workflows with custom branching and image moderation.

The code uses the AWS Serverless Application Model (AWS SAM), enabling you to deploy the application easily in your own AWS account. This walkthrough creates resources covered in the AWS Free Tier but you may incur cost for usage beyond development and testing.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

Introducing the “Happy Path” web application

In this scenario, a startup creates a web application called Happy Path. This app is designed to help state parks and nonprofit organizations replace printed materials, such as flyers and maps, with user-generated content. It allows visitors to capture images of park notices and photos of hiking trails. They can share these with other users to reduce printed waste.

The frontend displays and captures images of different locations, and the backend processes this data according to a set of business rules. This web application is designed for smartphones so it’s used while visitors are at the locations. Here is the typical user flow:

Happy Path user interface

  1. When park visitors first navigate to the site’s URL, it shows their current location with parks highlighted in the vicinity.
  2. The visitor selects a park. It shows thumbnails of any maps, photos, and images already uploaded by other users.
  3. If the visitor is logged in, they can upload their own images directly from their smartphone.

The first production version of this application provides a simple way for users to upload photos. It does little more than provide an uploading and sharing process.

However, the developer team quickly realizes that they must make some improvements. The developers need a way to implement complex, changing workflows on the backend without refactoring the code that is running in production. The architecture must also scale for an expected 100,000 monthly active users.

First, they want to optimize the large uploaded images to improve the speed of downloads. Next, they must also determine the suitability of images to ensure that the app only shows appropriate photos. There is also a rapidly growing list of feature requirements from organizations using the app.

In this series, I show how the development team can design the app to provide this level of flexibility. This way, they can implement new features and even pivot the core application if needed.

Deploying the application

In the GitHub repo, there are detailed deployment instructions in the README. The repo contains separate directories for the frontend, backend, and workflows. You must deploy the backend first. Once you have completed the deployment, you can run the frontend code on your local machine.

To launch the frontend application:

  1. Change to the frontend directory.
  2. Run npm run serve to start the development server. After building the modules in the project, the terminal shows the local URL where the application is running:
    Vue build completed
  3. Open a web browser and navigate to http://localhost:8080 to see the application.
  4. Open the developer console in your browser (for Google Chrome, Mozilla Firefox and Microsoft Edge, press F12 on the keyboard). This displays the application in a responsive layout and shows console logging. This can help you understand the flow of execution in the application.

Happy Path browser developer console

Testing the application

Now you have deployed the backend to your AWS account, and you are running the front end locally, you can test the application.

To upload an image for a location:

  1. Choose Log In and sign into the application, creating a new account if necessary.
  2. Select a location on the map to open the information window.
    Select a location on the map
  3. Choose Show Details, then choose Upload Images.
    Uploading images in Happy Path
  4. In the file picker dialog, select any one of the images from the sample photos dataset.

At this stage, the image is now uploaded to the S3 Uploads bucket on the backend. To verify this:

  1. Navigate to the Amazon S3 console.
  2. Choose the application’s upload bucket, then choose the folder name to open its contents. This shows the uploaded image.
    S3 bucket contents
  3. Navigate to the Amazon DynamoDB console.
  4. Select the hp-application table, then select the Items tab.
    DynamoDB table contents

There are two records shown:

  • The place listing: this item contains details about the selected park, such as the name and address.
  • The file metadata: this stores information about who uploaded the file, the timestamp, and the state of the upload.

At this stage, you have successfully tested that the frontend can upload images to the backend.

Architecture overview

After deploying the application using the repo’s README instructions, the backend architecture looks like this:

Happy Path backend architecture

There are five distinct functional areas for the backend application:

  1. API layer: when users interact with one of the API endpoints, this is processed by the API layer. Each API route invokes a Lambda function to complete its task, storing and fetching data from the storage layer.
  2. Storage layer: information about user uploads is persisted durably here. The application uses Amazon S3 buckets to store the binary objects, and a DynamoDB table for associated metadata.
  3. Notification layer: when images are uploaded, the PUT event triggers a Lambda function. This publishes the event to the Amazon EventBridge default event bus.
  4. Business logic layer: the customized business logic is encapsulated in AWS Step Functions workflows.
  5. Content distribution: the processed images are served via an Amazon CloudFront distribution to reduce latency and optimize delivery cost.

For future requirements, you can implement increasingly complex customized logic entirely within the business logic layer. All new workflow features are implemented here, without needing to modify other parts of the application

Conclusion

This series is about using serverless backends to allow you to iterate quickly on web application functionality.

In this post, I introduce the Happy Path example web application. I show the main features of the application, enabling end-users to upload maps and photos to the backend application. I walk through the deployment of the backend and frontend applications. Finally, you test with a sample image upload.

In part 2, you will deploy the image processing and workflow part of the application. This series explores progressively more complicated workflows, and how to manage their deployment. I will discuss some architectural choices which help to build in flexibility and scalability when designing backend applications

To learn more about building serverless web applications, see the Ask Around Me series.

The serverless LAMP stack part 4: Building a serverless Laravel application

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/the-serverless-lamp-stack-part-4-building-a-serverless-laravel-application/

In this post, you learn how to deploy a Laravel application with a serverless approach.

This is the fourth post in the “Serverless LAMP stack” series, previous posts covered:

Laravel is an open source web application framework for PHP. Using a framework helps developers to build faster by reusing generic components and modules. It also helps long-term maintenance by complying with development standards. However, there are still challenges when scaling PHP frameworks with a traditional LAMP stack. Deploying a framework using a serverless approach can help solve these challenges.

There are a number of solutions that simplify the deployment of a Laravel application onto a serverless infrastructure. The following solution uses an AWS Serverless Application Model (AWS SAM) template. This deploys a Laravel application into a single Lambda function. The function uses the Bref FPM custom runtime layer to run PHP. The AWS SAM template deploys the following architecture, explained in detail in “The Serverless LAMP stack Part 3: Replacing the web server”:

The serverless LAMP stack

Deploying Laravel and Bref with AWS SAM

Composer is a dependency management tool for PHP. It allows you to declare and manage your project libraries and dependencies such as Laravel and Bref.

Deploy Laraval and Bref with AWS SAM using the following steps:

  1. Download the Laravel installer using Composer:
    composer global require Laravel/installer
  2. Install Laravel:
    composer create-project --prefer-dist laravel/laravel blog
  3. In the Laravel project, install Bref using Composer:
    composer require bref/laravel-bridge
  4. Clone the AWS SAM template in your application’s root directory:
    git clone https://github.com/aws-samples/php-examples-for-aws-lambda/
  5. Change directory into “0.4-Building-A-Serverless-Laravel-App-With-AWS-SAM”:
    cd 0.4-Building-A-Serverless-Laravel-App-With-AWS-SAM
  6. Deploy the application using the AWS SAM CLI guided deploy:
    sam deploy -g

Once AWS SAM deploys the application, it returns the Amazon CloudFront distribution’s domain name. This distribution serves the serverless Laravel application.

CloudFront domain name

CloudFront domain name from AWS SAM template

Configuring Laravel for Lambda

There are some configuration changes required for Laravel to run in a Lambda function.

Session data store

While Lambda includes a 512 MB temporary file system, this is an ephemeral resource not intended for durable storage. This is because there is no guarantee of reusing the same Lambda function environment for each invocation.

For this reason, if you need Laravel session data, it must be stored outside of the Lambda function. There are a range of different options available for managing state with serverless applications. In this instance, it is recommended to store session data either in a database or using browser cookies.

Update the Laravel .env file to set the session_driver to cookie.

SESSION_DRIVER=cookie

Logging

Laravel implements a PHP logging library called Monolog as a common interface to write logs to a number of destinations. Laravel Monolog uses log channels to specify these destinations. Each channel is defined within the /config/logging.php file as an associative array.

Since the Lambda filesystem is not shared between multiple Lambda function invocations, application logs must be written to an external central location such as Amazon CloudWatch Logs. All errors, warnings, and notices emitted by PHP are forwarded onto CloudWatch Logs. This makes it easy to view, search, filter, or archive logs for future analysis from a single location. To configure this, add the following to the Laravel .env file:

LOG_CHANNEL=stderr

This ensures that the stderr channel is used to write all application logs which are automatically forwarded to CloudWatch Logs. This channel is defined in /config/logging.php:

'stderr' => 
    [ 
    'driver' => 'monolog', 
    'handler' => StreamHandler::class, 
    'formatter' => env('LOG_STDERR_FORMATTER'), 
    'with' => [ 
        'stream' => 'php://stderr', 
    ], 
],
CloudWatch Logs for a single Lambda invocation

CloudWatch Logs for a single Lambda invocation

Compiled views

Views contain the HTML served by an application, separating application logic from presentation logic. By default, views are compiled on demand inside the application’s storage directory.

As Lambda does not have write access to the storage directory, Laravel must be configured to write views to the function’s /tmp directory. This is a temporary file system for ephemeral data that’s only needed for the duration of each HTTP request.

In the .env file, add the following line to configure Laravel to use a new directory path for compiled views:

VIEW_COMPILED_PATH=/tmp/storage/framework/views

Laravel uses service providers to register or “bootstrap” components to your application. The AppServiceProvider.php file provides a central location to share data with all views. Add the following code to the Providers/AppServiceProvider.phpfile.

public function boot() { 
    // Make sure the directory for compiled views exist 
    if (! is_dir(config('view.compiled'))) { 
        mkdir(config('view.compiled'), 0755, true); 
    } 
}

This ensures that the view directory is automatically created for each Lambda function invocation, if it does not already exist.

File system abstraction with Amazon S3

Laravel uses a filesystem abstraction package called Flysystem. This provides a simple driver mechanism to configure the filesystem location. As Lambda’s /tmp directory is ephemeral, the filesystem location must be outside of the Lambda function. Configure Laravel to use the Amazon S3 filesystem driver by adding the following line to the .env file:

FILESYSTEM_DRIVER=s3

The AWS SAM template deploys an S3 bucket to store these objects:

Storage:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: php-example-laravel-FileSystemBucket

The bucket name is provided to the Lambda function as an environment variable from within the AWS SAM template:

    Environment:
      Variables:
        AWS_BUCKET: !Ref Storage

The Lambda function is granted permission to read/write to the S3 bucket, using an IAM policy definition:

Policies:
        - S3FullAccessPolicy:
            BucketName: !Ref Storage

Laravel’s filesystem configuration is found at config/filesystems.php. This is where the S3 filesystem disk is defined using the AWS SAM environment variable.

's3' => [
            'driver' => 's3',
            'key' => env('AWS_ACCESS_KEY_ID'),
            'secret' => env('AWS_SECRET_ACCESS_KEY'),
            'token' => env('AWS_SESSION_TOKEN'),
            'region' => env('AWS_DEFAULT_REGION'),
            'bucket' => env('AWS_BUCKET'),
            'url' => env('AWS_URL'),
            'endpoint' => env('AWS_ENDPOINT'),
        ],

The AWS account information and bucket ARN are provided by the Lambda environment that is running PHP, using Laravel’s env() function.

Public asset files

Laravel has a public disk driver for storing publicly accessible files such as images
and CSS files. By default, the public disk driver stores these files in storage/app/public/. These files must rather be stored in S3. Change the configuration in config/filesystems.php to the following:

+ 'public' => env('FILESYSTEM_DRIVER_PUBLIC', 'public_local'),
    
    'disks' => [

        'local' => [
            'driver' => 'local',
            'root' => storage_path('app'),
        ],

- 'public => [
+ 'public_local' => [
            'driver' => 'local',
            'root' => storage_path('app/public'),
            'url' => env('APP_URL').'/storage',
            'visibility' => 'public',
        ],

        's3' => [
            'driver' => 's3',
            'key' => env('AWS_ACCESS_KEY_ID'),
            'secret' => env('AWS_SECRET_ACCESS_KEY'),
            'token' => env('AWS_SESSION_TOKEN'),
            'region' => env('AWS_DEFAULT_REGION'),
            'bucket' => env('AWS_BUCKET'),
            'url' => env('AWS_URL'),
            'endpoint' => env('AWS_ENDPOINT'),
        ],

+ 's3_public' => [ + 'driver' => 's3', + 'key' => env('AWS_ACCESS_KEY_ID'), + 'secret' => env('AWS_SECRET_ACCESS_KEY'), + 'token' => env('AWS_SESSION_TOKEN'), + 'region' => env('AWS_DEFAULT_REGION'), + 'bucket' => env('AWS_PUBLIC_BUCKET'), + 'url' => env('AWS_URL'), + ],

    ],

This adds a new filesystem disk named s3_public, which uses an S3 driver. Laravel’s env() function retrieves the environment variable env(‘AWS_PUBLIC_BUCKET’) to set/configure the bucket location. The bucket name is passed to the Lambda function as an environment variable.

Add the following line to the .env file to configure the public disk to use S3:

FILESYSTEM_DRIVER_PUBLIC=s3

Referencing static assets in view templates

Laravel’s asset() helper function generates a URL for an asset using the current scheme of the request (HTTP or HTTPS):

$url = asset('img/photo.jpg');

These assets must be stored on S3 and served via CloudFront’s global CDN. Configure the URL host by setting the ASSET_URL variable in your .env file:

ASSET_URL=https://{YourCloudFrontDomain}.cloudfront.net

This allows the application to correctly reference assets from S3, via the CloudFront domain. Laravel’s native asset() helper function is used from within the view templates with the following format:

<img src="{{ asset('assets/icons.png') }}">
Serverless Laravel App with Lambda

Serverless Laravel App with Lambda

Alternative deployments methods for a serverless Laravel application

1. Bref, an open source custom runtime for PHP, recently merged a new pull request to automatically configure Laravel for Lambda. This new package also provides a way to integrate Amazon SQS with the Laravel Queues Jobs system.

2. Laravel Vapour is a serverless deployment platform for Laravel. This is a paid service, built by the Laravel team on the AWS Cloud.

Conclusion

This post explains how to deploy a PHP Laravel application using a serverless approach with AWS SAM. It explains the initial Laravel configuration steps required to implement a session store and centralised logging with an external filesystem and static assets in S3.

PHP development teams can focus on shipping code without changing the way they build. Start building serverless applications with PHP.

Visit this GitHub repository for accompanying code and instructions.

Building well-architected serverless applications: Controlling serverless API access – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-controlling-serverless-api-access-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the nine serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the Introduction post for a table of contents and explanation of the example application.

Security question SEC1: How do you control access to your serverless API?

Use authentication and authorization mechanisms to prevent unauthorized access, and enforce quota for public resources. By controlling access to your API, you can help protect against unauthorized access and prevent unnecessary use of resources.

AWS has a number of services to provide API endpoints including Amazon API Gateway and AWS AppSync.

Use Amazon API Gateway for RESTful and WebSocket APIs. Here is an example serverless web application architecture using API Gateway.

Example serverless application architecture using API Gateway

Example serverless application architecture using API Gateway

Use AWS AppSync for managed GraphQL APIs.

AWS AppSync overview diagram

AWS AppSync overview diagram

The serverless airline example in this series uses AWS AppSync to provide the frontend, user-facing public API. The application also uses API Gateway to provide backend, internal, private REST APIs for the loyalty and payment services.

Good practice: Use an authentication and an authorization mechanism

Authentication and authorization are mechanisms for controlling and managing access to a resource. In this well-architected question, that is a serverless API. Authentication is verifying who a client or user is. Authorization is deciding whether they have the permission to access a resource. By enforcing authorization, you can prevent unauthorized access to your workload from non-authenticated users.

Integrate with an identity provider that can validate your API consumer’s identity. An identity provider is a system that provides user authentication as a service. The identity provider may use the XML-based Security Assertion Markup Language (SAML), or JSON Web Tokens (JWT) for authentication. It may also federate with other identity management systems. JWT is an open standard that defines a way for securely transmitting information between parties as a JSON object. JWT uses frameworks such as OAuth 2.0 for authorization and OpenID Connect (OIDC), which builds on OAuth2, and adds authentication.

Only authorize access to consumers that have successfully authenticated. Use an identity provider rather than API keys as a primary authorization method. API keys are more suited to rate limiting and throttling.

Evaluate authorization mechanisms

Use AWS Identity and Access Management (IAM) for authorizing access to internal or private API consumers, or other AWS Managed Services like AWS Lambda.

For public, user facing web applications, API Gateway accepts JWT authorizers for authenticating consumers. You can use either Amazon Cognito or OpenID Connect (OIDC).

App client authenticates and gets tokens

App client authenticates and gets tokens

For custom authorization needs, you can use Lambda authorizers.

A Lambda authorizer (previously called a custom authorizer) is an AWS Lambda function which API Gateway calls for an authorization check when a client makes a request to an API method. This means you do not have to write custom authorization logic in a function behind an API. The Lambda authorizer function can validate a bearer token such as JWT, OAuth, or SAML, or request parameters and grant access. Lambda authorizers can be used when using an identity provider other than Amazon Cognito or AWS IAM, or when you require additional authorization customization.

Lambda authorizers

Lambda authorizers

For more information, see the AWS Hero blog post, “The Complete Guide to Custom Authorizers with AWS Lambda and API Gateway”.

The AWS documentation also has a useful section on “Understanding Lambda Authorizers Auth Workflow with Amazon API Gateway”.

Enforce authorization for non-public resources within your API

Within API Gateway, you can enable native authorization for users authenticated using Amazon Cognito or AWS IAM. For authorizing users authenticated by other identity providers, use Lambda authorizers.

For example, within the serverless airline, the loyalty service uses a Lambda function to fetch loyalty points and next tier progress. AWS AppSync acts as the client using an HTTP resolver, via an API Gateway REST API /loyalty/{customerId}/get resource, to invoke the function.

To ensure only AWS AppSync is authorized to invoke the API, IAM authorization is set within the API Gateway method request.

Viewing API Gateway IAM authorization

Viewing API Gateway IAM authorization

The serverless airline uses the AWS Serverless Application Model (AWS SAM) to deploy the backend infrastructure as code. This makes it easier to know which IAM role has access to the API. One of the benefits of using infrastructure as code is visibility into all deployed application resources, including IAM roles.

The loyalty service AWS SAM template contains the AppsyncLoyaltyRestApiIamRole.

AppsyncLoyaltyRestApiIamRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
  AppsyncLoyaltyRestApiIamRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: appsync.amazonaws.com
            Action: sts:AssumeRole
      Path: /
      Policies:
        - PolicyName: LoyaltyApiInvoke
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - execute-api:Invoke
                # arn:aws:execute-api:region:account-id:api-id/stage/METHOD_HTTP_VERB/Resource-path
                Resource: !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${LoyaltyApi}/*/*/*

The IAM role specifies that appsync.amazonaws.com can perform an execute-api:Invoke on the specific API Gateway resource arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${LoyaltyApi}/*/*/*

Within AWS AppSync, you can enable native authorization for users authenticating using Amazon Cognito or AWS IAM. You can also use any external identity provider compliant with OpenID Connect (OIDC).

Improvement plan summary:

  1. Evaluate authorization mechanisms.
  2. Enforce authorization for non-public resources within your API

Required practice: Use appropriate endpoint type and mechanisms to secure access to your API

APIs may have public or private endpoints. Consider public endpoints to serve consumers where they may not be part of your network perimeter. Consider private endpoints to serve consumers within your network perimeter where you may not want to expose the API publicly. Public and private endpoints may have different levels of security.

Determine your API consumer and choose an API endpoint type

For providing public content, use Amazon API Gateway or AWS AppSync public endpoints.

For providing content with restricted access, use Amazon API Gateway with authorization to specific resources, methods, and actions you want to restrict. For example, the serverless airline application uses AWS IAM to restrict access to the private loyalty API so only AWS AppSync can call it.

With AWS AppSync providing a GraphQL API, restrict access to specific data types, data fields, queries, mutations, or subscriptions.

You can create API Gateway private REST APIs that you can only access from your AWS Virtual Private Cloud(VPC) by using an interface VPC endpoint.

API Gateway private endpoints

API Gateway private endpoints

For more information, see “Choose an endpoint type to set up for an API Gateway API”.

Implement security mechanisms appropriate to your API endpoint

With Amazon API Gateway and AWS AppSync, for both public and private endpoints, there are a number of mechanisms for access control.

For providing content with restricted access, API Gateway REST APIs support native authorization using AWS IAM, Amazon Cognito user pools, and Lambda authorizers. Amazon Cognito user pools is a feature that provides a managed user directory for authentication. For more detailed information, see the AWS Hero blog post, “Picking the correct authorization mechanism in Amazon API Gateway“.

You can also use resource policies to restrict content to a specific VPC, VPC endpoint, a data center, or a specific AWS Account.

API Gateway resource policies are different from IAM identity policies. IAM identity policies are attached to IAM users, groups, or roles. These policies define what that identity can do on which resources. For example, in the serverless airline, the IAM role AppsyncLoyaltyRestApiIamRole specifies that appsync.amazonaws.com can perform an execute-api:Invoke on the specific API Gateway resource arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${LoyaltyApi}/*/*/*

Resource policies are attached to resources such as an Amazon S3 bucket, or an API Gateway resource or method. The policies define what identities can access the resource.

IAM access is determined by a combination of identity policies and resource policies.

For more information on the differences, see “Identity-Based Policies and Resource-Based Policies”. To see which services support resource-based policies, see “AWS Services That Work with IAM”.

API Gateway HTTP APIs support JWT authorizers as a part of OpenID Connect (OIDC) and OAuth 2.0 frameworks.

API Gateway WebSocket APIs support AWS IAM and Lambda authorizers.

With AWS AppSync public endpoints, you can enable authorization with the following:

  • AWS IAM
  • Amazon Cognito User pools for email and password functionality
  • Social providers (Facebook, Google+, and Login with Amazon)
  • Enterprise federation with SAML

Within the serverless airline, AWS Amplify Console hosts the public user facing site. Amplify Console provides a git-based workflow for building, deploying, and hosting serverless web applications. Amplify Console manages the hosting of the frontend assets for single page app (SPA) frameworks in addition to static websites, along with an optional serverless backend. Frontend assets are stored in S3 and the Amazon CloudFront global edge network distributes the web app globally.

The AWS Amplify CLI toolchain allows you to add backend resources using AWS CloudFormation.

Using Amplify CLI to add authentication

For the serverless airline, I use the Amplify CLI to add authentication using Amazon Cognito with the following command:

amplify add auth

When prompted, I specify the authentication parameters I require.

Amplify add auth

Amplify add auth

Amplify CLI creates a local CloudFormation template. Use the following command to deploy the updated authentication configuration to the cloud:

amplify push

Once the deployment is complete, I view the deployed authentication nested stack resources from within the CloudFormation Console. I see the Amazon Cognito user pool.

View Amplify authentication CloudFormation nested stack resources

View Amplify authentication CloudFormation nested stack resources

For a more detailed walkthrough using Amplify CLI to add authentication for the serverless airline, see the build video.

For more information on Amplify CLI and authentication, see “Authentication with Amplify”.

Conclusion

To help protect against unauthorized access and prevent unnecessary use of serverless API resources, control access using authentication and authorization mechanisms.

In this post, I cover the different mechanisms for authorization available for API Gateway and AWS AppSync. I explain the different approaches for public or private endpoints and show how to use IAM to control access to internal or private API consumers. I walk through how to use the Amplify CLI to create an Amazon Cognito user pool.

This well-architected question will be continued in a future post where I continue using the Amplify CLI to add a GraphQL API. I will explain how to view JSON Web Tokens (JWT) claims, and how to use Cognito identity pools to grant temporary access to AWS services. I will also show how to use API keys and API Gateway usage plans for rate limiting and throttling requests.

The Serverless LAMP stack part 3: Replacing the web server

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/the-serverless-lamp-stack-part-3-replacing-the-web-server/

In this post, you learn how to build serverless PHP applications without needing a web server.

Later in this post, Matthieu Napoli the creator of Bref and Serverless Visually Explained, tells how the implementation of FastCGI Process Manager inside of Lambda helps makes this possible. Bref is an open source runtime Lambda layer for PHP.

I show how to configure Amazon CloudFront to securely serve and cache static assets from a private Amazon S3 bucket. Dynamic requests are routed downstream to Amazon API Gateway and onto a single AWS Lambda function.

These services combine to replace the traditional web server for PHP applications.

Visit this GitHub repository for the sample code.

This serverless LAMP stack architecture is first discussed in this post. A web application is split in to two components (static assets and and the backend application that generates dynamic content). The Lambda function contains the application’s business logic and interactions with the MySQL database. Each response is synchronously returned via API Gateway.

Routing with API Gateway

The serverless LAMP stack does not use an http server. Instead, API Gateway replaces the routing mechanism of Apache or NGINX. The AWS Serverless Application Model (AWS SAM) is used to configure API Gateway routing rules.

      Events:
        DynamicRequestsRoot:
          Type: HttpApi
          Properties:
            Path: /
            Method: ANY
        DynamicRequestsProxy:
          Type: HttpApi
          Properties:
            Path: /{proxy+}
            Method: ANY

AWS SAM template to route all inbound requests from HTTP API to a single Lambda function.

The preceding template creates an HTTP API with a “catch-all” rule for inbound requests. The request context is sent downstream to a single Lambda function. This is similar behavior to that of a PHP MVC framework that forwards requests to an index.php file. The following shows how this is achieved in a traditional LAMP stack, using a combination of web server and .htaccess configurations.

Alias /yourdir /var/www/html/yourdir/public/ 
<Directory “/var/www/html/yourdir/public”> 
AllowOverride All 
Order allow,deny 
Allow from all 
</Directory>

apache2.conf file configuration

<IfModule mod_rewrite.c> 
RewriteEngine On 
RewriteBase / 
RewriteRule ^index\.php$ - [L] 
RewriteCond %{REQUEST_FILENAME} !-f 
RewriteCond %{REQUEST_FILENAME} !-d 
RewriteRule . /index.php [L] 
</IfModule>

Public/.htaccess configuration

Using Bref to host traditional PHP frameworks

Bref is an open source PHP runtime layer for Lambda. Using the bref-fpm layer, it’s possible to build applications with traditional PHP frameworks such as Symfony and Laravel. The framework sits within a single Lambda function and is invoked using the service architecture and routing rules illustrated previously. This is made possible due to Bref’s implementation of FastCGI Process Manager. Matthieu Napoli, creator of Bref, explains how.

Bref’s “FPM runtime” runs the php-fpm binary. PHP-FPM is a server implementing the FastCGI protocol, developed by the PHP core team. It is traditionally used with HTTP servers like Apache or NGINX.

Bref’s implementation of PHP-FPM allows PHP applications to run in a familiar environment by:

  • Running each HTTP request in a new process, which is the foundation of PHP’s “shared-nothing” execution model.
  • Populating the global variables ($_GET, $_POST…) used to access HTTP request data.
  • Providing a mechanism for PHP scripts to return HTTP responses (the header() function, stdout…).
  • Providing performance optimizations, such as OPcache (opcode cache), APCu (shared memory cache), or database persistent connections.

Most PHP frameworks are built around these PHP-FPM features, making this runtime an excellent transition from “server hosting” to serverless.

Here is an overview of how the runtime works:

Bref-fpm cycle

Startup

On initial invoke of a new Lambda environment, Bref’s bootstrap is executed and starts the php-fpm process in the background. This PHP-FPM server now waits for new connections on the FastCGI protocol.

The request/response cycle

Whenever a new HTTP request is sent to the application, the following happens:

  1. API Gateway receives the HTTP request and invokes AWS Lambda.
  2. The Lambda function environment executes the bootstrap for the Bref based runtime.
  3. Bref converts the HTTP request from the API Gateway format to the FastCGI format.
  4. Bref calls PHP-FPM through the FastCGI protocol.
  5. PHP-FPM runs the PHP handler and returns its response.
  6. Bref converts the FastCGI response to the API Gateway format.
  7. Bref returns the response to API Gateway, which returns the HTTP response to the client.

While there are multiple processes, this happens quickly.

AWS X-Ray trace view shows that the Lambda function finishes executing in 9 ms.

AWS X-Ray trace view shows that the Lambda function finishes executing in 9ms.

The Bref runtime performs a job similar to Apache or NGINX (forwarding an HTTP request through the FastCGI protocol), and PHP-FPM has been optimized for decades. Between requests, PHP-FPM does not kill and create new PHP processes. It keeps the same process and reset its memory (preserving in-memory caches like OPcache and APCu).

Configuring PHP for Lambda

Bref optimizes the configuration of PHP-FPM for AWS Lambda:

  • PHP-FPM runs a single “worker” because a Lambda instance handles one HTTP request at a time.
  • The standard error output of PHP-FPM is forwarded to CloudWatch. This makes logging from PHP as easy as writing to “stderr”.
  • All PHP errors, warnings, and notices are logged outside of the HTTP response and forwarded to CloudWatch by default.
  • PHP’s OPcache is optimized to avoid reading from disk because the PHP code base is mounted as read-only in Lambda.

Additionally, Bref adds behaviors that provide an easy migration from Apache/NGINX to API Gateway and Lambda:

  • Uploaded files are bridged with PHP-FPM’s uploaded file mechanism.
  • HTTP requests with binary content are automatically decoded from API Gateway’s base64 format.
  • Binary HTTP responses can also be automatically encoded to base64 by Bref.
  • Cookies are adapted to work with PHP-FPM’s mechanisms.

Bref also supports both v1 and v2 payload formats from API Gateway requests.

Static content routing and caching with Amazon CloudFront

The Lambda pricing model charges per request and per duration at GB of RAM allocated. This makes it ideal for handling requests for dynamic compute.

Amazon CloudFront handles requests for static content more efficiently than a server. This is a large scale, global, content delivery network (CDN) that provides secure, scalable delivery of content. It does this by caching data across a points of presence distributed all over the globe. This reduces the load on an application origin and improves the experience of the requestor by delivering a local copy of the content.

A CloudFront web distribution can serve different types of data from multiple origins. This template configures CloudFront to route requests for static assets directly to an S3 bucket. It routes all other requests directly to API Gateway.

Origins:
   -   Id: Website  
   DomainName: !Join ['.', [!Ref ServerlessHttpApi, 'execute-api', !Ref AWS::Region, 'amazonaws.com']]
   # This is the stage
   OriginPath: "/dev"
   CustomOriginConfig:
   	   OriginProtocolPolicy: 'https-only' # API Gateway only supports HTTPS
   # The assets (S3)
   -   Id: Assets
   DomainName: !GetAtt Assets.RegionalDomainName
   S3OriginConfig: {}

API Gateway routing is configured with HTTP APIs to route all inbound requests downstream to a single Lambda function, as previously shown.

Restricting access to Amazon S3 assets by using an origin access identity (OAI)

It is best practice to implement least privilege access permissions for each resource. This reduces security risk and the impact that could result from errors or malicious intent. Following this best practice, a security restriction is applied to the S3 bucket. The bucket is made private, with the objects inside only made available via the CloudFront distribution.

This is achieved by using an origin access identity (OAI). The OAI is defined within the CloudFormation template:

  S3OriginIdentity:
    Type: AWS::CloudFront::CloudFrontOriginAccessIdentity
    Properties:
      CloudFrontOriginAccessIdentityConfig:
        Comment: Cloudfront AOI

It is then set as a principal within the S3 bucket’s policy.

AssetsBucketPolicy: 
    Type: AWS::S3::BucketPolicy
    Properties: 
      Bucket:
        Ref: Assets # References the bucket we defined above
      PolicyDocument: 
        Statement:
          Effect: Allow  
          Action: s3:GetObject # to read
          Principal: 
            CanonicalUser: 
              Fn::GetAtt: S3OriginIdentity.S3CanonicalUserId
          Resource: # things in the bucket 'arn:aws:s3:::<bucket-name>/*'
            Fn::Join: 
                - ""
                - 
                  - "arn:aws:s3:::"
                  - 
                    Ref: Assets
                  - "/*"

Deploying the infrastructure

This GitHub repository contains an AWS SAM template with instructions to deploy this infrastructure. It has a single Lambda function (index.php) which uses Bref’s php-73-fpm:25 runtime layer:

Layers:
        - 'arn:aws:lambda:us-east-1:209497400698:layer:php-73-fpm:25'

A /vendors directory, holding the Bref runtime dependencies is also included. The handler inside Index.php returns HTML content to API Gateway’s requests. Within the Lambda function handler, there is a reference to a static image and static css file:

<link href="/assets/style.css" rel="stylesheet">
…
<img src="/assets/serverless-lamp-stack.png">

These files are referenced relatively (and not absolutely), because they are served under the same CloudFront domain as the dynamic portion of the website. Navigating to the generated CloudFront domain shows the dynamic webpage, along with the referenced static image. The Lambda function uses the global $_GET variable, made available to it by FastCGI process manager.

Servelress PHP website exampleBy building or replacing the index.php with your own framework, it’s possible to deploy feature-rich serverless web applications with PHP. Refer to Bref’s documentation for more information on building with popular PHP frameworks using the bref-fpm custom runtime.

Conclusion

This post explains how to build PHP applications with Lambda and API Gateway in place of an HTTP server like Apache or NGINX. It describes how to separate your application into static and dynamic requests. All dynamic HTTP requests are routed to a single Lambda function using Bref’s FPM custom runtime layer. The custom runtime’s implementation of FastCGI Process Manager makes it possible to build PHP applications with traditional frameworks.

Replacing the HTTP server frees developers from the responsibilities of web server maintenance, configuration, synchronization and scaling. PHP development teams can focus on shipping code without changing the way they build.

Start building serverless applications with PHP.

Deploy a dashboard for AWS WAF with minimal effort

Post Syndicated from Tomasz Stachlewski original https://aws.amazon.com/blogs/security/deploy-dashboard-for-aws-waf-minimal-effort/

In this post, I’ll show you how to deploy a solution in your Amazon Web Services (AWS) account that will provide a fully automated dashboard for AWS Web Application Firewall (WAF) service. The solution uses logs generated and collected by AWS WAF, and displays them in a user-friendly dashboard shown in Figure 1.

Figure 1: User-friendly dashboard for AWS Web Application Firewall

Figure 1: User-friendly dashboard for AWS Web Application Firewall

The dashboard provides multiple graphs for you to reference, filter, and adjust that are available out-of-the-box. The example in Figure 1 shows data from a sample web page that I created, where you can see:

  • Executed AWS WAF rules
  • Number of all requests
  • Number of blocked requests
  • Allowed versus blocked requests
  • Countries by number of requests
  • HTTP methods
  • HTTP versions
  • Unique IP count
  • Request count
  • Top 10 IP addresses
  • Top 10 countries
  • Top 10 user-agents
  • Top 10 hosts
  • Top 10 web ACLs

The dashboard is created using Kibana, which provides flexibility by enabling you to add new diagrams and visualizations.

AWS WAF is a web application firewall. It helps protect your web applications or APIs against common web exploits that can affect availability, compromise security, or consume excessive resources. In just a few steps, you can deploy AWS WAF to your Application Load Balancer, Amazon CloudFront distribution, or Amazon API Gateway stages. I’ll show you how you can use it to get more insights into what’s happening at the AWS WAF layer. AWS WAF provides two versions of the service: AWS WAF (version 2) and AWS WAF classic. We recommend using version 2 of AWS WAF to stay up to date with the latest features as AWS WAF classic is no longer being updated. The solution that I describe in this blog post works with both AWS WAF versions.

The solution is swift to deploy: The dashboard can be ready to use in less than an hour. The solution is built with multiple AWS services such as Amazon Elasticsearch (Amazon ES), AWS Lambda, Amazon Kinesis Data Firehose, Amazon Cognito, Amazon EventBridge, and more. However, you don’t need to know those services in detail to build and use the dashboard. I prepared a CloudFormation template that you can deploy in the AWS Console to set up the whole solution automatically on your AWS account. You can also find the whole solution on our AWS Github. It’s open source, so you can use and edit it to meet your needs.

The architecture of the solution can be broken down into 7 steps, which are outlined in Figure 2.

Figure 2: Interaction points while architecting the dashboard

Figure 2: Interaction points while architecting the dashboard

The interaction points are as follows:

  1. One of the functionalities of AWS WAF service is AWS WAF logs. The logs capture information about blocked and allowed requests. These logs are forwarded to Kinesis Data Firehose service.
  2. Kinesis Data Firehose buffer receives information, and then sends it to Amazon ES—the core of the solution.
  3. Some information, like the names of AWS WAF web ACLs aren’t provided in the AWS WAF logs. To make the whole solution more friendly for users, I used EventBridge, which will be called whenever a user changes their configuration of AWS WAF.
  4. EventBridge will call a Lambda function when new rules are created.
  5. Lambda will retrieve the information about all existing rules and it will update the mapping between IDs of the rules and their names in the Amazon ES cluster.
  6. To make the whole solution more secure, I’m using Amazon Cognito service to store the credentials of authorized dashboard users.
  7. The user enters their credentials to access the dashboard on Kibana which is installed on Amazon ES cluster.

Now, let’s deploy the solution and see how it works.

Step 1: Deploy solution using CloudFormation template

Click Launch Stack to launch a CloudFormation stack in your account and deploy the solution.

Select button to launch stack

You’ll be redirected to the CloudFormation service in North Virginia, USA, which is the default region to deploy this solution for an AWS WAF WebACL associated to CloudFront. You can change the region if you want. This template will spin up multiple cloud resources, including but not limited to:

  • Amazon ES cluster with Kibana for storing data and displaying dashboard
  • Amazon Cognito user pool with a registry of users who have access to dashboards
  • Kinesis Data Firehose for streaming logs to Amazon ES

In the wizard, you’ll be asked to modify or provide four different parameters. They are:

  • DataNodeEBSVolumeSize: Storage size of Amazon ES cluster which will be created. You can leave the default value.
  • ElasticSearchDomainName: Name of your Amazon ES cluster domain. You can leave the default value.
  • NodeType: Type of the instance which will be used to create Amazon ES cluster. You don’t need to change it if you don’t want to, but you can if necessary to accommodate your needs.
  • UserEmail: You must update this parameter. It is the email address that will receive the password to log in to Kibana.

Step 2: Wait

The process of launching a template, which I named aws-waf-dashboard for this example, will take 20–30 minutes. You can take a break and wait until the status of the stack changes to CREATE_COMPLETE.

Figure 3: Completed launch of the CloudFormation template

Figure 3: Completed launch of the CloudFormation template

Step 3: Validate that Kibana and dashboards work

Check your email. You should have received an email with the required password to log in to the Kibana dashboard. Make a note of it. Now return to the CloudFormation service and select the aws-waf-dashboard template. In the Output tab, there should be one parameter with a link to your dashboard in the Value column.

Figure 4: Output the CloudFormation template

Figure 4: Output the CloudFormation template

Select the link and log in to Kibana. Provide the email address that you set up in Step 1 and the password that was sent to it. You might be prompted to update the password.

In Kibana, select the Dashboard tab, as shown in Figure 5, and then select WAFDashboard in the table. This will call up the AWS WAF dashboard. It should still be empty because it hasn’t been connected with AWS WAF yet.

Figure 5: Empty Kibana dashboard

Figure 5: Empty Kibana dashboard

Step 4: Connect AWS WAF logs

Now it’s time to enable AWS WAF logs on the web ACL for which you want to create a dashboard and connect them to this solution. Open AWS WAF, select the AWS WAF dropdown option, select Web ACLs, and then select your desired web ACL. In this example, I used a previously created web ACL called MyPageWAF, as shown in Figure 6.

Figure 6: WAF & Shield

Figure 6: WAF & Shield

If you didn’t enable AWS WAF logs yet, you need to do it now in order to continue. To do this, select Logging and metrics in your web ACL, and then Enable logging, as shown in Figure 7.

Figure 7: Enable AWS WAF logs

Figure 7: Enable AWS WAF logs

Select the drop-down list under Amazon Kinesis Data Firehose Delivery Stream and then select the Kinesis Firehose which was created by the template in step 2. Its name starts with aws-waf-logs. Save your changes.

Figure 8: Select the Kinesis Firehose

Figure 8: Select the Kinesis Firehose

Step 5: Final validation

Your AWS WAF logs will be sent from the AWS WAF service through Kinesis Data Firehose directly to an Amazon ES cluster and will be available to you using Kibana dashboards. After a couple of minutes, you should start seeing data on your dashboard similar to the screenshot in Figure 1.

And that’s all! As you can see, in just a few steps we built and deployed a solution, which we can use to examine our AWS WAF configuration and see what kind of requests are being made and if they’re blocked or allowed.

Sample Scenario

Let’s go through a sample scenario to see one way you can use this solution. I built a small website for my dog and configured CloudFront to accelerate it and to make it more secure.

Figure 9: Java the Dog's homepage

Figure 9: Java the Dog’s homepage

Next, I configured an AWS WAF web ACL and attached it to my CloudFront distribution, which is the entry point of my website. In my AWS WAF web ACL, I didn’t add any rules, but allowed all requests. This will allow me to log all requests and understand who is visiting my website. Then I configured an AWS WAF dashboard by following the steps in this blog.

My imaginary website is mainly dedicated to three countries—USA, Germany and Japan—where French Bulldogs are very popular. I noticed that I got quite a lot of users from India, which was unexpected. In Figure 10, the AWS WAF dashboard includes data from all four countries and tells me there have been over 11,000 requests for my website.

Figure 10: Kibana dashboard with requests from USA, Japan, Germany, and India

Figure 10: Kibana dashboard with requests from USA, Japan, Germany, and India

To understand the data better, I filtered on requests coming only from India, which is shown in Figure 11:

Figure 11: Website requests coming from India only

Figure 11: Website requests coming from India only

The dashboard shows that I got more than 700 requests from India in the previous hour. This could have been a great success for my website, but unfortunately, all the requests were coming from single IP address. Additionally, most of them have a suspicious user-agent header: “secret-hacker-agent.” This information is provided in the Visualize tab in Kibana, shown in Figure 12.

Figure 12: Visualize tab of Kibana dashboard

Figure 12: Visualize tab of Kibana dashboard

This doesn’t look good, so I decided to block those requests using AWS WAF.

So, the question now is what to block exactly? I can block all requests coming from India, but this isn’t the best idea because there might be other Indian fans of French Bulldogs. I can block this single IP address, but the hacker might use a different IP to continue hitting my website. Finally, I decided to create an AWS WAF rule that inspects the user-agent header. If the user-agent header contains “secret-hacker-agent,” the rule will block the request

Within a couple of minutes of configuring my AWS WAF rule, I noticed that I was still getting requests from India, but this time, requests with the suspicious user-agent header were blocked! As shown in Figure 13, there were around 2,700 requests, but about 2,000 of them were blocked.

Figure 13: Blocked suspicious requests

Figure 13: Blocked suspicious requests

In reality, I was attacking my own website as secret-hacker-agent for the sake of the example. You can see in the following command line screenshot that my request (using wget) with the suspicious user-agent header was blocked (receiving a “403 Forbidden” message). When I use a different header (“good-agent”), my request passes the AWS WAF rule successfully.

Figure 14: Command line screenshot of the 'wget' request

Figure 14: Command line screenshot of the ‘wget’ request

Summary

In this post we’ve detailed how to deploy a dashboard for AWS WAF in a few steps, and how to use it to troubleshoot and block a web application attack. Now it’s your turn to deploy this solution for your own application. Please share your feedback about the solution and the dashboard. You can submit comments in the Comments section below or on the project’s GitHub page.

This post was inspired by a blog post created by my friend Tom Adamski, who also described how to use Kibana and Amazon ES to visualize AWS WAF logs, and with help of Achraf Souk, who contributed his specialist knowledge in AWS edge services.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Tomasz Stachlewski

Tomasz is a Senior Solution Architecture Manager at AWS, where he helps companies of all sizes (from startups to enterprises) in their Cloud journey. He is a big believer in innovative technology such as serverless architecture, which allows organizations to accelerate their digital transformation.

Building a serverless URL shortener app without AWS Lambda – part 3

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-a-serverless-url-shortener-app-without-lambda-part-3/

This is the final installment of a three-part series on building a serverless URL shortener without using AWS Lambda. This series highlights the power of Amazon API Gateway and its ability to directly integrate with services like Amazon DynamoDB. The result is a low latency, highly available application that is built with managed services and requires minimal code.

In part one of this series, I demonstrate building a serverless URL shortener application without using AWS Lambda. In part two, I walk through implementing application security using Amazon API Gateway settings and Amazon Cognito. In this part of this series, I cover application observability and performance.

Application observability

Before I can gauge the performance of the application, I must first be able to observe the performance of my application. There are two AWS services that I configure to help with observability, AWS X-Ray and Amazon CloudWatch.

X-Ray

X-Ray is a tracing service that enables developers to observe and debug distributed applications. With X-Ray enabled, every call to the API Gateway endpoint is tagged and monitored throughout the application services. Now that I have the application up and running, I want to test for errors and latency. I use Nordstrom’s open-source load testing library, serverless-artillery, to generate activity to the API endpoint. During the load test, serverless-artillery generates 8,000 requests per second (RPS) for a period of five minutes. The results are as follows:

X-Ray Tracing

This indicates that, from the point the request reaches API Gateway to when a response is generated, the average time for each request is 8 milliseconds (ms) with a 4 ms integration time to DynamoDB. It also indicates that there were no errors, faults, or throttling.

I change the parameters to increase the load and observe how the application performs. This time serverless-artillery generates 11,000rps for a period of 30 seconds. The results are as follows:

X-Ray tracing with throttling

X-Ray now indicates request throttling. This is due to the default throttling limits of API Gateway. Each account has a soft limit of 10,000rps with a burst limit of 5k requests. Since I am load testing the API with 11,000rps, API Gateway is throttling requests over 10k per second. When throttling occurs, API Gateway responds to the client with a status code of 429. Using X-Ray, I can drill down into the response data to get a closer look at requests by status code.

X-Ray analytics

CloudWatch

The next tool I use for application observability is Amazon CloudWatch. CloudWatch captures data for individual services and supports metric based alarms. I create the following alarms to have insight into my application:

AlarmTrigger
APIGateway4xxAlarmOne percent of the API calls result in a 4xx error over a one-minute period.
APIGateway5xxAlarmOne percent of the API calls result in a 5xx error over a one-minute period.
APIGatewayLatencyAlarmThe p99 latency experience is over 75 ms over a five-minute period.
DDB4xxAlarmOne percent of the DynamoDB requests result in a 4xx error over a one-minute period.
DDB5xxAlarmOne percent of the DynamoDB requests result in a 5xx error over a one-minute period.
CloudFrontTotalErrorRateAlarmFive requests to CloudFront result in a 4xx or 5xx error over a one-minute period.
CloudFrontTotalCacheHitRAteAlarm80% or less of the requests to CloudFront result in a cache hit over a five-minute period. While this is not an error or a problem, it indicates the need for a more aggressive caching story.

Each of these alarms is configured to publish to a notification topic using Amazon Simple Notification Service (SNS). In this example I have configured my email address as a subscriber to the SNS topic. I could also subscribe a Lambda function or a mobile number for SMS message notification. I can also get a quick view of the status of my alarms on the CloudWatch console.

CloudWatch and X-Ray provide additional alerts when there are problems. They also provide observability to help remediate discovered issues.

Performance

With observability tools in place, I am now able to evaluate the performance of the application. In part one, I discuss using API Gateway and DynamoDB as the primary services for this application and the performance advantage provided. However, these performance advantages are limited to the backend only. To improve performance between the client and the API I configure throttling and a content delivery network with Amazon CloudFront.

Throttling

Request throttling is handled with API Gateway and can be configured at the stage level or at the resource and method level. Because this application is a URL shortener, the most important action is the 301 redirect that happens at /{linkId} – GET. I want to ensure that these calls take priority, so I set a throttling limit on all other actions.

The best way to do this is to set a global throttling of 2000rps with a burst of 1k. I then configure an override on the /{linkId} – GET method to 10,000rps with a burst of 5k. If the API is experiencing an extraordinarily high volume of calls, all other calls are rejected.

Content delivery network

The distance between a user and the API endpoint can severely affect the performance of an application. Simply put, the further the data has to travel, the slower the application. By configuring a CloudFront distribution to use the Amazon CloudFront Global Edge Network, I bring the data closer to the user and increase performance.

I configure the cache for /{linkId} – GET to “max-age=300” which tells CloudFront to store the response of that call for 5 minutes. The first call queries the API and database for a response, while all subsequent calls in the next five minutes receive the local cached response. I then set all other endpoint cache to “no-cache, no-store”, which tells CloudFront to never store the value from these calls. This ensures that as users are creating or editing their short-links, they get the latest data.

By bringing the data closer to the user, I now ensure that regardless of where the user is, they receive improved performance. To evaluate this, I return to serverless-artillery and test the CloudFront endpoint. The results are as follows:

MinMaxAveragep10p50p90p95p99
8.12 ms739 ms21.7 ms10.1 ms12.1 ms20 ms34 ms375 ms

To be clear, these are the 301 redirect response times. I configured serverless-artillery not to follow the redirects as I have no control of the speed of the resulting site. The maximum response time was 739 ms. This would be the initial uncached call. The p50 metric shows that half of the traffic is seeing a 12 ms or better response time while the p95 indicates that most of my traffic is experiencing an equal or better than 34 ms response time.

Conclusion

In this series, I talk through building a serverless URL shortener without the use of any Lambda functions. The resulting architecture looks like this:

This application is built by integrating multiple managed services together and applying business logic via mapping templates on API Gateway. Because these are all serverless managed services, they provide inherent availability and scale to meet the client load as needed.

While the “Lambda-less” pattern is not a match for every application, it is a great answer for building highly performant applications with minimal logic. The advantage to this pattern is also in its extensibility. With the data saved to DynamoDB, I can use the DynamoDB streaming feature to connect additional processing as needed. I can also use CloudFront access logs to evaluate internal application metrics. Clone this repo to start serving your own shortened URLs and submit a pull request if you have an improvement.

Did you miss any of this series?

  1. Part 1: Building the application.
  2. Part 2: Securing the application.

Happy coding!

Architecting a Low-Cost Web Content Publishing System

Post Syndicated from Craig Jordan original https://aws.amazon.com/blogs/architecture/architecting-a-low-cost-web-content-publishing-system/

Introduction

When an IT team first contemplates reducing on-premises hardware they manage to support their workloads they often feel a tension between wanting to use cloud-native services versus taking a lift-and-shift approach. Cloud native services based on serverless designs could reduce costs and enable a solution that is easier to operate, but appears to be disruptive to end user processes and tools. A lift-and-shift migration, though it can eliminate on-premises hardware and maintain existing workflows, doesn’t eliminate the need to manage a server infrastructure, does nothing to improve a team’s agility in releasing enhancements after migration to the cloud, and may not optimize the cost of the resulting solution. Rather than settling for an either/or option that sacrifices cost savings and ease of operation in order to be non-intrusive to their web authors’ daily work, the University of Saint Thomas, Minnesota team implemented a creative hybrid approach that both avoids end user disruption and achieves the cost savings, agility, and simplified administration that a cloud-native solution can provide.

The Situation

University of St. Thomas wanted to reduce on-premises management of hardware for their university website. In addition, by migrating this functionality to the cloud, they intended to increase the website’s availability. The on-premises solution was deployed on an IIS server maintained by the IT team, but the content of the website was authored by staff members in departments across the university using two different Content Management Systems (CMS). The publishing process from these tools to the web server worked well, and there was no appetite for eliminating the distributed nature of the web site’s development nor the content management systems that the authors were comfortable with.

The IT team hoped to implement a serverless solution utilizing only Amazon Simple Storage Service (S3) to host the static website content. Not only would that reduce the cost of the solution, it would also eliminate having to manage web servers. One of the two content management systems could publish directly to S3, but unfortunately the other CMS could not.

A lift-and-shift migration approach would move the website onto an IIS server in Amazon Elastic Cloud Compute (EC2) and update the publishing process to write its outputs to this new server. This solution would avoid any impact to the authors because all the change would be accomplished behind the scenes by the IT team. However, this approach did not achieve the team’s goals of creating a solution that cost less and was easier to manage than the current on-premises one.

Rather than giving up on creating a cloud-native solution, the team worked from the constraints on the edges of the solution toward the middle.

Solution

Achieving the cost savings, management ease, and high availability for the solution depended upon using S3 to store the website’s contents (#1 in the diagram). If the CMS tools could have published directly to S3, the solution would have been completed by simply adjusting the CMS tools to target their output to S3. However, only one of the two CMS tools could do this. The other one expects to publish its output to a file system that is accessible to the on-premises server where the CMS tool runs. The team solved this problem by launching a t3.small EC2 instance (#2) to sit between the CMS tools and the S3 bucket that would store the website’s production content. Initially, it seemed like using two simple file sync processes could keep the file system of the EC2 instance synchronized with the CMS files. However, when the team first attempted this approach to build a copy of the website on EC2’s file system, they discovered that one of the sync processes would delete the other tool’s output rather than ignoring it when synchronizing updates from its tool to EC2.

To overcome this issue, the team created separate website roots in the EC2 file system into which each CMS would synchronize. Using Unionfs, a Linux utility that combines multiple directories into a single logical directory, a unified root folder for the website (#3) was created that could be easily pushed to S3 using the S3 CLI.

With this much of the solution in place the team had successfully created a new architecture for their website that was nearly as inexpensive as a static website hosted on S3, but that also maintained the tools and processes that their website authors were familiar with.

There was just one more technical issue to address: The IIS site contained internal metadata that redirected its users from virtual directories to the physical content located elsewhere in the website’s content. For example, https://..../law might be redirected to https://..../lawschool/ To achieve similar functionality, the IT team created one HTML file for each of these redirects and added them to a third website root directory in the EC2 instance (#4). These files include static HTML headers needed to redirect the user’s browser to the desired endpoint. Blending this directory with the other two through Unionfs creates a single logical copy of the website’s contents and that can be synchronized out to S3 with a S3 sync CLI command.

A final enhancement to the website was to use an Amazon CloudFront distribution (#5) to cache its contents providing improved response time for website visitors. The distribution object caching TTLs are set to defaults. The publishing process runs every 15 minutes, so to ensure that the website visitors would receive the latest content, the team wrote an AWS Lambda function (#6) that invalidates the cache each time an object is removed from (created in) the S3 bucket using S3 event notifications.

Conclusion

The University of Saint Thomas IT team found a creative way to implement a new solution for their university website that reduces the time and effort required to manage servers, achieves operational simplicity and cost savings by using cloud-native services, and yet doesn’t interfere with the web authoring tools and processes their customers were happy with. The mix of server-based and serverless components in their design illustrates how flexible cloud architectures can be and highlights the ingenuity of the team that built it.

Acknowledgements

Thank you to the following people at the University of Saint Thomas:

This solution was architected by Julian Mino, Cloud Architect. The creative use of Unionfs was suggested by William Bear, AVP for Applications and Infrastructure and former Linux administrator. Vicky Vue, Systems Engineer and Keith Ketchmark, Sr. Systems Engineer implemented the solution using Terraform, Ansible and Python. Daniel Strojny (Associate Director, Networks & IT Operations) helped resolved some internal DNS issues the team encountered.

200 Amazon CloudFront Points of Presence + Price Reduction

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/200-amazon-cloudfront-points-of-presence-price-reduction/

Less than two years ago I announced the 100th Point of Presence for Amazon CloudFront.

The overall Point of Presence footprint is now growing at 50% per year. Since we launched the 100th PoP in 2017, we have expanded to 77 cities in 34 countries including China, Israel, Denmark, Norway, South Africa, UAE, Bahrain, Portugal, and Belgium.

CloudFront has been used to deliver many high-visibility live-streaming events including Superbowl LIII, Thursday Night Football (via Prime Video), the Royal Wedding, the Winter Olympics, the Commonwealth Games, a multitude of soccer games (including the 2019 FIFA World Cup), and much more.

Whether used alone or in conjunction with other AWS services, CloudFront is a great way to deliver content, with plenty of options that also help to secure the content and to protect the underlying source. For example:

DDoS ProtectionAmazon CloudFront customers were automatically protected against 84,289 Distributed Denial of Service (DDoS) attacks in 2018, including a 1.4 Tbps memcached reflection attack.

Attack MitigationCloudFront customers used AWS Shield Advanced and AWS WAF to mitigate application-layer attacks, including a flood of over 20 million requests per second.

Certificate Management – We announced CloudFront Integration with AWS Certificate Manager in 2016, and use of custom certificates has grown by 600%.

New Locations in South America
Today I am happy to announce that our global network continues to grow, and now includes 200 Points of Presence, including new locations in Argentina (198), Chile (199), and Colombia (200):

AWS customer NED is based in Chile. They are using CloudFront to deliver server-side ad injection and low-latency content distribution to their clients, and are also using [email protected] to implement robust anti-piracy protection.

Price Reduction
We are also reducing the pricing for on-demand data transfer from CloudFront by 56% for all Points of Presence in South America, effective November 1, 2019. Check out the CloudFront Pricing page to learn more.

CloudFront Resources
Here are some resources to help you to learn how to make great use CloudFront in your organization:

Jeff;

 

Learn about AWS Services & Solutions – September AWS Online Tech Talks

Post Syndicated from Jenny Hang original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-september-aws-online-tech-talks/

Learn about AWS Services & Solutions – September AWS Online Tech Talks

AWS Tech Talks

Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

 

Compute:

September 23, 2019 | 11:00 AM – 12:00 PM PTBuild Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case.

September 26, 2019 | 1:00 PM – 2:00 PM PTSelf-Hosted WordPress: It’s Easier Than You Think – Learn how you can easily build a fault-tolerant WordPress site using Amazon Lightsail.

October 3, 2019 | 11:00 AM – 12:00 PM PTLower Costs by Right Sizing Your Instance with Amazon EC2 T3 General Purpose Burstable Instances – Get an overview of T3 instances, understand what workloads are ideal for them, and understand how the T3 credit system works so that you can lower your EC2 instance costs today.

 

Containers:

September 26, 2019 | 11:00 AM – 12:00 PM PTDevelop a Web App Using Amazon ECS and AWS Cloud Development Kit (CDK) – Learn how to build your first app using CDK and AWS container services.

 

Data Lakes & Analytics:

September 26, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Provisioning Amazon MSK Clusters and Using Popular Apache Kafka-Compatible Tooling – Learn best practices on running Apache Kafka production workloads at a lower cost on Amazon MSK.

 

Databases:

September 25, 2019 | 1:00 PM – 2:00 PM PTWhat’s New in Amazon DocumentDB (with MongoDB compatibility) – Learn what’s new in Amazon DocumentDB, a fully managed MongoDB compatible database service designed from the ground up to be fast, scalable, and highly available.

October 3, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Enterprise-Class Security, High-Availability, and Scalability with Amazon ElastiCache – Learn about new enterprise-friendly Amazon ElastiCache enhancements like customer managed key and online scaling up or down to make your critical workloads more secure, scalable and available.

 

DevOps:

October 1, 2019 | 9:00 AM – 10:00 AM PT – CI/CD for Containers: A Way Forward for Your DevOps Pipeline – Learn how to build CI/CD pipelines using AWS services to get the most out of the agility afforded by containers.

 

Enterprise & Hybrid:

September 24, 2019 | 1:00 PM – 2:30 PM PT Virtual Workshop: How to Monitor and Manage Your AWS Costs – Learn how to visualize and manage your AWS cost and usage in this virtual hands-on workshop.

October 2, 2019 | 1:00 PM – 2:00 PM PT – Accelerate Cloud Adoption and Reduce Operational Risk with AWS Managed Services – Learn how AMS accelerates your migration to AWS, reduces your operating costs, improves security and compliance, and enables you to focus on your differentiating business priorities.

 

IoT:

September 25, 2019 | 9:00 AM – 10:00 AM PTComplex Monitoring for Industrial with AWS IoT Data Services – Learn how to solve your complex event monitoring challenges with AWS IoT Data Services.

 

Machine Learning:

September 23, 2019 | 9:00 AM – 10:00 AM PTTraining Machine Learning Models Faster – Learn how to train machine learning models quickly and with a single click using Amazon SageMaker.

September 30, 2019 | 11:00 AM – 12:00 PM PTUsing Containers for Deep Learning Workflows – Learn how containers can help address challenges in deploying deep learning environments.

October 3, 2019 | 1:00 PM – 2:30 PM PTVirtual Workshop: Getting Hands-On with Machine Learning and Ready to Race in the AWS DeepRacer League – Join DeClercq Wentzel, Senior Product Manager for AWS DeepRacer, for a presentation on the basics of machine learning and how to build a reinforcement learning model that you can use to join the AWS DeepRacer League.

 

AWS Marketplace:

September 30, 2019 | 9:00 AM – 10:00 AM PTAdvancing Software Procurement in a Containerized World – Learn how to deploy applications faster with third-party container products.

 

Migration:

September 24, 2019 | 11:00 AM – 12:00 PM PTApplication Migrations Using AWS Server Migration Service (SMS) – Learn how to use AWS Server Migration Service (SMS) for automating application migration and scheduling continuous replication, from your on-premises data centers or Microsoft Azure to AWS.

 

Networking & Content Delivery:

September 25, 2019 | 11:00 AM – 12:00 PM PTBuilding Highly Available and Performant Applications using AWS Global Accelerator – Learn how to build highly available and performant architectures for your applications with AWS Global Accelerator, now with source IP preservation.

September 30, 2019 | 1:00 PM – 2:00 PM PTAWS Office Hours: Amazon CloudFront – Just getting started with Amazon CloudFront and [email protected]? Get answers directly from our experts during AWS Office Hours.

 

Robotics:

October 1, 2019 | 11:00 AM – 12:00 PM PTRobots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.

 

Security, Identity & Compliance:

October 1, 2019 | 1:00 PM – 2:00 PM PTDeep Dive on Running Active Directory on AWS – Learn how to deploy Active Directory on AWS and start migrating your windows workloads.

 

Serverless:

October 2, 2019 | 9:00 AM – 10:00 AM PTDeep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.

 

Storage:

September 24, 2019 | 9:00 AM – 10:00 AM PTOptimize Your Amazon S3 Data Lake with S3 Storage Classes and Management Tools – Learn how to use the Amazon S3 Storage Classes and management tools to better manage your data lake at scale and to optimize storage costs and resources.

October 2, 2019 | 11:00 AM – 12:00 PM PTThe Great Migration to Cloud Storage: Choosing the Right Storage Solution for Your Workload – Learn more about AWS storage services and identify which service is the right fit for your business.

 

 

Getting started with serverless

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/getting-started-with-serverless/

This post is contributed by Maureen Lonergan, Director, AWS Training and Certification

We consistently hear from customers that they’re interested in building serverless applications to take advantage of the increased agility and decreased total cost of ownership (TCO) that serverless delivers. But we also know that serverless may be intimidating for those who are more accustomed to using instances or containers for compute.

Since we launched AWS Lambda in 2014, our serverless portfolio has expanded beyond event-driven computing. We now have serverless databases, integration, and orchestration tools. This enables you to build end-to-end serverless applications—but it also means that you must learn how to build using a new serverless operational model.

For this reason, AWS Training and Certification is pleased to offer a new course through Coursera entitled AWS Fundamentals: Building Serverless Applications.

This scenario-based course, developed by the experts at AWS, will:

  • Introduce the AWS serverless framework and architecture in the context of a real business problem.
  • Provide the foundational knowledge to become more proficient in choosing and creating serverless solutions using AWS.
  • Provide demonstrations of the AWS services needed for deploying serverless solutions.
  • Help you develop skills in building and deploying serverless solutions using real-world examples of a serverless website and chatbot.

The syllabus allocates more than nine hours of video content and reading material over four weekly lessons. Each lesson has an estimated 2–3 hours per week of study time (though you can set your own pace and deadlines), with suggested exercises in the AWS Management Console. There is an end-of-course assessment that covers all the learning objectives and content.

The course is on-demand and 100% digital; you can even audit it for free. A completion certificate and access to the graded assessments are available for $49.

What can you expect?

In this course you will learn to use the AWS Serverless portfolio to create a chatbot that answers the question, “Can I let my cat outside?” You will build an application using every one of the concepts and services discussed in the class, including:

At the end of the class, you can audibly interact with the application to ask that essential question, “Can my cat go out in Denver?” (See the conversation in the following screenshot.)

Serverless Coursera training app

Across the four weeks of the course, you learn:

  1. What serverless computing is and how to create a chatbot with Amazon Lex using an S3 bucket to host a web application.
  2. How to build a highly scalable API with API Gateway and use Amazon CloudFront as a content delivery network (CDN) for your site and API.
  3. How to use Lambda to build serverless functions that write data to DynamoDB.
  4. How to apply lessons from the previous weeks to extend and add functionality to the chatbot.

Serverless Coursera training

AWS Fundamentals: Building Serverless Applications is now available. This course complements other standalone digital courses by AWS Training and Certification. They include the highly recommended Introduction to Serverless Development, as well as the following:

Analyze your Amazon CloudFront access logs at scale

Post Syndicated from Steffen Grunwald original https://aws.amazon.com/blogs/big-data/analyze-your-amazon-cloudfront-access-logs-at-scale/

Many AWS customers are using Amazon CloudFront, a global content delivery network (CDN) service. It delivers websites, videos, and API operations to browsers and clients with low latency and high transfer speeds. Amazon CloudFront protects your backends from massive load or malicious requests by caching or a web application firewall. As a result, sometimes only a small fraction of all requests gets to your backends. You can configure Amazon CloudFront to store access logs with detailed information of every request to Amazon Simple Storage Service (S3). This lets you gain insight into your cache efficiency and learn how your customers are using your products.

A common choice to run standard SQL queries on your data in S3 is Amazon Athena. Queries analyze your data immediately without the prior setup of infrastructure or loading your data. You pay only for the queries that you run. Amazon Athena is ideal for quick, interactive querying. It supports complex analysis of your data, including large joins, unions, nested queries, and window functions.

This blog post shows you how you can restructure your Amazon CloudFront access logs storage to optimize the cost and performance for queries. It demonstrates common patterns that are also applicable to other sources of time series data.

Optimizing Amazon CloudFront access logs for Amazon Athena queries

There are two main aspects to optimize: cost and performance.

Cost should be low for both storage of your data and the queries. Access logs are stored in S3, which is billed by GB/ month. Thus, it makes sense to compress your data – especially when you want to keep your logs for a long time. Also cost incurs on queries. When you optimize the storage cost, usually the query cost follows. Access logs are delivered compressed by gzip and Amazon Athena can deal with compression. Amazon Athena is billed by the amount of compressed data scanned, so the benefits of compression are passed on to you as cost savings.

Queries further benefit from partitioning. Partitioning divides your table into parts and keeps the related data together based on column values. For time-based queries, you benefit from partitioning by year, month, day, and hour. In Amazon CloudFront access logs, this indicates the request time. Depending on your data and queries, you add further dimensions to partitions. For example, for access logs it could be the domain name that was requested. When querying your data, you specify filters based on the partition to make Amazon Athena scan less data.

Generally, performance improves by scanning less data. Conversion of your access logs to columnar formats reduces the data to scan significantly. Columnar formats retain all information but store values by column. This allows creation of dictionaries, and effective use of Run Length Encoding and other compression techniques. Amazon Athena can further optimize the amount of data to read, because it does not scan columns at all if a column is not used in a filter or the result of a query. Columnar formats also split a file into chunks and calculate metadata on file- and chunk level like the range (min/ max), count, or sum of values. If the metadata indicates that the file or chunk is not relevant for the query Amazon Athena skips it. In addition, if you know your queries and the information you are looking for, you can further aggregate your data (for example, by day) for improved performance of frequent queries.

This blog post focuses on two measures to restructure Amazon CloudFront access logs for optimization: partitioning and conversion to columnar formats. For more details on performance tuning read the blog post about the top 10 performance tuning tips for Amazon Athena.

This blog post describes the concepts of a solution and includes code excerpts for better illustration of the implementation. Visit the AWS Samples repository for a fully working implementation of the concepts. Launching the packaged sample application from the AWS Serverless Application Repository, you deploy it within minutes in one step:

Partitioning CloudFront Access Logs in S3

Amazon CloudFront delivers each access log file in CSV format to an S3 bucket of your choice. Its name adheres to the following format (for more information, see Configuring and Using Access Logs):

/optional-prefix/distribution-ID.YYYY-MM-DD-HH.unique-ID.gz

The file name includes the date and time of the period in which the requests occurred in Coordinated Universal time (UTC). Although you can specify an optional prefix for an Amazon CloudFront distribution, all access log files for a distribution are stored with the same prefix.

When you have a large amount of access log data, this makes it hard to only scan and process parts of it efficiently. Thus, you must partition your data. Most tools in the big data space (for example, the Apache Hadoop ecosystem, Amazon Athena, AWS Glue) can deal with partitioning using the Apache Hive style. A partition is a directory that is self-descriptive. The directory name not only reflects the value of a column but also the column name. For access logs this is a desirable structure:

/optional-prefix/year=YYYY/month=MM/day=DD/hour=HH/distribution-ID.YYYY-MM-DD-HH.unique-ID.gz

To generate this structure, the sample application initiates the processing of each file by an S3 event notification. As soon as Amazon CloudFront puts a new access log file to an S3 bucket, an event triggers the AWS Lambda function moveAccessLogs. This moves the file to a prefix corresponding to the filename. Technically, the move is a copy followed by deletion of the original file.

 

 

Migration of your Amazon CloudFront Access Logs

The deployment of the sample application contains a single S3 bucket called <StackName>-cf-access-logs. You can modify your existing Amazon CloudFront distribution configuration to deliver access logs to this bucket with the new/ log prefix. Files are moved to the canonical file structure for Amazon Athena partitioning as soon as they are put into the bucket.

To migrate all previous access log files, copy them manually to the new/ folder in the bucket. For example, you could copy the files by using the AWS Command Line Interface (AWS CLI). These files are treated the same way as the incoming files by Amazon CloudFront.

Load the Partitions and query your Access Logs

Before you can query the access logs in your bucket with Amazon Athena the AWS Glue Data Catalog needs metadata. On deployment, the sample application creates a table with the definition of the schema and the location. The new table is created by adding the partitioning information to the CREATE TABLE statement from the Amazon CloudFront documentation (mind the PARTITIONED BY clause):

CREATE EXTERNAL TABLE IF NOT EXISTS
    cf_access_logs.partitioned_gz (
         date DATE,
         time STRING,
         location STRING,
         bytes BIGINT,
         requestip STRING,
         method STRING,
         host STRING,
         uri STRING,
         status INT,
         referrer STRING,
         useragent STRING,
         querystring STRING,
         cookie STRING,
         resulttype STRING,
         requestid STRING,
         hostheader STRING,
         requestprotocol STRING,
         requestbytes BIGINT,
         timetaken FLOAT,
         xforwardedfor STRING,
         sslprotocol STRING,
         sslcipher STRING,
         responseresulttype STRING,
         httpversion STRING,
         filestatus STRING,
         encryptedfields INT 
)
PARTITIONED BY(
         year string,
         month string,
         day string,
         hour string )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://<StackName>-cf-access-logs/partitioned-gz/'
TBLPROPERTIES ( 'skip.header.line.count'='2');

You can load the partitions added so far by running the metastore check (msck) statement via the Amazon Athena query editor. It discovers the partition structure in S3 and adds partitions to the metastore.

msck repair table cf_access_logs.partitioned_gz

You are now ready for your first query on your data in the Amazon Athena query editor:

SELECT SUM(bytes) AS total_bytes
FROM cf_access_logs.partitioned_gz
WHERE year = '2017'
AND month = '10'
AND day = '01'
AND hour BETWEEN '00' AND '11';

This query does not specify the request date (called date in a previous example) column of the table but the columns used for partitioning. These columns are dependent on date but the table definition does not specify this relationship. When you specify only the request date column, Amazon Athena scans every file as there is no hint which files contain the relevant rows and which files do not. By specifying the partition columns, Amazon Athena scans only a small subset of the total amount of Amazon CloudFront access log files. This optimizes both the performance and the cost of your queries. You can add further columns to the WHERE clause, such as the time to further narrow down the results.

To save cost, consider narrowing the scope of partitions down to a minimum by also putting the partitioning columns into the WHERE clause. You validate the approach by observing the amount of data that was scanned in the query execution statistics for your queries. These statistics are also displayed in the Amazon Athena query editor after your statement has been run:

Adding Partitions continuously

As Amazon CloudFront continuously delivers new access log data for requests, new prefixes for partitions are created in S3. However, Amazon Athena only queries the files contained in the known partitions, i.e. partitions that have been added before to the metastore. That’s why periodically triggering the msck command would not be the best solution. First, it is a time-consuming operation since Amazon Athena scans all S3 paths to validate and load your partitions. More importantly, this way you only add partitions that already have data delivered. Thus, there is some time period when the data exists in S3 but is not visible to Amazon Athena queries yet.

The sample application solves this by adding the partition for each hour in advance because partitions are just dependent on the request time. This way Amazon Athena scans files as soon as they exist in S3. A scheduled AWS Lambda function runs a statement like this:

ALTER TABLE cf_access_logs.partitioned_gz
ADD IF NOT EXISTS 
PARTITION (
    year = '2017',
    month = '10',
    day = '01',
    hour = '02' );

It can omit the specification of the canonical location attribute in this statement as it is automatically derived from the column values.

Conversion of the Access Logs to a Columnar Format

As mentioned previously, with columnar formats Amazon Athena skips scanning of data not relevant for a query resulting in less cost. Amazon Athena currently supports the columnar formats Apache ORC and Apache Parquet.

Key to the conversion is the Amazon Athena CREATE TABLE AS SELECT (CTAS) feature. A CTAS query creates a new table from the results of another SELECT query. Amazon Athena stores data files created by the CTAS statement in a specified location in Amazon S3. You can use CTAS to aggregate or transform the data, and to convert it into columnar formats. The sample application uses CTAS to hourly rewrite all logs from the CSV format to the Apache Parquet format. After this the resulting data will be added to a single partitioned table (the target table).

Creating the Target Table in Apache Parquet Format

The target table is a slightly modified version of the partitioned_gz table. Besides a different location the following table shows the different Serializer/Deserializer (SerDe) configuration for Apache Parquet:

CREATE EXTERNAL TABLE `cf_access_logs.partitioned_parquet`(
  `date` date, 
  `time` string, 
  `location` string, 
  `bytes` bigint, 
  `requestip` string, 
  `method` string, 
  `host` string, 
  `uri` string, 
  `status` int, 
  `referrer` string, 
  `useragent` string, 
  `querystring` string, 
  `cookie` string, 
  `resulttype` string, 
  `requestid` string, 
  `hostheader` string, 
  `requestprotocol` string, 
  `requestbytes` bigint, 
  `timetaken` float, 
  `xforwardedfor` string, 
  `sslprotocol` string, 
  `sslcipher` string, 
  `responseresulttype` string, 
  `httpversion` string, 
  `filestatus` string, 
  `encryptedfields` int)
PARTITIONED BY ( 
  `year` string, 
  `month` string, 
  `day` string, 
  `hour` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://<StackName>-cf-access-logs/partitioned-parquet'
TBLPROPERTIES (
  'has_encrypted_data'='false', 
  'parquet.compression'='SNAPPY')

Transformation to Apache Parquet by the CTAS Query

The sample application provides a scheduled AWS Lambda function transformPartition that runs a CTAS query on a single partition per run, taking one hour of data into account. The target location for the Apache Parquet files is the Apache Hive style path in the location of the partitioned_parquet table.

 

 

The files written to S3 are important but the table in the AWS Glue Data Catalog for this data is just a by-product. Hence the function drops the CTAS table immediately and create the corresponding partition in the partitioned_parquet table instead.

CREATE TABLE cf_access_logs.ctas_2017_10_01_02
WITH ( format='PARQUET',
    external_location='s3://<StackName>-cf-access-logs/partitioned_parquet/year=2017/month=10/day=01/hour=02',
    parquet_compression = 'SNAPPY')
AS SELECT *
FROM cf_access_logs.partitioned_gz
WHERE year = '2017'
    AND month = '10'
    AND day = '01'
    AND hour = '02';

DROP TABLE cf_access_logs.ctas_2017_10_01_02;

ALTER TABLE cf_access_logs.partitioned_parquet
ADD IF NOT EXISTS 
PARTITION (
    year = '2017',
    month = '10',
    day = '01',
    hour = '02' );

The statement should be run as soon as new data is written. Amazon CloudFront usually delivers the log file for a time period to your Amazon S3 bucket within an hour of the events that appear in the log. The sample application schedules the transformPartition function hourly to transform the data for the hour before the previous hour.

Some or all log file entries for a time period can sometimes be delayed by up to 24 hours. If you must mitigate this case, you delete and recreate a partition after that period. Also if you migrated partitions from previous Amazon CloudFront access logs, run the transformPartition function for each partition. The sample applications only transforms continuously added files.

When all files of a gzip partition are converted to Apache Parquet, you can save cost by getting rid of data that you do not need. Use the Lifecycle Policies in S3 to archive the gzip files in a cheaper storage class or delete them after a specific amount of days.

Query data over Multiple Tables

You now have two derived tables from the original Amazon CloudFront access log data:

  • partitioned_gz contains gzip compressed CSV files that are added as soon as new files are delivered.
  • Access logs in partitioned_parquet are written after one hour latest. A rough assumption is that the CTAS query takes a maximum of 15 minutes to transform a gzip partition. You must measure and confirm this assumption. Depending on the data size, this can be much faster.

The following diagram shows how the complete view on all data is composed of the two tables. The last complete partition of Apache Parquet files ends before the current time minus the transformation duration and the duration until Amazon CloudFront delivers the access log files.

For convenience the sample application creates the Amazon Athena view combined as a union of both tables. It includes an additional column called file. This is the file that stores the row.

CREATE OR REPLACE VIEW cf_access_logs.combined AS
SELECT *, "$path" AS file
FROM cf_access_logs.partitioned_gz
WHERE concat(year, month, day, hour) >=
       date_format(date_trunc('hour', (current_timestamp -
       INTERVAL '15' MINUTE - INTERVAL '1' HOUR)), '%Y%m%d%H')
UNION ALL SELECT *, "$path" AS file
FROM cf_access_logs.partitioned_parquet
WHERE concat(year, month, day, hour) <
       date_format(date_trunc('hour', (current_timestamp -
       INTERVAL '15' MINUTE - INTERVAL '1' HOUR)), '%Y%m%d%H')

Now you can query the data from the view to take advantage of the columnar based file partitions automatically. As mentioned before, you should add the partition columns (year, month, day, hour) to your statement to limit the files Amazon Athena scans.

SELECT SUM(bytes) AS total_bytes
FROM cf_access_logs.combined
WHERE year = '2017'
   AND month = '10'
   AND day = '01'

Summary

In this blog post, you learned how to optimize the cost and performance of your Amazon Athena queries with two steps. First, you divide the overall data into small partitions. This allows queries to run much faster by reducing the number of files to scan. The second step converts each partition into a columnar format to reduce storage cost and increase the efficiency of scans by Amazon Athena.

The results of both steps are combined in a single view for convenient interactive queries by you or your application. All data is partitioned by the time of the request. Thus, this format is best suited for interactive drill-downs into your logs for which the columns are limited and the time range is known. This way, it complements the Amazon CloudFront reports, for example, by providing easy access to:

  • Data from more than 60 days in the past
  • The distribution of detailed HTTP status codes (for example, 200, 403, 404) on a certain day or hour
  • Statistics based on the URI paths
  • Statistics of objects that are not listed in Amazon CloudFront’s 50 most popular objects report
  • A drill down into the attributes of each request

We hope you find this blog post and the sample application useful also for other types of time series data beside Amazon CloudFront access logs. Feel free to submit enhancements to the example application in the source repository or provide feedback in the comments.

 


About the Author

Steffen Grunwald is a senior solutions architect with Amazon Web Services. Supporting German enterprise customers on their journey to the cloud, he loves to dive deep into application architectures and development processes to drive performance, operational efficiency, and increase the speed of innovation.

 

 

 

 

Protecting your API using Amazon API Gateway and AWS WAF — Part 2

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-2/

This post courtesy of Heitor Lessa, AWS Specialist Solutions Architect – Serverless

In Part 1 of this blog, we described how to protect your API provided by Amazon API Gateway using AWS WAF. In this blog, we show how to use API keys between an Amazon CloudFront distribution and API Gateway to secure access to your API in API Gateway in addition to your preferred authorization (AuthZ) mechanism already set up in API Gateway. For more information about AuthZ mechanisms in API Gateway, see Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway.

We also extend the AWS CloudFormation stack previously used to automate the creation of the following necessary resources of this solution:

The following are alternative solutions to using an API key, depending on your security requirements:

Using a randomly generated HTTP secret header in CloudFront and verifying by API Gateway request validation
Signing incoming requests with [email protected] and verifying with API Gateway Lambda authorizers

Requirements

To follow along, you need full permissions to create, update, and delete API Gateway, CloudFront, Lambda, and CloudWatch Events through AWS CloudFormation.

Extending the existing AWS CloudFormation stack

First, click here to download the full template. Then follow these steps to update the existing AWS CloudFormation stack:

  1. Go to the AWS Management Console and open the AWS CloudFormation console.
  2. Select the stack that you created in Part 1, right-click it, and select Update Stack.
  3. For option 2, choose Choose file and select the template that you downloaded.
  4. Fill in the required parameters as shown in the following image.

Here’s more information about these parameters:

  • API Gateway to send traffic to – We use the same API Gateway URL as in Part 1 except without the URL scheme (https://): cxm45444t9a.execute-api.us-east-2.amazonaws.com/prod
  • Rotating API Keys – We define Daily and use 2018-04-03 as the timestamp value to append to the API key name

Continue with the AWS CloudFormation console to complete the operation. It might take a couple of minutes to update the stack as CloudFront takes its time to propagate changes across all point of presences.

Enabling API Keys in the example Pet Store API

While the stack completes in the background, let’s enable the use of API Keys in the API that CloudFront will send traffic to.

  1. Go to the AWS Management Console and open the API Gateway console.
  2. Select the API that you created in Part 1 and choose Resources.
  3. Under /pets, choose GET and then choose Method Request.
  4. For API Key Required, choose the dropdown menu and choose true.
  5. To save this change, select the highlighted check mark as shown in the following image.

Next, we need to deploy these changes so that requests sent to /pets fail if an API key isn’t present.

  1. Choose Actions and select Deploy API.
  2. Choose the Deployment stage dropdown menu and select the stage you created in Part 1.
  3. Add a deployment description such as “Requires API Keys under /pets” and choose Deploy.

When the deployment succeeds, you’re redirected to the API Gateway Stage page. There you can use the Invoke URL to test if the following request fails due to not having an API key.

This failure is expected and proves that our deployed changes are working. Next, let’s try to access the same API but this time through our CloudFront distribution.

  1. From the AWS Management Console, open the AWS Cloudformation console.
  2. Select the stack that you created in Part 1 and choose Outputs at the bottom left.
  3. On the CFDistribution line, copy the URL. Before you paste in a new browser tab or window, append ‘/pets’ to it.

As opposed to our first attempt without an API key, we receive a JSON response from the PetStore API. This is because CloudFront is injecting an API key before it forwards the request to the PetStore API. The following image demonstrates both of these tests:

  1. Successful request when accessing the API through CloudFront
  2. Unsuccessful request when accessing the API directly through its Invoke URL

This works as a secret between CloudFront and API Gateway, which could be any agreed random secret that can be rotated like an API key. However, it’s important to know that the API key is a feature to track or meter API consumers’ usage. It’s not a secure authorization mechanism and therefore should be used only in conjunction with an API Gateway authorizer.

Rotating API keys

API keys are automatically rotated based on the schedule (e.g., daily or monthly) that you chose when updating the AWS CloudFormation stack. This requires no maintenance or intervention on your part. In this section, we explain how this process works under the hood and what you can do if you want to manually trigger an API key rotation.

The AWS CloudFormation template that we downloaded and used to update our stack does the following in addition to Part 1.

Introduce a Timestamp parameter that is appended to the API key name

Parameters:
  Timestamp:
    Type: String
    Description: Fill in this format <Year>-<Month>-<Day>
    Default: 2018-04-02

Create an API Gateway key, API Gateway usage plan, associate the new key with the API gateway given as a parameter, and configure the CloudFront distribution to send a custom header when forwarding traffic to API Gateway

CFDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      Logging:
        IncludeCookies: 'false'
        Bucket: !Sub ${S3BucketAccessLogs}.s3.amazonaws.com
        Prefix: cloudfront-logs
      Enabled: 'true'
      Comment: API Gateway Regional Endpoint Blog post
      Origins:
        -
          Id: APIGWRegional
          DomainName: !Select [0, !Split ['/', !Ref ApiURL]]
          CustomOriginConfig:
            HTTPPort: 443
            OriginProtocolPolicy: https-only
          OriginCustomHeaders:
            - 
              HeaderName: x-api-key
              HeaderValue: !Ref ApiKey
              ...

ApiUsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    Description: CloudFront usage only
    UsagePlanName: CloudFront_only
    ApiStages:
      - 
        ApiId: !Select [0, !Split ['.', !Ref ApiURL]]
        Stage: !Select [1, !Split ['/', !Ref ApiURL]]

ApiKey: 
  Type: "AWS::ApiGateway::ApiKey"
  Properties: 
    Name: !Sub "CloudFront-${Timestamp}"
    Description: !Sub "CloudFormation API Key ${Timestamp}"
    Enabled: true

ApiKeyUsagePlan:
  Type: "AWS::ApiGateway::UsagePlanKey"
  Properties:
    KeyId: !Ref ApiKey
    KeyType: API_KEY
    UsagePlanId: !Ref ApiUsagePlan

As shown in the ApiKey resource, we append the given Timestamp to Name as well as use it in the API Gateway usage plan key resource. This means that whenever the Timestamp parameter changes, AWS CloudFormation triggers a resource replacement and updates every resource that depends on that API key. In this case, that includes the AWS CloudFront configuration and API Gateway usage plan.

But what does the rotation schedule that you chose at the beginning of this blog mean in this example?

Create a scheduled activity to trigger a Lambda function on a given schedule

Parameters:
...
  ApiKeyRotationSchedule: 
    Description: Schedule to rotate API Keys e.g. Daily, Monthly, Bimonthly basis
    Type: String
    Default: Daily
    AllowedValues:
      - Daily
      - Fortnightly
      - Monthly
      - Bimonthly
      - Quarterly
    ConstraintDescription: Must be any of the available options

Mappings: 

  ScheduleMap: 
    CloudwatchEvents: 
      Daily: "rate(1 day)"
      Fortnightly: "rate(14 days)"
      Monthly: "rate(30 days)"
      Bimonthly: "rate(60 days)"
      Quarterly: "rate(90 days)"

Resources:
...
  RotateApiKeysScheduledJob: 
    Type: "AWS::Events::Rule"
    Properties: 
      Description: "ScheduledRule"
      ScheduleExpression: !FindInMap [ScheduleMap, CloudwatchEvents, !Ref ApiKeyRotationSchedule]
      State: "ENABLED"
      Targets: 
        - 
          Arn: !GetAtt RotateApiKeysFunction.Arn
          Id: "RotateApiKeys"

The resource RotateApiKeysScheduledJob shows that the schedule that you selected through a dropdown menu when updating the AWS CloudFormation stack is actually converted to a CloudWatch Events rule. This in turn triggers a Lambda function that is defined in the same template.

RotateApiKeysFunction:
      Type: "AWS::Lambda::Function"
      Properties:
        Handler: "index.lambda_handler"
        Role: !GetAtt RotateApiKeysFunctionRole.Arn
        Runtime: python3.6
        Environment:
          Variables:
            StackName: !Ref "AWS::StackName"
        Code:
          ZipFile: !Sub |
            import datetime
            import os

            import boto3
            from botocore.exceptions import ClientError

            session = boto3.Session()
            cfn = session.client('cloudformation')
            
            timestamp = datetime.date.today()            
            params = {
                'StackName': os.getenv('StackName'),
                'UsePreviousTemplate': True,
                'Capabilities': ["CAPABILITY_IAM"],
                'Parameters': [
                    {
                      'ParameterKey': 'ApiURL',
                      'UsePreviousValue': True
                    },
                    {
                      'ParameterKey': 'ApiKeyRotationSchedule',
                      'UsePreviousValue': True
                    },
                    {
                      'ParameterKey': 'Timestamp',
                      'ParameterValue': str(timestamp)
                    },
                ],                
            }

            def lambda_handler(event, context):
              """ Updates CloudFormation Stack with a new timestamp and returns CloudFormation response"""
              try:
                  response = cfn.update_stack(**params)
              except ClientError as err:
                  if "No updates are to be performed" in err.response['Error']['Message']:
                      return {"message": err.response['Error']['Message']}
                  else:
                      raise Exception("An error happened while updating the stack: {}".format(err))          
  
              return response

All this Lambda function does is trigger an AWS CloudFormation stack update via API (exactly what you did through the console but programmatically) and updates the Timestamp parameter. As a result, it rotates the API key and the CloudFront distribution configuration.

This gives you enough flexibility to change the API key rotation schedule at any time without maintaining or writing any code. You can also manually update the stack and rotate the keys by updating the AWS CloudFormation stack’s Timestamp parameter.

Next Steps

We hope you found the information in this blog helpful. You can use it to understand how to create a mechanism to allow traffic only from CloudFront to API Gateway and avoid bypassing the AWS WAF rules that Part 1 set up.

Keep the following important notes in mind about this solution:

  • It assumes that you already have a strong AuthZ mechanism, managed by API Gateway, to control access to your API.
  • The API Gateway usage plan and other resources created in this solution work only for APIs created in the same account (the ApiUrl parameter).
  • If you already use API keys for tracking API usage, consider using either of the following solutions as a replacement:
    • Use a random HTTP header value in CloudFront origin configuration and use an API Gateway request model validation to verify it instead of API keys alone.
    • Combine [email protected] and an API Gateway custom authorizer to sign and verify incoming requests using a shared secret known only to the two. This is a more advanced technique.

Introducing Amazon API Gateway Private Endpoints

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/introducing-amazon-api-gateway-private-endpoints/

One of the biggest trends in application development today is the use of APIs to power the backend technologies supporting a product. Increasingly, the way mobile, IoT, web applications, or internal services talk to each other and to application frontends is using some API interface.

Alongside this trend of building API-powered applications is the move to a microservices application design pattern. A larger application is represented by many smaller application components, also typically communicating via API. The growth of APIs and microservices being used together is driven across all sorts of companies, from startups up through enterprises. The number of tools required to manage APIs at scale, securely, and with minimal operational overhead is growing as well.

Today, we’re excited to announce the launch of Amazon API Gateway private endpoints. This has been one of the most heavily requested features for this service. We believe this is going to make creating and managing private APIs even easier.

API Gateway overview

When API Gateway first launched, it came with what are now known as edge-optimized endpoints. These publicly facing endpoints came fronted with Amazon CloudFront, a global content delivery network with over 100 points of presence today.

Edge-optimized endpoints helped you reduce latency to clients accessing your API on the internet from anywhere; typically, mobile, IoT, or web-based applications. Behind API Gateway, you could back your API with a number of options for backend technologies: AWS Lambda, Amazon EC2, Elastic Load Balancing products such as Application Load Balancers or Classic Load Balancers, Amazon DynamoDB, Amazon Kinesis, or any publicly available HTTPS-based endpoint.

In February 2016, AWS launched the ability for AWS Lambda functions to access resources inside of an Amazon VPC. With this launch, you could build API-based services that did not require a publicly available endpoint. They could still interact with private services, such as databases, inside your VPC.

In November 2017, API Gateway launched regional API endpoints, which are publicly available endpoints without any preconfigured CDN in front of them. Regional endpoints are great for helping to reduce request latency when API requests originate from the same Region as your REST API. You can also configure your own CDN distribution, which allows you to protect your public APIs with AWS WAF, for example. With regional endpoints, nothing changed about the backend technologies supported.

At re:Invent 2017, we announced endpoint integrations inside a private VPC. With this capability, you can now have your backend running on EC2 be private inside your VPC without the need for a publicly accessible IP address or load balancer. Beyond that, you can also now use API Gateway to front APIs hosted by backends that exist privately in your own data centers, using AWS Direct Connect links to your VPC. Private integrations were made possible via VPC Link and Network Load Balancers, which support backends such as EC2 instances, Auto Scaling groups, and Amazon ECS using the Fargate launch type.

Combined with the other capabilities of API Gateway—such as Lambda authorizers, resource policies, canary deployments, SDK generation, and integration with Amazon Cognito User Pools—you’ve been able to build publicly available APIs, with nearly any backend you could want, securely, at scale, and with minimal operations overhead.

Private endpoints

Today’s launch solves one of the missing pieces of the puzzle, which is the ability to have private API endpoints inside your own VPC. With this new feature, you can still use API Gateway features, while securely exposing REST APIs only to the other services and resources inside your VPC, or those connected via Direct Connect to your own data centers.

Here’s how this works.

API Gateway private endpoints are made possible via AWS PrivateLink interface VPC endpoints. Interface endpoints work by creating elastic network interfaces in subnets that you define inside your VPC. Those network interfaces then provide access to services running in other VPCs, or to AWS services such as API Gateway. When configuring your interface endpoints, you specify which service traffic should go through them. When using private DNS, all traffic to that service is directed to the interface endpoint instead of through a default route, such as through a NAT gateway or public IP address.

API Gateway as a fully managed service runs its infrastructure in its own VPCs. When you interface with API Gateway publicly accessible endpoints, it is done through public networks. When they’re configured as private, the public networks are not made available to route your API. Instead, your API can only be accessed using the interface endpoints that you have configured.

Some things to note:

  • Because you configure the subnets in which your endpoints are made available, you control the availability of the access to your API Gateway hosted APIs. Make sure that you provide multiple interfaces in your VPC. In the above diagram, there is one endpoint in each subnet in each Availability Zone for which the VPC is configured.
  • Each endpoint is an elastic network interface configured in your VPC that has security groups configured. Network ACLs apply to the network interface as well.

For more information about endpoint limits, see Interface VPC Endpoints.

Setting up a private endpoint

Getting up and running with your private API Gateway endpoint requires just a few things:

  • A virtual private cloud (VPC) configured with at least one subnet and DNS resolution enabled.
  • A VPC endpoint with the following configuration:
    • Service name = “com.amazonaws.{region}.execute-api”
    • Enable Private DNS Name = enabled
    • A security group set to allow TCP Port 443 inbound from either an IP range in your VPC or another security group in your VPC
  • An API Gateway managed API with the following configuration:
    • Endpoint Type = “Private”
    • An API Gateway resource policy that allows access to your API from the VPC endpoint

Create the VPC

To create a VPC using AWS CloudFormation, choose Launch stack.

This VPC will have two private and two public subnets, one of each in an AZ, as seen in the CloudFormation Designer.

  1. Name the stack “PrivateAPIDemo”.
  2. Set the Environment to “Demo”. This has no real effect beyond tagging and naming certain resources accordingly.
  3. Choose Next.
  4. On the Options page, leave all of the defaults and choose Next.
  5. On the Review page, choose Create. It takes just a few moments for all of the resources in this template to be created.
  6. After the VPC has a status of “CREATE_COMPLETE”, choose Outputs and make note of the values for VpcId, both public and private subnets 1 and 2, and the endpoint security group.

Create the VPC endpoint for API Gateway

  1. Open the Amazon VPC console.
  2. Make sure that you are in the same Region in which you just created the above stack.
  3. In the left navigation pane, choose Endpoints, Create Endpoint.
  4. For Service category, keep it set to “AWS Services”.
  5. For Service Name, set it to “com.amazonaws.{region}.execute-api”.
  6. For VPC, select the one created earlier.
  7. For Subnets, select the two private labeled subnets from this VPC created earlier, one in each Availability Zone. You can find them labeled as “privateSubnet01” and “privateSubnet02”.
  8. For Enable Private DNS Name, keep it checked as Enabled for this endpoint.
  9. For Security Group, select the group named “EndpointSG”. It allows for HTTPS access to the endpoint for the entire VPC IP address range.
  10. Choose Create Endpoint.

Creating the endpoint takes a few moments to go through all of the interface endpoint lifecycle steps. You need the DNS names later so note them now.

Create the API

Follow the Pet Store example in the API Gateway documentation:

  1. Open the API Gateway console in the same Region as the VPC and private endpoint.
  2. Choose Create API, Example API.
  3. For Endpoint Type, choose Private.
  4. Choose Import.

Before deploying the API, create a resource policy to allow access to the API from inside the VPC.

  1. In the left navigation pane, choose Resource Policy.
  2. Choose Source VPC Whitelist from the three examples possible.
  3. Replace {{vpceID}} with the ID of your VPC endpoint.
  4. Choose Save.
  5. In the left navigation pane, select the new API and choose Actions, Deploy API.
    1. Choose [New Stage].
    2. Name the stage demo.
    3. Choose Deploy.

Your API is now fully deployed and available from inside your VPC. Next, test to confirm that it’s working.

Test the API

To emphasize the “privateness” of this API, test it from a resource that only lives inside your VPC and has no direct network access to it, in the traditional networking sense.

Launch a Lambda function inside the VPC, with no public access. To show its ability to hit the private API endpoint, invoke it using the console. The function is launched inside the private subnets inside the VPC without access to a NAT gateway, which would be required for any internet access. This works because Lambda functions are invoked using the service API, not any direct network access to the function’s underlying resources inside your VPC.

To create a Lambda function using CloudFormation, choose Launch stack.

All the code for this function is located inside of the template and the template creates just three resources, as shown in the diagram from Designer:

  • A Lambda function
  • An IAM role
  • A VPC security group
  1. Name the template LambdaTester, or something easy to remember.
  2. For the first parameter, enter a DNS name from your VPC endpoint. These can be found in the Amazon VPC console under Endpoints. For this example, use the endpoints that start with “vpce”. These are the private DNS names for them.For the API Gateway endpoint DNS, see the dashboard for your API Gateway API and copy the URL from the top of the page. Use just the endpoint DNS, not the “https://” or “/demo/” at the end.
  3. Select the same value for Environment as you did earlier in creating your VPC.
  4. Choose Next.
  5. Leave all options as the default values and choose Next.
  6. Select the check box next to I acknowledge that… and choose Create.
  7. When your stack reaches the “CREATE_COMPLETE” state, choose Resources.
  8. To go to the Lambda console for this function, choose the Physical ID of the AWS::Lambda::Function resource.

Note: If you chose a different environment than “Demo” for this example, modify the line “path: ‘/demo/pets’,” to the appropriate value.

  1. Choose Test in the top right of the Lambda console. You are prompted to create a test event to pass the function. Because you don’t need to take anything here for the function to call the internal API, you can create a blank payload or leave the default as shown. Choose Save.
  2. Choose Test again. This invokes the function and passes in the payload that you just saved. It takes just a few moments for the new function’s environment to spin to life and to call the code configured for it. You should now see the results of the API call to the PetStore API.

The JSON returned is from your API Gateway powered private API endpoint. Visit the API Gateway console to see activity on the dashboard and confirm again that this API was called by the Lambda function, as in the following screenshot:

Cleanup

Cleaning up from this demo requires a few simple steps:

  1. Delete the stack for your Lambda function.
  2. Delete the VPC endpoint.
  3. Delete the API Gateway API.
  4. Delete the VPC stack that you created first.

Conclusion

API Gateway private endpoints enable use cases for building private API–based services inside your own VPCs. You can now keep both the frontend to your API (API Gateway) and the backend service (Lambda, EC2, ECS, etc.) private inside your VPC. Or you can have networks using Direct Connect networks without the need to expose them to the internet in any way. All of this without the need to manage the infrastructure that powers the API gateway itself!

You can continue to use the advanced features of API Gateway such as custom authorizers, Amazon Cognito User Pools integration, usage tiers, throttling, deployment canaries, and API keys.

We believe that this feature greatly simplifies the growth of API-based microservices. We look forward to your feedback here, on social media, or in the AWS forums.

Protecting your API using Amazon API Gateway and AWS WAF — Part I

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-i/

This post courtesy of Thiago Morais, AWS Solutions Architect

When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.

Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.

Regional API endpoints

Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.

When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.

For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.

Edge-optimized API endpoint

The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.

Regional API endpoint

For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.

Walkthrough

In this section, you implement the following steps:

  • Create a regional API using the PetStore sample API.
  • Create a CloudFront distribution for the API.
  • Test the CloudFront distribution.
  • Set up AWS WAF and create a web ACL.
  • Attach the web ACL to the CloudFront distribution.
  • Test AWS WAF protection.

Create the regional API

For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:

After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.

On the API Gateway console, select the PetStore API and choose Actions, Deploy API.

For Stage name, type prod and add a stage description.

Choose Deploy and the new API stage is created.

Use the following AWS CLI command to update your API from edge-optimized to regional:

aws apigateway update-rest-api \
--rest-api-id {rest-api-id} \
--patch-operations op=replace,path=/endpointConfiguration/types/EDGE,value=REGIONAL

A successful response looks like the following:

{
    "description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints", 
    "createdDate": 1511525626, 
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }, 
    "id": "{api-id}", 
    "name": "PetStore"
}

After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.

Create a CloudFront distribution

To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.

For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:

{api-id}.execute-api.us-east-1.amazonaws.com

After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:

  • A CloudFront domain URL
  • An S3 bucket for CloudFront access logs
Output from CloudFormation

Output from CloudFormation

Test the CloudFront distribution

To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:

https://{your-distribution-url}.cloudfront.net/{api-stage}/pets

You should get the following output:

[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Set up AWS WAF and create a web ACL

With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.

For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.

For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.

For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.

The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.

To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.

After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.

Attach the AWS WAF web ACL to the CloudFront distribution

With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.

To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.

Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.

Save the changes to your CloudFront distribution and wait for the deployment to finish.

Test AWS WAF protection

To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.

To install Artillery on your machine, run the following command:

$ npm install -g artillery

After the installation completes, you can check if Artillery installed successfully by running the following command:

$ artillery -V
$ 1.6.0-12

As the time of publication, Artillery is on version 1.6.0-12.

One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.

First, use cURL to query your distribution and see the API output:

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Based on the test above, the result looks good. But what if you max out the 2000 requests in under 5 minutes?

Run the following Artillery command:

artillery quick -n 2000 --count 10  https://{distribution-name}.cloudfront.net/prod/pets

What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.

After Artillery finishes its execution, try to run the cURL request again and see what happens:

 

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.

Conclusion

In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.

In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.

Enhanced Domain Protections for Amazon CloudFront Requests

Post Syndicated from Colm MacCarthaigh original https://aws.amazon.com/blogs/security/enhanced-domain-protections-for-amazon-cloudfront-requests/

Over the coming weeks, we’ll be adding enhanced domain protections to Amazon CloudFront. The short version is this: the new measures are designed to ensure that requests handled by CloudFront are handled on behalf of legitimate domain owners.

Using CloudFront to receive traffic for a domain you aren’t authorized to use is already a violation of our AWS Terms of Service. When we become aware of this type of activity, we deal with it behind the scenes by disabling abusive accounts. Now we’re integrating checks directly into the CloudFront API and Content Distribution service, as well.

Enhanced Protection against Dangling DNS entries
To use CloudFront with your domain, you must configure your domain to point at CloudFront. You may use a traditional CNAME, or an Amazon Route 53 “ALIAS” record.

A problem can arise if you delete your CloudFront distribution, but leave your DNS still pointing at CloudFront, popularly known as a “dangling” DNS entry. Thankfully, this is very rare, as the domain will no longer work, but we occasionally see customers who leave their old domains dormant. This can also happen if you leave this kind of “dangling” DNS entry pointing at other infrastructure you no longer control. For example, if you leave a domain pointing at an IP address that you don’t control, then there is a risk that someone may come along and “claim” traffic destined for your domain.

In an even more rare set of circumstances, an abuser can exploit a subdomain of a domain that you are actively using. For example, if a customer left “images.example.com” dangling and pointing to a deleted CloudFront distribution which is no longer in use, but they still actively use the parent domain “example.com”, then an abuser could come along and register “images.example.com” as an alternative name on their own distribution and claim traffic that they aren’t entitled to. This also means that cookies may be set and intercepted for HTTP traffic potentially including the parent domain. HTTPS traffic remains protected if you’ve removed the certificate associated with the original CloudFront distribution.

Of course, the best fix for this kind of risk is not to leave dangling DNS entries in the first place. Earlier in February, 2018, we added a new warning to our systems. With this warning, if you remove an alternate domain name from a distribution, you are reminded to delete any DNS entries that may still be pointing at CloudFront.

We also have long-standing checks in the CloudFront API that ensure this kind of domain claiming can’t occur when you are using wildcard domains. If you attempt to add *.example.com to your CloudFront distribution, but another account has already registered www.example.com, then the attempt will fail.

With the new enhanced domain protection, CloudFront will now also check your DNS whenever you remove an alternate domain. If we determine that the domain is still pointing at your CloudFront distribution, the API call will fail and no other accounts will be able to claim this traffic in the future.

Enhanced Protection against Domain Fronting
CloudFront will also be soon be implementing enhanced protections against so-called “Domain Fronting”. Domain Fronting is when a non-standard client makes a TLS/SSL connection to a certain name, but then makes a HTTPS request for an unrelated name. For example, the TLS connection may connect to “www.example.com” but then issue a request for “www.example.org”.

In certain circumstances this is normal and expected. For example, browsers can re-use persistent connections for any domain that is listed in the same SSL Certificate, and these are considered related domains. But in other cases, tools including malware can use this technique between completely unrelated domains to evade restrictions and blocks that can be imposed at the TLS/SSL layer.

To be clear, this technique can’t be used to impersonate domains. The clients are non-standard and are working around the usual TLS/SSL checks that ordinary clients impose. But clearly, no customer ever wants to find that someone else is masquerading as their innocent, ordinary domain. Although these cases are also already handled as a breach of our AWS Terms of Service, in the coming weeks we will be checking that the account that owns the certificate we serve for a particular connection always matches the account that owns the request we handle on that connection. As ever, the security of our customers is our top priority, and we will continue to provide enhanced protection against misconfigurations and abuse from unrelated parties.

Interested in additional AWS Security news? Follow the AWS Security Blog on Twitter.

Give Your WordPress Blog a Voice With Our New Amazon Polly Plugin

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/give-your-wordpress-blog-a-voice-with-our-new-amazon-polly-plugin/

I first told you about Polly in late 2016 in my post Amazon Polly – Text to Speech in 47 Voices and 24 Languages. After that AWS re:Invent launch, we added support for Korean, five new voices, and made Polly available in all Regions in the aws partition. We also added whispering, speech marks, a timbre effect, and dynamic range compression.

New WordPress Plugin
Today we are launching a WordPress plugin that uses Polly to create high-quality audio versions of your blog posts. You can access the audio from within the post or in podcast form using a feature that we call Amazon Pollycast! Both options make your content more accessible and can help you to reach a wider audience. This plugin was a joint effort between the AWS team our friends at AWS Advanced Technology Partner WP Engine.

As you will see, the plugin is easy to install and configure. You can use it with installations of WordPress that you run on your own infrastructure or on AWS. Either way, you have access to all of Polly’s voices along with a wide variety of configuration options. The generated audio (an MP3 file for each post) can be stored alongside your WordPress content, or in Amazon Simple Storage Service (S3), with optional support for content distribution via Amazon CloudFront.

Installing the Plugin
I did not have an existing WordPress-powered blog, so I begin by launching a Lightsail instance using the WordPress 4.8.1 blueprint:

Then I follow these directions to access my login credentials:

Credentials in hand, I log in to the WordPress Dashboard:

The plugin makes calls to AWS, and needs to have credentials in order to do so. I hop over to the IAM Console and created a new policy. The policy allows the plugin to access a carefully selected set of S3 and Polly functions (find the full policy in the README):

Then I create an IAM user (wp-polly-user). I enter the name and indicate that it will be used for Programmatic Access:

Then I attach the policy that I just created, and click on Review:

I review my settings (not shown) and then click on Create User. Then I copy the two values (Access Key ID and Secret Access Key) into a secure location. Possession of these keys allows the bearer to make calls to AWS so I take care not to leave them lying around.

Now I am ready to install the plugin! I go back to the WordPress Dashboard and click on Add New in the Plugins menu:

Then I click on Upload Plugin and locate the ZIP file that I downloaded from the WordPress Plugins site. After I find it I click on Install Now to proceed:

WordPress uploads and installs the plugin. Now I click on Activate Plugin to move ahead:

With the plugin installed, I click on Settings to set it up:

I enter my keys and click on Save Changes:

The General settings let me control the sample rate, voice, player position, the default setting for new posts, and the autoplay option. I can leave all of the settings as-is to get started:

The Cloud Storage settings let me store audio in S3 and to use CloudFront to distribute the audio:

The Amazon Pollycast settings give me control over the iTunes parameters that are included in the generated RSS feed:

Finally, the Bulk Update button lets me regenerate all of the audio files after I change any of the other settings:

With the plugin installed and configured, I can create a new post. As you can see, the plugin can be enabled and customized for each post:

I can see how much it will cost to convert to audio with a click:

When I click on Publish, the plugin breaks the text into multiple blocks on sentence boundaries, calls the Polly SynthesizeSpeech API for each block, and accumulates the resulting audio in a single MP3 file. The published blog post references the file using the <audio> tag. Here’s the post:

I can’t seem to use an <audio> tag in this post, but you can download and play the MP3 file yourself if you’d like.

The Pollycast feature generates an RSS file with links to an MP3 file for each post:

Pricing
The plugin will make calls to Amazon Polly each time the post is saved or updated. Pricing is based on the number of characters in the speech requests, as described on the Polly Pricing page. Also, the AWS Free Tier lets you process up to 5 million characters per month at no charge, for a period of one year that starts when you make your first call to Polly.

Going Further
The plugin is available on GitHub in source code form and we are looking forward to your pull requests! Here are a couple of ideas to get you started:

Voice Per Author – Allow selection of a distinct Polly voice for each author.

Quoted Text – For blogs that make frequent use of embedded quotes, use a distinct voice for the quotes.

Translation – Use Amazon Translate to translate the texts into another language, and then use Polly to generate audio in that language.

Other Blogging Engines – Build a similar plugin for your favorite blogging engine.

SSML Support – Figure out an interesting way to use Polly’s SSML tags to add additional character to the audio.

Let me know what you come up with!

Jeff;